[PATCH 0/2] xen/arm: Convert runstate address during hypcall

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] xen/arm: Convert runstate address during hypcall
@ 2020-06-11 11:58 Bertrand Marquis
  2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis
  2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis
  0 siblings, 2 replies; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-11 11:58 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, nd, Volodymyr Babchuk,
	Roger Pau Monné

This serie is introducing changes to convert runstate address virtual
address provided by a guest during the hypercall on Arm.
It keeps the current strategy on x86 but moves some of the common code
inside the architecture code.
The serie is divided in 2 patches to allow an easier review of the
specific code to handle the use case where the guest area is actually
crossing 2 pages.

Bertrand Marquis (2):
  xen/arm: Convert runstate address during hypcall
  xen/arm: Support runstate crossing pages

 xen/arch/arm/domain.c        | 154 ++++++++++++++++++++++++++++++-----
 xen/arch/x86/domain.c        |  30 ++++++-
 xen/arch/x86/x86_64/domain.c |   4 +-
 xen/common/domain.c          |  19 ++---
 xen/include/asm-arm/domain.h |   9 ++
 xen/include/asm-x86/domain.h |  16 ++++
 xen/include/xen/domain.h     |   4 +
 xen/include/xen/sched.h      |  16 +---
 8 files changed, 199 insertions(+), 53 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 11:58 [PATCH 0/2] xen/arm: Convert runstate address during hypcall Bertrand Marquis
@ 2020-06-11 11:58 ` Bertrand Marquis
  2020-06-11 18:16   ` Stefano Stabellini
  2020-06-12 10:53   ` Julien Grall
  2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis
  1 sibling, 2 replies; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-11 11:58 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, nd, Volodymyr Babchuk,
	Roger Pau Monné

At the moment on Arm, a Linux guest running with KTPI enabled will
cause the following error when a context switch happens in user mode:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0

This patch is modifying the register runstate area handling on arm to
convert the guest address during the hypercall. During runtime context
switches the area is mapped to update the guest runstate copy.
The patch is also removing the initial copy which was done during the
hypercall handling as this is done on the current core when the context
is restore to go back to guest execution on arm.

As a consequence if the register runstate hypercall is called on one
vcpu for an other vcpu, the area will not be updated until the vcpu
will be executed (on arm only).

On x86, the behaviour is not modified, the address conversion is done
during the context switch and the area is updated fully during the
hypercall.
inline functions in headers could not be used as the architecture
domain.h is included before the global domain.h making it impossible
to use the struct vcpu inside the architecture header.
This should not have any performance impact as the hypercall is only
called once per vcpu usually.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
---
 xen/arch/arm/domain.c        | 109 +++++++++++++++++++++++++++++------
 xen/arch/x86/domain.c        |  30 +++++++++-
 xen/arch/x86/x86_64/domain.c |   4 +-
 xen/common/domain.c          |  19 ++----
 xen/include/asm-arm/domain.h |   8 +++
 xen/include/asm-x86/domain.h |  16 +++++
 xen/include/xen/domain.h     |   4 ++
 xen/include/xen/sched.h      |  16 +----
 8 files changed, 153 insertions(+), 53 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 31169326b2..739059234f 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -19,6 +19,7 @@
 #include <xen/sched.h>
 #include <xen/softirq.h>
 #include <xen/wait.h>
+#include <xen/domain_page.h>
 
 #include <asm/alternative.h>
 #include <asm/cpuerrata.h>
@@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n)
     virt_timer_restore(n);
 }
 
-/* Update per-VCPU guest runstate shared memory area (if registered). */
-static void update_runstate_area(struct vcpu *v)
+void arch_cleanup_runstate_guest(struct vcpu *v)
 {
-    void __user *guest_handle = NULL;
-    struct vcpu_runstate_info runstate;
+    spin_lock(&v->arch.runstate_guest.lock);
 
-    if ( guest_handle_is_null(runstate_guest(v)) )
-        return;
+    /* cleanup previous page if any */
+    if ( v->arch.runstate_guest.page )
+    {
+        put_page_and_type(v->arch.runstate_guest.page);
+        v->arch.runstate_guest.page = NULL;
+        v->arch.runstate_guest.offset = 0;
+    }
 
-    memcpy(&runstate, &v->runstate, sizeof(runstate));
+    spin_unlock(&v->arch.runstate_guest.lock);
+}
 
-    if ( VM_ASSIST(v->domain, runstate_update_flag) )
+int arch_setup_runstate_guest(struct vcpu *v,
+                            struct vcpu_register_runstate_memory_area area)
+{
+    struct page_info *page;
+    unsigned offset;
+
+    spin_lock(&v->arch.runstate_guest.lock);
+
+    /* cleanup previous page if any */
+    if ( v->arch.runstate_guest.page )
     {
-        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
-        guest_handle--;
-        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
-        smp_wmb();
+        put_page_and_type(v->arch.runstate_guest.page);
+        v->arch.runstate_guest.page = NULL;
+        v->arch.runstate_guest.offset = 0;
+    }
+
+    offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
+    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
+    {
+        spin_unlock(&v->arch.runstate_guest.lock);
+        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
+        return -EINVAL;
+    }
+
+    /* provided address must be aligned to a 64bit */
+    if ( offset % alignof(struct vcpu_runstate_info) )
+    {
+        spin_unlock(&v->arch.runstate_guest.lock);
+        gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n");
+        return -EINVAL;
+    }
+
+    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
+    if ( !page )
+    {
+        spin_unlock(&v->arch.runstate_guest.lock);
+        gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
+        return -EINVAL;
     }
 
-    __copy_to_guest(runstate_guest(v), &runstate, 1);
+    v->arch.runstate_guest.page = page;
+    v->arch.runstate_guest.offset = offset;
+
+    spin_unlock(&v->arch.runstate_guest.lock);
+
+    return 0;
+}
+
+
+/* Update per-VCPU guest runstate shared memory area (if registered). */
+static void update_runstate_area(struct vcpu *v)
+{
+    struct vcpu_runstate_info *guest_runstate;
+    void *p;
+
+    spin_lock(&v->arch.runstate_guest.lock);
 
-    if ( guest_handle )
+    if ( v->arch.runstate_guest.page )
     {
-        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+        p = __map_domain_page(v->arch.runstate_guest.page);
+        guest_runstate = p + v->arch.runstate_guest.offset;
+
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+        }
+
+        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
         smp_wmb();
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
+
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+        }
+
+        unmap_domain_page(p);
     }
+
+    spin_unlock(&v->arch.runstate_guest.lock);
 }
 
 static void schedule_tail(struct vcpu *prev)
@@ -560,6 +629,8 @@ int arch_vcpu_create(struct vcpu *v)
     v->arch.saved_context.sp = (register_t)v->arch.cpu_info;
     v->arch.saved_context.pc = (register_t)continue_new_vcpu;
 
+    spin_lock_init(&v->arch.runstate_guest.lock);
+
     /* Idle VCPUs don't need the rest of this setup */
     if ( is_idle_vcpu(v) )
         return rc;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fee6c3931a..32bbed87d5 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
 }
 
+int arch_setup_runstate_guest(struct vcpu *v,
+                             struct vcpu_register_runstate_memory_area area)
+{
+    struct vcpu_runstate_info runstate;
+
+    runstate_guest(v) = area.addr.h;
+
+    if ( v == current )
+    {
+        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
+    }
+    else
+    {
+        vcpu_runstate_get(v, &runstate);
+        __copy_to_guest(runstate_guest(v), &runstate, 1);
+    }
+    return 0;
+}
+
+void arch_cleanup_runstate_guest(struct vcpu *v)
+{
+    set_xen_guest_handle(runstate_guest(v), NULL);
+}
+
 /* Update per-VCPU guest runstate shared memory area (if registered). */
 bool update_runstate_area(struct vcpu *v)
 {
@@ -1660,8 +1684,8 @@ bool update_runstate_area(struct vcpu *v)
     if ( VM_ASSIST(v->domain, runstate_update_flag) )
     {
         guest_handle = has_32bit_shinfo(v->domain)
-            ? &v->runstate_guest.compat.p->state_entry_time + 1
-            : &v->runstate_guest.native.p->state_entry_time + 1;
+            ? &v->arch.runstate_guest.compat.p->state_entry_time + 1
+            : &v->arch.runstate_guest.native.p->state_entry_time + 1;
         guest_handle--;
         runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
         __raw_copy_to_guest(guest_handle,
@@ -1674,7 +1698,7 @@ bool update_runstate_area(struct vcpu *v)
         struct compat_vcpu_runstate_info info;
 
         XLAT_vcpu_runstate_info(&info, &runstate);
-        __copy_to_guest(v->runstate_guest.compat, &info, 1);
+        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
         rc = true;
     }
     else
diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
index c46dccc25a..b879e8dd2c 100644
--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -36,7 +36,7 @@ arch_compat_vcpu_op(
             break;
 
         rc = 0;
-        guest_from_compat_handle(v->runstate_guest.compat, area.addr.h);
+        guest_from_compat_handle(v->arch.runstate_guest.compat, area.addr.h);
 
         if ( v == current )
         {
@@ -49,7 +49,7 @@ arch_compat_vcpu_op(
             vcpu_runstate_get(v, &runstate);
             XLAT_vcpu_runstate_info(&info, &runstate);
         }
-        __copy_to_guest(v->runstate_guest.compat, &info, 1);
+        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
 
         break;
     }
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7cc9526139..0ca6bf4dbc 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -727,7 +727,10 @@ int domain_kill(struct domain *d)
         if ( cpupool_move_domain(d, cpupool0) )
             return -ERESTART;
         for_each_vcpu ( d, v )
+        {
+            arch_cleanup_runstate_guest(v);
             unmap_vcpu_info(v);
+        }
         d->is_dying = DOMDYING_dead;
         /* Mem event cleanup has to go here because the rings 
          * have to be put before we call put_domain. */
@@ -1167,7 +1170,7 @@ int domain_soft_reset(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        set_xen_guest_handle(runstate_guest(v), NULL);
+        arch_cleanup_runstate_guest(v);
         unmap_vcpu_info(v);
     }
 
@@ -1484,7 +1487,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
     case VCPUOP_register_runstate_memory_area:
     {
         struct vcpu_register_runstate_memory_area area;
-        struct vcpu_runstate_info runstate;
 
         rc = -EFAULT;
         if ( copy_from_guest(&area, arg, 1) )
@@ -1493,18 +1495,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !guest_handle_okay(area.addr.h, 1) )
             break;
 
-        rc = 0;
-        runstate_guest(v) = area.addr.h;
-
-        if ( v == current )
-        {
-            __copy_to_guest(runstate_guest(v), &v->runstate, 1);
-        }
-        else
-        {
-            vcpu_runstate_get(v, &runstate);
-            __copy_to_guest(runstate_guest(v), &runstate, 1);
-        }
+        rc = arch_setup_runstate_guest(v, area);
 
         break;
     }
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 4e2f582006..3a7f53e13d 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -11,6 +11,7 @@
 #include <asm/vgic.h>
 #include <asm/vpl011.h>
 #include <public/hvm/params.h>
+#include <public/vcpu.h>
 #include <xen/serial.h>
 #include <xen/rbtree.h>
 
@@ -206,6 +207,13 @@ struct arch_vcpu
      */
     bool need_flush_to_ram;
 
+    /* runstate guest info */
+    struct {
+        struct page_info *page;  /* guest page */
+        unsigned         offset; /* offset in page */
+        spinlock_t       lock;   /* lock to access page */
+    } runstate_guest;
+
 }  __cacheline_aligned;
 
 void vcpu_show_execution_state(struct vcpu *);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index e8cee3d5e5..f917b48cb8 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -11,6 +11,11 @@
 #include <asm/x86_emulate.h>
 #include <public/vcpu.h>
 #include <public/hvm/hvm_info_table.h>
+#ifdef CONFIG_COMPAT
+#include <compat/vcpu.h>
+DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
+#endif
+
 
 #define has_32bit_shinfo(d)    ((d)->arch.has_32bit_shinfo)
 
@@ -626,6 +631,17 @@ struct arch_vcpu
     struct {
         bool next_interrupt_enabled;
     } monitor;
+
+#ifndef CONFIG_COMPAT
+# define runstate_guest(v) ((v)->arch.runstate_guest)
+    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
+#else
+# define runstate_guest(v) ((v)->arch.runstate_guest.native)
+    union {
+        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
+        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
+    } runstate_guest;
+#endif
 };
 
 struct guest_memory_policy
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 7e51d361de..d5d73ce997 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -63,6 +63,10 @@ void arch_vcpu_destroy(struct vcpu *v);
 int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
 void unmap_vcpu_info(struct vcpu *v);
 
+int arch_setup_runstate_guest(struct vcpu *v,
+                            struct vcpu_register_runstate_memory_area area);
+void arch_cleanup_runstate_guest(struct vcpu *v);
+
 int arch_domain_create(struct domain *d,
                        struct xen_domctl_createdomain *config);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..fac030fb83 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -29,11 +29,6 @@
 #include <public/vcpu.h>
 #include <public/event_channel.h>
 
-#ifdef CONFIG_COMPAT
-#include <compat/vcpu.h>
-DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
-#endif
-
 /*
  * Stats
  *
@@ -166,16 +161,7 @@ struct vcpu
     struct sched_unit *sched_unit;
 
     struct vcpu_runstate_info runstate;
-#ifndef CONFIG_COMPAT
-# define runstate_guest(v) ((v)->runstate_guest)
-    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
-#else
-# define runstate_guest(v) ((v)->runstate_guest.native)
-    union {
-        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
-        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
-    } runstate_guest; /* guest address */
-#endif
+
     unsigned int     new_state;
 
     /* Has the FPU been initialised? */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis
@ 2020-06-11 18:16   ` Stefano Stabellini
  2020-06-11 18:24     ` Julien Grall
  2020-06-12  8:07     ` Bertrand Marquis
  2020-06-12 10:53   ` Julien Grall
  1 sibling, 2 replies; 26+ messages in thread
From: Stefano Stabellini @ 2020-06-11 18:16 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd,
	Volodymyr Babchuk, Roger Pau Monné

On Thu, 11 Jun 2020, Bertrand Marquis wrote:
> At the moment on Arm, a Linux guest running with KTPI enabled will
> cause the following error when a context switch happens in user mode:
> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
> 
> This patch is modifying the register runstate area handling on arm to
> convert the guest address during the hypercall. During runtime context
> switches the area is mapped to update the guest runstate copy.
> The patch is also removing the initial copy which was done during the
> hypercall handling as this is done on the current core when the context
> is restore to go back to guest execution on arm.
> 
> As a consequence if the register runstate hypercall is called on one
> vcpu for an other vcpu, the area will not be updated until the vcpu
> will be executed (on arm only).
> 
> On x86, the behaviour is not modified, the address conversion is done
> during the context switch and the area is updated fully during the
> hypercall.
> inline functions in headers could not be used as the architecture
> domain.h is included before the global domain.h making it impossible
> to use the struct vcpu inside the architecture header.
> This should not have any performance impact as the hypercall is only
> called once per vcpu usually.
> 
> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
> ---
>  xen/arch/arm/domain.c        | 109 +++++++++++++++++++++++++++++------
>  xen/arch/x86/domain.c        |  30 +++++++++-
>  xen/arch/x86/x86_64/domain.c |   4 +-
>  xen/common/domain.c          |  19 ++----
>  xen/include/asm-arm/domain.h |   8 +++
>  xen/include/asm-x86/domain.h |  16 +++++
>  xen/include/xen/domain.h     |   4 ++
>  xen/include/xen/sched.h      |  16 +----
>  8 files changed, 153 insertions(+), 53 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 31169326b2..739059234f 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -19,6 +19,7 @@
>  #include <xen/sched.h>
>  #include <xen/softirq.h>
>  #include <xen/wait.h>
> +#include <xen/domain_page.h>
>  
>  #include <asm/alternative.h>
>  #include <asm/cpuerrata.h>
> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n)
>      virt_timer_restore(n);
>  }
>  
> -/* Update per-VCPU guest runstate shared memory area (if registered). */
> -static void update_runstate_area(struct vcpu *v)
> +void arch_cleanup_runstate_guest(struct vcpu *v)
>  {
> -    void __user *guest_handle = NULL;
> -    struct vcpu_runstate_info runstate;
> +    spin_lock(&v->arch.runstate_guest.lock);
>  
> -    if ( guest_handle_is_null(runstate_guest(v)) )
> -        return;
> +    /* cleanup previous page if any */
> +    if ( v->arch.runstate_guest.page )
> +    {
> +        put_page_and_type(v->arch.runstate_guest.page);
> +        v->arch.runstate_guest.page = NULL;
> +        v->arch.runstate_guest.offset = 0;
> +    }
>  
> -    memcpy(&runstate, &v->runstate, sizeof(runstate));
> +    spin_unlock(&v->arch.runstate_guest.lock);
> +}
>  
> -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +int arch_setup_runstate_guest(struct vcpu *v,
> +                            struct vcpu_register_runstate_memory_area area)
> +{
> +    struct page_info *page;
> +    unsigned offset;
> +
> +    spin_lock(&v->arch.runstate_guest.lock);
> +
> +    /* cleanup previous page if any */
> +    if ( v->arch.runstate_guest.page )
>      {
> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
> -        guest_handle--;
> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
> -        smp_wmb();
> +        put_page_and_type(v->arch.runstate_guest.page);
> +        v->arch.runstate_guest.page = NULL;
> +        v->arch.runstate_guest.offset = 0;
> +    }
> +
> +    offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
> +    {
> +        spin_unlock(&v->arch.runstate_guest.lock);
> +        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
> +        return -EINVAL;
> +    }
> +
> +    /* provided address must be aligned to a 64bit */
> +    if ( offset % alignof(struct vcpu_runstate_info) )
> +    {
> +        spin_unlock(&v->arch.runstate_guest.lock);
> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n");
> +        return -EINVAL;
> +    }
> +
> +    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
> +    if ( !page )
> +    {
> +        spin_unlock(&v->arch.runstate_guest.lock);
> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");

I would make all these XENLOG_WARNING or XENLOG_ERR, they are errors
after all. This last one I'd say "Could not map runstate point" and also
print the address.


> +        return -EINVAL;
>      }
>  
> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> +    v->arch.runstate_guest.page = page;
> +    v->arch.runstate_guest.offset = offset;
> +
> +    spin_unlock(&v->arch.runstate_guest.lock);
> +
> +    return 0;
> +}
> +
> +
> +/* Update per-VCPU guest runstate shared memory area (if registered). */
> +static void update_runstate_area(struct vcpu *v)
> +{
> +    struct vcpu_runstate_info *guest_runstate;
> +    void *p;
> +
> +    spin_lock(&v->arch.runstate_guest.lock);
>  
> -    if ( guest_handle )
> +    if ( v->arch.runstate_guest.page )
>      {
> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +        p = __map_domain_page(v->arch.runstate_guest.page);
> +        guest_runstate = p + v->arch.runstate_guest.offset;
> +
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;

I think that this write to guest_runstate should use write_atomic or
another atomic write operation.


> +            smp_wmb();
> +        }
> +
> +        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
>          smp_wmb();
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
> +
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;

same here


> +            smp_wmb();
> +        }
> +
> +        unmap_domain_page(p);
>      }
> +
> +    spin_unlock(&v->arch.runstate_guest.lock);
>  }
>  
>  static void schedule_tail(struct vcpu *prev)
> @@ -560,6 +629,8 @@ int arch_vcpu_create(struct vcpu *v)
>      v->arch.saved_context.sp = (register_t)v->arch.cpu_info;
>      v->arch.saved_context.pc = (register_t)continue_new_vcpu;
>  
> +    spin_lock_init(&v->arch.runstate_guest.lock);
> +
>      /* Idle VCPUs don't need the rest of this setup */
>      if ( is_idle_vcpu(v) )
>          return rc;
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index fee6c3931a..32bbed87d5 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>          wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
>  }
>  
> +int arch_setup_runstate_guest(struct vcpu *v,
> +                             struct vcpu_register_runstate_memory_area area)
> +{
> +    struct vcpu_runstate_info runstate;
> +
> +    runstate_guest(v) = area.addr.h;
> +
> +    if ( v == current )
> +    {
> +        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
> +    }
> +    else
> +    {
> +        vcpu_runstate_get(v, &runstate);
> +        __copy_to_guest(runstate_guest(v), &runstate, 1);
> +    }
> +    return 0;
> +}
> +
> +void arch_cleanup_runstate_guest(struct vcpu *v)
> +{
> +    set_xen_guest_handle(runstate_guest(v), NULL);
> +}
> +
>  /* Update per-VCPU guest runstate shared memory area (if registered). */
>  bool update_runstate_area(struct vcpu *v)
>  {
> @@ -1660,8 +1684,8 @@ bool update_runstate_area(struct vcpu *v)
>      if ( VM_ASSIST(v->domain, runstate_update_flag) )
>      {
>          guest_handle = has_32bit_shinfo(v->domain)
> -            ? &v->runstate_guest.compat.p->state_entry_time + 1
> -            : &v->runstate_guest.native.p->state_entry_time + 1;
> +            ? &v->arch.runstate_guest.compat.p->state_entry_time + 1
> +            : &v->arch.runstate_guest.native.p->state_entry_time + 1;
>          guest_handle--;
>          runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>          __raw_copy_to_guest(guest_handle,
> @@ -1674,7 +1698,7 @@ bool update_runstate_area(struct vcpu *v)
>          struct compat_vcpu_runstate_info info;
>  
>          XLAT_vcpu_runstate_info(&info, &runstate);
> -        __copy_to_guest(v->runstate_guest.compat, &info, 1);
> +        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
>          rc = true;
>      }
>      else
> diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
> index c46dccc25a..b879e8dd2c 100644
> --- a/xen/arch/x86/x86_64/domain.c
> +++ b/xen/arch/x86/x86_64/domain.c
> @@ -36,7 +36,7 @@ arch_compat_vcpu_op(
>              break;
>  
>          rc = 0;
> -        guest_from_compat_handle(v->runstate_guest.compat, area.addr.h);
> +        guest_from_compat_handle(v->arch.runstate_guest.compat, area.addr.h);
>  
>          if ( v == current )
>          {
> @@ -49,7 +49,7 @@ arch_compat_vcpu_op(
>              vcpu_runstate_get(v, &runstate);
>              XLAT_vcpu_runstate_info(&info, &runstate);
>          }
> -        __copy_to_guest(v->runstate_guest.compat, &info, 1);
> +        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
>  
>          break;
>      }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 7cc9526139..0ca6bf4dbc 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -727,7 +727,10 @@ int domain_kill(struct domain *d)
>          if ( cpupool_move_domain(d, cpupool0) )
>              return -ERESTART;
>          for_each_vcpu ( d, v )
> +        {
> +            arch_cleanup_runstate_guest(v);
>              unmap_vcpu_info(v);
> +        }
>          d->is_dying = DOMDYING_dead;
>          /* Mem event cleanup has to go here because the rings 
>           * have to be put before we call put_domain. */
> @@ -1167,7 +1170,7 @@ int domain_soft_reset(struct domain *d)
>  
>      for_each_vcpu ( d, v )
>      {
> -        set_xen_guest_handle(runstate_guest(v), NULL);
> +        arch_cleanup_runstate_guest(v);
>          unmap_vcpu_info(v);
>      }
>  
> @@ -1484,7 +1487,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case VCPUOP_register_runstate_memory_area:
>      {
>          struct vcpu_register_runstate_memory_area area;
> -        struct vcpu_runstate_info runstate;
>  
>          rc = -EFAULT;
>          if ( copy_from_guest(&area, arg, 1) )
> @@ -1493,18 +1495,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( !guest_handle_okay(area.addr.h, 1) )
>              break;
>  
> -        rc = 0;
> -        runstate_guest(v) = area.addr.h;
> -
> -        if ( v == current )
> -        {
> -            __copy_to_guest(runstate_guest(v), &v->runstate, 1);
> -        }
> -        else
> -        {
> -            vcpu_runstate_get(v, &runstate);
> -            __copy_to_guest(runstate_guest(v), &runstate, 1);
> -        }
> +        rc = arch_setup_runstate_guest(v, area);
>  
>          break;
>      }
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 4e2f582006..3a7f53e13d 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -11,6 +11,7 @@
>  #include <asm/vgic.h>
>  #include <asm/vpl011.h>
>  #include <public/hvm/params.h>
> +#include <public/vcpu.h>
>  #include <xen/serial.h>
>  #include <xen/rbtree.h>
>  
> @@ -206,6 +207,13 @@ struct arch_vcpu
>       */
>      bool need_flush_to_ram;
>  
> +    /* runstate guest info */
> +    struct {
> +        struct page_info *page;  /* guest page */
> +        unsigned         offset; /* offset in page */
> +        spinlock_t       lock;   /* lock to access page */
> +    } runstate_guest;
> +
>  }  __cacheline_aligned;
>  
>  void vcpu_show_execution_state(struct vcpu *);
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index e8cee3d5e5..f917b48cb8 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -11,6 +11,11 @@
>  #include <asm/x86_emulate.h>
>  #include <public/vcpu.h>
>  #include <public/hvm/hvm_info_table.h>
> +#ifdef CONFIG_COMPAT
> +#include <compat/vcpu.h>
> +DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
> +#endif
> +
>  
>  #define has_32bit_shinfo(d)    ((d)->arch.has_32bit_shinfo)
>  
> @@ -626,6 +631,17 @@ struct arch_vcpu
>      struct {
>          bool next_interrupt_enabled;
>      } monitor;
> +
> +#ifndef CONFIG_COMPAT
> +# define runstate_guest(v) ((v)->arch.runstate_guest)
> +    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
> +#else
> +# define runstate_guest(v) ((v)->arch.runstate_guest.native)
> +    union {
> +        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
> +        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
> +    } runstate_guest;
> +#endif
>  };
>  
>  struct guest_memory_policy
> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
> index 7e51d361de..d5d73ce997 100644
> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -63,6 +63,10 @@ void arch_vcpu_destroy(struct vcpu *v);
>  int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
>  void unmap_vcpu_info(struct vcpu *v);
>  
> +int arch_setup_runstate_guest(struct vcpu *v,
> +                            struct vcpu_register_runstate_memory_area area);

NIT: code style, the indentation is off


> +void arch_cleanup_runstate_guest(struct vcpu *v);
> +
>  int arch_domain_create(struct domain *d,
>                         struct xen_domctl_createdomain *config);
>  
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index ac53519d7f..fac030fb83 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -29,11 +29,6 @@
>  #include <public/vcpu.h>
>  #include <public/event_channel.h>
>  
> -#ifdef CONFIG_COMPAT
> -#include <compat/vcpu.h>
> -DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
> -#endif
> -
>  /*
>   * Stats
>   *
> @@ -166,16 +161,7 @@ struct vcpu
>      struct sched_unit *sched_unit;
>  
>      struct vcpu_runstate_info runstate;
> -#ifndef CONFIG_COMPAT
> -# define runstate_guest(v) ((v)->runstate_guest)
> -    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
> -#else
> -# define runstate_guest(v) ((v)->runstate_guest.native)
> -    union {
> -        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
> -        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
> -    } runstate_guest; /* guest address */
> -#endif
> +
>      unsigned int     new_state;
>  
>      /* Has the FPU been initialised? */
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 18:16   ` Stefano Stabellini
@ 2020-06-11 18:24     ` Julien Grall
  2020-06-11 18:50       ` Stefano Stabellini
  2020-06-12  8:07     ` Bertrand Marquis
  1 sibling, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-06-11 18:24 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

> > +        return -EINVAL;
> >      }
> >
> > -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> > +    v->arch.runstate_guest.page = page;
> > +    v->arch.runstate_guest.offset = offset;
> > +
> > +    spin_unlock(&v->arch.runstate_guest.lock);
> > +
> > +    return 0;
> > +}
> > +
> > +
> > +/* Update per-VCPU guest runstate shared memory area (if registered). */
> > +static void update_runstate_area(struct vcpu *v)
> > +{
> > +    struct vcpu_runstate_info *guest_runstate;
> > +    void *p;
> > +
> > +    spin_lock(&v->arch.runstate_guest.lock);
> >
> > -    if ( guest_handle )
> > +    if ( v->arch.runstate_guest.page )
> >      {
> > -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> > +        p = __map_domain_page(v->arch.runstate_guest.page);
> > +        guest_runstate = p + v->arch.runstate_guest.offset;
> > +
> > +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> > +        {
> > +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> > +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>
> I think that this write to guest_runstate should use write_atomic or
> another atomic write operation.

I thought about suggesting the same, but  guest_copy_* helpers may not
do a single memory write to state_entry_time.
What are you trying to prevent with the write_atomic()?

Cheers,


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 18:24     ` Julien Grall
@ 2020-06-11 18:50       ` Stefano Stabellini
  2020-06-11 19:38         ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Stabellini @ 2020-06-11 18:50 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Bertrand Marquis, Jan Beulich, xen-devel, nd,
	Volodymyr Babchuk, Roger Pau Monné

On Thu, 11 Jun 2020, Julien Grall wrote:
> > > +        return -EINVAL;
> > >      }
> > >
> > > -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> > > +    v->arch.runstate_guest.page = page;
> > > +    v->arch.runstate_guest.offset = offset;
> > > +
> > > +    spin_unlock(&v->arch.runstate_guest.lock);
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +
> > > +/* Update per-VCPU guest runstate shared memory area (if registered). */
> > > +static void update_runstate_area(struct vcpu *v)
> > > +{
> > > +    struct vcpu_runstate_info *guest_runstate;
> > > +    void *p;
> > > +
> > > +    spin_lock(&v->arch.runstate_guest.lock);
> > >
> > > -    if ( guest_handle )
> > > +    if ( v->arch.runstate_guest.page )
> > >      {
> > > -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> > > +        p = __map_domain_page(v->arch.runstate_guest.page);
> > > +        guest_runstate = p + v->arch.runstate_guest.offset;
> > > +
> > > +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> > > +        {
> > > +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> > > +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> >
> > I think that this write to guest_runstate should use write_atomic or
> > another atomic write operation.
> 
> I thought about suggesting the same, but  guest_copy_* helpers may not
> do a single memory write to state_entry_time.
> What are you trying to prevent with the write_atomic()?

I am thinking that without using an atomic write, it would be (at least
theoretically) possible for a guest to see a partial write to
state_entry_time, which is not good. In theory, the set of assembly
instructions generated by the compiler could go through an intermediate
state that we don't want the guest to see. In practice, I doubt that any
possible combination of assembly instructions generated by the compiler
could lead to something harmful.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 18:50       ` Stefano Stabellini
@ 2020-06-11 19:38         ` Julien Grall
  2020-06-12  1:09           ` Stefano Stabellini
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-06-11 19:38 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

Hi Stefano,

On 11/06/2020 19:50, Stefano Stabellini wrote:
> On Thu, 11 Jun 2020, Julien Grall wrote:
>>>> +        return -EINVAL;
>>>>       }
>>>>
>>>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>>>> +    v->arch.runstate_guest.page = page;
>>>> +    v->arch.runstate_guest.offset = offset;
>>>> +
>>>> +    spin_unlock(&v->arch.runstate_guest.lock);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +
>>>> +/* Update per-VCPU guest runstate shared memory area (if registered). */
>>>> +static void update_runstate_area(struct vcpu *v)
>>>> +{
>>>> +    struct vcpu_runstate_info *guest_runstate;
>>>> +    void *p;
>>>> +
>>>> +    spin_lock(&v->arch.runstate_guest.lock);
>>>>
>>>> -    if ( guest_handle )
>>>> +    if ( v->arch.runstate_guest.page )
>>>>       {
>>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>>>> +
>>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>>> +        {
>>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>
>>> I think that this write to guest_runstate should use write_atomic or
>>> another atomic write operation.
>>
>> I thought about suggesting the same, but  guest_copy_* helpers may not
>> do a single memory write to state_entry_time.
>> What are you trying to prevent with the write_atomic()?
> 
> I am thinking that without using an atomic write, it would be (at least
> theoretically) possible for a guest to see a partial write to
> state_entry_time, which is not good. 

It is already the case with existing implementation as Xen may write 
byte by byte. So are you suggesting the existing code is also buggy?

> In theory, the set of assembly
> instructions generated by the compiler could go through an intermediate
> state that we don't want the guest to see. In practice, I doubt that any
> possible combination of assembly instructions generated by the compiler
> could lead to something harmful.

Well, I think you first need to define the theoritical state you are 
worried about. Then we can discuss whether they can happen in practice 
and how we can prevent them in the existing and new code.

So what sort of state your are concerned?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 19:38         ` Julien Grall
@ 2020-06-12  1:09           ` Stefano Stabellini
  2020-06-12  8:13             ` Bertrand Marquis
  2020-06-12  9:53             ` Julien Grall
  0 siblings, 2 replies; 26+ messages in thread
From: Stefano Stabellini @ 2020-06-12  1:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Bertrand Marquis, Roger Pau Monné,
	Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall

On Thu, 11 Jun 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 11/06/2020 19:50, Stefano Stabellini wrote:
> > On Thu, 11 Jun 2020, Julien Grall wrote:
> > > > > +        return -EINVAL;
> > > > >       }
> > > > > 
> > > > > -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> > > > > +    v->arch.runstate_guest.page = page;
> > > > > +    v->arch.runstate_guest.offset = offset;
> > > > > +
> > > > > +    spin_unlock(&v->arch.runstate_guest.lock);
> > > > > +
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > > +
> > > > > +/* Update per-VCPU guest runstate shared memory area (if registered).
> > > > > */
> > > > > +static void update_runstate_area(struct vcpu *v)
> > > > > +{
> > > > > +    struct vcpu_runstate_info *guest_runstate;
> > > > > +    void *p;
> > > > > +
> > > > > +    spin_lock(&v->arch.runstate_guest.lock);
> > > > > 
> > > > > -    if ( guest_handle )
> > > > > +    if ( v->arch.runstate_guest.page )
> > > > >       {
> > > > > -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> > > > > +        p = __map_domain_page(v->arch.runstate_guest.page);
> > > > > +        guest_runstate = p + v->arch.runstate_guest.offset;
> > > > > +
> > > > > +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> > > > > +        {
> > > > > +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> > > > > +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> > > > 
> > > > I think that this write to guest_runstate should use write_atomic or
> > > > another atomic write operation.
> > > 
> > > I thought about suggesting the same, but  guest_copy_* helpers may not
> > > do a single memory write to state_entry_time.
> > > What are you trying to prevent with the write_atomic()?
> > 
> > I am thinking that without using an atomic write, it would be (at least
> > theoretically) possible for a guest to see a partial write to
> > state_entry_time, which is not good. 
> 
> It is already the case with existing implementation as Xen may write byte by
> byte. So are you suggesting the existing code is also buggy?

Writing byte by byte is a different case. That is OK. In that case, the
guest could see the state after 3 bytes written and it would be fine and
consistent. If this hadn't been the case, then yes, the existing code
would also be buggy.

So if we did the write with a memcpy, it would be fine, no need for
atomics:

  memcpy(&guest_runstate->state_entry_time,
         &v->runstate.state_entry_time,
         XXX);


The |= case is different: GCC could implement it in any way it likes,
including going through a zero-write to any of the bytes in the word, or
doing an addition then a subtraction. GCC doesn't make any guarantees.
If we want guarantees we need to use atomics.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12  1:09           ` Stefano Stabellini
@ 2020-06-12  8:13             ` Bertrand Marquis
  2020-06-13  0:24               ` Stefano Stabellini
  2020-06-12  9:53             ` Julien Grall
  1 sibling, 1 reply; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-12  8:13 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Roger Pau Monné,
	Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall



> On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Thu, 11 Jun 2020, Julien Grall wrote:
>> Hi Stefano,
>> 
>> On 11/06/2020 19:50, Stefano Stabellini wrote:
>>> On Thu, 11 Jun 2020, Julien Grall wrote:
>>>>>> +        return -EINVAL;
>>>>>>      }
>>>>>> 
>>>>>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>>>>>> +    v->arch.runstate_guest.page = page;
>>>>>> +    v->arch.runstate_guest.offset = offset;
>>>>>> +
>>>>>> +    spin_unlock(&v->arch.runstate_guest.lock);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered).
>>>>>> */
>>>>>> +static void update_runstate_area(struct vcpu *v)
>>>>>> +{
>>>>>> +    struct vcpu_runstate_info *guest_runstate;
>>>>>> +    void *p;
>>>>>> +
>>>>>> +    spin_lock(&v->arch.runstate_guest.lock);
>>>>>> 
>>>>>> -    if ( guest_handle )
>>>>>> +    if ( v->arch.runstate_guest.page )
>>>>>>      {
>>>>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>>>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>>>>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>>>>>> +
>>>>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>>>>> +        {
>>>>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>>> 
>>>>> I think that this write to guest_runstate should use write_atomic or
>>>>> another atomic write operation.
>>>> 
>>>> I thought about suggesting the same, but  guest_copy_* helpers may not
>>>> do a single memory write to state_entry_time.
>>>> What are you trying to prevent with the write_atomic()?
>>> 
>>> I am thinking that without using an atomic write, it would be (at least
>>> theoretically) possible for a guest to see a partial write to
>>> state_entry_time, which is not good. 
>> 
>> It is already the case with existing implementation as Xen may write byte by
>> byte. So are you suggesting the existing code is also buggy?
> 
> Writing byte by byte is a different case. That is OK. In that case, the
> guest could see the state after 3 bytes written and it would be fine and
> consistent. If this hadn't been the case, then yes, the existing code
> would also be buggy.
> 
> So if we did the write with a memcpy, it would be fine, no need for
> atomics:
> 
>  memcpy(&guest_runstate->state_entry_time,
>         &v->runstate.state_entry_time,
>         XXX);
> 
> 
> The |= case is different: GCC could implement it in any way it likes,
> including going through a zero-write to any of the bytes in the word, or
> doing an addition then a subtraction. GCC doesn't make any guarantees.
> If we want guarantees we need to use atomics.

Wouldn’t that require all accesses to state_entry_time to use also atomic operations ?
In this case we could not propagate the changes to a guest without changing the interface itself.

As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done.
I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done.

Cheers

Bertrand



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12  8:13             ` Bertrand Marquis
@ 2020-06-13  0:24               ` Stefano Stabellini
  2020-06-15 14:09                 ` Bertrand Marquis
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Stabellini @ 2020-06-13  0:24 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Roger Pau Monné,
	Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 3798 bytes --]

On Fri, 12 Jun 2020, Bertrand Marquis wrote:
> > On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > 
> > On Thu, 11 Jun 2020, Julien Grall wrote:
> >> Hi Stefano,
> >> 
> >> On 11/06/2020 19:50, Stefano Stabellini wrote:
> >>> On Thu, 11 Jun 2020, Julien Grall wrote:
> >>>>>> +        return -EINVAL;
> >>>>>>      }
> >>>>>> 
> >>>>>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> >>>>>> +    v->arch.runstate_guest.page = page;
> >>>>>> +    v->arch.runstate_guest.offset = offset;
> >>>>>> +
> >>>>>> +    spin_unlock(&v->arch.runstate_guest.lock);
> >>>>>> +
> >>>>>> +    return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +
> >>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered).
> >>>>>> */
> >>>>>> +static void update_runstate_area(struct vcpu *v)
> >>>>>> +{
> >>>>>> +    struct vcpu_runstate_info *guest_runstate;
> >>>>>> +    void *p;
> >>>>>> +
> >>>>>> +    spin_lock(&v->arch.runstate_guest.lock);
> >>>>>> 
> >>>>>> -    if ( guest_handle )
> >>>>>> +    if ( v->arch.runstate_guest.page )
> >>>>>>      {
> >>>>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> >>>>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
> >>>>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
> >>>>>> +
> >>>>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> >>>>>> +        {
> >>>>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> >>>>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> >>>>> 
> >>>>> I think that this write to guest_runstate should use write_atomic or
> >>>>> another atomic write operation.
> >>>> 
> >>>> I thought about suggesting the same, but  guest_copy_* helpers may not
> >>>> do a single memory write to state_entry_time.
> >>>> What are you trying to prevent with the write_atomic()?
> >>> 
> >>> I am thinking that without using an atomic write, it would be (at least
> >>> theoretically) possible for a guest to see a partial write to
> >>> state_entry_time, which is not good. 
> >> 
> >> It is already the case with existing implementation as Xen may write byte by
> >> byte. So are you suggesting the existing code is also buggy?
> > 
> > Writing byte by byte is a different case. That is OK. In that case, the
> > guest could see the state after 3 bytes written and it would be fine and
> > consistent. If this hadn't been the case, then yes, the existing code
> > would also be buggy.
> > 
> > So if we did the write with a memcpy, it would be fine, no need for
> > atomics:
> > 
> >  memcpy(&guest_runstate->state_entry_time,
> >         &v->runstate.state_entry_time,
> >         XXX);
> > 
> > 
> > The |= case is different: GCC could implement it in any way it likes,
> > including going through a zero-write to any of the bytes in the word, or
> > doing an addition then a subtraction. GCC doesn't make any guarantees.
> > If we want guarantees we need to use atomics.
> 
> Wouldn’t that require all accesses to state_entry_time to use also atomic operations ?
> In this case we could not propagate the changes to a guest without changing the interface itself.
> 
> As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done.
> I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done.

As you say, we have a flag to mark a transitiong period, the flag is
XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during
the transitioning period. But we need to make sure to use atomics for the
update of the XEN_RUNSTATE_UPDATE flag itself.

Does it make sense? Or maybe I misunderstood some of the things you
wrote?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-13  0:24               ` Stefano Stabellini
@ 2020-06-15 14:09                 ` Bertrand Marquis
  2020-06-15 20:30                   ` Stefano Stabellini
  0 siblings, 1 reply; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-15 14:09 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Roger Pau Monné,
	Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall



> On 13 Jun 2020, at 01:24, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Fri, 12 Jun 2020, Bertrand Marquis wrote:
>>> On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>> 
>>> On Thu, 11 Jun 2020, Julien Grall wrote:
>>>> Hi Stefano,
>>>> 
>>>> On 11/06/2020 19:50, Stefano Stabellini wrote:
>>>>> On Thu, 11 Jun 2020, Julien Grall wrote:
>>>>>>>> +        return -EINVAL;
>>>>>>>>     }
>>>>>>>> 
>>>>>>>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>>>>>>>> +    v->arch.runstate_guest.page = page;
>>>>>>>> +    v->arch.runstate_guest.offset = offset;
>>>>>>>> +
>>>>>>>> +    spin_unlock(&v->arch.runstate_guest.lock);
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered).
>>>>>>>> */
>>>>>>>> +static void update_runstate_area(struct vcpu *v)
>>>>>>>> +{
>>>>>>>> +    struct vcpu_runstate_info *guest_runstate;
>>>>>>>> +    void *p;
>>>>>>>> +
>>>>>>>> +    spin_lock(&v->arch.runstate_guest.lock);
>>>>>>>> 
>>>>>>>> -    if ( guest_handle )
>>>>>>>> +    if ( v->arch.runstate_guest.page )
>>>>>>>>     {
>>>>>>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>>>>>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>>>>>>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>>>>>>>> +
>>>>>>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>>>>>>> +        {
>>>>>>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>>>>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>>>>> 
>>>>>>> I think that this write to guest_runstate should use write_atomic or
>>>>>>> another atomic write operation.
>>>>>> 
>>>>>> I thought about suggesting the same, but  guest_copy_* helpers may not
>>>>>> do a single memory write to state_entry_time.
>>>>>> What are you trying to prevent with the write_atomic()?
>>>>> 
>>>>> I am thinking that without using an atomic write, it would be (at least
>>>>> theoretically) possible for a guest to see a partial write to
>>>>> state_entry_time, which is not good. 
>>>> 
>>>> It is already the case with existing implementation as Xen may write byte by
>>>> byte. So are you suggesting the existing code is also buggy?
>>> 
>>> Writing byte by byte is a different case. That is OK. In that case, the
>>> guest could see the state after 3 bytes written and it would be fine and
>>> consistent. If this hadn't been the case, then yes, the existing code
>>> would also be buggy.
>>> 
>>> So if we did the write with a memcpy, it would be fine, no need for
>>> atomics:
>>> 
>>> memcpy(&guest_runstate->state_entry_time,
>>>        &v->runstate.state_entry_time,
>>>        XXX);
>>> 
>>> 
>>> The |= case is different: GCC could implement it in any way it likes,
>>> including going through a zero-write to any of the bytes in the word, or
>>> doing an addition then a subtraction. GCC doesn't make any guarantees.
>>> If we want guarantees we need to use atomics.
>> 
>> Wouldn’t that require all accesses to state_entry_time to use also atomic operations ?
>> In this case we could not propagate the changes to a guest without changing the interface itself.
>> 
>> As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done.
>> I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done.
> 
> As you say, we have a flag to mark a transitiong period, the flag is
> XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during
> the transitioning period. But we need to make sure to use atomics for the
> update of the XEN_RUNSTATE_UPDATE flag itself.
> 
> Does it make sense? Or maybe I misunderstood some of the things you
> wrote?

To achieve this you would do an atomic operation on state_entry_time to set/unset the XEN_RUNSTATE_UPDATE bit.
This field is holding a flag in the upper bit but also a value, so all operations on state_entry_time would need to be changed to use atomic operations.

Also this state_entry_time might also be accessed by the guest on other cores at the same time (to retrieve the time part).

To prevent something being used as atomic and non atomic, specific types are usually used (atomic_t) and this structure is also used by guests so modifying it will not be easy.

Or did I missunderstood something here ?

Regards
Bertrand





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-15 14:09                 ` Bertrand Marquis
@ 2020-06-15 20:30                   ` Stefano Stabellini
  2020-06-15 20:44                     ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Stefano Stabellini @ 2020-06-15 20:30 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Roger Pau Monné,
	Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 5231 bytes --]

On Mon, 15 Jun 2020, Bertrand Marquis wrote:
> > On 13 Jun 2020, at 01:24, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > 
> > On Fri, 12 Jun 2020, Bertrand Marquis wrote:
> >>> On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >>> 
> >>> On Thu, 11 Jun 2020, Julien Grall wrote:
> >>>> Hi Stefano,
> >>>> 
> >>>> On 11/06/2020 19:50, Stefano Stabellini wrote:
> >>>>> On Thu, 11 Jun 2020, Julien Grall wrote:
> >>>>>>>> +        return -EINVAL;
> >>>>>>>>     }
> >>>>>>>> 
> >>>>>>>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> >>>>>>>> +    v->arch.runstate_guest.page = page;
> >>>>>>>> +    v->arch.runstate_guest.offset = offset;
> >>>>>>>> +
> >>>>>>>> +    spin_unlock(&v->arch.runstate_guest.lock);
> >>>>>>>> +
> >>>>>>>> +    return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +
> >>>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered).
> >>>>>>>> */
> >>>>>>>> +static void update_runstate_area(struct vcpu *v)
> >>>>>>>> +{
> >>>>>>>> +    struct vcpu_runstate_info *guest_runstate;
> >>>>>>>> +    void *p;
> >>>>>>>> +
> >>>>>>>> +    spin_lock(&v->arch.runstate_guest.lock);
> >>>>>>>> 
> >>>>>>>> -    if ( guest_handle )
> >>>>>>>> +    if ( v->arch.runstate_guest.page )
> >>>>>>>>     {
> >>>>>>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> >>>>>>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
> >>>>>>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
> >>>>>>>> +
> >>>>>>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> >>>>>>>> +        {
> >>>>>>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> >>>>>>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> >>>>>>> 
> >>>>>>> I think that this write to guest_runstate should use write_atomic or
> >>>>>>> another atomic write operation.
> >>>>>> 
> >>>>>> I thought about suggesting the same, but  guest_copy_* helpers may not
> >>>>>> do a single memory write to state_entry_time.
> >>>>>> What are you trying to prevent with the write_atomic()?
> >>>>> 
> >>>>> I am thinking that without using an atomic write, it would be (at least
> >>>>> theoretically) possible for a guest to see a partial write to
> >>>>> state_entry_time, which is not good. 
> >>>> 
> >>>> It is already the case with existing implementation as Xen may write byte by
> >>>> byte. So are you suggesting the existing code is also buggy?
> >>> 
> >>> Writing byte by byte is a different case. That is OK. In that case, the
> >>> guest could see the state after 3 bytes written and it would be fine and
> >>> consistent. If this hadn't been the case, then yes, the existing code
> >>> would also be buggy.
> >>> 
> >>> So if we did the write with a memcpy, it would be fine, no need for
> >>> atomics:
> >>> 
> >>> memcpy(&guest_runstate->state_entry_time,
> >>>        &v->runstate.state_entry_time,
> >>>        XXX);
> >>> 
> >>> 
> >>> The |= case is different: GCC could implement it in any way it likes,
> >>> including going through a zero-write to any of the bytes in the word, or
> >>> doing an addition then a subtraction. GCC doesn't make any guarantees.
> >>> If we want guarantees we need to use atomics.
> >> 
> >> Wouldn’t that require all accesses to state_entry_time to use also atomic operations ?
> >> In this case we could not propagate the changes to a guest without changing the interface itself.
> >> 
> >> As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done.
> >> I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done.
> > 
> > As you say, we have a flag to mark a transitiong period, the flag is
> > XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during
> > the transitioning period. But we need to make sure to use atomics for the
> > update of the XEN_RUNSTATE_UPDATE flag itself.
> > 
> > Does it make sense? Or maybe I misunderstood some of the things you
> > wrote?
> 
> To achieve this you would do an atomic operation on state_entry_time to set/unset the XEN_RUNSTATE_UPDATE bit.
> This field is holding a flag in the upper bit but also a value, so all operations on state_entry_time would need to be changed to use atomic operations.

I don't think that all operations on state_entry_time need to be atomic.
Only the bit write to XEN_RUNSTATE_UPDATE. More on this below.


> Also this state_entry_time might also be accessed by the guest on other cores at the same time (to retrieve the time part).

Yes but they are all just readers, right?


> To prevent something being used as atomic and non atomic, specific types are usually used (atomic_t) and this structure is also used by guests so modifying it will not be easy.
> 
> Or did I missunderstood something here ?

I was not suggesting to use an atomic_t type. I was only suggesting to
use an atomic operation, i.e. calling write_u32_atomic directly (or
something like that.) I would not change the type of state_entry_time.
Also using memcpy would be acceptable due to the fact that we only need
to update one byte.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-15 20:30                   ` Stefano Stabellini
@ 2020-06-15 20:44                     ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2020-06-15 20:44 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

On Mon, 15 Jun 2020 at 21:30, Stefano Stabellini <sstabellini@kernel.org> wrote:
>
> On Mon, 15 Jun 2020, Bertrand Marquis wrote:
> > > On 13 Jun 2020, at 01:24, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > >
> > > On Fri, 12 Jun 2020, Bertrand Marquis wrote:
> > >>> On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > >>>
> > >>> On Thu, 11 Jun 2020, Julien Grall wrote:
> > >>>> Hi Stefano,
> > >>>>
> > >>>> On 11/06/2020 19:50, Stefano Stabellini wrote:
> > >>>>> On Thu, 11 Jun 2020, Julien Grall wrote:
> > >>>>>>>> +        return -EINVAL;
> > >>>>>>>>     }
> > >>>>>>>>
> > >>>>>>>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> > >>>>>>>> +    v->arch.runstate_guest.page = page;
> > >>>>>>>> +    v->arch.runstate_guest.offset = offset;
> > >>>>>>>> +
> > >>>>>>>> +    spin_unlock(&v->arch.runstate_guest.lock);
> > >>>>>>>> +
> > >>>>>>>> +    return 0;
> > >>>>>>>> +}
> > >>>>>>>> +
> > >>>>>>>> +
> > >>>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered).
> > >>>>>>>> */
> > >>>>>>>> +static void update_runstate_area(struct vcpu *v)
> > >>>>>>>> +{
> > >>>>>>>> +    struct vcpu_runstate_info *guest_runstate;
> > >>>>>>>> +    void *p;
> > >>>>>>>> +
> > >>>>>>>> +    spin_lock(&v->arch.runstate_guest.lock);
> > >>>>>>>>
> > >>>>>>>> -    if ( guest_handle )
> > >>>>>>>> +    if ( v->arch.runstate_guest.page )
> > >>>>>>>>     {
> > >>>>>>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> > >>>>>>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
> > >>>>>>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
> > >>>>>>>> +
> > >>>>>>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> > >>>>>>>> +        {
> > >>>>>>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> > >>>>>>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> > >>>>>>>
> > >>>>>>> I think that this write to guest_runstate should use write_atomic or
> > >>>>>>> another atomic write operation.
> > >>>>>>
> > >>>>>> I thought about suggesting the same, but  guest_copy_* helpers may not
> > >>>>>> do a single memory write to state_entry_time.
> > >>>>>> What are you trying to prevent with the write_atomic()?
> > >>>>>
> > >>>>> I am thinking that without using an atomic write, it would be (at least
> > >>>>> theoretically) possible for a guest to see a partial write to
> > >>>>> state_entry_time, which is not good.
> > >>>>
> > >>>> It is already the case with existing implementation as Xen may write byte by
> > >>>> byte. So are you suggesting the existing code is also buggy?
> > >>>
> > >>> Writing byte by byte is a different case. That is OK. In that case, the
> > >>> guest could see the state after 3 bytes written and it would be fine and
> > >>> consistent. If this hadn't been the case, then yes, the existing code
> > >>> would also be buggy.
> > >>>
> > >>> So if we did the write with a memcpy, it would be fine, no need for
> > >>> atomics:
> > >>>
> > >>> memcpy(&guest_runstate->state_entry_time,
> > >>>        &v->runstate.state_entry_time,
> > >>>        XXX);
> > >>>
> > >>>
> > >>> The |= case is different: GCC could implement it in any way it likes,
> > >>> including going through a zero-write to any of the bytes in the word, or
> > >>> doing an addition then a subtraction. GCC doesn't make any guarantees.
> > >>> If we want guarantees we need to use atomics.
> > >>
> > >> Wouldn’t that require all accesses to state_entry_time to use also atomic operations ?
> > >> In this case we could not propagate the changes to a guest without changing the interface itself.
> > >>
> > >> As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done.
> > >> I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done.
> > >
> > > As you say, we have a flag to mark a transitiong period, the flag is
> > > XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during
> > > the transitioning period. But we need to make sure to use atomics for the
> > > update of the XEN_RUNSTATE_UPDATE flag itself.
> > >
> > > Does it make sense? Or maybe I misunderstood some of the things you
> > > wrote?
> >
> > To achieve this you would do an atomic operation on state_entry_time to set/unset the XEN_RUNSTATE_UPDATE bit.
> > This field is holding a flag in the upper bit but also a value, so all operations on state_entry_time would need to be changed to use atomic operations.
>
> I don't think that all operations on state_entry_time need to be atomic.
> Only the bit write to XEN_RUNSTATE_UPDATE. More on this below.
>
>
> > Also this state_entry_time might also be accessed by the guest on other cores at the same time (to retrieve the time part).
>
> Yes but they are all just readers, right?

This is only one part of the equation. The second part is the readers
*must not* process the rest of the content while XEN_RUNSTATE_UDPATE
is set.

>
> I was not suggesting to use an atomic_t type. I was only suggesting to
> use an atomic operation, i.e. calling write_u32_atomic directly (or
> something like that.) I would not change the type of state_entry_time.
> Also using memcpy would be acceptable due to the fact that we only need
> to update one byte.

Please don't use memcpy. This is an abuse to think it can make atomic
copy (even if it may be correct for byte).
At the same time, lets avoid to introduce more atomic bit operation on
guest memory that can be read-write (see XSA-295).

In this case a write_atomic() is sufficient as it would do a single
write for size smaller than 64-bit.

Cheers,


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12  1:09           ` Stefano Stabellini
  2020-06-12  8:13             ` Bertrand Marquis
@ 2020-06-12  9:53             ` Julien Grall
  2020-06-13  0:24               ` Stefano Stabellini
  1 sibling, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-06-12  9:53 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Roger Pau Monné,
	Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Julien Grall



On 12/06/2020 02:09, Stefano Stabellini wrote:
> On Thu, 11 Jun 2020, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 11/06/2020 19:50, Stefano Stabellini wrote:
>>> On Thu, 11 Jun 2020, Julien Grall wrote:
>>>>>> +        return -EINVAL;
>>>>>>        }
>>>>>>
>>>>>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>>>>>> +    v->arch.runstate_guest.page = page;
>>>>>> +    v->arch.runstate_guest.offset = offset;
>>>>>> +
>>>>>> +    spin_unlock(&v->arch.runstate_guest.lock);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered).
>>>>>> */
>>>>>> +static void update_runstate_area(struct vcpu *v)
>>>>>> +{
>>>>>> +    struct vcpu_runstate_info *guest_runstate;
>>>>>> +    void *p;
>>>>>> +
>>>>>> +    spin_lock(&v->arch.runstate_guest.lock);
>>>>>>
>>>>>> -    if ( guest_handle )
>>>>>> +    if ( v->arch.runstate_guest.page )
>>>>>>        {
>>>>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>>>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>>>>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>>>>>> +
>>>>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>>>>> +        {
>>>>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>>>
>>>>> I think that this write to guest_runstate should use write_atomic or
>>>>> another atomic write operation.
>>>>
>>>> I thought about suggesting the same, but  guest_copy_* helpers may not
>>>> do a single memory write to state_entry_time.
>>>> What are you trying to prevent with the write_atomic()?
>>>
>>> I am thinking that without using an atomic write, it would be (at least
>>> theoretically) possible for a guest to see a partial write to
>>> state_entry_time, which is not good.
>>
>> It is already the case with existing implementation as Xen may write byte by
>> byte. So are you suggesting the existing code is also buggy?

It looks like I may have misread the code as we only copy one byte. But 
I still think this is fragile.

For this context, I agree that a write_atomic() should do the job.

However, I still want to answer to your comments below.

> 
> Writing byte by byte is a different case. That is OK. In that case, the
> guest could see the state after 3 bytes written and it would be fine and
> consistent.

Why? What does actually prevent a guest to see an in-between value?

To give a concrete example, if the original value is 0xabc and you want 
to write 0xdef. Why would the guest never see 0xabf or 0xaec?

> If this hadn't been the case, then yes, the existing code
> would also be buggy.
> 
> So if we did the write with a memcpy, it would be fine, no need for
> atomics:
> 
>    memcpy(&guest_runstate->state_entry_time,
>           &v->runstate.state_entry_time,
>           XXX);
> 
> 
> The |= case is different: GCC could implement it in any way it likes,
> including going through a zero-write to any of the bytes in the word, or
> doing an addition then a subtraction. GCC doesn't make any guarantees.
> If we want guarantees we need to use atomics.

Yes GCC can generate assembly for |= in any way it likes. But so does 
for memcpy(). So I still don't understand why one would be fine for you 
and not the other...

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12  9:53             ` Julien Grall
@ 2020-06-13  0:24               ` Stefano Stabellini
  0 siblings, 0 replies; 26+ messages in thread
From: Stefano Stabellini @ 2020-06-13  0:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Bertrand Marquis, Roger Pau Monné,
	Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall

On Fri, 12 Jun 2020, Julien Grall wrote:
> > Writing byte by byte is a different case. That is OK. In that case, the
> > guest could see the state after 3 bytes written and it would be fine and
> > consistent.
> 
> Why? What does actually prevent a guest to see an in-between value?
> 
> To give a concrete example, if the original value is 0xabc and you want to
> write 0xdef. Why would the guest never see 0xabf or 0xaec?

That is a good question, but the minimum granularity is 1 byte, so if

  current: 0xaabbcc
  new: 0xddeeff

Can the guest see one of the following?

  0xaabbff
  0xaaeecc

Yes, it can. I don't think it is a problem in this case because we only
change 1 byte, so to continue with the example:

  current: 0xaabbcc
  new: 0xffbbcc

So the only values the VM can see are either 0xaabbcc or 0xffbbcc.


> > If this hadn't been the case, then yes, the existing code
> > would also be buggy.
> > 
> > So if we did the write with a memcpy, it would be fine, no need for
> > atomics:
> > 
> >    memcpy(&guest_runstate->state_entry_time,
> >           &v->runstate.state_entry_time,
> >           XXX);
> > 
> > 
> > The |= case is different: GCC could implement it in any way it likes,
> > including going through a zero-write to any of the bytes in the word, or
> > doing an addition then a subtraction. GCC doesn't make any guarantees.
> > If we want guarantees we need to use atomics.
> 
> Yes GCC can generate assembly for |= in any way it likes. But so does for
> memcpy(). So I still don't understand why one would be fine for you and not
> the other...

It is down to the implementation of memcpy to guarantee that the only
thing memcpy does is memory copies. memcpy doesn't specify whether it is
going to use byte-copies or word-copies, but it should only do copies.
If we have to write memcpy in assembly to make it so, so be it :-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 18:16   ` Stefano Stabellini
  2020-06-11 18:24     ` Julien Grall
@ 2020-06-12  8:07     ` Bertrand Marquis
  1 sibling, 0 replies; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-12  8:07 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné



> On 11 Jun 2020, at 19:16, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Thu, 11 Jun 2020, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>> 
>> This patch is modifying the register runstate area handling on arm to
>> convert the guest address during the hypercall. During runtime context
>> switches the area is mapped to update the guest runstate copy.
>> The patch is also removing the initial copy which was done during the
>> hypercall handling as this is done on the current core when the context
>> is restore to go back to guest execution on arm.
>> 
>> As a consequence if the register runstate hypercall is called on one
>> vcpu for an other vcpu, the area will not be updated until the vcpu
>> will be executed (on arm only).
>> 
>> On x86, the behaviour is not modified, the address conversion is done
>> during the context switch and the area is updated fully during the
>> hypercall.
>> inline functions in headers could not be used as the architecture
>> domain.h is included before the global domain.h making it impossible
>> to use the struct vcpu inside the architecture header.
>> This should not have any performance impact as the hypercall is only
>> called once per vcpu usually.
>> 
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
>> ---
>> xen/arch/arm/domain.c        | 109 +++++++++++++++++++++++++++++------
>> xen/arch/x86/domain.c        |  30 +++++++++-
>> xen/arch/x86/x86_64/domain.c |   4 +-
>> xen/common/domain.c          |  19 ++----
>> xen/include/asm-arm/domain.h |   8 +++
>> xen/include/asm-x86/domain.h |  16 +++++
>> xen/include/xen/domain.h     |   4 ++
>> xen/include/xen/sched.h      |  16 +----
>> 8 files changed, 153 insertions(+), 53 deletions(-)
>> 
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 31169326b2..739059234f 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -19,6 +19,7 @@
>> #include <xen/sched.h>
>> #include <xen/softirq.h>
>> #include <xen/wait.h>
>> +#include <xen/domain_page.h>
>> 
>> #include <asm/alternative.h>
>> #include <asm/cpuerrata.h>
>> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n)
>>     virt_timer_restore(n);
>> }
>> 
>> -/* Update per-VCPU guest runstate shared memory area (if registered). */
>> -static void update_runstate_area(struct vcpu *v)
>> +void arch_cleanup_runstate_guest(struct vcpu *v)
>> {
>> -    void __user *guest_handle = NULL;
>> -    struct vcpu_runstate_info runstate;
>> +    spin_lock(&v->arch.runstate_guest.lock);
>> 
>> -    if ( guest_handle_is_null(runstate_guest(v)) )
>> -        return;
>> +    /* cleanup previous page if any */
>> +    if ( v->arch.runstate_guest.page )
>> +    {
>> +        put_page_and_type(v->arch.runstate_guest.page);
>> +        v->arch.runstate_guest.page = NULL;
>> +        v->arch.runstate_guest.offset = 0;
>> +    }
>> 
>> -    memcpy(&runstate, &v->runstate, sizeof(runstate));
>> +    spin_unlock(&v->arch.runstate_guest.lock);
>> +}
>> 
>> -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +int arch_setup_runstate_guest(struct vcpu *v,
>> +                            struct vcpu_register_runstate_memory_area area)
>> +{
>> +    struct page_info *page;
>> +    unsigned offset;
>> +
>> +    spin_lock(&v->arch.runstate_guest.lock);
>> +
>> +    /* cleanup previous page if any */
>> +    if ( v->arch.runstate_guest.page )
>>     {
>> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>> -        guest_handle--;
>> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> -        smp_wmb();
>> +        put_page_and_type(v->arch.runstate_guest.page);
>> +        v->arch.runstate_guest.page = NULL;
>> +        v->arch.runstate_guest.offset = 0;
>> +    }
>> +
>> +    offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
>> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* provided address must be aligned to a 64bit */
>> +    if ( offset % alignof(struct vcpu_runstate_info) )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
>> +    if ( !page )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
> 
> I would make all these XENLOG_WARNING or XENLOG_ERR, they are errors
> after all. This last one I'd say "Could not map runstate point" and also
> print the address.

Ok I will fix it to Warning and change the message like this.

> 
> 
>> +        return -EINVAL;
>>     }
>> 
>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>> +    v->arch.runstate_guest.page = page;
>> +    v->arch.runstate_guest.offset = offset;
>> +
>> +    spin_unlock(&v->arch.runstate_guest.lock);
>> +
>> +    return 0;
>> +}
>> +
>> +
>> +/* Update per-VCPU guest runstate shared memory area (if registered). */
>> +static void update_runstate_area(struct vcpu *v)
>> +{
>> +    struct vcpu_runstate_info *guest_runstate;
>> +    void *p;
>> +
>> +    spin_lock(&v->arch.runstate_guest.lock);
>> 
>> -    if ( guest_handle )
>> +    if ( v->arch.runstate_guest.page )
>>     {
>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>> +
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> 
> I think that this write to guest_runstate should use write_atomic or
> another atomic write operation.

I will answer at the end of the discussion on the subject.

> 
> 
>> +            smp_wmb();
>> +        }
>> +
>> +        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
>>         smp_wmb();
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> +
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> 
> same here
> 
> 
>> +            smp_wmb();
>> +        }
>> +
>> +        unmap_domain_page(p);
>>     }
>> +
>> +    spin_unlock(&v->arch.runstate_guest.lock);
>> }
>> 
>> static void schedule_tail(struct vcpu *prev)
>> @@ -560,6 +629,8 @@ int arch_vcpu_create(struct vcpu *v)
>>     v->arch.saved_context.sp = (register_t)v->arch.cpu_info;
>>     v->arch.saved_context.pc = (register_t)continue_new_vcpu;
>> 
>> +    spin_lock_init(&v->arch.runstate_guest.lock);
>> +
>>     /* Idle VCPUs don't need the rest of this setup */
>>     if ( is_idle_vcpu(v) )
>>         return rc;
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index fee6c3931a..32bbed87d5 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>>         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
>> }
>> 
>> +int arch_setup_runstate_guest(struct vcpu *v,
>> +                             struct vcpu_register_runstate_memory_area area)
>> +{
>> +    struct vcpu_runstate_info runstate;
>> +
>> +    runstate_guest(v) = area.addr.h;
>> +
>> +    if ( v == current )
>> +    {
>> +        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
>> +    }
>> +    else
>> +    {
>> +        vcpu_runstate_get(v, &runstate);
>> +        __copy_to_guest(runstate_guest(v), &runstate, 1);
>> +    }
>> +    return 0;
>> +}
>> +
>> +void arch_cleanup_runstate_guest(struct vcpu *v)
>> +{
>> +    set_xen_guest_handle(runstate_guest(v), NULL);
>> +}
>> +
>> /* Update per-VCPU guest runstate shared memory area (if registered). */
>> bool update_runstate_area(struct vcpu *v)
>> {
>> @@ -1660,8 +1684,8 @@ bool update_runstate_area(struct vcpu *v)
>>     if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>     {
>>         guest_handle = has_32bit_shinfo(v->domain)
>> -            ? &v->runstate_guest.compat.p->state_entry_time + 1
>> -            : &v->runstate_guest.native.p->state_entry_time + 1;
>> +            ? &v->arch.runstate_guest.compat.p->state_entry_time + 1
>> +            : &v->arch.runstate_guest.native.p->state_entry_time + 1;
>>         guest_handle--;
>>         runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>         __raw_copy_to_guest(guest_handle,
>> @@ -1674,7 +1698,7 @@ bool update_runstate_area(struct vcpu *v)
>>         struct compat_vcpu_runstate_info info;
>> 
>>         XLAT_vcpu_runstate_info(&info, &runstate);
>> -        __copy_to_guest(v->runstate_guest.compat, &info, 1);
>> +        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
>>         rc = true;
>>     }
>>     else
>> diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
>> index c46dccc25a..b879e8dd2c 100644
>> --- a/xen/arch/x86/x86_64/domain.c
>> +++ b/xen/arch/x86/x86_64/domain.c
>> @@ -36,7 +36,7 @@ arch_compat_vcpu_op(
>>             break;
>> 
>>         rc = 0;
>> -        guest_from_compat_handle(v->runstate_guest.compat, area.addr.h);
>> +        guest_from_compat_handle(v->arch.runstate_guest.compat, area.addr.h);
>> 
>>         if ( v == current )
>>         {
>> @@ -49,7 +49,7 @@ arch_compat_vcpu_op(
>>             vcpu_runstate_get(v, &runstate);
>>             XLAT_vcpu_runstate_info(&info, &runstate);
>>         }
>> -        __copy_to_guest(v->runstate_guest.compat, &info, 1);
>> +        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
>> 
>>         break;
>>     }
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 7cc9526139..0ca6bf4dbc 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -727,7 +727,10 @@ int domain_kill(struct domain *d)
>>         if ( cpupool_move_domain(d, cpupool0) )
>>             return -ERESTART;
>>         for_each_vcpu ( d, v )
>> +        {
>> +            arch_cleanup_runstate_guest(v);
>>             unmap_vcpu_info(v);
>> +        }
>>         d->is_dying = DOMDYING_dead;
>>         /* Mem event cleanup has to go here because the rings 
>>          * have to be put before we call put_domain. */
>> @@ -1167,7 +1170,7 @@ int domain_soft_reset(struct domain *d)
>> 
>>     for_each_vcpu ( d, v )
>>     {
>> -        set_xen_guest_handle(runstate_guest(v), NULL);
>> +        arch_cleanup_runstate_guest(v);
>>         unmap_vcpu_info(v);
>>     }
>> 
>> @@ -1484,7 +1487,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>>     case VCPUOP_register_runstate_memory_area:
>>     {
>>         struct vcpu_register_runstate_memory_area area;
>> -        struct vcpu_runstate_info runstate;
>> 
>>         rc = -EFAULT;
>>         if ( copy_from_guest(&area, arg, 1) )
>> @@ -1493,18 +1495,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>>         if ( !guest_handle_okay(area.addr.h, 1) )
>>             break;
>> 
>> -        rc = 0;
>> -        runstate_guest(v) = area.addr.h;
>> -
>> -        if ( v == current )
>> -        {
>> -            __copy_to_guest(runstate_guest(v), &v->runstate, 1);
>> -        }
>> -        else
>> -        {
>> -            vcpu_runstate_get(v, &runstate);
>> -            __copy_to_guest(runstate_guest(v), &runstate, 1);
>> -        }
>> +        rc = arch_setup_runstate_guest(v, area);
>> 
>>         break;
>>     }
>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>> index 4e2f582006..3a7f53e13d 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -11,6 +11,7 @@
>> #include <asm/vgic.h>
>> #include <asm/vpl011.h>
>> #include <public/hvm/params.h>
>> +#include <public/vcpu.h>
>> #include <xen/serial.h>
>> #include <xen/rbtree.h>
>> 
>> @@ -206,6 +207,13 @@ struct arch_vcpu
>>      */
>>     bool need_flush_to_ram;
>> 
>> +    /* runstate guest info */
>> +    struct {
>> +        struct page_info *page;  /* guest page */
>> +        unsigned         offset; /* offset in page */
>> +        spinlock_t       lock;   /* lock to access page */
>> +    } runstate_guest;
>> +
>> }  __cacheline_aligned;
>> 
>> void vcpu_show_execution_state(struct vcpu *);
>> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
>> index e8cee3d5e5..f917b48cb8 100644
>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -11,6 +11,11 @@
>> #include <asm/x86_emulate.h>
>> #include <public/vcpu.h>
>> #include <public/hvm/hvm_info_table.h>
>> +#ifdef CONFIG_COMPAT
>> +#include <compat/vcpu.h>
>> +DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
>> +#endif
>> +
>> 
>> #define has_32bit_shinfo(d)    ((d)->arch.has_32bit_shinfo)
>> 
>> @@ -626,6 +631,17 @@ struct arch_vcpu
>>     struct {
>>         bool next_interrupt_enabled;
>>     } monitor;
>> +
>> +#ifndef CONFIG_COMPAT
>> +# define runstate_guest(v) ((v)->arch.runstate_guest)
>> +    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
>> +#else
>> +# define runstate_guest(v) ((v)->arch.runstate_guest.native)
>> +    union {
>> +        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
>> +        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
>> +    } runstate_guest;
>> +#endif
>> };
>> 
>> struct guest_memory_policy
>> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
>> index 7e51d361de..d5d73ce997 100644
>> --- a/xen/include/xen/domain.h
>> +++ b/xen/include/xen/domain.h
>> @@ -63,6 +63,10 @@ void arch_vcpu_destroy(struct vcpu *v);
>> int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
>> void unmap_vcpu_info(struct vcpu *v);
>> 
>> +int arch_setup_runstate_guest(struct vcpu *v,
>> +                            struct vcpu_register_runstate_memory_area area);
> 
> NIT: code style, the indentation is off

Ok

> 
> 
>> +void arch_cleanup_runstate_guest(struct vcpu *v);
>> +
>> int arch_domain_create(struct domain *d,
>>                        struct xen_domctl_createdomain *config);
>> 
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index ac53519d7f..fac030fb83 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -29,11 +29,6 @@
>> #include <public/vcpu.h>
>> #include <public/event_channel.h>
>> 
>> -#ifdef CONFIG_COMPAT
>> -#include <compat/vcpu.h>
>> -DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
>> -#endif
>> -
>> /*
>>  * Stats
>>  *
>> @@ -166,16 +161,7 @@ struct vcpu
>>     struct sched_unit *sched_unit;
>> 
>>     struct vcpu_runstate_info runstate;
>> -#ifndef CONFIG_COMPAT
>> -# define runstate_guest(v) ((v)->runstate_guest)
>> -    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
>> -#else
>> -# define runstate_guest(v) ((v)->runstate_guest.native)
>> -    union {
>> -        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
>> -        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
>> -    } runstate_guest; /* guest address */
>> -#endif
>> +
>>     unsigned int     new_state;
>> 
>>     /* Has the FPU been initialised? */
>> -- 
>> 2.17.1



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis
  2020-06-11 18:16   ` Stefano Stabellini
@ 2020-06-12 10:53   ` Julien Grall
  2020-06-12 14:13     ` Bertrand Marquis
  2020-06-12 16:51     ` Bertrand Marquis
  1 sibling, 2 replies; 26+ messages in thread
From: Julien Grall @ 2020-06-12 10:53 UTC (permalink / raw)
  To: Bertrand Marquis, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, nd, Volodymyr Babchuk,
	Roger Pau Monné

Hi Bertrand,

On 11/06/2020 12:58, Bertrand Marquis wrote:
> At the moment on Arm, a Linux guest running with KTPI enabled will
> cause the following error when a context switch happens in user mode:
> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0

I think you want to add a bit more context explaining the reason on the 
failure. I.e. this is because the virtual address used by the runstate 
is only accessible when running in kernel space.

> 
> This patch is modifying the register runstate area handling on arm to
> convert the guest address during the hypercall. During runtime context
> switches the area is mapped to update the guest runstate copy.
> The patch is also removing the initial copy which was done during the
> hypercall handling as this is done on the current core when the context
> is restore to go back to guest execution on arm.

This is explaining what the commit does but not why we want to translate 
the virtual address at hypercall time. More importantly, this doesn't 
explain the restrictions added on the hypercall and why they are fine.

Note that they also should be documented in the public header.

> 
> As a consequence if the register runstate hypercall is called on one
> vcpu for an other vcpu, the area will not be updated until the vcpu
> will be executed (on arm only).

The code below suggests the hypercall will just fail if you call it from 
a different vCPU. Is that correct?

> 
> On x86, the behaviour is not modified, the address conversion is done
> during the context switch and the area is updated fully during the
> hypercall.
> inline functions in headers could not be used as the architecture
> domain.h is included before the global domain.h making it impossible
> to use the struct vcpu inside the architecture header.
> This should not have any performance impact as the hypercall is only
> called once per vcpu usually.
> 
> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
> ---
>   xen/arch/arm/domain.c        | 109 +++++++++++++++++++++++++++++------
>   xen/arch/x86/domain.c        |  30 +++++++++-
>   xen/arch/x86/x86_64/domain.c |   4 +-
>   xen/common/domain.c          |  19 ++----
>   xen/include/asm-arm/domain.h |   8 +++
>   xen/include/asm-x86/domain.h |  16 +++++
>   xen/include/xen/domain.h     |   4 ++
>   xen/include/xen/sched.h      |  16 +----
>   8 files changed, 153 insertions(+), 53 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 31169326b2..739059234f 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -19,6 +19,7 @@
>   #include <xen/sched.h>
>   #include <xen/softirq.h>
>   #include <xen/wait.h>
> +#include <xen/domain_page.h>
>   
>   #include <asm/alternative.h>
>   #include <asm/cpuerrata.h>
> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n)
>       virt_timer_restore(n);
>   }
>   
> -/* Update per-VCPU guest runstate shared memory area (if registered). */
> -static void update_runstate_area(struct vcpu *v)
> +void arch_cleanup_runstate_guest(struct vcpu *v)

I would prefer if this is name arch_vcpu_cleanup_runstate() as this is 
per-vCPU and not per-domain information.

>   {
> -    void __user *guest_handle = NULL;
> -    struct vcpu_runstate_info runstate;
> +    spin_lock(&v->arch.runstate_guest.lock);
>   
> -    if ( guest_handle_is_null(runstate_guest(v)) )
> -        return;
> +    /* cleanup previous page if any */
> +    if ( v->arch.runstate_guest.page )
> +    {
> +        put_page_and_type(v->arch.runstate_guest.page);

get_page_from_gva() is only grabbing a reference on the page. So you 
want to use put_page() here.

Note that we don't have type reference on Arm, so it equivalent to 
put_page(). But this wouldn't be technically correct :).

> +        v->arch.runstate_guest.page = NULL;
> +        v->arch.runstate_guest.offset = 0;
> +    }
>   
> -    memcpy(&runstate, &v->runstate, sizeof(runstate));
> +    spin_unlock(&v->arch.runstate_guest.lock);
> +}
>   
> -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +int arch_setup_runstate_guest(struct vcpu *v,
> +                            struct vcpu_register_runstate_memory_area area)

The indentation looks off here.

Also, same remark for the naming.


> +{
> +    struct page_info *page;
> +    unsigned offset;
> +
> +    spin_lock(&v->arch.runstate_guest.lock);
> +
> +    /* cleanup previous page if any */
> +    if ( v->arch.runstate_guest.page )
>       {
> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
> -        guest_handle--;
> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
> -        smp_wmb();
> +        put_page_and_type(v->arch.runstate_guest.page);

Same remark here. Although I would prefer if we try to have a common 
helper to cleaning up the runstate. Maybe cleanup_runstate_vcpu_locked()?

> +        v->arch.runstate_guest.page = NULL;
> +        v->arch.runstate_guest.offset = 0;
> +    }
> +
> +    offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
> +    {
> +        spin_unlock(&v->arch.runstate_guest.lock);
> +        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
> +        return -EINVAL;
> +    }
> +
> +    /* provided address must be aligned to a 64bit */
> +    if ( offset % alignof(struct vcpu_runstate_info) )
> +    {
> +        spin_unlock(&v->arch.runstate_guest.lock);
> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n");
> +        return -EINVAL;
> +    }
> +
> +    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);

In the current implementation, 0 was used to unregister the runstate 
area. I think we want to keep that feature and not throw an error.

> +    if ( !page )
> +    {
> +        spin_unlock(&v->arch.runstate_guest.lock);
> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
> +        return -EINVAL;
>       }
>   
> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> +    v->arch.runstate_guest.page = page;
> +    v->arch.runstate_guest.offset = offset;
> +

In the current implementation, the runstate area was update with the 
latest information during the hypercall. This doesn't seem to happen 
anymore. Is there any specific reason?

> +    spin_unlock(&v->arch.runstate_guest.lock);
> +
> +    return 0;
> +}
> +
> +
> +/* Update per-VCPU guest runstate shared memory area (if registered). */
> +static void update_runstate_area(struct vcpu *v)
> +{
> +    struct vcpu_runstate_info *guest_runstate;
> +    void *p;
> +
> +    spin_lock(&v->arch.runstate_guest.lock);
>   
> -    if ( guest_handle )
> +    if ( v->arch.runstate_guest.page )
>       {
> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +        p = __map_domain_page(v->arch.runstate_guest.page);
> +        guest_runstate = p + v->arch.runstate_guest.offset;
> +
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            smp_wmb();
> +        }
> +
> +        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
>           smp_wmb();
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
> +
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            smp_wmb();

Why do you need this barrier?


[...]

> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 4e2f582006..3a7f53e13d 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -11,6 +11,7 @@
>   #include <asm/vgic.h>
>   #include <asm/vpl011.h>
>   #include <public/hvm/params.h>
> +#include <public/vcpu.h>

Why do you need to add this new include?

>   #include <xen/serial.h>
>   #include <xen/rbtree.h>
>   
> @@ -206,6 +207,13 @@ struct arch_vcpu
>        */
>       bool need_flush_to_ram;
>   
> +    /* runstate guest info */
> +    struct {
> +        struct page_info *page;  /* guest page */
> +        unsigned         offset; /* offset in page */

Please use unsigned int.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12 10:53   ` Julien Grall
@ 2020-06-12 14:13     ` Bertrand Marquis
  2020-06-12 19:56       ` Julien Grall
  2020-06-12 16:51     ` Bertrand Marquis
  1 sibling, 1 reply; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-12 14:13 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

Hi Julien,

> On 12 Jun 2020, at 11:53, Julien Grall <julien@xen.org> wrote:
> 
> Hi Bertrand,
> 
> On 11/06/2020 12:58, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
> 
> I think you want to add a bit more context explaining the reason on the failure. I.e. this is because the virtual address used by the runstate is only accessible when running in kernel space.

Ok

> 
>> This patch is modifying the register runstate area handling on arm to
>> convert the guest address during the hypercall. During runtime context
>> switches the area is mapped to update the guest runstate copy.
>> The patch is also removing the initial copy which was done during the
>> hypercall handling as this is done on the current core when the context
>> is restore to go back to guest execution on arm.
> 
> This is explaining what the commit does but not why we want to translate the virtual address at hypercall time. More importantly, this doesn't explain the restrictions added on the hypercall and why they are fine.
> 
> Note that they also should be documented in the public header.

Ok

> 
>> As a consequence if the register runstate hypercall is called on one
>> vcpu for an other vcpu, the area will not be updated until the vcpu
>> will be executed (on arm only).
> 
> The code below suggests the hypercall will just fail if you call it from a different vCPU. Is that correct?

No the hypercall will work, but the area in this case won’t be updated.
When the hypercall is called on the current vcpu, the area will be updated when the context will be restored before returning to the guest so there is no need to do it an extra time in the hypercall itself.
When this is done for a different vcpu the current code is not updating the area anymore.

I did think at first to do it but the complexity it was adding (mainly due to the dual page case) was very high and I could not really think of a use case or find one in Linux.
But now that I think of it I could, in that specific case, use a copy_to_guest.

Do you think this is something which would make sense to restore ?

Anyway I will fix the commit message to be clearer on that part.

> 
>> On x86, the behaviour is not modified, the address conversion is done
>> during the context switch and the area is updated fully during the
>> hypercall.
>> inline functions in headers could not be used as the architecture
>> domain.h is included before the global domain.h making it impossible
>> to use the struct vcpu inside the architecture header.
>> This should not have any performance impact as the hypercall is only
>> called once per vcpu usually.
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
>> ---
>>  xen/arch/arm/domain.c        | 109 +++++++++++++++++++++++++++++------
>>  xen/arch/x86/domain.c        |  30 +++++++++-
>>  xen/arch/x86/x86_64/domain.c |   4 +-
>>  xen/common/domain.c          |  19 ++----
>>  xen/include/asm-arm/domain.h |   8 +++
>>  xen/include/asm-x86/domain.h |  16 +++++
>>  xen/include/xen/domain.h     |   4 ++
>>  xen/include/xen/sched.h      |  16 +----
>>  8 files changed, 153 insertions(+), 53 deletions(-)
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 31169326b2..739059234f 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -19,6 +19,7 @@
>>  #include <xen/sched.h>
>>  #include <xen/softirq.h>
>>  #include <xen/wait.h>
>> +#include <xen/domain_page.h>
>>    #include <asm/alternative.h>
>>  #include <asm/cpuerrata.h>
>> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n)
>>      virt_timer_restore(n);
>>  }
>>  -/* Update per-VCPU guest runstate shared memory area (if registered). */
>> -static void update_runstate_area(struct vcpu *v)
>> +void arch_cleanup_runstate_guest(struct vcpu *v)
> 
> I would prefer if this is name arch_vcpu_cleanup_runstate() as this is per-vCPU and not per-domain information.

Ok

> 
>>  {
>> -    void __user *guest_handle = NULL;
>> -    struct vcpu_runstate_info runstate;
>> +    spin_lock(&v->arch.runstate_guest.lock);
>>  -    if ( guest_handle_is_null(runstate_guest(v)) )
>> -        return;
>> +    /* cleanup previous page if any */
>> +    if ( v->arch.runstate_guest.page )
>> +    {
>> +        put_page_and_type(v->arch.runstate_guest.page);
> 
> get_page_from_gva() is only grabbing a reference on the page. So you want to use put_page() here.
> 
> Note that we don't have type reference on Arm, so it equivalent to put_page(). But this wouldn't be technically correct :).

Ok, thanks for the explanation.

> 
>> +        v->arch.runstate_guest.page = NULL;
>> +        v->arch.runstate_guest.offset = 0;
>> +    }
>>  -    memcpy(&runstate, &v->runstate, sizeof(runstate));
>> +    spin_unlock(&v->arch.runstate_guest.lock);
>> +}
>>  -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +int arch_setup_runstate_guest(struct vcpu *v,
>> +                            struct vcpu_register_runstate_memory_area area)
> 
> The indentation looks off here.
> 
> Also, same remark for the naming.

Ok

> 
> 
>> +{
>> +    struct page_info *page;
>> +    unsigned offset;
>> +
>> +    spin_lock(&v->arch.runstate_guest.lock);
>> +
>> +    /* cleanup previous page if any */
>> +    if ( v->arch.runstate_guest.page )
>>      {
>> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>> -        guest_handle--;
>> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> -        smp_wmb();
>> +        put_page_and_type(v->arch.runstate_guest.page);
> 
> Same remark here. Although I would prefer if we try to have a common helper to cleaning up the runstate. Maybe cleanup_runstate_vcpu_locked()?

Agree I will add a static function and use it on those 2 locations.

> 
>> +        v->arch.runstate_guest.page = NULL;
>> +        v->arch.runstate_guest.offset = 0;
>> +    }
>> +
>> +    offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
>> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* provided address must be aligned to a 64bit */
>> +    if ( offset % alignof(struct vcpu_runstate_info) )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
> 
> In the current implementation, 0 was used to unregister the runstate area. I think we want to keep that feature and not throw an error.

I will add this use case (I missed it).

> 
>> +    if ( !page )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
>> +        return -EINVAL;
>>      }
>>  -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>> +    v->arch.runstate_guest.page = page;
>> +    v->arch.runstate_guest.offset = offset;
>> +
> 
> In the current implementation, the runstate area was update with the latest information during the hypercall. This doesn't seem to happen anymore. Is there any specific reason?

As explained before, this is still happening on the current vcpu as a result of the context switch back to the guest at the end of the hypercall.
This is not done if called for a different vcpu anymore, but I could add it back using copy_to_guest

> 
>> +    spin_unlock(&v->arch.runstate_guest.lock);
>> +
>> +    return 0;
>> +}
>> +
>> +
>> +/* Update per-VCPU guest runstate shared memory area (if registered). */
>> +static void update_runstate_area(struct vcpu *v)
>> +{
>> +    struct vcpu_runstate_info *guest_runstate;
>> +    void *p;
>> +
>> +    spin_lock(&v->arch.runstate_guest.lock);
>>  -    if ( guest_handle )
>> +    if ( v->arch.runstate_guest.page )
>>      {
>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>> +
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>> +            smp_wmb();
>> +        }
>> +
>> +        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
>>          smp_wmb();
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> +
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +            smp_wmb();
> 
> Why do you need this barrier?

As the bit is supposed to be used to wait for an update to be done, I felt it was better to push this out as soon as possible.
As there is already one after the memcpy, the cost should be fairly limited here.

> 
> 
> [...]
> 
>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>> index 4e2f582006..3a7f53e13d 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -11,6 +11,7 @@
>>  #include <asm/vgic.h>
>>  #include <asm/vpl011.h>
>>  #include <public/hvm/params.h>
>> +#include <public/vcpu.h>
> 
> Why do you need to add this new include?
> 
>>  #include <xen/serial.h>
>>  #include <xen/rbtree.h>
>>  @@ -206,6 +207,13 @@ struct arch_vcpu
>>       */
>>      bool need_flush_to_ram;
>>  +    /* runstate guest info */
>> +    struct {
>> +        struct page_info *page;  /* guest page */
>> +        unsigned         offset; /* offset in page */
> 
> Please use unsigned int.
Ok

Thanks for your comments.
Bertrand


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12 14:13     ` Bertrand Marquis
@ 2020-06-12 19:56       ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2020-06-12 19:56 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

On 12/06/2020 15:13, Bertrand Marquis wrote:
> Hi Julien,

Hi,

>> On 12 Jun 2020, at 11:53, Julien Grall <julien@xen.org> wrote:
>> The code below suggests the hypercall will just fail if you call it from a different vCPU. Is that correct?
> 
> No the hypercall will work, but the area in this case won’t be updated.
> When the hypercall is called on the current vcpu, the area will be updated when the context will be restored before returning to the guest so there is no need to do it an extra time in the hypercall itself.

I am afraid this is not correct, update_runstate_area() is only called 
when context switching between 2 vCPUs. So the only way this could be 
called on return from hypercall is if the vCPU get preempted.

> When this is done for a different vcpu the current code is not updating the area anymore.
> 
> I did think at first to do it but the complexity it was adding (mainly due to the dual page case) was very high and I could not really think of a use case or find one in Linux.

You may want to have an updated view without forcing the vCPU to exit to 
the hypervisor and do a context switch.

> But now that I think of it I could, in that specific case, use a copy_to_guest.

I think you should be able to call update_runstate_area() directly.

> 
> Do you think this is something which would make sense to restore ?

I would like to avoid diverging too much from the current definition of 
the hypercall. In this case, I don't think the restriction you add is 
necessary.

>>
>>> +    if ( !page )
>>> +    {
>>> +        spin_unlock(&v->arch.runstate_guest.lock);
>>> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
>>> +        return -EINVAL;
>>>       }
>>>   -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>>> +    v->arch.runstate_guest.page = page;
>>> +    v->arch.runstate_guest.offset = offset;
>>> +
>>
>> In the current implementation, the runstate area was update with the latest information during the hypercall. This doesn't seem to happen anymore. Is there any specific reason?
> 
> As explained before, this is still happening on the current vcpu as a result of the context switch back to the guest at the end of the hypercall.

See above, I don't believe this is correct. :).

>>> +    spin_unlock(&v->arch.runstate_guest.lock);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +
>>> +/* Update per-VCPU guest runstate shared memory area (if registered). */
>>> +static void update_runstate_area(struct vcpu *v)
>>> +{
>>> +    struct vcpu_runstate_info *guest_runstate;
>>> +    void *p;
>>> +
>>> +    spin_lock(&v->arch.runstate_guest.lock);
>>>   -    if ( guest_handle )
>>> +    if ( v->arch.runstate_guest.page )
>>>       {
>>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>>> +
>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>> +        {
>>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>> +            smp_wmb();
>>> +        }
>>> +
>>> +        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
>>>           smp_wmb();
>>> -        __raw_copy_to_guest(guest_handle,
>>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>>> +
>>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>> +        {
>>> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>> +            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>>> +            smp_wmb();
>>
>> Why do you need this barrier?
> 
> As the bit is supposed to be used to wait for an update to be done, I felt it was better to push this out as soon as possible.

smp_wmb() only ensure that any write before the barrier will be seen 
before after write after the barrier. It doesn't guarantee the other end 
will see it right after it...

> As there is already one after the memcpy, the cost should be fairly limited here.

... so this feel like more a micro-optimization (happy to be proved 
wrong). The memory barriers are already tricky enough to use/understand, 
so I would rather not want to use more than necessary.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12 10:53   ` Julien Grall
  2020-06-12 14:13     ` Bertrand Marquis
@ 2020-06-12 16:51     ` Bertrand Marquis
  2020-06-12 20:31       ` Julien Grall
  1 sibling, 1 reply; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-12 16:51 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

Hi,

> On 12 Jun 2020, at 11:53, Julien Grall <julien@xen.org> wrote:
> 
> Hi Bertrand,
> 
> On 11/06/2020 12:58, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
> 
> I think you want to add a bit more context explaining the reason on the failure. I.e. this is because the virtual address used by the runstate is only accessible when running in kernel space.
> 
>> This patch is modifying the register runstate area handling on arm to
>> convert the guest address during the hypercall. During runtime context
>> switches the area is mapped to update the guest runstate copy.
>> The patch is also removing the initial copy which was done during the
>> hypercall handling as this is done on the current core when the context
>> is restore to go back to guest execution on arm.
> 
> This is explaining what the commit does but not why we want to translate the virtual address at hypercall time. More importantly, this doesn't explain the restrictions added on the hypercall and why they are fine.
> 
> Note that they also should be documented in the public header.
> 
>> As a consequence if the register runstate hypercall is called on one
>> vcpu for an other vcpu, the area will not be updated until the vcpu
>> will be executed (on arm only).
> 
> The code below suggests the hypercall will just fail if you call it from a different vCPU. Is that correct?
> 
>> On x86, the behaviour is not modified, the address conversion is done
>> during the context switch and the area is updated fully during the
>> hypercall.
>> inline functions in headers could not be used as the architecture
>> domain.h is included before the global domain.h making it impossible
>> to use the struct vcpu inside the architecture header.
>> This should not have any performance impact as the hypercall is only
>> called once per vcpu usually.
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
>> ---
>>  xen/arch/arm/domain.c        | 109 +++++++++++++++++++++++++++++------
>>  xen/arch/x86/domain.c        |  30 +++++++++-
>>  xen/arch/x86/x86_64/domain.c |   4 +-
>>  xen/common/domain.c          |  19 ++----
>>  xen/include/asm-arm/domain.h |   8 +++
>>  xen/include/asm-x86/domain.h |  16 +++++
>>  xen/include/xen/domain.h     |   4 ++
>>  xen/include/xen/sched.h      |  16 +----
>>  8 files changed, 153 insertions(+), 53 deletions(-)
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 31169326b2..739059234f 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -19,6 +19,7 @@
>>  #include <xen/sched.h>
>>  #include <xen/softirq.h>
>>  #include <xen/wait.h>
>> +#include <xen/domain_page.h>
>>    #include <asm/alternative.h>
>>  #include <asm/cpuerrata.h>
>> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n)
>>      virt_timer_restore(n);
>>  }
>>  -/* Update per-VCPU guest runstate shared memory area (if registered). */
>> -static void update_runstate_area(struct vcpu *v)
>> +void arch_cleanup_runstate_guest(struct vcpu *v)
> 
> I would prefer if this is name arch_vcpu_cleanup_runstate() as this is per-vCPU and not per-domain information.
> 
>>  {
>> -    void __user *guest_handle = NULL;
>> -    struct vcpu_runstate_info runstate;
>> +    spin_lock(&v->arch.runstate_guest.lock);
>>  -    if ( guest_handle_is_null(runstate_guest(v)) )
>> -        return;
>> +    /* cleanup previous page if any */
>> +    if ( v->arch.runstate_guest.page )
>> +    {
>> +        put_page_and_type(v->arch.runstate_guest.page);
> 
> get_page_from_gva() is only grabbing a reference on the page. So you want to use put_page() here.
> 
> Note that we don't have type reference on Arm, so it equivalent to put_page(). But this wouldn't be technically correct :).
> 
>> +        v->arch.runstate_guest.page = NULL;
>> +        v->arch.runstate_guest.offset = 0;
>> +    }
>>  -    memcpy(&runstate, &v->runstate, sizeof(runstate));
>> +    spin_unlock(&v->arch.runstate_guest.lock);
>> +}
>>  -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +int arch_setup_runstate_guest(struct vcpu *v,
>> +                            struct vcpu_register_runstate_memory_area area)
> 
> The indentation looks off here.
> 
> Also, same remark for the naming.
> 
> 
>> +{
>> +    struct page_info *page;
>> +    unsigned offset;
>> +
>> +    spin_lock(&v->arch.runstate_guest.lock);
>> +
>> +    /* cleanup previous page if any */
>> +    if ( v->arch.runstate_guest.page )
>>      {
>> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>> -        guest_handle--;
>> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> -        smp_wmb();
>> +        put_page_and_type(v->arch.runstate_guest.page);
> 
> Same remark here. Although I would prefer if we try to have a common helper to cleaning up the runstate. Maybe cleanup_runstate_vcpu_locked()?
> 
>> +        v->arch.runstate_guest.page = NULL;
>> +        v->arch.runstate_guest.offset = 0;
>> +    }
>> +
>> +    offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
>> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* provided address must be aligned to a 64bit */
>> +    if ( offset % alignof(struct vcpu_runstate_info) )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
> 
> In the current implementation, 0 was used to unregister the runstate area. I think we want to keep that feature and not throw an error.
> 
>> +    if ( !page )
>> +    {
>> +        spin_unlock(&v->arch.runstate_guest.lock);
>> +        gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
>> +        return -EINVAL;
>>      }
>>  -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>> +    v->arch.runstate_guest.page = page;
>> +    v->arch.runstate_guest.offset = offset;
>> +
> 
> In the current implementation, the runstate area was update with the latest information during the hypercall. This doesn't seem to happen anymore. Is there any specific reason?
> 
>> +    spin_unlock(&v->arch.runstate_guest.lock);
>> +
>> +    return 0;
>> +}
>> +
>> +
>> +/* Update per-VCPU guest runstate shared memory area (if registered). */
>> +static void update_runstate_area(struct vcpu *v)
>> +{
>> +    struct vcpu_runstate_info *guest_runstate;
>> +    void *p;
>> +
>> +    spin_lock(&v->arch.runstate_guest.lock);
>>  -    if ( guest_handle )
>> +    if ( v->arch.runstate_guest.page )
>>      {
>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +        p = __map_domain_page(v->arch.runstate_guest.page);
>> +        guest_runstate = p + v->arch.runstate_guest.offset;
>> +
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> +            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>> +            smp_wmb();
>> +        }
>> +
>> +        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
>>          smp_wmb();
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> +
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +            smp_wmb();
> 
> Why do you need this barrier?
> 
> 
> [...]
> 
>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>> index 4e2f582006..3a7f53e13d 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -11,6 +11,7 @@
>>  #include <asm/vgic.h>
>>  #include <asm/vpl011.h>
>>  #include <public/hvm/params.h>
>> +#include <public/vcpu.h>
> 
> Why do you need to add this new include?

Sorry I forgot to answer to this one.
This is needed to have the definition of vcpu_register_runstate_memory_area.

Bertrand



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12 16:51     ` Bertrand Marquis
@ 2020-06-12 20:31       ` Julien Grall
  2020-06-15 14:01         ` Bertrand Marquis
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-06-12 20:31 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné



On 12/06/2020 17:51, Bertrand Marquis wrote:

Hi,

>>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>>> index 4e2f582006..3a7f53e13d 100644
>>> --- a/xen/include/asm-arm/domain.h
>>> +++ b/xen/include/asm-arm/domain.h
>>> @@ -11,6 +11,7 @@
>>>   #include <asm/vgic.h>
>>>   #include <asm/vpl011.h>
>>>   #include <public/hvm/params.h>
>>> +#include <public/vcpu.h>
>>
>> Why do you need to add this new include?
> 
> Sorry I forgot to answer to this one.
> This is needed to have the definition of vcpu_register_runstate_memory_area.

I might be missing something because nothing you have added this header 
seem to require vcpu_register_runstate_memory_area. So should the 
include be done in a different header?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall
  2020-06-12 20:31       ` Julien Grall
@ 2020-06-15 14:01         ` Bertrand Marquis
  0 siblings, 0 replies; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-15 14:01 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné



> On 12 Jun 2020, at 21:31, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 12/06/2020 17:51, Bertrand Marquis wrote:
> 
> Hi,
> 
>>>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>>>> index 4e2f582006..3a7f53e13d 100644
>>>> --- a/xen/include/asm-arm/domain.h
>>>> +++ b/xen/include/asm-arm/domain.h
>>>> @@ -11,6 +11,7 @@
>>>>  #include <asm/vgic.h>
>>>>  #include <asm/vpl011.h>
>>>>  #include <public/hvm/params.h>
>>>> +#include <public/vcpu.h>
>>> 
>>> Why do you need to add this new include?
>> Sorry I forgot to answer to this one.
>> This is needed to have the definition of vcpu_register_runstate_memory_area.
> 
> I might be missing something because nothing you have added this header seem to require vcpu_register_runstate_memory_area. So should the include be done in a different header?

Right.
This was required before when I had the definitions of the interface per arch to have a static inline for x86.
This is not needed and I will remove that include.

Cheers
Bertrand



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/2] xen/arm: Support runstate crossing pages
  2020-06-11 11:58 [PATCH 0/2] xen/arm: Convert runstate address during hypcall Bertrand Marquis
  2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis
@ 2020-06-11 11:58 ` Bertrand Marquis
  2020-06-12  1:10   ` Stefano Stabellini
  2020-06-12 12:14   ` Julien Grall
  1 sibling, 2 replies; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-11 11:58 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr Babchuk, nd, Julien Grall, Stefano Stabellini

Add support for runstate area register with the structure crossing pages
The code is storing up to 2 pages reference during the hypercall.
During a context switch, the code is computing where the
state_entry_time is and is breaking the memcpy in 2 parts when it is
required.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
---
 xen/arch/arm/domain.c        | 101 +++++++++++++++++++++++++----------
 xen/include/asm-arm/domain.h |   5 +-
 2 files changed, 76 insertions(+), 30 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 739059234f..d847cb00f2 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
 {
     spin_lock(&v->arch.runstate_guest.lock);
 
-    /* cleanup previous page if any */
-    if ( v->arch.runstate_guest.page )
+    /* cleanup previous pages if any */
+    if ( v->arch.runstate_guest.page[0] )
     {
-        put_page_and_type(v->arch.runstate_guest.page);
-        v->arch.runstate_guest.page = NULL;
+        put_page_and_type(v->arch.runstate_guest.page[0]);
+        v->arch.runstate_guest.page[0] = NULL;
+        if ( v->arch.runstate_guest.page[1] )
+        {
+            put_page_and_type(v->arch.runstate_guest.page[1]);
+            v->arch.runstate_guest.page[1] = NULL;
+        }
         v->arch.runstate_guest.offset = 0;
     }
 
@@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
 int arch_setup_runstate_guest(struct vcpu *v,
                             struct vcpu_register_runstate_memory_area area)
 {
-    struct page_info *page;
+    struct page_info *page[2] = {NULL,NULL};
     unsigned offset;
 
     spin_lock(&v->arch.runstate_guest.lock);
 
-    /* cleanup previous page if any */
-    if ( v->arch.runstate_guest.page )
+    /* cleanup previous pages if any */
+    if ( v->arch.runstate_guest.page[0] )
     {
-        put_page_and_type(v->arch.runstate_guest.page);
-        v->arch.runstate_guest.page = NULL;
+        put_page_and_type(v->arch.runstate_guest.page[0]);
+        v->arch.runstate_guest.page[0] = NULL;
+        if ( v->arch.runstate_guest.page[1] )
+        {
+            put_page_and_type(v->arch.runstate_guest.page[1]);
+            v->arch.runstate_guest.page[1] = NULL;
+        }
         v->arch.runstate_guest.offset = 0;
     }
 
     offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
-    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
-    {
-        spin_unlock(&v->arch.runstate_guest.lock);
-        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
-        return -EINVAL;
-    }
 
     /* provided address must be aligned to a 64bit */
     if ( offset % alignof(struct vcpu_runstate_info) )
@@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v,
         return -EINVAL;
     }
 
-    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
-    if ( !page )
+    page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
+    if ( !page[0] )
     {
         spin_unlock(&v->arch.runstate_guest.lock);
         gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
         return -EINVAL;
     }
 
-    v->arch.runstate_guest.page = page;
+    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
+    {
+        /* guest area is crossing pages */
+        page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE,
+                                        GV2M_WRITE);
+        if ( !page[1] )
+        {
+            put_page_and_type(v->arch.runstate_guest.page[0]);
+            spin_unlock(&v->arch.runstate_guest.lock);
+            gprintk(XENLOG_DEBUG, "Runstate pointer is not fully mapped\n");
+            return -EINVAL;
+        }
+    }
+
+    v->arch.runstate_guest.page[0] = page[0];
+    v->arch.runstate_guest.page[1] = page[1];
     v->arch.runstate_guest.offset = offset;
 
     spin_unlock(&v->arch.runstate_guest.lock);
@@ -343,34 +362,60 @@ int arch_setup_runstate_guest(struct vcpu *v,
 /* Update per-VCPU guest runstate shared memory area (if registered). */
 static void update_runstate_area(struct vcpu *v)
 {
-    struct vcpu_runstate_info *guest_runstate;
-    void *p;
+    void *p[2];
+    unsigned offset, time_offset;
+    uint64_t *state_entry_time;
 
     spin_lock(&v->arch.runstate_guest.lock);
 
-    if ( v->arch.runstate_guest.page )
+    if ( v->arch.runstate_guest.page[0] )
     {
-        p = __map_domain_page(v->arch.runstate_guest.page);
-        guest_runstate = p + v->arch.runstate_guest.offset;
+        p[0] = __map_domain_page(v->arch.runstate_guest.page[0]);
+        if ( v->arch.runstate_guest.page[1] )
+            p[1] = __map_domain_page(v->arch.runstate_guest.page[1]);
+        else
+            p[1] = NULL;
+        offset = v->arch.runstate_guest.offset;
 
         if ( VM_ASSIST(v->domain, runstate_update_flag) )
         {
+            time_offset = offset +
+                    offsetof(struct vcpu_runstate_info, state_entry_time);
+
+            if ( time_offset < PAGE_SIZE )
+                state_entry_time = p[0] + time_offset;
+            else
+                state_entry_time = p[1] + (time_offset - PAGE_SIZE);
+
             v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
-            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
+            *state_entry_time |= XEN_RUNSTATE_UPDATE;
             smp_wmb();
         }
+        else
+            state_entry_time = NULL;
+
+        if ( p[1] )
+        {
+            memcpy(p[0] + offset, &v->runstate, PAGE_SIZE - offset);
+            memcpy(p[1], &v->runstate + (PAGE_SIZE - offset),
+                    sizeof(v->runstate) - (PAGE_SIZE - offset));
+        }
+        else
+            memcpy(p[0] + offset, &v->runstate, sizeof(v->runstate));
 
-        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
+        /* copy must be done before switching the bit */
         smp_wmb();
 
-        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        if ( state_entry_time )
         {
             v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
-            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            *state_entry_time &= ~XEN_RUNSTATE_UPDATE;
             smp_wmb();
         }
 
-        unmap_domain_page(p);
+        unmap_domain_page(p[0]);
+        if ( p[1] )
+            unmap_domain_page(p[1]);
     }
 
     spin_unlock(&v->arch.runstate_guest.lock);
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 3a7f53e13d..61e32f1ed5 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -209,8 +209,9 @@ struct arch_vcpu
 
     /* runstate guest info */
     struct {
-        struct page_info *page;  /* guest page */
-        unsigned         offset; /* offset in page */
+        /* we need 2 pages in case the guest area is crossing pages */
+        struct page_info *page[2];  /* guest pages */
+        unsigned         offset; /* offset in first page */
         spinlock_t       lock;   /* lock to access page */
     } runstate_guest;
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages
  2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis
@ 2020-06-12  1:10   ` Stefano Stabellini
  2020-06-12 11:37     ` Julien Grall
  2020-06-12 12:14   ` Julien Grall
  1 sibling, 1 reply; 26+ messages in thread
From: Stefano Stabellini @ 2020-06-12  1:10 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Volodymyr Babchuk, xen-devel, nd, Julien Grall, Stefano Stabellini

On Thu, 11 Jun 2020, Bertrand Marquis wrote:
> Add support for runstate area register with the structure crossing pages
> The code is storing up to 2 pages reference during the hypercall.
> During a context switch, the code is computing where the
> state_entry_time is and is breaking the memcpy in 2 parts when it is
> required.
> 
> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>

Clearly a lot of efforts went into this patch, thanks you Bertrand.

The change is complex for the feature it adds. I wonder if we could use
ioremap (or maybe something similar but using a different virtual space)
to simplify it. Julien, do you have good ideas?

Otherwise I don't think there is much we can do to make this patch
simpler.

As ugly as it is (sorry Bertrand, it is not your fault!) the patch looks
correct.


> ---
>  xen/arch/arm/domain.c        | 101 +++++++++++++++++++++++++----------
>  xen/include/asm-arm/domain.h |   5 +-
>  2 files changed, 76 insertions(+), 30 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 739059234f..d847cb00f2 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
>  {
>      spin_lock(&v->arch.runstate_guest.lock);
>  
> -    /* cleanup previous page if any */
> -    if ( v->arch.runstate_guest.page )
> +    /* cleanup previous pages if any */
> +    if ( v->arch.runstate_guest.page[0] )
>      {
> -        put_page_and_type(v->arch.runstate_guest.page);
> -        v->arch.runstate_guest.page = NULL;
> +        put_page_and_type(v->arch.runstate_guest.page[0]);
> +        v->arch.runstate_guest.page[0] = NULL;
> +        if ( v->arch.runstate_guest.page[1] )
> +        {
> +            put_page_and_type(v->arch.runstate_guest.page[1]);
> +            v->arch.runstate_guest.page[1] = NULL;
> +        }
>          v->arch.runstate_guest.offset = 0;
>      }
>  
> @@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
>  int arch_setup_runstate_guest(struct vcpu *v,
>                              struct vcpu_register_runstate_memory_area area)
>  {
> -    struct page_info *page;
> +    struct page_info *page[2] = {NULL,NULL};
>      unsigned offset;
>  
>      spin_lock(&v->arch.runstate_guest.lock);
>  
> -    /* cleanup previous page if any */
> -    if ( v->arch.runstate_guest.page )
> +    /* cleanup previous pages if any */
> +    if ( v->arch.runstate_guest.page[0] )
>      {
> -        put_page_and_type(v->arch.runstate_guest.page);
> -        v->arch.runstate_guest.page = NULL;
> +        put_page_and_type(v->arch.runstate_guest.page[0]);
> +        v->arch.runstate_guest.page[0] = NULL;
> +        if ( v->arch.runstate_guest.page[1] )
> +        {
> +            put_page_and_type(v->arch.runstate_guest.page[1]);
> +            v->arch.runstate_guest.page[1] = NULL;
> +        }
>          v->arch.runstate_guest.offset = 0;
>      }
>  
>      offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
> -    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
> -    {
> -        spin_unlock(&v->arch.runstate_guest.lock);
> -        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
> -        return -EINVAL;
> -    }
>  
>      /* provided address must be aligned to a 64bit */
>      if ( offset % alignof(struct vcpu_runstate_info) )
> @@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v,
>          return -EINVAL;
>      }
>  
> -    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
> -    if ( !page )
> +    page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
> +    if ( !page[0] )
>      {
>          spin_unlock(&v->arch.runstate_guest.lock);
>          gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
>          return -EINVAL;
>      }
>  
> -    v->arch.runstate_guest.page = page;
> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
> +    {
> +        /* guest area is crossing pages */
> +        page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE,
> +                                        GV2M_WRITE);
> +        if ( !page[1] )
> +        {
> +            put_page_and_type(v->arch.runstate_guest.page[0]);
> +            spin_unlock(&v->arch.runstate_guest.lock);
> +            gprintk(XENLOG_DEBUG, "Runstate pointer is not fully mapped\n");
> +            return -EINVAL;
> +        }
> +    }
> +
> +    v->arch.runstate_guest.page[0] = page[0];
> +    v->arch.runstate_guest.page[1] = page[1];
>      v->arch.runstate_guest.offset = offset;
>  
>      spin_unlock(&v->arch.runstate_guest.lock);
> @@ -343,34 +362,60 @@ int arch_setup_runstate_guest(struct vcpu *v,
>  /* Update per-VCPU guest runstate shared memory area (if registered). */
>  static void update_runstate_area(struct vcpu *v)
>  {
> -    struct vcpu_runstate_info *guest_runstate;
> -    void *p;
> +    void *p[2];
> +    unsigned offset, time_offset;
> +    uint64_t *state_entry_time;
>  
>      spin_lock(&v->arch.runstate_guest.lock);
>  
> -    if ( v->arch.runstate_guest.page )
> +    if ( v->arch.runstate_guest.page[0] )
>      {
> -        p = __map_domain_page(v->arch.runstate_guest.page);
> -        guest_runstate = p + v->arch.runstate_guest.offset;
> +        p[0] = __map_domain_page(v->arch.runstate_guest.page[0]);
> +        if ( v->arch.runstate_guest.page[1] )
> +            p[1] = __map_domain_page(v->arch.runstate_guest.page[1]);
> +        else
> +            p[1] = NULL;
> +        offset = v->arch.runstate_guest.offset;
>  
>          if ( VM_ASSIST(v->domain, runstate_update_flag) )
>          {
> +            time_offset = offset +
> +                    offsetof(struct vcpu_runstate_info, state_entry_time);
> +
> +            if ( time_offset < PAGE_SIZE )
> +                state_entry_time = p[0] + time_offset;
> +            else
> +                state_entry_time = p[1] + (time_offset - PAGE_SIZE);
> +
>              v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> -            guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            *state_entry_time |= XEN_RUNSTATE_UPDATE;
>              smp_wmb();
>          }
> +        else
> +            state_entry_time = NULL;
> +
> +        if ( p[1] )
> +        {
> +            memcpy(p[0] + offset, &v->runstate, PAGE_SIZE - offset);
> +            memcpy(p[1], &v->runstate + (PAGE_SIZE - offset),
> +                    sizeof(v->runstate) - (PAGE_SIZE - offset));
> +        }
> +        else
> +            memcpy(p[0] + offset, &v->runstate, sizeof(v->runstate));
>  
> -        memcpy(guest_runstate, &v->runstate, sizeof(v->runstate));
> +        /* copy must be done before switching the bit */
>          smp_wmb();
>  
> -        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        if ( state_entry_time )
>          {
>              v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> -            guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            *state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>              smp_wmb();
>          }
>  
> -        unmap_domain_page(p);
> +        unmap_domain_page(p[0]);
> +        if ( p[1] )
> +            unmap_domain_page(p[1]);
>      }
>  
>      spin_unlock(&v->arch.runstate_guest.lock);
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 3a7f53e13d..61e32f1ed5 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -209,8 +209,9 @@ struct arch_vcpu
>  
>      /* runstate guest info */
>      struct {
> -        struct page_info *page;  /* guest page */
> -        unsigned         offset; /* offset in page */
> +        /* we need 2 pages in case the guest area is crossing pages */
> +        struct page_info *page[2];  /* guest pages */
> +        unsigned         offset; /* offset in first page */
>          spinlock_t       lock;   /* lock to access page */
>      } runstate_guest;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages
  2020-06-12  1:10   ` Stefano Stabellini
@ 2020-06-12 11:37     ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2020-06-12 11:37 UTC (permalink / raw)
  To: Stefano Stabellini, Bertrand Marquis; +Cc: xen-devel, nd, Volodymyr Babchuk

Hi,

On 12/06/2020 02:10, Stefano Stabellini wrote:
> On Thu, 11 Jun 2020, Bertrand Marquis wrote:
>> Add support for runstate area register with the structure crossing pages
>> The code is storing up to 2 pages reference during the hypercall.
>> During a context switch, the code is computing where the
>> state_entry_time is and is breaking the memcpy in 2 parts when it is
>> required.
>>
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
> 
> Clearly a lot of efforts went into this patch, thanks you Bertrand.
> 
> The change is complex for the feature it adds. I wonder if we could use
> ioremap (or maybe something similar but using a different virtual space)
> to simplify it. Julien, do you have good ideas?

ioremap() is for MMIO region, so you would want to use vmap(). Note that 
former is just a wrapper of the latter.

vmap() is probably a good start for now, but not ideal for long term 
because the vmap() area is fairly small (768MB) and if we want to go 
towards a secret-free hypervisor on Arm, we would want to restrict the 
visibility of the mapping to the other CPUs.

I think it would be good to have some per-CPU/per-domain mapping to 
limit the waste of the address space. But this will require to 
deduplicate page-tables for arm64 (I was actually looking at it over the 
past few week-ends).

For the time-being, I would suggest to use vmap(). The memory is always 
direct mapped on arm64, so it should make no different for security concern.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages
  2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis
  2020-06-12  1:10   ` Stefano Stabellini
@ 2020-06-12 12:14   ` Julien Grall
  2020-06-12 16:13     ` Bertrand Marquis
  1 sibling, 1 reply; 26+ messages in thread
From: Julien Grall @ 2020-06-12 12:14 UTC (permalink / raw)
  To: Bertrand Marquis, xen-devel; +Cc: nd, Volodymyr Babchuk, Stefano Stabellini

Hi,

On 11/06/2020 12:58, Bertrand Marquis wrote:
> Add support for runstate area register with the structure crossing pages

Well, this has always been supported until your previous patch. In 
general, we try to not break thing in a middle of the series so we can 
still bisect it.

I think this patch can be simplified a lot by mapping the two page 
contiguously (see my previous answer). With that it would be feasible to 
fold this patch in #1.

> The code is storing up to 2 pages reference during the hypercall.
> During a context switch, the code is computing where the
> state_entry_time is and is breaking the memcpy in 2 parts when it is
> required.
> 
> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
> ---
>   xen/arch/arm/domain.c        | 101 +++++++++++++++++++++++++----------
>   xen/include/asm-arm/domain.h |   5 +-
>   2 files changed, 76 insertions(+), 30 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 739059234f..d847cb00f2 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
>   {
>       spin_lock(&v->arch.runstate_guest.lock);
>   
> -    /* cleanup previous page if any */
> -    if ( v->arch.runstate_guest.page )
> +    /* cleanup previous pages if any */
> +    if ( v->arch.runstate_guest.page[0] )
>       {
> -        put_page_and_type(v->arch.runstate_guest.page);
> -        v->arch.runstate_guest.page = NULL;
> +        put_page_and_type(v->arch.runstate_guest.page[0]);
> +        v->arch.runstate_guest.page[0] = NULL;
> +        if ( v->arch.runstate_guest.page[1] )
> +        {
> +            put_page_and_type(v->arch.runstate_guest.page[1]);
> +            v->arch.runstate_guest.page[1] = NULL;
> +        }

I think this can be moved outside of the first if.

>           v->arch.runstate_guest.offset = 0;
>       }
>   
> @@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
>   int arch_setup_runstate_guest(struct vcpu *v,
>                               struct vcpu_register_runstate_memory_area area)
>   {
> -    struct page_info *page;
> +    struct page_info *page[2] = {NULL,NULL};

I don't think you need the temporary variable. You can directly update 
page[0] as it is protected by the lock. The nice benefits is you could 
take advantage of a common helper to cleanup and reduce the complexity 
of the code.

>       unsigned offset;
>   
>       spin_lock(&v->arch.runstate_guest.lock);
>   
> -    /* cleanup previous page if any */
> -    if ( v->arch.runstate_guest.page )
> +    /* cleanup previous pages if any */
> +    if ( v->arch.runstate_guest.page[0] )
>       {
> -        put_page_and_type(v->arch.runstate_guest.page);
> -        v->arch.runstate_guest.page = NULL;
> +        put_page_and_type(v->arch.runstate_guest.page[0]);
> +        v->arch.runstate_guest.page[0] = NULL;
> +        if ( v->arch.runstate_guest.page[1] )
> +        {
> +            put_page_and_type(v->arch.runstate_guest.page[1]);
> +            v->arch.runstate_guest.page[1] = NULL;
> +        }
>           v->arch.runstate_guest.offset = 0;
>       }
>   
>       offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
> -    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
> -    {
> -        spin_unlock(&v->arch.runstate_guest.lock);
> -        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
> -        return -EINVAL;
> -    }
>   
>       /* provided address must be aligned to a 64bit */
>       if ( offset % alignof(struct vcpu_runstate_info) )
> @@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v,
>           return -EINVAL;
>       }
>   
> -    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
> -    if ( !page )
> +    page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
> +    if ( !page[0] )
>       {
>           spin_unlock(&v->arch.runstate_guest.lock);
>           gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
>           return -EINVAL;
>       }
>   
> -    v->arch.runstate_guest.page = page;
> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
> +    {
> +        /* guest area is crossing pages */
> +        page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE,
> +                                        GV2M_WRITE);
> +        if ( !page[1] )
> +        {
> +            put_page_and_type(v->arch.runstate_guest.page[0]);
v->arch.runstate_guest.page[0] would be NULL as you set it afterwards. 
So you want to set v->arch.runstate_guest.page[0] beforehand.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages
  2020-06-12 12:14   ` Julien Grall
@ 2020-06-12 16:13     ` Bertrand Marquis
  0 siblings, 0 replies; 26+ messages in thread
From: Bertrand Marquis @ 2020-06-12 16:13 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, nd, Volodymyr Babchuk, Stefano Stabellini

Hi,

> On 12 Jun 2020, at 13:14, Julien Grall <julien@xen.org> wrote:
> 
> Hi,
> 
> On 11/06/2020 12:58, Bertrand Marquis wrote:
>> Add support for runstate area register with the structure crossing pages
> 
> Well, this has always been supported until your previous patch. In general, we try to not break thing in a middle of the series so we can still bisect it.
> 
> I think this patch can be simplified a lot by mapping the two page contiguously (see my previous answer). With that it would be feasible to fold this patch in #1.

I will dig into switching to something using vmap.
Worst case scenario we would consume 8k * 128 vCPU which is still 1M but should be acceptable as it could simplify greatly the code.

> 
>> The code is storing up to 2 pages reference during the hypercall.
>> During a context switch, the code is computing where the
>> state_entry_time is and is breaking the memcpy in 2 parts when it is
>> required.
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
>> ---
>>  xen/arch/arm/domain.c        | 101 +++++++++++++++++++++++++----------
>>  xen/include/asm-arm/domain.h |   5 +-
>>  2 files changed, 76 insertions(+), 30 deletions(-)
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 739059234f..d847cb00f2 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
>>  {
>>      spin_lock(&v->arch.runstate_guest.lock);
>>  -    /* cleanup previous page if any */
>> -    if ( v->arch.runstate_guest.page )
>> +    /* cleanup previous pages if any */
>> +    if ( v->arch.runstate_guest.page[0] )
>>      {
>> -        put_page_and_type(v->arch.runstate_guest.page);
>> -        v->arch.runstate_guest.page = NULL;
>> +        put_page_and_type(v->arch.runstate_guest.page[0]);
>> +        v->arch.runstate_guest.page[0] = NULL;
>> +        if ( v->arch.runstate_guest.page[1] )
>> +        {
>> +            put_page_and_type(v->arch.runstate_guest.page[1]);
>> +            v->arch.runstate_guest.page[1] = NULL;
>> +        }
> 
> I think this can be moved outside of the first if.
Agree

> 
>>          v->arch.runstate_guest.offset = 0;
>>      }
>>  @@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v)
>>  int arch_setup_runstate_guest(struct vcpu *v,
>>                              struct vcpu_register_runstate_memory_area area)
>>  {
>> -    struct page_info *page;
>> +    struct page_info *page[2] = {NULL,NULL};
> 
> I don't think you need the temporary variable. You can directly update page[0] as it is protected by the lock. The nice benefits is you could take advantage of a common helper to cleanup and reduce the complexity of the code.

Would make sense yes (and remove the unlock everywhere).
I will try that, thanks for the hint.

> 
>>      unsigned offset;
>>        spin_lock(&v->arch.runstate_guest.lock);
>>  -    /* cleanup previous page if any */
>> -    if ( v->arch.runstate_guest.page )
>> +    /* cleanup previous pages if any */
>> +    if ( v->arch.runstate_guest.page[0] )
>>      {
>> -        put_page_and_type(v->arch.runstate_guest.page);
>> -        v->arch.runstate_guest.page = NULL;
>> +        put_page_and_type(v->arch.runstate_guest.page[0]);
>> +        v->arch.runstate_guest.page[0] = NULL;
>> +        if ( v->arch.runstate_guest.page[1] )
>> +        {
>> +            put_page_and_type(v->arch.runstate_guest.page[1]);
>> +            v->arch.runstate_guest.page[1] = NULL;
>> +        }
>>          v->arch.runstate_guest.offset = 0;
>>      }
>>        offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK;
>> -    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
>> -    {
>> -        spin_unlock(&v->arch.runstate_guest.lock);
>> -        gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n");
>> -        return -EINVAL;
>> -    }
>>        /* provided address must be aligned to a 64bit */
>>      if ( offset % alignof(struct vcpu_runstate_info) )
>> @@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v,
>>          return -EINVAL;
>>      }
>>  -    page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
>> -    if ( !page )
>> +    page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE);
>> +    if ( !page[0] )
>>      {
>>          spin_unlock(&v->arch.runstate_guest.lock);
>>          gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n");
>>          return -EINVAL;
>>      }
>>  -    v->arch.runstate_guest.page = page;
>> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
>> +    {
>> +        /* guest area is crossing pages */
>> +        page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE,
>> +                                        GV2M_WRITE);
>> +        if ( !page[1] )
>> +        {
>> +            put_page_and_type(v->arch.runstate_guest.page[0]);
> v->arch.runstate_guest.page[0] would be NULL as you set it afterwards. So you want to set v->arch.runstate_guest.page[0] beforehand.

Nice catch, should have been page[0] alone.

Thanks
Bertrand



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-06-15 20:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 11:58 [PATCH 0/2] xen/arm: Convert runstate address during hypcall Bertrand Marquis
2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis
2020-06-11 18:16   ` Stefano Stabellini
2020-06-11 18:24     ` Julien Grall
2020-06-11 18:50       ` Stefano Stabellini
2020-06-11 19:38         ` Julien Grall
2020-06-12  1:09           ` Stefano Stabellini
2020-06-12  8:13             ` Bertrand Marquis
2020-06-13  0:24               ` Stefano Stabellini
2020-06-15 14:09                 ` Bertrand Marquis
2020-06-15 20:30                   ` Stefano Stabellini
2020-06-15 20:44                     ` Julien Grall
2020-06-12  9:53             ` Julien Grall
2020-06-13  0:24               ` Stefano Stabellini
2020-06-12  8:07     ` Bertrand Marquis
2020-06-12 10:53   ` Julien Grall
2020-06-12 14:13     ` Bertrand Marquis
2020-06-12 19:56       ` Julien Grall
2020-06-12 16:51     ` Bertrand Marquis
2020-06-12 20:31       ` Julien Grall
2020-06-15 14:01         ` Bertrand Marquis
2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis
2020-06-12  1:10   ` Stefano Stabellini
2020-06-12 11:37     ` Julien Grall
2020-06-12 12:14   ` Julien Grall
2020-06-12 16:13     ` Bertrand Marquis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).