xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] xen/arm: Convert runstate address during hypcall
@ 2020-07-28 15:52 Bertrand Marquis
  2020-07-28 19:04 ` Stefano Stabellini
  2020-07-28 19:54 ` Jan Beulich
  0 siblings, 2 replies; 14+ messages in thread
From: Bertrand Marquis @ 2020-07-28 15:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, nd, Volodymyr Babchuk,
	Roger Pau Monné

At the moment on Arm, a Linux guest running with KTPI enabled will
cause the following error when a context switch happens in user mode:
(XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0

The error is caused by the virtual address for the runstate area
registered by the guest only being accessible when the guest is running
in kernel space when KPTI is enabled.

To solve this issue, this patch is doing the translation from virtual
address to physical address during the hypercall and mapping the
required pages using vmap. This is removing the conversion from virtual
to physical address during the context switch which is solving the
problem with KPTI.

This is done only on arm architecture, the behaviour on x86 is not
modified by this patch and the address conversion is done as before
during each context switch.

This is introducing several limitations in comparison to the previous
behaviour (on arm only):
- if the guest is remapping the area at a different physical address Xen
will continue to update the area at the previous physical address. As
the area is in kernel space and usually defined as a global variable this
is something which is believed not to happen. If this is required by a
guest, it will have to call the hypercall with the new area (even if it
is at the same virtual address).
- the area needs to be mapped during the hypercall. For the same reasons
as for the previous case, even if the area is registered for a different
vcpu. It is believed that registering an area using a virtual address
unmapped is not something done.

inline functions in headers could not be used as the architecture
domain.h is included before the global domain.h making it impossible
to use the struct vcpu inside the architecture header.
This should not have any performance impact as the hypercall is only
called once per vcpu usually.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>

---
  Changes in v2
    - use vmap to map the pages during the hypercall.
    - reintroduce initial copy during hypercall.

---
 xen/arch/arm/domain.c        | 160 +++++++++++++++++++++++++++++++----
 xen/arch/x86/domain.c        |  30 ++++++-
 xen/arch/x86/x86_64/domain.c |   4 +-
 xen/common/domain.c          |  19 ++---
 xen/include/asm-arm/domain.h |   9 ++
 xen/include/asm-x86/domain.h |  16 ++++
 xen/include/xen/domain.h     |   5 ++
 xen/include/xen/sched.h      |  16 +---
 8 files changed, 207 insertions(+), 52 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 31169326b2..c595438bd9 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -19,6 +19,7 @@
 #include <xen/sched.h>
 #include <xen/softirq.h>
 #include <xen/wait.h>
+#include <xen/vmap.h>
 
 #include <asm/alternative.h>
 #include <asm/cpuerrata.h>
@@ -275,36 +276,157 @@ static void ctxt_switch_to(struct vcpu *n)
     virt_timer_restore(n);
 }
 
-/* Update per-VCPU guest runstate shared memory area (if registered). */
-static void update_runstate_area(struct vcpu *v)
+static void cleanup_runstate_vcpu_locked(struct vcpu *v)
+{
+    if ( v->arch.runstate_guest )
+    {
+        vunmap((void *)((unsigned long)v->arch.runstate_guest & PAGE_MASK));
+
+        put_page(v->arch.runstate_guest_page[0]);
+
+        if ( v->arch.runstate_guest_page[1] )
+        {
+            put_page(v->arch.runstate_guest_page[1]);
+        }
+        v->arch.runstate_guest = NULL;
+    }
+}
+
+void arch_vcpu_cleanup_runstate(struct vcpu *v)
 {
-    void __user *guest_handle = NULL;
+    spin_lock(&v->arch.runstate_guest_lock);
+
+    cleanup_runstate_vcpu_locked(v);
+
+    spin_unlock(&v->arch.runstate_guest_lock);
+}
+
+static int setup_runstate_vcpu_locked(struct vcpu *v, vaddr_t vaddr)
+{
+    unsigned int offset;
+    mfn_t mfn[2];
+    struct page_info *page;
+    unsigned int numpages;
     struct vcpu_runstate_info runstate;
+    void *p;
 
-    if ( guest_handle_is_null(runstate_guest(v)) )
-        return;
+    /* user can pass a NULL address to unregister a previous area */
+    if ( vaddr == 0 )
+        return 0;
 
-    memcpy(&runstate, &v->runstate, sizeof(runstate));
+    offset = vaddr & ~PAGE_MASK;
 
-    if ( VM_ASSIST(v->domain, runstate_update_flag) )
+    /* provided address must be aligned to a 64bit */
+    if ( offset % alignof(struct vcpu_runstate_info) )
     {
-        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
-        guest_handle--;
-        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
-        smp_wmb();
+        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
+                "Invalid offset\n", vaddr);
+        return -EINVAL;
+    }
+
+    page = get_page_from_gva(v, vaddr, GV2M_WRITE);
+    if ( !page )
+    {
+        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
+                "Page is not mapped\n", vaddr);
+        return -EINVAL;
+    }
+    mfn[0] = page_to_mfn(page);
+    v->arch.runstate_guest_page[0] = page;
+
+    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
+    {
+        /* guest area is crossing pages */
+        page = get_page_from_gva(v, vaddr + PAGE_SIZE, GV2M_WRITE);
+        if ( !page )
+        {
+            put_page(v->arch.runstate_guest_page[0]);
+            gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
+                    "2nd Page is not mapped\n", vaddr);
+            return -EINVAL;
+        }
+        mfn[1] = page_to_mfn(page);
+        v->arch.runstate_guest_page[1] = page;
+        numpages = 2;
+    }
+    else
+    {
+        v->arch.runstate_guest_page[1] = NULL;
+        numpages = 1;
+    }
+
+    p = vmap(mfn, numpages);
+    if ( !p )
+    {
+        put_page(v->arch.runstate_guest_page[0]);
+        if ( numpages == 2 )
+            put_page(v->arch.runstate_guest_page[1]);
+
+        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
+                "vmap error\n", vaddr);
+        return -EINVAL;
     }
 
-    __copy_to_guest(runstate_guest(v), &runstate, 1);
+    v->arch.runstate_guest = p + offset;
 
-    if ( guest_handle )
+    if (v == current)
     {
-        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+        memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
+    }
+    else
+    {
+        vcpu_runstate_get(v, &runstate);
+        memcpy(v->arch.runstate_guest, &runstate, sizeof(v->runstate));
+    }
+
+    return 0;
+}
+
+int arch_vcpu_setup_runstate(struct vcpu *v,
+                             struct vcpu_register_runstate_memory_area area)
+{
+    int rc;
+
+    spin_lock(&v->arch.runstate_guest_lock);
+
+    /* cleanup if we are recalled */
+    cleanup_runstate_vcpu_locked(v);
+
+    rc = setup_runstate_vcpu_locked(v, (vaddr_t)area.addr.v);
+
+    spin_unlock(&v->arch.runstate_guest_lock);
+
+    return rc;
+}
+
+
+/* Update per-VCPU guest runstate shared memory area (if registered). */
+static void update_runstate_area(struct vcpu *v)
+{
+    spin_lock(&v->arch.runstate_guest_lock);
+
+    if ( v->arch.runstate_guest )
+    {
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+            v->arch.runstate_guest->state_entry_time |= XEN_RUNSTATE_UPDATE;
+            smp_wmb();
+        }
+
+        memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
+
+        /* copy must be done before switching the bit */
         smp_wmb();
-        __raw_copy_to_guest(guest_handle,
-                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
+
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            v->arch.runstate_guest->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+        }
     }
+
+    spin_unlock(&v->arch.runstate_guest_lock);
 }
 
 static void schedule_tail(struct vcpu *prev)
@@ -560,6 +682,8 @@ int arch_vcpu_create(struct vcpu *v)
     v->arch.saved_context.sp = (register_t)v->arch.cpu_info;
     v->arch.saved_context.pc = (register_t)continue_new_vcpu;
 
+    spin_lock_init(&v->arch.runstate_guest_lock);
+
     /* Idle VCPUs don't need the rest of this setup */
     if ( is_idle_vcpu(v) )
         return rc;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fee6c3931a..98910b7cf1 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
         wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
 }
 
+int arch_vcpu_setup_runstate(struct vcpu *v,
+                             struct vcpu_register_runstate_memory_area area)
+{
+    struct vcpu_runstate_info runstate;
+
+    runstate_guest(v) = area.addr.h;
+
+    if ( v == current )
+    {
+        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
+    }
+    else
+    {
+        vcpu_runstate_get(v, &runstate);
+        __copy_to_guest(runstate_guest(v), &runstate, 1);
+    }
+    return 0;
+}
+
+void arch_vcpu_cleanup_runstate(struct vcpu *v)
+{
+    set_xen_guest_handle(runstate_guest(v), NULL);
+}
+
 /* Update per-VCPU guest runstate shared memory area (if registered). */
 bool update_runstate_area(struct vcpu *v)
 {
@@ -1660,8 +1684,8 @@ bool update_runstate_area(struct vcpu *v)
     if ( VM_ASSIST(v->domain, runstate_update_flag) )
     {
         guest_handle = has_32bit_shinfo(v->domain)
-            ? &v->runstate_guest.compat.p->state_entry_time + 1
-            : &v->runstate_guest.native.p->state_entry_time + 1;
+            ? &v->arch.runstate_guest.compat.p->state_entry_time + 1
+            : &v->arch.runstate_guest.native.p->state_entry_time + 1;
         guest_handle--;
         runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
         __raw_copy_to_guest(guest_handle,
@@ -1674,7 +1698,7 @@ bool update_runstate_area(struct vcpu *v)
         struct compat_vcpu_runstate_info info;
 
         XLAT_vcpu_runstate_info(&info, &runstate);
-        __copy_to_guest(v->runstate_guest.compat, &info, 1);
+        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
         rc = true;
     }
     else
diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
index c46dccc25a..b879e8dd2c 100644
--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -36,7 +36,7 @@ arch_compat_vcpu_op(
             break;
 
         rc = 0;
-        guest_from_compat_handle(v->runstate_guest.compat, area.addr.h);
+        guest_from_compat_handle(v->arch.runstate_guest.compat, area.addr.h);
 
         if ( v == current )
         {
@@ -49,7 +49,7 @@ arch_compat_vcpu_op(
             vcpu_runstate_get(v, &runstate);
             XLAT_vcpu_runstate_info(&info, &runstate);
         }
-        __copy_to_guest(v->runstate_guest.compat, &info, 1);
+        __copy_to_guest(v->arch.runstate_guest.compat, &info, 1);
 
         break;
     }
diff --git a/xen/common/domain.c b/xen/common/domain.c
index e9be05f1d0..cd8595b186 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -727,7 +727,10 @@ int domain_kill(struct domain *d)
         if ( cpupool_move_domain(d, cpupool0) )
             return -ERESTART;
         for_each_vcpu ( d, v )
+        {
+            arch_vcpu_cleanup_runstate(v);
             unmap_vcpu_info(v);
+        }
         d->is_dying = DOMDYING_dead;
         /* Mem event cleanup has to go here because the rings 
          * have to be put before we call put_domain. */
@@ -1167,7 +1170,7 @@ int domain_soft_reset(struct domain *d)
 
     for_each_vcpu ( d, v )
     {
-        set_xen_guest_handle(runstate_guest(v), NULL);
+        arch_vcpu_cleanup_runstate(v);
         unmap_vcpu_info(v);
     }
 
@@ -1494,7 +1497,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
     case VCPUOP_register_runstate_memory_area:
     {
         struct vcpu_register_runstate_memory_area area;
-        struct vcpu_runstate_info runstate;
 
         rc = -EFAULT;
         if ( copy_from_guest(&area, arg, 1) )
@@ -1503,18 +1505,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !guest_handle_okay(area.addr.h, 1) )
             break;
 
-        rc = 0;
-        runstate_guest(v) = area.addr.h;
-
-        if ( v == current )
-        {
-            __copy_to_guest(runstate_guest(v), &v->runstate, 1);
-        }
-        else
-        {
-            vcpu_runstate_get(v, &runstate);
-            __copy_to_guest(runstate_guest(v), &runstate, 1);
-        }
+        rc = arch_vcpu_setup_runstate(v, area);
 
         break;
     }
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 4e2f582006..9df4d10abb 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -206,6 +206,15 @@ struct arch_vcpu
      */
     bool need_flush_to_ram;
 
+    /* runstate guest lock */
+    spinlock_t runstate_guest_lock;
+
+    /* runstate guest info */
+    struct vcpu_runstate_info *runstate_guest;
+
+    /* runstate pages mapped for runstate_guest */
+    struct page_info *runstate_guest_page[2];
+
 }  __cacheline_aligned;
 
 void vcpu_show_execution_state(struct vcpu *);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 6fd94c2e14..c369d22ccc 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -11,6 +11,11 @@
 #include <asm/x86_emulate.h>
 #include <public/vcpu.h>
 #include <public/hvm/hvm_info_table.h>
+#ifdef CONFIG_COMPAT
+#include <compat/vcpu.h>
+DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
+#endif
+
 
 #define has_32bit_shinfo(d)    ((d)->arch.has_32bit_shinfo)
 
@@ -627,6 +632,17 @@ struct arch_vcpu
     struct {
         bool next_interrupt_enabled;
     } monitor;
+
+#ifndef CONFIG_COMPAT
+# define runstate_guest(v) ((v)->arch.runstate_guest)
+    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
+#else
+# define runstate_guest(v) ((v)->arch.runstate_guest.native)
+    union {
+        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
+        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
+    } runstate_guest;
+#endif
 };
 
 struct guest_memory_policy
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 7e51d361de..5e8cbba31d 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -5,6 +5,7 @@
 #include <xen/types.h>
 
 #include <public/xen.h>
+#include <public/vcpu.h>
 #include <asm/domain.h>
 #include <asm/numa.h>
 
@@ -63,6 +64,10 @@ void arch_vcpu_destroy(struct vcpu *v);
 int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
 void unmap_vcpu_info(struct vcpu *v);
 
+int arch_vcpu_setup_runstate(struct vcpu *v,
+                             struct vcpu_register_runstate_memory_area area);
+void arch_vcpu_cleanup_runstate(struct vcpu *v);
+
 int arch_domain_create(struct domain *d,
                        struct xen_domctl_createdomain *config);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..fac030fb83 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -29,11 +29,6 @@
 #include <public/vcpu.h>
 #include <public/event_channel.h>
 
-#ifdef CONFIG_COMPAT
-#include <compat/vcpu.h>
-DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
-#endif
-
 /*
  * Stats
  *
@@ -166,16 +161,7 @@ struct vcpu
     struct sched_unit *sched_unit;
 
     struct vcpu_runstate_info runstate;
-#ifndef CONFIG_COMPAT
-# define runstate_guest(v) ((v)->runstate_guest)
-    XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */
-#else
-# define runstate_guest(v) ((v)->runstate_guest.native)
-    union {
-        XEN_GUEST_HANDLE(vcpu_runstate_info_t) native;
-        XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
-    } runstate_guest; /* guest address */
-#endif
+
     unsigned int     new_state;
 
     /* Has the FPU been initialised? */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-28 15:52 [PATCH v2] xen/arm: Convert runstate address during hypcall Bertrand Marquis
@ 2020-07-28 19:04 ` Stefano Stabellini
  2020-07-29  6:47   ` Bertrand Marquis
  2020-07-28 19:54 ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Stefano Stabellini @ 2020-07-28 19:04 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd,
	Volodymyr Babchuk, Roger Pau Monné

On Tue, 28 Jul 2020, Bertrand Marquis wrote:
> At the moment on Arm, a Linux guest running with KTPI enabled will
> cause the following error when a context switch happens in user mode:
> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
> 
> The error is caused by the virtual address for the runstate area
> registered by the guest only being accessible when the guest is running
> in kernel space when KPTI is enabled.
> 
> To solve this issue, this patch is doing the translation from virtual
> address to physical address during the hypercall and mapping the
> required pages using vmap. This is removing the conversion from virtual
> to physical address during the context switch which is solving the
> problem with KPTI.
> 
> This is done only on arm architecture, the behaviour on x86 is not
> modified by this patch and the address conversion is done as before
> during each context switch.
> 
> This is introducing several limitations in comparison to the previous
> behaviour (on arm only):
> - if the guest is remapping the area at a different physical address Xen
> will continue to update the area at the previous physical address. As
> the area is in kernel space and usually defined as a global variable this
> is something which is believed not to happen. If this is required by a
> guest, it will have to call the hypercall with the new area (even if it
> is at the same virtual address).
> - the area needs to be mapped during the hypercall. For the same reasons
> as for the previous case, even if the area is registered for a different
> vcpu. It is believed that registering an area using a virtual address
> unmapped is not something done.
> 
> inline functions in headers could not be used as the architecture
> domain.h is included before the global domain.h making it impossible
> to use the struct vcpu inside the architecture header.
> This should not have any performance impact as the hypercall is only
> called once per vcpu usually.
> 
> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
> ---
>   Changes in v2
>     - use vmap to map the pages during the hypercall.
>     - reintroduce initial copy during hypercall.
> 
> ---
>  xen/arch/arm/domain.c        | 160 +++++++++++++++++++++++++++++++----
>  xen/arch/x86/domain.c        |  30 ++++++-
>  xen/arch/x86/x86_64/domain.c |   4 +-
>  xen/common/domain.c          |  19 ++---
>  xen/include/asm-arm/domain.h |   9 ++
>  xen/include/asm-x86/domain.h |  16 ++++
>  xen/include/xen/domain.h     |   5 ++
>  xen/include/xen/sched.h      |  16 +---
>  8 files changed, 207 insertions(+), 52 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 31169326b2..c595438bd9 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -19,6 +19,7 @@
>  #include <xen/sched.h>
>  #include <xen/softirq.h>
>  #include <xen/wait.h>
> +#include <xen/vmap.h>
>  
>  #include <asm/alternative.h>
>  #include <asm/cpuerrata.h>
> @@ -275,36 +276,157 @@ static void ctxt_switch_to(struct vcpu *n)
>      virt_timer_restore(n);
>  }
>  
> -/* Update per-VCPU guest runstate shared memory area (if registered). */
> -static void update_runstate_area(struct vcpu *v)
> +static void cleanup_runstate_vcpu_locked(struct vcpu *v)
> +{
> +    if ( v->arch.runstate_guest )
> +    {
> +        vunmap((void *)((unsigned long)v->arch.runstate_guest & PAGE_MASK));
> +
> +        put_page(v->arch.runstate_guest_page[0]);
> +
> +        if ( v->arch.runstate_guest_page[1] )
> +        {
> +            put_page(v->arch.runstate_guest_page[1]);
> +        }
> +        v->arch.runstate_guest = NULL;
> +    }
> +}
> +
> +void arch_vcpu_cleanup_runstate(struct vcpu *v)
>  {
> -    void __user *guest_handle = NULL;
> +    spin_lock(&v->arch.runstate_guest_lock);
> +
> +    cleanup_runstate_vcpu_locked(v);
> +
> +    spin_unlock(&v->arch.runstate_guest_lock);
> +}
> +
> +static int setup_runstate_vcpu_locked(struct vcpu *v, vaddr_t vaddr)
> +{
> +    unsigned int offset;
> +    mfn_t mfn[2];
> +    struct page_info *page;
> +    unsigned int numpages;
>      struct vcpu_runstate_info runstate;
> +    void *p;
>  
> -    if ( guest_handle_is_null(runstate_guest(v)) )
> -        return;
> +    /* user can pass a NULL address to unregister a previous area */
> +    if ( vaddr == 0 )
> +        return 0;
>  
> -    memcpy(&runstate, &v->runstate, sizeof(runstate));
> +    offset = vaddr & ~PAGE_MASK;
>  
> -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +    /* provided address must be aligned to a 64bit */
> +    if ( offset % alignof(struct vcpu_runstate_info) )
>      {
> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
> -        guest_handle--;
> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
> -        smp_wmb();
> +        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
> +                "Invalid offset\n", vaddr);
> +        return -EINVAL;
> +    }
> +
> +    page = get_page_from_gva(v, vaddr, GV2M_WRITE);
> +    if ( !page )
> +    {
> +        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
> +                "Page is not mapped\n", vaddr);
> +        return -EINVAL;
> +    }
> +    mfn[0] = page_to_mfn(page);
> +    v->arch.runstate_guest_page[0] = page;
> +
> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
> +    {
> +        /* guest area is crossing pages */
> +        page = get_page_from_gva(v, vaddr + PAGE_SIZE, GV2M_WRITE);
> +        if ( !page )
> +        {
> +            put_page(v->arch.runstate_guest_page[0]);
> +            gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
> +                    "2nd Page is not mapped\n", vaddr);
> +            return -EINVAL;
> +        }
> +        mfn[1] = page_to_mfn(page);
> +        v->arch.runstate_guest_page[1] = page;
> +        numpages = 2;
> +    }
> +    else
> +    {
> +        v->arch.runstate_guest_page[1] = NULL;
> +        numpages = 1;
> +    }
> +
> +    p = vmap(mfn, numpages);
> +    if ( !p )
> +    {
> +        put_page(v->arch.runstate_guest_page[0]);
> +        if ( numpages == 2 )
> +            put_page(v->arch.runstate_guest_page[1]);
> +
> +        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
> +                "vmap error\n", vaddr);
> +        return -EINVAL;
>      }
>  
> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
> +    v->arch.runstate_guest = p + offset;
>  
> -    if ( guest_handle )
> +    if (v == current)
>      {
> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +        memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
> +    }
> +    else
> +    {
> +        vcpu_runstate_get(v, &runstate);
> +        memcpy(v->arch.runstate_guest, &runstate, sizeof(v->runstate));
> +    }
> +
> +    return 0;
> +}


The arm32 build breaks with:

domain.c: In function 'setup_runstate_vcpu_locked':
domain.c:322:9: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
         gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
         ^
domain.c:330:9: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
         gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
         ^
domain.c:344:13: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
             gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
             ^
domain.c:365:9: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
         gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
         ^
cc1: all warnings being treated as errors


> +int arch_vcpu_setup_runstate(struct vcpu *v,
> +                             struct vcpu_register_runstate_memory_area area)
> +{
> +    int rc;
> +
> +    spin_lock(&v->arch.runstate_guest_lock);
> +
> +    /* cleanup if we are recalled */
> +    cleanup_runstate_vcpu_locked(v);
> +
> +    rc = setup_runstate_vcpu_locked(v, (vaddr_t)area.addr.v);
> +
> +    spin_unlock(&v->arch.runstate_guest_lock);
> +
> +    return rc;
> +}
> +
> +
> +/* Update per-VCPU guest runstate shared memory area (if registered). */
> +static void update_runstate_area(struct vcpu *v)
> +{
> +    spin_lock(&v->arch.runstate_guest_lock);
> +
> +    if ( v->arch.runstate_guest )
> +    {
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            v->arch.runstate_guest->state_entry_time |= XEN_RUNSTATE_UPDATE;

Please use write_atomic (as suggested by Julien here:
https://marc.info/?l=xen-devel&m=159225391619240)


> +            smp_wmb();
> +        }
> +
> +        memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
> +
> +        /* copy must be done before switching the bit */
>          smp_wmb();
> -        __raw_copy_to_guest(guest_handle,
> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
> +
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            v->arch.runstate_guest->state_entry_time &= ~XEN_RUNSTATE_UPDATE;

and here too

The rest looks OK to me.


> +        }
>      }
> +
> +    spin_unlock(&v->arch.runstate_guest_lock);
>  }
>  
>  static void schedule_tail(struct vcpu *prev)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-28 15:52 [PATCH v2] xen/arm: Convert runstate address during hypcall Bertrand Marquis
  2020-07-28 19:04 ` Stefano Stabellini
@ 2020-07-28 19:54 ` Jan Beulich
  2020-07-29  7:08   ` Bertrand Marquis
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2020-07-28 19:54 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

On 28.07.2020 17:52, Bertrand Marquis wrote:
> At the moment on Arm, a Linux guest running with KTPI enabled will
> cause the following error when a context switch happens in user mode:
> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
> 
> The error is caused by the virtual address for the runstate area
> registered by the guest only being accessible when the guest is running
> in kernel space when KPTI is enabled.
> 
> To solve this issue, this patch is doing the translation from virtual
> address to physical address during the hypercall and mapping the
> required pages using vmap. This is removing the conversion from virtual
> to physical address during the context switch which is solving the
> problem with KPTI.
> 
> This is done only on arm architecture, the behaviour on x86 is not
> modified by this patch and the address conversion is done as before
> during each context switch.
> 
> This is introducing several limitations in comparison to the previous
> behaviour (on arm only):
> - if the guest is remapping the area at a different physical address Xen
> will continue to update the area at the previous physical address. As
> the area is in kernel space and usually defined as a global variable this
> is something which is believed not to happen. If this is required by a
> guest, it will have to call the hypercall with the new area (even if it
> is at the same virtual address).
> - the area needs to be mapped during the hypercall. For the same reasons
> as for the previous case, even if the area is registered for a different
> vcpu. It is believed that registering an area using a virtual address
> unmapped is not something done.

Beside me thinking that an in-use and stable ABI can't be changed like
this, no matter what is "believed" kernel code may or may not do, I
also don't think having arch-es diverge in behavior here is a good
idea. Use of commonly available interfaces shouldn't lead to head
aches or surprises when porting code from one arch to another. I'm
pretty sure it was suggested before: Why don't you simply introduce
a physical address based hypercall (and then also on x86 at the same
time, keeping functional parity)? I even seem to recall giving a
suggestion how to fit this into a future "physical addresses only"
model, as long as we can settle on the basic principles of that
conversion path that we want to go sooner or later anyway (as I
understand).

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>           wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
>   }
>   
> +int arch_vcpu_setup_runstate(struct vcpu *v,
> +                             struct vcpu_register_runstate_memory_area area)
> +{
> +    struct vcpu_runstate_info runstate;
> +
> +    runstate_guest(v) = area.addr.h;
> +
> +    if ( v == current )
> +    {
> +        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
> +    }

Pointless braces (and I think there are more instances).

> +    else
> +    {
> +        vcpu_runstate_get(v, &runstate);
> +        __copy_to_guest(runstate_guest(v), &runstate, 1);
> +    }
> +    return 0;

Missing blank line before main "return".

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-28 19:04 ` Stefano Stabellini
@ 2020-07-29  6:47   ` Bertrand Marquis
  0 siblings, 0 replies; 14+ messages in thread
From: Bertrand Marquis @ 2020-07-29  6:47 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Jan Beulich, Xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné



> On 28 Jul 2020, at 21:04, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Tue, 28 Jul 2020, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>> 
>> The error is caused by the virtual address for the runstate area
>> registered by the guest only being accessible when the guest is running
>> in kernel space when KPTI is enabled.
>> 
>> To solve this issue, this patch is doing the translation from virtual
>> address to physical address during the hypercall and mapping the
>> required pages using vmap. This is removing the conversion from virtual
>> to physical address during the context switch which is solving the
>> problem with KPTI.
>> 
>> This is done only on arm architecture, the behaviour on x86 is not
>> modified by this patch and the address conversion is done as before
>> during each context switch.
>> 
>> This is introducing several limitations in comparison to the previous
>> behaviour (on arm only):
>> - if the guest is remapping the area at a different physical address Xen
>> will continue to update the area at the previous physical address. As
>> the area is in kernel space and usually defined as a global variable this
>> is something which is believed not to happen. If this is required by a
>> guest, it will have to call the hypercall with the new area (even if it
>> is at the same virtual address).
>> - the area needs to be mapped during the hypercall. For the same reasons
>> as for the previous case, even if the area is registered for a different
>> vcpu. It is believed that registering an area using a virtual address
>> unmapped is not something done.
>> 
>> inline functions in headers could not be used as the architecture
>> domain.h is included before the global domain.h making it impossible
>> to use the struct vcpu inside the architecture header.
>> This should not have any performance impact as the hypercall is only
>> called once per vcpu usually.
>> 
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
>> ---
>>  Changes in v2
>>    - use vmap to map the pages during the hypercall.
>>    - reintroduce initial copy during hypercall.
>> 
>> ---
>> xen/arch/arm/domain.c        | 160 +++++++++++++++++++++++++++++++----
>> xen/arch/x86/domain.c        |  30 ++++++-
>> xen/arch/x86/x86_64/domain.c |   4 +-
>> xen/common/domain.c          |  19 ++---
>> xen/include/asm-arm/domain.h |   9 ++
>> xen/include/asm-x86/domain.h |  16 ++++
>> xen/include/xen/domain.h     |   5 ++
>> xen/include/xen/sched.h      |  16 +---
>> 8 files changed, 207 insertions(+), 52 deletions(-)
>> 
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 31169326b2..c595438bd9 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -19,6 +19,7 @@
>> #include <xen/sched.h>
>> #include <xen/softirq.h>
>> #include <xen/wait.h>
>> +#include <xen/vmap.h>
>> 
>> #include <asm/alternative.h>
>> #include <asm/cpuerrata.h>
>> @@ -275,36 +276,157 @@ static void ctxt_switch_to(struct vcpu *n)
>>     virt_timer_restore(n);
>> }
>> 
>> -/* Update per-VCPU guest runstate shared memory area (if registered). */
>> -static void update_runstate_area(struct vcpu *v)
>> +static void cleanup_runstate_vcpu_locked(struct vcpu *v)
>> +{
>> +    if ( v->arch.runstate_guest )
>> +    {
>> +        vunmap((void *)((unsigned long)v->arch.runstate_guest & PAGE_MASK));
>> +
>> +        put_page(v->arch.runstate_guest_page[0]);
>> +
>> +        if ( v->arch.runstate_guest_page[1] )
>> +        {
>> +            put_page(v->arch.runstate_guest_page[1]);
>> +        }
>> +        v->arch.runstate_guest = NULL;
>> +    }
>> +}
>> +
>> +void arch_vcpu_cleanup_runstate(struct vcpu *v)
>> {
>> -    void __user *guest_handle = NULL;
>> +    spin_lock(&v->arch.runstate_guest_lock);
>> +
>> +    cleanup_runstate_vcpu_locked(v);
>> +
>> +    spin_unlock(&v->arch.runstate_guest_lock);
>> +}
>> +
>> +static int setup_runstate_vcpu_locked(struct vcpu *v, vaddr_t vaddr)
>> +{
>> +    unsigned int offset;
>> +    mfn_t mfn[2];
>> +    struct page_info *page;
>> +    unsigned int numpages;
>>     struct vcpu_runstate_info runstate;
>> +    void *p;
>> 
>> -    if ( guest_handle_is_null(runstate_guest(v)) )
>> -        return;
>> +    /* user can pass a NULL address to unregister a previous area */
>> +    if ( vaddr == 0 )
>> +        return 0;
>> 
>> -    memcpy(&runstate, &v->runstate, sizeof(runstate));
>> +    offset = vaddr & ~PAGE_MASK;
>> 
>> -    if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +    /* provided address must be aligned to a 64bit */
>> +    if ( offset % alignof(struct vcpu_runstate_info) )
>>     {
>> -        guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>> -        guest_handle--;
>> -        runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> -        smp_wmb();
>> +        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>> +                "Invalid offset\n", vaddr);
>> +        return -EINVAL;
>> +    }
>> +
>> +    page = get_page_from_gva(v, vaddr, GV2M_WRITE);
>> +    if ( !page )
>> +    {
>> +        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>> +                "Page is not mapped\n", vaddr);
>> +        return -EINVAL;
>> +    }
>> +    mfn[0] = page_to_mfn(page);
>> +    v->arch.runstate_guest_page[0] = page;
>> +
>> +    if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) )
>> +    {
>> +        /* guest area is crossing pages */
>> +        page = get_page_from_gva(v, vaddr + PAGE_SIZE, GV2M_WRITE);
>> +        if ( !page )
>> +        {
>> +            put_page(v->arch.runstate_guest_page[0]);
>> +            gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>> +                    "2nd Page is not mapped\n", vaddr);
>> +            return -EINVAL;
>> +        }
>> +        mfn[1] = page_to_mfn(page);
>> +        v->arch.runstate_guest_page[1] = page;
>> +        numpages = 2;
>> +    }
>> +    else
>> +    {
>> +        v->arch.runstate_guest_page[1] = NULL;
>> +        numpages = 1;
>> +    }
>> +
>> +    p = vmap(mfn, numpages);
>> +    if ( !p )
>> +    {
>> +        put_page(v->arch.runstate_guest_page[0]);
>> +        if ( numpages == 2 )
>> +            put_page(v->arch.runstate_guest_page[1]);
>> +
>> +        gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>> +                "vmap error\n", vaddr);
>> +        return -EINVAL;
>>     }
>> 
>> -    __copy_to_guest(runstate_guest(v), &runstate, 1);
>> +    v->arch.runstate_guest = p + offset;
>> 
>> -    if ( guest_handle )
>> +    if (v == current)
>>     {
>> -        runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +        memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
>> +    }
>> +    else
>> +    {
>> +        vcpu_runstate_get(v, &runstate);
>> +        memcpy(v->arch.runstate_guest, &runstate, sizeof(v->runstate));
>> +    }
>> +
>> +    return 0;
>> +}
> 
> 
> The arm32 build breaks with:
> 
> domain.c: In function 'setup_runstate_vcpu_locked':
> domain.c:322:9: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
>         gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>         ^
> domain.c:330:9: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
>         gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>         ^
> domain.c:344:13: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
>             gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>             ^
> domain.c:365:9: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'vaddr_t' [-Werror=format=]
>         gprintk(XENLOG_WARNING, "Cannot map runstate pointer at 0x%lx: "
>         ^
> cc1: all warnings being treated as errors

My bad. I tested x86 and arm64 build but forgot the arm32.
I will fix that.

> 
> 
>> +int arch_vcpu_setup_runstate(struct vcpu *v,
>> +                             struct vcpu_register_runstate_memory_area area)
>> +{
>> +    int rc;
>> +
>> +    spin_lock(&v->arch.runstate_guest_lock);
>> +
>> +    /* cleanup if we are recalled */
>> +    cleanup_runstate_vcpu_locked(v);
>> +
>> +    rc = setup_runstate_vcpu_locked(v, (vaddr_t)area.addr.v);
>> +
>> +    spin_unlock(&v->arch.runstate_guest_lock);
>> +
>> +    return rc;
>> +}
>> +
>> +
>> +/* Update per-VCPU guest runstate shared memory area (if registered). */
>> +static void update_runstate_area(struct vcpu *v)
>> +{
>> +    spin_lock(&v->arch.runstate_guest_lock);
>> +
>> +    if ( v->arch.runstate_guest )
>> +    {
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> +            v->arch.runstate_guest->state_entry_time |= XEN_RUNSTATE_UPDATE;
> 
> Please use write_atomic (as suggested by Julien here:
> https://marc.info/?l=xen-devel&m=159225391619240)

I will do that.

> 
> 
>> +            smp_wmb();
>> +        }
>> +
>> +        memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate));
>> +
>> +        /* copy must be done before switching the bit */
>>         smp_wmb();
>> -        __raw_copy_to_guest(guest_handle,
>> -                            (void *)(&runstate.state_entry_time + 1) - 1, 1);
>> +
>> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> +        {
>> +            v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> +            v->arch.runstate_guest->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> 
> and here too
> 
> The rest looks OK to me.

Thanks for the review.
Regards
Bertrand

> 
> 
>> +        }
>>     }
>> +
>> +    spin_unlock(&v->arch.runstate_guest_lock);
>> }
>> 
>> static void schedule_tail(struct vcpu *prev)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-28 19:54 ` Jan Beulich
@ 2020-07-29  7:08   ` Bertrand Marquis
  2020-07-29 18:41     ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Bertrand Marquis @ 2020-07-29  7:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné



> On 28 Jul 2020, at 21:54, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 28.07.2020 17:52, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>> The error is caused by the virtual address for the runstate area
>> registered by the guest only being accessible when the guest is running
>> in kernel space when KPTI is enabled.
>> To solve this issue, this patch is doing the translation from virtual
>> address to physical address during the hypercall and mapping the
>> required pages using vmap. This is removing the conversion from virtual
>> to physical address during the context switch which is solving the
>> problem with KPTI.
>> This is done only on arm architecture, the behaviour on x86 is not
>> modified by this patch and the address conversion is done as before
>> during each context switch.
>> This is introducing several limitations in comparison to the previous
>> behaviour (on arm only):
>> - if the guest is remapping the area at a different physical address Xen
>> will continue to update the area at the previous physical address. As
>> the area is in kernel space and usually defined as a global variable this
>> is something which is believed not to happen. If this is required by a
>> guest, it will have to call the hypercall with the new area (even if it
>> is at the same virtual address).
>> - the area needs to be mapped during the hypercall. For the same reasons
>> as for the previous case, even if the area is registered for a different
>> vcpu. It is believed that registering an area using a virtual address
>> unmapped is not something done.
> 
> Beside me thinking that an in-use and stable ABI can't be changed like
> this, no matter what is "believed" kernel code may or may not do, I
> also don't think having arch-es diverge in behavior here is a good
> idea. Use of commonly available interfaces shouldn't lead to head
> aches or surprises when porting code from one arch to another. I'm
> pretty sure it was suggested before: Why don't you simply introduce
> a physical address based hypercall (and then also on x86 at the same
> time, keeping functional parity)? I even seem to recall giving a
> suggestion how to fit this into a future "physical addresses only"
> model, as long as we can settle on the basic principles of that
> conversion path that we want to go sooner or later anyway (as I
> understand).

I fully agree with the “physical address only” model and i think it must be
done. Introducing a new hypercall taking a physical address as parameter
is the long term solution (and I would even volunteer to do it in a new patchset).
But this would not solve the issue here unless linux is modified.
So I do see this patch as a “bug fix”.

> 
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>>          wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
>>  }
>>  +int arch_vcpu_setup_runstate(struct vcpu *v,
>> +                             struct vcpu_register_runstate_memory_area area)
>> +{
>> +    struct vcpu_runstate_info runstate;
>> +
>> +    runstate_guest(v) = area.addr.h;
>> +
>> +    if ( v == current )
>> +    {
>> +        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
>> +    }
> 
> Pointless braces (and I think there are more instances).

So:
if cond
   instruction
else
{
   xxx
}

is something that should be done in Xen ?

Sorry if i do those kind of mistakes in the future as i am more used to a model
where no braces is an absolute no-go. I will try to remember this.

> 
>> +    else
>> +    {
>> +        vcpu_runstate_get(v, &runstate);
>> +        __copy_to_guest(runstate_guest(v), &runstate, 1);
>> +    }
>> +    return 0;
> 
> Missing blank line before main "return".

I will fix it.
Bertrand


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-29  7:08   ` Bertrand Marquis
@ 2020-07-29 18:41     ` Jan Beulich
  2020-07-30  1:30       ` Stefano Stabellini
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2020-07-29 18:41 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

On 29.07.2020 09:08, Bertrand Marquis wrote:
> 
> 
>> On 28 Jul 2020, at 21:54, Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.07.2020 17:52, Bertrand Marquis wrote:
>>> At the moment on Arm, a Linux guest running with KTPI enabled will
>>> cause the following error when a context switch happens in user mode:
>>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>>> The error is caused by the virtual address for the runstate area
>>> registered by the guest only being accessible when the guest is running
>>> in kernel space when KPTI is enabled.
>>> To solve this issue, this patch is doing the translation from virtual
>>> address to physical address during the hypercall and mapping the
>>> required pages using vmap. This is removing the conversion from virtual
>>> to physical address during the context switch which is solving the
>>> problem with KPTI.
>>> This is done only on arm architecture, the behaviour on x86 is not
>>> modified by this patch and the address conversion is done as before
>>> during each context switch.
>>> This is introducing several limitations in comparison to the previous
>>> behaviour (on arm only):
>>> - if the guest is remapping the area at a different physical address Xen
>>> will continue to update the area at the previous physical address. As
>>> the area is in kernel space and usually defined as a global variable this
>>> is something which is believed not to happen. If this is required by a
>>> guest, it will have to call the hypercall with the new area (even if it
>>> is at the same virtual address).
>>> - the area needs to be mapped during the hypercall. For the same reasons
>>> as for the previous case, even if the area is registered for a different
>>> vcpu. It is believed that registering an area using a virtual address
>>> unmapped is not something done.
>>
>> Beside me thinking that an in-use and stable ABI can't be changed like
>> this, no matter what is "believed" kernel code may or may not do, I
>> also don't think having arch-es diverge in behavior here is a good
>> idea. Use of commonly available interfaces shouldn't lead to head
>> aches or surprises when porting code from one arch to another. I'm
>> pretty sure it was suggested before: Why don't you simply introduce
>> a physical address based hypercall (and then also on x86 at the same
>> time, keeping functional parity)? I even seem to recall giving a
>> suggestion how to fit this into a future "physical addresses only"
>> model, as long as we can settle on the basic principles of that
>> conversion path that we want to go sooner or later anyway (as I
>> understand).
> 
> I fully agree with the “physical address only” model and i think it must be
> done. Introducing a new hypercall taking a physical address as parameter
> is the long term solution (and I would even volunteer to do it in a new patchset).
> But this would not solve the issue here unless linux is modified.
> So I do see this patch as a “bug fix”.

Well, it is sort of implied by my previous reply that we won't get away
without an OS side change here. The prereq to get away without would be
that it is okay to change the behavior of a hypercall like you do, and
that it is okay to make the behavior diverge between arch-es. I think
I've made pretty clear that I don't think either is really an option.

>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
>>>           wrmsr_tsc_aux(v->arch.msrs->tsc_aux);
>>>   }
>>>   +int arch_vcpu_setup_runstate(struct vcpu *v,
>>> +                             struct vcpu_register_runstate_memory_area area)
>>> +{
>>> +    struct vcpu_runstate_info runstate;
>>> +
>>> +    runstate_guest(v) = area.addr.h;
>>> +
>>> +    if ( v == current )
>>> +    {
>>> +        __copy_to_guest(runstate_guest(v), &v->runstate, 1);
>>> +    }
>>
>> Pointless braces (and I think there are more instances).
> 
> So:
> if cond
>     instruction
> else
> {
>     xxx
> }
> 
> is something that should be done in Xen ?

Yes. Especially in coding styles placing opening braces on their own
lines this is, afaik, quite common.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-29 18:41     ` Jan Beulich
@ 2020-07-30  1:30       ` Stefano Stabellini
  2020-07-31  6:39         ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Stefano Stabellini @ 2020-07-30  1:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Bertrand Marquis, Xen-devel, nd,
	Volodymyr Babchuk, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 5095 bytes --]

On Wed, 29 Jul 2020, Jan Beulich wrote:
> On 29.07.2020 09:08, Bertrand Marquis wrote:
> > > On 28 Jul 2020, at 21:54, Jan Beulich <jbeulich@suse.com> wrote:
> > > 
> > > On 28.07.2020 17:52, Bertrand Marquis wrote:
> > > > At the moment on Arm, a Linux guest running with KTPI enabled will
> > > > cause the following error when a context switch happens in user mode:
> > > > (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
> > > > The error is caused by the virtual address for the runstate area
> > > > registered by the guest only being accessible when the guest is running
> > > > in kernel space when KPTI is enabled.
> > > > To solve this issue, this patch is doing the translation from virtual
> > > > address to physical address during the hypercall and mapping the
> > > > required pages using vmap. This is removing the conversion from virtual
> > > > to physical address during the context switch which is solving the
> > > > problem with KPTI.
> > > > This is done only on arm architecture, the behaviour on x86 is not
> > > > modified by this patch and the address conversion is done as before
> > > > during each context switch.
> > > > This is introducing several limitations in comparison to the previous
> > > > behaviour (on arm only):
> > > > - if the guest is remapping the area at a different physical address Xen
> > > > will continue to update the area at the previous physical address. As
> > > > the area is in kernel space and usually defined as a global variable
> > > > this
> > > > is something which is believed not to happen. If this is required by a
> > > > guest, it will have to call the hypercall with the new area (even if it
> > > > is at the same virtual address).
> > > > - the area needs to be mapped during the hypercall. For the same reasons
> > > > as for the previous case, even if the area is registered for a different
> > > > vcpu. It is believed that registering an area using a virtual address
> > > > unmapped is not something done.
> > > 
> > > Beside me thinking that an in-use and stable ABI can't be changed like
> > > this, no matter what is "believed" kernel code may or may not do, I
> > > also don't think having arch-es diverge in behavior here is a good
> > > idea. Use of commonly available interfaces shouldn't lead to head
> > > aches or surprises when porting code from one arch to another. I'm
> > > pretty sure it was suggested before: Why don't you simply introduce
> > > a physical address based hypercall (and then also on x86 at the same
> > > time, keeping functional parity)? I even seem to recall giving a
> > > suggestion how to fit this into a future "physical addresses only"
> > > model, as long as we can settle on the basic principles of that
> > > conversion path that we want to go sooner or later anyway (as I
> > > understand).
> > 
> > I fully agree with the “physical address only” model and i think it must be
> > done. Introducing a new hypercall taking a physical address as parameter
> > is the long term solution (and I would even volunteer to do it in a new
> > patchset).
> > But this would not solve the issue here unless linux is modified.
> > So I do see this patch as a “bug fix”.
> 
> Well, it is sort of implied by my previous reply that we won't get away
> without an OS side change here. The prereq to get away without would be
> that it is okay to change the behavior of a hypercall like you do, and
> that it is okay to make the behavior diverge between arch-es. I think
> I've made pretty clear that I don't think either is really an option.

Hi Jan,

This is a difficult problem to solve and the current situation honestly
sucks: there is no way to solve the problem without making compromises.

The new hypercall is good-to-have in any case (it is a better interface)
but it is not a full solution.  If we introduce a new hypercall we fix
new guests but don't fix existing guests. If we change Linux in any way,
we are still going to have problems with all already-released kernel
binaries. Leaving the issue unfixed is not an option either because the
problem can't be ignored.

There is no simple way out of this situation.


After careful considerations and many discussions (most of them on
xen-devel [1][2]) we came to the realization that changing the behavior
for this hypercall on ARM is the best we can do:

1) it solves the problem for existing guests
2) it has no ill effects as far as we know


Are we happy about it? We are not. It is the best we can do given the
constraints. The rule to never change hypercalls is a good one to have
and I fully support it, but in this unique circumstance it is best to
make an exception.

The idea is to minimize the impact to x86 so that's why Bertrand's patch
is introducing a divergence between Arm and x86 on this hypercall.


In summary, please consider this patch together with its context. This
has been the only time to my memory when we had to do this -- it is
certainly a unique situation.



[1] https://marc.info/?l=xen-devel&m=158946660216159
[2] https://marc.info/?l=xen-devel&m=159067967432381

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-30  1:30       ` Stefano Stabellini
@ 2020-07-31  6:39         ` Jan Beulich
  2020-07-31 10:12           ` Julien Grall
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2020-07-31  6:39 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Bertrand Marquis, Xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

On 30.07.2020 03:30, Stefano Stabellini wrote:
> On Wed, 29 Jul 2020, Jan Beulich wrote:
>> On 29.07.2020 09:08, Bertrand Marquis wrote:
>>>> On 28 Jul 2020, at 21:54, Jan Beulich <jbeulich@suse.com> wrote:
>>>> On 28.07.2020 17:52, Bertrand Marquis wrote:
>>>>> At the moment on Arm, a Linux guest running with KTPI enabled will
>>>>> cause the following error when a context switch happens in user mode:
>>>>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>>>>> The error is caused by the virtual address for the runstate area
>>>>> registered by the guest only being accessible when the guest is running
>>>>> in kernel space when KPTI is enabled.
>>>>> To solve this issue, this patch is doing the translation from virtual
>>>>> address to physical address during the hypercall and mapping the
>>>>> required pages using vmap. This is removing the conversion from virtual
>>>>> to physical address during the context switch which is solving the
>>>>> problem with KPTI.
>>>>> This is done only on arm architecture, the behaviour on x86 is not
>>>>> modified by this patch and the address conversion is done as before
>>>>> during each context switch.
>>>>> This is introducing several limitations in comparison to the previous
>>>>> behaviour (on arm only):
>>>>> - if the guest is remapping the area at a different physical address Xen
>>>>> will continue to update the area at the previous physical address. As
>>>>> the area is in kernel space and usually defined as a global variable
>>>>> this
>>>>> is something which is believed not to happen. If this is required by a
>>>>> guest, it will have to call the hypercall with the new area (even if it
>>>>> is at the same virtual address).
>>>>> - the area needs to be mapped during the hypercall. For the same reasons
>>>>> as for the previous case, even if the area is registered for a different
>>>>> vcpu. It is believed that registering an area using a virtual address
>>>>> unmapped is not something done.
>>>>
>>>> Beside me thinking that an in-use and stable ABI can't be changed like
>>>> this, no matter what is "believed" kernel code may or may not do, I
>>>> also don't think having arch-es diverge in behavior here is a good
>>>> idea. Use of commonly available interfaces shouldn't lead to head
>>>> aches or surprises when porting code from one arch to another. I'm
>>>> pretty sure it was suggested before: Why don't you simply introduce
>>>> a physical address based hypercall (and then also on x86 at the same
>>>> time, keeping functional parity)? I even seem to recall giving a
>>>> suggestion how to fit this into a future "physical addresses only"
>>>> model, as long as we can settle on the basic principles of that
>>>> conversion path that we want to go sooner or later anyway (as I
>>>> understand).
>>>
>>> I fully agree with the “physical address only” model and i think it must be
>>> done. Introducing a new hypercall taking a physical address as parameter
>>> is the long term solution (and I would even volunteer to do it in a new
>>> patchset).
>>> But this would not solve the issue here unless linux is modified.
>>> So I do see this patch as a “bug fix”.
>>
>> Well, it is sort of implied by my previous reply that we won't get away
>> without an OS side change here. The prereq to get away without would be
>> that it is okay to change the behavior of a hypercall like you do, and
>> that it is okay to make the behavior diverge between arch-es. I think
>> I've made pretty clear that I don't think either is really an option.
> 
> This is a difficult problem to solve and the current situation honestly
> sucks: there is no way to solve the problem without making compromises.
> 
> The new hypercall is good-to-have in any case (it is a better interface)
> but it is not a full solution.  If we introduce a new hypercall we fix
> new guests but don't fix existing guests. If we change Linux in any way,
> we are still going to have problems with all already-released kernel
> binaries. Leaving the issue unfixed is not an option either because the
> problem can't be ignored.

We're fixing other issues without breaking the ABI. Where's the
problem of backporting the kernel side change (which I anticipate
to not be overly involved)?

If the plan remains to be to make an ABI breaking change, then I
think this will need an explicit vote.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-31  6:39         ` Jan Beulich
@ 2020-07-31 10:12           ` Julien Grall
  2020-07-31 10:18             ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Julien Grall @ 2020-07-31 10:12 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Bertrand Marquis, Xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné

Hi Jan,

On 31/07/2020 07:39, Jan Beulich wrote:
> We're fixing other issues without breaking the ABI. Where's the
> problem of backporting the kernel side change (which I anticipate
> to not be overly involved)?
This means you can't take advantage of the runstage on existing Linux 
without any modification.

> If the plan remains to be to make an ABI breaking change,

For a theoritical PoV, this is a ABI breakage. However, I fail to see 
how the restrictions added would affect OSes at least on Arm.

In particular, you can't change the VA -> PA on Arm without going 
through an invalid mapping. So I wouldn't expect this to happen for the 
runstate.

The only part that *may* be an issue is if the guest is registering the 
runstate with an initially invalid VA. Although, I have yet to see that 
in practice. Maybe you know?

>  then I
> think this will need an explicit vote.

I was under the impression that the two Arm maintainers (Stefano and I) 
already agreed with the approach here. Therefore, given the ABI breakage 
is only affecting Arm, why would we need a vote?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-31 10:12           ` Julien Grall
@ 2020-07-31 10:18             ` Jan Beulich
  2020-07-31 14:36               ` Bertrand Marquis
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2020-07-31 10:18 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Bertrand Marquis, Xen-devel, nd,
	Volodymyr Babchuk, Roger Pau Monné

On 31.07.2020 12:12, Julien Grall wrote:
> On 31/07/2020 07:39, Jan Beulich wrote:
>> We're fixing other issues without breaking the ABI. Where's the
>> problem of backporting the kernel side change (which I anticipate
>> to not be overly involved)?
> This means you can't take advantage of the runstage on existing Linux 
> without any modification.
> 
>> If the plan remains to be to make an ABI breaking change,
> 
> For a theoritical PoV, this is a ABI breakage. However, I fail to see 
> how the restrictions added would affect OSes at least on Arm.

"OSes" covering what? Just Linux?

> In particular, you can't change the VA -> PA on Arm without going 
> through an invalid mapping. So I wouldn't expect this to happen for the 
> runstate.
> 
> The only part that *may* be an issue is if the guest is registering the 
> runstate with an initially invalid VA. Although, I have yet to see that 
> in practice. Maybe you know?

I'm unaware of any such use, but this means close to nothing.

>>  then I
>> think this will need an explicit vote.
> 
> I was under the impression that the two Arm maintainers (Stefano and I) 
> already agreed with the approach here. Therefore, given the ABI breakage 
> is only affecting Arm, why would we need a vote?

The problem here is of conceptual nature: You're planning to
make the behavior of a common hypercall diverge between
architectures, and in a retroactive fashion. Imo that's nothing
we should do even for new hypercalls, if _at all_ avoidable. If
we allow this here, we'll have a precedent that people later
may (and based on my experience will, sooner or later) reference,
to get their own change justified.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-31 10:18             ` Jan Beulich
@ 2020-07-31 14:36               ` Bertrand Marquis
  2020-07-31 23:03                 ` Stefano Stabellini
  0 siblings, 1 reply; 14+ messages in thread
From: Bertrand Marquis @ 2020-07-31 14:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Xen-devel, nd, Volodymyr Babchuk,
	Roger Pau Monné



> On 31 Jul 2020, at 12:18, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 31.07.2020 12:12, Julien Grall wrote:
>> On 31/07/2020 07:39, Jan Beulich wrote:
>>> We're fixing other issues without breaking the ABI. Where's the
>>> problem of backporting the kernel side change (which I anticipate
>>> to not be overly involved)?
>> This means you can't take advantage of the runstage on existing Linux 
>> without any modification.
>> 
>>> If the plan remains to be to make an ABI breaking change,
>> 
>> For a theoritical PoV, this is a ABI breakage. However, I fail to see 
>> how the restrictions added would affect OSes at least on Arm.
> 
> "OSes" covering what? Just Linux?
> 
>> In particular, you can't change the VA -> PA on Arm without going 
>> through an invalid mapping. So I wouldn't expect this to happen for the 
>> runstate.
>> 
>> The only part that *may* be an issue is if the guest is registering the 
>> runstate with an initially invalid VA. Although, I have yet to see that 
>> in practice. Maybe you know?
> 
> I'm unaware of any such use, but this means close to nothing.
> 
>>> then I
>>> think this will need an explicit vote.
>> 
>> I was under the impression that the two Arm maintainers (Stefano and I) 
>> already agreed with the approach here. Therefore, given the ABI breakage 
>> is only affecting Arm, why would we need a vote?
> 
> The problem here is of conceptual nature: You're planning to
> make the behavior of a common hypercall diverge between
> architectures, and in a retroactive fashion. Imo that's nothing
> we should do even for new hypercalls, if _at all_ avoidable. If
> we allow this here, we'll have a precedent that people later
> may (and based on my experience will, sooner or later) reference,
> to get their own change justified.

After a discussion with Jan, he is proposing to have a guest config setting to
turn on or off the translation of the address during the hypercall and add a
global Xen command line parameter to set the global default behaviour. 
With this was done on arm could be done on x86 and the current behaviour
would be kept by default but possible to modify by configuration.

@Jan: please correct me if i said something wrong
@others: what is your view on this solution ?

Bertrand



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-31 14:36               ` Bertrand Marquis
@ 2020-07-31 23:03                 ` Stefano Stabellini
  2020-08-14  9:25                   ` Bertrand Marquis
  0 siblings, 1 reply; 14+ messages in thread
From: Stefano Stabellini @ 2020-07-31 23:03 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Xen-devel, nd,
	Volodymyr Babchuk, Roger Pau Monné

On Fri, 31 Jul 2020, Bertrand Marquis wrote:
> > On 31 Jul 2020, at 12:18, Jan Beulich <jbeulich@suse.com> wrote:
> > 
> > On 31.07.2020 12:12, Julien Grall wrote:
> >> On 31/07/2020 07:39, Jan Beulich wrote:
> >>> We're fixing other issues without breaking the ABI. Where's the
> >>> problem of backporting the kernel side change (which I anticipate
> >>> to not be overly involved)?
> >> This means you can't take advantage of the runstage on existing Linux 
> >> without any modification.
> >> 
> >>> If the plan remains to be to make an ABI breaking change,
> >> 
> >> For a theoritical PoV, this is a ABI breakage. However, I fail to see 
> >> how the restrictions added would affect OSes at least on Arm.
> > 
> > "OSes" covering what? Just Linux?
> > 
> >> In particular, you can't change the VA -> PA on Arm without going 
> >> through an invalid mapping. So I wouldn't expect this to happen for the 
> >> runstate.
> >> 
> >> The only part that *may* be an issue is if the guest is registering the 
> >> runstate with an initially invalid VA. Although, I have yet to see that 
> >> in practice. Maybe you know?
> > 
> > I'm unaware of any such use, but this means close to nothing.
> > 
> >>> then I
> >>> think this will need an explicit vote.
> >> 
> >> I was under the impression that the two Arm maintainers (Stefano and I) 
> >> already agreed with the approach here. Therefore, given the ABI breakage 
> >> is only affecting Arm, why would we need a vote?
> > 
> > The problem here is of conceptual nature: You're planning to
> > make the behavior of a common hypercall diverge between
> > architectures, and in a retroactive fashion. Imo that's nothing
> > we should do even for new hypercalls, if _at all_ avoidable. If
> > we allow this here, we'll have a precedent that people later
> > may (and based on my experience will, sooner or later) reference,
> > to get their own change justified.

Please let's avoid "slippery slope" arguments
(https://en.wikipedia.org/wiki/Slippery_slope)

We shouldn't consider this instance as the first in a long series of bad
decisions on hypercall compatibility. Each new case, if there will be
any, will have to be considered based on its own merits. Also, let's
keep in mind that there have been no other cases in the last 8 years. (I
would like to repeat my support for hypercall ABI compatibility.)


I would also kindly ask not to put the discussion on a "conceptual"
level: there is no way to fix all guests and also keep compatibility.
From a conceptual point of view, it is already game over :-)


> After a discussion with Jan, he is proposing to have a guest config setting to
> turn on or off the translation of the address during the hypercall and add a
> global Xen command line parameter to set the global default behaviour. 
> With this was done on arm could be done on x86 and the current behaviour
> would be kept by default but possible to modify by configuration.
> 
> @Jan: please correct me if i said something wrong
> @others: what is your view on this solution ?

Having options to turn on or off the new behavior could be good-to-have
if we find a guest that actually requires the old behavior. Today we
don't know of any such cases. We have strong reasons to believe that
there aren't any on ARM (see Julien's explanation in regards to the
temporary invalid mappings.) In fact, it is one of the factors that led
us to think this patch is the right approach.

That said, I am also OK with adding such a parameter now, but we need to
choose the default value carefully.


We need the new behavior as default on ARM because we need the fix to
work for all guests. I don't think we want to explain how you always
need to set config_foobar otherwise things don't work. It has to work
out of the box.

It would be nice if we had the same default on x86 too, although I
understand if Jan and Andrew don't want to make the same change on x86,
at least initially.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-07-31 23:03                 ` Stefano Stabellini
@ 2020-08-14  9:25                   ` Bertrand Marquis
  2020-08-20 10:23                     ` Julien Grall
  0 siblings, 1 reply; 14+ messages in thread
From: Bertrand Marquis @ 2020-08-14  9:25 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Jan Beulich, Julien Grall, Xen-devel, nd, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Ian Jackson, Wei Liu,
	Roger Pau Monné



> On 1 Aug 2020, at 00:03, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Fri, 31 Jul 2020, Bertrand Marquis wrote:
>>> On 31 Jul 2020, at 12:18, Jan Beulich <jbeulich@suse.com> wrote:
>>> 
>>> On 31.07.2020 12:12, Julien Grall wrote:
>>>> On 31/07/2020 07:39, Jan Beulich wrote:
>>>>> We're fixing other issues without breaking the ABI. Where's the
>>>>> problem of backporting the kernel side change (which I anticipate
>>>>> to not be overly involved)?
>>>> This means you can't take advantage of the runstage on existing Linux 
>>>> without any modification.
>>>> 
>>>>> If the plan remains to be to make an ABI breaking change,
>>>> 
>>>> For a theoritical PoV, this is a ABI breakage. However, I fail to see 
>>>> how the restrictions added would affect OSes at least on Arm.
>>> 
>>> "OSes" covering what? Just Linux?
>>> 
>>>> In particular, you can't change the VA -> PA on Arm without going 
>>>> through an invalid mapping. So I wouldn't expect this to happen for the 
>>>> runstate.
>>>> 
>>>> The only part that *may* be an issue is if the guest is registering the 
>>>> runstate with an initially invalid VA. Although, I have yet to see that 
>>>> in practice. Maybe you know?
>>> 
>>> I'm unaware of any such use, but this means close to nothing.
>>> 
>>>>> then I
>>>>> think this will need an explicit vote.
>>>> 
>>>> I was under the impression that the two Arm maintainers (Stefano and I) 
>>>> already agreed with the approach here. Therefore, given the ABI breakage 
>>>> is only affecting Arm, why would we need a vote?
>>> 
>>> The problem here is of conceptual nature: You're planning to
>>> make the behavior of a common hypercall diverge between
>>> architectures, and in a retroactive fashion. Imo that's nothing
>>> we should do even for new hypercalls, if _at all_ avoidable. If
>>> we allow this here, we'll have a precedent that people later
>>> may (and based on my experience will, sooner or later) reference,
>>> to get their own change justified.
> 
> Please let's avoid "slippery slope" arguments
> (https://en.wikipedia.org/wiki/Slippery_slope)
> 
> We shouldn't consider this instance as the first in a long series of bad
> decisions on hypercall compatibility. Each new case, if there will be
> any, will have to be considered based on its own merits. Also, let's
> keep in mind that there have been no other cases in the last 8 years. (I
> would like to repeat my support for hypercall ABI compatibility.)
> 
> 
> I would also kindly ask not to put the discussion on a "conceptual"
> level: there is no way to fix all guests and also keep compatibility.
> From a conceptual point of view, it is already game over :-)
> 
> 
>> After a discussion with Jan, he is proposing to have a guest config setting to
>> turn on or off the translation of the address during the hypercall and add a
>> global Xen command line parameter to set the global default behaviour. 
>> With this was done on arm could be done on x86 and the current behaviour
>> would be kept by default but possible to modify by configuration.
>> 
>> @Jan: please correct me if i said something wrong
>> @others: what is your view on this solution ?
> 
> Having options to turn on or off the new behavior could be good-to-have
> if we find a guest that actually requires the old behavior. Today we
> don't know of any such cases. We have strong reasons to believe that
> there aren't any on ARM (see Julien's explanation in regards to the
> temporary invalid mappings.) In fact, it is one of the factors that led
> us to think this patch is the right approach.
> 
> That said, I am also OK with adding such a parameter now, but we need to
> choose the default value carefully.

This would also mean keeping support in the code for old and new behaviour
which might make the code bigger and more complex.

> 
> 
> We need the new behavior as default on ARM because we need the fix to
> work for all guests. I don't think we want to explain how you always
> need to set config_foobar otherwise things don't work. It has to work
> out of the box.
> 
> It would be nice if we had the same default on x86 too, although I
> understand if Jan and Andrew don't want to make the same change on x86,
> at least initially.

So you mean here adding a parameter but only on Arm ?
Should it be a command line parameter ? a configuration parameter ? both ?

It seems that with this patch i touched some kind of sensible area.
Should i just abandon it and see later to work on adding a new hypercall using
a physical address as parameter ?

Cheers
Bertrand



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] xen/arm: Convert runstate address during hypcall
  2020-08-14  9:25                   ` Bertrand Marquis
@ 2020-08-20 10:23                     ` Julien Grall
  0 siblings, 0 replies; 14+ messages in thread
From: Julien Grall @ 2020-08-20 10:23 UTC (permalink / raw)
  To: Bertrand Marquis, Stefano Stabellini
  Cc: Jan Beulich, Xen-devel, nd, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Ian Jackson, Wei Liu, Roger Pau Monné

Hi,

Sorry for the late answer.

On 14/08/2020 10:25, Bertrand Marquis wrote:
> 
> 
>> On 1 Aug 2020, at 00:03, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>
>> On Fri, 31 Jul 2020, Bertrand Marquis wrote:
>>>> On 31 Jul 2020, at 12:18, Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 31.07.2020 12:12, Julien Grall wrote:
>>>>> On 31/07/2020 07:39, Jan Beulich wrote:
>>>>>> We're fixing other issues without breaking the ABI. Where's the
>>>>>> problem of backporting the kernel side change (which I anticipate
>>>>>> to not be overly involved)?
>>>>> This means you can't take advantage of the runstage on existing Linux
>>>>> without any modification.
>>>>>
>>>>>> If the plan remains to be to make an ABI breaking change,
>>>>>
>>>>> For a theoritical PoV, this is a ABI breakage. However, I fail to see
>>>>> how the restrictions added would affect OSes at least on Arm.
>>>>
>>>> "OSes" covering what? Just Linux?
>>>>
>>>>> In particular, you can't change the VA -> PA on Arm without going
>>>>> through an invalid mapping. So I wouldn't expect this to happen for the
>>>>> runstate.
>>>>>
>>>>> The only part that *may* be an issue is if the guest is registering the
>>>>> runstate with an initially invalid VA. Although, I have yet to see that
>>>>> in practice. Maybe you know?
>>>>
>>>> I'm unaware of any such use, but this means close to nothing.
>>>>
>>>>>> then I
>>>>>> think this will need an explicit vote.
>>>>>
>>>>> I was under the impression that the two Arm maintainers (Stefano and I)
>>>>> already agreed with the approach here. Therefore, given the ABI breakage
>>>>> is only affecting Arm, why would we need a vote?
>>>>
>>>> The problem here is of conceptual nature: You're planning to
>>>> make the behavior of a common hypercall diverge between
>>>> architectures, and in a retroactive fashion. Imo that's nothing
>>>> we should do even for new hypercalls, if _at all_ avoidable. If
>>>> we allow this here, we'll have a precedent that people later
>>>> may (and based on my experience will, sooner or later) reference,
>>>> to get their own change justified.
>>
>> Please let's avoid "slippery slope" arguments
>> (https://en.wikipedia.org/wiki/Slippery_slope)
>>
>> We shouldn't consider this instance as the first in a long series of bad
>> decisions on hypercall compatibility. Each new case, if there will be
>> any, will have to be considered based on its own merits. Also, let's
>> keep in mind that there have been no other cases in the last 8 years. (I
>> would like to repeat my support for hypercall ABI compatibility.)
>>
>>
>> I would also kindly ask not to put the discussion on a "conceptual"
>> level: there is no way to fix all guests and also keep compatibility.
>>  From a conceptual point of view, it is already game over :-)
>>
>>
>>> After a discussion with Jan, he is proposing to have a guest config setting to
>>> turn on or off the translation of the address during the hypercall and add a
>>> global Xen command line parameter to set the global default behaviour.
>>> With this was done on arm could be done on x86 and the current behaviour
>>> would be kept by default but possible to modify by configuration.
>>>
>>> @Jan: please correct me if i said something wrong
>>> @others: what is your view on this solution ?
>>
>> Having options to turn on or off the new behavior could be good-to-have
>> if we find a guest that actually requires the old behavior. Today we
>> don't know of any such cases. We have strong reasons to believe that
>> there aren't any on ARM (see Julien's explanation in regards to the
>> temporary invalid mappings.) In fact, it is one of the factors that led
>> us to think this patch is the right approach.
>>
>> That said, I am also OK with adding such a parameter now, but we need to
>> choose the default value carefully.

I agree with that :).

> 
> This would also mean keeping support in the code for old and new behaviour
> which might make the code bigger and more complex.

I am concerned with that as well. However, this concern is also going to 
be true if we introduce an hypercall using a physical address as 
parameter. Indeed, the old hypercall will not go away.

If we introduce a second hypercall, you will also have to think about 
the interactions between the two. For instance:
     - The firmware may register the runstate using the old hypercall, 
while the OS may register using the new hypercall.
     - Can an OS use a mix of the two hypercalls?

For more details, you can have a look at the original attempt for a new 
hypercall (see [1]).

The approach you discussed with Jan has the advantage to not require any 
change in the guest software stack. So this would be my preference over 
a new hypercall.

>>
>>
>> We need the new behavior as default on ARM because we need the fix to
>> work for all guests. I don't think we want to explain how you always
>> need to set config_foobar otherwise things don't work. It has to work
>> out of the box.
>>
>> It would be nice if we had the same default on x86 too, although I
>> understand if Jan and Andrew don't want to make the same change on x86,
>> at least initially.
> 
> So you mean here adding a parameter but only on Arm ?
> Should it be a command line parameter ? a configuration parameter ? both ?
> 
> It seems that with this patch i touched some kind of sensible area.
> Should i just abandon it and see later to work on adding a new hypercall using
> a physical address as parameter ?

I would suggest to mention the thread in the next community call.

Cheers,

[1] <1558721577-13958-3-git-send-email-andrii.anisov@gmail.com>


-- 
Julien Grall


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-08-20 10:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28 15:52 [PATCH v2] xen/arm: Convert runstate address during hypcall Bertrand Marquis
2020-07-28 19:04 ` Stefano Stabellini
2020-07-29  6:47   ` Bertrand Marquis
2020-07-28 19:54 ` Jan Beulich
2020-07-29  7:08   ` Bertrand Marquis
2020-07-29 18:41     ` Jan Beulich
2020-07-30  1:30       ` Stefano Stabellini
2020-07-31  6:39         ` Jan Beulich
2020-07-31 10:12           ` Julien Grall
2020-07-31 10:18             ` Jan Beulich
2020-07-31 14:36               ` Bertrand Marquis
2020-07-31 23:03                 ` Stefano Stabellini
2020-08-14  9:25                   ` Bertrand Marquis
2020-08-20 10:23                     ` Julien Grall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).