* [PATCH 0/2] xen/arm: Convert runstate address during hypcall @ 2020-06-11 11:58 Bertrand Marquis 2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis 2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis 0 siblings, 2 replies; 26+ messages in thread From: Bertrand Marquis @ 2020-06-11 11:58 UTC (permalink / raw) To: xen-devel Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, nd, Volodymyr Babchuk, Roger Pau Monné This serie is introducing changes to convert runstate address virtual address provided by a guest during the hypercall on Arm. It keeps the current strategy on x86 but moves some of the common code inside the architecture code. The serie is divided in 2 patches to allow an easier review of the specific code to handle the use case where the guest area is actually crossing 2 pages. Bertrand Marquis (2): xen/arm: Convert runstate address during hypcall xen/arm: Support runstate crossing pages xen/arch/arm/domain.c | 154 ++++++++++++++++++++++++++++++----- xen/arch/x86/domain.c | 30 ++++++- xen/arch/x86/x86_64/domain.c | 4 +- xen/common/domain.c | 19 ++--- xen/include/asm-arm/domain.h | 9 ++ xen/include/asm-x86/domain.h | 16 ++++ xen/include/xen/domain.h | 4 + xen/include/xen/sched.h | 16 +--- 8 files changed, 199 insertions(+), 53 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 11:58 [PATCH 0/2] xen/arm: Convert runstate address during hypcall Bertrand Marquis @ 2020-06-11 11:58 ` Bertrand Marquis 2020-06-11 18:16 ` Stefano Stabellini 2020-06-12 10:53 ` Julien Grall 2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis 1 sibling, 2 replies; 26+ messages in thread From: Bertrand Marquis @ 2020-06-11 11:58 UTC (permalink / raw) To: xen-devel Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, nd, Volodymyr Babchuk, Roger Pau Monné At the moment on Arm, a Linux guest running with KTPI enabled will cause the following error when a context switch happens in user mode: (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 This patch is modifying the register runstate area handling on arm to convert the guest address during the hypercall. During runtime context switches the area is mapped to update the guest runstate copy. The patch is also removing the initial copy which was done during the hypercall handling as this is done on the current core when the context is restore to go back to guest execution on arm. As a consequence if the register runstate hypercall is called on one vcpu for an other vcpu, the area will not be updated until the vcpu will be executed (on arm only). On x86, the behaviour is not modified, the address conversion is done during the context switch and the area is updated fully during the hypercall. inline functions in headers could not be used as the architecture domain.h is included before the global domain.h making it impossible to use the struct vcpu inside the architecture header. This should not have any performance impact as the hypercall is only called once per vcpu usually. Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> --- xen/arch/arm/domain.c | 109 +++++++++++++++++++++++++++++------ xen/arch/x86/domain.c | 30 +++++++++- xen/arch/x86/x86_64/domain.c | 4 +- xen/common/domain.c | 19 ++---- xen/include/asm-arm/domain.h | 8 +++ xen/include/asm-x86/domain.h | 16 +++++ xen/include/xen/domain.h | 4 ++ xen/include/xen/sched.h | 16 +---- 8 files changed, 153 insertions(+), 53 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 31169326b2..739059234f 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -19,6 +19,7 @@ #include <xen/sched.h> #include <xen/softirq.h> #include <xen/wait.h> +#include <xen/domain_page.h> #include <asm/alternative.h> #include <asm/cpuerrata.h> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n) virt_timer_restore(n); } -/* Update per-VCPU guest runstate shared memory area (if registered). */ -static void update_runstate_area(struct vcpu *v) +void arch_cleanup_runstate_guest(struct vcpu *v) { - void __user *guest_handle = NULL; - struct vcpu_runstate_info runstate; + spin_lock(&v->arch.runstate_guest.lock); - if ( guest_handle_is_null(runstate_guest(v)) ) - return; + /* cleanup previous page if any */ + if ( v->arch.runstate_guest.page ) + { + put_page_and_type(v->arch.runstate_guest.page); + v->arch.runstate_guest.page = NULL; + v->arch.runstate_guest.offset = 0; + } - memcpy(&runstate, &v->runstate, sizeof(runstate)); + spin_unlock(&v->arch.runstate_guest.lock); +} - if ( VM_ASSIST(v->domain, runstate_update_flag) ) +int arch_setup_runstate_guest(struct vcpu *v, + struct vcpu_register_runstate_memory_area area) +{ + struct page_info *page; + unsigned offset; + + spin_lock(&v->arch.runstate_guest.lock); + + /* cleanup previous page if any */ + if ( v->arch.runstate_guest.page ) { - guest_handle = &v->runstate_guest.p->state_entry_time + 1; - guest_handle--; - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; - __raw_copy_to_guest(guest_handle, - (void *)(&runstate.state_entry_time + 1) - 1, 1); - smp_wmb(); + put_page_and_type(v->arch.runstate_guest.page); + v->arch.runstate_guest.page = NULL; + v->arch.runstate_guest.offset = 0; + } + + offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) + { + spin_unlock(&v->arch.runstate_guest.lock); + gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); + return -EINVAL; + } + + /* provided address must be aligned to a 64bit */ + if ( offset % alignof(struct vcpu_runstate_info) ) + { + spin_unlock(&v->arch.runstate_guest.lock); + gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n"); + return -EINVAL; + } + + page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); + if ( !page ) + { + spin_unlock(&v->arch.runstate_guest.lock); + gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); + return -EINVAL; } - __copy_to_guest(runstate_guest(v), &runstate, 1); + v->arch.runstate_guest.page = page; + v->arch.runstate_guest.offset = offset; + + spin_unlock(&v->arch.runstate_guest.lock); + + return 0; +} + + +/* Update per-VCPU guest runstate shared memory area (if registered). */ +static void update_runstate_area(struct vcpu *v) +{ + struct vcpu_runstate_info *guest_runstate; + void *p; + + spin_lock(&v->arch.runstate_guest.lock); - if ( guest_handle ) + if ( v->arch.runstate_guest.page ) { - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; + p = __map_domain_page(v->arch.runstate_guest.page); + guest_runstate = p + v->arch.runstate_guest.offset; + + if ( VM_ASSIST(v->domain, runstate_update_flag) ) + { + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; + smp_wmb(); + } + + memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); smp_wmb(); - __raw_copy_to_guest(guest_handle, - (void *)(&runstate.state_entry_time + 1) - 1, 1); + + if ( VM_ASSIST(v->domain, runstate_update_flag) ) + { + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; + guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; + smp_wmb(); + } + + unmap_domain_page(p); } + + spin_unlock(&v->arch.runstate_guest.lock); } static void schedule_tail(struct vcpu *prev) @@ -560,6 +629,8 @@ int arch_vcpu_create(struct vcpu *v) v->arch.saved_context.sp = (register_t)v->arch.cpu_info; v->arch.saved_context.pc = (register_t)continue_new_vcpu; + spin_lock_init(&v->arch.runstate_guest.lock); + /* Idle VCPUs don't need the rest of this setup */ if ( is_idle_vcpu(v) ) return rc; diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index fee6c3931a..32bbed87d5 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v) wrmsr_tsc_aux(v->arch.msrs->tsc_aux); } +int arch_setup_runstate_guest(struct vcpu *v, + struct vcpu_register_runstate_memory_area area) +{ + struct vcpu_runstate_info runstate; + + runstate_guest(v) = area.addr.h; + + if ( v == current ) + { + __copy_to_guest(runstate_guest(v), &v->runstate, 1); + } + else + { + vcpu_runstate_get(v, &runstate); + __copy_to_guest(runstate_guest(v), &runstate, 1); + } + return 0; +} + +void arch_cleanup_runstate_guest(struct vcpu *v) +{ + set_xen_guest_handle(runstate_guest(v), NULL); +} + /* Update per-VCPU guest runstate shared memory area (if registered). */ bool update_runstate_area(struct vcpu *v) { @@ -1660,8 +1684,8 @@ bool update_runstate_area(struct vcpu *v) if ( VM_ASSIST(v->domain, runstate_update_flag) ) { guest_handle = has_32bit_shinfo(v->domain) - ? &v->runstate_guest.compat.p->state_entry_time + 1 - : &v->runstate_guest.native.p->state_entry_time + 1; + ? &v->arch.runstate_guest.compat.p->state_entry_time + 1 + : &v->arch.runstate_guest.native.p->state_entry_time + 1; guest_handle--; runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; __raw_copy_to_guest(guest_handle, @@ -1674,7 +1698,7 @@ bool update_runstate_area(struct vcpu *v) struct compat_vcpu_runstate_info info; XLAT_vcpu_runstate_info(&info, &runstate); - __copy_to_guest(v->runstate_guest.compat, &info, 1); + __copy_to_guest(v->arch.runstate_guest.compat, &info, 1); rc = true; } else diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c index c46dccc25a..b879e8dd2c 100644 --- a/xen/arch/x86/x86_64/domain.c +++ b/xen/arch/x86/x86_64/domain.c @@ -36,7 +36,7 @@ arch_compat_vcpu_op( break; rc = 0; - guest_from_compat_handle(v->runstate_guest.compat, area.addr.h); + guest_from_compat_handle(v->arch.runstate_guest.compat, area.addr.h); if ( v == current ) { @@ -49,7 +49,7 @@ arch_compat_vcpu_op( vcpu_runstate_get(v, &runstate); XLAT_vcpu_runstate_info(&info, &runstate); } - __copy_to_guest(v->runstate_guest.compat, &info, 1); + __copy_to_guest(v->arch.runstate_guest.compat, &info, 1); break; } diff --git a/xen/common/domain.c b/xen/common/domain.c index 7cc9526139..0ca6bf4dbc 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -727,7 +727,10 @@ int domain_kill(struct domain *d) if ( cpupool_move_domain(d, cpupool0) ) return -ERESTART; for_each_vcpu ( d, v ) + { + arch_cleanup_runstate_guest(v); unmap_vcpu_info(v); + } d->is_dying = DOMDYING_dead; /* Mem event cleanup has to go here because the rings * have to be put before we call put_domain. */ @@ -1167,7 +1170,7 @@ int domain_soft_reset(struct domain *d) for_each_vcpu ( d, v ) { - set_xen_guest_handle(runstate_guest(v), NULL); + arch_cleanup_runstate_guest(v); unmap_vcpu_info(v); } @@ -1484,7 +1487,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) case VCPUOP_register_runstate_memory_area: { struct vcpu_register_runstate_memory_area area; - struct vcpu_runstate_info runstate; rc = -EFAULT; if ( copy_from_guest(&area, arg, 1) ) @@ -1493,18 +1495,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) if ( !guest_handle_okay(area.addr.h, 1) ) break; - rc = 0; - runstate_guest(v) = area.addr.h; - - if ( v == current ) - { - __copy_to_guest(runstate_guest(v), &v->runstate, 1); - } - else - { - vcpu_runstate_get(v, &runstate); - __copy_to_guest(runstate_guest(v), &runstate, 1); - } + rc = arch_setup_runstate_guest(v, area); break; } diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 4e2f582006..3a7f53e13d 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -11,6 +11,7 @@ #include <asm/vgic.h> #include <asm/vpl011.h> #include <public/hvm/params.h> +#include <public/vcpu.h> #include <xen/serial.h> #include <xen/rbtree.h> @@ -206,6 +207,13 @@ struct arch_vcpu */ bool need_flush_to_ram; + /* runstate guest info */ + struct { + struct page_info *page; /* guest page */ + unsigned offset; /* offset in page */ + spinlock_t lock; /* lock to access page */ + } runstate_guest; + } __cacheline_aligned; void vcpu_show_execution_state(struct vcpu *); diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index e8cee3d5e5..f917b48cb8 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -11,6 +11,11 @@ #include <asm/x86_emulate.h> #include <public/vcpu.h> #include <public/hvm/hvm_info_table.h> +#ifdef CONFIG_COMPAT +#include <compat/vcpu.h> +DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t); +#endif + #define has_32bit_shinfo(d) ((d)->arch.has_32bit_shinfo) @@ -626,6 +631,17 @@ struct arch_vcpu struct { bool next_interrupt_enabled; } monitor; + +#ifndef CONFIG_COMPAT +# define runstate_guest(v) ((v)->arch.runstate_guest) + XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */ +#else +# define runstate_guest(v) ((v)->arch.runstate_guest.native) + union { + XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; + XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; + } runstate_guest; +#endif }; struct guest_memory_policy diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h index 7e51d361de..d5d73ce997 100644 --- a/xen/include/xen/domain.h +++ b/xen/include/xen/domain.h @@ -63,6 +63,10 @@ void arch_vcpu_destroy(struct vcpu *v); int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset); void unmap_vcpu_info(struct vcpu *v); +int arch_setup_runstate_guest(struct vcpu *v, + struct vcpu_register_runstate_memory_area area); +void arch_cleanup_runstate_guest(struct vcpu *v); + int arch_domain_create(struct domain *d, struct xen_domctl_createdomain *config); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ac53519d7f..fac030fb83 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -29,11 +29,6 @@ #include <public/vcpu.h> #include <public/event_channel.h> -#ifdef CONFIG_COMPAT -#include <compat/vcpu.h> -DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t); -#endif - /* * Stats * @@ -166,16 +161,7 @@ struct vcpu struct sched_unit *sched_unit; struct vcpu_runstate_info runstate; -#ifndef CONFIG_COMPAT -# define runstate_guest(v) ((v)->runstate_guest) - XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */ -#else -# define runstate_guest(v) ((v)->runstate_guest.native) - union { - XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; - XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; - } runstate_guest; /* guest address */ -#endif + unsigned int new_state; /* Has the FPU been initialised? */ -- 2.17.1 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis @ 2020-06-11 18:16 ` Stefano Stabellini 2020-06-11 18:24 ` Julien Grall 2020-06-12 8:07 ` Bertrand Marquis 2020-06-12 10:53 ` Julien Grall 1 sibling, 2 replies; 26+ messages in thread From: Stefano Stabellini @ 2020-06-11 18:16 UTC (permalink / raw) To: Bertrand Marquis Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné On Thu, 11 Jun 2020, Bertrand Marquis wrote: > At the moment on Arm, a Linux guest running with KTPI enabled will > cause the following error when a context switch happens in user mode: > (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 > > This patch is modifying the register runstate area handling on arm to > convert the guest address during the hypercall. During runtime context > switches the area is mapped to update the guest runstate copy. > The patch is also removing the initial copy which was done during the > hypercall handling as this is done on the current core when the context > is restore to go back to guest execution on arm. > > As a consequence if the register runstate hypercall is called on one > vcpu for an other vcpu, the area will not be updated until the vcpu > will be executed (on arm only). > > On x86, the behaviour is not modified, the address conversion is done > during the context switch and the area is updated fully during the > hypercall. > inline functions in headers could not be used as the architecture > domain.h is included before the global domain.h making it impossible > to use the struct vcpu inside the architecture header. > This should not have any performance impact as the hypercall is only > called once per vcpu usually. > > Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> > --- > xen/arch/arm/domain.c | 109 +++++++++++++++++++++++++++++------ > xen/arch/x86/domain.c | 30 +++++++++- > xen/arch/x86/x86_64/domain.c | 4 +- > xen/common/domain.c | 19 ++---- > xen/include/asm-arm/domain.h | 8 +++ > xen/include/asm-x86/domain.h | 16 +++++ > xen/include/xen/domain.h | 4 ++ > xen/include/xen/sched.h | 16 +---- > 8 files changed, 153 insertions(+), 53 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 31169326b2..739059234f 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -19,6 +19,7 @@ > #include <xen/sched.h> > #include <xen/softirq.h> > #include <xen/wait.h> > +#include <xen/domain_page.h> > > #include <asm/alternative.h> > #include <asm/cpuerrata.h> > @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n) > virt_timer_restore(n); > } > > -/* Update per-VCPU guest runstate shared memory area (if registered). */ > -static void update_runstate_area(struct vcpu *v) > +void arch_cleanup_runstate_guest(struct vcpu *v) > { > - void __user *guest_handle = NULL; > - struct vcpu_runstate_info runstate; > + spin_lock(&v->arch.runstate_guest.lock); > > - if ( guest_handle_is_null(runstate_guest(v)) ) > - return; > + /* cleanup previous page if any */ > + if ( v->arch.runstate_guest.page ) > + { > + put_page_and_type(v->arch.runstate_guest.page); > + v->arch.runstate_guest.page = NULL; > + v->arch.runstate_guest.offset = 0; > + } > > - memcpy(&runstate, &v->runstate, sizeof(runstate)); > + spin_unlock(&v->arch.runstate_guest.lock); > +} > > - if ( VM_ASSIST(v->domain, runstate_update_flag) ) > +int arch_setup_runstate_guest(struct vcpu *v, > + struct vcpu_register_runstate_memory_area area) > +{ > + struct page_info *page; > + unsigned offset; > + > + spin_lock(&v->arch.runstate_guest.lock); > + > + /* cleanup previous page if any */ > + if ( v->arch.runstate_guest.page ) > { > - guest_handle = &v->runstate_guest.p->state_entry_time + 1; > - guest_handle--; > - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > - __raw_copy_to_guest(guest_handle, > - (void *)(&runstate.state_entry_time + 1) - 1, 1); > - smp_wmb(); > + put_page_and_type(v->arch.runstate_guest.page); > + v->arch.runstate_guest.page = NULL; > + v->arch.runstate_guest.offset = 0; > + } > + > + offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; > + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > + { > + spin_unlock(&v->arch.runstate_guest.lock); > + gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); > + return -EINVAL; > + } > + > + /* provided address must be aligned to a 64bit */ > + if ( offset % alignof(struct vcpu_runstate_info) ) > + { > + spin_unlock(&v->arch.runstate_guest.lock); > + gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n"); > + return -EINVAL; > + } > + > + page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); > + if ( !page ) > + { > + spin_unlock(&v->arch.runstate_guest.lock); > + gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); I would make all these XENLOG_WARNING or XENLOG_ERR, they are errors after all. This last one I'd say "Could not map runstate point" and also print the address. > + return -EINVAL; > } > > - __copy_to_guest(runstate_guest(v), &runstate, 1); > + v->arch.runstate_guest.page = page; > + v->arch.runstate_guest.offset = offset; > + > + spin_unlock(&v->arch.runstate_guest.lock); > + > + return 0; > +} > + > + > +/* Update per-VCPU guest runstate shared memory area (if registered). */ > +static void update_runstate_area(struct vcpu *v) > +{ > + struct vcpu_runstate_info *guest_runstate; > + void *p; > + > + spin_lock(&v->arch.runstate_guest.lock); > > - if ( guest_handle ) > + if ( v->arch.runstate_guest.page ) > { > - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + p = __map_domain_page(v->arch.runstate_guest.page); > + guest_runstate = p + v->arch.runstate_guest.offset; > + > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + { > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; I think that this write to guest_runstate should use write_atomic or another atomic write operation. > + smp_wmb(); > + } > + > + memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); > smp_wmb(); > - __raw_copy_to_guest(guest_handle, > - (void *)(&runstate.state_entry_time + 1) - 1, 1); > + > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + { > + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; same here > + smp_wmb(); > + } > + > + unmap_domain_page(p); > } > + > + spin_unlock(&v->arch.runstate_guest.lock); > } > > static void schedule_tail(struct vcpu *prev) > @@ -560,6 +629,8 @@ int arch_vcpu_create(struct vcpu *v) > v->arch.saved_context.sp = (register_t)v->arch.cpu_info; > v->arch.saved_context.pc = (register_t)continue_new_vcpu; > > + spin_lock_init(&v->arch.runstate_guest.lock); > + > /* Idle VCPUs don't need the rest of this setup */ > if ( is_idle_vcpu(v) ) > return rc; > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index fee6c3931a..32bbed87d5 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v) > wrmsr_tsc_aux(v->arch.msrs->tsc_aux); > } > > +int arch_setup_runstate_guest(struct vcpu *v, > + struct vcpu_register_runstate_memory_area area) > +{ > + struct vcpu_runstate_info runstate; > + > + runstate_guest(v) = area.addr.h; > + > + if ( v == current ) > + { > + __copy_to_guest(runstate_guest(v), &v->runstate, 1); > + } > + else > + { > + vcpu_runstate_get(v, &runstate); > + __copy_to_guest(runstate_guest(v), &runstate, 1); > + } > + return 0; > +} > + > +void arch_cleanup_runstate_guest(struct vcpu *v) > +{ > + set_xen_guest_handle(runstate_guest(v), NULL); > +} > + > /* Update per-VCPU guest runstate shared memory area (if registered). */ > bool update_runstate_area(struct vcpu *v) > { > @@ -1660,8 +1684,8 @@ bool update_runstate_area(struct vcpu *v) > if ( VM_ASSIST(v->domain, runstate_update_flag) ) > { > guest_handle = has_32bit_shinfo(v->domain) > - ? &v->runstate_guest.compat.p->state_entry_time + 1 > - : &v->runstate_guest.native.p->state_entry_time + 1; > + ? &v->arch.runstate_guest.compat.p->state_entry_time + 1 > + : &v->arch.runstate_guest.native.p->state_entry_time + 1; > guest_handle--; > runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > __raw_copy_to_guest(guest_handle, > @@ -1674,7 +1698,7 @@ bool update_runstate_area(struct vcpu *v) > struct compat_vcpu_runstate_info info; > > XLAT_vcpu_runstate_info(&info, &runstate); > - __copy_to_guest(v->runstate_guest.compat, &info, 1); > + __copy_to_guest(v->arch.runstate_guest.compat, &info, 1); > rc = true; > } > else > diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c > index c46dccc25a..b879e8dd2c 100644 > --- a/xen/arch/x86/x86_64/domain.c > +++ b/xen/arch/x86/x86_64/domain.c > @@ -36,7 +36,7 @@ arch_compat_vcpu_op( > break; > > rc = 0; > - guest_from_compat_handle(v->runstate_guest.compat, area.addr.h); > + guest_from_compat_handle(v->arch.runstate_guest.compat, area.addr.h); > > if ( v == current ) > { > @@ -49,7 +49,7 @@ arch_compat_vcpu_op( > vcpu_runstate_get(v, &runstate); > XLAT_vcpu_runstate_info(&info, &runstate); > } > - __copy_to_guest(v->runstate_guest.compat, &info, 1); > + __copy_to_guest(v->arch.runstate_guest.compat, &info, 1); > > break; > } > diff --git a/xen/common/domain.c b/xen/common/domain.c > index 7cc9526139..0ca6bf4dbc 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -727,7 +727,10 @@ int domain_kill(struct domain *d) > if ( cpupool_move_domain(d, cpupool0) ) > return -ERESTART; > for_each_vcpu ( d, v ) > + { > + arch_cleanup_runstate_guest(v); > unmap_vcpu_info(v); > + } > d->is_dying = DOMDYING_dead; > /* Mem event cleanup has to go here because the rings > * have to be put before we call put_domain. */ > @@ -1167,7 +1170,7 @@ int domain_soft_reset(struct domain *d) > > for_each_vcpu ( d, v ) > { > - set_xen_guest_handle(runstate_guest(v), NULL); > + arch_cleanup_runstate_guest(v); > unmap_vcpu_info(v); > } > > @@ -1484,7 +1487,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) > case VCPUOP_register_runstate_memory_area: > { > struct vcpu_register_runstate_memory_area area; > - struct vcpu_runstate_info runstate; > > rc = -EFAULT; > if ( copy_from_guest(&area, arg, 1) ) > @@ -1493,18 +1495,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) > if ( !guest_handle_okay(area.addr.h, 1) ) > break; > > - rc = 0; > - runstate_guest(v) = area.addr.h; > - > - if ( v == current ) > - { > - __copy_to_guest(runstate_guest(v), &v->runstate, 1); > - } > - else > - { > - vcpu_runstate_get(v, &runstate); > - __copy_to_guest(runstate_guest(v), &runstate, 1); > - } > + rc = arch_setup_runstate_guest(v, area); > > break; > } > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 4e2f582006..3a7f53e13d 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -11,6 +11,7 @@ > #include <asm/vgic.h> > #include <asm/vpl011.h> > #include <public/hvm/params.h> > +#include <public/vcpu.h> > #include <xen/serial.h> > #include <xen/rbtree.h> > > @@ -206,6 +207,13 @@ struct arch_vcpu > */ > bool need_flush_to_ram; > > + /* runstate guest info */ > + struct { > + struct page_info *page; /* guest page */ > + unsigned offset; /* offset in page */ > + spinlock_t lock; /* lock to access page */ > + } runstate_guest; > + > } __cacheline_aligned; > > void vcpu_show_execution_state(struct vcpu *); > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > index e8cee3d5e5..f917b48cb8 100644 > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -11,6 +11,11 @@ > #include <asm/x86_emulate.h> > #include <public/vcpu.h> > #include <public/hvm/hvm_info_table.h> > +#ifdef CONFIG_COMPAT > +#include <compat/vcpu.h> > +DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t); > +#endif > + > > #define has_32bit_shinfo(d) ((d)->arch.has_32bit_shinfo) > > @@ -626,6 +631,17 @@ struct arch_vcpu > struct { > bool next_interrupt_enabled; > } monitor; > + > +#ifndef CONFIG_COMPAT > +# define runstate_guest(v) ((v)->arch.runstate_guest) > + XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */ > +#else > +# define runstate_guest(v) ((v)->arch.runstate_guest.native) > + union { > + XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; > + XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; > + } runstate_guest; > +#endif > }; > > struct guest_memory_policy > diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h > index 7e51d361de..d5d73ce997 100644 > --- a/xen/include/xen/domain.h > +++ b/xen/include/xen/domain.h > @@ -63,6 +63,10 @@ void arch_vcpu_destroy(struct vcpu *v); > int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset); > void unmap_vcpu_info(struct vcpu *v); > > +int arch_setup_runstate_guest(struct vcpu *v, > + struct vcpu_register_runstate_memory_area area); NIT: code style, the indentation is off > +void arch_cleanup_runstate_guest(struct vcpu *v); > + > int arch_domain_create(struct domain *d, > struct xen_domctl_createdomain *config); > > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index ac53519d7f..fac030fb83 100644 > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -29,11 +29,6 @@ > #include <public/vcpu.h> > #include <public/event_channel.h> > > -#ifdef CONFIG_COMPAT > -#include <compat/vcpu.h> > -DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t); > -#endif > - > /* > * Stats > * > @@ -166,16 +161,7 @@ struct vcpu > struct sched_unit *sched_unit; > > struct vcpu_runstate_info runstate; > -#ifndef CONFIG_COMPAT > -# define runstate_guest(v) ((v)->runstate_guest) > - XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */ > -#else > -# define runstate_guest(v) ((v)->runstate_guest.native) > - union { > - XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; > - XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; > - } runstate_guest; /* guest address */ > -#endif > + > unsigned int new_state; > > /* Has the FPU been initialised? */ > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 18:16 ` Stefano Stabellini @ 2020-06-11 18:24 ` Julien Grall 2020-06-11 18:50 ` Stefano Stabellini 2020-06-12 8:07 ` Bertrand Marquis 1 sibling, 1 reply; 26+ messages in thread From: Julien Grall @ 2020-06-11 18:24 UTC (permalink / raw) To: Stefano Stabellini Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné > > + return -EINVAL; > > } > > > > - __copy_to_guest(runstate_guest(v), &runstate, 1); > > + v->arch.runstate_guest.page = page; > > + v->arch.runstate_guest.offset = offset; > > + > > + spin_unlock(&v->arch.runstate_guest.lock); > > + > > + return 0; > > +} > > + > > + > > +/* Update per-VCPU guest runstate shared memory area (if registered). */ > > +static void update_runstate_area(struct vcpu *v) > > +{ > > + struct vcpu_runstate_info *guest_runstate; > > + void *p; > > + > > + spin_lock(&v->arch.runstate_guest.lock); > > > > - if ( guest_handle ) > > + if ( v->arch.runstate_guest.page ) > > { > > - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > > + p = __map_domain_page(v->arch.runstate_guest.page); > > + guest_runstate = p + v->arch.runstate_guest.offset; > > + > > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > > + { > > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > > + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > > I think that this write to guest_runstate should use write_atomic or > another atomic write operation. I thought about suggesting the same, but guest_copy_* helpers may not do a single memory write to state_entry_time. What are you trying to prevent with the write_atomic()? Cheers, ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 18:24 ` Julien Grall @ 2020-06-11 18:50 ` Stefano Stabellini 2020-06-11 19:38 ` Julien Grall 0 siblings, 1 reply; 26+ messages in thread From: Stefano Stabellini @ 2020-06-11 18:50 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné On Thu, 11 Jun 2020, Julien Grall wrote: > > > + return -EINVAL; > > > } > > > > > > - __copy_to_guest(runstate_guest(v), &runstate, 1); > > > + v->arch.runstate_guest.page = page; > > > + v->arch.runstate_guest.offset = offset; > > > + > > > + spin_unlock(&v->arch.runstate_guest.lock); > > > + > > > + return 0; > > > +} > > > + > > > + > > > +/* Update per-VCPU guest runstate shared memory area (if registered). */ > > > +static void update_runstate_area(struct vcpu *v) > > > +{ > > > + struct vcpu_runstate_info *guest_runstate; > > > + void *p; > > > + > > > + spin_lock(&v->arch.runstate_guest.lock); > > > > > > - if ( guest_handle ) > > > + if ( v->arch.runstate_guest.page ) > > > { > > > - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > > > + p = __map_domain_page(v->arch.runstate_guest.page); > > > + guest_runstate = p + v->arch.runstate_guest.offset; > > > + > > > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > > > + { > > > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > > > + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > > > > I think that this write to guest_runstate should use write_atomic or > > another atomic write operation. > > I thought about suggesting the same, but guest_copy_* helpers may not > do a single memory write to state_entry_time. > What are you trying to prevent with the write_atomic()? I am thinking that without using an atomic write, it would be (at least theoretically) possible for a guest to see a partial write to state_entry_time, which is not good. In theory, the set of assembly instructions generated by the compiler could go through an intermediate state that we don't want the guest to see. In practice, I doubt that any possible combination of assembly instructions generated by the compiler could lead to something harmful. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 18:50 ` Stefano Stabellini @ 2020-06-11 19:38 ` Julien Grall 2020-06-12 1:09 ` Stefano Stabellini 0 siblings, 1 reply; 26+ messages in thread From: Julien Grall @ 2020-06-11 19:38 UTC (permalink / raw) To: Stefano Stabellini, Julien Grall Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné Hi Stefano, On 11/06/2020 19:50, Stefano Stabellini wrote: > On Thu, 11 Jun 2020, Julien Grall wrote: >>>> + return -EINVAL; >>>> } >>>> >>>> - __copy_to_guest(runstate_guest(v), &runstate, 1); >>>> + v->arch.runstate_guest.page = page; >>>> + v->arch.runstate_guest.offset = offset; >>>> + >>>> + spin_unlock(&v->arch.runstate_guest.lock); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> + >>>> +/* Update per-VCPU guest runstate shared memory area (if registered). */ >>>> +static void update_runstate_area(struct vcpu *v) >>>> +{ >>>> + struct vcpu_runstate_info *guest_runstate; >>>> + void *p; >>>> + >>>> + spin_lock(&v->arch.runstate_guest.lock); >>>> >>>> - if ( guest_handle ) >>>> + if ( v->arch.runstate_guest.page ) >>>> { >>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >>>> + p = __map_domain_page(v->arch.runstate_guest.page); >>>> + guest_runstate = p + v->arch.runstate_guest.offset; >>>> + >>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >>>> + { >>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >>>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >>> >>> I think that this write to guest_runstate should use write_atomic or >>> another atomic write operation. >> >> I thought about suggesting the same, but guest_copy_* helpers may not >> do a single memory write to state_entry_time. >> What are you trying to prevent with the write_atomic()? > > I am thinking that without using an atomic write, it would be (at least > theoretically) possible for a guest to see a partial write to > state_entry_time, which is not good. It is already the case with existing implementation as Xen may write byte by byte. So are you suggesting the existing code is also buggy? > In theory, the set of assembly > instructions generated by the compiler could go through an intermediate > state that we don't want the guest to see. In practice, I doubt that any > possible combination of assembly instructions generated by the compiler > could lead to something harmful. Well, I think you first need to define the theoritical state you are worried about. Then we can discuss whether they can happen in practice and how we can prevent them in the existing and new code. So what sort of state your are concerned? Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 19:38 ` Julien Grall @ 2020-06-12 1:09 ` Stefano Stabellini 2020-06-12 8:13 ` Bertrand Marquis 2020-06-12 9:53 ` Julien Grall 0 siblings, 2 replies; 26+ messages in thread From: Stefano Stabellini @ 2020-06-12 1:09 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Bertrand Marquis, Roger Pau Monné, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall On Thu, 11 Jun 2020, Julien Grall wrote: > Hi Stefano, > > On 11/06/2020 19:50, Stefano Stabellini wrote: > > On Thu, 11 Jun 2020, Julien Grall wrote: > > > > > + return -EINVAL; > > > > > } > > > > > > > > > > - __copy_to_guest(runstate_guest(v), &runstate, 1); > > > > > + v->arch.runstate_guest.page = page; > > > > > + v->arch.runstate_guest.offset = offset; > > > > > + > > > > > + spin_unlock(&v->arch.runstate_guest.lock); > > > > > + > > > > > + return 0; > > > > > +} > > > > > + > > > > > + > > > > > +/* Update per-VCPU guest runstate shared memory area (if registered). > > > > > */ > > > > > +static void update_runstate_area(struct vcpu *v) > > > > > +{ > > > > > + struct vcpu_runstate_info *guest_runstate; > > > > > + void *p; > > > > > + > > > > > + spin_lock(&v->arch.runstate_guest.lock); > > > > > > > > > > - if ( guest_handle ) > > > > > + if ( v->arch.runstate_guest.page ) > > > > > { > > > > > - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > > > > > + p = __map_domain_page(v->arch.runstate_guest.page); > > > > > + guest_runstate = p + v->arch.runstate_guest.offset; > > > > > + > > > > > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > > > > > + { > > > > > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > > > > > + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > > > > > > > > I think that this write to guest_runstate should use write_atomic or > > > > another atomic write operation. > > > > > > I thought about suggesting the same, but guest_copy_* helpers may not > > > do a single memory write to state_entry_time. > > > What are you trying to prevent with the write_atomic()? > > > > I am thinking that without using an atomic write, it would be (at least > > theoretically) possible for a guest to see a partial write to > > state_entry_time, which is not good. > > It is already the case with existing implementation as Xen may write byte by > byte. So are you suggesting the existing code is also buggy? Writing byte by byte is a different case. That is OK. In that case, the guest could see the state after 3 bytes written and it would be fine and consistent. If this hadn't been the case, then yes, the existing code would also be buggy. So if we did the write with a memcpy, it would be fine, no need for atomics: memcpy(&guest_runstate->state_entry_time, &v->runstate.state_entry_time, XXX); The |= case is different: GCC could implement it in any way it likes, including going through a zero-write to any of the bytes in the word, or doing an addition then a subtraction. GCC doesn't make any guarantees. If we want guarantees we need to use atomics. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 1:09 ` Stefano Stabellini @ 2020-06-12 8:13 ` Bertrand Marquis 2020-06-13 0:24 ` Stefano Stabellini 2020-06-12 9:53 ` Julien Grall 1 sibling, 1 reply; 26+ messages in thread From: Bertrand Marquis @ 2020-06-12 8:13 UTC (permalink / raw) To: Stefano Stabellini Cc: Roger Pau Monné, Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall > On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Thu, 11 Jun 2020, Julien Grall wrote: >> Hi Stefano, >> >> On 11/06/2020 19:50, Stefano Stabellini wrote: >>> On Thu, 11 Jun 2020, Julien Grall wrote: >>>>>> + return -EINVAL; >>>>>> } >>>>>> >>>>>> - __copy_to_guest(runstate_guest(v), &runstate, 1); >>>>>> + v->arch.runstate_guest.page = page; >>>>>> + v->arch.runstate_guest.offset = offset; >>>>>> + >>>>>> + spin_unlock(&v->arch.runstate_guest.lock); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> + >>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered). >>>>>> */ >>>>>> +static void update_runstate_area(struct vcpu *v) >>>>>> +{ >>>>>> + struct vcpu_runstate_info *guest_runstate; >>>>>> + void *p; >>>>>> + >>>>>> + spin_lock(&v->arch.runstate_guest.lock); >>>>>> >>>>>> - if ( guest_handle ) >>>>>> + if ( v->arch.runstate_guest.page ) >>>>>> { >>>>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >>>>>> + p = __map_domain_page(v->arch.runstate_guest.page); >>>>>> + guest_runstate = p + v->arch.runstate_guest.offset; >>>>>> + >>>>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >>>>>> + { >>>>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >>>>>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >>>>> >>>>> I think that this write to guest_runstate should use write_atomic or >>>>> another atomic write operation. >>>> >>>> I thought about suggesting the same, but guest_copy_* helpers may not >>>> do a single memory write to state_entry_time. >>>> What are you trying to prevent with the write_atomic()? >>> >>> I am thinking that without using an atomic write, it would be (at least >>> theoretically) possible for a guest to see a partial write to >>> state_entry_time, which is not good. >> >> It is already the case with existing implementation as Xen may write byte by >> byte. So are you suggesting the existing code is also buggy? > > Writing byte by byte is a different case. That is OK. In that case, the > guest could see the state after 3 bytes written and it would be fine and > consistent. If this hadn't been the case, then yes, the existing code > would also be buggy. > > So if we did the write with a memcpy, it would be fine, no need for > atomics: > > memcpy(&guest_runstate->state_entry_time, > &v->runstate.state_entry_time, > XXX); > > > The |= case is different: GCC could implement it in any way it likes, > including going through a zero-write to any of the bytes in the word, or > doing an addition then a subtraction. GCC doesn't make any guarantees. > If we want guarantees we need to use atomics. Wouldn’t that require all accesses to state_entry_time to use also atomic operations ? In this case we could not propagate the changes to a guest without changing the interface itself. As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done. I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done. Cheers Bertrand ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 8:13 ` Bertrand Marquis @ 2020-06-13 0:24 ` Stefano Stabellini 2020-06-15 14:09 ` Bertrand Marquis 0 siblings, 1 reply; 26+ messages in thread From: Stefano Stabellini @ 2020-06-13 0:24 UTC (permalink / raw) To: Bertrand Marquis Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Roger Pau Monné, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall [-- Attachment #1: Type: text/plain, Size: 3798 bytes --] On Fri, 12 Jun 2020, Bertrand Marquis wrote: > > On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote: > > > > On Thu, 11 Jun 2020, Julien Grall wrote: > >> Hi Stefano, > >> > >> On 11/06/2020 19:50, Stefano Stabellini wrote: > >>> On Thu, 11 Jun 2020, Julien Grall wrote: > >>>>>> + return -EINVAL; > >>>>>> } > >>>>>> > >>>>>> - __copy_to_guest(runstate_guest(v), &runstate, 1); > >>>>>> + v->arch.runstate_guest.page = page; > >>>>>> + v->arch.runstate_guest.offset = offset; > >>>>>> + > >>>>>> + spin_unlock(&v->arch.runstate_guest.lock); > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> + > >>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered). > >>>>>> */ > >>>>>> +static void update_runstate_area(struct vcpu *v) > >>>>>> +{ > >>>>>> + struct vcpu_runstate_info *guest_runstate; > >>>>>> + void *p; > >>>>>> + > >>>>>> + spin_lock(&v->arch.runstate_guest.lock); > >>>>>> > >>>>>> - if ( guest_handle ) > >>>>>> + if ( v->arch.runstate_guest.page ) > >>>>>> { > >>>>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > >>>>>> + p = __map_domain_page(v->arch.runstate_guest.page); > >>>>>> + guest_runstate = p + v->arch.runstate_guest.offset; > >>>>>> + > >>>>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > >>>>>> + { > >>>>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > >>>>>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > >>>>> > >>>>> I think that this write to guest_runstate should use write_atomic or > >>>>> another atomic write operation. > >>>> > >>>> I thought about suggesting the same, but guest_copy_* helpers may not > >>>> do a single memory write to state_entry_time. > >>>> What are you trying to prevent with the write_atomic()? > >>> > >>> I am thinking that without using an atomic write, it would be (at least > >>> theoretically) possible for a guest to see a partial write to > >>> state_entry_time, which is not good. > >> > >> It is already the case with existing implementation as Xen may write byte by > >> byte. So are you suggesting the existing code is also buggy? > > > > Writing byte by byte is a different case. That is OK. In that case, the > > guest could see the state after 3 bytes written and it would be fine and > > consistent. If this hadn't been the case, then yes, the existing code > > would also be buggy. > > > > So if we did the write with a memcpy, it would be fine, no need for > > atomics: > > > > memcpy(&guest_runstate->state_entry_time, > > &v->runstate.state_entry_time, > > XXX); > > > > > > The |= case is different: GCC could implement it in any way it likes, > > including going through a zero-write to any of the bytes in the word, or > > doing an addition then a subtraction. GCC doesn't make any guarantees. > > If we want guarantees we need to use atomics. > > Wouldn’t that require all accesses to state_entry_time to use also atomic operations ? > In this case we could not propagate the changes to a guest without changing the interface itself. > > As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done. > I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done. As you say, we have a flag to mark a transitiong period, the flag is XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during the transitioning period. But we need to make sure to use atomics for the update of the XEN_RUNSTATE_UPDATE flag itself. Does it make sense? Or maybe I misunderstood some of the things you wrote? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-13 0:24 ` Stefano Stabellini @ 2020-06-15 14:09 ` Bertrand Marquis 2020-06-15 20:30 ` Stefano Stabellini 0 siblings, 1 reply; 26+ messages in thread From: Bertrand Marquis @ 2020-06-15 14:09 UTC (permalink / raw) To: Stefano Stabellini Cc: Roger Pau Monné, Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall > On 13 Jun 2020, at 01:24, Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Fri, 12 Jun 2020, Bertrand Marquis wrote: >>> On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote: >>> >>> On Thu, 11 Jun 2020, Julien Grall wrote: >>>> Hi Stefano, >>>> >>>> On 11/06/2020 19:50, Stefano Stabellini wrote: >>>>> On Thu, 11 Jun 2020, Julien Grall wrote: >>>>>>>> + return -EINVAL; >>>>>>>> } >>>>>>>> >>>>>>>> - __copy_to_guest(runstate_guest(v), &runstate, 1); >>>>>>>> + v->arch.runstate_guest.page = page; >>>>>>>> + v->arch.runstate_guest.offset = offset; >>>>>>>> + >>>>>>>> + spin_unlock(&v->arch.runstate_guest.lock); >>>>>>>> + >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> + >>>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered). >>>>>>>> */ >>>>>>>> +static void update_runstate_area(struct vcpu *v) >>>>>>>> +{ >>>>>>>> + struct vcpu_runstate_info *guest_runstate; >>>>>>>> + void *p; >>>>>>>> + >>>>>>>> + spin_lock(&v->arch.runstate_guest.lock); >>>>>>>> >>>>>>>> - if ( guest_handle ) >>>>>>>> + if ( v->arch.runstate_guest.page ) >>>>>>>> { >>>>>>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >>>>>>>> + p = __map_domain_page(v->arch.runstate_guest.page); >>>>>>>> + guest_runstate = p + v->arch.runstate_guest.offset; >>>>>>>> + >>>>>>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >>>>>>>> + { >>>>>>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >>>>>>>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >>>>>>> >>>>>>> I think that this write to guest_runstate should use write_atomic or >>>>>>> another atomic write operation. >>>>>> >>>>>> I thought about suggesting the same, but guest_copy_* helpers may not >>>>>> do a single memory write to state_entry_time. >>>>>> What are you trying to prevent with the write_atomic()? >>>>> >>>>> I am thinking that without using an atomic write, it would be (at least >>>>> theoretically) possible for a guest to see a partial write to >>>>> state_entry_time, which is not good. >>>> >>>> It is already the case with existing implementation as Xen may write byte by >>>> byte. So are you suggesting the existing code is also buggy? >>> >>> Writing byte by byte is a different case. That is OK. In that case, the >>> guest could see the state after 3 bytes written and it would be fine and >>> consistent. If this hadn't been the case, then yes, the existing code >>> would also be buggy. >>> >>> So if we did the write with a memcpy, it would be fine, no need for >>> atomics: >>> >>> memcpy(&guest_runstate->state_entry_time, >>> &v->runstate.state_entry_time, >>> XXX); >>> >>> >>> The |= case is different: GCC could implement it in any way it likes, >>> including going through a zero-write to any of the bytes in the word, or >>> doing an addition then a subtraction. GCC doesn't make any guarantees. >>> If we want guarantees we need to use atomics. >> >> Wouldn’t that require all accesses to state_entry_time to use also atomic operations ? >> In this case we could not propagate the changes to a guest without changing the interface itself. >> >> As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done. >> I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done. > > As you say, we have a flag to mark a transitiong period, the flag is > XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during > the transitioning period. But we need to make sure to use atomics for the > update of the XEN_RUNSTATE_UPDATE flag itself. > > Does it make sense? Or maybe I misunderstood some of the things you > wrote? To achieve this you would do an atomic operation on state_entry_time to set/unset the XEN_RUNSTATE_UPDATE bit. This field is holding a flag in the upper bit but also a value, so all operations on state_entry_time would need to be changed to use atomic operations. Also this state_entry_time might also be accessed by the guest on other cores at the same time (to retrieve the time part). To prevent something being used as atomic and non atomic, specific types are usually used (atomic_t) and this structure is also used by guests so modifying it will not be easy. Or did I missunderstood something here ? Regards Bertrand ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-15 14:09 ` Bertrand Marquis @ 2020-06-15 20:30 ` Stefano Stabellini 2020-06-15 20:44 ` Julien Grall 0 siblings, 1 reply; 26+ messages in thread From: Stefano Stabellini @ 2020-06-15 20:30 UTC (permalink / raw) To: Bertrand Marquis Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Roger Pau Monné, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall [-- Attachment #1: Type: text/plain, Size: 5231 bytes --] On Mon, 15 Jun 2020, Bertrand Marquis wrote: > > On 13 Jun 2020, at 01:24, Stefano Stabellini <sstabellini@kernel.org> wrote: > > > > On Fri, 12 Jun 2020, Bertrand Marquis wrote: > >>> On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote: > >>> > >>> On Thu, 11 Jun 2020, Julien Grall wrote: > >>>> Hi Stefano, > >>>> > >>>> On 11/06/2020 19:50, Stefano Stabellini wrote: > >>>>> On Thu, 11 Jun 2020, Julien Grall wrote: > >>>>>>>> + return -EINVAL; > >>>>>>>> } > >>>>>>>> > >>>>>>>> - __copy_to_guest(runstate_guest(v), &runstate, 1); > >>>>>>>> + v->arch.runstate_guest.page = page; > >>>>>>>> + v->arch.runstate_guest.offset = offset; > >>>>>>>> + > >>>>>>>> + spin_unlock(&v->arch.runstate_guest.lock); > >>>>>>>> + > >>>>>>>> + return 0; > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> + > >>>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered). > >>>>>>>> */ > >>>>>>>> +static void update_runstate_area(struct vcpu *v) > >>>>>>>> +{ > >>>>>>>> + struct vcpu_runstate_info *guest_runstate; > >>>>>>>> + void *p; > >>>>>>>> + > >>>>>>>> + spin_lock(&v->arch.runstate_guest.lock); > >>>>>>>> > >>>>>>>> - if ( guest_handle ) > >>>>>>>> + if ( v->arch.runstate_guest.page ) > >>>>>>>> { > >>>>>>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > >>>>>>>> + p = __map_domain_page(v->arch.runstate_guest.page); > >>>>>>>> + guest_runstate = p + v->arch.runstate_guest.offset; > >>>>>>>> + > >>>>>>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > >>>>>>>> + { > >>>>>>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > >>>>>>>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > >>>>>>> > >>>>>>> I think that this write to guest_runstate should use write_atomic or > >>>>>>> another atomic write operation. > >>>>>> > >>>>>> I thought about suggesting the same, but guest_copy_* helpers may not > >>>>>> do a single memory write to state_entry_time. > >>>>>> What are you trying to prevent with the write_atomic()? > >>>>> > >>>>> I am thinking that without using an atomic write, it would be (at least > >>>>> theoretically) possible for a guest to see a partial write to > >>>>> state_entry_time, which is not good. > >>>> > >>>> It is already the case with existing implementation as Xen may write byte by > >>>> byte. So are you suggesting the existing code is also buggy? > >>> > >>> Writing byte by byte is a different case. That is OK. In that case, the > >>> guest could see the state after 3 bytes written and it would be fine and > >>> consistent. If this hadn't been the case, then yes, the existing code > >>> would also be buggy. > >>> > >>> So if we did the write with a memcpy, it would be fine, no need for > >>> atomics: > >>> > >>> memcpy(&guest_runstate->state_entry_time, > >>> &v->runstate.state_entry_time, > >>> XXX); > >>> > >>> > >>> The |= case is different: GCC could implement it in any way it likes, > >>> including going through a zero-write to any of the bytes in the word, or > >>> doing an addition then a subtraction. GCC doesn't make any guarantees. > >>> If we want guarantees we need to use atomics. > >> > >> Wouldn’t that require all accesses to state_entry_time to use also atomic operations ? > >> In this case we could not propagate the changes to a guest without changing the interface itself. > >> > >> As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done. > >> I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done. > > > > As you say, we have a flag to mark a transitiong period, the flag is > > XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during > > the transitioning period. But we need to make sure to use atomics for the > > update of the XEN_RUNSTATE_UPDATE flag itself. > > > > Does it make sense? Or maybe I misunderstood some of the things you > > wrote? > > To achieve this you would do an atomic operation on state_entry_time to set/unset the XEN_RUNSTATE_UPDATE bit. > This field is holding a flag in the upper bit but also a value, so all operations on state_entry_time would need to be changed to use atomic operations. I don't think that all operations on state_entry_time need to be atomic. Only the bit write to XEN_RUNSTATE_UPDATE. More on this below. > Also this state_entry_time might also be accessed by the guest on other cores at the same time (to retrieve the time part). Yes but they are all just readers, right? > To prevent something being used as atomic and non atomic, specific types are usually used (atomic_t) and this structure is also used by guests so modifying it will not be easy. > > Or did I missunderstood something here ? I was not suggesting to use an atomic_t type. I was only suggesting to use an atomic operation, i.e. calling write_u32_atomic directly (or something like that.) I would not change the type of state_entry_time. Also using memcpy would be acceptable due to the fact that we only need to update one byte. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-15 20:30 ` Stefano Stabellini @ 2020-06-15 20:44 ` Julien Grall 0 siblings, 0 replies; 26+ messages in thread From: Julien Grall @ 2020-06-15 20:44 UTC (permalink / raw) To: Stefano Stabellini Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné On Mon, 15 Jun 2020 at 21:30, Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Mon, 15 Jun 2020, Bertrand Marquis wrote: > > > On 13 Jun 2020, at 01:24, Stefano Stabellini <sstabellini@kernel.org> wrote: > > > > > > On Fri, 12 Jun 2020, Bertrand Marquis wrote: > > >>> On 12 Jun 2020, at 02:09, Stefano Stabellini <sstabellini@kernel.org> wrote: > > >>> > > >>> On Thu, 11 Jun 2020, Julien Grall wrote: > > >>>> Hi Stefano, > > >>>> > > >>>> On 11/06/2020 19:50, Stefano Stabellini wrote: > > >>>>> On Thu, 11 Jun 2020, Julien Grall wrote: > > >>>>>>>> + return -EINVAL; > > >>>>>>>> } > > >>>>>>>> > > >>>>>>>> - __copy_to_guest(runstate_guest(v), &runstate, 1); > > >>>>>>>> + v->arch.runstate_guest.page = page; > > >>>>>>>> + v->arch.runstate_guest.offset = offset; > > >>>>>>>> + > > >>>>>>>> + spin_unlock(&v->arch.runstate_guest.lock); > > >>>>>>>> + > > >>>>>>>> + return 0; > > >>>>>>>> +} > > >>>>>>>> + > > >>>>>>>> + > > >>>>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered). > > >>>>>>>> */ > > >>>>>>>> +static void update_runstate_area(struct vcpu *v) > > >>>>>>>> +{ > > >>>>>>>> + struct vcpu_runstate_info *guest_runstate; > > >>>>>>>> + void *p; > > >>>>>>>> + > > >>>>>>>> + spin_lock(&v->arch.runstate_guest.lock); > > >>>>>>>> > > >>>>>>>> - if ( guest_handle ) > > >>>>>>>> + if ( v->arch.runstate_guest.page ) > > >>>>>>>> { > > >>>>>>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > > >>>>>>>> + p = __map_domain_page(v->arch.runstate_guest.page); > > >>>>>>>> + guest_runstate = p + v->arch.runstate_guest.offset; > > >>>>>>>> + > > >>>>>>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > > >>>>>>>> + { > > >>>>>>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > > >>>>>>>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > > >>>>>>> > > >>>>>>> I think that this write to guest_runstate should use write_atomic or > > >>>>>>> another atomic write operation. > > >>>>>> > > >>>>>> I thought about suggesting the same, but guest_copy_* helpers may not > > >>>>>> do a single memory write to state_entry_time. > > >>>>>> What are you trying to prevent with the write_atomic()? > > >>>>> > > >>>>> I am thinking that without using an atomic write, it would be (at least > > >>>>> theoretically) possible for a guest to see a partial write to > > >>>>> state_entry_time, which is not good. > > >>>> > > >>>> It is already the case with existing implementation as Xen may write byte by > > >>>> byte. So are you suggesting the existing code is also buggy? > > >>> > > >>> Writing byte by byte is a different case. That is OK. In that case, the > > >>> guest could see the state after 3 bytes written and it would be fine and > > >>> consistent. If this hadn't been the case, then yes, the existing code > > >>> would also be buggy. > > >>> > > >>> So if we did the write with a memcpy, it would be fine, no need for > > >>> atomics: > > >>> > > >>> memcpy(&guest_runstate->state_entry_time, > > >>> &v->runstate.state_entry_time, > > >>> XXX); > > >>> > > >>> > > >>> The |= case is different: GCC could implement it in any way it likes, > > >>> including going through a zero-write to any of the bytes in the word, or > > >>> doing an addition then a subtraction. GCC doesn't make any guarantees. > > >>> If we want guarantees we need to use atomics. > > >> > > >> Wouldn’t that require all accesses to state_entry_time to use also atomic operations ? > > >> In this case we could not propagate the changes to a guest without changing the interface itself. > > >> > > >> As the copy time needs to be protected, the write barriers are there to make sure that during the copy the bit is set and that when we unset it, the copy is done. > > >> I added for this purpose a barrier after the memcpy to make sure that when/if we unset the bit the copy has already been done. > > > > > > As you say, we have a flag to mark a transitiong period, the flag is > > > XEN_RUNSTATE_UPDATE. So, I think it is OK if we don't use atomics during > > > the transitioning period. But we need to make sure to use atomics for the > > > update of the XEN_RUNSTATE_UPDATE flag itself. > > > > > > Does it make sense? Or maybe I misunderstood some of the things you > > > wrote? > > > > To achieve this you would do an atomic operation on state_entry_time to set/unset the XEN_RUNSTATE_UPDATE bit. > > This field is holding a flag in the upper bit but also a value, so all operations on state_entry_time would need to be changed to use atomic operations. > > I don't think that all operations on state_entry_time need to be atomic. > Only the bit write to XEN_RUNSTATE_UPDATE. More on this below. > > > > Also this state_entry_time might also be accessed by the guest on other cores at the same time (to retrieve the time part). > > Yes but they are all just readers, right? This is only one part of the equation. The second part is the readers *must not* process the rest of the content while XEN_RUNSTATE_UDPATE is set. > > I was not suggesting to use an atomic_t type. I was only suggesting to > use an atomic operation, i.e. calling write_u32_atomic directly (or > something like that.) I would not change the type of state_entry_time. > Also using memcpy would be acceptable due to the fact that we only need > to update one byte. Please don't use memcpy. This is an abuse to think it can make atomic copy (even if it may be correct for byte). At the same time, lets avoid to introduce more atomic bit operation on guest memory that can be read-write (see XSA-295). In this case a write_atomic() is sufficient as it would do a single write for size smaller than 64-bit. Cheers, ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 1:09 ` Stefano Stabellini 2020-06-12 8:13 ` Bertrand Marquis @ 2020-06-12 9:53 ` Julien Grall 2020-06-13 0:24 ` Stefano Stabellini 1 sibling, 1 reply; 26+ messages in thread From: Julien Grall @ 2020-06-12 9:53 UTC (permalink / raw) To: Stefano Stabellini Cc: Roger Pau Monné, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Bertrand Marquis, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall On 12/06/2020 02:09, Stefano Stabellini wrote: > On Thu, 11 Jun 2020, Julien Grall wrote: >> Hi Stefano, >> >> On 11/06/2020 19:50, Stefano Stabellini wrote: >>> On Thu, 11 Jun 2020, Julien Grall wrote: >>>>>> + return -EINVAL; >>>>>> } >>>>>> >>>>>> - __copy_to_guest(runstate_guest(v), &runstate, 1); >>>>>> + v->arch.runstate_guest.page = page; >>>>>> + v->arch.runstate_guest.offset = offset; >>>>>> + >>>>>> + spin_unlock(&v->arch.runstate_guest.lock); >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> + >>>>>> +/* Update per-VCPU guest runstate shared memory area (if registered). >>>>>> */ >>>>>> +static void update_runstate_area(struct vcpu *v) >>>>>> +{ >>>>>> + struct vcpu_runstate_info *guest_runstate; >>>>>> + void *p; >>>>>> + >>>>>> + spin_lock(&v->arch.runstate_guest.lock); >>>>>> >>>>>> - if ( guest_handle ) >>>>>> + if ( v->arch.runstate_guest.page ) >>>>>> { >>>>>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >>>>>> + p = __map_domain_page(v->arch.runstate_guest.page); >>>>>> + guest_runstate = p + v->arch.runstate_guest.offset; >>>>>> + >>>>>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >>>>>> + { >>>>>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >>>>>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >>>>> >>>>> I think that this write to guest_runstate should use write_atomic or >>>>> another atomic write operation. >>>> >>>> I thought about suggesting the same, but guest_copy_* helpers may not >>>> do a single memory write to state_entry_time. >>>> What are you trying to prevent with the write_atomic()? >>> >>> I am thinking that without using an atomic write, it would be (at least >>> theoretically) possible for a guest to see a partial write to >>> state_entry_time, which is not good. >> >> It is already the case with existing implementation as Xen may write byte by >> byte. So are you suggesting the existing code is also buggy? It looks like I may have misread the code as we only copy one byte. But I still think this is fragile. For this context, I agree that a write_atomic() should do the job. However, I still want to answer to your comments below. > > Writing byte by byte is a different case. That is OK. In that case, the > guest could see the state after 3 bytes written and it would be fine and > consistent. Why? What does actually prevent a guest to see an in-between value? To give a concrete example, if the original value is 0xabc and you want to write 0xdef. Why would the guest never see 0xabf or 0xaec? > If this hadn't been the case, then yes, the existing code > would also be buggy. > > So if we did the write with a memcpy, it would be fine, no need for > atomics: > > memcpy(&guest_runstate->state_entry_time, > &v->runstate.state_entry_time, > XXX); > > > The |= case is different: GCC could implement it in any way it likes, > including going through a zero-write to any of the bytes in the word, or > doing an addition then a subtraction. GCC doesn't make any guarantees. > If we want guarantees we need to use atomics. Yes GCC can generate assembly for |= in any way it likes. But so does for memcpy(). So I still don't understand why one would be fine for you and not the other... Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 9:53 ` Julien Grall @ 2020-06-13 0:24 ` Stefano Stabellini 0 siblings, 0 replies; 26+ messages in thread From: Stefano Stabellini @ 2020-06-13 0:24 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Bertrand Marquis, Roger Pau Monné, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Julien Grall On Fri, 12 Jun 2020, Julien Grall wrote: > > Writing byte by byte is a different case. That is OK. In that case, the > > guest could see the state after 3 bytes written and it would be fine and > > consistent. > > Why? What does actually prevent a guest to see an in-between value? > > To give a concrete example, if the original value is 0xabc and you want to > write 0xdef. Why would the guest never see 0xabf or 0xaec? That is a good question, but the minimum granularity is 1 byte, so if current: 0xaabbcc new: 0xddeeff Can the guest see one of the following? 0xaabbff 0xaaeecc Yes, it can. I don't think it is a problem in this case because we only change 1 byte, so to continue with the example: current: 0xaabbcc new: 0xffbbcc So the only values the VM can see are either 0xaabbcc or 0xffbbcc. > > If this hadn't been the case, then yes, the existing code > > would also be buggy. > > > > So if we did the write with a memcpy, it would be fine, no need for > > atomics: > > > > memcpy(&guest_runstate->state_entry_time, > > &v->runstate.state_entry_time, > > XXX); > > > > > > The |= case is different: GCC could implement it in any way it likes, > > including going through a zero-write to any of the bytes in the word, or > > doing an addition then a subtraction. GCC doesn't make any guarantees. > > If we want guarantees we need to use atomics. > > Yes GCC can generate assembly for |= in any way it likes. But so does for > memcpy(). So I still don't understand why one would be fine for you and not > the other... It is down to the implementation of memcpy to guarantee that the only thing memcpy does is memory copies. memcpy doesn't specify whether it is going to use byte-copies or word-copies, but it should only do copies. If we have to write memcpy in assembly to make it so, so be it :-) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 18:16 ` Stefano Stabellini 2020-06-11 18:24 ` Julien Grall @ 2020-06-12 8:07 ` Bertrand Marquis 1 sibling, 0 replies; 26+ messages in thread From: Bertrand Marquis @ 2020-06-12 8:07 UTC (permalink / raw) To: Stefano Stabellini Cc: Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné > On 11 Jun 2020, at 19:16, Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Thu, 11 Jun 2020, Bertrand Marquis wrote: >> At the moment on Arm, a Linux guest running with KTPI enabled will >> cause the following error when a context switch happens in user mode: >> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 >> >> This patch is modifying the register runstate area handling on arm to >> convert the guest address during the hypercall. During runtime context >> switches the area is mapped to update the guest runstate copy. >> The patch is also removing the initial copy which was done during the >> hypercall handling as this is done on the current core when the context >> is restore to go back to guest execution on arm. >> >> As a consequence if the register runstate hypercall is called on one >> vcpu for an other vcpu, the area will not be updated until the vcpu >> will be executed (on arm only). >> >> On x86, the behaviour is not modified, the address conversion is done >> during the context switch and the area is updated fully during the >> hypercall. >> inline functions in headers could not be used as the architecture >> domain.h is included before the global domain.h making it impossible >> to use the struct vcpu inside the architecture header. >> This should not have any performance impact as the hypercall is only >> called once per vcpu usually. >> >> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> >> --- >> xen/arch/arm/domain.c | 109 +++++++++++++++++++++++++++++------ >> xen/arch/x86/domain.c | 30 +++++++++- >> xen/arch/x86/x86_64/domain.c | 4 +- >> xen/common/domain.c | 19 ++---- >> xen/include/asm-arm/domain.h | 8 +++ >> xen/include/asm-x86/domain.h | 16 +++++ >> xen/include/xen/domain.h | 4 ++ >> xen/include/xen/sched.h | 16 +---- >> 8 files changed, 153 insertions(+), 53 deletions(-) >> >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index 31169326b2..739059234f 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -19,6 +19,7 @@ >> #include <xen/sched.h> >> #include <xen/softirq.h> >> #include <xen/wait.h> >> +#include <xen/domain_page.h> >> >> #include <asm/alternative.h> >> #include <asm/cpuerrata.h> >> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n) >> virt_timer_restore(n); >> } >> >> -/* Update per-VCPU guest runstate shared memory area (if registered). */ >> -static void update_runstate_area(struct vcpu *v) >> +void arch_cleanup_runstate_guest(struct vcpu *v) >> { >> - void __user *guest_handle = NULL; >> - struct vcpu_runstate_info runstate; >> + spin_lock(&v->arch.runstate_guest.lock); >> >> - if ( guest_handle_is_null(runstate_guest(v)) ) >> - return; >> + /* cleanup previous page if any */ >> + if ( v->arch.runstate_guest.page ) >> + { >> + put_page_and_type(v->arch.runstate_guest.page); >> + v->arch.runstate_guest.page = NULL; >> + v->arch.runstate_guest.offset = 0; >> + } >> >> - memcpy(&runstate, &v->runstate, sizeof(runstate)); >> + spin_unlock(&v->arch.runstate_guest.lock); >> +} >> >> - if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> +int arch_setup_runstate_guest(struct vcpu *v, >> + struct vcpu_register_runstate_memory_area area) >> +{ >> + struct page_info *page; >> + unsigned offset; >> + >> + spin_lock(&v->arch.runstate_guest.lock); >> + >> + /* cleanup previous page if any */ >> + if ( v->arch.runstate_guest.page ) >> { >> - guest_handle = &v->runstate_guest.p->state_entry_time + 1; >> - guest_handle--; >> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> - __raw_copy_to_guest(guest_handle, >> - (void *)(&runstate.state_entry_time + 1) - 1, 1); >> - smp_wmb(); >> + put_page_and_type(v->arch.runstate_guest.page); >> + v->arch.runstate_guest.page = NULL; >> + v->arch.runstate_guest.offset = 0; >> + } >> + >> + offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; >> + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); >> + return -EINVAL; >> + } >> + >> + /* provided address must be aligned to a 64bit */ >> + if ( offset % alignof(struct vcpu_runstate_info) ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n"); >> + return -EINVAL; >> + } >> + >> + page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); >> + if ( !page ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); > > I would make all these XENLOG_WARNING or XENLOG_ERR, they are errors > after all. This last one I'd say "Could not map runstate point" and also > print the address. Ok I will fix it to Warning and change the message like this. > > >> + return -EINVAL; >> } >> >> - __copy_to_guest(runstate_guest(v), &runstate, 1); >> + v->arch.runstate_guest.page = page; >> + v->arch.runstate_guest.offset = offset; >> + >> + spin_unlock(&v->arch.runstate_guest.lock); >> + >> + return 0; >> +} >> + >> + >> +/* Update per-VCPU guest runstate shared memory area (if registered). */ >> +static void update_runstate_area(struct vcpu *v) >> +{ >> + struct vcpu_runstate_info *guest_runstate; >> + void *p; >> + >> + spin_lock(&v->arch.runstate_guest.lock); >> >> - if ( guest_handle ) >> + if ( v->arch.runstate_guest.page ) >> { >> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + p = __map_domain_page(v->arch.runstate_guest.page); >> + guest_runstate = p + v->arch.runstate_guest.offset; >> + >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> + { >> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > > I think that this write to guest_runstate should use write_atomic or > another atomic write operation. I will answer at the end of the discussion on the subject. > > >> + smp_wmb(); >> + } >> + >> + memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); >> smp_wmb(); >> - __raw_copy_to_guest(guest_handle, >> - (void *)(&runstate.state_entry_time + 1) - 1, 1); >> + >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> + { >> + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; > > same here > > >> + smp_wmb(); >> + } >> + >> + unmap_domain_page(p); >> } >> + >> + spin_unlock(&v->arch.runstate_guest.lock); >> } >> >> static void schedule_tail(struct vcpu *prev) >> @@ -560,6 +629,8 @@ int arch_vcpu_create(struct vcpu *v) >> v->arch.saved_context.sp = (register_t)v->arch.cpu_info; >> v->arch.saved_context.pc = (register_t)continue_new_vcpu; >> >> + spin_lock_init(&v->arch.runstate_guest.lock); >> + >> /* Idle VCPUs don't need the rest of this setup */ >> if ( is_idle_vcpu(v) ) >> return rc; >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c >> index fee6c3931a..32bbed87d5 100644 >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -1642,6 +1642,30 @@ void paravirt_ctxt_switch_to(struct vcpu *v) >> wrmsr_tsc_aux(v->arch.msrs->tsc_aux); >> } >> >> +int arch_setup_runstate_guest(struct vcpu *v, >> + struct vcpu_register_runstate_memory_area area) >> +{ >> + struct vcpu_runstate_info runstate; >> + >> + runstate_guest(v) = area.addr.h; >> + >> + if ( v == current ) >> + { >> + __copy_to_guest(runstate_guest(v), &v->runstate, 1); >> + } >> + else >> + { >> + vcpu_runstate_get(v, &runstate); >> + __copy_to_guest(runstate_guest(v), &runstate, 1); >> + } >> + return 0; >> +} >> + >> +void arch_cleanup_runstate_guest(struct vcpu *v) >> +{ >> + set_xen_guest_handle(runstate_guest(v), NULL); >> +} >> + >> /* Update per-VCPU guest runstate shared memory area (if registered). */ >> bool update_runstate_area(struct vcpu *v) >> { >> @@ -1660,8 +1684,8 @@ bool update_runstate_area(struct vcpu *v) >> if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> { >> guest_handle = has_32bit_shinfo(v->domain) >> - ? &v->runstate_guest.compat.p->state_entry_time + 1 >> - : &v->runstate_guest.native.p->state_entry_time + 1; >> + ? &v->arch.runstate_guest.compat.p->state_entry_time + 1 >> + : &v->arch.runstate_guest.native.p->state_entry_time + 1; >> guest_handle--; >> runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> __raw_copy_to_guest(guest_handle, >> @@ -1674,7 +1698,7 @@ bool update_runstate_area(struct vcpu *v) >> struct compat_vcpu_runstate_info info; >> >> XLAT_vcpu_runstate_info(&info, &runstate); >> - __copy_to_guest(v->runstate_guest.compat, &info, 1); >> + __copy_to_guest(v->arch.runstate_guest.compat, &info, 1); >> rc = true; >> } >> else >> diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c >> index c46dccc25a..b879e8dd2c 100644 >> --- a/xen/arch/x86/x86_64/domain.c >> +++ b/xen/arch/x86/x86_64/domain.c >> @@ -36,7 +36,7 @@ arch_compat_vcpu_op( >> break; >> >> rc = 0; >> - guest_from_compat_handle(v->runstate_guest.compat, area.addr.h); >> + guest_from_compat_handle(v->arch.runstate_guest.compat, area.addr.h); >> >> if ( v == current ) >> { >> @@ -49,7 +49,7 @@ arch_compat_vcpu_op( >> vcpu_runstate_get(v, &runstate); >> XLAT_vcpu_runstate_info(&info, &runstate); >> } >> - __copy_to_guest(v->runstate_guest.compat, &info, 1); >> + __copy_to_guest(v->arch.runstate_guest.compat, &info, 1); >> >> break; >> } >> diff --git a/xen/common/domain.c b/xen/common/domain.c >> index 7cc9526139..0ca6bf4dbc 100644 >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -727,7 +727,10 @@ int domain_kill(struct domain *d) >> if ( cpupool_move_domain(d, cpupool0) ) >> return -ERESTART; >> for_each_vcpu ( d, v ) >> + { >> + arch_cleanup_runstate_guest(v); >> unmap_vcpu_info(v); >> + } >> d->is_dying = DOMDYING_dead; >> /* Mem event cleanup has to go here because the rings >> * have to be put before we call put_domain. */ >> @@ -1167,7 +1170,7 @@ int domain_soft_reset(struct domain *d) >> >> for_each_vcpu ( d, v ) >> { >> - set_xen_guest_handle(runstate_guest(v), NULL); >> + arch_cleanup_runstate_guest(v); >> unmap_vcpu_info(v); >> } >> >> @@ -1484,7 +1487,6 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) >> case VCPUOP_register_runstate_memory_area: >> { >> struct vcpu_register_runstate_memory_area area; >> - struct vcpu_runstate_info runstate; >> >> rc = -EFAULT; >> if ( copy_from_guest(&area, arg, 1) ) >> @@ -1493,18 +1495,7 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg) >> if ( !guest_handle_okay(area.addr.h, 1) ) >> break; >> >> - rc = 0; >> - runstate_guest(v) = area.addr.h; >> - >> - if ( v == current ) >> - { >> - __copy_to_guest(runstate_guest(v), &v->runstate, 1); >> - } >> - else >> - { >> - vcpu_runstate_get(v, &runstate); >> - __copy_to_guest(runstate_guest(v), &runstate, 1); >> - } >> + rc = arch_setup_runstate_guest(v, area); >> >> break; >> } >> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >> index 4e2f582006..3a7f53e13d 100644 >> --- a/xen/include/asm-arm/domain.h >> +++ b/xen/include/asm-arm/domain.h >> @@ -11,6 +11,7 @@ >> #include <asm/vgic.h> >> #include <asm/vpl011.h> >> #include <public/hvm/params.h> >> +#include <public/vcpu.h> >> #include <xen/serial.h> >> #include <xen/rbtree.h> >> >> @@ -206,6 +207,13 @@ struct arch_vcpu >> */ >> bool need_flush_to_ram; >> >> + /* runstate guest info */ >> + struct { >> + struct page_info *page; /* guest page */ >> + unsigned offset; /* offset in page */ >> + spinlock_t lock; /* lock to access page */ >> + } runstate_guest; >> + >> } __cacheline_aligned; >> >> void vcpu_show_execution_state(struct vcpu *); >> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h >> index e8cee3d5e5..f917b48cb8 100644 >> --- a/xen/include/asm-x86/domain.h >> +++ b/xen/include/asm-x86/domain.h >> @@ -11,6 +11,11 @@ >> #include <asm/x86_emulate.h> >> #include <public/vcpu.h> >> #include <public/hvm/hvm_info_table.h> >> +#ifdef CONFIG_COMPAT >> +#include <compat/vcpu.h> >> +DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t); >> +#endif >> + >> >> #define has_32bit_shinfo(d) ((d)->arch.has_32bit_shinfo) >> >> @@ -626,6 +631,17 @@ struct arch_vcpu >> struct { >> bool next_interrupt_enabled; >> } monitor; >> + >> +#ifndef CONFIG_COMPAT >> +# define runstate_guest(v) ((v)->arch.runstate_guest) >> + XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */ >> +#else >> +# define runstate_guest(v) ((v)->arch.runstate_guest.native) >> + union { >> + XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; >> + XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; >> + } runstate_guest; >> +#endif >> }; >> >> struct guest_memory_policy >> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h >> index 7e51d361de..d5d73ce997 100644 >> --- a/xen/include/xen/domain.h >> +++ b/xen/include/xen/domain.h >> @@ -63,6 +63,10 @@ void arch_vcpu_destroy(struct vcpu *v); >> int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset); >> void unmap_vcpu_info(struct vcpu *v); >> >> +int arch_setup_runstate_guest(struct vcpu *v, >> + struct vcpu_register_runstate_memory_area area); > > NIT: code style, the indentation is off Ok > > >> +void arch_cleanup_runstate_guest(struct vcpu *v); >> + >> int arch_domain_create(struct domain *d, >> struct xen_domctl_createdomain *config); >> >> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >> index ac53519d7f..fac030fb83 100644 >> --- a/xen/include/xen/sched.h >> +++ b/xen/include/xen/sched.h >> @@ -29,11 +29,6 @@ >> #include <public/vcpu.h> >> #include <public/event_channel.h> >> >> -#ifdef CONFIG_COMPAT >> -#include <compat/vcpu.h> >> -DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t); >> -#endif >> - >> /* >> * Stats >> * >> @@ -166,16 +161,7 @@ struct vcpu >> struct sched_unit *sched_unit; >> >> struct vcpu_runstate_info runstate; >> -#ifndef CONFIG_COMPAT >> -# define runstate_guest(v) ((v)->runstate_guest) >> - XEN_GUEST_HANDLE(vcpu_runstate_info_t) runstate_guest; /* guest address */ >> -#else >> -# define runstate_guest(v) ((v)->runstate_guest.native) >> - union { >> - XEN_GUEST_HANDLE(vcpu_runstate_info_t) native; >> - XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; >> - } runstate_guest; /* guest address */ >> -#endif >> + >> unsigned int new_state; >> >> /* Has the FPU been initialised? */ >> -- >> 2.17.1 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis 2020-06-11 18:16 ` Stefano Stabellini @ 2020-06-12 10:53 ` Julien Grall 2020-06-12 14:13 ` Bertrand Marquis 2020-06-12 16:51 ` Bertrand Marquis 1 sibling, 2 replies; 26+ messages in thread From: Julien Grall @ 2020-06-12 10:53 UTC (permalink / raw) To: Bertrand Marquis, xen-devel Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, nd, Volodymyr Babchuk, Roger Pau Monné Hi Bertrand, On 11/06/2020 12:58, Bertrand Marquis wrote: > At the moment on Arm, a Linux guest running with KTPI enabled will > cause the following error when a context switch happens in user mode: > (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 I think you want to add a bit more context explaining the reason on the failure. I.e. this is because the virtual address used by the runstate is only accessible when running in kernel space. > > This patch is modifying the register runstate area handling on arm to > convert the guest address during the hypercall. During runtime context > switches the area is mapped to update the guest runstate copy. > The patch is also removing the initial copy which was done during the > hypercall handling as this is done on the current core when the context > is restore to go back to guest execution on arm. This is explaining what the commit does but not why we want to translate the virtual address at hypercall time. More importantly, this doesn't explain the restrictions added on the hypercall and why they are fine. Note that they also should be documented in the public header. > > As a consequence if the register runstate hypercall is called on one > vcpu for an other vcpu, the area will not be updated until the vcpu > will be executed (on arm only). The code below suggests the hypercall will just fail if you call it from a different vCPU. Is that correct? > > On x86, the behaviour is not modified, the address conversion is done > during the context switch and the area is updated fully during the > hypercall. > inline functions in headers could not be used as the architecture > domain.h is included before the global domain.h making it impossible > to use the struct vcpu inside the architecture header. > This should not have any performance impact as the hypercall is only > called once per vcpu usually. > > Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> > --- > xen/arch/arm/domain.c | 109 +++++++++++++++++++++++++++++------ > xen/arch/x86/domain.c | 30 +++++++++- > xen/arch/x86/x86_64/domain.c | 4 +- > xen/common/domain.c | 19 ++---- > xen/include/asm-arm/domain.h | 8 +++ > xen/include/asm-x86/domain.h | 16 +++++ > xen/include/xen/domain.h | 4 ++ > xen/include/xen/sched.h | 16 +---- > 8 files changed, 153 insertions(+), 53 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 31169326b2..739059234f 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -19,6 +19,7 @@ > #include <xen/sched.h> > #include <xen/softirq.h> > #include <xen/wait.h> > +#include <xen/domain_page.h> > > #include <asm/alternative.h> > #include <asm/cpuerrata.h> > @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n) > virt_timer_restore(n); > } > > -/* Update per-VCPU guest runstate shared memory area (if registered). */ > -static void update_runstate_area(struct vcpu *v) > +void arch_cleanup_runstate_guest(struct vcpu *v) I would prefer if this is name arch_vcpu_cleanup_runstate() as this is per-vCPU and not per-domain information. > { > - void __user *guest_handle = NULL; > - struct vcpu_runstate_info runstate; > + spin_lock(&v->arch.runstate_guest.lock); > > - if ( guest_handle_is_null(runstate_guest(v)) ) > - return; > + /* cleanup previous page if any */ > + if ( v->arch.runstate_guest.page ) > + { > + put_page_and_type(v->arch.runstate_guest.page); get_page_from_gva() is only grabbing a reference on the page. So you want to use put_page() here. Note that we don't have type reference on Arm, so it equivalent to put_page(). But this wouldn't be technically correct :). > + v->arch.runstate_guest.page = NULL; > + v->arch.runstate_guest.offset = 0; > + } > > - memcpy(&runstate, &v->runstate, sizeof(runstate)); > + spin_unlock(&v->arch.runstate_guest.lock); > +} > > - if ( VM_ASSIST(v->domain, runstate_update_flag) ) > +int arch_setup_runstate_guest(struct vcpu *v, > + struct vcpu_register_runstate_memory_area area) The indentation looks off here. Also, same remark for the naming. > +{ > + struct page_info *page; > + unsigned offset; > + > + spin_lock(&v->arch.runstate_guest.lock); > + > + /* cleanup previous page if any */ > + if ( v->arch.runstate_guest.page ) > { > - guest_handle = &v->runstate_guest.p->state_entry_time + 1; > - guest_handle--; > - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > - __raw_copy_to_guest(guest_handle, > - (void *)(&runstate.state_entry_time + 1) - 1, 1); > - smp_wmb(); > + put_page_and_type(v->arch.runstate_guest.page); Same remark here. Although I would prefer if we try to have a common helper to cleaning up the runstate. Maybe cleanup_runstate_vcpu_locked()? > + v->arch.runstate_guest.page = NULL; > + v->arch.runstate_guest.offset = 0; > + } > + > + offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; > + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > + { > + spin_unlock(&v->arch.runstate_guest.lock); > + gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); > + return -EINVAL; > + } > + > + /* provided address must be aligned to a 64bit */ > + if ( offset % alignof(struct vcpu_runstate_info) ) > + { > + spin_unlock(&v->arch.runstate_guest.lock); > + gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n"); > + return -EINVAL; > + } > + > + page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); In the current implementation, 0 was used to unregister the runstate area. I think we want to keep that feature and not throw an error. > + if ( !page ) > + { > + spin_unlock(&v->arch.runstate_guest.lock); > + gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); > + return -EINVAL; > } > > - __copy_to_guest(runstate_guest(v), &runstate, 1); > + v->arch.runstate_guest.page = page; > + v->arch.runstate_guest.offset = offset; > + In the current implementation, the runstate area was update with the latest information during the hypercall. This doesn't seem to happen anymore. Is there any specific reason? > + spin_unlock(&v->arch.runstate_guest.lock); > + > + return 0; > +} > + > + > +/* Update per-VCPU guest runstate shared memory area (if registered). */ > +static void update_runstate_area(struct vcpu *v) > +{ > + struct vcpu_runstate_info *guest_runstate; > + void *p; > + > + spin_lock(&v->arch.runstate_guest.lock); > > - if ( guest_handle ) > + if ( v->arch.runstate_guest.page ) > { > - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + p = __map_domain_page(v->arch.runstate_guest.page); > + guest_runstate = p + v->arch.runstate_guest.offset; > + > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + { > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > + smp_wmb(); > + } > + > + memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); > smp_wmb(); > - __raw_copy_to_guest(guest_handle, > - (void *)(&runstate.state_entry_time + 1) - 1, 1); > + > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + { > + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + smp_wmb(); Why do you need this barrier? [...] > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 4e2f582006..3a7f53e13d 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -11,6 +11,7 @@ > #include <asm/vgic.h> > #include <asm/vpl011.h> > #include <public/hvm/params.h> > +#include <public/vcpu.h> Why do you need to add this new include? > #include <xen/serial.h> > #include <xen/rbtree.h> > > @@ -206,6 +207,13 @@ struct arch_vcpu > */ > bool need_flush_to_ram; > > + /* runstate guest info */ > + struct { > + struct page_info *page; /* guest page */ > + unsigned offset; /* offset in page */ Please use unsigned int. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 10:53 ` Julien Grall @ 2020-06-12 14:13 ` Bertrand Marquis 2020-06-12 19:56 ` Julien Grall 2020-06-12 16:51 ` Bertrand Marquis 1 sibling, 1 reply; 26+ messages in thread From: Bertrand Marquis @ 2020-06-12 14:13 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné Hi Julien, > On 12 Jun 2020, at 11:53, Julien Grall <julien@xen.org> wrote: > > Hi Bertrand, > > On 11/06/2020 12:58, Bertrand Marquis wrote: >> At the moment on Arm, a Linux guest running with KTPI enabled will >> cause the following error when a context switch happens in user mode: >> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 > > I think you want to add a bit more context explaining the reason on the failure. I.e. this is because the virtual address used by the runstate is only accessible when running in kernel space. Ok > >> This patch is modifying the register runstate area handling on arm to >> convert the guest address during the hypercall. During runtime context >> switches the area is mapped to update the guest runstate copy. >> The patch is also removing the initial copy which was done during the >> hypercall handling as this is done on the current core when the context >> is restore to go back to guest execution on arm. > > This is explaining what the commit does but not why we want to translate the virtual address at hypercall time. More importantly, this doesn't explain the restrictions added on the hypercall and why they are fine. > > Note that they also should be documented in the public header. Ok > >> As a consequence if the register runstate hypercall is called on one >> vcpu for an other vcpu, the area will not be updated until the vcpu >> will be executed (on arm only). > > The code below suggests the hypercall will just fail if you call it from a different vCPU. Is that correct? No the hypercall will work, but the area in this case won’t be updated. When the hypercall is called on the current vcpu, the area will be updated when the context will be restored before returning to the guest so there is no need to do it an extra time in the hypercall itself. When this is done for a different vcpu the current code is not updating the area anymore. I did think at first to do it but the complexity it was adding (mainly due to the dual page case) was very high and I could not really think of a use case or find one in Linux. But now that I think of it I could, in that specific case, use a copy_to_guest. Do you think this is something which would make sense to restore ? Anyway I will fix the commit message to be clearer on that part. > >> On x86, the behaviour is not modified, the address conversion is done >> during the context switch and the area is updated fully during the >> hypercall. >> inline functions in headers could not be used as the architecture >> domain.h is included before the global domain.h making it impossible >> to use the struct vcpu inside the architecture header. >> This should not have any performance impact as the hypercall is only >> called once per vcpu usually. >> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> >> --- >> xen/arch/arm/domain.c | 109 +++++++++++++++++++++++++++++------ >> xen/arch/x86/domain.c | 30 +++++++++- >> xen/arch/x86/x86_64/domain.c | 4 +- >> xen/common/domain.c | 19 ++---- >> xen/include/asm-arm/domain.h | 8 +++ >> xen/include/asm-x86/domain.h | 16 +++++ >> xen/include/xen/domain.h | 4 ++ >> xen/include/xen/sched.h | 16 +---- >> 8 files changed, 153 insertions(+), 53 deletions(-) >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index 31169326b2..739059234f 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -19,6 +19,7 @@ >> #include <xen/sched.h> >> #include <xen/softirq.h> >> #include <xen/wait.h> >> +#include <xen/domain_page.h> >> #include <asm/alternative.h> >> #include <asm/cpuerrata.h> >> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n) >> virt_timer_restore(n); >> } >> -/* Update per-VCPU guest runstate shared memory area (if registered). */ >> -static void update_runstate_area(struct vcpu *v) >> +void arch_cleanup_runstate_guest(struct vcpu *v) > > I would prefer if this is name arch_vcpu_cleanup_runstate() as this is per-vCPU and not per-domain information. Ok > >> { >> - void __user *guest_handle = NULL; >> - struct vcpu_runstate_info runstate; >> + spin_lock(&v->arch.runstate_guest.lock); >> - if ( guest_handle_is_null(runstate_guest(v)) ) >> - return; >> + /* cleanup previous page if any */ >> + if ( v->arch.runstate_guest.page ) >> + { >> + put_page_and_type(v->arch.runstate_guest.page); > > get_page_from_gva() is only grabbing a reference on the page. So you want to use put_page() here. > > Note that we don't have type reference on Arm, so it equivalent to put_page(). But this wouldn't be technically correct :). Ok, thanks for the explanation. > >> + v->arch.runstate_guest.page = NULL; >> + v->arch.runstate_guest.offset = 0; >> + } >> - memcpy(&runstate, &v->runstate, sizeof(runstate)); >> + spin_unlock(&v->arch.runstate_guest.lock); >> +} >> - if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> +int arch_setup_runstate_guest(struct vcpu *v, >> + struct vcpu_register_runstate_memory_area area) > > The indentation looks off here. > > Also, same remark for the naming. Ok > > >> +{ >> + struct page_info *page; >> + unsigned offset; >> + >> + spin_lock(&v->arch.runstate_guest.lock); >> + >> + /* cleanup previous page if any */ >> + if ( v->arch.runstate_guest.page ) >> { >> - guest_handle = &v->runstate_guest.p->state_entry_time + 1; >> - guest_handle--; >> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> - __raw_copy_to_guest(guest_handle, >> - (void *)(&runstate.state_entry_time + 1) - 1, 1); >> - smp_wmb(); >> + put_page_and_type(v->arch.runstate_guest.page); > > Same remark here. Although I would prefer if we try to have a common helper to cleaning up the runstate. Maybe cleanup_runstate_vcpu_locked()? Agree I will add a static function and use it on those 2 locations. > >> + v->arch.runstate_guest.page = NULL; >> + v->arch.runstate_guest.offset = 0; >> + } >> + >> + offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; >> + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); >> + return -EINVAL; >> + } >> + >> + /* provided address must be aligned to a 64bit */ >> + if ( offset % alignof(struct vcpu_runstate_info) ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n"); >> + return -EINVAL; >> + } >> + >> + page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); > > In the current implementation, 0 was used to unregister the runstate area. I think we want to keep that feature and not throw an error. I will add this use case (I missed it). > >> + if ( !page ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); >> + return -EINVAL; >> } >> - __copy_to_guest(runstate_guest(v), &runstate, 1); >> + v->arch.runstate_guest.page = page; >> + v->arch.runstate_guest.offset = offset; >> + > > In the current implementation, the runstate area was update with the latest information during the hypercall. This doesn't seem to happen anymore. Is there any specific reason? As explained before, this is still happening on the current vcpu as a result of the context switch back to the guest at the end of the hypercall. This is not done if called for a different vcpu anymore, but I could add it back using copy_to_guest > >> + spin_unlock(&v->arch.runstate_guest.lock); >> + >> + return 0; >> +} >> + >> + >> +/* Update per-VCPU guest runstate shared memory area (if registered). */ >> +static void update_runstate_area(struct vcpu *v) >> +{ >> + struct vcpu_runstate_info *guest_runstate; >> + void *p; >> + >> + spin_lock(&v->arch.runstate_guest.lock); >> - if ( guest_handle ) >> + if ( v->arch.runstate_guest.page ) >> { >> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + p = __map_domain_page(v->arch.runstate_guest.page); >> + guest_runstate = p + v->arch.runstate_guest.offset; >> + >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> + { >> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >> + smp_wmb(); >> + } >> + >> + memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); >> smp_wmb(); >> - __raw_copy_to_guest(guest_handle, >> - (void *)(&runstate.state_entry_time + 1) - 1, 1); >> + >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> + { >> + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + smp_wmb(); > > Why do you need this barrier? As the bit is supposed to be used to wait for an update to be done, I felt it was better to push this out as soon as possible. As there is already one after the memcpy, the cost should be fairly limited here. > > > [...] > >> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >> index 4e2f582006..3a7f53e13d 100644 >> --- a/xen/include/asm-arm/domain.h >> +++ b/xen/include/asm-arm/domain.h >> @@ -11,6 +11,7 @@ >> #include <asm/vgic.h> >> #include <asm/vpl011.h> >> #include <public/hvm/params.h> >> +#include <public/vcpu.h> > > Why do you need to add this new include? > >> #include <xen/serial.h> >> #include <xen/rbtree.h> >> @@ -206,6 +207,13 @@ struct arch_vcpu >> */ >> bool need_flush_to_ram; >> + /* runstate guest info */ >> + struct { >> + struct page_info *page; /* guest page */ >> + unsigned offset; /* offset in page */ > > Please use unsigned int. Ok Thanks for your comments. Bertrand ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 14:13 ` Bertrand Marquis @ 2020-06-12 19:56 ` Julien Grall 0 siblings, 0 replies; 26+ messages in thread From: Julien Grall @ 2020-06-12 19:56 UTC (permalink / raw) To: Bertrand Marquis Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné On 12/06/2020 15:13, Bertrand Marquis wrote: > Hi Julien, Hi, >> On 12 Jun 2020, at 11:53, Julien Grall <julien@xen.org> wrote: >> The code below suggests the hypercall will just fail if you call it from a different vCPU. Is that correct? > > No the hypercall will work, but the area in this case won’t be updated. > When the hypercall is called on the current vcpu, the area will be updated when the context will be restored before returning to the guest so there is no need to do it an extra time in the hypercall itself. I am afraid this is not correct, update_runstate_area() is only called when context switching between 2 vCPUs. So the only way this could be called on return from hypercall is if the vCPU get preempted. > When this is done for a different vcpu the current code is not updating the area anymore. > > I did think at first to do it but the complexity it was adding (mainly due to the dual page case) was very high and I could not really think of a use case or find one in Linux. You may want to have an updated view without forcing the vCPU to exit to the hypervisor and do a context switch. > But now that I think of it I could, in that specific case, use a copy_to_guest. I think you should be able to call update_runstate_area() directly. > > Do you think this is something which would make sense to restore ? I would like to avoid diverging too much from the current definition of the hypercall. In this case, I don't think the restriction you add is necessary. >> >>> + if ( !page ) >>> + { >>> + spin_unlock(&v->arch.runstate_guest.lock); >>> + gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); >>> + return -EINVAL; >>> } >>> - __copy_to_guest(runstate_guest(v), &runstate, 1); >>> + v->arch.runstate_guest.page = page; >>> + v->arch.runstate_guest.offset = offset; >>> + >> >> In the current implementation, the runstate area was update with the latest information during the hypercall. This doesn't seem to happen anymore. Is there any specific reason? > > As explained before, this is still happening on the current vcpu as a result of the context switch back to the guest at the end of the hypercall. See above, I don't believe this is correct. :). >>> + spin_unlock(&v->arch.runstate_guest.lock); >>> + >>> + return 0; >>> +} >>> + >>> + >>> +/* Update per-VCPU guest runstate shared memory area (if registered). */ >>> +static void update_runstate_area(struct vcpu *v) >>> +{ >>> + struct vcpu_runstate_info *guest_runstate; >>> + void *p; >>> + >>> + spin_lock(&v->arch.runstate_guest.lock); >>> - if ( guest_handle ) >>> + if ( v->arch.runstate_guest.page ) >>> { >>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >>> + p = __map_domain_page(v->arch.runstate_guest.page); >>> + guest_runstate = p + v->arch.runstate_guest.offset; >>> + >>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >>> + { >>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >>> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >>> + smp_wmb(); >>> + } >>> + >>> + memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); >>> smp_wmb(); >>> - __raw_copy_to_guest(guest_handle, >>> - (void *)(&runstate.state_entry_time + 1) - 1, 1); >>> + >>> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >>> + { >>> + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >>> + guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; >>> + smp_wmb(); >> >> Why do you need this barrier? > > As the bit is supposed to be used to wait for an update to be done, I felt it was better to push this out as soon as possible. smp_wmb() only ensure that any write before the barrier will be seen before after write after the barrier. It doesn't guarantee the other end will see it right after it... > As there is already one after the memcpy, the cost should be fairly limited here. ... so this feel like more a micro-optimization (happy to be proved wrong). The memory barriers are already tricky enough to use/understand, so I would rather not want to use more than necessary. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 10:53 ` Julien Grall 2020-06-12 14:13 ` Bertrand Marquis @ 2020-06-12 16:51 ` Bertrand Marquis 2020-06-12 20:31 ` Julien Grall 1 sibling, 1 reply; 26+ messages in thread From: Bertrand Marquis @ 2020-06-12 16:51 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné Hi, > On 12 Jun 2020, at 11:53, Julien Grall <julien@xen.org> wrote: > > Hi Bertrand, > > On 11/06/2020 12:58, Bertrand Marquis wrote: >> At the moment on Arm, a Linux guest running with KTPI enabled will >> cause the following error when a context switch happens in user mode: >> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 > > I think you want to add a bit more context explaining the reason on the failure. I.e. this is because the virtual address used by the runstate is only accessible when running in kernel space. > >> This patch is modifying the register runstate area handling on arm to >> convert the guest address during the hypercall. During runtime context >> switches the area is mapped to update the guest runstate copy. >> The patch is also removing the initial copy which was done during the >> hypercall handling as this is done on the current core when the context >> is restore to go back to guest execution on arm. > > This is explaining what the commit does but not why we want to translate the virtual address at hypercall time. More importantly, this doesn't explain the restrictions added on the hypercall and why they are fine. > > Note that they also should be documented in the public header. > >> As a consequence if the register runstate hypercall is called on one >> vcpu for an other vcpu, the area will not be updated until the vcpu >> will be executed (on arm only). > > The code below suggests the hypercall will just fail if you call it from a different vCPU. Is that correct? > >> On x86, the behaviour is not modified, the address conversion is done >> during the context switch and the area is updated fully during the >> hypercall. >> inline functions in headers could not be used as the architecture >> domain.h is included before the global domain.h making it impossible >> to use the struct vcpu inside the architecture header. >> This should not have any performance impact as the hypercall is only >> called once per vcpu usually. >> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> >> --- >> xen/arch/arm/domain.c | 109 +++++++++++++++++++++++++++++------ >> xen/arch/x86/domain.c | 30 +++++++++- >> xen/arch/x86/x86_64/domain.c | 4 +- >> xen/common/domain.c | 19 ++---- >> xen/include/asm-arm/domain.h | 8 +++ >> xen/include/asm-x86/domain.h | 16 +++++ >> xen/include/xen/domain.h | 4 ++ >> xen/include/xen/sched.h | 16 +---- >> 8 files changed, 153 insertions(+), 53 deletions(-) >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index 31169326b2..739059234f 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -19,6 +19,7 @@ >> #include <xen/sched.h> >> #include <xen/softirq.h> >> #include <xen/wait.h> >> +#include <xen/domain_page.h> >> #include <asm/alternative.h> >> #include <asm/cpuerrata.h> >> @@ -275,36 +276,104 @@ static void ctxt_switch_to(struct vcpu *n) >> virt_timer_restore(n); >> } >> -/* Update per-VCPU guest runstate shared memory area (if registered). */ >> -static void update_runstate_area(struct vcpu *v) >> +void arch_cleanup_runstate_guest(struct vcpu *v) > > I would prefer if this is name arch_vcpu_cleanup_runstate() as this is per-vCPU and not per-domain information. > >> { >> - void __user *guest_handle = NULL; >> - struct vcpu_runstate_info runstate; >> + spin_lock(&v->arch.runstate_guest.lock); >> - if ( guest_handle_is_null(runstate_guest(v)) ) >> - return; >> + /* cleanup previous page if any */ >> + if ( v->arch.runstate_guest.page ) >> + { >> + put_page_and_type(v->arch.runstate_guest.page); > > get_page_from_gva() is only grabbing a reference on the page. So you want to use put_page() here. > > Note that we don't have type reference on Arm, so it equivalent to put_page(). But this wouldn't be technically correct :). > >> + v->arch.runstate_guest.page = NULL; >> + v->arch.runstate_guest.offset = 0; >> + } >> - memcpy(&runstate, &v->runstate, sizeof(runstate)); >> + spin_unlock(&v->arch.runstate_guest.lock); >> +} >> - if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> +int arch_setup_runstate_guest(struct vcpu *v, >> + struct vcpu_register_runstate_memory_area area) > > The indentation looks off here. > > Also, same remark for the naming. > > >> +{ >> + struct page_info *page; >> + unsigned offset; >> + >> + spin_lock(&v->arch.runstate_guest.lock); >> + >> + /* cleanup previous page if any */ >> + if ( v->arch.runstate_guest.page ) >> { >> - guest_handle = &v->runstate_guest.p->state_entry_time + 1; >> - guest_handle--; >> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> - __raw_copy_to_guest(guest_handle, >> - (void *)(&runstate.state_entry_time + 1) - 1, 1); >> - smp_wmb(); >> + put_page_and_type(v->arch.runstate_guest.page); > > Same remark here. Although I would prefer if we try to have a common helper to cleaning up the runstate. Maybe cleanup_runstate_vcpu_locked()? > >> + v->arch.runstate_guest.page = NULL; >> + v->arch.runstate_guest.offset = 0; >> + } >> + >> + offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; >> + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); >> + return -EINVAL; >> + } >> + >> + /* provided address must be aligned to a 64bit */ >> + if ( offset % alignof(struct vcpu_runstate_info) ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate pointer is not aligned\n"); >> + return -EINVAL; >> + } >> + >> + page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); > > In the current implementation, 0 was used to unregister the runstate area. I think we want to keep that feature and not throw an error. > >> + if ( !page ) >> + { >> + spin_unlock(&v->arch.runstate_guest.lock); >> + gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); >> + return -EINVAL; >> } >> - __copy_to_guest(runstate_guest(v), &runstate, 1); >> + v->arch.runstate_guest.page = page; >> + v->arch.runstate_guest.offset = offset; >> + > > In the current implementation, the runstate area was update with the latest information during the hypercall. This doesn't seem to happen anymore. Is there any specific reason? > >> + spin_unlock(&v->arch.runstate_guest.lock); >> + >> + return 0; >> +} >> + >> + >> +/* Update per-VCPU guest runstate shared memory area (if registered). */ >> +static void update_runstate_area(struct vcpu *v) >> +{ >> + struct vcpu_runstate_info *guest_runstate; >> + void *p; >> + >> + spin_lock(&v->arch.runstate_guest.lock); >> - if ( guest_handle ) >> + if ( v->arch.runstate_guest.page ) >> { >> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + p = __map_domain_page(v->arch.runstate_guest.page); >> + guest_runstate = p + v->arch.runstate_guest.offset; >> + >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> + { >> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >> + guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >> + smp_wmb(); >> + } >> + >> + memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); >> smp_wmb(); >> - __raw_copy_to_guest(guest_handle, >> - (void *)(&runstate.state_entry_time + 1) - 1, 1); >> + >> + if ( VM_ASSIST(v->domain, runstate_update_flag) ) >> + { >> + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; >> + smp_wmb(); > > Why do you need this barrier? > > > [...] > >> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >> index 4e2f582006..3a7f53e13d 100644 >> --- a/xen/include/asm-arm/domain.h >> +++ b/xen/include/asm-arm/domain.h >> @@ -11,6 +11,7 @@ >> #include <asm/vgic.h> >> #include <asm/vpl011.h> >> #include <public/hvm/params.h> >> +#include <public/vcpu.h> > > Why do you need to add this new include? Sorry I forgot to answer to this one. This is needed to have the definition of vcpu_register_runstate_memory_area. Bertrand ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 16:51 ` Bertrand Marquis @ 2020-06-12 20:31 ` Julien Grall 2020-06-15 14:01 ` Bertrand Marquis 0 siblings, 1 reply; 26+ messages in thread From: Julien Grall @ 2020-06-12 20:31 UTC (permalink / raw) To: Bertrand Marquis Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné On 12/06/2020 17:51, Bertrand Marquis wrote: Hi, >>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >>> index 4e2f582006..3a7f53e13d 100644 >>> --- a/xen/include/asm-arm/domain.h >>> +++ b/xen/include/asm-arm/domain.h >>> @@ -11,6 +11,7 @@ >>> #include <asm/vgic.h> >>> #include <asm/vpl011.h> >>> #include <public/hvm/params.h> >>> +#include <public/vcpu.h> >> >> Why do you need to add this new include? > > Sorry I forgot to answer to this one. > This is needed to have the definition of vcpu_register_runstate_memory_area. I might be missing something because nothing you have added this header seem to require vcpu_register_runstate_memory_area. So should the include be done in a different header? Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] xen/arm: Convert runstate address during hypcall 2020-06-12 20:31 ` Julien Grall @ 2020-06-15 14:01 ` Bertrand Marquis 0 siblings, 0 replies; 26+ messages in thread From: Bertrand Marquis @ 2020-06-15 14:01 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich, xen-devel, nd, Volodymyr Babchuk, Roger Pau Monné > On 12 Jun 2020, at 21:31, Julien Grall <julien@xen.org> wrote: > > > > On 12/06/2020 17:51, Bertrand Marquis wrote: > > Hi, > >>>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >>>> index 4e2f582006..3a7f53e13d 100644 >>>> --- a/xen/include/asm-arm/domain.h >>>> +++ b/xen/include/asm-arm/domain.h >>>> @@ -11,6 +11,7 @@ >>>> #include <asm/vgic.h> >>>> #include <asm/vpl011.h> >>>> #include <public/hvm/params.h> >>>> +#include <public/vcpu.h> >>> >>> Why do you need to add this new include? >> Sorry I forgot to answer to this one. >> This is needed to have the definition of vcpu_register_runstate_memory_area. > > I might be missing something because nothing you have added this header seem to require vcpu_register_runstate_memory_area. So should the include be done in a different header? Right. This was required before when I had the definitions of the interface per arch to have a static inline for x86. This is not needed and I will remove that include. Cheers Bertrand ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/2] xen/arm: Support runstate crossing pages 2020-06-11 11:58 [PATCH 0/2] xen/arm: Convert runstate address during hypcall Bertrand Marquis 2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis @ 2020-06-11 11:58 ` Bertrand Marquis 2020-06-12 1:10 ` Stefano Stabellini 2020-06-12 12:14 ` Julien Grall 1 sibling, 2 replies; 26+ messages in thread From: Bertrand Marquis @ 2020-06-11 11:58 UTC (permalink / raw) To: xen-devel; +Cc: Volodymyr Babchuk, nd, Julien Grall, Stefano Stabellini Add support for runstate area register with the structure crossing pages The code is storing up to 2 pages reference during the hypercall. During a context switch, the code is computing where the state_entry_time is and is breaking the memcpy in 2 parts when it is required. Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> --- xen/arch/arm/domain.c | 101 +++++++++++++++++++++++++---------- xen/include/asm-arm/domain.h | 5 +- 2 files changed, 76 insertions(+), 30 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 739059234f..d847cb00f2 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v) { spin_lock(&v->arch.runstate_guest.lock); - /* cleanup previous page if any */ - if ( v->arch.runstate_guest.page ) + /* cleanup previous pages if any */ + if ( v->arch.runstate_guest.page[0] ) { - put_page_and_type(v->arch.runstate_guest.page); - v->arch.runstate_guest.page = NULL; + put_page_and_type(v->arch.runstate_guest.page[0]); + v->arch.runstate_guest.page[0] = NULL; + if ( v->arch.runstate_guest.page[1] ) + { + put_page_and_type(v->arch.runstate_guest.page[1]); + v->arch.runstate_guest.page[1] = NULL; + } v->arch.runstate_guest.offset = 0; } @@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v) int arch_setup_runstate_guest(struct vcpu *v, struct vcpu_register_runstate_memory_area area) { - struct page_info *page; + struct page_info *page[2] = {NULL,NULL}; unsigned offset; spin_lock(&v->arch.runstate_guest.lock); - /* cleanup previous page if any */ - if ( v->arch.runstate_guest.page ) + /* cleanup previous pages if any */ + if ( v->arch.runstate_guest.page[0] ) { - put_page_and_type(v->arch.runstate_guest.page); - v->arch.runstate_guest.page = NULL; + put_page_and_type(v->arch.runstate_guest.page[0]); + v->arch.runstate_guest.page[0] = NULL; + if ( v->arch.runstate_guest.page[1] ) + { + put_page_and_type(v->arch.runstate_guest.page[1]); + v->arch.runstate_guest.page[1] = NULL; + } v->arch.runstate_guest.offset = 0; } offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; - if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) - { - spin_unlock(&v->arch.runstate_guest.lock); - gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); - return -EINVAL; - } /* provided address must be aligned to a 64bit */ if ( offset % alignof(struct vcpu_runstate_info) ) @@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v, return -EINVAL; } - page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); - if ( !page ) + page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); + if ( !page[0] ) { spin_unlock(&v->arch.runstate_guest.lock); gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); return -EINVAL; } - v->arch.runstate_guest.page = page; + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) + { + /* guest area is crossing pages */ + page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE, + GV2M_WRITE); + if ( !page[1] ) + { + put_page_and_type(v->arch.runstate_guest.page[0]); + spin_unlock(&v->arch.runstate_guest.lock); + gprintk(XENLOG_DEBUG, "Runstate pointer is not fully mapped\n"); + return -EINVAL; + } + } + + v->arch.runstate_guest.page[0] = page[0]; + v->arch.runstate_guest.page[1] = page[1]; v->arch.runstate_guest.offset = offset; spin_unlock(&v->arch.runstate_guest.lock); @@ -343,34 +362,60 @@ int arch_setup_runstate_guest(struct vcpu *v, /* Update per-VCPU guest runstate shared memory area (if registered). */ static void update_runstate_area(struct vcpu *v) { - struct vcpu_runstate_info *guest_runstate; - void *p; + void *p[2]; + unsigned offset, time_offset; + uint64_t *state_entry_time; spin_lock(&v->arch.runstate_guest.lock); - if ( v->arch.runstate_guest.page ) + if ( v->arch.runstate_guest.page[0] ) { - p = __map_domain_page(v->arch.runstate_guest.page); - guest_runstate = p + v->arch.runstate_guest.offset; + p[0] = __map_domain_page(v->arch.runstate_guest.page[0]); + if ( v->arch.runstate_guest.page[1] ) + p[1] = __map_domain_page(v->arch.runstate_guest.page[1]); + else + p[1] = NULL; + offset = v->arch.runstate_guest.offset; if ( VM_ASSIST(v->domain, runstate_update_flag) ) { + time_offset = offset + + offsetof(struct vcpu_runstate_info, state_entry_time); + + if ( time_offset < PAGE_SIZE ) + state_entry_time = p[0] + time_offset; + else + state_entry_time = p[1] + (time_offset - PAGE_SIZE); + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; - guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; + *state_entry_time |= XEN_RUNSTATE_UPDATE; smp_wmb(); } + else + state_entry_time = NULL; + + if ( p[1] ) + { + memcpy(p[0] + offset, &v->runstate, PAGE_SIZE - offset); + memcpy(p[1], &v->runstate + (PAGE_SIZE - offset), + sizeof(v->runstate) - (PAGE_SIZE - offset)); + } + else + memcpy(p[0] + offset, &v->runstate, sizeof(v->runstate)); - memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); + /* copy must be done before switching the bit */ smp_wmb(); - if ( VM_ASSIST(v->domain, runstate_update_flag) ) + if ( state_entry_time ) { v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; - guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; + *state_entry_time &= ~XEN_RUNSTATE_UPDATE; smp_wmb(); } - unmap_domain_page(p); + unmap_domain_page(p[0]); + if ( p[1] ) + unmap_domain_page(p[1]); } spin_unlock(&v->arch.runstate_guest.lock); diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 3a7f53e13d..61e32f1ed5 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -209,8 +209,9 @@ struct arch_vcpu /* runstate guest info */ struct { - struct page_info *page; /* guest page */ - unsigned offset; /* offset in page */ + /* we need 2 pages in case the guest area is crossing pages */ + struct page_info *page[2]; /* guest pages */ + unsigned offset; /* offset in first page */ spinlock_t lock; /* lock to access page */ } runstate_guest; -- 2.17.1 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages 2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis @ 2020-06-12 1:10 ` Stefano Stabellini 2020-06-12 11:37 ` Julien Grall 2020-06-12 12:14 ` Julien Grall 1 sibling, 1 reply; 26+ messages in thread From: Stefano Stabellini @ 2020-06-12 1:10 UTC (permalink / raw) To: Bertrand Marquis Cc: Volodymyr Babchuk, xen-devel, nd, Julien Grall, Stefano Stabellini On Thu, 11 Jun 2020, Bertrand Marquis wrote: > Add support for runstate area register with the structure crossing pages > The code is storing up to 2 pages reference during the hypercall. > During a context switch, the code is computing where the > state_entry_time is and is breaking the memcpy in 2 parts when it is > required. > > Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> Clearly a lot of efforts went into this patch, thanks you Bertrand. The change is complex for the feature it adds. I wonder if we could use ioremap (or maybe something similar but using a different virtual space) to simplify it. Julien, do you have good ideas? Otherwise I don't think there is much we can do to make this patch simpler. As ugly as it is (sorry Bertrand, it is not your fault!) the patch looks correct. > --- > xen/arch/arm/domain.c | 101 +++++++++++++++++++++++++---------- > xen/include/asm-arm/domain.h | 5 +- > 2 files changed, 76 insertions(+), 30 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 739059234f..d847cb00f2 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v) > { > spin_lock(&v->arch.runstate_guest.lock); > > - /* cleanup previous page if any */ > - if ( v->arch.runstate_guest.page ) > + /* cleanup previous pages if any */ > + if ( v->arch.runstate_guest.page[0] ) > { > - put_page_and_type(v->arch.runstate_guest.page); > - v->arch.runstate_guest.page = NULL; > + put_page_and_type(v->arch.runstate_guest.page[0]); > + v->arch.runstate_guest.page[0] = NULL; > + if ( v->arch.runstate_guest.page[1] ) > + { > + put_page_and_type(v->arch.runstate_guest.page[1]); > + v->arch.runstate_guest.page[1] = NULL; > + } > v->arch.runstate_guest.offset = 0; > } > > @@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v) > int arch_setup_runstate_guest(struct vcpu *v, > struct vcpu_register_runstate_memory_area area) > { > - struct page_info *page; > + struct page_info *page[2] = {NULL,NULL}; > unsigned offset; > > spin_lock(&v->arch.runstate_guest.lock); > > - /* cleanup previous page if any */ > - if ( v->arch.runstate_guest.page ) > + /* cleanup previous pages if any */ > + if ( v->arch.runstate_guest.page[0] ) > { > - put_page_and_type(v->arch.runstate_guest.page); > - v->arch.runstate_guest.page = NULL; > + put_page_and_type(v->arch.runstate_guest.page[0]); > + v->arch.runstate_guest.page[0] = NULL; > + if ( v->arch.runstate_guest.page[1] ) > + { > + put_page_and_type(v->arch.runstate_guest.page[1]); > + v->arch.runstate_guest.page[1] = NULL; > + } > v->arch.runstate_guest.offset = 0; > } > > offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; > - if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > - { > - spin_unlock(&v->arch.runstate_guest.lock); > - gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); > - return -EINVAL; > - } > > /* provided address must be aligned to a 64bit */ > if ( offset % alignof(struct vcpu_runstate_info) ) > @@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v, > return -EINVAL; > } > > - page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); > - if ( !page ) > + page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); > + if ( !page[0] ) > { > spin_unlock(&v->arch.runstate_guest.lock); > gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); > return -EINVAL; > } > > - v->arch.runstate_guest.page = page; > + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > + { > + /* guest area is crossing pages */ > + page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE, > + GV2M_WRITE); > + if ( !page[1] ) > + { > + put_page_and_type(v->arch.runstate_guest.page[0]); > + spin_unlock(&v->arch.runstate_guest.lock); > + gprintk(XENLOG_DEBUG, "Runstate pointer is not fully mapped\n"); > + return -EINVAL; > + } > + } > + > + v->arch.runstate_guest.page[0] = page[0]; > + v->arch.runstate_guest.page[1] = page[1]; > v->arch.runstate_guest.offset = offset; > > spin_unlock(&v->arch.runstate_guest.lock); > @@ -343,34 +362,60 @@ int arch_setup_runstate_guest(struct vcpu *v, > /* Update per-VCPU guest runstate shared memory area (if registered). */ > static void update_runstate_area(struct vcpu *v) > { > - struct vcpu_runstate_info *guest_runstate; > - void *p; > + void *p[2]; > + unsigned offset, time_offset; > + uint64_t *state_entry_time; > > spin_lock(&v->arch.runstate_guest.lock); > > - if ( v->arch.runstate_guest.page ) > + if ( v->arch.runstate_guest.page[0] ) > { > - p = __map_domain_page(v->arch.runstate_guest.page); > - guest_runstate = p + v->arch.runstate_guest.offset; > + p[0] = __map_domain_page(v->arch.runstate_guest.page[0]); > + if ( v->arch.runstate_guest.page[1] ) > + p[1] = __map_domain_page(v->arch.runstate_guest.page[1]); > + else > + p[1] = NULL; > + offset = v->arch.runstate_guest.offset; > > if ( VM_ASSIST(v->domain, runstate_update_flag) ) > { > + time_offset = offset + > + offsetof(struct vcpu_runstate_info, state_entry_time); > + > + if ( time_offset < PAGE_SIZE ) > + state_entry_time = p[0] + time_offset; > + else > + state_entry_time = p[1] + (time_offset - PAGE_SIZE); > + > v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > - guest_runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; > + *state_entry_time |= XEN_RUNSTATE_UPDATE; > smp_wmb(); > } > + else > + state_entry_time = NULL; > + > + if ( p[1] ) > + { > + memcpy(p[0] + offset, &v->runstate, PAGE_SIZE - offset); > + memcpy(p[1], &v->runstate + (PAGE_SIZE - offset), > + sizeof(v->runstate) - (PAGE_SIZE - offset)); > + } > + else > + memcpy(p[0] + offset, &v->runstate, sizeof(v->runstate)); > > - memcpy(guest_runstate, &v->runstate, sizeof(v->runstate)); > + /* copy must be done before switching the bit */ > smp_wmb(); > > - if ( VM_ASSIST(v->domain, runstate_update_flag) ) > + if ( state_entry_time ) > { > v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > - guest_runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE; > + *state_entry_time &= ~XEN_RUNSTATE_UPDATE; > smp_wmb(); > } > > - unmap_domain_page(p); > + unmap_domain_page(p[0]); > + if ( p[1] ) > + unmap_domain_page(p[1]); > } > > spin_unlock(&v->arch.runstate_guest.lock); > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 3a7f53e13d..61e32f1ed5 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -209,8 +209,9 @@ struct arch_vcpu > > /* runstate guest info */ > struct { > - struct page_info *page; /* guest page */ > - unsigned offset; /* offset in page */ > + /* we need 2 pages in case the guest area is crossing pages */ > + struct page_info *page[2]; /* guest pages */ > + unsigned offset; /* offset in first page */ > spinlock_t lock; /* lock to access page */ > } runstate_guest; ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages 2020-06-12 1:10 ` Stefano Stabellini @ 2020-06-12 11:37 ` Julien Grall 0 siblings, 0 replies; 26+ messages in thread From: Julien Grall @ 2020-06-12 11:37 UTC (permalink / raw) To: Stefano Stabellini, Bertrand Marquis; +Cc: xen-devel, nd, Volodymyr Babchuk Hi, On 12/06/2020 02:10, Stefano Stabellini wrote: > On Thu, 11 Jun 2020, Bertrand Marquis wrote: >> Add support for runstate area register with the structure crossing pages >> The code is storing up to 2 pages reference during the hypercall. >> During a context switch, the code is computing where the >> state_entry_time is and is breaking the memcpy in 2 parts when it is >> required. >> >> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> > > Clearly a lot of efforts went into this patch, thanks you Bertrand. > > The change is complex for the feature it adds. I wonder if we could use > ioremap (or maybe something similar but using a different virtual space) > to simplify it. Julien, do you have good ideas? ioremap() is for MMIO region, so you would want to use vmap(). Note that former is just a wrapper of the latter. vmap() is probably a good start for now, but not ideal for long term because the vmap() area is fairly small (768MB) and if we want to go towards a secret-free hypervisor on Arm, we would want to restrict the visibility of the mapping to the other CPUs. I think it would be good to have some per-CPU/per-domain mapping to limit the waste of the address space. But this will require to deduplicate page-tables for arm64 (I was actually looking at it over the past few week-ends). For the time-being, I would suggest to use vmap(). The memory is always direct mapped on arm64, so it should make no different for security concern. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages 2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis 2020-06-12 1:10 ` Stefano Stabellini @ 2020-06-12 12:14 ` Julien Grall 2020-06-12 16:13 ` Bertrand Marquis 1 sibling, 1 reply; 26+ messages in thread From: Julien Grall @ 2020-06-12 12:14 UTC (permalink / raw) To: Bertrand Marquis, xen-devel; +Cc: nd, Volodymyr Babchuk, Stefano Stabellini Hi, On 11/06/2020 12:58, Bertrand Marquis wrote: > Add support for runstate area register with the structure crossing pages Well, this has always been supported until your previous patch. In general, we try to not break thing in a middle of the series so we can still bisect it. I think this patch can be simplified a lot by mapping the two page contiguously (see my previous answer). With that it would be feasible to fold this patch in #1. > The code is storing up to 2 pages reference during the hypercall. > During a context switch, the code is computing where the > state_entry_time is and is breaking the memcpy in 2 parts when it is > required. > > Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> > --- > xen/arch/arm/domain.c | 101 +++++++++++++++++++++++++---------- > xen/include/asm-arm/domain.h | 5 +- > 2 files changed, 76 insertions(+), 30 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 739059234f..d847cb00f2 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v) > { > spin_lock(&v->arch.runstate_guest.lock); > > - /* cleanup previous page if any */ > - if ( v->arch.runstate_guest.page ) > + /* cleanup previous pages if any */ > + if ( v->arch.runstate_guest.page[0] ) > { > - put_page_and_type(v->arch.runstate_guest.page); > - v->arch.runstate_guest.page = NULL; > + put_page_and_type(v->arch.runstate_guest.page[0]); > + v->arch.runstate_guest.page[0] = NULL; > + if ( v->arch.runstate_guest.page[1] ) > + { > + put_page_and_type(v->arch.runstate_guest.page[1]); > + v->arch.runstate_guest.page[1] = NULL; > + } I think this can be moved outside of the first if. > v->arch.runstate_guest.offset = 0; > } > > @@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v) > int arch_setup_runstate_guest(struct vcpu *v, > struct vcpu_register_runstate_memory_area area) > { > - struct page_info *page; > + struct page_info *page[2] = {NULL,NULL}; I don't think you need the temporary variable. You can directly update page[0] as it is protected by the lock. The nice benefits is you could take advantage of a common helper to cleanup and reduce the complexity of the code. > unsigned offset; > > spin_lock(&v->arch.runstate_guest.lock); > > - /* cleanup previous page if any */ > - if ( v->arch.runstate_guest.page ) > + /* cleanup previous pages if any */ > + if ( v->arch.runstate_guest.page[0] ) > { > - put_page_and_type(v->arch.runstate_guest.page); > - v->arch.runstate_guest.page = NULL; > + put_page_and_type(v->arch.runstate_guest.page[0]); > + v->arch.runstate_guest.page[0] = NULL; > + if ( v->arch.runstate_guest.page[1] ) > + { > + put_page_and_type(v->arch.runstate_guest.page[1]); > + v->arch.runstate_guest.page[1] = NULL; > + } > v->arch.runstate_guest.offset = 0; > } > > offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; > - if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > - { > - spin_unlock(&v->arch.runstate_guest.lock); > - gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); > - return -EINVAL; > - } > > /* provided address must be aligned to a 64bit */ > if ( offset % alignof(struct vcpu_runstate_info) ) > @@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v, > return -EINVAL; > } > > - page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); > - if ( !page ) > + page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); > + if ( !page[0] ) > { > spin_unlock(&v->arch.runstate_guest.lock); > gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); > return -EINVAL; > } > > - v->arch.runstate_guest.page = page; > + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > + { > + /* guest area is crossing pages */ > + page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE, > + GV2M_WRITE); > + if ( !page[1] ) > + { > + put_page_and_type(v->arch.runstate_guest.page[0]); v->arch.runstate_guest.page[0] would be NULL as you set it afterwards. So you want to set v->arch.runstate_guest.page[0] beforehand. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] xen/arm: Support runstate crossing pages 2020-06-12 12:14 ` Julien Grall @ 2020-06-12 16:13 ` Bertrand Marquis 0 siblings, 0 replies; 26+ messages in thread From: Bertrand Marquis @ 2020-06-12 16:13 UTC (permalink / raw) To: Julien Grall; +Cc: xen-devel, nd, Volodymyr Babchuk, Stefano Stabellini Hi, > On 12 Jun 2020, at 13:14, Julien Grall <julien@xen.org> wrote: > > Hi, > > On 11/06/2020 12:58, Bertrand Marquis wrote: >> Add support for runstate area register with the structure crossing pages > > Well, this has always been supported until your previous patch. In general, we try to not break thing in a middle of the series so we can still bisect it. > > I think this patch can be simplified a lot by mapping the two page contiguously (see my previous answer). With that it would be feasible to fold this patch in #1. I will dig into switching to something using vmap. Worst case scenario we would consume 8k * 128 vCPU which is still 1M but should be acceptable as it could simplify greatly the code. > >> The code is storing up to 2 pages reference during the hypercall. >> During a context switch, the code is computing where the >> state_entry_time is and is breaking the memcpy in 2 parts when it is >> required. >> Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> >> --- >> xen/arch/arm/domain.c | 101 +++++++++++++++++++++++++---------- >> xen/include/asm-arm/domain.h | 5 +- >> 2 files changed, 76 insertions(+), 30 deletions(-) >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index 739059234f..d847cb00f2 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -280,11 +280,16 @@ void arch_cleanup_runstate_guest(struct vcpu *v) >> { >> spin_lock(&v->arch.runstate_guest.lock); >> - /* cleanup previous page if any */ >> - if ( v->arch.runstate_guest.page ) >> + /* cleanup previous pages if any */ >> + if ( v->arch.runstate_guest.page[0] ) >> { >> - put_page_and_type(v->arch.runstate_guest.page); >> - v->arch.runstate_guest.page = NULL; >> + put_page_and_type(v->arch.runstate_guest.page[0]); >> + v->arch.runstate_guest.page[0] = NULL; >> + if ( v->arch.runstate_guest.page[1] ) >> + { >> + put_page_and_type(v->arch.runstate_guest.page[1]); >> + v->arch.runstate_guest.page[1] = NULL; >> + } > > I think this can be moved outside of the first if. Agree > >> v->arch.runstate_guest.offset = 0; >> } >> @@ -294,26 +299,25 @@ void arch_cleanup_runstate_guest(struct vcpu *v) >> int arch_setup_runstate_guest(struct vcpu *v, >> struct vcpu_register_runstate_memory_area area) >> { >> - struct page_info *page; >> + struct page_info *page[2] = {NULL,NULL}; > > I don't think you need the temporary variable. You can directly update page[0] as it is protected by the lock. The nice benefits is you could take advantage of a common helper to cleanup and reduce the complexity of the code. Would make sense yes (and remove the unlock everywhere). I will try that, thanks for the hint. > >> unsigned offset; >> spin_lock(&v->arch.runstate_guest.lock); >> - /* cleanup previous page if any */ >> - if ( v->arch.runstate_guest.page ) >> + /* cleanup previous pages if any */ >> + if ( v->arch.runstate_guest.page[0] ) >> { >> - put_page_and_type(v->arch.runstate_guest.page); >> - v->arch.runstate_guest.page = NULL; >> + put_page_and_type(v->arch.runstate_guest.page[0]); >> + v->arch.runstate_guest.page[0] = NULL; >> + if ( v->arch.runstate_guest.page[1] ) >> + { >> + put_page_and_type(v->arch.runstate_guest.page[1]); >> + v->arch.runstate_guest.page[1] = NULL; >> + } >> v->arch.runstate_guest.offset = 0; >> } >> offset = ((vaddr_t)area.addr.v) & ~PAGE_MASK; >> - if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) >> - { >> - spin_unlock(&v->arch.runstate_guest.lock); >> - gprintk(XENLOG_DEBUG, "Runstate is crossing pages\n"); >> - return -EINVAL; >> - } >> /* provided address must be aligned to a 64bit */ >> if ( offset % alignof(struct vcpu_runstate_info) ) >> @@ -323,15 +327,30 @@ int arch_setup_runstate_guest(struct vcpu *v, >> return -EINVAL; >> } >> - page = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); >> - if ( !page ) >> + page[0] = get_page_from_gva(v, (vaddr_t)area.addr.v, GV2M_WRITE); >> + if ( !page[0] ) >> { >> spin_unlock(&v->arch.runstate_guest.lock); >> gprintk(XENLOG_DEBUG, "Runstate pointer is not mapped\n"); >> return -EINVAL; >> } >> - v->arch.runstate_guest.page = page; >> + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) >> + { >> + /* guest area is crossing pages */ >> + page[1] = get_page_from_gva(v, (vaddr_t)area.addr.v + PAGE_SIZE, >> + GV2M_WRITE); >> + if ( !page[1] ) >> + { >> + put_page_and_type(v->arch.runstate_guest.page[0]); > v->arch.runstate_guest.page[0] would be NULL as you set it afterwards. So you want to set v->arch.runstate_guest.page[0] beforehand. Nice catch, should have been page[0] alone. Thanks Bertrand ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2020-06-15 20:44 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-11 11:58 [PATCH 0/2] xen/arm: Convert runstate address during hypcall Bertrand Marquis 2020-06-11 11:58 ` [PATCH 1/2] " Bertrand Marquis 2020-06-11 18:16 ` Stefano Stabellini 2020-06-11 18:24 ` Julien Grall 2020-06-11 18:50 ` Stefano Stabellini 2020-06-11 19:38 ` Julien Grall 2020-06-12 1:09 ` Stefano Stabellini 2020-06-12 8:13 ` Bertrand Marquis 2020-06-13 0:24 ` Stefano Stabellini 2020-06-15 14:09 ` Bertrand Marquis 2020-06-15 20:30 ` Stefano Stabellini 2020-06-15 20:44 ` Julien Grall 2020-06-12 9:53 ` Julien Grall 2020-06-13 0:24 ` Stefano Stabellini 2020-06-12 8:07 ` Bertrand Marquis 2020-06-12 10:53 ` Julien Grall 2020-06-12 14:13 ` Bertrand Marquis 2020-06-12 19:56 ` Julien Grall 2020-06-12 16:51 ` Bertrand Marquis 2020-06-12 20:31 ` Julien Grall 2020-06-15 14:01 ` Bertrand Marquis 2020-06-11 11:58 ` [PATCH 2/2] xen/arm: Support runstate crossing pages Bertrand Marquis 2020-06-12 1:10 ` Stefano Stabellini 2020-06-12 11:37 ` Julien Grall 2020-06-12 12:14 ` Julien Grall 2020-06-12 16:13 ` Bertrand Marquis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).