From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG Date: Wed, 29 Mar 2017 07:55:43 -0600 Message-ID: <58DBD8FF020000780014A113@prv-mh.provo.novell.com> References: <1490361899-18303-1-git-send-email-rcojocaru@bitdefender.com> <58DA510A0200007800148F6F@prv-mh.provo.novell.com> <925827a5-b346-1733-3c0a-64eaa7b3e251@bitdefender.com> <58DA5B7E020000780014900C@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=__Part98A116CF.2__=" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Razvan Cojocaru Cc: andrew.cooper3@citrix.com, paul.durrant@citrix.com, Tim Deegan , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part98A116CF.2__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline >>> On 28.03.17 at 12:50, wrote: > On 03/28/2017 01:47 PM, Jan Beulich wrote: >>>>> On 28.03.17 at 12:27, wrote: >>> On 03/28/2017 01:03 PM, Jan Beulich wrote: >>>>>>> On 28.03.17 at 11:14, wrote: >>>>> I'm not sure that the RETRY model is what the guest OS expects. = AFAIK, a >>>>> failed CMPXCHG should happen just once, with the proper registers = and ZF >>>>> set. The guest surely expects neither that the instruction resume = until >>>>> it succeeds, nor that some hidden loop goes on for an undeterminate >>>>> ammount of time until a CMPXCHG succeeds. >>>> >>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to >>>> the instruction being restarted instead of completed. >>> >>> Indeed, but it works differently with hvm_emulate_one_vm_event() where >>> RETRY currently would have the instruction be re-executed (properly >>> re-executed, not just re-emulated) by the guest. >>=20 >> Right - see my other reply to Andrew: The function likely would >> need to tell apart guest CMPXCHG uses from us using the insn to >> carry out the write by some other one. That may involve >> adjustments to the memory write logic in x86_emulate() itself, as >> the late failure of the comparison then would also need to be >> communicated back (via ZF clear) to the guest. >=20 > Exactly, it would require quite some reworking of x86_emulate(). I had imagined it to be less intrusive (outside of x86_emulate()), but I've now learned why Andrew was able to get rid of X86EMUL_CMPXCHG_FAILED - the apparently intended behavior was never implemented. Attached a first take at it, which has seen smoke testing, but nothing more. The way it ends up being I don't think this can reasonably be considered for 4.9 at this point in time. (Also Cc-ing Tim for the shadow code changes, even if this isn't really a proper patch submission.) Jan --=__Part98A116CF.2__= Content-Type: text/plain; name="x86emul-cmpxchg-fail.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="x86emul-cmpxchg-fail.patch" x86emul: correctly handle CMPXCHG* comparison failures=0A=0AIf the = ->cmpxchg() hook finds a mismatch, we should deal with this the=0Asame as = when the "manual" comparison reports a mismatch.=0A=0AThis involves = reverting bfce0e62c3 ("x86/emul: Drop=0AX86EMUL_CMPXCHG_FAILED"), albeit = with X86EMUL_CMPXCHG_FAILED now=0Abecoming a value distinct from X86EMUL_RE= TRY.=0A=0AIn order to not leave mixed code also fully switch affected = functions=0Afrom paddr_t to intpte_t.=0A=0ASigned-off-by: Jan Beulich = =0A---=0AThe code could be further simplified if we = could rely on all=0A->cmpxchg() hooks always using CMPXCHG, but for now we = need to cope=0Awith them using plain write (and hence accept the double = reads if=0ACMPXCHG is actually being used).=0ANote that the patch doesn't = address the incorrectness of there not=0Abeing a memory write even in the = comparison-failed case.=0A=0A--- a/xen/arch/x86/mm.c=0A+++ b/xen/arch/x86/m= m.c=0A@@ -5236,16 +5236,17 @@ static int ptwr_emulated_read(=0A =0A static = int ptwr_emulated_update(=0A unsigned long addr,=0A- paddr_t = old,=0A- paddr_t val,=0A+ intpte_t *p_old,=0A+ intpte_t val,=0A = unsigned int bytes,=0A- unsigned int do_cmpxchg,=0A struct = ptwr_emulate_ctxt *ptwr_ctxt)=0A {=0A unsigned long mfn;=0A = unsigned long unaligned_addr =3D addr;=0A struct page_info *page;=0A = l1_pgentry_t pte, ol1e, nl1e, *pl1e;=0A+ intpte_t old =3D p_old ? = *p_old : 0;=0A+ unsigned int offset =3D 0;=0A struct vcpu *v =3D = current;=0A struct domain *d =3D v->domain;=0A int ret;=0A@@ = -5259,28 +5260,30 @@ static int ptwr_emulated_update(=0A }=0A =0A = /* Turn a sub-word access into a full-word access. */=0A- if ( bytes = !=3D sizeof(paddr_t) )=0A+ if ( bytes !=3D sizeof(val) )=0A {=0A- = paddr_t full;=0A- unsigned int rc, offset =3D addr & = (sizeof(paddr_t)-1);=0A+ intpte_t full;=0A+ unsigned int = rc;=0A+=0A+ offset =3D addr & (sizeof(full) - 1);=0A =0A /* = Align address; read full word. */=0A- addr &=3D ~(sizeof(paddr_t)-1)= ;=0A- if ( (rc =3D copy_from_user(&full, (void *)addr, sizeof(paddr_= t))) !=3D 0 )=0A+ addr &=3D ~(sizeof(full) - 1);=0A+ if ( = (rc =3D copy_from_user(&full, (void *)addr, sizeof(full))) !=3D 0 )=0A = {=0A x86_emul_pagefault(0, /* Read fault. */=0A- = addr + sizeof(paddr_t) - rc,=0A+ = addr + sizeof(full) - rc,=0A = &ptwr_ctxt->ctxt);=0A return X86EMUL_EXCEPTION;=0A = }=0A /* Mask out bits provided by caller. */=0A- full &=3D = ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));=0A+ full &=3D = ~((((intpte_t)1 << (bytes * 8)) - 1) << (offset * 8));=0A /* Shift = the caller value and OR in the missing bits. */=0A- val &=3D = (((paddr_t)1 << (bytes*8)) - 1);=0A+ val &=3D (((intpte_t)1 << = (bytes * 8)) - 1);=0A val <<=3D (offset)*8;=0A val |=3D = full;=0A /* Also fill in missing parts of the cmpxchg old value. = */=0A- old &=3D (((paddr_t)1 << (bytes*8)) - 1);=0A+ old = &=3D (((intpte_t)1 << (bytes * 8)) - 1);=0A old <<=3D (offset)*8;= =0A old |=3D full;=0A }=0A@@ -5302,7 +5305,7 @@ static int = ptwr_emulated_update(=0A {=0A default:=0A if ( is_pv_32bit_= domain(d) && (bytes =3D=3D 4) && (unaligned_addr & 4) &&=0A- = !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )=0A+ = !p_old && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )=0A {=0A = /*=0A * If this is an upper-half write to a PAE PTE then = we assume that=0A@@ -5333,21 +5336,26 @@ static int ptwr_emulated_update(= =0A /* Checked successfully: do the update (write or cmpxchg). */=0A = pl1e =3D map_domain_page(_mfn(mfn));=0A pl1e =3D (l1_pgentry_t = *)((unsigned long)pl1e + (addr & ~PAGE_MASK));=0A- if ( do_cmpxchg = )=0A+ if ( p_old )=0A {=0A- int okay;=0A- intpte_t t = =3D old;=0A ol1e =3D l1e_from_intpte(old);=0A =0A- okay =3D = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),=0A- = &t, l1e_get_intpte(nl1e), _mfn(mfn));=0A- = okay =3D (okay && t =3D=3D old);=0A+ if ( !paging_cmpxchg_guest_entr= y(v, &l1e_get_intpte(*pl1e),=0A+ = &old, l1e_get_intpte(nl1e), _mfn(mfn)) )=0A+ ret =3D X86EMUL_UNH= ANDLEABLE;=0A+ else if ( l1e_get_intpte(ol1e) =3D=3D old )=0A+ = ret =3D X86EMUL_OKAY;=0A+ else=0A+ {=0A+ = *p_old =3D old >> (offset * 8);=0A+ ret =3D X86EMUL_CMPXCHG_FAIL= ED;=0A+ }=0A =0A- if ( !okay )=0A+ if ( ret !=3D = X86EMUL_OKAY )=0A {=0A unmap_domain_page(pl1e);=0A = put_page_from_l1e(nl1e, d);=0A- return X86EMUL_RETRY;=0A= + return ret;=0A }=0A }=0A else=0A@@ -5374,9 = +5382,9 @@ static int ptwr_emulated_write(=0A unsigned int bytes,=0A = struct x86_emulate_ctxt *ctxt)=0A {=0A- paddr_t val =3D 0;=0A+ = intpte_t val =3D 0;=0A =0A- if ( (bytes > sizeof(paddr_t)) || (bytes & = (bytes - 1)) || !bytes )=0A+ if ( (bytes > sizeof(val)) || (bytes & = (bytes - 1)) || !bytes )=0A {=0A MEM_LOG("ptwr_emulate: bad = write size (addr=3D%lx, bytes=3D%u)",=0A offset, bytes);=0A= @@ -5385,9 +5393,9 @@ static int ptwr_emulated_write(=0A =0A memcpy(&va= l, p_data, bytes);=0A =0A- return ptwr_emulated_update(=0A- = offset, 0, val, bytes, 0,=0A- container_of(ctxt, struct ptwr_emulate= _ctxt, ctxt));=0A+ return ptwr_emulated_update(offset, NULL, val, = bytes,=0A+ container_of(ctxt, struct = ptwr_emulate_ctxt,=0A+ = ctxt));=0A }=0A =0A static int ptwr_emulated_cmpxchg(=0A@@ -5398,21 = +5406,20 @@ static int ptwr_emulated_cmpxchg(=0A unsigned int = bytes,=0A struct x86_emulate_ctxt *ctxt)=0A {=0A- paddr_t old =3D = 0, new =3D 0;=0A+ intpte_t new =3D 0;=0A =0A- if ( (bytes > = sizeof(paddr_t)) || (bytes & (bytes -1)) )=0A+ if ( (bytes > sizeof(new)= ) || (bytes & (bytes -1)) )=0A {=0A MEM_LOG("ptwr_emulate: bad = cmpxchg size (addr=3D%lx, bytes=3D%u)",=0A offset, = bytes);=0A return X86EMUL_UNHANDLEABLE;=0A }=0A =0A- = memcpy(&old, p_old, bytes);=0A memcpy(&new, p_new, bytes);=0A =0A- = return ptwr_emulated_update(=0A- offset, old, new, bytes, 1,=0A- = container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));=0A+ return = ptwr_emulated_update(offset, p_old, new, bytes,=0A+ = container_of(ctxt, struct ptwr_emulate_ctxt,=0A+ = ctxt));=0A }=0A =0A static int pv_emul_is_mem_wri= te(const struct x86_emulate_state *state,=0A--- a/xen/arch/x86/mm/shadow/co= mmon.c=0A+++ b/xen/arch/x86/mm/shadow/common.c=0A@@ -285,7 +285,7 @@ = hvm_emulate_cmpxchg(enum x86_segment seg=0A struct sh_emulate_ctxt = *sh_ctxt =3D=0A container_of(ctxt, struct sh_emulate_ctxt, = ctxt);=0A struct vcpu *v =3D current;=0A- unsigned long addr, old, = new;=0A+ unsigned long addr, new =3D 0;=0A int rc;=0A =0A if ( = bytes > sizeof(long) )=0A@@ -296,12 +296,10 @@ hvm_emulate_cmpxchg(enum = x86_segment seg=0A if ( rc )=0A return rc;=0A =0A- old =3D = new =3D 0;=0A- memcpy(&old, p_old, bytes);=0A memcpy(&new, p_new, = bytes);=0A =0A return v->arch.paging.mode->shadow.x86_emulate_cmpxchg(= =0A- v, addr, old, new, bytes, sh_ctxt);=0A+ = v, addr, p_old, new, bytes, sh_ctxt);=0A }=0A =0A static const struct = x86_emulate_ops hvm_shadow_emulator_ops =3D {=0A--- a/xen/arch/x86/mm/shado= w/multi.c=0A+++ b/xen/arch/x86/mm/shadow/multi.c=0A@@ -4755,11 +4755,11 @@ = sh_x86_emulate_write(struct vcpu *v, uns=0A =0A static int=0A sh_x86_emulat= e_cmpxchg(struct vcpu *v, unsigned long vaddr,=0A- = unsigned long old, unsigned long new,=0A- unsigned = int bytes, struct sh_emulate_ctxt *sh_ctxt)=0A+ = unsigned long *p_old, unsigned long new,=0A+ = unsigned int bytes, struct sh_emulate_ctxt *sh_ctxt)=0A {=0A void = *addr;=0A- unsigned long prev;=0A+ unsigned long prev, old =3D = *p_old;=0A int rv =3D X86EMUL_OKAY;=0A =0A /* Unaligned writes are = only acceptable on HVM */=0A@@ -4783,7 +4783,10 @@ sh_x86_emulate_cmpxchg(s= truct vcpu *v, u=0A }=0A =0A if ( prev !=3D old )=0A- rv = =3D X86EMUL_RETRY;=0A+ {=0A+ *p_old =3D prev;=0A+ rv =3D = X86EMUL_CMPXCHG_FAILED;=0A+ }=0A =0A SHADOW_DEBUG(EMULATE, "va %#lx = was %#lx expected %#lx"=0A " wanted %#lx now %#lx bytes = %u\n",=0A--- a/xen/arch/x86/x86_emulate/x86_emulate.c=0A+++ b/xen/arch/x86/= x86_emulate/x86_emulate.c=0A@@ -1880,6 +1880,9 @@ protmode_load_seg(=0A = =0A default:=0A return rc;=0A+=0A+ case = X86EMUL_CMPXCHG_FAILED:=0A+ return X86EMUL_RETRY;=0A = }=0A =0A /* Force the Accessed flag in our local copy. */=0A@@ = -6702,6 +6705,7 @@ x86_emulate(=0A break;=0A =0A case = X86EMUL_OPC(0x0f, 0xb0): case X86EMUL_OPC(0x0f, 0xb1): /* cmpxchg */=0A+ = fail_if(!ops->cmpxchg);=0A /* Save real source value, then = compare EAX against destination. */=0A src.orig_val =3D src.val;=0A= src.val =3D _regs.r(ax);=0A@@ -6710,8 +6714,17 @@ x86_emulate(=0A = if ( _regs.eflags & X86_EFLAGS_ZF )=0A {=0A /* = Success: write back to memory. */=0A- dst.val =3D src.orig_val;= =0A+ dst.val =3D src.val;=0A+ rc =3D ops->cmpxchg(dst= .mem.seg, dst.mem.off, &dst.val,=0A+ = &src.orig_val, dst.bytes, ctxt);=0A+ if ( rc =3D=3D X86EMUL_CMPX= CHG_FAILED )=0A+ {=0A+ _regs.eflags &=3D = ~X86_EFLAGS_ZF;=0A+ rc =3D X86EMUL_OKAY;=0A+ }=0A = }=0A+ if ( _regs.eflags & X86_EFLAGS_ZF )=0A+ = dst.type =3D OP_NONE;=0A else=0A {=0A /* = Failure: write the value we saw to EAX. */=0A@@ -7016,6 +7029,7 @@ = x86_emulate(=0A =0A if ( memcmp(old, aux, op_bytes) )=0A = {=0A+ cmpxchgNb_failed:=0A /* Expected !=3D actual: = store actual to rDX:rAX and clear ZF. */=0A _regs.r(ax) =3D = !(rex_prefix & REX_W) ? old->u32[0] : old->u64[0];=0A = _regs.r(dx) =3D !(rex_prefix & REX_W) ? old->u32[1] : old->u64[1];=0A@@ = -7025,7 +7039,7 @@ x86_emulate(=0A {=0A /*=0A = * Expected =3D=3D actual: Get proposed value, attempt atomic cmpxchg=0A= - * and set ZF.=0A+ * and set ZF if successful.=0A = */=0A if ( !(rex_prefix & REX_W) )=0A = {=0A@@ -7038,10 +7052,20 @@ x86_emulate(=0A aux->u64[1] = =3D _regs.r(cx);=0A }=0A =0A- if ( (rc =3D = ops->cmpxchg(ea.mem.seg, ea.mem.off, old, aux,=0A- = op_bytes, ctxt)) !=3D X86EMUL_OKAY )=0A+ switch ( rc = =3D ops->cmpxchg(ea.mem.seg, ea.mem.off, old, aux,=0A+ = op_bytes, ctxt) )=0A+ {=0A+ case = X86EMUL_OKAY:=0A+ _regs.eflags |=3D X86_EFLAGS_ZF;=0A+ = break;=0A+=0A+ case X86EMUL_CMPXCHG_FAILED:=0A+ = rc =3D X86EMUL_OKAY;=0A+ goto cmpxchgNb_failed;=0A+= =0A+ default:=0A goto done;=0A- = _regs.eflags |=3D X86_EFLAGS_ZF;=0A+ }=0A }=0A = break;=0A }=0A@@ -8049,6 +8073,8 @@ x86_emulate(=0A rc =3D = ops->cmpxchg(=0A dst.mem.seg, dst.mem.off, &dst.orig_val,= =0A &dst.val, dst.bytes, ctxt);=0A+ if ( rc = =3D=3D X86EMUL_CMPXCHG_FAILED )=0A+ rc =3D X86EMUL_RETRY;=0A= }=0A else=0A {=0A--- a/xen/arch/x86/x86_emulate/x8= 6_emulate.h=0A+++ b/xen/arch/x86/x86_emulate/x86_emulate.h=0A@@ -153,6 = +153,8 @@ struct x86_emul_fpu_aux {=0A #define X86EMUL_EXCEPTION 2=0A = /* Retry the emulation for some reason. No state modified. */=0A #define = X86EMUL_RETRY 3=0A+ /* (cmpxchg accessor): CMPXCHG failed. = */=0A+#define X86EMUL_CMPXCHG_FAILED 4=0A /*=0A * Operation fully done = by one of the hooks:=0A * - validate(): operation completed (except = common insn retire logic)=0A@@ -160,7 +162,7 @@ struct x86_emul_fpu_aux = {=0A * - read_io() / write_io(): bypass GPR update (non-string insns = only)=0A * Undefined behavior when used anywhere else.=0A */=0A-#define= X86EMUL_DONE 4=0A+#define X86EMUL_DONE 5=0A =0A /* = FPU sub-types which may be requested via ->get_fpu(). */=0A enum x86_emulat= e_fpu_type {=0A@@ -250,6 +252,8 @@ struct x86_emulate_ops=0A /*=0A = * cmpxchg: Emulate an atomic (LOCKed) CMPXCHG operation.=0A * = @p_old: [IN ] Pointer to value expected to be current at @addr.=0A+ * = [OUT] Pointer to value found at @addr (may always be=0A+ * = updated, meaningful for X86EMUL_CMPXCHG_FAILED only).=0A * = @p_new: [IN ] Pointer to value to write to @addr.=0A * @bytes: [IN = ] Operation size (up to 8 (x86/32) or 16 (x86/64) bytes).=0A */=0A--- = a/xen/include/asm-x86/paging.h=0A+++ b/xen/include/asm-x86/paging.h=0A@@ = -89,7 +89,7 @@ struct shadow_paging_mode {=0A = void *src, u32 bytes,=0A = struct sh_emulate_ctxt *sh_ctxt);=0A int (*x86_emula= te_cmpxchg )(struct vcpu *v, unsigned long va,=0A- = unsigned long old, =0A+ = unsigned long *old,=0A = unsigned long new,=0A = unsigned int bytes,=0A struct = sh_emulate_ctxt *sh_ctxt);=0A --=__Part98A116CF.2__= Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --=__Part98A116CF.2__=--