From mboxrd@z Thu Jan 1 00:00:00 1970 From: Razvan Cojocaru Subject: Re: [PATCH RFC] x86/emulate: implement hvmemul_cmpxchg() with an actual CMPXCHG Date: Wed, 29 Mar 2017 18:04:47 +0300 Message-ID: <185bbccb-0156-07fe-d060-6135cae07caf@bitdefender.com> References: <1490361899-18303-1-git-send-email-rcojocaru@bitdefender.com> <58DA510A0200007800148F6F@prv-mh.provo.novell.com> <925827a5-b346-1733-3c0a-64eaa7b3e251@bitdefender.com> <58DA5B7E020000780014900C@prv-mh.provo.novell.com> <58DBD8FF020000780014A113@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------98F4263CC98C85C8C22EE572" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Jan Beulich Cc: andrew.cooper3@citrix.com, paul.durrant@citrix.com, Tim Deegan , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------98F4263CC98C85C8C22EE572 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 03/29/2017 05:00 PM, Razvan Cojocaru wrote: > On 03/29/2017 04:55 PM, Jan Beulich wrote: >>>>> On 28.03.17 at 12:50, wrote: >>> On 03/28/2017 01:47 PM, Jan Beulich wrote: >>>>>>> On 28.03.17 at 12:27, wrote: >>>>> On 03/28/2017 01:03 PM, Jan Beulich wrote: >>>>>>>>> On 28.03.17 at 11:14, wrote: >>>>>>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a >>>>>>> failed CMPXCHG should happen just once, with the proper registers and ZF >>>>>>> set. The guest surely expects neither that the instruction resume until >>>>>>> it succeeds, nor that some hidden loop goes on for an undeterminate >>>>>>> ammount of time until a CMPXCHG succeeds. >>>>>> >>>>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to >>>>>> the instruction being restarted instead of completed. >>>>> >>>>> Indeed, but it works differently with hvm_emulate_one_vm_event() where >>>>> RETRY currently would have the instruction be re-executed (properly >>>>> re-executed, not just re-emulated) by the guest. >>>> >>>> Right - see my other reply to Andrew: The function likely would >>>> need to tell apart guest CMPXCHG uses from us using the insn to >>>> carry out the write by some other one. That may involve >>>> adjustments to the memory write logic in x86_emulate() itself, as >>>> the late failure of the comparison then would also need to be >>>> communicated back (via ZF clear) to the guest. >>> >>> Exactly, it would require quite some reworking of x86_emulate(). >> >> I had imagined it to be less intrusive (outside of x86_emulate()), >> but I've now learned why Andrew was able to get rid of >> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior >> was never implemented. Attached a first take at it, which has >> seen smoke testing, but nothing more. The way it ends up being >> I don't think this can reasonably be considered for 4.9 at this >> point in time. (Also Cc-ing Tim for the shadow code changes, >> even if this isn't really a proper patch submission.) > > Thanks! I'll give a spin with a modified version of my CMPXCHG patch as > soon as possible. With the attached patch with hvmemul_cmpxchg() now returning X86EMUL_CMPXCHG_FAILED if __cmpxchg() fails my (32-bit) Windows 7 guest gets stuck at the "Starting Windows" screen. It's state appears to be: # ./xenctx -a 3 cs:eip: 0008:8bcd85d6 flags: 00200246 cid i z p ss:esp: 0010:82736b9c eax: 00000000 ebx: 84f3a678 ecx: 84ee2610 edx: 001eb615 esi: 40008000 edi: 82739d20 ebp: 82736c20 ds: 0023 es: 0023 fs: 0030 gs: 0000 cr0: 8001003b cr2: 8fd94000 cr3: 00185000 cr4: 000406f9 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: fffe0ff0 dr7: 00000400 Code (instr addr 8bcd85d6) 47 fc 83 c7 14 4e 75 ef 5f 5e c3 cc cc cc cc cc cc 8b ff fb f4 cc cc cc cc cc 8b ff 55 8b ec # ./xenctx -a 3 cs:eip: 0008:8bcd85d6 flags: 00200246 cid i z p ss:esp: 0010:82736b9c eax: 00000000 ebx: 84f3a678 ecx: 84ee2610 edx: 002ca60d esi: 40008000 edi: 82739d20 ebp: 82736c20 ds: 0023 es: 0023 fs: 0030 gs: 0000 cr0: 8001003b cr2: 8fd94000 cr3: 00185000 cr4: 000406f9 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: fffe0ff0 dr7: 00000400 Code (instr addr 8bcd85d6) 47 fc 83 c7 14 4e 75 ef 5f 5e c3 cc cc cc cc cc cc 8b ff fb f4 cc cc cc cc cc 8b ff 55 8b ec This only happens in SMP scenarios (my guest had 10 VCPUs for easy reproduction). With a single VCPU, the guest booted fine. So something somehow is still not right when a CMPXCHG fails in a race-type situation (unless something's obviously wrong with my patch, but I don't see it). Thanks, Razvan --------------98F4263CC98C85C8C22EE572 Content-Type: text/x-diff; name="real_cmpxchg.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="real_cmpxchg.patch" diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 2d92957..b946ef7 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1029,6 +1030,77 @@ static int hvmemul_wbinvd_discard( return X86EMUL_OKAY; } +static int hvmemul_vaddr_to_mfn( + unsigned long addr, + mfn_t *mfn, + uint32_t pfec, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t gpa = addr & ~PAGE_MASK; + struct page_info *page; + p2m_type_t p2mt; + unsigned long gfn; + struct vcpu *curr = current; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + + gfn = paging_gva_to_gfn(curr, addr, &pfec); + + if ( gfn == gfn_x(INVALID_GFN) ) + { + pagefault_info_t pfinfo = {}; + + if ( ( pfec & PFEC_page_paged ) || ( pfec & PFEC_page_shared ) ) + return X86EMUL_RETRY; + + pfinfo.linear = addr; + pfinfo.ec = pfec; + + x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt); + return X86EMUL_EXCEPTION; + } + + gpa |= (paddr_t)gfn << PAGE_SHIFT; + + /* + * No need to do the P2M lookup for internally handled MMIO, benefiting + * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses, + * - newer Windows (like Server 2012) for HPET accesses. + */ + if ( !nestedhvm_vcpu_in_guestmode(curr) && hvm_mmio_internal(gpa) ) + return X86EMUL_UNHANDLEABLE; + + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); + + if ( !page ) + return X86EMUL_UNHANDLEABLE; + + if ( p2m_is_paging(p2mt) ) + { + put_page(page); + p2m_mem_paging_populate(curr->domain, gfn); + return X86EMUL_RETRY; + } + + if ( p2m_is_shared(p2mt) ) + { + put_page(page); + return X86EMUL_RETRY; + } + + if ( p2m_is_grant(p2mt) ) + { + put_page(page); + return X86EMUL_UNHANDLEABLE; + } + + *mfn = _mfn(page_to_mfn(page)); + + put_page(page); + + return X86EMUL_OKAY; +} + static int hvmemul_cmpxchg( enum x86_segment seg, unsigned long offset, @@ -1037,8 +1109,70 @@ static int hvmemul_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - /* Fix this in case the guest is really relying on r-m-w atomicity. */ - return hvmemul_write(seg, offset, p_new, bytes, ctxt); + unsigned long addr, reps = 1; + int rc = X86EMUL_OKAY; + unsigned long old = 0, new = 0; + uint32_t pfec = PFEC_page_present | PFEC_write_access; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + mfn_t mfn[2]; + void *map = NULL; + struct domain *currd = current->domain; + + if ( is_x86_system_segment(seg) ) + pfec |= PFEC_implicit; + else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 ) + pfec |= PFEC_user_mode; + + rc = hvmemul_virtual_to_linear( + seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr); + + if ( rc != X86EMUL_OKAY || !bytes ) + return rc; + + rc = hvmemul_vaddr_to_mfn(addr, &mfn[0], pfec, ctxt); + + if ( rc != X86EMUL_OKAY ) + return rc; + + if ( likely(((addr + bytes - 1) & PAGE_MASK) == (addr & PAGE_MASK)) ) + { + /* Whole write fits on a single page. */ + mfn[1] = INVALID_MFN; + map = map_domain_page(mfn[0]); + } + else + { + rc = hvmemul_vaddr_to_mfn((addr + bytes - 1) & PAGE_MASK, &mfn[1], + pfec, ctxt); + if ( rc != X86EMUL_OKAY ) + return rc; + + map = vmap(mfn, 2); + } + + if ( !map ) + return X86EMUL_UNHANDLEABLE; + + map += (addr & ~PAGE_MASK); + + memcpy(&old, p_old, bytes); + memcpy(&new, p_new, bytes); + + if ( __cmpxchg(map, old, new, bytes) != old ) + rc = X86EMUL_CMPXCHG_FAILED; + + paging_mark_dirty(currd, mfn[0]); + + if ( unlikely(mfn_valid(mfn[1])) ) + { + paging_mark_dirty(currd, mfn[1]); + vunmap((void *)((unsigned long)map & PAGE_MASK)); + } + else + unmap_domain_page(map); + + return rc; } static int hvmemul_validate( --------------98F4263CC98C85C8C22EE572 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --------------98F4263CC98C85C8C22EE572--