From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Aravindh Puthiyaparambil (aravindp)" Subject: Re: Issue policing writes from Xen to PV domain memory Date: Fri, 9 May 2014 17:48:34 +0000 Message-ID: <97A500D504438F4ABC02EBA81613CC63317ED7CE@xmb-aln-x02.cisco.com> References: <97A500D504438F4ABC02EBA81613CC63317E7BBC@xmb-aln-x02.cisco.com> <5360BDA7020000780000D9F4@g0-1-119.ukb-fw-asa.gns.novell.com> <97A500D504438F4ABC02EBA81613CC63317E95B0@xmb-aln-x02.cisco.com> <53637B80020000780000E5F1@mail.emea.novell.com> <97A500D504438F4ABC02EBA81613CC63317EA268@xmb-aln-x02.cisco.com> <53674A17020000780000ED37@mail.emea.novell.com> <97A500D504438F4ABC02EBA81613CC63317EA99C@xmb-aln-x02.cisco.com> <5368B299020000780000F2FD@mail.emea.novell.com> <97A500D504438F4ABC02EBA81613CC63317EBC1B@xmb-aln-x02.cisco.com> <5369F079020000780000FAA8@mail.emea.novell.com> <97A500D504438F4ABC02EBA81613CC63317ED4A5@xmb-aln-x02.cisco.com> <536C96B20200007800010BD7@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WiouV-000133-Ed for xen-devel@lists.xenproject.org; Fri, 09 May 2014 17:48:39 +0000 In-Reply-To: <536C96B20200007800010BD7@mail.emea.novell.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: "xen-devel@lists.xenproject.org" , "TimDeegan(tim@xen.org)" List-Id: xen-devel@lists.xenproject.org >> The above sequence of events would not cause an EPT violation (I >> realize this can happen only in the guest context) or pagefault (The >> page is present and marked writable in the guest). All that would >> happen is an event to be sent to the listener from __hvm_copy() and no >cascading faults will occur. > >I have to admit that I'm unable to spot where such an event gets sent. I had preceded this by mentioning "Now take the similar scenario in the hypothetical HVM + EPT case where we are policing Xen writes to guest memory." The code would look something like this if it was implemented. Please note this was just a quick hacky write up :-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index da220bf..8ce99d5 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2750,6 +2750,65 @@ static enum hvm_copy_result __hvm_copy( return HVMCOPY_unhandleable; } + if ( unlikely(mem_event_check_ring(&curr->domain->mem_event->access)) ) + { + struct p2m_domain *p2m = p2m_get_hostp2m(curr->domain); + mfn_t gmfn; + p2m_type_t _p2mt; + p2m_access_t p2ma; + bool_t violation = 0; + mem_event_request_t *req_ptr = NULL; + + gmfn = p2m->get_entry(p2m, gfn, &_p2mt, &p2ma, 0, NULL);; + + /* If the access is against the permissions, then send to mem_event */ + switch (p2ma) + { + case p2m_access_n: + case p2m_access_n2rwx: + case p2m_access_x: + violation = 1; + break; + + case p2m_access_w: + case p2m_access_wx: + if ( flags & HVMCOPY_from_guest ) + violation = 1; + break; + + case p2m_access_r: + case p2m_access_rx: + case p2m_access_rx2rw: + if ( flags & HVMCOPY_to_guest ) + violation = 1; + break; + + case p2m_access_rw: + case p2m_access_rwx: + break; + } + + if ( violation ) + { + paddr_t gpa = (gfn << PAGE_SHIFT) + + (flags & HVMCOPY_virt ? + (addr & ((1 << PAGE_SHIFT) - 1)) : 0); + + if ( !p2m_mem_access_check(gpa, (flags & HVMCOPY_virt ? 1 : 0), + addr, 0, 1, 0, &req_ptr) ) + { + if ( unlikely(req_ptr != NULL) ) + { + mem_access_send_req(curr->domain, req_ptr); + xfree(req_ptr); + } + put_page(page); + return HVMCOPY_unhandleable; + } + } + } + p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK); if ( flags & HVMCOPY_to_guest ) >What is quite clear is that you don't want multiple nested runstate area >updates to occur in the first place, i.e. you need to properly deal with the >_first_ page fault occurring instead of waiting until multiple such events pile >up. And I'm afraid doing so will require some (perhaps gross) hackery... OK, I will take a stab at this. >> Looking at what the solution for the ring being full in the PV case >> whether we are policing Xen writes or not, calling wait() will not >> work due to the scenario I had mentioned a while back and is shown above >in the stack trace. >> I am repeating that flow here >> mem_event_claim_slot() -> >> mem_event_wait_slot() -> >> wait_event(mem_event_wait_try_grab(med, &rc) != - >EBUSY) >> >> wait_event() macro looks like this: >> do { >> if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) >> break; >> for ( ; ; ) { >> prepare_to_wait(&med->wq); >> if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) >> break; >> wait(); >> } >> finish_wait(&med->wq); >> } while (0) >> >> In the case where the ring is full, wait() gets called and the cpu >> gets scheduled away. But since it is in middle of a pagefault, when it >> runs again it ends up in handle_exception_saved and the same pagefault is >tried again. >> But since finish_wait() never ends up being called wqv->esp never >> becomes 0 and hence the assert fires on the next go around. So I think >> we should be calling >> process_pending_softirqs() instead of wait() for PV domains. > >That would effectively be a spin wait then, which is surely not the right thing. >But I don't follow your explanation above anyway - when coming back from >wait(), the state is the same as the original one, so the page fault handling >continues, it's not being retried. Let me step through this. The pagefault for runstate occurs and wait_event() gets called and the ring is full. wait_event(): do { if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will return EBUSY break; for ( ; ; ) { prepare_to_wait(&med->wq); if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will return EBUSY break; wait(); ==> Write to runstate area occurs... } finish_wait(&med->wq); } while (0) ...now this write to runstate will again cause a pagefault and we will end up in wait_event() again. The previous attempt would have called prepare_to_wait() but finish_wait() was not called. finish_wait()->__finish_wait() is where wqv->esp gets reset to NULL. So now: wait_event(): do { if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) ==> This will return EBUSY break; for ( ; ; ) { prepare_to_wait(&med->wq); ==> Assertion 'wqv->esp == 0' fails if ( mem_event_wait_try_grab(med, &rc) != -EBUSY ) break; wait(); } finish_wait(&med->wq); } while (0) Does this make sense? Thanks, Aravindh