From mboxrd@z Thu Jan 1 00:00:00 1970 From: Razvan Cojocaru Subject: Re: [PATCH RFC V9 4/5] xen, libxc: Request page fault injection via libxc Date: Tue, 02 Sep 2014 12:18:06 +0300 Message-ID: <54058B4E.9060001@bitdefender.com> References: <1409226482-12657-1-git-send-email-rcojocaru@bitdefender.com> <1409226482-12657-4-git-send-email-rcojocaru@bitdefender.com> <53FF36A1020000780002EAED@mail.emea.novell.com> <53FF1BD8.5010401@bitdefender.com> <53FF38A6020000780002EB2B@mail.emea.novell.com> <54002F43.4070802@bitdefender.com> <5400638A020000780002EFD6@mail.emea.novell.com> <540421E1.9020505@bitdefender.com> <540453C8020000780002F59C@mail.emea.novell.com> <54045E7C.50604@bitdefender.com> <54047D1D020000780002F73A@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1XOkEB-0000t0-MS for xen-devel@lists.xenproject.org; Tue, 02 Sep 2014 09:18:15 +0000 In-Reply-To: <54047D1D020000780002F73A@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: kevin.tian@intel.com, eddie.dong@intel.com, stefano.stabellini@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, jun.nakajima@intel.com, xen-devel@lists.xenproject.org, ian.jackson@eu.citrix.com, ian.campbell@citrix.com List-Id: xen-devel@lists.xenproject.org On 09/01/2014 03:05 PM, Jan Beulich wrote: >>>> On 01.09.14 at 13:54, wrote: >> On 09/01/2014 12:08 PM, Jan Beulich wrote: >>>>>> On 01.09.14 at 09:36, wrote: >>>> On 08/29/2014 12:27 PM, Jan Beulich wrote: >>>>>>>> On 29.08.14 at 09:44, wrote: >>>>>> I do understand the preference for a VCPU-based mechanism from a >>>>>> concurrency point of view, but that would simply potentially fail for >>>>>> us, hence defeating the purpose of the patch. I'm also not sure how that >>>>>> would be useful in the general case either, since the same problem that >>>>>> applies to us would seem to apply to the general case as well. >>>>> >>>>> Yeah, the whole thing probably needs a bit more thinking so that the >>>>> interface doesn't end up being a BitDefender-special. Indeed together >>>>> with the address space qualification, the interface might not be very >>>>> useful when made vCPU-bound. And taking it a little further into the >>>>> "generic" direction, allowing this to only inject #PF doesn't make a >>>>> very nice interface either. Plus we already have HVMOP_inject_trap, >>>>> i.e. your first line of thinking (and eventual explaining as the >>>>> motivation for a patch) should be why that can't be used. >>>> >>>> I'd say that it's memory-introspection specific rather than 3rd-party >>>> vendor specific. Without this this patch, memory-introspection support >>>> in general is impacted / less flexible, since there's no other way to >>>> bring swapped out pages back in. >>>> >>>> For all the reasons you've explained (at least as far as I understand >>>> it) there's not much room to go more generic - so maybe just renaming >>>> the libxc wrapper to something more specific ( >>>> xc_domain_request_usermode_pagefault?) is the solution here? >>> >>> Maybe, but only after you explained why the existing interface can >>> neither be used nor suitably extended. >> >> As far as I understand the HVMOP_inject_trap interface, it is simply (in >> this case) a way to trigger the equivalent of hvm_inject_page_fault() >> from userspace (via libxc). >> >> We need two additional checks: >> >> 1. That CR3 matches, because the way the application works, we need to >> inject a page fault related to the address space of whatever process is >> interesting at the time, and >> >> 2. That we're in usermode (the CPL check), because we know that it's >> safe to inject a page fault when we're in user mode, but it's not always >> safe to do so in kernel mode. >> >> The current interface seems to be a low-level, basic wrapper around >> hvm_inject_trap(). >> >> What we're trying to do is ask for a page fault when we're both A) in >> usermode, and B) when a target process matches - and are only interested >> in page faults, no other trap vector. >> >> Technically, the checks could, probably, be moved into (and new >> parameters added to) hvm_inject_trap() & friends, but it seems unlikely >> that users other than memory introspection applications would be >> interested in them, while they would possibly add to the complexity of >> the interface. The rest of the clients would have to carry dummy >> parameters around to use it. > > I'm not sure: Injecting faults with certain restrictions (like in your > case CPL and CR3) does seem to make quite some sense in > general when the fault isn't intended to be terminal (of a process > or the entire VM). It's just that so far we used fault injection only > to indicate error conditions. > > But as always - let's see if others are of differing opinion before > settling on either route. While waiting for other opinions, I've started looking at the HVMOP_inject_trap code, and it uses a per-VCPU data-saving model similar to the one we've been discussing (and agreed that does not work for the explained use-cases). The first giveaway is the libxc function: /* * Injects a hardware/software CPU trap, to take effect the next time the HVM * resumes. */ int xc_hvm_inject_trap( xc_interface *xch, domid_t dom, int vcpu, uint32_t vector, uint32_t type, uint32_t error_code, uint32_t insn_len, uint64_t cr2); which takes an "int vcpu" parameter. Then, this happens in xen/arch/x86/hvm/hvm.c: case HVMOP_inject_trap: { xen_hvm_inject_trap_t tr; struct domain *d; struct vcpu *v; if ( copy_from_guest(&tr, arg, 1 ) ) return -EFAULT; rc = rcu_lock_remote_domain_by_id(tr.domid, &d); if ( rc != 0 ) return rc; rc = -EINVAL; if ( !is_hvm_domain(d) ) goto param_fail8; rc = xsm_hvm_control(XSM_DM_PRIV, d, op); if ( rc ) goto param_fail8; rc = -ENOENT; if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL ) goto param_fail8; if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) rc = -EBUSY; else { v->arch.hvm_vcpu.inject_trap.vector = tr.vector; v->arch.hvm_vcpu.inject_trap.type = tr.type; v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code; v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len; v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2; rc = 0; } param_fail8: rcu_unlock_domain(d); break; } The "v->arch.hvm_vcpu.inject_trap. = initializer;" code again points to a per-VCPU implementation. And finally, hvm_do_resume() is also VCPU-dependent: void hvm_do_resume(struct vcpu *v) { struct domain *d = v->domain; struct hvm_ioreq_server *s; check_wakeup_from_wait(); if ( is_hvm_vcpu(v) ) pt_restore_timer(v); list_for_each_entry ( s, &d->arch.hvm_domain.ioreq_server.list, list_entry ) { struct hvm_ioreq_vcpu *sv; list_for_each_entry ( sv, &s->ioreq_vcpu_list, list_entry ) { if ( sv->vcpu == v ) { if ( !hvm_wait_for_io(sv, get_ioreq(s, v)) ) return; break; } } } /* Inject pending hw/sw trap */ if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) { hvm_inject_trap(&v->arch.hvm_vcpu.inject_trap); v->arch.hvm_vcpu.inject_trap.vector = -1; } } While we need to set the data per-domain and have whatever VCPU inject the page fault - _but_only_if_ it's in usermode and its CR3 points to something interesting. Hope I've been following the code correctly. Thanks, Razvan Cojocaru