From mboxrd@z Thu Jan  1 00:00:00 1970
From: Razvan Cojocaru <rcojocaru@bitdefender.com>
Subject: Re: [PATCH RFC V9 4/5] xen,
 libxc: Request page fault injection via libxc
Date: Wed, 10 Sep 2014 12:30:48 +0300
Message-ID: <54101A48.8030401@bitdefender.com>
References: <53FF38A6020000780002EB2B@mail.emea.novell.com>	<54002F43.4070802@bitdefender.com>	<5400638A020000780002EFD6@mail.emea.novell.com>	<540421E1.9020505@bitdefender.com>	<540453C8020000780002F59C@mail.emea.novell.com>	<54045E7C.50604@bitdefender.com>	<54047D1D020000780002F73A@mail.emea.novell.com>	<54058B4E.9060001@bitdefender.com>	<20140902132434.GA24202@deinos.phlegethon.org>	<CAFLBxZZgxHW-fVw4pro6Kv8K5DAaBj6t7ubxEEdD1sRzSsXvtA@mail.gmail.com>
	<20140909201435.GB82414@deinos.phlegethon.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <rcojocaru@bitdefender.com>) id 1XReEb-000604-O3
	for xen-devel@lists.xenproject.org; Wed, 10 Sep 2014 09:30:41 +0000
Received: from smtp01.buh.bitdefender.com (smtp.bitdefender.biz [10.17.80.75])
	by mx-sr.buh.bitdefender.com (Postfix) with ESMTP id 8A61680086
	for <xen-devel@lists.xenproject.org>;
	Wed, 10 Sep 2014 12:30:38 +0300 (EEST)
In-Reply-To: <20140909201435.GB82414@deinos.phlegethon.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Tim Deegan <tim@xen.org>, George Dunlap <dunlapg@umich.edu>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Jun Nakajima <jun.nakajima@intel.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "Dong,
	Eddie" <eddie.dong@intel.com>, Jan Beulich <JBeulich@suse.com>, xen-devel <xen-devel@lists.xenproject.org>, Ian Jackson <ian.jackson@eu.citrix.com>
List-Id: xen-devel@lists.xenproject.org

On 09/09/2014 11:14 PM, Tim Deegan wrote:
> At 17:57 +0100 on 09 Sep (1410281829), George Dunlap wrote:
>> On Tue, Sep 2, 2014 at 2:24 PM, Tim Deegan <tim@xen.org> wrote:
>>> Hi,
>>>
>>> At 12:18 +0300 on 02 Sep (1409656686), Razvan Cojocaru wrote:
>>>> While we need to set the data per-domain and have whatever VCPU inject
>>>> the page fault - _but_only_if_ it's in usermode and its CR3 points to
>>>> something interesting.
>>>
>>> That's a strange and specific thing to ask the hypervisor to do for
>>> you.  Given that you can already trap CR3 changes as mem-events can
>>> you trigger the fault injection in response to the contect switch?
>>> I guess that would probably catch it in kernel mode. :(
>>
>> I was wondering, rather than special-casing inject_trap, would it make
>> sense to be able for the memory controller to get notifications when
>> certain more complex conditions happen (e.g., "some vcpu is in user
>> mode with this CR3")?  Then the controller could ask to be notified
>> when the event happens, and when it does, just call inject_fault.
> 
> Yes, that sounds like a better place to put this kind of test.  As
> part of the mem_event trigger framework it doesn't seem nearly so out
> of place (and it avoids many of the problems of clashes between
> different event injection paths).

Do you mean someplace here (hvm.c)?

3265 int hvm_set_cr3(unsigned long value)
3266 {
3267     struct vcpu *v = current;
3268     struct page_info *page;
3269     unsigned long old;
3270
3271     if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) &&
3272          (value != v->arch.hvm_vcpu.guest_cr[3]) )
3273     {
3274         /* Shadow-mode CR3 change. Check PDBR and update refcounts. */
3275         HVM_DBG_LOG(DBG_LEVEL_VMMU, "CR3 value = %lx", value);
3276         page = get_page_from_gfn(v->domain, value >> PAGE_SHIFT,
3277                                  NULL, P2M_ALLOC);
3278         if ( !page )
3279             goto bad_cr3;
3280
3281         put_page(pagetable_get_page(v->arch.guest_table));
3282         v->arch.guest_table = pagetable_from_page(page);
3283
3284         HVM_DBG_LOG(DBG_LEVEL_VMMU, "Update CR3 value = %lx", value);
3285     }
3286
3287     old=v->arch.hvm_vcpu.guest_cr[3];
3288     v->arch.hvm_vcpu.guest_cr[3] = value;
3289     paging_update_cr3(v);
3290     hvm_memory_event_cr3(value, old);
3291     return X86EMUL_OKAY;
3292
3293  bad_cr3:
3294     gdprintk(XENLOG_ERR, "Invalid CR3\n");
3295     domain_crash(v->domain);
3296     return X86EMUL_UNHANDLEABLE;
3297 }

Alongside hvm_memory_event_cr3(value, old), have another function
checking an array of CR3s and if v is in user mode and send out an event?

As I've explained in my earlier reply to Tamas, the way we use the
injection request hypercall now, conditions normally apply for immediate
injection.

Also, I'm not sure the "is it user mode?" check would reliably work at
the time of calling hvm_set_cr3(). Are CR3 loads not always happening in
kernel-mode?

>> That way, inject_fault isn't special-cased at all; and one could
>> imagine designing the "condition" such that any number of interesting
>> conditions could be trapped.
>>
>> Thoughts?
>>
>> But ultimately, as Tim said, you're basically just *hoping* that it
>> won't take too long to happen to be at the hypervisor when the proper
>> condition happens.  If the process in question isn't getting many
>> interrupts, or is spending the vast majority of its time in the
>> kernel, you may end up waiting an unbounded amount of time to be able
>> to "catch" it in user mode.  It seems like it would be better to find
>> a reliable way to trap on the return into user mode, in which case you
>> wouldn't need to have a special "wait for this complicated event to
>> happen" call at all, would you?
> 
> Yeah; I was thinking about page-protecting the process's stack as an
> approach to this.  Breakpointing the return address might work too but
> would probably cause more false alarms -- you'd at least need to walk
> up past the libc/win32 wrappers to avoid trapping every thread.
> 
> Ideally there'd be something vcpu-specific we could tinker with
> (e.g. arranging MSRs so that SYSRET will fault) once we see the right
> CR3 (assuming intercepting CR3 is cheap enough in this case).

All valid suggestions, however they would seem to have a greater impact
on guest responsiveness. There would be quite a lot of CR3 loads and
SYSRETs.


Thanks,
Razvan Cojocaru