From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lengyel, Tamas" Subject: Re: [PATCH 2/2] altp2m: Implement p2m_get_mem_access for altp2m views Date: Thu, 28 Jan 2016 10:17:22 -0700 Message-ID: References: <1453925201-15926-1-git-send-email-tlengyel@novetta.com> <1453925201-15926-2-git-send-email-tlengyel@novetta.com> <56AA27D402000078000CC080@prv-mh.provo.novell.com> <56AA2DC0.9060806@bitdefender.com> <56AA31C6.4030202@bitdefender.com> <56AA4284.3080700@bitdefender.com> <56AA4A27.3020202@bitdefender.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1903587504215883977==" Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aOqCX-0000Bg-5g for xen-devel@lists.xenproject.org; Thu, 28 Jan 2016 17:17:45 +0000 Received: by mail-vk0-f47.google.com with SMTP id e185so27526661vkb.1 for ; Thu, 28 Jan 2016 09:17:42 -0800 (PST) In-Reply-To: <56AA4A27.3020202@bitdefender.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Razvan Cojocaru Cc: Wei Liu , Ian Campbell , Stefano Stabellini , George Dunlap , Andrew Cooper , Ian Jackson , Stefano Stabellini , Jan Beulich , xen-devel@lists.xenproject.org, Keir Fraser List-Id: xen-devel@lists.xenproject.org --===============1903587504215883977== Content-Type: multipart/alternative; boundary=001a11440176a096de052a6818e5 --001a11440176a096de052a6818e5 Content-Type: text/plain; charset=UTF-8 On Thu, Jan 28, 2016 at 10:04 AM, Razvan Cojocaru wrote: > On 01/28/2016 06:40 PM, Lengyel, Tamas wrote: > > > > > > On Thu, Jan 28, 2016 at 9:32 AM, Razvan Cojocaru > > > wrote: > > > > On 01/28/2016 05:58 PM, Lengyel, Tamas wrote: > > > > > > > > > On Thu, Jan 28, 2016 at 8:20 AM, Razvan Cojocaru > > > > > > >> wrote: > > > > > > On 01/28/2016 05:12 PM, Lengyel, Tamas wrote: > > > > > > > > On Jan 28, 2016 8:02 AM, "Razvan Cojocaru" < > rcojocaru@bitdefender.com > > >> > > > > > > >>> wrote: > > > >> > > > >> On 01/28/2016 04:42 PM, Lengyel, Tamas wrote: > > > >> > > > > >> > On Jan 28, 2016 6:38 AM, "Jan Beulich" > > > > > > > > > >> > > > >> > > > > > > > > > >>>> wrote: > > > >> >> > > > >> >> >>> On 27.01.16 at 21:06, > > > > > > > > > >> > > > >> > > > > > > > > > >>>> > wrote: > > > >> >> > --- a/xen/arch/x86/mm/p2m.c > > > >> >> > +++ b/xen/arch/x86/mm/p2m.c > > > >> >> > @@ -1572,7 +1572,9 @@ void > > p2m_mem_access_emulate_check(struct > > > > vcpu *v, > > > >> >> > bool_t violation = 1; > > > >> >> > const struct vm_event_mem_access *data = > > > &rsp->u.mem_access; > > > >> >> > > > > >> >> > - if ( p2m_get_mem_access(v->domain, > > _gfn(data->gfn), > > > >> > &access) == 0 ) > > > >> >> > + if ( p2m_get_mem_access(v->domain, > > _gfn(data->gfn), > > > >> >> > + > > altp2m_active(v->domain) ? > > > >> > vcpu_altp2m(v).p2midx : 0, > > > >> >> > + &access) == 0 ) > > > >> >> > > > >> >> This looks to be a behavioral change beyond what title > and > > > >> >> description say, and it's not clear whether that's > > actually the > > > >> >> behavior everyone wants. > > > >> > > > > >> > I'm fairly comfident its exactly the expected behavior > > when one > > > uses > > > >> > mem_access in altp2m tables and emulation. Right now > because > > > the lack of > > > >> > this AFAIK emulation would not work correctly with > > altp2m. But > > > Razvan > > > >> > probably can chime in as he uses this path actively. > > > >> > > > >> I've done an experiment to see how much slower using altp2m > > would > > > be as > > > >> compared to emulation - so I'm not a big user of the > > feature, but > > > I did > > > >> find it cumbersome to have to work with two sets of APIs > > (one for > > > what > > > >> could arguably be called the default altp2m view, i.e. the > > regular > > > >> xc_set_mem_access(), and one for altp2m, i.e. > > > >> xc_altp2m_set_mem_access()). Furthermore, the APIs do not > > currently > > > >> offer the same features (most notably, > > xc_altp2m_get_mem_access() is > > > >> completely missing). I've mentioned this to Tamas while > > initially > > > trying > > > >> to get it to work. > > > >> > > > >> Now, whether the behaviour I expect is what everyone > > expects is, of > > > >> course, wide open to debate. But I think we can all agree > > that the > > > >> altp2m interface can, and probably should, be improved. > > > >> > > > > > > > > There is that, but also, what is the exact logic behind > > doing this > > > check > > > > before emulation? AFAIU emulation happens in response to a > > vm_event so > > > > we should be fairly certain that this check succeeds as it > just > > > verifies > > > > that indeed the permissions are restricted by mem_access in > the > > > p2m (and > > > > with altp2m this should be the active one). But when is this > > check > > > > normally expected to fail? > > > > > > That check is important, please do not remove it. A vm_event > > is sent > > > into userspace to our monitoring application, but the > monitoring > > > application can actually remove the page restrictions before > > replying, > > > so in that case emulation is pointless - there will be no more > > page > > > faults for that instruction. > > > > > > > > > I see, but then why would you reply with VM_EVENT_FLAG_EMULATE? > > You know > > > you removed the permission before sending the reply, so this > > sounds like > > > something specific to your application. > > > > It's cheap insurance that things go right. If there's some issue with > > page rights, or some external tool somehow does an > xc_set_mem_access(), > > things won't go wrong. > > > > > > I can see this working for your application if you don't cache the > > mem_access permissions locally and you don't want to query for it before > > deciding to send the emulate flag in the response or not. Although, I > > think that would be the best way to go here. > > Querying is out of the question, for obvious performance reasons. That's > why we've cached the registers in the vm_event request - we could have > not done that and instead just query them via libxc. But one small > decision like that and the monitored guest is running twice as slow. > This way, you can just set the emulate flag and have the hypervisor do > the right thing anyway, with no extra userspace <-> hypervisor roundtrips. > > Caching might work, but then again that's extra work, memory used in the > application (in _each_ application, not just ours). So on one hand, we > have the current scenario where things can't go wrong and the solution > is in one place, vs. the other scenario, where each application needs to > solve the problem by doing tracking / caching / querying that the HV > does anyway in p2m, and pay with a possible guest crash or freeze for > failure. > > > And they will go wrong if Xen thinks it should > > emulate the next instruction and the next instruction is not the one > > that has caused the original fault. > > > > > > How could that happen? When the vCPU is resumed after the fault, isn't > > the same instruction guaranteed to be retried? > > The instruction is the same, but if the page restrictions have been > lifted (somehow) and the EMULATE flag is still set, the original > instruction will run normally (because it won't trigger another page > fault). But the HV will still think that it needs to emulate the next > page fault, and so it will emulate whatever instruction causes the next > page fault (if it matches the emulate conditions). > > > I would think that benefits any > > application. > > > > > > It's just a bit of an obscure exception. From an API perspective I would > > rather have Xen do what I tell it to do - in this case emulate - rather > > then it doing something else silently behind the scenes that you really > > only find out about if you read the code. > > But the way the emulation code works now, it _can't_ emulate (see above > explanation). Emulation currently only happens as a result of a page > fault, and there will be no page fault if the page restriction are > lifted. I am thinking about a better way to achieve this, but until then > I think it's a good idea to keep the check in. > > I hope I've been able to shed more light on this. > Sure, make sense. Since AFAIK you guys are the only one really using this path I'm cool with keeping it as it is, was really just wondering for the logic behind it. Without a reference implementation using this path it's not exactly trivial trying to figure out why things are the way they are. Jan, with the explanation above by Razvan, when using emulation with altp2m the correct check here is to see if the altp2m permissions are still restricted, otherwise no need to emulate. So this patch actually makes the two systems correctly work together. Without this patch only the hostp2m permissions are checked which may not have the restrictions that actually caused the fault and lead to infinite faults and hanging the VM. Tamas --001a11440176a096de052a6818e5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Thu, Jan 28, 2016 at 10:04 AM, Razvan Cojocaru <= ;rcojocaru@b= itdefender.com> wrote:
On 01/28/2016 06:40 PM, Lengyel, Tamas wrote:
>
>
> On Thu, Jan 28, 2016 at 9:32 AM, Razvan Cojocaru
> <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0On 01/28/2016 05:58 PM, Lengyel, Tamas wrote:
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0> On Thu, Jan 28, 2016 at 8:20 AM, Razvan Cojoca= ru
>=C2=A0 =C2=A0 =C2=A0> <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>
>=C2=A0 =C2=A0 =C2=A0<mailto:rcojocaru@bitdefender.com
>=C2=A0 =C2=A0 =C2=A0<mailto:rcojocaru@bitdefender.com>>> wrote:
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0On 01/28/2016 05:12 PM, Len= gyel, Tamas wrote:
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> On Jan 28, 2016 8:02 A= M, "Razvan Cojocaru" <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>
>=C2=A0 =C2=A0 =C2=A0<mailto:rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> <mailto:rcojocaru@bitdefender.com
>=C2=A0 =C2=A0 =C2=A0<mailto:rcojocaru@bitdefender.com> <mailto:rcojocaru@bitdefender.com
>=C2=A0 =C2=A0 =C2=A0<mailto:rcojocaru@bitdefender.com>>>> wrote:
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> On 01/28/2016 04:4= 2 PM, Lengyel, Tamas wrote:
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > On Jan 28, 20= 16 6:38 AM, "Jan Beulich" <JBeulich@suse.com
>=C2=A0 =C2=A0 =C2=A0<mailto:JBe= ulich@suse.com> <mailto:JBeu= lich@suse.com
>=C2=A0 =C2=A0 =C2=A0<mailto:JBe= ulich@suse.com>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> <mailto:JBeulich@suse.com <mailto:JBeulich@suse.com>
>=C2=A0 =C2=A0 =C2=A0<mailto:JBe= ulich@suse.com <mailto:JBeulich= @suse.com>>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > <mailto:JBeulich@suse.com <mailto:JBeulich@suse.com>
>=C2=A0 =C2=A0 =C2=A0<mailto:JBe= ulich@suse.com <mailto:JBeulich= @suse.com>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0<mailto:JBeulich@suse.com <mailto:JBeulich@suse.com>
>=C2=A0 =C2=A0 =C2=A0<mailto:JBe= ulich@suse.com <mailto:JBeulich= @suse.com>>>>> wrote:
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> >>&= gt; On 27.01.16 at 21:06, <tleng= yel@novetta.com
>=C2=A0 =C2=A0 =C2=A0<mailto:= tlengyel@novetta.com> <mailto:tlengyel@novetta.com
>=C2=A0 =C2=A0 =C2=A0<mailto:= tlengyel@novetta.com>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> <mailto:tlengyel@novetta.com <mailto:tlengyel@novetta.com>
>=C2=A0 =C2=A0 =C2=A0<mailto:= tlengyel@novetta.com <mailto:tlengyel@novetta.com>>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > <mailto:tlengyel@novetta.com
>=C2=A0 =C2=A0 =C2=A0<mailto:= tlengyel@novetta.com> <mailto:tlengyel@novetta.com
>=C2=A0 =C2=A0 =C2=A0<mailto:= tlengyel@novetta.com>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0<mailto:tlengyel@novetta.com <mailto:tlengyel@novetta.com>
>=C2=A0 =C2=A0 =C2=A0<mailto:= tlengyel@novetta.com <mailto:tlengyel@novetta.com>>>>> wrote:
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> > --- = a/xen/arch/x86/mm/p2m.c
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> > +++ = b/xen/arch/x86/mm/p2m.c
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> > @@ -= 1572,7 +1572,9 @@ void
>=C2=A0 =C2=A0 =C2=A0p2m_mem_access_emulate_check(struct
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> vcpu *v,
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> >=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bool_t violation =3D 1;
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> >=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 const struct vm_event_mem_access *data =3D<= br> >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0&rsp->u.mem_access;<= br> >=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> >
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> > -=C2= =A0 =C2=A0 =C2=A0 =C2=A0 if ( p2m_get_mem_access(v->domain,
>=C2=A0 =C2=A0 =C2=A0_gfn(data->gfn),
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > &access) = =3D=3D 0 )
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> > +=C2= =A0 =C2=A0 =C2=A0 =C2=A0 if ( p2m_get_mem_access(v->domain,
>=C2=A0 =C2=A0 =C2=A0_gfn(data->gfn),
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> > + >=C2=A0 =C2=A0 =C2=A0altp2m_active(v->domain) ?
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > vcpu_altp2m(v= ).p2midx : 0,
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> > +=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &access) =3D=3D 0 )
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> This look= s to be a behavioral change beyond what title and
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> descripti= on say, and it's not clear whether that's
>=C2=A0 =C2=A0 =C2=A0actually the
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >> behavior = everyone wants.
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> >
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > I'm fairl= y comfident its exactly the expected behavior
>=C2=A0 =C2=A0 =C2=A0when one
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0uses
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > mem_access in= altp2m tables and emulation. Right now because
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0the lack of
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > this AFAIK em= ulation would not work correctly with
>=C2=A0 =C2=A0 =C2=A0altp2m. But
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0Razvan
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> > probably can = chime in as he uses this path actively.
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> I've done an e= xperiment to see how much slower using altp2m
>=C2=A0 =C2=A0 =C2=A0would
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0be as
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> compared to emulat= ion - so I'm not a big user of the
>=C2=A0 =C2=A0 =C2=A0feature, but
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0I did
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> find it cumbersome= to have to work with two sets of APIs
>=C2=A0 =C2=A0 =C2=A0(one for
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0what
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> could arguably be = called the default altp2m view, i.e. the
>=C2=A0 =C2=A0 =C2=A0regular
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> xc_set_mem_access(= ), and one for altp2m, i.e.
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> xc_altp2m_set_mem_= access()). Furthermore, the APIs do not
>=C2=A0 =C2=A0 =C2=A0currently
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> offer the same fea= tures (most notably,
>=C2=A0 =C2=A0 =C2=A0xc_altp2m_get_mem_access() is
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> completely missing= ). I've mentioned this to Tamas while
>=C2=A0 =C2=A0 =C2=A0initially
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0trying
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> to get it to work.=
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> Now, whether the b= ehaviour I expect is what everyone
>=C2=A0 =C2=A0 =C2=A0expects is, of
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> course, wide open = to debate. But I think we can all agree
>=C2=A0 =C2=A0 =C2=A0that the
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>> altp2m interface c= an, and probably should, be improved.
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> There is that, but als= o, what is the exact logic behind
>=C2=A0 =C2=A0 =C2=A0doing this
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0check
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> before emulation? AFAI= U emulation happens in response to a
>=C2=A0 =C2=A0 =C2=A0vm_event so
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> we should be fairly ce= rtain that this check succeeds as it just
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0verifies
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> that indeed the permis= sions are restricted by mem_access in the
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0p2m (and
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> with altp2m this shoul= d be the active one). But when is this
>=C2=A0 =C2=A0 =C2=A0check
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0> normally expected to f= ail?
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0That check is important, pl= ease do not remove it. A vm_event
>=C2=A0 =C2=A0 =C2=A0is sent
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0into userspace to our monit= oring application, but the monitoring
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0application can actually re= move the page restrictions before
>=C2=A0 =C2=A0 =C2=A0replying,
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0so in that case emulation i= s pointless - there will be no more
>=C2=A0 =C2=A0 =C2=A0page
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0faults for that instruction= .
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0> I see, but then why would you reply with VM_EV= ENT_FLAG_EMULATE?
>=C2=A0 =C2=A0 =C2=A0You know
>=C2=A0 =C2=A0 =C2=A0> you removed the permission before sending the = reply, so this
>=C2=A0 =C2=A0 =C2=A0sounds like
>=C2=A0 =C2=A0 =C2=A0> something specific to your application.
>
>=C2=A0 =C2=A0 =C2=A0It's cheap insurance that things go right. If t= here's some issue with
>=C2=A0 =C2=A0 =C2=A0page rights, or some external tool somehow does an = xc_set_mem_access(),
>=C2=A0 =C2=A0 =C2=A0things won't go wrong.
>
>
> I can see this working for your application if you don't cache the=
> mem_access permissions locally and you don't want to query for it = before
> deciding to send the emulate flag in the response or not. Although, I<= br> > think that would be the best way to go here.

Querying is out of the question, for obvious performance reason= s. That's
why we've cached the registers in the vm_event request - we could have<= br> not done that and instead just query them via libxc. But one small
decision like that and the monitored guest is running twice as slow.
This way, you can just set the emulate flag and have the hypervisor do
the right thing anyway, with no extra userspace <-> hypervisor roundt= rips.

Caching might work, but then again that's extra work, memory used in th= e
application (in _each_ application, not just ours). So on one hand, we
have the current scenario where things can't go wrong and the solution<= br> is in one place, vs. the other scenario, where each application needs to solve the problem by doing tracking / caching / querying that the HV
does anyway in p2m, and pay with a possible guest crash or freeze for
failure.

>=C2=A0 =C2=A0 =C2=A0And they will go wrong if Xen thinks it should
>=C2=A0 =C2=A0 =C2=A0emulate the next instruction and the next instructi= on is not the one
>=C2=A0 =C2=A0 =C2=A0that has caused the original fault.
>
>
> How could that happen? When the vCPU is resumed after the fault, isn&#= 39;t
> the same instruction guaranteed to be retried?

The instruction is the same, but if the page restrictions have been<= br> lifted (somehow) and the EMULATE flag is still set, the original
instruction will run normally (because it won't trigger another page fault). But the HV will still think that it needs to emulate the next
page fault, and so it will emulate whatever instruction causes the next
page fault (if it matches the emulate conditions).

>=C2=A0 =C2=A0 =C2=A0I would think that benefits any
>=C2=A0 =C2=A0 =C2=A0application.
>
>
> It's just a bit of an obscure exception. From an API perspective I= would
> rather have Xen do what I tell it to do - in this case emulate - rathe= r
> then it doing something else silently behind the scenes that you reall= y
> only find out about if you read the code.

But the way the emulation code works now, it _can't_ emulate (se= e above
explanation). Emulation currently only happens as a result of a page
fault, and there will be no page fault if the page restriction are
lifted. I am thinking about a better way to achieve this, but until then I think it's a good idea to keep the check in.

I hope I've been able to shed more light on this.
=
Sure, make sense. Since AFAIK you guys are the only one real= ly using this path I'm cool with keeping it as it is, was really just w= ondering for the logic behind it. Without a reference implementation using = this path it's not exactly trivial trying to figure out why things are = the way they are.

Jan,
with the explanation above by Razvan, when= using emulation with altp2m the correct check here is to see if the altp2m= permissions are still restricted, otherwise no need to emulate. So this pa= tch actually makes the two systems correctly work together. Without this pa= tch only the hostp2m permissions are checked which may not have the restric= tions that actually caused the fault and lead to infinite faults and hangin= g the VM.

Tamas

--001a11440176a096de052a6818e5-- --===============1903587504215883977== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============1903587504215883977==--