All of lore.kernel.org
 help / color / mirror / Atom feed
* hap_invlpg() vs INVLPGA
@ 2016-01-29 13:24 Jan Beulich
  2016-01-29 13:57 ` Egger, Christoph
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-01-29 13:24 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

Christoph,

in commit dd6de3ab99 ("Implement Nested-on-Nested") you added
code to hap_invlpg() supposedly emulating INVLPGA. I've been
stumbling across this a number of times in the past, not being able
to make the connection between (a) VMX/EPT and INVLPGA and
(b) SVM's INVLPGA intercept and this function.

I'm asking in the context of a reported crash resulting from the
nv_p2m field being NULL during emulation of an INVLPG instruction
in a guest with nesting enabled but - afaict - not actually used. Of
course I could submit a patch adding a NULL check here, but I'd
like to understand what this code if for, and hence whether the
better fix wouldn't be to get rid of it.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-01-29 13:24 hap_invlpg() vs INVLPGA Jan Beulich
@ 2016-01-29 13:57 ` Egger, Christoph
  2016-01-29 14:02   ` Egger, Christoph
  0 siblings, 1 reply; 10+ messages in thread
From: Egger, Christoph @ 2016-01-29 13:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 29/01/16 14:24, Jan Beulich wrote:
> Christoph,
> 
> in commit dd6de3ab99 ("Implement Nested-on-Nested") you added
> code to hap_invlpg() supposedly emulating INVLPGA. I've been
> stumbling across this a number of times in the past, not being able
> to make the connection between (a) VMX/EPT and INVLPGA and
> (b) SVM's INVLPGA intercept and this function.

When you boot Windows 7 as L1 guest and XP-Mode as L2 guest then
L2 guest uses INVLPG instruction to invalidate a page and L1 guest
handles this via using INVLPGA instruction.

The INVLPG intercept flushes the nested hap p2m which is effectively
a TLB flush to the L1 guest. Then this intercept is injected into
L1 guest.

The INVLPGA instruction enforces a new ASID.

If the nested hap p2m is NULL then p2m_flush() should effectively
be a noop but it may not crash the guest.

What I don't remember is if Windows 7 must be 32bit or 64bit
to reproduce this.

Christoph

> I'm asking in the context of a reported crash resulting from the
> nv_p2m field being NULL during emulation of an INVLPG instruction
> in a guest with nesting enabled but - afaict - not actually used. Of
> course I could submit a patch adding a NULL check here, but I'd
> like to understand what this code is for, and hence whether the
> better fix wouldn't be to get rid of it.
> 
> Jan

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-01-29 13:57 ` Egger, Christoph
@ 2016-01-29 14:02   ` Egger, Christoph
  2016-01-29 15:53     ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Egger, Christoph @ 2016-01-29 14:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 29/01/16 14:57, Egger, Christoph wrote:
> On 29/01/16 14:24, Jan Beulich wrote:
>> Christoph,
>>
>> in commit dd6de3ab99 ("Implement Nested-on-Nested") you added
>> code to hap_invlpg() supposedly emulating INVLPGA. I've been
>> stumbling across this a number of times in the past, not being able
>> to make the connection between (a) VMX/EPT and INVLPGA and
>> (b) SVM's INVLPGA intercept and this function.
> 
> When you boot Windows 7 as L1 guest and XP-Mode as L2 guest then
> L2 guest uses INVLPG instruction to invalidate a page and L1 guest
> handles this via using INVLPGA instruction.
> 
> The INVLPG intercept flushes the nested hap p2m which is effectively
> a TLB flush to the L1 guest.

... actually to the L2 guest. Sorry for the typo.

> Then this intercept is injected into L1 guest.
> 
> The INVLPGA instruction enforces a new ASID.
> 
> If the nested hap p2m is NULL then p2m_flush() should effectively
> be a noop but it may not crash the guest.
> 
> What I don't remember is if Windows 7 must be 32bit or 64bit
> to reproduce this.
> 
> Christoph
> 
>> I'm asking in the context of a reported crash resulting from the
>> nv_p2m field being NULL during emulation of an INVLPG instruction
>> in a guest with nesting enabled but - afaict - not actually used. Of
>> course I could submit a patch adding a NULL check here, but I'd
>> like to understand what this code is for, and hence whether the
>> better fix wouldn't be to get rid of it.
>>
>> Jan
> 

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-01-29 14:02   ` Egger, Christoph
@ 2016-01-29 15:53     ` Jan Beulich
  2016-01-29 17:09       ` Egger, Christoph
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-01-29 15:53 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

>>> On 29.01.16 at 15:02, <chegger@amazon.de> wrote:
> On 29/01/16 14:57, Egger, Christoph wrote:
>> On 29/01/16 14:24, Jan Beulich wrote:
>>> Christoph,
>>>
>>> in commit dd6de3ab99 ("Implement Nested-on-Nested") you added
>>> code to hap_invlpg() supposedly emulating INVLPGA. I've been
>>> stumbling across this a number of times in the past, not being able
>>> to make the connection between (a) VMX/EPT and INVLPGA and
>>> (b) SVM's INVLPGA intercept and this function.
>> 
>> When you boot Windows 7 as L1 guest and XP-Mode as L2 guest then
>> L2 guest uses INVLPG instruction to invalidate a page and L1 guest
>> handles this via using INVLPGA instruction.
>> 
>> The INVLPG intercept flushes the nested hap p2m which is effectively
>> a TLB flush to the L1 guest.
> 
> ... actually to the L2 guest. Sorry for the typo.

So if the L1 guest does an INVLPGA, we should see an INVLPGA
intercept, not an INVLPG one.

>> Then this intercept is injected into L1 guest.

This, otoh, reads as if you imply we intercept the L2's INVLPG.
Yet the INVLPG intercept gets cleared when the domain uses
NPT (and your original change also didn't alter any intercept
settings). Hence I'm still lost how hap_invlpg() can be reached
in that case other than via emulating INVLPG in the instruction
emulator.

>> The INVLPGA instruction enforces a new ASID.
>> 
>> If the nested hap p2m is NULL then p2m_flush() should effectively
>> be a noop but it may not crash the guest.

s/may not/should not/ ?

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-01-29 15:53     ` Jan Beulich
@ 2016-01-29 17:09       ` Egger, Christoph
  2016-02-01  8:04         ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Egger, Christoph @ 2016-01-29 17:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 29/01/16 16:53, Jan Beulich wrote:
>>>> On 29.01.16 at 15:02, <chegger@amazon.de> wrote:
>> On 29/01/16 14:57, Egger, Christoph wrote:
>>> On 29/01/16 14:24, Jan Beulich wrote:
>>>> Christoph,
>>>>
>>>> in commit dd6de3ab99 ("Implement Nested-on-Nested") you added
>>>> code to hap_invlpg() supposedly emulating INVLPGA. I've been
>>>> stumbling across this a number of times in the past, not being able
>>>> to make the connection between (a) VMX/EPT and INVLPGA and
>>>> (b) SVM's INVLPGA intercept and this function.
>>>
>>> When you boot Windows 7 as L1 guest and XP-Mode as L2 guest then
>>> L2 guest uses INVLPG instruction to invalidate a page and L1 guest
>>> handles this via using INVLPGA instruction.
>>>
>>> The INVLPG intercept flushes the nested hap p2m which is effectively
>>> a TLB flush to the L1 guest.
>>
>> ... actually to the L2 guest. Sorry for the typo.
> 
> So if the L1 guest does an INVLPGA, we should see an INVLPGA
> intercept, not an INVLPG one.

INVLPG intercept comes first from L2 then INVLPGA from L1.

>>> Then this intercept is injected into L1 guest.
> 
> This, otoh, reads as if you imply we intercept the L2's INVLPG.
> Yet the INVLPG intercept gets cleared when the domain uses
> NPT (and your original change also didn't alter any intercept
> settings). Hence I'm still lost how hap_invlpg() can be reached
> in that case other than via emulating INVLPG in the instruction
> emulator.

svm_invlpg_intercept() and vmx_invlpg_intercept() call
paging_invlpg().  paging_invlpg() calls hap_invlpg()
as initialized in xen/arch/x86/mm/hap/hap.c

>>> The INVLPGA instruction enforces a new ASID.
>>>
>>> If the nested hap p2m is NULL then p2m_flush() should effectively
>>> be a noop but it may not crash the guest.
> 
> s/may not/should not/ ?

Yes.

Christoph

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-01-29 17:09       ` Egger, Christoph
@ 2016-02-01  8:04         ` Jan Beulich
  2016-02-01  8:14           ` Egger, Christoph
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-02-01  8:04 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

>>> On 29.01.16 at 18:09, <chegger@amazon.de> wrote:
> On 29/01/16 16:53, Jan Beulich wrote:
>>>>> On 29.01.16 at 15:02, <chegger@amazon.de> wrote:
>>> On 29/01/16 14:57, Egger, Christoph wrote:
>>>> On 29/01/16 14:24, Jan Beulich wrote:
>>>>> Christoph,
>>>>>
>>>>> in commit dd6de3ab99 ("Implement Nested-on-Nested") you added
>>>>> code to hap_invlpg() supposedly emulating INVLPGA. I've been
>>>>> stumbling across this a number of times in the past, not being able
>>>>> to make the connection between (a) VMX/EPT and INVLPGA and
>>>>> (b) SVM's INVLPGA intercept and this function.
>>>>
>>>> When you boot Windows 7 as L1 guest and XP-Mode as L2 guest then
>>>> L2 guest uses INVLPG instruction to invalidate a page and L1 guest
>>>> handles this via using INVLPGA instruction.
>>>>
>>>> The INVLPG intercept flushes the nested hap p2m which is effectively
>>>> a TLB flush to the L1 guest.
>>>
>>> ... actually to the L2 guest. Sorry for the typo.
>> 
>> So if the L1 guest does an INVLPGA, we should see an INVLPGA
>> intercept, not an INVLPG one.
> 
> INVLPG intercept comes first from L2 then INVLPGA from L1.

I.e. Xen's action should be in response to the intercepted INVLPGA,
which afaict wouldn't lead to hap_invlpg().

>>>> Then this intercept is injected into L1 guest.
>> 
>> This, otoh, reads as if you imply we intercept the L2's INVLPG.
>> Yet the INVLPG intercept gets cleared when the domain uses
>> NPT (and your original change also didn't alter any intercept
>> settings). Hence I'm still lost how hap_invlpg() can be reached
>> in that case other than via emulating INVLPG in the instruction
>> emulator.
> 
> svm_invlpg_intercept() and vmx_invlpg_intercept() call
> paging_invlpg().  paging_invlpg() calls hap_invlpg()
> as initialized in xen/arch/x86/mm/hap/hap.c

That's all fine, but according to my previous reply: How does
execution reach svm_invlpg_intercept() when the INVLPG
intercept gets disabled for domains using HAP (NPT)?

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-02-01  8:04         ` Jan Beulich
@ 2016-02-01  8:14           ` Egger, Christoph
  2016-02-01  9:00             ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Egger, Christoph @ 2016-02-01  8:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 01/02/16 09:04, Jan Beulich wrote:
>>> This, otoh, reads as if you imply we intercept the L2's INVLPG.
>>> Yet the INVLPG intercept gets cleared when the domain uses
>>> NPT (and your original change also didn't alter any intercept
>>> settings). Hence I'm still lost how hap_invlpg() can be reached
>>> in that case other than via emulating INVLPG in the instruction
>>> emulator.
>>
>> svm_invlpg_intercept() and vmx_invlpg_intercept() call
>> paging_invlpg().  paging_invlpg() calls hap_invlpg()
>> as initialized in xen/arch/x86/mm/hap/hap.c
> 
> That's all fine, but according to my previous reply: How does
> execution reach svm_invlpg_intercept() when the INVLPG
> intercept gets disabled for domains using HAP (NPT)?

The intercept bitmask for L1 guest and L2 guest gets binary or'ed
when emulating the VMENTRY for the L1 guest.
That way you get also intercepts for the L1 hypervisor.

Christoph

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-02-01  8:14           ` Egger, Christoph
@ 2016-02-01  9:00             ` Jan Beulich
  2016-02-01  9:41               ` Egger, Christoph
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-02-01  9:00 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

>>> On 01.02.16 at 09:14, <chegger@amazon.de> wrote:
> On 01/02/16 09:04, Jan Beulich wrote:
>>>> This, otoh, reads as if you imply we intercept the L2's INVLPG.
>>>> Yet the INVLPG intercept gets cleared when the domain uses
>>>> NPT (and your original change also didn't alter any intercept
>>>> settings). Hence I'm still lost how hap_invlpg() can be reached
>>>> in that case other than via emulating INVLPG in the instruction
>>>> emulator.
>>>
>>> svm_invlpg_intercept() and vmx_invlpg_intercept() call
>>> paging_invlpg().  paging_invlpg() calls hap_invlpg()
>>> as initialized in xen/arch/x86/mm/hap/hap.c
>> 
>> That's all fine, but according to my previous reply: How does
>> execution reach svm_invlpg_intercept() when the INVLPG
>> intercept gets disabled for domains using HAP (NPT)?
> 
> The intercept bitmask for L1 guest and L2 guest gets binary or'ed
> when emulating the VMENTRY for the L1 guest.
> That way you get also intercepts for the L1 hypervisor.

Okay, I can see this perhaps being correct (albeit unexpected)
for general1-intercepts (because all 32 bits are defined), but
clearly this is broken for e.g. general2-intercepts (where the
guest could set flags the hypervisor doesn't know about),
leading to the BUG() in nsvm_vmcb_guest_intercepts_exitcode().
Hence I didn't expect such behavior to be there in the first place.

And then this still doesn't make svm_invlpg_intercept() reachable:
While the L2 guest runs, the INVLPG intercept would be reflected
to the L1 guest. Whereas while the L1 guest runs, the intercept
would be off.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-02-01  9:00             ` Jan Beulich
@ 2016-02-01  9:41               ` Egger, Christoph
  2016-02-01  9:58                 ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Egger, Christoph @ 2016-02-01  9:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 01/02/16 10:00, Jan Beulich wrote:
>>>> On 01.02.16 at 09:14, <chegger@amazon.de> wrote:
>> On 01/02/16 09:04, Jan Beulich wrote:
>>>>> This, otoh, reads as if you imply we intercept the L2's INVLPG.
>>>>> Yet the INVLPG intercept gets cleared when the domain uses
>>>>> NPT (and your original change also didn't alter any intercept
>>>>> settings). Hence I'm still lost how hap_invlpg() can be reached
>>>>> in that case other than via emulating INVLPG in the instruction
>>>>> emulator.
>>>>
>>>> svm_invlpg_intercept() and vmx_invlpg_intercept() call
>>>> paging_invlpg().  paging_invlpg() calls hap_invlpg()
>>>> as initialized in xen/arch/x86/mm/hap/hap.c
>>>
>>> That's all fine, but according to my previous reply: How does
>>> execution reach svm_invlpg_intercept() when the INVLPG
>>> intercept gets disabled for domains using HAP (NPT)?
>>
>> The intercept bitmask for L1 guest and L2 guest gets binary or'ed
>> when emulating the VMENTRY for the L1 guest.
>> That way you get also intercepts for the L1 hypervisor.
> 
> Okay, I can see this perhaps being correct (albeit unexpected)
> for general1-intercepts (because all 32 bits are defined), but
> clearly this is broken for e.g. general2-intercepts (where the
> guest could set flags the hypervisor doesn't know about),
> leading to the BUG() in nsvm_vmcb_guest_intercepts_exitcode().
> Hence I didn't expect such behavior to be there in the first place.

Whenever new intercepts get defined then those must be added.

> And then this still doesn't make svm_invlpg_intercept() reachable:
> While the L2 guest runs, the INVLPG intercept would be reflected
> to the L1 guest. Whereas while the L1 guest runs, the intercept
> would be off.

While this is correct, L0 hypervisor must flush the nested hap or
whatever the L1 hypervisor does has no real effect to the L2 guest,
otherwise because the TLB/MMU pagetable walk is not different.

Christoph

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hap_invlpg() vs INVLPGA
  2016-02-01  9:41               ` Egger, Christoph
@ 2016-02-01  9:58                 ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-02-01  9:58 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

>>> On 01.02.16 at 10:41, <chegger@amazon.de> wrote:
> On 01/02/16 10:00, Jan Beulich wrote:
>>>>> On 01.02.16 at 09:14, <chegger@amazon.de> wrote:
>>> On 01/02/16 09:04, Jan Beulich wrote:
>>>>>> This, otoh, reads as if you imply we intercept the L2's INVLPG.
>>>>>> Yet the INVLPG intercept gets cleared when the domain uses
>>>>>> NPT (and your original change also didn't alter any intercept
>>>>>> settings). Hence I'm still lost how hap_invlpg() can be reached
>>>>>> in that case other than via emulating INVLPG in the instruction
>>>>>> emulator.
>>>>>
>>>>> svm_invlpg_intercept() and vmx_invlpg_intercept() call
>>>>> paging_invlpg().  paging_invlpg() calls hap_invlpg()
>>>>> as initialized in xen/arch/x86/mm/hap/hap.c
>>>>
>>>> That's all fine, but according to my previous reply: How does
>>>> execution reach svm_invlpg_intercept() when the INVLPG
>>>> intercept gets disabled for domains using HAP (NPT)?
>>>
>>> The intercept bitmask for L1 guest and L2 guest gets binary or'ed
>>> when emulating the VMENTRY for the L1 guest.
>>> That way you get also intercepts for the L1 hypervisor.
>> 
>> Okay, I can see this perhaps being correct (albeit unexpected)
>> for general1-intercepts (because all 32 bits are defined), but
>> clearly this is broken for e.g. general2-intercepts (where the
>> guest could set flags the hypervisor doesn't know about),
>> leading to the BUG() in nsvm_vmcb_guest_intercepts_exitcode().
>> Hence I didn't expect such behavior to be there in the first place.
> 
> Whenever new intercepts get defined then those must be added.

I'm sorry, but no - this attitude is why nested mode can't be
expected to become supported any time soon. Unknown intercepts
must be explicitly filtered out and/or unknown L2 exits must be
handled gracefully (to at least the hypervisor).

>> And then this still doesn't make svm_invlpg_intercept() reachable:
>> While the L2 guest runs, the INVLPG intercept would be reflected
>> to the L1 guest. Whereas while the L1 guest runs, the intercept
>> would be off.
> 
> While this is correct, L0 hypervisor must flush the nested hap or
> whatever the L1 hypervisor does has no real effect to the L2 guest,
> otherwise because the TLB/MMU pagetable walk is not different.

I don't understand: You agree that svm_invlpg_intercept() is
unreachable when the guest uses HAP, but at the same time
you say that what it does is required for correct operation?

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-02-01  9:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-29 13:24 hap_invlpg() vs INVLPGA Jan Beulich
2016-01-29 13:57 ` Egger, Christoph
2016-01-29 14:02   ` Egger, Christoph
2016-01-29 15:53     ` Jan Beulich
2016-01-29 17:09       ` Egger, Christoph
2016-02-01  8:04         ` Jan Beulich
2016-02-01  8:14           ` Egger, Christoph
2016-02-01  9:00             ` Jan Beulich
2016-02-01  9:41               ` Egger, Christoph
2016-02-01  9:58                 ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.