kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
@ 2009-01-11  9:12 Marcelo Tosatti
  2009-01-11  9:20 ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Tosatti @ 2009-01-11  9:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, kvm, joerg.roedel

On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Let me shoot at one direction: a shadow page with PGE bit in either
>> state is created. Later that shadow page is nuked (via mmu notifiers,
>> for example). 
>
> I doubt that mmu notifiers were invoked in this case (the bug would be  
> very rare); in any case we flush the tlb.

There are other events that zap shadow pages. Anyway, someone else
should figure why NPT dislikes duplicate shadow page tables for 
the same address space. How about this, untested:

KVM: MMU: zero base_role on TDP mmu context initialization

A recent change which propagates CR4.PGE to shadow page roles broke NPT,
perhaps due to the allocation of duplicate shadow trees for the same
address space (actual details unknown).

In the meantime, since guest CR4.PGE is controlled by HW with
NPT, and the relevant role information for TDP is passed directly to
kvm_mmu_get_page, zero base_role on TDP mmu context init.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 10bdb2a..44ffcf6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2231,6 +2231,8 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->root_level = PT32_ROOT_LEVEL;
 	}
 
+	vcpu->arch.mmu.base_role = 0;
+
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-11  9:12 [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings Marcelo Tosatti
@ 2009-01-11  9:20 ` Avi Kivity
  0 siblings, 0 replies; 16+ messages in thread
From: Avi Kivity @ 2009-01-11  9:20 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alexander Graf, kvm, joerg.roedel

Marcelo Tosatti wrote:
> On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> Let me shoot at one direction: a shadow page with PGE bit in either
>>> state is created. Later that shadow page is nuked (via mmu notifiers,
>>> for example). 
>>>       
>> I doubt that mmu notifiers were invoked in this case (the bug would be  
>> very rare); in any case we flush the tlb.
>>     
>
> There are other events that zap shadow pages. Anyway, someone else
> should figure why NPT dislikes duplicate shadow page tables for 
> the same address space. How about this, untested:
>
> KVM: MMU: zero base_role on TDP mmu context initialization
>
> A recent change which propagates CR4.PGE to shadow page roles broke NPT,
> perhaps due to the allocation of duplicate shadow trees for the same
> address space (actual details unknown).
>
> In the meantime, since guest CR4.PGE is controlled by HW with
> NPT, and the relevant role information for TDP is passed directly to
> kvm_mmu_get_page, zero base_role on TDP mmu context init.
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 10bdb2a..44ffcf6 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2231,6 +2231,8 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
>  		context->root_level = PT32_ROOT_LEVEL;
>  	}
>  
> +	vcpu->arch.mmu.base_role = 0;
> +
>  	return 0;
>  }
>  
>   

Won't even compile, will it?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-09  0:36                       ` Marcelo Tosatti
@ 2009-01-09 10:43                         ` Alexander Graf
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Graf @ 2009-01-09 10:43 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, joerg.roedel





On 09.01.2009, at 01:36, Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Thu, Jan 08, 2009 at 08:53:21PM +0100, Alexander Graf wrote:
>> Sorry for the late reply - I wanted to know who kvm hangs in the host
>> kernel context :)
>>
>> On 07.01.2009, at 14:46, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>
>>> On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote:
>>>> Marcelo Tosatti wrote:
>>>>> Let me shoot at one direction: a shadow page with PGE bit in  
>>>>> either
>>>>> state is created. Later that shadow page is nuked (via mmu
>>>>> notifiers,
>>>>> for example).
>>>>
>>>> I doubt that mmu notifiers were invoked in this case (the bug would
>>>> be
>>>> very rare); in any case we flush the tlb.
>>>
>>> This comment is worrying
>>>
>>>       /*
>>>        * FIXME: Tis shouldn't be necessary here, but there is a  
>>> flush
>>>        * missing in the MMU code. Until we find this bug, flush the
>>>        * complete TLB here on an NPF
>>>        */
>>>       if (npt_enabled)
>>>               svm_flush_tlb(&svm->vcpu);
>>>
>>
>> This is in, because netbench in an npt-guest failed after a few  
>> minutes
>> (see Alex W's bug report) and this appeard to fix it.
>
> Right. The comment is scary, thats all. Maybe the hypothetical missing
> flush is also the cause for this bug you're not. Or not.
>
>>> Alexander, you might want to try this patch, -ENONPT here (and  
>>> revert
>>> the previous
>>> one).
>>
>> Eh, what?
>
> I meant I have no NPT box to test myself easily.

Ah, what a pity :o

>
> "revert the previous one" = remove the !tdp_enabled test on set_cr4.
>
> The patch below is just a hack to flush the TLB of remote vcpu's  
> before
> updating the host TLB. To confirm an experimental theory (read:  
> guess).

Ok, will do.

>
>
> Hum, NPT is surely not available inside the guest for ESX to use?

Npt can only be used with SVM, and I haven't seen esx call svm  
instructions at all so far.

Alex

> Yeah, gathering more information would be helpful.
>
>>> I have no clue, what else could be causing this?
>>>
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 10bdb2a..bf68e5b 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -33,6 +33,7 @@
>>> #include <asm/cmpxchg.h>
>>> #include <asm/io.h>
>>> #include <asm/vmx.h>
>>> +#include <asm/tlbflush.h>
>>>
>>> /*
>>> * When setting this variable to true it enables Two-Dimensional-
>>> Paging
>>> @@ -1850,6 +1851,11 @@ static int __direct_map(struct kvm_vcpu  
>>> *vcpu,
>>> gpa_t v, int write,
>>>
>>>       if (*iterator.sptep == shadow_trap_nonpresent_pte) {
>>>           pseudo_gfn = (iterator.addr & PT64_DIR_BASE_ADDR_MASK) >>
>>> PAGE_SHIFT;
>>> +
>>> +                        kvm_flush_remote_tlbs(vcpu->kvm);
>>> +                        kvm_mmu_flush_tlb(vcpu);
>>> +                        __flush_tlb();
>>> +
>>>           sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
>>>                         iterator.level - 1,
>>>                         1, ACC_ALL, iterator.sptep);
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-08 19:53                     ` Alexander Graf
@ 2009-01-09  0:36                       ` Marcelo Tosatti
  2009-01-09 10:43                         ` Alexander Graf
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Tosatti @ 2009-01-09  0:36 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Avi Kivity, kvm, joerg.roedel

On Thu, Jan 08, 2009 at 08:53:21PM +0100, Alexander Graf wrote:
> Sorry for the late reply - I wanted to know who kvm hangs in the host  
> kernel context :)
>
> On 07.01.2009, at 14:46, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
>> On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote:
>>> Marcelo Tosatti wrote:
>>>> Let me shoot at one direction: a shadow page with PGE bit in either
>>>> state is created. Later that shadow page is nuked (via mmu  
>>>> notifiers,
>>>> for example).
>>>
>>> I doubt that mmu notifiers were invoked in this case (the bug would  
>>> be
>>> very rare); in any case we flush the tlb.
>>
>> This comment is worrying
>>
>>        /*
>>         * FIXME: Tis shouldn't be necessary here, but there is a flush
>>         * missing in the MMU code. Until we find this bug, flush the
>>         * complete TLB here on an NPF
>>         */
>>        if (npt_enabled)
>>                svm_flush_tlb(&svm->vcpu);
>>
>
> This is in, because netbench in an npt-guest failed after a few minutes 
> (see Alex W's bug report) and this appeard to fix it.

Right. The comment is scary, thats all. Maybe the hypothetical missing
flush is also the cause for this bug you're not. Or not.

>> Alexander, you might want to try this patch, -ENONPT here (and revert 
>> the previous
>> one).
>
> Eh, what?

I meant I have no NPT box to test myself easily.

"revert the previous one" = remove the !tdp_enabled test on set_cr4.

The patch below is just a hack to flush the TLB of remote vcpu's before
updating the host TLB. To confirm an experimental theory (read: guess).

Hum, NPT is surely not available inside the guest for ESX to use?

Yeah, gathering more information would be helpful.

>> I have no clue, what else could be causing this?
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 10bdb2a..bf68e5b 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -33,6 +33,7 @@
>> #include <asm/cmpxchg.h>
>> #include <asm/io.h>
>> #include <asm/vmx.h>
>> +#include <asm/tlbflush.h>
>>
>> /*
>>  * When setting this variable to true it enables Two-Dimensional- 
>> Paging
>> @@ -1850,6 +1851,11 @@ static int __direct_map(struct kvm_vcpu *vcpu, 
>> gpa_t v, int write,
>>
>>        if (*iterator.sptep == shadow_trap_nonpresent_pte) {
>>            pseudo_gfn = (iterator.addr & PT64_DIR_BASE_ADDR_MASK) >>  
>> PAGE_SHIFT;
>> +
>> +                        kvm_flush_remote_tlbs(vcpu->kvm);
>> +                        kvm_mmu_flush_tlb(vcpu);
>> +                        __flush_tlb();
>> +
>>            sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
>>                          iterator.level - 1,
>>                          1, ACC_ALL, iterator.sptep);
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-07 13:46                   ` Marcelo Tosatti
@ 2009-01-08 19:53                     ` Alexander Graf
  2009-01-09  0:36                       ` Marcelo Tosatti
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Graf @ 2009-01-08 19:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, joerg.roedel

Sorry for the late reply - I wanted to know who kvm hangs in the host  
kernel context :)

On 07.01.2009, at 14:46, Marcelo Tosatti <mtosatti@redhat.com> wrote:

> On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote:
>> Marcelo Tosatti wrote:
>>> Let me shoot at one direction: a shadow page with PGE bit in either
>>> state is created. Later that shadow page is nuked (via mmu  
>>> notifiers,
>>> for example).
>>
>> I doubt that mmu notifiers were invoked in this case (the bug would  
>> be
>> very rare); in any case we flush the tlb.
>
> This comment is worrying
>
>        /*
>         * FIXME: Tis shouldn't be necessary here, but there is a flush
>         * missing in the MMU code. Until we find this bug, flush the
>         * complete TLB here on an NPF
>         */
>        if (npt_enabled)
>                svm_flush_tlb(&svm->vcpu);
>

This is in, because netbench in an npt-guest failed after a few  
minutes (see Alex W's bug report) and this appeard to fix it.


> Alexander, you might want to try this patch, -ENONPT here (and  
> revert the previous
> one).

Eh, what?

> I have no clue, what else could be causing this?
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 10bdb2a..bf68e5b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -33,6 +33,7 @@
> #include <asm/cmpxchg.h>
> #include <asm/io.h>
> #include <asm/vmx.h>
> +#include <asm/tlbflush.h>
>
> /*
>  * When setting this variable to true it enables Two-Dimensional- 
> Paging
> @@ -1850,6 +1851,11 @@ static int __direct_map(struct kvm_vcpu  
> *vcpu, gpa_t v, int write,
>
>        if (*iterator.sptep == shadow_trap_nonpresent_pte) {
>            pseudo_gfn = (iterator.addr & PT64_DIR_BASE_ADDR_MASK) >>  
> PAGE_SHIFT;
> +
> +                        kvm_flush_remote_tlbs(vcpu->kvm);
> +                        kvm_mmu_flush_tlb(vcpu);
> +                        __flush_tlb();
> +
>            sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
>                          iterator.level - 1,
>                          1, ACC_ALL, iterator.sptep);
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-07 11:32                 ` Avi Kivity
@ 2009-01-07 13:46                   ` Marcelo Tosatti
  2009-01-08 19:53                     ` Alexander Graf
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Tosatti @ 2009-01-07 13:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, kvm, joerg.roedel

On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Let me shoot at one direction: a shadow page with PGE bit in either
>> state is created. Later that shadow page is nuked (via mmu notifiers,
>> for example). 
>
> I doubt that mmu notifiers were invoked in this case (the bug would be  
> very rare); in any case we flush the tlb.

This comment is worrying

        /*
         * FIXME: Tis shouldn't be necessary here, but there is a flush
         * missing in the MMU code. Until we find this bug, flush the
         * complete TLB here on an NPF
         */
        if (npt_enabled)
                svm_flush_tlb(&svm->vcpu);

Alexander, you might want to try this patch, -ENONPT here (and revert the previous
one). I have no clue, what else could be causing this?

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 10bdb2a..bf68e5b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -33,6 +33,7 @@
 #include <asm/cmpxchg.h>
 #include <asm/io.h>
 #include <asm/vmx.h>
+#include <asm/tlbflush.h>
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -1850,6 +1851,11 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 
 		if (*iterator.sptep == shadow_trap_nonpresent_pte) {
 			pseudo_gfn = (iterator.addr & PT64_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
+
+                        kvm_flush_remote_tlbs(vcpu->kvm);
+                        kvm_mmu_flush_tlb(vcpu);
+                        __flush_tlb();
+                        
 			sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
 					      iterator.level - 1,
 					      1, ACC_ALL, iterator.sptep);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-07 10:43               ` Marcelo Tosatti
@ 2009-01-07 11:32                 ` Avi Kivity
  2009-01-07 13:46                   ` Marcelo Tosatti
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2009-01-07 11:32 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alexander Graf, kvm, joerg.roedel

Marcelo Tosatti wrote:
> Let me shoot at one direction: a shadow page with PGE bit in either
> state is created. Later that shadow page is nuked (via mmu notifiers,
> for example). 

I doubt that mmu notifiers were invoked in this case (the bug would be 
very rare); in any case we flush the tlb.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-07 10:19             ` Avi Kivity
@ 2009-01-07 10:43               ` Marcelo Tosatti
  2009-01-07 11:32                 ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Tosatti @ 2009-01-07 10:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, kvm, joerg.roedel

On Wed, Jan 07, 2009 at 12:19:26PM +0200, Avi Kivity wrote:
> Alexander Graf wrote:
>> Using this patch it works. But if I read it correctly, that doesn't
>> actually fix anything but only treats NPT/EPT special, which it
>> shouldn't, should it? 
>
> The patch doesn't fix the bug but is nevertheless correct.  cr4.pge only  
> matters to the mmu if using the shadow mmu; with tdp it only wastes  
> memory (and exposes the bug which you encountered).
>
> So, wrt to the bug you saw, it's a workaround, but it's also a correct  
> fix for another bug.
>
>> Maybe this actually even breaks EPT?
>>   
>
> It shouldn't.
>
>> I remember having seen a lot of CR4 hacks in svm.c when npt is enabled.
>> Maybe that is related?
>>   
>
> No.  cr4 controls the guest mmu, but with npt the guest mmu is  
> completely virtualized, so we need to ignore those bits.

Let me shoot at one direction: a shadow page with PGE bit in either
state is created. Later that shadow page is nuked (via mmu notifiers,
for example). Then set_cr4 changes base_role.pge to a different value,
and a fault creates a new shadow page and instantiates that in the tree.

Perhaps a svm_flush_tlb is required in such case, when updating a
previously valid pagetable entry? Joerg?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-07  6:49           ` Alexander Graf
@ 2009-01-07 10:19             ` Avi Kivity
  2009-01-07 10:43               ` Marcelo Tosatti
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2009-01-07 10:19 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Marcelo Tosatti, kvm, joerg.roedel

Alexander Graf wrote:
> Using this patch it works. But if I read it correctly, that doesn't
> actually fix anything but only treats NPT/EPT special, which it
> shouldn't, should it? 

The patch doesn't fix the bug but is nevertheless correct.  cr4.pge only 
matters to the mmu if using the shadow mmu; with tdp it only wastes 
memory (and exposes the bug which you encountered).

So, wrt to the bug you saw, it's a workaround, but it's also a correct 
fix for another bug.

> Maybe this actually even breaks EPT?
>   

It shouldn't.

> I remember having seen a lot of CR4 hacks in svm.c when npt is enabled.
> Maybe that is related?
>   

No.  cr4 controls the guest mmu, but with npt the guest mmu is 
completely virtualized, so we need to ignore those bits.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-06 16:43         ` Marcelo Tosatti
@ 2009-01-07  6:49           ` Alexander Graf
  2009-01-07 10:19             ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Graf @ 2009-01-07  6:49 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, joerg.roedel

Marcelo Tosatti wrote:
> On Tue, Jan 06, 2009 at 04:29:59PM +0200, Avi Kivity wrote:
>   
>> Yes...
>>
>> Looks like kvm_unsync_page can be folded into mmu_need_write_protect  
>> (after which we can drop lookup_page(), which is not a good API).  But  
>> that's after we solve the current problem.
>>
>> Looks like the addition of a second role for non-pge mode confuses the  
>> mmu.  After the second page is created, mmu_need_write_protect() will  
>> return 1, but previously existing sptes can still be writable?
>>
>> Looks like we need to call rmap_write_protect() when the new page is  
>> created.
>>     
>
> I'm not sure about the details, but I suspect that multiple shadows
> confuse NPT somehow.
>
> Alexander can you give this a try:
>   

Using this patch it works. But if I read it correctly, that doesn't
actually fix anything but only treats NPT/EPT special, which it
shouldn't, should it? Maybe this actually even breaks EPT?
I remember having seen a lot of CR4 hacks in svm.c when npt is enabled.
Maybe that is related?

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-06 14:29       ` Avi Kivity
  2009-01-06 15:06         ` Marcelo Tosatti
@ 2009-01-06 16:43         ` Marcelo Tosatti
  2009-01-07  6:49           ` Alexander Graf
  1 sibling, 1 reply; 16+ messages in thread
From: Marcelo Tosatti @ 2009-01-06 16:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, kvm, joerg.roedel

On Tue, Jan 06, 2009 at 04:29:59PM +0200, Avi Kivity wrote:
> Yes...
>
> Looks like kvm_unsync_page can be folded into mmu_need_write_protect  
> (after which we can drop lookup_page(), which is not a good API).  But  
> that's after we solve the current problem.
>
> Looks like the addition of a second role for non-pge mode confuses the  
> mmu.  After the second page is created, mmu_need_write_protect() will  
> return 1, but previously existing sptes can still be writable?
>
> Looks like we need to call rmap_write_protect() when the new page is  
> created.

I'm not sure about the details, but I suspect that multiple shadows
confuse NPT somehow.

Alexander can you give this a try:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 36aa576..2c6579e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -363,7 +363,8 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 	}
 	kvm_x86_ops->set_cr4(vcpu, cr4);
 	vcpu->arch.cr4 = cr4;
-	vcpu->arch.mmu.base_role.cr4_pge = !!(cr4 & X86_CR4_PGE);
+	if (!tdp_enabled)
+		vcpu->arch.mmu.base_role.cr4_pge = !!(cr4 & X86_CR4_PGE);
 	kvm_mmu_sync_global(vcpu);
 	kvm_mmu_reset_context(vcpu);
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-06 14:29       ` Avi Kivity
@ 2009-01-06 15:06         ` Marcelo Tosatti
  2009-01-06 16:43         ` Marcelo Tosatti
  1 sibling, 0 replies; 16+ messages in thread
From: Marcelo Tosatti @ 2009-01-06 15:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, kvm, joerg.roedel

On Tue, Jan 06, 2009 at 04:29:59PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> On Tue, Jan 06, 2009 at 12:41:40PM +0200, Avi Kivity wrote:
>>   
>>> Alexander Graf wrote:
>>>     
>>>> Avi Kivity wrote:
>>>>         
>>>>> From: Avi Kivity <avi@redhat.com>
>>>>>
>>>>> Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
>>>>> cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
>>>>> global bit set (the global bit has no effect if !cr4.pge).
>>>>>
>>>>> This can only occur on smp with different cr4.pge settings for different
>>>>> vcpus (since a cr4 change will resync the shadow ptes), but there's no
>>>>> cost to being correct here.
>>>>>
>>>>> Signed-off-by: Avi Kivity <avi@redhat.com>
>>>>>
>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>> index f49bfd0..ab8ef1d 100644
>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>> @@ -183,6 +183,7 @@ union kvm_mmu_page_role {
>>>>>  		unsigned metaphysical:1;
>>>>>  		unsigned access:3;
>>>>>  		unsigned invalid:1;
>>>>> +		unsigned cr4_pge:1;
>>>>>  	};
>>>>>  };
>>>>>  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>> index c4da7fb..aa4575c 100644
>>>>> --- a/arch/x86/kvm/x86.c
>>>>> +++ b/arch/x86/kvm/x86.c
>>>>> @@ -363,6 +363,7 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>>>>>  	}
>>>>>  	kvm_x86_ops->set_cr4(vcpu, cr4);
>>>>>  	vcpu->arch.cr4 = cr4;
>>>>> +	vcpu->arch.mmu.base_role.cr4_pge = !!(cr4 & X86_CR4_PGE);
>>>>>               
>>>> This line broke VMware ESX bootup using NPT on one VCPU for me. If I
>>>> comment it out and apply my patches to actually make ESX run, it boots
>>>> again.
>>>>         
>>> I think this might be the problem:
>>>
>>>     
>>>> static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>>>>                   bool can_unsync)
>>>> {
>>>>     struct kvm_mmu_page *shadow;
>>>>
>>>>     shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
>>>>     if (shadow) {
>>>>         if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
>>>>             return 1;
>>>>         if (shadow->unsync)
>>>>             return 0;
>>>>         if (can_unsync && oos_shadow)
>>>>             return kvm_unsync_page(vcpu, shadow);
>>>>         return 1;
>>>>     }
>>>>     return 0;
>>>> }
>>>>
>>>>       
>>> lookup_page() is not deterministic if there are multiple shadows for 
>>> a  page, and the patch increases multiple shadows.
>>>
>>> Marcelo, shouldn't there be a for_each in there?
>>>     
>>
>> static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) {
>>
>>         unsigned index;
>>         struct hlist_head *bucket;
>>         struct kvm_mmu_page *s;
>>         struct hlist_node *node, *n;
>>
>>         index = kvm_page_table_hashfn(sp->gfn);
>>         bucket = &vcpu->kvm->arch.mmu_page_hash[index];
>>         /* don't unsync if pagetable is shadowed with multiple roles */
>>         hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
>>                 if (s->gfn != sp->gfn || s->role.metaphysical)
>>                         continue;
>>                 if (s->role.word != sp->role.word)
>>                         return 1;
>>         }
>>
>> This one?
>>   
>
> Yes...
>
> Looks like kvm_unsync_page can be folded into mmu_need_write_protect  
> (after which we can drop lookup_page(), which is not a good API).  But  
> that's after we solve the current problem.
>
> Looks like the addition of a second role for non-pge mode confuses the  
> mmu.  After the second page is created, mmu_need_write_protect() will  
> return 1, but previously existing sptes can still be writable?
>
> Looks like we need to call rmap_write_protect() when the new page is  
> created.

kvm_mmu_get_page:

        hlist_for_each_entry_safe(sp, node, tmp, bucket, hash_link)
                if (sp->gfn == gfn) {
                        if (sp->unsync)
                                if (kvm_sync_page(vcpu, sp))
                                        continue;

So rmap_write_protect is called when the new page is created. Looking 
at it now.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-06 14:11     ` Marcelo Tosatti
@ 2009-01-06 14:29       ` Avi Kivity
  2009-01-06 15:06         ` Marcelo Tosatti
  2009-01-06 16:43         ` Marcelo Tosatti
  0 siblings, 2 replies; 16+ messages in thread
From: Avi Kivity @ 2009-01-06 14:29 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Alexander Graf, kvm, joerg.roedel

Marcelo Tosatti wrote:
> On Tue, Jan 06, 2009 at 12:41:40PM +0200, Avi Kivity wrote:
>   
>> Alexander Graf wrote:
>>     
>>> Avi Kivity wrote:
>>>   
>>>       
>>>> From: Avi Kivity <avi@redhat.com>
>>>>
>>>> Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
>>>> cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
>>>> global bit set (the global bit has no effect if !cr4.pge).
>>>>
>>>> This can only occur on smp with different cr4.pge settings for different
>>>> vcpus (since a cr4 change will resync the shadow ptes), but there's no
>>>> cost to being correct here.
>>>>
>>>> Signed-off-by: Avi Kivity <avi@redhat.com>
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index f49bfd0..ab8ef1d 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -183,6 +183,7 @@ union kvm_mmu_page_role {
>>>>  		unsigned metaphysical:1;
>>>>  		unsigned access:3;
>>>>  		unsigned invalid:1;
>>>> +		unsigned cr4_pge:1;
>>>>  	};
>>>>  };
>>>>  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index c4da7fb..aa4575c 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -363,6 +363,7 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>>>>  	}
>>>>  	kvm_x86_ops->set_cr4(vcpu, cr4);
>>>>  	vcpu->arch.cr4 = cr4;
>>>> +	vcpu->arch.mmu.base_role.cr4_pge = !!(cr4 & X86_CR4_PGE);
>>>>       
>>>>         
>>> This line broke VMware ESX bootup using NPT on one VCPU for me. If I
>>> comment it out and apply my patches to actually make ESX run, it boots
>>> again.
>>>   
>>>       
>> I think this might be the problem:
>>
>>     
>>> static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>>>                   bool can_unsync)
>>> {
>>>     struct kvm_mmu_page *shadow;
>>>
>>>     shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
>>>     if (shadow) {
>>>         if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
>>>             return 1;
>>>         if (shadow->unsync)
>>>             return 0;
>>>         if (can_unsync && oos_shadow)
>>>             return kvm_unsync_page(vcpu, shadow);
>>>         return 1;
>>>     }
>>>     return 0;
>>> }
>>>
>>>       
>> lookup_page() is not deterministic if there are multiple shadows for a  
>> page, and the patch increases multiple shadows.
>>
>> Marcelo, shouldn't there be a for_each in there?
>>     
>
> static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) {
>
>         unsigned index;
>         struct hlist_head *bucket;
>         struct kvm_mmu_page *s;
>         struct hlist_node *node, *n;
>
>         index = kvm_page_table_hashfn(sp->gfn);
>         bucket = &vcpu->kvm->arch.mmu_page_hash[index];
>         /* don't unsync if pagetable is shadowed with multiple roles */
>         hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
>                 if (s->gfn != sp->gfn || s->role.metaphysical)
>                         continue;
>                 if (s->role.word != sp->role.word)
>                         return 1;
>         }
>
> This one?
>   

Yes...

Looks like kvm_unsync_page can be folded into mmu_need_write_protect 
(after which we can drop lookup_page(), which is not a good API).  But 
that's after we solve the current problem.

Looks like the addition of a second role for non-pge mode confuses the 
mmu.  After the second page is created, mmu_need_write_protect() will 
return 1, but previously existing sptes can still be writable?

Looks like we need to call rmap_write_protect() when the new page is 
created.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-06 10:41   ` Avi Kivity
@ 2009-01-06 14:11     ` Marcelo Tosatti
  2009-01-06 14:29       ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Tosatti @ 2009-01-06 14:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, kvm, joerg.roedel

On Tue, Jan 06, 2009 at 12:41:40PM +0200, Avi Kivity wrote:
> Alexander Graf wrote:
>> Avi Kivity wrote:
>>   
>>> From: Avi Kivity <avi@redhat.com>
>>>
>>> Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
>>> cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
>>> global bit set (the global bit has no effect if !cr4.pge).
>>>
>>> This can only occur on smp with different cr4.pge settings for different
>>> vcpus (since a cr4 change will resync the shadow ptes), but there's no
>>> cost to being correct here.
>>>
>>> Signed-off-by: Avi Kivity <avi@redhat.com>
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index f49bfd0..ab8ef1d 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -183,6 +183,7 @@ union kvm_mmu_page_role {
>>>  		unsigned metaphysical:1;
>>>  		unsigned access:3;
>>>  		unsigned invalid:1;
>>> +		unsigned cr4_pge:1;
>>>  	};
>>>  };
>>>  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index c4da7fb..aa4575c 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -363,6 +363,7 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>>>  	}
>>>  	kvm_x86_ops->set_cr4(vcpu, cr4);
>>>  	vcpu->arch.cr4 = cr4;
>>> +	vcpu->arch.mmu.base_role.cr4_pge = !!(cr4 & X86_CR4_PGE);
>>>       
>>
>> This line broke VMware ESX bootup using NPT on one VCPU for me. If I
>> comment it out and apply my patches to actually make ESX run, it boots
>> again.
>>   
>
> I think this might be the problem:
>
>> static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>>                   bool can_unsync)
>> {
>>     struct kvm_mmu_page *shadow;
>>
>>     shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
>>     if (shadow) {
>>         if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
>>             return 1;
>>         if (shadow->unsync)
>>             return 0;
>>         if (can_unsync && oos_shadow)
>>             return kvm_unsync_page(vcpu, shadow);
>>         return 1;
>>     }
>>     return 0;
>> }
>>
>
> lookup_page() is not deterministic if there are multiple shadows for a  
> page, and the patch increases multiple shadows.
>
> Marcelo, shouldn't there be a for_each in there?

static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) {

        unsigned index;
        struct hlist_head *bucket;
        struct kvm_mmu_page *s;
        struct hlist_node *node, *n;

        index = kvm_page_table_hashfn(sp->gfn);
        bucket = &vcpu->kvm->arch.mmu_page_hash[index];
        /* don't unsync if pagetable is shadowed with multiple roles */
        hlist_for_each_entry_safe(s, node, n, bucket, hash_link) {
                if (s->gfn != sp->gfn || s->role.metaphysical)
                        continue;
                if (s->role.word != sp->role.word)
                        return 1;
        }

This one?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
  2009-01-05 14:56 ` Alexander Graf
@ 2009-01-06 10:41   ` Avi Kivity
  2009-01-06 14:11     ` Marcelo Tosatti
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2009-01-06 10:41 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, joerg.roedel, Marcelo Tosatti

Alexander Graf wrote:
> Avi Kivity wrote:
>   
>> From: Avi Kivity <avi@redhat.com>
>>
>> Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
>> cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
>> global bit set (the global bit has no effect if !cr4.pge).
>>
>> This can only occur on smp with different cr4.pge settings for different
>> vcpus (since a cr4 change will resync the shadow ptes), but there's no
>> cost to being correct here.
>>
>> Signed-off-by: Avi Kivity <avi@redhat.com>
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index f49bfd0..ab8ef1d 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -183,6 +183,7 @@ union kvm_mmu_page_role {
>>  		unsigned metaphysical:1;
>>  		unsigned access:3;
>>  		unsigned invalid:1;
>> +		unsigned cr4_pge:1;
>>  	};
>>  };
>>  
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c4da7fb..aa4575c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -363,6 +363,7 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>>  	}
>>  	kvm_x86_ops->set_cr4(vcpu, cr4);
>>  	vcpu->arch.cr4 = cr4;
>> +	vcpu->arch.mmu.base_role.cr4_pge = !!(cr4 & X86_CR4_PGE);
>>   
>>     
>
> This line broke VMware ESX bootup using NPT on one VCPU for me. If I
> comment it out and apply my patches to actually make ESX run, it boots
> again.
>   

I think this might be the problem:

> static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
>                   bool can_unsync)
> {
>     struct kvm_mmu_page *shadow;
>
>     shadow = kvm_mmu_lookup_page(vcpu->kvm, gfn);
>     if (shadow) {
>         if (shadow->role.level != PT_PAGE_TABLE_LEVEL)
>             return 1;
>         if (shadow->unsync)
>             return 0;
>         if (can_unsync && oos_shadow)
>             return kvm_unsync_page(vcpu, shadow);
>         return 1;
>     }
>     return 0;
> }
>

lookup_page() is not deterministic if there are multiple shadows for a 
page, and the patch increases multiple shadows.

Marcelo, shouldn't there be a for_each in there?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
       [not found] <20081221184146.8E00B250012@cleopatra.tlv.redhat.com>
@ 2009-01-05 14:56 ` Alexander Graf
  2009-01-06 10:41   ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Graf @ 2009-01-05 14:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, joerg.roedel

Avi Kivity wrote:
> From: Avi Kivity <avi@redhat.com>
>
> Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
> cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
> global bit set (the global bit has no effect if !cr4.pge).
>
> This can only occur on smp with different cr4.pge settings for different
> vcpus (since a cr4 change will resync the shadow ptes), but there's no
> cost to being correct here.
>
> Signed-off-by: Avi Kivity <avi@redhat.com>
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f49bfd0..ab8ef1d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -183,6 +183,7 @@ union kvm_mmu_page_role {
>  		unsigned metaphysical:1;
>  		unsigned access:3;
>  		unsigned invalid:1;
> +		unsigned cr4_pge:1;
>  	};
>  };
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c4da7fb..aa4575c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -363,6 +363,7 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  	}
>  	kvm_x86_ops->set_cr4(vcpu, cr4);
>  	vcpu->arch.cr4 = cr4;
> +	vcpu->arch.mmu.base_role.cr4_pge = !!(cr4 & X86_CR4_PGE);
>   

This line broke VMware ESX bootup using NPT on one VCPU for me. If I
comment it out and apply my patches to actually make ESX run, it boots
again.

Alex

>  	kvm_mmu_sync_global(vcpu);
>  	kvm_mmu_reset_context(vcpu);
>  }
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-commits" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-01-11  9:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-11  9:12 [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings Marcelo Tosatti
2009-01-11  9:20 ` Avi Kivity
     [not found] <20081221184146.8E00B250012@cleopatra.tlv.redhat.com>
2009-01-05 14:56 ` Alexander Graf
2009-01-06 10:41   ` Avi Kivity
2009-01-06 14:11     ` Marcelo Tosatti
2009-01-06 14:29       ` Avi Kivity
2009-01-06 15:06         ` Marcelo Tosatti
2009-01-06 16:43         ` Marcelo Tosatti
2009-01-07  6:49           ` Alexander Graf
2009-01-07 10:19             ` Avi Kivity
2009-01-07 10:43               ` Marcelo Tosatti
2009-01-07 11:32                 ` Avi Kivity
2009-01-07 13:46                   ` Marcelo Tosatti
2009-01-08 19:53                     ` Alexander Graf
2009-01-09  0:36                       ` Marcelo Tosatti
2009-01-09 10:43                         ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).