* [PATCH] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case @ 2019-02-22 16:46 Vitaly Kuznetsov 2019-02-22 17:17 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Vitaly Kuznetsov @ 2019-02-22 16:46 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, Radim Krčmář, Junaid Shahid, linux-kernel I noticed that fast_cr3_switch() always fails when we switch back from L2 to L1 as it is not able to find a cached root. This is odd: host's CR3 usually stays the same, we expect to always follow the fast path. Turns out the problem is that page role is always mismatched because kvm_mmu_get_page() filters out cr4_pae when direct, the value is stored in page header and later compared with new_role in cached_root_available(). As cr4_pae is always set in long mode prev_roots cache is dysfunctional. The problem appeared after we introduced kvm_calc_mmu_role_common(): previously, we were only setting this bit for shadow MMU root but now we set it for everything. Restore the original behavior. Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- RFC: Alternatively, I can suggest two solutions: - Do not clear cr4_pae in kvm_mmu_get_page() and check direct on call sites (detect_write_misaligned(), get_written_sptes()). - Filter cr4_pae out when direct in kvm_mmu_new_cr3(). The code in kvm_mmu_get_page() is very ancient, I'm afraid to touch it :=) --- arch/x86/kvm/mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f2d1d230d5b8..c729e98eee49 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4791,7 +4791,6 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu, role.base.access = ACC_ALL; role.base.nxe = !!is_nx(vcpu); - role.base.cr4_pae = !!is_pae(vcpu); role.base.cr0_wp = is_write_protection(vcpu); role.base.smm = is_smm(vcpu); role.base.guest_mode = is_guest_mode(vcpu); @@ -4871,6 +4870,8 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only) { union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, base_only); + role.base.cr4_pae = !!is_pae(vcpu); + role.base.smep_andnot_wp = role.ext.cr4_smep && !is_write_protection(vcpu); role.base.smap_andnot_wp = role.ext.cr4_smap && -- 2.20.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-02-22 16:46 [PATCH] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case Vitaly Kuznetsov @ 2019-02-22 17:17 ` Paolo Bonzini 2019-02-22 18:49 ` Vitaly Kuznetsov 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2019-02-22 17:17 UTC (permalink / raw) To: Vitaly Kuznetsov, kvm Cc: Radim Krčmář, Junaid Shahid, linux-kernel On 22/02/19 17:46, Vitaly Kuznetsov wrote: > I noticed that fast_cr3_switch() always fails when we switch back from L2 > to L1 as it is not able to find a cached root. This is odd: host's CR3 > usually stays the same, we expect to always follow the fast path. Turns > out the problem is that page role is always mismatched because > kvm_mmu_get_page() filters out cr4_pae when direct, the value is stored > in page header and later compared with new_role in cached_root_available(). > As cr4_pae is always set in long mode prev_roots cache is dysfunctional. Really cr4_pae means "are the PTEs 8 bytes". So I think your patch is correct but on top we should set it to 1 (not zero!!) for kvm_calc_shadow_ept_root_page_role, init_kvm_nested_mmu and kvm_calc_tdp_mmu_root_page_role. Or maybe everything breaks with that change. > - Do not clear cr4_pae in kvm_mmu_get_page() and check direct on call sites > (detect_write_misaligned(), get_written_sptes()). These only run with shadow page tables, by the way. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-02-22 17:17 ` Paolo Bonzini @ 2019-02-22 18:49 ` Vitaly Kuznetsov 2019-02-22 20:29 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Vitaly Kuznetsov @ 2019-02-22 18:49 UTC (permalink / raw) To: Paolo Bonzini, kvm Cc: Radim Krčmář, Junaid Shahid, linux-kernel Paolo Bonzini <pbonzini@redhat.com> writes: > On 22/02/19 17:46, Vitaly Kuznetsov wrote: >> I noticed that fast_cr3_switch() always fails when we switch back from L2 >> to L1 as it is not able to find a cached root. This is odd: host's CR3 >> usually stays the same, we expect to always follow the fast path. Turns >> out the problem is that page role is always mismatched because >> kvm_mmu_get_page() filters out cr4_pae when direct, the value is stored >> in page header and later compared with new_role in cached_root_available(). >> As cr4_pae is always set in long mode prev_roots cache is dysfunctional. > > Really cr4_pae means "are the PTEs 8 bytes". So I think your patch is > correct but on top we should set it to 1 (not zero!!) for > kvm_calc_shadow_ept_root_page_role, init_kvm_nested_mmu and > kvm_calc_tdp_mmu_root_page_role. Or maybe everything breaks with that > change. > Yes, exactly. If we put '1' there kvm_mmu_get_page() will again filter it out and we won't be able to find the root in prev_roots cache :-( >> - Do not clear cr4_pae in kvm_mmu_get_page() and check direct on call sites >> (detect_write_misaligned(), get_written_sptes()). > > These only run with shadow page tables, by the way. > Yes, and that's why I think it may make sense to move the filtering logic there. At least in other cases cr4_pae will always be equal to is_pae(). It seems I know too little about shadow paging and all these corner cases :-( -- Vitaly ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-02-22 18:49 ` Vitaly Kuznetsov @ 2019-02-22 20:29 ` Paolo Bonzini 2019-02-23 11:15 ` [PATCH v2 RFC] " Vitaly Kuznetsov 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2019-02-22 20:29 UTC (permalink / raw) To: Vitaly Kuznetsov, kvm Cc: Radim Krčmář, Junaid Shahid, linux-kernel On 22/02/19 19:49, Vitaly Kuznetsov wrote: >> Really cr4_pae means "are the PTEs 8 bytes". So I think your patch is >> correct but on top we should set it to 1 (not zero!!) for >> kvm_calc_shadow_ept_root_page_role, init_kvm_nested_mmu and >> kvm_calc_tdp_mmu_root_page_role. Or maybe everything breaks with that >> change. >> > Yes, exactly. If we put '1' there kvm_mmu_get_page() will again filter > it out and we won't be able to find the root in prev_roots cache :-( > Well, of course then kvm_mmu_get_page() would have to remove the filtering. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 RFC] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-02-22 20:29 ` Paolo Bonzini @ 2019-02-23 11:15 ` Vitaly Kuznetsov 2019-03-07 14:07 ` Vitaly Kuznetsov 0 siblings, 1 reply; 10+ messages in thread From: Vitaly Kuznetsov @ 2019-02-23 11:15 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, Radim Krčmář, Junaid Shahid, linux-kernel Alternative patch: remove the filtering from kvm_mmu_get_page() and check for direct on call sites. cr4_pae setting in kvm_calc_mmu_role_common() can be preserved for consistency. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- arch/x86/kvm/mmu.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f2d1d230d5b8..7fb8118f2af6 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2420,8 +2420,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, role = vcpu->arch.mmu->mmu_role.base; role.level = level; role.direct = direct; - if (role.direct) - role.cr4_pae = 0; role.access = access; if (!vcpu->arch.mmu->direct_map && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { @@ -5176,7 +5174,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa, gpa, bytes, sp->role.word); offset = offset_in_page(gpa); - pte_size = sp->role.cr4_pae ? 8 : 4; + pte_size = (sp->role.cr4_pae && !sp->role.direct) ? 8 : 4; /* * Sometimes, the OS only writes the last one bytes to update status @@ -5200,7 +5198,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte) page_offset = offset_in_page(gpa); level = sp->role.level; *nspte = 1; - if (!sp->role.cr4_pae) { + if (!sp->role.cr4_pae || sp->role.direct) { page_offset <<= 1; /* 32->64 */ /* * A 32-bit pde maps 4MB while the shadow pdes map -- 2.20.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 RFC] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-02-23 11:15 ` [PATCH v2 RFC] " Vitaly Kuznetsov @ 2019-03-07 14:07 ` Vitaly Kuznetsov 2019-03-07 16:06 ` Sean Christopherson 0 siblings, 1 reply; 10+ messages in thread From: Vitaly Kuznetsov @ 2019-03-07 14:07 UTC (permalink / raw) To: Paolo Bonzini, Radim Krčmář Cc: kvm, Junaid Shahid, linux-kernel Vitaly Kuznetsov <vkuznets@redhat.com> writes: > Alternative patch: remove the filtering from kvm_mmu_get_page() and check > for direct on call sites. cr4_pae setting in kvm_calc_mmu_role_common() > can be preserved for consistency. > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > arch/x86/kvm/mmu.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index f2d1d230d5b8..7fb8118f2af6 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2420,8 +2420,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > role = vcpu->arch.mmu->mmu_role.base; > role.level = level; > role.direct = direct; > - if (role.direct) > - role.cr4_pae = 0; > role.access = access; > if (!vcpu->arch.mmu->direct_map > && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { > @@ -5176,7 +5174,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa, > gpa, bytes, sp->role.word); > > offset = offset_in_page(gpa); > - pte_size = sp->role.cr4_pae ? 8 : 4; > + pte_size = (sp->role.cr4_pae && !sp->role.direct) ? 8 : 4; > > /* > * Sometimes, the OS only writes the last one bytes to update status > @@ -5200,7 +5198,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte) > page_offset = offset_in_page(gpa); > level = sp->role.level; > *nspte = 1; > - if (!sp->role.cr4_pae) { > + if (!sp->role.cr4_pae || sp->role.direct) { > page_offset <<= 1; /* 32->64 */ > /* > * A 32-bit pde maps 4MB while the shadow pdes map While I personally prefer this approach to not setting role.cr4_pae in kvm_calc_mmu_role_common() I'd like to get maintainers opinion (I did test the patch with ept=off but I have to admit I don't know much about shadow page tables which actually use detect_write_misaligned()/ get_written_sptes()) Paolo, Radim, (anyone else) - any thoughts? -- Vitaly ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 RFC] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-03-07 14:07 ` Vitaly Kuznetsov @ 2019-03-07 16:06 ` Sean Christopherson 2019-03-07 16:41 ` Vitaly Kuznetsov 0 siblings, 1 reply; 10+ messages in thread From: Sean Christopherson @ 2019-03-07 16:06 UTC (permalink / raw) To: Vitaly Kuznetsov Cc: Paolo Bonzini, Radim Krčmář, kvm, Junaid Shahid, linux-kernel On Thu, Mar 07, 2019 at 03:07:05PM +0100, Vitaly Kuznetsov wrote: > Vitaly Kuznetsov <vkuznets@redhat.com> writes: > > > Alternative patch: remove the filtering from kvm_mmu_get_page() and check > > for direct on call sites. cr4_pae setting in kvm_calc_mmu_role_common() > > can be preserved for consistency. > > > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > > --- > > arch/x86/kvm/mmu.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > index f2d1d230d5b8..7fb8118f2af6 100644 > > --- a/arch/x86/kvm/mmu.c > > +++ b/arch/x86/kvm/mmu.c > > @@ -2420,8 +2420,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > > role = vcpu->arch.mmu->mmu_role.base; > > role.level = level; > > role.direct = direct; > > - if (role.direct) > > - role.cr4_pae = 0; > > role.access = access; > > if (!vcpu->arch.mmu->direct_map > > && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { > > @@ -5176,7 +5174,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa, > > gpa, bytes, sp->role.word); > > > > offset = offset_in_page(gpa); > > - pte_size = sp->role.cr4_pae ? 8 : 4; > > + pte_size = (sp->role.cr4_pae && !sp->role.direct) ? 8 : 4; > > > > /* > > * Sometimes, the OS only writes the last one bytes to update status > > @@ -5200,7 +5198,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte) > > page_offset = offset_in_page(gpa); > > level = sp->role.level; > > *nspte = 1; > > - if (!sp->role.cr4_pae) { > > + if (!sp->role.cr4_pae || sp->role.direct) { > > page_offset <<= 1; /* 32->64 */ > > /* > > * A 32-bit pde maps 4MB while the shadow pdes map > > While I personally prefer this approach to not setting role.cr4_pae in > kvm_calc_mmu_role_common() I'd like to get maintainers opinion (I did > test the patch with ept=off but I have to admit I don't know much about > shadow page tables which actually use detect_write_misaligned()/ > get_written_sptes()) > > Paolo, Radim, (anyone else) - any thoughts? The changes to detect_write_misaligned() and get_written_sptes() are wrong/unnecessary as those functions should only be called for indirect shadow PTEs, e.g.: for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) { if (detect_write_misaligned(sp, gpa, bytes) || detect_write_flooding(sp)) { kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); ++vcpu->kvm->stat.mmu_flooded; continue; } spte = get_written_sptes(sp, gpa, &npte); if (!spte) continue; ... } If anything, they could WARN_ON(sp->role.direct), but IMO that's overkill since they're static helpers with a single call site (above). I'm missing the background for this patch, why does clearing role.cr4_pae for direct SPTEs cause problems with the prev_roots cache? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 RFC] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-03-07 16:06 ` Sean Christopherson @ 2019-03-07 16:41 ` Vitaly Kuznetsov 2019-03-07 18:59 ` Sean Christopherson 0 siblings, 1 reply; 10+ messages in thread From: Vitaly Kuznetsov @ 2019-03-07 16:41 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Radim Krčmář, kvm, Junaid Shahid, linux-kernel Sean Christopherson <sean.j.christopherson@intel.com> writes: > On Thu, Mar 07, 2019 at 03:07:05PM +0100, Vitaly Kuznetsov wrote: >> Vitaly Kuznetsov <vkuznets@redhat.com> writes: >> >> > Alternative patch: remove the filtering from kvm_mmu_get_page() and check >> > for direct on call sites. cr4_pae setting in kvm_calc_mmu_role_common() >> > can be preserved for consistency. >> > >> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> > --- >> > arch/x86/kvm/mmu.c | 6 ++---- >> > 1 file changed, 2 insertions(+), 4 deletions(-) >> > >> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> > index f2d1d230d5b8..7fb8118f2af6 100644 >> > --- a/arch/x86/kvm/mmu.c >> > +++ b/arch/x86/kvm/mmu.c >> > @@ -2420,8 +2420,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, >> > role = vcpu->arch.mmu->mmu_role.base; >> > role.level = level; >> > role.direct = direct; >> > - if (role.direct) >> > - role.cr4_pae = 0; >> > role.access = access; >> > if (!vcpu->arch.mmu->direct_map >> > && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { >> > @@ -5176,7 +5174,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa, >> > gpa, bytes, sp->role.word); >> > >> > offset = offset_in_page(gpa); >> > - pte_size = sp->role.cr4_pae ? 8 : 4; >> > + pte_size = (sp->role.cr4_pae && !sp->role.direct) ? 8 : 4; >> > >> > /* >> > * Sometimes, the OS only writes the last one bytes to update status >> > @@ -5200,7 +5198,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte) >> > page_offset = offset_in_page(gpa); >> > level = sp->role.level; >> > *nspte = 1; >> > - if (!sp->role.cr4_pae) { >> > + if (!sp->role.cr4_pae || sp->role.direct) { >> > page_offset <<= 1; /* 32->64 */ >> > /* >> > * A 32-bit pde maps 4MB while the shadow pdes map >> >> While I personally prefer this approach to not setting role.cr4_pae in >> kvm_calc_mmu_role_common() I'd like to get maintainers opinion (I did >> test the patch with ept=off but I have to admit I don't know much about >> shadow page tables which actually use detect_write_misaligned()/ >> get_written_sptes()) >> >> Paolo, Radim, (anyone else) - any thoughts? > > The changes to detect_write_misaligned() and get_written_sptes() are > wrong/unnecessary as those functions should only be called for indirect > shadow PTEs, e.g.: > > for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) { > if (detect_write_misaligned(sp, gpa, bytes) || > detect_write_flooding(sp)) { > kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); > ++vcpu->kvm->stat.mmu_flooded; > continue; > } > > spte = get_written_sptes(sp, gpa, &npte); > if (!spte) > continue; > > ... > } > > If anything, they could WARN_ON(sp->role.direct), but IMO that's overkill > since they're static helpers with a single call site (above). > > I'm missing the background for this patch, why does clearing role.cr4_pae > for direct SPTEs cause problems with the prev_roots cache? Oh, thank you for taking a look! You've probably missed my original v1 patch where I tried to explaing the issue: "I noticed that fast_cr3_switch() always fails when we switch back from L2 to L1 as it is not able to find a cached root. This is odd: host's CR3 usually stays the same, we expect to always follow the fast path. Turns out the problem is that page role is always mismatched because kvm_mmu_get_page() filters out cr4_pae when direct, the value is stored in page header and later compared with new_role in cached_root_available(). As cr4_pae is always set in long mode prev_roots cache is dysfunctional. The problem appeared after we introduced kvm_calc_mmu_role_common(): previously, we were only setting this bit for shadow MMU root but now we set it for everything. Restore the original behavior. Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") " and the patch was just removing role.cr4_pae setting and moving it to kvm_calc_shadow_mmu_root_page_role(). This is an alternative approach - always set cr4_pae but count on kvm_mmu_get_page() filtering out cr4_pae when direct is set. -- Vitaly ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 RFC] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-03-07 16:41 ` Vitaly Kuznetsov @ 2019-03-07 18:59 ` Sean Christopherson 2019-03-07 19:02 ` Sean Christopherson 0 siblings, 1 reply; 10+ messages in thread From: Sean Christopherson @ 2019-03-07 18:59 UTC (permalink / raw) To: Vitaly Kuznetsov Cc: Paolo Bonzini, Radim Krčmář, kvm, Junaid Shahid, linux-kernel On Thu, Mar 07, 2019 at 05:41:39PM +0100, Vitaly Kuznetsov wrote: > Sean Christopherson <sean.j.christopherson@intel.com> writes: > > > On Thu, Mar 07, 2019 at 03:07:05PM +0100, Vitaly Kuznetsov wrote: > >> Vitaly Kuznetsov <vkuznets@redhat.com> writes: > >> > >> > Alternative patch: remove the filtering from kvm_mmu_get_page() and check > >> > for direct on call sites. cr4_pae setting in kvm_calc_mmu_role_common() > >> > can be preserved for consistency. > >> > > >> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > >> > --- > >> > arch/x86/kvm/mmu.c | 6 ++---- > >> > 1 file changed, 2 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > >> > index f2d1d230d5b8..7fb8118f2af6 100644 > >> > --- a/arch/x86/kvm/mmu.c > >> > +++ b/arch/x86/kvm/mmu.c > >> > @@ -2420,8 +2420,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > >> > role = vcpu->arch.mmu->mmu_role.base; > >> > role.level = level; > >> > role.direct = direct; > >> > - if (role.direct) > >> > - role.cr4_pae = 0; > >> > role.access = access; > >> > if (!vcpu->arch.mmu->direct_map > >> > && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { > >> > @@ -5176,7 +5174,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa, > >> > gpa, bytes, sp->role.word); > >> > > >> > offset = offset_in_page(gpa); > >> > - pte_size = sp->role.cr4_pae ? 8 : 4; > >> > + pte_size = (sp->role.cr4_pae && !sp->role.direct) ? 8 : 4; > >> > > >> > /* > >> > * Sometimes, the OS only writes the last one bytes to update status > >> > @@ -5200,7 +5198,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte) > >> > page_offset = offset_in_page(gpa); > >> > level = sp->role.level; > >> > *nspte = 1; > >> > - if (!sp->role.cr4_pae) { > >> > + if (!sp->role.cr4_pae || sp->role.direct) { > >> > page_offset <<= 1; /* 32->64 */ > >> > /* > >> > * A 32-bit pde maps 4MB while the shadow pdes map > >> > >> While I personally prefer this approach to not setting role.cr4_pae in > >> kvm_calc_mmu_role_common() I'd like to get maintainers opinion (I did > >> test the patch with ept=off but I have to admit I don't know much about > >> shadow page tables which actually use detect_write_misaligned()/ > >> get_written_sptes()) > >> > >> Paolo, Radim, (anyone else) - any thoughts? > > > > The changes to detect_write_misaligned() and get_written_sptes() are > > wrong/unnecessary as those functions should only be called for indirect > > shadow PTEs, e.g.: > > > > for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) { > > if (detect_write_misaligned(sp, gpa, bytes) || > > detect_write_flooding(sp)) { > > kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); > > ++vcpu->kvm->stat.mmu_flooded; > > continue; > > } > > > > spte = get_written_sptes(sp, gpa, &npte); > > if (!spte) > > continue; > > > > ... > > } > > > > If anything, they could WARN_ON(sp->role.direct), but IMO that's overkill > > since they're static helpers with a single call site (above). > > > > I'm missing the background for this patch, why does clearing role.cr4_pae > > for direct SPTEs cause problems with the prev_roots cache? > > Oh, thank you for taking a look! You've probably missed my original v1 > patch where I tried to explaing the issue: > > "I noticed that fast_cr3_switch() always fails when we switch back from > L2 to L1 as it is not able to find a cached root. This is odd: host's > CR3 usually stays the same, we expect to always follow the fast > path. Turns out the problem is that page role is always mismatched > because kvm_mmu_get_page() filters out cr4_pae when direct, the value is > stored in page header and later compared with new_role in > cached_root_available(). As cr4_pae is always set in long mode > prev_roots cache is dysfunctional. > > The problem appeared after we introduced kvm_calc_mmu_role_common(): > previously, we were only setting this bit for shadow MMU root but now we > set it for everything. Restore the original behavior. > > Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") > " > > and the patch was just removing role.cr4_pae setting and moving it to > kvm_calc_shadow_mmu_root_page_role(). This is an alternative approach - > always set cr4_pae but count on kvm_mmu_get_page() filtering out cr4_pae > when direct is set. Ah, but neither approach solves the underlying problem that KVM can't identify a shadow page that is associated with a nested EPT entry. For example, kvm_calc_shadow_ept_root_page_role() should set cr4_pae=1, i.e. always has 8-byte guest PTEs regardless of PAE. But that would break e.g. __kvm_sync_page() as it would always mismatch if L2 is a non-PAE guest (I think I got that right...). I think what we could do is repurpose role's nxe, cr0_wp, and sm{a,e}p_andnot_wp bits to uniquely identify a nested EPT/NPT entry. E.g. cr0_wp=1 and sm{a,e}p_andnot_wp=1 are an impossible combination. I'll throw together a patch to see what breaks. In fact, I think we could revamp kvm_calc_shadow_ept_root_page_role() to completely ignore all legacy paging bits, i.e. handling changes in L2's configuration is L1's problem. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 RFC] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case 2019-03-07 18:59 ` Sean Christopherson @ 2019-03-07 19:02 ` Sean Christopherson 0 siblings, 0 replies; 10+ messages in thread From: Sean Christopherson @ 2019-03-07 19:02 UTC (permalink / raw) To: Vitaly Kuznetsov Cc: Paolo Bonzini, Radim Krčmář, kvm, Junaid Shahid, linux-kernel On Thu, Mar 07, 2019 at 10:59:46AM -0800, Sean Christopherson wrote: > I think what we could do is repurpose role's nxe, cr0_wp, and > sm{a,e}p_andnot_wp bits to uniquely identify a nested EPT/NPT entry. Ignore the "NPT" comment, this would only apply to EPT. > E.g. cr0_wp=1 and sm{a,e}p_andnot_wp=1 are an impossible combination. > I'll throw together a patch to see what breaks. In fact, I think we > could revamp kvm_calc_shadow_ept_root_page_role() to completely ignore > all legacy paging bits, i.e. handling changes in L2's configuration is > L1's problem. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-03-07 19:02 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-22 16:46 [PATCH] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case Vitaly Kuznetsov 2019-02-22 17:17 ` Paolo Bonzini 2019-02-22 18:49 ` Vitaly Kuznetsov 2019-02-22 20:29 ` Paolo Bonzini 2019-02-23 11:15 ` [PATCH v2 RFC] " Vitaly Kuznetsov 2019-03-07 14:07 ` Vitaly Kuznetsov 2019-03-07 16:06 ` Sean Christopherson 2019-03-07 16:41 ` Vitaly Kuznetsov 2019-03-07 18:59 ` Sean Christopherson 2019-03-07 19:02 ` Sean Christopherson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.