From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [PATCH v2 4/5] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits. Date: Fri, 25 Nov 2016 10:45:28 +0100 Message-ID: References: <1478646030-101103-5-git-send-email-junaids@google.com> <41116680.VFaKihePAT@js-desktop.mtv.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: kvm@vger.kernel.org, andreslc@google.com, pfeiner@google.com, guangrong.xiao@linux.intel.com To: Junaid Shahid Return-path: Received: from mail-wm0-f66.google.com ([74.125.82.66]:36746 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753342AbcKYJpk (ORCPT ); Fri, 25 Nov 2016 04:45:40 -0500 Received: by mail-wm0-f66.google.com with SMTP id m203so7213016wma.3 for ; Fri, 25 Nov 2016 01:45:32 -0800 (PST) In-Reply-To: <41116680.VFaKihePAT@js-desktop.mtv.corp.google.com> Sender: kvm-owner@vger.kernel.org List-ID: On 24/11/2016 04:50, Junaid Shahid wrote: > On Monday, November 21, 2016 03:42:23 PM Paolo Bonzini wrote: >> Please introduce a new function spte_is_access_tracking_enabled(u64 >> old_spte, u64 new_spte) and use it here: >> >> if (shadow_accessed_mask ? >> spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask) : >> spte_is_access_tracking_enabled(old_spte, new_spte)) { >> flush |= !shadow_accessed_mask; >> kvm_set_pfn_accessed(spte_to_pfn(old_spte)); >> } >> > > I think we can just set flush = true in the then block instead of > flush |= !shadow_accessed_mask. > > And while we are at it, is there any reason to flush the TLB when setting the > A or D bit in the PTE? If not, we can remove this earlier block since the > clearing case is now handled in the separate if blocks for accessed and dirty: Hmm, flushing the TLB is expensive. I would have thought that we want to avoid a shootdown (kvm_flush_remote_tlbs) for the A and D bits. But it's probably rare enough that it doesn't matter, and the existing code has /* * Flush TLB when accessed/dirty bits are changed in the page tables, * to guarantee consistency between TLB and page tables. */ if (spte_is_bit_changed(old_spte, new_spte, shadow_accessed_mask | shadow_dirty_mask)) ret = true; so yeah. > if (spte_is_bit_changed(old_spte, new_spte, > shadow_accessed_mask | shadow_dirty_mask)) > flush = true; > > Also, instead of spte_is_access_tracking_enabled(), I’ve added is_accessed_spte > as you suggested later and used that here as well. Yes, that makes more sense. Thanks! >>> - if (spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask)) >>> + >>> + if (shadow_dirty_mask ? >>> + spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask) : >>> + writable_cleared) >> >> writable_cleared can be inline here and written as >> >> spte_is_bit_cleared(old_spte, new_spte, PT_WRITABLE_MASK) >> >> so >> >> if (shadow_dirty_mask ? >> spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask) : >> spte_is_bit_cleared(old_spte, new_spte, PT_WRITABLE_MASK)) { >> flush |= !shadow_dirty_mask; >> kvm_set_pfn_dirty(spte_to_pfn(old_spte)); >> } >>> kvm_set_pfn_dirty(spte_to_pfn(old_spte)); >>> >>> - return ret; >>> + return flush; >>> } >> >> But please anticipate the changes to this function, except for >> introducing is_access_track_spte of course, to a separate function. > > Sorry, I didn’t exactly understand what you meant by the last line. But I have > made it like this: > > if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) { > flush = true; > kvm_set_pfn_accessed(spte_to_pfn(old_spte)); > } > if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) { > flush = true; > kvm_set_pfn_dirty(spte_to_pfn(old_spte)); > } The idea is to split this patch in two. You can first refactor the function to have the above code and introduce is_accessed_spte/is_dirty_spte. Then, you add the lockless access tracking on top. But it's okay if we leave it for a subsequent review. There are going to be many changes already between v2 and v3! > I’ve replaced the checks with if(is_accessed_spte...) / if (is_dirty_spte...) Good! >>> + /* >>> + * Any PTE marked for access tracking should also be marked for dirty >>> + * tracking (by being non-writable) >>> + */ >>> + spte &= ~PT_WRITABLE_MASK; >> >> This should be implicit in the definition of >> shadow_acc_track_mask/value, so it's not necessary (or it can be a >> WARN_ONCE). > > It can't be handled by shadow_acc_track_mask/value, but I have changed > shadow_acc_track_saved_bits_mask to save only the R/X bits, which achieves the > same result. > >> >>> + spte &= ~(shadow_acc_track_saved_bits_mask << >>> + shadow_acc_track_saved_bits_shift); >> >> Should there be a WARN_ON if these bits are not zero? > > No, these bits can be non-zero from a previous instance of > mark_spte_for_access_track. They are not cleared when the PTE is restored to > normal state. (Though we could do that and then have a WARN_ONCE here.) Ok, if it's not too hard, do add it. I think it's worth having more self-checks. >>> @@ -2919,10 +3063,17 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, >>> * >>> * Compare with set_spte where instead shadow_dirty_mask is set. >>> */ >>> - if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) != spte) >>> + if (cmpxchg64(sptep, old_spte, new_spte) != old_spte) >>> return false; >>> >>> - kvm_vcpu_mark_page_dirty(vcpu, gfn); >>> + if (remove_write_prot) { >> >> Stupid question ahead, why not call kvm_vcpu_mark_page_accessed in this >> function? > > We could, but I kept that call in the same paths as before for consistency with > the other cases. Plus, even though it is a cheap call, why add it to the fast > PF path if it is not necessary. Makes sense, I said it was a stupid question. :) >>> @@ -2953,36 +3104,43 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, >>> >>> /* >>> - * Check if it is a spurious fault caused by TLB lazily flushed. >>> + * Check whether the memory access that caused the fault would >>> + * still cause it if it were to be performed right now. If not, >>> + * then this is a spurious fault caused by TLB lazily flushed, >>> + * or some other CPU has already fixed the PTE after the >>> + * current CPU took the fault. >>> + * >>> + * If Write-Only mappings ever become supported, then the >>> + * condition below would need to be changed appropriately. >>> * >>> * Need not check the access of upper level table entries since >>> * they are always ACC_ALL. >>> */ >>> - if (is_writable_pte(spte)) { >>> - ret = true; >>> + if (((spte & PT_PRESENT_MASK) && !remove_write_prot) || >>> + valid_exec_access) { >>> + fault_handled = true; >>> break; >>> } >> >> Let's separate the three conditions (R/W/X): >> >> if ((error_code & PFERR_FETCH_MASK) { >> if ((spte & (shadow_x_mask|shadow_nx_mask)) >> == shadow_x_mask) { >> fault_handled = true; >> break; >> } >> } >> if (error_code & PFERR_WRITE_MASK) { >> if (is_writable_pte(spte)) { >> fault_handled = true; >> break; >> } >> remove_write_prot = >> spte_can_locklessly_be_made_writable(spte); >> } >> if (!(error_code & PFERR_PRESENT_MASK)) { >> if (!is_access_track_spte(spte)) { >> fault_handled = true; >> break; >> } >> remove_acc_track = true; >> } > > I think the third block is incorrect e.g. it will set fault_handled = true even > for a completely zero PTE. A completely zero PTE would have been filtered before by the is_shadow_present_pte check, wouldn't it? >>> >>> +/* This is only supposed to be used for non-EPT mappings */ >> >> It's only used for non-EPT mappings, why is it only *supposed* to be >> used for non-EPT mappings? It seems to me that it would work. >> >>> static bool need_remote_flush(u64 old, u64 new) >>> { > > It would work but it will return true in at least one (probably only) case > where it doesn’t need to e.g. going from an acc-track PTE to a zeroed PTE. > Though maybe that is not a big deal. Of course, we could also just update it > to handle acc-track. Yeah, I think it's not a big deal. But actually is it really used only for non-EPT? Nested virtualization uses kvm_mmu_pte_write for EPT as well. Paolo > > > Thanks, > Junaid > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >