All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junaid Shahid <junaids@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, andreslc@google.com, pfeiner@google.com,
	guangrong.xiao@linux.intel.com
Subject: Re: [PATCH v2 4/5] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits.
Date: Mon, 28 Nov 2016 18:43:07 -0800	[thread overview]
Message-ID: <1857180.zXFPSRIdpL@js-desktop.mtv.corp.google.com> (raw)
In-Reply-To: <f393f6b8-c61a-a408-55ce-58a3cf7087a5@redhat.com>

On Friday, November 25, 2016 10:45:28 AM Paolo Bonzini wrote:
> 
> On 24/11/2016 04:50, Junaid Shahid wrote:
> > On Monday, November 21, 2016 03:42:23 PM Paolo Bonzini wrote:
> >> Please introduce a new function spte_is_access_tracking_enabled(u64 
> >> old_spte, u64 new_spte) and use it here:
> >>
> >> 	if (shadow_accessed_mask ?
> >> 	    spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask) :
> >> 	    spte_is_access_tracking_enabled(old_spte, new_spte)) {
> >> 		flush |= !shadow_accessed_mask;
> >> 		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> >> 	}
> >>
> > 
> > I think we can just set flush = true in the then block instead of 
> > flush |= !shadow_accessed_mask.
> > 
> > And while we are at it, is there any reason to flush the TLB when setting the
> > A or D bit in the PTE? If not, we can remove this earlier block since the
> > clearing case is now handled in the separate if blocks for accessed and dirty:
> 
> Hmm, flushing the TLB is expensive.  I would have thought that we want to avoid
> a shootdown (kvm_flush_remote_tlbs) for the A and D bits.  But it's probably rare
> enough that it doesn't matter, and the existing code has
> 
>         /*
>          * Flush TLB when accessed/dirty bits are changed in the page tables,
>          * to guarantee consistency between TLB and page tables.
>          */
>         if (spte_is_bit_changed(old_spte, new_spte,
>                                 shadow_accessed_mask | shadow_dirty_mask))
>                 ret = true;
> 
> so yeah.

Ok. So I’ll remove the existing spte_is_bit_changed block and set flush = true
inside the separate blocks that check accessed and dirty masks.

> >> But please anticipate the changes to this function, except for 
> >> introducing is_access_track_spte of course, to a separate function.
> > 
> > Sorry, I didn’t exactly understand what you meant by the last line. But I have
> > made it like this:
> > 
> > 	if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
> > 		flush = true;
> > 		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> > 	}
> > 	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) {
> > 		flush = true;
> > 		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
> > 	}
> 
> The idea is to split this patch in two.  You can first refactor the function to
> have the above code and introduce is_accessed_spte/is_dirty_spte.  Then, you
> add the lockless access tracking on top.  But it's okay if we leave it for a
> subsequent review.  There are going to be many changes already between v2 and v3!

Thanks for the clarification. I guess I can just add the other patch now to
separate out the refactoring from the rest of the access tracking changes.

> >>> +	spte &= ~(shadow_acc_track_saved_bits_mask <<
> >>> +		  shadow_acc_track_saved_bits_shift);
> >>
> >> Should there be a WARN_ON if these bits are not zero?
> > 
> > No, these bits can be non-zero from a previous instance of 
> > mark_spte_for_access_track. They are not cleared when the PTE is restored to
> > normal state. (Though we could do that and then have a WARN_ONCE here.)
> 
> Ok, if it's not too hard, do add it.  I think it's worth having
> more self-checks.

Sure. Will do.

> 
> >>> @@ -2953,36 +3104,43 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> >>>  
> >>>  		/*
> >>> -		 * Check if it is a spurious fault caused by TLB lazily flushed.
> >>> +		 * Check whether the memory access that caused the fault would
> >>> +		 * still cause it if it were to be performed right now. If not,
> >>> +		 * then this is a spurious fault caused by TLB lazily flushed,
> >>> +		 * or some other CPU has already fixed the PTE after the
> >>> +		 * current CPU took the fault.
> >>> +		 *
> >>> +		 * If Write-Only mappings ever become supported, then the
> >>> +		 * condition below would need to be changed appropriately.
> >>>  		 *
> >>>  		 * Need not check the access of upper level table entries since
> >>>  		 * they are always ACC_ALL.
> >>>  		 */
> >>> -		if (is_writable_pte(spte)) {
> >>> -			ret = true;
> >>> +		if (((spte & PT_PRESENT_MASK) && !remove_write_prot) ||
> >>> +		    valid_exec_access) {
> >>> +			fault_handled = true;
> >>>  			break;
> >>>  		}
> >>
> >> Let's separate the three conditions (R/W/X):
> >>
> >> 		if ((error_code & PFERR_FETCH_MASK) {
> >> 			if ((spte & (shadow_x_mask|shadow_nx_mask))
> >> 			    == shadow_x_mask) {
> >> 				fault_handled = true;
> >> 				break;
> >> 			}
> >> 		}
> >> 		if (error_code & PFERR_WRITE_MASK) {
> >> 			if (is_writable_pte(spte)) {
> >> 				fault_handled = true;
> >> 				break;
> >> 			}
> >> 			remove_write_prot =
> >> 				spte_can_locklessly_be_made_writable(spte);
> >> 		}
> >> 		if (!(error_code & PFERR_PRESENT_MASK)) {
> >> 			if (!is_access_track_spte(spte)) {
> >> 				fault_handled = true;
> >> 				break;
> >> 			}
> >> 			remove_acc_track = true;
> >> 		}
> > 
> > I think the third block is incorrect e.g. it will set fault_handled = true even
> > for a completely zero PTE.
> 
> A completely zero PTE would have been filtered before by the
> is_shadow_present_pte check, wouldn't it?

Oh, the is_shadow_present_pte check was actually removed in the patch. We could
add it back, minus the ret = true statement, and then it would filter the zero 
PTE case. But I still think that the other form:

                if ((error_code & PFERR_USER_MASK) &&
                    (spte & PT_PRESENT_MASK)) {
                        fault_handled = true;
                        break;
                }

is simpler as it is directly analogous to the cases for fetch and write.
Please let me know if you think otherwise.

> >>> +/* This is only supposed to be used for non-EPT mappings */
> >>
> >> It's only used for non-EPT mappings, why is it only *supposed* to be 
> >> used for non-EPT mappings?  It seems to me that it would work.
> >>
> >>>  static bool need_remote_flush(u64 old, u64 new)
> >>>  {
> > 
> > It would work but it will return true in at least one (probably only) case 
> > where it doesn’t need to e.g. going from an acc-track PTE to a zeroed PTE.
> > Though maybe that is not a big deal. Of course, we could also just update it 
> > to handle acc-track. 
> 
> Yeah, I think it's not a big deal.  But actually is it really used only
> for non-EPT?  Nested virtualization uses kvm_mmu_pte_write for EPT as well.
 
Ok. I guess it might be more accurate to say indirect mappings instead of
non-EPT mappings. In any case, I’ll just remove the comment.

Thanks,
Junaid


  reply	other threads:[~2016-11-29  2:43 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-27  2:19 [PATCH 0/4] Lockless Access Tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-10-27  2:19 ` [PATCH 1/4] kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications Junaid Shahid
2016-11-02 18:03   ` Paolo Bonzini
2016-11-02 21:40     ` Junaid Shahid
2016-10-27  2:19 ` [PATCH 2/4] kvm: x86: mmu: Rename spte_is_locklessly_modifiable() Junaid Shahid
2016-10-27  2:19 ` [PATCH 3/4] kvm: x86: mmu: Fast Page Fault path retries Junaid Shahid
2016-10-27  2:19 ` [PATCH 4/4] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-11-02 18:01   ` Paolo Bonzini
2016-11-02 21:42     ` Junaid Shahid
2016-11-08 23:00 ` [PATCH v2 0/5] Lockless Access Tracking " Junaid Shahid
2016-11-08 23:00   ` [PATCH v2 1/5] kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications Junaid Shahid
2016-11-21 13:06     ` Paolo Bonzini
2016-11-08 23:00   ` [PATCH v2 2/5] kvm: x86: mmu: Rename spte_is_locklessly_modifiable() Junaid Shahid
2016-11-21 13:07     ` Paolo Bonzini
2016-11-08 23:00   ` [PATCH v2 3/5] kvm: x86: mmu: Fast Page Fault path retries Junaid Shahid
2016-11-21 13:13     ` Paolo Bonzini
2016-11-08 23:00   ` [PATCH v2 4/5] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-11-21 14:42     ` Paolo Bonzini
2016-11-24  3:50       ` Junaid Shahid
2016-11-25  9:45         ` Paolo Bonzini
2016-11-29  2:43           ` Junaid Shahid [this message]
2016-11-29  8:09             ` Paolo Bonzini
2016-11-30  0:59               ` Junaid Shahid
2016-11-30 11:09                 ` Paolo Bonzini
2016-12-01 22:54       ` Junaid Shahid
2016-12-02  8:33         ` Paolo Bonzini
2016-12-05 22:57           ` Junaid Shahid
2016-11-08 23:00   ` [PATCH v2 5/5] kvm: x86: mmu: Update documentation for fast page fault mechanism Junaid Shahid
2016-12-07  0:46 ` [PATCH v3 0/8] Lockless Access Tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 1/8] kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications Junaid Shahid
2016-12-15  6:50     ` Xiao Guangrong
2016-12-15 23:06       ` Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 2/8] kvm: x86: mmu: Rename spte_is_locklessly_modifiable() Junaid Shahid
2016-12-15  6:51     ` Xiao Guangrong
2016-12-07  0:46   ` [PATCH v3 3/8] kvm: x86: mmu: Fast Page Fault path retries Junaid Shahid
2016-12-15  7:20     ` Xiao Guangrong
2016-12-15 23:36       ` Junaid Shahid
2016-12-16 13:13         ` Xiao Guangrong
2016-12-17  0:36           ` Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 4/8] kvm: x86: mmu: Refactor accessed/dirty checks in mmu_spte_update/clear Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 5/8] kvm: x86: mmu: Introduce a no-tracking version of mmu_spte_update Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 6/8] kvm: x86: mmu: Do not use bit 63 for tracking special SPTEs Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 7/8] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-12-14 16:28     ` Paolo Bonzini
2016-12-14 22:36       ` Junaid Shahid
2016-12-14 23:35         ` Paolo Bonzini
2016-12-16 13:04     ` Xiao Guangrong
2016-12-16 15:23       ` Paolo Bonzini
2016-12-17  0:01         ` Junaid Shahid
2016-12-21  9:49         ` Xiao Guangrong
2016-12-21 18:00           ` Paolo Bonzini
2016-12-17  2:04       ` Junaid Shahid
2016-12-17 14:19         ` Paolo Bonzini
2016-12-20  3:36           ` Junaid Shahid
2016-12-20  9:01             ` Paolo Bonzini
2016-12-07  0:46   ` [PATCH v3 8/8] kvm: x86: mmu: Update documentation for fast page fault mechanism Junaid Shahid

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1857180.zXFPSRIdpL@js-desktop.mtv.corp.google.com \
    --to=junaids@google.com \
    --cc=andreslc@google.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pfeiner@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.