All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Junaid Shahid <junaids@google.com>
Cc: kvm@vger.kernel.org, andreslc@google.com, pfeiner@google.com,
	guangrong.xiao@linux.intel.com
Subject: Re: [PATCH v2 4/5] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits.
Date: Fri, 25 Nov 2016 10:45:28 +0100	[thread overview]
Message-ID: <f393f6b8-c61a-a408-55ce-58a3cf7087a5@redhat.com> (raw)
In-Reply-To: <41116680.VFaKihePAT@js-desktop.mtv.corp.google.com>



On 24/11/2016 04:50, Junaid Shahid wrote:
> On Monday, November 21, 2016 03:42:23 PM Paolo Bonzini wrote:
>> Please introduce a new function spte_is_access_tracking_enabled(u64 
>> old_spte, u64 new_spte) and use it here:
>>
>> 	if (shadow_accessed_mask ?
>> 	    spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask) :
>> 	    spte_is_access_tracking_enabled(old_spte, new_spte)) {
>> 		flush |= !shadow_accessed_mask;
>> 		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
>> 	}
>>
> 
> I think we can just set flush = true in the then block instead of 
> flush |= !shadow_accessed_mask.
> 
> And while we are at it, is there any reason to flush the TLB when setting the
> A or D bit in the PTE? If not, we can remove this earlier block since the
> clearing case is now handled in the separate if blocks for accessed and dirty:

Hmm, flushing the TLB is expensive.  I would have thought that we want to avoid
a shootdown (kvm_flush_remote_tlbs) for the A and D bits.  But it's probably rare
enough that it doesn't matter, and the existing code has

        /*
         * Flush TLB when accessed/dirty bits are changed in the page tables,
         * to guarantee consistency between TLB and page tables.
         */
        if (spte_is_bit_changed(old_spte, new_spte,
                                shadow_accessed_mask | shadow_dirty_mask))
                ret = true;

so yeah.

> 	if (spte_is_bit_changed(old_spte, new_spte,
>                            shadow_accessed_mask | shadow_dirty_mask))
> 		flush = true;
> 
> Also, instead of spte_is_access_tracking_enabled(), I’ve added is_accessed_spte
> as you suggested later and used that here as well.

Yes, that makes more sense.  Thanks!

>>> -	if (spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask))
>>> +
>>> +	if (shadow_dirty_mask ?
>>> +	    spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask) :
>>> +	    writable_cleared)
>>
>> writable_cleared can be inline here and written as
>>
>> 	spte_is_bit_cleared(old_spte, new_spte, PT_WRITABLE_MASK)
>>
>> so
>>
>> 	if (shadow_dirty_mask ?
>> 	    spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask) :
>> 	    spte_is_bit_cleared(old_spte, new_spte, PT_WRITABLE_MASK)) {
>> 		flush |= !shadow_dirty_mask;
>> 		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
>> 	}
>>>  		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
>>>  
>>> -	return ret;
>>> +	return flush;
>>>  }
>>
>> But please anticipate the changes to this function, except for 
>> introducing is_access_track_spte of course, to a separate function.
> 
> Sorry, I didn’t exactly understand what you meant by the last line. But I have
> made it like this:
> 
> 	if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
> 		flush = true;
> 		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> 	}
> 	if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) {
> 		flush = true;
> 		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
> 	}

The idea is to split this patch in two.  You can first refactor the function to
have the above code and introduce is_accessed_spte/is_dirty_spte.  Then, you
add the lockless access tracking on top.  But it's okay if we leave it for a
subsequent review.  There are going to be many changes already between v2 and v3!

> I’ve replaced the checks with if(is_accessed_spte...) / if (is_dirty_spte...)

Good!

>>> +	/*
>>> +	 * Any PTE marked for access tracking should also be marked for dirty
>>> +	 * tracking (by being non-writable)
>>> +	 */
>>> +	spte &= ~PT_WRITABLE_MASK;
>>
>> This should be implicit in the definition of 
>> shadow_acc_track_mask/value, so it's not necessary (or it can be a 
>> WARN_ONCE).
> 
> It can't be handled by shadow_acc_track_mask/value, but I have changed
> shadow_acc_track_saved_bits_mask to save only the R/X bits, which achieves the
> same result.
> 
>>
>>> +	spte &= ~(shadow_acc_track_saved_bits_mask <<
>>> +		  shadow_acc_track_saved_bits_shift);
>>
>> Should there be a WARN_ON if these bits are not zero?
> 
> No, these bits can be non-zero from a previous instance of 
> mark_spte_for_access_track. They are not cleared when the PTE is restored to
> normal state. (Though we could do that and then have a WARN_ONCE here.)

Ok, if it's not too hard, do add it.  I think it's worth having
more self-checks.

>>> @@ -2919,10 +3063,17 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>>>  	 *
>>>  	 * Compare with set_spte where instead shadow_dirty_mask is set.
>>>  	 */
>>> -	if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) != spte)
>>> +	if (cmpxchg64(sptep, old_spte, new_spte) != old_spte)
>>>  		return false;
>>>  
>>> -	kvm_vcpu_mark_page_dirty(vcpu, gfn);
>>> +	if (remove_write_prot) {
>>
>> Stupid question ahead, why not call kvm_vcpu_mark_page_accessed in this 
>> function?
> 
> We could, but I kept that call in the same paths as before for consistency with
> the other cases. Plus, even though it is a cheap call, why add it to the fast 
> PF path if it is not necessary.

Makes sense, I said it was a stupid question. :)

>>> @@ -2953,36 +3104,43 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
>>>  
>>>  		/*
>>> -		 * Check if it is a spurious fault caused by TLB lazily flushed.
>>> +		 * Check whether the memory access that caused the fault would
>>> +		 * still cause it if it were to be performed right now. If not,
>>> +		 * then this is a spurious fault caused by TLB lazily flushed,
>>> +		 * or some other CPU has already fixed the PTE after the
>>> +		 * current CPU took the fault.
>>> +		 *
>>> +		 * If Write-Only mappings ever become supported, then the
>>> +		 * condition below would need to be changed appropriately.
>>>  		 *
>>>  		 * Need not check the access of upper level table entries since
>>>  		 * they are always ACC_ALL.
>>>  		 */
>>> -		if (is_writable_pte(spte)) {
>>> -			ret = true;
>>> +		if (((spte & PT_PRESENT_MASK) && !remove_write_prot) ||
>>> +		    valid_exec_access) {
>>> +			fault_handled = true;
>>>  			break;
>>>  		}
>>
>> Let's separate the three conditions (R/W/X):
>>
>> 		if ((error_code & PFERR_FETCH_MASK) {
>> 			if ((spte & (shadow_x_mask|shadow_nx_mask))
>> 			    == shadow_x_mask) {
>> 				fault_handled = true;
>> 				break;
>> 			}
>> 		}
>> 		if (error_code & PFERR_WRITE_MASK) {
>> 			if (is_writable_pte(spte)) {
>> 				fault_handled = true;
>> 				break;
>> 			}
>> 			remove_write_prot =
>> 				spte_can_locklessly_be_made_writable(spte);
>> 		}
>> 		if (!(error_code & PFERR_PRESENT_MASK)) {
>> 			if (!is_access_track_spte(spte)) {
>> 				fault_handled = true;
>> 				break;
>> 			}
>> 			remove_acc_track = true;
>> 		}
> 
> I think the third block is incorrect e.g. it will set fault_handled = true even
> for a completely zero PTE.

A completely zero PTE would have been filtered before by the
is_shadow_present_pte check, wouldn't it?

>>>  
>>> +/* This is only supposed to be used for non-EPT mappings */
>>
>> It's only used for non-EPT mappings, why is it only *supposed* to be 
>> used for non-EPT mappings?  It seems to me that it would work.
>>
>>>  static bool need_remote_flush(u64 old, u64 new)
>>>  {
> 
> It would work but it will return true in at least one (probably only) case 
> where it doesn’t need to e.g. going from an acc-track PTE to a zeroed PTE.
> Though maybe that is not a big deal. Of course, we could also just update it 
> to handle acc-track. 

Yeah, I think it's not a big deal.  But actually is it really used only
for non-EPT?  Nested virtualization uses kvm_mmu_pte_write for EPT as well.

Paolo


> 
> 
> Thanks,
> Junaid
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2016-11-25  9:45 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-27  2:19 [PATCH 0/4] Lockless Access Tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-10-27  2:19 ` [PATCH 1/4] kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications Junaid Shahid
2016-11-02 18:03   ` Paolo Bonzini
2016-11-02 21:40     ` Junaid Shahid
2016-10-27  2:19 ` [PATCH 2/4] kvm: x86: mmu: Rename spte_is_locklessly_modifiable() Junaid Shahid
2016-10-27  2:19 ` [PATCH 3/4] kvm: x86: mmu: Fast Page Fault path retries Junaid Shahid
2016-10-27  2:19 ` [PATCH 4/4] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-11-02 18:01   ` Paolo Bonzini
2016-11-02 21:42     ` Junaid Shahid
2016-11-08 23:00 ` [PATCH v2 0/5] Lockless Access Tracking " Junaid Shahid
2016-11-08 23:00   ` [PATCH v2 1/5] kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications Junaid Shahid
2016-11-21 13:06     ` Paolo Bonzini
2016-11-08 23:00   ` [PATCH v2 2/5] kvm: x86: mmu: Rename spte_is_locklessly_modifiable() Junaid Shahid
2016-11-21 13:07     ` Paolo Bonzini
2016-11-08 23:00   ` [PATCH v2 3/5] kvm: x86: mmu: Fast Page Fault path retries Junaid Shahid
2016-11-21 13:13     ` Paolo Bonzini
2016-11-08 23:00   ` [PATCH v2 4/5] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-11-21 14:42     ` Paolo Bonzini
2016-11-24  3:50       ` Junaid Shahid
2016-11-25  9:45         ` Paolo Bonzini [this message]
2016-11-29  2:43           ` Junaid Shahid
2016-11-29  8:09             ` Paolo Bonzini
2016-11-30  0:59               ` Junaid Shahid
2016-11-30 11:09                 ` Paolo Bonzini
2016-12-01 22:54       ` Junaid Shahid
2016-12-02  8:33         ` Paolo Bonzini
2016-12-05 22:57           ` Junaid Shahid
2016-11-08 23:00   ` [PATCH v2 5/5] kvm: x86: mmu: Update documentation for fast page fault mechanism Junaid Shahid
2016-12-07  0:46 ` [PATCH v3 0/8] Lockless Access Tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 1/8] kvm: x86: mmu: Use symbolic constants for EPT Violation Exit Qualifications Junaid Shahid
2016-12-15  6:50     ` Xiao Guangrong
2016-12-15 23:06       ` Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 2/8] kvm: x86: mmu: Rename spte_is_locklessly_modifiable() Junaid Shahid
2016-12-15  6:51     ` Xiao Guangrong
2016-12-07  0:46   ` [PATCH v3 3/8] kvm: x86: mmu: Fast Page Fault path retries Junaid Shahid
2016-12-15  7:20     ` Xiao Guangrong
2016-12-15 23:36       ` Junaid Shahid
2016-12-16 13:13         ` Xiao Guangrong
2016-12-17  0:36           ` Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 4/8] kvm: x86: mmu: Refactor accessed/dirty checks in mmu_spte_update/clear Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 5/8] kvm: x86: mmu: Introduce a no-tracking version of mmu_spte_update Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 6/8] kvm: x86: mmu: Do not use bit 63 for tracking special SPTEs Junaid Shahid
2016-12-07  0:46   ` [PATCH v3 7/8] kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits Junaid Shahid
2016-12-14 16:28     ` Paolo Bonzini
2016-12-14 22:36       ` Junaid Shahid
2016-12-14 23:35         ` Paolo Bonzini
2016-12-16 13:04     ` Xiao Guangrong
2016-12-16 15:23       ` Paolo Bonzini
2016-12-17  0:01         ` Junaid Shahid
2016-12-21  9:49         ` Xiao Guangrong
2016-12-21 18:00           ` Paolo Bonzini
2016-12-17  2:04       ` Junaid Shahid
2016-12-17 14:19         ` Paolo Bonzini
2016-12-20  3:36           ` Junaid Shahid
2016-12-20  9:01             ` Paolo Bonzini
2016-12-07  0:46   ` [PATCH v3 8/8] kvm: x86: mmu: Update documentation for fast page fault mechanism Junaid Shahid

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f393f6b8-c61a-a408-55ce-58a3cf7087a5@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=andreslc@google.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pfeiner@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.