All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Ben Gardon <bgardon@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	Peter Shier <pshier@google.com>,
	Peter Feiner <pfeiner@google.com>,
	Junaid Shahid <junaids@google.com>,
	Jim Mattson <jmattson@google.com>,
	Yulei Zhang <yulei.kernel@gmail.com>,
	Wanpeng Li <kernellwp@gmail.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Subject: Re: [PATCH 11/24] kvm: x86/mmu: Put TDP MMU PT walks in RCU read-critical section
Date: Wed, 20 Jan 2021 14:19:32 -0800	[thread overview]
Message-ID: <YAisdPTXGDqzil5G@google.com> (raw)
In-Reply-To: <20210112181041.356734-12-bgardon@google.com>

On Tue, Jan 12, 2021, Ben Gardon wrote:                                         
> In order to enable concurrent modifications to the paging structures in       
> the TDP MMU, threads must be able to safely remove pages of page table        
> memory while other threads are traversing the same memory. To ensure          
> threads do not access PT memory after it is freed, protect PT memory          
> with RCU.                                                                     
                                                                                
Normally I like splitting up patches, but the three RCU patches (11-13) probably
need to be combined into a single patch.  I assume you introduced the RCU       
readers in a separate patch to isolate deadlocks, but it's impossible to give   
this patch a proper review without peeking ahead to see how what's actually     
being protected with RCU.                                                       
                                                                                
The combined changelog should also explain why READING_SHADOW_PAGE_TABLES isn't 
a good solution.  I suspect the answer is because the longer-running walks would
disable IRQs for too long, but that should be explicitly documented.

> Reviewed-by: Peter Feiner <pfeiner@google.com>                                
>                                                                               
> Signed-off-by: Ben Gardon <bgardon@google.com>                                
> ---                                                                           
>  arch/x86/kvm/mmu/tdp_mmu.c | 53 ++++++++++++++++++++++++++++++++++++--       
>  1 file changed, 51 insertions(+), 2 deletions(-)                             
>                                                                               
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c          
> index e8f35cd46b4c..662907d374b3 100644                                       
> --- a/arch/x86/kvm/mmu/tdp_mmu.c                                              
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c                                              
> @@ -458,11 +458,14 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
>   * Return true if this function yielded, the TLBs were flushed, and the      
>   * iterator's traversal was reset. Return false if a yield was not needed.   
>   */                                                                          
> -static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm, struct tdp_iter *iter)
> +static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm,                 
> +             struct tdp_iter *iter)                                          
                                                                                
Unrelated newline.                                                              
                                                                                
>  {                                                                            
>       if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {                 
>               kvm_flush_remote_tlbs(kvm);                                     
> +             rcu_read_unlock();                                              
                                                                                
I'm 99% certain rcu_read_unlock() can be moved before the TLB flush.  IIUC, RCU 
is protecting only the host kernel's software walks; the only true "writer" is  
immediately preceded by a remote TLB flush (in patch 13).                       
                                                                                
        kvm_flush_remote_tlbs_with_address(kvm, gfn,                            
                                           KVM_PAGES_PER_HPAGE(level));         
                                                                                
        call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);                  
                                                                                
That also resolves an inconsistency with zap_gfn_range(), which unlocks before
doing the remote flush.  Ditto for zap_collapsible_spte_range(), and I think a
few other flows.

>  		cond_resched_lock(&kvm->mmu_lock);
> +		rcu_read_lock();
>  		tdp_iter_refresh_walk(iter);
>  		return true;
>  	} else
> @@ -483,7 +486,9 @@ static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm, struct tdp_iter *it
>  static bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter)
>  {
>  	if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
> +		rcu_read_unlock();
>  		cond_resched_lock(&kvm->mmu_lock);
> +		rcu_read_lock();
>  		tdp_iter_refresh_walk(iter);
>  		return true;
>  	} else
> @@ -508,6 +513,8 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>  	gfn_t last_goal_gfn = start;
>  	bool flush_needed = false;
>  
> +	rcu_read_lock();
> +
>  	tdp_root_for_each_pte(iter, root, start, end) {
>  		/* Ensure forward progress has been made before yielding. */
>  		if (can_yield && iter.goal_gfn != last_goal_gfn &&
> @@ -538,6 +545,8 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>  		tdp_mmu_set_spte(kvm, &iter, 0);
>  		flush_needed = true;
>  	}
> +
> +	rcu_read_unlock();

Unlock before TLB flush.  <-------

>  	return flush_needed;
>  }

...

> @@ -844,6 +863,8 @@ static int set_tdp_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
>  	u64 new_spte;
>  	int need_flush = 0;
>  
> +	rcu_read_lock();
> +
>  	WARN_ON(pte_huge(*ptep));
>  
>  	new_pfn = pte_pfn(*ptep);
> @@ -872,6 +893,8 @@ static int set_tdp_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
>  	if (need_flush)
>  		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
>  
> +	rcu_read_unlock();

Unlock before flush?

> +
>  	return 0;
>  }
>  
  
...

> @@ -1277,10 +1322,14 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
>  
>  	*root_level = vcpu->arch.mmu->shadow_root_level;
>  
> +	rcu_read_lock();

Hrm, isn't this an existing bug?  And also not really the correct fix?  mmu_lock
is not held here, so the existing code has no protections.  Using
walk_shadow_page_lockless_begin/end() feels more appropriate for this particular
walk.

> +
>  	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
>  		leaf = iter.level;
>  		sptes[leaf] = iter.old_spte;
>  	}
>  
> +	rcu_read_unlock();
> +
>  	return leaf;
>  }
> -- 
> 2.30.0.284.gd98b1dd5eaa7-goog
> 

  reply	other threads:[~2021-01-20 22:30 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 18:10 [PATCH 00/24] Allow parallel page faults with TDP MMU Ben Gardon
2021-01-12 18:10 ` [PATCH 01/24] locking/rwlocks: Add contention detection for rwlocks Ben Gardon
2021-01-12 21:20   ` kernel test robot
2021-01-12 18:10 ` [PATCH 02/24] sched: Add needbreak " Ben Gardon
2021-01-12 18:10 ` [PATCH 03/24] sched: Add cond_resched_rwlock Ben Gardon
2021-01-12 18:10 ` [PATCH 04/24] kvm: x86/mmu: change TDP MMU yield function returns to match cond_resched Ben Gardon
2021-01-20 18:38   ` Sean Christopherson
2021-01-21 20:22     ` Paolo Bonzini
2021-01-26 14:11     ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 05/24] kvm: x86/mmu: Fix yielding in TDP MMU Ben Gardon
2021-01-20 19:28   ` Sean Christopherson
2021-01-22  1:06     ` Ben Gardon
2021-01-12 18:10 ` [PATCH 06/24] kvm: x86/mmu: Skip no-op changes in TDP MMU functions Ben Gardon
2021-01-20 19:51   ` Sean Christopherson
2021-01-25 23:51     ` Ben Gardon
2021-01-12 18:10 ` [PATCH 07/24] kvm: x86/mmu: Add comment on __tdp_mmu_set_spte Ben Gardon
2021-01-26 14:13   ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 08/24] kvm: x86/mmu: Add lockdep when setting a TDP MMU SPTE Ben Gardon
2021-01-20 19:58   ` Sean Christopherson
2021-01-26 14:13   ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 09/24] kvm: x86/mmu: Don't redundantly clear TDP MMU pt memory Ben Gardon
2021-01-20 20:06   ` Sean Christopherson
2021-01-26 14:14   ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 10/24] kvm: x86/mmu: Factor out handle disconnected pt Ben Gardon
2021-01-20 20:30   ` Sean Christopherson
2021-01-26 14:14   ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 11/24] kvm: x86/mmu: Put TDP MMU PT walks in RCU read-critical section Ben Gardon
2021-01-20 22:19   ` Sean Christopherson [this message]
2021-01-12 18:10 ` [PATCH 12/24] kvm: x86/kvm: RCU dereference tdp mmu page table links Ben Gardon
2021-01-22 18:32   ` Sean Christopherson
2021-01-26 18:17     ` Ben Gardon
2021-01-12 18:10 ` [PATCH 13/24] kvm: x86/mmu: Only free tdp_mmu pages after a grace period Ben Gardon
2021-01-12 18:10 ` [PATCH 14/24] kvm: mmu: Wrap mmu_lock lock / unlock in a function Ben Gardon
2021-01-13  2:35   ` kernel test robot
2021-01-13  2:35     ` kernel test robot
2021-01-12 18:10 ` [PATCH 15/24] kvm: mmu: Wrap mmu_lock cond_resched and needbreak Ben Gardon
2021-01-21  0:19   ` Sean Christopherson
2021-01-21 20:17     ` Paolo Bonzini
2021-01-26 14:38     ` Paolo Bonzini
2021-01-26 17:47       ` Ben Gardon
2021-01-26 17:55         ` Paolo Bonzini
2021-01-26 18:11           ` Ben Gardon
2021-01-26 20:47             ` Paolo Bonzini
2021-01-27 20:08               ` Ben Gardon
2021-01-27 20:55                 ` Paolo Bonzini
2021-01-27 21:20                   ` Ben Gardon
2021-01-28  8:18                     ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 16/24] kvm: mmu: Wrap mmu_lock assertions Ben Gardon
2021-01-26 14:29   ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 17/24] kvm: mmu: Move mmu_lock to struct kvm_arch Ben Gardon
2021-01-12 18:10 ` [PATCH 18/24] kvm: x86/mmu: Use an rwlock for the x86 TDP MMU Ben Gardon
2021-01-21  0:45   ` Sean Christopherson
2021-01-12 18:10 ` [PATCH 19/24] kvm: x86/mmu: Protect tdp_mmu_pages with a lock Ben Gardon
2021-01-21 19:22   ` Sean Christopherson
2021-01-21 21:32     ` Sean Christopherson
2021-01-26 14:27       ` Paolo Bonzini
2021-01-26 21:47         ` Ben Gardon
2021-01-26 22:02         ` Sean Christopherson
2021-01-26 22:09           ` Sean Christopherson
2021-01-27 12:40           ` Paolo Bonzini
2021-01-26 13:37   ` Paolo Bonzini
2021-01-26 21:07     ` Ben Gardon
2021-01-12 18:10 ` [PATCH 20/24] kvm: x86/mmu: Add atomic option for setting SPTEs Ben Gardon
2021-01-13  0:05   ` kernel test robot
2021-01-13  0:05     ` kernel test robot
2021-01-26 14:21   ` Paolo Bonzini
2021-01-12 18:10 ` [PATCH 21/24] kvm: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map Ben Gardon
2021-01-12 18:10 ` [PATCH 22/24] kvm: x86/mmu: Flush TLBs after zap in TDP MMU PF handler Ben Gardon
2021-01-21  0:05   ` Sean Christopherson
2021-01-12 18:10 ` [PATCH 23/24] kvm: x86/mmu: Freeze SPTEs in disconnected pages Ben Gardon
2021-01-12 18:10 ` [PATCH 24/24] kvm: x86/mmu: Allow parallel page faults for the TDP MMU Ben Gardon
2021-01-21  0:55   ` Sean Christopherson
2021-01-26 21:57     ` Ben Gardon
2021-01-27 17:14       ` Sean Christopherson
2021-01-26 13:37   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YAisdPTXGDqzil5G@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=jmattson@google.com \
    --cc=junaids@google.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfeiner@google.com \
    --cc=pshier@google.com \
    --cc=vkuznets@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    --cc=yulei.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.