All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <bonzini@gnu.org>
Cc: Wanpeng Li <kernellwp@gmail.com>, Marc Zyngier <maz@kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm <kvm@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH v2 09/10] KVM: Don't take mmu_lock for range invalidation unless necessary
Date: Tue, 20 Apr 2021 01:17:15 +0000	[thread overview]
Message-ID: <YH4rm4W57R85tMKE@google.com> (raw)
In-Reply-To: <051f78aa-7bf8-0832-aee6-b4157a1853a0@gnu.org>

On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> On 19/04/21 17:09, Sean Christopherson wrote:
> > > - this loses the rwsem fairness.  On the other hand, mm/mmu_notifier.c's
> > > own interval-tree-based filter is also using a similar mechanism that is
> > > likewise not fair, so it should be okay.
> > 
> > The one concern I had with an unfair mechanism of this nature is that, in theory,
> > the memslot update could be blocked indefinitely.
> 
> Yep, that's why I mentioned it.
> 
> > > @@ -1333,9 +1351,22 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
> > >   	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
> > >   	slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
> > > -	down_write(&kvm->mmu_notifier_slots_lock);
> > > +	/*
> > > +	 * This cannot be an rwsem because the MMU notifier must not run
> > > +	 * inside the critical section.  A sleeping rwsem cannot exclude
> > > +	 * that.
> > 
> > How on earth did you decipher that from the splat?  I stared at it for a good
> > five minutes and was completely befuddled.
> 
> Just scratch that, it makes no sense.  It's much simpler, but you have
> to look at include/linux/mmu_notifier.h to figure it out:

LOL, glad you could figure it out, I wasn't getting anywhere, mmu_notifier.h or
not.

>     invalidate_range_start
>       take pseudo lock
>       down_read()           (*)
>       release pseudo lock
>     invalidate_range_end
>       take pseudo lock      (**)
>       up_read()
>       release pseudo lock
> 
> At point (*) we take the mmu_notifiers_slots_lock inside the pseudo lock;
> at point (**) we take the pseudo lock inside the mmu_notifiers_slots_lock.
> 
> This could cause a deadlock (ignoring for a second that the pseudo lock
> is not a lock):
> 
> - invalidate_range_start waits on down_read(), because the rwsem is
> held by install_new_memslots
> 
> - install_new_memslots waits on down_write(), because the rwsem is
> held till (another) invalidate_range_end finishes
> 
> - invalidate_range_end sits waits on the pseudo lock, held by
> invalidate_range_start.
> 
> Removing the fairness of the rwsem breaks the cycle (in lockdep terms,
> it would change the *shared* rwsem readers into *shared recursive*
> readers).  This also means that there's no need for a raw spinlock.

Ahh, thanks, this finally made things click.

> Given this simple explanation, I think it's okay to include this

LOL, "simple".

> patch in the merge window pull request, with the fix after my
> signature squashed in.  The fix actually undoes a lot of the
> changes to __kvm_handle_hva_range that this patch made, so the
> result is relatively simple.  You can already find the result
> in kvm/queue.

...

>  static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>  						  const struct kvm_hva_range *range)
>  {
> @@ -515,10 +495,6 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>  	idx = srcu_read_lock(&kvm->srcu);
> -	if (range->must_lock &&
> -	    kvm_mmu_lock_and_check_handler(kvm, range, &locked))
> -		goto out_unlock;
> -
>  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>  		slots = __kvm_memslots(kvm, i);
>  		kvm_for_each_memslot(slot, slots) {
> @@ -547,8 +523,14 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>  			gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
>  			gfn_range.slot = slot;
> -			if (kvm_mmu_lock_and_check_handler(kvm, range, &locked))
> -				goto out_unlock;
> +			if (!locked) {
> +				locked = true;
> +				KVM_MMU_LOCK(kvm);
> +				if (!IS_KVM_NULL_FN(range->on_lock))
> +					range->on_lock(kvm, range->start, range->end);
> +				if (IS_KVM_NULL_FN(range->handler))
> +					break;

This can/should be "goto out_unlock", "break" only takes us out of the memslots
walk, we want to get out of the address space loop.  Not a functional problem,
but we might walk all SMM memslots unnecessarily.

> +			}
>  			ret |= range->handler(kvm, &gfn_range);
>  		}
> @@ -557,7 +539,6 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
>  	if (range->flush_on_ret && (ret || kvm->tlbs_dirty))
>  		kvm_flush_remote_tlbs(kvm);
> -out_unlock:
>  	if (locked)
>  		KVM_MMU_UNLOCK(kvm);
> @@ -580,7 +561,6 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
>  		.pte		= pte,
>  		.handler	= handler,
>  		.on_lock	= (void *)kvm_null_fn,
> -		.must_lock	= false,
>  		.flush_on_ret	= true,
>  		.may_block	= false,
>  	};
> @@ -600,7 +580,6 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
>  		.pte		= __pte(0),
>  		.handler	= handler,
>  		.on_lock	= (void *)kvm_null_fn,
> -		.must_lock	= false,
>  		.flush_on_ret	= false,
>  		.may_block	= false,
>  	};
> @@ -620,13 +599,11 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
>  	 * .change_pte() must be surrounded by .invalidate_range_{start,end}(),

While you're squashing, want to change the above comma to a period?

>  	 * If mmu_notifier_count is zero, then start() didn't find a relevant
>  	 * memslot and wasn't forced down the slow path; rechecking here is
> -	 * unnecessary.  This can only occur if memslot updates are blocked;
> -	 * otherwise, mmu_notifier_count is incremented unconditionally.
> +	 * unnecessary.
>  	 */
> -	if (!kvm->mmu_notifier_count) {
> -		lockdep_assert_held(&kvm->mmu_notifier_slots_lock);
> +	WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
> +	if (!kvm->mmu_notifier_count)
>  		return;
> -	}
>  	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
>  }

...

> @@ -1333,9 +1315,22 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>  	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
>  	slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
> -	down_write(&kvm->mmu_notifier_slots_lock);
> +	/*
> +	 * This cannot be an rwsem because the MMU notifier must not run
> +	 * inside the critical section, which cannot be excluded with a
> +	 * sleeping rwsem.

Any objection to replcaing this comment with a rephrased version of your
statement about "shared" vs. "shared recursive" and breaking the fairness cycle?
IIUC, it's not "running inside the critical section" that's problematic, it's
that sleeping in down_write() can cause deadlock due to blocking future readers.

Thanks much!

> +	 */
> +	spin_lock(&kvm->mn_invalidate_lock);
> +	prepare_to_rcuwait(&kvm->mn_memslots_update_rcuwait);
> +	while (kvm->mn_active_invalidate_count) {
> +		set_current_state(TASK_UNINTERRUPTIBLE);
> +		spin_unlock(&kvm->mn_invalidate_lock);
> +		schedule();
> +		spin_lock(&kvm->mn_invalidate_lock);
> +	}
> +	finish_rcuwait(&kvm->mn_memslots_update_rcuwait);
>  	rcu_assign_pointer(kvm->memslots[as_id], slots);
> -	up_write(&kvm->mmu_notifier_slots_lock);
> +	spin_unlock(&kvm->mn_invalidate_lock);
>  	synchronize_srcu_expedited(&kvm->srcu);
> -- 
> 2.26.2
> 

  reply	other threads:[~2021-04-20  1:17 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  0:56 [PATCH v2 00/10] KVM: Consolidate and optimize MMU notifiers Sean Christopherson
2021-04-02  0:56 ` Sean Christopherson
2021-04-02  0:56 ` Sean Christopherson
2021-04-02  0:56 ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 01/10] KVM: Assert that notifier count is elevated in .change_pte() Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02 11:08   ` Paolo Bonzini
2021-04-02 11:08     ` Paolo Bonzini
2021-04-02 11:08     ` Paolo Bonzini
2021-04-02 11:08     ` Paolo Bonzini
2021-04-02  0:56 ` [PATCH v2 02/10] KVM: Move x86's MMU notifier memslot walkers to generic code Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 03/10] KVM: arm64: Convert to the gfn-based MMU notifier callbacks Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-12 10:12   ` Marc Zyngier
2021-04-12 10:12     ` Marc Zyngier
2021-04-12 10:12     ` Marc Zyngier
2021-04-12 10:12     ` Marc Zyngier
2021-04-02  0:56 ` [PATCH v2 04/10] KVM: MIPS/MMU: " Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 05/10] KVM: PPC: " Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 06/10] KVM: Kill off the old hva-based " Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 07/10] KVM: Move MMU notifier's mmu_lock acquisition into common helper Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  9:35   ` Paolo Bonzini
2021-04-02  9:35     ` Paolo Bonzini
2021-04-02  9:35     ` Paolo Bonzini
2021-04-02  9:35     ` Paolo Bonzini
2021-04-02 14:59     ` Sean Christopherson
2021-04-02 14:59       ` Sean Christopherson
2021-04-02 14:59       ` Sean Christopherson
2021-04-02 14:59       ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 08/10] KVM: Take mmu_lock when handling MMU notifier iff the hva hits a memslot Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56 ` [PATCH v2 09/10] KVM: Don't take mmu_lock for range invalidation unless necessary Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  9:34   ` Paolo Bonzini
2021-04-02  9:34     ` Paolo Bonzini
2021-04-02  9:34     ` Paolo Bonzini
2021-04-02  9:34     ` Paolo Bonzini
2021-04-02 14:59     ` Sean Christopherson
2021-04-02 14:59       ` Sean Christopherson
2021-04-02 14:59       ` Sean Christopherson
2021-04-02 14:59       ` Sean Christopherson
2021-04-19  8:49   ` Wanpeng Li
2021-04-19  8:49     ` Wanpeng Li
2021-04-19  8:49     ` Wanpeng Li
2021-04-19  8:49     ` Wanpeng Li
2021-04-19 13:50     ` Paolo Bonzini
2021-04-19 15:09       ` Sean Christopherson
2021-04-19 22:09         ` Paolo Bonzini
2021-04-20  1:17           ` Sean Christopherson [this message]
2021-04-02  0:56 ` [PATCH v2 10/10] KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02  0:56   ` Sean Christopherson
2021-04-02 12:17 ` [PATCH v2 00/10] KVM: Consolidate and optimize MMU notifiers Paolo Bonzini
2021-04-02 12:17   ` Paolo Bonzini
2021-04-02 12:17   ` Paolo Bonzini
2021-04-02 12:17   ` Paolo Bonzini
2021-04-12 10:27   ` Marc Zyngier
2021-04-12 10:27     ` Marc Zyngier
2021-04-12 10:27     ` Marc Zyngier
2021-04-12 10:27     ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YH4rm4W57R85tMKE@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=bonzini@gnu.org \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.