From: Suzuki K Poulose <Suzuki.Poulose@arm.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org, marc.zyngier@arm.com, andreyknvl@google.com,
Will Deacon <Will.Deacon@arm.com>,
linux-kernel@vger.kernel.org, pbonzini@redhat.com,
paulmck@linux.vnet.ibm.com, kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 1/2] kvm: Fix mmu_notifier release race
Date: Fri, 28 Apr 2017 18:20:04 +0100 [thread overview]
Message-ID: <611d0ad2-f907-d41c-cdc1-5977c247b104@arm.com> (raw)
In-Reply-To: <c1232b1d-ad82-794b-1b86-4d0cc0d4cd7f@arm.com>
On 26/04/17 17:03, Suzuki K Poulose wrote:
> On 25/04/17 19:49, Radim Krčmář wrote:
>> 2017-04-24 11:10+0100, Suzuki K Poulose:
>>> The KVM uses mmu_notifier (wherever available) to keep track
>>> of the changes to the mm of the guest. The guest shadow page
>>> tables are released when the VM exits via mmu_notifier->ops.release().
>>> There is a rare chance that the mmu_notifier->release could be
>>> called more than once via two different paths, which could end
>>> up in use-after-free of kvm instance (such as [0]).
>>>
>>> e.g:
>>>
>>> thread A thread B
>>> ------- --------------
>>>
>>> get_signal-> kvm_destroy_vm()->
>>> do_exit-> mmu_notifier_unregister->
>>> exit_mm-> kvm_arch_flush_shadow_all()->
>>> exit_mmap-> spin_lock(&kvm->mmu_lock)
>>> mmu_notifier_release-> ....
>>> kvm_arch_flush_shadow_all()-> .....
>>> ... spin_lock(&kvm->mmu_lock) .....
>>> spin_unlock(&kvm->mmu_lock)
>>> kvm_arch_free_kvm()
>>> *** use after free of kvm ***
>>
>> I don't understand this race ...
>> a piece of code in mmu_notifier_unregister() says:
>>
>> /*
>> * Wait for any running method to finish, of course including
>> * ->release if it was run by mmu_notifier_release instead of us.
>> */
>> synchronize_srcu(&srcu);
>>
>> and code before that removes the notifier from the list, so it cannot be
>> called after we pass this point. mmu_notifier_release() does roughly
>> the same and explains it as:
>>
>> /*
>> * synchronize_srcu here prevents mmu_notifier_release from returning to
>> * exit_mmap (which would proceed with freeing all pages in the mm)
>> * until the ->release method returns, if it was invoked by
>> * mmu_notifier_unregister.
>> *
>> * The mmu_notifier_mm can't go away from under us because one mm_count
>> * is held by exit_mmap.
>> */
>> synchronize_srcu(&srcu);
>>
>> The call of mmu_notifier->release is protected by srcu in both cases and
>> while it seems possible that mmu_notifier->release would be called
>> twice, I don't see a combination that could result in use-after-free
>> from mmu_notifier_release after mmu_notifier_unregister() has returned.
>
> Thanks for bringing it up. Even I am wondering why this is triggered ! (But it
> does get triggered for sure !!)
>
> The only difference I can spot with _unregister & _release paths are the way
> we use src_read_lock across the deletion of the entry from the list.
>
> In mmu_notifier_unregister() we do :
>
> id = srcu_read_lock(&srcu);
> /*
> * exit_mmap will block in mmu_notifier_release to guarantee
> * that ->release is called before freeing the pages.
> */
> if (mn->ops->release)
> mn->ops->release(mn, mm);
> srcu_read_unlock(&srcu, id);
>
> ## Releases the srcu lock here and then goes on to grab the spin_lock.
>
> spin_lock(&mm->mmu_notifier_mm->lock);
> /*
> * Can not use list_del_rcu() since __mmu_notifier_release
> * can delete it before we hold the lock.
> */
> hlist_del_init_rcu(&mn->hlist);
> spin_unlock(&mm->mmu_notifier_mm->lock);
>
> While in mmu_notifier_release() we hold it until the node(s) are deleted from the
> list :
> /*
> * SRCU here will block mmu_notifier_unregister until
> * ->release returns.
> */
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist)
> /*
> * If ->release runs before mmu_notifier_unregister it must be
> * handled, as it's the only way for the driver to flush all
> * existing sptes and stop the driver from establishing any more
> * sptes before all the pages in the mm are freed.
> */
> if (mn->ops->release)
> mn->ops->release(mn, mm);
>
> spin_lock(&mm->mmu_notifier_mm->lock);
> while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
> mn = hlist_entry(mm->mmu_notifier_mm->list.first,
> struct mmu_notifier,
> hlist);
> /*
> * We arrived before mmu_notifier_unregister so
> * mmu_notifier_unregister will do nothing other than to wait
> * for ->release to finish and for mmu_notifier_unregister to
> * return.
> */
> hlist_del_init_rcu(&mn->hlist);
> }
> spin_unlock(&mm->mmu_notifier_mm->lock);
> srcu_read_unlock(&srcu, id);
>
> ## The lock is release only after the deletion of the node.
>
> Both are followed by a synchronize_srcu(). Now, I am wondering if the unregister path
> could potentially miss SRCU read lock held in _release() path and go onto finish the
> synchronize_srcu before the item is deleted ? May be we should do the read_unlock
> after the deletion of the node in _unregister (like we do in the _release()) ?
I haven't been able to reproduce the mmu_notifier race condition, which leads to KVM
free, reported at [1]. I will leave it running (with tracepoints/ftrace) over the
weekend.
>
>>
>> Doesn't [2/2] solve the exact same issue (that the release method cannot
>> be called twice in parallel)?
>
> Not really. This could be a race between a release() and one of the other notifier
> callbacks. e.g, In [0], we were hitting a use-after-free in kvm_unmap_hva() where,
> the unregister could have succeeded and released the KVM.
But I can reproduce this problem [0], and we need the [2/2] for arm/arm64.
[0] http://lkml.kernel.org/r/febea966-3767-21ff-3c40-1a76d1399138@suse.de
[1] http://lkml.kernel.org/r/CAAeHK+x8udHKq9xa1zkTO6ax5E8Dk32HYWfaT05FMchL2cr48g@mail.gmail.com
Thanks
Suzuki
next prev parent reply other threads:[~2017-04-28 17:20 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-24 10:10 [PATCH 0/2] kvm: Fixes for race conditions Suzuki K Poulose
2017-04-24 10:10 ` [PATCH 1/2] kvm: Fix mmu_notifier release race Suzuki K Poulose
2017-04-25 15:37 ` Christoffer Dall
2017-04-25 18:49 ` Radim Krčmář
2017-04-26 16:03 ` Suzuki K Poulose
2017-04-26 16:17 ` Paul E. McKenney
2017-04-28 17:20 ` Suzuki K Poulose [this message]
2017-05-03 13:13 ` Suzuki K Poulose
2017-04-24 10:10 ` [PATCH 2/2] kvm: arm/arm64: Fix race in resetting stage2 PGD Suzuki K Poulose
2017-04-24 12:27 ` Christoffer Dall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=611d0ad2-f907-d41c-cdc1-5977c247b104@arm.com \
--to=suzuki.poulose@arm.com \
--cc=Will.Deacon@arm.com \
--cc=andreyknvl@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).