From: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
To: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"sean.j.christopherson@intel.com"
<sean.j.christopherson@intel.com>,
"vkuznets@redhat.com" <vkuznets@redhat.com>,
"wanpengli@tencent.com" <wanpengli@tencent.com>,
"jmattson@google.com" <jmattson@google.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Felipe Franciosi <felipe@nutanix.com>
Subject: Re: [RFC PATCH] KVM: x86: Fix APIC page invalidation race
Date: Sat, 6 Jun 2020 05:00:18 +0000 [thread overview]
Message-ID: <75DCBAE1-6DC6-4450-9697-AD27891B497B@nutanix.com> (raw)
In-Reply-To: <20200606042627.61070-1-eiichi.tsukata@nutanix.com>
Hello
The race window I mentioned in the commit message is pretty small. So it’s difficult to reproduce it.
But with the following ‘delay’ patch, it can be very easy to reproduce.
```
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c17e6eb9ad43..b6728bf80a7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -55,6 +55,7 @@
#include <linux/sched/stat.h>
#include <linux/sched/isolation.h>
#include <linux/mem_encrypt.h>
+#include <linux/delay.h>
#include <trace/events/kvm.h>
@@ -8161,8 +8162,10 @@ int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
* Update it when it becomes invalid.
*/
apic_address = gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
- if (start <= apic_address && apic_address < end)
+ if (start <= apic_address && apic_address < end) {
kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+ mdelay(1000);
+ }
return 0;
}
```
Steps to Reproduce:
- start Windows VM(ex: Windows Server 2016) and watch YouTube video to stimulate VM_ENTER/EXIT
- ’stress —vm X —vm-bytes Y’ to make the APIC page swapped out
- Windows OS will crash with BugCheck 0x109
Thanks,
Eiichi
> On Jun 6, 2020, at 13:26, Eiichi Tsukata <eiichi.tsukata@nutanix.com> wrote:
>
> Commit b1394e745b94 ("KVM: x86: fix APIC page invalidation") tried to
> fix inappropriate APIC page invalidation by re-introducing arch specific
> kvm_arch_mmu_notifier_invalidate_range() and calling it from
> kvm_mmu_notifier_invalidate_range_start. But threre could be the
> following race because VMCS APIC address cache can be updated
> *before* it is unmapped.
>
> Race:
> (Invalidator) kvm_mmu_notifier_invalidate_range_start()
> (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
> (KVM VCPU) vcpu_enter_guest()
> (KVM VCPU) kvm_vcpu_reload_apic_access_page()
> (Invalidator) actually unmap page
>
> Symptom:
> The above race can make Guest OS see already freed page and Guest OS
> will see broken APIC register values. Especially, Windows OS checks
> LAPIC modification so it can cause BSOD crash with BugCheck
> CRITICAL_STRUCTURE_CORRUPTION (109). These symptoms are the same as we
> previously saw in https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.kernel.org_show-5Fbug.cgi-3Fid-3D197951&d=DwIDAg&c=s883GpUCOChKOHiocYtGcg&r=dy01Dr4Ly8mhvnUdx1pZhhT1bkq4h9z5aVWu3paoZtk&m=0Tyk-14RQ4E7qUHEz3qfkUGJEUisqm5fr6wFgen6m9o&s=uTkyasbUNMoptgfsLkg3D5IDb_xxOSjklf2IfLLUzgI&e= and
> we are currently seeing in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1751017&d=DwIDAg&c=s883GpUCOChKOHiocYtGcg&r=dy01Dr4Ly8mhvnUdx1pZhhT1bkq4h9z5aVWu3paoZtk&m=0Tyk-14RQ4E7qUHEz3qfkUGJEUisqm5fr6wFgen6m9o&s=pyRkFbs1A9a9AXxWMqiDEOoGJGBbmF8uJdLu8vKSPCs&e= .
>
> To prevent Guest OS from accessing already freed page, this patch calls
> kvm_arch_mmu_notifier_invalidate_range() from
> kvm_mmu_notifier_invalidate_range() instead of ..._range_start().
>
> Fixes: b1394e745b94 ("KVM: x86: fix APIC page invalidation")
> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
> ---
> arch/x86/kvm/x86.c | 7 ++-----
> include/linux/kvm_host.h | 4 ++--
> virt/kvm/kvm_main.c | 26 ++++++++++++++++----------
> 3 files changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c17e6eb9ad43..1700aade39d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8150,9 +8150,8 @@ static void vcpu_load_eoi_exitmap(struct kvm_vcpu *vcpu)
> kvm_x86_ops.load_eoi_exitmap(vcpu, eoi_exit_bitmap);
> }
>
> -int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> - unsigned long start, unsigned long end,
> - bool blockable)
> +void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> + unsigned long start, unsigned long end)
> {
> unsigned long apic_address;
>
> @@ -8163,8 +8162,6 @@ int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> apic_address = gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> if (start <= apic_address && apic_address < end)
> kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
> -
> - return 0;
> }
>
> void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 131cc1527d68..92efa39ea3d7 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1406,8 +1406,8 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
> }
> #endif /* CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL */
>
> -int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> - unsigned long start, unsigned long end, bool blockable);
> +void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> + unsigned long start, unsigned long end);
>
> #ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
> int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 731c1e517716..77aa91fb08d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -155,10 +155,9 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
> static unsigned long long kvm_createvm_count;
> static unsigned long long kvm_active_vms;
>
> -__weak int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> - unsigned long start, unsigned long end, bool blockable)
> +__weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> + unsigned long start, unsigned long end)
> {
> - return 0;
> }
>
> bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
> @@ -384,6 +383,18 @@ static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
> return container_of(mn, struct kvm, mmu_notifier);
> }
>
> +static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn,
> + struct mm_struct *mm,
> + unsigned long start, unsigned long end)
> +{
> + struct kvm *kvm = mmu_notifier_to_kvm(mn);
> + int idx;
> +
> + idx = srcu_read_lock(&kvm->srcu);
> + kvm_arch_mmu_notifier_invalidate_range(kvm, start, end);
> + srcu_read_unlock(&kvm->srcu, idx);
> +}
> +
> static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
> struct mm_struct *mm,
> unsigned long address,
> @@ -408,7 +419,6 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
> {
> struct kvm *kvm = mmu_notifier_to_kvm(mn);
> int need_tlb_flush = 0, idx;
> - int ret;
>
> idx = srcu_read_lock(&kvm->srcu);
> spin_lock(&kvm->mmu_lock);
> @@ -425,14 +435,9 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
> kvm_flush_remote_tlbs(kvm);
>
> spin_unlock(&kvm->mmu_lock);
> -
> - ret = kvm_arch_mmu_notifier_invalidate_range(kvm, range->start,
> - range->end,
> - mmu_notifier_range_blockable(range));
> -
> srcu_read_unlock(&kvm->srcu, idx);
>
> - return ret;
> + return 0;
> }
>
> static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
> @@ -538,6 +543,7 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
> }
>
> static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
> + .invalidate_range = kvm_mmu_notifier_invalidate_range,
> .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start,
> .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
> .clear_flush_young = kvm_mmu_notifier_clear_flush_young,
> --
> 2.21.3
>
next prev parent reply other threads:[~2020-06-06 5:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-06 4:26 [RFC PATCH] KVM: x86: Fix APIC page invalidation race Eiichi Tsukata
2020-06-06 5:00 ` Eiichi Tsukata [this message]
2020-06-08 13:13 ` Paolo Bonzini
2020-06-09 1:04 ` Eiichi Tsukata
2020-06-09 9:54 ` Paolo Bonzini
2020-06-09 13:36 ` Eiichi Tsukata
[not found] ` <202006191317431160122@wangsu.com>
2020-06-19 12:12 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75DCBAE1-6DC6-4450-9697-AD27891B497B@nutanix.com \
--to=eiichi.tsukata@nutanix.com \
--cc=bp@alien8.de \
--cc=felipe@nutanix.com \
--cc=hpa@zytor.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=sean.j.christopherson@intel.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).