linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Cc: "sean.j.christopherson@intel.com"
	<sean.j.christopherson@intel.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"wanpengli@tencent.com" <wanpengli@tencent.com>,
	"jmattson@google.com" <jmattson@google.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Felipe Franciosi <felipe@nutanix.com>,
	"rkrcmar@redhat.com" <rkrcmar@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [RFC PATCH] KVM: x86: Fix APIC page invalidation race
Date: Tue, 9 Jun 2020 11:54:06 +0200	[thread overview]
Message-ID: <6d2d2faf-116f-8c71-fda2-3fc052952dee@redhat.com> (raw)
In-Reply-To: <7B9024C7-98D0-4940-91AE-40BCDE555C8F@nutanix.com>

On 09/06/20 03:04, Eiichi Tsukata wrote:
> 
> 
>> On Jun 8, 2020, at 22:13, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 06/06/20 06:26, Eiichi Tsukata wrote:
>>> Commit b1394e745b94 ("KVM: x86: fix APIC page invalidation") tried to
>>> fix inappropriate APIC page invalidation by re-introducing arch specific
>>> kvm_arch_mmu_notifier_invalidate_range() and calling it from
>>> kvm_mmu_notifier_invalidate_range_start. But threre could be the
>>> following race because VMCS APIC address cache can be updated
>>> *before* it is unmapped.
>>>
>>> Race:
>>>  (Invalidator) kvm_mmu_notifier_invalidate_range_start()
>>>  (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
>>>  (KVM VCPU) vcpu_enter_guest()
>>>  (KVM VCPU) kvm_vcpu_reload_apic_access_page()
>>>  (Invalidator) actually unmap page
>>>
>>> Symptom:
>>>  The above race can make Guest OS see already freed page and Guest OS
>>> will see broken APIC register values.
>>
>> This is not exactly the issue.  The values in the APIC-access page do
>> not really matter, the problem is that the host physical address values
>> won't match between the page tables and the APIC-access page address.
>> Then the processor will not trap APIC accesses, and will instead show
>> the raw contents of the APIC-access page (zeroes), and cause the crash
>> as you mention below.
>>
>> Still, the race explains the symptoms and the patch matches this text in
>> include/linux/mmu_notifier.h:
>>
>> 	 * If the subsystem
>>         * can't guarantee that no additional references are taken to
>>         * the pages in the range, it has to implement the
>>         * invalidate_range() notifier to remove any references taken
>>         * after invalidate_range_start().
>>
>> where the "additional reference" is in the VMCS: because we have to
>> account for kvm_vcpu_reload_apic_access_page running between
>> invalidate_range_start() and invalidate_range_end(), we need to
>> implement invalidate_range().
>>
>> The patch seems good, but I'd like Andrea Arcangeli to take a look as
>> well so I've CCed him.
>>
>> Thank you very much!
>>
>> Paolo
>>
> 
> Hello Paolo
> 
> Thanks for detailed explanation!
> I’ll fix the commit message like this:

No need to resend, the patch is good.  Here is my take on the commit message:

    Commit b1394e745b94 ("KVM: x86: fix APIC page invalidation") tried
    to fix inappropriate APIC page invalidation by re-introducing arch
    specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
    kvm_mmu_notifier_invalidate_range_start. However, the patch left a
    possible race where the VMCS APIC address cache is updated *before*
    it is unmapped:
    
      (Invalidator) kvm_mmu_notifier_invalidate_range_start()
      (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
      (KVM VCPU) vcpu_enter_guest()
      (KVM VCPU) kvm_vcpu_reload_apic_access_page()
      (Invalidator) actually unmap page
    
    Because of the above race, there can be a mismatch between the
    host physical address stored in the APIC_ACCESS_PAGE VMCS field and
    the host physical address stored in the EPT entry for the APIC GPA
    (0xfee0000).  When this happens, the processor will not trap APIC
    accesses, and will instead show the raw contents of the APIC-access page.
    Because Windows OS periodically checks for unexpected modifications to
    the LAPIC register, this will show up as a BSOD crash with BugCheck
    CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
    https://bugzilla.redhat.com/show_bug.cgi?id=1751017.
    
    The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
    cannot guarantee that no additional references are taken to the pages in
    the range before kvm_mmu_notifier_invalidate_range_end().  Fortunately,
    this case is supported by the MMU notifier API, as documented in
    include/linux/mmu_notifier.h:
    
             * If the subsystem
             * can't guarantee that no additional references are taken to
             * the pages in the range, it has to implement the
             * invalidate_range() notifier to remove any references taken
             * after invalidate_range_start().
    
    The fix therefore is to reload the APIC-access page field in the VMCS
    from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().

Thanks,

Paolo


  reply	other threads:[~2020-06-09  9:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-06  4:26 [RFC PATCH] KVM: x86: Fix APIC page invalidation race Eiichi Tsukata
2020-06-06  5:00 ` Eiichi Tsukata
2020-06-08 13:13 ` Paolo Bonzini
2020-06-09  1:04   ` Eiichi Tsukata
2020-06-09  9:54     ` Paolo Bonzini [this message]
2020-06-09 13:36       ` Eiichi Tsukata
     [not found]       ` <202006191317431160122@wangsu.com>
2020-06-19 12:12         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d2d2faf-116f-8c71-fda2-3fc052952dee@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=bp@alien8.de \
    --cc=eiichi.tsukata@nutanix.com \
    --cc=felipe@nutanix.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).