linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@linux.alibaba.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 1/4] KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid()
Date: Fri, 22 Oct 2021 08:22:19 +0800	[thread overview]
Message-ID: <a79bbdb6-9d24-7674-2a77-f1f68b64635f@linux.alibaba.com> (raw)
In-Reply-To: <YXF+pG0yGA0TQZww@google.com>



On 2021/10/21 22:52, Sean Christopherson wrote:
> On Thu, Oct 21, 2021, Lai Jiangshan wrote:
>>
>>
>> On 2021/10/21 02:26, Sean Christopherson wrote:
>>> On Wed, Oct 20, 2021, Lai Jiangshan wrote:
>>>> On 2021/10/19 23:25, Sean Christopherson wrote:
>>>> I just read some interception policy in vmx.c, if EPT=1 but vmx_need_pf_intercept()
>>>> return true for some reasons/configs, #PF is intercepted.  But CR3 write is not
>>>> intercepted, which means there will be an EPT fault _after_ (IIUC) the CR3 write if
>>>> the GPA of the new CR3 exceeds the guest maxphyaddr limit.  And kvm queues a fault to
>>>> the guest which is also _after_ the CR3 write, but the guest expects the fault before
>>>> the write.
>>>>
>>>> IIUC, it can be fixed by intercepting CR3 write or reversing the CR3 write in EPT
>>>> violation handler.
>>>
>>> KVM implicitly does the latter by emulating the faulting instruction.
>>>
>>>     static int handle_ept_violation(struct kvm_vcpu *vcpu)
>>>     {
>>> 	...
>>>
>>> 	/*
>>> 	 * Check that the GPA doesn't exceed physical memory limits, as that is
>>> 	 * a guest page fault.  We have to emulate the instruction here, because
>>> 	 * if the illegal address is that of a paging structure, then
>>> 	 * EPT_VIOLATION_ACC_WRITE bit is set.  Alternatively, if supported we
>>> 	 * would also use advanced VM-exit information for EPT violations to
>>> 	 * reconstruct the page fault error code.
>>> 	 */
>>> 	if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gpa)))
>>> 		return kvm_emulate_instruction(vcpu, 0);
>>>
>>> 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
>>>     }
>>>
>>> and injecting a #GP when kvm_set_cr3() fails.
>>
>> I think the EPT violation happens *after* the cr3 write.  So the instruction to be
>> emulated is not "cr3 write".  The emulation will queue fault into guest though,
>> recursive EPT violation happens since the cr3 exceeds maxphyaddr limit.
> 
> Doh, you're correct.  I think my mind wandered into thinking about what would
> happen with PDPTRs and forgot to get back to normal MOV CR3.
> 
> So yeah, the only way to correctly handle this would be to intercept CR3 loads.
> I'm guessing that would have a noticeable impact on guest performance.

I think we can detect it in handle_ept_violation() via checking the cr3 value,
and make it triple-fault if it is the case, so that the VMM can exit.  I don't
think any OS would use the reserved bit in CR3 and the corresponding #GP.

> 
> Paolo, I'll leave this one for you to decide, we have pretty much written off
> allow_smaller_maxphyaddr :-)
> 

  parent reply	other threads:[~2021-10-22  0:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 11:01 [PATCH 0/4] KVM: X86: Improve guest TLB flushing Lai Jiangshan
2021-10-19 11:01 ` [PATCH 1/4] KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid() Lai Jiangshan
2021-10-19 15:25   ` Sean Christopherson
2021-10-20  9:54     ` Lai Jiangshan
2021-10-20 18:26       ` Sean Christopherson
2021-10-21  1:27         ` Lai Jiangshan
2021-10-21 14:52           ` Sean Christopherson
2021-10-21 17:13             ` Paolo Bonzini
2021-10-21 17:32               ` Jim Mattson
2021-10-22  0:22             ` Lai Jiangshan [this message]
2021-10-19 11:01 ` [PATCH 2/4] KVM: X86: Cache CR3 in prev_roots when PCID is disabled Lai Jiangshan
2021-10-21 17:43   ` Paolo Bonzini
2021-10-22  2:11     ` Lai Jiangshan
2021-10-19 11:01 ` [PATCH 3/4] KVM: X86: Use smp_rmb() to pair with smp_wmb() in mmu_try_to_unsync_pages() Lai Jiangshan
2021-10-21  2:32   ` Lai Jiangshan
2021-10-21 17:44   ` Paolo Bonzini
2021-10-19 11:01 ` [PATCH 4/4] KVM: X86: Don't unload MMU in kvm_vcpu_flush_tlb_guest() Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a79bbdb6-9d24-7674-2a77-f1f68b64635f@linux.alibaba.com \
    --to=laijs@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).