linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <laijs@linux.alibaba.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 1/4] KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid()
Date: Thu, 21 Oct 2021 14:52:20 +0000	[thread overview]
Message-ID: <YXF+pG0yGA0TQZww@google.com> (raw)
In-Reply-To: <55abc519-b528-ddaa-120d-8d157b520623@linux.alibaba.com>

On Thu, Oct 21, 2021, Lai Jiangshan wrote:
> 
> 
> On 2021/10/21 02:26, Sean Christopherson wrote:
> > On Wed, Oct 20, 2021, Lai Jiangshan wrote:
> > > On 2021/10/19 23:25, Sean Christopherson wrote:
> > > I just read some interception policy in vmx.c, if EPT=1 but vmx_need_pf_intercept()
> > > return true for some reasons/configs, #PF is intercepted.  But CR3 write is not
> > > intercepted, which means there will be an EPT fault _after_ (IIUC) the CR3 write if
> > > the GPA of the new CR3 exceeds the guest maxphyaddr limit.  And kvm queues a fault to
> > > the guest which is also _after_ the CR3 write, but the guest expects the fault before
> > > the write.
> > > 
> > > IIUC, it can be fixed by intercepting CR3 write or reversing the CR3 write in EPT
> > > violation handler.
> > 
> > KVM implicitly does the latter by emulating the faulting instruction.
> > 
> >    static int handle_ept_violation(struct kvm_vcpu *vcpu)
> >    {
> > 	...
> > 
> > 	/*
> > 	 * Check that the GPA doesn't exceed physical memory limits, as that is
> > 	 * a guest page fault.  We have to emulate the instruction here, because
> > 	 * if the illegal address is that of a paging structure, then
> > 	 * EPT_VIOLATION_ACC_WRITE bit is set.  Alternatively, if supported we
> > 	 * would also use advanced VM-exit information for EPT violations to
> > 	 * reconstruct the page fault error code.
> > 	 */
> > 	if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gpa)))
> > 		return kvm_emulate_instruction(vcpu, 0);
> > 
> > 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> >    }
> > 
> > and injecting a #GP when kvm_set_cr3() fails.
> 
> I think the EPT violation happens *after* the cr3 write.  So the instruction to be
> emulated is not "cr3 write".  The emulation will queue fault into guest though,
> recursive EPT violation happens since the cr3 exceeds maxphyaddr limit.

Doh, you're correct.  I think my mind wandered into thinking about what would
happen with PDPTRs and forgot to get back to normal MOV CR3.

So yeah, the only way to correctly handle this would be to intercept CR3 loads.
I'm guessing that would have a noticeable impact on guest performance.

Paolo, I'll leave this one for you to decide, we have pretty much written off
allow_smaller_maxphyaddr :-)

  reply	other threads:[~2021-10-21 14:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 11:01 [PATCH 0/4] KVM: X86: Improve guest TLB flushing Lai Jiangshan
2021-10-19 11:01 ` [PATCH 1/4] KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid() Lai Jiangshan
2021-10-19 15:25   ` Sean Christopherson
2021-10-20  9:54     ` Lai Jiangshan
2021-10-20 18:26       ` Sean Christopherson
2021-10-21  1:27         ` Lai Jiangshan
2021-10-21 14:52           ` Sean Christopherson [this message]
2021-10-21 17:13             ` Paolo Bonzini
2021-10-21 17:32               ` Jim Mattson
2021-10-22  0:22             ` Lai Jiangshan
2021-10-19 11:01 ` [PATCH 2/4] KVM: X86: Cache CR3 in prev_roots when PCID is disabled Lai Jiangshan
2021-10-21 17:43   ` Paolo Bonzini
2021-10-22  2:11     ` Lai Jiangshan
2021-10-19 11:01 ` [PATCH 3/4] KVM: X86: Use smp_rmb() to pair with smp_wmb() in mmu_try_to_unsync_pages() Lai Jiangshan
2021-10-21  2:32   ` Lai Jiangshan
2021-10-21 17:44   ` Paolo Bonzini
2021-10-19 11:01 ` [PATCH 4/4] KVM: X86: Don't unload MMU in kvm_vcpu_flush_tlb_guest() Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXF+pG0yGA0TQZww@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=laijs@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).