linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Yan Zhao <yan.y.zhao@intel.com>,
	Isaku Yamahata <isaku.yamahata@intel.com>,
	 Michael Roth <michael.roth@amd.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	 Chao Peng <chao.p.peng@linux.intel.com>,
	Fuad Tabba <tabba@google.com>,
	 David Matlack <dmatlack@google.com>
Subject: Re: [PATCH 02/16] KVM: x86: Remove separate "bit" defines for page fault error code masks
Date: Thu, 29 Feb 2024 10:40:03 -0800	[thread overview]
Message-ID: <ZeDPgx1O_AuR2Iz3@google.com> (raw)
In-Reply-To: <CABgObfbtPJ6AAX9GnjNscPRTbNAOtamdxX677kx_r=zd4scw6w@mail.gmail.com>

On Thu, Feb 29, 2024, Paolo Bonzini wrote:
> On Wed, Feb 28, 2024 at 3:46 AM Sean Christopherson <seanjc@google.com> wrote:
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 60f21bb4c27b..e8b620a85627 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -213,7 +213,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> >          */
> >         u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
> >         bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
> > -       int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
> > +       int index = (pfec + (not_smap << ilog2(PFERR_RSVD_MASK))) >> 1;
> 
> Just use "(pfec + (not_smap ? PFERR_RSVD_MASK : 0)) >> 1".
> 
> Likewise below, "pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0"/
> 
> No need to even check what the compiler produces, it will be either
> exactly the same code or a bunch of cmov instructions.

I couldn't resist :-)

The second one generates identical code, but for this one:

  int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;

gcc generates almost bizarrely different code in the call from vcpu_mmio_gva_to_gpa().
clang is clever enough to realize "pfec" can only contain USER_MASK and/or WRITE_MASK,
and so does a ton of dead code elimination and other optimizations.  But for some
reason, gcc doesn't appear to realize that, and generates a MOVSX when computing
"index", i.e. sign-extends the result of the ADD (at least, I think that's what it's
doing).

There's no actual bug today, and the vcpu_mmio_gva_to_gpa() path is super safe
since KVM fully controls the error code.  But the call from FNAME(walk_addr_generic)
uses a _much_ more dynamic error code.

If an error code with unexpected bits set managed to get into permission_fault(),
I'm pretty sure we'd end up with out-of-bounds accesses.  KVM sanity checks that
PK and RSVD aren't set, 

	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));

but KVM unnecessarily uses an ADD instead of OR, here


	int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;

and here

		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
		offset = (pfec & ~1) +
			((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));

i.e. if the WARN fired, KVM would generate completely unexpected values due to
adding two RSVD bit flags.

And if _really_ unexpected flags make their way into permission_fault(), e.g. the
upcoming RMP flag (bit 31) or Intel's SGX flag (bit 15), then the use of index

	fault = (mmu->permissions[index] >> pte_access) & 1;

could generate a read waaaya outside of the array.  It can't/shouldn't happen in
practice since KVM shouldn't be trying to emulate RMP violations or faults in SGX
enclaves, but it's unnecessarily dangerous.

Long story short, I think we should get to the below (I'll post a separate series,
assuming I'm not missing something).

	unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
	unsigned int pfec = access & (PFERR_PRESENT_MASK |
				      PFERR_WRITE_MASK |
				      PFERR_USER_MASK |
				      PFERR_FETCH_MASK);

	/*
	 * For explicit supervisor accesses, SMAP is disabled if EFLAGS.AC = 1.
	 * For implicit supervisor accesses, SMAP cannot be overridden.
	 *
	 * SMAP works on supervisor accesses only, and not_smap can
	 * be set or not set when user access with neither has any bearing
	 * on the result.
	 *
	 * We put the SMAP checking bit in place of the PFERR_RSVD_MASK bit;
	 * this bit will always be zero in pfec, but it will be one in index
	 * if SMAP checks are being disabled.
	 */
	u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
	bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
	int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1;
	u32 errcode = PFERR_PRESENT_MASK;
	bool fault;

	kvm_mmu_refresh_passthrough_bits(vcpu, mmu);

	fault = (mmu->permissions[index] >> pte_access) & 1;

	/*
	 * Sanity check that no bits are set in the legacy #PF error code
	 * (bits 31:0) other than the supported permission bits (see above).
	 */
	WARN_ON_ONCE(pfec != (unsigned int)access);

	if (unlikely(mmu->pkru_mask)) {
		u32 pkru_bits, offset;

		/*
		* PKRU defines 32 bits, there are 16 domains and 2
		* attribute bits per domain in pkru.  pte_pkey is the
		* index of the protection domain, so pte_pkey * 2 is
		* is the index of the first bit for the domain.
		*/
		pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;

		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
		offset = (pfec & ~1) | (pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0);

		pkru_bits &= mmu->pkru_mask >> offset;
		errcode |= -pkru_bits & PFERR_PK_MASK;
		fault |= (pkru_bits != 0);
	}

	return -(u32)fault & errcode;

  reply	other threads:[~2024-02-29 18:40 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28  2:41 [PATCH 00/16] KVM: x86/mmu: Page fault and MMIO cleanups Sean Christopherson
2024-02-28  2:41 ` [PATCH 01/16] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation Sean Christopherson
2024-03-01  8:48   ` Xiaoyao Li
2024-03-07 12:52   ` Gupta, Pankaj
2024-03-12  2:59     ` Binbin Wu
2024-04-04 16:38       ` Sean Christopherson
2024-03-08  4:22   ` Yan Zhao
2024-04-04 16:45     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 02/16] KVM: x86: Remove separate "bit" defines for page fault error code masks Sean Christopherson
2024-02-29 12:44   ` Paolo Bonzini
2024-02-29 18:40     ` Sean Christopherson [this message]
2024-02-29 20:56       ` Paolo Bonzini
2024-02-29 13:43   ` Dongli Zhang
2024-02-29 15:25     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 03/16] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF Sean Christopherson
2024-02-28  4:43   ` Dongli Zhang
2024-02-28 16:16     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 04/16] KVM: x86/mmu: Pass full 64-bit error code when handling page faults Sean Christopherson
2024-02-28  7:30   ` Dongli Zhang
2024-02-28 16:22     ` Sean Christopherson
2024-02-29 13:32       ` Dongli Zhang
2024-03-05  3:55   ` Xiaoyao Li
2024-02-28  2:41 ` [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults Sean Christopherson
2024-02-29 11:16   ` Huang, Kai
2024-02-29 15:17     ` Sean Christopherson
2024-03-06  9:43   ` Xu Yilun
2024-03-06 14:45     ` Sean Christopherson
2024-03-07  9:05       ` Xu Yilun
2024-03-07 14:36         ` Sean Christopherson
2024-03-12  5:34   ` Binbin Wu
2024-02-28  2:41 ` [PATCH 06/16] KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero Sean Christopherson
2024-02-29 22:11   ` Huang, Kai
2024-02-29 23:07     ` Sean Christopherson
2024-03-12  5:44       ` Binbin Wu
2024-02-28  2:41 ` [PATCH 07/16] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler Sean Christopherson
2024-02-29 22:19   ` Huang, Kai
2024-02-29 22:52     ` Sean Christopherson
2024-02-29 23:14       ` Huang, Kai
2024-03-12  9:44   ` Binbin Wu
2024-02-28  2:41 ` [PATCH 08/16] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults Sean Christopherson
2024-02-29 22:26   ` Huang, Kai
2024-02-29 23:06     ` Sean Christopherson
2024-02-29 23:21       ` Huang, Kai
2024-03-04 15:51         ` Sean Christopherson
2024-03-05 21:32           ` Huang, Kai
2024-03-06  0:25             ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 09/16] KVM: x86/mmu: Move private vs. shared check above slot validity checks Sean Christopherson
2024-03-05 23:06   ` Huang, Kai
2024-03-06  0:38     ` Sean Christopherson
2024-03-06  1:22       ` Huang, Kai
2024-03-06  2:02         ` Sean Christopherson
2024-03-06 22:06           ` Huang, Kai
2024-03-06 23:49             ` Sean Christopherson
2024-03-07  0:28               ` Huang, Kai
2024-03-08  4:54   ` Xu Yilun
2024-03-08 23:28     ` Sean Christopherson
2024-03-11  4:43       ` Xu Yilun
2024-03-12  0:08         ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 10/16] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots Sean Christopherson
2024-03-07  0:03   ` Huang, Kai
2024-02-28  2:41 ` [PATCH 11/16] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO Sean Christopherson
2024-03-06 22:35   ` Huang, Kai
2024-03-06 22:43     ` Sean Christopherson
2024-03-06 22:49       ` Huang, Kai
2024-03-06 23:01         ` Sean Christopherson
2024-03-06 23:20           ` Huang, Kai
2024-03-07 17:10         ` Kirill A. Shutemov
2024-03-08  0:09           ` Huang, Kai
2024-02-28  2:41 ` [PATCH 12/16] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() Sean Christopherson
2024-03-07  0:11   ` Huang, Kai
2024-02-28  2:41 ` [PATCH 13/16] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() Sean Christopherson
2024-03-07  0:48   ` Huang, Kai
2024-03-07  0:53     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 14/16] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults Sean Christopherson
2024-03-07  0:50   ` Huang, Kai
2024-03-07  1:01     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 15/16] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values Sean Christopherson
2024-03-07  0:46   ` Huang, Kai
2024-02-28  2:41 ` [PATCH 16/16] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns Sean Christopherson
2024-03-07  0:46   ` Huang, Kai
2024-04-17 12:48 ` [PATCH 00/16] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
2024-04-18 15:40   ` Sean Christopherson
2024-04-19  6:47   ` Xiaoyao Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeDPgx1O_AuR2Iz3@google.com \
    --to=seanjc@google.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=tabba@google.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).