All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Yan Zhao <yan.y.zhao@intel.com>,
	Isaku Yamahata <isaku.yamahata@intel.com>,
	 Michael Roth <michael.roth@amd.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	 Chao Peng <chao.p.peng@linux.intel.com>,
	Fuad Tabba <tabba@google.com>,
	 David Matlack <dmatlack@google.com>
Subject: Re: [PATCH 02/16] KVM: x86: Remove separate "bit" defines for page fault error code masks
Date: Thu, 29 Feb 2024 10:40:03 -0800	[thread overview]
Message-ID: <ZeDPgx1O_AuR2Iz3@google.com> (raw)
In-Reply-To: <CABgObfbtPJ6AAX9GnjNscPRTbNAOtamdxX677kx_r=zd4scw6w@mail.gmail.com>

On Thu, Feb 29, 2024, Paolo Bonzini wrote:
> On Wed, Feb 28, 2024 at 3:46 AM Sean Christopherson <seanjc@google.com> wrote:
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 60f21bb4c27b..e8b620a85627 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -213,7 +213,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> >          */
> >         u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
> >         bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
> > -       int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
> > +       int index = (pfec + (not_smap << ilog2(PFERR_RSVD_MASK))) >> 1;
> 
> Just use "(pfec + (not_smap ? PFERR_RSVD_MASK : 0)) >> 1".
> 
> Likewise below, "pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0"/
> 
> No need to even check what the compiler produces, it will be either
> exactly the same code or a bunch of cmov instructions.

I couldn't resist :-)

The second one generates identical code, but for this one:

  int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;

gcc generates almost bizarrely different code in the call from vcpu_mmio_gva_to_gpa().
clang is clever enough to realize "pfec" can only contain USER_MASK and/or WRITE_MASK,
and so does a ton of dead code elimination and other optimizations.  But for some
reason, gcc doesn't appear to realize that, and generates a MOVSX when computing
"index", i.e. sign-extends the result of the ADD (at least, I think that's what it's
doing).

There's no actual bug today, and the vcpu_mmio_gva_to_gpa() path is super safe
since KVM fully controls the error code.  But the call from FNAME(walk_addr_generic)
uses a _much_ more dynamic error code.

If an error code with unexpected bits set managed to get into permission_fault(),
I'm pretty sure we'd end up with out-of-bounds accesses.  KVM sanity checks that
PK and RSVD aren't set, 

	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));

but KVM unnecessarily uses an ADD instead of OR, here


	int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;

and here

		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
		offset = (pfec & ~1) +
			((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));

i.e. if the WARN fired, KVM would generate completely unexpected values due to
adding two RSVD bit flags.

And if _really_ unexpected flags make their way into permission_fault(), e.g. the
upcoming RMP flag (bit 31) or Intel's SGX flag (bit 15), then the use of index

	fault = (mmu->permissions[index] >> pte_access) & 1;

could generate a read waaaya outside of the array.  It can't/shouldn't happen in
practice since KVM shouldn't be trying to emulate RMP violations or faults in SGX
enclaves, but it's unnecessarily dangerous.

Long story short, I think we should get to the below (I'll post a separate series,
assuming I'm not missing something).

	unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
	unsigned int pfec = access & (PFERR_PRESENT_MASK |
				      PFERR_WRITE_MASK |
				      PFERR_USER_MASK |
				      PFERR_FETCH_MASK);

	/*
	 * For explicit supervisor accesses, SMAP is disabled if EFLAGS.AC = 1.
	 * For implicit supervisor accesses, SMAP cannot be overridden.
	 *
	 * SMAP works on supervisor accesses only, and not_smap can
	 * be set or not set when user access with neither has any bearing
	 * on the result.
	 *
	 * We put the SMAP checking bit in place of the PFERR_RSVD_MASK bit;
	 * this bit will always be zero in pfec, but it will be one in index
	 * if SMAP checks are being disabled.
	 */
	u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
	bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
	int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1;
	u32 errcode = PFERR_PRESENT_MASK;
	bool fault;

	kvm_mmu_refresh_passthrough_bits(vcpu, mmu);

	fault = (mmu->permissions[index] >> pte_access) & 1;

	/*
	 * Sanity check that no bits are set in the legacy #PF error code
	 * (bits 31:0) other than the supported permission bits (see above).
	 */
	WARN_ON_ONCE(pfec != (unsigned int)access);

	if (unlikely(mmu->pkru_mask)) {
		u32 pkru_bits, offset;

		/*
		* PKRU defines 32 bits, there are 16 domains and 2
		* attribute bits per domain in pkru.  pte_pkey is the
		* index of the protection domain, so pte_pkey * 2 is
		* is the index of the first bit for the domain.
		*/
		pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;

		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
		offset = (pfec & ~1) | (pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0);

		pkru_bits &= mmu->pkru_mask >> offset;
		errcode |= -pkru_bits & PFERR_PK_MASK;
		fault |= (pkru_bits != 0);
	}

	return -(u32)fault & errcode;

  reply	other threads:[~2024-02-29 18:40 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28  2:41 [PATCH 00/16] KVM: x86/mmu: Page fault and MMIO cleanups Sean Christopherson
2024-02-28  2:41 ` [PATCH 01/16] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation Sean Christopherson
2024-03-01  8:48   ` Xiaoyao Li
2024-03-07 12:52   ` Gupta, Pankaj
2024-03-12  2:59     ` Binbin Wu
2024-04-04 16:38       ` Sean Christopherson
2024-03-08  4:22   ` Yan Zhao
2024-04-04 16:45     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 02/16] KVM: x86: Remove separate "bit" defines for page fault error code masks Sean Christopherson
2024-02-29 12:44   ` Paolo Bonzini
2024-02-29 18:40     ` Sean Christopherson [this message]
2024-02-29 20:56       ` Paolo Bonzini
2024-02-29 13:43   ` Dongli Zhang
2024-02-29 15:25     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 03/16] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF Sean Christopherson
2024-02-28  4:43   ` Dongli Zhang
2024-02-28 16:16     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 04/16] KVM: x86/mmu: Pass full 64-bit error code when handling page faults Sean Christopherson
2024-02-28  7:30   ` Dongli Zhang
2024-02-28 16:22     ` Sean Christopherson
2024-02-29 13:32       ` Dongli Zhang
2024-03-05  3:55   ` Xiaoyao Li
2024-02-28  2:41 ` [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults Sean Christopherson
2024-02-29 11:16   ` Huang, Kai
2024-02-29 15:17     ` Sean Christopherson
2024-03-06  9:43   ` Xu Yilun
2024-03-06 14:45     ` Sean Christopherson
2024-03-07  9:05       ` Xu Yilun
2024-03-07 14:36         ` Sean Christopherson
2024-03-12  5:34   ` Binbin Wu
2024-02-28  2:41 ` [PATCH 06/16] KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero Sean Christopherson
2024-02-29 22:11   ` Huang, Kai
2024-02-29 23:07     ` Sean Christopherson
2024-03-12  5:44       ` Binbin Wu
2024-02-28  2:41 ` [PATCH 07/16] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler Sean Christopherson
2024-02-29 22:19   ` Huang, Kai
2024-02-29 22:52     ` Sean Christopherson
2024-02-29 23:14       ` Huang, Kai
2024-03-12  9:44   ` Binbin Wu
2024-02-28  2:41 ` [PATCH 08/16] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults Sean Christopherson
2024-02-29 22:26   ` Huang, Kai
2024-02-29 23:06     ` Sean Christopherson
2024-02-29 23:21       ` Huang, Kai
2024-03-04 15:51         ` Sean Christopherson
2024-03-05 21:32           ` Huang, Kai
2024-03-06  0:25             ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 09/16] KVM: x86/mmu: Move private vs. shared check above slot validity checks Sean Christopherson
2024-03-05 23:06   ` Huang, Kai
2024-03-06  0:38     ` Sean Christopherson
2024-03-06  1:22       ` Huang, Kai
2024-03-06  2:02         ` Sean Christopherson
2024-03-06 22:06           ` Huang, Kai
2024-03-06 23:49             ` Sean Christopherson
2024-03-07  0:28               ` Huang, Kai
2024-03-08  4:54   ` Xu Yilun
2024-03-08 23:28     ` Sean Christopherson
2024-03-11  4:43       ` Xu Yilun
2024-03-12  0:08         ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 10/16] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots Sean Christopherson
2024-03-07  0:03   ` Huang, Kai
2024-02-28  2:41 ` [PATCH 11/16] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO Sean Christopherson
2024-03-06 22:35   ` Huang, Kai
2024-03-06 22:43     ` Sean Christopherson
2024-03-06 22:49       ` Huang, Kai
2024-03-06 23:01         ` Sean Christopherson
2024-03-06 23:20           ` Huang, Kai
2024-03-07 17:10         ` Kirill A. Shutemov
2024-03-08  0:09           ` Huang, Kai
2024-02-28  2:41 ` [PATCH 12/16] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() Sean Christopherson
2024-03-07  0:11   ` Huang, Kai
2024-02-28  2:41 ` [PATCH 13/16] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() Sean Christopherson
2024-03-07  0:48   ` Huang, Kai
2024-03-07  0:53     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 14/16] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults Sean Christopherson
2024-03-07  0:50   ` Huang, Kai
2024-03-07  1:01     ` Sean Christopherson
2024-02-28  2:41 ` [PATCH 15/16] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values Sean Christopherson
2024-03-07  0:46   ` Huang, Kai
2024-02-28  2:41 ` [PATCH 16/16] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns Sean Christopherson
2024-03-07  0:46   ` Huang, Kai
2024-04-17 12:48 ` [PATCH 00/16] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
2024-04-18 15:40   ` Sean Christopherson
2024-04-19  6:47   ` Xiaoyao Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeDPgx1O_AuR2Iz3@google.com \
    --to=seanjc@google.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=tabba@google.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.