All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yang Weijiang <weijiang.yang@intel.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  chao.gao@intel.com,
	rick.p.edgecombe@intel.com, mlevitsk@redhat.com,
	 john.allen@amd.com, Aaron Lewis <aaronlewis@google.com>,
	 Jim Mattson <jmattson@google.com>,
	Oliver Upton <oupton@google.com>,
	 Mingwei Zhang <mizhang@google.com>
Subject: Re: [PATCH v10 20/27] KVM: VMX: Emulate read and write to CET MSRs
Date: Tue, 12 Mar 2024 15:55:07 -0700	[thread overview]
Message-ID: <ZfDdS8rtVtyEr0UR@google.com> (raw)
In-Reply-To: <20240219074733.122080-21-weijiang.yang@intel.com>

-non-KVM people, +Mingwei, Aaron, Oliver, and Jim

On Sun, Feb 18, 2024, Yang Weijiang wrote:
>  	case MSR_IA32_PERF_CAPABILITIES:
>  		if (data && !vcpu_to_pmu(vcpu)->version)
>  			return 1;

Ha, perfect, this is already in the diff context.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c0ed69353674..281c3fe728c5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1849,6 +1849,36 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
>  }
>  EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>  
> +#define CET_US_RESERVED_BITS		GENMASK(9, 6)
> +#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
> +#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
> +#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
> +
> +static bool is_set_cet_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u64 data,
> +				   bool host_initiated)
> +{

...

> +	/*
> +	 * If KVM supports the MSR, i.e. has enumerated the MSR existence to
> +	 * userspace, then userspace is allowed to write '0' irrespective of
> +	 * whether or not the MSR is exposed to the guest.
> +	 */
> +	if (!host_initiated || data)
> +		return false;

...

> @@ -1951,6 +2017,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>  		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
>  			return 1;
>  		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> +		    !guest_can_use(vcpu, X86_FEATURE_IBT))
> +			return 1;

As pointed out by Mingwei in a conversation about PERF_CAPABILITIES, rejecting
host *reads* while allowing host writes of '0' is inconsistent.  Which, while
arguably par for the course for KVM's ABI, will likely result in the exact problem
we're trying to avoid: killing userspace because it attempts to access an MSR KVM
has said exists.

PERF_CAPABILITIES has a similar, but opposite, problem where KVM returns a non-zero
value on reads, but rejects that same non-zero value on write.  PERF_CAPABILITIES
is even more complicated because KVM stuff a non-zero value at vCPU creation, but
that's not really relevant to this discussion, just another data point for how
messed up this all is.

Also relevant to this discussion are KVM's PV MSRs, e.g. MSR_KVM_ASYNC_PF_ACK,
as KVM rejects attempts to write '0' if the guest doesn't support the MSR, but
if and only userspace has enabled KVM_CAP_ENFORCE_PV_FEATURE_CPUID.

Coming to the point, this mess is getting too hard to maintain, both from a code
perspective and "what is KVM's ABI?" perspective.

Rather than play whack-a-mole and inevitably end up with bugs and/or inconsistencies,
what if we (a) return KVM_MSR_RET_INVALID when an MSR access is denied based on
guest CPUID, (b) wrap userspace MSR accesses at the very top level and convert
KVM_MSR_RET_INVALID to "success" when KVM reported the MSR as savable and userspace
is reading or writing '0', and (c) drop all of the host_initiated checks that
exist purely to exempt userspace access from guest CPUID checks.

The only possible hiccup I can think of is that this could subtly break userspace
that is setting CPUID _after_ MSRs, but my understanding is that we've agreed to
draw a line and say that that's unsupported.  And I think it's low risk, because
I don't see how code like this:

	case MSR_TSC_AUX:
		if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
			return 1;

		if (!host_initiated &&
		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
			return 1;

		if (guest_cpuid_is_intel(vcpu) && (data >> 32) != 0)
			return 1;

can possibly work if userspace sets MSRs first.  The RDTSCP/RDPID checks are
exempt, but the vendor in guest CPUID would be '0', not Intel's magic string,
and so setting MSRs before CPUID would fail, at least if the target vCPU model
is Intel.

P.S. I also want to rename KVM_MSR_RET_INVALID => KVM_MSR_RET_UNSUPPORTED, because
I can never remember that "invalid" doesn't mean the value was invalid, it means
the MSR index was invalid.

It'll take a few patches, but I believe we can end up with something like this:

static bool kvm_is_msr_to_save(u32 msr_index)
{
	unsigned int i;

	for (i = 0; i < num_msrs_to_save; i++) {
		if (msrs_to_save[i] == msr_index)
			return true;
	}

	return false;
}
typedef int (*msr_uaccess_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
			     bool host_initiated);

static __always_inline int kvm_do_msr_uaccess(struct kvm_vcpu *vcpu, u32 msr,
					      u64 *data, bool host_initiated,
					      enum kvm_msr_access rw,
					      msr_uaccess_t msr_uaccess_fn)
{
	const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
	int ret;

	BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);

	/*
	 * Zero the data on read failures to avoid leaking stack data to the
	 * guest and/or userspace, e.g. if the failure is ignored below.
	 */
	ret = msr_uaccess_fn(vcpu, msr, data, host_initiated);
	if (ret && rw == MSR_TYPE_R)
		*data = 0;

	if (ret != KVM_MSR_RET_UNSUPPORTED)
		return ret;

	/*
	 * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
	 * reports as to-be-saved, even if an MSRs isn't fully supported.
	 * Simply check that @data is '0', which covers both the write '0' case
	 * and all reads (in which case @data is zeroed on failure; see above).
	 */
	if (kvm_is_msr_to_save(msr) && !*data)
		return 0;

	if (!ignore_msrs) {
		kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
				      op, msr, *data);
		return ret;
	}

	if (report_ignored_msrs)
		kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
	
	return 0;
}

  reply	other threads:[~2024-03-12 22:55 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19  7:47 [PATCH v10 00/27] Enable CET Virtualization Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 01/27] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 02/27] x86/fpu/xstate: Refine CET user xstate bit enabling Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 03/27] x86/fpu/xstate: Add CET supervisor mode state support Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 04/27] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Yang Weijiang
2024-05-01 18:45   ` Sean Christopherson
2024-05-02 17:46     ` Dave Hansen
2024-05-07 22:57       ` Sean Christopherson
2024-05-07 23:17         ` Dave Hansen
2024-05-08  1:19           ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 05/27] x86/fpu/xstate: Introduce fpu_guest_cfg for guest FPU configuration Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 06/27] x86/fpu/xstate: Create guest fpstate with guest specific config Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 07/27] x86/fpu/xstate: Warn if kernel dynamic xfeatures detected in normal fpstate Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 08/27] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on vCPU data Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 09/27] KVM: x86: Rename kvm_{g,s}et_msr()* to menifest emulation operations Yang Weijiang
2024-05-01 18:54   ` Sean Christopherson
2024-05-06  5:58     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 10/27] KVM: x86: Refine xsave-managed guest register/MSR reset handling Yang Weijiang
2024-02-20  3:04   ` Chao Gao
2024-02-20 13:23     ` Yang, Weijiang
2024-05-01 20:40   ` Sean Christopherson
2024-05-06  7:26     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 11/27] KVM: x86: Add kvm_msr_{read,write}() helpers Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 12/27] KVM: x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 13/27] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2024-02-20  8:51   ` Chao Gao
2024-05-01 20:43   ` Sean Christopherson
2024-05-06  7:30     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 14/27] KVM: x86: Initialize kvm_caps.supported_xss Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 15/27] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 16/27] KVM: x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 17/27] KVM: x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
2024-05-01 22:40   ` Sean Christopherson
2024-05-06  8:31     ` Yang, Weijiang
2024-05-07 17:27       ` Sean Christopherson
2024-05-08  7:00         ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 18/27] KVM: VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 19/27] KVM: x86: Use KVM-governed feature framework to track "SHSTK/IBT enabled" Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 20/27] KVM: VMX: Emulate read and write to CET MSRs Yang Weijiang
2024-03-12 22:55   ` Sean Christopherson [this message]
2024-03-13  9:43     ` Yang, Weijiang
2024-03-13 16:00       ` Sean Christopherson
2024-02-19  7:47 ` [PATCH v10 21/27] KVM: x86: Save and reload SSP to/from SMRAM Yang Weijiang
2024-05-01 22:50   ` Sean Christopherson
2024-05-06  8:41     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 22/27] KVM: VMX: Set up interception for CET MSRs Yang Weijiang
2024-05-01 23:07   ` Sean Christopherson
2024-05-06  8:48     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 23/27] KVM: VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
2024-02-19  7:47 ` [PATCH v10 24/27] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2024-05-01 23:15   ` Sean Christopherson
2024-05-01 23:24     ` Edgecombe, Rick P
2024-05-06  9:19     ` Yang, Weijiang
2024-05-06 16:54       ` Sean Christopherson
2024-05-07  2:37         ` Yang, Weijiang
2024-05-06 17:05       ` Edgecombe, Rick P
2024-05-06 23:33         ` Sean Christopherson
2024-05-06 23:53           ` Edgecombe, Rick P
2024-05-07 14:21             ` Sean Christopherson
2024-05-07 14:45               ` Edgecombe, Rick P
2024-05-07 15:08                 ` Sean Christopherson
2024-05-07 15:33                   ` Edgecombe, Rick P
2024-05-01 23:34   ` Sean Christopherson
2024-05-06  9:41     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 25/27] KVM: nVMX: Introduce new VMX_BASIC bit for event error_code delivery to L1 Yang Weijiang
2024-05-01 23:19   ` Sean Christopherson
2024-05-06  9:19     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 26/27] KVM: nVMX: Enable CET support for nested guest Yang Weijiang
2024-05-01 23:23   ` Sean Christopherson
2024-05-06  9:25     ` Yang, Weijiang
2024-02-19  7:47 ` [PATCH v10 27/27] KVM: x86: Don't emulate instructions guarded by CET Yang Weijiang
2024-05-01 23:24   ` Sean Christopherson
2024-05-06  9:26     ` Yang, Weijiang
2024-03-06 14:44 ` [PATCH v10 00/27] Enable CET Virtualization Yang, Weijiang
2024-05-01 23:27 ` Sean Christopherson
2024-05-06  9:31   ` Yang, Weijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfDdS8rtVtyEr0UR@google.com \
    --to=seanjc@google.com \
    --cc=aaronlewis@google.com \
    --cc=chao.gao@intel.com \
    --cc=jmattson@google.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=mlevitsk@redhat.com \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=weijiang.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.