Clean up a handful of interrelated warts in the kernel's handling of VMX: - Enable VMX in IA32_FEATURE_CONTROL during boot instead of on-demand during KVM load to avoid future contention over writing the MSR. - Rework VMX feature reporting so that it is accurate and up-to-date, now and in the future. - Consolidate code across CPUs that support VMX. The patches are based on tip/x86/cpu, commit 87d6021b8143 ("x86/math-emu: Limit MATH_EMULATION to 486SX compatibles") Please let me know if you'd prefer not to receive the full patch set on future versions of the series. I cc'd everyone on all patches to provide the full picture, e.g. the motivation behind things like the perf patch. This series stems from two separate but related issues. The first issue, pointed out by Boris in the SGX enabling series[1], is that the kernel currently doesn't ensure the IA32_FEATURE_CONTROL MSR is configured during boot. The second issue is that the kernel's reporting of VMX features is stale, potentially inaccurate, and difficult to maintain. == IA32_FEATURE_CONTROL == Lack of IA32_FEATURE_CONTROL configuration during boot isn't a functional issue in the current kernel as the majority of platforms set and lock IA32_FEATURE_CONTROL in firmware. And when the MSR is left unlocked, KVM is the only subsystem that writes IA32_FEATURE_CONTROL. That will change if/when SGX support is enabled, as SGX will also want to fully enable itself when IA32_FEATURE_CONTROL is unlocked. == VMX Feature Reporting == VMX features are not enumerated via CPUID, but instead are enumerated through VMX MSRs. As a result, new VMX features are not automatically reported via /proc/cpuinfo. An attempt was made long ago to report interesting and/or meaningful VMX features by synthesizing select features into a Linux-defined cpufeatures word. Synthetic feature flags worked for the initial purpose, but the existence of the synthetic flags was forgotten almost immediately, e.g. only one new flag (EPT A/D) has been added in the the decade since the synthetic VMX features were introduced, while VMX and KVM have gained support for many new features. Placing the synthetic flags in x86_capability also allows them to be queried via cpu_has() and company, which is misleading as the flags exist purely for reporting via /proc/cpuinfo. KVM, the only in-kernel user of VMX, ignores the flags. Last but not least, VMX features are reported in /proc/cpuinfo even when VMX is unusable due to lack of enabling in IA32_FEATURE_CONTROL. == Caveats == All of the testing of non-standard flows was done in a VM, as I don't have a system that leaves IA32_FEATURE_CONTROL unlocked, or locks it with VMX disabled. The Centaur and Zhaoxin changes are somewhat speculative, as I haven't confirmed they actually support IA32_FEATURE_CONTROL, or that they want to gain "official" KVM support. I assume they unofficially support KVM given that both CPUs went through the effort of enumerating VMX features. That in turn would require them to support IA32_FEATURE_CONTROL since KVM will fault and refuse to load if the MSR doesn't exist. [1] https://lkml.kernel.org/r/20190925085156.GA3891@zn.tnic Sean Christopherson (16): x86/intel: Initialize IA32_FEATURE_CONTROL MSR at boot x86/mce: WARN once if IA32_FEATURE_CONTROL MSR is left unlocked x86/centaur: Use common IA32_FEATURE_CONTROL MSR initialization x86/zhaoxin: Use common IA32_FEATURE_CONTROL MSR initialization KVM: VMX: Drop initialization of IA32_FEATURE_CONTROL MSR x86/cpu: Clear VMX feature flag if VMX is not fully enabled KVM: VMX: Use VMX feature flag to query BIOS enabling KVM: VMX: Check for full VMX support when verifying CPU compatibility x86/vmx: Introduce VMX_FEATURES_* x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs x86/cpu: Print VMX features as separate line item in /proc/cpuinfo x86/cpufeatures: Drop synthetic VMX feature flags KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits x86/cpufeatures: Clean up synthetic virtualization flags perf/x86: Provide stubs of KVM helpers for non-Intel CPUs KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs MAINTAINERS | 2 +- arch/x86/Kconfig.cpu | 8 ++ arch/x86/boot/mkcpustr.c | 1 + arch/x86/include/asm/cpufeatures.h | 15 +--- arch/x86/include/asm/perf_event.h | 22 +++-- arch/x86/include/asm/processor.h | 4 + arch/x86/include/asm/vmx.h | 105 +++++++++++----------- arch/x86/include/asm/vmxfeatures.h | 121 ++++++++++++++++++++++++++ arch/x86/kernel/cpu/Makefile | 6 +- arch/x86/kernel/cpu/centaur.c | 35 +------- arch/x86/kernel/cpu/common.c | 3 + arch/x86/kernel/cpu/cpu.h | 4 + arch/x86/kernel/cpu/feature_control.c | 106 ++++++++++++++++++++++ arch/x86/kernel/cpu/intel.c | 49 +---------- arch/x86/kernel/cpu/mce/intel.c | 7 +- arch/x86/kernel/cpu/mkcapflags.sh | 15 +++- arch/x86/kernel/cpu/proc.c | 15 ++++ arch/x86/kernel/cpu/zhaoxin.c | 35 +------- arch/x86/kvm/Kconfig | 9 +- arch/x86/kvm/vmx/vmx.c | 41 ++------- 20 files changed, 368 insertions(+), 235 deletions(-) create mode 100644 arch/x86/include/asm/vmxfeatures.h create mode 100644 arch/x86/kernel/cpu/feature_control.c -- 2.22.0
Opportunistically initialize IA32_FEATURE_CONTROL MSR to enable VMX when the MSR is left unlocked by BIOS. Configuring IA32_FEATURE_CONTROL at boot time paves the way for similar enabling of other features, e.g. Software Guard Extensions (SGX). Temporarily leave equivalent KVM code in place in order to avoid introducing a regression on Centaur and Zhaoxin CPUs, e.g. removing KVM's code would leave the MSR unlocked on those CPUs and would break existing functionality if people are loading kvm_intel on Centaur and/or Zhaoxin. Defer enablement of the boot-time configuration on Centaur and Zhaoxin to future patches to aid bisection. Note, Local Machine Check Exceptions (LMCE) are also supported by the kernel and enabled via IA32_FEATURE_CONTROL, but the kernel currently uses LMCE if and and only if the feature is explicitly enable by BIOS. Keep the current behavior to avoid introducing bugs, future patches can opt in to opportunistic enabling if it's deemed desirable to do so. Always lock IA32_FEATURE_CONTROL if it exists, even if the CPU doesn't support VMX, so that other existing and future kernel code that queries IA32_FEATURE_CONTROL can assume it's locked. Start from a clean slate when constructing the value to write to IA32_FEATURE_CONTROL, i.e. ignore whatever value BIOS left in the MSR so as not to enable random features or fault on the WRMSR. Suggested-by: Borislav Petkov <bp@suse.de> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/Kconfig.cpu | 4 ++++ arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/cpu.h | 4 ++++ arch/x86/kernel/cpu/feature_control.c | 30 +++++++++++++++++++++++++++ arch/x86/kernel/cpu/intel.c | 2 ++ 5 files changed, 41 insertions(+) create mode 100644 arch/x86/kernel/cpu/feature_control.c diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index af9c967782f6..aafc14a0abf7 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -387,6 +387,10 @@ config X86_DEBUGCTLMSR def_bool y depends on !(MK6 || MWINCHIPC6 || MWINCHIP3D || MCYRIXIII || M586MMX || M586TSC || M586 || M486SX || M486) && !UML +config X86_FEATURE_CONTROL_MSR + def_bool y + depends on CPU_SUP_INTEL + menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT ---help--- diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index d7a1e5a9331c..df5ad0cfe3e9 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -29,6 +29,7 @@ obj-y += umwait.o obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o +obj-$(CONFIG_X86_FEATURE_CONTROL_MSR) += feature_control.o ifdef CONFIG_CPU_SUP_INTEL obj-y += intel.o intel_pconfig.o obj-$(CONFIG_PM) += intel_epb.o diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index c0e2407abdd6..d2750f53a0cb 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -62,4 +62,8 @@ unsigned int aperfmperf_get_khz(int cpu); extern void x86_spec_ctrl_setup_ap(void); +#ifdef CONFIG_X86_FEATURE_CONTROL_MSR +void init_feature_control_msr(struct cpuinfo_x86 *c); +#endif + #endif /* ARCH_X86_CPU_H */ diff --git a/arch/x86/kernel/cpu/feature_control.c b/arch/x86/kernel/cpu/feature_control.c new file mode 100644 index 000000000000..57b928e64cf5 --- /dev/null +++ b/arch/x86/kernel/cpu/feature_control.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/tboot.h> + +#include <asm/cpufeature.h> +#include <asm/msr-index.h> +#include <asm/processor.h> + +void init_feature_control_msr(struct cpuinfo_x86 *c) +{ + u64 msr; + + if (rdmsrl_safe(MSR_IA32_FEATURE_CONTROL, &msr)) + return; + + if (msr & FEATURE_CONTROL_LOCKED) + return; + + /* + * Ignore whatever value BIOS left in the MSR to avoid enabling random + * features or faulting on the WRMSR. + */ + msr = FEATURE_CONTROL_LOCKED; + + if (cpu_has(c, X86_FEATURE_VMX)) { + msr |= FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; + if (tboot_enabled()) + msr |= FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX; + } + wrmsrl(MSR_IA32_FEATURE_CONTROL, msr); +} diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index c2fdc00df163..15d59224e2f8 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -755,6 +755,8 @@ static void init_intel(struct cpuinfo_x86 *c) /* Work around errata */ srat_detect_node(c); + init_feature_control_msr(c); + if (cpu_has(c, X86_FEATURE_VMX)) detect_vmx_virtcap(c); -- 2.22.0
WARN if the IA32_FEATURE_CONTROL MSR is somehow left unlocked now that CPU initialization unconditionally locks the MSR. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/mce/intel.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index 88cd9598fa57..1008f14b803b 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -117,11 +117,10 @@ static bool lmce_supported(void) * generate a #GP fault. */ rdmsrl(MSR_IA32_FEATURE_CONTROL, tmp); - if ((tmp & (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE)) == - (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_LMCE)) - return true; + if (WARN_ON_ONCE(!(tmp & FEATURE_CONTROL_LOCKED))) + return false; - return false; + return tmp & FEATURE_CONTROL_LMCE; } bool mce_intel_cmci_poll(void) -- 2.22.0
Use the recently added IA32_FEATURE_CONTROL MSR initialization sequence to opportunstically enable VMX support when running on a Centaur CPU. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/Kconfig.cpu | 2 +- arch/x86/kernel/cpu/centaur.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index aafc14a0abf7..9e4e41424dc2 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -389,7 +389,7 @@ config X86_DEBUGCTLMSR config X86_FEATURE_CONTROL_MSR def_bool y - depends on CPU_SUP_INTEL + depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c index 14433ff5b828..a6ca4c31c1b6 100644 --- a/arch/x86/kernel/cpu/centaur.c +++ b/arch/x86/kernel/cpu/centaur.c @@ -250,6 +250,8 @@ static void init_centaur(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); #endif + init_feature_control_msr(c); + if (cpu_has(c, X86_FEATURE_VMX)) centaur_detect_vmx_virtcap(c); } -- 2.22.0
Use the recently added IA32_FEATURE_CONTROL MSR initialization sequence to opportunstically enable VMX support when running on a Zhaoxin CPU. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/Kconfig.cpu | 2 +- arch/x86/kernel/cpu/zhaoxin.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 9e4e41424dc2..e78f39adae7b 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -389,7 +389,7 @@ config X86_DEBUGCTLMSR config X86_FEATURE_CONTROL_MSR def_bool y - depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR + depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR || CPU_SUP_ZHAOXIN menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT diff --git a/arch/x86/kernel/cpu/zhaoxin.c b/arch/x86/kernel/cpu/zhaoxin.c index 8e6f2f4b4afe..01b05a4a5a85 100644 --- a/arch/x86/kernel/cpu/zhaoxin.c +++ b/arch/x86/kernel/cpu/zhaoxin.c @@ -141,6 +141,8 @@ static void init_zhaoxin(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); #endif + init_feature_control_msr(c); + if (cpu_has(c, X86_FEATURE_VMX)) zhaoxin_detect_vmx_virtcap(c); } -- 2.22.0
Remove the code to initialize IA32_FEATURE_CONTROL MSR when KVM is loaded now that the MSR is initialized during boot on all CPUs that support VMX, i.e. can possibly load kvm_intel. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/vmx/vmx.c | 48 +++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index d4575ffb3cec..23c9e4b91b31 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2192,24 +2192,26 @@ static __init int vmx_disabled_by_bios(void) u64 msr; rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); - if (msr & FEATURE_CONTROL_LOCKED) { - /* launched w/ TXT and VMX disabled */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) - && tboot_enabled()) - return 1; - /* launched w/o TXT and VMX only enabled w/ TXT */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) - && (msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) - && !tboot_enabled()) { - printk(KERN_WARNING "kvm: disable TXT in the BIOS or " - "activate TXT before enabling KVM\n"); - return 1; - } - /* launched w/o TXT and VMX disabled */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) - && !tboot_enabled()) - return 1; + + if (WARN_ON_ONCE(!(msr & FEATURE_CONTROL_LOCKED))) + return 1; + + /* launched w/ TXT and VMX disabled */ + if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) && + tboot_enabled()) + return 1; + /* launched w/o TXT and VMX only enabled w/ TXT */ + if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) && + (msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) && + !tboot_enabled()) { + pr_warn("kvm: disable TXT in the BIOS or " + "activate TXT before enabling KVM\n"); + return 1; } + /* launched w/o TXT and VMX disabled */ + if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) && + !tboot_enabled()) + return 1; return 0; } @@ -2226,7 +2228,6 @@ static int hardware_enable(void) { int cpu = raw_smp_processor_id(); u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); - u64 old, test_bits; if (cr4_read_shadow() & X86_CR4_VMXE) return -EBUSY; @@ -2254,17 +2255,6 @@ static int hardware_enable(void) */ crash_enable_local_vmclear(cpu); - rdmsrl(MSR_IA32_FEATURE_CONTROL, old); - - test_bits = FEATURE_CONTROL_LOCKED; - test_bits |= FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; - if (tboot_enabled()) - test_bits |= FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX; - - if ((old & test_bits) != test_bits) { - /* enable and lock */ - wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits); - } kvm_cpu_vmxon(phys_addr); if (enable_ept) ept_sync_global(); -- 2.22.0
Now that the IA32_FEATURE_CONTROL MSR is guaranteed to be configured and locked, clear the VMX capability flag if the IA32_FEATURE_CONTROL MSR is not supported or if BIOS disabled VMX, i.e. locked IA32_FEATURE_CONTROL and did not set the appropriate VMX enable bit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/feature_control.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/feature_control.c b/arch/x86/kernel/cpu/feature_control.c index 57b928e64cf5..74c76159a046 100644 --- a/arch/x86/kernel/cpu/feature_control.c +++ b/arch/x86/kernel/cpu/feature_control.c @@ -7,13 +7,19 @@ void init_feature_control_msr(struct cpuinfo_x86 *c) { + bool tboot = tboot_enabled(); u64 msr; - if (rdmsrl_safe(MSR_IA32_FEATURE_CONTROL, &msr)) + if (rdmsrl_safe(MSR_IA32_FEATURE_CONTROL, &msr)) { + if (cpu_has(c, X86_FEATURE_VMX)) { + pr_err_once("x86/cpu: VMX disabled, IA32_FEATURE_CONTROL MSR unsupported\n"); + clear_cpu_cap(c, X86_FEATURE_VMX); + } return; + } if (msr & FEATURE_CONTROL_LOCKED) - return; + goto update_caps; /* * Ignore whatever value BIOS left in the MSR to avoid enabling random @@ -23,8 +29,19 @@ void init_feature_control_msr(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_VMX)) { msr |= FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; - if (tboot_enabled()) + if (tboot) msr |= FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX; } wrmsrl(MSR_IA32_FEATURE_CONTROL, msr); + +update_caps: + if (!cpu_has(c, X86_FEATURE_VMX)) + return; + + if ((tboot && !(msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX)) || + (!tboot && !(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX))) { + pr_err_once("x86/cpu: VMX disabled by BIOS (TXT %s)\n", + tboot ? "enabled" : "disabled"); + clear_cpu_cap(c, X86_FEATURE_VMX); + } } -- 2.22.0
Replace KVM's manual checks on IA32_FEATURE_CONTROL with a query on the boot CPU's VMX feature flag. The VMX flag is now cleared during boot if VMX isn't fully enabled via IA32_FEATURE_CONTROL, including the case where IA32_FEATURE_CONTROL isn't supported. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/vmx/vmx.c | 26 +------------------------- 1 file changed, 1 insertion(+), 25 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 23c9e4b91b31..510f8a778fca 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2189,31 +2189,7 @@ static __init int cpu_has_kvm_support(void) static __init int vmx_disabled_by_bios(void) { - u64 msr; - - rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); - - if (WARN_ON_ONCE(!(msr & FEATURE_CONTROL_LOCKED))) - return 1; - - /* launched w/ TXT and VMX disabled */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) && - tboot_enabled()) - return 1; - /* launched w/o TXT and VMX only enabled w/ TXT */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) && - (msr & FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) && - !tboot_enabled()) { - pr_warn("kvm: disable TXT in the BIOS or " - "activate TXT before enabling KVM\n"); - return 1; - } - /* launched w/o TXT and VMX disabled */ - if (!(msr & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) && - !tboot_enabled()) - return 1; - - return 0; + return !boot_cpu_has(X86_FEATURE_VMX); } static void kvm_cpu_vmxon(u64 addr) -- 2.22.0
Explicitly check the current CPU's VMX feature flag when verifying compatibility across physical CPUs. This effectively adds a check on IA32_FEATURE_CONTROL to ensure that VMX is fully enabled on all CPUs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/vmx/vmx.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 510f8a778fca..a482949063f2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6837,6 +6837,11 @@ static int __init vmx_check_processor_compat(void) struct vmcs_config vmcs_conf; struct vmx_capability vmx_cap; + if (!this_cpu_has(X86_FEATURE_VMX)) { + pr_err("kvm: VMX is disabled on CPU %d\n", smp_processor_id()); + return -EIO; + } + if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) return -EIO; if (nested) -- 2.22.0
Add a VMX specific variant of X86_FEATURE_* flags, which will eventually supplant the synthetic VMX flags defined in cpufeatures word 8. Use the Intel-defined layouts for the major VMX execution controls so that their word entries can be directly populated from their respective MSRs, and so that the VMX_FEATURE_* flags can be used to define the existing bit definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE flag when adding support for a new hardware feature. The majority of Intel's (and compatible CPU's) VMX capabilities are enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't naturally provide any insight into the virtualization capabilities of VMX enabled CPUs. Commit e38e05a85828d ("x86: extended "flags" to show virtualization HW feature in /proc/cpuinfo") attempted to address the issue by synthesizing select VMX features into a Linux-defined word in cpufeatures. The synthetic cpufeatures approach has several flaws: - The set of synthesized VMX flags has become extremely stale with respect to the full set of VMX features, e.g. only one new flag (EPT A/D) has been added in the the decade since the introduction of the synthetic VMX features. Failure to keep the VMX flags up to date is likely due to the lack of a mechanism that forces developers to consider whether or not a new feature is worth reporting. - The synthetic flags may incorrectly be misinterpreted as affecting kernel behavior, i.e. KVM, the kernel's sole consumer of VMX, completely ignores the synthetic flags. - New CPU vendors that support VMX have duplicated the hideous code that propagates VMX features from MSRs to cpufeatures. Bringing the synthetic VMX flags up to date would exacerbate the copy+paste trainwreck. Define separate VMX_FEATURE flags to set the stage for enumerating VMX capabilities outside of the cpu_has() framework, and for adding functional usage of VMX_FEATURE_* to help ensure the features reported via /proc/cpuinfo is up to date with respect to kernel recognition of VMX capabilities. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- MAINTAINERS | 2 +- arch/x86/include/asm/processor.h | 1 + arch/x86/include/asm/vmxfeatures.h | 116 +++++++++++++++++++++++++++++ 3 files changed, 118 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/vmxfeatures.h diff --git a/MAINTAINERS b/MAINTAINERS index 296de2b51c83..a6ba0ddabeb2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9017,7 +9017,7 @@ F: arch/x86/include/uapi/asm/svm.h F: arch/x86/include/asm/kvm* F: arch/x86/include/asm/pvclock-abi.h F: arch/x86/include/asm/svm.h -F: arch/x86/include/asm/vmx.h +F: arch/x86/include/asm/vmx*.h F: arch/x86/kernel/kvm.c F: arch/x86/kernel/kvmclock.c diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6e0a3b43d027..4c3f41d7be5f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -24,6 +24,7 @@ struct vm86; #include <asm/special_insns.h> #include <asm/fpu/types.h> #include <asm/unwind_hints.h> +#include <asm/vmxfeatures.h> #include <linux/personality.h> #include <linux/cache.h> diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h new file mode 100644 index 000000000000..ab82e3643d0c --- /dev/null +++ b/arch/x86/include/asm/vmxfeatures.h @@ -0,0 +1,116 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_VMXFEATURES_H +#define _ASM_X86_VMXFEATURES_H + +/* + * Note: If the comment begins with a quoted string, that string is used + * in /proc/cpuinfo instead of the macro name. If the string is "", + * this feature bit is not displayed in /proc/cpuinfo at all. + */ + +/* Pin-Based VM-Execution Controls, EPT, VPID (misc) and VM-Functions, word 0 */ +#define VMX_FEATURE_INTR_EXITING ( 0*32+ 0) /* "" VM-Exit on vectored interrupts */ +#define VMX_FEATURE_RESERVED_PIN_1 ( 0*32+ 1) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_2 ( 0*32+ 2) /* "" Reserved */ +#define VMX_FEATURE_NMI_EXITING ( 0*32+ 3) /* "" VM-Exit on NMIs */ +#define VMX_FEATURE_RESERVED_PIN_4 ( 0*32+ 4) /* "" Reserved */ +#define VMX_FEATURE_VIRTUAL_NMIS ( 0*32+ 5) /* NMI virtualization */ +#define VMX_FEATURE_PREEMPTION_TIMER ( 0*32+ 6) /* VMX Preemption Timer */ +#define VMX_FEATURE_POSTED_INTR ( 0*32+ 7) /* Posted Interrupts */ +#define VMX_FEATURE_RESERVED_PIN_8 ( 0*32+ 8) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_9 ( 0*32+ 9) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_10 ( 0*32+ 10) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_11 ( 0*32+ 11) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_12 ( 0*32+ 12) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_13 ( 0*32+ 13) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_14 ( 0*32+ 14) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_15 ( 0*32+ 15) /* "" Reserved */ + +#define VMX_FEATURE_EPT_EXECUTE_ONLY ( 0*32+ 16) /* "ept_x_only" EPT entries can be execute only */ +#define VMX_FEATURE_EPT_AD ( 0*32+ 17) /* EPT Accessed/Dirty bits */ +#define VMX_FEATURE_EPT_1GB ( 0*32+ 18) /* 1GB EPT pages */ +#define VMX_FEATURE_RESERVED_EPT_3 ( 0*32+ 19) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_EPT_4 ( 0*32+ 20) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_EPT_5 ( 0*32+ 21) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_EPT_6 ( 0*32+ 22) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_EPT_7 ( 0*32+ 23) /* "" Reserved */ + +#define VMX_FEATURE_INVVPID ( 0*32+ 24) /* INVVPID is supported */ +#define VMX_FEATURE_RESERVED_MISC_1 ( 0*32+ 26) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_MISC_2 ( 0*32+ 26) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_MISC_3 ( 0*32+ 27) /* "" Reserved */ + +#define VMX_FEATURE_EPTP_SWITCHING ( 0*32+ 28) /* EPTP switching (in guest) */ +#define VMX_FEATURE_RESERVED_FUNC_1 ( 0*32+ 29) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_FUNC_2 ( 0*32+ 30) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_FUNC_3 ( 0*32+ 31) /* "" Reserved */ + +/* Primary Processor-Based VM-Execution Controls, word 1 */ +#define VMX_FEATURE_RESERVED_PRIMARY_0 ( 1*32+ 0) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PRIMARY_1 ( 1*32+ 1) /* "" Reserved */ +#define VMX_FEATURE_VIRTUAL_INTR_PENDING ( 1*32+ 2) /* "" VM-Exit if INTRs are unblocked in guest */ +#define VMX_FEATURE_TSC_OFFSETTING ( 1*32+ 3) /* Offset hardware TSC when read in guest */ +#define VMX_FEATURE_RESERVED_PRIMARY_4 ( 1*32+ 4) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PRIMARY_5 ( 1*32+ 5) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PRIMARY_6 ( 1*32+ 6) /* "" Reserved */ +#define VMX_FEATURE_HLT_EXITING ( 1*32+ 7) /* "" VM-Exit on HLT */ +#define VMX_FEATURE_RESERVED_PRIMARY_8 ( 1*32+ 8) /* "" Reserved */ +#define VMX_FEATURE_INVLPG_EXITING ( 1*32+ 9) /* "" VM-Exit on INVLPG */ +#define VMX_FEATURE_MWAIT_EXITING ( 1*32+ 10) /* "" VM-Exit on MWAIT */ +#define VMX_FEATURE_RDPMC_EXITING ( 1*32+ 11) /* "" VM-Exit on RDPMC */ +#define VMX_FEATURE_RDTSC_EXITING ( 1*32+ 12) /* "" VM-Exit on RDTSC */ +#define VMX_FEATURE_RESERVED_PRIMARY_13 ( 1*32+ 13) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PRIMARY_14 ( 1*32+ 14) /* "" Reserved */ +#define VMX_FEATURE_CR3_LOAD_EXITING ( 1*32+ 15) /* "" VM-Exit on writes to CR3 */ +#define VMX_FEATURE_CR3_STORE_EXITING ( 1*32+ 16) /* "" VM-Exit on reads from CR3 */ +#define VMX_FEATURE_RESERVED_PRIMARY_17 ( 1*32+ 17) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PRIMARY_18 ( 1*32+ 18) /* "" Reserved */ +#define VMX_FEATURE_CR8_LOAD_EXITING ( 1*32+ 19) /* "" VM-Exit on writes to CR8 */ +#define VMX_FEATURE_CR8_STORE_EXITING ( 1*32+ 20) /* "" VM-Exit on reads from CR8 */ +#define VMX_FEATURE_VIRTUAL_TPR ( 1*32+ 21) /* TPR virtualization, a.k.a. TPR shadow */ +#define VMX_FEATURE_VIRTUAL_NMI_PENDING ( 1*32+ 22) /* "" VM-Exit if NMIs are unblocked in guest */ +#define VMX_FEATURE_MOV_DR_EXITING ( 1*32+ 23) /* "" VM-Exit on accesses to debug registers */ +#define VMX_FEATURE_UNCOND_IO_EXITING ( 1*32+ 24) /* "" VM-Exit on *all* IN{S} and OUT{S}*/ +#define VMX_FEATURE_USE_IO_BITMAPS ( 1*32+ 25) /* "" VM-Exit based on I/O port */ +#define VMX_FEATURE_RESERVED_PRIMARY_26 ( 1*32+ 26) /* "" Reserved */ +#define VMX_FEATURE_MONITOR_TRAP_FLAG ( 1*32+ 27) /* "mtf" VMX single-step VM-Exits */ +#define VMX_FEATURE_USE_MSR_BITMAPS ( 1*32+ 28) /* "" VM-Exit based on MSR index */ +#define VMX_FEATURE_MONITOR_EXITING ( 1*32+ 29) /* "" VM-Exit on MONITOR (MWAIT's accomplice) */ +#define VMX_FEATURE_PAUSE_EXITING ( 1*32+ 30) /* "" Reserved */ +#define VMX_FEATURE_SEC_CONTROLS ( 1*32+ 31) /* "" Enable Secondary VM-Execution Controls */ + +/* Secondary Processor-Based VM-Execution Controls, word 2 */ +#define VMX_FEATURE_VIRT_APIC_ACCESSES ( 2*32+ 0) /* "" VM-Exit on vectored interrupts */ +#define VMX_FEATURE_EPT ( 2*32+ 1) /* Extended Page Tables, a.k.a. Two-Dimensional Paging */ +#define VMX_FEATURE_DESC_EXITING ( 2*32+ 2) /* "" VM-Exit on {S,L}*DT instructions */ +#define VMX_FEATURE_RDTSCP ( 2*32+ 3) /* "" Enable RDTSCP in guest */ +#define VMX_FEATURE_VIRTUAL_X2APIC ( 2*32+ 4) /* "" Virtualize X2APIC for the guest */ +#define VMX_FEATURE_VPID ( 2*32+ 5) /* Virtual Processor ID (TLB ASID modifier) */ +#define VMX_FEATURE_WBINVD_EXITING ( 2*32+ 6) /* "" VM-Exit on WBINVD */ +#define VMX_FEATURE_UNRESTRICTED_GUEST ( 2*32+ 7) /* Allow Big Real Mode and other "invalid" states */ +#define VMX_FEATURE_APIC_REGISTER_VIRT ( 2*32+ 8) /* "" Reserved */ +#define VMX_FEATURE_VIRT_INTR_DELIVERY ( 2*32+ 9) /* "" Reserved */ +#define VMX_FEATURE_PAUSE_LOOP_EXITING ( 2*32+ 10) /* "ple" Conditionally VM-Exit on PAUSE at CPL0 */ +#define VMX_FEATURE_RDRAND_EXITING ( 2*32+ 11) /* "" VM-Exit on RDRAND*/ +#define VMX_FEATURE_INVPCID ( 2*32+ 12) /* "" Enable INVPCID in guest */ +#define VMX_FEATURE_VMFUNC ( 2*32+ 13) /* "" Enable VM-Functions (leaf dependent) */ +#define VMX_FEATURE_SHADOW_VMCS ( 2*32+ 14) /* VMREAD/VMWRITE in guest can access shadow VMCS */ +#define VMX_FEATURE_ENCLS_EXITING ( 2*32+ 15) /* "" VM-Exit on ENCLS (leaf dependent) */ +#define VMX_FEATURE_RDSEED_EXITING ( 2*32+ 16) /* "" VM-Exit on RDSEED */ +#define VMX_FEATURE_PAGE_MOD_LOGGING ( 2*32+ 17) /* "pml" Log dirty pages into buffer */ +#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* "" Conditionally reflect EPT violations as #VE exceptions */ +#define VMX_FEATURE_PT_CONCEAL_VMX ( 2*32+ 19) /* "" Suppress VMX indicators in Processor Trace */ +#define VMX_FEATURE_XSAVES ( 2*32+ 20) /* "" Enable XSAVES and XRSTORS in guest */ +#define VMX_FEATURE_RESERVED_PIN_21 ( 2*32+ 21) /* "" Reserved */ +#define VMX_FEATURE_MODE_BASED_EPT_EXEC ( 2*32+ 22) /* Enable separate EPT EXEC bits for supervisor vs. user */ +#define VMX_FEATURE_RESERVED_PIN_23 ( 2*32+ 23) /* "" Reserved */ +#define VMX_FEATURE_PT_USE_GPA ( 2*32+ 24) /* "" Processor Trace logs GPAs */ +#define VMX_FEATURE_TSC_SCALING ( 2*32+ 25) /* Scale hardware TSC when read in guest */ +#define VMX_FEATURE_RESERVED_PIN_26 ( 2*32+ 26) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_27 ( 2*32+ 27) /* "" Reserved */ +#define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */ +#define VMX_FEATURE_RESERVED_PIN_29 ( 2*32+ 29) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_30 ( 2*32+ 30) /* "" Reserved */ +#define VMX_FEATURE_RESERVED_PIN_31 ( 2*32+ 31) /* "" Reserved */ + +#endif /* _ASM_X86_VMXFEATURES_H */ -- 2.22.0
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill the capabilities during IA32_FEATURE_CONTROL MSR initialization. Make the VMX capabilities dependent on X86_INTEL_FEATURE_CONTROL and X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't possibly support VMX, or when /proc/cpuinfo is not available. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/Kconfig.cpu | 4 ++ arch/x86/include/asm/processor.h | 3 ++ arch/x86/include/asm/vmxfeatures.h | 5 +++ arch/x86/kernel/cpu/common.c | 3 ++ arch/x86/kernel/cpu/feature_control.c | 59 +++++++++++++++++++++++++++ 5 files changed, 74 insertions(+) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index e78f39adae7b..e7571bd0f515 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -391,6 +391,10 @@ config X86_FEATURE_CONTROL_MSR def_bool y depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR || CPU_SUP_ZHAOXIN +config X86_VMX_FEATURE_NAMES + def_bool y + depends on X86_FEATURE_CONTROL_MSR && X86_FEATURE_NAMES + menuconfig PROCESSOR_SELECT bool "Supported processor vendors" if EXPERT ---help--- diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 4c3f41d7be5f..3b5dc9b1e7c4 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -84,6 +84,9 @@ struct cpuinfo_x86 { #ifdef CONFIG_X86_64 /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; +#endif +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + __u32 vmx_capability[NVMXINTS]; #endif __u8 x86_virt_bits; __u8 x86_phys_bits; diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h index ab82e3643d0c..d33ea1c165fd 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -2,6 +2,11 @@ #ifndef _ASM_X86_VMXFEATURES_H #define _ASM_X86_VMXFEATURES_H +/* + * Defines VMX CPU feature bits + */ +#define NVMXINTS 3 /* N 32-bit words worth of info */ + /* * Note: If the comment begins with a quoted string, that string is used * in /proc/cpuinfo instead of the macro name. If the string is "", diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9ae7d1bcd4f4..33537556dac6 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1421,6 +1421,9 @@ static void identify_cpu(struct cpuinfo_x86 *c) #endif c->x86_cache_alignment = c->x86_clflush_size; memset(&c->x86_capability, 0, sizeof(c->x86_capability)); +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + memset(&c->vmx_capability, 0, sizeof(c->vmx_capability)); +#endif generic_identify(c); diff --git a/arch/x86/kernel/cpu/feature_control.c b/arch/x86/kernel/cpu/feature_control.c index 74c76159a046..43eb65e8cd18 100644 --- a/arch/x86/kernel/cpu/feature_control.c +++ b/arch/x86/kernel/cpu/feature_control.c @@ -4,6 +4,61 @@ #include <asm/cpufeature.h> #include <asm/msr-index.h> #include <asm/processor.h> +#include <asm/vmx.h> + +#ifdef CONFIG_X86_VMX_FEATURE_NAMES +enum vmx_feature_leafs { + MISC_FEATURES = 0, + PRIMARY_PROC_CONTROLS, + SECONDARY_PROC_CONTROLS, + NR_VMX_FEATURE_WORDS, +}; + +#define EPT_BIT(x) BIT(VMX_FEATURE_##x & 0x1f) + +static void init_vmx_capabilities(struct cpuinfo_x86 *c) +{ + u32 supported, funcs, ept, vpid, ign; + + BUILD_BUG_ON(NVMXINTS != NR_VMX_FEATURE_WORDS); + + /* + * The high bits contain the allowed-1 settings, i.e. features that can + * be turned on. The low bits contain the allowed-0 settings, i.e. + * features that can be turned off. Ignore the allowed-0 settings, + * if a feature can be turned on then it's supported. + */ + rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported); + rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs); + + /* + * Except for EPT+VPID, which enumerates support for both in a single + * MSR, low for EPT, high for VPID. + */ + rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP, &ept, &vpid); + + /* Pin, EPT, VPID and VM-Func are merged into a single word. */ + WARN_ON_ONCE(supported >> 16); + WARN_ON_ONCE(funcs >> 4); + c->vmx_capability[MISC_FEATURES] = (supported & 0xffff) | + ((vpid & 0x1) << 24) | + ((funcs & 0xf) << 28); + + /* EPT bits are scattered and must be manually handled. */ + if (ept & VMX_EPT_EXECUTE_ONLY_BIT) + c->vmx_capability[MISC_FEATURES] |= EPT_BIT(EPT_EXECUTE_ONLY); + if (ept & VMX_EPT_1GB_PAGE_BIT) + c->vmx_capability[MISC_FEATURES] |= EPT_BIT(EPT_AD); + if (ept & VMX_EPT_1GB_PAGE_BIT) + c->vmx_capability[MISC_FEATURES] |= EPT_BIT(EPT_1GB); + + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, ign, supported); + c->vmx_capability[PRIMARY_PROC_CONTROLS] = supported; + + rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported); + c->vmx_capability[SECONDARY_PROC_CONTROLS] = supported; +} +#endif /* CONFIG_X86_VMX_FEATURE_NAMES */ void init_feature_control_msr(struct cpuinfo_x86 *c) { @@ -43,5 +98,9 @@ void init_feature_control_msr(struct cpuinfo_x86 *c) pr_err_once("x86/cpu: VMX disabled by BIOS (TXT %s)\n", tboot ? "enabled" : "disabled"); clear_cpu_cap(c, X86_FEATURE_VMX); + } else { +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + init_vmx_capabilities(c); +#endif } } -- 2.22.0
Add support for generating VMX feature names in capflags.c and printing the resulting names in /proc/cpuinfo as "vmx flags" when VMX support is detected. Do not print VMX flags if no bits are set in word 0, which includes Pin controls. INTR and NMI exiting are fundamental pillars of virtualization, if they're not supported then the CPU is broken, it does not actually support VMX, or the kernel wasn't built with support for the target CPU. Remove all code which sets the synthetic VMX flags in cpufeatures so that duplicate VMX features are not printed in "flags" and "vmx flags". Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/boot/mkcpustr.c | 1 + arch/x86/kernel/cpu/Makefile | 5 ++-- arch/x86/kernel/cpu/centaur.c | 35 ---------------------- arch/x86/kernel/cpu/intel.c | 49 ------------------------------- arch/x86/kernel/cpu/mkcapflags.sh | 15 +++++++--- arch/x86/kernel/cpu/proc.c | 15 ++++++++++ arch/x86/kernel/cpu/zhaoxin.c | 35 ---------------------- 7 files changed, 30 insertions(+), 125 deletions(-) diff --git a/arch/x86/boot/mkcpustr.c b/arch/x86/boot/mkcpustr.c index 9caa10e82217..da0ccc5de538 100644 --- a/arch/x86/boot/mkcpustr.c +++ b/arch/x86/boot/mkcpustr.c @@ -15,6 +15,7 @@ #include "../include/asm/required-features.h" #include "../include/asm/disabled-features.h" #include "../include/asm/cpufeatures.h" +#include "../include/asm/vmxfeatures.h" #include "../kernel/cpu/capflags.c" int main(void) diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index df5ad0cfe3e9..025cbfd45687 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -54,11 +54,12 @@ obj-$(CONFIG_ACRN_GUEST) += acrn.o ifdef CONFIG_X86_FEATURE_NAMES quiet_cmd_mkcapflags = MKCAP $@ - cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $< $@ + cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^ cpufeature = $(src)/../../include/asm/cpufeatures.h +vmxfeature = $(src)/../../include/asm/vmxfeatures.h -$(obj)/capflags.c: $(cpufeature) $(src)/mkcapflags.sh FORCE +$(obj)/capflags.c: $(cpufeature) $(vmxfeature) $(src)/mkcapflags.sh FORCE $(call if_changed,mkcapflags) endif targets += capflags.c diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c index a6ca4c31c1b6..be11c796926b 100644 --- a/arch/x86/kernel/cpu/centaur.c +++ b/arch/x86/kernel/cpu/centaur.c @@ -18,13 +18,6 @@ #define RNG_ENABLED (1 << 3) #define RNG_ENABLE (1 << 6) /* MSR_VIA_RNG */ -#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 - static void init_c3(struct cpuinfo_x86 *c) { u32 lo, hi; @@ -119,31 +112,6 @@ static void early_init_centaur(struct cpuinfo_x86 *c) } } -static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) -{ - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; - - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); - msr_ctl = vmx_msr_high | vmx_msr_low; - - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) - set_cpu_cap(c, X86_FEATURE_VNMI); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, - vmx_msr_low, vmx_msr_high); - msr_ctl2 = vmx_msr_high | vmx_msr_low; - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) - set_cpu_cap(c, X86_FEATURE_EPT); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) - set_cpu_cap(c, X86_FEATURE_VPID); - } -} - static void init_centaur(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_32 @@ -251,9 +219,6 @@ static void init_centaur(struct cpuinfo_x86 *c) #endif init_feature_control_msr(c); - - if (cpu_has(c, X86_FEATURE_VMX)) - centaur_detect_vmx_virtcap(c); } #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 15d59224e2f8..594d2686ad52 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -494,52 +494,6 @@ static void srat_detect_node(struct cpuinfo_x86 *c) #endif } -static void detect_vmx_virtcap(struct cpuinfo_x86 *c) -{ - /* Intel VMX MSR indicated features */ -#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 -#define x86_VMX_FEATURE_EPT_CAP_AD 0x00200000 - - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; - u32 msr_vpid_cap, msr_ept_cap; - - clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - clear_cpu_cap(c, X86_FEATURE_VNMI); - clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - clear_cpu_cap(c, X86_FEATURE_EPT); - clear_cpu_cap(c, X86_FEATURE_VPID); - clear_cpu_cap(c, X86_FEATURE_EPT_AD); - - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); - msr_ctl = vmx_msr_high | vmx_msr_low; - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) - set_cpu_cap(c, X86_FEATURE_VNMI); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, - vmx_msr_low, vmx_msr_high); - msr_ctl2 = vmx_msr_high | vmx_msr_low; - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) { - set_cpu_cap(c, X86_FEATURE_EPT); - rdmsr(MSR_IA32_VMX_EPT_VPID_CAP, - msr_ept_cap, msr_vpid_cap); - if (msr_ept_cap & x86_VMX_FEATURE_EPT_CAP_AD) - set_cpu_cap(c, X86_FEATURE_EPT_AD); - } - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) - set_cpu_cap(c, X86_FEATURE_VPID); - } -} - #define MSR_IA32_TME_ACTIVATE 0x982 /* Helpers to access TME_ACTIVATE MSR */ @@ -757,9 +711,6 @@ static void init_intel(struct cpuinfo_x86 *c) init_feature_control_msr(c); - if (cpu_has(c, X86_FEATURE_VMX)) - detect_vmx_virtcap(c); - if (cpu_has(c, X86_FEATURE_TME)) detect_tme(c); diff --git a/arch/x86/kernel/cpu/mkcapflags.sh b/arch/x86/kernel/cpu/mkcapflags.sh index aed45b8895d5..1db560ed2ca3 100644 --- a/arch/x86/kernel/cpu/mkcapflags.sh +++ b/arch/x86/kernel/cpu/mkcapflags.sh @@ -6,8 +6,7 @@ set -e -IN=$1 -OUT=$2 +OUT=$1 dump_array() { @@ -15,6 +14,7 @@ dump_array() SIZE=$2 PFX=$3 POSTFIX=$4 + IN=$5 PFX_SZ=$(echo $PFX | wc -c) TABS="$(printf '\t\t\t\t\t')" @@ -57,11 +57,18 @@ trap 'rm "$OUT"' EXIT echo "#endif" echo "" - dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" + dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" $2 echo "" - dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" + dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" $2 + echo "" + echo "#ifdef CONFIG_X86_VMX_FEATURE_NAMES" + echo "#ifndef _ASM_X86_VMXFEATURES_H" + echo "#include <asm/vmxfeatures.h>" + echo "#endif" + dump_array "x86_vmx_flags" "NVMXINTS*32" "VMX_FEATURE_" "" $3 + echo "#endif /* CONFIG_X86_VMX_FEATURE_NAMES */" ) > $OUT trap - EXIT diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index cb2e49810d68..4eec8889b0ff 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -7,6 +7,10 @@ #include "cpu.h" +#ifdef CONFIG_X86_VMX_FEATURE_NAMES +extern const char * const x86_vmx_flags[NVMXINTS*32]; +#endif + /* * Get CPU information for use by the procfs. */ @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v) if (cpu_has(c, i) && x86_cap_flags[i] != NULL) seq_printf(m, " %s", x86_cap_flags[i]); +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { + seq_puts(m, "\nvmx flags\t:"); + for (i = 0; i < 32*NVMXINTS; i++) { + if (test_bit(i, (unsigned long *)c->vmx_capability) && + x86_vmx_flags[i] != NULL) + seq_printf(m, " %s", x86_vmx_flags[i]); + } + } +#endif + seq_puts(m, "\nbugs\t\t:"); for (i = 0; i < 32*NBUGINTS; i++) { unsigned int bug_bit = 32*NCAPINTS + i; diff --git a/arch/x86/kernel/cpu/zhaoxin.c b/arch/x86/kernel/cpu/zhaoxin.c index 01b05a4a5a85..edfc7cc4ec33 100644 --- a/arch/x86/kernel/cpu/zhaoxin.c +++ b/arch/x86/kernel/cpu/zhaoxin.c @@ -16,13 +16,6 @@ #define RNG_ENABLED (1 << 3) #define RNG_ENABLE (1 << 8) /* MSR_ZHAOXIN_RNG */ -#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 - static void init_zhaoxin_cap(struct cpuinfo_x86 *c) { u32 lo, hi; @@ -89,31 +82,6 @@ static void early_init_zhaoxin(struct cpuinfo_x86 *c) } -static void zhaoxin_detect_vmx_virtcap(struct cpuinfo_x86 *c) -{ - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; - - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); - msr_ctl = vmx_msr_high | vmx_msr_low; - - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) - set_cpu_cap(c, X86_FEATURE_VNMI); - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, - vmx_msr_low, vmx_msr_high); - msr_ctl2 = vmx_msr_high | vmx_msr_low; - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) - set_cpu_cap(c, X86_FEATURE_EPT); - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) - set_cpu_cap(c, X86_FEATURE_VPID); - } -} - static void init_zhaoxin(struct cpuinfo_x86 *c) { early_init_zhaoxin(c); @@ -142,9 +110,6 @@ static void init_zhaoxin(struct cpuinfo_x86 *c) #endif init_feature_control_msr(c); - - if (cpu_has(c, X86_FEATURE_VMX)) - zhaoxin_detect_vmx_virtcap(c); } #ifdef CONFIG_X86_32 -- 2.22.0
Remove the synthetic VMX feature flags from word 8 as they have been superseded by VMX_FEATURE_*. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/asm/cpufeatures.h | 7 ------- 1 file changed, 7 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 0652d3eed9bd..76070800187a 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -222,15 +222,8 @@ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ /* Virtualization flags: Linux defined, word 8 */ -#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ -#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ -#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */ -#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ -#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */ - #define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */ #define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ -#define X86_FEATURE_EPT_AD ( 8*32+17) /* Intel Extended Page Table access-dirty bit */ #define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */ #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */ -- 2.22.0
Define the VMCS execution control flags (consumed by KVM) using their associated VMX_FEATURE_* to provide a strong hint that new VMX features are expected to be added to VMX_FEATURE and considered for reporting via /proc/cpuinfo. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/asm/vmx.h | 105 +++++++++++++++++++------------------ 1 file changed, 55 insertions(+), 50 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 1835767aa335..9fbba31be825 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -15,67 +15,70 @@ #include <linux/bitops.h> #include <linux/types.h> #include <uapi/asm/vmx.h> +#include <asm/vmxfeatures.h> + +#define VMCS_CONTROL_BIT(x) BIT(VMX_FEATURE_##x & 0x1f) /* * Definitions of Primary Processor-Based VM-Execution Controls. */ -#define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 -#define CPU_BASED_USE_TSC_OFFSETING 0x00000008 -#define CPU_BASED_HLT_EXITING 0x00000080 -#define CPU_BASED_INVLPG_EXITING 0x00000200 -#define CPU_BASED_MWAIT_EXITING 0x00000400 -#define CPU_BASED_RDPMC_EXITING 0x00000800 -#define CPU_BASED_RDTSC_EXITING 0x00001000 -#define CPU_BASED_CR3_LOAD_EXITING 0x00008000 -#define CPU_BASED_CR3_STORE_EXITING 0x00010000 -#define CPU_BASED_CR8_LOAD_EXITING 0x00080000 -#define CPU_BASED_CR8_STORE_EXITING 0x00100000 -#define CPU_BASED_TPR_SHADOW 0x00200000 -#define CPU_BASED_VIRTUAL_NMI_PENDING 0x00400000 -#define CPU_BASED_MOV_DR_EXITING 0x00800000 -#define CPU_BASED_UNCOND_IO_EXITING 0x01000000 -#define CPU_BASED_USE_IO_BITMAPS 0x02000000 -#define CPU_BASED_MONITOR_TRAP_FLAG 0x08000000 -#define CPU_BASED_USE_MSR_BITMAPS 0x10000000 -#define CPU_BASED_MONITOR_EXITING 0x20000000 -#define CPU_BASED_PAUSE_EXITING 0x40000000 -#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x80000000 +#define CPU_BASED_VIRTUAL_INTR_PENDING VMCS_CONTROL_BIT(VIRTUAL_INTR_PENDING) +#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING) +#define CPU_BASED_HLT_EXITING VMCS_CONTROL_BIT(HLT_EXITING) +#define CPU_BASED_INVLPG_EXITING VMCS_CONTROL_BIT(INVLPG_EXITING) +#define CPU_BASED_MWAIT_EXITING VMCS_CONTROL_BIT(MWAIT_EXITING) +#define CPU_BASED_RDPMC_EXITING VMCS_CONTROL_BIT(RDPMC_EXITING) +#define CPU_BASED_RDTSC_EXITING VMCS_CONTROL_BIT(RDTSC_EXITING) +#define CPU_BASED_CR3_LOAD_EXITING VMCS_CONTROL_BIT(CR3_LOAD_EXITING) +#define CPU_BASED_CR3_STORE_EXITING VMCS_CONTROL_BIT(CR3_STORE_EXITING) +#define CPU_BASED_CR8_LOAD_EXITING VMCS_CONTROL_BIT(CR8_LOAD_EXITING) +#define CPU_BASED_CR8_STORE_EXITING VMCS_CONTROL_BIT(CR8_STORE_EXITING) +#define CPU_BASED_TPR_SHADOW VMCS_CONTROL_BIT(VIRTUAL_TPR) +#define CPU_BASED_VIRTUAL_NMI_PENDING VMCS_CONTROL_BIT(VIRTUAL_NMI_PENDING) +#define CPU_BASED_MOV_DR_EXITING VMCS_CONTROL_BIT(MOV_DR_EXITING) +#define CPU_BASED_UNCOND_IO_EXITING VMCS_CONTROL_BIT(UNCOND_IO_EXITING) +#define CPU_BASED_USE_IO_BITMAPS VMCS_CONTROL_BIT(USE_IO_BITMAPS) +#define CPU_BASED_MONITOR_TRAP_FLAG VMCS_CONTROL_BIT(MONITOR_TRAP_FLAG) +#define CPU_BASED_USE_MSR_BITMAPS VMCS_CONTROL_BIT(USE_MSR_BITMAPS) +#define CPU_BASED_MONITOR_EXITING VMCS_CONTROL_BIT(MONITOR_EXITING) +#define CPU_BASED_PAUSE_EXITING VMCS_CONTROL_BIT(PAUSE_EXITING) +#define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS VMCS_CONTROL_BIT(SEC_CONTROLS) #define CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x0401e172 /* * Definitions of Secondary Processor-Based VM-Execution Controls. */ -#define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001 -#define SECONDARY_EXEC_ENABLE_EPT 0x00000002 -#define SECONDARY_EXEC_DESC 0x00000004 -#define SECONDARY_EXEC_RDTSCP 0x00000008 -#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE 0x00000010 -#define SECONDARY_EXEC_ENABLE_VPID 0x00000020 -#define SECONDARY_EXEC_WBINVD_EXITING 0x00000040 -#define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x00000080 -#define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x00000100 -#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY 0x00000200 -#define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400 -#define SECONDARY_EXEC_RDRAND_EXITING 0x00000800 -#define SECONDARY_EXEC_ENABLE_INVPCID 0x00001000 -#define SECONDARY_EXEC_ENABLE_VMFUNC 0x00002000 -#define SECONDARY_EXEC_SHADOW_VMCS 0x00004000 -#define SECONDARY_EXEC_ENCLS_EXITING 0x00008000 -#define SECONDARY_EXEC_RDSEED_EXITING 0x00010000 -#define SECONDARY_EXEC_ENABLE_PML 0x00020000 -#define SECONDARY_EXEC_PT_CONCEAL_VMX 0x00080000 -#define SECONDARY_EXEC_XSAVES 0x00100000 -#define SECONDARY_EXEC_PT_USE_GPA 0x01000000 -#define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x00400000 -#define SECONDARY_EXEC_TSC_SCALING 0x02000000 +#define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES VMCS_CONTROL_BIT(VIRT_APIC_ACCESSES) +#define SECONDARY_EXEC_ENABLE_EPT VMCS_CONTROL_BIT(EPT) +#define SECONDARY_EXEC_DESC VMCS_CONTROL_BIT(DESC_EXITING) +#define SECONDARY_EXEC_RDTSCP VMCS_CONTROL_BIT(RDTSCP) +#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE VMCS_CONTROL_BIT(VIRTUAL_X2APIC) +#define SECONDARY_EXEC_ENABLE_VPID VMCS_CONTROL_BIT(VPID) +#define SECONDARY_EXEC_WBINVD_EXITING VMCS_CONTROL_BIT(WBINVD_EXITING) +#define SECONDARY_EXEC_UNRESTRICTED_GUEST VMCS_CONTROL_BIT(UNRESTRICTED_GUEST) +#define SECONDARY_EXEC_APIC_REGISTER_VIRT VMCS_CONTROL_BIT(APIC_REGISTER_VIRT) +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY VMCS_CONTROL_BIT(VIRT_INTR_DELIVERY) +#define SECONDARY_EXEC_PAUSE_LOOP_EXITING VMCS_CONTROL_BIT(PAUSE_LOOP_EXITING) +#define SECONDARY_EXEC_RDRAND_EXITING VMCS_CONTROL_BIT(RDRAND_EXITING) +#define SECONDARY_EXEC_ENABLE_INVPCID VMCS_CONTROL_BIT(INVPCID) +#define SECONDARY_EXEC_ENABLE_VMFUNC VMCS_CONTROL_BIT(VMFUNC) +#define SECONDARY_EXEC_SHADOW_VMCS VMCS_CONTROL_BIT(SHADOW_VMCS) +#define SECONDARY_EXEC_ENCLS_EXITING VMCS_CONTROL_BIT(ENCLS_EXITING) +#define SECONDARY_EXEC_RDSEED_EXITING VMCS_CONTROL_BIT(RDSEED_EXITING) +#define SECONDARY_EXEC_ENABLE_PML VMCS_CONTROL_BIT(PAGE_MOD_LOGGING) +#define SECONDARY_EXEC_PT_CONCEAL_VMX VMCS_CONTROL_BIT(PT_CONCEAL_VMX) +#define SECONDARY_EXEC_XSAVES VMCS_CONTROL_BIT(XSAVES) +#define SECONDARY_EXEC_MODE_BASED_EPT_EXEC VMCS_CONTROL_BIT(MODE_BASED_EPT_EXEC) +#define SECONDARY_EXEC_PT_USE_GPA VMCS_CONTROL_BIT(PT_USE_GPA) +#define SECONDARY_EXEC_TSC_SCALING VMCS_CONTROL_BIT(TSC_SCALING) #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE 0x04000000 -#define PIN_BASED_EXT_INTR_MASK 0x00000001 -#define PIN_BASED_NMI_EXITING 0x00000008 -#define PIN_BASED_VIRTUAL_NMIS 0x00000020 -#define PIN_BASED_VMX_PREEMPTION_TIMER 0x00000040 -#define PIN_BASED_POSTED_INTR 0x00000080 +#define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING) +#define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING) +#define PIN_BASED_VIRTUAL_NMIS VMCS_CONTROL_BIT(VIRTUAL_NMIS) +#define PIN_BASED_VMX_PREEMPTION_TIMER VMCS_CONTROL_BIT(PREEMPTION_TIMER) +#define PIN_BASED_POSTED_INTR VMCS_CONTROL_BIT(POSTED_INTR) #define PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR 0x00000016 @@ -114,7 +117,9 @@ #define VMX_MISC_MSR_LIST_MULTIPLIER 512 /* VMFUNC functions */ -#define VMX_VMFUNC_EPTP_SWITCHING 0x00000001 +#define VMFUNC_CONTROL_BIT(x) BIT((VMX_FEATURE_##x & 0x1f) - 28) + +#define VMX_VMFUNC_EPTP_SWITCHING VMFUNC_CONTROL_BIT(EPTP_SWITCHING) #define VMFUNC_EPTP_ENTRIES 512 static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic) -- 2.22.0
Shift the remaining synthetic virtualization flags so that the flags are contiguous starting from bit 0. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/asm/cpufeatures.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 76070800187a..1899ddce53f7 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -222,10 +222,10 @@ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ /* Virtualization flags: Linux defined, word 8 */ -#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */ -#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ -#define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */ -#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */ +#define X86_FEATURE_VMMCALL ( 8*32+ 0) /* Prefer VMMCALL to VMCALL */ +#define X86_FEATURE_XENPV ( 8*32+ 1) /* "" Xen paravirtual guest */ +#define X86_FEATURE_VMCALL ( 8*32+ 2) /* "" Hypervisor supports the VMCALL instruction */ +#define X86_FEATURE_VMW_VMMCALL ( 8*32+ 3) /* "" VMware prefers VMMCALL hypercall instruction */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ -- 2.22.0
Provide stubs for perf_guest_get_msrs() and intel_pt_handle_vmx() when building without support for Intel CPUs, i.e. CPU_SUP_INTEL=n. Lack of stubs is not currently a problem as the only user, KVM_INTEL, takes a dependency on CPU_SUP_INTEL=y. Provide the stubs for all CPUs so that KVM_INTEL can be built for any CPU with compatible hardware support, e.g. Centuar and Zhaoxin CPUs. Note, the existing stub for perf_guest_get_msrs() is essentially dead code as KVM selects CONFIG_PERF_EVENTS, i.e. the only user guarantees the full implementation is built. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/asm/perf_event.h | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index ee26e9215f18..29964b0e1075 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -322,17 +322,10 @@ struct perf_guest_switch_msr { u64 host, guest; }; -extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap); extern void perf_check_microcode(void); extern int x86_perf_rdpmc_index(struct perf_event *event); #else -static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) -{ - *nr = 0; - return NULL; -} - static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { memset(cap, 0, sizeof(*cap)); @@ -342,8 +335,23 @@ static inline void perf_events_lapic_init(void) { } static inline void perf_check_microcode(void) { } #endif +#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL) +extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); +#else +static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + *nr = 0; + return NULL; +} +#endif + #ifdef CONFIG_CPU_SUP_INTEL extern void intel_pt_handle_vmx(int on); +#else +static inline void intel_pt_handle_vmx(int on) +{ + +} #endif #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) -- 2.22.0
Change the dependency for KVM_INTEL, i.e. KVM w/ VMX, from Intel CPUs to any CPU that has IA32_FEATURE_CONTROL MSR and thus VMX functionality. This effectively allows building KVM_INTEL for Centaur and Zhaoxin CPUs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/Kconfig | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 840e12583b85..42c7a23c5f28 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -60,13 +60,12 @@ config KVM If unsure, say N. config KVM_INTEL - tristate "KVM for Intel processors support" + tristate "KVM for Intel (and compatible) processors support" depends on KVM - # for perf_guest_get_msrs(): - depends on CPU_SUP_INTEL + depends on X86_FEATURE_CONTROL_MSR ---help--- - Provides support for KVM on Intel processors equipped with the VT - extensions. + Provides support for KVM on processors equipped with Intel's VT + extensions, a.k.a. Virtual Machine Extensions (VMX). To compile this as a module, choose M here: the module will be called kvm-intel. -- 2.22.0
On Fri, Oct 4, 2019 at 2:56 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Remove the code to initialize IA32_FEATURE_CONTROL MSR when KVM is
> loaded now that the MSR is initialized during boot on all CPUs that
> support VMX, i.e. can possibly load kvm_intel.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
On Fri, Oct 4, 2019 at 2:56 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Replace KVM's manual checks on IA32_FEATURE_CONTROL with a query on the
> boot CPU's VMX feature flag. The VMX flag is now cleared during boot if
> VMX isn't fully enabled via IA32_FEATURE_CONTROL, including the case
> where IA32_FEATURE_CONTROL isn't supported.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
On Fri, Oct 4, 2019 at 2:56 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Explicitly check the current CPU's VMX feature flag when verifying
> compatibility across physical CPUs. This effectively adds a check on
> IA32_FEATURE_CONTROL to ensure that VMX is fully enabled on all CPUs.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
On 04/10/19 23:56, Sean Christopherson wrote:
> Always lock IA32_FEATURE_CONTROL if it exists, even if the CPU doesn't
> support VMX, so that other existing and future kernel code that queries
> IA32_FEATURE_CONTROL can assume it's locked.
Possibly stupid question: why bother locking it? It makes sense to lock
the MSR bits to _off_ in the firmware, but if the BIOS hasn't locked it,
why should the OS?
It seems to me that locking introduces a lot of complication.
Paolo
On 04/10/19 23:56, Sean Christopherson wrote:
> +#define VMX_FEATURE_RDSEED_EXITING ( 2*32+ 16) /* "" VM-Exit on RDSEED */
> +#define VMX_FEATURE_PAGE_MOD_LOGGING ( 2*32+ 17) /* "pml" Log dirty pages into buffer */
> +#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* "" Conditionally reflect EPT violations as #VE exceptions */
> +#define VMX_FEATURE_PT_CONCEAL_VMX ( 2*32+ 19) /* "" Suppress VMX indicators in Processor Trace */
> +#define VMX_FEATURE_XSAVES ( 2*32+ 20) /* "" Enable XSAVES and XRSTORS in guest */
> +#define VMX_FEATURE_RESERVED_PIN_21 ( 2*32+ 21) /* "" Reserved */
> +#define VMX_FEATURE_MODE_BASED_EPT_EXEC ( 2*32+ 22) /* Enable separate EPT EXEC bits for supervisor vs. user */
> +#define VMX_FEATURE_RESERVED_PIN_23 ( 2*32+ 23) /* "" Reserved */
> +#define VMX_FEATURE_PT_USE_GPA ( 2*32+ 24) /* "" Processor Trace logs GPAs */
> +#define VMX_FEATURE_TSC_SCALING ( 2*32+ 25) /* Scale hardware TSC when read in guest */
> +#define VMX_FEATURE_RESERVED_PIN_26 ( 2*32+ 26) /* "" Reserved */
> +#define VMX_FEATURE_RESERVED_PIN_27 ( 2*32+ 27) /* "" Reserved */
> +#define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */
> +#define VMX_FEATURE_RESERVED_PIN_29 ( 2*32+ 29) /* "" Reserved */
> +#define VMX_FEATURE_RESERVED_PIN_30 ( 2*32+ 30) /* "" Reserved */
> +#define VMX_FEATURE_RESERVED_PIN_31 ( 2*32+ 31) /* "" Reserved */
Pasto on the reserved bit names, but in general why is it useful to
define them?
Paolo
On Mon, Oct 07, 2019 at 07:05:32PM +0200, Paolo Bonzini wrote:
> On 04/10/19 23:56, Sean Christopherson wrote:
> > Always lock IA32_FEATURE_CONTROL if it exists, even if the CPU doesn't
> > support VMX, so that other existing and future kernel code that queries
> > IA32_FEATURE_CONTROL can assume it's locked.
>
> Possibly stupid question: why bother locking it? It makes sense to lock
> the MSR bits to _off_ in the firmware, but if the BIOS hasn't locked it,
> why should the OS?
>
> It seems to me that locking introduces a lot of complication.
None of the enable bits take effect until the MSR is locked. If I had to
guess, ucode likely goes and pokes the enabled features during the WRMSR
with the lock bit set, as opposed to the relevant features querying the
MSR value as needed (querying the MSR is likely slow).
On 04/10/19 23:56, Sean Christopherson wrote:
> + /*
> + * The high bits contain the allowed-1 settings, i.e. features that can
> + * be turned on. The low bits contain the allowed-0 settings, i.e.
> + * features that can be turned off. Ignore the allowed-0 settings,
> + * if a feature can be turned on then it's supported.
> + */
> + rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported);
For QEMU, we're defining a feature as supported if a feature can be
turned both on and off. Since msr_low and msr_high can be defined
respectively as must-be-one and can-be-one, the features become
"msr_high & ~msr_low".
Also, shouldn't this use the "true" feature availability MSRs if available?
Paolo
On 04/10/19 23:56, Sean Christopherson wrote: > Add support for generating VMX feature names in capflags.c and printing > the resulting names in /proc/cpuinfo as "vmx flags" when VMX support is > detected. Do not print VMX flags if no bits are set in word 0, which > includes Pin controls. INTR and NMI exiting are fundamental pillars of > virtualization, if they're not supported then the CPU is broken, it does > not actually support VMX, or the kernel wasn't built with support for > the target CPU. > > Remove all code which sets the synthetic VMX flags in cpufeatures so > that duplicate VMX features are not printed in "flags" and "vmx flags". > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/boot/mkcpustr.c | 1 + > arch/x86/kernel/cpu/Makefile | 5 ++-- > arch/x86/kernel/cpu/centaur.c | 35 ---------------------- > arch/x86/kernel/cpu/intel.c | 49 ------------------------------- > arch/x86/kernel/cpu/mkcapflags.sh | 15 +++++++--- > arch/x86/kernel/cpu/proc.c | 15 ++++++++++ > arch/x86/kernel/cpu/zhaoxin.c | 35 ---------------------- > 7 files changed, 30 insertions(+), 125 deletions(-) > > diff --git a/arch/x86/boot/mkcpustr.c b/arch/x86/boot/mkcpustr.c > index 9caa10e82217..da0ccc5de538 100644 > --- a/arch/x86/boot/mkcpustr.c > +++ b/arch/x86/boot/mkcpustr.c > @@ -15,6 +15,7 @@ > #include "../include/asm/required-features.h" > #include "../include/asm/disabled-features.h" > #include "../include/asm/cpufeatures.h" > +#include "../include/asm/vmxfeatures.h" > #include "../kernel/cpu/capflags.c" > > int main(void) > diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile > index df5ad0cfe3e9..025cbfd45687 100644 > --- a/arch/x86/kernel/cpu/Makefile > +++ b/arch/x86/kernel/cpu/Makefile > @@ -54,11 +54,12 @@ obj-$(CONFIG_ACRN_GUEST) += acrn.o > > ifdef CONFIG_X86_FEATURE_NAMES > quiet_cmd_mkcapflags = MKCAP $@ > - cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $< $@ > + cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^ > > cpufeature = $(src)/../../include/asm/cpufeatures.h > +vmxfeature = $(src)/../../include/asm/vmxfeatures.h > > -$(obj)/capflags.c: $(cpufeature) $(src)/mkcapflags.sh FORCE > +$(obj)/capflags.c: $(cpufeature) $(vmxfeature) $(src)/mkcapflags.sh FORCE > $(call if_changed,mkcapflags) > endif > targets += capflags.c > diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c > index a6ca4c31c1b6..be11c796926b 100644 > --- a/arch/x86/kernel/cpu/centaur.c > +++ b/arch/x86/kernel/cpu/centaur.c > @@ -18,13 +18,6 @@ > #define RNG_ENABLED (1 << 3) > #define RNG_ENABLE (1 << 6) /* MSR_VIA_RNG */ > > -#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 > -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 > -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 > -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 > -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 > -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 > - > static void init_c3(struct cpuinfo_x86 *c) > { > u32 lo, hi; > @@ -119,31 +112,6 @@ static void early_init_centaur(struct cpuinfo_x86 *c) > } > } > > -static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) > -{ > - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; > - > - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); > - msr_ctl = vmx_msr_high | vmx_msr_low; > - > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) > - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) > - set_cpu_cap(c, X86_FEATURE_VNMI); > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { > - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, > - vmx_msr_low, vmx_msr_high); > - msr_ctl2 = vmx_msr_high | vmx_msr_low; > - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && > - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) > - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) > - set_cpu_cap(c, X86_FEATURE_EPT); > - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) > - set_cpu_cap(c, X86_FEATURE_VPID); > - } > -} > - > static void init_centaur(struct cpuinfo_x86 *c) > { > #ifdef CONFIG_X86_32 > @@ -251,9 +219,6 @@ static void init_centaur(struct cpuinfo_x86 *c) > #endif > > init_feature_control_msr(c); > - > - if (cpu_has(c, X86_FEATURE_VMX)) > - centaur_detect_vmx_virtcap(c); > } > > #ifdef CONFIG_X86_32 > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c > index 15d59224e2f8..594d2686ad52 100644 > --- a/arch/x86/kernel/cpu/intel.c > +++ b/arch/x86/kernel/cpu/intel.c > @@ -494,52 +494,6 @@ static void srat_detect_node(struct cpuinfo_x86 *c) > #endif > } > > -static void detect_vmx_virtcap(struct cpuinfo_x86 *c) > -{ > - /* Intel VMX MSR indicated features */ > -#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 > -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 > -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 > -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 > -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 > -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 > -#define x86_VMX_FEATURE_EPT_CAP_AD 0x00200000 > - > - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; > - u32 msr_vpid_cap, msr_ept_cap; > - > - clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > - clear_cpu_cap(c, X86_FEATURE_VNMI); > - clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > - clear_cpu_cap(c, X86_FEATURE_EPT); > - clear_cpu_cap(c, X86_FEATURE_VPID); > - clear_cpu_cap(c, X86_FEATURE_EPT_AD); > - > - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); > - msr_ctl = vmx_msr_high | vmx_msr_low; > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) > - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) > - set_cpu_cap(c, X86_FEATURE_VNMI); > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { > - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, > - vmx_msr_low, vmx_msr_high); > - msr_ctl2 = vmx_msr_high | vmx_msr_low; > - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && > - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) > - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) { > - set_cpu_cap(c, X86_FEATURE_EPT); > - rdmsr(MSR_IA32_VMX_EPT_VPID_CAP, > - msr_ept_cap, msr_vpid_cap); > - if (msr_ept_cap & x86_VMX_FEATURE_EPT_CAP_AD) > - set_cpu_cap(c, X86_FEATURE_EPT_AD); > - } > - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) > - set_cpu_cap(c, X86_FEATURE_VPID); > - } > -} > - > #define MSR_IA32_TME_ACTIVATE 0x982 > > /* Helpers to access TME_ACTIVATE MSR */ > @@ -757,9 +711,6 @@ static void init_intel(struct cpuinfo_x86 *c) > > init_feature_control_msr(c); > > - if (cpu_has(c, X86_FEATURE_VMX)) > - detect_vmx_virtcap(c); > - > if (cpu_has(c, X86_FEATURE_TME)) > detect_tme(c); > > diff --git a/arch/x86/kernel/cpu/mkcapflags.sh b/arch/x86/kernel/cpu/mkcapflags.sh > index aed45b8895d5..1db560ed2ca3 100644 > --- a/arch/x86/kernel/cpu/mkcapflags.sh > +++ b/arch/x86/kernel/cpu/mkcapflags.sh > @@ -6,8 +6,7 @@ > > set -e > > -IN=$1 > -OUT=$2 > +OUT=$1 > > dump_array() > { > @@ -15,6 +14,7 @@ dump_array() > SIZE=$2 > PFX=$3 > POSTFIX=$4 > + IN=$5 > > PFX_SZ=$(echo $PFX | wc -c) > TABS="$(printf '\t\t\t\t\t')" > @@ -57,11 +57,18 @@ trap 'rm "$OUT"' EXIT > echo "#endif" > echo "" > > - dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" > + dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" $2 > echo "" > > - dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" > + dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" $2 > + echo "" > > + echo "#ifdef CONFIG_X86_VMX_FEATURE_NAMES" > + echo "#ifndef _ASM_X86_VMXFEATURES_H" > + echo "#include <asm/vmxfeatures.h>" > + echo "#endif" > + dump_array "x86_vmx_flags" "NVMXINTS*32" "VMX_FEATURE_" "" $3 > + echo "#endif /* CONFIG_X86_VMX_FEATURE_NAMES */" > ) > $OUT > > trap - EXIT > diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c > index cb2e49810d68..4eec8889b0ff 100644 > --- a/arch/x86/kernel/cpu/proc.c > +++ b/arch/x86/kernel/cpu/proc.c > @@ -7,6 +7,10 @@ > > #include "cpu.h" > > +#ifdef CONFIG_X86_VMX_FEATURE_NAMES > +extern const char * const x86_vmx_flags[NVMXINTS*32]; > +#endif > + > /* > * Get CPU information for use by the procfs. > */ > @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v) > if (cpu_has(c, i) && x86_cap_flags[i] != NULL) > seq_printf(m, " %s", x86_cap_flags[i]); I'm afraid this is going to break some scripts in the wild. I would simply remove the seq_puts below. Paolo > +#ifdef CONFIG_X86_VMX_FEATURE_NAMES > + if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { > + seq_puts(m, "\nvmx flags\t:"); > + for (i = 0; i < 32*NVMXINTS; i++) { > + if (test_bit(i, (unsigned long *)c->vmx_capability) && > + x86_vmx_flags[i] != NULL) > + seq_printf(m, " %s", x86_vmx_flags[i]); > + } > + } > +#endif > + > seq_puts(m, "\nbugs\t\t:"); > for (i = 0; i < 32*NBUGINTS; i++) { > unsigned int bug_bit = 32*NCAPINTS + i; > diff --git a/arch/x86/kernel/cpu/zhaoxin.c b/arch/x86/kernel/cpu/zhaoxin.c > index 01b05a4a5a85..edfc7cc4ec33 100644 > --- a/arch/x86/kernel/cpu/zhaoxin.c > +++ b/arch/x86/kernel/cpu/zhaoxin.c > @@ -16,13 +16,6 @@ > #define RNG_ENABLED (1 << 3) > #define RNG_ENABLE (1 << 8) /* MSR_ZHAOXIN_RNG */ > > -#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW 0x00200000 > -#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x00400000 > -#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x80000000 > -#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC 0x00000001 > -#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002 > -#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020 > - > static void init_zhaoxin_cap(struct cpuinfo_x86 *c) > { > u32 lo, hi; > @@ -89,31 +82,6 @@ static void early_init_zhaoxin(struct cpuinfo_x86 *c) > > } > > -static void zhaoxin_detect_vmx_virtcap(struct cpuinfo_x86 *c) > -{ > - u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2; > - > - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); > - msr_ctl = vmx_msr_high | vmx_msr_low; > - > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW) > - set_cpu_cap(c, X86_FEATURE_TPR_SHADOW); > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI) > - set_cpu_cap(c, X86_FEATURE_VNMI); > - if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) { > - rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2, > - vmx_msr_low, vmx_msr_high); > - msr_ctl2 = vmx_msr_high | vmx_msr_low; > - if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) && > - (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)) > - set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY); > - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT) > - set_cpu_cap(c, X86_FEATURE_EPT); > - if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID) > - set_cpu_cap(c, X86_FEATURE_VPID); > - } > -} > - > static void init_zhaoxin(struct cpuinfo_x86 *c) > { > early_init_zhaoxin(c); > @@ -142,9 +110,6 @@ static void init_zhaoxin(struct cpuinfo_x86 *c) > #endif > > init_feature_control_msr(c); > - > - if (cpu_has(c, X86_FEATURE_VMX)) > - zhaoxin_detect_vmx_virtcap(c); > } > > #ifdef CONFIG_X86_32 >
On Mon, Oct 07, 2019 at 07:08:28PM +0200, Paolo Bonzini wrote:
> On 04/10/19 23:56, Sean Christopherson wrote:
> > +#define VMX_FEATURE_RDSEED_EXITING ( 2*32+ 16) /* "" VM-Exit on RDSEED */
> > +#define VMX_FEATURE_PAGE_MOD_LOGGING ( 2*32+ 17) /* "pml" Log dirty pages into buffer */
> > +#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* "" Conditionally reflect EPT violations as #VE exceptions */
> > +#define VMX_FEATURE_PT_CONCEAL_VMX ( 2*32+ 19) /* "" Suppress VMX indicators in Processor Trace */
> > +#define VMX_FEATURE_XSAVES ( 2*32+ 20) /* "" Enable XSAVES and XRSTORS in guest */
> > +#define VMX_FEATURE_RESERVED_PIN_21 ( 2*32+ 21) /* "" Reserved */
> > +#define VMX_FEATURE_MODE_BASED_EPT_EXEC ( 2*32+ 22) /* Enable separate EPT EXEC bits for supervisor vs. user */
> > +#define VMX_FEATURE_RESERVED_PIN_23 ( 2*32+ 23) /* "" Reserved */
> > +#define VMX_FEATURE_PT_USE_GPA ( 2*32+ 24) /* "" Processor Trace logs GPAs */
> > +#define VMX_FEATURE_TSC_SCALING ( 2*32+ 25) /* Scale hardware TSC when read in guest */
> > +#define VMX_FEATURE_RESERVED_PIN_26 ( 2*32+ 26) /* "" Reserved */
> > +#define VMX_FEATURE_RESERVED_PIN_27 ( 2*32+ 27) /* "" Reserved */
> > +#define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */
> > +#define VMX_FEATURE_RESERVED_PIN_29 ( 2*32+ 29) /* "" Reserved */
> > +#define VMX_FEATURE_RESERVED_PIN_30 ( 2*32+ 30) /* "" Reserved */
> > +#define VMX_FEATURE_RESERVED_PIN_31 ( 2*32+ 31) /* "" Reserved */
>
> Pasto on the reserved bit names, but in general why is it useful to
> define them?
Doh. It was mostly so that I could more easily cross-reference the SDM
and visually see that I wasn't skipping bits. No argument from me if it'd
be preferably to omit them.
On Mon, Oct 07, 2019 at 07:11:24PM +0200, Paolo Bonzini wrote: > On 04/10/19 23:56, Sean Christopherson wrote: > > + /* > > + * The high bits contain the allowed-1 settings, i.e. features that can > > + * be turned on. The low bits contain the allowed-0 settings, i.e. > > + * features that can be turned off. Ignore the allowed-0 settings, > > + * if a feature can be turned on then it's supported. > > + */ > > + rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported); > > For QEMU, we're defining a feature as supported if a feature can be > turned both on and off. Since msr_low and msr_high can be defined > respectively as must-be-one and can-be-one, the features become > "msr_high & ~msr_low". That makes sense for Qemu, but I don't think it's appropriate for this type of reporting. E.g. if EPT and Unrestricted Guest are must-be-one on a hypothetical (virtual) CPU, it'd be odd to not list them as a supported feature. For actual hardware (well, Intel hardware), as proposed it's a moot point. The only features that are must-be-one (even without "true" MSRs) and are documented in the SDM are CR3_LOAD_EXITING, CR3_STORE_EXITING, SAVE_DEBUG_CONTROLS, and LOAD_DEBUG_CONTROLS, none of which are reported in /proc/cpuinfo. > Also, shouldn't this use the "true" feature availability MSRs if available? Only if incorporating the "& ~msr_low" can-be-one logic. If a feature is considered supported if it must-be-one or can-be-one then the true MSR and vanilla MSR will yield the same feature set. > > Paolo
On Mon, Oct 07, 2019 at 07:12:37PM +0200, Paolo Bonzini wrote: > On 04/10/19 23:56, Sean Christopherson wrote: > > diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c > > index cb2e49810d68..4eec8889b0ff 100644 > > --- a/arch/x86/kernel/cpu/proc.c > > +++ b/arch/x86/kernel/cpu/proc.c > > @@ -7,6 +7,10 @@ > > > > #include "cpu.h" > > > > +#ifdef CONFIG_X86_VMX_FEATURE_NAMES > > +extern const char * const x86_vmx_flags[NVMXINTS*32]; > > +#endif > > + > > /* > > * Get CPU information for use by the procfs. > > */ > > @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v) > > if (cpu_has(c, i) && x86_cap_flags[i] != NULL) > > seq_printf(m, " %s", x86_cap_flags[i]); > > I'm afraid this is going to break some scripts in the wild. I would > simply remove the seq_puts below. Can you elaborate? I'm having trouble connecting the dots... > Paolo > > > +#ifdef CONFIG_X86_VMX_FEATURE_NAMES > > + if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { > > + seq_puts(m, "\nvmx flags\t:"); > > + for (i = 0; i < 32*NVMXINTS; i++) { > > + if (test_bit(i, (unsigned long *)c->vmx_capability) && > > + x86_vmx_flags[i] != NULL) > > + seq_printf(m, " %s", x86_vmx_flags[i]); > > + } > > + } > > +#endif > > + > > seq_puts(m, "\nbugs\t\t:"); > > for (i = 0; i < 32*NBUGINTS; i++) { > > unsigned int bug_bit = 32*NCAPINTS + i;
On 07/10/19 21:54, Sean Christopherson wrote:
>> For QEMU, we're defining a feature as supported if a feature can be
>> turned both on and off. Since msr_low and msr_high can be defined
>> respectively as must-be-one and can-be-one, the features become
>> "msr_high & ~msr_low".
>
> That makes sense for Qemu, but I don't think it's appropriate for this
> type of reporting. E.g. if EPT and Unrestricted Guest are must-be-one on
> a hypothetical (virtual) CPU, it'd be odd to not list them as a supported
> feature.
>
> For actual hardware (well, Intel hardware), as proposed it's a moot point.
> The only features that are must-be-one (even without "true" MSRs) and are
> documented in the SDM are CR3_LOAD_EXITING, CR3_STORE_EXITING,
> SAVE_DEBUG_CONTROLS, and LOAD_DEBUG_CONTROLS, none of which are reported
> in /proc/cpuinfo.
>
>> Also, shouldn't this use the "true" feature availability MSRs if available?
>
> Only if incorporating the "& ~msr_low" can-be-one logic. If a feature is
> considered supported if it must-be-one or can-be-one then the true MSR and
> vanilla MSR will yield the same feature set.
Ok, that all makes sense.
Paolo
On 07/10/19 21:56, Sean Christopherson wrote: > On Mon, Oct 07, 2019 at 07:12:37PM +0200, Paolo Bonzini wrote: >> On 04/10/19 23:56, Sean Christopherson wrote: >>> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c >>> index cb2e49810d68..4eec8889b0ff 100644 >>> --- a/arch/x86/kernel/cpu/proc.c >>> +++ b/arch/x86/kernel/cpu/proc.c >>> @@ -7,6 +7,10 @@ >>> >>> #include "cpu.h" >>> >>> +#ifdef CONFIG_X86_VMX_FEATURE_NAMES >>> +extern const char * const x86_vmx_flags[NVMXINTS*32]; >>> +#endif >>> + >>> /* >>> * Get CPU information for use by the procfs. >>> */ >>> @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v) >>> if (cpu_has(c, i) && x86_cap_flags[i] != NULL) >>> seq_printf(m, " %s", x86_cap_flags[i]); >> >> I'm afraid this is going to break some scripts in the wild. I would >> simply remove the seq_puts below. > > Can you elaborate? I'm having trouble connecting the dots... Somebody is bound to have scripts doing "grep ^flags.*ept /proc/cpuinfo" or checking for VMX flags under some kind of "if (/^flags/)", so it's safer not to separate VMX and non-VMX flags. Paolo >> Paolo >> >>> +#ifdef CONFIG_X86_VMX_FEATURE_NAMES >>> + if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { >>> + seq_puts(m, "\nvmx flags\t:"); >>> + for (i = 0; i < 32*NVMXINTS; i++) { >>> + if (test_bit(i, (unsigned long *)c->vmx_capability) && >>> + x86_vmx_flags[i] != NULL) >>> + seq_printf(m, " %s", x86_vmx_flags[i]); >>> + } >>> + } >>> +#endif >>> + >>> seq_puts(m, "\nbugs\t\t:"); >>> for (i = 0; i < 32*NBUGINTS; i++) { >>> unsigned int bug_bit = 32*NCAPINTS + i;
On Mon, Oct 7, 2019 at 11:57 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 07/10/19 21:56, Sean Christopherson wrote:
> > On Mon, Oct 07, 2019 at 07:12:37PM +0200, Paolo Bonzini wrote:
> >> On 04/10/19 23:56, Sean Christopherson wrote:
> >>> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> >>> index cb2e49810d68..4eec8889b0ff 100644
> >>> --- a/arch/x86/kernel/cpu/proc.c
> >>> +++ b/arch/x86/kernel/cpu/proc.c
> >>> @@ -7,6 +7,10 @@
> >>>
> >>> #include "cpu.h"
> >>>
> >>> +#ifdef CONFIG_X86_VMX_FEATURE_NAMES
> >>> +extern const char * const x86_vmx_flags[NVMXINTS*32];
> >>> +#endif
> >>> +
> >>> /*
> >>> * Get CPU information for use by the procfs.
> >>> */
> >>> @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> >>> if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
> >>> seq_printf(m, " %s", x86_cap_flags[i]);
> >>
> >> I'm afraid this is going to break some scripts in the wild. I would
> >> simply remove the seq_puts below.
> >
> > Can you elaborate? I'm having trouble connecting the dots...
>
> Somebody is bound to have scripts doing "grep ^flags.*ept /proc/cpuinfo"
> or checking for VMX flags under some kind of "if (/^flags/)", so it's
> safer not to separate VMX and non-VMX flags.
Yep. Not quite that exact syntax, but we do have, e.g.:
./x86/feature_check.sh ept
...and you can imagine what feature_check.sh does.
On Tue, Oct 08, 2019 at 08:57:30AM +0200, Paolo Bonzini wrote:
> On 07/10/19 21:56, Sean Christopherson wrote:
> > On Mon, Oct 07, 2019 at 07:12:37PM +0200, Paolo Bonzini wrote:
> >> On 04/10/19 23:56, Sean Christopherson wrote:
> >>> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> >>> index cb2e49810d68..4eec8889b0ff 100644
> >>> --- a/arch/x86/kernel/cpu/proc.c
> >>> +++ b/arch/x86/kernel/cpu/proc.c
> >>> @@ -7,6 +7,10 @@
> >>>
> >>> #include "cpu.h"
> >>>
> >>> +#ifdef CONFIG_X86_VMX_FEATURE_NAMES
> >>> +extern const char * const x86_vmx_flags[NVMXINTS*32];
> >>> +#endif
> >>> +
> >>> /*
> >>> * Get CPU information for use by the procfs.
> >>> */
> >>> @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> >>> if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
> >>> seq_printf(m, " %s", x86_cap_flags[i]);
> >>
> >> I'm afraid this is going to break some scripts in the wild. I would
> >> simply remove the seq_puts below.
> >
> > Can you elaborate? I'm having trouble connecting the dots...
>
> Somebody is bound to have scripts doing "grep ^flags.*ept /proc/cpuinfo"
> or checking for VMX flags under some kind of "if (/^flags/)", so it's
> safer not to separate VMX and non-VMX flags.
Are the names of the flags considered ABI? If so, then the rename of
"vnmi" to "virtual_nmis" also needs to be dropped. :-(
On 09/10/19 21:16, Sean Christopherson wrote:
> On Tue, Oct 08, 2019 at 08:57:30AM +0200, Paolo Bonzini wrote:
>> On 07/10/19 21:56, Sean Christopherson wrote:
>>> On Mon, Oct 07, 2019 at 07:12:37PM +0200, Paolo Bonzini wrote:
>>>> On 04/10/19 23:56, Sean Christopherson wrote:
>>>>> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
>>>>> index cb2e49810d68..4eec8889b0ff 100644
>>>>> --- a/arch/x86/kernel/cpu/proc.c
>>>>> +++ b/arch/x86/kernel/cpu/proc.c
>>>>> @@ -7,6 +7,10 @@
>>>>>
>>>>> #include "cpu.h"
>>>>>
>>>>> +#ifdef CONFIG_X86_VMX_FEATURE_NAMES
>>>>> +extern const char * const x86_vmx_flags[NVMXINTS*32];
>>>>> +#endif
>>>>> +
>>>>> /*
>>>>> * Get CPU information for use by the procfs.
>>>>> */
>>>>> @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v)
>>>>> if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
>>>>> seq_printf(m, " %s", x86_cap_flags[i]);
>>>>
>>>> I'm afraid this is going to break some scripts in the wild. I would
>>>> simply remove the seq_puts below.
>>>
>>> Can you elaborate? I'm having trouble connecting the dots...
>>
>> Somebody is bound to have scripts doing "grep ^flags.*ept /proc/cpuinfo"
>> or checking for VMX flags under some kind of "if (/^flags/)", so it's
>> safer not to separate VMX and non-VMX flags.
>
> Are the names of the flags considered ABI? If so, then the rename of
> "vnmi" to "virtual_nmis" also needs to be dropped. :-(
Yes, they are. :/
Paolo