All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] KVM: PKS Virtualization support
@ 2022-02-21  8:08 Chenyi Qiang
  2022-02-21  8:08 ` [PATCH v6 1/7] KVM: VMX: Introduce PKS VMCS fields Chenyi Qiang
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

This patch series is based on top of v8 PKS core support kernel patchset:
https://lore.kernel.org/lkml/20220127175505.851391-1-ira.weiny@intel.com/

Note: If you read the SDM section 4.6.1 and has some confusion about the
statement of Data writes to supervisor-mode address:

  If CR0.WP = 0, data may be written to any supervisor-mode address with
  a protection key for which write access is permitted.

Which may seems a little conflict with 4.6.2:

  if WDi = 1, write accesses are not permitted if CR0.WP = 1. (If CR0.WP
  = 0, IA32_PKRS.WDi does not affect write accesses to supervisor-mode
  address with protection key i.)

In fact, the statement in 4.6.1 doesn't say "a protection key with the
appropriate WDi bit set." The reader should instead refer to Section
4.6.2 to find the definition of what that means. We will follow up
this with someone internally to make it more clear in SDM.

---

Protection Keys for Supervisor Pages(PKS) is a feature that extends the
Protection Keys architecture to support thread-specific permission
restrictions on supervisor pages.

PKS works similar to an existing feature named PKU(protecting user pages).
They both perform an additional check after normal paging permission
checks are done. Access or Writes can be disabled via a MSR update
without TLB flushes when permissions changes. If violating this
addional check, #PF occurs and PFEC.PK bit will be set.

PKS introduces MSR IA32_PKRS to manage supervisor protection key
rights. The MSR contains 16 pairs of ADi and WDi bits. Each pair
advertises on a group of pages with the same key which is set in the
leaf paging-structure entries(bits[62:59]). Currently, IA32_PKRS is not
supported by XSAVES architecture.

This patchset aims to add the virtualization of PKS in KVM. It
implemented PKS CPUID enumeration, vmentry/vmexit configuration, MSR
exposure, nested supported etc. Currently, PKS is not yet supported for
shadow paging. 

Detailed information about PKS can be found in the latest Intel 64 and
IA-32 Architectures Software Developer's Manual.

---

Changelogs:

v5->v6
- PKRS is preserved on INIT. Add the PKRS reset operation in kvm_vcpu_reset.
  (Sean)
- Track the pkrs as u32. Add the code WARN on bits 64:32 being set in VMCS field.
  (Sean)
- Adjust the MSR intercept and entry/exit control in VMCS according to
  guest CPUID. This resolve the issue when userspace re-enable this feature.
  (Sean)
- Split VMX restriction on PKS support(entry/exit load controls) out of
  common x86. And put tdp restriction together with PKU in common x86.
  (Sean)
- Thanks for Sean to revise the comments in mmu.c related to
  update_pkr_bitmap, which make it more clear for pkr bitmask cache usage.
- v5: https://lore.kernel.org/lkml/20210811101126.8973-1-chenyi.qiang@intel.com/

v4->v5
- Make setting of MSR intercept/vmcs control bits not dependent on guest.CR4.PKS.
  And set them if PKS is exposed to guest. (Suggested by Sean)
- Add pkrs to standard register caching mechanism to help update
  vcpu->arch.pkrs on demand. Add related helper functions. (Suggested by Sean)
- Do the real pkrs update in VMCS field in vmx_vcpu_reset and
  vmx_sync_vmcs_host_state(). (Sean)
- Add a new mmu_role cr4_pks instead of smushing PKU and PKS together.
  (Sean & Paolo)
- v4: https://lore.kernel.org/lkml/20210205083706.14146-1-chenyi.qiang@intel.com/

v3->v4
- Make the MSR intercept and load-controls setting depend on CR4.PKS value
- shadow the guest pkrs and make it usable in PKS emultion
- add the cr4_pke and cr4_pks check in pkr_mask update
- squash PATCH 2 and PATCH 5 to make the dependencies read more clear
- v3: https://lore.kernel.org/lkml/20201105081805.5674-1-chenyi.qiang@intel.com/

v2->v3:
- No function changes since last submit
- rebase on the latest PKS kernel support:
  https://lore.kernel.org/lkml/20201102205320.1458656-1-ira.weiny@intel.com/
- add MSR_IA32_PKRS to the vmx_possible_passthrough_msrs[]
- RFC v2: https://lore.kernel.org/lkml/20201014021157.18022-1-chenyi.qiang@intel.com/

v1->v2:
- rebase on the latest PKS kernel support:
  https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3
- add a kvm-unit-tests for PKS
- add the check in kvm_init_msr_list for PKRS
- place the X86_CR4_PKS in mmu_role_bits in kvm_set_cr4
- add the support to expose VM_{ENTRY, EXIT}_LOAD_IA32_PKRS in nested
  VMX MSR
- RFC v1: https://lore.kernel.org/lkml/20200807084841.7112-1-chenyi.qiang@intel.com/

---

Chenyi Qiang (7):
  KVM: VMX: Introduce PKS VMCS fields
  KVM: VMX: Add proper cache tracking for PKRS
  KVM: X86: Expose IA32_PKRS MSR
  KVM: MMU: Rename the pkru to pkr
  KVM: MMU: Add support for PKS emulation
  KVM: VMX: Expose PKS to guest
  KVM: VMX: Enable PKS for nested VM

 arch/x86/include/asm/kvm_host.h |  17 ++++--
 arch/x86/include/asm/vmx.h      |   6 ++
 arch/x86/kvm/cpuid.c            |  13 ++--
 arch/x86/kvm/kvm_cache_regs.h   |   7 +++
 arch/x86/kvm/mmu.h              |  27 +++++----
 arch/x86/kvm/mmu/mmu.c          | 101 ++++++++++++++++++++------------
 arch/x86/kvm/vmx/capabilities.h |   6 ++
 arch/x86/kvm/vmx/nested.c       |  38 +++++++++++-
 arch/x86/kvm/vmx/vmcs.h         |   1 +
 arch/x86/kvm/vmx/vmcs12.c       |   2 +
 arch/x86/kvm/vmx/vmcs12.h       |   4 ++
 arch/x86/kvm/vmx/vmx.c          |  92 ++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.h          |   7 ++-
 arch/x86/kvm/x86.c              |  10 +++-
 arch/x86/kvm/x86.h              |   8 +++
 arch/x86/mm/pkeys.c             |   6 ++
 include/linux/pkeys.h           |   6 ++
 17 files changed, 280 insertions(+), 71 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/7] KVM: VMX: Introduce PKS VMCS fields
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
@ 2022-02-21  8:08 ` Chenyi Qiang
  2022-03-30 20:05   ` Sean Christopherson
  2022-02-21  8:08 ` [PATCH v6 2/7] KVM: VMX: Add proper cache tracking for PKRS Chenyi Qiang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

PKS(Protection Keys for Supervisor Pages) is a feature that extends the
Protection Key architecture to support thread-specific permission
restrictions on supervisor pages.

A new PKS MSR(PKRS) is defined in kernel to support PKS, which holds a
set of permissions associated with each protection domain.

Two VMCS fields {HOST,GUEST}_IA32_PKRS are introduced in
{host,guest}-state area to store the respective values of PKRS.

Every VM exit saves PKRS into guest-state area.
If VM_EXIT_LOAD_IA32_PKRS = 1, VM exit loads PKRS from the host-state
area.
If VM_ENTRY_LOAD_IA32_PKRS = 1, VM entry loads PKRS from the guest-state
area.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/vmx.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0ffaa3156a4e..7962d506ba91 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -95,6 +95,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_IA32_PKRS			0x20000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -108,6 +109,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_IA32_PKRS			0x00400000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -245,12 +247,16 @@ enum vmcs_field {
 	GUEST_BNDCFGS_HIGH              = 0x00002813,
 	GUEST_IA32_RTIT_CTL		= 0x00002814,
 	GUEST_IA32_RTIT_CTL_HIGH	= 0x00002815,
+	GUEST_IA32_PKRS			= 0x00002818,
+	GUEST_IA32_PKRS_HIGH		= 0x00002819,
 	HOST_IA32_PAT			= 0x00002c00,
 	HOST_IA32_PAT_HIGH		= 0x00002c01,
 	HOST_IA32_EFER			= 0x00002c02,
 	HOST_IA32_EFER_HIGH		= 0x00002c03,
 	HOST_IA32_PERF_GLOBAL_CTRL	= 0x00002c04,
 	HOST_IA32_PERF_GLOBAL_CTRL_HIGH	= 0x00002c05,
+	HOST_IA32_PKRS			= 0x00002c06,
+	HOST_IA32_PKRS_HIGH		= 0x00002c07,
 	PIN_BASED_VM_EXEC_CONTROL       = 0x00004000,
 	CPU_BASED_VM_EXEC_CONTROL       = 0x00004002,
 	EXCEPTION_BITMAP                = 0x00004004,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/7] KVM: VMX: Add proper cache tracking for PKRS
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
  2022-02-21  8:08 ` [PATCH v6 1/7] KVM: VMX: Introduce PKS VMCS fields Chenyi Qiang
@ 2022-02-21  8:08 ` Chenyi Qiang
  2022-03-30 20:42   ` Sean Christopherson
  2022-02-21  8:08 ` [PATCH v6 3/7] KVM: X86: Expose IA32_PKRS MSR Chenyi Qiang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

Add PKRS caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.

This is because vcpu->arch.pkrs will be rarely acceesed by KVM, only in
the case of host userspace MSR reads and GVA->GPA translation in
following patches. It is unnecessary to keep it up-to-date at all times.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/kvm_cache_regs.h   |  7 +++++++
 arch/x86/kvm/vmx/vmx.c          | 11 +++++++++++
 arch/x86/kvm/vmx/vmx.h          |  3 ++-
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1384517d7709..75940aeb5f67 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -177,6 +177,7 @@ enum kvm_reg {
 	VCPU_EXREG_SEGMENTS,
 	VCPU_EXREG_EXIT_INFO_1,
 	VCPU_EXREG_EXIT_INFO_2,
+	VCPU_EXREG_PKRS,
 };
 
 enum {
@@ -632,6 +633,7 @@ struct kvm_vcpu_arch {
 	unsigned long cr8;
 	u32 host_pkru;
 	u32 pkru;
+	u32 pkrs;
 	u32 hflags;
 	u64 efer;
 	u64 apic_base;
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 3febc342360c..2b2540ca584f 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -177,6 +177,13 @@ static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu)
 		| ((u64)(kvm_rdx_read(vcpu) & -1u) << 32);
 }
 
+static inline u32 kvm_read_pkrs(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_register_is_available(vcpu, VCPU_EXREG_PKRS))
+		static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_PKRS);
+	return vcpu->arch.pkrs;
+}
+
 static inline void enter_guest_mode(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.hflags |= HF_GUEST_MASK;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ac676066d60..0496afe786fa 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2269,6 +2269,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 {
 	unsigned long guest_owned_bits;
+	u64 ia32_pkrs;
 
 	kvm_register_mark_available(vcpu, reg);
 
@@ -2303,6 +2304,16 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 		vcpu->arch.cr4 &= ~guest_owned_bits;
 		vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits;
 		break;
+	case VCPU_EXREG_PKRS:
+		/*
+		 * The high 32 bits of PKRS are reserved and attempting to write
+		 * non-zero value will cause #GP. KVM intentionally drops those
+		 * bits.
+		 */
+		ia32_pkrs = vmcs_read64(GUEST_IA32_PKRS);
+		WARN_ON_ONCE(ia32_pkrs >> 32);
+		vcpu->arch.pkrs = ia32_pkrs;
+		break;
 	default:
 		KVM_BUG_ON(1, vcpu->kvm);
 		break;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 7f2c82e7f38f..da5e95a6694c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -500,7 +500,8 @@ BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL)
 				(1 << VCPU_EXREG_CR3) |         \
 				(1 << VCPU_EXREG_CR4) |         \
 				(1 << VCPU_EXREG_EXIT_INFO_1) | \
-				(1 << VCPU_EXREG_EXIT_INFO_2))
+				(1 << VCPU_EXREG_EXIT_INFO_2) | \
+				(1 << VCPU_EXREG_PKRS))
 
 static inline struct kvm_vmx *to_kvm_vmx(struct kvm *kvm)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/7] KVM: X86: Expose IA32_PKRS MSR
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
  2022-02-21  8:08 ` [PATCH v6 1/7] KVM: VMX: Introduce PKS VMCS fields Chenyi Qiang
  2022-02-21  8:08 ` [PATCH v6 2/7] KVM: VMX: Add proper cache tracking for PKRS Chenyi Qiang
@ 2022-02-21  8:08 ` Chenyi Qiang
  2022-02-21  8:08 ` [PATCH v6 4/7] KVM: MMU: Rename the pkru to pkr Chenyi Qiang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

Protection Key for Superviosr Pages (PKS) uses IA32_PKRS MSR (PKRS) at
index 0x6E1 to allow software to manage superviosr key rights, i.e. it
can enforce additional permissions checks besides normal paging
protections via a MSR update without TLB flushes when permissions
change.

For performance consideration, PKRS intercept in KVM will be disabled
when PKS is supported in guest so that PKRS can be accessed without VM
exit.

PKS introduces dedicated control fields in VMCS to switch PKRS, which
only does the retore part. In addition, every VM exit saves PKRS into
the guest-state area in VMCS, while VM enter won't save the host value
due to the expectation that the host won't change the MSR often. Update
the host's value in VMCS manually if the MSR has been changed by the
kernel since the last time the VMCS was run.

Introduce a function get_current_pkrs() in arch/x86/mm/pkeys.c to export
the per-cpu variable pkrs_cache to avoid frequent rdmsr of PKRS.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 arch/x86/kvm/vmx/vmcs.h |  1 +
 arch/x86/kvm/vmx/vmx.c  | 66 +++++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/vmx.h  |  2 +-
 arch/x86/kvm/x86.c      | 10 ++++++-
 arch/x86/kvm/x86.h      |  6 ++++
 arch/x86/mm/pkeys.c     |  6 ++++
 include/linux/pkeys.h   |  6 ++++
 7 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index e325c290a816..ee37741b2b9d 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -42,6 +42,7 @@ struct vmcs_host_state {
 #ifdef CONFIG_X86_64
 	u16           ds_sel, es_sel;
 #endif
+	u32           pkrs;
 };
 
 struct vmcs_controls_shadow {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0496afe786fa..b3d5412b9481 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -28,6 +28,7 @@
 #include <linux/tboot.h>
 #include <linux/trace_events.h>
 #include <linux/entry-kvm.h>
+#include <linux/pkeys.h>
 
 #include <asm/apic.h>
 #include <asm/asm.h>
@@ -172,6 +173,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
 	MSR_CORE_C3_RESIDENCY,
 	MSR_CORE_C6_RESIDENCY,
 	MSR_CORE_C7_RESIDENCY,
+	MSR_IA32_PKRS,
 };
 
 /*
@@ -1121,6 +1123,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 #endif
 	unsigned long fs_base, gs_base;
 	u16 fs_sel, gs_sel;
+	u32 host_pkrs;
 	int i;
 
 	vmx->req_immediate_exit = false;
@@ -1156,6 +1159,20 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	 */
 	host_state->ldt_sel = kvm_read_ldt();
 
+	/*
+	 * Update the host pkrs vmcs field before vcpu runs.
+	 * The setting of VM_EXIT_LOAD_IA32_PKRS can ensure
+	 * kvm_cpu_cap_has(X86_FEATURE_PKS) &&
+	 * guest_cpuid_has(vcpu, X86_FEATURE_PKS)
+	 */
+	if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) {
+		host_pkrs = get_current_pkrs();
+		if (unlikely(host_pkrs != host_state->pkrs)) {
+			vmcs_write64(HOST_IA32_PKRS, host_pkrs);
+			host_state->pkrs = host_pkrs;
+		}
+	}
+
 #ifdef CONFIG_X86_64
 	savesegment(ds, host_state->ds_sel);
 	savesegment(es, host_state->es_sel);
@@ -1912,6 +1929,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
 		break;
+	case MSR_IA32_PKRS:
+		if (!kvm_cpu_cap_has(X86_FEATURE_PKS) ||
+		    (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_PKS)))
+			return 1;
+		msr_info->data = kvm_read_pkrs(vcpu);
+		break;
 	default:
 	find_uret_msr:
 		msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2253,7 +2277,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
-
+	case MSR_IA32_PKRS:
+		if (!kvm_pkrs_valid(data))
+			return 1;
+		if (!kvm_cpu_cap_has(X86_FEATURE_PKS) ||
+		    (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_PKS)))
+			return 1;
+		vcpu->arch.pkrs = data;
+		kvm_register_mark_available(vcpu, VCPU_EXREG_PKRS);
+		vmcs_write64(GUEST_IA32_PKRS, data);
+		break;
 	default:
 	find_uret_msr:
 		msr = vmx_find_uret_msr(vmx, msr_index);
@@ -2544,7 +2578,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_EXIT_LOAD_IA32_EFER |
 	      VM_EXIT_CLEAR_BNDCFGS |
 	      VM_EXIT_PT_CONCEAL_PIP |
-	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
+	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
+	      VM_EXIT_LOAD_IA32_PKRS;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
 				&_vmexit_control) < 0)
 		return -EIO;
@@ -2568,7 +2603,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_ENTRY_LOAD_IA32_EFER |
 	      VM_ENTRY_LOAD_BNDCFGS |
 	      VM_ENTRY_PT_CONCEAL_PIP |
-	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
+	      VM_ENTRY_LOAD_IA32_RTIT_CTL |
+	      VM_ENTRY_LOAD_IA32_PKRS;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
 				&_vmentry_control) < 0)
 		return -EIO;
@@ -4162,7 +4198,8 @@ static u32 vmx_vmentry_ctrl(void)
 				  VM_ENTRY_LOAD_IA32_RTIT_CTL);
 	/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
 	return vmentry_ctrl &
-		~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER);
+		~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER |
+		  VM_ENTRY_LOAD_IA32_PKRS);
 }
 
 static u32 vmx_vmexit_ctrl(void)
@@ -4174,7 +4211,8 @@ static u32 vmx_vmexit_ctrl(void)
 				 VM_EXIT_CLEAR_IA32_RTIT_CTL);
 	/* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */
 	return vmexit_ctrl &
-		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
+		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER |
+		  VM_EXIT_LOAD_IA32_PKRS);
 }
 
 static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
@@ -5887,6 +5925,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 		       vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL));
 	if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS)
 		pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS));
+	if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PKRS)
+		pr_err("PKRS = 0x%016llx\n", vmcs_read64(GUEST_IA32_PKRS));
 	pr_err("Interruptibility = %08x  ActivityState = %08x\n",
 	       vmcs_read32(GUEST_INTERRUPTIBILITY_INFO),
 	       vmcs_read32(GUEST_ACTIVITY_STATE));
@@ -5928,6 +5968,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 		       vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
 	if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0)
 		vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
+	if (vmexit_ctl & VM_EXIT_LOAD_IA32_PKRS)
+		pr_err("PKRS = 0x%016llx\n", vmcs_read64(HOST_IA32_PKRS));
 
 	pr_err("*** Control State ***\n");
 	pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n",
@@ -7357,6 +7399,20 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
+
+	if (kvm_cpu_cap_has(X86_FEATURE_PKS)) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_PKS)) {
+			vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW);
+
+			vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_PKRS);
+			vm_exit_controls_setbit(vmx, VM_EXIT_LOAD_IA32_PKRS);
+		} else {
+			vmx_enable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW);
+
+			vm_entry_controls_clearbit(vmx, VM_ENTRY_LOAD_IA32_PKRS);
+			vm_exit_controls_clearbit(vmx, VM_EXIT_LOAD_IA32_PKRS);
+		}
+	}
 }
 
 static __init void vmx_set_cpu_caps(void)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index da5e95a6694c..d704ba3a4af7 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -352,7 +352,7 @@ struct vcpu_vmx {
 	struct lbr_desc lbr_desc;
 
 	/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS	15
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS	16
 	struct {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e43d756312f..8e61373d89d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1356,7 +1356,7 @@ static const u32 msrs_to_save_all[] = {
 	MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
 	MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
 	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
-	MSR_IA32_UMWAIT_CONTROL,
+	MSR_IA32_UMWAIT_CONTROL, MSR_IA32_PKRS,
 
 	MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
 	MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
@@ -6506,6 +6506,10 @@ static void kvm_init_msr_list(void)
 				intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2)
 				continue;
 			break;
+		case MSR_IA32_PKRS:
+			if (!kvm_cpu_cap_has(X86_FEATURE_PKS))
+				continue;
+			break;
 		case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17:
 			if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >=
 			    min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
@@ -11233,6 +11237,10 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
 	kvm_rip_write(vcpu, 0xfff0);
 
+	/* PKRS is preserved on INIT */
+	if (!init_event && kvm_cpu_cap_has(X86_FEATURE_PKS))
+		__kvm_set_msr(vcpu, MSR_IA32_PKRS, 0, true);
+
 	vcpu->arch.cr3 = 0;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 635b75f9e145..8b752cebbefc 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -452,6 +452,12 @@ static inline void kvm_machine_check(void)
 #endif
 }
 
+static inline bool kvm_pkrs_valid(u64 data)
+{
+	/* bit[63,32] must be zero */
+	return !(data >> 32);
+}
+
 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
 int kvm_spec_ctrl_test_value(u64 value);
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 50cbb65439a9..4396c4be18cb 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -490,4 +490,10 @@ void pks_update_exception(struct pt_regs *regs, int pkey, u32 protection)
 }
 EXPORT_SYMBOL_GPL(pks_update_exception);
 
+u32 get_current_pkrs(void)
+{
+	return this_cpu_read(pkrs_cache);
+}
+EXPORT_SYMBOL_GPL(get_current_pkrs);
+
 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index a642c875a04e..6915b43e2ffc 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -98,6 +98,8 @@ static inline void pks_mk_readwrite(int pkey)
 typedef bool (*pks_key_callback)(struct pt_regs *regs, unsigned long address,
 				 bool write);
 
+u32 get_current_pkrs(void);
+
 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
 
 static inline bool pks_available(void)
@@ -112,6 +114,10 @@ static inline void pks_update_exception(struct pt_regs *regs,
 					int pkey,
 					u32 protection)
 { }
+static inline u32 get_current_pkrs(void)
+{
+	return 0;
+}
 
 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/7] KVM: MMU: Rename the pkru to pkr
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
                   ` (2 preceding siblings ...)
  2022-02-21  8:08 ` [PATCH v6 3/7] KVM: X86: Expose IA32_PKRS MSR Chenyi Qiang
@ 2022-02-21  8:08 ` Chenyi Qiang
  2022-02-21  8:08 ` [PATCH v6 5/7] KVM: MMU: Add support for PKS emulation Chenyi Qiang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

PKRU represents the PKU register utilized in the protection key rights
check for user pages. Protection Keys for Superviosr Pages (PKS) extends
the protection key architecture to cover supervisor pages.

Rename the *pkru* related variables and functions to *pkr* which stands
for both of the PKRU and PKRS. It makes sense because PKS and PKU each
have:
 - a single control register (PKRU and PKRS)
 - the same number of keys (16 in total)
 - the same format in control registers (Access and Write disable bits)

PKS and PKU can also share the same bitmap pkr_mask cache conditions
where protection key checks are needed, because they can share almost the
same requirements for PK restrictions to cause a fault, except they
focus on different pages (supervisor and user pages).

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu.h              | 12 ++++++------
 arch/x86/kvm/mmu/mmu.c          | 10 +++++-----
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 75940aeb5f67..c37cd23b6764 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -455,7 +455,7 @@ struct kvm_mmu {
 	* with PFEC.RSVD replaced by ACC_USER_MASK from the page tables.
 	* Each domain has 2 bits which are ANDed with AD and WD from PKRU.
 	*/
-	u32 pkru_mask;
+	u32 pkr_mask;
 
 	u64 *pae_root;
 	u64 *pml4_root;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e9fbb2c8bbe2..9e216d205c8d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -275,8 +275,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	u32 errcode = PFERR_PRESENT_MASK;
 
 	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
-	if (unlikely(mmu->pkru_mask)) {
-		u32 pkru_bits, offset;
+	if (unlikely(mmu->pkr_mask)) {
+		u32 pkr_bits, offset;
 
 		/*
 		* PKRU defines 32 bits, there are 16 domains and 2
@@ -284,15 +284,15 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		* index of the protection domain, so pte_pkey * 2 is
 		* is the index of the first bit for the domain.
 		*/
-		pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
+		pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
 
 		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
 		offset = (pfec & ~1) +
 			((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));
 
-		pkru_bits &= mmu->pkru_mask >> offset;
-		errcode |= -pkru_bits & PFERR_PK_MASK;
-		fault |= (pkru_bits != 0);
+		pkr_bits &= mmu->pkr_mask >> offset;
+		errcode |= -pkr_bits & PFERR_PK_MASK;
+		fault |= (pkr_bits != 0);
 	}
 
 	return -(u32)fault & errcode;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 593093b52395..83597161b5f9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4620,12 +4620,12 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
 * away both AD and WD.  For all reads or if the last condition holds, WD
 * only will be masked away.
 */
-static void update_pkru_bitmask(struct kvm_mmu *mmu)
+static void update_pkr_bitmask(struct kvm_mmu *mmu)
 {
 	unsigned bit;
 	bool wp;
 
-	mmu->pkru_mask = 0;
+	mmu->pkr_mask = 0;
 
 	if (!is_cr4_pke(mmu))
 		return;
@@ -4660,7 +4660,7 @@ static void update_pkru_bitmask(struct kvm_mmu *mmu)
 		/* PKRU.WD stops write access. */
 		pkey_bits |= (!!check_write) << 1;
 
-		mmu->pkru_mask |= (pkey_bits & 3) << pfec;
+		mmu->pkr_mask |= (pkey_bits & 3) << pfec;
 	}
 }
 
@@ -4672,7 +4672,7 @@ static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu,
 
 	reset_rsvds_bits_mask(vcpu, mmu);
 	update_permission_bitmask(mmu, false);
-	update_pkru_bitmask(mmu);
+	update_pkr_bitmask(mmu);
 }
 
 static void paging64_init_context(struct kvm_mmu *context)
@@ -4946,7 +4946,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->direct_map = false;
 
 	update_permission_bitmask(context, true);
-	context->pkru_mask = 0;
+	context->pkr_mask = 0;
 	reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level);
 	reset_ept_shadow_zero_bits_mask(vcpu, context, execonly);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 5/7] KVM: MMU: Add support for PKS emulation
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
                   ` (3 preceding siblings ...)
  2022-02-21  8:08 ` [PATCH v6 4/7] KVM: MMU: Rename the pkru to pkr Chenyi Qiang
@ 2022-02-21  8:08 ` Chenyi Qiang
  2022-03-30 21:27   ` Sean Christopherson
  2022-02-21  8:08 ` [PATCH v6 6/7] KVM: VMX: Expose PKS to guest Chenyi Qiang
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

Up until now, pkr_mask had 0 bits for supervisor pages (the U/S bit in
page tables replaces the PFEC.RSVD in page fault error code).
For PKS support, fill in the bits using the same algorithm used for user
mode pages, but with CR4.PKE replaced by CR4.PKS. Because of this
change, CR4.PKS must also be included in the MMU role.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 10 ++--
 arch/x86/kvm/mmu.h              | 17 +++---
 arch/x86/kvm/mmu/mmu.c          | 91 +++++++++++++++++++++------------
 3 files changed, 76 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c37cd23b6764..5c53efe0012e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -370,6 +370,7 @@ union kvm_mmu_extended_role {
 		unsigned int cr4_smap:1;
 		unsigned int cr4_smep:1;
 		unsigned int cr4_la57:1;
+		unsigned int cr4_pks:1;
 		unsigned int efer_lma:1;
 	};
 };
@@ -450,10 +451,11 @@ struct kvm_mmu {
 	u8 permissions[16];
 
 	/*
-	* The pkru_mask indicates if protection key checks are needed.  It
-	* consists of 16 domains indexed by page fault error code bits [4:1],
-	* with PFEC.RSVD replaced by ACC_USER_MASK from the page tables.
-	* Each domain has 2 bits which are ANDed with AD and WD from PKRU.
+	* The pkr_mask indicates if protection key checks are needed.
+	* It consists of 16 domains indexed by page fault error code
+	* bits[4:1] with PFEC.RSVD replaced by ACC_USER_MASK from the
+	* page tables. Each domain has 2 bits which are ANDed with AD
+	* and WD from PKRU/PKRS.
 	*/
 	u32 pkr_mask;
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 9e216d205c8d..0dacd3bb8602 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -45,7 +45,8 @@
 #define PT32E_ROOT_LEVEL 3
 
 #define KVM_MMU_CR4_ROLE_BITS (X86_CR4_PSE | X86_CR4_PAE | X86_CR4_LA57 | \
-			       X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE)
+			       X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE | \
+			       X86_CR4_PKS)
 
 #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP)
 
@@ -277,14 +278,18 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
 	if (unlikely(mmu->pkr_mask)) {
 		u32 pkr_bits, offset;
+		u32 pkr;
 
 		/*
-		* PKRU defines 32 bits, there are 16 domains and 2
-		* attribute bits per domain in pkru.  pte_pkey is the
-		* index of the protection domain, so pte_pkey * 2 is
-		* is the index of the first bit for the domain.
+		* PKRU and PKRS both define 32 bits. There are 16 domains
+		* and 2 attribute bits per domain in them. pte_key is the
+		* index of the protection domain, so pte_pkey * 2 is the
+		* index of the first bit for the domain. The use of PKRU
+		* versus PKRS is selected by the address type, as determined
+		* by the U/S bit in the paging-structure entries.
 		*/
-		pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
+		pkr = pte_access & PT_USER_MASK ? vcpu->arch.pkru : kvm_read_pkrs(vcpu);
+		pkr_bits = (pkr >> pte_pkey * 2) & 3;
 
 		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
 		offset = (pfec & ~1) +
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 83597161b5f9..ea19ccfa4da3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -220,6 +220,7 @@ BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smep, X86_CR4_SMEP);
 BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smap, X86_CR4_SMAP);
 BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pke, X86_CR4_PKE);
 BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57);
+BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pks, X86_CR4_PKS);
 BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX);
 BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA);
 
@@ -242,6 +243,7 @@ BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
+BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pks);
 BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
 
 static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
@@ -4597,37 +4599,58 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
 }
 
 /*
-* PKU is an additional mechanism by which the paging controls access to
-* user-mode addresses based on the value in the PKRU register.  Protection
-* key violations are reported through a bit in the page fault error code.
-* Unlike other bits of the error code, the PK bit is not known at the
-* call site of e.g. gva_to_gpa; it must be computed directly in
-* permission_fault based on two bits of PKRU, on some machine state (CR4,
-* CR0, EFER, CPL), and on other bits of the error code and the page tables.
-*
-* In particular the following conditions come from the error code, the
-* page tables and the machine state:
-* - PK is always zero unless CR4.PKE=1 and EFER.LMA=1
-* - PK is always zero if RSVD=1 (reserved bit set) or F=1 (instruction fetch)
-* - PK is always zero if U=0 in the page tables
-* - PKRU.WD is ignored if CR0.WP=0 and the access is a supervisor access.
-*
-* The PKRU bitmask caches the result of these four conditions.  The error
-* code (minus the P bit) and the page table's U bit form an index into the
-* PKRU bitmask.  Two bits of the PKRU bitmask are then extracted and ANDed
-* with the two bits of the PKRU register corresponding to the protection key.
-* For the first three conditions above the bits will be 00, thus masking
-* away both AD and WD.  For all reads or if the last condition holds, WD
-* only will be masked away.
-*/
+ * Protection Key Rights (PKR) is an additional mechanism by which data accesses
+ * with 4-level or 5-level paging (EFER.LMA=1) may be disabled based on the
+ * Protection Key Rights Userspace (PRKU) or Protection Key Rights Supervisor
+ * (PKRS) registers.  The Protection Key (PK) used for an access is a 4-bit
+ * value specified in bits 62:59 of the leaf PTE used to translate the address.
+ *
+ * PKRU and PKRS are 32-bit registers, with 16 2-bit entries consisting of an
+ * access-disable (AD) and write-disable (WD) bit.  The PK from the leaf PTE is
+ * used to index the appropriate PKR (see below), e.g. PK=1 would consume bits
+ * 3:2 (bit 3 == write-disable, bit 2 == access-disable).
+ *
+ * The PK register (PKRU vs. PKRS) indexed by the PK depends on the type of
+ * _address_ (not access type!).  For a user-mode address, PKRU is used; for a
+ * supervisor-mode address, PKRS is used.  An address is supervisor-mode if the
+ * U/S flag (bit 2) is 0 in at least one of the paging-structure entries, i.e.
+ * an address is user-mode if the U/S flag is 1 in _all_ entries.  Again, this
+ * is the address type, not the access type, e.g. a supervisor-mode _access_
+ * will consume PKRU if the _address_ is a user-mode address.
+ *
+ * As alluded to above, PKR checks are only performed for data accesses; code
+ * fetches are not subject to PKR checks.  Terminal page faults (!PRESENT or
+ * PFEC.RSVD=1) are also not subject to PKR checks.
+ *
+ * PKR write-disable checks for superivsor-mode _accesses_ are performed if and
+ * only if CR0.WP=1 (though access-disable checks still apply).
+ *
+ * In summary, PKR checks are based on (a) EFER.LMA, (b) CR4.PKE or CR4.PKS,
+ * (c) CR0.WP, (d) the PK in the leaf PTE, (e) two bits from the corresponding
+ * PKR{S,U} entry, (f) the access type (derived from the other PFEC bits), and
+ * (g) the address type (retrieved from the paging-structure entries).
+ *
+ * To avoid conditional branches in permission_fault(), the PKR bitmask caches
+ * the above inputs, except for (e) the PKR{S,U} entry.  The FETCH, USER, and
+ * WRITE bits of the PFEC and the effective value of the paging-structures' U/S
+ * bit (slotted into the PFEC.RSVD position, bit 3) are used to index into the
+ * PKR bitmask (similar to the 4-bit Protection Key itself).  The two bits of
+ * the PKR bitmask "entry" are then extracted and ANDed with the two bits of
+ * the PKR{S,U} register corresponding to the address type and protection key.
+ *
+ * E.g. for all values where PFEC.FETCH=1, the corresponding pkr_bitmask bits
+ * will be 00b, thus masking away the AD and WD bits from the PKR{S,U} register
+ * to suppress PKR checks on code fetches.
+ */
 static void update_pkr_bitmask(struct kvm_mmu *mmu)
 {
 	unsigned bit;
 	bool wp;
-
+	bool cr4_pke = is_cr4_pke(mmu);
+	bool cr4_pks = is_cr4_pks(mmu);
 	mmu->pkr_mask = 0;
 
-	if (!is_cr4_pke(mmu))
+	if (!cr4_pke && !cr4_pks)
 		return;
 
 	wp = is_cr0_wp(mmu);
@@ -4645,19 +4668,22 @@ static void update_pkr_bitmask(struct kvm_mmu *mmu)
 		pte_user = pfec & PFERR_RSVD_MASK;
 
 		/*
-		 * Only need to check the access which is not an
-		 * instruction fetch and is to a user page.
+		 * need to check the access which is not an
+		 * instruction fetch and
+		 * - if cr4_pke 1-setting when accessing a user page.
+		 * - if cr4_pks 1-setting when accessing a supervisor page.
 		 */
-		check_pkey = (!ff && pte_user);
+		check_pkey = !ff && (pte_user ? cr4_pke : cr4_pks);
+
 		/*
-		 * write access is controlled by PKRU if it is a
-		 * user access or CR0.WP = 1.
+		 * write access is controlled by PKRU/PKRS if
+		 * it is a user access or CR0.WP = 1.
 		 */
 		check_write = check_pkey && wf && (uf || wp);
 
-		/* PKRU.AD stops both read and write access. */
+		/* PKRU/PKRS.AD stops both read and write access. */
 		pkey_bits = !!check_pkey;
-		/* PKRU.WD stops write access. */
+		/* PKRU/PKRS.WD stops write access. */
 		pkey_bits |= (!!check_write) << 1;
 
 		mmu->pkr_mask |= (pkey_bits & 3) << pfec;
@@ -4708,6 +4734,7 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
 		/* PKEY and LA57 are active iff long mode is active. */
 		ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs);
 		ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs);
+		ext.cr4_pks = ____is_efer_lma(regs) && ____is_cr4_pks(regs);
 		ext.efer_lma = ____is_efer_lma(regs);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 6/7] KVM: VMX: Expose PKS to guest
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
                   ` (4 preceding siblings ...)
  2022-02-21  8:08 ` [PATCH v6 5/7] KVM: MMU: Add support for PKS emulation Chenyi Qiang
@ 2022-02-21  8:08 ` Chenyi Qiang
  2022-03-30 21:30   ` Sean Christopherson
  2022-02-21  8:08 ` [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM Chenyi Qiang
  2022-03-07  6:09 ` [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
  7 siblings, 1 reply; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

Existence of PKS is enumerated via CPUID.(EAX=7,ECX=0):ECX[31]. It is
enabled by setting CR4.PKS when long mode is active. PKS is only
implemented when EPT is enabled and requires the support of
VM_{ENTRY,EXIT}_LOAD_IA32_PKRS VMCS controls currently.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/cpuid.c            | 13 +++++++++----
 arch/x86/kvm/vmx/capabilities.h |  6 ++++++
 arch/x86/kvm/vmx/vmx.c          | 14 +++++++++++---
 arch/x86/kvm/x86.h              |  2 ++
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5c53efe0012e..4de4b9c617e4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -114,7 +114,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_PKS))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3902c28fb6cb..2457dadad7ef 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -547,18 +547,23 @@ void kvm_set_cpu_caps(void)
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
 		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
-		F(SGX_LC) | F(BUS_LOCK_DETECT)
+		F(SGX_LC) | F(BUS_LOCK_DETECT) | F(PKS)
 	);
 	/* Set LA57 based on hardware capability. */
 	if (cpuid_ecx(7) & F(LA57))
 		kvm_cpu_cap_set(X86_FEATURE_LA57);
 
 	/*
-	 * PKU not yet implemented for shadow paging and requires OSPKE
-	 * to be set on the host. Clear it if that is not the case
+	 * Protection Keys are not supported for shadow paging.  PKU further
+	 * requires OSPKE to be set on the host in order to use {RD,WR}PKRU to
+	 * save/restore the guests PKRU.
 	 */
-	if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
+	if (!tdp_enabled) {
 		kvm_cpu_cap_clear(X86_FEATURE_PKU);
+		kvm_cpu_cap_clear(X86_FEATURE_PKS);
+	} else if (!boot_cpu_has(X86_FEATURE_OSPKE)) {
+		kvm_cpu_cap_clear(X86_FEATURE_PKU);
+	}
 
 	kvm_cpu_cap_mask(CPUID_7_EDX,
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 959b59d13b5a..ab1868a9c177 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -105,6 +105,12 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	       (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL);
 }
 
+static inline bool cpu_has_load_ia32_pkrs(void)
+{
+	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PKRS) &&
+	       (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PKRS);
+}
+
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return (vmcs_config.vmexit_ctrl & VM_EXIT_CLEAR_BNDCFGS) &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b3d5412b9481..6fc70ddeff5a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3270,7 +3270,7 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		}
 
 		/*
-		 * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in
+		 * SMEP/SMAP/PKU/PKS is disabled if CPU is in non-paging mode in
 		 * hardware.  To emulate this behavior, SMEP/SMAP/PKU needs
 		 * to be manually disabled when guest switches to non-paging
 		 * mode.
@@ -3278,10 +3278,11 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		 * If !enable_unrestricted_guest, the CPU is always running
 		 * with CR0.PG=1 and CR4 needs to be modified.
 		 * If enable_unrestricted_guest, the CPU automatically
-		 * disables SMEP/SMAP/PKU when the guest sets CR0.PG=0.
+		 * disables SMEP/SMAP/PKU/PKS when the guest sets CR0.PG=0.
 		 */
 		if (!is_paging(vcpu))
-			hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
+			hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE |
+				    X86_CR4_PKS);
 	}
 
 	vmcs_writel(CR4_READ_SHADOW, cr4);
@@ -7454,6 +7455,13 @@ static __init void vmx_set_cpu_caps(void)
 
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+	/*
+	 * If not support VM_{ENTRY, EXIT}_LOAD_IA32_PKRS,
+	 * don't expose the PKS as well.
+	 */
+	if (cpu_has_load_ia32_pkrs())
+		kvm_cpu_cap_check_and_set(X86_FEATURE_PKS);
 }
 
 static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 8b752cebbefc..8de5ea9f8e25 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -497,6 +497,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_PKS))		\
+		__reserved_bits |= X86_CR4_PKS;		\
 	__reserved_bits;                                \
 })
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
                   ` (5 preceding siblings ...)
  2022-02-21  8:08 ` [PATCH v6 6/7] KVM: VMX: Expose PKS to guest Chenyi Qiang
@ 2022-02-21  8:08 ` Chenyi Qiang
  2022-03-30 21:47   ` Sean Christopherson
  2022-03-07  6:09 ` [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
  7 siblings, 1 reply; 17+ messages in thread
From: Chenyi Qiang @ 2022-02-21  8:08 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: Chenyi Qiang, kvm, linux-kernel

PKS MSR passes through guest directly. Configure the MSR to match the
L0/L1 settings so that nested VM runs PKS properly.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 38 ++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmcs12.c |  2 ++
 arch/x86/kvm/vmx/vmcs12.h |  4 ++++
 arch/x86/kvm/vmx/vmx.c    |  1 +
 arch/x86/kvm/vmx/vmx.h    |  2 ++
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f235f77cbc03..c42a1df385ef 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -252,6 +252,10 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
 	dest->ds_sel = src->ds_sel;
 	dest->es_sel = src->es_sel;
 #endif
+	if (unlikely(src->pkrs != dest->pkrs)) {
+		vmcs_write64(HOST_IA32_PKRS, src->pkrs);
+		dest->pkrs = src->pkrs;
+	}
 }
 
 static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
@@ -685,6 +689,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_PRED_CMD, MSR_TYPE_W);
 
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PKRS, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
@@ -2433,6 +2440,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 		if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
 		    (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
 			vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
+
+		if (vmx->nested.nested_run_pending &&
+		    (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))
+			vmcs_write64(GUEST_IA32_PKRS, vmcs12->guest_ia32_pkrs);
 	}
 
 	if (nested_cpu_has_xsaves(vmcs12))
@@ -2521,6 +2532,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
 	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
 		vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
+	if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
+	    (!vmx->nested.nested_run_pending ||
+	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)))
+		vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs);
+
 	vmx_set_rflags(vcpu, vmcs12->guest_rflags);
 
 	/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
@@ -2897,6 +2913,10 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 					   vmcs12->host_ia32_perf_global_ctrl)))
 		return -EINVAL;
 
+	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) &&
+		CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs)))
+		return -EINVAL;
+
 #ifdef CONFIG_X86_64
 	ia32e = !!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE);
 #else
@@ -3049,6 +3069,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
 	if (nested_check_guest_non_reg_state(vmcs12))
 		return -EINVAL;
 
+	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS) &&
+	    CC(!kvm_pkrs_valid(vmcs12->guest_ia32_pkrs)))
+		return -EINVAL;
+
 	return 0;
 }
 
@@ -3377,6 +3401,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 	if (kvm_mpx_supported() &&
 		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
 		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
+	if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
+	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))
+		vmx->nested.vmcs01_guest_pkrs = vmcs_read64(GUEST_IA32_PKRS);
 
 	/*
 	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
@@ -4022,6 +4049,7 @@ static bool is_vmcs12_ext_field(unsigned long field)
 	case GUEST_IDTR_BASE:
 	case GUEST_PENDING_DBG_EXCEPTIONS:
 	case GUEST_BNDCFGS:
+	case GUEST_IA32_PKRS:
 		return true;
 	default:
 		break;
@@ -4073,6 +4101,8 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu,
 		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 	if (kvm_mpx_supported())
 		vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
+	if (guest_cpuid_has(vcpu, X86_FEATURE_PKS))
+		vmcs12->guest_ia32_pkrs = vmcs_read64(GUEST_IA32_PKRS);
 
 	vmx->nested.need_sync_vmcs02_to_vmcs12_rare = false;
 }
@@ -4310,6 +4340,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 		WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
 					 vmcs12->host_ia32_perf_global_ctrl));
 
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS)
+		vmcs_write64(GUEST_IA32_PKRS, vmcs12->host_ia32_pkrs);
+
 	/* Set L1 segment info according to Intel SDM
 	    27.5.2 Loading Host Segment and Descriptor-Table Registers */
 	seg = (struct kvm_segment) {
@@ -6527,7 +6560,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
+		VM_EXIT_LOAD_IA32_PKRS;
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -6547,7 +6581,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		VM_ENTRY_IA32E_MODE |
 #endif
 		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
-		VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+		VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_PKRS;
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER);
 
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index cab6ba7a5005..f3b964e1496c 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -62,9 +62,11 @@ const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(GUEST_PDPTR2, guest_pdptr2),
 	FIELD64(GUEST_PDPTR3, guest_pdptr3),
 	FIELD64(GUEST_BNDCFGS, guest_bndcfgs),
+	FIELD64(GUEST_IA32_PKRS, guest_ia32_pkrs),
 	FIELD64(HOST_IA32_PAT, host_ia32_pat),
 	FIELD64(HOST_IA32_EFER, host_ia32_efer),
 	FIELD64(HOST_IA32_PERF_GLOBAL_CTRL, host_ia32_perf_global_ctrl),
+	FIELD64(HOST_IA32_PKRS, host_ia32_pkrs),
 	FIELD(PIN_BASED_VM_EXEC_CONTROL, pin_based_vm_exec_control),
 	FIELD(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control),
 	FIELD(EXCEPTION_BITMAP, exception_bitmap),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 2a45f026ee11..42cb4a2b806a 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -185,6 +185,8 @@ struct __packed vmcs12 {
 	u16 host_gs_selector;
 	u16 host_tr_selector;
 	u16 guest_pml_index;
+	u64 host_ia32_pkrs;
+	u64 guest_ia32_pkrs;
 };
 
 /*
@@ -359,6 +361,8 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_gs_selector, 992);
 	CHECK_OFFSET(host_tr_selector, 994);
 	CHECK_OFFSET(guest_pml_index, 996);
+	CHECK_OFFSET(host_ia32_pkrs, 998);
+	CHECK_OFFSET(guest_ia32_pkrs, 1006);
 }
 
 extern const unsigned short vmcs_field_to_offset_table[];
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6fc70ddeff5a..978f4a61771f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7248,6 +7248,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_PKE,        ecx, feature_bit(PKU));
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
+	cr4_fixed1_update(X86_CR4_PKS,        ecx, feature_bit(PKS));
 
 #undef cr4_fixed1_update
 }
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d704ba3a4af7..78bca3d14ed4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -222,6 +222,8 @@ struct nested_vmx {
 	u64 vmcs01_debugctl;
 	u64 vmcs01_guest_bndcfgs;
 
+	u64 vmcs01_guest_pkrs;
+
 	/* to migrate it to L1 if L2 writes to L1's CR8 directly */
 	int l1_tpr_threshold;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/7] KVM: PKS Virtualization support
  2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
                   ` (6 preceding siblings ...)
  2022-02-21  8:08 ` [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM Chenyi Qiang
@ 2022-03-07  6:09 ` Chenyi Qiang
  7 siblings, 0 replies; 17+ messages in thread
From: Chenyi Qiang @ 2022-03-07  6:09 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Xiaoyao Li
  Cc: kvm, linux-kernel

Kindly ping for comments.

On 2/21/2022 4:08 PM, Chenyi Qiang wrote:
> This patch series is based on top of v8 PKS core support kernel patchset:
> https://lore.kernel.org/lkml/20220127175505.851391-1-ira.weiny@intel.com/
> 
> Note: If you read the SDM section 4.6.1 and has some confusion about the
> statement of Data writes to supervisor-mode address:
> 
>    If CR0.WP = 0, data may be written to any supervisor-mode address with
>    a protection key for which write access is permitted.
> 
> Which may seems a little conflict with 4.6.2:
> 
>    if WDi = 1, write accesses are not permitted if CR0.WP = 1. (If CR0.WP
>    = 0, IA32_PKRS.WDi does not affect write accesses to supervisor-mode
>    address with protection key i.)
> 
> In fact, the statement in 4.6.1 doesn't say "a protection key with the
> appropriate WDi bit set." The reader should instead refer to Section
> 4.6.2 to find the definition of what that means. We will follow up
> this with someone internally to make it more clear in SDM.
> 
> ---
> 
> Protection Keys for Supervisor Pages(PKS) is a feature that extends the
> Protection Keys architecture to support thread-specific permission
> restrictions on supervisor pages.
> 
> PKS works similar to an existing feature named PKU(protecting user pages).
> They both perform an additional check after normal paging permission
> checks are done. Access or Writes can be disabled via a MSR update
> without TLB flushes when permissions changes. If violating this
> addional check, #PF occurs and PFEC.PK bit will be set.
> 
> PKS introduces MSR IA32_PKRS to manage supervisor protection key
> rights. The MSR contains 16 pairs of ADi and WDi bits. Each pair
> advertises on a group of pages with the same key which is set in the
> leaf paging-structure entries(bits[62:59]). Currently, IA32_PKRS is not
> supported by XSAVES architecture.
> 
> This patchset aims to add the virtualization of PKS in KVM. It
> implemented PKS CPUID enumeration, vmentry/vmexit configuration, MSR
> exposure, nested supported etc. Currently, PKS is not yet supported for
> shadow paging.
> 
> Detailed information about PKS can be found in the latest Intel 64 and
> IA-32 Architectures Software Developer's Manual.
> 
> ---
> 
> Changelogs:
> 
> v5->v6
> - PKRS is preserved on INIT. Add the PKRS reset operation in kvm_vcpu_reset.
>    (Sean)
> - Track the pkrs as u32. Add the code WARN on bits 64:32 being set in VMCS field.
>    (Sean)
> - Adjust the MSR intercept and entry/exit control in VMCS according to
>    guest CPUID. This resolve the issue when userspace re-enable this feature.
>    (Sean)
> - Split VMX restriction on PKS support(entry/exit load controls) out of
>    common x86. And put tdp restriction together with PKU in common x86.
>    (Sean)
> - Thanks for Sean to revise the comments in mmu.c related to
>    update_pkr_bitmap, which make it more clear for pkr bitmask cache usage.
> - v5: https://lore.kernel.org/lkml/20210811101126.8973-1-chenyi.qiang@intel.com/
> 
> v4->v5
> - Make setting of MSR intercept/vmcs control bits not dependent on guest.CR4.PKS.
>    And set them if PKS is exposed to guest. (Suggested by Sean)
> - Add pkrs to standard register caching mechanism to help update
>    vcpu->arch.pkrs on demand. Add related helper functions. (Suggested by Sean)
> - Do the real pkrs update in VMCS field in vmx_vcpu_reset and
>    vmx_sync_vmcs_host_state(). (Sean)
> - Add a new mmu_role cr4_pks instead of smushing PKU and PKS together.
>    (Sean & Paolo)
> - v4: https://lore.kernel.org/lkml/20210205083706.14146-1-chenyi.qiang@intel.com/
> 
> v3->v4
> - Make the MSR intercept and load-controls setting depend on CR4.PKS value
> - shadow the guest pkrs and make it usable in PKS emultion
> - add the cr4_pke and cr4_pks check in pkr_mask update
> - squash PATCH 2 and PATCH 5 to make the dependencies read more clear
> - v3: https://lore.kernel.org/lkml/20201105081805.5674-1-chenyi.qiang@intel.com/
> 
> v2->v3:
> - No function changes since last submit
> - rebase on the latest PKS kernel support:
>    https://lore.kernel.org/lkml/20201102205320.1458656-1-ira.weiny@intel.com/
> - add MSR_IA32_PKRS to the vmx_possible_passthrough_msrs[]
> - RFC v2: https://lore.kernel.org/lkml/20201014021157.18022-1-chenyi.qiang@intel.com/
> 
> v1->v2:
> - rebase on the latest PKS kernel support:
>    https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3
> - add a kvm-unit-tests for PKS
> - add the check in kvm_init_msr_list for PKRS
> - place the X86_CR4_PKS in mmu_role_bits in kvm_set_cr4
> - add the support to expose VM_{ENTRY, EXIT}_LOAD_IA32_PKRS in nested
>    VMX MSR
> - RFC v1: https://lore.kernel.org/lkml/20200807084841.7112-1-chenyi.qiang@intel.com/
> 
> ---
> 
> Chenyi Qiang (7):
>    KVM: VMX: Introduce PKS VMCS fields
>    KVM: VMX: Add proper cache tracking for PKRS
>    KVM: X86: Expose IA32_PKRS MSR
>    KVM: MMU: Rename the pkru to pkr
>    KVM: MMU: Add support for PKS emulation
>    KVM: VMX: Expose PKS to guest
>    KVM: VMX: Enable PKS for nested VM
> 
>   arch/x86/include/asm/kvm_host.h |  17 ++++--
>   arch/x86/include/asm/vmx.h      |   6 ++
>   arch/x86/kvm/cpuid.c            |  13 ++--
>   arch/x86/kvm/kvm_cache_regs.h   |   7 +++
>   arch/x86/kvm/mmu.h              |  27 +++++----
>   arch/x86/kvm/mmu/mmu.c          | 101 ++++++++++++++++++++------------
>   arch/x86/kvm/vmx/capabilities.h |   6 ++
>   arch/x86/kvm/vmx/nested.c       |  38 +++++++++++-
>   arch/x86/kvm/vmx/vmcs.h         |   1 +
>   arch/x86/kvm/vmx/vmcs12.c       |   2 +
>   arch/x86/kvm/vmx/vmcs12.h       |   4 ++
>   arch/x86/kvm/vmx/vmx.c          |  92 ++++++++++++++++++++++++++---
>   arch/x86/kvm/vmx/vmx.h          |   7 ++-
>   arch/x86/kvm/x86.c              |  10 +++-
>   arch/x86/kvm/x86.h              |   8 +++
>   arch/x86/mm/pkeys.c             |   6 ++
>   include/linux/pkeys.h           |   6 ++
>   17 files changed, 280 insertions(+), 71 deletions(-)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/7] KVM: VMX: Introduce PKS VMCS fields
  2022-02-21  8:08 ` [PATCH v6 1/7] KVM: VMX: Introduce PKS VMCS fields Chenyi Qiang
@ 2022-03-30 20:05   ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2022-03-30 20:05 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel

On Mon, Feb 21, 2022, Chenyi Qiang wrote:
> PKS(Protection Keys for Supervisor Pages) is a feature that extends the
> Protection Key architecture to support thread-specific permission
> restrictions on supervisor pages.
> 
> A new PKS MSR(PKRS) is defined in kernel to support PKS, which holds a
> set of permissions associated with each protection domain.
> 
> Two VMCS fields {HOST,GUEST}_IA32_PKRS are introduced in
> {host,guest}-state area to store the respective values of PKRS.
> 
> Every VM exit saves PKRS into guest-state area.
> If VM_EXIT_LOAD_IA32_PKRS = 1, VM exit loads PKRS from the host-state
> area.
> If VM_ENTRY_LOAD_IA32_PKRS = 1, VM entry loads PKRS from the guest-state
> area.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Add proper cache tracking for PKRS
  2022-02-21  8:08 ` [PATCH v6 2/7] KVM: VMX: Add proper cache tracking for PKRS Chenyi Qiang
@ 2022-03-30 20:42   ` Sean Christopherson
  2022-03-31  5:39     ` Chenyi Qiang
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2022-03-30 20:42 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel

On Mon, Feb 21, 2022, Chenyi Qiang wrote:
> Add PKRS caching into the standard register caching mechanism in order
> to take advantage of the availability checks provided by regs_avail.
> 
> This is because vcpu->arch.pkrs will be rarely acceesed by KVM, only in
> the case of host userspace MSR reads and GVA->GPA translation in
> following patches. It is unnecessary to keep it up-to-date at all times.

It might be worth throwing in a blurb that the potential benefits of this caching
are tenous.

Barring userspace wierdness, the MSR read is not a hot path.

permission_fault() is slightly more common, but I would be surprised if caching
actually provides meaningful performance benefit.  The PKRS checks are done only
once per virtual access, i.e. only on the final translation, so the cache will get
a hit if and only if there are multiple translations in a single round of emulation,
where a "round of emulation" ends upon entry to the guest.  With unrestricted
guest, i.e. for all intents and purposes every VM using PKRS, there aren't _that_
many scenarios where KVM will (a) emulate in the first place and (b) emulate enough
accesses for the caching to be meaningful.

That said, this is basically "free", so I've no objection to adding it.  But I do
think it's worth documenting that it's nice-to-have so that we don't hesitate to
rip it out in the future if there's a strong reason to drop the caching.

> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 5/7] KVM: MMU: Add support for PKS emulation
  2022-02-21  8:08 ` [PATCH v6 5/7] KVM: MMU: Add support for PKS emulation Chenyi Qiang
@ 2022-03-30 21:27   ` Sean Christopherson
  2022-03-31  5:43     ` Chenyi Qiang
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2022-03-30 21:27 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel

On Mon, Feb 21, 2022, Chenyi Qiang wrote:
> @@ -277,14 +278,18 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
>  	if (unlikely(mmu->pkr_mask)) {
>  		u32 pkr_bits, offset;
> +		u32 pkr;
>  
>  		/*
> -		* PKRU defines 32 bits, there are 16 domains and 2
> -		* attribute bits per domain in pkru.  pte_pkey is the
> -		* index of the protection domain, so pte_pkey * 2 is
> -		* is the index of the first bit for the domain.
> +		* PKRU and PKRS both define 32 bits. There are 16 domains
> +		* and 2 attribute bits per domain in them. pte_key is the
> +		* index of the protection domain, so pte_pkey * 2 is the
> +		* index of the first bit for the domain. The use of PKRU
> +		* versus PKRS is selected by the address type, as determined
> +		* by the U/S bit in the paging-structure entries.
>  		*/
> -		pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
> +		pkr = pte_access & PT_USER_MASK ? vcpu->arch.pkru : kvm_read_pkrs(vcpu);

Blindly reading PKRU/PKRS is wrong.  I think this magic insanity will be functionally
correct due to update_pkr_bitmask() clearing the appropriate bits in pkr_mask based
on CR4.PK*, but the read should never happen.  PKRU is benign, but I believe reading
PKRS will result in VMREAD to an invalid field if PKRU is supported and enabled, but
PKRS is not supported.

I belive the easiest solution is:

		if (pte_access & PT_USER_MASK)
			pkr = is_cr4_pke(mmu) ? vcpu->arch.pkru : 0;
		else
			pkr = is_cr4_pks(mmu) ? kvm_read_pkrs(vcpu) : 0;

The is_cr4_pk*() helpers are restricted to mmu.c, but this presents a good
opportunity to extra the PKR stuff to a separate, non-inline helper (as a prep
patch).  E.g.


	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
	if (unlikely(mmu->pkr_mask))
		u32 pkr_bits = kvm_mmu_pkr_bits(vcpu, mmu, pte_access, pte_pkey);

		errcode |= -pkr_bits & PFERR_PK_MASK;
		fault |= (pkr_bits != 0);
	}

	return -(u32)fault & errcode;

permission_fault() is inline because it's heavily used for shadow paging, but
when using TDP, it's far less performance critical.  PKR is TDP-only, so moving
it out-of-line should be totally ok (this is also why this patch is "unlikely").

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 6/7] KVM: VMX: Expose PKS to guest
  2022-02-21  8:08 ` [PATCH v6 6/7] KVM: VMX: Expose PKS to guest Chenyi Qiang
@ 2022-03-30 21:30   ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2022-03-30 21:30 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel

On Mon, Feb 21, 2022, Chenyi Qiang wrote:
> @@ -7454,6 +7455,13 @@ static __init void vmx_set_cpu_caps(void)
>  
>  	if (cpu_has_vmx_waitpkg())
>  		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
> +
> +	/*
> +	 * If not support VM_{ENTRY, EXIT}_LOAD_IA32_PKRS,
> +	 * don't expose the PKS as well.
> +	 */

I wouldn't bother with the comment, this pattern is common and the behavior is
self-explanatory.

> +	if (cpu_has_load_ia32_pkrs())
> +		kvm_cpu_cap_check_and_set(X86_FEATURE_PKS);
>  }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM
  2022-02-21  8:08 ` [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM Chenyi Qiang
@ 2022-03-30 21:47   ` Sean Christopherson
  2022-03-31  6:08     ` Chenyi Qiang
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2022-03-30 21:47 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel

On Mon, Feb 21, 2022, Chenyi Qiang wrote:
> PKS MSR passes through guest directly. Configure the MSR to match the
> L0/L1 settings so that nested VM runs PKS properly.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 38 ++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/vmx/vmcs12.c |  2 ++
>  arch/x86/kvm/vmx/vmcs12.h |  4 ++++
>  arch/x86/kvm/vmx/vmx.c    |  1 +
>  arch/x86/kvm/vmx/vmx.h    |  2 ++
>  5 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index f235f77cbc03..c42a1df385ef 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -252,6 +252,10 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
>  	dest->ds_sel = src->ds_sel;
>  	dest->es_sel = src->es_sel;
>  #endif
> +	if (unlikely(src->pkrs != dest->pkrs)) {
> +		vmcs_write64(HOST_IA32_PKRS, src->pkrs);
> +		dest->pkrs = src->pkrs;
> +	}

It's worth adding a helper for this, a la vmx_set_host_fs_gs(), though this one
can probably be an inline in vmx.h.  E.g. to yield

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bfa37c7665a5..906a2913a886 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -252,10 +252,7 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
        dest->ds_sel = src->ds_sel;
        dest->es_sel = src->es_sel;
 #endif
-       if (unlikely(src->pkrs != dest->pkrs)) {
-               vmcs_write64(HOST_IA32_PKRS, src->pkrs);
-               dest->pkrs = src->pkrs;
-       }
+       vmx_set_host_pkrs(dest, src->pkrs);
 }

 static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 35fee600fae7..b6b5f1a46544 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1157,10 +1157,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
         */
        if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) {
                host_pkrs = get_current_pkrs();
-               if (unlikely(host_pkrs != host_state->pkrs)) {
-                       vmcs_write64(HOST_IA32_PKRS, host_pkrs);
-                       host_state->pkrs = host_pkrs;
-               }
+               vmx_set_host_pkrs(host_state, host_pkrs);
        }

 #ifdef CONFIG_X86_64


>  }
>  
>  static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
> @@ -685,6 +689,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
>  	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
>  					 MSR_IA32_PRED_CMD, MSR_TYPE_W);
>  
> +	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> +					 MSR_IA32_PKRS, MSR_TYPE_RW);
> +
>  	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
>  
>  	vmx->nested.force_msr_bitmap_recalc = false;
> @@ -2433,6 +2440,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
>  		if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
>  		    (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>  			vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
> +
> +		if (vmx->nested.nested_run_pending &&
> +		    (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))
> +			vmcs_write64(GUEST_IA32_PKRS, vmcs12->guest_ia32_pkrs);
>  	}
>  
>  	if (nested_cpu_has_xsaves(vmcs12))
> @@ -2521,6 +2532,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  	if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
>  	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
>  		vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
> +	if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
> +	    (!vmx->nested.nested_run_pending ||
> +	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)))
> +		vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs);
> +
>  	vmx_set_rflags(vcpu, vmcs12->guest_rflags);
>  
>  	/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
> @@ -2897,6 +2913,10 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
>  					   vmcs12->host_ia32_perf_global_ctrl)))
>  		return -EINVAL;
>  
> +	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) &&
> +		CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs)))

Please align the indentation:

	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) &&
	    CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs)))
		return -EINVAL;

> +		return -EINVAL;
> +
>  #ifdef CONFIG_X86_64
>  	ia32e = !!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE);
>  #else
> @@ -3049,6 +3069,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
>  	if (nested_check_guest_non_reg_state(vmcs12))
>  		return -EINVAL;
>  
> +	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS) &&
> +	    CC(!kvm_pkrs_valid(vmcs12->guest_ia32_pkrs)))
> +		return -EINVAL;
> +
>  	return 0;
>  }
>  
> @@ -3377,6 +3401,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  	if (kvm_mpx_supported() &&
>  		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>  		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
> +	if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
> +	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))

This needs read the current PKRS if from_vmentry == false, e.g.

	if (kvm_cpu_cap_has(X86_FEATURE_PKS) && 
	    (!from_vmentry ||
	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)))

because in the migration case, if nested state is set after MSR state, the value
needs to come from the current MSR value, which was propagated to vmc02 (which
this calls vmcs01, but whatever).

Note, I'm pretty sure the GUEST_BNDCFGS code is broken, surprise surprise.

> +		vmx->nested.vmcs01_guest_pkrs = vmcs_read64(GUEST_IA32_PKRS);
>  
>  	/*
>  	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
> @@ -4022,6 +4049,7 @@ static bool is_vmcs12_ext_field(unsigned long field)
>  	case GUEST_IDTR_BASE:
>  	case GUEST_PENDING_DBG_EXCEPTIONS:
>  	case GUEST_BNDCFGS:
> +	case GUEST_IA32_PKRS:
>  		return true;
>  	default:
>  		break;
> @@ -4073,6 +4101,8 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu,
>  		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
>  	if (kvm_mpx_supported())
>  		vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
> +	if (guest_cpuid_has(vcpu, X86_FEATURE_PKS))

This needs to check vmx->nested.msrs.entry_ctls_* (I can never remember if it's
the high or low part...).  The SDM states PKRS is saved "if the processor supports
the 1-setting of the 'load PKRS' VM-entry control", which is different than PKRS
being supported in CPUID.  Also, guest CPUID is userspace controlled, e.g. userspace
could induce a failed VMREAD by giving a garbage CPUID model, where vmx->nested.msrs
can only be restricted by userspace, i.e. is trusted.

Happyily, checking vmx->nested.msrs is also a performance win, as guest_cpuid_has()
can require walking a large array.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 2/7] KVM: VMX: Add proper cache tracking for PKRS
  2022-03-30 20:42   ` Sean Christopherson
@ 2022-03-31  5:39     ` Chenyi Qiang
  0 siblings, 0 replies; 17+ messages in thread
From: Chenyi Qiang @ 2022-03-31  5:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel



On 3/31/2022 4:42 AM, Sean Christopherson wrote:
> On Mon, Feb 21, 2022, Chenyi Qiang wrote:
>> Add PKRS caching into the standard register caching mechanism in order
>> to take advantage of the availability checks provided by regs_avail.
>>
>> This is because vcpu->arch.pkrs will be rarely acceesed by KVM, only in
>> the case of host userspace MSR reads and GVA->GPA translation in
>> following patches. It is unnecessary to keep it up-to-date at all times.
> 
> It might be worth throwing in a blurb that the potential benefits of this caching
> are tenous.
> 
> Barring userspace wierdness, the MSR read is not a hot path.
> 
> permission_fault() is slightly more common, but I would be surprised if caching
> actually provides meaningful performance benefit.  The PKRS checks are done only
> once per virtual access, i.e. only on the final translation, so the cache will get
> a hit if and only if there are multiple translations in a single round of emulation,
> where a "round of emulation" ends upon entry to the guest.  With unrestricted
> guest, i.e. for all intents and purposes every VM using PKRS, there aren't _that_
> many scenarios where KVM will (a) emulate in the first place and (b) emulate enough
> accesses for the caching to be meaningful.
> 
> That said, this is basically "free", so I've no objection to adding it.  But I do
> think it's worth documenting that it's nice-to-have so that we don't hesitate to
> rip it out in the future if there's a strong reason to drop the caching.
> 

OK, will add this note in commit message.

>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 5/7] KVM: MMU: Add support for PKS emulation
  2022-03-30 21:27   ` Sean Christopherson
@ 2022-03-31  5:43     ` Chenyi Qiang
  0 siblings, 0 replies; 17+ messages in thread
From: Chenyi Qiang @ 2022-03-31  5:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel



On 3/31/2022 5:27 AM, Sean Christopherson wrote:
> On Mon, Feb 21, 2022, Chenyi Qiang wrote:
>> @@ -277,14 +278,18 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>>   	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
>>   	if (unlikely(mmu->pkr_mask)) {
>>   		u32 pkr_bits, offset;
>> +		u32 pkr;
>>   
>>   		/*
>> -		* PKRU defines 32 bits, there are 16 domains and 2
>> -		* attribute bits per domain in pkru.  pte_pkey is the
>> -		* index of the protection domain, so pte_pkey * 2 is
>> -		* is the index of the first bit for the domain.
>> +		* PKRU and PKRS both define 32 bits. There are 16 domains
>> +		* and 2 attribute bits per domain in them. pte_key is the
>> +		* index of the protection domain, so pte_pkey * 2 is the
>> +		* index of the first bit for the domain. The use of PKRU
>> +		* versus PKRS is selected by the address type, as determined
>> +		* by the U/S bit in the paging-structure entries.
>>   		*/
>> -		pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
>> +		pkr = pte_access & PT_USER_MASK ? vcpu->arch.pkru : kvm_read_pkrs(vcpu);
> 
> Blindly reading PKRU/PKRS is wrong.  I think this magic insanity will be functionally
> correct due to update_pkr_bitmask() clearing the appropriate bits in pkr_mask based
> on CR4.PK*, but the read should never happen.  PKRU is benign, but I believe reading
> PKRS will result in VMREAD to an invalid field if PKRU is supported and enabled, but
> PKRS is not supported.
> 

Nice catch.

> I belive the easiest solution is:
> 
> 		if (pte_access & PT_USER_MASK)
> 			pkr = is_cr4_pke(mmu) ? vcpu->arch.pkru : 0;
> 		else
> 			pkr = is_cr4_pks(mmu) ? kvm_read_pkrs(vcpu) : 0;
> 
> The is_cr4_pk*() helpers are restricted to mmu.c, but this presents a good
> opportunity to extra the PKR stuff to a separate, non-inline helper (as a prep
> patch).  E.g.
> 
> 
> 	WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
> 	if (unlikely(mmu->pkr_mask))
> 		u32 pkr_bits = kvm_mmu_pkr_bits(vcpu, mmu, pte_access, pte_pkey);
> 
> 		errcode |= -pkr_bits & PFERR_PK_MASK;
> 		fault |= (pkr_bits != 0);
> 	}
> 
> 	return -(u32)fault & errcode;
> 
> permission_fault() is inline because it's heavily used for shadow paging, but
> when using TDP, it's far less performance critical.  PKR is TDP-only, so moving
> it out-of-line should be totally ok (this is also why this patch is "unlikely").

Make sense, will do it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM
  2022-03-30 21:47   ` Sean Christopherson
@ 2022-03-31  6:08     ` Chenyi Qiang
  0 siblings, 0 replies; 17+ messages in thread
From: Chenyi Qiang @ 2022-03-31  6:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Xiaoyao Li, kvm, linux-kernel



On 3/31/2022 5:47 AM, Sean Christopherson wrote:
> On Mon, Feb 21, 2022, Chenyi Qiang wrote:
>> PKS MSR passes through guest directly. Configure the MSR to match the
>> L0/L1 settings so that nested VM runs PKS properly.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> ---
>>   arch/x86/kvm/vmx/nested.c | 38 ++++++++++++++++++++++++++++++++++++--
>>   arch/x86/kvm/vmx/vmcs12.c |  2 ++
>>   arch/x86/kvm/vmx/vmcs12.h |  4 ++++
>>   arch/x86/kvm/vmx/vmx.c    |  1 +
>>   arch/x86/kvm/vmx/vmx.h    |  2 ++
>>   5 files changed, 45 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index f235f77cbc03..c42a1df385ef 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -252,6 +252,10 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
>>   	dest->ds_sel = src->ds_sel;
>>   	dest->es_sel = src->es_sel;
>>   #endif
>> +	if (unlikely(src->pkrs != dest->pkrs)) {
>> +		vmcs_write64(HOST_IA32_PKRS, src->pkrs);
>> +		dest->pkrs = src->pkrs;
>> +	}
> 
> It's worth adding a helper for this, a la vmx_set_host_fs_gs(), though this one
> can probably be an inline in vmx.h.  E.g. to yield
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index bfa37c7665a5..906a2913a886 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -252,10 +252,7 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
>          dest->ds_sel = src->ds_sel;
>          dest->es_sel = src->es_sel;
>   #endif
> -       if (unlikely(src->pkrs != dest->pkrs)) {
> -               vmcs_write64(HOST_IA32_PKRS, src->pkrs);
> -               dest->pkrs = src->pkrs;
> -       }
> +       vmx_set_host_pkrs(dest, src->pkrs);
>   }
> 
>   static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 35fee600fae7..b6b5f1a46544 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1157,10 +1157,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
>           */
>          if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) {
>                  host_pkrs = get_current_pkrs();
> -               if (unlikely(host_pkrs != host_state->pkrs)) {
> -                       vmcs_write64(HOST_IA32_PKRS, host_pkrs);
> -                       host_state->pkrs = host_pkrs;
> -               }
> +               vmx_set_host_pkrs(host_state, host_pkrs);
>          }
> 
>   #ifdef CONFIG_X86_64
> 
> 

Will do.

>>   }
>>   
>>   static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
>> @@ -685,6 +689,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
>>   	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
>>   					 MSR_IA32_PRED_CMD, MSR_TYPE_W);
>>   
>> +	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
>> +					 MSR_IA32_PKRS, MSR_TYPE_RW);
>> +
>>   	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
>>   
>>   	vmx->nested.force_msr_bitmap_recalc = false;
>> @@ -2433,6 +2440,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
>>   		if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
>>   		    (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>>   			vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
>> +
>> +		if (vmx->nested.nested_run_pending &&
>> +		    (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))
>> +			vmcs_write64(GUEST_IA32_PKRS, vmcs12->guest_ia32_pkrs);
>>   	}
>>   
>>   	if (nested_cpu_has_xsaves(vmcs12))
>> @@ -2521,6 +2532,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>   	if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
>>   	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
>>   		vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
>> +	if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
>> +	    (!vmx->nested.nested_run_pending ||
>> +	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)))
>> +		vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs);
>> +
>>   	vmx_set_rflags(vcpu, vmcs12->guest_rflags);
>>   
>>   	/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
>> @@ -2897,6 +2913,10 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
>>   					   vmcs12->host_ia32_perf_global_ctrl)))
>>   		return -EINVAL;
>>   
>> +	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) &&
>> +		CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs)))
> 
> Please align the indentation:
> 
> 	if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) &&
> 	    CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs)))
> 		return -EINVAL;
> 

Fixed.

>> +		return -EINVAL;
>> +
>>   #ifdef CONFIG_X86_64
>>   	ia32e = !!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE);
>>   #else
>> @@ -3049,6 +3069,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
>>   	if (nested_check_guest_non_reg_state(vmcs12))
>>   		return -EINVAL;
>>   
>> +	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS) &&
>> +	    CC(!kvm_pkrs_valid(vmcs12->guest_ia32_pkrs)))
>> +		return -EINVAL;
>> +
>>   	return 0;
>>   }
>>   
>> @@ -3377,6 +3401,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>>   	if (kvm_mpx_supported() &&
>>   		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>>   		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
>> +	if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
>> +	    !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))
> 
> This needs read the current PKRS if from_vmentry == false, e.g.
> 
> 	if (kvm_cpu_cap_has(X86_FEATURE_PKS) &&
> 	    (!from_vmentry ||
> 	     !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)))
> 
> because in the migration case, if nested state is set after MSR state, the value
> needs to come from the current MSR value, which was propagated to vmc02 (which
> this calls vmcs01, but whatever).
> 
> Note, I'm pretty sure the GUEST_BNDCFGS code is broken, surprise surprise.
> 

Yes, I miss the migration case and know your point here. Will fix it and 
also the GUEST_BNDCFGS code in a separate patch.

>> +		vmx->nested.vmcs01_guest_pkrs = vmcs_read64(GUEST_IA32_PKRS);
>>   
>>   	/*
>>   	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
>> @@ -4022,6 +4049,7 @@ static bool is_vmcs12_ext_field(unsigned long field)
>>   	case GUEST_IDTR_BASE:
>>   	case GUEST_PENDING_DBG_EXCEPTIONS:
>>   	case GUEST_BNDCFGS:
>> +	case GUEST_IA32_PKRS:
>>   		return true;
>>   	default:
>>   		break;
>> @@ -4073,6 +4101,8 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu,
>>   		vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
>>   	if (kvm_mpx_supported())
>>   		vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
>> +	if (guest_cpuid_has(vcpu, X86_FEATURE_PKS))
> 
> This needs to check vmx->nested.msrs.entry_ctls_* (I can never remember if it's
> the high or low part...).  The SDM states PKRS is saved "if the processor supports
> the 1-setting of the 'load PKRS' VM-entry control", which is different than PKRS
> being supported in CPUID.  Also, guest CPUID is userspace controlled, e.g. userspace
> could induce a failed VMREAD by giving a garbage CPUID model, where vmx->nested.msrs
> can only be restricted by userspace, i.e. is trusted.
> 
> Happyily, checking vmx->nested.msrs is also a performance win, as guest_cpuid_has()
> can require walking a large array.

Make sense. Checking vmx->nested.msrs.entry_ctls_high is more accurate.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-03-31  6:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21  8:08 [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang
2022-02-21  8:08 ` [PATCH v6 1/7] KVM: VMX: Introduce PKS VMCS fields Chenyi Qiang
2022-03-30 20:05   ` Sean Christopherson
2022-02-21  8:08 ` [PATCH v6 2/7] KVM: VMX: Add proper cache tracking for PKRS Chenyi Qiang
2022-03-30 20:42   ` Sean Christopherson
2022-03-31  5:39     ` Chenyi Qiang
2022-02-21  8:08 ` [PATCH v6 3/7] KVM: X86: Expose IA32_PKRS MSR Chenyi Qiang
2022-02-21  8:08 ` [PATCH v6 4/7] KVM: MMU: Rename the pkru to pkr Chenyi Qiang
2022-02-21  8:08 ` [PATCH v6 5/7] KVM: MMU: Add support for PKS emulation Chenyi Qiang
2022-03-30 21:27   ` Sean Christopherson
2022-03-31  5:43     ` Chenyi Qiang
2022-02-21  8:08 ` [PATCH v6 6/7] KVM: VMX: Expose PKS to guest Chenyi Qiang
2022-03-30 21:30   ` Sean Christopherson
2022-02-21  8:08 ` [PATCH v6 7/7] KVM: VMX: Enable PKS for nested VM Chenyi Qiang
2022-03-30 21:47   ` Sean Christopherson
2022-03-31  6:08     ` Chenyi Qiang
2022-03-07  6:09 ` [PATCH v6 0/7] KVM: PKS Virtualization support Chenyi Qiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.