KVM Archive on lore.kernel.org
 help / color / Atom feed
* [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support
@ 2020-01-02  6:13 Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 01/10] Documentation: Add EPT based Subpage Protection and related APIs Yang Weijiang
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

EPT-Based Sub-Page write Protection(SPP) allows Virtual Machine Monitor(VMM)
specify write-permission for guest physical memory at a sub-page(128 byte)
granularity. When SPP works, HW enforces write-access check for sub-pages
within a protected 4KB page.

The feature targets to provide fine-grained memory protection for
usages such as memory guard and VM introspection etc.

SPP is active when the "sub-page write protection" (bit 23) is 1 in
Secondary VM-Execution Controls. The feature is backed with a Sub-Page
Permission Table(SPPT), and subpage permission vector is stored in the
leaf entry of SPPT. The root page is referenced via a Sub-Page Permission
Table Pointer (SPPTP) in VMCS.

To enable SPP for guest memory, the guest page should be first mapped
to a 4KB EPT entry, then set SPP bit 61 of the corresponding entry. 
While HW walks EPT, it traverses SPPT with the gpa to look up the sub-page
permission vector within SPPT leaf entry. If the corresponding bit is set,
write to sub-page is permitted, otherwise, SPP induced EPT violation is generated.

This patch serial passed SPP function test and selftest on Ice-Lake platform.

Please refer to the SPP introduction document in this patch set and
Intel SDM for details:

Intel SDM:
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Patch 1: Documentation for SPP and related API.
Patch 2: Add control flags for Sub-Page Protection(SPP).
Patch 3: Add SPP Table setup functions.
Patch 4: Add functions to create/destroy SPP bitmap block.
Patch 5: Introduce user-space SPP IOCTLs.
Patch 6: Set up SPP paging table at vmentry/vmexit.
Patch 7: Enable Lazy mode SPP protection.
Patch 8: Handle SPP protected pages when VM memory changes.
Patch 9: Add SPP protection check in emulation case.
Patch 10: SPP selftest.

Change logs:

v9 ->v10
  1. Cleared SPP active flag on VM resetting.
  2. Added trancepoints on subpage setup and SPP induced vmexits.
  3. Fixed a few code issues reported by Intel test robot.

v8 ->v9:
  1. Added SPP protection check in pte prefetch case.
  2. Flushed EPT rmap to remove existing mappings of the target gfns.
  3. Modified documentation to reflect recent changes.
  4. Other minor code refactor.

v7 -> v8:
  1. Changed ioctl interface definition per Paolo's comments.
  2. Replaced SPP_INIT ioctl funciton with KVM_ENABLE_CAP.
  3. Removed SPP bit from X86 feature word.
  4. Returned instruction length to user-space when SPP induced EPT
     violation happens, this is to provide flexibility to use SPP,
     revert write or track write.
  5. Modified selftest application and added into this serial.
  6. Simplified SPP permission vector check.
  7. Moved spp.c and spp.h to kvm/mmu folder.
  8. Other code fix according to Paolo's feedback and testing.

v6 -> v7:
  1. Configured all available protected pages once SPP induced vmexit
     happens since there's no PRESENT bit in SPPT leaf entry.
  2. Changed SPP protection check flow in tdp_page_fault().
  3. Code refactor and minior fixes.

v5 -> v6:
  1. Added SPP protection patch for emulation cases per Jim's review.
  2. Modified documentation and added API description per Jim's review.
  3. Other minior changes suggested by Jim.

v4 -> v5:
  1. Enable SPP support for Hugepage(1GB/2MB) to extend application.
  2. Make SPP miss vm-exit handler as the unified place to set up SPPT.
  3. If SPP protected pages are access-tracked or dirty-page-tracked,
     store SPP flag in reserved address bit, restore it in
     fast_page_fault() handler.
  4. Move SPP specific functions to vmx/spp.c and vmx/spp.h
  5. Rebased code to kernel v5.3
  6. Other change suggested by KVM community.
  
v3 -> v4:
  1. Modified documentation to make it consistent with patches.
  2. Allocated SPPT root page in init_spp() instead of vmx_set_cr3() to
     avoid SPPT miss error.
  3. Added back co-developers and sign-offs.

v2 -> v3:                                                                
  1. Rebased patches to kernel 5.1 release                                
  2. Deferred SPPT setup to EPT fault handler if the page is not
     available while set_subpage() is being called.
  3. Added init IOCTL to reduce extra cost if SPP is not used.
  4. Refactored patch structure, cleaned up cross referenced functions.
  5. Added code to deal with memory swapping/migration/shrinker cases.

v2 -> v1:
  1. Rebased to 4.20-rc1
  2. Move VMCS change to a separated patch.
  3. Code refine and Bug fix 



Yang Weijiang (10):
  Documentation: Add EPT based Subpage Protection and related APIs
  vmx: spp: Add control flags for Sub-Page Protection(SPP)
  mmu: spp: Add SPP Table setup functions
  mmu: spp: Add functions to operate SPP access bitmap
  x86: spp: Introduce user-space SPP IOCTLs
  vmx: spp: Set up SPP paging table at vmentry/vmexit
  mmu: spp: Enable Lazy mode SPP protection
  mmu: spp: Handle SPP protected pages when VM memory changes
  x86: spp: Add SPP protection check in emulation
  kvm: selftests: selftest for Sub-Page protection

 Documentation/virt/kvm/api.txt                |  39 ++
 Documentation/virtual/kvm/spp_kvm.txt         | 179 +++++
 arch/x86/include/asm/kvm_host.h               |  11 +-
 arch/x86/include/asm/vmx.h                    |  10 +
 arch/x86/include/uapi/asm/vmx.h               |   2 +
 arch/x86/kvm/mmu.h                            |   2 +
 arch/x86/kvm/mmu/mmu.c                        | 106 ++-
 arch/x86/kvm/mmu/spp.c                        | 660 ++++++++++++++++++
 arch/x86/kvm/mmu/spp.h                        |  35 +
 arch/x86/kvm/trace.h                          |  66 ++
 arch/x86/kvm/vmx/capabilities.h               |   5 +
 arch/x86/kvm/vmx/vmx.c                        | 104 ++-
 arch/x86/kvm/x86.c                            | 136 +++-
 include/uapi/linux/kvm.h                      |  17 +
 tools/testing/selftests/kvm/Makefile          |   2 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
 tools/testing/selftests/kvm/x86_64/spp_test.c | 234 +++++++
 17 files changed, 1599 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/virtual/kvm/spp_kvm.txt
 create mode 100644 arch/x86/kvm/mmu/spp.c
 create mode 100644 arch/x86/kvm/mmu/spp.h
 create mode 100644 tools/testing/selftests/kvm/x86_64/spp_test.c

-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 01/10] Documentation: Add EPT based Subpage Protection and related APIs
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 02/10] vmx: spp: Add control flags for Sub-Page Protection(SPP) Yang Weijiang
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

Co-developed-by: yi.z.zhang@linux.intel.com
Signed-off-by: yi.z.zhang@linux.intel.com
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 Documentation/virt/kvm/api.txt        |  39 ++++++
 Documentation/virtual/kvm/spp_kvm.txt | 179 ++++++++++++++++++++++++++
 2 files changed, 218 insertions(+)
 create mode 100644 Documentation/virtual/kvm/spp_kvm.txt

diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index ebb37b34dcfc..a08c944d8eb6 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -4168,6 +4168,45 @@ This ioctl issues an ultravisor call to terminate the secure guest,
 unpins the VPA pages and releases all the device pages that are used to
 track the secure pages by hypervisor.
 
+4.122 KVM_SUBPAGES_GET_ACCESS
+
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_subpage_info (in/out)
+Returns: 0 on success, < 0 on error
+
+#define KVM_SUBPAGE_MAX_PAGES   512
+struct kvm_subpage {
+	__u64 gfn_base;    /* the first page gfn of the contiguous pages */
+	__u32 npages; /* number of 4K pages */
+	__u32 flags;  /* reserved to 0 now */
+	__u32 access_map[0]; /* start place of bitmap array */
+};
+
+This ioctl fetches subpage permission from contiguous pages starting with
+gfn. npages is the number of contiguous pages to fetch. access_map contains permission
+vectors fetched for all the pages.
+
+4.123 KVM_SUBPAGES_SET_ACCESS
+
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_subpage_info (in/out)
+Returns: 0 on success, < 0 on error
+
+#define KVM_SUBPAGE_MAX_PAGES   512
+struct kvm_subpage {
+	__u64 gfn_base;    /* the first page gfn of the contiguous pages */
+	__u32 npages; /* number of 4K pages */
+	__u32 flags;  /* reserved to 0 now */
+	__u32 access_map[0]; /* start place of bitmap array */
+};
+
+This ioctl sets subpage permission for contiguous pages starting with gfn. npages is
+the number of contiguous pages to set. access_map contains permission vectors for all the
+pages. Since during execution of the ioctl, it holds mmu_lock, so limits the MAX pages
+to 512 to reduce the impact to EPT.
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/Documentation/virtual/kvm/spp_kvm.txt b/Documentation/virtual/kvm/spp_kvm.txt
new file mode 100644
index 000000000000..1b41125e0cb1
--- /dev/null
+++ b/Documentation/virtual/kvm/spp_kvm.txt
@@ -0,0 +1,179 @@
+EPT-Based Sub-Page Protection (SPP) for KVM
+====================================================
+
+1.Overview
+  EPT-based Sub-Page Protection(SPP) allows VMM to specify
+  fine-grained(128byte per sub-page) write-protection for guest physical
+  memory. When it's enabled, the CPU enforces write-access permission
+  for the sub-pages within a 4KB page, if corresponding bit is set in
+  permission vector, write to sub-page region is allowed, otherwise,
+  it's prevented with a EPT violation.
+
+  *Note*: In current implementation, SPP is exclusive with nested flag,
+  if it's on, SPP feature won't work.
+
+2.SPP Operation
+  Sub-Page Protection Table (SPPT) is introduced to manage sub-page
+  write-access permission.
+
+  It is active when:
+  a) nested flag is turned off.
+  b) "sub-page write protection" VM-execution control is 1.
+  c) SPP is initialized with KVM_ENABLE_CAP ioctl and sub-class KVM_CAP_X86_SPP.
+  d) Sub-page permissions are set with KVM_SUBPAGES_SET_ACCESS ioctl.
+     see below sections for details.
+
+  __________________________________________________________________________
+
+  How SPP hardware works:
+  __________________________________________________________________________
+
+  Guest write access --> GPA --> Walk EPT --> EPT leaf entry -----|
+  |---------------------------------------------------------------|
+  |-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
+       |
+       |-> <false> --> EPT legacy behavior
+       |
+       |
+       |-> <true>  --> if ept_leaf_entry.writable
+                        |
+                        |-> <true>  --> Ignore SPP
+                        |
+                        |-> <false> --> GPA --> Walk SPP 4-level table--|
+                                                                        |
+  |------------<----------get-the-SPPT-point-from-VMCS-field-----<------|
+  |
+  Walk SPP L4E table
+  |
+  |---> if-entry-misconfiguration ------------>-------|-------<---------|
+   |                                                  |                 |
+  else                                                |                 |
+   |                                                  |                 |
+   |   |------------------SPP VMexit<-----------------|                 |
+   |   |                                                                |
+   |   |-> exit_qualification & sppt_misconfig --> sppt misconfig       |
+   |   |                                                                |
+   |   |-> exit_qualification & sppt_miss --> sppt miss                 |
+   |---|                                                                |
+       |                                                                |
+  walk SPPT L3E--|--> if-entry-misconfiguration------------>------------|
+                 |                                                      |
+                else                                                    |
+                 |                                                      |
+                 |                                                      |
+          walk SPPT L2E --|--> if-entry-misconfiguration-------->-------|
+                          |                                             |
+                         else                                           |
+                          |                                             |
+                          |                                             |
+                   walk SPPT L1E --|-> if-entry-misconfiguration--->----|
+                                   |
+                                 else
+                                   |
+                                   |-> if sub-page writable
+                                   |-> <true>  allow, write access
+                                   |-> <false> disallow, EPT violation
+  ______________________________________________________________________________
+
+3.IOCTL Interfaces
+
+    KVM_ENABLE_CAP(capability: KVM_CAP_X86_SPP):
+    Allocate storage for sub-page permission vectors and SPPT root page.
+
+    KVM_SUBPAGES_GET_ACCESS:
+    Get sub-page write permission vectors for given contiguous guest pages.
+
+    KVM_SUBPAGES_SET_ACCESS
+    Set SPP bit in EPT leaf entries for given contiguous guest pages. The
+    actual SPPT setup is triggered when SPP miss vm-exit is handled.
+
+    struct kvm_subpage{
+		__u64 gfn_base;    /* the first page gfn of the contiguous pages */
+		__u32 npages;      /* number of 4K pages */
+		__u32 flags;       /* reserved to 0 now */
+		__u32 access_map[0]; /* start place of bitmap array */
+    };
+
+    #define KVM_SUBPAGES_GET_ACCESS   _IOR(KVMIO,  0x49, __u64)
+    #define KVM_SUBPAGES_SET_ACCESS   _IOW(KVMIO,  0x4a, __u64)
+
+4.Set Sub-Page Permission
+
+  * To enable SPP protection, KVM user-space application sets sub-page permission
+    via KVM_SUBPAGES_SET_ACCESS ioctl:
+    (1) It first stores the access permissions in bitmap array.
+
+    (2) Then, if the target 4KB pages are mapped as PT_PAGE_TABLE_LEVEL entry in EPT,
+	it sets SPP bit of the corresponding entry to mark sub-page protection.
+	If the 4KB pages are mapped within PT_DIRECTORY_LEVEL or PT_PDPE_LEVEL entry,
+	it first zaps the hugepage entries so as to let following memory access to trigger
+	EPT violation, there the gfn is check against SPP permission bitmap and
+	proper level is selected to set up EPT entry.
+
+
+   The SPPT paging structure format is as below:
+
+   Format of the SPPT L4E, L3E, L2E:
+   | Bit    | Contents                                                                 |
+   | :----- | :------------------------------------------------------------------------|
+   | 0      | Valid entry when set; indicates whether the entry is present             |
+   | 11:1   | Reserved (0)                                                             |
+   | N-1:12 | Physical address of 4KB aligned SPPT LX-1 Table referenced by this entry |
+   | 51:N   | Reserved (0)                                                             |
+   | 63:52  | Reserved (0)                                                             |
+   Note: N is the physical address width supported by the processor. X is the page level
+
+   Format of the SPPT L1E:
+   | Bit   | Contents                                                          |
+   | :---- | :---------------------------------------------------------------- |
+   | 0+2i  | Write permission for i-th 128 byte sub-page region.               |
+   | 1+2i  | Reserved (0).                                                     |
+   Note: 0<=i<=31
+
+5.SPPT-induced VM exit
+
+  * SPPT miss and misconfiguration induced VM exit
+
+    A SPPT missing VM exit occurs when walk the SPPT, there is no SPPT
+    misconfiguration but a paging-structure entry is not
+    present in any of L4E/L3E/L2E entries.
+
+    A SPPT misconfiguration VM exit occurs when reserved bits or unsupported values
+    are set in SPPT entry.
+
+    *NOTE* SPPT miss and SPPT misconfigurations can occur only due to
+    "eligible" memory write, this excludes, e.g., guest paging structure,
+    please refer to SDM 28.2 for details of "non-eligible" cases.
+
+  * SPP permission induced VM exit
+    SPP sub-page permission induced violation is reported as EPT violation
+    therefore causes VM exit.
+
+6.SPPT-induced VM exit handling
+
+  #define EXIT_REASON_SPP                 66
+
+  static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
+    ...
+    [EXIT_REASON_SPP]                     = handle_spp,
+    ...
+  };
+
+  New exit qualification for SPPT-induced vmexits.
+
+  | Bit   | Contents                                                          |
+  | :---- | :---------------------------------------------------------------- |
+  | 10:0  | Reserved (0).                                                     |
+  | 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
+  | 12    | NMI unblocking due to IRET                                        |
+  | 63:13 | Reserved (0)                                                      |
+
+  * SPPT miss induced VM exit
+    Set up SPPT entries correctly.
+
+  * SPPT misconfiguration induced VM exit
+    This is left to user-space application to handle.
+
+  * SPP permission induced VM exit
+    This is left to user-space application to handle, e.g.,
+    retry the fault instruction or skip it.
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 02/10] vmx: spp: Add control flags for Sub-Page Protection(SPP)
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 01/10] Documentation: Add EPT based Subpage Protection and related APIs Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-10 16:58   ` Sean Christopherson
  2020-01-02  6:13 ` [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions Yang Weijiang
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

Check SPP capability in MSR_IA32_VMX_PROCBASED_CTLS2, its 23-bit
indicates SPP capability. Enable SPP feature bit in CPU capabilities
bitmap if it's supported.

Co-developed-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/vmx.h      | 1 +
 arch/x86/kvm/mmu.h              | 2 ++
 arch/x86/kvm/vmx/capabilities.h | 5 +++++
 arch/x86/kvm/vmx/vmx.c          | 9 +++++++++
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 1835767aa335..de79a4de0723 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -68,6 +68,7 @@
 #define SECONDARY_EXEC_XSAVES			0x00100000
 #define SECONDARY_EXEC_PT_USE_GPA		0x01000000
 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC	0x00400000
+#define SECONDARY_EXEC_ENABLE_SPP		0x00800000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
 #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE	0x04000000
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index d55674f44a18..5b2807222465 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -26,6 +26,8 @@
 #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
 #define PT_PAT_MASK (1ULL << 7)
 #define PT_GLOBAL_MASK (1ULL << 8)
+#define PT_SPP_SHIFT 61
+#define PT_SPP_MASK (1ULL << PT_SPP_SHIFT)
 #define PT64_NX_SHIFT 63
 #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
 
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 7aa69716d516..3d79fd116687 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -241,6 +241,11 @@ static inline bool cpu_has_vmx_pml(void)
 	return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_PML;
 }
 
+static inline bool cpu_has_vmx_ept_spp(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_SPP;
+}
+
 static inline bool vmx_xsaves_supported(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e3394c839dea..5713e8a6224c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -60,6 +60,7 @@
 #include "vmcs12.h"
 #include "vmx.h"
 #include "x86.h"
+#include "../mmu/spp.h"
 
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
@@ -111,6 +112,7 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);
 
 static bool __read_mostly dump_invalid_vmcs = 0;
 module_param(dump_invalid_vmcs, bool, 0644);
+static bool __read_mostly spp_supported = 0;
 
 #define MSR_BITMAP_MODE_X2APIC		1
 #define MSR_BITMAP_MODE_X2APIC_APICV	2
@@ -2391,6 +2393,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 			SECONDARY_EXEC_RDSEED_EXITING |
 			SECONDARY_EXEC_RDRAND_EXITING |
 			SECONDARY_EXEC_ENABLE_PML |
+			SECONDARY_EXEC_ENABLE_SPP |
 			SECONDARY_EXEC_TSC_SCALING |
 			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
 			SECONDARY_EXEC_PT_USE_GPA |
@@ -4039,6 +4042,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!enable_pml)
 		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
+	if (!spp_supported)
+		exec_control &= ~SECONDARY_EXEC_ENABLE_SPP;
+
 	if (vmx_xsaves_supported()) {
 		/* Exposing XSAVES only when XSAVE is exposed */
 		bool xsaves_enabled =
@@ -7630,6 +7636,9 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_flexpriority())
 		flexpriority_enabled = 0;
 
+	if (cpu_has_vmx_ept_spp() && enable_ept)
+		spp_supported = 1;
+
 	if (!cpu_has_virtual_nmis())
 		enable_vnmi = 0;
 
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 01/10] Documentation: Add EPT based Subpage Protection and related APIs Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 02/10] vmx: spp: Add control flags for Sub-Page Protection(SPP) Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-10 17:26   ` Sean Christopherson
  2020-01-10 17:40   ` Sean Christopherson
  2020-01-02  6:13 ` [RESEND PATCH v10 04/10] mmu: spp: Add functions to operate SPP access bitmap Yang Weijiang
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

SPPT is a 4-level paging structure similar to EPT, when SPP is
armed for target physical page, bit 61 of the corresponding
EPT entry is flaged, then SPPT is traversed with the gfn,
the leaf entry of SPPT contains the access bitmap of subpages
inside the target 4KB physical page, one bit per 128-byte subpage.

Co-developed-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   5 +-
 arch/x86/kvm/mmu/spp.c          | 228 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/spp.h          |  10 ++
 3 files changed, 242 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/mmu/spp.c
 create mode 100644 arch/x86/kvm/mmu/spp.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b79cd6aa4075..9506c9d40895 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -261,7 +261,8 @@ union kvm_mmu_page_role {
 		unsigned smap_andnot_wp:1;
 		unsigned ad_disabled:1;
 		unsigned guest_mode:1;
-		unsigned :6;
+		unsigned spp:1;
+		unsigned reserved:5;
 
 		/*
 		 * This is left at the top of the word so that
@@ -956,6 +957,8 @@ struct kvm_arch {
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
+
+	hpa_t sppt_root;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/spp.c b/arch/x86/kvm/mmu/spp.c
new file mode 100644
index 000000000000..5fca08af705c
--- /dev/null
+++ b/arch/x86/kvm/mmu/spp.c
@@ -0,0 +1,228 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "spp.h"
+
+#define for_each_shadow_spp_entry(_vcpu, _addr, _walker)    \
+	for (shadow_spp_walk_init(&(_walker), _vcpu, _addr);	\
+	     shadow_walk_okay(&(_walker));			\
+	     shadow_walk_next(&(_walker)))
+
+static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
+				 struct kvm_vcpu *vcpu, u64 addr)
+{
+	iterator->addr = addr;
+	iterator->shadow_addr = vcpu->kvm->arch.sppt_root;
+
+	/* SPP Table is a 4-level paging structure */
+	iterator->level = PT64_ROOT_4LEVEL;
+}
+
+static bool __rmap_open_subpage_bit(struct kvm *kvm,
+				    struct kvm_rmap_head *rmap_head)
+{
+	struct rmap_iterator iter;
+	bool flush = false;
+	u64 *sptep;
+	u64 spte;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
+		/*
+		 * SPP works only when the page is write-protected
+		 * and SPP bit is set in EPT leaf entry.
+		 */
+		flush |= spte_write_protect(sptep, false);
+		spte = *sptep | PT_SPP_MASK;
+		flush |= mmu_spte_update(sptep, spte);
+	}
+
+	return flush;
+}
+
+static int kvm_spp_open_write_protect(struct kvm *kvm,
+				      struct kvm_memory_slot *slot,
+				      gfn_t gfn)
+{
+	struct kvm_rmap_head *rmap_head;
+	bool flush = false;
+
+	/*
+	 * SPP is only supported with 4KB level1 memory page, check
+	 * if the page is mapped in EPT leaf entry.
+	 */
+	rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+	if (!rmap_head->val)
+		return -EFAULT;
+
+	flush |= __rmap_open_subpage_bit(kvm, rmap_head);
+
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	return 0;
+}
+
+static bool __rmap_clear_subpage_bit(struct kvm *kvm,
+				     struct kvm_rmap_head *rmap_head)
+{
+	struct rmap_iterator iter;
+	bool flush = false;
+	u64 *sptep;
+	u64 spte;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
+		spte = (*sptep & ~PT_SPP_MASK);
+		flush |= mmu_spte_update(sptep, spte);
+	}
+
+	return flush;
+}
+
+static int kvm_spp_clear_write_protect(struct kvm *kvm,
+				       struct kvm_memory_slot *slot,
+				       gfn_t gfn)
+{
+	struct kvm_rmap_head *rmap_head;
+	bool flush = false;
+
+	rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+	if (!rmap_head->val)
+		return -EFAULT;
+
+	flush |= __rmap_clear_subpage_bit(kvm, rmap_head);
+
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	return 0;
+}
+
+struct kvm_mmu_page *kvm_spp_get_page(struct kvm_vcpu *vcpu,
+				      gfn_t gfn,
+				      unsigned int level)
+{
+	struct kvm_mmu_page *sp;
+	union kvm_mmu_page_role role;
+
+	role = vcpu->arch.mmu->mmu_role.base;
+	role.level = level;
+	role.direct = true;
+	role.spp = true;
+
+	for_each_valid_sp(vcpu->kvm, sp, gfn) {
+		if (sp->gfn != gfn)
+			continue;
+		if (sp->role.word != role.word)
+			continue;
+		if (sp->role.spp && sp->role.level == level)
+			goto out;
+	}
+
+	sp = kvm_mmu_alloc_page(vcpu, true);
+	sp->gfn = gfn;
+	sp->role = role;
+	hlist_add_head(&sp->hash_link,
+		       &vcpu->kvm->arch.mmu_page_hash
+		       [kvm_page_table_hashfn(gfn)]);
+	clear_page(sp->spt);
+out:
+	return sp;
+}
+EXPORT_SYMBOL_GPL(kvm_spp_get_page);
+
+static void link_spp_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+				 struct kvm_mmu_page *sp)
+{
+	u64 spte;
+
+	spte = __pa(sp->spt) | PT_PRESENT_MASK;
+
+	mmu_spte_set(sptep, spte);
+
+	mmu_page_add_parent_pte(vcpu, sp, sptep);
+}
+
+static u64 format_spp_spte(u32 spp_wp_bitmap)
+{
+	u64 new_spte = 0;
+	int i = 0;
+
+	/*
+	 * One 4K page contains 32 sub-pages, in SPP table L4E, old bits
+	 * are reserved, so we need to transfer u32 subpage write
+	 * protect bitmap to u64 SPP L4E format.
+	 */
+	while (i < 32) {
+		if (spp_wp_bitmap & (1ULL << i))
+			new_spte |= 1ULL << (i * 2);
+		i++;
+	}
+
+	return new_spte;
+}
+
+static void spp_spte_set(u64 *sptep, u64 new_spte)
+{
+	__set_spte(sptep, new_spte);
+}
+
+bool is_spp_spte(struct kvm_mmu_page *sp)
+{
+	return sp->role.spp;
+}
+
+#define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)
+
+int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
+			    u32 access_map, gfn_t gfn)
+{
+	struct kvm_shadow_walk_iterator iter;
+	struct kvm_mmu_page *sp;
+	gfn_t pseudo_gfn;
+	u64 old_spte, spp_spte;
+	int ret = -EFAULT;
+
+	/* direct_map spp start */
+	if (!VALID_PAGE(vcpu->kvm->arch.sppt_root))
+		return -EFAULT;
+
+	for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
+		if (iter.level == PT_PAGE_TABLE_LEVEL) {
+			spp_spte = format_spp_spte(access_map);
+			old_spte = mmu_spte_get_lockless(iter.sptep);
+			if (old_spte != spp_spte)
+				spp_spte_set(iter.sptep, spp_spte);
+			ret = 0;
+			break;
+		}
+
+		if (!is_shadow_present_pte(*iter.sptep)) {
+			u64 base_addr = iter.addr;
+
+			base_addr &= PT64_LVL_ADDR_MASK(iter.level);
+			pseudo_gfn = base_addr >> PAGE_SHIFT;
+			spp_spte = *iter.sptep;
+			sp = kvm_spp_get_page(vcpu, pseudo_gfn,
+					      iter.level - 1);
+			link_spp_shadow_page(vcpu, iter.sptep, sp);
+		} else if (iter.level == PT_DIRECTORY_LEVEL  &&
+			   !(spp_spte & PT_PRESENT_MASK) &&
+			   (spp_spte & SPPT_ENTRY_PHA_MASK)) {
+			spp_spte = *iter.sptep;
+			spp_spte |= PT_PRESENT_MASK;
+			spp_spte_set(iter.sptep, spp_spte);
+		}
+	}
+
+	kvm_flush_remote_tlbs(vcpu->kvm);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);
+
+inline u64 construct_spptp(unsigned long root_hpa)
+{
+	return root_hpa & PAGE_MASK;
+}
+EXPORT_SYMBOL_GPL(construct_spptp);
+
diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
new file mode 100644
index 000000000000..8ef94b7a2057
--- /dev/null
+++ b/arch/x86/kvm/mmu/spp.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_SPP_H
+#define __KVM_X86_VMX_SPP_H
+
+bool is_spp_spte(struct kvm_mmu_page *sp);
+u64 construct_spptp(unsigned long root_hpa);
+int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
+			    u32 access_map, gfn_t gfn);
+
+#endif /* __KVM_X86_VMX_SPP_H */
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 04/10] mmu: spp: Add functions to operate SPP access bitmap
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
                   ` (2 preceding siblings ...)
  2020-01-02  6:13 ` [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-10 17:38   ` Sean Christopherson
  2020-01-02  6:13 ` [RESEND PATCH v10 05/10] x86: spp: Introduce user-space SPP IOCTLs Yang Weijiang
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

Create access bitmap for SPP subpages, the bitmap can
be accessed with a gfn. The initial access bitmap for each
physical page is 0xFFFFFFFF, meaning SPP is not enabled for the
subpages.

Co-developed-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/mmu/spp.c          | 332 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/spp.h          |  12 ++
 include/uapi/linux/kvm.h        |   8 +
 4 files changed, 354 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9506c9d40895..f5145b86d620 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -812,6 +812,7 @@ struct kvm_lpage_info {
 
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
+	u32 *subpage_wp_info;
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
 };
@@ -959,6 +960,7 @@ struct kvm_arch {
 	struct task_struct *nx_lpage_recovery_thread;
 
 	hpa_t sppt_root;
+	bool spp_active;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/spp.c b/arch/x86/kvm/mmu/spp.c
index 5fca08af705c..edab5ec83ef3 100644
--- a/arch/x86/kvm/mmu/spp.c
+++ b/arch/x86/kvm/mmu/spp.c
@@ -17,6 +17,21 @@ static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
 	iterator->level = PT64_ROOT_4LEVEL;
 }
 
+u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	unsigned long idx;
+
+	if (!slot->arch.subpage_wp_info)
+		return NULL;
+
+	idx = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+	if (idx > slot->npages - 1)
+		return NULL;
+
+	return &slot->arch.subpage_wp_info[idx];
+}
+EXPORT_SYMBOL_GPL(gfn_to_subpage_wp_info);
+
 static bool __rmap_open_subpage_bit(struct kvm *kvm,
 				    struct kvm_rmap_head *rmap_head)
 {
@@ -172,6 +187,20 @@ bool is_spp_spte(struct kvm_mmu_page *sp)
 	return sp->role.spp;
 }
 
+static int kvm_spp_level_pages(gfn_t gfn_lower, gfn_t gfn_upper, int level)
+{
+	int page_num = KVM_PAGES_PER_HPAGE(level);
+	gfn_t gfn_max = (gfn_lower & ~(page_num - 1)) + page_num - 1;
+	int ret;
+
+	if (gfn_upper <= gfn_max)
+		ret = gfn_upper - gfn_lower + 1;
+	else
+		ret = gfn_max - gfn_lower + 1;
+
+	return ret;
+}
+
 #define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)
 
 int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
@@ -220,6 +249,309 @@ int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
 }
 EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);
 
+int vmx_spp_flush_sppt(struct kvm *kvm, u64 gfn_base, u32 npages)
+{
+	struct kvm_shadow_walk_iterator iter;
+	struct kvm_vcpu *vcpu;
+	gfn_t gfn = gfn_base;
+	gfn_t gfn_max = gfn_base + npages - 1;
+	u64 spde;
+	int count;
+	bool flush = false;
+
+	vcpu = kvm_get_vcpu(kvm, 0);
+	if (!VALID_PAGE(vcpu->kvm->arch.sppt_root))
+		return -EFAULT;
+
+	for (; gfn <= gfn_max; gfn++) {
+		for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
+			if (!is_shadow_present_pte(*iter.sptep))
+				break;
+
+			if (iter.level == PT_DIRECTORY_LEVEL) {
+				spde = *iter.sptep;
+				spde &= ~PT_PRESENT_MASK;
+				spp_spte_set(iter.sptep, spde);
+				count = kvm_spp_level_pages(gfn,
+							    gfn_max,
+							    PT_DIRECTORY_LEVEL);
+				flush = true;
+				if (count >= npages)
+					goto out;
+				gfn += count;
+				break;
+			}
+		}
+	}
+out:
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vmx_spp_flush_sppt);
+
+static int kvm_spp_create_bitmaps(struct kvm *kvm)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int i, j, ret;
+	u32 *buff;
+
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot(memslot, slots) {
+			buff = kvzalloc(memslot->npages *
+				sizeof(*memslot->arch.subpage_wp_info),
+				GFP_KERNEL);
+
+			if (!buff) {
+				ret = -ENOMEM;
+				goto out_free;
+			}
+			memslot->arch.subpage_wp_info = buff;
+
+			for (j = 0; j < memslot->npages; j++)
+				buff[j] = FULL_SPP_ACCESS;
+		}
+	}
+
+	return 0;
+out_free:
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot(memslot, slots) {
+			if (memslot->arch.subpage_wp_info) {
+				kvfree(memslot->arch.subpage_wp_info);
+				memslot->arch.subpage_wp_info = NULL;
+			}
+		}
+	}
+
+	return ret;
+}
+
+int vmx_spp_init(struct kvm *kvm)
+{
+	int i, ret;
+	struct kvm_vcpu *vcpu;
+	int root_level;
+	struct kvm_mmu_page *ssp_sp;
+	bool first_root = true;
+
+	/* SPP feature is exclusive with nested VM.*/
+	if (kvm_x86_ops->get_nested_state)
+		return -EPERM;
+
+	if (kvm->arch.spp_active)
+		return 0;
+
+	ret = kvm_spp_create_bitmaps(kvm);
+
+	if (ret)
+		return ret;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (first_root) {
+			/* prepare caches for SPP setup.*/
+			mmu_topup_memory_caches(vcpu);
+			root_level = vcpu->arch.mmu->shadow_root_level;
+			ssp_sp = kvm_spp_get_page(vcpu, 0, root_level);
+			first_root = false;
+			vcpu->kvm->arch.sppt_root = __pa(ssp_sp->spt);
+		}
+		++ssp_sp->root_count;
+		kvm_make_request(KVM_REQ_LOAD_CR3, vcpu);
+	}
+
+	kvm->arch.spp_active = true;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vmx_spp_init);
+
+int kvm_spp_get_permission(struct kvm *kvm, u64 gfn, u32 npages,
+			   u32 *access_map)
+{
+	u32 *access;
+	struct kvm_memory_slot *slot;
+	int i;
+
+	if (!kvm->arch.spp_active)
+		return -ENODEV;
+
+	for (i = 0; i < npages; i++, gfn++) {
+		slot = gfn_to_memslot(kvm, gfn);
+		if (!slot)
+			return -EFAULT;
+		access = gfn_to_subpage_wp_info(slot, gfn);
+		if (!access)
+			return -EFAULT;
+		access_map[i] = *access;
+	}
+
+	return i;
+}
+EXPORT_SYMBOL_GPL(kvm_spp_get_permission);
+
+static void kvm_spp_zap_pte(struct kvm *kvm, u64 *spte, int level)
+{
+	u64 pte;
+
+	pte = *spte;
+	if (is_shadow_present_pte(pte) && is_last_spte(pte, level)) {
+		drop_spte(kvm, spte);
+		if (is_large_pte(pte))
+			--kvm->stat.lpages;
+	}
+}
+
+static bool kvm_spp_flush_rmap(struct kvm *kvm, u64 gfn_min, u64 gfn_max)
+{
+	u64 *sptep;
+	struct rmap_iterator iter;
+	struct kvm_rmap_head *rmap_head;
+	int level;
+	struct kvm_memory_slot *slot;
+	bool flush = false;
+
+	slot = gfn_to_memslot(kvm, gfn_min);
+	if (!slot)
+		return false;
+
+	for (; gfn_min <= gfn_max; gfn_min++) {
+		for (level = PT_PAGE_TABLE_LEVEL;
+		     level <= PT_DIRECTORY_LEVEL; level++) {
+			rmap_head = __gfn_to_rmap(gfn_min, level, slot);
+			for_each_rmap_spte(rmap_head, &iter, sptep) {
+				pte_list_remove(rmap_head, sptep);
+				flush = true;
+			}
+		}
+	}
+
+	return flush;
+}
+
+int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
+			   u32 *access_map)
+{
+	gfn_t old_gfn = gfn;
+	u32 *access;
+	struct kvm_memory_slot *slot;
+	struct kvm_shadow_walk_iterator iterator;
+	struct kvm_vcpu *vcpu;
+	gfn_t gfn_max;
+	int i, count, level;
+	bool flush = false;
+
+	if (!kvm->arch.spp_active)
+		return -ENODEV;
+
+	vcpu = kvm_get_vcpu(kvm, 0);
+	if (!VALID_PAGE(vcpu->kvm->arch.sppt_root))
+		return -EFAULT;
+
+	for (i = 0; i < npages; i++, gfn++) {
+		slot = gfn_to_memslot(kvm, gfn);
+		if (!slot)
+			return -EFAULT;
+
+		access = gfn_to_subpage_wp_info(slot, gfn);
+		if (!access)
+			return -EFAULT;
+		*access = access_map[i];
+	}
+
+	gfn = old_gfn;
+	gfn_max = gfn + npages - 1;
+	vcpu = kvm_get_vcpu(kvm, 0);
+
+	if (!vcpu || (vcpu && !VALID_PAGE(vcpu->arch.mmu->root_hpa)))
+		goto out;
+
+	flush = kvm_spp_flush_rmap(kvm, gfn, gfn_max);
+
+	for (i = 0; gfn <= gfn_max; i++, gfn++) {
+		for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
+			if (!is_shadow_present_pte(*iterator.sptep))
+				break;
+
+			if (iterator.level == PT_PAGE_TABLE_LEVEL) {
+				if (kvm_spp_mark_protection(kvm,
+							    gfn,
+							    access_map[i]) < 0)
+					return -EFAULT;
+				break;
+			} else if (is_large_pte(*iterator.sptep)) {
+				level = iterator.level;
+				if (access_map[i] == FULL_SPP_ACCESS)
+					break;
+				count = kvm_spp_level_pages(gfn,
+							    gfn_max,
+							    level);
+				kvm_spp_zap_pte(kvm, iterator.sptep, level);
+				flush = true;
+				if (count >= npages)
+					goto out;
+				gfn += count - 1;
+			}
+		}
+	}
+out:
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+	return npages;
+}
+
+int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access)
+{
+	struct kvm_memory_slot *slot;
+	struct kvm_rmap_head *rmap_head;
+	int ret;
+
+	if (!kvm->arch.spp_active)
+		return -ENODEV;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot)
+		return -EFAULT;
+
+	/*
+	 * check whether the target 4KB page exists in EPT leaf
+	 * entry.If it's there, just flag SPP bit of the entry,
+	 * defer the setup to SPPT miss induced vm-exit  handler.
+	 */
+	rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+	if (rmap_head->val) {
+		/*
+		 * if all subpages are not writable, open SPP bit in
+		 * EPT leaf entry to enable SPP protection for
+		 * corresponding page.
+		 */
+		if (access != FULL_SPP_ACCESS) {
+			ret = kvm_spp_open_write_protect(kvm, slot, gfn);
+			if (ret)
+				return ret;
+		} else {
+			ret = kvm_spp_clear_write_protect(kvm, slot, gfn);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+void kvm_spp_free_memslot(struct kvm_memory_slot *free,
+			  struct kvm_memory_slot *dont)
+{
+	if (!dont || free->arch.subpage_wp_info !=
+	    dont->arch.subpage_wp_info) {
+		kvfree(free->arch.subpage_wp_info);
+		free->arch.subpage_wp_info = NULL;
+	}
+}
+
 inline u64 construct_spptp(unsigned long root_hpa)
 {
 	return root_hpa & PAGE_MASK;
diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
index 8ef94b7a2057..99d93fdc74b5 100644
--- a/arch/x86/kvm/mmu/spp.h
+++ b/arch/x86/kvm/mmu/spp.h
@@ -2,9 +2,21 @@
 #ifndef __KVM_X86_VMX_SPP_H
 #define __KVM_X86_VMX_SPP_H
 
+#define FULL_SPP_ACCESS		((u32)((1ULL << 32) - 1))
+
+int kvm_spp_get_permission(struct kvm *kvm, u64 gfn, u32 npages,
+			   u32 *access_map);
+int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
+			   u32 *access_map);
+int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access);
 bool is_spp_spte(struct kvm_mmu_page *sp);
 u64 construct_spptp(unsigned long root_hpa);
 int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
 			    u32 access_map, gfn_t gfn);
+int vmx_spp_flush_sppt(struct kvm *kvm, u64 gfn_base, u32 npages);
+void kvm_spp_free_memslot(struct kvm_memory_slot *free,
+			  struct kvm_memory_slot *dont);
+int vmx_spp_init(struct kvm *kvm);
+u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot, gfn_t gfn);
 
 #endif /* __KVM_X86_VMX_SPP_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0a16b4adbbd..eabd55ec5af7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -102,6 +102,14 @@ struct kvm_userspace_memory_region {
 	__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+struct kvm_subpage {
+	__u64 gfn_base; /* the first page gfn of the contiguous pages */
+	__u32 npages;   /* number of 4K pages */
+	__u32 flags;    /* reserved to 0 now */
+	__u32 access_map[0]; /* start place of bitmap array */
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
  * other bits are reserved for kvm internal use which are defined in
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 05/10] x86: spp: Introduce user-space SPP IOCTLs
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
                   ` (3 preceding siblings ...)
  2020-01-02  6:13 ` [RESEND PATCH v10 04/10] mmu: spp: Add functions to operate SPP access bitmap Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-10 18:10   ` Sean Christopherson
  2020-01-02  6:13 ` [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit Yang Weijiang
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

User application, e.g., QEMU or VMI, must initialize SPP
before gets/sets SPP subpages, the dynamic initialization is to
reduce the extra storage cost if the SPP feature is not not used.

Co-developed-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++
 arch/x86/kvm/mmu/spp.c          | 44 +++++++++++++++
 arch/x86/kvm/mmu/spp.h          |  9 ++++
 arch/x86/kvm/vmx/vmx.c          | 15 ++++++
 arch/x86/kvm/x86.c              | 95 ++++++++++++++++++++++++++++++++-
 include/uapi/linux/kvm.h        |  3 ++
 6 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f5145b86d620..c7a9f03f39a7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1238,6 +1238,10 @@ struct kvm_x86_ops {
 
 	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
 	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
+
+	int (*init_spp)(struct kvm *kvm);
+	int (*flush_subpages)(struct kvm *kvm, u64 gfn, u32 npages);
+	int (*get_inst_len)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/mmu/spp.c b/arch/x86/kvm/mmu/spp.c
index edab5ec83ef3..6f611e04e817 100644
--- a/arch/x86/kvm/mmu/spp.c
+++ b/arch/x86/kvm/mmu/spp.c
@@ -558,3 +558,47 @@ inline u64 construct_spptp(unsigned long root_hpa)
 }
 EXPORT_SYMBOL_GPL(construct_spptp);
 
+int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
+			      u64 gfn,
+			      u32 npages,
+			      u32 *access_map)
+{
+	int ret;
+
+	mutex_lock(&kvm->slots_lock);
+	ret = kvm_spp_get_permission(kvm, gfn, npages, access_map);
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_ioctl_get_subpages);
+
+int kvm_vm_ioctl_set_subpages(struct kvm *kvm,
+			      u64 gfn,
+			      u32 npages,
+			      u32 *access_map)
+{
+	int ret;
+
+	if (!kvm_x86_ops->flush_subpages)
+		return -EINVAL;
+
+	spin_lock(&kvm->mmu_lock);
+	ret = kvm_x86_ops->flush_subpages(kvm, gfn, npages);
+	spin_unlock(&kvm->mmu_lock);
+
+	if (ret < 0)
+		return ret;
+
+	mutex_lock(&kvm->slots_lock);
+	spin_lock(&kvm->mmu_lock);
+
+	ret = kvm_spp_set_permission(kvm, gfn, npages, access_map);
+
+	spin_unlock(&kvm->mmu_lock);
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_ioctl_set_subpages);
+
diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
index 99d93fdc74b5..daa69bd274da 100644
--- a/arch/x86/kvm/mmu/spp.h
+++ b/arch/x86/kvm/mmu/spp.h
@@ -3,6 +3,7 @@
 #define __KVM_X86_VMX_SPP_H
 
 #define FULL_SPP_ACCESS		((u32)((1ULL << 32) - 1))
+#define KVM_SUBPAGE_MAX_PAGES   512
 
 int kvm_spp_get_permission(struct kvm *kvm, u64 gfn, u32 npages,
 			   u32 *access_map);
@@ -11,6 +12,14 @@ int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
 int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access);
 bool is_spp_spte(struct kvm_mmu_page *sp);
 u64 construct_spptp(unsigned long root_hpa);
+int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
+			      u64 gfn,
+			      u32 npages,
+			      u32 *access_map);
+int kvm_vm_ioctl_set_subpages(struct kvm *kvm,
+			      u64 gfn,
+			      u32 npages,
+			      u32 *access_map);
 int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
 			    u32 access_map, gfn_t gfn);
 int vmx_spp_flush_sppt(struct kvm *kvm, u64 gfn_base, u32 npages);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5713e8a6224c..24e4e1c47f42 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1424,6 +1424,11 @@ static bool emulation_required(struct kvm_vcpu *vcpu)
 	return emulate_invalid_guest_state && !guest_state_valid(vcpu);
 }
 
+static int vmx_get_inst_len(struct kvm_vcpu *vcpu)
+{
+	return vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+}
+
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu);
 
 unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
@@ -7705,6 +7710,12 @@ static __init int hardware_setup(void)
 		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
 	}
 
+	if (!spp_supported) {
+		kvm_x86_ops->flush_subpages = NULL;
+		kvm_x86_ops->init_spp = NULL;
+		kvm_x86_ops->get_inst_len = NULL;
+	}
+
 	if (!cpu_has_vmx_preemption_timer())
 		enable_preemption_timer = false;
 
@@ -7917,6 +7928,10 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.nested_get_evmcs_version = NULL,
 	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
 	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
+
+	.flush_subpages = vmx_spp_flush_sppt,
+	.init_spp = vmx_spp_init,
+	.get_inst_len = vmx_get_inst_len,
 };
 
 static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf917139de6b..fb7da000ceaf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -26,6 +26,7 @@
 #include "cpuid.h"
 #include "pmu.h"
 #include "hyperv.h"
+#include "mmu/spp.h"
 
 #include <linux/clocksource.h>
 #include <linux/interrupt.h>
@@ -3335,6 +3336,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
 		r = kvm_x86_ops->nested_enable_evmcs != NULL;
 		break;
+	case KVM_CAP_X86_SPP:
+		r = KVM_SUBPAGE_MAX_PAGES;
+		break;
 	default:
 		break;
 	}
@@ -4114,7 +4118,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 			return -ENOTTY;
 
 		return kvm_x86_ops->enable_direct_tlbflush(vcpu);
-
 	default:
 		return -EINVAL;
 	}
@@ -4830,6 +4833,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.exception_payload_enabled = cap->args[0];
 		r = 0;
 		break;
+	case KVM_CAP_X86_SPP:
+		r =  kvm_x86_ops->init_spp(kvm);
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -5134,6 +5140,93 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	case KVM_SET_PMU_EVENT_FILTER:
 		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
 		break;
+	case KVM_SUBPAGES_GET_ACCESS: {
+		struct kvm_subpage spp_info, *pinfo;
+		u32 total;
+
+		r = -ENODEV;
+		if (!kvm->arch.spp_active)
+			goto out;
+
+		r = -EFAULT;
+		if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+			goto out;
+
+		r = -EINVAL;
+		if (spp_info.flags != 0 ||
+		    spp_info.npages > KVM_SUBPAGE_MAX_PAGES)
+			goto out;
+		r = 0;
+		if (!spp_info.npages)
+			goto out;
+
+		total = sizeof(spp_info) +
+			sizeof(spp_info.access_map[0]) * spp_info.npages;
+		pinfo = kvzalloc(total, GFP_KERNEL_ACCOUNT);
+
+		r = -ENOMEM;
+		if (!pinfo)
+			goto out;
+
+		r = -EFAULT;
+		if (copy_from_user(pinfo, argp, total))
+			goto out;
+
+		r = kvm_vm_ioctl_get_subpages(kvm,
+					      pinfo->gfn_base,
+					      pinfo->npages,
+					      pinfo->access_map);
+		if (r != pinfo->npages)
+			goto out;
+
+		r = -EFAULT;
+		if (copy_to_user(argp, pinfo, total))
+			goto out;
+
+		r = pinfo->npages;
+		kfree(pinfo);
+		break;
+	}
+	case KVM_SUBPAGES_SET_ACCESS: {
+		struct kvm_subpage spp_info, *pinfo;
+		u32 total;
+
+		r = -ENODEV;
+		if (!kvm->arch.spp_active)
+			goto out;
+
+		r = -EFAULT;
+		if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+			goto out;
+
+		r = -EINVAL;
+		if (spp_info.flags != 0 ||
+		    spp_info.npages > KVM_SUBPAGE_MAX_PAGES)
+			goto out;
+
+		r = 0;
+		if (!spp_info.npages)
+			goto out;
+
+		total = sizeof(spp_info) +
+			sizeof(spp_info.access_map[0]) * spp_info.npages;
+		pinfo = kvzalloc(total, GFP_KERNEL_ACCOUNT);
+
+		r = -ENOMEM;
+		if (!pinfo)
+			goto out;
+
+		r = -EFAULT;
+		if (copy_from_user(pinfo, argp, total))
+			goto out;
+
+		r = kvm_vm_ioctl_set_subpages(kvm,
+					      pinfo->gfn_base,
+					      pinfo->npages,
+					      pinfo->access_map);
+		kfree(pinfo);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index eabd55ec5af7..09e5e8e6e6dd 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1017,6 +1017,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
 #define KVM_CAP_ARM_NISV_TO_USER 177
 #define KVM_CAP_ARM_INJECT_EXT_DABT 178
+#define KVM_CAP_X86_SPP 179
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1272,6 +1273,8 @@ struct kvm_vfio_spapr_tce {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SUBPAGES_GET_ACCESS   _IOR(KVMIO,  0x49, __u64)
+#define KVM_SUBPAGES_SET_ACCESS   _IOW(KVMIO,  0x4a, __u64)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
                   ` (4 preceding siblings ...)
  2020-01-02  6:13 ` [RESEND PATCH v10 05/10] x86: spp: Introduce user-space SPP IOCTLs Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-10 17:55   ` Sean Christopherson
  2020-01-10 18:04   ` Sean Christopherson
  2020-01-02  6:13 ` [RESEND PATCH v10 07/10] mmu: spp: Enable Lazy mode SPP protection Yang Weijiang
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

If write to subpage is not allowed, EPT violation generates
and it's handled in fast_page_fault().

In current implementation, SPPT setup is only handled in handle_spp()
vmexit handler, it's triggered when SPP bit is set in EPT leaf
entry while SPPT entries are not ready.

A SPP specific bit(11) is added to exit_qualification and a new
exit reason(66) is introduced for SPP.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/vmx.h      |  9 ++++
 arch/x86/include/uapi/asm/vmx.h |  2 +
 arch/x86/kvm/mmu/mmu.c          | 69 +++++++++++++++++++++++++---
 arch/x86/kvm/mmu/spp.c          | 13 ++++++
 arch/x86/kvm/mmu/spp.h          |  2 +
 arch/x86/kvm/trace.h            | 66 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c          | 80 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  4 ++
 include/uapi/linux/kvm.h        |  6 +++
 9 files changed, 244 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index de79a4de0723..81faf6607a9e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -213,6 +213,8 @@ enum vmcs_field {
 	XSS_EXIT_BITMAP_HIGH            = 0x0000202D,
 	ENCLS_EXITING_BITMAP		= 0x0000202E,
 	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
+	SPPT_POINTER			= 0x00002030,
+	SPPT_POINTER_HIGH		= 0x00002031,
 	TSC_MULTIPLIER                  = 0x00002032,
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
@@ -534,6 +536,13 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_EXECUTABLE	(1 << EPT_VIOLATION_EXECUTABLE_BIT)
 #define EPT_VIOLATION_GVA_TRANSLATED	(1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
 
+/*
+ * Exit Qualifications for SPPT-Induced vmexits
+ */
+#define SPPT_INDUCED_EXIT_TYPE_BIT     11
+#define SPPT_INDUCED_EXIT_TYPE         (1 << SPPT_INDUCED_EXIT_TYPE_BIT)
+#define SPPT_INTR_INFO_UNBLOCK_NMI     INTR_INFO_UNBLOCK_NMI
+
 /*
  * VM-instruction error numbers
  */
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 3eb8411ab60e..4833a39a799b 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -86,6 +86,7 @@
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
+#define EXIT_REASON_SPP                 66
 #define EXIT_REASON_UMWAIT              67
 #define EXIT_REASON_TPAUSE              68
 
@@ -145,6 +146,7 @@
 	{ EXIT_REASON_ENCLS,                 "ENCLS" }, \
 	{ EXIT_REASON_RDSEED,                "RDSEED" }, \
 	{ EXIT_REASON_PML_FULL,              "PML_FULL" }, \
+	{ EXIT_REASON_SPP,                   "SPP" }, \
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
 	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
 	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6f92b40d798c..c41791ebee65 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -20,6 +20,7 @@
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "cpuid.h"
+#include "spp.h"
 
 #include <linux/kvm_host.h>
 #include <linux/types.h>
@@ -177,6 +178,7 @@ module_param(dbg, bool, 0644);
 /* The mask for the R/X bits in EPT PTEs */
 #define PT64_EPT_READABLE_MASK			0x1ull
 #define PT64_EPT_EXECUTABLE_MASK		0x4ull
+#define PT64_SPP_SAVED_BIT	(1ULL << (PT64_SECOND_AVAIL_BITS_SHIFT + 1))
 
 #include <trace/events/kvm.h>
 
@@ -200,6 +202,7 @@ enum {
 	RET_PF_RETRY = 0,
 	RET_PF_EMULATE = 1,
 	RET_PF_INVALID = 2,
+	RET_PF_USERSPACE = 3,
 };
 
 struct pte_list_desc {
@@ -979,6 +982,11 @@ static u64 mark_spte_for_access_track(u64 spte)
 		shadow_acc_track_saved_bits_shift;
 	spte &= ~shadow_acc_track_mask;
 
+	if (spte & PT_SPP_MASK) {
+		spte &= ~PT_SPP_MASK;
+		spte |= PT64_SPP_SAVED_BIT;
+	}
+
 	return spte;
 }
 
@@ -1677,9 +1685,15 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep)
 {
 	bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
 					       (unsigned long *)sptep);
+	bool was_spp_armed = test_and_clear_bit(PT_SPP_SHIFT,
+					       (unsigned long *)sptep);
+
 	if (was_writable && !spte_ad_enabled(*sptep))
 		kvm_set_pfn_dirty(spte_to_pfn(*sptep));
 
+	if (was_spp_armed)
+		*sptep |= PT64_SPP_SAVED_BIT;
+
 	return was_writable;
 }
 
@@ -3478,7 +3492,8 @@ static bool page_fault_can_be_fast(u32 error_code)
  */
 static bool
 fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			u64 *sptep, u64 old_spte, u64 new_spte)
+			u64 *sptep, u64 old_spte, u64 new_spte,
+			bool spp_protected)
 {
 	gfn_t gfn;
 
@@ -3499,7 +3514,8 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if (cmpxchg64(sptep, old_spte, new_spte) != old_spte)
 		return false;
 
-	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
+	if ((is_writable_pte(new_spte) && !is_writable_pte(old_spte)) ||
+	    spp_protected) {
 		/*
 		 * The gfn of direct spte is stable since it is
 		 * calculated by sp->gfn.
@@ -3534,6 +3550,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
 	bool fault_handled = false;
+	bool spp_protected = false;
 	u64 spte = 0ull;
 	uint retry_count = 0;
 
@@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 		if ((error_code & PFERR_WRITE_MASK) &&
 		    spte_can_locklessly_be_made_writable(spte))
 		{
-			new_spte |= PT_WRITABLE_MASK;
+			/*
+			 * Record write protect fault caused by
+			 * Sub-page Protection, let VMI decide
+			 * the next step.
+			 */
+			if (spte & PT_SPP_MASK) {
+				int len = kvm_x86_ops->get_inst_len(vcpu);
+
+				fault_handled = true;
+				vcpu->run->exit_reason = KVM_EXIT_SPP;
+				vcpu->run->spp.addr = gva;
+				vcpu->run->spp.ins_len = len;
+				trace_kvm_spp_induced_page_fault(vcpu,
+								 gva,
+								 len);
+				break;
+			}
+
+			if (was_spp_armed(new_spte)) {
+				restore_spp_bit(&new_spte);
+				spp_protected = true;
+			} else {
+				new_spte |= PT_WRITABLE_MASK;
+			}
 
 			/*
 			 * Do not fix write-permission on the large spte.  Since
@@ -3604,7 +3644,8 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 
 		/* Verify that the fault can be handled in the fast path */
 		if (new_spte == spte ||
-		    !is_access_allowed(error_code, new_spte))
+		    (!is_access_allowed(error_code, new_spte) &&
+		    !spp_protected))
 			break;
 
 		/*
@@ -3614,7 +3655,8 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 		 */
 		fault_handled = fast_pf_fix_direct_spte(vcpu, sp,
 							iterator.sptep, spte,
-							new_spte);
+							new_spte,
+							spp_protected);
 		if (fault_handled)
 			break;
 
@@ -3740,6 +3782,12 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
 			mmu_free_root_page(vcpu->kvm, &mmu->root_hpa,
 					   &invalid_list);
+			if (vcpu->kvm->arch.spp_active) {
+				vcpu->kvm->arch.spp_active = false;
+				mmu_free_root_page(vcpu->kvm,
+						   &vcpu->kvm->arch.sppt_root,
+						   &invalid_list);
+			}
 		} else {
 			for (i = 0; i < 4; ++i)
 				if (mmu->pae_root[i] != 0)
@@ -5223,6 +5271,8 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots)
 		uint i;
 
 		vcpu->arch.mmu->root_hpa = INVALID_PAGE;
+		if (!vcpu->kvm->arch.spp_active)
+			vcpu->kvm->arch.sppt_root = INVALID_PAGE;
 
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 			vcpu->arch.mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
@@ -5539,6 +5589,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 		r = vcpu->arch.mmu->page_fault(vcpu, cr2,
 					       lower_32_bits(error_code),
 					       false);
+
+		if (vcpu->run->exit_reason == KVM_EXIT_SPP)
+			r = RET_PF_USERSPACE;
+
 		WARN_ON(r == RET_PF_INVALID);
 	}
 
@@ -5546,7 +5600,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 		return 1;
 	if (r < 0)
 		return r;
-
+	if (r == RET_PF_USERSPACE)
+		return 0;
 	/*
 	 * Before emulating the instruction, check if the error code
 	 * was due to a RO violation while translating the guest page.
@@ -6372,6 +6427,8 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
 	return nr_mmu_pages;
 }
 
+#include "spp.c"
+
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_unload(vcpu);
diff --git a/arch/x86/kvm/mmu/spp.c b/arch/x86/kvm/mmu/spp.c
index 6f611e04e817..3ec434140967 100644
--- a/arch/x86/kvm/mmu/spp.c
+++ b/arch/x86/kvm/mmu/spp.c
@@ -17,6 +17,18 @@ static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
 	iterator->level = PT64_ROOT_4LEVEL;
 }
 
+/* Restore an spp armed PTE */
+void restore_spp_bit(u64 *spte)
+{
+	*spte &= ~PT64_SPP_SAVED_BIT;
+	*spte |= PT_SPP_MASK;
+}
+
+bool was_spp_armed(u64 spte)
+{
+	return !!(spte & PT64_SPP_SAVED_BIT);
+}
+
 u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	unsigned long idx;
@@ -459,6 +471,7 @@ int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
 		if (!access)
 			return -EFAULT;
 		*access = access_map[i];
+		trace_kvm_spp_set_subpages(vcpu, gfn, *access);
 	}
 
 	gfn = old_gfn;
diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
index daa69bd274da..5ffe06d2cd6f 100644
--- a/arch/x86/kvm/mmu/spp.h
+++ b/arch/x86/kvm/mmu/spp.h
@@ -11,6 +11,8 @@ int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
 			   u32 *access_map);
 int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access);
 bool is_spp_spte(struct kvm_mmu_page *sp);
+void restore_spp_bit(u64 *spte);
+bool was_spp_armed(u64 spte);
 u64 construct_spptp(unsigned long root_hpa);
 int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
 			      u64 gfn,
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 7c741a0c5f80..293b91d516b1 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1496,6 +1496,72 @@ TRACE_EVENT(kvm_nested_vmenter_failed,
 		__print_symbolic(__entry->err, VMX_VMENTER_INSTRUCTION_ERRORS))
 );
 
+TRACE_EVENT(kvm_spp_set_subpages,
+	TP_PROTO(struct kvm_vcpu *vcpu, gfn_t gfn, u32 access),
+	TP_ARGS(vcpu, gfn, access),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(gfn_t, gfn)
+		__field(u32, access)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->gfn = gfn;
+		__entry->access = access;
+	),
+
+	TP_printk("vcpu %d gfn %llx access %x",
+		  __entry->vcpu_id,
+		  __entry->gfn,
+		  __entry->access)
+);
+
+TRACE_EVENT(kvm_spp_induced_exit,
+	TP_PROTO(struct kvm_vcpu *vcpu, gpa_t gpa, u32 qualification),
+	TP_ARGS(vcpu, gpa, qualification),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(gpa_t, gpa)
+		__field(u32, qualification)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->gpa = gpa;
+		__entry->qualification = qualification;
+	),
+
+	TP_printk("vcpu %d gpa %llx qualificaiton %x",
+		  __entry->vcpu_id,
+		  __entry->gpa,
+		  __entry->qualification)
+);
+
+TRACE_EVENT(kvm_spp_induced_page_fault,
+	TP_PROTO(struct kvm_vcpu *vcpu, gpa_t gpa, int inst_len),
+	TP_ARGS(vcpu, gpa, inst_len),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(gpa_t, gpa)
+		__field(int, inst_len)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->gpa = gpa;
+		__entry->inst_len = inst_len;
+	),
+
+	TP_printk("vcpu %d gpa %llx inst_len %d",
+		  __entry->vcpu_id,
+		  __entry->gpa,
+		  __entry->inst_len)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 24e4e1c47f42..97d862c79124 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -200,7 +200,6 @@ static const struct {
 	[VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false},
 	[VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false},
 };
-
 #define L1D_CACHE_ORDER 4
 static void *vmx_l1d_flush_pages;
 
@@ -2999,6 +2998,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	bool update_guest_cr3 = true;
 	unsigned long guest_cr3;
 	u64 eptp;
+	u64 spptp;
 
 	guest_cr3 = cr3;
 	if (enable_ept) {
@@ -3027,6 +3027,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 	if (update_guest_cr3)
 		vmcs_writel(GUEST_CR3, guest_cr3);
+
+	if (kvm->arch.spp_active && VALID_PAGE(vcpu->kvm->arch.sppt_root)) {
+		spptp = construct_spptp(vcpu->kvm->arch.sppt_root);
+		vmcs_write64(SPPT_POINTER, spptp);
+		vmx_flush_tlb(vcpu, true);
+	}
 }
 
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
@@ -5361,6 +5367,74 @@ static int handle_monitor_trap(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+int handle_spp(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification;
+	struct kvm_memory_slot *slot;
+	gpa_t gpa;
+	gfn_t gfn;
+
+	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+
+	/*
+	 * SPP VM exit happened while executing iret from NMI,
+	 * "blocked by NMI" bit has to be set before next VM entry.
+	 * There are errata that may cause this bit to not be set:
+	 * AAK134, BY25.
+	 */
+	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+	    (exit_qualification & SPPT_INTR_INFO_UNBLOCK_NMI))
+		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+			      GUEST_INTR_STATE_NMI);
+
+	vcpu->arch.exit_qualification = exit_qualification;
+	if (exit_qualification & SPPT_INDUCED_EXIT_TYPE) {
+		int page_num = KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);
+		u32 *access;
+		gfn_t gfn_max;
+
+		/*
+		 * SPPT missing
+		 * We don't set SPP write access for the corresponding
+		 * GPA, if we haven't setup, we need to construct
+		 * SPP table here.
+		 */
+		gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+		gfn = gpa >> PAGE_SHIFT;
+		trace_kvm_spp_induced_exit(vcpu, gpa, exit_qualification);
+		/*
+		 * In level 1 of SPPT, there's no PRESENT bit, all data is
+		 * regarded as permission vector, so need to check from
+		 * level 2 to set up the vector if target page is protected.
+		 */
+		spin_lock(&vcpu->kvm->mmu_lock);
+		gfn &= ~(page_num - 1);
+		gfn_max = gfn + page_num - 1;
+		for (; gfn <= gfn_max; gfn++) {
+			slot = gfn_to_memslot(vcpu->kvm, gfn);
+			if (!slot)
+				continue;
+			access = gfn_to_subpage_wp_info(slot, gfn);
+			if (access && *access != FULL_SPP_ACCESS)
+				kvm_spp_setup_structure(vcpu,
+							*access,
+							gfn);
+		}
+		spin_unlock(&vcpu->kvm->mmu_lock);
+		return 1;
+	}
+	/*
+	 * SPPT Misconfig
+	 * This is probably caused by some mis-configuration in SPPT
+	 * entries, cannot handle it here, escalate the fault to
+	 * emulator.
+	 */
+	WARN_ON(1);
+	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
+	return 0;
+}
+
 static int handle_monitor(struct kvm_vcpu *vcpu)
 {
 	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
@@ -5575,6 +5649,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_INVVPID]                 = handle_vmx_instruction,
 	[EXIT_REASON_RDRAND]                  = handle_invalid_op,
 	[EXIT_REASON_RDSEED]                  = handle_invalid_op,
+	[EXIT_REASON_SPP]                     = handle_spp,
 	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
 	[EXIT_REASON_INVPCID]                 = handle_invpcid,
 	[EXIT_REASON_VMFUNC]		      = handle_vmx_instruction,
@@ -5807,6 +5882,9 @@ void dump_vmcs(void)
 		pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV));
 	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT))
 		pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER));
+	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_SPP))
+		pr_err("SPPT pointer = 0x%016llx\n", vmcs_read64(SPPT_POINTER));
+
 	n = vmcs_read32(CR3_TARGET_COUNT);
 	for (i = 0; i + 1 < n; i += 4)
 		pr_err("CR3 target%u=%016lx target%u=%016lx\n",
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb7da000ceaf..a9d7fc21dad6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9782,6 +9782,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 	}
 
 	kvm_page_track_free_memslot(free, dont);
+	kvm_spp_free_memslot(free, dont);
 }
 
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -10406,3 +10407,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_set_subpages);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_induced_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_induced_page_fault);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 09e5e8e6e6dd..c0f3162ee46a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -244,6 +244,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
 #define KVM_EXIT_ARM_NISV         28
+#define KVM_EXIT_SPP              29
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -401,6 +402,11 @@ struct kvm_run {
 		struct {
 			__u8 vector;
 		} eoi;
+		/* KVM_EXIT_SPP */
+		struct {
+			__u64 addr;
+			__u8 ins_len;
+		} spp;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
 		/* KVM_EXIT_ARM_NISV */
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 07/10] mmu: spp: Enable Lazy mode SPP protection
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
                   ` (5 preceding siblings ...)
  2020-01-02  6:13 ` [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 08/10] mmu: spp: Handle SPP protected pages when VM memory changes Yang Weijiang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

To deal with SPP protected 4KB pages within hugepage(2MB,1GB etc),
the hugepage entry is first zapped when set subpage permission, then
in tdp_page_fault(), it checks whether the gfn should be mapped to
PT_PAGE_TABLE_LEVEL or PT_DIRECTORY_LEVEL level depending on gfn
inclusion of SPP protected page range.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++++++++
 arch/x86/kvm/mmu/spp.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/spp.h |  2 ++
 3 files changed, 65 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c41791ebee65..aada0a3552b2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3246,6 +3246,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 	unsigned access = sp->role.access;
 	int i, ret;
 	gfn_t gfn;
+	u32 *wp_bitmap;
 
 	gfn = kvm_mmu_page_get_gfn(sp, start - sp->spt);
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK);
@@ -3259,6 +3260,13 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 	for (i = 0; i < ret; i++, gfn++, start++) {
 		mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
 			     page_to_pfn(pages[i]), true, true);
+		if (vcpu->kvm->arch.spp_active) {
+			wp_bitmap = gfn_to_subpage_wp_info(slot, gfn);
+			if (wp_bitmap && *wp_bitmap != FULL_SPP_ACCESS)
+				kvm_spp_mark_protection(vcpu->kvm,
+							gfn,
+							*wp_bitmap);
+		}
 		put_page(pages[i]);
 	}
 
@@ -3372,6 +3380,15 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 			   map_writable);
 	direct_pte_prefetch(vcpu, it.sptep);
 	++vcpu->stat.pf_fixed;
+	if (level == PT_PAGE_TABLE_LEVEL) {
+		int ret;
+		u32 access;
+
+		ret = kvm_spp_get_permission(vcpu->kvm, gfn, 1, &access);
+		if (ret == 1  && access != FULL_SPP_ACCESS)
+			kvm_spp_mark_protection(vcpu->kvm, gfn, access);
+	}
+
 	return ret;
 }
 
@@ -4338,6 +4355,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 		if (level > PT_DIRECTORY_LEVEL &&
 		    !check_hugepage_cache_consistency(vcpu, gfn, level))
 			level = PT_DIRECTORY_LEVEL;
+
+		check_spp_protection(vcpu, gfn, &force_pt_level, &level);
+
 		gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
 	}
 
diff --git a/arch/x86/kvm/mmu/spp.c b/arch/x86/kvm/mmu/spp.c
index 3ec434140967..f63541b42385 100644
--- a/arch/x86/kvm/mmu/spp.c
+++ b/arch/x86/kvm/mmu/spp.c
@@ -571,6 +571,49 @@ inline u64 construct_spptp(unsigned long root_hpa)
 }
 EXPORT_SYMBOL_GPL(construct_spptp);
 
+static bool is_spp_protected(struct kvm_memory_slot *slot, gfn_t gfn, int level)
+{
+	int page_num = KVM_PAGES_PER_HPAGE(level);
+	u32 *access;
+	gfn_t gfn_max;
+
+	gfn &= ~(page_num - 1);
+	gfn_max = gfn + page_num - 1;
+	for (; gfn <= gfn_max; gfn++) {
+		access = gfn_to_subpage_wp_info(slot, gfn);
+		if (access && *access != FULL_SPP_ACCESS)
+			return true;
+	}
+	return false;
+}
+
+bool check_spp_protection(struct kvm_vcpu *vcpu, gfn_t gfn,
+			  bool *force_pt_level, int *level)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_memory_slot *slot;
+	bool protected;
+	int old_level = *level;
+
+	if (!kvm->arch.spp_active)
+		return false;
+
+	slot = gfn_to_memslot(kvm, gfn);
+
+	if (!slot)
+		return false;
+	protected = is_spp_protected(slot, gfn, PT_DIRECTORY_LEVEL);
+
+	if (protected) {
+		*level = PT_PAGE_TABLE_LEVEL;
+		*force_pt_level = true;
+	} else if (*level == PT_PDPE_LEVEL &&
+		   is_spp_protected(slot, gfn, PT_PDPE_LEVEL))
+		*level = PT_DIRECTORY_LEVEL;
+
+	return (old_level != *level);
+}
+
 int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
 			      u64 gfn,
 			      u32 npages,
diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
index 5ffe06d2cd6f..0baf0f9ac135 100644
--- a/arch/x86/kvm/mmu/spp.h
+++ b/arch/x86/kvm/mmu/spp.h
@@ -13,6 +13,8 @@ int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access);
 bool is_spp_spte(struct kvm_mmu_page *sp);
 void restore_spp_bit(u64 *spte);
 bool was_spp_armed(u64 spte);
+bool check_spp_protection(struct kvm_vcpu *vcpu, gfn_t gfn,
+			  bool *force_pt_level, int *level);
 u64 construct_spptp(unsigned long root_hpa);
 int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
 			      u64 gfn,
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 08/10] mmu: spp: Handle SPP protected pages when VM memory changes
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
                   ` (6 preceding siblings ...)
  2020-01-02  6:13 ` [RESEND PATCH v10 07/10] mmu: spp: Enable Lazy mode SPP protection Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 09/10] x86: spp: Add SPP protection check in emulation Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 10/10] kvm: selftests: selftest for Sub-Page protection Yang Weijiang
  9 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

Host page swapping/migration may change the translation in
EPT leaf entry, if the target page is SPP protected,
re-enable SPP protection in MMU notifier. If SPPT shadow
page is reclaimed, the level1 pages don't have rmap to clear.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aada0a3552b2..3fcc6ee71f15 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1913,6 +1913,19 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			new_spte &= ~PT_WRITABLE_MASK;
 			new_spte &= ~SPTE_HOST_WRITEABLE;
 
+			/*
+			 * if it's EPT leaf entry and the physical page is
+			 * SPP protected, then re-enable SPP protection for
+			 * the page.
+			 */
+			if (kvm->arch.spp_active &&
+			    level == PT_PAGE_TABLE_LEVEL) {
+				u32 *access = gfn_to_subpage_wp_info(slot, gfn);
+
+				if (access && *access != FULL_SPP_ACCESS)
+					new_spte |= PT_SPP_MASK;
+			}
+
 			new_spte = mark_spte_for_access_track(new_spte);
 
 			mmu_spte_clear_track_bits(sptep);
@@ -2763,6 +2776,10 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 	pte = *spte;
 	if (is_shadow_present_pte(pte)) {
 		if (is_last_spte(pte, sp->role.level)) {
+			/* SPPT leaf entries don't have rmaps*/
+			if (sp->role.level == PT_PAGE_TABLE_LEVEL &&
+			    is_spp_spte(sp))
+				return true;
 			drop_spte(kvm, spte);
 			if (is_large_pte(pte))
 				--kvm->stat.lpages;
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 09/10] x86: spp: Add SPP protection check in emulation
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
                   ` (7 preceding siblings ...)
  2020-01-02  6:13 ` [RESEND PATCH v10 08/10] mmu: spp: Handle SPP protected pages when VM memory changes Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  2020-01-02  6:13 ` [RESEND PATCH v10 10/10] kvm: selftests: selftest for Sub-Page protection Yang Weijiang
  9 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

In instruction/mmio emulation cases, if the target write memroy
is SPP protected, exit to user-space to handle it as if it's
caused by SPP induced EPT violation due to guest write.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9d7fc21dad6..f2fb95249bf6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5731,6 +5731,37 @@ static const struct read_write_emulator_ops write_emultor = {
 	.write = true,
 };
 
+static bool is_emulator_spp_protected(struct kvm_vcpu *vcpu,
+				      gpa_t gpa,
+				      unsigned int bytes)
+{
+	gfn_t gfn, start_gfn, end_gfn;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_memory_slot *slot;
+	u32 *access;
+
+	if (!kvm->arch.spp_active)
+		return false;
+
+	start_gfn = gpa >> PAGE_SHIFT;
+	end_gfn = (gpa + bytes) >> PAGE_SHIFT;
+	for (gfn = start_gfn; gfn <= end_gfn; gfn++) {
+		slot = gfn_to_memslot(kvm, gfn);
+		if (slot) {
+			access = gfn_to_subpage_wp_info(slot, gfn);
+			if (access && *access != FULL_SPP_ACCESS) {
+				vcpu->run->exit_reason = KVM_EXIT_SPP;
+				vcpu->run->spp.addr = gfn;
+				vcpu->run->spp.ins_len =
+					kvm_x86_ops->get_inst_len(vcpu);
+				return true;
+			}
+		}
+	}
+
+	return false;
+}
+
 static int emulator_read_write_onepage(unsigned long addr, void *val,
 				       unsigned int bytes,
 				       struct x86_exception *exception,
@@ -5761,6 +5792,9 @@ static int emulator_read_write_onepage(unsigned long addr, void *val,
 			return X86EMUL_PROPAGATE_FAULT;
 	}
 
+	if (write && is_emulator_spp_protected(vcpu, gpa, bytes))
+		return X86EMUL_UNHANDLEABLE;
+
 	if (!ret && ops->read_write_emulate(vcpu, gpa, val, bytes))
 		return X86EMUL_CONTINUE;
 
@@ -6837,6 +6871,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 		return 1;
 
 	if (r == EMULATION_FAILED) {
+		if (vcpu->run->exit_reason == KVM_EXIT_SPP)
+			return 0;
+
 		if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
 					emulation_type))
 			return 1;
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 10/10] kvm: selftests: selftest for Sub-Page protection
  2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
                   ` (8 preceding siblings ...)
  2020-01-02  6:13 ` [RESEND PATCH v10 09/10] x86: spp: Add SPP protection check in emulation Yang Weijiang
@ 2020-01-02  6:13 ` Yang Weijiang
  9 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  6:13 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

Sub-Page Permission(SPP) is to protect finer granularity subpages
(128Byte each) within a 4KB page. It's not enabled in KVM by default,
the test first initializes the SPP runtime environment with
KVM_ENABLE_CAP ioctl, then sets protection with KVM_SUBPAGES_SET_ACCESS
for the target guest page, check permissions with KVM_SUBPAGES_GET_ACCESS
to make sure they are set as expected.

Two steps in guest code to very whether SPP is working:
1) protect all 128byte subpages, write data to each subpage
to see if SPP induced EPT violation happening. 2)unprotect all
subpages, again write data to each subpage to see if SPP still
works or not.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 tools/testing/selftests/kvm/Makefile          |   2 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
 tools/testing/selftests/kvm/x86_64/spp_test.c | 234 ++++++++++++++++++
 3 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/spp_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 3138a916574a..48582b7d1963 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += clear_dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
-
+TEST_GEN_PROGS_x86_64 += x86_64/spp_test
 TEST_GEN_PROGS_aarch64 += clear_dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 41cf45416060..bc0a25f4276b 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1486,6 +1486,7 @@ static struct exit_reason {
 	{KVM_EXIT_UNKNOWN, "UNKNOWN"},
 	{KVM_EXIT_EXCEPTION, "EXCEPTION"},
 	{KVM_EXIT_IO, "IO"},
+	{KVM_EXIT_SPP, "SPP"},
 	{KVM_EXIT_HYPERCALL, "HYPERCALL"},
 	{KVM_EXIT_DEBUG, "DEBUG"},
 	{KVM_EXIT_HLT, "HLT"},
diff --git a/tools/testing/selftests/kvm/x86_64/spp_test.c b/tools/testing/selftests/kvm/x86_64/spp_test.c
new file mode 100644
index 000000000000..2e83ff60768b
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/spp_test.c
@@ -0,0 +1,234 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Sub-Page Permission test
+ *
+ * Copyright (C) 2019, Intel Corp.
+ *
+ */
+
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "../../lib/kvm_util_internal.h"
+#include "linux/kvm.h"
+
+#define VCPU_ID           1
+#define PAGE_SIZE         (4096)
+#define SPP_GUARD_SIZE    (16 * PAGE_SIZE)
+#define SPP_GUARD_MEMSLOT (1)
+#define SPP_GUARD_PAGES   (SPP_GUARD_SIZE / PAGE_SIZE)
+#define SPP_GUARD_GPA      0x10000000
+
+#define SUBPAGE_ACCESS_DEFAULT   (0x0)
+#define SUBPAGE_ACCESS_FULL      (0xFFFFFFFF)
+#define START_SPP_VM_ADDR        (0x700000)
+#define SUBPAGE_SIZE             (128)
+
+vm_vaddr_t vspp_start;
+vm_paddr_t pspp_start;
+
+void guest_code(void)
+{
+	uint8_t *iterator = (uint8_t *)vspp_start;
+	int count;
+
+	GUEST_SYNC(1);
+	/*
+	 * expect EPT violation induced by SPP in each interation since
+	 * the full page is protected by SPP.
+	 */
+	for (count = 0; count < PAGE_SIZE / SUBPAGE_SIZE; count++) {
+		*(uint32_t *)(iterator) = 0x99;
+		iterator += SUBPAGE_SIZE;
+	}
+	GUEST_SYNC(2);
+	iterator = (uint8_t *)vspp_start;
+
+	/*
+	 * don't expect EPT violation happen since SPP is disabled
+	 * for the page
+	 */
+	for (count = 0; count < PAGE_SIZE / SUBPAGE_SIZE; count++) {
+		*(uint32_t *)(iterator) = 0x99;
+		iterator += SUBPAGE_SIZE;
+	}
+}
+
+void prepare_test(struct kvm_vm **g_vm, struct kvm_run **g_run)
+{
+	void *spp_hva;
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+	/* Create VM, SPP is only valid for 4KB page mode */
+	*g_vm = vm_create_default(VCPU_ID, 0, guest_code);
+	vm = *g_vm;
+
+	*g_run = vcpu_state(vm, VCPU_ID);
+	run = *g_run;
+
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SPP_GUARD_GPA,
+				    SPP_GUARD_MEMSLOT, SPP_GUARD_PAGES, 0);
+
+	pspp_start = vm_phy_pages_alloc(vm, 1, SPP_GUARD_GPA,
+					SPP_GUARD_MEMSLOT);
+
+	memset(addr_gpa2hva(vm, SPP_GUARD_GPA), 0x0, PAGE_SIZE);
+
+	virt_map(vm, START_SPP_VM_ADDR, SPP_GUARD_GPA, PAGE_SIZE, 0);
+
+	vspp_start = vm_vaddr_alloc(vm, PAGE_SIZE, START_SPP_VM_ADDR,
+				    SPP_GUARD_MEMSLOT, 0);
+
+	spp_hva = addr_gva2hva(vm, vspp_start);
+
+	pspp_start = addr_hva2gpa(vm, spp_hva);
+
+	printf("SPP protected zone: size = %d, gva = 0x%lx, gpa = 0x%lx, "
+	       "hva = 0x%p\n", PAGE_SIZE, vspp_start, pspp_start, spp_hva);
+
+	/* make sure the virtual address is visible to VM. */
+	sync_global_to_guest(vm, vspp_start);
+
+	vcpu_run(vm, VCPU_ID);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "exit reason: %u (%s),\n", run->exit_reason,
+		     exit_reason_str(run->exit_reason));
+}
+
+void setup_spp(struct kvm_vm *vm)
+{
+	struct kvm_enable_cap cap;
+	int ret = 0;
+	struct kvm_subpage *sp;
+	int len;
+	memset(&cap, 0, sizeof(cap));
+	cap.cap = KVM_CAP_X86_SPP;
+	cap.flags = 0;
+
+	/* initialize the SPP runtime environment.*/
+	ret = ioctl(vm->fd, KVM_ENABLE_CAP, &cap);
+	TEST_ASSERT(ret == 0, "KVM_CAP_X86_SPP failed.");
+	len = sizeof(*sp) + sizeof(__u32);
+	printf("SPP initialized successfully.\n");
+
+	sp = malloc(len);
+	TEST_ASSERT(sp > 0, "Low memory 1!");
+	memset(sp, 0, len);
+	/* set up SPP protection for the page. */
+	sp->npages = 1;
+	sp->gfn_base = pspp_start >> 12;
+	sp->access_map[0] = SUBPAGE_ACCESS_DEFAULT;
+	ret = ioctl(vm->fd, KVM_SUBPAGES_SET_ACCESS, sp);
+
+	TEST_ASSERT(ret == 1, "KVM_SUBPAGES_SET_ACCESS failed. ret = 0x%x, "
+		    "gfn_base = 0x%llx\n", ret, sp->gfn_base);
+	printf("set spp protection info: gfn = 0x%llx, access = 0x%x, "
+	       "npages = %d\n", sp->gfn_base, sp->access_map[0],
+	       sp->npages);
+
+	memset(sp, 0, len);
+	/* make sure the SPP permission bits are actully set as expected. */
+	sp->npages = 1;
+	sp->gfn_base = pspp_start >> 12;
+
+	ret = ioctl(vm->fd, KVM_SUBPAGES_GET_ACCESS, sp);
+
+	TEST_ASSERT(ret == 1, "KVM_SUBPAGES_GET_ACCESS failed.");
+
+	TEST_ASSERT(sp->access_map[0] == SUBPAGE_ACCESS_DEFAULT,
+		    "subpage access didn't match.");
+	printf("get spp protection info: gfn = 0x%llx, access = 0x%x, "
+	       "npages = %d\n", sp->gfn_base,
+	       sp->access_map[0], sp->npages);
+
+	free(sp);
+	printf("got matched subpage permission vector.\n");
+	printf("expect VM exits caused by SPP below.\n");
+}
+
+void unset_spp(struct kvm_vm *vm)
+{
+	struct kvm_subpage *sp;
+	int len;
+
+	len = sizeof(*sp) + sizeof(__u32);
+	sp = malloc(len);
+	TEST_ASSERT(sp > 0, "Low memory 2!");
+	memset(sp, 0, len);
+
+	/* now unprotect the SPP to the page.*/
+	sp->npages = 1;
+	sp->gfn_base = pspp_start >> 12;
+	sp->access_map[0] = SUBPAGE_ACCESS_FULL;
+	ioctl(vm->fd, KVM_SUBPAGES_SET_ACCESS, sp);
+
+	printf("unset SPP protection at gfn: 0x%llx\n", sp->gfn_base);
+	printf("expect NO VM exits caused by SPP below.\n");
+	free(sp);
+}
+
+#define TEST_SYNC_FIELDS   KVM_SYNC_X86_REGS
+
+void run_test(struct kvm_vm *vm, struct kvm_run *run)
+{
+	int loop;
+	int ept_fault = 0;
+	struct kvm_regs regs;
+
+	run->kvm_valid_regs = TEST_SYNC_FIELDS;
+	vcpu_run(vm, VCPU_ID);
+
+	for (loop = 0; loop < PAGE_SIZE / SUBPAGE_SIZE; loop++) {
+		/*
+		 * if everything goes correctly, should get VM exit
+		 * with KVM_EXIT_SPP.
+		 */
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_SPP,
+			    "exit reason: %u (%s),\n", run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+		printf("%d - exit reason: %s\n", loop + 1,
+		       exit_reason_str(run->exit_reason));
+		ept_fault++;
+
+		vcpu_regs_get(vm, VCPU_ID, &regs);
+
+		run->s.regs.regs.rip += run->spp.ins_len;
+
+		run->kvm_valid_regs = TEST_SYNC_FIELDS;
+		run->kvm_dirty_regs = KVM_SYNC_X86_REGS;
+
+		vcpu_run(vm, VCPU_ID);
+	}
+
+	printf("total EPT violation count: %d\n", ept_fault);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+
+	prepare_test(&vm, &run);
+
+	setup_spp(vm);
+
+	run_test(vm, run);
+
+	unset_spp(vm);
+
+	vcpu_run(vm, VCPU_ID);
+
+	printf("completed SPP test successfully!\n");
+
+	kvm_vm_free(vm);
+
+	return 0;
+}
+
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 02/10] vmx: spp: Add control flags for Sub-Page Protection(SPP)
  2020-01-02  6:13 ` [RESEND PATCH v10 02/10] vmx: spp: Add control flags for Sub-Page Protection(SPP) Yang Weijiang
@ 2020-01-10 16:58   ` Sean Christopherson
  2020-01-13  5:44     ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-10 16:58 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Thu, Jan 02, 2020 at 02:13:11PM +0800, Yang Weijiang wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e3394c839dea..5713e8a6224c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -60,6 +60,7 @@
>  #include "vmcs12.h"
>  #include "vmx.h"
>  #include "x86.h"
> +#include "../mmu/spp.h"

The ".." should be unnecessary, e.g. x86.h is obviously a level up.

>  MODULE_AUTHOR("Qumranet");
>  MODULE_LICENSE("GPL");
> @@ -111,6 +112,7 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);
>  
>  static bool __read_mostly dump_invalid_vmcs = 0;
>  module_param(dump_invalid_vmcs, bool, 0644);
> +static bool __read_mostly spp_supported = 0;

s/spp_supported/enable_spp to be consistent with all the other booleans.

Is there a reason this isn't exposed as a module param?

And if this is to be on by default, then the flag itself should be
initialized to '1' so that it's clear to readers that the feature is
enabled by default (if it's supported).  Looking at only this code, I would
think that SPP is forced off and can't be enabled.

That being said, turning on the enable_spp control flag should be the last
patch in the series, i.e. it shouldn't be turned on until all the
underlying support code is in place.  So, I would keep this as is, but
invert the code in hardware_setup() below.  That way the flag exists and
is checked, but can't be turned on without modifying the code.  Then when
all is said and done, you can add a patch to introduce the module param
and turn on the flag by default (if that's indeed what we want).

>  #define MSR_BITMAP_MODE_X2APIC		1
>  #define MSR_BITMAP_MODE_X2APIC_APICV	2
> @@ -2391,6 +2393,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>  			SECONDARY_EXEC_RDSEED_EXITING |
>  			SECONDARY_EXEC_RDRAND_EXITING |
>  			SECONDARY_EXEC_ENABLE_PML |
> +			SECONDARY_EXEC_ENABLE_SPP |
>  			SECONDARY_EXEC_TSC_SCALING |
>  			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
>  			SECONDARY_EXEC_PT_USE_GPA |
> @@ -4039,6 +4042,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
>  	if (!enable_pml)
>  		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>  
> +	if (!spp_supported)
> +		exec_control &= ~SECONDARY_EXEC_ENABLE_SPP;
> +
>  	if (vmx_xsaves_supported()) {
>  		/* Exposing XSAVES only when XSAVE is exposed */
>  		bool xsaves_enabled =
> @@ -7630,6 +7636,9 @@ static __init int hardware_setup(void)
>  	if (!cpu_has_vmx_flexpriority())
>  		flexpriority_enabled = 0;
>  
> +	if (cpu_has_vmx_ept_spp() && enable_ept)
> +		spp_supported = 1;

As above, invert this to disable spp when it's not supported, or when EPT
is disabled (or not supported).

> +
>  	if (!cpu_has_virtual_nmis())
>  		enable_vnmi = 0;
>  
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions
  2020-01-02  6:13 ` [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions Yang Weijiang
@ 2020-01-10 17:26   ` Sean Christopherson
  2020-01-13  6:00     ` Yang Weijiang
  2020-01-10 17:40   ` Sean Christopherson
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-10 17:26 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Thu, Jan 02, 2020 at 02:13:12PM +0800, Yang Weijiang wrote:
> SPPT is a 4-level paging structure similar to EPT, when SPP is

How does SPP interact with 5-level EPT?

> armed for target physical page, bit 61 of the corresponding
> EPT entry is flaged, then SPPT is traversed with the gfn,
> the leaf entry of SPPT contains the access bitmap of subpages
> inside the target 4KB physical page, one bit per 128-byte subpage.
> 
> Co-developed-by: He Chen <he.chen@linux.intel.com>
> Signed-off-by: He Chen <he.chen@linux.intel.com>
> Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---

...

> +static u64 format_spp_spte(u32 spp_wp_bitmap)
> +{
> +	u64 new_spte = 0;
> +	int i = 0;
> +
> +	/*
> +	 * One 4K page contains 32 sub-pages, in SPP table L4E, old bits

Is this

	One 4k page constains 32 sub-pages in SPP table L4E.  Old bits are...

or
	One 4k page contains 32 sub-pages.  In SPP table L4E, old bits are...

or
	???

> +	 * are reserved, so we need to transfer u32 subpage write

Wrap comments at 80 columns to save lines.

> +	 * protect bitmap to u64 SPP L4E format.

What is a "page" in "one 4k page"?  What old bits?  Why not just track a
64-bit value?  I understand *what* the code below does, but I have no clue
why or whether it's correct.

> +	 */
> +	while (i < 32) {
> +		if (spp_wp_bitmap & (1ULL << i))
> +			new_spte |= 1ULL << (i * 2);
> +		i++;
> +	}

	for (i = 0; i < 32; i++)
		new_spte |= (spp_wp_bitmap & BIT_ULL(i)) << i;

At the very least, use a for loop.

> +
> +	return new_spte;
> +}
> +
> +static void spp_spte_set(u64 *sptep, u64 new_spte)
> +{
> +	__set_spte(sptep, new_spte);
> +}
> +
> +bool is_spp_spte(struct kvm_mmu_page *sp)
> +{
> +	return sp->role.spp;
> +}
> +
> +#define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)
> +
> +int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
> +			    u32 access_map, gfn_t gfn)
> +{
> +	struct kvm_shadow_walk_iterator iter;
> +	struct kvm_mmu_page *sp;
> +	gfn_t pseudo_gfn;
> +	u64 old_spte, spp_spte;
> +	int ret = -EFAULT;
> +
> +	/* direct_map spp start */
> +	if (!VALID_PAGE(vcpu->kvm->arch.sppt_root))
> +		return -EFAULT;
> +
> +	for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
> +		if (iter.level == PT_PAGE_TABLE_LEVEL) {
> +			spp_spte = format_spp_spte(access_map);
> +			old_spte = mmu_spte_get_lockless(iter.sptep);
> +			if (old_spte != spp_spte)
> +				spp_spte_set(iter.sptep, spp_spte);
> +			ret = 0;
> +			break;
> +		}
> +
> +		if (!is_shadow_present_pte(*iter.sptep)) {
> +			u64 base_addr = iter.addr;
> +
> +			base_addr &= PT64_LVL_ADDR_MASK(iter.level);
> +			pseudo_gfn = base_addr >> PAGE_SHIFT;
> +			spp_spte = *iter.sptep;
> +			sp = kvm_spp_get_page(vcpu, pseudo_gfn,
> +					      iter.level - 1);
> +			link_spp_shadow_page(vcpu, iter.sptep, sp);
> +		} else if (iter.level == PT_DIRECTORY_LEVEL  &&
> +			   !(spp_spte & PT_PRESENT_MASK) &&
> +			   (spp_spte & SPPT_ENTRY_PHA_MASK)) {
> +			spp_spte = *iter.sptep;
> +			spp_spte |= PT_PRESENT_MASK;
> +			spp_spte_set(iter.sptep, spp_spte);
> +		}
> +	}
> +
> +	kvm_flush_remote_tlbs(vcpu->kvm);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);
> +
> +inline u64 construct_spptp(unsigned long root_hpa)
> +{
> +	return root_hpa & PAGE_MASK;
> +}
> +EXPORT_SYMBOL_GPL(construct_spptp);
> +
> diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
> new file mode 100644
> index 000000000000..8ef94b7a2057
> --- /dev/null
> +++ b/arch/x86/kvm/mmu/spp.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __KVM_X86_VMX_SPP_H
> +#define __KVM_X86_VMX_SPP_H
> +
> +bool is_spp_spte(struct kvm_mmu_page *sp);
> +u64 construct_spptp(unsigned long root_hpa);
> +int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
> +			    u32 access_map, gfn_t gfn);
> +
> +#endif /* __KVM_X86_VMX_SPP_H */
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 04/10] mmu: spp: Add functions to operate SPP access bitmap
  2020-01-02  6:13 ` [RESEND PATCH v10 04/10] mmu: spp: Add functions to operate SPP access bitmap Yang Weijiang
@ 2020-01-10 17:38   ` Sean Christopherson
  2020-01-13  6:15     ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-10 17:38 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Thu, Jan 02, 2020 at 02:13:13PM +0800, Yang Weijiang wrote:
> Create access bitmap for SPP subpages, the bitmap can
> be accessed with a gfn. The initial access bitmap for each
> physical page is 0xFFFFFFFF, meaning SPP is not enabled for the
> subpages.

Wrap changelogs at ~75 chars.

Create access bitmap for SPP subpages, the bitmap can be accessed with a
gfn.  The initial access bitmap for each physical page is 0xFFFFFFFF,
meaning SPP is not enabled for the subpages.

There needs to be a *lot* more information provided in all of the changelogs
for this series.  I understand the basic concepts of SPP, but nothing in the
documentation or changelogs explains how KVM generates the SPP tables based
on userspace input.  Essentially, explain the design in decent detail, with
a focus on *why* KVM does what it does.

> Co-developed-by: He Chen <he.chen@linux.intel.com>
> Signed-off-by: He Chen <he.chen@linux.intel.com>
> Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   2 +
>  arch/x86/kvm/mmu/spp.c          | 332 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/mmu/spp.h          |  12 ++
>  include/uapi/linux/kvm.h        |   8 +
>  4 files changed, 354 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9506c9d40895..f5145b86d620 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -812,6 +812,7 @@ struct kvm_lpage_info {
>  
>  struct kvm_arch_memory_slot {
>  	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
> +	u32 *subpage_wp_info;
>  	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
>  	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
>  };
> @@ -959,6 +960,7 @@ struct kvm_arch {
>  	struct task_struct *nx_lpage_recovery_thread;
>  
>  	hpa_t sppt_root;
> +	bool spp_active;
>  };
>  
>  struct kvm_vm_stat {
> diff --git a/arch/x86/kvm/mmu/spp.c b/arch/x86/kvm/mmu/spp.c
> index 5fca08af705c..edab5ec83ef3 100644
> --- a/arch/x86/kvm/mmu/spp.c
> +++ b/arch/x86/kvm/mmu/spp.c
> @@ -17,6 +17,21 @@ static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
>  	iterator->level = PT64_ROOT_4LEVEL;
>  }
>  
> +u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> +	unsigned long idx;
> +
> +	if (!slot->arch.subpage_wp_info)
> +		return NULL;
> +
> +	idx = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
> +	if (idx > slot->npages - 1)
> +		return NULL;
> +
> +	return &slot->arch.subpage_wp_info[idx];
> +}
> +EXPORT_SYMBOL_GPL(gfn_to_subpage_wp_info);
> +
>  static bool __rmap_open_subpage_bit(struct kvm *kvm,
>  				    struct kvm_rmap_head *rmap_head)
>  {
> @@ -172,6 +187,20 @@ bool is_spp_spte(struct kvm_mmu_page *sp)
>  	return sp->role.spp;
>  }
>  
> +static int kvm_spp_level_pages(gfn_t gfn_lower, gfn_t gfn_upper, int level)
> +{
> +	int page_num = KVM_PAGES_PER_HPAGE(level);
> +	gfn_t gfn_max = (gfn_lower & ~(page_num - 1)) + page_num - 1;
> +	int ret;
> +
> +	if (gfn_upper <= gfn_max)
> +		ret = gfn_upper - gfn_lower + 1;
> +	else
> +		ret = gfn_max - gfn_lower + 1;
> +
> +	return ret;
> +}
> +
>  #define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)

There's almost certainly an existing macro for this.

>  
>  int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
> @@ -220,6 +249,309 @@ int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
>  }
>  EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);
>  
> +int vmx_spp_flush_sppt(struct kvm *kvm, u64 gfn_base, u32 npages)
> +{
> +	struct kvm_shadow_walk_iterator iter;
> +	struct kvm_vcpu *vcpu;
> +	gfn_t gfn = gfn_base;
> +	gfn_t gfn_max = gfn_base + npages - 1;

s/gfn_max/gfn_end.  "max" makes me think this is literally walking every
possible gfn.

> +	u64 spde;
> +	int count;
> +	bool flush = false;
> +
> +	vcpu = kvm_get_vcpu(kvm, 0);
> +	if (!VALID_PAGE(vcpu->kvm->arch.sppt_root))
> +		return -EFAULT;
> +
> +	for (; gfn <= gfn_max; gfn++) {
> +		for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
> +			if (!is_shadow_present_pte(*iter.sptep))
> +				break;
> +
> +			if (iter.level == PT_DIRECTORY_LEVEL) {
> +				spde = *iter.sptep;
> +				spde &= ~PT_PRESENT_MASK;
> +				spp_spte_set(iter.sptep, spde);
> +				count = kvm_spp_level_pages(gfn,
> +							    gfn_max,
> +							    PT_DIRECTORY_LEVEL);
> +				flush = true;
> +				if (count >= npages)
> +					goto out;
> +				gfn += count;
> +				break;
> +			}
> +		}
> +	}
> +out:
> +	if (flush)
> +		kvm_flush_remote_tlbs(kvm);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(vmx_spp_flush_sppt);
> +
> +static int kvm_spp_create_bitmaps(struct kvm *kvm)
> +{
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *memslot;
> +	int i, j, ret;
> +	u32 *buff;
> +
> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +		slots = __kvm_memslots(kvm, i);
> +		kvm_for_each_memslot(memslot, slots) {
> +			buff = kvzalloc(memslot->npages *
> +				sizeof(*memslot->arch.subpage_wp_info),
> +				GFP_KERNEL);
> +
> +			if (!buff) {
> +				ret = -ENOMEM;
> +				goto out_free;
> +			}
> +			memslot->arch.subpage_wp_info = buff;
> +
> +			for (j = 0; j < memslot->npages; j++)
> +				buff[j] = FULL_SPP_ACCESS;
> +		}
> +	}
> +
> +	return 0;
> +out_free:
> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +		slots = __kvm_memslots(kvm, i);
> +		kvm_for_each_memslot(memslot, slots) {
> +			if (memslot->arch.subpage_wp_info) {
> +				kvfree(memslot->arch.subpage_wp_info);
> +				memslot->arch.subpage_wp_info = NULL;
> +			}
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +int vmx_spp_init(struct kvm *kvm)
> +{
> +	int i, ret;
> +	struct kvm_vcpu *vcpu;
> +	int root_level;
> +	struct kvm_mmu_page *ssp_sp;
> +	bool first_root = true;
> +
> +	/* SPP feature is exclusive with nested VM.*/
> +	if (kvm_x86_ops->get_nested_state)
> +		return -EPERM;
> +
> +	if (kvm->arch.spp_active)
> +		return 0;
> +
> +	ret = kvm_spp_create_bitmaps(kvm);
> +
> +	if (ret)
> +		return ret;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (first_root) {
> +			/* prepare caches for SPP setup.*/
> +			mmu_topup_memory_caches(vcpu);
> +			root_level = vcpu->arch.mmu->shadow_root_level;
> +			ssp_sp = kvm_spp_get_page(vcpu, 0, root_level);
> +			first_root = false;
> +			vcpu->kvm->arch.sppt_root = __pa(ssp_sp->spt);
> +		}
> +		++ssp_sp->root_count;
> +		kvm_make_request(KVM_REQ_LOAD_CR3, vcpu);
> +	}
> +
> +	kvm->arch.spp_active = true;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(vmx_spp_init);
> +
> +int kvm_spp_get_permission(struct kvm *kvm, u64 gfn, u32 npages,
> +			   u32 *access_map)
> +{
> +	u32 *access;
> +	struct kvm_memory_slot *slot;
> +	int i;
> +
> +	if (!kvm->arch.spp_active)
> +		return -ENODEV;
> +
> +	for (i = 0; i < npages; i++, gfn++) {
> +		slot = gfn_to_memslot(kvm, gfn);
> +		if (!slot)
> +			return -EFAULT;
> +		access = gfn_to_subpage_wp_info(slot, gfn);
> +		if (!access)
> +			return -EFAULT;
> +		access_map[i] = *access;
> +	}
> +
> +	return i;
> +}
> +EXPORT_SYMBOL_GPL(kvm_spp_get_permission);
> +
> +static void kvm_spp_zap_pte(struct kvm *kvm, u64 *spte, int level)
> +{
> +	u64 pte;
> +
> +	pte = *spte;
> +	if (is_shadow_present_pte(pte) && is_last_spte(pte, level)) {
> +		drop_spte(kvm, spte);
> +		if (is_large_pte(pte))
> +			--kvm->stat.lpages;
> +	}
> +}
> +
> +static bool kvm_spp_flush_rmap(struct kvm *kvm, u64 gfn_min, u64 gfn_max)
> +{
> +	u64 *sptep;
> +	struct rmap_iterator iter;
> +	struct kvm_rmap_head *rmap_head;
> +	int level;
> +	struct kvm_memory_slot *slot;
> +	bool flush = false;
> +
> +	slot = gfn_to_memslot(kvm, gfn_min);
> +	if (!slot)
> +		return false;
> +
> +	for (; gfn_min <= gfn_max; gfn_min++) {
> +		for (level = PT_PAGE_TABLE_LEVEL;
> +		     level <= PT_DIRECTORY_LEVEL; level++) {
> +			rmap_head = __gfn_to_rmap(gfn_min, level, slot);
> +			for_each_rmap_spte(rmap_head, &iter, sptep) {
> +				pte_list_remove(rmap_head, sptep);
> +				flush = true;
> +			}
> +		}
> +	}
> +
> +	return flush;
> +}
> +
> +int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
> +			   u32 *access_map)
> +{
> +	gfn_t old_gfn = gfn;
> +	u32 *access;
> +	struct kvm_memory_slot *slot;
> +	struct kvm_shadow_walk_iterator iterator;
> +	struct kvm_vcpu *vcpu;
> +	gfn_t gfn_max;
> +	int i, count, level;
> +	bool flush = false;
> +
> +	if (!kvm->arch.spp_active)
> +		return -ENODEV;
> +
> +	vcpu = kvm_get_vcpu(kvm, 0);
> +	if (!VALID_PAGE(vcpu->kvm->arch.sppt_root))
> +		return -EFAULT;
> +
> +	for (i = 0; i < npages; i++, gfn++) {
> +		slot = gfn_to_memslot(kvm, gfn);
> +		if (!slot)
> +			return -EFAULT;
> +
> +		access = gfn_to_subpage_wp_info(slot, gfn);
> +		if (!access)
> +			return -EFAULT;
> +		*access = access_map[i];
> +	}
> +
> +	gfn = old_gfn;
> +	gfn_max = gfn + npages - 1;
> +	vcpu = kvm_get_vcpu(kvm, 0);
> +
> +	if (!vcpu || (vcpu && !VALID_PAGE(vcpu->arch.mmu->root_hpa)))
> +		goto out;
> +
> +	flush = kvm_spp_flush_rmap(kvm, gfn, gfn_max);
> +
> +	for (i = 0; gfn <= gfn_max; i++, gfn++) {
> +		for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
> +			if (!is_shadow_present_pte(*iterator.sptep))
> +				break;
> +
> +			if (iterator.level == PT_PAGE_TABLE_LEVEL) {
> +				if (kvm_spp_mark_protection(kvm,
> +							    gfn,
> +							    access_map[i]) < 0)
> +					return -EFAULT;
> +				break;
> +			} else if (is_large_pte(*iterator.sptep)) {
> +				level = iterator.level;
> +				if (access_map[i] == FULL_SPP_ACCESS)
> +					break;
> +				count = kvm_spp_level_pages(gfn,
> +							    gfn_max,
> +							    level);
> +				kvm_spp_zap_pte(kvm, iterator.sptep, level);
> +				flush = true;
> +				if (count >= npages)
> +					goto out;
> +				gfn += count - 1;
> +			}
> +		}
> +	}
> +out:
> +	if (flush)
> +		kvm_flush_remote_tlbs(kvm);
> +	return npages;
> +}
> +
> +int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access)
> +{
> +	struct kvm_memory_slot *slot;
> +	struct kvm_rmap_head *rmap_head;
> +	int ret;
> +
> +	if (!kvm->arch.spp_active)
> +		return -ENODEV;
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot)
> +		return -EFAULT;
> +
> +	/*
> +	 * check whether the target 4KB page exists in EPT leaf
> +	 * entry.If it's there, just flag SPP bit of the entry,
> +	 * defer the setup to SPPT miss induced vm-exit  handler.
> +	 */
> +	rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
> +
> +	if (rmap_head->val) {
> +		/*
> +		 * if all subpages are not writable, open SPP bit in
> +		 * EPT leaf entry to enable SPP protection for
> +		 * corresponding page.
> +		 */
> +		if (access != FULL_SPP_ACCESS) {
> +			ret = kvm_spp_open_write_protect(kvm, slot, gfn);
> +			if (ret)
> +				return ret;
> +		} else {
> +			ret = kvm_spp_clear_write_protect(kvm, slot, gfn);
> +			if (ret)
> +				return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +void kvm_spp_free_memslot(struct kvm_memory_slot *free,
> +			  struct kvm_memory_slot *dont)
> +{
> +	if (!dont || free->arch.subpage_wp_info !=
> +	    dont->arch.subpage_wp_info) {
> +		kvfree(free->arch.subpage_wp_info);
> +		free->arch.subpage_wp_info = NULL;
> +	}
> +}
> +
>  inline u64 construct_spptp(unsigned long root_hpa)
>  {
>  	return root_hpa & PAGE_MASK;
> diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
> index 8ef94b7a2057..99d93fdc74b5 100644
> --- a/arch/x86/kvm/mmu/spp.h
> +++ b/arch/x86/kvm/mmu/spp.h
> @@ -2,9 +2,21 @@
>  #ifndef __KVM_X86_VMX_SPP_H
>  #define __KVM_X86_VMX_SPP_H
>  
> +#define FULL_SPP_ACCESS		((u32)((1ULL << 32) - 1))
> +
> +int kvm_spp_get_permission(struct kvm *kvm, u64 gfn, u32 npages,
> +			   u32 *access_map);
> +int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
> +			   u32 *access_map);
> +int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access);
>  bool is_spp_spte(struct kvm_mmu_page *sp);
>  u64 construct_spptp(unsigned long root_hpa);
>  int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
>  			    u32 access_map, gfn_t gfn);
> +int vmx_spp_flush_sppt(struct kvm *kvm, u64 gfn_base, u32 npages);
> +void kvm_spp_free_memslot(struct kvm_memory_slot *free,
> +			  struct kvm_memory_slot *dont);
> +int vmx_spp_init(struct kvm *kvm);
> +u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot, gfn_t gfn);
>  
>  #endif /* __KVM_X86_VMX_SPP_H */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f0a16b4adbbd..eabd55ec5af7 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -102,6 +102,14 @@ struct kvm_userspace_memory_region {
>  	__u64 userspace_addr; /* start of the userspace allocated memory */
>  };
>  
> +/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
> +struct kvm_subpage {
> +	__u64 gfn_base; /* the first page gfn of the contiguous pages */
> +	__u32 npages;   /* number of 4K pages */
> +	__u32 flags;    /* reserved to 0 now */
> +	__u32 access_map[0]; /* start place of bitmap array */
> +};
> +
>  /*
>   * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
>   * other bits are reserved for kvm internal use which are defined in
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions
  2020-01-02  6:13 ` [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions Yang Weijiang
  2020-01-10 17:26   ` Sean Christopherson
@ 2020-01-10 17:40   ` Sean Christopherson
  2020-01-13  6:04     ` Yang Weijiang
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-10 17:40 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Thu, Jan 02, 2020 at 02:13:12PM +0800, Yang Weijiang wrote:
> SPPT is a 4-level paging structure similar to EPT, when SPP is
> armed for target physical page, bit 61 of the corresponding
> EPT entry is flaged, then SPPT is traversed with the gfn,
> the leaf entry of SPPT contains the access bitmap of subpages
> inside the target 4KB physical page, one bit per 128-byte subpage.
> 
> Co-developed-by: He Chen <he.chen@linux.intel.com>
> Signed-off-by: He Chen <he.chen@linux.intel.com>
> Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |   5 +-
>  arch/x86/kvm/mmu/spp.c          | 228 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/mmu/spp.h          |  10 ++

This patch is completely untesteable.  It adds spp.c but doesn't compile
it.  Actually, everything up until patch 06 is untestable.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-02  6:13 ` [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit Yang Weijiang
@ 2020-01-10 17:55   ` Sean Christopherson
  2020-01-13  6:50     ` Yang Weijiang
  2020-01-10 18:04   ` Sean Christopherson
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-10 17:55 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> If write to subpage is not allowed, EPT violation generates
> and it's handled in fast_page_fault().
> 
> In current implementation, SPPT setup is only handled in handle_spp()
> vmexit handler, it's triggered when SPP bit is set in EPT leaf
> entry while SPPT entries are not ready.
> 
> A SPP specific bit(11) is added to exit_qualification and a new
> exit reason(66) is introduced for SPP.

...

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6f92b40d798c..c41791ebee65 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6372,6 +6427,8 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
>  	return nr_mmu_pages;
>  }
>  
> +#include "spp.c"
> +

Unless there is a *very* good reason for these shenanigans, spp.c needs
to built via the Makefile like any other source.  If this is justified
for whatever reason, then that justification needs to be very clearly
stated in the changelog.

In general, the code organization of this entire series likely needs to
be overhauled.  There are gobs exports which are either completely
unnecessary or completely backswards.

E.g. exporting VMX-only functions from spp.c, which presumably are only
callbed by VMX.

	EXPORT_SYMBOL_GPL(vmx_spp_flush_sppt);
	EXPORT_SYMBOL_GPL(vmx_spp_init);

Exporting ioctl helpers from the same file, which are presumably called
only from x86.c.

	EXPORT_SYMBOL_GPL(kvm_vm_ioctl_get_subpages);
	EXPORT_SYMBOL_GPL(kvm_vm_ioctl_set_subpages);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-02  6:13 ` [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit Yang Weijiang
  2020-01-10 17:55   ` Sean Christopherson
@ 2020-01-10 18:04   ` Sean Christopherson
  2020-01-13  8:10     ` Yang Weijiang
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-10 18:04 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
>  		if ((error_code & PFERR_WRITE_MASK) &&
>  		    spte_can_locklessly_be_made_writable(spte))
>  		{
> -			new_spte |= PT_WRITABLE_MASK;
> +			/*
> +			 * Record write protect fault caused by
> +			 * Sub-page Protection, let VMI decide
> +			 * the next step.
> +			 */
> +			if (spte & PT_SPP_MASK) {
> +				int len = kvm_x86_ops->get_inst_len(vcpu);

There's got to be a better way to handle SPP exits than adding a helper
to retrieve the instruction length.

> +
> +				fault_handled = true;
> +				vcpu->run->exit_reason = KVM_EXIT_SPP;
> +				vcpu->run->spp.addr = gva;
> +				vcpu->run->spp.ins_len = len;

s/ins_len/insn_len to be consistent with other KVM nomenclature.

> +				trace_kvm_spp_induced_page_fault(vcpu,
> +								 gva,
> +								 len);
> +				break;
> +			}
> +
> +			if (was_spp_armed(new_spte)) {
> +				restore_spp_bit(&new_spte);
> +				spp_protected = true;
> +			} else {
> +				new_spte |= PT_WRITABLE_MASK;
> +			}
>  
>  			/*
>  			 * Do not fix write-permission on the large spte.  Since

...

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 24e4e1c47f42..97d862c79124 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -200,7 +200,6 @@ static const struct {
>  	[VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false},
>  	[VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false},
>  };
> -

Spurious whitepsace.

>  #define L1D_CACHE_ORDER 4
>  static void *vmx_l1d_flush_pages;
>  
> @@ -2999,6 +2998,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	bool update_guest_cr3 = true;
>  	unsigned long guest_cr3;
>  	u64 eptp;
> +	u64 spptp;
>  
>  	guest_cr3 = cr3;
>  	if (enable_ept) {
> @@ -3027,6 +3027,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  
>  	if (update_guest_cr3)
>  		vmcs_writel(GUEST_CR3, guest_cr3);
> +
> +	if (kvm->arch.spp_active && VALID_PAGE(vcpu->kvm->arch.sppt_root)) {
> +		spptp = construct_spptp(vcpu->kvm->arch.sppt_root);
> +		vmcs_write64(SPPT_POINTER, spptp);
> +		vmx_flush_tlb(vcpu, true);

Why is SPP so special that it gets to force TLB flushes?

> +	}
>  }
>  
>  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> @@ -5361,6 +5367,74 @@ static int handle_monitor_trap(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +int handle_spp(struct kvm_vcpu *vcpu)

Can be static.

> +{
> +	unsigned long exit_qualification;
> +	struct kvm_memory_slot *slot;
> +	gpa_t gpa;
> +	gfn_t gfn;
> +
> +	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +
> +	/*
> +	 * SPP VM exit happened while executing iret from NMI,
> +	 * "blocked by NMI" bit has to be set before next VM entry.
> +	 * There are errata that may cause this bit to not be set:
> +	 * AAK134, BY25.
> +	 */
> +	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
> +	    (exit_qualification & SPPT_INTR_INFO_UNBLOCK_NMI))
> +		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
> +			      GUEST_INTR_STATE_NMI);
> +
> +	vcpu->arch.exit_qualification = exit_qualification;

	if (WARN_ON(!(exit_qualification & SPPT_INDUCED_EXIT_TYPE)))
		goto out_err;

	<handle spp exit>

	return 1;

out_err:
	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
	vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
	return 0;

> +	if (exit_qualification & SPPT_INDUCED_EXIT_TYPE) {
> +		int page_num = KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);

The compiler is probably clever enough to make these constants, but if
this logic is a fundamental property of SPP then it should be defined as
a macro somewhere.

> +		u32 *access;
> +		gfn_t gfn_max;
> +
> +		/*
> +		 * SPPT missing
> +		 * We don't set SPP write access for the corresponding
> +		 * GPA, if we haven't setup, we need to construct
> +		 * SPP table here.
> +		 */
> +		gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> +		gfn = gpa >> PAGE_SHIFT;

gpa_to_gfn()

> +		trace_kvm_spp_induced_exit(vcpu, gpa, exit_qualification);
> +		/*
> +		 * In level 1 of SPPT, there's no PRESENT bit, all data is
> +		 * regarded as permission vector, so need to check from
> +		 * level 2 to set up the vector if target page is protected.
> +		 */
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		gfn &= ~(page_num - 1);



> +		gfn_max = gfn + page_num - 1;

s/gfn_max/gfn_end

> +		for (; gfn <= gfn_max; gfn++) {

My preference would be to do:
		gfn_end = gfn + page_num;

		for ( ; gfn < gfn_end; gfn++)

> +			slot = gfn_to_memslot(vcpu->kvm, gfn);
> +			if (!slot)
> +				continue;
> +			access = gfn_to_subpage_wp_info(slot, gfn);
> +			if (access && *access != FULL_SPP_ACCESS)
> +				kvm_spp_setup_structure(vcpu,
> +							*access,
> +							gfn);
> +		}
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +		return 1;
> +	}
> +	/*
> +	 * SPPT Misconfig
> +	 * This is probably caused by some mis-configuration in SPPT
> +	 * entries, cannot handle it here, escalate the fault to
> +	 * emulator.
> +	 */
> +	WARN_ON(1);
> +	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> +	vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
> +	return 0;
> +}
> +
>  static int handle_monitor(struct kvm_vcpu *vcpu)
>  {
>  	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> @@ -5575,6 +5649,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[EXIT_REASON_INVVPID]                 = handle_vmx_instruction,
>  	[EXIT_REASON_RDRAND]                  = handle_invalid_op,
>  	[EXIT_REASON_RDSEED]                  = handle_invalid_op,
> +	[EXIT_REASON_SPP]                     = handle_spp,
>  	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
>  	[EXIT_REASON_INVPCID]                 = handle_invpcid,
>  	[EXIT_REASON_VMFUNC]		      = handle_vmx_instruction,
> @@ -5807,6 +5882,9 @@ void dump_vmcs(void)
>  		pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV));
>  	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT))
>  		pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER));
> +	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_SPP))
> +		pr_err("SPPT pointer = 0x%016llx\n", vmcs_read64(SPPT_POINTER));
> +
>  	n = vmcs_read32(CR3_TARGET_COUNT);
>  	for (i = 0; i + 1 < n; i += 4)
>  		pr_err("CR3 target%u=%016lx target%u=%016lx\n",
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fb7da000ceaf..a9d7fc21dad6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9782,6 +9782,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
>  	}
>  
>  	kvm_page_track_free_memslot(free, dont);
> +	kvm_spp_free_memslot(free, dont);
>  }
>  
>  int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> @@ -10406,3 +10407,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_set_subpages);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_induced_exit);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_induced_page_fault);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 09e5e8e6e6dd..c0f3162ee46a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -244,6 +244,7 @@ struct kvm_hyperv_exit {
>  #define KVM_EXIT_IOAPIC_EOI       26
>  #define KVM_EXIT_HYPERV           27
>  #define KVM_EXIT_ARM_NISV         28
> +#define KVM_EXIT_SPP              29
>  
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -401,6 +402,11 @@ struct kvm_run {
>  		struct {
>  			__u8 vector;
>  		} eoi;
> +		/* KVM_EXIT_SPP */
> +		struct {
> +			__u64 addr;
> +			__u8 ins_len;
> +		} spp;
>  		/* KVM_EXIT_HYPERV */
>  		struct kvm_hyperv_exit hyperv;
>  		/* KVM_EXIT_ARM_NISV */
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 05/10] x86: spp: Introduce user-space SPP IOCTLs
  2020-01-02  6:13 ` [RESEND PATCH v10 05/10] x86: spp: Introduce user-space SPP IOCTLs Yang Weijiang
@ 2020-01-10 18:10   ` Sean Christopherson
  2020-01-13  8:21     ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-10 18:10 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Thu, Jan 02, 2020 at 02:13:14PM +0800, Yang Weijiang wrote:
> User application, e.g., QEMU or VMI, must initialize SPP
> before gets/sets SPP subpages, the dynamic initialization is to
> reduce the extra storage cost if the SPP feature is not not used.
> 
> Co-developed-by: He Chen <he.chen@linux.intel.com>
> Signed-off-by: He Chen <he.chen@linux.intel.com>
> Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  4 ++
>  arch/x86/kvm/mmu/spp.c          | 44 +++++++++++++++
>  arch/x86/kvm/mmu/spp.h          |  9 ++++
>  arch/x86/kvm/vmx/vmx.c          | 15 ++++++
>  arch/x86/kvm/x86.c              | 95 ++++++++++++++++++++++++++++++++-
>  include/uapi/linux/kvm.h        |  3 ++
>  6 files changed, 169 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f5145b86d620..c7a9f03f39a7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1238,6 +1238,10 @@ struct kvm_x86_ops {
>  
>  	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>  	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> +
> +	int (*init_spp)(struct kvm *kvm);
> +	int (*flush_subpages)(struct kvm *kvm, u64 gfn, u32 npages);
> +	int (*get_inst_len)(struct kvm_vcpu *vcpu);

If this is necessary, which hopefully it isn't, then get_insn_len() to be
consistent with other KVM nomenclature.

A comment for the series overall, it needs a lot of work to properly order
code between patches.  E.g. this patch introduces get_inst_len() without
any justification in the changelog and without a user.  At best it's
confusing, at worst this series will be impossible to bisect.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 02/10] vmx: spp: Add control flags for Sub-Page Protection(SPP)
  2020-01-10 16:58   ` Sean Christopherson
@ 2020-01-13  5:44     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-13  5:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Fri, Jan 10, 2020 at 08:58:59AM -0800, Sean Christopherson wrote:
> On Thu, Jan 02, 2020 at 02:13:11PM +0800, Yang Weijiang wrote:
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index e3394c839dea..5713e8a6224c 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -60,6 +60,7 @@
> >  #include "vmcs12.h"
> >  #include "vmx.h"
> >  #include "x86.h"
> > +#include "../mmu/spp.h"
> 
> The ".." should be unnecessary, e.g. x86.h is obviously a level up.
>
Sean, thanks a lot for the feedback! Will change this.
> >  MODULE_AUTHOR("Qumranet");
> >  MODULE_LICENSE("GPL");
> > @@ -111,6 +112,7 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);
> >  
> >  static bool __read_mostly dump_invalid_vmcs = 0;
> >  module_param(dump_invalid_vmcs, bool, 0644);
> > +static bool __read_mostly spp_supported = 0;
> 
> s/spp_supported/enable_spp to be consistent with all the other booleans.
> 
> Is there a reason this isn't exposed as a module param?
> 
Yes, in original versions, SPP is enbled by a module param, so called
"static enable", considering the SPP bitmap pre-allocated is a bit
large, the from v3, it's changed to "dynamic enable", i.e., user
application need to enable SPP via init_spp IOCTL(later changed to via
ENABLE_CAP) to remove the pre-allocation, so the flag now is used to
cross-check SPP status between functions. Will change the name.

> And if this is to be on by default, then the flag itself should be
> initialized to '1' so that it's clear to readers that the feature is
> enabled by default (if it's supported).  Looking at only this code, I would
> think that SPP is forced off and can't be enabled.
> 
> That being said, turning on the enable_spp control flag should be the last
> patch in the series, i.e. it shouldn't be turned on until all the
> underlying support code is in place.  So, I would keep this as is, but
> invert the code in hardware_setup() below.  That way the flag exists and
> is checked, but can't be turned on without modifying the code.  Then when
> all is said and done, you can add a patch to introduce the module param
> and turn on the flag by default (if that's indeed what we want).
> 
You're right, I'll re-order the patch to enable SPP bit in the last
patch, thanks!

> >  #define MSR_BITMAP_MODE_X2APIC		1
> >  #define MSR_BITMAP_MODE_X2APIC_APICV	2
> > @@ -2391,6 +2393,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> >  			SECONDARY_EXEC_RDSEED_EXITING |
> >  			SECONDARY_EXEC_RDRAND_EXITING |
> >  			SECONDARY_EXEC_ENABLE_PML |
> > +			SECONDARY_EXEC_ENABLE_SPP |
> >  			SECONDARY_EXEC_TSC_SCALING |
> >  			SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
> >  			SECONDARY_EXEC_PT_USE_GPA |
> > @@ -4039,6 +4042,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
> >  	if (!enable_pml)
> >  		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
> >  
> > +	if (!spp_supported)
> > +		exec_control &= ~SECONDARY_EXEC_ENABLE_SPP;
> > +
> >  	if (vmx_xsaves_supported()) {
> >  		/* Exposing XSAVES only when XSAVE is exposed */
> >  		bool xsaves_enabled =
> > @@ -7630,6 +7636,9 @@ static __init int hardware_setup(void)
> >  	if (!cpu_has_vmx_flexpriority())
> >  		flexpriority_enabled = 0;
> >  
> > +	if (cpu_has_vmx_ept_spp() && enable_ept)
> > +		spp_supported = 1;
> 
> As above, invert this to disable spp when it's not supported, or when EPT
> is disabled (or not supported).
>
Sure,thank you!
> > +
> >  	if (!cpu_has_virtual_nmis())
> >  		enable_vnmi = 0;
> >  
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions
  2020-01-10 17:26   ` Sean Christopherson
@ 2020-01-13  6:00     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-13  6:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Fri, Jan 10, 2020 at 09:26:11AM -0800, Sean Christopherson wrote:
> On Thu, Jan 02, 2020 at 02:13:12PM +0800, Yang Weijiang wrote:
> > SPPT is a 4-level paging structure similar to EPT, when SPP is
> 
> How does SPP interact with 5-level EPT?
>
It should work, will add the test later.

> > armed for target physical page, bit 61 of the corresponding
> > EPT entry is flaged, then SPPT is traversed with the gfn,
> > the leaf entry of SPPT contains the access bitmap of subpages
> > inside the target 4KB physical page, one bit per 128-byte subpage.
> > 
> > Co-developed-by: He Chen <he.chen@linux.intel.com>
> > Signed-off-by: He Chen <he.chen@linux.intel.com>
> > Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> 
> ...
> 
> > +static u64 format_spp_spte(u32 spp_wp_bitmap)
> > +{
> > +	u64 new_spte = 0;
> > +	int i = 0;
> > +
> > +	/*
> > +	 * One 4K page contains 32 sub-pages, in SPP table L4E, old bits
> 
> Is this
> 
> 	One 4k page constains 32 sub-pages in SPP table L4E.  Old bits are...
> 
> or
> 	One 4k page contains 32 sub-pages.  In SPP table L4E, old bits are...
> 
> or
> 	???
>
The second case, there's a typo, old should be odd. will modify it,
thank you!

> > +	 * are reserved, so we need to transfer u32 subpage write
> 
> Wrap comments at 80 columns to save lines.
> 
> > +	 * protect bitmap to u64 SPP L4E format.
> 
> What is a "page" in "one 4k page"?  What old bits?  Why not just track a
> 64-bit value?  I understand *what* the code below does, but I have no clue
> why or whether it's correct.
old should be "odd", according to SDM, the write-permission is tracked
via even bits(2i), then 32*128 = 4KB. odd bits are reserved now.
> 
> > +	 */
> > +	while (i < 32) {
> > +		if (spp_wp_bitmap & (1ULL << i))
> > +			new_spte |= 1ULL << (i * 2);
> > +		i++;
> > +	}
> 
> 	for (i = 0; i < 32; i++)
> 		new_spte |= (spp_wp_bitmap & BIT_ULL(i)) << i;
> 
> At the very least, use a for loop.
Sure, will change it, thank you!

> 
> > +
> > +	return new_spte;
> > +}
> > +
> > +static void spp_spte_set(u64 *sptep, u64 new_spte)
> > +{
> > +	__set_spte(sptep, new_spte);
> > +}
> > +
> > +bool is_spp_spte(struct kvm_mmu_page *sp)
> > +{
> > +	return sp->role.spp;
> > +}
> > +
> > +#define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)
> > +
> > +int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
> > +			    u32 access_map, gfn_t gfn)
> > +{
> > +	struct kvm_shadow_walk_iterator iter;
> > +	struct kvm_mmu_page *sp;
> > +	gfn_t pseudo_gfn;
> > +	u64 old_spte, spp_spte;
> > +	int ret = -EFAULT;
> > +
> > +	/* direct_map spp start */
> > +	if (!VALID_PAGE(vcpu->kvm->arch.sppt_root))
> > +		return -EFAULT;
> > +
> > +	for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
> > +		if (iter.level == PT_PAGE_TABLE_LEVEL) {
> > +			spp_spte = format_spp_spte(access_map);
> > +			old_spte = mmu_spte_get_lockless(iter.sptep);
> > +			if (old_spte != spp_spte)
> > +				spp_spte_set(iter.sptep, spp_spte);
> > +			ret = 0;
> > +			break;
> > +		}
> > +
> > +		if (!is_shadow_present_pte(*iter.sptep)) {
> > +			u64 base_addr = iter.addr;
> > +
> > +			base_addr &= PT64_LVL_ADDR_MASK(iter.level);
> > +			pseudo_gfn = base_addr >> PAGE_SHIFT;
> > +			spp_spte = *iter.sptep;
> > +			sp = kvm_spp_get_page(vcpu, pseudo_gfn,
> > +					      iter.level - 1);
> > +			link_spp_shadow_page(vcpu, iter.sptep, sp);
> > +		} else if (iter.level == PT_DIRECTORY_LEVEL  &&
> > +			   !(spp_spte & PT_PRESENT_MASK) &&
> > +			   (spp_spte & SPPT_ENTRY_PHA_MASK)) {
> > +			spp_spte = *iter.sptep;
> > +			spp_spte |= PT_PRESENT_MASK;
> > +			spp_spte_set(iter.sptep, spp_spte);
> > +		}
> > +	}
> > +
> > +	kvm_flush_remote_tlbs(vcpu->kvm);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);
> > +
> > +inline u64 construct_spptp(unsigned long root_hpa)
> > +{
> > +	return root_hpa & PAGE_MASK;
> > +}
> > +EXPORT_SYMBOL_GPL(construct_spptp);
> > +
> > diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
> > new file mode 100644
> > index 000000000000..8ef94b7a2057
> > --- /dev/null
> > +++ b/arch/x86/kvm/mmu/spp.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __KVM_X86_VMX_SPP_H
> > +#define __KVM_X86_VMX_SPP_H
> > +
> > +bool is_spp_spte(struct kvm_mmu_page *sp);
> > +u64 construct_spptp(unsigned long root_hpa);
> > +int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
> > +			    u32 access_map, gfn_t gfn);
> > +
> > +#endif /* __KVM_X86_VMX_SPP_H */
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions
  2020-01-10 17:40   ` Sean Christopherson
@ 2020-01-13  6:04     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-13  6:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Fri, Jan 10, 2020 at 09:40:27AM -0800, Sean Christopherson wrote:
> On Thu, Jan 02, 2020 at 02:13:12PM +0800, Yang Weijiang wrote:
> > SPPT is a 4-level paging structure similar to EPT, when SPP is
> > armed for target physical page, bit 61 of the corresponding
> > EPT entry is flaged, then SPPT is traversed with the gfn,
> > the leaf entry of SPPT contains the access bitmap of subpages
> > inside the target 4KB physical page, one bit per 128-byte subpage.
> > 
> > Co-developed-by: He Chen <he.chen@linux.intel.com>
> > Signed-off-by: He Chen <he.chen@linux.intel.com>
> > Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |   5 +-
> >  arch/x86/kvm/mmu/spp.c          | 228 ++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/mmu/spp.h          |  10 ++
> 
> This patch is completely untesteable.  It adds spp.c but doesn't compile
> it.  Actually, everything up until patch 06 is untestable.
OK, will re-order the patches in a reasonable sequence. thank you!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 04/10] mmu: spp: Add functions to operate SPP access bitmap
  2020-01-10 17:38   ` Sean Christopherson
@ 2020-01-13  6:15     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-13  6:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Fri, Jan 10, 2020 at 09:38:04AM -0800, Sean Christopherson wrote:
> On Thu, Jan 02, 2020 at 02:13:13PM +0800, Yang Weijiang wrote:
> > Create access bitmap for SPP subpages, the bitmap can
> > be accessed with a gfn. The initial access bitmap for each
> > physical page is 0xFFFFFFFF, meaning SPP is not enabled for the
> > subpages.
> 
> Wrap changelogs at ~75 chars.
> 
> Create access bitmap for SPP subpages, the bitmap can be accessed with a
> gfn.  The initial access bitmap for each physical page is 0xFFFFFFFF,
> meaning SPP is not enabled for the subpages.
> 
> There needs to be a *lot* more information provided in all of the changelogs
> for this series.  I understand the basic concepts of SPP, but nothing in the
> documentation or changelogs explains how KVM generates the SPP tables based
> on userspace input.  Essentially, explain the design in decent detail, with
> a focus on *why* KVM does what it does.
>
OK, will modify the documentation to add SPPT setup section, thanks!

> > +static int kvm_spp_level_pages(gfn_t gfn_lower, gfn_t gfn_upper, int level)
> > +{
> > +	int page_num = KVM_PAGES_PER_HPAGE(level);
> > +	gfn_t gfn_max = (gfn_lower & ~(page_num - 1)) + page_num - 1;
> > +	int ret;
> > +
> > +	if (gfn_upper <= gfn_max)
> > +		ret = gfn_upper - gfn_lower + 1;
> > +	else
> > +		ret = gfn_max - gfn_lower + 1;
> > +
> > +	return ret;
> > +}
> > +
> >  #define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)
> 
> There's almost certainly an existing macro for this.
>
Sure, will remove it.
> >  
> >  int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
> > @@ -220,6 +249,309 @@ int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);
> >  
> > +int vmx_spp_flush_sppt(struct kvm *kvm, u64 gfn_base, u32 npages)
> > +{
> > +	struct kvm_shadow_walk_iterator iter;
> > +	struct kvm_vcpu *vcpu;
> > +	gfn_t gfn = gfn_base;
> > +	gfn_t gfn_max = gfn_base + npages - 1;
> 
> s/gfn_max/gfn_end.  "max" makes me think this is literally walking every
> possible gfn.
> 
Make sense, will change it.

> > +	u64 spde;
> > +	int count;
> > +	bool flush = false;
> > +
> >  /*
> >   * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
> >   * other bits are reserved for kvm internal use which are defined in
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-10 17:55   ` Sean Christopherson
@ 2020-01-13  6:50     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-13  6:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Fri, Jan 10, 2020 at 09:55:37AM -0800, Sean Christopherson wrote:
> On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > If write to subpage is not allowed, EPT violation generates
> > and it's handled in fast_page_fault().
> > 
> > In current implementation, SPPT setup is only handled in handle_spp()
> > vmexit handler, it's triggered when SPP bit is set in EPT leaf
> > entry while SPPT entries are not ready.
> > 
> > A SPP specific bit(11) is added to exit_qualification and a new
> > exit reason(66) is introduced for SPP.
> 
> ...
> 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 6f92b40d798c..c41791ebee65 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -6372,6 +6427,8 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
> >  	return nr_mmu_pages;
> >  }
> >  
> > +#include "spp.c"
> > +
> 
> Unless there is a *very* good reason for these shenanigans, spp.c needs
> to built via the Makefile like any other source.  If this is justified
> for whatever reason, then that justification needs to be very clearly
> stated in the changelog.

Yes, it looks odd. When extracted the SPP code from mmu.c, I found a lot
of functions in mmu.c should be exposed so that spp.c can see them, I
took them as unnecessary modification to mmu.c, so just add the spp.c file back
to mmu.c, if you suggest change it with a seperate object file, I'll do it.

> 
> In general, the code organization of this entire series likely needs to
> be overhauled.  There are gobs exports which are either completely
> unnecessary or completely backswards.
> 
> E.g. exporting VMX-only functions from spp.c, which presumably are only
> callbed by VMX.
> 
> 	EXPORT_SYMBOL_GPL(vmx_spp_flush_sppt);
> 	EXPORT_SYMBOL_GPL(vmx_spp_init);
> 
> Exporting ioctl helpers from the same file, which are presumably called
> only from x86.c.
> 
> 	EXPORT_SYMBOL_GPL(kvm_vm_ioctl_get_subpages);
> 	EXPORT_SYMBOL_GPL(kvm_vm_ioctl_set_subpages);

Thanks for the suggestion, I'll go over the patches and or-organize them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-10 18:04   ` Sean Christopherson
@ 2020-01-13  8:10     ` Yang Weijiang
  2020-01-13 17:33       ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2020-01-13  8:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Fri, Jan 10, 2020 at 10:04:59AM -0800, Sean Christopherson wrote:
> On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> >  		if ((error_code & PFERR_WRITE_MASK) &&
> >  		    spte_can_locklessly_be_made_writable(spte))
> >  		{
> > -			new_spte |= PT_WRITABLE_MASK;
> > +			/*
> > +			 * Record write protect fault caused by
> > +			 * Sub-page Protection, let VMI decide
> > +			 * the next step.
> > +			 */
> > +			if (spte & PT_SPP_MASK) {
> > +				int len = kvm_x86_ops->get_inst_len(vcpu);
> 
> There's got to be a better way to handle SPP exits than adding a helper
> to retrieve the instruction length.
>
The fault instruction was skipped by kvm_skip_emulated_instruction()
before, but Paolo suggested leave the re-do or skip option to user-space
to make it flexible for write protection or write tracking, so return
length to user-space.

> > +
> > +				fault_handled = true;
> > +				vcpu->run->exit_reason = KVM_EXIT_SPP;
> > +				vcpu->run->spp.addr = gva;
> > +				vcpu->run->spp.ins_len = len;
> 
> s/ins_len/insn_len to be consistent with other KVM nomenclature.
> 
OK.

> > +				trace_kvm_spp_induced_page_fault(vcpu,
> > +								 gva,
> > +								 len);
> > +				break;
> > +			}
> > +
> > +			if (was_spp_armed(new_spte)) {
> > +				restore_spp_bit(&new_spte);
> > +				spp_protected = true;
> > +			} else {
> > +				new_spte |= PT_WRITABLE_MASK;
> > +			}
> >  
> >  			/*
> >  			 * Do not fix write-permission on the large spte.  Since
> 
> ...
> 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 24e4e1c47f42..97d862c79124 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -200,7 +200,6 @@ static const struct {
> >  	[VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false},
> >  	[VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false},
> >  };
> > -
> 
> Spurious whitepsace.
>
OK.

> >  #define L1D_CACHE_ORDER 4
> >  static void *vmx_l1d_flush_pages;
> >  
> > @@ -2999,6 +2998,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >  	bool update_guest_cr3 = true;
> >  	unsigned long guest_cr3;
> >  	u64 eptp;
> > +	u64 spptp;
> >  
> >  	guest_cr3 = cr3;
> >  	if (enable_ept) {
> > @@ -3027,6 +3027,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >  
> >  	if (update_guest_cr3)
> >  		vmcs_writel(GUEST_CR3, guest_cr3);
> > +
> > +	if (kvm->arch.spp_active && VALID_PAGE(vcpu->kvm->arch.sppt_root)) {
> > +		spptp = construct_spptp(vcpu->kvm->arch.sppt_root);
> > +		vmcs_write64(SPPT_POINTER, spptp);
> > +		vmx_flush_tlb(vcpu, true);
> 
> Why is SPP so special that it gets to force TLB flushes?
I double checked the code, there's a call to vmx_flush_tlb() in
mmu_load(), so it's unnecessary here, thank you!

> > +	}
> >  }
> >  
> >  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> > @@ -5361,6 +5367,74 @@ static int handle_monitor_trap(struct kvm_vcpu *vcpu)
> >  	return 1;
> >  }
> >  
> > +int handle_spp(struct kvm_vcpu *vcpu)
> 
> Can be static.
>
Thanks!

> > +{
> > +	unsigned long exit_qualification;
> > +	struct kvm_memory_slot *slot;
> > +	gpa_t gpa;
> > +	gfn_t gfn;
> > +
> > +	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> > +
> > +	/*
> > +	 * SPP VM exit happened while executing iret from NMI,
> > +	 * "blocked by NMI" bit has to be set before next VM entry.
> > +	 * There are errata that may cause this bit to not be set:
> > +	 * AAK134, BY25.
> > +	 */
> > +	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
> > +	    (exit_qualification & SPPT_INTR_INFO_UNBLOCK_NMI))
> > +		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
> > +			      GUEST_INTR_STATE_NMI);
> > +
> > +	vcpu->arch.exit_qualification = exit_qualification;
> 
> 	if (WARN_ON(!(exit_qualification & SPPT_INDUCED_EXIT_TYPE)))
> 		goto out_err;
> 
> 	<handle spp exit>
> 
> 	return 1;
> 
> out_err:
> 	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> 	vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
> 	return 0;
>
Sure, will change it.

> > +	if (exit_qualification & SPPT_INDUCED_EXIT_TYPE) {
> > +		int page_num = KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);
> 
> The compiler is probably clever enough to make these constants, but if
> this logic is a fundamental property of SPP then it should be defined as
> a macro somewhere.
>
OK, will change it.

> > +		u32 *access;
> > +		gfn_t gfn_max;
> > +
> > +		/*
> > +		 * SPPT missing
> > +		 * We don't set SPP write access for the corresponding
> > +		 * GPA, if we haven't setup, we need to construct
> > +		 * SPP table here.
> > +		 */
> > +		gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> > +		gfn = gpa >> PAGE_SHIFT;
> 
> gpa_to_gfn()
>
OK.

> > +		trace_kvm_spp_induced_exit(vcpu, gpa, exit_qualification);
> > +		/*
> > +		 * In level 1 of SPPT, there's no PRESENT bit, all data is
> > +		 * regarded as permission vector, so need to check from
> > +		 * level 2 to set up the vector if target page is protected.
> > +		 */
> > +		spin_lock(&vcpu->kvm->mmu_lock);
> > +		gfn &= ~(page_num - 1);
> 
> 
> 
> > +		gfn_max = gfn + page_num - 1;
> 
> s/gfn_max/gfn_end
OK.

> 
> > +		for (; gfn <= gfn_max; gfn++) {
> 
> My preference would be to do:
> 		gfn_end = gfn + page_num;
> 
> 		for ( ; gfn < gfn_end; gfn++)
>
Thank you!
> > +			slot = gfn_to_memslot(vcpu->kvm, gfn);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 05/10] x86: spp: Introduce user-space SPP IOCTLs
  2020-01-10 18:10   ` Sean Christopherson
@ 2020-01-13  8:21     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-13  8:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Fri, Jan 10, 2020 at 10:10:53AM -0800, Sean Christopherson wrote:
> On Thu, Jan 02, 2020 at 02:13:14PM +0800, Yang Weijiang wrote:
> > User application, e.g., QEMU or VMI, must initialize SPP
> > before gets/sets SPP subpages, the dynamic initialization is to
> > reduce the extra storage cost if the SPP feature is not not used.
> > 
> > Co-developed-by: He Chen <he.chen@linux.intel.com>
> > Signed-off-by: He Chen <he.chen@linux.intel.com>
> > Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  4 ++
> >  arch/x86/kvm/mmu/spp.c          | 44 +++++++++++++++
> >  arch/x86/kvm/mmu/spp.h          |  9 ++++
> >  arch/x86/kvm/vmx/vmx.c          | 15 ++++++
> >  arch/x86/kvm/x86.c              | 95 ++++++++++++++++++++++++++++++++-
> >  include/uapi/linux/kvm.h        |  3 ++
> >  6 files changed, 169 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index f5145b86d620..c7a9f03f39a7 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1238,6 +1238,10 @@ struct kvm_x86_ops {
> >  
> >  	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> >  	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > +
> > +	int (*init_spp)(struct kvm *kvm);
> > +	int (*flush_subpages)(struct kvm *kvm, u64 gfn, u32 npages);
> > +	int (*get_inst_len)(struct kvm_vcpu *vcpu);
> 
> If this is necessary, which hopefully it isn't, then get_insn_len() to be
> consistent with other KVM nomenclature.
>
Yep, will change it.

> A comment for the series overall, it needs a lot of work to properly order
> code between patches.  E.g. this patch introduces get_inst_len() without
> any justification in the changelog and without a user.  At best it's
> confusing, at worst this series will be impossible to bisect.

I'll double check the patch and add more comments on some confusing
points. Meanwhile, will re-order some code to make the serial testable,
thanks a lot for your careful review!


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-13  8:10     ` Yang Weijiang
@ 2020-01-13 17:33       ` Sean Christopherson
  2020-01-13 18:55         ` Adalbert Lazăr
  2020-01-14  3:08         ` Yang Weijiang
  0 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2020-01-13 17:33 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Mon, Jan 13, 2020 at 04:10:50PM +0800, Yang Weijiang wrote:
> On Fri, Jan 10, 2020 at 10:04:59AM -0800, Sean Christopherson wrote:
> > On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > > @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> > >  		if ((error_code & PFERR_WRITE_MASK) &&
> > >  		    spte_can_locklessly_be_made_writable(spte))
> > >  		{
> > > -			new_spte |= PT_WRITABLE_MASK;
> > > +			/*
> > > +			 * Record write protect fault caused by
> > > +			 * Sub-page Protection, let VMI decide
> > > +			 * the next step.
> > > +			 */
> > > +			if (spte & PT_SPP_MASK) {
> > > +				int len = kvm_x86_ops->get_inst_len(vcpu);
> > 
> > There's got to be a better way to handle SPP exits than adding a helper
> > to retrieve the instruction length.
> >
> The fault instruction was skipped by kvm_skip_emulated_instruction()
> before, but Paolo suggested leave the re-do or skip option to user-space
> to make it flexible for write protection or write tracking, so return
> length to user-space.

Sorry, my comment was unclear.  I have no objection to punting the fault
to userspace, it's the mechanics of how it's done that I dislike.

Specifically, (a) using run->exit_reason to propagate the SPP exit up the
stack, e.g. instead of modifying affected call stacks to play nice with
any exit to userspace, (b) assuming ->get_insn_len() will always be
accurate, e.g. see the various caveats in skip_emulated_instruction() for
both VMX and SVM, and (c) duplicating the state capture code in every
location that can encounter a SPP fault.

What I'm hoping is that it's possible to modify the call stacks to
explicitly propagate an exit to userspace and/or SPP fault, and shove all
the state capture into a common location, e.g. handle_ept_violation().

Side topic, assuming the userspace VMI is going to be instrospecting the
faulting instruction, won't it decode the instruction?  I.e. calculate
the instruction length anyways?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-13 17:33       ` Sean Christopherson
@ 2020-01-13 18:55         ` Adalbert Lazăr
  2020-01-13 21:47           ` Sean Christopherson
  2020-01-14  3:08         ` Yang Weijiang
  1 sibling, 1 reply; 32+ messages in thread
From: Adalbert Lazăr @ 2020-01-13 18:55 UTC (permalink / raw)
  To: Sean Christopherson, Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, edwin.zhai,
	tamas, mathieu.tarral

On Mon, 13 Jan 2020 09:33:58 -0800, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> On Mon, Jan 13, 2020 at 04:10:50PM +0800, Yang Weijiang wrote:
> > On Fri, Jan 10, 2020 at 10:04:59AM -0800, Sean Christopherson wrote:
> > > On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > > > @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> > > >  		if ((error_code & PFERR_WRITE_MASK) &&
> > > >  		    spte_can_locklessly_be_made_writable(spte))
> > > >  		{
> > > > -			new_spte |= PT_WRITABLE_MASK;
> > > > +			/*
> > > > +			 * Record write protect fault caused by
> > > > +			 * Sub-page Protection, let VMI decide
> > > > +			 * the next step.
> > > > +			 */
> > > > +			if (spte & PT_SPP_MASK) {
> > > > +				int len = kvm_x86_ops->get_inst_len(vcpu);
> > > 
> > > There's got to be a better way to handle SPP exits than adding a helper
> > > to retrieve the instruction length.
> > >
> > The fault instruction was skipped by kvm_skip_emulated_instruction()
> > before, but Paolo suggested leave the re-do or skip option to user-space
> > to make it flexible for write protection or write tracking, so return
> > length to user-space.
> 
> Sorry, my comment was unclear.  I have no objection to punting the fault
> to userspace, it's the mechanics of how it's done that I dislike.
> 
> Specifically, (a) using run->exit_reason to propagate the SPP exit up the
> stack, e.g. instead of modifying affected call stacks to play nice with
> any exit to userspace, (b) assuming ->get_insn_len() will always be
> accurate, e.g. see the various caveats in skip_emulated_instruction() for
> both VMX and SVM, and (c) duplicating the state capture code in every
> location that can encounter a SPP fault.
> 
> What I'm hoping is that it's possible to modify the call stacks to
> explicitly propagate an exit to userspace and/or SPP fault, and shove all
> the state capture into a common location, e.g. handle_ept_violation().
> 
> Side topic, assuming the userspace VMI is going to be instrospecting the
> faulting instruction, won't it decode the instruction?  I.e. calculate
> the instruction length anyways?

Indeed, we decode the instruction from userspace. I don't know if the
instruction length helps other projects. Added Tamas and Mathieu.

In our last VMI API proposal, the breakpoint event had the instruction
length sent to userspace, but I can't remember why.

https://lore.kernel.org/kvm/20190809160047.8319-62-alazar@bitdefender.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-13 18:55         ` Adalbert Lazăr
@ 2020-01-13 21:47           ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2020-01-13 21:47 UTC (permalink / raw)
  To: Adalbert Lazăr
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	edwin.zhai, tamas, mathieu.tarral

On Mon, Jan 13, 2020 at 08:55:46PM +0200, Adalbert Lazăr wrote:
> On Mon, 13 Jan 2020 09:33:58 -0800, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > On Mon, Jan 13, 2020 at 04:10:50PM +0800, Yang Weijiang wrote:
> > > On Fri, Jan 10, 2020 at 10:04:59AM -0800, Sean Christopherson wrote:
> > > > On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > > > > @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> > > > >  		if ((error_code & PFERR_WRITE_MASK) &&
> > > > >  		    spte_can_locklessly_be_made_writable(spte))
> > > > >  		{
> > > > > -			new_spte |= PT_WRITABLE_MASK;
> > > > > +			/*
> > > > > +			 * Record write protect fault caused by
> > > > > +			 * Sub-page Protection, let VMI decide
> > > > > +			 * the next step.
> > > > > +			 */
> > > > > +			if (spte & PT_SPP_MASK) {
> > > > > +				int len = kvm_x86_ops->get_inst_len(vcpu);
> > > > 
> > > > There's got to be a better way to handle SPP exits than adding a helper
> > > > to retrieve the instruction length.
> > > >
> > > The fault instruction was skipped by kvm_skip_emulated_instruction()
> > > before, but Paolo suggested leave the re-do or skip option to user-space
> > > to make it flexible for write protection or write tracking, so return
> > > length to user-space.
> > 
> > Sorry, my comment was unclear.  I have no objection to punting the fault
> > to userspace, it's the mechanics of how it's done that I dislike.
> > 
> > Specifically, (a) using run->exit_reason to propagate the SPP exit up the
> > stack, e.g. instead of modifying affected call stacks to play nice with
> > any exit to userspace, (b) assuming ->get_insn_len() will always be
> > accurate, e.g. see the various caveats in skip_emulated_instruction() for
> > both VMX and SVM, and (c) duplicating the state capture code in every
> > location that can encounter a SPP fault.
> > 
> > What I'm hoping is that it's possible to modify the call stacks to
> > explicitly propagate an exit to userspace and/or SPP fault, and shove all
> > the state capture into a common location, e.g. handle_ept_violation().
> > 
> > Side topic, assuming the userspace VMI is going to be instrospecting the
> > faulting instruction, won't it decode the instruction?  I.e. calculate
> > the instruction length anyways?
> 
> Indeed, we decode the instruction from userspace. I don't know if the
> instruction length helps other projects. Added Tamas and Mathieu.
> 
> In our last VMI API proposal, the breakpoint event had the instruction
> length sent to userspace, but I can't remember why.

INT3 is trap-like, i.e. the VM-Exit occurs after the instruction retires.
It's impossible for software to know how far to unwind RIP without the
instruction length being provided by hardware/KVM, e.g. if the guest is
being silly and prepends ignored prefixes on the INT3.

Self-aware software has a priori knowledge of what's being patched in,
and practically speaking I don't any well-behaved sane software uses
prefixes with INT3, but from a VMM's perspective it's legal and possible.

> 
> https://lore.kernel.org/kvm/20190809160047.8319-62-alazar@bitdefender.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-13 17:33       ` Sean Christopherson
  2020-01-13 18:55         ` Adalbert Lazăr
@ 2020-01-14  3:08         ` Yang Weijiang
  2020-01-14 18:58           ` Sean Christopherson
  1 sibling, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2020-01-14  3:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Mon, Jan 13, 2020 at 09:33:58AM -0800, Sean Christopherson wrote:
> On Mon, Jan 13, 2020 at 04:10:50PM +0800, Yang Weijiang wrote:
> > On Fri, Jan 10, 2020 at 10:04:59AM -0800, Sean Christopherson wrote:
> > > On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > > > @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> > > >  		if ((error_code & PFERR_WRITE_MASK) &&
> > > >  		    spte_can_locklessly_be_made_writable(spte))
> > > >  		{
> > > > -			new_spte |= PT_WRITABLE_MASK;
> > > > +			/*
> > > > +			 * Record write protect fault caused by
> > > > +			 * Sub-page Protection, let VMI decide
> > > > +			 * the next step.
> > > > +			 */
> > > > +			if (spte & PT_SPP_MASK) {
> > > > +				int len = kvm_x86_ops->get_inst_len(vcpu);
> > > 
> > > There's got to be a better way to handle SPP exits than adding a helper
> > > to retrieve the instruction length.
> > >
> > The fault instruction was skipped by kvm_skip_emulated_instruction()
> > before, but Paolo suggested leave the re-do or skip option to user-space
> > to make it flexible for write protection or write tracking, so return
> > length to user-space.
> 
> Sorry, my comment was unclear.  I have no objection to punting the fault
> to userspace, it's the mechanics of how it's done that I dislike.
> 
> Specifically, (a) using run->exit_reason to propagate the SPP exit up the
> stack, e.g. instead of modifying affected call stacks to play nice with
> any exit to userspace, (b) assuming ->get_insn_len() will always be
> accurate, e.g. see the various caveats in skip_emulated_instruction() for
> both VMX and SVM, and (c) duplicating the state capture code in every
> location that can encounter a SPP fault.
How about calling skip_emulated_instruction() in KVM before exit to
userspace, but still return the skipped instruction length, if userspace
would like to re-execute the instruction, it can unwind RIP or simply
rely on KVM?

> 
> What I'm hoping is that it's possible to modify the call stacks to
> explicitly propagate an exit to userspace and/or SPP fault, and shove all
> the state capture into a common location, e.g. handle_ept_violation().
>
The problem is, the state capture code in fast_page_fault() and
emulation case share different causes, the former is generic occurence
of SPP induced EPT violation, the latter is atually a "faked" one while
detecting emulation instruction is writing some SPP protected area, so I
seperated them.

> Side topic, assuming the userspace VMI is going to be instrospecting the
> faulting instruction, won't it decode the instruction?  I.e. calculate
> the instruction length anyways?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-14  3:08         ` Yang Weijiang
@ 2020-01-14 18:58           ` Sean Christopherson
  2020-01-15  1:36             ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2020-01-14 18:58 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, alazar, edwin.zhai

On Tue, Jan 14, 2020 at 11:08:20AM +0800, Yang Weijiang wrote:
> On Mon, Jan 13, 2020 at 09:33:58AM -0800, Sean Christopherson wrote:
> > On Mon, Jan 13, 2020 at 04:10:50PM +0800, Yang Weijiang wrote:
> > > On Fri, Jan 10, 2020 at 10:04:59AM -0800, Sean Christopherson wrote:
> > > > On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > > > > @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> > > > >  		if ((error_code & PFERR_WRITE_MASK) &&
> > > > >  		    spte_can_locklessly_be_made_writable(spte))
> > > > >  		{
> > > > > -			new_spte |= PT_WRITABLE_MASK;
> > > > > +			/*
> > > > > +			 * Record write protect fault caused by
> > > > > +			 * Sub-page Protection, let VMI decide
> > > > > +			 * the next step.
> > > > > +			 */
> > > > > +			if (spte & PT_SPP_MASK) {
> > > > > +				int len = kvm_x86_ops->get_inst_len(vcpu);
> > > > 
> > > > There's got to be a better way to handle SPP exits than adding a helper
> > > > to retrieve the instruction length.
> > > >
> > > The fault instruction was skipped by kvm_skip_emulated_instruction()
> > > before, but Paolo suggested leave the re-do or skip option to user-space
> > > to make it flexible for write protection or write tracking, so return
> > > length to user-space.
> > 
> > Sorry, my comment was unclear.  I have no objection to punting the fault
> > to userspace, it's the mechanics of how it's done that I dislike.
> > 
> > Specifically, (a) using run->exit_reason to propagate the SPP exit up the
> > stack, e.g. instead of modifying affected call stacks to play nice with
> > any exit to userspace, (b) assuming ->get_insn_len() will always be
> > accurate, e.g. see the various caveats in skip_emulated_instruction() for
> > both VMX and SVM, and (c) duplicating the state capture code in every
> > location that can encounter a SPP fault.
>
> How about calling skip_emulated_instruction() in KVM before exit to

I'm confused.  It sounds like KVM_EXIT_SPP provides the instruction length
because it skips an instruction before exiting to userspace.  But if KVM
is is emulating an instruction, it shouldn't be doing
{kvm_}skip_emulated_instruction(), e.g. if emulation fails due to a SPP
violation (returns KVM_EXIT_SPP) then GUEST_RIP should still point at the
exiting instruction.  Ditto for the fast_page_fault() case, RIP shouldn't
be advanced.

What am I missing?

> userspace, but still return the skipped instruction length, if userspace
> would like to re-execute the instruction, it can unwind RIP or simply
> rely on KVM?

I'm not convinced the instruction length needs to be provided to userspace
for this case.  Obviously it's not difficult to provide the info, I just
don't understand the value added by doing so.  As above, RIP shouldn't
need to be unwound, and blindly skipping an instruction seems like an odd
thing for a VMI engine to do.

> > What I'm hoping is that it's possible to modify the call stacks to
> > explicitly propagate an exit to userspace and/or SPP fault, and shove all
> > the state capture into a common location, e.g. handle_ept_violation().
> >
> The problem is, the state capture code in fast_page_fault() and
> emulation case share different causes, the former is generic occurence
> of SPP induced EPT violation, the latter is atually a "faked" one while
> detecting emulation instruction is writing some SPP protected area, so I
> seperated them.

Can we make SPP dependent on unrestricted guest so that the only entry
point to the emulator is through handle_ept_violation()?  And thus the
only path to triggering KVM_EXIT_SPP would also be through
handle_ept_violation(); (I think, might be forgetting a different emulation
path).

>
> > Side topic, assuming the userspace VMI is going to be instrospecting the
> > faulting instruction, won't it decode the instruction?  I.e. calculate
> > the instruction length anyways?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-14 18:58           ` Sean Christopherson
@ 2020-01-15  1:36             ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-15  1:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	alazar, edwin.zhai

On Tue, Jan 14, 2020 at 10:58:08AM -0800, Sean Christopherson wrote:
> On Tue, Jan 14, 2020 at 11:08:20AM +0800, Yang Weijiang wrote:
> > On Mon, Jan 13, 2020 at 09:33:58AM -0800, Sean Christopherson wrote:
> > > On Mon, Jan 13, 2020 at 04:10:50PM +0800, Yang Weijiang wrote:
> > > > On Fri, Jan 10, 2020 at 10:04:59AM -0800, Sean Christopherson wrote:
> > > > > On Thu, Jan 02, 2020 at 02:13:15PM +0800, Yang Weijiang wrote:
> > > > > > @@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> > > > > >  		if ((error_code & PFERR_WRITE_MASK) &&
> > > > > >  		    spte_can_locklessly_be_made_writable(spte))
> > > > > >  		{
> > > > > > -			new_spte |= PT_WRITABLE_MASK;
> > > > > > +			/*
> > > > > > +			 * Record write protect fault caused by
> > > > > > +			 * Sub-page Protection, let VMI decide
> > > > > > +			 * the next step.
> > > > > > +			 */
> > > > > > +			if (spte & PT_SPP_MASK) {
> > > > > > +				int len = kvm_x86_ops->get_inst_len(vcpu);
> > > > > 
> > > > > There's got to be a better way to handle SPP exits than adding a helper
> > > > > to retrieve the instruction length.
> > > > >
> > > > The fault instruction was skipped by kvm_skip_emulated_instruction()
> > > > before, but Paolo suggested leave the re-do or skip option to user-space
> > > > to make it flexible for write protection or write tracking, so return
> > > > length to user-space.
> > > 
> > > Sorry, my comment was unclear.  I have no objection to punting the fault
> > > to userspace, it's the mechanics of how it's done that I dislike.
> > > 
> > > Specifically, (a) using run->exit_reason to propagate the SPP exit up the
> > > stack, e.g. instead of modifying affected call stacks to play nice with
> > > any exit to userspace, (b) assuming ->get_insn_len() will always be
> > > accurate, e.g. see the various caveats in skip_emulated_instruction() for
> > > both VMX and SVM, and (c) duplicating the state capture code in every
> > > location that can encounter a SPP fault.
> >
> > How about calling skip_emulated_instruction() in KVM before exit to
> 
> I'm confused.  It sounds like KVM_EXIT_SPP provides the instruction length
> because it skips an instruction before exiting to userspace.  But if KVM
> is is emulating an instruction, it shouldn't be doing
> {kvm_}skip_emulated_instruction(), e.g. if emulation fails due to a SPP
> violation (returns KVM_EXIT_SPP) then GUEST_RIP should still point at the
> exiting instruction.  Ditto for the fast_page_fault() case, RIP shouldn't
> be advanced.
There're two SPP usages, one is for write-protection the other is for
write-tracking. If the first case is being used, KVM ignores the write
, i.e., write to the memory is discarded. The second case is, if
userspace is tracking memory write through SPP, then it's notified via
KVM_EXIT_SPP but still let the write take effect by unprotecting the
subpage, i.e., like generic 4KB access-tracking.

In the first case, no necessity to re-try the faulted instruction,
the second case, a re-try is necessary, so I would skip current instruction
first, then if it's actually the second case, userspace should take action
based on the instruction lenght returned.
> 
> What am I missing?
>

> > userspace, but still return the skipped instruction length, if userspace
> > would like to re-execute the instruction, it can unwind RIP or simply
> > rely on KVM?
> 
> I'm not convinced the instruction length needs to be provided to userspace
> for this case.  Obviously it's not difficult to provide the info, I just
> don't understand the value added by doing so.  As above, RIP shouldn't
> need to be unwound, and blindly skipping an instruction seems like an odd
> thing for a VMI engine to do.
In the last review by Paolo, he mentioned SPP could be used in
access-tracing manner, it's flexible to provide instruction length to
userspace, so I removed instruction-skip in KVM but let userspace to
decide.
> 
> > > What I'm hoping is that it's possible to modify the call stacks to
> > > explicitly propagate an exit to userspace and/or SPP fault, and shove all
> > > the state capture into a common location, e.g. handle_ept_violation().
> > >
> > The problem is, the state capture code in fast_page_fault() and
> > emulation case share different causes, the former is generic occurence
> > of SPP induced EPT violation, the latter is atually a "faked" one while
> > detecting emulation instruction is writing some SPP protected area, so I
> > seperated them.
> 
> Can we make SPP dependent on unrestricted guest so that the only entry
> point to the emulator is through handle_ept_violation()?  And thus the
> only path to triggering KVM_EXIT_SPP would also be through
> handle_ept_violation(); (I think, might be forgetting a different emulation
> path).
> 
I don't got your point, from my understanding, instruction emulation is
used in several cases regarless of guest working mode. As long as any memory write happens
e.g., string ops, port ops etc, SPP write-protection
check should be applied to let the userspace capture the event.

> >
> > > Side topic, assuming the userspace VMI is going to be instrospecting the
> > > faulting instruction, won't it decode the instruction?  I.e. calculate
> > > the instruction length anyways?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit
  2020-01-02  5:18 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
@ 2020-01-02  5:19 ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2020-01-02  5:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, jmattson, sean.j.christopherson
  Cc: yu.c.zhang, alazar, edwin.zhai, Yang Weijiang

If write to subpage is not allowed, EPT violation generates
and it's handled in fast_page_fault().

In current implementation, SPPT setup is only handled in handle_spp()
vmexit handler, it's triggered when SPP bit is set in EPT leaf
entry while SPPT entries are not ready.

A SPP specific bit(11) is added to exit_qualification and a new
exit reason(66) is introduced for SPP.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Co-developed-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/vmx.h      |  9 ++++
 arch/x86/include/uapi/asm/vmx.h |  2 +
 arch/x86/kvm/mmu/mmu.c          | 69 +++++++++++++++++++++++++---
 arch/x86/kvm/mmu/spp.c          | 17 ++++++-
 arch/x86/kvm/mmu/spp.h          |  4 +-
 arch/x86/kvm/trace.h            | 66 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c          | 80 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  4 ++
 include/uapi/linux/kvm.h        |  6 +++
 9 files changed, 247 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index de79a4de0723..81faf6607a9e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -213,6 +213,8 @@ enum vmcs_field {
 	XSS_EXIT_BITMAP_HIGH            = 0x0000202D,
 	ENCLS_EXITING_BITMAP		= 0x0000202E,
 	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
+	SPPT_POINTER			= 0x00002030,
+	SPPT_POINTER_HIGH		= 0x00002031,
 	TSC_MULTIPLIER                  = 0x00002032,
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
@@ -534,6 +536,13 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_EXECUTABLE	(1 << EPT_VIOLATION_EXECUTABLE_BIT)
 #define EPT_VIOLATION_GVA_TRANSLATED	(1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
 
+/*
+ * Exit Qualifications for SPPT-Induced vmexits
+ */
+#define SPPT_INDUCED_EXIT_TYPE_BIT     11
+#define SPPT_INDUCED_EXIT_TYPE         (1 << SPPT_INDUCED_EXIT_TYPE_BIT)
+#define SPPT_INTR_INFO_UNBLOCK_NMI     INTR_INFO_UNBLOCK_NMI
+
 /*
  * VM-instruction error numbers
  */
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 3eb8411ab60e..4833a39a799b 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -86,6 +86,7 @@
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
+#define EXIT_REASON_SPP                 66
 #define EXIT_REASON_UMWAIT              67
 #define EXIT_REASON_TPAUSE              68
 
@@ -145,6 +146,7 @@
 	{ EXIT_REASON_ENCLS,                 "ENCLS" }, \
 	{ EXIT_REASON_RDSEED,                "RDSEED" }, \
 	{ EXIT_REASON_PML_FULL,              "PML_FULL" }, \
+	{ EXIT_REASON_SPP,                   "SPP" }, \
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
 	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
 	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6f92b40d798c..c41791ebee65 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -20,6 +20,7 @@
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "cpuid.h"
+#include "spp.h"
 
 #include <linux/kvm_host.h>
 #include <linux/types.h>
@@ -177,6 +178,7 @@ module_param(dbg, bool, 0644);
 /* The mask for the R/X bits in EPT PTEs */
 #define PT64_EPT_READABLE_MASK			0x1ull
 #define PT64_EPT_EXECUTABLE_MASK		0x4ull
+#define PT64_SPP_SAVED_BIT	(1ULL << (PT64_SECOND_AVAIL_BITS_SHIFT + 1))
 
 #include <trace/events/kvm.h>
 
@@ -200,6 +202,7 @@ enum {
 	RET_PF_RETRY = 0,
 	RET_PF_EMULATE = 1,
 	RET_PF_INVALID = 2,
+	RET_PF_USERSPACE = 3,
 };
 
 struct pte_list_desc {
@@ -979,6 +982,11 @@ static u64 mark_spte_for_access_track(u64 spte)
 		shadow_acc_track_saved_bits_shift;
 	spte &= ~shadow_acc_track_mask;
 
+	if (spte & PT_SPP_MASK) {
+		spte &= ~PT_SPP_MASK;
+		spte |= PT64_SPP_SAVED_BIT;
+	}
+
 	return spte;
 }
 
@@ -1677,9 +1685,15 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep)
 {
 	bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT,
 					       (unsigned long *)sptep);
+	bool was_spp_armed = test_and_clear_bit(PT_SPP_SHIFT,
+					       (unsigned long *)sptep);
+
 	if (was_writable && !spte_ad_enabled(*sptep))
 		kvm_set_pfn_dirty(spte_to_pfn(*sptep));
 
+	if (was_spp_armed)
+		*sptep |= PT64_SPP_SAVED_BIT;
+
 	return was_writable;
 }
 
@@ -3478,7 +3492,8 @@ static bool page_fault_can_be_fast(u32 error_code)
  */
 static bool
 fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			u64 *sptep, u64 old_spte, u64 new_spte)
+			u64 *sptep, u64 old_spte, u64 new_spte,
+			bool spp_protected)
 {
 	gfn_t gfn;
 
@@ -3499,7 +3514,8 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if (cmpxchg64(sptep, old_spte, new_spte) != old_spte)
 		return false;
 
-	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
+	if ((is_writable_pte(new_spte) && !is_writable_pte(old_spte)) ||
+	    spp_protected) {
 		/*
 		 * The gfn of direct spte is stable since it is
 		 * calculated by sp->gfn.
@@ -3534,6 +3550,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 	struct kvm_shadow_walk_iterator iterator;
 	struct kvm_mmu_page *sp;
 	bool fault_handled = false;
+	bool spp_protected = false;
 	u64 spte = 0ull;
 	uint retry_count = 0;
 
@@ -3585,7 +3602,30 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 		if ((error_code & PFERR_WRITE_MASK) &&
 		    spte_can_locklessly_be_made_writable(spte))
 		{
-			new_spte |= PT_WRITABLE_MASK;
+			/*
+			 * Record write protect fault caused by
+			 * Sub-page Protection, let VMI decide
+			 * the next step.
+			 */
+			if (spte & PT_SPP_MASK) {
+				int len = kvm_x86_ops->get_inst_len(vcpu);
+
+				fault_handled = true;
+				vcpu->run->exit_reason = KVM_EXIT_SPP;
+				vcpu->run->spp.addr = gva;
+				vcpu->run->spp.ins_len = len;
+				trace_kvm_spp_induced_page_fault(vcpu,
+								 gva,
+								 len);
+				break;
+			}
+
+			if (was_spp_armed(new_spte)) {
+				restore_spp_bit(&new_spte);
+				spp_protected = true;
+			} else {
+				new_spte |= PT_WRITABLE_MASK;
+			}
 
 			/*
 			 * Do not fix write-permission on the large spte.  Since
@@ -3604,7 +3644,8 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 
 		/* Verify that the fault can be handled in the fast path */
 		if (new_spte == spte ||
-		    !is_access_allowed(error_code, new_spte))
+		    (!is_access_allowed(error_code, new_spte) &&
+		    !spp_protected))
 			break;
 
 		/*
@@ -3614,7 +3655,8 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 		 */
 		fault_handled = fast_pf_fix_direct_spte(vcpu, sp,
 							iterator.sptep, spte,
-							new_spte);
+							new_spte,
+							spp_protected);
 		if (fault_handled)
 			break;
 
@@ -3740,6 +3782,12 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		    (mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
 			mmu_free_root_page(vcpu->kvm, &mmu->root_hpa,
 					   &invalid_list);
+			if (vcpu->kvm->arch.spp_active) {
+				vcpu->kvm->arch.spp_active = false;
+				mmu_free_root_page(vcpu->kvm,
+						   &vcpu->kvm->arch.sppt_root,
+						   &invalid_list);
+			}
 		} else {
 			for (i = 0; i < 4; ++i)
 				if (mmu->pae_root[i] != 0)
@@ -5223,6 +5271,8 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots)
 		uint i;
 
 		vcpu->arch.mmu->root_hpa = INVALID_PAGE;
+		if (!vcpu->kvm->arch.spp_active)
+			vcpu->kvm->arch.sppt_root = INVALID_PAGE;
 
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 			vcpu->arch.mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
@@ -5539,6 +5589,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 		r = vcpu->arch.mmu->page_fault(vcpu, cr2,
 					       lower_32_bits(error_code),
 					       false);
+
+		if (vcpu->run->exit_reason == KVM_EXIT_SPP)
+			r = RET_PF_USERSPACE;
+
 		WARN_ON(r == RET_PF_INVALID);
 	}
 
@@ -5546,7 +5600,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 		return 1;
 	if (r < 0)
 		return r;
-
+	if (r == RET_PF_USERSPACE)
+		return 0;
 	/*
 	 * Before emulating the instruction, check if the error code
 	 * was due to a RO violation while translating the guest page.
@@ -6372,6 +6427,8 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
 	return nr_mmu_pages;
 }
 
+#include "spp.c"
+
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_unload(vcpu);
diff --git a/arch/x86/kvm/mmu/spp.c b/arch/x86/kvm/mmu/spp.c
index 46a17c3c2cf3..3ec434140967 100644
--- a/arch/x86/kvm/mmu/spp.c
+++ b/arch/x86/kvm/mmu/spp.c
@@ -17,6 +17,18 @@ static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
 	iterator->level = PT64_ROOT_4LEVEL;
 }
 
+/* Restore an spp armed PTE */
+void restore_spp_bit(u64 *spte)
+{
+	*spte &= ~PT64_SPP_SAVED_BIT;
+	*spte |= PT_SPP_MASK;
+}
+
+bool was_spp_armed(u64 spte)
+{
+	return !!(spte & PT64_SPP_SAVED_BIT);
+}
+
 u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	unsigned long idx;
@@ -187,7 +199,7 @@ bool is_spp_spte(struct kvm_mmu_page *sp)
 	return sp->role.spp;
 }
 
-int kvm_spp_level_pages(gfn_t gfn_lower, gfn_t gfn_upper, int level)
+static int kvm_spp_level_pages(gfn_t gfn_lower, gfn_t gfn_upper, int level)
 {
 	int page_num = KVM_PAGES_PER_HPAGE(level);
 	gfn_t gfn_max = (gfn_lower & ~(page_num - 1)) + page_num - 1;
@@ -404,7 +416,7 @@ static void kvm_spp_zap_pte(struct kvm *kvm, u64 *spte, int level)
 	}
 }
 
-bool kvm_spp_flush_rmap(struct kvm *kvm, u64 gfn_min, u64 gfn_max)
+static bool kvm_spp_flush_rmap(struct kvm *kvm, u64 gfn_min, u64 gfn_max)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -459,6 +471,7 @@ int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
 		if (!access)
 			return -EFAULT;
 		*access = access_map[i];
+		trace_kvm_spp_set_subpages(vcpu, gfn, *access);
 	}
 
 	gfn = old_gfn;
diff --git a/arch/x86/kvm/mmu/spp.h b/arch/x86/kvm/mmu/spp.h
index 370a6b71e143..5ffe06d2cd6f 100644
--- a/arch/x86/kvm/mmu/spp.h
+++ b/arch/x86/kvm/mmu/spp.h
@@ -11,7 +11,9 @@ int kvm_spp_set_permission(struct kvm *kvm, u64 gfn, u32 npages,
 			   u32 *access_map);
 int kvm_spp_mark_protection(struct kvm *kvm, u64 gfn, u32 access);
 bool is_spp_spte(struct kvm_mmu_page *sp);
-inline u64 construct_spptp(unsigned long root_hpa);
+void restore_spp_bit(u64 *spte);
+bool was_spp_armed(u64 spte);
+u64 construct_spptp(unsigned long root_hpa);
 int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
 			      u64 gfn,
 			      u32 npages,
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 7c741a0c5f80..293b91d516b1 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1496,6 +1496,72 @@ TRACE_EVENT(kvm_nested_vmenter_failed,
 		__print_symbolic(__entry->err, VMX_VMENTER_INSTRUCTION_ERRORS))
 );
 
+TRACE_EVENT(kvm_spp_set_subpages,
+	TP_PROTO(struct kvm_vcpu *vcpu, gfn_t gfn, u32 access),
+	TP_ARGS(vcpu, gfn, access),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(gfn_t, gfn)
+		__field(u32, access)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->gfn = gfn;
+		__entry->access = access;
+	),
+
+	TP_printk("vcpu %d gfn %llx access %x",
+		  __entry->vcpu_id,
+		  __entry->gfn,
+		  __entry->access)
+);
+
+TRACE_EVENT(kvm_spp_induced_exit,
+	TP_PROTO(struct kvm_vcpu *vcpu, gpa_t gpa, u32 qualification),
+	TP_ARGS(vcpu, gpa, qualification),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(gpa_t, gpa)
+		__field(u32, qualification)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->gpa = gpa;
+		__entry->qualification = qualification;
+	),
+
+	TP_printk("vcpu %d gpa %llx qualificaiton %x",
+		  __entry->vcpu_id,
+		  __entry->gpa,
+		  __entry->qualification)
+);
+
+TRACE_EVENT(kvm_spp_induced_page_fault,
+	TP_PROTO(struct kvm_vcpu *vcpu, gpa_t gpa, int inst_len),
+	TP_ARGS(vcpu, gpa, inst_len),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(gpa_t, gpa)
+		__field(int, inst_len)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->gpa = gpa;
+		__entry->inst_len = inst_len;
+	),
+
+	TP_printk("vcpu %d gpa %llx inst_len %d",
+		  __entry->vcpu_id,
+		  __entry->gpa,
+		  __entry->inst_len)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 24e4e1c47f42..97d862c79124 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -200,7 +200,6 @@ static const struct {
 	[VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false},
 	[VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false},
 };
-
 #define L1D_CACHE_ORDER 4
 static void *vmx_l1d_flush_pages;
 
@@ -2999,6 +2998,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	bool update_guest_cr3 = true;
 	unsigned long guest_cr3;
 	u64 eptp;
+	u64 spptp;
 
 	guest_cr3 = cr3;
 	if (enable_ept) {
@@ -3027,6 +3027,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 	if (update_guest_cr3)
 		vmcs_writel(GUEST_CR3, guest_cr3);
+
+	if (kvm->arch.spp_active && VALID_PAGE(vcpu->kvm->arch.sppt_root)) {
+		spptp = construct_spptp(vcpu->kvm->arch.sppt_root);
+		vmcs_write64(SPPT_POINTER, spptp);
+		vmx_flush_tlb(vcpu, true);
+	}
 }
 
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
@@ -5361,6 +5367,74 @@ static int handle_monitor_trap(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+int handle_spp(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification;
+	struct kvm_memory_slot *slot;
+	gpa_t gpa;
+	gfn_t gfn;
+
+	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+
+	/*
+	 * SPP VM exit happened while executing iret from NMI,
+	 * "blocked by NMI" bit has to be set before next VM entry.
+	 * There are errata that may cause this bit to not be set:
+	 * AAK134, BY25.
+	 */
+	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+	    (exit_qualification & SPPT_INTR_INFO_UNBLOCK_NMI))
+		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+			      GUEST_INTR_STATE_NMI);
+
+	vcpu->arch.exit_qualification = exit_qualification;
+	if (exit_qualification & SPPT_INDUCED_EXIT_TYPE) {
+		int page_num = KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);
+		u32 *access;
+		gfn_t gfn_max;
+
+		/*
+		 * SPPT missing
+		 * We don't set SPP write access for the corresponding
+		 * GPA, if we haven't setup, we need to construct
+		 * SPP table here.
+		 */
+		gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+		gfn = gpa >> PAGE_SHIFT;
+		trace_kvm_spp_induced_exit(vcpu, gpa, exit_qualification);
+		/*
+		 * In level 1 of SPPT, there's no PRESENT bit, all data is
+		 * regarded as permission vector, so need to check from
+		 * level 2 to set up the vector if target page is protected.
+		 */
+		spin_lock(&vcpu->kvm->mmu_lock);
+		gfn &= ~(page_num - 1);
+		gfn_max = gfn + page_num - 1;
+		for (; gfn <= gfn_max; gfn++) {
+			slot = gfn_to_memslot(vcpu->kvm, gfn);
+			if (!slot)
+				continue;
+			access = gfn_to_subpage_wp_info(slot, gfn);
+			if (access && *access != FULL_SPP_ACCESS)
+				kvm_spp_setup_structure(vcpu,
+							*access,
+							gfn);
+		}
+		spin_unlock(&vcpu->kvm->mmu_lock);
+		return 1;
+	}
+	/*
+	 * SPPT Misconfig
+	 * This is probably caused by some mis-configuration in SPPT
+	 * entries, cannot handle it here, escalate the fault to
+	 * emulator.
+	 */
+	WARN_ON(1);
+	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
+	return 0;
+}
+
 static int handle_monitor(struct kvm_vcpu *vcpu)
 {
 	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
@@ -5575,6 +5649,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_INVVPID]                 = handle_vmx_instruction,
 	[EXIT_REASON_RDRAND]                  = handle_invalid_op,
 	[EXIT_REASON_RDSEED]                  = handle_invalid_op,
+	[EXIT_REASON_SPP]                     = handle_spp,
 	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
 	[EXIT_REASON_INVPCID]                 = handle_invpcid,
 	[EXIT_REASON_VMFUNC]		      = handle_vmx_instruction,
@@ -5807,6 +5882,9 @@ void dump_vmcs(void)
 		pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV));
 	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT))
 		pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER));
+	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_SPP))
+		pr_err("SPPT pointer = 0x%016llx\n", vmcs_read64(SPPT_POINTER));
+
 	n = vmcs_read32(CR3_TARGET_COUNT);
 	for (i = 0; i + 1 < n; i += 4)
 		pr_err("CR3 target%u=%016lx target%u=%016lx\n",
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb7da000ceaf..a9d7fc21dad6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9782,6 +9782,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 	}
 
 	kvm_page_track_free_memslot(free, dont);
+	kvm_spp_free_memslot(free, dont);
 }
 
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -10406,3 +10407,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_set_subpages);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_induced_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_spp_induced_page_fault);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 09e5e8e6e6dd..c0f3162ee46a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -244,6 +244,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
 #define KVM_EXIT_ARM_NISV         28
+#define KVM_EXIT_SPP              29
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -401,6 +402,11 @@ struct kvm_run {
 		struct {
 			__u8 vector;
 		} eoi;
+		/* KVM_EXIT_SPP */
+		struct {
+			__u64 addr;
+			__u8 ins_len;
+		} spp;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
 		/* KVM_EXIT_ARM_NISV */
-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, back to index

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-02  6:13 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 01/10] Documentation: Add EPT based Subpage Protection and related APIs Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 02/10] vmx: spp: Add control flags for Sub-Page Protection(SPP) Yang Weijiang
2020-01-10 16:58   ` Sean Christopherson
2020-01-13  5:44     ` Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 03/10] mmu: spp: Add SPP Table setup functions Yang Weijiang
2020-01-10 17:26   ` Sean Christopherson
2020-01-13  6:00     ` Yang Weijiang
2020-01-10 17:40   ` Sean Christopherson
2020-01-13  6:04     ` Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 04/10] mmu: spp: Add functions to operate SPP access bitmap Yang Weijiang
2020-01-10 17:38   ` Sean Christopherson
2020-01-13  6:15     ` Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 05/10] x86: spp: Introduce user-space SPP IOCTLs Yang Weijiang
2020-01-10 18:10   ` Sean Christopherson
2020-01-13  8:21     ` Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit Yang Weijiang
2020-01-10 17:55   ` Sean Christopherson
2020-01-13  6:50     ` Yang Weijiang
2020-01-10 18:04   ` Sean Christopherson
2020-01-13  8:10     ` Yang Weijiang
2020-01-13 17:33       ` Sean Christopherson
2020-01-13 18:55         ` Adalbert Lazăr
2020-01-13 21:47           ` Sean Christopherson
2020-01-14  3:08         ` Yang Weijiang
2020-01-14 18:58           ` Sean Christopherson
2020-01-15  1:36             ` Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 07/10] mmu: spp: Enable Lazy mode SPP protection Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 08/10] mmu: spp: Handle SPP protected pages when VM memory changes Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 09/10] x86: spp: Add SPP protection check in emulation Yang Weijiang
2020-01-02  6:13 ` [RESEND PATCH v10 10/10] kvm: selftests: selftest for Sub-Page protection Yang Weijiang
  -- strict thread matches above, loose matches on Subject: below --
2020-01-02  5:18 [RESEND PATCH v10 00/10] Enable Sub-Page Write Protection Support Yang Weijiang
2020-01-02  5:19 ` [RESEND PATCH v10 06/10] vmx: spp: Set up SPP paging table at vmentry/vmexit Yang Weijiang

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git