linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
@ 2017-10-13 16:57 ` Jim Mattson
  2017-10-13 21:13   ` Paolo Bonzini
  2017-10-16  0:01   ` Yi Zhang
  2017-10-13 23:12 ` [PATCH RFC 01/10] KVM: VMX: Added EPT Subpage Protection Documentation Zhang Yi
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 29+ messages in thread
From: Jim Mattson @ 2017-10-13 16:57 UTC (permalink / raw)
  To: Zhang Yi; +Cc: kvm list, LKML, Paolo Bonzini, Radim Krčmář

I'll ask before Paolo does: Can you please add kvm-unit-tests to
exercise all of this new code?

BTW, what generation of hardware do we need to exercise this code ourselves?

On Fri, Oct 13, 2017 at 4:11 PM, Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
> From: Zhang Yi Z <yi.z.zhang@linux.intel.com>
>
> Hi All,
>
> Here is a patch-series which adding EPT-Based Sub-page Write Protection Support. You can get It's software developer manuals from:
>
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
>
> In Chapter 4 EPT-BASED SUB-PAGE PERMISSIONS.
>
> Introduction:
>
> EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which allow Virtual Machine Monitors(VMM) to specify write-permission for guest physical memory at a sub-page(128 byte) granularity.  When this capability is utilized, the CPU enforces write-access permissions for sub-page regions of 4K pages as specified by the VMM. EPT-based sub-page permissions is intended to enable fine-grained memory write enforcement by a VMM for security(guest OS monitoring) and usages such as device virtualization and memory check-point.
>
> How SPP Works:
>
> SPP is active when the "sub-page write protection" VM-execution control is 1. A new 4-level paging structure named SPP page table(SPPT) is introduced, SPPT will look up the guest physical addresses to derive a 64 bit "sub-page permission" value containing sub-page write permissions. The lookup from guest-physical addresses to the sub-page region permissions is determined by a set of this SPPT paging structures.
>
> The SPPT is used to lookup write permission bits for the 128 byte sub-page regions containing in the 4KB guest physical page. EPT specifies the 4KB page level privileges that software is allowed when accessing the guest physical address, whereas SPPT defines the write permissions for software at the 128 byte granularity regions within a 4KB page. Write accesses prevented due to sub-page permissions looked up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to lookup sub-page region write permissions for guest-physical addresses only when those addresses are used to access memory.
>
> Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
> ┌-----------------------------------------------------------┘
> └-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
>      |
>      └-> <false> --> EPT legacy behavior
>      |
>      |
>      └-> <true>  --> if ept_leaf_entry.writable
>                       |
>                       └-> <true>  --> Ignore SPP
>                       |
>                       └-> <false> --> GPA --> Walk SPP 4-level table--┐
>                                                                       |
> ┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
> |
> Walk SPP L4E table
> |
> └┐--> entry misconfiguration ------------>----------┐<----------------┐
>  |                                                  |                 |
> else                                                |                 |
>  |                                                  |                 |
>  |   ┌------------------SPP VMexit<-----------------┘                 |
>  |   |                                                                |
>  |   └-> exit_qualification & sppt_misconfig --> sppt misconfig       |
>  |   |                                                                |
>  |   └-> exit_qualification & sppt_miss --> sppt miss                 |
>  └--┐                                                                 |
>     |                                                                 |
> walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
>                |                                                      |
>               else                                                    |
>                |                                                      |
>                |                                                      |
>         walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
>                         |                                             |
>                        else                                           |
>                         |                                             |
>                         |                                             |
>                  walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
>                                  |
>                                 else
>                                  |
>                                  └-> if sub-page writable
>                                       └-> <true>  allow, write access
>                                       └-> <false> disallow, EPT violation
>
> Patch-sets Description:
>
> Patch 1: Documentation.
>
> Patch 2: This patch adds reporting SPP capability from VMX Procbased MSR, according to the definition of hardware spec, bit 23 is the control of the SPP capability.
>
> Patch 3: Add new secondary processor-based VM-execution control bit which defined as "sub-page write permission", same as VMX Procbased MSR, bit 23 is the enable bit of SPP.
> Also we introduced a kernel parameter "enable_ept_spp", now SPP is active when the "Sub-page Write Protection" in Secondary  VM-Execution Control is set and enable the kernel parameter by "enable_ept_spp=1".
>
> Patch 4: Introduced the spptp and spp page table.
> The sub-page permission table is referenced via a 64-bit control field called Sub-Page Permission Table Pointer (SPPTP) which contains a 4K-aligned physical address. The index and encoding for this VMCS field if defined 0x2030 at this time The format of SPPTP is shown in below figure 2:
> this patch introduced the Spp paging structures, which root page will created at kvm mmu page initialization.
> Also we added a mmu page role type spp to distinguish it is a spp page or a EPT page.
>
> Patch 5: Introduced the SPP-Induced VM exit and it's handle.
> Accesses using guest-physical addresses may cause SPP-induced VM exits due to an SPPT misconfiguration or an SPPT miss. The basic VM exit reason code reporte for SPP-induced VM exits is 66.
>
> Also introduced the new exit qualification for SPPT-induced vmexits.
>
> | Bit   | Contents                                                          |
> | :---- | :---------------------------------------------------------------- |
> | 10:0  | Reserved (0).                                                     |
> | 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
> | 12    | NMI unblocking due to IRET                                        |
> | 63:13 | Reserved (0)                                                      |
>
> Patch 6: Added a handle of EPT subpage write protection fault.
> A control bit in EPT leaf paging-structure entries is defined as “Sub-Page Permission” (SPP bit). The bit position is 61; it is chosen from among the bits that are currently ignored by the processor and available to software.
> While hardware walking the SPP page table, If the sub-page region write permission bit is set, the write is allowed, else the write is disallowed and results in an EPT violation.
> We need peek this case in EPT violation handler, and trigger a user-space exit, return the write protected address(GVA) to user(qemu).
>
> Patch 7: Introduce ioctls to set/get Sub-Page Write Protection.
> We introduced 2 ioctls to let user application to set/get subpage write protection bitmap per gfn, each gfn corresponds to a bitmap.
> The user application, qemu, or some other security control daemon. will set the protection bitmap via this ioctl.
> the API defined as:
>         struct kvm_subpage {
>                 __u64 base_gfn;
>                 __u64 npages;
>                 /* sub-page write-access bitmap array */
>                 __u32 access_map[SUBPAGE_MAX_BITMAP];
>                 }sp;
>         kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
>         kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
>
> Patch 8 ~ Patch 9: Setup spp page table and update the EPT leaf entry indicated with the SPP enable bit.
> If the sub-page write permission VM-execution control is set, treatment of write accesses to guest-physical accesses depends on the state of the accumulated write-access bit (position 1) and sub-page permission bit (position 61) in the EPT leaf paging-structure.
> Software will update the EPT leaf entry sub-page permission bit while kvm_set_subpage(patch 7). If the EPT write-access bit set to 0 and the SPP bit set to 1 in the leaf EPT paging-structure entry that maps a 4KB page, then the hardware will look up a VMM-managed Sub-Page Permission Table (SPPT), which will be prepared by setup kvm_set_subpage(patch 8).
> The hardware uses the guest-physical address and bits 11:7 of the address accessed to lookup the SPPT to fetch a write permission bit for the 128 byte wide sub-page region being accessed within the 4K guest-physical page. If the sub-page region write permission bit is set, the write is allowed, otherwise the write is disallowed and results in an EPT violation.
> Guest-physical pages mapped via leaf EPT-paging-structures for which the accumulated write-access bit and the SPP bits are both clear (0) generate EPT violations on memory writes accesses. Guest-physical pages mapped via EPT-paging-structure for which the accumulated write-access bit is set (1) allow writes, effectively ignoring the SPP bit on the leaf EPT-paging structure.
> Software will setup the spp page table level4,3,2 as well as EPT page structure, and fill the level 1 page via the 32 bit bitmaps per a single 4K page. Now it could be divided to 32 x 128 sub-pages.
>
> The SPP L4E L3E L2E is defined as below figure.
>
> | Bit    | Contents                                                               |
> | :----- | :--------------------------------------------------------------------- |
> | 0      | Valid entry when set; indicates whether the entry is present           |
> | 11:1   | Reserved (0)                                                           |
> | N-1:12 | Physical address of 4K aligned SPPT LX-1 Table referenced by the entry |
> | 51:N   | Reserved (0)                                                           |
> | 63:52  | Reserved (0)                                                           |
> Note: N is the physical address width supported by the processor, X is the page level
>
> The SPP L1E format is defined as below figure.
> | Bit   | Contents                                                          |
> | :---- | :---------------------------------------------------------------- |
> | 0+2i  | Write permission for i-th 128 byte sub-page region.               |
> | 1+2i  | Reserved (0).                                                     |
> Note: `0<=i<=31`
>
>
> Zhang Yi Z (10):
>   KVM: VMX: Added EPT Subpage Protection Documentation.
>   x86/cpufeature: Add intel Sub-Page Protection to CPU features
>   KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
>   KVM: VMX: Introduce the SPPTP and SPP page table.
>   KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
>   KVM: VMX: Added handle of SPP write protection fault.
>   KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
>   KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
>   KVM: VMX: Added setup spp page structure.
>   KVM: VMX: implement setup SPP page structure in spp miss.
>
>  Documentation/virtual/kvm/spp_design_kvm.txt | 272 +++++++++++++++++++++
>  arch/x86/include/asm/cpufeatures.h           |   1 +
>  arch/x86/include/asm/kvm_host.h              |  18 +-
>  arch/x86/include/asm/vmx.h                   |  10 +
>  arch/x86/include/uapi/asm/vmx.h              |   2 +
>  arch/x86/kernel/cpu/intel.c                  |   4 +
>  arch/x86/kvm/mmu.c                           | 340 ++++++++++++++++++++++++++-
>  arch/x86/kvm/mmu.h                           |   1 +
>  arch/x86/kvm/vmx.c                           | 104 ++++++++
>  arch/x86/kvm/x86.c                           |  99 +++++++-
>  include/linux/kvm_host.h                     |   5 +
>  include/uapi/linux/kvm.h                     |  16 ++
>  virt/kvm/kvm_main.c                          |  26 ++
>  13 files changed, 893 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-13 16:57 ` Jim Mattson
@ 2017-10-13 21:13   ` Paolo Bonzini
  2017-10-16  0:08     ` Yi Zhang
  2017-10-16  0:01   ` Yi Zhang
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-10-13 21:13 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Zhang Yi, kvm list, LKML, Radim Krčmář, Alex Williamson


> I'll ask before Paolo does: Can you please add kvm-unit-tests to
> exercise all of this new code?

More specifically it should be the api/ unit tests because this code
can only be triggered by specific code in the host.

However, as things stand I'm not sure about how userspace would use it.
Only allowing blocking of writes means that we cannot (for example) use
it to do sub-page passthrough in VFIO.  That would be useful when the
MSI-X table does not fit a full page, but would require blocking reads
as well.  And the introspection facility by Mihai uses a completely
different API for the introspector, based on sockets rather than ioctls.
So I'm not sure this is the right API at all.

Paolo

> BTW, what generation of hardware do we need to exercise this code ourselves?
> 
> On Fri, Oct 13, 2017 at 4:11 PM, Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
> > From: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> >
> > Hi All,
> >
> > Here is a patch-series which adding EPT-Based Sub-page Write Protection
> > Support. You can get It's software developer manuals from:
> >
> > https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> >
> > In Chapter 4 EPT-BASED SUB-PAGE PERMISSIONS.
> >
> > Introduction:
> >
> > EPT-Based Sub-page Write Protection referred to as SPP, it is a capability
> > which allow Virtual Machine Monitors(VMM) to specify write-permission for
> > guest physical memory at a sub-page(128 byte) granularity.  When this
> > capability is utilized, the CPU enforces write-access permissions for
> > sub-page regions of 4K pages as specified by the VMM. EPT-based sub-page
> > permissions is intended to enable fine-grained memory write enforcement by
> > a VMM for security(guest OS monitoring) and usages such as device
> > virtualization and memory check-point.
> >
> > How SPP Works:
> >
> > SPP is active when the "sub-page write protection" VM-execution control is
> > 1. A new 4-level paging structure named SPP page table(SPPT) is
> > introduced, SPPT will look up the guest physical addresses to derive a 64
> > bit "sub-page permission" value containing sub-page write permissions. The
> > lookup from guest-physical addresses to the sub-page region permissions is
> > determined by a set of this SPPT paging structures.
> >
> > The SPPT is used to lookup write permission bits for the 128 byte sub-page
> > regions containing in the 4KB guest physical page. EPT specifies the 4KB
> > page level privileges that software is allowed when accessing the guest
> > physical address, whereas SPPT defines the write permissions for software
> > at the 128 byte granularity regions within a 4KB page. Write accesses
> > prevented due to sub-page permissions looked up via SPPT are reported as
> > EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to
> > lookup sub-page region write permissions for guest-physical addresses only
> > when those addresses are used to access memory.
> >
> > Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
> > ┌-----------------------------------------------------------┘
> > └-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
> >      |
> >      └-> <false> --> EPT legacy behavior
> >      |
> >      |
> >      └-> <true>  --> if ept_leaf_entry.writable
> >                       |
> >                       └-> <true>  --> Ignore SPP
> >                       |
> >                       └-> <false> --> GPA --> Walk SPP 4-level table--┐
> >                                                                       |
> > ┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
> > |
> > Walk SPP L4E table
> > |
> > └┐--> entry misconfiguration ------------>----------┐<----------------┐
> >  |                                                  |                 |
> > else                                                |                 |
> >  |                                                  |                 |
> >  |   ┌------------------SPP VMexit<-----------------┘                 |
> >  |   |                                                                |
> >  |   └-> exit_qualification & sppt_misconfig --> sppt misconfig       |
> >  |   |                                                                |
> >  |   └-> exit_qualification & sppt_miss --> sppt miss                 |
> >  └--┐                                                                 |
> >     |                                                                 |
> > walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
> >                |                                                      |
> >               else                                                    |
> >                |                                                      |
> >                |                                                      |
> >         walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
> >                         |                                             |
> >                        else                                           |
> >                         |                                             |
> >                         |                                             |
> >                  walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
> >                                  |
> >                                 else
> >                                  |
> >                                  └-> if sub-page writable
> >                                       └-> <true>  allow, write access
> >                                       └-> <false> disallow, EPT violation
> >
> > Patch-sets Description:
> >
> > Patch 1: Documentation.
> >
> > Patch 2: This patch adds reporting SPP capability from VMX Procbased MSR,
> > according to the definition of hardware spec, bit 23 is the control of the
> > SPP capability.
> >
> > Patch 3: Add new secondary processor-based VM-execution control bit which
> > defined as "sub-page write permission", same as VMX Procbased MSR, bit 23
> > is the enable bit of SPP.
> > Also we introduced a kernel parameter "enable_ept_spp", now SPP is active
> > when the "Sub-page Write Protection" in Secondary  VM-Execution Control is
> > set and enable the kernel parameter by "enable_ept_spp=1".
> >
> > Patch 4: Introduced the spptp and spp page table.
> > The sub-page permission table is referenced via a 64-bit control field
> > called Sub-Page Permission Table Pointer (SPPTP) which contains a
> > 4K-aligned physical address. The index and encoding for this VMCS field if
> > defined 0x2030 at this time The format of SPPTP is shown in below figure
> > 2:
> > this patch introduced the Spp paging structures, which root page will
> > created at kvm mmu page initialization.
> > Also we added a mmu page role type spp to distinguish it is a spp page or a
> > EPT page.
> >
> > Patch 5: Introduced the SPP-Induced VM exit and it's handle.
> > Accesses using guest-physical addresses may cause SPP-induced VM exits due
> > to an SPPT misconfiguration or an SPPT miss. The basic VM exit reason code
> > reporte for SPP-induced VM exits is 66.
> >
> > Also introduced the new exit qualification for SPPT-induced vmexits.
> >
> > | Bit   | Contents
> > | |
> > | :---- | :----------------------------------------------------------------
> > | |
> > | 10:0  | Reserved (0).
> > | |
> > | 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig.
> > | |
> > | 12    | NMI unblocking due to IRET
> > | |
> > | 63:13 | Reserved (0)
> > | |
> >
> > Patch 6: Added a handle of EPT subpage write protection fault.
> > A control bit in EPT leaf paging-structure entries is defined as “Sub-Page
> > Permission” (SPP bit). The bit position is 61; it is chosen from among the
> > bits that are currently ignored by the processor and available to
> > software.
> > While hardware walking the SPP page table, If the sub-page region write
> > permission bit is set, the write is allowed, else the write is disallowed
> > and results in an EPT violation.
> > We need peek this case in EPT violation handler, and trigger a user-space
> > exit, return the write protected address(GVA) to user(qemu).
> >
> > Patch 7: Introduce ioctls to set/get Sub-Page Write Protection.
> > We introduced 2 ioctls to let user application to set/get subpage write
> > protection bitmap per gfn, each gfn corresponds to a bitmap.
> > The user application, qemu, or some other security control daemon. will set
> > the protection bitmap via this ioctl.
> > the API defined as:
> >         struct kvm_subpage {
> >                 __u64 base_gfn;
> >                 __u64 npages;
> >                 /* sub-page write-access bitmap array */
> >                 __u32 access_map[SUBPAGE_MAX_BITMAP];
> >                 }sp;
> >         kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
> >         kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
> >
> > Patch 8 ~ Patch 9: Setup spp page table and update the EPT leaf entry
> > indicated with the SPP enable bit.
> > If the sub-page write permission VM-execution control is set, treatment of
> > write accesses to guest-physical accesses depends on the state of the
> > accumulated write-access bit (position 1) and sub-page permission bit
> > (position 61) in the EPT leaf paging-structure.
> > Software will update the EPT leaf entry sub-page permission bit while
> > kvm_set_subpage(patch 7). If the EPT write-access bit set to 0 and the SPP
> > bit set to 1 in the leaf EPT paging-structure entry that maps a 4KB page,
> > then the hardware will look up a VMM-managed Sub-Page Permission Table
> > (SPPT), which will be prepared by setup kvm_set_subpage(patch 8).
> > The hardware uses the guest-physical address and bits 11:7 of the address
> > accessed to lookup the SPPT to fetch a write permission bit for the 128
> > byte wide sub-page region being accessed within the 4K guest-physical
> > page. If the sub-page region write permission bit is set, the write is
> > allowed, otherwise the write is disallowed and results in an EPT
> > violation.
> > Guest-physical pages mapped via leaf EPT-paging-structures for which the
> > accumulated write-access bit and the SPP bits are both clear (0) generate
> > EPT violations on memory writes accesses. Guest-physical pages mapped via
> > EPT-paging-structure for which the accumulated write-access bit is set (1)
> > allow writes, effectively ignoring the SPP bit on the leaf EPT-paging
> > structure.
> > Software will setup the spp page table level4,3,2 as well as EPT page
> > structure, and fill the level 1 page via the 32 bit bitmaps per a single
> > 4K page. Now it could be divided to 32 x 128 sub-pages.
> >
> > The SPP L4E L3E L2E is defined as below figure.
> >
> > | Bit    | Contents
> > | |
> > | :----- |
> > | :--------------------------------------------------------------------- |
> > | 0      | Valid entry when set; indicates whether the entry is present
> > | |
> > | 11:1   | Reserved (0)
> > | |
> > | N-1:12 | Physical address of 4K aligned SPPT LX-1 Table referenced by the
> > | entry |
> > | 51:N   | Reserved (0)
> > | |
> > | 63:52  | Reserved (0)
> > | |
> > Note: N is the physical address width supported by the processor, X is the
> > page level
> >
> > The SPP L1E format is defined as below figure.
> > | Bit   | Contents
> > | |
> > | :---- | :----------------------------------------------------------------
> > | |
> > | 0+2i  | Write permission for i-th 128 byte sub-page region.
> > | |
> > | 1+2i  | Reserved (0).
> > | |
> > Note: `0<=i<=31`
> >
> >
> > Zhang Yi Z (10):
> >   KVM: VMX: Added EPT Subpage Protection Documentation.
> >   x86/cpufeature: Add intel Sub-Page Protection to CPU features
> >   KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
> >   KVM: VMX: Introduce the SPPTP and SPP page table.
> >   KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
> >   KVM: VMX: Added handle of SPP write protection fault.
> >   KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
> >   KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
> >   KVM: VMX: Added setup spp page structure.
> >   KVM: VMX: implement setup SPP page structure in spp miss.
> >
> >  Documentation/virtual/kvm/spp_design_kvm.txt | 272 +++++++++++++++++++++
> >  arch/x86/include/asm/cpufeatures.h           |   1 +
> >  arch/x86/include/asm/kvm_host.h              |  18 +-
> >  arch/x86/include/asm/vmx.h                   |  10 +
> >  arch/x86/include/uapi/asm/vmx.h              |   2 +
> >  arch/x86/kernel/cpu/intel.c                  |   4 +
> >  arch/x86/kvm/mmu.c                           | 340
> >  ++++++++++++++++++++++++++-
> >  arch/x86/kvm/mmu.h                           |   1 +
> >  arch/x86/kvm/vmx.c                           | 104 ++++++++
> >  arch/x86/kvm/x86.c                           |  99 +++++++-
> >  include/linux/kvm_host.h                     |   5 +
> >  include/uapi/linux/kvm.h                     |  16 ++
> >  virt/kvm/kvm_main.c                          |  26 ++
> >  13 files changed, 893 insertions(+), 5 deletions(-)
> >  create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt
> >
> > --
> > 2.7.4
> >
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
@ 2017-10-13 23:11 Zhang Yi
  2017-10-13 16:57 ` Jim Mattson
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:11 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

Hi All,

Here is a patch-series which adding EPT-Based Sub-page Write Protection Support. You can get It's software developer manuals from:

https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

In Chapter 4 EPT-BASED SUB-PAGE PERMISSIONS.

Introduction:

EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which allow Virtual Machine Monitors(VMM) to specify write-permission for guest physical memory at a sub-page(128 byte) granularity.  When this capability is utilized, the CPU enforces write-access permissions for sub-page regions of 4K pages as specified by the VMM. EPT-based sub-page permissions is intended to enable fine-grained memory write enforcement by a VMM for security(guest OS monitoring) and usages such as device virtualization and memory check-point.

How SPP Works:

SPP is active when the "sub-page write protection" VM-execution control is 1. A new 4-level paging structure named SPP page table(SPPT) is introduced, SPPT will look up the guest physical addresses to derive a 64 bit "sub-page permission" value containing sub-page write permissions. The lookup from guest-physical addresses to the sub-page region permissions is determined by a set of this SPPT paging structures.

The SPPT is used to lookup write permission bits for the 128 byte sub-page regions containing in the 4KB guest physical page. EPT specifies the 4KB page level privileges that software is allowed when accessing the guest physical address, whereas SPPT defines the write permissions for software at the 128 byte granularity regions within a 4KB page. Write accesses prevented due to sub-page permissions looked up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to lookup sub-page region write permissions for guest-physical addresses only when those addresses are used to access memory.

Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
┌-----------------------------------------------------------┘
└-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
     |
     └-> <false> --> EPT legacy behavior
     |
     |
     └-> <true>  --> if ept_leaf_entry.writable
                      |
                      └-> <true>  --> Ignore SPP
                      |
		      └-> <false> --> GPA --> Walk SPP 4-level table--┐
                                                                      |
┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
|
Walk SPP L4E table
|
└┐--> entry misconfiguration ------------>----------┐<----------------┐
 |                                                  |                 |
else                                                |                 |
 |                                                  |                 |
 |   ┌------------------SPP VMexit<-----------------┘                 |
 |   |                                                                |
 |   └-> exit_qualification & sppt_misconfig --> sppt misconfig       |
 |   |                                                                |
 |   └-> exit_qualification & sppt_miss --> sppt miss                 |
 └--┐                                                                 |
    |                                                                 |
walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
               |                                                      |
	      else                                                    |
	       |                                                      |
	       |                                                      |
        walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
                        |                                             |
                       else                                           |
			|                                             |
			|                                             |
	         walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
                                 |
			        else
				 |
                                 └-> if sub-page writable
                                      └-> <true>  allow, write access
	                              └-> <false> disallow, EPT violation

Patch-sets Description:

Patch 1: Documentation.

Patch 2: This patch adds reporting SPP capability from VMX Procbased MSR, according to the definition of hardware spec, bit 23 is the control of the SPP capability.

Patch 3: Add new secondary processor-based VM-execution control bit which defined as "sub-page write permission", same as VMX Procbased MSR, bit 23 is the enable bit of SPP.
Also we introduced a kernel parameter "enable_ept_spp", now SPP is active when the "Sub-page Write Protection" in Secondary  VM-Execution Control is set and enable the kernel parameter by "enable_ept_spp=1".

Patch 4: Introduced the spptp and spp page table.
The sub-page permission table is referenced via a 64-bit control field called Sub-Page Permission Table Pointer (SPPTP) which contains a 4K-aligned physical address. The index and encoding for this VMCS field if defined 0x2030 at this time The format of SPPTP is shown in below figure 2:
this patch introduced the Spp paging structures, which root page will created at kvm mmu page initialization.
Also we added a mmu page role type spp to distinguish it is a spp page or a EPT page.

Patch 5: Introduced the SPP-Induced VM exit and it's handle.
Accesses using guest-physical addresses may cause SPP-induced VM exits due to an SPPT misconfiguration or an SPPT miss. The basic VM exit reason code reporte for SPP-induced VM exits is 66.

Also introduced the new exit qualification for SPPT-induced vmexits.

| Bit   | Contents                                                          |
| :---- | :---------------------------------------------------------------- |
| 10:0  | Reserved (0).                                                     |
| 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
| 12    | NMI unblocking due to IRET                                        |
| 63:13 | Reserved (0)                                                      |

Patch 6: Added a handle of EPT subpage write protection fault.
A control bit in EPT leaf paging-structure entries is defined as “Sub-Page Permission” (SPP bit). The bit position is 61; it is chosen from among the bits that are currently ignored by the processor and available to software.
While hardware walking the SPP page table, If the sub-page region write permission bit is set, the write is allowed, else the write is disallowed and results in an EPT violation.
We need peek this case in EPT violation handler, and trigger a user-space exit, return the write protected address(GVA) to user(qemu).

Patch 7: Introduce ioctls to set/get Sub-Page Write Protection.
We introduced 2 ioctls to let user application to set/get subpage write protection bitmap per gfn, each gfn corresponds to a bitmap.
The user application, qemu, or some other security control daemon. will set the protection bitmap via this ioctl.
the API defined as:
	struct kvm_subpage {
		__u64 base_gfn;
		__u64 npages;
		/* sub-page write-access bitmap array */
		__u32 access_map[SUBPAGE_MAX_BITMAP];
		}sp;
	kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
	kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)

Patch 8 ~ Patch 9: Setup spp page table and update the EPT leaf entry indicated with the SPP enable bit.
If the sub-page write permission VM-execution control is set, treatment of write accesses to guest-physical accesses depends on the state of the accumulated write-access bit (position 1) and sub-page permission bit (position 61) in the EPT leaf paging-structure.
Software will update the EPT leaf entry sub-page permission bit while kvm_set_subpage(patch 7). If the EPT write-access bit set to 0 and the SPP bit set to 1 in the leaf EPT paging-structure entry that maps a 4KB page, then the hardware will look up a VMM-managed Sub-Page Permission Table (SPPT), which will be prepared by setup kvm_set_subpage(patch 8).
The hardware uses the guest-physical address and bits 11:7 of the address accessed to lookup the SPPT to fetch a write permission bit for the 128 byte wide sub-page region being accessed within the 4K guest-physical page. If the sub-page region write permission bit is set, the write is allowed, otherwise the write is disallowed and results in an EPT violation.
Guest-physical pages mapped via leaf EPT-paging-structures for which the accumulated write-access bit and the SPP bits are both clear (0) generate EPT violations on memory writes accesses. Guest-physical pages mapped via EPT-paging-structure for which the accumulated write-access bit is set (1) allow writes, effectively ignoring the SPP bit on the leaf EPT-paging structure.
Software will setup the spp page table level4,3,2 as well as EPT page structure, and fill the level 1 page via the 32 bit bitmaps per a single 4K page. Now it could be divided to 32 x 128 sub-pages.

The SPP L4E L3E L2E is defined as below figure.

| Bit    | Contents                                                               |
| :----- | :--------------------------------------------------------------------- |
| 0      | Valid entry when set; indicates whether the entry is present           |
| 11:1   | Reserved (0)                                                           |
| N-1:12 | Physical address of 4K aligned SPPT LX-1 Table referenced by the entry |
| 51:N   | Reserved (0)                                                           |
| 63:52  | Reserved (0)                                                           |
Note: N is the physical address width supported by the processor, X is the page level

The SPP L1E format is defined as below figure.
| Bit   | Contents                                                          |
| :---- | :---------------------------------------------------------------- |
| 0+2i  | Write permission for i-th 128 byte sub-page region.               |
| 1+2i  | Reserved (0).                                                     |
Note: `0<=i<=31`


Zhang Yi Z (10):
  KVM: VMX: Added EPT Subpage Protection Documentation.
  x86/cpufeature: Add intel Sub-Page Protection to CPU features
  KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
  KVM: VMX: Introduce the SPPTP and SPP page table.
  KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
  KVM: VMX: Added handle of SPP write protection fault.
  KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
  KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
  KVM: VMX: Added setup spp page structure.
  KVM: VMX: implement setup SPP page structure in spp miss.

 Documentation/virtual/kvm/spp_design_kvm.txt | 272 +++++++++++++++++++++
 arch/x86/include/asm/cpufeatures.h           |   1 +
 arch/x86/include/asm/kvm_host.h              |  18 +-
 arch/x86/include/asm/vmx.h                   |  10 +
 arch/x86/include/uapi/asm/vmx.h              |   2 +
 arch/x86/kernel/cpu/intel.c                  |   4 +
 arch/x86/kvm/mmu.c                           | 340 ++++++++++++++++++++++++++-
 arch/x86/kvm/mmu.h                           |   1 +
 arch/x86/kvm/vmx.c                           | 104 ++++++++
 arch/x86/kvm/x86.c                           |  99 +++++++-
 include/linux/kvm_host.h                     |   5 +
 include/uapi/linux/kvm.h                     |  16 ++
 virt/kvm/kvm_main.c                          |  26 ++
 13 files changed, 893 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt

-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFC 01/10] KVM: VMX: Added EPT Subpage Protection Documentation.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
  2017-10-13 16:57 ` Jim Mattson
@ 2017-10-13 23:12 ` Zhang Yi
  2017-10-13 23:12 ` [PATCH RFC 02/10] x86/cpufeature: Add intel Sub-Page Protection to CPU features Zhang Yi
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:12 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 Documentation/virtual/kvm/spp_design_kvm.txt | 272 +++++++++++++++++++++++++++
 1 file changed, 272 insertions(+)
 create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt

diff --git a/Documentation/virtual/kvm/spp_design_kvm.txt b/Documentation/virtual/kvm/spp_design_kvm.txt
new file mode 100644
index 0000000..7bdab9e
--- /dev/null
+++ b/Documentation/virtual/kvm/spp_design_kvm.txt
@@ -0,0 +1,272 @@
+DRAFT: EPT-Based Sub-Page Protection (SPP) Design Doc for KVM
+=============================================================
+
+1. Overview
+
+EPT-based Sub-Page Protection (SPP) capability to allow Virtual Machine
+Monitors to specify write-protection for guest physical memory at a
+sub-page (128 byte) granularity. When this capability is utilized, the
+CPU enforces write-access permissions for sub-page regions of 4K pages
+as specified by the VMM.
+
+2. Operation of SPP
+
+Sub-Page Protection Table (SPPT) is introduced to manage sub-page
+write-access.
+
+SPPT is active when the "sub-page write protection" VM-execution control
+is 1. SPPT looks up the guest physical addresses to derive a 64 bit
+"sub-page permission" value containing sub-page write permissions. The
+lookup from guest-physical addresses to the sub-page region permissions
+is determined by a set of SPPT paging structures.
+
+When the "sub-page write protection" VM-execution control is 1, the SPPT
+is used to lookup write permission bits for the 128 byte sub-page regions
+containing in the 4KB guest physical page. EPT specifies the 4KB page
+level privileges that software is allowed when accessing the guest
+physical address, whereas SPPT defines the write permissions for software
+at the 128 byte granularity regions within a 4KB page. Write accesses
+prevented due to sub-page permissions looked up via SPPT are reported as
+EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to
+lookup sub-page region write permissions for guest-physical addresses
+only when those addresses are used to access memory.
+_______________________________________________________________________________
+
+Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
+┌-----------------------------------------------------------┘
+└-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
+     |
+     └-> <false> --> EPT legacy behavior
+     |
+     |
+     └-> <true>  --> if ept_leaf_entry.writable
+                      |
+                      └-> <true>  --> Ignore SPP
+                      |
+		      └-> <false> --> GPA --> Walk SPP 4-level table--┐
+                                                                      |
+┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
+|
+Walk SPP L4E table
+|
+└┐--> entry misconfiguration ------------>----------┐<----------------┐
+ |                                                  |                 |
+else                                                |                 |
+ |                                                  |                 |
+ |   ┌------------------SPP VMexit<-----------------┘                 |
+ |   |                                                                |
+ |   └-> exit_qualification & sppt_misconfig --> sppt misconfig       |
+ |   |                                                                |
+ |   └-> exit_qualification & sppt_miss --> sppt miss                 |
+ └--┐                                                                 |
+    |                                                                 |
+walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
+               |                                                      |
+	      else                                                    |
+	       |                                                      |
+	       |                                                      |
+        walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
+                        |                                             |
+                       else                                           |
+			|                                             |
+			|                                             |
+	         walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
+                                 |
+			        else
+				 |
+                                 └-> if sub-page writable
+                                      └-> <true>  allow, write access
+	                              └-> <false> disallow, EPT violation
+______________________________________________________________________________
+
+3. Interfaces
+
+* Feature enabling
+
+Add "spp=on" to KVM module parameter to enable SPP feature, default is off.
+
+* Get/Set sub-page write access permission
+
+New KVM ioctl:
+
+`KVM_SUBPAGES_GET_ACCESS`:
+Get sub-pages write access bitmap corresponding to given rang of continuous gfn.
+
+`KVM_SUBPAGES_SET_ACCESS`
+Set sub-pages write access bitmap corresponding to given rang of continuous gfn.
+
+```c
+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+struct kvm_subpage_info {
+	__u64 gfn;
+	__u64 npages; /* number of 4K pages */
+	__u64 *access_map; /* sub-page write-access bitmap array */
+};
+
+#define KVM_SUBPAGES_GET_ACCESS   _IOR(KVMIO,  0x49, struct kvm_subpage_info)
+#define KVM_SUBPAGES_SET_ACCESS   _IOW(KVMIO,  0x4a, struct kvm_subpage_info)
+```
+
+4. SPPT initialization
+
+* SPPT root page allocation
+
+  SPPT is referenced via a 64-bit control field called "sub-page
+  protection table pointe" (SPPTP, encoding 0x2030) which contains a
+  4K-align physical address.
+
+  SPPT also has 4 level table as well as EPT. So, as EPT does, when KVM
+  loads mmu, we allocate a root page for SPPT L4 table.
+
+* EPT leaf entry SPP bit
+
+  Set 0 to SPP bit to close SPP by default.
+
+5. Set/Get Sub-Page access bitmap for bunch of guest physical pages
+
+* To utilize SPP feature, system admin should Set a Sub-page access write via
+  SPP KVM ioctl `KVM_SUBPAGES_SET_ACCESS`, which will prepared the flowing things.
+
+   (1.Got the corresponding EPT leaf entry via the guest physical address.
+   (2.If it is a 4K page frame, flag the bit 61 to enable subpage protection on this page.
+   (3.Setup spp page structure, the page structure format is list following.
+
+   Format of the SPPT L4E, L3E, L2E:
+   | Bit    | Contents                                                                 |
+   | :----- | :------------------------------------------------------------------------|
+   | 0      | Valid entry when set; indicates whether the entry is present             |
+   | 11:1   | Reserved (0)                                                             |
+   | N-1:12 | Physical address of 4KB aligned SPPT LX-1 Table referenced by this entry |
+   | 51:N   | Reserved (0)                                                             |
+   | 63:52  | Reserved (0)                                                             |
+   Note: N is the physical address width supported by the processor. X is the page level
+
+   Format of the SPPT L1E:
+   | Bit   | Contents                                                          |
+   | :---- | :---------------------------------------------------------------- |
+   | 0+2i  | Write permission for i-th 128 byte sub-page region.               |
+   | 1+2i  | Reserved (0).                                                     |
+   Note: `0<=i<=31`
+
+   (4.Update the subpage info into memory slot structure.
+
+* Sub-page write access bitmap setting pseudo-code:
+
+```c
+static int kvm_mmu_set_subpages(struct kvm_vcpu *vcpu,
+				struct kvm_subpage_info *spp_info)
+{
+	gfn_t *gfns = spp_info->gfns;
+	u64 *access_map = spp_info->access_map;
+
+	sanity_check();
+
+	/* SPP works when the page is unwritable */
+	if (set_ept_leaf_level_unwritable(gfn) == success)
+
+		if (kvm_mmu_setup_spp_structure(gfn) == success)
+
+			set_subpage_slot_info(access_map);
+
+}
+```
+
+User could get the subpage info via SPP KVM ioctl `KVM_SUBPAGES_GET_ACCESS`,
+which from the memory slot structure corresponding the specify gpa.
+
+* Sub-page get subpage info pseudo-code:
+
+```c
+static int kvm_mmu_get_subpages(struct kvm_vcpu *vcpu,
+				struct kvm_subpage_info *spp_info)
+{
+	gfn_t *gfns = spp_info->gfns;
+
+	sanity_check(gfn);
+	spp_info = get_subpage_slot_info(gfn);
+}
+
+```
+
+5. SPPT-induced vmexits
+
+* SPP VM exits
+
+Accesses using guest physical addresses may cause VM exits due to a SPPT
+Misconfiguration or a SPPT Miss.
+
+A SPPT Misconfiguration vmexit occurs when, in the course of translating
+a guest physical address, the logical processor encounters a leaf EPT
+paging-structure entry mapping a 4KB page, with SPP enabled, during the
+SPPT lookup, a SPPT paging-structure entry contains an unsupported
+value.
+
+A SPPT Miss vmexit occurs during the SPPT lookup there is no SPPT
+misconfiguration but any level of SPPT paging-structure entries are not
+present.
+
+NOTE. SPPT misconfigurations and SPPT miss can occur only due to an
+attempt to write memory with a guest physical address.
+
+* EPT violation vmexits due to SPPT
+
+EPT violations due to memory write accesses disallowed due to sub-page
+protection permissions specified in the SPPT are reported via EPT
+violation VM exits.
+
+6. SPPT-induced vmexits handling
+
+```c
+#define EXIT_REASON_SPP                 66
+
+static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
+	...
+	[EXIT_REASON_SPP]                     = handle_spp,
+	...
+};
+```
+
+New exit qualification for SPPT-induced vmexits.
+
+| Bit   | Contents                                                          |
+| :---- | :---------------------------------------------------------------- |
+| 10:0  | Reserved (0).                                                     |
+| 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
+| 12    | NMI unblocking due to IRET                                        |
+| 63:13 | Reserved (0)                                                      |
+
+In addition to the exit qualification, Guest Linear Address and Guest
+Physical Address fields will be reported.
+
+* SPPT miss and misconfiguration
+
+Allocate a page for the SPPT entry and set the entry correctly.
+
+
+SPP VMexit handler Pseudo-code:
+```c
+static int handle_spp(kvm_vcpu *vcpu)
+{
+	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	if (exit_qualification & SPP_EXIT_TYPE_BIT) {
+		/* SPPT Miss */
+		/* We don't set SPP write access for the corresponding
+		 * GPA, leave it unwritable, so no need to construct
+		 * SPP table here. */
+	} else {
+		/* SPPT Misconfig */
+		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+		vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
+	}
+	return 0;
+}
+```
+
+* EPT violation vmexits due to SPPT
+
+While hardware walking the SPP page table, If the sub-page region write
+permission bit is set, the write is allowed, else the write is disallowed
+and results in an EPT violation.
+
+we need peek this case in EPT violation handler, and trigger a user-space
+exit, return the write protected address(GPA) to user(qemu).
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 02/10] x86/cpufeature: Add intel Sub-Page Protection to CPU features
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
  2017-10-13 16:57 ` Jim Mattson
  2017-10-13 23:12 ` [PATCH RFC 01/10] KVM: VMX: Added EPT Subpage Protection Documentation Zhang Yi
@ 2017-10-13 23:12 ` Zhang Yi
  2017-10-13 23:13 ` [PATCH RFC 03/10] KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls Zhang Yi
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:12 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

Adds reporting SPP capability from VMX Procbased MSR, according to
the definition of hardware spec, bit 32 is the control of the SPP
capability.

Defined X86_FEATURE_SPP under intel X86 VT-x CPU features.

Defined the X86_VMX_FEATURE_PROC_CTLS2_SPP in intel VMX MSR indicated
features, And enable SPP capability by this MSR.

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/intel.c        | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 2519c6c..36cd3b9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
 #define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */
 #define X86_FEATURE_EPT         ( 8*32+ 3) /* Intel Extended Page Table */
 #define X86_FEATURE_VPID        ( 8*32+ 4) /* Intel Virtual Processor ID */
+#define X86_FEATURE_SPP         ( 8*32+ 5) /* Intel EPT-based Sub-Page Write Protection */
 
 #define X86_FEATURE_VMMCALL     ( 8*32+15) /* Prefer vmmcall to vmcall */
 #define X86_FEATURE_XENPV       ( 8*32+16) /* "" Xen paravirtual guest */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index dfa90a3..242978b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -425,6 +425,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
 #define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC	0x00000001
 #define X86_VMX_FEATURE_PROC_CTLS2_EPT		0x00000002
 #define X86_VMX_FEATURE_PROC_CTLS2_VPID		0x00000020
+#define X86_VMX_FEATURE_PROC_CTLS2_SPP		0x00800000
 
 	u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
 
@@ -433,6 +434,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
 	clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
 	clear_cpu_cap(c, X86_FEATURE_EPT);
 	clear_cpu_cap(c, X86_FEATURE_VPID);
+	clear_cpu_cap(c, X86_FEATURE_SPP);
 
 	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
 	msr_ctl = vmx_msr_high | vmx_msr_low;
@@ -451,6 +453,8 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
 			set_cpu_cap(c, X86_FEATURE_EPT);
 		if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
 			set_cpu_cap(c, X86_FEATURE_VPID);
+		if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_SPP)
+			set_cpu_cap(c, X86_FEATURE_SPP);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 03/10] KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (2 preceding siblings ...)
  2017-10-13 23:12 ` [PATCH RFC 02/10] x86/cpufeature: Add intel Sub-Page Protection to CPU features Zhang Yi
@ 2017-10-13 23:13 ` Zhang Yi
  2017-10-13 23:13 ` [PATCH RFC 04/10] KVM: VMX: Introduce the SPPTP and SPP page table Zhang Yi
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:13 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

Add new secondary processor-based VM-execution control bit which
defined as "sub-page write permission", same as VMX Procbased MSR,
bit 23 is the enable bit of SPP.

Also we introduced a enable_ept_spp parameter to control the
SPP is ON/OFF, Set the default is OFF as we are on the way of
enabling.

Now SPP is active when the "Sub-page Write Protection"
in Secondary VM-Execution Control is set and enable the kernel
parameter by "spp=on".

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/vmx.c         | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index caec841..633dff5 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -77,6 +77,7 @@
 #define SECONDARY_EXEC_RDSEED			0x00010000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_XSAVES			0x00100000
+#define SECONDARY_EXEC_ENABLE_SPP		0x00800000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8ed90f7c..1a2ca87 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -82,6 +82,9 @@ module_param_named(unrestricted_guest,
 static bool __read_mostly enable_ept_ad_bits = 1;
 module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO);
 
+static bool __read_mostly enable_ept_spp;
+module_param_named(spp, enable_ept_spp, bool, S_IRUGO);
+
 static bool __read_mostly emulate_invalid_guest_state = true;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
@@ -1307,6 +1310,11 @@ static inline bool cpu_has_vmx_pml(void)
 	return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_PML;
 }
 
+static inline bool cpu_has_vmx_ept_spp(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_SPP;
+}
+
 static inline bool cpu_has_vmx_tsc_scaling(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -3660,6 +3668,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_RDSEED |
 			SECONDARY_EXEC_RDRAND |
 			SECONDARY_EXEC_ENABLE_PML |
+			SECONDARY_EXEC_ENABLE_SPP |
 			SECONDARY_EXEC_TSC_SCALING |
 			SECONDARY_EXEC_ENABLE_VMFUNC;
 		if (adjust_vmx_controls(min2, opt2,
@@ -5323,6 +5332,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!enable_pml)
 		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
+	if (!enable_ept_spp)
+		exec_control &= ~SECONDARY_EXEC_ENABLE_SPP;
+
 	if (vmx_xsaves_supported()) {
 		/* Exposing XSAVES only when XSAVE is exposed */
 		bool xsaves_enabled =
@@ -6753,11 +6765,15 @@ static __init int hardware_setup(void)
 		enable_ept = 0;
 		enable_unrestricted_guest = 0;
 		enable_ept_ad_bits = 0;
+		enable_ept_spp = 0;
 	}
 
 	if (!cpu_has_vmx_ept_ad_bits() || !enable_ept)
 		enable_ept_ad_bits = 0;
 
+	if (!cpu_has_vmx_ept_spp())
+		enable_ept_spp = 0;
+
 	if (!cpu_has_vmx_unrestricted_guest())
 		enable_unrestricted_guest = 0;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 04/10] KVM: VMX: Introduce the SPPTP and SPP page table.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (3 preceding siblings ...)
  2017-10-13 23:13 ` [PATCH RFC 03/10] KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls Zhang Yi
@ 2017-10-13 23:13 ` Zhang Yi
  2017-10-13 23:14 ` [PATCH RFC 05/10] KVM: VMX: Introduce SPP-Induced vm exit and it's handle Zhang Yi
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:13 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

SPPT has 4-level paging structure that is similar to EPT
except L1E.

The sub-page permission table is referenced via a 64-bit control field
called Sub-Page Permission Table Pointer (SPPTP) which contains a
4K-aligned physical address. the index and encoding for this VMCS field
is defined 0x2030 at this time

The format of SPPTP is shown in below figure

-------------------------------------------------------------------------
| Bit    | Contents                                                     |
|        |                                                              |
:-----------------------------------------------------------------------|
| 11:0   | Reserved (0)                                                 |
| N-1:12 | Physical address of 4KB aligned SPPT L4E Table               |
| 51:N   | Reserved (0)                                                 |
| 63:52  | Reserved (0)                                                 |
------------------------------------------------------------------------|

Note: N is the physical address width supported by the processor.

This patch introduced the Spp paging structures, which root page will
created at kvm mmu page initialization. and free at mmu page free.

Same as EPT page table, We initialized the SPPT,
and write the SPPT point into VMCS field.
Also we added a mmu page role type spp to distinguish it is a spp page
or a EPT page.

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++-
 arch/x86/include/asm/vmx.h      |  2 ++
 arch/x86/kvm/mmu.c              | 39 +++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx.c              | 16 ++++++++++++++++
 4 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493..5e8fdda 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -256,7 +256,8 @@ union kvm_mmu_page_role {
 		unsigned smep_andnot_wp:1;
 		unsigned smap_andnot_wp:1;
 		unsigned ad_disabled:1;
-		unsigned :7;
+		unsigned spp:1;
+		unsigned reserved:6;
 
 		/*
 		 * This is left at the top of the word so that
@@ -345,6 +346,7 @@ struct kvm_mmu {
 	void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   u64 *spte, const void *pte);
 	hpa_t root_hpa;
+	hpa_t sppt_root;
 	union kvm_mmu_page_role base_role;
 	u8 root_level;
 	u8 shadow_root_level;
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 633dff5..55bac23 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -211,6 +211,8 @@ enum vmcs_field {
 	VMWRITE_BITMAP                  = 0x00002028,
 	XSS_EXIT_BITMAP                 = 0x0000202C,
 	XSS_EXIT_BITMAP_HIGH            = 0x0000202D,
+	SPPT_POINTER			= 0x00002030,
+	SPPT_POINTER_HIGH		= 0x00002031,
 	TSC_MULTIPLIER                  = 0x00002032,
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index eca30c1..32a374c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2287,6 +2287,28 @@ static void clear_sp_write_flooding_count(u64 *spte)
 	__clear_sp_write_flooding_count(sp);
 }
 
+static struct kvm_mmu_page *kvm_mmu_get_spp_page(struct kvm_vcpu *vcpu,
+						 gfn_t gfn,
+						 unsigned level)
+
+{
+	struct kvm_mmu_page *sp;
+	union kvm_mmu_page_role role;
+
+	role = vcpu->arch.mmu.base_role;
+	role.level = level;
+	role.direct = true;
+	role.spp = true;
+
+	sp = kvm_mmu_alloc_page(vcpu, true);
+	sp->gfn = gfn;
+	sp->role = role;
+	hlist_add_head(&sp->hash_link,
+		       &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
+	clear_page(sp->spt);
+	return sp;
+}
+
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 					     gfn_t gfn,
 					     gva_t gaddr,
@@ -3319,7 +3341,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 static void mmu_free_roots(struct kvm_vcpu *vcpu)
 {
 	int i;
-	struct kvm_mmu_page *sp;
+	struct kvm_mmu_page *sp, *spp_sp;
 	LIST_HEAD(invalid_list);
 
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
@@ -3329,16 +3351,24 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 	    (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL ||
 	     vcpu->arch.mmu.direct_map)) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
+		hpa_t spp_root = vcpu->arch.mmu.sppt_root;
 
 		spin_lock(&vcpu->kvm->mmu_lock);
 		sp = page_header(root);
+		spp_sp = page_header(spp_root);
 		--sp->root_count;
+		--spp_sp->root_count;
 		if (!sp->root_count && sp->role.invalid) {
 			kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
 			kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
 		}
+		if (!spp_sp->root_count && spp_sp->role.invalid) {
+			kvm_mmu_prepare_zap_page(vcpu->kvm, spp_sp, &invalid_list);
+			kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+		}
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = INVALID_PAGE;
+		vcpu->arch.mmu.sppt_root = INVALID_PAGE;
 		return;
 	}
 
@@ -3375,7 +3405,7 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
 
 static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 {
-	struct kvm_mmu_page *sp;
+	struct kvm_mmu_page *sp, *spp_sp;
 	unsigned i;
 
 	if (vcpu->arch.mmu.shadow_root_level >= PT64_ROOT_4LEVEL) {
@@ -3386,9 +3416,13 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		}
 		sp = kvm_mmu_get_page(vcpu, 0, 0,
 				vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL);
+		spp_sp = kvm_mmu_get_spp_page(vcpu, 0,
+				vcpu->arch.mmu.shadow_root_level);
 		++sp->root_count;
+		++spp_sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = __pa(sp->spt);
+		vcpu->arch.mmu.sppt_root = __pa(spp_sp->spt);
 	} else if (vcpu->arch.mmu.shadow_root_level == PT32E_ROOT_LEVEL) {
 		for (i = 0; i < 4; ++i) {
 			hpa_t root = vcpu->arch.mmu.pae_root[i];
@@ -5021,6 +5055,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.walk_mmu = &vcpu->arch.mmu;
 	vcpu->arch.mmu.root_hpa = INVALID_PAGE;
+	vcpu->arch.mmu.sppt_root = INVALID_PAGE;
 	vcpu->arch.mmu.translate_gpa = translate_gpa;
 	vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a2ca87..a4ac08a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -253,6 +253,7 @@ struct __packed vmcs12 {
 	u64 eoi_exit_bitmap3;
 	u64 eptp_list_address;
 	u64 xss_exit_bitmap;
+	u64 sppt_pointer;
 	u64 guest_physical_address;
 	u64 vmcs_link_pointer;
 	u64 pml_address;
@@ -775,6 +776,7 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(EOI_EXIT_BITMAP3, eoi_exit_bitmap3),
 	FIELD64(EPTP_LIST_ADDRESS, eptp_list_address),
 	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
+	FIELD64(SPPT_POINTER, sppt_pointer),
 	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
 	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
 	FIELD64(PML_ADDRESS, pml_address),
@@ -4323,10 +4325,16 @@ static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
 	return eptp;
 }
 
+static inline u64 construct_spptp(unsigned long root_hpa)
+{
+	return root_hpa & PAGE_MASK;
+}
+
 static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	unsigned long guest_cr3;
 	u64 eptp;
+	u64 spptp;
 
 	guest_cr3 = cr3;
 	if (enable_ept) {
@@ -4339,6 +4347,12 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		ept_load_pdptrs(vcpu);
 	}
 
+	if ((vcpu->arch.mmu.sppt_root != INVALID_PAGE) &&
+	    enable_ept_spp) {
+		spptp = construct_spptp(vcpu->arch.mmu.sppt_root);
+		vmcs_write64(SPPT_POINTER, spptp);
+	}
+
 	vmx_flush_tlb(vcpu);
 	vmcs_writel(GUEST_CR3, guest_cr3);
 }
@@ -8754,6 +8768,8 @@ static void dump_vmcs(void)
 		pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV));
 	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT))
 		pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER));
+	if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_SPP))
+		pr_err("SPPT pointer = 0x%016llx\n", vmcs_read64(SPPT_POINTER));
 	n = vmcs_read32(CR3_TARGET_COUNT);
 	for (i = 0; i + 1 < n; i += 4)
 		pr_err("CR3 target%u=%016lx target%u=%016lx\n",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 05/10] KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (4 preceding siblings ...)
  2017-10-13 23:13 ` [PATCH RFC 04/10] KVM: VMX: Introduce the SPPTP and SPP page table Zhang Yi
@ 2017-10-13 23:14 ` Zhang Yi
  2017-10-13 23:14 ` [PATCH RFC 06/10] KVM: VMX: Added handle of SPP write protection fault Zhang Yi
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:14 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

Accesses using guest-physical addresses may cause SPP-induced VM exits
due to an SPPT misconfiguration or an
SPPT miss. The basic VM exit reason code reported for SPP-induced VM
exits is 66.

An SPPT misconfiguration VM exit occurs when, in the course of
translating a guest-physical address, the logical processor encounters
a leaf EPT paging-structure entry mapping a 4KB page for which the
sub-page write permission control bit is set and during the SPPT lookup
an SPPT paging-structure entry contains an unsupported value.

An SPPT miss VM exit occurs when, in the course of translation a
guest-physical address, the logical processor encounters a leaf
EPT paging-structure entry for which the sub-page write permission
control bit is set and during the SPPT lookup there is no SPPT
misconfiguration but any level of SPPT paging-structure entries
are not-present.

SPPT misconfigurations and SPPT misses can occur only due to an attempt
to write memory with a guest-physical address.

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 arch/x86/include/asm/vmx.h      |  7 +++++++
 arch/x86/include/uapi/asm/vmx.h |  2 ++
 arch/x86/kvm/vmx.c              | 45 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 55bac23..7f1b824 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -543,6 +543,13 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_GVA_TRANSLATED	(1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
 
 /*
+ * Exit Qualifications for SPPT-Induced VM Exits
+ */
+#define SPPT_INDUCED_EXIT_TYPE_BIT	11
+#define SPPT_INDUCED_EXIT_TYPE		(1 << SPPT_INDUCED_EXIT_TYPE_BIT)
+#define SPPT_INTR_INFO_UNBLOCK_NMI	INTR_INFO_UNBLOCK_NMI
+
+/*
  * VM-instruction error numbers
  */
 enum vm_instruction_error_number {
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 690a2dc..d632264 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -84,6 +84,7 @@
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
+#define EXIT_REASON_SPP                 66
 
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -140,6 +141,7 @@
 	{ EXIT_REASON_ENCLS,                 "ENCLS" }, \
 	{ EXIT_REASON_RDSEED,                "RDSEED" }, \
 	{ EXIT_REASON_PML_FULL,              "PML_FULL" }, \
+	{ EXIT_REASON_SPP,                   "SPP" }, \
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
 	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a4ac08a..fa4f548 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7997,6 +7997,50 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
 
+static int handle_spp(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qualification;
+
+	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+
+	/*
+	 * SPP VM exit happened while executing iret from NMI,
+	 * "blocked by NMI" bit has to be set before next VM entry.
+	 * There are errata that may cause this bit to not be set:
+	 * AAK134, BY25.
+	 */
+	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+	    (exit_qualification & SPPT_INTR_INFO_UNBLOCK_NMI))
+		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+			      GUEST_INTR_STATE_NMI);
+
+	pr_debug("SPP: SPP exit_qualification=%lx\n", exit_qualification);
+
+	vcpu->arch.exit_qualification = exit_qualification;
+
+	if (exit_qualification & SPPT_INDUCED_EXIT_TYPE) {
+		/*
+		 * SPPT Miss
+		 * We don't set SPP write access for the corresponding
+		 * GPA, if we haven't setup, we need to construct
+		 * SPP table here.
+		 */
+		pr_debug("SPP: %s: SPPT Miss!!!\n", __func__);
+		return 1;
+	}
+
+	/*
+	 * SPPT Misconfig
+	 * This is probably possible that your sppt table
+	 * set as an incorrect format
+	 */
+	WARN_ON(1);
+	vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
+	pr_alert("SPP: %s: SPPT Misconfiguration!!!\n", __func__);
+	return 0;
+}
+
 static int handle_pml_full(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification;
@@ -8194,6 +8238,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_INVVPID]                 = handle_invvpid,
 	[EXIT_REASON_RDRAND]                  = handle_invalid_op,
 	[EXIT_REASON_RDSEED]                  = handle_invalid_op,
+	[EXIT_REASON_SPP]                     = handle_spp,
 	[EXIT_REASON_XSAVES]                  = handle_xsaves,
 	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
 	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 06/10] KVM: VMX: Added handle of SPP write protection fault.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (5 preceding siblings ...)
  2017-10-13 23:14 ` [PATCH RFC 05/10] KVM: VMX: Introduce SPP-Induced vm exit and it's handle Zhang Yi
@ 2017-10-13 23:14 ` Zhang Yi
  2017-10-13 23:14 ` [PATCH RFC 07/10] KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection Zhang Yi
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:14 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

A control bit in EPT leaf paging-structure entries is defined as
“Sub-Page Permission” (SPP bit). The bit position is 61

While hardware walking the SPP page table, If the sub-page
region write permission bit is set, the write is allowed,
else the write is disallowed and results in an EPT violation.

we need peek this case in EPT violation handler, and trigger
a user-space exit, return the write protected address(GPA)
to user(qemu).

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 arch/x86/kvm/mmu.c       | 19 +++++++++++++++++++
 arch/x86/kvm/mmu.h       |  1 +
 include/uapi/linux/kvm.h |  5 +++++
 3 files changed, 25 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 32a374c..1fbe467 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3232,6 +3232,21 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 		if ((error_code & PFERR_WRITE_MASK) &&
 		    spte_can_locklessly_be_made_writable(spte))
 		{
+			/*
+			 * Record write protect fault caused by
+			 * Sub-page Protection
+			 */
+			if (spte & PT_SPP_MASK) {
+				fault_handled = true;
+
+				vcpu->run->exit_reason = KVM_EXIT_SPP;
+				vcpu->run->spp.addr = gva;
+				kvm_skip_emulated_instruction(vcpu);
+
+				/* Let QEMU decide how to handle this. */
+				break;
+			}
+
 			new_spte |= PT_WRITABLE_MASK;
 
 			/*
@@ -4966,6 +4981,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 
 	r = vcpu->arch.mmu.page_fault(vcpu, cr2, lower_32_bits(error_code),
 				      false);
+
+	if (vcpu->run->exit_reason == KVM_EXIT_SPP)
+		return 0;
+
 	if (r < 0)
 		return r;
 	if (!r)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 64a2dbd..c860efe 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -25,6 +25,7 @@
 #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
 #define PT_PAT_MASK (1ULL << 7)
 #define PT_GLOBAL_MASK (1ULL << 8)
+#define PT_SPP_MASK (1ULL << 61)
 #define PT64_NX_SHIFT 63
 #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8388875..0cd821e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -234,6 +234,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_S390_STSI        25
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
+#define KVM_EXIT_SPP              28
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -389,6 +390,10 @@ struct kvm_run {
 		struct {
 			__u8 vector;
 		} eoi;
+		/* KVM_EXIT_SPP */
+		struct {
+			__u64 addr;
+		} spp;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
 		/* Fix the size of the union. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 07/10] KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (6 preceding siblings ...)
  2017-10-13 23:14 ` [PATCH RFC 06/10] KVM: VMX: Added handle of SPP write protection fault Zhang Yi
@ 2017-10-13 23:14 ` Zhang Yi
  2017-10-13 23:14 ` [PATCH RFC 08/10] KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit Zhang Yi
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:14 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

We introduced 2 ioctls to let user application to set/get subpage write
protection bitmap per gfn, each gfn corresponds to a bitmap.

The user application, qemu, or some other security control daemon. will
set the protection bitmap via this ioctl.

the API defined as:

struct kvm_subpage {
	__u64 base_gfn;
	__u64 npages;
	/* sub-page write-access bitmap array */
	__u32 access_map[SUBPAGE_MAX_BITMAP];
}sp;

kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  8 ++++
 arch/x86/kvm/mmu.c              | 49 ++++++++++++++++++++
 arch/x86/kvm/vmx.c              | 19 ++++++++
 arch/x86/kvm/x86.c              | 99 ++++++++++++++++++++++++++++++++++++++++-
 include/linux/kvm_host.h        |  5 +++
 include/uapi/linux/kvm.h        | 11 +++++
 virt/kvm/kvm_main.c             | 26 +++++++++++
 7 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5e8fdda..763cd7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -345,6 +345,8 @@ struct kvm_mmu {
 	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
 	void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   u64 *spte, const void *pte);
+	int (*get_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
+	int (*set_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
 	hpa_t root_hpa;
 	hpa_t sppt_root;
 	union kvm_mmu_page_role base_role;
@@ -703,6 +705,7 @@ struct kvm_lpage_info {
 
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
+	u32 *subpage_wp_info;
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
 };
@@ -1063,6 +1066,8 @@ struct kvm_x86_ops {
 	void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
+	int (*get_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
+	int (*set_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
 };
 
 struct kvm_arch_async_pf {
@@ -1254,6 +1259,9 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
 
+int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+
 void kvm_enable_tdp(void);
 void kvm_disable_tdp(void);
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1fbe467..6c92d19 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1342,6 +1342,15 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
 	return sptep;
 }
 
+static u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot,
+				   gfn_t gfn)
+{
+	unsigned long idx;
+
+	idx = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+	return &slot->arch.subpage_wp_info[idx];
+}
+
 #define for_each_rmap_spte(_rmap_head_, _iter_, _spte_)			\
 	for (_spte_ = rmap_get_first(_rmap_head_, _iter_);		\
 	     _spte_; _spte_ = rmap_get_next(_iter_))
@@ -3971,6 +3980,44 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	return 0;
 }
 
+int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+	u32 *access = spp_info->access_map;
+	gfn_t gfn = spp_info->base_gfn;
+	int npages = spp_info->npages;
+	struct kvm_memory_slot *slot;
+	int i;
+
+	for (i = 0; i < npages; i++, gfn++) {
+		slot = gfn_to_memslot(kvm, gfn);
+		if (!slot)
+			return -EFAULT;
+		access[i] = *gfn_to_subpage_wp_info(slot, gfn);
+	}
+
+	return i;
+}
+
+int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+	u32 access = spp_info->access_map[0];
+	gfn_t gfn = spp_info->base_gfn;
+	int npages = spp_info->npages;
+	struct kvm_memory_slot *slot;
+	u32 *wp_map;
+	int i;
+
+	for (i = 0; i < npages; i++, gfn++) {
+		slot = gfn_to_memslot(kvm, gfn);
+		if (!slot)
+			return -EFAULT;
+		wp_map = gfn_to_subpage_wp_info(slot, gfn);
+		*wp_map = access;
+	}
+
+	return i;
+}
+
 static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 				   struct kvm_mmu *context)
 {
@@ -4523,6 +4570,8 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->get_cr3 = get_cr3;
 	context->get_pdptr = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
+	context->get_subpages = kvm_x86_ops->get_subpages;
+	context->set_subpages = kvm_x86_ops->set_subpages;
 
 	if (!is_paging(vcpu)) {
 		context->nx = false;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fa4f548..9116b53 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6879,6 +6879,11 @@ static __init int hardware_setup(void)
 		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
 	}
 
+	if (!enable_ept_spp) {
+		kvm_x86_ops->get_subpages = NULL;
+		kvm_x86_ops->set_subpages = NULL;
+	}
+
 	if (cpu_has_vmx_preemption_timer() && enable_preemption_timer) {
 		u64 vmx_msr;
 
@@ -12014,6 +12019,18 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static int vmx_get_subpages(struct kvm *kvm,
+			    struct kvm_subpage *spp_info)
+{
+	return kvm_get_subpages(kvm, spp_info);
+}
+
+static int vmx_set_subpages(struct kvm *kvm,
+			    struct kvm_subpage *spp_info)
+{
+	return kvm_set_subpages(kvm, spp_info);
+}
+
 static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -12139,6 +12156,8 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 #endif
 
 	.setup_mce = vmx_setup_mce,
+	.get_subpages = vmx_get_subpages,
+	.set_subpages = vmx_set_subpages,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd17b7d..9c6fc52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4010,6 +4010,18 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	return r;
 }
 
+static int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
+				     struct kvm_subpage *spp_info)
+{
+	return kvm_arch_get_subpages(kvm, spp_info);
+}
+
+static int kvm_vm_ioctl_set_subpages(struct kvm *kvm,
+				     struct kvm_subpage *spp_info)
+{
+	return kvm_arch_set_subpages(kvm, spp_info);
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
@@ -4270,6 +4282,40 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_enable_cap(kvm, &cap);
 		break;
 	}
+	case KVM_SUBPAGES_GET_ACCESS: {
+		struct kvm_subpage spp_info;
+
+		r = -EFAULT;
+		if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+			goto out;
+
+		r = -EINVAL;
+		if (spp_info.npages == 0 ||
+		    spp_info.npages > SUBPAGE_MAX_BITMAP)
+			goto out;
+
+		r = kvm_vm_ioctl_get_subpages(kvm, &spp_info);
+		if (copy_to_user(argp, &spp_info, sizeof(spp_info))) {
+			r = -EFAULT;
+			goto out;
+		}
+		break;
+	}
+	case KVM_SUBPAGES_SET_ACCESS: {
+		struct kvm_subpage spp_info;
+
+		r = -EFAULT;
+		if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+			goto out;
+
+		r = -EINVAL;
+		if (spp_info.npages == 0 ||
+		    spp_info.npages > SUBPAGE_MAX_BITMAP)
+			goto out;
+
+		r = kvm_vm_ioctl_set_subpages(kvm, &spp_info);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
@@ -8240,6 +8286,34 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_page_track_cleanup(kvm);
 }
 
+int kvm_subpage_create_memslot(struct kvm_memory_slot *slot,
+			       unsigned long npages)
+{
+	int lpages;
+
+	lpages = gfn_to_index(slot->base_gfn + npages - 1,
+			      slot->base_gfn, 1) + 1;
+
+	slot->arch.subpage_wp_info =
+	      kvzalloc(lpages * sizeof(*slot->arch.subpage_wp_info),
+		       GFP_KERNEL);
+
+	if (!slot->arch.subpage_wp_info)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void kvm_subpage_free_memslot(struct kvm_memory_slot *free,
+			      struct kvm_memory_slot *dont)
+{
+	if (!dont || free->arch.subpage_wp_info !=
+		dont->arch.subpage_wp_info) {
+		kvfree(free->arch.subpage_wp_info);
+		free->arch.subpage_wp_info = NULL;
+	}
+}
+
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 			   struct kvm_memory_slot *dont)
 {
@@ -8261,6 +8335,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 	}
 
 	kvm_page_track_free_memslot(free, dont);
+	kvm_subpage_free_memslot(free, dont);
 }
 
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -8312,8 +8387,12 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (kvm_page_track_create_memslot(slot, npages))
 		goto out_free;
 
-	return 0;
+	if (kvm_subpage_create_memslot(slot, npages))
+		goto out_free_page_track;
 
+	return 0;
+out_free_page_track:
+	kvm_page_track_free_memslot(slot, NULL);
 out_free:
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
 		kvfree(slot->arch.rmap[i]);
@@ -8790,6 +8869,24 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
 	return kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
 }
 
+int kvm_arch_get_subpages(struct kvm *kvm,
+			  struct kvm_subpage *spp_info)
+{
+	if (!kvm_x86_ops->get_subpages)
+		return -EINVAL;
+
+	return kvm_x86_ops->get_subpages(kvm, spp_info);
+}
+
+int kvm_arch_set_subpages(struct kvm *kvm,
+			  struct kvm_subpage *spp_info)
+{
+	if (!kvm_x86_ops->set_subpages)
+		return -EINVAL;
+
+	return kvm_x86_ops->set_subpages(kvm, spp_info);
+}
+
 bool kvm_vector_hashing_enabled(void)
 {
 	return vector_hashing;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6882538..9f33a57 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -803,6 +803,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
+int kvm_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_arch_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_arch_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0cd821e..fca4dc7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -101,6 +101,15 @@ struct kvm_userspace_memory_region {
 	__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+#define SUBPAGE_MAX_BITMAP 128
+struct kvm_subpage {
+	__u64 base_gfn;
+	__u64 npages;
+	 /* sub-page write-access bitmap array */
+	__u32 access_map[SUBPAGE_MAX_BITMAP];
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
  * other bits are reserved for kvm internal use which are defined in
@@ -1184,6 +1193,8 @@ struct kvm_vfio_spapr_tce {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SUBPAGES_GET_ACCESS   _IOR(KVMIO,  0x49, __u64)
+#define KVM_SUBPAGES_SET_ACCESS   _IOW(KVMIO,  0x4a, __u64)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9deb5a2..9a51ee4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1104,6 +1104,32 @@ int kvm_get_dirty_log(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
 
+int kvm_get_subpages(struct kvm *kvm,
+		     struct kvm_subpage *spp_info)
+{
+	int ret;
+
+	mutex_lock(&kvm->slots_lock);
+	ret = kvm_mmu_get_subpages(kvm, spp_info);
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_get_subpages);
+
+int kvm_set_subpages(struct kvm *kvm,
+		     struct kvm_subpage *spp_info)
+{
+	int ret;
+
+	mutex_lock(&kvm->slots_lock);
+	ret = kvm_mmu_set_subpages(kvm, spp_info);
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_set_subpages);
+
 #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 /**
  * kvm_get_dirty_log_protect - get a snapshot of dirty pages, and if any pages
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 08/10] KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (7 preceding siblings ...)
  2017-10-13 23:14 ` [PATCH RFC 07/10] KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection Zhang Yi
@ 2017-10-13 23:14 ` Zhang Yi
  2017-10-13 23:14 ` [PATCH RFC 09/10] KVM: VMX: Added setup spp page structure Zhang Yi
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:14 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

If the sub-page write permission VM-execution control is set,
treatment of write accesses to guest-physical accesses
depends on the state of the accumulated write-access bit (position 1)
and sub-page permission bit (position 61) in the EPT leaf paging-structure.

Software will update the EPT leaf entry sub-page permission bit while
kvm_set_subpage. If the EPT write-access bit set to 0 and the SPP bit
set to 1 in the leaf EPT paging-structure entry that maps a 4KB page,
then the hardware will look up a VMM-managed Sub-Page Permission Table
(SPPT), which will also be prepared by setup kvm_set_subpage.

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6c92d19..0bda9eb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1580,6 +1580,87 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static bool __rmap_open_subpage_bit(struct kvm *kvm,
+				    struct kvm_rmap_head *rmap_head)
+{
+	struct rmap_iterator iter;
+	bool flush = false;
+	u64 *sptep;
+	u64 spte;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
+		/*
+		 * SPP works only when the page is unwritable
+		 * and SPP bit is set
+		 */
+		flush |= spte_write_protect(sptep, false);
+		spte = *sptep | PT_SPP_MASK;
+		flush |= mmu_spte_update(sptep, spte);
+	}
+
+	return flush;
+}
+
+static int kvm_mmu_open_subpage_write_protect(struct kvm *kvm,
+					      struct kvm_memory_slot *slot,
+					      gfn_t gfn)
+{
+	struct kvm_rmap_head *rmap_head;
+	bool flush = false;
+
+	/*
+	 * we only support spp in a normal 4K level 1 page frame
+	 * If it a huge page, we drop it.
+	 */
+	rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+	if (!rmap_head->val)
+		return -EFAULT;
+
+	flush |= __rmap_open_subpage_bit(kvm, rmap_head);
+
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	return 0;
+}
+
+static bool __rmap_clear_subpage_bit(struct kvm *kvm,
+				     struct kvm_rmap_head *rmap_head)
+{
+	struct rmap_iterator iter;
+	bool flush = false;
+	u64 *sptep;
+	u64 spte;
+
+	for_each_rmap_spte(rmap_head, &iter, sptep) {
+		spte = (*sptep & ~PT_SPP_MASK) | PT_WRITABLE_MASK;
+		flush |= mmu_spte_update(sptep, spte);
+	}
+
+	return flush;
+}
+
+static int kvm_mmu_clear_subpage_write_protect(struct kvm *kvm,
+					       struct kvm_memory_slot *slot,
+					       gfn_t gfn)
+{
+	struct kvm_rmap_head *rmap_head;
+	bool flush = false;
+
+	rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+	if (!rmap_head->val)
+		return -EFAULT;
+
+		flush |= __rmap_clear_subpage_bit(kvm, rmap_head);
+
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	return 0;
+}
+
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn)
 {
@@ -4005,12 +4086,31 @@ int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
 	int npages = spp_info->npages;
 	struct kvm_memory_slot *slot;
 	u32 *wp_map;
+	int ret;
 	int i;
 
 	for (i = 0; i < npages; i++, gfn++) {
 		slot = gfn_to_memslot(kvm, gfn);
 		if (!slot)
 			return -EFAULT;
+
+		/*
+		 * open SPP bit in EPT leaf entry to write protect the
+		 * sub-pages in corresponding page
+		 */
+		if (access != (u32)((1ULL << 32) - 1))
+			ret = kvm_mmu_open_subpage_write_protect(
+			kvm, slot, gfn);
+		else
+			ret = kvm_mmu_clear_subpage_write_protect(
+			kvm, slot, gfn);
+
+		if (ret) {
+			pr_info("SPP ,didn't get the gfn:%llx from EPT leaf level1\n"
+				"Current we didn't support huge page on SPP\n"
+				"Please try to disable the huge page\n", gfn);
+			return -EFAULT;
+		}
 		wp_map = gfn_to_subpage_wp_info(slot, gfn);
 		*wp_map = access;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 09/10] KVM: VMX: Added setup spp page structure.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (8 preceding siblings ...)
  2017-10-13 23:14 ` [PATCH RFC 08/10] KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit Zhang Yi
@ 2017-10-13 23:14 ` Zhang Yi
  2017-10-13 23:16 ` [PATCH RFC 10/10] KVM: VMX: implement setup SPP page structure in spp miss Zhang Yi
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:14 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

The hardware uses the guest-physical address and bits 11:7 of the
address accessed to lookup the SPPT to fetch a write permission bit for
the 128 byte wide sub-page region being accessed within the 4K
guest-physical page. If the sub-page region write permission bit is set,
the write is allowed; otherwise the write is disallowed and results in
an EPT violation.

Guest-physical pages mapped via leaf EPT-paging-structures for which the
accumulated write-access bit and the SPP bits are both clear (0) generate
EPT violations on memory writes accesses. Guest-physical pages mapped via
EPT-paging-structure for which the accumulated write-access bit is set
(1) allow writes, effectively ignoring the SPP bit on the leaf EPT-paging
structure.

Software will setup the spp page table level4,3,2 as well as EPT page
structure, and fill the level1 via the 32 bit bitmap per a single 4K page.
Now it could be divided to 32 x 128 sub-pages.

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |   4 ++
 arch/x86/kvm/mmu.c              | 123 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 763cd7e..ef50d98 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1256,6 +1256,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
 		       void *insn, int insn_len);
+
+int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
+				u32 access_map, gfn_t gfn);
+
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0bda9eb..c229324 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -174,6 +174,11 @@ struct kvm_shadow_walk_iterator {
 		({ spte = mmu_spte_get_lockless(_walker.sptep); 1; });	\
 	     __shadow_walk_next(&(_walker), spte))
 
+#define for_each_shadow_spp_entry(_vcpu, _addr, _walker)    \
+	for (shadow_spp_walk_init(&(_walker), _vcpu, _addr);	\
+	     shadow_walk_okay(&(_walker));			\
+	     shadow_walk_next(&(_walker)))
+
 static struct kmem_cache *pte_list_desc_cache;
 static struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
@@ -394,6 +399,11 @@ static int is_shadow_present_pte(u64 pte)
 	return (pte != 0) && !is_mmio_spte(pte);
 }
 
+static int is_spp_mide_page_present(u64 pte)
+{
+	return pte & PT_PRESENT_MASK;
+}
+
 static int is_large_pte(u64 pte)
 {
 	return pte & PT_PAGE_SIZE_MASK;
@@ -413,6 +423,11 @@ static bool is_executable_pte(u64 spte)
 	return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
 }
 
+static bool is_spp_spte(struct kvm_mmu_page *sp)
+{
+	return sp->role.spp;
+}
+
 static kvm_pfn_t spte_to_pfn(u64 pte)
 {
 	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
@@ -2512,6 +2527,16 @@ static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
 	}
 }
 
+static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
+				 struct kvm_vcpu *vcpu, u64 addr)
+{
+	iterator->addr = addr;
+	iterator->shadow_addr = vcpu->arch.mmu.sppt_root;
+
+	/* SPP Table is a 4-level paging structure */
+	iterator->level = 4;
+}
+
 static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
 {
 	if (iterator->level < PT_PAGE_TABLE_LEVEL)
@@ -2562,6 +2587,18 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
 		mark_unsync(sptep);
 }
 
+static void link_spp_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+				 struct kvm_mmu_page *sp)
+{
+	u64 spte;
+
+	spte = __pa(sp->spt) | PT_PRESENT_MASK;
+
+	mmu_spte_set(sptep, spte);
+
+	mmu_page_add_parent_pte(vcpu, sp, sptep);
+}
+
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 				   unsigned direct_access)
 {
@@ -2592,7 +2629,13 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 
 	pte = *spte;
 	if (is_shadow_present_pte(pte)) {
-		if (is_last_spte(pte, sp->role.level)) {
+		if (is_spp_spte(sp)) {
+			if (sp->role.level == PT_PAGE_TABLE_LEVEL)
+				//spp page do not need to release rmap.
+				return true;
+			child = page_header(pte & PT64_BASE_ADDR_MASK);
+			drop_parent_pte(child, spte);
+		} else if (is_last_spte(pte, sp->role.level)) {
 			drop_spte(kvm, spte);
 			if (is_large_pte(pte))
 				--kvm->stat.lpages;
@@ -4061,6 +4104,77 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	return 0;
 }
 
+static u64 format_spp_spte(u32 spp_wp_bitmap)
+{
+	u64 new_spte = 0;
+	int i = 0;
+
+	/*
+	 * One 4K page contains 32 sub-pages, in SPP table L4E, old bits
+	 * are reserved, so we need to transfer u32 subpage write
+	 * protect bitmap to u64 SPP L4E format.
+	 */
+	while (i < 32) {
+		if (spp_wp_bitmap & (1ULL << i))
+			new_spte |= 1ULL << (i * 2);
+
+		i++;
+	}
+
+	return new_spte;
+}
+
+static void mmu_spp_spte_set(u64 *sptep, u64 new_spte)
+{
+	__set_spte(sptep, new_spte);
+}
+
+int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
+				u32 access_map, gfn_t gfn)
+{
+	struct kvm_shadow_walk_iterator iter;
+	struct kvm_mmu_page *sp;
+	gfn_t pseudo_gfn;
+	u64 old_spte, spp_spte;
+	struct kvm *kvm = vcpu->kvm;
+
+	spin_lock(&kvm->mmu_lock);
+
+	/* direct_map spp start */
+
+	if (!VALID_PAGE(vcpu->arch.mmu.sppt_root))
+		goto out_unlock;
+
+	for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
+		if (iter.level == PT_PAGE_TABLE_LEVEL) {
+			spp_spte = format_spp_spte(access_map);
+			old_spte = mmu_spte_get_lockless(iter.sptep);
+			if (old_spte != spp_spte) {
+				mmu_spp_spte_set(iter.sptep, spp_spte);
+				kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+			}
+			break;
+		}
+
+		if (!is_spp_mide_page_present(*iter.sptep)) {
+			u64 base_addr = iter.addr;
+
+			base_addr &= PT64_LVL_ADDR_MASK(iter.level);
+			pseudo_gfn = base_addr >> PAGE_SHIFT;
+			sp = kvm_mmu_get_spp_page(vcpu, pseudo_gfn,
+						  iter.level - 1);
+			link_spp_shadow_page(vcpu, iter.sptep, sp);
+		}
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+	return 0;
+
+out_unlock:
+	spin_unlock(&kvm->mmu_lock);
+	return -EFAULT;
+}
+
 int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
 {
 	u32 *access = spp_info->access_map;
@@ -4085,9 +4199,10 @@ int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
 	gfn_t gfn = spp_info->base_gfn;
 	int npages = spp_info->npages;
 	struct kvm_memory_slot *slot;
+	struct kvm_vcpu *vcpu;
 	u32 *wp_map;
 	int ret;
-	int i;
+	int i, j;
 
 	for (i = 0; i < npages; i++, gfn++) {
 		slot = gfn_to_memslot(kvm, gfn);
@@ -4111,6 +4226,10 @@ int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
 				"Please try to disable the huge page\n", gfn);
 			return -EFAULT;
 		}
+
+		kvm_for_each_vcpu(j, vcpu, kvm)
+			kvm_mmu_setup_spp_structure(vcpu, access, gfn);
+
 		wp_map = gfn_to_subpage_wp_info(slot, gfn);
 		*wp_map = access;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC 10/10] KVM: VMX: implement setup SPP page structure in spp miss.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (9 preceding siblings ...)
  2017-10-13 23:14 ` [PATCH RFC 09/10] KVM: VMX: Added setup spp page structure Zhang Yi
@ 2017-10-13 23:16 ` Zhang Yi
  2017-10-18  7:09 ` [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Christoph Hellwig
  2017-11-04  0:12 ` Yi Zhang
  12 siblings, 0 replies; 29+ messages in thread
From: Zhang Yi @ 2017-10-13 23:16 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, Zhang Yi Z

From: Zhang Yi Z <yi.z.zhang@linux.intel.com>

We also should setup SPP page structure while we catch
a SPP miss, some case, such as hotplug vcpu, should update
the SPP page table in SPP miss handler.

Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.c              | 12 ++++++++++++
 arch/x86/kvm/vmx.c              |  8 ++++++++
 3 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ef50d98..bc56c4c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1260,6 +1260,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
 int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
 				u32 access_map, gfn_t gfn);
 
+int kvm_mmu_get_spp_acsess_map(struct kvm *kvm, u32 *access_map, gfn_t gfn);
+
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
 void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c229324..88b8571 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4129,6 +4129,17 @@ static void mmu_spp_spte_set(u64 *sptep, u64 new_spte)
 	__set_spte(sptep, new_spte);
 }
 
+int kvm_mmu_get_spp_acsess_map(struct kvm *kvm, u32 *access_map, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	*access_map = *gfn_to_subpage_wp_info(slot, gfn);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_spp_acsess_map);
+
 int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
 				u32 access_map, gfn_t gfn)
 {
@@ -4174,6 +4185,7 @@ int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
 	spin_unlock(&kvm->mmu_lock);
 	return -EFAULT;
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_setup_spp_structure);
 
 int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9116b53..c4cd773 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8005,6 +8005,9 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 static int handle_spp(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qualification;
+	gpa_t gpa;
+	gfn_t gfn;
+	u32 map;
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
@@ -8031,6 +8034,11 @@ static int handle_spp(struct kvm_vcpu *vcpu)
 		 * SPP table here.
 		 */
 		pr_debug("SPP: %s: SPPT Miss!!!\n", __func__);
+
+		gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+		gfn = gpa >> PAGE_SHIFT;
+		kvm_mmu_get_spp_acsess_map(vcpu->kvm, &map, gfn);
+		kvm_mmu_setup_spp_structure(vcpu, map, gfn);
 		return 1;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-13 16:57 ` Jim Mattson
  2017-10-13 21:13   ` Paolo Bonzini
@ 2017-10-16  0:01   ` Yi Zhang
  1 sibling, 0 replies; 29+ messages in thread
From: Yi Zhang @ 2017-10-16  0:01 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, LKML, Paolo Bonzini, Radim Krčmář

[-- Attachment #1: Type: text/plain, Size: 13483 bytes --]

Thanks for your review Jim.

On 2017-10-13 at 09:57:45 -0700, Jim Mattson wrote:
> I'll ask before Paolo does: Can you please add kvm-unit-tests to
> exercise all of this new code?
it is should be a API/ioctl tools rather than a kvm-unit-test. Actually,
I have prepared a draft version of tools which embedded in the qemu
command line, mean that we could set/get the subpage protection via qemu
command.

Attached the qemu patch. BTW, it is a pre-design version, I will send a
formal qemu patch to qemu list after the API/ioctl was fix by kvm side.

> 
> BTW, what generation of hardware do we need to exercise this code ourselves?
As far as I know , This feature will enable on Intel next-generation Ice
Lake chips.

> 
> On Fri, Oct 13, 2017 at 4:11 PM, Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
> > From: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> >
> > Hi All,
> >
> > Here is a patch-series which adding EPT-Based Sub-page Write Protection Support. You can get It's software developer manuals from:
> >
> > https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> >
> > In Chapter 4 EPT-BASED SUB-PAGE PERMISSIONS.
> >
> > Introduction:
> >
> > EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which allow Virtual Machine Monitors(VMM) to specify write-permission for guest physical memory at a sub-page(128 byte) granularity.  When this capability is utilized, the CPU enforces write-access permissions for sub-page regions of 4K pages as specified by the VMM. EPT-based sub-page permissions is intended to enable fine-grained memory write enforcement by a VMM for security(guest OS monitoring) and usages such as device virtualization and memory check-point.
> >
> > How SPP Works:
> >
> > SPP is active when the "sub-page write protection" VM-execution control is 1. A new 4-level paging structure named SPP page table(SPPT) is introduced, SPPT will look up the guest physical addresses to derive a 64 bit "sub-page permission" value containing sub-page write permissions. The lookup from guest-physical addresses to the sub-page region permissions is determined by a set of this SPPT paging structures.
> >
> > The SPPT is used to lookup write permission bits for the 128 byte sub-page regions containing in the 4KB guest physical page. EPT specifies the 4KB page level privileges that software is allowed when accessing the guest physical address, whereas SPPT defines the write permissions for software at the 128 byte granularity regions within a 4KB page. Write accesses prevented due to sub-page permissions looked up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to lookup sub-page region write permissions for guest-physical addresses only when those addresses are used to access memory.
> >
> > Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
> > ┌-----------------------------------------------------------┘
> > └-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
> >      |
> >      └-> <false> --> EPT legacy behavior
> >      |
> >      |
> >      └-> <true>  --> if ept_leaf_entry.writable
> >                       |
> >                       └-> <true>  --> Ignore SPP
> >                       |
> >                       └-> <false> --> GPA --> Walk SPP 4-level table--┐
> >                                                                       |
> > ┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
> > |
> > Walk SPP L4E table
> > |
> > └┐--> entry misconfiguration ------------>----------┐<----------------┐
> >  |                                                  |                 |
> > else                                                |                 |
> >  |                                                  |                 |
> >  |   ┌------------------SPP VMexit<-----------------┘                 |
> >  |   |                                                                |
> >  |   └-> exit_qualification & sppt_misconfig --> sppt misconfig       |
> >  |   |                                                                |
> >  |   └-> exit_qualification & sppt_miss --> sppt miss                 |
> >  └--┐                                                                 |
> >     |                                                                 |
> > walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
> >                |                                                      |
> >               else                                                    |
> >                |                                                      |
> >                |                                                      |
> >         walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
> >                         |                                             |
> >                        else                                           |
> >                         |                                             |
> >                         |                                             |
> >                  walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
> >                                  |
> >                                 else
> >                                  |
> >                                  └-> if sub-page writable
> >                                       └-> <true>  allow, write access
> >                                       └-> <false> disallow, EPT violation
> >
> > Patch-sets Description:
> >
> > Patch 1: Documentation.
> >
> > Patch 2: This patch adds reporting SPP capability from VMX Procbased MSR, according to the definition of hardware spec, bit 23 is the control of the SPP capability.
> >
> > Patch 3: Add new secondary processor-based VM-execution control bit which defined as "sub-page write permission", same as VMX Procbased MSR, bit 23 is the enable bit of SPP.
> > Also we introduced a kernel parameter "enable_ept_spp", now SPP is active when the "Sub-page Write Protection" in Secondary  VM-Execution Control is set and enable the kernel parameter by "enable_ept_spp=1".
> >
> > Patch 4: Introduced the spptp and spp page table.
> > The sub-page permission table is referenced via a 64-bit control field called Sub-Page Permission Table Pointer (SPPTP) which contains a 4K-aligned physical address. The index and encoding for this VMCS field if defined 0x2030 at this time The format of SPPTP is shown in below figure 2:
> > this patch introduced the Spp paging structures, which root page will created at kvm mmu page initialization.
> > Also we added a mmu page role type spp to distinguish it is a spp page or a EPT page.
> >
> > Patch 5: Introduced the SPP-Induced VM exit and it's handle.
> > Accesses using guest-physical addresses may cause SPP-induced VM exits due to an SPPT misconfiguration or an SPPT miss. The basic VM exit reason code reporte for SPP-induced VM exits is 66.
> >
> > Also introduced the new exit qualification for SPPT-induced vmexits.
> >
> > | Bit   | Contents                                                          |
> > | :---- | :---------------------------------------------------------------- |
> > | 10:0  | Reserved (0).                                                     |
> > | 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
> > | 12    | NMI unblocking due to IRET                                        |
> > | 63:13 | Reserved (0)                                                      |
> >
> > Patch 6: Added a handle of EPT subpage write protection fault.
> > A control bit in EPT leaf paging-structure entries is defined as “Sub-Page Permission” (SPP bit). The bit position is 61; it is chosen from among the bits that are currently ignored by the processor and available to software.
> > While hardware walking the SPP page table, If the sub-page region write permission bit is set, the write is allowed, else the write is disallowed and results in an EPT violation.
> > We need peek this case in EPT violation handler, and trigger a user-space exit, return the write protected address(GVA) to user(qemu).
> >
> > Patch 7: Introduce ioctls to set/get Sub-Page Write Protection.
> > We introduced 2 ioctls to let user application to set/get subpage write protection bitmap per gfn, each gfn corresponds to a bitmap.
> > The user application, qemu, or some other security control daemon. will set the protection bitmap via this ioctl.
> > the API defined as:
> >         struct kvm_subpage {
> >                 __u64 base_gfn;
> >                 __u64 npages;
> >                 /* sub-page write-access bitmap array */
> >                 __u32 access_map[SUBPAGE_MAX_BITMAP];
> >                 }sp;
> >         kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
> >         kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
> >
> > Patch 8 ~ Patch 9: Setup spp page table and update the EPT leaf entry indicated with the SPP enable bit.
> > If the sub-page write permission VM-execution control is set, treatment of write accesses to guest-physical accesses depends on the state of the accumulated write-access bit (position 1) and sub-page permission bit (position 61) in the EPT leaf paging-structure.
> > Software will update the EPT leaf entry sub-page permission bit while kvm_set_subpage(patch 7). If the EPT write-access bit set to 0 and the SPP bit set to 1 in the leaf EPT paging-structure entry that maps a 4KB page, then the hardware will look up a VMM-managed Sub-Page Permission Table (SPPT), which will be prepared by setup kvm_set_subpage(patch 8).
> > The hardware uses the guest-physical address and bits 11:7 of the address accessed to lookup the SPPT to fetch a write permission bit for the 128 byte wide sub-page region being accessed within the 4K guest-physical page. If the sub-page region write permission bit is set, the write is allowed, otherwise the write is disallowed and results in an EPT violation.
> > Guest-physical pages mapped via leaf EPT-paging-structures for which the accumulated write-access bit and the SPP bits are both clear (0) generate EPT violations on memory writes accesses. Guest-physical pages mapped via EPT-paging-structure for which the accumulated write-access bit is set (1) allow writes, effectively ignoring the SPP bit on the leaf EPT-paging structure.
> > Software will setup the spp page table level4,3,2 as well as EPT page structure, and fill the level 1 page via the 32 bit bitmaps per a single 4K page. Now it could be divided to 32 x 128 sub-pages.
> >
> > The SPP L4E L3E L2E is defined as below figure.
> >
> > | Bit    | Contents                                                               |
> > | :----- | :--------------------------------------------------------------------- |
> > | 0      | Valid entry when set; indicates whether the entry is present           |
> > | 11:1   | Reserved (0)                                                           |
> > | N-1:12 | Physical address of 4K aligned SPPT LX-1 Table referenced by the entry |
> > | 51:N   | Reserved (0)                                                           |
> > | 63:52  | Reserved (0)                                                           |
> > Note: N is the physical address width supported by the processor, X is the page level
> >
> > The SPP L1E format is defined as below figure.
> > | Bit   | Contents                                                          |
> > | :---- | :---------------------------------------------------------------- |
> > | 0+2i  | Write permission for i-th 128 byte sub-page region.               |
> > | 1+2i  | Reserved (0).                                                     |
> > Note: `0<=i<=31`
> >
> >
> > Zhang Yi Z (10):
> >   KVM: VMX: Added EPT Subpage Protection Documentation.
> >   x86/cpufeature: Add intel Sub-Page Protection to CPU features
> >   KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
> >   KVM: VMX: Introduce the SPPTP and SPP page table.
> >   KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
> >   KVM: VMX: Added handle of SPP write protection fault.
> >   KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
> >   KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
> >   KVM: VMX: Added setup spp page structure.
> >   KVM: VMX: implement setup SPP page structure in spp miss.
> >
> >  Documentation/virtual/kvm/spp_design_kvm.txt | 272 +++++++++++++++++++++
> >  arch/x86/include/asm/cpufeatures.h           |   1 +
> >  arch/x86/include/asm/kvm_host.h              |  18 +-
> >  arch/x86/include/asm/vmx.h                   |  10 +
> >  arch/x86/include/uapi/asm/vmx.h              |   2 +
> >  arch/x86/kernel/cpu/intel.c                  |   4 +
> >  arch/x86/kvm/mmu.c                           | 340 ++++++++++++++++++++++++++-
> >  arch/x86/kvm/mmu.h                           |   1 +
> >  arch/x86/kvm/vmx.c                           | 104 ++++++++
> >  arch/x86/kvm/x86.c                           |  99 +++++++-
> >  include/linux/kvm_host.h                     |   5 +
> >  include/uapi/linux/kvm.h                     |  16 ++
> >  virt/kvm/kvm_main.c                          |  26 ++
> >  13 files changed, 893 insertions(+), 5 deletions(-)
> >  create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt
> >
> > --
> > 2.7.4
> >

[-- Attachment #2: 0001-x86-Intel-Sub-Page-Protection-support.patch --]
[-- Type: text/x-diff, Size: 10314 bytes --]

>From a369bed5d986dccb3ca36dc5a27c6220ca2d1405 Mon Sep 17 00:00:00 2001
From: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Date: Tue, 14 Mar 2017 15:11:38 +0800
Subject: [PATCH] x86: Intel Sub-Page Protection support

Signed-off-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
---
 hmp-commands.hx           | 26 ++++++++++++++++++++++++++
 hmp.c                     | 26 ++++++++++++++++++++++++++
 hmp.h                     |  2 ++
 include/sysemu/kvm.h      |  2 ++
 kvm-all.c                 | 40 ++++++++++++++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h | 15 +++++++++++++++
 qapi-schema.json          | 41 +++++++++++++++++++++++++++++++++++++++++
 qmp.c                     | 43 +++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm.c         | 22 ++++++++++++++++++++++
 9 files changed, 217 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8819281..7a57411 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1766,6 +1766,32 @@ Set QOM property @var{property} of object at location @var{path} to value @var{v
 ETEXI
 
     {
+        .name       = "get-subpage",
+        .args_type  = "base_gfn:l,npages:l,filename:str",
+        .params     = "base_gfn npages filename",
+        .help       = "get the write-protect bitmap setting of sub-page protectio",
+        .cmd        = hmp_get_subpage,
+    },
+
+STEXI
+@item get-subpage @var{base_gfn} @var{npages} @var{file}
+Get the write-protect bitmap setting of sub-page protection in the range of @var{base_gfn} to @var{base_gfn} + @var{npages}
+ETEXI
+
+    {
+        .name       = "set-subpage",
+        .args_type  = "base_gfn:l,npages:l,wp_map:i",
+        .params     = "base_gfn npages",
+        .help       = "set the write-protect bitmap setting of sub-page protectio",
+        .cmd        = hmp_set_subpage,
+    },
+
+STEXI
+@item set-subpage @var{base_gfn} @var{npages}
+Get the write-protect bitmap setting of sub-page protection in the range of @var{base_gfn} to @var{base_gfn} + @var{npages}
+ETEXI
+
+    {
         .name       = "info",
         .args_type  = "item:s?",
         .params     = "[subcommand]",
diff --git a/hmp.c b/hmp.c
index 261843f..7d217e9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2614,3 +2614,29 @@ void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict)
     }
     qapi_free_GuidInfo(info);
 }
+
+void hmp_get_subpage(Monitor *mon, const QDict *qdict)
+{
+    uint64_t base_gfn = qdict_get_int(qdict, "base_gfn");
+    uint64_t npages = qdict_get_int(qdict, "npages");
+    const char *filename = qdict_get_str(qdict, "filename");
+    Error *err = NULL;
+
+    monitor_printf(mon, "base_gfn: %ld, npages: %ld, file: %s\n", base_gfn, npages, filename);
+
+    qmp_get_subpage(base_gfn, npages, filename, &err);
+    hmp_handle_error(mon, &err);
+}
+
+void hmp_set_subpage(Monitor *mon, const QDict *qdict)
+{
+    uint64_t base_gfn = qdict_get_int(qdict, "base_gfn");
+    uint64_t npages = qdict_get_int(qdict, "npages");
+    uint32_t wp_map = qdict_get_int(qdict, "wp_map");
+    Error *err = NULL;
+
+    monitor_printf(mon, "base_gfn: %ld, npages: %ld, wp_map: %d\n", base_gfn, npages, wp_map);
+
+    qmp_set_subpage(base_gfn, npages, wp_map, &err);
+    hmp_handle_error(mon, &err);
+}
diff --git a/hmp.h b/hmp.h
index 799fd37..b72143f 100644
--- a/hmp.h
+++ b/hmp.h
@@ -138,5 +138,7 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
 void hmp_info_dump(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
+void hmp_get_subpage(Monitor *mon, const QDict *qdict);
+void hmp_set_subpage(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 24281fc..f7c1340 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -528,4 +528,6 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void *source);
  */
 int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target);
 int kvm_get_max_memslots(void);
+int kvm_get_subpage_wp_map(uint64_t base_gfn, uint32_t *buf, uint64_t len);
+int kvm_set_subpage_wp_map(uint64_t base_gfn, uint64_t npages, uint32_t wp_map);
 #endif
diff --git a/kvm-all.c b/kvm-all.c
index 9040bd5..58cc0a4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2593,6 +2593,46 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target)
     return r;
 }
 
+int kvm_get_subpage_wp_map(uint64_t base_gfn, uint32_t *buf,
+		           uint64_t len)
+{
+	KVMState *s = kvm_state;
+	struct kvm_subpage sp = {};
+	int n;
+
+	sp.base_gfn = base_gfn;
+	sp.npages = len;
+
+
+	if (kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp) < 0) {
+		DPRINTF("ioctl failed %d\n", errno);
+		return -1;
+	}
+
+	memcpy(buf, sp.access_map, n * sizeof(uint32_t));
+
+	return n;
+}
+
+int kvm_set_subpage_wp_map(uint64_t base_gfn, uint64_t npages,
+		           uint32_t wp_map)
+{
+	KVMState *s = kvm_state;
+	struct kvm_subpage sp = {};
+
+	sp.base_gfn = base_gfn;
+	sp.npages = npages;
+	sp.access_map[0] = wp_map;
+
+
+	if (kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp) < 0) {
+		DPRINTF("ioctl failed %d\n", errno);
+		return -1;
+	}
+
+	return 0;
+}
+
 static void kvm_accel_class_init(ObjectClass *oc, void *data)
 {
     AccelClass *ac = ACCEL_CLASS(oc);
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 4e082a8..69de005 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -205,6 +205,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_S390_STSI        25
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
+#define KVM_EXIT_SPP              28
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -360,6 +361,10 @@ struct kvm_run {
 		struct {
 			__u8 vector;
 		} eoi;
+		/* KVM_EXIT_SPP */
+		struct {
+			__u64 addr;
+		} spp;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
 		/* Fix the size of the union. */
@@ -1126,6 +1131,8 @@ enum kvm_device_type {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SUBPAGES_GET_ACCESS   _IOR(KVMIO,  0x49, __u64)
+#define KVM_SUBPAGES_SET_ACCESS   _IOW(KVMIO,  0x4a, __u64)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
@@ -1354,4 +1361,12 @@ struct kvm_assigned_msix_entry {
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+#define SUBPAGE_MAX_BITMAP 256
+struct kvm_subpage {
+       __u64 base_gfn;
+       __u64 npages;
+       __u32 access_map[SUBPAGE_MAX_BITMAP]; /* sub-page write-access bitmap array */
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/qapi-schema.json b/qapi-schema.json
index 32b4a4b..d6b46bb 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -6267,3 +6267,44 @@
 # Since 2.9
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
+
+##
+# @get-subpage:
+#
+# This command will get setting information of sub-page
+# protection.
+#
+# Since: 2.10
+#
+# Example:
+#
+# -> { "execute": "get-subpage",
+#      "arguments": { "base_gfn": 0x1000,
+#                     "npages": 10,
+#                     "filename": "/tmp/spp_info" } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'get-subpage',
+  'data': {'base_gfn': 'uint64', 'npages': 'uint64', 'filename': 'str'} }
+
+
+##
+# @set-subpage:
+#
+# This command will set sub-page protection for given GFNs.
+#
+# Since: 2.10
+#
+# Example:
+#
+# -> { "execute": "set-subpage",
+#      "arguments": { "base_gfn": 0x1000,
+#                     "npages": 10,
+#                     "wp_map": 0xffff0000 } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'set-subpage',
+  'data': {'base_gfn': 'uint64', 'npages': 'uint64', 'wp_map': 'uint32'} }
+
diff --git a/qmp.c b/qmp.c
index fa82b59..274efdb 100644
--- a/qmp.c
+++ b/qmp.c
@@ -717,3 +717,46 @@ ACPIOSTInfoList *qmp_query_acpi_ospm_status(Error **errp)
 
     return head;
 }
+
+#define SUBPAGE_BUF_LEN 256
+void qmp_get_subpage(uint64_t base_gfn, uint64_t npages,
+		     const char *filename, Error **errp)
+{
+    FILE *f;
+    uint64_t n;
+    uint32_t buf[SUBPAGE_BUF_LEN];
+
+    f = fopen(filename, "wb");
+    if (!f) {
+        error_setg_file_open(errp, errno, filename);
+        return;
+    }
+
+    while (npages != 0) {
+        n = npages;
+        if (n > SUBPAGE_BUF_LEN)
+            n = SUBPAGE_BUF_LEN;
+        if (kvm_get_subpage_wp_map(base_gfn, buf, n) < 0) {
+            error_setg(errp, QERR_IO_ERROR);
+            goto exit;
+        }
+        if (fwrite(buf, 4, n, f) != n) {
+            error_setg(errp, QERR_IO_ERROR);
+            goto exit;
+        }
+        base_gfn += n;
+        npages -= n;
+    }
+
+exit:
+    fclose(f);
+}
+
+void qmp_set_subpage(uint64_t base_gfn, uint64_t npages,
+		     uint32_t wp_map, Error **errp)
+{
+	if (kvm_set_subpage_wp_map(base_gfn, npages, wp_map) < 0)
+		error_setg(errp, QERR_IO_ERROR);
+
+}
+
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 472399f..18a43d7 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -3147,6 +3147,23 @@ static int kvm_handle_debug(X86CPU *cpu,
     return ret;
 }
 
+static int kvm_handle_spp(uint64_t addr)
+{
+    /*
+    uint64_t base_gfn = addr >> 12;
+    uint64_t offset = addr & ((1 << 12) - 1);
+    int subpage_index = offset >> 7;
+    uint32_t mask;
+
+    kvm_get_subpage_wp_map(base_gfn, &mask, 1);
+    mask |= 1UL << subpage_index;
+    return kvm_set_subpage_wp_map(base_gfn, 1, mask);
+    */
+
+    fprintf(stderr, "QEMU-SPP: we are in kvm_handle_spp now, addr=0x%lx!\n", addr);
+    return 0;
+}
+
 void kvm_arch_update_guest_debug(CPUState *cpu, struct kvm_guest_debug *dbg)
 {
     const uint8_t type_code[] = {
@@ -3240,6 +3257,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ioapic_eoi_broadcast(run->eoi.vector);
         ret = 0;
         break;
+    case KVM_EXIT_SPP:
+        DPRINTF("handle_spp\n");
+        kvm_handle_spp(run->spp.addr);
+        ret = 0;
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-13 21:13   ` Paolo Bonzini
@ 2017-10-16  0:08     ` Yi Zhang
  2017-10-18  9:35       ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Yi Zhang @ 2017-10-16  0:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm list, LKML, Radim Krčmář,
	Alex Williamson

Thanks for your quick response Paolo.

On 2017-10-13 at 17:13:25 -0400, Paolo Bonzini wrote:
> 
> > I'll ask before Paolo does: Can you please add kvm-unit-tests to
> > exercise all of this new code?
> 
> More specifically it should be the api/ unit tests because this code
> can only be triggered by specific code in the host.
> 
> However, as things stand I'm not sure about how userspace would use it.
> Only allowing blocking of writes means that we cannot (for example) use
> it to do sub-page passthrough in VFIO.  That would be useful when the
> MSI-X table does not fit a full page, but would require blocking reads

For read access blocking, it is a good point, Will report your advice to
the hardware designers, hope it could apply in the next-next generation
nodes. ^_^

> as well.  And the introspection facility by Mihai uses a completely
> different API for the introspector, based on sockets rather than ioctls.
> So I'm not sure this is the right API at all.

Currently,  We only block the write access, As far as I know an example,
we now using it in a security daemon:
Consider It has a server which launching in the host user-space, and a
client launching in the guest kernel. Yes, they are communicate with
sockets. The guest kernel wanna protect a special area to prevent all
the process including the kernel itself modify this area. the client
could send the guest physical address via the security socket to server
side, and server would update these protection into KVM. Thus, all the
write access in a guest specific area will be blocked.

Now the implementation only on the second half(maybe third ^_^) of this
example: 'How kvm set the write-protect into a specific GFN?'

Maybe a user space tools which use ioctl let kvm mmu update the
write-protection is a better choice.

> 
> Paolo
> 
> > BTW, what generation of hardware do we need to exercise this code ourselves?
> > 
> > On Fri, Oct 13, 2017 at 4:11 PM, Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
> > > From: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > >
> > > Hi All,
> > >
> > > Here is a patch-series which adding EPT-Based Sub-page Write Protection
> > > Support. You can get It's software developer manuals from:
> > >
> > > https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> > >
> > > In Chapter 4 EPT-BASED SUB-PAGE PERMISSIONS.
> > >
> > > Introduction:
> > >
> > > EPT-Based Sub-page Write Protection referred to as SPP, it is a capability
> > > which allow Virtual Machine Monitors(VMM) to specify write-permission for
> > > guest physical memory at a sub-page(128 byte) granularity.  When this
> > > capability is utilized, the CPU enforces write-access permissions for
> > > sub-page regions of 4K pages as specified by the VMM. EPT-based sub-page
> > > permissions is intended to enable fine-grained memory write enforcement by
> > > a VMM for security(guest OS monitoring) and usages such as device
> > > virtualization and memory check-point.
> > >
> > > How SPP Works:
> > >
> > > SPP is active when the "sub-page write protection" VM-execution control is
> > > 1. A new 4-level paging structure named SPP page table(SPPT) is
> > > introduced, SPPT will look up the guest physical addresses to derive a 64
> > > bit "sub-page permission" value containing sub-page write permissions. The
> > > lookup from guest-physical addresses to the sub-page region permissions is
> > > determined by a set of this SPPT paging structures.
> > >
> > > The SPPT is used to lookup write permission bits for the 128 byte sub-page
> > > regions containing in the 4KB guest physical page. EPT specifies the 4KB
> > > page level privileges that software is allowed when accessing the guest
> > > physical address, whereas SPPT defines the write permissions for software
> > > at the 128 byte granularity regions within a 4KB page. Write accesses
> > > prevented due to sub-page permissions looked up via SPPT are reported as
> > > EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to
> > > lookup sub-page region write permissions for guest-physical addresses only
> > > when those addresses are used to access memory.
> > >
> > > Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
> > > ┌-----------------------------------------------------------┘
> > > └-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
> > >      |
> > >      └-> <false> --> EPT legacy behavior
> > >      |
> > >      |
> > >      └-> <true>  --> if ept_leaf_entry.writable
> > >                       |
> > >                       └-> <true>  --> Ignore SPP
> > >                       |
> > >                       └-> <false> --> GPA --> Walk SPP 4-level table--┐
> > >                                                                       |
> > > ┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
> > > |
> > > Walk SPP L4E table
> > > |
> > > └┐--> entry misconfiguration ------------>----------┐<----------------┐
> > >  |                                                  |                 |
> > > else                                                |                 |
> > >  |                                                  |                 |
> > >  |   ┌------------------SPP VMexit<-----------------┘                 |
> > >  |   |                                                                |
> > >  |   └-> exit_qualification & sppt_misconfig --> sppt misconfig       |
> > >  |   |                                                                |
> > >  |   └-> exit_qualification & sppt_miss --> sppt miss                 |
> > >  └--┐                                                                 |
> > >     |                                                                 |
> > > walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
> > >                |                                                      |
> > >               else                                                    |
> > >                |                                                      |
> > >                |                                                      |
> > >         walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
> > >                         |                                             |
> > >                        else                                           |
> > >                         |                                             |
> > >                         |                                             |
> > >                  walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
> > >                                  |
> > >                                 else
> > >                                  |
> > >                                  └-> if sub-page writable
> > >                                       └-> <true>  allow, write access
> > >                                       └-> <false> disallow, EPT violation
> > >
> > > Patch-sets Description:
> > >
> > > Patch 1: Documentation.
> > >
> > > Patch 2: This patch adds reporting SPP capability from VMX Procbased MSR,
> > > according to the definition of hardware spec, bit 23 is the control of the
> > > SPP capability.
> > >
> > > Patch 3: Add new secondary processor-based VM-execution control bit which
> > > defined as "sub-page write permission", same as VMX Procbased MSR, bit 23
> > > is the enable bit of SPP.
> > > Also we introduced a kernel parameter "enable_ept_spp", now SPP is active
> > > when the "Sub-page Write Protection" in Secondary  VM-Execution Control is
> > > set and enable the kernel parameter by "enable_ept_spp=1".
> > >
> > > Patch 4: Introduced the spptp and spp page table.
> > > The sub-page permission table is referenced via a 64-bit control field
> > > called Sub-Page Permission Table Pointer (SPPTP) which contains a
> > > 4K-aligned physical address. The index and encoding for this VMCS field if
> > > defined 0x2030 at this time The format of SPPTP is shown in below figure
> > > 2:
> > > this patch introduced the Spp paging structures, which root page will
> > > created at kvm mmu page initialization.
> > > Also we added a mmu page role type spp to distinguish it is a spp page or a
> > > EPT page.
> > >
> > > Patch 5: Introduced the SPP-Induced VM exit and it's handle.
> > > Accesses using guest-physical addresses may cause SPP-induced VM exits due
> > > to an SPPT misconfiguration or an SPPT miss. The basic VM exit reason code
> > > reporte for SPP-induced VM exits is 66.
> > >
> > > Also introduced the new exit qualification for SPPT-induced vmexits.
> > >
> > > | Bit   | Contents
> > > | |
> > > | :---- | :----------------------------------------------------------------
> > > | |
> > > | 10:0  | Reserved (0).
> > > | |
> > > | 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig.
> > > | |
> > > | 12    | NMI unblocking due to IRET
> > > | |
> > > | 63:13 | Reserved (0)
> > > | |
> > >
> > > Patch 6: Added a handle of EPT subpage write protection fault.
> > > A control bit in EPT leaf paging-structure entries is defined as “Sub-Page
> > > Permission” (SPP bit). The bit position is 61; it is chosen from among the
> > > bits that are currently ignored by the processor and available to
> > > software.
> > > While hardware walking the SPP page table, If the sub-page region write
> > > permission bit is set, the write is allowed, else the write is disallowed
> > > and results in an EPT violation.
> > > We need peek this case in EPT violation handler, and trigger a user-space
> > > exit, return the write protected address(GVA) to user(qemu).
> > >
> > > Patch 7: Introduce ioctls to set/get Sub-Page Write Protection.
> > > We introduced 2 ioctls to let user application to set/get subpage write
> > > protection bitmap per gfn, each gfn corresponds to a bitmap.
> > > The user application, qemu, or some other security control daemon. will set
> > > the protection bitmap via this ioctl.
> > > the API defined as:
> > >         struct kvm_subpage {
> > >                 __u64 base_gfn;
> > >                 __u64 npages;
> > >                 /* sub-page write-access bitmap array */
> > >                 __u32 access_map[SUBPAGE_MAX_BITMAP];
> > >                 }sp;
> > >         kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
> > >         kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
> > >
> > > Patch 8 ~ Patch 9: Setup spp page table and update the EPT leaf entry
> > > indicated with the SPP enable bit.
> > > If the sub-page write permission VM-execution control is set, treatment of
> > > write accesses to guest-physical accesses depends on the state of the
> > > accumulated write-access bit (position 1) and sub-page permission bit
> > > (position 61) in the EPT leaf paging-structure.
> > > Software will update the EPT leaf entry sub-page permission bit while
> > > kvm_set_subpage(patch 7). If the EPT write-access bit set to 0 and the SPP
> > > bit set to 1 in the leaf EPT paging-structure entry that maps a 4KB page,
> > > then the hardware will look up a VMM-managed Sub-Page Permission Table
> > > (SPPT), which will be prepared by setup kvm_set_subpage(patch 8).
> > > The hardware uses the guest-physical address and bits 11:7 of the address
> > > accessed to lookup the SPPT to fetch a write permission bit for the 128
> > > byte wide sub-page region being accessed within the 4K guest-physical
> > > page. If the sub-page region write permission bit is set, the write is
> > > allowed, otherwise the write is disallowed and results in an EPT
> > > violation.
> > > Guest-physical pages mapped via leaf EPT-paging-structures for which the
> > > accumulated write-access bit and the SPP bits are both clear (0) generate
> > > EPT violations on memory writes accesses. Guest-physical pages mapped via
> > > EPT-paging-structure for which the accumulated write-access bit is set (1)
> > > allow writes, effectively ignoring the SPP bit on the leaf EPT-paging
> > > structure.
> > > Software will setup the spp page table level4,3,2 as well as EPT page
> > > structure, and fill the level 1 page via the 32 bit bitmaps per a single
> > > 4K page. Now it could be divided to 32 x 128 sub-pages.
> > >
> > > The SPP L4E L3E L2E is defined as below figure.
> > >
> > > | Bit    | Contents
> > > | |
> > > | :----- |
> > > | :--------------------------------------------------------------------- |
> > > | 0      | Valid entry when set; indicates whether the entry is present
> > > | |
> > > | 11:1   | Reserved (0)
> > > | |
> > > | N-1:12 | Physical address of 4K aligned SPPT LX-1 Table referenced by the
> > > | entry |
> > > | 51:N   | Reserved (0)
> > > | |
> > > | 63:52  | Reserved (0)
> > > | |
> > > Note: N is the physical address width supported by the processor, X is the
> > > page level
> > >
> > > The SPP L1E format is defined as below figure.
> > > | Bit   | Contents
> > > | |
> > > | :---- | :----------------------------------------------------------------
> > > | |
> > > | 0+2i  | Write permission for i-th 128 byte sub-page region.
> > > | |
> > > | 1+2i  | Reserved (0).
> > > | |
> > > Note: `0<=i<=31`
> > >
> > >
> > > Zhang Yi Z (10):
> > >   KVM: VMX: Added EPT Subpage Protection Documentation.
> > >   x86/cpufeature: Add intel Sub-Page Protection to CPU features
> > >   KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
> > >   KVM: VMX: Introduce the SPPTP and SPP page table.
> > >   KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
> > >   KVM: VMX: Added handle of SPP write protection fault.
> > >   KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
> > >   KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
> > >   KVM: VMX: Added setup spp page structure.
> > >   KVM: VMX: implement setup SPP page structure in spp miss.
> > >
> > >  Documentation/virtual/kvm/spp_design_kvm.txt | 272 +++++++++++++++++++++
> > >  arch/x86/include/asm/cpufeatures.h           |   1 +
> > >  arch/x86/include/asm/kvm_host.h              |  18 +-
> > >  arch/x86/include/asm/vmx.h                   |  10 +
> > >  arch/x86/include/uapi/asm/vmx.h              |   2 +
> > >  arch/x86/kernel/cpu/intel.c                  |   4 +
> > >  arch/x86/kvm/mmu.c                           | 340
> > >  ++++++++++++++++++++++++++-
> > >  arch/x86/kvm/mmu.h                           |   1 +
> > >  arch/x86/kvm/vmx.c                           | 104 ++++++++
> > >  arch/x86/kvm/x86.c                           |  99 +++++++-
> > >  include/linux/kvm_host.h                     |   5 +
> > >  include/uapi/linux/kvm.h                     |  16 ++
> > >  virt/kvm/kvm_main.c                          |  26 ++
> > >  13 files changed, 893 insertions(+), 5 deletions(-)
> > >  create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt
> > >
> > > --
> > > 2.7.4
> > >
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (10 preceding siblings ...)
  2017-10-13 23:16 ` [PATCH RFC 10/10] KVM: VMX: implement setup SPP page structure in spp miss Zhang Yi
@ 2017-10-18  7:09 ` Christoph Hellwig
  2017-10-18 14:02   ` Yi Zhang
  2017-11-04  0:12 ` Yi Zhang
  12 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2017-10-18  7:09 UTC (permalink / raw)
  To: Zhang Yi; +Cc: kvm, linux-kernel, pbonzini, rkrcmar

> We introduced 2 ioctls to let user application to set/get subpage write protection bitmap per gfn, each gfn corresponds to a bitmap.
> The user application, qemu, or some other security control daemon. will set the protection bitmap via this ioctl.
> the API defined as:
> 	struct kvm_subpage {
> 		__u64 base_gfn;
> 		__u64 npages;
> 		/* sub-page write-access bitmap array */
> 		__u32 access_map[SUBPAGE_MAX_BITMAP];
> 		}sp;
> 	kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
> 	kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)

What is the use case for this feature?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-16  0:08     ` Yi Zhang
@ 2017-10-18  9:35       ` Paolo Bonzini
  2017-10-18 14:07         ` Yi Zhang
  2017-10-18 14:13         ` Mihai Donțu
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2017-10-18  9:35 UTC (permalink / raw)
  To: Jim Mattson, kvm list, LKML, Radim Krčmář,
	Alex Williamson

On 16/10/2017 02:08, Yi Zhang wrote:
>> And the introspection facility by Mihai uses a completely
>> different API for the introspector, based on sockets rather than ioctls.
>> So I'm not sure this is the right API at all.
>
> Currently,  We only block the write access, As far as I know an example,
> we now using it in a security daemon:

Understood.  However, I think QEMU is the wrong place to set this up.

If the kernel wants to protect _itself_, it should use a hypercall.  If
an introspector appliance wants to protect the guest kernel, it should
use the socket that connects it to the hypervisor.

Paolo

> Consider It has a server which launching in the host user-space, and a
> client launching in the guest kernel. Yes, they are communicate with
> sockets. The guest kernel wanna protect a special area to prevent all
> the process including the kernel itself modify this area. the client
> could send the guest physical address via the security socket to server
> side, and server would update these protection into KVM. Thus, all the
> write access in a guest specific area will be blocked.
> 
> Now the implementation only on the second half(maybe third ^_^) of this
> example: 'How kvm set the write-protect into a specific GFN?'
> 
> Maybe a user space tools which use ioctl let kvm mmu update the
> write-protection is a better choice.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-18  7:09 ` [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Christoph Hellwig
@ 2017-10-18 14:02   ` Yi Zhang
  0 siblings, 0 replies; 29+ messages in thread
From: Yi Zhang @ 2017-10-18 14:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kvm, linux-kernel, pbonzini, rkrcmar

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On 2017-10-18 at 00:09:36 -0700, Christoph Hellwig wrote:
> > We introduced 2 ioctls to let user application to set/get subpage write protection bitmap per gfn, each gfn corresponds to a bitmap.
> > The user application, qemu, or some other security control daemon. will set the protection bitmap via this ioctl.
> > the API defined as:
> > 	struct kvm_subpage {
> > 		__u64 base_gfn;
> > 		__u64 npages;
> > 		/* sub-page write-access bitmap array */
> > 		__u32 access_map[SUBPAGE_MAX_BITMAP];
> > 		}sp;
> > 	kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
> > 	kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
> 
> What is the use case for this feature?

Thanks for your review Chirs,

I have prepared a draft version of tools which embedded in the qemu
command line, mean that we could set/get the subpage protection via qemu
command.

Attached the qemu patch, it is a pre-design version, I'm considering to
change the interface to hypercall as Paolo's advice.


[-- Attachment #2: 0001-x86-Intel-Sub-Page-Protection-support.patch --]
[-- Type: text/x-diff, Size: 10314 bytes --]

>From a369bed5d986dccb3ca36dc5a27c6220ca2d1405 Mon Sep 17 00:00:00 2001
From: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Date: Tue, 14 Mar 2017 15:11:38 +0800
Subject: [PATCH] x86: Intel Sub-Page Protection support

Signed-off-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
---
 hmp-commands.hx           | 26 ++++++++++++++++++++++++++
 hmp.c                     | 26 ++++++++++++++++++++++++++
 hmp.h                     |  2 ++
 include/sysemu/kvm.h      |  2 ++
 kvm-all.c                 | 40 ++++++++++++++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h | 15 +++++++++++++++
 qapi-schema.json          | 41 +++++++++++++++++++++++++++++++++++++++++
 qmp.c                     | 43 +++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm.c         | 22 ++++++++++++++++++++++
 9 files changed, 217 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8819281..7a57411 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1766,6 +1766,32 @@ Set QOM property @var{property} of object at location @var{path} to value @var{v
 ETEXI
 
     {
+        .name       = "get-subpage",
+        .args_type  = "base_gfn:l,npages:l,filename:str",
+        .params     = "base_gfn npages filename",
+        .help       = "get the write-protect bitmap setting of sub-page protectio",
+        .cmd        = hmp_get_subpage,
+    },
+
+STEXI
+@item get-subpage @var{base_gfn} @var{npages} @var{file}
+Get the write-protect bitmap setting of sub-page protection in the range of @var{base_gfn} to @var{base_gfn} + @var{npages}
+ETEXI
+
+    {
+        .name       = "set-subpage",
+        .args_type  = "base_gfn:l,npages:l,wp_map:i",
+        .params     = "base_gfn npages",
+        .help       = "set the write-protect bitmap setting of sub-page protectio",
+        .cmd        = hmp_set_subpage,
+    },
+
+STEXI
+@item set-subpage @var{base_gfn} @var{npages}
+Get the write-protect bitmap setting of sub-page protection in the range of @var{base_gfn} to @var{base_gfn} + @var{npages}
+ETEXI
+
+    {
         .name       = "info",
         .args_type  = "item:s?",
         .params     = "[subcommand]",
diff --git a/hmp.c b/hmp.c
index 261843f..7d217e9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2614,3 +2614,29 @@ void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict)
     }
     qapi_free_GuidInfo(info);
 }
+
+void hmp_get_subpage(Monitor *mon, const QDict *qdict)
+{
+    uint64_t base_gfn = qdict_get_int(qdict, "base_gfn");
+    uint64_t npages = qdict_get_int(qdict, "npages");
+    const char *filename = qdict_get_str(qdict, "filename");
+    Error *err = NULL;
+
+    monitor_printf(mon, "base_gfn: %ld, npages: %ld, file: %s\n", base_gfn, npages, filename);
+
+    qmp_get_subpage(base_gfn, npages, filename, &err);
+    hmp_handle_error(mon, &err);
+}
+
+void hmp_set_subpage(Monitor *mon, const QDict *qdict)
+{
+    uint64_t base_gfn = qdict_get_int(qdict, "base_gfn");
+    uint64_t npages = qdict_get_int(qdict, "npages");
+    uint32_t wp_map = qdict_get_int(qdict, "wp_map");
+    Error *err = NULL;
+
+    monitor_printf(mon, "base_gfn: %ld, npages: %ld, wp_map: %d\n", base_gfn, npages, wp_map);
+
+    qmp_set_subpage(base_gfn, npages, wp_map, &err);
+    hmp_handle_error(mon, &err);
+}
diff --git a/hmp.h b/hmp.h
index 799fd37..b72143f 100644
--- a/hmp.h
+++ b/hmp.h
@@ -138,5 +138,7 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
 void hmp_info_dump(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
+void hmp_get_subpage(Monitor *mon, const QDict *qdict);
+void hmp_set_subpage(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 24281fc..f7c1340 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -528,4 +528,6 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void *source);
  */
 int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target);
 int kvm_get_max_memslots(void);
+int kvm_get_subpage_wp_map(uint64_t base_gfn, uint32_t *buf, uint64_t len);
+int kvm_set_subpage_wp_map(uint64_t base_gfn, uint64_t npages, uint32_t wp_map);
 #endif
diff --git a/kvm-all.c b/kvm-all.c
index 9040bd5..58cc0a4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2593,6 +2593,46 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target)
     return r;
 }
 
+int kvm_get_subpage_wp_map(uint64_t base_gfn, uint32_t *buf,
+		           uint64_t len)
+{
+	KVMState *s = kvm_state;
+	struct kvm_subpage sp = {};
+	int n;
+
+	sp.base_gfn = base_gfn;
+	sp.npages = len;
+
+
+	if (kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp) < 0) {
+		DPRINTF("ioctl failed %d\n", errno);
+		return -1;
+	}
+
+	memcpy(buf, sp.access_map, n * sizeof(uint32_t));
+
+	return n;
+}
+
+int kvm_set_subpage_wp_map(uint64_t base_gfn, uint64_t npages,
+		           uint32_t wp_map)
+{
+	KVMState *s = kvm_state;
+	struct kvm_subpage sp = {};
+
+	sp.base_gfn = base_gfn;
+	sp.npages = npages;
+	sp.access_map[0] = wp_map;
+
+
+	if (kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp) < 0) {
+		DPRINTF("ioctl failed %d\n", errno);
+		return -1;
+	}
+
+	return 0;
+}
+
 static void kvm_accel_class_init(ObjectClass *oc, void *data)
 {
     AccelClass *ac = ACCEL_CLASS(oc);
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 4e082a8..69de005 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -205,6 +205,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_S390_STSI        25
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
+#define KVM_EXIT_SPP              28
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -360,6 +361,10 @@ struct kvm_run {
 		struct {
 			__u8 vector;
 		} eoi;
+		/* KVM_EXIT_SPP */
+		struct {
+			__u64 addr;
+		} spp;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
 		/* Fix the size of the union. */
@@ -1126,6 +1131,8 @@ enum kvm_device_type {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SUBPAGES_GET_ACCESS   _IOR(KVMIO,  0x49, __u64)
+#define KVM_SUBPAGES_SET_ACCESS   _IOW(KVMIO,  0x4a, __u64)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
@@ -1354,4 +1361,12 @@ struct kvm_assigned_msix_entry {
 #define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+#define SUBPAGE_MAX_BITMAP 256
+struct kvm_subpage {
+       __u64 base_gfn;
+       __u64 npages;
+       __u32 access_map[SUBPAGE_MAX_BITMAP]; /* sub-page write-access bitmap array */
+};
+
 #endif /* __LINUX_KVM_H */
diff --git a/qapi-schema.json b/qapi-schema.json
index 32b4a4b..d6b46bb 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -6267,3 +6267,44 @@
 # Since 2.9
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
+
+##
+# @get-subpage:
+#
+# This command will get setting information of sub-page
+# protection.
+#
+# Since: 2.10
+#
+# Example:
+#
+# -> { "execute": "get-subpage",
+#      "arguments": { "base_gfn": 0x1000,
+#                     "npages": 10,
+#                     "filename": "/tmp/spp_info" } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'get-subpage',
+  'data': {'base_gfn': 'uint64', 'npages': 'uint64', 'filename': 'str'} }
+
+
+##
+# @set-subpage:
+#
+# This command will set sub-page protection for given GFNs.
+#
+# Since: 2.10
+#
+# Example:
+#
+# -> { "execute": "set-subpage",
+#      "arguments": { "base_gfn": 0x1000,
+#                     "npages": 10,
+#                     "wp_map": 0xffff0000 } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'set-subpage',
+  'data': {'base_gfn': 'uint64', 'npages': 'uint64', 'wp_map': 'uint32'} }
+
diff --git a/qmp.c b/qmp.c
index fa82b59..274efdb 100644
--- a/qmp.c
+++ b/qmp.c
@@ -717,3 +717,46 @@ ACPIOSTInfoList *qmp_query_acpi_ospm_status(Error **errp)
 
     return head;
 }
+
+#define SUBPAGE_BUF_LEN 256
+void qmp_get_subpage(uint64_t base_gfn, uint64_t npages,
+		     const char *filename, Error **errp)
+{
+    FILE *f;
+    uint64_t n;
+    uint32_t buf[SUBPAGE_BUF_LEN];
+
+    f = fopen(filename, "wb");
+    if (!f) {
+        error_setg_file_open(errp, errno, filename);
+        return;
+    }
+
+    while (npages != 0) {
+        n = npages;
+        if (n > SUBPAGE_BUF_LEN)
+            n = SUBPAGE_BUF_LEN;
+        if (kvm_get_subpage_wp_map(base_gfn, buf, n) < 0) {
+            error_setg(errp, QERR_IO_ERROR);
+            goto exit;
+        }
+        if (fwrite(buf, 4, n, f) != n) {
+            error_setg(errp, QERR_IO_ERROR);
+            goto exit;
+        }
+        base_gfn += n;
+        npages -= n;
+    }
+
+exit:
+    fclose(f);
+}
+
+void qmp_set_subpage(uint64_t base_gfn, uint64_t npages,
+		     uint32_t wp_map, Error **errp)
+{
+	if (kvm_set_subpage_wp_map(base_gfn, npages, wp_map) < 0)
+		error_setg(errp, QERR_IO_ERROR);
+
+}
+
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 472399f..18a43d7 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -3147,6 +3147,23 @@ static int kvm_handle_debug(X86CPU *cpu,
     return ret;
 }
 
+static int kvm_handle_spp(uint64_t addr)
+{
+    /*
+    uint64_t base_gfn = addr >> 12;
+    uint64_t offset = addr & ((1 << 12) - 1);
+    int subpage_index = offset >> 7;
+    uint32_t mask;
+
+    kvm_get_subpage_wp_map(base_gfn, &mask, 1);
+    mask |= 1UL << subpage_index;
+    return kvm_set_subpage_wp_map(base_gfn, 1, mask);
+    */
+
+    fprintf(stderr, "QEMU-SPP: we are in kvm_handle_spp now, addr=0x%lx!\n", addr);
+    return 0;
+}
+
 void kvm_arch_update_guest_debug(CPUState *cpu, struct kvm_guest_debug *dbg)
 {
     const uint8_t type_code[] = {
@@ -3240,6 +3257,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ioapic_eoi_broadcast(run->eoi.vector);
         ret = 0;
         break;
+    case KVM_EXIT_SPP:
+        DPRINTF("handle_spp\n");
+        kvm_handle_spp(run->spp.addr);
+        ret = 0;
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-18  9:35       ` Paolo Bonzini
@ 2017-10-18 14:07         ` Yi Zhang
  2017-10-19 11:57           ` Paolo Bonzini
  2017-10-18 14:13         ` Mihai Donțu
  1 sibling, 1 reply; 29+ messages in thread
From: Yi Zhang @ 2017-10-18 14:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm list, LKML, Radim Krčmář,
	Alex Williamson

On 2017-10-18 at 11:35:12 +0200, Paolo Bonzini wrote:
> >
> > Currently,  We only block the write access, As far as I know an example,
> > we now using it in a security daemon:
> 
> Understood.  However, I think QEMU is the wrong place to set this up.
> 
> If the kernel wants to protect _itself_, it should use a hypercall.  If
> an introspector appliance wants to protect the guest kernel, it should
> use the socket that connects it to the hypervisor.
> 
> Paolo
> 

Thanks Paolo,

Yes, that correctable, I will think about to switch the interface to a
hypercall,  How about we keep these 2 interface together(hyper call +
ioctl)? think about that if VMM manager have some way could intercept
the guest kernel memory accessing, the page protection would like a
hardware watch point, is it an easy way to let VMM manager debug the
guest kernel?

Except the interface change, could you please help to review the other
patch series? just skip the ioctl patch( patch 7). 
Thank you very much Paolo.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-18  9:35       ` Paolo Bonzini
  2017-10-18 14:07         ` Yi Zhang
@ 2017-10-18 14:13         ` Mihai Donțu
  2017-10-20  8:47           ` Yi Zhang
  1 sibling, 1 reply; 29+ messages in thread
From: Mihai Donțu @ 2017-10-18 14:13 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson, kvm list, LKML,
	Radim Krčmář,
	Alex Williamson

On Wed, 2017-10-18 at 11:35 +0200, Paolo Bonzini wrote:
> On 16/10/2017 02:08, Yi Zhang wrote:
> > > And the introspection facility by Mihai uses a completely
> > > different API for the introspector, based on sockets rather than ioctls.
> > > So I'm not sure this is the right API at all.
> > 
> > Currently,  We only block the write access, As far as I know an example,
> > we now using it in a security daemon:
> 
> Understood.  However, I think QEMU is the wrong place to set this up.
> 
> If the kernel wants to protect _itself_, it should use a hypercall.  If
> an introspector appliance wants to protect the guest kernel, it should
> use the socket that connects it to the hypervisor.

We have been looking at using SPP for VMI for quite some time. If a
guest kernel will be able to control it (can it do so with EPT?) then
it would be useful a simple switch that disables this ability, as an
introspector wouldn't want the guest is trying to protect to interfere
with it.

Also, if Intel doesn't have a specific use case for it that requires
separate access to SPP control, then maybe we can fold it into the VMI 
API we are working on?

Thanks,

> > Consider It has a server which launching in the host user-space, and a
> > client launching in the guest kernel. Yes, they are communicate with
> > sockets. The guest kernel wanna protect a special area to prevent all
> > the process including the kernel itself modify this area. the client
> > could send the guest physical address via the security socket to server
> > side, and server would update these protection into KVM. Thus, all the
> > write access in a guest specific area will be blocked.
> > 
> > Now the implementation only on the second half(maybe third ^_^) of this
> > example: 'How kvm set the write-protect into a specific GFN?'
> > 
> > Maybe a user space tools which use ioctl let kvm mmu update the
> > write-protection is a better choice.

-- 
Mihai Donțu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-18 14:07         ` Yi Zhang
@ 2017-10-19 11:57           ` Paolo Bonzini
  2017-10-20  8:51             ` Yi Zhang
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-10-19 11:57 UTC (permalink / raw)
  To: Jim Mattson, kvm list, LKML, Radim Krčmář,
	Alex Williamson

On 18/10/2017 16:07, Yi Zhang wrote:
> On 2017-10-18 at 11:35:12 +0200, Paolo Bonzini wrote:
>>>
>>> Currently,  We only block the write access, As far as I know an example,
>>> we now using it in a security daemon:
>>
>> Understood.  However, I think QEMU is the wrong place to set this up.
>>
>> If the kernel wants to protect _itself_, it should use a hypercall.  If
>> an introspector appliance wants to protect the guest kernel, it should
>> use the socket that connects it to the hypervisor.
>>
>> Paolo
>>
> 
> Thanks Paolo,
> 
> Yes, that correctable, I will think about to switch the interface to a
> hypercall,  How about we keep these 2 interface together(hyper call +
> ioctl)? think about that if VMM manager have some way could intercept
> the guest kernel memory accessing, the page protection would like a
> hardware watch point, is it an easy way to let VMM manager debug the
> guest kernel?

I would leave out the ioctl without a use case.  It's always tricky to
add APIs without a user, as the risk of bit rot is high.  But if
somebody comes up with a matching useful patch for QEMU or kvmtool, it's
fine.

> Except the interface change, could you please help to review the other
> patch series? just skip the ioctl patch( patch 7). 

Yes, of course.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-18 14:13         ` Mihai Donțu
@ 2017-10-20  8:47           ` Yi Zhang
  2017-10-20 17:06             ` Mihai Donțu
  0 siblings, 1 reply; 29+ messages in thread
From: Yi Zhang @ 2017-10-20  8:47 UTC (permalink / raw)
  To: Mihai Donțu
  Cc: Paolo Bonzini, Jim Mattson, kvm list, LKML,
	Radim Krčmář,
	Alex Williamson

On 2017-10-18 at 17:13:18 +0300, Mihai Donțu wrote:
> On Wed, 2017-10-18 at 11:35 +0200, Paolo Bonzini wrote:
> > On 16/10/2017 02:08, Yi Zhang wrote:
> > > > And the introspection facility by Mihai uses a completely
> > > > different API for the introspector, based on sockets rather than ioctls.
> > > > So I'm not sure this is the right API at all.
> > > 
> > > Currently,  We only block the write access, As far as I know an example,
> > > we now using it in a security daemon:
> > 
> > Understood.  However, I think QEMU is the wrong place to set this up.
> > 
> > If the kernel wants to protect _itself_, it should use a hypercall.  If
> > an introspector appliance wants to protect the guest kernel, it should
> > use the socket that connects it to the hypervisor.
> 
> We have been looking at using SPP for VMI for quite some time. If a
> guest kernel will be able to control it (can it do so with EPT?) then
> it would be useful a simple switch that disables this ability, as an
> introspector wouldn't want the guest is trying to protect to interfere
> with it.

Could you mind to provide more information and history about your
investigation?

> 
> Also, if Intel doesn't have a specific use case for it that requires
> separate access to SPP control, then maybe we can fold it into the VMI 
> API we are working on?

That's totally Excellent as we really don't have a specific user case at
this time.
BTW, I have already submit the SPP implementation draft in Xen side.
when you got some time, you can take a look at if that match your
requirement.

> 
> Thanks,
> 
> > > Consider It has a server which launching in the host user-space, and a
> > > client launching in the guest kernel. Yes, they are communicate with
> > > sockets. The guest kernel wanna protect a special area to prevent all
> > > the process including the kernel itself modify this area. the client
> > > could send the guest physical address via the security socket to server
> > > side, and server would update these protection into KVM. Thus, all the
> > > write access in a guest specific area will be blocked.
> > > 
> > > Now the implementation only on the second half(maybe third ^_^) of this
> > > example: 'How kvm set the write-protect into a specific GFN?'
> > > 
> > > Maybe a user space tools which use ioctl let kvm mmu update the
> > > write-protection is a better choice.
> 
> -- 
> Mihai Donțu
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-19 11:57           ` Paolo Bonzini
@ 2017-10-20  8:51             ` Yi Zhang
  0 siblings, 0 replies; 29+ messages in thread
From: Yi Zhang @ 2017-10-20  8:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm list, LKML, Radim Krčmář,
	Alex Williamson

On 2017-10-19 at 13:57:12 +0200, Paolo Bonzini wrote:
> 
> I would leave out the ioctl without a use case.  It's always tricky to
> add APIs without a user, as the risk of bit rot is high.  But if
> somebody comes up with a matching useful patch for QEMU or kvmtool, it's
> fine.

That's fine, leave it out.

> 
> > Except the interface change, could you please help to review the other
> > patch series? just skip the ioctl patch( patch 7). 
> 
> Yes, of course.
Thank Thanks Thankss. 
> 
> Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-20  8:47           ` Yi Zhang
@ 2017-10-20 17:06             ` Mihai Donțu
  2017-10-24  7:52               ` Yi Zhang
  0 siblings, 1 reply; 29+ messages in thread
From: Mihai Donțu @ 2017-10-20 17:06 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Paolo Bonzini, Jim Mattson, kvm list, LKML,
	Radim Krčmář,
	Alex Williamson

On Fri, 2017-10-20 at 16:47 +0800, Yi Zhang wrote:
> On 2017-10-18 at 17:13:18 +0300, Mihai Donțu wrote:
> > On Wed, 2017-10-18 at 11:35 +0200, Paolo Bonzini wrote:
> > > On 16/10/2017 02:08, Yi Zhang wrote:
> > > > > And the introspection facility by Mihai uses a completely
> > > > > different API for the introspector, based on sockets rather than ioctls.
> > > > > So I'm not sure this is the right API at all.
> > > > 
> > > > Currently,  We only block the write access, As far as I know an example,
> > > > we now using it in a security daemon:
> > > 
> > > Understood.  However, I think QEMU is the wrong place to set this up.
> > > 
> > > If the kernel wants to protect _itself_, it should use a hypercall.  If
> > > an introspector appliance wants to protect the guest kernel, it should
> > > use the socket that connects it to the hypervisor.
> > 
> > We have been looking at using SPP for VMI for quite some time. If a
> > guest kernel will be able to control it (can it do so with EPT?) then
> > it would be useful a simple switch that disables this ability, as an
> > introspector wouldn't want the guest is trying to protect to interfere
> > with it.
> 
> Could you mind to provide more information and history about your
> investigation?

We are using VMI to secure certain parts of a guest kernel in memory
(like prevent a certain data structure from being overriten). However,
it sometimes happens for that part to be placed in the same page with
other data, of no interest to us, that gets written frequently. This
makes using the EPT problematic (a 4k page is just too big and
generates too many violations). However, SPP (with its 128 bytes
granularity) is ideal here.

> > Also, if Intel doesn't have a specific use case for it that requires
> > separate access to SPP control, then maybe we can fold it into the VMI 
> > API we are working on?
> 
> That's totally Excellent as we really don't have a specific user case at
> this time.

OK. We will spend some time thinking at a proper way of exposing SPP
with the VMI API.

For example, we now work on implementing something similar to this:

  kvm_set_page_access( struct kvm *kvm, gfn_t gfn, u8 access );

The simplest approach would be to add something like:

  kvm_set_sub_page_access( struct kvm *kvm, gfn_t gfn, u32 mask );

where every bit from 'mask' indicates the write-allowed state of every
128-byte subpage.

> BTW, I have already submit the SPP implementation draft in Xen side.
> when you got some time, you can take a look at if that match your
> requirement.

I believe my colleague Răzvan Cojocaru has already commented on that
patch set. :-)

> > > > Consider It has a server which launching in the host user-space, and a
> > > > client launching in the guest kernel. Yes, they are communicate with
> > > > sockets. The guest kernel wanna protect a special area to prevent all
> > > > the process including the kernel itself modify this area. the client
> > > > could send the guest physical address via the security socket to server
> > > > side, and server would update these protection into KVM. Thus, all the
> > > > write access in a guest specific area will be blocked.
> > > > 
> > > > Now the implementation only on the second half(maybe third ^_^) of this
> > > > example: 'How kvm set the write-protect into a specific GFN?'
> > > > 
> > > > Maybe a user space tools which use ioctl let kvm mmu update the
> > > > write-protection is a better choice.

-- 
Mihai Donțu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-20 17:06             ` Mihai Donțu
@ 2017-10-24  7:52               ` Yi Zhang
  0 siblings, 0 replies; 29+ messages in thread
From: Yi Zhang @ 2017-10-24  7:52 UTC (permalink / raw)
  To: Mihai Donțu
  Cc: Paolo Bonzini, Jim Mattson, kvm list, LKML,
	Radim Krčmář,
	Alex Williamson

On 2017-10-20 at 20:06:47 +0300, Mihai Donțu wrote:
> On Fri, 2017-10-20 at 16:47 +0800, Yi Zhang wrote:
> > Could you mind to provide more information and history about your
> > investigation?
> 
> We are using VMI to secure certain parts of a guest kernel in memory
> (like prevent a certain data structure from being overriten). However,
> it sometimes happens for that part to be placed in the same page with
> other data, of no interest to us, that gets written frequently. This
> makes using the EPT problematic (a 4k page is just too big and
> generates too many violations). However, SPP (with its 128 bytes
> granularity) is ideal here.
> 

> > > Also, if Intel doesn't have a specific use case for it that requires
> > > separate access to SPP control, then maybe we can fold it into the VMI 
> > > API we are working on?
> > 
> > That's totally Excellent as we really don't have a specific user case at
> > this time.
> 
> OK. We will spend some time thinking at a proper way of exposing SPP
> with the VMI API.
> 
> For example, we now work on implementing something similar to this:
> 
>   kvm_set_page_access( struct kvm *kvm, gfn_t gfn, u8 access );
> 
> The simplest approach would be to add something like:
> 
>   kvm_set_sub_page_access( struct kvm *kvm, gfn_t gfn, u32 mask );
> 
> where every bit from 'mask' indicates the write-allowed state of every
> 128-byte subpage.

Got it, seems very compatible with current implementation by us. 
> 
> > BTW, I have already submit the SPP implementation draft in Xen side.
> > when you got some time, you can take a look at if that match your
> > requirement.
> 
> I believe my colleague Răzvan Cojocaru has already commented on that
> patch set. :-)

Oh, yes, pls send my best thanks to him. 

> 
> -- 
> Mihai Donțu
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
                   ` (11 preceding siblings ...)
  2017-10-18  7:09 ` [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Christoph Hellwig
@ 2017-11-04  0:12 ` Yi Zhang
  2017-11-04 16:54   ` Paolo Bonzini
  12 siblings, 1 reply; 29+ messages in thread
From: Yi Zhang @ 2017-11-04  0:12 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: pbonzini, rkrcmar, ravi.sahita

On 2017-10-14 at 07:11:28 +0800, Zhang Yi wrote:
> From: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> 
> Hi All,
> 
> Here is a patch-series which adding EPT-Based Sub-page Write Protection Support. You can get It's software developer manuals from:
> 
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> In Chapter 4 EPT-BASED SUB-PAGE PERMISSIONS.
> 
> Introduction:
> 
> EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which allow Virtual Machine Monitors(VMM) to specify write-permission for guest physical memory at a sub-page(128 byte) granularity.  When this capability is utilized, the CPU enforces write-access permissions for sub-page regions of 4K pages as specified by the VMM. EPT-based sub-page permissions is intended to enable fine-grained memory write enforcement by a VMM for security(guest OS monitoring) and usages such as device virtualization and memory check-point.
> 
> How SPP Works:
> 
> SPP is active when the "sub-page write protection" VM-execution control is 1. A new 4-level paging structure named SPP page table(SPPT) is introduced, SPPT will look up the guest physical addresses to derive a 64 bit "sub-page permission" value containing sub-page write permissions. The lookup from guest-physical addresses to the sub-page region permissions is determined by a set of this SPPT paging structures.
> 
> The SPPT is used to lookup write permission bits for the 128 byte sub-page regions containing in the 4KB guest physical page. EPT specifies the 4KB page level privileges that software is allowed when accessing the guest physical address, whereas SPPT defines the write permissions for software at the 128 byte granularity regions within a 4KB page. Write accesses prevented due to sub-page permissions looked up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to lookup sub-page region write permissions for guest-physical addresses only when those addresses are used to access memory.
> 
> Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
> ┌-----------------------------------------------------------┘
> └-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
>      |
>      └-> <false> --> EPT legacy behavior
>      |
>      |
>      └-> <true>  --> if ept_leaf_entry.writable
>                       |
>                       └-> <true>  --> Ignore SPP
>                       |
> 		      └-> <false> --> GPA --> Walk SPP 4-level table--┐
>                                                                       |
> ┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
> |
> Walk SPP L4E table
> |
> └┐--> entry misconfiguration ------------>----------┐<----------------┐
>  |                                                  |                 |
> else                                                |                 |
>  |                                                  |                 |
>  |   ┌------------------SPP VMexit<-----------------┘                 |
>  |   |                                                                |
>  |   └-> exit_qualification & sppt_misconfig --> sppt misconfig       |
>  |   |                                                                |
>  |   └-> exit_qualification & sppt_miss --> sppt miss                 |
>  └--┐                                                                 |
>     |                                                                 |
> walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
>                |                                                      |
> 	      else                                                    |
> 	       |                                                      |
> 	       |                                                      |
>         walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
>                         |                                             |
>                        else                                           |
> 			|                                             |
> 			|                                             |
> 	         walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
>                                  |
> 			        else
> 				 |
>                                  └-> if sub-page writable
>                                       └-> <true>  allow, write access
> 	                              └-> <false> disallow, EPT violation
> 
> Patch-sets Description:
> 
> Patch 1: Documentation.
> 
> Patch 2: This patch adds reporting SPP capability from VMX Procbased MSR, according to the definition of hardware spec, bit 23 is the control of the SPP capability.
> 
> Patch 3: Add new secondary processor-based VM-execution control bit which defined as "sub-page write permission", same as VMX Procbased MSR, bit 23 is the enable bit of SPP.
> Also we introduced a kernel parameter "enable_ept_spp", now SPP is active when the "Sub-page Write Protection" in Secondary  VM-Execution Control is set and enable the kernel parameter by "enable_ept_spp=1".
> 
> Patch 4: Introduced the spptp and spp page table.
> The sub-page permission table is referenced via a 64-bit control field called Sub-Page Permission Table Pointer (SPPTP) which contains a 4K-aligned physical address. The index and encoding for this VMCS field if defined 0x2030 at this time The format of SPPTP is shown in below figure 2:
> this patch introduced the Spp paging structures, which root page will created at kvm mmu page initialization.
> Also we added a mmu page role type spp to distinguish it is a spp page or a EPT page.
> 
> Patch 5: Introduced the SPP-Induced VM exit and it's handle.
> Accesses using guest-physical addresses may cause SPP-induced VM exits due to an SPPT misconfiguration or an SPPT miss. The basic VM exit reason code reporte for SPP-induced VM exits is 66.
> 
> Also introduced the new exit qualification for SPPT-induced vmexits.
> 
> | Bit   | Contents                                                          |
> | :---- | :---------------------------------------------------------------- |
> | 10:0  | Reserved (0).                                                     |
> | 11    | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
> | 12    | NMI unblocking due to IRET                                        |
> | 63:13 | Reserved (0)                                                      |
> 
> Patch 6: Added a handle of EPT subpage write protection fault.
> A control bit in EPT leaf paging-structure entries is defined as “Sub-Page Permission” (SPP bit). The bit position is 61; it is chosen from among the bits that are currently ignored by the processor and available to software.
> While hardware walking the SPP page table, If the sub-page region write permission bit is set, the write is allowed, else the write is disallowed and results in an EPT violation.
> We need peek this case in EPT violation handler, and trigger a user-space exit, return the write protected address(GVA) to user(qemu).
> 
> Patch 7: Introduce ioctls to set/get Sub-Page Write Protection.
> We introduced 2 ioctls to let user application to set/get subpage write protection bitmap per gfn, each gfn corresponds to a bitmap.
> The user application, qemu, or some other security control daemon. will set the protection bitmap via this ioctl.
> the API defined as:
> 	struct kvm_subpage {
> 		__u64 base_gfn;
> 		__u64 npages;
> 		/* sub-page write-access bitmap array */
> 		__u32 access_map[SUBPAGE_MAX_BITMAP];
> 		}sp;
> 	kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
> 	kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
> 
> Patch 8 ~ Patch 9: Setup spp page table and update the EPT leaf entry indicated with the SPP enable bit.
> If the sub-page write permission VM-execution control is set, treatment of write accesses to guest-physical accesses depends on the state of the accumulated write-access bit (position 1) and sub-page permission bit (position 61) in the EPT leaf paging-structure.
> Software will update the EPT leaf entry sub-page permission bit while kvm_set_subpage(patch 7). If the EPT write-access bit set to 0 and the SPP bit set to 1 in the leaf EPT paging-structure entry that maps a 4KB page, then the hardware will look up a VMM-managed Sub-Page Permission Table (SPPT), which will be prepared by setup kvm_set_subpage(patch 8).
> The hardware uses the guest-physical address and bits 11:7 of the address accessed to lookup the SPPT to fetch a write permission bit for the 128 byte wide sub-page region being accessed within the 4K guest-physical page. If the sub-page region write permission bit is set, the write is allowed, otherwise the write is disallowed and results in an EPT violation.
> Guest-physical pages mapped via leaf EPT-paging-structures for which the accumulated write-access bit and the SPP bits are both clear (0) generate EPT violations on memory writes accesses. Guest-physical pages mapped via EPT-paging-structure for which the accumulated write-access bit is set (1) allow writes, effectively ignoring the SPP bit on the leaf EPT-paging structure.
> Software will setup the spp page table level4,3,2 as well as EPT page structure, and fill the level 1 page via the 32 bit bitmaps per a single 4K page. Now it could be divided to 32 x 128 sub-pages.
> 
> The SPP L4E L3E L2E is defined as below figure.
> 
> | Bit    | Contents                                                               |
> | :----- | :--------------------------------------------------------------------- |
> | 0      | Valid entry when set; indicates whether the entry is present           |
> | 11:1   | Reserved (0)                                                           |
> | N-1:12 | Physical address of 4K aligned SPPT LX-1 Table referenced by the entry |
> | 51:N   | Reserved (0)                                                           |
> | 63:52  | Reserved (0)                                                           |
> Note: N is the physical address width supported by the processor, X is the page level
> 
> The SPP L1E format is defined as below figure.
> | Bit   | Contents                                                          |
> | :---- | :---------------------------------------------------------------- |
> | 0+2i  | Write permission for i-th 128 byte sub-page region.               |
> | 1+2i  | Reserved (0).                                                     |
> Note: `0<=i<=31`
> 
> 
> Zhang Yi Z (10):
>   KVM: VMX: Added EPT Subpage Protection Documentation.
>   x86/cpufeature: Add intel Sub-Page Protection to CPU features
>   KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
>   KVM: VMX: Introduce the SPPTP and SPP page table.
>   KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
>   KVM: VMX: Added handle of SPP write protection fault.
>   KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
>   KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
>   KVM: VMX: Added setup spp page structure.
>   KVM: VMX: implement setup SPP page structure in spp miss.
> 
>  Documentation/virtual/kvm/spp_design_kvm.txt | 272 +++++++++++++++++++++
>  arch/x86/include/asm/cpufeatures.h           |   1 +
>  arch/x86/include/asm/kvm_host.h              |  18 +-
>  arch/x86/include/asm/vmx.h                   |  10 +
>  arch/x86/include/uapi/asm/vmx.h              |   2 +
>  arch/x86/kernel/cpu/intel.c                  |   4 +
>  arch/x86/kvm/mmu.c                           | 340 ++++++++++++++++++++++++++-
>  arch/x86/kvm/mmu.h                           |   1 +
>  arch/x86/kvm/vmx.c                           | 104 ++++++++
>  arch/x86/kvm/x86.c                           |  99 +++++++-
>  include/linux/kvm_host.h                     |   5 +
>  include/uapi/linux/kvm.h                     |  16 ++
>  virt/kvm/kvm_main.c                          |  26 ++
>  13 files changed, 893 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt
> 
> -- 
> 2.7.4
> 
Adding Ravi, 

Does anyone have further comments on current implementation, it is a
important feature in our next generation chip-set.

Regards
Yi.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-11-04  0:12 ` Yi Zhang
@ 2017-11-04 16:54   ` Paolo Bonzini
  2017-11-10 15:39     ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-11-04 16:54 UTC (permalink / raw)
  To: kvm, linux-kernel, rkrcmar, ravi.sahita

On 04/11/2017 01:12, Yi Zhang wrote:
>>
> Adding Ravi, 
> 
> Does anyone have further comments on current implementation, it is a
> important feature in our next generation chip-set.

What matters is not the feature, but the use case; without a use case,
there is no point in including code for SPP in KVM.  KVM doesn't use
VMFUNC or #VE for example, because they are not necessary.

SPP may become useful once we have the introspection interface.  Or, if
another hypervisor uses it, support for nested SPP may be useful (for
example we support nested VMFUNC and should get nested #VE sooner or
later, even though the features are not used on bare metal).

Right now, however, supporting SPP does not seem to be particularly
important honestly.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-11-04 16:54   ` Paolo Bonzini
@ 2017-11-10 15:39     ` Paolo Bonzini
  2017-11-13 10:37       ` Yi Zhang
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-11-10 15:39 UTC (permalink / raw)
  To: kvm, linux-kernel, rkrcmar, ravi.sahita

On 04/11/2017 17:54, Paolo Bonzini wrote:
> On 04/11/2017 01:12, Yi Zhang wrote:
>>>
>> Adding Ravi, 
>>
>> Does anyone have further comments on current implementation, it is a
>> important feature in our next generation chip-set.
> 
> What matters is not the feature, but the use case; without a use case,
> there is no point in including code for SPP in KVM.  KVM doesn't use
> VMFUNC or #VE for example, because they are not necessary.
> 
> SPP may become useful once we have the introspection interface.  Or, if
> another hypervisor uses it, support for nested SPP may be useful (for
> example we support nested VMFUNC and should get nested #VE sooner or
> later, even though the features are not used on bare metal).
> 
> Right now, however, supporting SPP does not seem to be particularly
> important honestly.

Hi Yi Zhang,

are you going to work on nested SPP?  I guess that would be most useful
way to add SPP support to KVM (and you could also test it with
kvm-unit-tests).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support.
  2017-11-10 15:39     ` Paolo Bonzini
@ 2017-11-13 10:37       ` Yi Zhang
  0 siblings, 0 replies; 29+ messages in thread
From: Yi Zhang @ 2017-11-13 10:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, rkrcmar, ravi.sahita

On 2017-11-10 at 16:39:27 +0100, Paolo Bonzini wrote:
> On 04/11/2017 17:54, Paolo Bonzini wrote:
> > On 04/11/2017 01:12, Yi Zhang wrote:
> >>>
> >> Adding Ravi, 
> >>
> >> Does anyone have further comments on current implementation, it is a
> >> important feature in our next generation chip-set.
> > 
> > What matters is not the feature, but the use case; without a use case,
> > there is no point in including code for SPP in KVM.  KVM doesn't use
> > VMFUNC or #VE for example, because they are not necessary.
> > 
> > SPP may become useful once we have the introspection interface.  Or, if
> > another hypervisor uses it, support for nested SPP may be useful (for
> > example we support nested VMFUNC and should get nested #VE sooner or
> > later, even though the features are not used on bare metal).
> > 
> > Right now, however, supporting SPP does not seem to be particularly
> > important honestly.
> 
> Hi Yi Zhang,
> 
> are you going to work on nested SPP?  I guess that would be most useful
> way to add SPP support to KVM (and you could also test it with
> kvm-unit-tests).
Hi Paolo,

We Haven't planing on the nested support yet, so far there are many hardware
assistance work on current SPP implemetation, and apply it in next
generration icelake chip-set.

Regards
Yi.
> 
> Thanks,
> 
> Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-11-13  1:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-13 23:11 [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Zhang Yi
2017-10-13 16:57 ` Jim Mattson
2017-10-13 21:13   ` Paolo Bonzini
2017-10-16  0:08     ` Yi Zhang
2017-10-18  9:35       ` Paolo Bonzini
2017-10-18 14:07         ` Yi Zhang
2017-10-19 11:57           ` Paolo Bonzini
2017-10-20  8:51             ` Yi Zhang
2017-10-18 14:13         ` Mihai Donțu
2017-10-20  8:47           ` Yi Zhang
2017-10-20 17:06             ` Mihai Donțu
2017-10-24  7:52               ` Yi Zhang
2017-10-16  0:01   ` Yi Zhang
2017-10-13 23:12 ` [PATCH RFC 01/10] KVM: VMX: Added EPT Subpage Protection Documentation Zhang Yi
2017-10-13 23:12 ` [PATCH RFC 02/10] x86/cpufeature: Add intel Sub-Page Protection to CPU features Zhang Yi
2017-10-13 23:13 ` [PATCH RFC 03/10] KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls Zhang Yi
2017-10-13 23:13 ` [PATCH RFC 04/10] KVM: VMX: Introduce the SPPTP and SPP page table Zhang Yi
2017-10-13 23:14 ` [PATCH RFC 05/10] KVM: VMX: Introduce SPP-Induced vm exit and it's handle Zhang Yi
2017-10-13 23:14 ` [PATCH RFC 06/10] KVM: VMX: Added handle of SPP write protection fault Zhang Yi
2017-10-13 23:14 ` [PATCH RFC 07/10] KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection Zhang Yi
2017-10-13 23:14 ` [PATCH RFC 08/10] KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit Zhang Yi
2017-10-13 23:14 ` [PATCH RFC 09/10] KVM: VMX: Added setup spp page structure Zhang Yi
2017-10-13 23:16 ` [PATCH RFC 10/10] KVM: VMX: implement setup SPP page structure in spp miss Zhang Yi
2017-10-18  7:09 ` [PATCH RFC 00/10] Intel EPT-Based Sub-page Write Protection Support Christoph Hellwig
2017-10-18 14:02   ` Yi Zhang
2017-11-04  0:12 ` Yi Zhang
2017-11-04 16:54   ` Paolo Bonzini
2017-11-10 15:39     ` Paolo Bonzini
2017-11-13 10:37       ` Yi Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).