[RFC PATCH part-5 00/22] VMX emulation

* [RFC PATCH part-5 00/22] VMX emulation
@ 2023-03-12 18:02 Jason Chen CJ
  2023-03-12 18:02 ` [RFC PATCH part-5 01/22] pkvm: x86: Add memcpy lib Jason Chen CJ
                   ` (22 more replies)
  0 siblings, 23 replies; 34+ messages in thread
From: Jason Chen CJ @ 2023-03-12 18:02 UTC (permalink / raw)
  To: kvm; +Cc: Jason Chen CJ

This patch set is part-5 of this RFC patches. It introduces VMX
emulation for pKVM on Intel platform.

Host VM wants the capability to run its guest, it needs VMX support.

pKVM is designed to emulate VMX for host VM based on shadow vmcs.
This requires "VMCS shadowing" feature support in VMX secondary
processor-based VM-Execution controls field [1].

One alternative way to emulate VMX is based on enlightened vmcs (evmcs)
which was introduced by Hyper-V nesting support. evmcs does normal memory
reads/writes instead of doing VMWRITE/VMREAD instructions, it's a
flexible SW solution to emulate VMX, and does not need "VMCS shadowing"
feature support; while making evmcs work for pKVM leads to the
refactor to KVM Hyper-V code; to avoid change that part of code, we
choose to use shadow VMCS in this RFC.

    +--------------------+   +-----------------+
    |     host VM        |   |   guest VM      |
    |                    |   |                 |
    |        +---------+ |   |                 |
    |        | vmcs12* | |   |                 |
    |        +---------+ |   |                 |
    +--------------------+   +-----------------+
    +------------------------------------------+       +---------+
    |     +---------+         +---------+      |       | shadow  |
    |     | vmcs01* |         | vmcs02* +------+---+-->|  vcpu   |
    |     +---------+         +---------+      |   |   |  state  |
    |                      +---------------+   |   |   +---------+
    |                      | cached_vmcs12 +---+---+
    | pKVM                 +---------------+   |
    +------------------------------------------+

 [*]vmcs12: virtual vmcs of a nested guest
 [*]vmcs02: vmcs of a nested guest
 [*]vmcs01: vmcs of host VM

"VMCS shadowing" use a shadow vmcs page (vmcs02) to cache vmcs fields
accessing from host VM through VMWRITE/VMREAD, avoid causing vmexit.
The fields cached in vmcs02 is pre-defined by VMREAD/VMWRITE bitmap.
Meanwhile for other fields not in VMREAD/VMWRITE bitmap, accessing from
host VM cause VMREAD/VMWRITE vmexit, pKVM need to cache them in another
place - cached_vmcs12 is introduced for this purpose.

The vmcs02 page in root mode is kept in the structure shadow_vcpu_state,
which allocated then donated from host VM when it initialize vcpus for
its launched guest (nested). Same for field of cached_vmcs12.

pKVM use vmcs02 with two purposes, one is mentioned above, using it
as the shadow vmcs page of nested guest when host VM program its vmcs
fields; the other one is using it as ordinary (or active) vmcs for the
same guest during the vmlaunch/vmresume.

For a nested guest, during its vmcs programing from host VM, according
to above, its virtual vmcs (vmcs12) is saved in two places: vmcs02 for
shadow fields and cached_vmcs12 for no shadow fields. Meanwhile for
cached_vmcs12, there are also two parts for its fields: one is emulated
fields, the other one is host state fields. The emulated fields are
mostly security related control fields which shall be emulated to the
physical value then fill into vmcs02 before vmcs02 active to do
vmlaunch/vmresume for the nested guest. The host state fields are
guest state of host vcpu, it shall be restored to guest state of host
vcpu vmcs (vmcs01) before return to host VM.

Below is a summary for contents of different vmcs fields in each above
mentioned vmcs:

              host state      guest state          control
 ---------------------------------------------------------------
 vmcs12:      host VM's     nested guest's     set by host VM
 vmcs02:       pKVM's       nested guest's   set by host VM + pKVM*
 vmcs01:       pKVM's        host VM's          set by pKVM

 [*]the security related control fields of vmcs02 is controlled by pKVM
  (e.g., EPT_POINTER)

Blow show the brief vmcs emulation method for different vmcs fields for
a nested guest:

                host state      guest state   security related control
 ---------------------------------------------------------------------
 virutal vmcs:  cached_vmcs12*     vmcs02*          emulated*

 [*]cached_vmcs12: vmexit then set/get value to/from cached_vmcs12
 [*]vmcs02:        no-vmexit and directly shadow from vmcs02
 [*]emulated:      vmexit then do the emulation

The vmcs02 & cached_vmcs12 is sync back to vmcs12 during VMCLEAR
emulation, and updated from vmcs12 when emulating VMPTRLD. And before
the nested guest vmentry(vmlaunch/vmresume emulation), the vmcs02 is
further sync dirty fields(caused by vmwrite) from cached_vmcs12 and
update emulated fields through emulation.

INVEPT/INVVPID now is simplify emulated by doing a global INVEPT.

VMX msrs are emulated by pKVM as well to provide the VMX capabilities
to host VM, features of PT, SMM, shadowing VMCS and vmfunc are filtered
out.

[1]: SDM: Virtual Machine Control Structures chapter, VMCS TYPES.

Haiwei Li (2):
  pkvm: x86: Do guest address translation per page granularity
  pkvm: x86: Add check for guest address translation

Jason Chen CJ (19):
  pkvm: x86: Add memcpy lib
  pkvm: x86: Add memory operation APIs for for host VM
  pkvm: x86: Add hypercalls for shadow_vm/vcpu init & teardown
  KVM: VMX: Add new kvm_x86_ops vm_free
  KVM: VMX: Add initialization/teardown for shadow vm/vcpu
  pkvm: x86: Add hash table mapping for shadow vcpu based on vmcs12_pa
  pkvm: x86: Add VMXON/VMXOFF emulation
  KVM: VMX: Add more vmcs and vmcs12 fields definition
  pkvm: x86: Init vmcs read/write bitmap for vmcs emulation
  pkvm: x86: Initialize emulated fields for vmcs emulation
  pkvm: x86: Add msr ops for pKVM hypervisor
  pkvm: x86: Move _init_host_state_area to pKVM hypervisor
  pkvm: x86: Add vmcs_load/clear_track APIs
  pkvm: x86: Add VMPTRLD/VMCLEAR emulation
  pkvm: x86: Add VMREAD/VMWRITE emulation
  pkvm: x86: Add VMLAUNCH/VMRESUME emulation
  pkvm: x86: Add INVEPT/INVVPID emulation
  pkvm: x86: Initialize msr_bitmap for vmsr
  pkvm: x86: Add vmx msr emulation

Tina Zhang (1):
  pkvm: x86: Add has_vmcs_field() API for physical vmx capability check

 arch/x86/include/asm/kvm-x86-ops.h            |    1 +
 arch/x86/include/asm/kvm_host.h               |    5 +
 arch/x86/include/asm/kvm_pkvm.h               |   14 +
 arch/x86/include/asm/pkvm_image_vars.h        |    3 +-
 arch/x86/include/asm/vmx.h                    |    4 +
 arch/x86/kvm/vmx/pkvm/hyp/Makefile            |    6 +-
 arch/x86/kvm/vmx/pkvm/hyp/cpu.h               |   23 +
 arch/x86/kvm/vmx/pkvm/hyp/init_finalise.c     |    3 +
 arch/x86/kvm/vmx/pkvm/hyp/lib/memcpy_64.S     |   26 +
 arch/x86/kvm/vmx/pkvm/hyp/memory.c            |  216 ++++
 arch/x86/kvm/vmx/pkvm/hyp/memory.h            |   11 +
 arch/x86/kvm/vmx/pkvm/hyp/nested.c            | 1030 +++++++++++++++++
 arch/x86/kvm/vmx/pkvm/hyp/nested.h            |   27 +
 arch/x86/kvm/vmx/pkvm/hyp/pkvm.c              |  342 ++++++
 arch/x86/kvm/vmx/pkvm/hyp/pkvm_hyp.h          |   82 ++
 .../vmx/pkvm/hyp/pkvm_nested_vmcs_fields.h    |  195 ++++
 arch/x86/kvm/vmx/pkvm/hyp/vmexit.c            |  174 ++-
 arch/x86/kvm/vmx/pkvm/hyp/vmsr.c              |   88 ++
 arch/x86/kvm/vmx/pkvm/hyp/vmsr.h              |   11 +
 arch/x86/kvm/vmx/pkvm/hyp/vmx.c               |   77 ++
 arch/x86/kvm/vmx/pkvm/hyp/vmx.h               |   23 +
 arch/x86/kvm/vmx/pkvm/include/pkvm.h          |    5 +
 arch/x86/kvm/vmx/pkvm/pkvm_constants.c        |    4 +
 arch/x86/kvm/vmx/pkvm/pkvm_host.c             |  181 +--
 arch/x86/kvm/vmx/vmcs12.c                     |    6 +
 arch/x86/kvm/vmx/vmcs12.h                     |   16 +-
 arch/x86/kvm/vmx/vmx.c                        |   14 +-
 arch/x86/kvm/x86.c                            |    1 +
 include/linux/kvm_host.h                      |    8 +
 29 files changed, 2459 insertions(+), 137 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/pkvm/hyp/lib/memcpy_64.S
 create mode 100644 arch/x86/kvm/vmx/pkvm/hyp/nested.c
 create mode 100644 arch/x86/kvm/vmx/pkvm/hyp/nested.h
 create mode 100644 arch/x86/kvm/vmx/pkvm/hyp/pkvm_nested_vmcs_fields.h
 create mode 100644 arch/x86/kvm/vmx/pkvm/hyp/vmsr.c
 create mode 100644 arch/x86/kvm/vmx/pkvm/hyp/vmsr.h
 create mode 100644 arch/x86/kvm/vmx/pkvm/hyp/vmx.c

-- 
2.25.1

^ permalink raw reply	[flat|nested] 34+ messages in thread