All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] Basic KVM SGX Virtualization support
@ 2017-05-08  5:24 Kai Huang
  2017-05-08  5:24 ` [PATCH 01/10] x86: add SGX Launch Control definition to cpufeature Kai Huang
                   ` (10 more replies)
  0 siblings, 11 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

Hi,

This RFC series are KVM part of Basic KVM SGX virtualization support (KVM SGX
EPC static partitioning + Launch Control + SGX2 support). Qemu also needs to
be changed to support KVM SGX virtualization and Qemu part will be sent out
separately in the future.

You can also find this series and Qemu changes at below github repos:

https://github.com/01org/kvm-sgx.git
https://github.com/01org/qemu-sgx.git

KVM SGX virtualization needs to work with host SGX driver (explained below,
which has not been upstreamed yet), therefore part of this series will depend
on SGX driver. You can find the SGX driver at below repo on github.

https://github.com/jsakkine-intel/linux-sgx

The SGX specification can be found in latest Intel SDM as Volume D(below).

https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf

SGX is relatively more complicated on specification (entire Volume D) and it is
unrealistic to list all hardware details here. Below is the brief SGX overview
(which I think is mandatory to talk about design) and high level design. Please
help to review and give comments. Thanks!

============================   SGX Overview   ===========================

- Enclave

Intel Software Guard Extensions (SGX) is a set of instructions and mechanisms
for memory accesses in order to provide security accesses for sensitive
applications and data. SGX allows an application to use it's pariticular address
space as an *enclave*, which is a protected area provides confidentiality and
integrity even in the presence of privileged malware. Accesses to the enclave
memory area from any software not resident in the enclave are prevented,
including those from privileged software. Below diagram illustrates the presence
of Enclave in application.

	|-----------------------|
	|                       |
	|   |---------------|   |
	|   |   OS kernel   |   |       |-----------------------|
	|   |---------------|   |       |                       |
	|   |               |   |       |   |---------------|   |
	|   |---------------|   |       |   | Entry table   |   |
	|   |   Enclave     |---|-----> |   |---------------|   |
	|   |---------------|   |       |   | Enclave stack |   |
	|   |   App code    |   |       |   |---------------|   |
	|   |---------------|   |       |   | Enclave heap  |   |
	|   |   Enclave     |   |       |   |---------------|   |
	|   |---------------|   |       |   | Enclave code  |   |
	|   |   App code    |   |       |   |---------------|   |
	|   |---------------|   |       |                       |
	|           |           |       |-----------------------|
	|-----------------------|

SGX supports SGX1 and SGX2 extensions. SGX1 provides basic enclave support,
and SGX2 allows additional flexibility in runtime management of enclave
resources and thread execution within an enclave.

- Enclave Page Cache

Enclave Page Cache (EPC) is the physical resource used to commit to enclave.
EPC is divided to 4K pages. An EPC page is 4K in size and always aligned to 4K
boundary. Hardware performs additional access control checks to restrict access
to the EPC page. The Enclave Page Cache Map (EPCM) is a secure structure which
holds one entry for each EPC page, and is used by hardware to track the status
of each EPC page (invisibe to software). Typically EPC and EPCM are reserved
by BIOS as Processor Reserved Memory but the actual amount, size, and layout
of EPC are model-specific, and dependent on BIOS settings. EPC is enumerated
via new SGX CPUID, and is reported as reserved memory.

EPC pages can either be invalid or valid. There are 4 valid EPC types in SGX1:
regular EPC page, SGX Enclave Control Structure (SECS) page, Thread Control
Structure (TCS) page, and Version Array (VA) page. SGX2 adds Trimmed EPC page.
Each enclave is associated with one SECS page. Each thread in enclave is
associated with one TCS page. VA page is used in EPC page eviction and reload.
Trimmed EPC page is used when particular 4K page in enclave is going to be
freed (trimmed).

- ENCLS and ENCLU

Two new instructions ENCLS and ENCLU are introduced to manage enclave and EPC.
ENCLS can only run in ring 0, while ENCLU can only run in ring 3. Both ENCLS and
ENCLU have multiple leaf functions, with EAX indicating the specific leaf
function. Specification of ENCLS and ENCLU can be found at SDM Chapter 41 SGX
Instruction References.

- Discovering SGX capability

CPUID.0x7.0:EBX.SGX[bit 2] reports the availability of SGX, and detailed SGX
info can be enumerated via new CPUID 0x12.

CPUID.0x12.0x0 enumerates SGX capablity (ex, SGX1, SGX2), including enclave
instruction opcode support. CPUID.0x12.0x1 enumerates SGX capability of
processor state configuration and enclave configuration in the SECS structure.
CPUID.0x12.0x2 (and following indexes if they are valid) enumerates EPC
resources. Starting from CPUID.0x12.0x2, each index reports one valid EPC
section (base, size), until CPUID reports invalid EPC. Typically multiple EPC
sections only exist on multiple sockets server machines (which currently don't
exist), and client machine or single socket server just reports one EPC.

Please refer to Chapter 37.7.2 Intel SGX Resource Enumeration Leaves for
detailed info of SGX CPUID.

On processor that supports SGX, SGX can also be opt-in{out} via SGX_ENABLE bit
(bit 18) of IA32_FEATURE_CONTROL MSR. If SGX_ENABLE bit is cleared while
IA32_FEATURE_CONTROL is locked then SGX is disabled on processor. The SGX CPUID
0x12 is still available if SGX is opted out via IA32_FEATURE_CONTROL. The SDM
doesn't specify the exact info that SGX CPUID 0x12 will report in this case,
but likely they will report invalid SGX info. If SGX is opted in, then SGX CPUID
0x12 reports valid SGX info. ENCLS and ENCLU will either #UD or #GP, depending
on the value of CPUID.0x7.0:EBX.SGX, IA32_FEATURE_CONTROL.SGX_ENABLE and
IA32_FEATURE_CONTROL.LOCK.

Please refer to Chapter 37.7.1 Intel SGX Opt-in Configuration for detailed info.

- SGX Launch Control

On processor that supports SGX, IA32_SGXLEPUBKEYHASH[0-3] MSRs contains the
hash of RSA public key. The Launch Enclave (LE) can be only run if it is signed
with the related RSA private key. Without SGX Launch Control, hardware can only
run Launch Enclave (LE) that signed with Intel's RSA key. SGX Launch Control
allows software to be able to change IA32_SGXLEPUBKEYHASHn at runtime, allowing
processor to run 3rd party's LE.

SGX Launch Control adds a new SGX_LAUNCH_CONTROL_ENABLE bit (bit 17) to
IA32_FEATURE_CONTROL MSR. If SGX_LAUNCH_CONTROL_ENABLE[bit 17] is 1,
IA32_SGXLEPUBKEYHASHn are writable at runtime after IA32_FEATURE_CONTROL.LOCK
is set. Otherwise they are readonly. Typically BIOS allows user to setup
3rd party's IA32_SGXLEPUBKEYHASHn before IA32_FEATURE_CONTROL is locked, and
allows user to choose whether to allow IA32_SGXLEPUBKEYHASHn to be changed
at runtime as well. However this depends on BIOS's implementation.

The CPUID.0x7.0:ECX[bit 30] reports availability of bit 17 of
IA32_FEATURE_CONTROL, meaning processor only support SGX Launch Policy when
CPUID.0x7.0:ECX[bit 30] is 1.

- SGX interaction with VMX

A new 64-bit ENCLS-exiting bitmap control field is added to VMCS (encoding
0202EH) to control VMEXIT on ENCLS leaf functions. And a new "Enable ENCLS
exiting" control bit (bit 15) is defined in secondary processor based vm
execution control. 1-Setting of "Enable ENCLS exiting" enables ENCLS-exiting
bitmap control.

Support for the 1-setting of "Enable ENCLS exiting" control is enumrated from
IA32_VMX_PROCBASED_CTLS2[bit 47]. IA32_VMX_PROCBASED_CTLS2[bit 47] monitors
CPUID.[EAX=0x7,ECX=0].EBX.SGX.

A new ENCLS VM exit reason (60) is also defined to Basic Exit Reason.

Below code shows how above execution control works:

    IF ( (in VMX non-root operation) and ( Enable_ENCLS_EXITING = 1) )
        Then
            IF ( ((EAX < 63) and (ENCLS_EXITING_Bitmap[EAX] = 1)) or
                    (EAX> 62 and ENCLS_EXITING_Bitmap[63] = 1) )
            Then
                Set VMCS.EXIT_REASON = ENCLS;
                Deliver VM exit;
        FI;
    FI;

VM exits that originate within an enclave set the following two bits before
delivering the VM exit to the VMM:
    - Bit 27 in the Exit reason filed of Basic VM-exit information.
    - Bit 4 in the Interruptibility State of Guest Non-Register State of VMCS.

Refer to 42.5 Interactions with VMX, 27.2.1 Basic VM-Exit Information, and
27.3.4 Saving Non-Register.

=========================   High Level Design   ==========================

- Qemu Changes

EPC is limited resource. Typically the EPC and EPCM together are 32M, 64M, or
128M configurable in BIOS. In order to use EPC more efficiently between
different KVM guests, we add additional Qemu parameters to allow administrator
to specify guest's EPC size when it is created. we also add additional two
parameters for SGX Launch Control. Specifically, below SGX parameters are added:

	# qemu-system-x86_64 -sgx epc=<size>,lehash='256-bit value string',lewr

In which 'epc' parameter specifies guest's EPC size. Any MB aligned value is
supported. 'lehash' is used to specify guest's IA32_SGXLEPUBKEYHASHn initial
value, and 'lewr' is used to specify whether guest's IA32_SGXLEPUBKEYHASHn are
writable for guest OS. 'epc' is mandatory and both 'lehash' and 'lewr' are
optional. Normally with 'lewr' specified, 'lehash' is not needed (and default
value is Intel's hash) as guest OS is able to change IA32_SGXLEPUBKEYHASHn as
it wishs.

With 'epc' parameter, Qemu is responsible for notifying KVM guest's EPC base
and size. EPC base address will be calculated by Qemu internally (according to
chip type, memory size, etc).

With 'lehash' specified, Qemu sets guest's IA32_SGXLEPUBKEYHASHn to the value
specified. With 'lewr' specified, Qemu sets guest's IA32_FEATURE_CONTROL bit 17
to be 1.

- Expose SGX to guest

SGX feature is exposed to guest via SGX CPUID. Looking at SGX CPUID, we can
report the same CPUID info to guest as on native for most of SGX CPUID. With
reporting the same CPUID guest is able to use full capacity of SGX, and KVM
doesn't need to emulate those info.

There are two exceptions: the first is obviously KVM cannot report physical
EPC to guest, but should report guest's (virtual) EPC base and size (which
will be notified from Qemu as we mentioned above). The second one is
SECS.ATTRIBUTES, which is reported by CPUID.0x12.1:EAX-EDX. Particularly,
it is SECS.ATTRIBUTES.XFRM(bit 127:64] that needs emulation. It reports which
XFRM bits can be set when creating enclave by using ENCLS[ECREATE]. As guest
may not support all XFRM bits that supported by hardware,
CPUID.0x12.0x1:[ECX-EDX] should also only reports guest's supported XFRM bits.

All other CPUID info can be reported to guest just as the same as on native.

And we only report one EPC section to guset (only CPUID.0x12.0x2 is valid).

- Initializing SGX for guest

As mentioned above guest's EPC base and size are determined by Qemu, and KVM
needs Qemu to notify such info to it before it can initialize SGX for guest.
To avoid new IOCTL for such purpose (ex, KVM_SET_EPC), KVM will initialize
guest's SGX in KVM_SET_CPUID2, where Qemu will pass guest's SGX CPUID where
guest's EPC base and size will be included.

Also the SDM says SGX CPUID is actually thread-specific. Software cannot assume
all logical processor will report the same SGX CPUID. Initializing guest's SGX
in KVM_SET_CPUID2 provides an opportunity for KVM to check whether SGX CPUID
passed by Qemu are valid and consistent within for all VCPUs.

- EPC management

On host side there's SGX driver which serves host SGX applications from
userspace. It detects SGX features and manages all EPC pages. To work with SGX
driver simultaneously, we have to use 'unified model', in which SGX driver
still manages EPC and KVM calls driver's APIs to allocate/free EPC page, etc.
However KVM cannot call driver's APIs directly, as on machines without SGX
feature, SGX driver won't be loaded, and calling driver's APIs directly will
make KVM unable to be loaded either. Instead, KVM uses symbol_get to get
driver's APIs at runtime to avoids this issue.

For KVM guests, there are two approaches in terms of managing EPC: static
partitioning and oversubscription. In static partitioning all EPC pages are
are allocated to guest when it is created and are freed only when guest is
destroyed. In oversubscription, EPC pages are allocated to guest on demand,
and EPC pages allocated to guest can be evicted out by KVM, and reassigned
to other guests. Accessing to guest EPC page where there's no physical EPC
mapped causes EPT violation (or PF in case of shadowing), in which physical
EPC page will be allocated to guest (and reloaded to enclave if required).

-- Static partitioning

Static partitioning is an simple appproach. KVM only needs to allocate all
EPC pages when guest is created and set up mapping. All ENCLS leaf functions
will run perfectly in guest, so KVM doesn't need to turn on ENCLS VMEXIT.

However KVM needs to turn on ENCLS VMEXIT if KVM doesn't expose SGX to guest,
or guest has turned off SGX via IA32_FEATURE_CONTROL.SGX_ENABLE, as in such
cases ENCLS run in guest may have different behavior from on native, as on
hardware SGX is indeed enabled, but accroding to SDM, running ENCLS in guest
while SGX environment is abnormal in guest should cause #UD or #GP. KVM needs
to trap ENCLS to emulate such behavior.

-- Oversubscription

While oversubscription is better in terms of functionality, it needs more
complicated implementation. Below is the brief explanation of what needs to
be done in order to support EPC oversubscription between guests.

Below is the sequence to evict regular EPC page:

	1) Select one or multiple regular EPC pages from one enclave
	2) Remove EPT/PT mapping for selected EPC pages
	3) Send IPIs to remote CPUs to flush TLB of selected EPC pages
	4) EBLOCK on selected EPC pages
	5) ETRACK on enclave's SECS page
	6) allocate one available slot (8-byte) in VA page
	7) EWB on selected EPC pages

With EWB taking:

	- VA slot, to restore eviction version info.
	- one normal 4K page in memory, to store encrypted content of EPC page.
	- one struct PCMD in memory, to store meta data.

And below is the sequence to evict an SECS page or VA page:

	1) locate SECS (or VA) page
	2) remove EPT/PT mapping for SECS (or VA) page
	3) Send IPIs to remote CPUs
	6) allocate one available slot (8-byte) in VA page
	4) EWB on SECS (or) page

And for evicting SECS page, all regular EPC pages that belongs to that SECS
must be evicted out prior, otherwise EWB returns SGX_CHILD_PRESENT error.

And to reload an EPC page:

	1) ELDU/ELDB on EPC page
	2) setup EPT/PT mapping

With ELDU/ELDB taking:

	- location of SECS page
	- linear address of enclave's 4K page (that we are going to reload to)
	- VA slot (used in EWB)
	- 4K page in memory (used in EWB)
	- struct PCMD in memory (used in EWB)

Therefore, to support EPC oversubscription for guests, KVM needs to know: 

	1) EPC page type (SECS, regular page, VA page, etc)
	2) EPC status (whether blocked) -- guest may already have run EBLOCK
	3) location of SECS page -- both eviction & reload need it.

Besides above, KVM also needs to manage allocation of VA slot, which itself is
also EPC page and could potentially trigger EPC oversubscription.

To get above info, KVM needs to trap ENCLS from all guests, and maintain info
of all EPC pages and all enclaves from all guests. Specifically, KVM needs to
turn on ENCLS VMEXIT for all guests, and upon ENCLS VMEXIT, KVM needs to parse
ENCLS parameters (so that we can update EPC/enclave info according to which
ENCLS leaf guest is running, and it's parameters). KVM also needs to either
run ENCLS on behalf of guest (and skip this ENCLS), or using MTF to return to
guest and let guest run this ENCLS again. For the formar, KVM needs to
reconstruct guest's ENCLS parameters and remap guest's virtual address to KVM
kernel address (as all addresses in guest's ENCLS parameters are guest virtual
address), and run ENCLS in KVM on behalf for guest. For the latter, upon ENCLS
VMEXIT, KVM needs to temporary turn off ENCLS VMEXIT, turn on MTF VMEXIT, and
enter guest to allow guest run this ENCLS again. This time ENCLS VMEXIT won't
happen and MTF VMEXIT will happen after ENCLS is executed. Upon MTF VMEXIT, we
turn on ENCLS VMEXIT and turn off MTF VMEXIT again.

Below diagrams compares the two approaches: Run ENCLS in KVM, and Using MTF.


	--------------------------------------------------------------
				|	ENCLS		|
	--------------------------------------------------------------
				|	   	    /|\
		ENCLS VMEXIT	|			| VMENTRY
				|			|
			       \|/			|
		
		1) parse ENCLS parameters
		2) reconstruct(remap) guest's ENCLS parameters
		3) run ENCLS on behalf of guest (and skip ENCLS)
		4) on success, update EPC/enclave info, or inject error

		   	1) Run ENCLS in KVM


	--------------------------------------------------------------
			 	 |	ENCLS		|
	--------------------------------------------------------------
				| /|\		       |/|\
			ENCLS	|  | VMENTRY    MTF    | | VMENTRY
		       VMEXIT	|  |	       VMEXIT  | |
		       	       \|/ |		      \|/|
	1) Turn off EMCLS VMEXIT	   1) Turn off MTF VMEXIT
	2) turn on MTF VMEXIT		   2) Turn on ENCLS VMEXIT
	3) cache ENCLS parameters          3) check ENCLS succeeds or not, and
	   (as ENCLS will change RAX-RDX)     only on success, parse cached
	   				      ENCLS parameters, and update
					      EPC/enclave info

			2) Using MTF

Note in using MTF, checking ENCLS status (whether succeeds or not) is tricky,
as ENCLS can both return error via EAX register, or just cause #UD or #GP. For
the formar case it's relatively easier for KVM to check but for the latter KVM
needs to trap #UD and #GP from guest, and also needs to check whether the #UD
or #GP happened while running ENCLS.

In this patch series we only support 'static partitioning'. 'oversubscription'
can be supported when it is required. Currently we do support nested SGX
(mentioned below) and in 'oversubscription' supporting nested SGX will be very
complicated.

- Guest's EPC memory slot implementation

Guest's (virtual) EPC is implemented as private memory slot in KVM. Qemu will
not be aware the existence of such EPC slot. Using private slot, we can avoid
mmap in Qemu for getting EPC slot's host virtual address, and KVM doesn't need
to handle such mmap from Qemu for EPC slot. And we don't want to implement such
mmap support in SGX driver either.

A dedicated kvm_epc_ops is added for VMA of EPC slot, and EPC page will be
allocated via vma->vm_ops->fault. This is the natual way to support
'oversubscription' (if we need to support in the future) and works for
'static partitioning' nicely as well.

- Nested SGX

Currently for 'static partitioning' nested SGX is also supported. As mentioned
above in normal case KVM (L0) doesn't need to turn on ENCLS VMEXIT, but KVM
cannot assume L1 hypervisor's behavior, so if ENCLS VMEXIT is turned on in L1,
KVM (L0) must also turn on ENCLS VMEXIT but let L1 to handle such ENCLS VMEXIT
from L2 guest.

Supporting nested SGX in 'oversubscription' will be very complicated, as both
L0 and L1 may turn on ENCLS VMEXIT, and both L0 and L1 needs to maintain and
update EPC/enclave info from guests as explained above.

Kai Huang (10):
  x86: add SGX Launch Control definition to cpufeature
  kvm: vmx: add ENCLS VMEXIT detection
  kvm: vmx: detect presence of host SGX driver
  kvm: sgx: new functions to init and destory SGX for guest
  kvm: x86: add KVM_GET_SUPPORTED_CPUID SGX support
  kvm: x86: add KVM_SET_CPUID2 SGX support
  kvm: vmx: add SGX IA32_FEATURE_CONTROL MSR emulation
  kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  kvm: vmx: handle ENCLS VMEXIT
  kvm: vmx: handle VMEXIT from SGX Enclave

 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   9 +-
 arch/x86/include/asm/msr-index.h   |   7 +
 arch/x86/include/asm/vmx.h         |   4 +
 arch/x86/include/uapi/asm/vmx.h    |   5 +-
 arch/x86/kvm/Makefile              |   2 +-
 arch/x86/kvm/cpuid.c               |  21 +-
 arch/x86/kvm/cpuid.h               |  22 ++
 arch/x86/kvm/sgx.c                 | 463 +++++++++++++++++++++++
 arch/x86/kvm/sgx.h                 | 105 ++++++
 arch/x86/kvm/svm.c                 |  11 +-
 arch/x86/kvm/vmx.c                 | 752 +++++++++++++++++++++++++++++++++++--
 12 files changed, 1362 insertions(+), 40 deletions(-)
 create mode 100644 arch/x86/kvm/sgx.c
 create mode 100644 arch/x86/kvm/sgx.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] x86: add SGX Launch Control definition to cpufeature
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:24 ` [PATCH 02/10] kvm: vmx: add ENCLS VMEXIT detection Kai Huang
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

For Intel CPU that supports SGX Launch Control, CPUID.0x7.0:ECX[bit 30] reports
the availability of 1-setting of bit 17 of IA32_FEATURE_CONTROL MSR, which
enables runtime configuration of SGX Launch Control via IA32_SGXLEPUBKEYHASHn.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 61eba9423b5c..e31c06ac3c65 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -292,6 +292,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57	(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID	(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_SGX_LAUNCH_CONTROL (16*32+30) /* SGX Launch Control */
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 02/10] kvm: vmx: add ENCLS VMEXIT detection
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
  2017-05-08  5:24 ` [PATCH 01/10] x86: add SGX Launch Control definition to cpufeature Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:24 ` [PATCH 03/10] kvm: vmx: detect presence of host SGX driver Kai Huang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

This patch detects whether ENCLS VMEXIT is supported. A new bool parameter
'enable_sgx' is also added to control enable SGX virtualization or not. SGX
virtualization is disabled if hardware doesn't support ENCLS VMEXIT.

ENCLS VMEXIT is disabled in vmx_secondary_exec_control, and when to turn on
or off is done in further patch.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/vmx.c         | 22 +++++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cc54b7026567..f7ac249ce83d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -72,6 +72,7 @@
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID		0x00001000
 #define SECONDARY_EXEC_SHADOW_VMCS              0x00004000
+#define SECONDARY_EXEC_ENCLS_EXITING		0x00008000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_XSAVES			0x00100000
 #define SECONDARY_EXEC_TSC_SCALING              0x02000000
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 259e9b28ccf8..050a143414e1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -108,6 +108,9 @@ static u64 __read_mostly host_xss;
 static bool __read_mostly enable_pml = 1;
 module_param_named(pml, enable_pml, bool, S_IRUGO);
 
+static bool __read_mostly enable_sgx = 1;
+module_param_named(sgx, enable_sgx, bool, S_IRUGO);
+
 #define KVM_VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
 
 /* Guest_tsc -> host_tsc conversion requires 64-bit division.  */
@@ -1123,6 +1126,12 @@ static inline bool cpu_has_vmx_virtual_intr_delivery(void)
 		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
 }
 
+static inline bool cpu_has_vmx_encls_vmexit(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_ENCLS_EXITING;
+}
+
 /*
  * Comment's format: document - errata name - stepping - processor name.
  * Refer from
@@ -3585,7 +3594,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_SHADOW_VMCS |
 			SECONDARY_EXEC_XSAVES |
 			SECONDARY_EXEC_ENABLE_PML |
-			SECONDARY_EXEC_TSC_SCALING;
+			SECONDARY_EXEC_TSC_SCALING |
+			SECONDARY_EXEC_ENCLS_EXITING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -5160,6 +5170,13 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	if (!enable_pml)
 		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
+	/*
+	 * ENCLS VMEXIT is controlled in vmx_cpuid_update. This function is
+	 * called in many places. We don't want ENCLS VMEXIT get enabled
+	 * surprisingly.
+	 */
+	exec_control &= ~SECONDARY_EXEC_ENCLS_EXITING;
+
 	return exec_control;
 }
 
@@ -6655,6 +6672,9 @@ static __init int hardware_setup(void)
 
 	kvm_mce_cap_supported |= MCG_LMCE_P;
 
+	if (!cpu_has_vmx_encls_vmexit())
+		enable_sgx = 0;
+
 	return alloc_kvm_area();
 
 out:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 03/10] kvm: vmx: detect presence of host SGX driver
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
  2017-05-08  5:24 ` [PATCH 01/10] x86: add SGX Launch Control definition to cpufeature Kai Huang
  2017-05-08  5:24 ` [PATCH 02/10] kvm: vmx: add ENCLS VMEXIT detection Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:24 ` [PATCH 04/10] kvm: sgx: new functions to init and destory SGX for guest Kai Huang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

At host side there's SGX driver which serves host SGX applications. It detects
SGX feature and manages all EPC pages. KVM needs to co-work with SGX driver in
terms of EPC management, because they are both EPC consumers. We should go for
'unified model', in which SGX driver manages all EPC pages and KVM simply calls
driver's APIs to allocate/free EPC page, etc. However KVM cannot call driver's
APIs directly, as on machines without SGX feature, SGX driver won't be loaded,
and calling driver's APIs directly will make KVM unable to be loaded either.
Instead, KVM uses symbol_get to get driver's APIs at runtime thus avoids this
issue.

This patch adds new functions to initialize and destroy KVM SGX support, where
currently KVM simply calls symbol_get{put} for all necessary driver's APIs.
The symbols will only be released when KVM exits to prevent SGX driver being
unloaded during KVM's lifetime. Note KVM compeletely trusts SGX driver in SGX
feature and EPC resource detection and won't detect SGX by itself.

Two new files arch/x86/kvm/sgx.c{h} are added for holding bulk of KVM SGX code.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/kvm/Makefile |   2 +-
 arch/x86/kvm/sgx.c    | 163 ++++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/sgx.h    |  34 +++++++++++
 arch/x86/kvm/vmx.c    |  10 ++++
 4 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/sgx.c
 create mode 100644 arch/x86/kvm/sgx.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 3bff20710471..015712e666fd 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -17,7 +17,7 @@ kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
 
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)	+= assigned-dev.o iommu.o
 
-kvm-intel-y		+= vmx.o pmu_intel.o
+kvm-intel-y		+= vmx.o pmu_intel.o sgx.o
 kvm-amd-y		+= svm.o pmu_amd.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/x86/kvm/sgx.c b/arch/x86/kvm/sgx.c
new file mode 100644
index 000000000000..4b65b1bb1f30
--- /dev/null
+++ b/arch/x86/kvm/sgx.c
@@ -0,0 +1,163 @@
+/*
+ * KVM SGX Virtualization support.
+ *
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author:	Kai Huang <kai.huang@linux.intel.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/cpufeature.h>	/* boot_cpu_has */
+#include <asm/processor.h>	/* cpuid */
+#include <linux/smp.h>
+#include <linux/module.h>
+#include "sgx.h"
+
+/* Debug helpers... */
+#define	sgx_debug(fmt, ...)	\
+	printk(KERN_DEBUG "KVM: SGX: %s: "fmt, __func__, ## __VA_ARGS__)
+#define	sgx_info(fmt, ...)	\
+	printk(KERN_INFO "KVM: SGX: "fmt, ## __VA_ARGS__)
+#define	sgx_err(fmt, ...)	\
+	printk(KERN_ERR "KVM: SGX: "fmt, ## __VA_ARGS__)
+
+/*
+ * EPC pages are managed by SGX driver. KVM needs to call SGX driver's APIs
+ * to allocate/free EPC page, etc.
+ *
+ * However KVM cannot call SGX driver's APIs directly. As on machine without
+ * SGX support, SGX driver cannot be loaded, therefore if KVM calls driver's
+ * APIs directly, KVM won't be able to be loaded either, which is not
+ * acceptable. Instead, KVM uses symbol_get{put} pair to get driver's APIs
+ * at runtime and simply disable SGX if those symbols cannot be found.
+ */
+struct required_sgx_driver_symbols {
+	struct sgx_epc_page *(*alloc_epc_page)(unsigned int flags);
+	/*
+	 * Currently SGX driver's sgx_free_page has 'struct sgx_encl *encl'
+	 * as parameter. We need to honor that.
+	 */
+	int (*free_epc_page)(struct sgx_epc_page *epg, struct sgx_encl *encl);
+	/*
+	 * get/put (map/unmap) kernel virtual address of given EPC page.
+	 * The namings are aligned to SGX driver's APIs.
+	 */
+	void *(*get_epc_page)(struct sgx_epc_page *epg);
+	void (*put_epc_page)(void *epc_page_vaddr);
+};
+
+static struct required_sgx_driver_symbols sgx_driver_symbols = {
+	.alloc_epc_page = NULL,
+	.free_epc_page = NULL,
+	.get_epc_page = NULL,
+	.put_epc_page = NULL,
+};
+
+static inline struct sgx_epc_page *sgx_alloc_epc_page(unsigned int flags)
+{
+	struct sgx_epc_page *epg;
+
+	BUG_ON(!sgx_driver_symbols.alloc_epc_page);
+
+	epg = sgx_driver_symbols.alloc_epc_page(flags);
+
+	/* sgx_alloc_page returns ERR_PTR(error_code) instead of NULL */
+	return IS_ERR_OR_NULL(epg) ? NULL : epg;
+}
+
+static inline void sgx_free_epc_page(struct sgx_epc_page *epg)
+{
+	BUG_ON(!sgx_driver_symbols.free_epc_page);
+
+	sgx_driver_symbols.free_epc_page(epg, NULL);
+}
+
+static inline void *sgx_kmap_epc_page(struct sgx_epc_page *epg)
+{
+	BUG_ON(!sgx_driver_symbols.get_epc_page);
+
+	return sgx_driver_symbols.get_epc_page(epg);
+}
+
+static inline void sgx_kunmap_epc_page(void *addr)
+{
+	BUG_ON(!sgx_driver_symbols.put_epc_page);
+
+	sgx_driver_symbols.put_epc_page(addr);
+}
+
+static inline u64 sgx_epc_page_to_pfn(struct sgx_epc_page *epg)
+{
+	return (u64)(epg->pa >> PAGE_SHIFT);
+}
+
+static void put_sgx_driver_symbols(void);
+
+static int get_sgx_driver_symbols(void)
+{
+	sgx_driver_symbols.alloc_epc_page = symbol_get(sgx_alloc_page);
+	if (!sgx_driver_symbols.alloc_epc_page)
+		goto error;
+	sgx_driver_symbols.free_epc_page = symbol_get(sgx_free_page);
+	if (!sgx_driver_symbols.free_epc_page)
+		goto error;
+	sgx_driver_symbols.get_epc_page = symbol_get(sgx_get_page);
+	if (!sgx_driver_symbols.get_epc_page)
+		goto error;
+	sgx_driver_symbols.put_epc_page = symbol_get(sgx_put_page);
+	if (!sgx_driver_symbols.put_epc_page)
+		goto error;
+
+	return 0;
+
+error:
+	put_sgx_driver_symbols();
+	return -EFAULT;
+}
+
+static void put_sgx_driver_symbols(void)
+{
+	if (sgx_driver_symbols.alloc_epc_page)
+		symbol_put(sgx_alloc_page);
+	if (sgx_driver_symbols.free_epc_page)
+		symbol_put(sgx_free_page);
+	if (sgx_driver_symbols.get_epc_page)
+		symbol_put(sgx_get_page);
+	if (sgx_driver_symbols.put_epc_page)
+		symbol_put(sgx_put_page);
+
+	memset(&sgx_driver_symbols, 0, sizeof (sgx_driver_symbols));
+}
+
+int sgx_init(void)
+{
+	int r;
+
+	r = get_sgx_driver_symbols();
+	if (r) {
+		sgx_err("SGX driver is not loaded.\n");
+		return r;
+	}
+
+	sgx_info("SGX virtualization supported.\n");
+
+	return 0;
+}
+
+void sgx_destroy(void)
+{
+	put_sgx_driver_symbols();
+}
diff --git a/arch/x86/kvm/sgx.h b/arch/x86/kvm/sgx.h
new file mode 100644
index 000000000000..ff2766eeae33
--- /dev/null
+++ b/arch/x86/kvm/sgx.h
@@ -0,0 +1,34 @@
+/*
+ * KVM SGX Virtualization support.
+ *
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author:	Kai Huang <kai.huang@linux.intel.com>
+ */
+
+#ifndef	ARCH_X86_KVM_SGX_H
+#define	ARCH_X86_KVM_SGX_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/bitops.h>
+#include <linux/kvm_host.h>
+#include <asm/sgx.h>
+
+int sgx_init(void);
+void sgx_destroy(void);
+
+#endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 050a143414e1..4b368a0af9bd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -52,6 +52,8 @@
 #include "trace.h"
 #include "pmu.h"
 
+#include "sgx.h"
+
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 #define __ex_clear(x, reg) \
 	____kvm_handle_fault_on_reboot(x, "xor " reg " , " reg)
@@ -11657,6 +11659,11 @@ static int __init vmx_init(void)
 	if (r)
 		return r;
 
+	if (enable_sgx) {
+		if (sgx_init())
+			enable_sgx = 0;
+	}
+
 #ifdef CONFIG_KEXEC_CORE
 	rcu_assign_pointer(crash_vmclear_loaded_vmcss,
 			   crash_vmclear_local_loaded_vmcss);
@@ -11672,6 +11679,9 @@ static void __exit vmx_exit(void)
 	synchronize_rcu();
 #endif
 
+	if (enable_sgx)
+		sgx_destroy();
+
 	kvm_exit();
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 04/10] kvm: sgx: new functions to init and destory SGX for guest
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (2 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 03/10] kvm: vmx: detect presence of host SGX driver Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:24 ` [PATCH 05/10] kvm: x86: add KVM_GET_SUPPORTED_CPUID SGX support Kai Huang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

Add new kvm_sgx structure to keep per-guest SGX staff, including guest's CPUID
and EPC slot info. The initialization function checks consistency of SGX cpuid
info from Qemu (return error in case Qemu did something wrong) and creates EPC
slot (only once when firstly called). If anything goes wrong by returning error
to Qemu, it is able to stop creating vcpu or just kill guest.

EPC slot is implemented as private memory slot by KVM. It is the easiest way as
we don't expose a 'file' to userspace to let Qemu issue mmap to get userspace
virtual address for EPC slot, and we don't want to use SGX driver's mmap for
this purpose either.

EPC page is actually allocated via vma->vm_ops->fault associated to EPC slot's
vma, to comply with current hva_to_pfn implementation, so that hva_to_pfn works
for EPC as well.

A new kvm_epc structure is also added to represent EPC slot info for guest, and
a new kvm_epc_page structure is added to track each guest's EPC page status. It
is used to keep all physical EPC pages allocated to guest, which is needed when
KVM wants to free all EPC pages allocated to guest when guest is destroyed. Btw
SGX driver doesn't have sgx_epc_pfn_to_page so KVM needs to do the bookkeeping.
What's more, we can expend it in the future, ex, to support EPC oversubscription
between KVM guests (where we will have more status of guest's EPC page).

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |   4 +-
 arch/x86/kvm/sgx.c              | 300 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/sgx.h              |  71 ++++++++++
 3 files changed, 374 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 74ef58c8ff53..1d622334fc0e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -40,7 +40,7 @@
 #define KVM_MAX_VCPU_ID 1023
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 3
+#define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_MEM_SLOTS_NUM (KVM_USER_MEM_SLOTS + KVM_PRIVATE_MEM_SLOTS)
 
 #define KVM_PIO_PAGE_OFFSET 1
@@ -817,6 +817,8 @@ struct kvm_arch {
 
 	bool x2apic_format;
 	bool x2apic_broadcast_quirk_disabled;
+
+	void *priv;	/* x86 vendor specific data */
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/sgx.c b/arch/x86/kvm/sgx.c
index 4b65b1bb1f30..a7040e6380a5 100644
--- a/arch/x86/kvm/sgx.c
+++ b/arch/x86/kvm/sgx.c
@@ -104,6 +104,306 @@ static inline u64 sgx_epc_page_to_pfn(struct sgx_epc_page *epg)
 	return (u64)(epg->pa >> PAGE_SHIFT);
 }
 
+static int __sgx_eremove(struct sgx_epc_page *epg)
+{
+	void *addr;
+	int r;
+
+	addr = sgx_kmap_epc_page(epg);
+	r = __eremove(addr);
+	sgx_kunmap_epc_page(addr);
+	if (unlikely(r)) {
+		sgx_err("__eremove error: EPC pfn 0x%lx, r %d\n",
+				(unsigned long)sgx_epc_page_to_pfn(epg),
+				r);
+	}
+
+	return r;
+}
+
+/* By reaching here the mmap_sem should be already hold */
+static int kvm_epc_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct kvm_sgx *sgx = (struct kvm_sgx *)vma->vm_private_data;
+	struct kvm *kvm;
+	struct sgx_epc_page *epg;
+	struct kvm_epc_page *gepg;
+	u64 gfn, pfn;
+
+	BUG_ON(!sgx);
+	kvm = sgx->kvm;
+
+	gfn = to_epc(sgx)->base_gfn + (((unsigned long)vmf->address -
+				vma->vm_start) >> PAGE_SHIFT);
+	gepg = gfn_to_guest_epc_page(kvm, gfn);
+
+	/*
+	 * SGX driver doesn't support recycling EPC pages back from KVM
+	 * guests yet, and it doesn't support out-of-EPC killer either,
+	 * therefore if we don't use SGX_ALLOC_ATOMIC here, this function
+	 * may never return in case SGX driver cannot recycle enough EPC
+	 * pages from host SGX applications.
+	 */
+	epg = sgx_alloc_epc_page(SGX_ALLOC_ATOMIC);
+	if (!epg) {
+		/* Unable to allocate EPC. Kill the guest */
+		sgx_err("kvm 0x%p, gfn 0x%lx: out of EPC when trying to "
+				"map EPC to guest.\n", kvm, (unsigned long)gfn);
+		goto error;
+	}
+
+	pfn = sgx_epc_page_to_pfn(epg);
+	if (vm_insert_pfn(vma, (unsigned long)vmf->address,
+			(unsigned long)pfn)) {
+		sgx_err("kvm 0x%p, gfn 0x%lx: failed to install host mapping "
+				"on: hva 0x%lx, pfn 0x%lx\n", kvm,
+				(unsigned long)gfn,
+				(unsigned long)vmf->address,
+				(unsigned long)pfn);
+		sgx_free_epc_page(epg);
+		goto error;
+	}
+
+	/* Book keeping physical EPC page allocated/mapped to particular GFN */
+	gepg->epg = epg;
+
+	return VM_FAULT_NOPAGE;	/* EPC has not 'struct page' associated */
+error:
+	return VM_FAULT_SIGBUS;
+}
+
+static void kvm_epc_close(struct vm_area_struct *vma)
+{
+}
+
+static struct vm_operations_struct kvm_epc_ops =  {
+	.fault = kvm_epc_fault,
+	/* close to prevent vma to be merged. */
+	.close = kvm_epc_close,
+};
+
+static void kvm_init_epc_table(struct kvm_epc_page *epc_table, u64 npages)
+{
+	u64 i;
+
+	for (i = 0; i < npages; i++)  {
+		struct kvm_epc_page *gepg = epc_table + i;
+
+		gepg->epg = NULL;
+	}
+}
+
+static void kvm_destroy_epc_table(struct kvm_epc_page *epc_table,
+		u64 npages)
+{
+	u64 i;
+	int r;
+
+	/*
+	 *
+	 */
+	/*
+	 * We need to call EREMOVE explicitly but not sgx_free_epc_page here
+	 * for the first round as sgx_free_page (sgx_free_epc_page calls it)
+	 * provided by SGX driver always does EREMOVE and adds EPC page back
+	 * to sgx_free_list if there's no error. We don't keep SECS page to
+	 * a temporary list but rely on sgx_free_epc_page to free all EPC pages
+	 * in second round so just use EREMOVE at first round.
+	 */
+	for (i = 0; i < npages; i++) {
+		struct kvm_epc_page *gepg = epc_table + i;
+		struct sgx_epc_page *epg;
+
+		if (!gepg->epg)
+			continue;
+
+		epg = gepg->epg;
+		r = __sgx_eremove(epg);
+		if (r == SGX_CHILD_PRESENT) {
+			sgx_debug("EREMOVE SECS (0x%lx) prior to regular EPC\n",
+				(unsigned long)sgx_epc_page_to_pfn(epg));
+		}
+	}
+
+	/*
+	 * EREMOVE on invalid EPC (which has been removed from enclave) will
+	 * simply return success.
+	 */
+	for (i = 0; i < npages; i++) {
+		struct kvm_epc_page *gepg = epc_table + i;
+		struct sgx_epc_page *epg;
+
+		if (!gepg->epg)
+			continue;
+
+		epg = gepg->epg;
+		sgx_free_epc_page(epg);
+	}
+}
+
+static int kvm_init_epc(struct kvm *kvm, u64 epc_base_pfn, u64 epc_npages)
+{
+	struct kvm_sgx *sgx = to_sgx(kvm);
+	struct vm_area_struct *vma;
+	struct kvm_memory_slot *slot;
+	struct kvm_epc_page *epc_table;
+	int r;
+
+	r = x86_set_memory_region(kvm, SGX_EPC_MEMSLOT,
+			epc_base_pfn << PAGE_SHIFT, epc_npages << PAGE_SHIFT);
+	if (r) {
+		sgx_debug("x86_set_memory_region failed: %d\n", r);
+		return r;
+	}
+
+	slot = id_to_memslot(kvm_memslots(kvm), SGX_EPC_MEMSLOT);
+	BUG_ON(!slot);
+
+	epc_table = alloc_pages_exact(epc_npages * sizeof (struct kvm_epc_page),
+			GFP_KERNEL);
+	if (!epc_table) {
+		sgx_debug("unable to alloc guest EPC table.\n");
+		x86_set_memory_region(kvm, SGX_EPC_MEMSLOT, 0, 0);
+		return -ENOMEM;
+	}
+
+	kvm_init_epc_table(epc_table, epc_npages);
+
+	sgx->epc.epc_table = epc_table;
+	sgx->epc.base_gfn = slot->base_gfn;
+	sgx->epc.npages = slot->npages;
+
+	vma = find_vma_intersection(kvm->mm, slot->userspace_addr,
+			slot->userspace_addr + 1);
+	BUG_ON(!vma);
+
+	/* EPC has no 'struct page' associated */
+	vma->vm_flags |= VM_PFNMAP;
+	vma->vm_flags &= ~(VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC | VM_MAYSHARE);
+	vma->vm_ops = &kvm_epc_ops;
+	vma->vm_private_data = (void *)sgx;
+
+	return 0;
+}
+
+static void kvm_destroy_epc(struct kvm *kvm)
+{
+	struct kvm_sgx *sgx = to_sgx(kvm);
+	struct kvm_epc_page *epc_table = to_epc(sgx)->epc_table;
+	u64 npages = to_epc(sgx)->npages;
+
+	/*
+	 * See kvm_arch_destroy_vm, which is also the reason that we don't
+	 * keep slot in kvm_epc structure, as slot may already have been
+	 * destroyed during abnormal exit.
+	 */
+	if (current->mm == kvm->mm)
+		x86_set_memory_region(kvm, SGX_EPC_MEMSLOT, 0, 0);
+
+	kvm_destroy_epc_table(epc_table, npages);
+
+	free_pages_exact(epc_table, npages * sizeof (struct kvm_epc_page));
+}
+
+static int kvm_populate_epc(struct kvm *kvm, u64 epc_base_pfn,
+		u64 epc_npages)
+{
+	int i;
+
+	for (i = 0; i < epc_npages; i++) {
+		gfn_t gfn = epc_base_pfn + i;
+		/* This will trigger vma->vm_ops->fault to populate EPC */
+		kvm_pfn_t pfn = gfn_to_pfn(kvm, gfn);
+		if (is_error_pfn(pfn))
+			return -EFAULT;	/* Cannot use ENOMEM */
+	}
+	return 0;
+}
+
+/*
+ * Initialize SGX for particular guest. This function may be called several
+ * times from caller. If guest SGX has not been initialized (this function is
+ * firstly called), we create kvm_sgx structure and initialize it. If guest SGX
+ * has already been initialized, we then check whether SGX cpuid from Qemu is
+ * consistent with existing one. If Qemu did something wrong by returning error
+ * here we can allow Qemu to stop creating vcpu, or just kill guest. We also
+ * populate all EPC for guest if oversubscription is not supported.
+ */
+int kvm_init_sgx(struct kvm *kvm, struct sgx_cpuinfo *sgxinfo)
+{
+	struct kvm_sgx *sgx = to_sgx(kvm);
+	u64 epc_base_pfn, epc_npages;
+	int r;
+
+	if (!sgxinfo)
+		return -EINVAL;
+
+	if (sgx) {
+		/*
+		 * Already inited? We then check whether EPC base and size
+		 * equal to saved value.
+		 */
+
+		if (memcmp(&(sgx->sgxinfo), sgxinfo,
+					sizeof(struct sgx_cpuinfo))) {
+			sgx_debug("SGX CPUID inconsistency from Qemu\n");
+			return -EINVAL;
+		}
+		else
+			return 0;
+	}
+
+	epc_base_pfn = sgxinfo->epc_base  >> PAGE_SHIFT;
+	epc_npages = sgxinfo->epc_size >> PAGE_SHIFT;
+
+	sgx = kzalloc(sizeof(struct kvm_sgx), GFP_KERNEL);
+	if (!sgx) {
+		sgx_debug("out of memory\n");
+		return -ENOMEM;
+	}
+	sgx->kvm = kvm;
+	memcpy(&(sgx->sgxinfo), sgxinfo, sizeof(struct sgx_cpuinfo));
+	/* Make to_sgx(kvm) work */
+	kvm->arch.priv = sgx;
+
+	/* Init EPC for guest */
+	r = kvm_init_epc(kvm, epc_base_pfn, epc_npages);
+	if (r) {
+		sgx_debug("kvm_create_epc_slot failed.\n");
+		kfree(sgx);
+		kvm->arch.priv = NULL;
+		return r;
+	}
+
+	/* Populate all EPC pages for guest when it is created. */
+	r = kvm_populate_epc(kvm, epc_base_pfn, epc_npages);
+	if (r) {
+		sgx_debug("kvm_populate_epc failed.\n");
+		/* EPC slot will be destroyed when guest is destoryed */
+		kvm_destroy_epc(kvm);
+		kfree(sgx);
+		kvm->arch.priv = NULL;
+		return r;
+	}
+
+	return 0;
+}
+
+void kvm_destroy_sgx(struct kvm *kvm)
+{
+	struct kvm_sgx *sgx = to_sgx(kvm);
+
+	if (sgx) {
+		kvm_destroy_epc(kvm);
+		kfree(sgx);
+	}
+
+	kvm->arch.priv = NULL;
+}
+
+
+
 static void put_sgx_driver_symbols(void);
 
 static int get_sgx_driver_symbols(void)
diff --git a/arch/x86/kvm/sgx.h b/arch/x86/kvm/sgx.h
index ff2766eeae33..8a8f1235c19c 100644
--- a/arch/x86/kvm/sgx.h
+++ b/arch/x86/kvm/sgx.h
@@ -27,8 +27,79 @@
 #include <linux/bitops.h>
 #include <linux/kvm_host.h>
 #include <asm/sgx.h>
+#include <uapi/asm/sgx.h>	/* ENCLS error code */
 
 int sgx_init(void);
 void sgx_destroy(void);
 
+struct kvm_epc_page {
+	/* valid if physical EPC page is mapped to guest EPC gfn */
+	struct sgx_epc_page *epg;
+};
+
+struct kvm_epc {
+	u64 base_gfn;
+	u64 npages;
+	struct kvm_epc_page *epc_table;
+};
+
+/*
+ * SGX capability from SGX CPUID.
+ */
+struct sgx_cpuinfo {
+#define SGX_CAP_SGX1    (1UL << 0)
+#define SGX_CAP_SGX2    (1UL << 1)
+    u32 cap;
+    u32 miscselect;
+    u32 max_enclave_size64;
+    u32 max_enclave_size32;
+    u32 secs_attr_bitmask[4];
+    u64 epc_base;
+    u64 epc_size;
+};
+
+/*
+ * SGX per-VM structure
+ */
+struct kvm_sgx {
+	struct kvm *kvm;
+	struct sgx_cpuinfo sgxinfo;
+	struct kvm_epc epc;
+};
+
+#define	to_sgx(_kvm)	((struct kvm_sgx *)(kvm->arch.priv))
+#define	to_epc(_sgx)	((struct kvm_epc *)(&((_sgx)->epc)))
+
+static inline bool is_valid_epc_gfn(struct kvm *kvm, u64 gfn)
+{
+	struct kvm_sgx *sgx = to_sgx(kvm);
+	struct kvm_epc *epc = to_epc(sgx);
+
+	return ((gfn >= epc->base_gfn) && (gfn < epc->base_gfn + epc->npages));
+}
+
+static inline struct kvm_epc_page *gfn_to_guest_epc_page(struct kvm *kvm, u64 gfn)
+{
+	struct kvm_sgx *sgx = to_sgx(kvm);
+	struct kvm_epc *epc = to_epc(sgx);
+
+	BUG_ON(!is_valid_epc_gfn(kvm, gfn));
+
+	return epc->epc_table + (gfn - epc->base_gfn);
+}
+
+static inline u64 guest_epc_page_to_gfn(struct kvm *kvm, struct kvm_epc_page *gepg)
+{
+	struct kvm_sgx *sgx = to_sgx(kvm);
+	struct kvm_epc *epc = to_epc(sgx);
+
+	return epc->base_gfn + (gepg - epc->epc_table);
+}
+
+/* EPC slot is created by KVM as private slot. */
+#define SGX_EPC_MEMSLOT		(KVM_USER_MEM_SLOTS + 3)
+
+int kvm_init_sgx(struct kvm *kvm, struct sgx_cpuinfo *sgxinfo);
+void kvm_destroy_sgx(struct kvm *kvm);
+
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 05/10] kvm: x86: add KVM_GET_SUPPORTED_CPUID SGX support
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (3 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 04/10] kvm: sgx: new functions to init and destory SGX for guest Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:24 ` [PATCH 06/10] kvm: x86: add KVM_SET_CPUID2 " Kai Huang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

This patch adds SGX CPUID support for KVM_GET_SUPPORTED_CPUID IOCTL. We need
to only expose SGX CPUID when enable_sgx is valid, as enable_sgx may be false,
for example, when user deliberately disables SGX, or when SGX initialization
fails, in which case hardware will still reports valid SGX CPUID.

As enable_sgx is not exposed to arch/x86/kvm/cpuid.c, we need to do SGX related
CPUID in vmx.c, for which kvm_x86_ops->set_supported_cpuid is extended to meet
SGX's need, and do_cpuid_1_ent is also exposed to VMX.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/cpuid.c            | 13 ++++----
 arch/x86/kvm/cpuid.h            |  2 ++
 arch/x86/kvm/svm.c              |  5 ++-
 arch/x86/kvm/vmx.c              | 71 +++++++++++++++++++++++++++++++++++++++--
 5 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d622334fc0e..d7254f36b17d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -978,7 +978,8 @@ struct kvm_x86_ops {
 
 	void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
 
-	void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
+	int (*set_supported_cpuid)(u32 func, u32 index,
+			struct kvm_cpuid_entry2 *entry, int *nent, int maxnent);
 
 	bool (*has_wbinvd_exit)(void);
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index efde6cc50875..d2c396b0b32f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -274,7 +274,7 @@ static void cpuid_mask(u32 *word, int wordnum)
 	*word &= boot_cpu_data.x86_capability[wordnum];
 }
 
-static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
+void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			   u32 index)
 {
 	entry->function = function;
@@ -283,6 +283,7 @@ static void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		    &entry->eax, &entry->ebx, &entry->ecx, &entry->edx);
 	entry->flags = 0;
 }
+EXPORT_SYMBOL_GPL(do_cpuid_1_ent);
 
 static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry,
 				   u32 func, u32 index, int *nent, int maxnent)
@@ -402,7 +403,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	switch (function) {
 	case 0:
-		entry->eax = min(entry->eax, (u32)0xd);
+		entry->eax = min(entry->eax, (u32)0x12);
 		break;
 	case 1:
 		entry->edx &= kvm_cpuid_1_edx_x86_features;
@@ -573,6 +574,9 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		}
 		break;
 	}
+	case 0x12:
+		  /* Intel SGX CPUID. Passthrough to VMX to handle. */
+		  break;
 	case KVM_CPUID_SIGNATURE: {
 		static const char signature[12] = "KVMKVMKVM\0\0";
 		const u32 *sigptr = (const u32 *)signature;
@@ -651,10 +655,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		break;
 	}
 
-	kvm_x86_ops->set_supported_cpuid(function, entry);
-
-	r = 0;
-
+	r = kvm_x86_ops->set_supported_cpuid(function, index, entry, nent, maxnent);
 out:
 	put_cpu();
 
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 35058c2c0eea..de658f4fa1c6 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -6,6 +6,8 @@
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu);
 bool kvm_mpx_supported(void);
+void do_cpuid_1_ent(struct kvm_cpuid_entry2 *entry, u32 function,
+			   u32 index);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
 					      u32 function, u32 index);
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5fba70646c32..678b30d2a188 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4988,7 +4988,8 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 		entry->ecx &= ~bit(X86_FEATURE_X2APIC);
 }
 
-static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
+static int svm_set_supported_cpuid(u32 func, u32 index,
+		struct kvm_cpuid_entry2 *entry, int *nent, int maxnent)
 {
 	switch (func) {
 	case 0x1:
@@ -5017,6 +5018,8 @@ static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
 
 		break;
 	}
+
+	return 0;
 }
 
 static int svm_get_lpage_level(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4b368a0af9bd..31de95986dbd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9493,10 +9493,75 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 		nested_vmx_cr_fixed1_bits_update(vcpu);
 }
 
-static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
+static int vmx_set_supported_cpuid(u32 func, u32 index,
+		struct kvm_cpuid_entry2 *entry, int *nent, int maxnent)
 {
-	if (func == 1 && nested)
-		entry->ecx |= bit(X86_FEATURE_VMX);
+	int r = -E2BIG;
+
+	switch (func) {
+	case 0x1:
+		if (nested)
+			entry->ecx |= bit(X86_FEATURE_VMX);
+		break;
+	case 0x7:
+		if (index == 0 && enable_sgx) {
+			entry->ebx |= bit(X86_FEATURE_SGX);
+			if (boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL))
+				entry->ecx |=
+					bit(X86_FEATURE_SGX_LAUNCH_CONTROL);
+		}
+		break;
+	case 0x12: {
+		WARN_ON(index != 0);
+
+		if (enable_sgx) {
+			if (*nent >= maxnent)
+				goto out;
+
+			/* do_cpuid_1_ent has already been called for index 0 */
+			entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+
+			/* Index 1: SECS.ATTRIBUTE */
+			do_cpuid_1_ent(++entry, 0x12, 0x1);
+			entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+			++*nent;
+
+			if (*nent >= maxnent)
+				goto out;
+
+			/*
+			 * Index 2: EPC section
+			 *
+			 * Note: We only report one EPC section as userspace
+			 * doesn't need to know physical EPC info. In fact,
+			 * KVM_SET_CPUID2 should contain guest's virtual EPC
+			 * base & size, in which case one virtual EPC section
+			 * is obviously enough for guest.
+			 */
+			do_cpuid_1_ent(++entry, 0x12, 0x2);
+			entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+			/*
+			 * Don't report physical EPC info as userspace doesn't
+			 * need to know.
+			 */
+			entry->eax &= 0xf;
+			entry->ebx = 0;
+			entry->ecx &= 0xf;
+			entry->edx = 0;
+			++*nent;
+		}
+		else
+			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+
+		break;
+	}
+	default:
+		break;
+	}
+
+	r = 0;
+out:
+	return r;
 }
 
 static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 06/10] kvm: x86: add KVM_SET_CPUID2 SGX support
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (4 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 05/10] kvm: x86: add KVM_GET_SUPPORTED_CPUID SGX support Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:24 ` [PATCH 07/10] kvm: vmx: add SGX IA32_FEATURE_CONTROL MSR emulation Kai Huang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

This patch adds SGX CPUID support for KVM_SET_CPUID2. Besides setting up
guest's SGX CPUID, guest's SGX is initialized in KVM_SET_CPUID2 as well.
This is because, to avoid adding a new IOCTL to set guest's EPC base & size,
we need to get such info from KVM_SET_CPUID2 (where userspace will setup
guest's EPC base & size), and guest's SGX can only be initialized after KVM
knows such info.

Initializing guest's SGX may fail, so kvm_x86_ops->cpuid_update is changed
to return integer to reflect whether guest's SGX is initialized successfully
or not. kvm_update_cpuid is also moved to be called before
kvm_x86_ops->cpuid_update, as guest's SGX CPUID.0x12.1 needs to depends on
vcpu->arch.guest_supported_xcr0 (which is set in kvm_update_cpuid).

Also a new kvm_x86_ops->vm_destroy is added for VMX, in which guest's SGX is
destroyed.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/cpuid.c            |   8 ++-
 arch/x86/kvm/cpuid.h            |  20 +++++++
 arch/x86/kvm/svm.c              |   6 ++-
 arch/x86/kvm/vmx.c              | 113 +++++++++++++++++++++++++++++++++++++++-
 5 files changed, 143 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d7254f36b17d..38cbb1eb652f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -894,7 +894,7 @@ struct kvm_x86_ops {
 	void (*hardware_unsetup)(void);            /* __exit */
 	bool (*cpu_has_accelerated_tpr)(void);
 	bool (*cpu_has_high_real_mode_segbase)(void);
-	void (*cpuid_update)(struct kvm_vcpu *vcpu);
+	int (*cpuid_update)(struct kvm_vcpu *vcpu);
 
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d2c396b0b32f..11a13afef373 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -220,8 +220,10 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 	vcpu->arch.cpuid_nent = cpuid->nent;
 	cpuid_fix_nx_cap(vcpu);
 	kvm_apic_set_version(vcpu);
-	kvm_x86_ops->cpuid_update(vcpu);
 	r = kvm_update_cpuid(vcpu);
+	if (r)
+		goto out;
+	r = kvm_x86_ops->cpuid_update(vcpu);
 
 out:
 	vfree(cpuid_entries);
@@ -243,8 +245,10 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
 		goto out;
 	vcpu->arch.cpuid_nent = cpuid->nent;
 	kvm_apic_set_version(vcpu);
-	kvm_x86_ops->cpuid_update(vcpu);
 	r = kvm_update_cpuid(vcpu);
+	if (r)
+		goto out;
+	r = kvm_x86_ops->cpuid_update(vcpu);
 out:
 	return r;
 }
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index de658f4fa1c6..7d10f2884779 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -155,6 +155,26 @@ static inline bool guest_cpuid_has_rdtscp(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Only checks CPUID.0x7.0x0:EBX.SGX. SDM says if this bit is 1, logical
+ * processor supports SGX CPUID 0x12.
+ */
+static inline bool guest_cpuid_has_sgx(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x7, 0);
+	return best && (best->ebx & bit(X86_FEATURE_SGX));
+}
+
+static inline bool guest_cpuid_has_sgx_launch_control(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x7, 0);
+	return best && (best->ecx & bit(X86_FEATURE_SGX_LAUNCH_CONTROL));
+}
+
+/*
  * NRIPS is provided through cpuidfn 0x8000000a.edx bit 3
  */
 #define BIT_NRIPS	3
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 678b30d2a188..d5a5410ba623 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4972,7 +4972,7 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 	return 0;
 }
 
-static void svm_cpuid_update(struct kvm_vcpu *vcpu)
+static int svm_cpuid_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct kvm_cpuid_entry2 *entry;
@@ -4981,11 +4981,13 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 	svm->nrips_enabled = !!guest_cpuid_has_nrips(&svm->vcpu);
 
 	if (!kvm_vcpu_apicv_active(vcpu))
-		return;
+		return 0;
 
 	entry = kvm_find_cpuid_entry(vcpu, 1, 0);
 	if (entry)
 		entry->ecx &= ~bit(X86_FEATURE_X2APIC);
+
+	return 0;
 }
 
 static int svm_set_supported_cpuid(u32 func, u32 index,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 31de95986dbd..3c1cc94e7e6d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9447,11 +9447,109 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 #undef cr4_fixed1_update
 }
 
-static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
+/* This should be called after vcpu's SGX CPUID has been properly setup */
+static void vmx_cpuid_get_sgx_cpuinfo(struct kvm_vcpu *vcpu, struct
+		sgx_cpuinfo *sgxinfo)
+{
+	struct kvm_cpuid_entry2 *best;
+	u64 base, size;
+
+	BUG_ON(!sgxinfo);
+
+	/* See comments in detect_sgx... */
+	memset(sgxinfo, 0, sizeof(struct sgx_cpuinfo));
+
+	best = kvm_find_cpuid_entry(vcpu, 0x12, 0);
+	if (!best)
+		goto not_supported;
+	if (!(best->eax & SGX_CAP_SGX1))
+		goto not_supported;
+
+	sgxinfo->cap = best->eax;
+	sgxinfo->miscselect = best->ebx;
+	sgxinfo->max_enclave_size32 = best->edx & 0xff;
+	sgxinfo->max_enclave_size64 = (best->edx & 0xff00) >> 8;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x12, 1);
+	if (!best)
+		goto not_supported;
+
+	sgxinfo->secs_attr_bitmask[0] = best->eax;
+	sgxinfo->secs_attr_bitmask[1] = best->ebx;
+	sgxinfo->secs_attr_bitmask[2] = best->ecx;
+	sgxinfo->secs_attr_bitmask[3] = best->edx;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x12, 2);
+	if (!(best->eax & 0x1) || !(best->ecx & 0x1))
+		goto not_supported;
+
+	base = (((u64)(best->ebx & 0xfffff)) << 32) | (best->eax & 0xfffff000);
+	size = (((u64)(best->edx & 0xfffff)) << 32) | (best->ecx & 0xfffff000);
+	if (!base || !size)
+		goto not_supported;
+
+	sgxinfo->epc_base = base;
+	sgxinfo->epc_size = size;
+
+	return;
+
+not_supported:
+	memset(sgxinfo, 0, sizeof(struct sgx_cpuinfo));
+}
+
+static int vmx_cpuid_update_sgx(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+	struct sgx_cpuinfo si;
+	int r;
+
+	/* Nothing to check if SGX is not enabled for guest */
+	if (!guest_cpuid_has_sgx(vcpu))
+		return 0;
+
+	/*
+	 * Update CPUID.0x12.0x1 according to vcpu->arch.guest_supported_xcr0,
+	 * which is calculated in kvm_update_cpuid. This is the reason we
+	 * change the order of kvm_x86_ops->cpuid_update and kvm_update_cpuid.
+	 */
+	best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1);
+	if (!best)
+		return -EFAULT;
+	best->ecx &= (unsigned int)(vcpu->arch.guest_supported_xcr0 & 0xffffffff);
+	best->ecx |= 0x3;
+	best->edx &= (unsigned int)(vcpu->arch.guest_supported_xcr0 >> 32);
+
+	/*
+	 * Make sure all SGX CPUIDs are properly set in KVM_SET_CPUID2 from
+	 * userspace. vmx_cpuid_get_sgx_cpuinfo will report invalid SGX if
+	 * any SGX CPUID is not properly setup in KVM_SET_CPUID2.
+	 */
+	vmx_cpuid_get_sgx_cpuinfo(vcpu, &si);
+	if (!(si.cap & SGX_CAP_SGX1))
+		return -EFAULT;
+
+	/*
+	 * Initialize guest's SGX staff here. To avoid a new IOCTL to allow
+	 * userspace to pass guest's (virtual) EPC base and size to, KVM gets
+	 * such info from KVM_SET_CPUID2. Initializing guest's SGX here also
+	 * provides a way for KVM to check whether userspace did everything
+	 * right about SGX CPUID (ex, inconsistent SGX CPUID between vcpus
+	 * is passed to KVM). In case of any error, we return error to reflect
+	 * userspace did something wrong about CPUID.
+	 */
+	r = kvm_init_sgx(vcpu->kvm, &si);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+static int vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 secondary_exec_ctl = vmx_secondary_exec_control(vmx);
+	int r = 0;
 
 	if (vmx_rdtscp_supported()) {
 		bool rdtscp_enabled = guest_cpuid_has_rdtscp(vcpu);
@@ -9491,6 +9589,12 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 
 	if (nested_vmx_allowed(vcpu))
 		nested_vmx_cr_fixed1_bits_update(vcpu);
+
+	r = vmx_cpuid_update_sgx(vcpu);
+	if (r)
+		return r;
+
+	return 0;
 }
 
 static int vmx_set_supported_cpuid(u32 func, u32 index,
@@ -11589,6 +11693,11 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static void vmx_vm_destroy(struct kvm *kvm)
+{
+	kvm_destroy_sgx(kvm);
+}
+
 static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -11600,6 +11709,8 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_accelerated_tpr = report_flexpriority,
 	.cpu_has_high_real_mode_segbase = vmx_has_high_real_mode_segbase,
 
+	.vm_destroy = vmx_vm_destroy,
+
 	.vcpu_create = vmx_create_vcpu,
 	.vcpu_free = vmx_free_vcpu,
 	.vcpu_reset = vmx_vcpu_reset,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 07/10] kvm: vmx: add SGX IA32_FEATURE_CONTROL MSR emulation
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (5 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 06/10] kvm: x86: add KVM_SET_CPUID2 " Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:24 ` [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support Kai Huang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

If CPUID.0x7.0:EBX.SGX == 1, IA32_FEATURE_CONTROL bit 18 (SGX enable) is valid.
If CPUID.0x7.0:ECX[bit30] = 1, IA32_FEATURE_CONTROL bit 17 (SGX Launch Control)
is valid. This patch emulate the two new bits for IA32_FEATURE_CONTROL MSR.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h |  2 ++
 arch/x86/kvm/vmx.c               | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d8b5f8ab8ef9..e3770f570bb9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -422,6 +422,8 @@
 #define FEATURE_CONTROL_LOCKED				(1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
 #define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX	(1<<2)
+#define FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE	(1<<17)
+#define FEATURE_CONTROL_SGX_ENABLE			(1<<18)
 #define FEATURE_CONTROL_LMCE				(1<<20)
 
 #define MSR_IA32_APICBASE		0x0000001b
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3c1cc94e7e6d..a16539594a99 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3329,6 +3329,20 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
+		/*
+		 * If guest's FEATURE_CONTROL_SGX_ENABLE is disabled, shall
+		 * we also clear vcpu's SGX CPUID? SDM (chapter 37.7.7.1)
+		 * says FEATURE_CONTROL_SGX_ENABLE bit doesn't reflect SGX
+		 * CPUID but in reality seems if FEATURE_CONTROL_SGX_ENABLE
+		 * is disabled, SGX CPUID will reports (at least) invalid EPC.
+		 * But looks we cannot just simply clear vcpu's SGX CPUID,
+		 * as Qemu may write IA32_FEATURE_CONTROL *before* or *after*
+		 * KVM_SET_CPUID2. If KVM_SET_CPUID2 is called first, and we
+		 * clear vcpu's SGX CPUID here, we will not be able to enable
+		 * SGX again as SGX CPUID info has already lost. Therefore do
+		 * nothing here. We assume guest will always check whether
+		 * SGX has been enabled in BIOS before using SGX.
+		 */
 		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!msr_info->host_initiated)
@@ -9594,6 +9608,20 @@ static int vmx_cpuid_update(struct kvm_vcpu *vcpu)
 	if (r)
 		return r;
 
+	if (guest_cpuid_has_sgx(vcpu)) {
+		/*
+		 * If CPUID.0x7.0:EBX.SGX = 1, SGX can be opt-in{out} in BIOS
+		 * via IA32_FEATURE_CONTROL bit 18. If CPUID.0x7.0:ECX[bit30]
+		 * = 1, IA32_FEATURE_CONTROL bit 17 is valid to enable runtime
+		 * SGX Launch Control.
+		 */
+		to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
+			FEATURE_CONTROL_SGX_ENABLE;
+		if (guest_cpuid_has_sgx_launch_control(vcpu))
+			to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
+				FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
+	}
+
 	return 0;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (6 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 07/10] kvm: vmx: add SGX IA32_FEATURE_CONTROL MSR emulation Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-12  0:32   ` Huang, Kai
  2017-05-08  5:24 ` [PATCH 09/10] kvm: vmx: handle ENCLS VMEXIT Kai Huang
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

If SGX runtime launch control is enabled on host (IA32_FEATURE_CONTROL[17]
is set), KVM can support running multiple guests with each running LE signed
with different RSA pubkey. KVM traps IA32_SGXLEPUBKEYHASHn MSR write from
and keeps the values to vcpu internally, and when vcpu is scheduled in, KVM
write those values to real IA32_SGXLEPUBKEYHASHn MSR.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h |   5 ++
 arch/x86/kvm/vmx.c               | 123 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e3770f570bb9..70482b951b0f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -417,6 +417,11 @@
 #define MSR_IA32_TSC_ADJUST             0x0000003b
 #define MSR_IA32_BNDCFGS		0x00000d90
 
+#define MSR_IA32_SGXLEPUBKEYHASH0	0x0000008c
+#define MSR_IA32_SGXLEPUBKEYHASH1	0x0000008d
+#define MSR_IA32_SGXLEPUBKEYHASH2	0x0000008e
+#define MSR_IA32_SGXLEPUBKEYHASH3	0x0000008f
+
 #define MSR_IA32_XSS			0x00000da0
 
 #define FEATURE_CONTROL_LOCKED				(1<<0)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a16539594a99..c96332b9dd44 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -656,6 +656,9 @@ struct vcpu_vmx {
 	 */
 	u64 msr_ia32_feature_control;
 	u64 msr_ia32_feature_control_valid_bits;
+
+	/* SGX Launch Control public key hash */
+	u64 msr_ia32_sgxlepubkeyhash[4];
 };
 
 enum segment_cache_field {
@@ -2244,6 +2247,70 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx)
 	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
 }
 
+static bool cpu_sgx_lepubkeyhash_writable(void)
+{
+	u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED |
+			FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE);
+
+	rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
+
+	return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask);
+}
+
+static bool vmx_sgx_lc_disabled_in_bios(struct kvm_vcpu *vcpu)
+{
+	return (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED)
+		&& (!(to_vmx(vcpu)->msr_ia32_feature_control &
+				FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE));
+}
+
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH0		0xa6053e051270b7ac
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH1	        0x6cfbe8ba8b3b413d
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH2		0xc4916d99f2b3735d
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH3		0xd4f8c05909f9bb3b
+
+static void vmx_sgx_init_lepubkeyhash(struct kvm_vcpu *vcpu)
+{
+	u64 h0, h1, h2, h3;
+
+	/*
+	 * If runtime launch control is enabled (IA32_SGXLEPUBKEYHASHn is
+	 * writable), we set guest's default value to be Intel's default
+	 * hash (which is fixed value and can be hard-coded). Otherwise,
+	 * guest can only use machine's IA32_SGXLEPUBKEYHASHn so set guest's
+	 * default to that.
+	 */
+	if (cpu_sgx_lepubkeyhash_writable()) {
+		h0 = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
+		h1 = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
+		h2 = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
+		h3 = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
+	}
+	else {
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH0, h0);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, h1);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, h2);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, h3);
+	}
+
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0] = h0;
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1] = h1;
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2] = h2;
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3] = h3;
+}
+
+static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu)
+{
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0]);
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1]);
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2]);
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3]);
+}
+
 /*
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
@@ -2316,6 +2383,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	vmx_vcpu_pi_load(vcpu, cpu);
 	vmx->host_pkru = read_pkru();
+
+	/*
+	 * Load guset's SGX LE pubkey hash if runtime launch control is
+	 * enabled.
+	 */
+	if (guest_cpuid_has_sgx_launch_control(vcpu) &&
+			cpu_sgx_lepubkeyhash_writable())
+		vmx_sgx_lepubkeyhash_load(vcpu);
 }
 
 static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
@@ -3225,6 +3300,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_FEATURE_CONTROL:
 		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		/*
+		 * SDM 35.1 Model-Specific Registers, table 35-2.
+		 * Read permitted if CPUID.0x12.0:EAX[0] = 1. (We have
+		 * guaranteed this will be true if guest_cpuid_has_sgx
+		 * is true.)
+		 */
+		if (!guest_cpuid_has_sgx(vcpu))
+			return 1;
+		msr_info->data =
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_info->index -
+			MSR_IA32_SGXLEPUBKEYHASH0];
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
@@ -3344,6 +3432,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * SGX has been enabled in BIOS before using SGX.
 		 */
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		/*
+		 * SDM 35.1 Model-Specific Registers, table 35-2.
+		 * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is
+		 * available.
+		 * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
+		 * FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
+		 */
+		if (!guest_cpuid_has_sgx(vcpu) ||
+				!guest_cpuid_has_sgx_launch_control(vcpu))
+			return 1;
+		/*
+		 * Don't let userspace set guest's IA32_SGXLEPUBKEYHASHn,
+		 * if machine's IA32_SGXLEPUBKEYHASHn cannot be changed at
+		 * runtime. Note to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash are
+		 * set to default in vmx_create_vcpu therefore guest is able
+		 * to get the machine's IA32_SGXLEPUBKEYHASHn by rdmsr in
+		 * guest.
+		 */
+		if (!cpu_sgx_lepubkeyhash_writable())
+			return 1;
+		/*
+		 * If guest's FEATURE_CONTROL[17] is not set, guest's
+		 * IA32_SGXLEPUBKEYHASHn are not writeable from guest.
+		 */
+		if (!vmx_sgx_lc_disabled_in_bios(vcpu) &&
+				!msr_info->host_initiated)
+			return 1;
+		to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_index -
+			MSR_IA32_SGXLEPUBKEYHASH0] = data;
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!msr_info->host_initiated)
 			return 1; /* they are read-only */
@@ -9305,6 +9424,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 		vmx->nested.vpid02 = allocate_vpid();
 	}
 
+	/* Set vcpu's default IA32_SGXLEPUBKEYHASHn */
+	if (enable_sgx && boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL))
+		vmx_sgx_init_lepubkeyhash(&vmx->vcpu);
+
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = -1ull;
 	vmx->nested.current_vmcs12 = NULL;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 09/10] kvm: vmx: handle ENCLS VMEXIT
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (7 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  8:08   ` Paolo Bonzini
  2017-05-08  5:24 ` [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave Kai Huang
  2017-05-08  5:24 ` [PATCH 11/11] kvm: vmx: workaround FEATURE_CONTROL[17] is not set by BIOS Kai Huang
  10 siblings, 1 reply; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

This patch handles ENCLS VMEXIT. ENCLS VMEXIT doesn't need to be always turned
on, actually it should not be turned on in most cases, as guest can run ENCLS
perfectly in non-root mode. However there are some cases we need to trap ENCLS
and emulate as in those cases ENCLS in guest may behavor differently with
in native (for example, when hardware supports SGX but SGX is not exposed to
guest, and if guest runs ENCLS deliberately, it may have different behavior to
on native).

In case of nested SGX support, we need to turn on ENCLS VMEXIT if L1 hypervisor
has turned on ENCLS VMEXIT, and such ENCLS VMEXIT from L2 (nested guest) will
be handled by L1 hypervisor.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/vmx.h      |   2 +
 arch/x86/include/uapi/asm/vmx.h |   4 +-
 arch/x86/kvm/vmx.c              | 265 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 270 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index f7ac249ce83d..2f24290b7f9d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -202,6 +202,8 @@ enum vmcs_field {
 	XSS_EXIT_BITMAP_HIGH            = 0x0000202D,
 	TSC_MULTIPLIER                  = 0x00002032,
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
+	ENCLS_EXITING_BITMAP		= 0x0000202E,
+	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 14458658e988..2bcd967d5c83 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -77,6 +77,7 @@
 #define EXIT_REASON_XSETBV              55
 #define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
+#define EXIT_REASON_ENCLS		60
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
@@ -130,7 +131,8 @@
 	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
 	{ EXIT_REASON_INVPCID,               "INVPCID" }, \
 	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
-	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
+	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
+	{ EXIT_REASON_ENCLS,		     "ENCLS" }
 
 #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
 #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL       2
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c96332b9dd44..b5f37982e975 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -254,6 +254,7 @@ struct __packed vmcs12 {
 	u64 eoi_exit_bitmap2;
 	u64 eoi_exit_bitmap3;
 	u64 xss_exit_bitmap;
+	u64 encls_exiting_bitmap;
 	u64 guest_physical_address;
 	u64 vmcs_link_pointer;
 	u64 guest_ia32_debugctl;
@@ -780,6 +781,7 @@ static const unsigned short vmcs_field_to_offset_table[] = {
 	FIELD64(EOI_EXIT_BITMAP2, eoi_exit_bitmap2),
 	FIELD64(EOI_EXIT_BITMAP3, eoi_exit_bitmap3),
 	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
+	FIELD64(ENCLS_EXITING_BITMAP, encls_exiting_bitmap),
 	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
 	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
 	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
@@ -1402,6 +1404,11 @@ static inline bool nested_cpu_has_posted_intr(struct vmcs12 *vmcs12)
 	return vmcs12->pin_based_vm_exec_control & PIN_BASED_POSTED_INTR;
 }
 
+static inline bool nested_cpu_has_encls_exit(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING);
+}
+
 static inline bool is_nmi(u32 intr_info)
 {
 	return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -2312,6 +2319,128 @@ static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Setup ENCLS VMEXIT on current VMCS according to encls_vmexit_bitmap.
+ * If encls_vmexit_bitmap is 0, we also disable ENCLS VMEXIT in secondary
+ * execution control. Otherwise we enable ENCLS VMEXIT.
+ *
+ * Must be called after vcpu is loaded.
+ */
+static void vmx_set_encls_vmexit_bitmap(struct kvm_vcpu *vcpu, u64
+		encls_vmexit_bitmap)
+{
+	u32 secondary_exec_ctl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+
+	if (encls_vmexit_bitmap)
+		secondary_exec_ctl |= SECONDARY_EXEC_ENCLS_EXITING;
+	else
+		secondary_exec_ctl &= ~SECONDARY_EXEC_ENCLS_EXITING;
+
+	vmcs_write64(ENCLS_EXITING_BITMAP, encls_vmexit_bitmap);
+	vmcs_write32(SECONDARY_VM_EXEC_CONTROL, secondary_exec_ctl);
+}
+
+static void vmx_enable_encls_vmexit_all(struct kvm_vcpu *vcpu)
+{
+	vmx_set_encls_vmexit_bitmap(vcpu, -1ULL);
+}
+
+/* Disable ENCLS VMEXIT on current VMCS. Must be called after vcpu is loaded. */
+static void vmx_disable_encls_vmexit(struct kvm_vcpu *vcpu)
+{
+	vmx_set_encls_vmexit_bitmap(vcpu, 0);
+}
+
+static bool vmx_sgx_enabled_in_bios(struct kvm_vcpu *vcpu)
+{
+	u32 sgx_opted_in = FEATURE_CONTROL_SGX_ENABLE | FEATURE_CONTROL_LOCKED;
+
+	return (to_vmx(vcpu)->msr_ia32_feature_control & sgx_opted_in) ==
+		sgx_opted_in;
+}
+
+static void vmx_update_encls_vmexit(struct kvm_vcpu *vcpu)
+{
+	/* Hardware doesn't support SGX */
+	if (!cpu_has_vmx_encls_vmexit())
+		return;
+
+	/*
+	 * ENCLS error check sequence:
+	 *
+	 * 1) IF CR0.PE = 0 (real mode), or RFLAGS.VM = 1 (virtual-8086 mode),
+	 *    or SMM mode, or CPUID.0x12.0x0:EAX.SGX1 = 0
+	 *	#UD
+	 *
+	 * 2) IF CPL > 0
+	 *	#UD
+	 *
+	 * 3) VMEXIT if enabled
+	 *
+	 * 4) IA32_FEATURE_CONTROL.LOCK, or IA32_FEATURE_CONTROL.SGX_ENABLE = 0
+	 *	#GP
+	 *
+	 * 5) IF RAX = invalid leaf function
+	 *	#GP
+	 *
+	 * 6) IF CR0.PG = 0 (paging disabled)
+	 *	#GP
+	 *
+	 * 7) IF not in 64-bit mode, and DS.type is expend-down data
+	 *	#GP
+	 *
+	 *    Note: non 64-bit mode (32-bit mode) means:
+	 *	- protected mode
+	 *	- IA32e mode's compatibility mode (IA32_EFER.LMA = 1, CS.L = 1)
+	 *
+	 *    Currently KVM doesn't do anything in terms of compatibility mode
+	 *    (SECONDARY_VM_EXEC_CONTROL[bit 2] (descriptor-table exiting) is not
+	 *    enabled, so KVM won't trap any segment register operation in
+	 *    guest). We don't have to trap ENCLS for compatibility mode as
+	 *    ENCLS will behavior just the same in guest.
+	 *
+	 * So, to correctly emulate ENCLS, below ENCLS VMEXIT policy is applied:
+	 *
+	 * - For 1), in real mode, SMM mode, no need to trap ENCLS (we cannot
+	 *   actually, as this check happens before VMEXIT).
+	 *
+	 * - If SGX is not exposed to guest (guest_cpuid_has_sgx(vcpu) == 0), or
+	 *   SGX is not enabled in guest's BIOS (vmx->msr_ia32_feature_control
+	 *   doesn't have SGX_ENABLE or LOCK bit set), we need to turn on ENCLS
+	 *   VMEXIT for protect mode and long mode. The reason is, we need to
+	 *   inject #UD for the formar and inject #GP for the latter. The
+	 *   hardware actually has SGX support and it is indeed enabled in
+	 *   physical BIOS, so may be ENCLS will have different behavior with
+	 *   SDM when running in guest.
+	 *
+	 * - For 5), 6), 7), no need to trap ENCLS, as ENCLS will just cause
+	 *   #GP while running in guest.
+	 *
+	 * Most importantly:
+	 *
+	 * - If guest supports SGX, and SGX is enabled in guest's BIOS, on the
+	 *   contrary we don't want to turn on ENCLS VMEXIT, as ENCLS can
+	 *   perfectly run in guest while having the same hardware behavior.
+	 *   Trapping ENCLS from guest is meaningless but only hurt performance.
+	 */
+
+	/* It's pointless to update ENCLS VMEXIT while guest in real mode */
+	if (to_vmx(vcpu)->rmode.vm86_active)
+		return;
+
+	if (!guest_cpuid_has_sgx(vcpu) || !vmx_sgx_enabled_in_bios(vcpu)) {
+		vmx_enable_encls_vmexit_all(vcpu);
+		return;
+	}
+
+	/* If ENCLS VMEXIT is turned on nested, don't disable it */
+	if (nested && is_guest_mode(vcpu) &&
+			nested_cpu_has_encls_exit(get_vmcs12(vcpu)))
+		return;
+
+	vmx_disable_encls_vmexit(vcpu);
+}
+
+/*
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
  */
@@ -3417,6 +3546,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
+
+		/* SGX may be enabled/disabled in guest's BIOS */
+		vmx_update_encls_vmexit(vcpu);
+
 		/*
 		 * If guest's FEATURE_CONTROL_SGX_ENABLE is disabled, shall
 		 * we also clear vcpu's SGX CPUID? SDM (chapter 37.7.7.1)
@@ -4131,6 +4264,9 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 		msr->data = efer & ~EFER_LME;
 	}
 	setup_msrs(vmx);
+
+	/* Possible mode change */
+	vmx_update_encls_vmexit(vcpu);
 }
 
 #ifdef CONFIG_X86_64
@@ -4337,6 +4473,9 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 
 	/* depends on vcpu->arch.cr0 to be set to a new value */
 	vmx->emulation_required = emulation_required(vcpu);
+
+	/* Possible mode change */
+	vmx_update_encls_vmexit(vcpu);
 }
 
 static u64 construct_eptp(unsigned long root_hpa)
@@ -4548,6 +4687,9 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
 
 out:
 	vmx->emulation_required = emulation_required(vcpu);
+
+	/* Possible mode change */
+	vmx_update_encls_vmexit(vcpu);
 }
 
 static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -7992,6 +8134,73 @@ static int handle_preemption_timer(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int nested_handle_encls_exit(struct kvm_vcpu *vcpu)
+{
+	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+	if (guest_cpuid_has_sgx(vcpu)) {
+		/*
+		 * Which means SGX is exposed to L1 but is disabled in
+		 * L1's BIOS. We should inject #GP according to SDM
+		 * (Chapter 37.7.1 Intel SGX Opt-in Configuration).
+		 *
+		 * nested_cpu_has_encls_exit cannot be true as in this
+		 * case we have allowed L1 to handle ENCLS VMEXIT.
+		 */
+		BUG_ON(nested_cpu_has_encls_exit(vmcs12));
+
+		kvm_inject_gp(vcpu, 0);
+	}
+	else {
+		/*
+		 * Which means we didn't expose SGX to L1 at all. Inject
+		 * #UD according to SDM.
+		 */
+		kvm_queue_exception(vcpu, UD_VECTOR);
+	}
+
+	return 1;
+}
+
+/*
+ * Handle ENCLS VMEXIT due to unexpected ENCLS from in guest, including
+ * executing ENCLS when SGX is not exposed to guest, or SGX is disabled
+ * in guest BIOS.
+ *
+ * Return 1 if handled, 0 if not handled
+ */
+static int handle_unexpected_encls(struct kvm_vcpu *vcpu)
+{
+	if (guest_cpuid_has_sgx(vcpu) && vmx_sgx_enabled_in_bios(vcpu))
+		return 0;
+
+	if (!guest_cpuid_has_sgx(vcpu))
+		kvm_queue_exception(vcpu, UD_VECTOR);
+	else	/* !vmx_sgx_enabled_in_bios(vcpu)) */
+		kvm_inject_gp(vcpu, 0);
+
+	kvm_x86_ops->skip_emulated_instruction(vcpu);
+
+	return 1;
+}
+
+static int handle_encls(struct kvm_vcpu *vcpu)
+{
+	/* Handle ENCLS VMEXIT from L2 */
+	if (is_guest_mode(vcpu))
+		return nested_handle_encls_exit(vcpu);
+
+	/*
+	 * Handle unexpected ENCLS VMEXIT. If successfully handled we can
+	 * just return to guest to run.
+	 */
+	if (handle_unexpected_encls(vcpu))
+		return 1;
+
+	/* So far ENCLS is not trapped in normal cases. */
+	return -EFAULT;
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -8043,6 +8252,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_XRSTORS]                 = handle_xrstors,
 	[EXIT_REASON_PML_FULL]		      = handle_pml_full,
 	[EXIT_REASON_PREEMPTION_TIMER]	      = handle_preemption_timer,
+	[EXIT_REASON_ENCLS]		      = handle_encls,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -8356,6 +8566,43 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 	case EXIT_REASON_PML_FULL:
 		/* We don't expose PML support to L1. */
 		return false;
+	case EXIT_REASON_ENCLS:
+		/*
+		 * So far we don't trap ENCLS in normal case (meaning SGX is
+		 * exposed to guest and SGX is enabled in guest's BIOS).
+		 * If SGX is enabled in L1 hypervisor properly, L1 hypervisor
+		 * should take care of this ENCLS VMEXIT, otherwise L0
+		 * hypervisor should handle this ENCLS VMEXIT and inject proper
+		 * error (#UD or #GP) according to ENCLS behavior in abnormal
+		 * SGX environment.
+		 */
+		if (guest_cpuid_has_sgx(vcpu) &&
+				vmx_sgx_enabled_in_bios(vcpu)) {
+			/*
+			 * As explained above, if SGX in L1 hypervisor is
+			 * normal, ENCLS VMEXIT from L2 guest should be due
+			 * to L1 turned on ENCLS VMEXIT, as L0 won't turn on
+			 * ENCLS VMEXIT in this case. We don't want to handle
+			 * this case in L0 as we really don't know how to,
+			 * and instead, we depend on L1 hypervisor to handle.
+			 */
+			WARN_ON(!nested_cpu_has_encls_exit(vmcs12));
+			return true;
+		}
+		else if (guest_cpuid_has_sgx(vcpu)) {
+			/*
+			 * If SGX is exposed to L1 but SGX is not turned on
+			 * in L1's BIOS, then L1 may or may not turn on ENCLS
+			 * VMEXIT. If ENCLS VMEXIT is turned on in L1, VMEXIT
+			 * happens prior to FEATURE_CONTROL check, so we inject
+			 * ENCLS VMEXIT to L1. Otherwise we let L0 inject #GP
+			 * directly to L2.
+			 */
+			return nested_cpu_has_encls_exit(vmcs12);
+		}
+		else {
+			return false;
+		}
 	default:
 		return true;
 	}
@@ -9743,8 +9990,19 @@ static int vmx_cpuid_update(struct kvm_vcpu *vcpu)
 		if (guest_cpuid_has_sgx_launch_control(vcpu))
 			to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
 				FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
+
+		/*
+		 * To reflect hardware hebavior, We must allow guest to be able
+		 * to set ENCLS exiting if we expose SGX to guest.
+		 */
+		if (nested_vmx_allowed(vcpu))
+			to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high |=
+				SECONDARY_EXEC_ENCLS_EXITING;
 	}
 
+	/* SGX CPUID may be changed */
+	vmx_update_encls_vmexit(vcpu);
+
 	return 0;
 }
 
@@ -10491,6 +10749,13 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)
 			vmcs_write64(APIC_ACCESS_ADDR, -1ull);
 
+		/* If L1 has turned on ENCLS vmexit, we need to honor that. */
+		if (nested_cpu_has_encls_exit(vmcs12)) {
+			exec_control |= SECONDARY_EXEC_ENCLS_EXITING;
+			vmcs_write64(ENCLS_EXITING_BITMAP,
+					vmcs12->encls_exiting_bitmap);
+		}
+
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (8 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 09/10] kvm: vmx: handle ENCLS VMEXIT Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  8:22   ` Paolo Bonzini
  2017-05-08  5:24 ` [PATCH 11/11] kvm: vmx: workaround FEATURE_CONTROL[17] is not set by BIOS Kai Huang
  10 siblings, 1 reply; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

VMX adds new bit to both exit_reason and GUEST_INTERRUPT_STATE to indicate
whether VMEXIT happens in Enclave. Several instructions are also invalid or
behave differently in enclave according to SDM. This patch handles those
cases.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/vmx.h      |   1 +
 arch/x86/include/uapi/asm/vmx.h |   1 +
 arch/x86/kvm/vmx.c              | 120 +++++++++++++++++++++++++++++++++-------
 3 files changed, 103 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 2f24290b7f9d..ec91f68f4511 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -351,6 +351,7 @@ enum vmcs_field {
 #define GUEST_INTR_STATE_MOV_SS		0x00000002
 #define GUEST_INTR_STATE_SMI		0x00000004
 #define GUEST_INTR_STATE_NMI		0x00000008
+#define GUEST_INTR_STATE_ENCLAVE_INTR	0x00000010
 
 /* GUEST_ACTIVITY_STATE flags */
 #define GUEST_ACTIVITY_ACTIVE		0
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 2bcd967d5c83..6f18898c003d 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -26,6 +26,7 @@
 
 
 #define VMX_EXIT_REASONS_FAILED_VMENTRY         0x80000000
+#define VMX_EXIT_REASON_FROM_ENCLAVE		0x08000000
 
 #define EXIT_REASON_EXCEPTION_NMI       0
 #define EXIT_REASON_EXTERNAL_INTERRUPT  1
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b5f37982e975..1022295ba925 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2628,6 +2628,24 @@ static void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
 		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, interruptibility);
 }
 
+static bool vmx_exit_from_enclave(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * We have 2 bits to indicate whether VMEXIT happens from enclave --
+	 * bit 27 in VM_EXIT_REASON, and bit 4 in GUEST_INTERRUPTIBILITY_INFO.
+	 * Currently use latter to check whether VMEXIT happens from enclave,
+	 * but note that we never clear this bit therefore we assume hardware
+	 * will clear this bit when VMEXIT happens not from enclave, which
+	 * should be the case.
+	 *
+	 * We can either do this via bit 27 in VM_EXIT_REASON, by adding a bool
+	 * in vmx and set it in vmx_handle_exit when above bit is set, and clear
+	 * the bool right before vmentry to guest.
+	 */
+	return vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+		GUEST_INTR_STATE_ENCLAVE_INTR ? true : false;
+}
+
 static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	unsigned long rip;
@@ -5457,6 +5475,25 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	return exec_control;
 }
 
+static void vmcs_set_secondary_exec_control(u32 new_ctl)
+{
+	/*
+	 * These bits in the secondary execution controls field
+	 * are dynamic, the others are mostly based on the hypervisor
+	 * architecture and the guest's CPUID.  Do not touch the
+	 * dynamic bits.
+	 */
+	u32 mask =
+		SECONDARY_EXEC_SHADOW_VMCS |
+		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+
+	u32 cur_ctl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+
+	vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
+		     (new_ctl & ~mask) | (cur_ctl & mask));
+}
+
 static void ept_set_mmio_spte_mask(void)
 {
 	/*
@@ -6305,6 +6342,12 @@ static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 
 static int handle_cpuid(struct kvm_vcpu *vcpu)
 {
+	/* CPUID is invalid in enclave */
+	if (vmx_exit_from_enclave(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
 	return kvm_emulate_cpuid(vcpu);
 }
 
@@ -6378,6 +6421,16 @@ static int handle_vmcall(struct kvm_vcpu *vcpu)
 
 static int handle_invd(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * SDM 39.6.5 INVD Handling when Enclaves Are Enabled.
+	 *
+	 * Spec says INVD causes #GP if EPC is enabled.
+	 */
+	if (vmx_exit_from_enclave(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
@@ -6399,6 +6452,18 @@ static int handle_rdpmc(struct kvm_vcpu *vcpu)
 
 static int handle_wbinvd(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * SDM 39.6.5 INVD Handling when Enclaves Are Enabled.
+	 *
+	 * Spec says INVD causes #GP if EPC is enabled.
+	 *
+	 * FIXME: Does this also apply to WBINVD?
+	 */
+	if (vmx_exit_from_enclave(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
 	return kvm_emulate_wbinvd(vcpu);
 }
 
@@ -6977,6 +7042,31 @@ static __exit void hardware_unsetup(void)
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * SDM 39.6.3 PAUSE Instruction.
+	 *
+	 * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM should
+	 * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
+	 * again without further PAUSE-looping VMEXIT.
+	 *
+	 * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should disable
+	 * 'PAUSE exiting' so PAUSE can be executed in Enclave again without
+	 * further PAUSE VMEXIT.
+	 */
+	if (vmx_exit_from_enclave(vcpu)) {
+		u32 exec_ctl, secondary_exec_ctl;
+
+		exec_ctl = vmx_exec_control(to_vmx(vcpu));
+		exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
+		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
+
+		secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
+		secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+		vmcs_set_secondary_exec_control(secondary_exec_ctl);
+
+		return 1;
+	}
+
 	if (ple_gap)
 		grow_ple_window(vcpu);
 
@@ -8876,6 +8966,17 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 		return 0;
 	}
 
+	/* Bit 27 of exit_reason will be set if VMEXT is from SGX enclave. */
+	if (exit_reason & VMX_EXIT_REASON_FROM_ENCLAVE) {
+		/*
+		 * Need to clear bit 27 otherwise further check of calling
+		 * kvm_vmx_exit_handlers would fail. We rely on bit 4 of
+		 * GUEST_INTERRUPTIBILITY_INFO to determine whether VMEXIT
+		 * is from enclave in the future.
+		 */
+		exit_reason &= ~VMX_EXIT_REASON_FROM_ENCLAVE;
+	}
+
 	/*
 	 * Note:
 	 * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
@@ -9768,25 +9869,6 @@ static int vmx_get_lpage_level(void)
 		return PT_PDPE_LEVEL;
 }
 
-static void vmcs_set_secondary_exec_control(u32 new_ctl)
-{
-	/*
-	 * These bits in the secondary execution controls field
-	 * are dynamic, the others are mostly based on the hypervisor
-	 * architecture and the guest's CPUID.  Do not touch the
-	 * dynamic bits.
-	 */
-	u32 mask =
-		SECONDARY_EXEC_SHADOW_VMCS |
-		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
-		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-
-	u32 cur_ctl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
-
-	vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
-		     (new_ctl & ~mask) | (cur_ctl & mask));
-}
-
 /*
  * Generate MSR_IA32_VMX_CR{0,4}_FIXED1 according to CPUID. Only set bits
  * (indicating "allowed-1") if they are supported in the guest's CPUID.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 11/11] kvm: vmx: workaround FEATURE_CONTROL[17] is not set by BIOS
  2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
                   ` (9 preceding siblings ...)
  2017-05-08  5:24 ` [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave Kai Huang
@ 2017-05-08  5:24 ` Kai Huang
  2017-05-08  5:29   ` Huang, Kai
  10 siblings, 1 reply; 78+ messages in thread
From: Kai Huang @ 2017-05-08  5:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm

even this bit is not set by BIOS, current ucode patch allows write to
IA32_SGXLEPUBKEYHASHn.
---
 arch/x86/kvm/vmx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1022295ba925..9e687ce45b48 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2254,12 +2254,20 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx)
 	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
 }
 
+#define	UCODE_PATCH
 static bool cpu_sgx_lepubkeyhash_writable(void)
 {
 	u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED |
 			FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE);
 
 	rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
+#ifdef UCODE_PATCH
+	/*
+	 * current ucode patch can support write to IA32_SGXLEPUBKEYHASHn
+	 * even if FEATURE_CONTROL[17] is not set.
+	 */
+	val |=  FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
+#endif
 
 	return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/11] kvm: vmx: workaround FEATURE_CONTROL[17] is not set by BIOS
  2017-05-08  5:24 ` [PATCH 11/11] kvm: vmx: workaround FEATURE_CONTROL[17] is not set by BIOS Kai Huang
@ 2017-05-08  5:29   ` Huang, Kai
  0 siblings, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-08  5:29 UTC (permalink / raw)
  To: Kai Huang, pbonzini, rkrcmar, kvm

Oops.. Please ignore this patch :)

Thanks,
-Kai

On 5/8/2017 5:24 PM, Kai Huang wrote:
> even this bit is not set by BIOS, current ucode patch allows write to
> IA32_SGXLEPUBKEYHASHn.
> ---
>  arch/x86/kvm/vmx.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1022295ba925..9e687ce45b48 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2254,12 +2254,20 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx)
>  	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
>  }
>
> +#define	UCODE_PATCH
>  static bool cpu_sgx_lepubkeyhash_writable(void)
>  {
>  	u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED |
>  			FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE);
>
>  	rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
> +#ifdef UCODE_PATCH
> +	/*
> +	 * current ucode patch can support write to IA32_SGXLEPUBKEYHASHn
> +	 * even if FEATURE_CONTROL[17] is not set.
> +	 */
> +	val |=  FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
> +#endif
>
>  	return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask);
>  }
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/10] kvm: vmx: handle ENCLS VMEXIT
  2017-05-08  5:24 ` [PATCH 09/10] kvm: vmx: handle ENCLS VMEXIT Kai Huang
@ 2017-05-08  8:08   ` Paolo Bonzini
  2017-05-10  1:30     ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2017-05-08  8:08 UTC (permalink / raw)
  To: Kai Huang, rkrcmar, kvm



On 08/05/2017 07:24, Kai Huang wrote:
> This patch handles ENCLS VMEXIT. ENCLS VMEXIT doesn't need to be always turned
> on, actually it should not be turned on in most cases, as guest can run ENCLS
> perfectly in non-root mode. However there are some cases we need to trap ENCLS
> and emulate as in those cases ENCLS in guest may behavor differently with
> in native (for example, when hardware supports SGX but SGX is not exposed to
> guest, and if guest runs ENCLS deliberately, it may have different behavior to
> on native).
> 
> In case of nested SGX support, we need to turn on ENCLS VMEXIT if L1 hypervisor
> has turned on ENCLS VMEXIT, and such ENCLS VMEXIT from L2 (nested guest) will
> be handled by L1 hypervisor.
> 
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  arch/x86/include/asm/vmx.h      |   2 +
>  arch/x86/include/uapi/asm/vmx.h |   4 +-
>  arch/x86/kvm/vmx.c              | 265 ++++++++++++++++++++++++++++++++++++++++

Please try to move more code to sgx.c.

Paolo

>  3 files changed, 270 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index f7ac249ce83d..2f24290b7f9d 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -202,6 +202,8 @@ enum vmcs_field {
>  	XSS_EXIT_BITMAP_HIGH            = 0x0000202D,
>  	TSC_MULTIPLIER                  = 0x00002032,
>  	TSC_MULTIPLIER_HIGH             = 0x00002033,
> +	ENCLS_EXITING_BITMAP		= 0x0000202E,
> +	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>  	VMCS_LINK_POINTER               = 0x00002800,
> diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
> index 14458658e988..2bcd967d5c83 100644
> --- a/arch/x86/include/uapi/asm/vmx.h
> +++ b/arch/x86/include/uapi/asm/vmx.h
> @@ -77,6 +77,7 @@
>  #define EXIT_REASON_XSETBV              55
>  #define EXIT_REASON_APIC_WRITE          56
>  #define EXIT_REASON_INVPCID             58
> +#define EXIT_REASON_ENCLS		60
>  #define EXIT_REASON_PML_FULL            62
>  #define EXIT_REASON_XSAVES              63
>  #define EXIT_REASON_XRSTORS             64
> @@ -130,7 +131,8 @@
>  	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
>  	{ EXIT_REASON_INVPCID,               "INVPCID" }, \
>  	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
> -	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
> +	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
> +	{ EXIT_REASON_ENCLS,		     "ENCLS" }
>  
>  #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
>  #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL       2

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave
  2017-05-08  5:24 ` [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave Kai Huang
@ 2017-05-08  8:22   ` Paolo Bonzini
  2017-05-11  9:34     ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2017-05-08  8:22 UTC (permalink / raw)
  To: Kai Huang, rkrcmar, kvm



On 08/05/2017 07:24, Kai Huang wrote:
> @@ -6977,6 +7042,31 @@ static __exit void hardware_unsetup(void)
>   */
>  static int handle_pause(struct kvm_vcpu *vcpu)
>  {
> +	/*
> +	 * SDM 39.6.3 PAUSE Instruction.
> +	 *
> +	 * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM should
> +	 * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
> +	 * again without further PAUSE-looping VMEXIT.
> +	 *
> +	 * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should disable
> +	 * 'PAUSE exiting' so PAUSE can be executed in Enclave again without
> +	 * further PAUSE VMEXIT.
> +	 */

How is PLE re-enabled?

I don't understand the interaction of the internal control registers
(paragraph 41.1.4) with VMX.  How can you migrate the VM between EENTER
and EEXIT?

In addition, paragraph 41.1.4 does not include the parts of CR_SAVE_FS*
and CR_SAVE_GS* (base, limit, access rights) and does not include
CR_ENCLAVE_ENTRY_IP.

Paolo

> +	if (vmx_exit_from_enclave(vcpu)) {
> +		u32 exec_ctl, secondary_exec_ctl;
> +
> +		exec_ctl = vmx_exec_control(to_vmx(vcpu));
> +		exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
> +		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
> +
> +		secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
> +		secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> +		vmcs_set_secondary_exec_control(secondary_exec_ctl);
> +
> +		return 1;
> +	}
> +

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/10] kvm: vmx: handle ENCLS VMEXIT
  2017-05-08  8:08   ` Paolo Bonzini
@ 2017-05-10  1:30     ` Huang, Kai
  0 siblings, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-10  1:30 UTC (permalink / raw)
  To: Paolo Bonzini, Kai Huang, rkrcmar, kvm



On 5/8/2017 8:08 PM, Paolo Bonzini wrote:
>
>
> On 08/05/2017 07:24, Kai Huang wrote:
>> This patch handles ENCLS VMEXIT. ENCLS VMEXIT doesn't need to be always turned
>> on, actually it should not be turned on in most cases, as guest can run ENCLS
>> perfectly in non-root mode. However there are some cases we need to trap ENCLS
>> and emulate as in those cases ENCLS in guest may behavor differently with
>> in native (for example, when hardware supports SGX but SGX is not exposed to
>> guest, and if guest runs ENCLS deliberately, it may have different behavior to
>> on native).
>>
>> In case of nested SGX support, we need to turn on ENCLS VMEXIT if L1 hypervisor
>> has turned on ENCLS VMEXIT, and such ENCLS VMEXIT from L2 (nested guest) will
>> be handled by L1 hypervisor.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>  arch/x86/include/asm/vmx.h      |   2 +
>>  arch/x86/include/uapi/asm/vmx.h |   4 +-
>>  arch/x86/kvm/vmx.c              | 265 ++++++++++++++++++++++++++++++++++++++++
>
> Please try to move more code to sgx.c.

Hi Paolo,

Thanks for comments. Will try to do this in next version.

Thanks,
-Kai
>
> Paolo
>
>>  3 files changed, 270 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index f7ac249ce83d..2f24290b7f9d 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> @@ -202,6 +202,8 @@ enum vmcs_field {
>>  	XSS_EXIT_BITMAP_HIGH            = 0x0000202D,
>>  	TSC_MULTIPLIER                  = 0x00002032,
>>  	TSC_MULTIPLIER_HIGH             = 0x00002033,
>> +	ENCLS_EXITING_BITMAP		= 0x0000202E,
>> +	ENCLS_EXITING_BITMAP_HIGH	= 0x0000202F,
>>  	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>>  	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>>  	VMCS_LINK_POINTER               = 0x00002800,
>> diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
>> index 14458658e988..2bcd967d5c83 100644
>> --- a/arch/x86/include/uapi/asm/vmx.h
>> +++ b/arch/x86/include/uapi/asm/vmx.h
>> @@ -77,6 +77,7 @@
>>  #define EXIT_REASON_XSETBV              55
>>  #define EXIT_REASON_APIC_WRITE          56
>>  #define EXIT_REASON_INVPCID             58
>> +#define EXIT_REASON_ENCLS		60
>>  #define EXIT_REASON_PML_FULL            62
>>  #define EXIT_REASON_XSAVES              63
>>  #define EXIT_REASON_XRSTORS             64
>> @@ -130,7 +131,8 @@
>>  	{ EXIT_REASON_INVVPID,               "INVVPID" }, \
>>  	{ EXIT_REASON_INVPCID,               "INVPCID" }, \
>>  	{ EXIT_REASON_XSAVES,                "XSAVES" }, \
>> -	{ EXIT_REASON_XRSTORS,               "XRSTORS" }
>> +	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
>> +	{ EXIT_REASON_ENCLS,		     "ENCLS" }
>>
>>  #define VMX_ABORT_SAVE_GUEST_MSR_FAIL        1
>>  #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL       2
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave
  2017-05-08  8:22   ` Paolo Bonzini
@ 2017-05-11  9:34     ` Huang, Kai
  2017-06-19  5:02       ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-05-11  9:34 UTC (permalink / raw)
  To: Paolo Bonzini, Kai Huang, rkrcmar, kvm



On 5/8/2017 8:22 PM, Paolo Bonzini wrote:
>
>
> On 08/05/2017 07:24, Kai Huang wrote:
>> @@ -6977,6 +7042,31 @@ static __exit void hardware_unsetup(void)
>>   */
>>  static int handle_pause(struct kvm_vcpu *vcpu)
>>  {
>> +	/*
>> +	 * SDM 39.6.3 PAUSE Instruction.
>> +	 *
>> +	 * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM should
>> +	 * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
>> +	 * again without further PAUSE-looping VMEXIT.
>> +	 *
>> +	 * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should disable
>> +	 * 'PAUSE exiting' so PAUSE can be executed in Enclave again without
>> +	 * further PAUSE VMEXIT.
>> +	 */
>
> How is PLE re-enabled?

Currently it will not be enabled again. Probably we can re-enable it at 
another VMEXIT, if that VMEXIT is not PLE VMEXIT?

>
> I don't understand the interaction of the internal control registers
> (paragraph 41.1.4) with VMX.  How can you migrate the VM between EENTER
> and EEXIT?

Current SGX hardware architecture doesn't support live migration, as the 
key architecture of SGX is not migratable. For example, some keys are 
persistent and bound to hardware (sealing and attestation). Therefore 
right now if SGX is exposed to guest, live migration is not supposed.

>
> In addition, paragraph 41.1.4 does not include the parts of CR_SAVE_FS*
> and CR_SAVE_GS* (base, limit, access rights) and does not include
> CR_ENCLAVE_ENTRY_IP.

CPU can exit enclave via EEXIT, or by Asynchronous Enclave Exit (AEX). 
All non-EEXIT enclave exit are referred as AEX. When AEX happens, a so 
called "synthetic state" is created on CPU to prevent any software from 
trying to observe *secret* from CPU status in AEX. What exactly will be 
pushed in "synthetic state" is in SDM 40.3.

So in my understanding, CPU won't put something like 
"CR_ENCLAVE_ENTRY_IP" to RIP. Actually during AEX, Asynchronous Exit 
Pointer (AEP), which is in normal memory, will be pushed to stack and 
IRET will return to AEP to continue to run. AEP typically points to some 
small piece of code which basically calls ERESUME so that we can go back 
to enclave to run.

Hope my reply answered your questions?

Thanks,
-Kai

>
> Paolo
>
>> +	if (vmx_exit_from_enclave(vcpu)) {
>> +		u32 exec_ctl, secondary_exec_ctl;
>> +
>> +		exec_ctl = vmx_exec_control(to_vmx(vcpu));
>> +		exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
>> +		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
>> +
>> +		secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
>> +		secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>> +		vmcs_set_secondary_exec_control(secondary_exec_ctl);
>> +
>> +		return 1;
>> +	}
>> +
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-08  5:24 ` [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support Kai Huang
@ 2017-05-12  0:32   ` Huang, Kai
  2017-05-12  3:28     ` [intel-sgx-kernel-dev] " Andy Lutomirski
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-05-12  0:32 UTC (permalink / raw)
  To: Kai Huang, pbonzini, rkrcmar, kvm, intel-sgx-kernel-dev
  Cc: jarkko.sakkinen, sean.j.christopherson, haim.cohen, haitao.huang

Hi Paolo/Radim,

I'd like to start a discussion regarding to IA32_SGXLEPUBKEYHASHn 
handling here. I also copied SGX driver mailing list (which looks like I 
should do when sending out this series, sorry) and Sean, Haim and Haitao 
from Intel to have a better discussion.

Basically IA32_SGXLEPUBKEYHASHn (or more generally speaking, SGX Launch 
Control) allows us to run different Launch Enclave (LE) signed with 
different RSA keys. Only when the value of IA32_SGXLEPUBKEYHASHn matches 
the key used to sign the LE, the LE can be initialized, specifically, by 
EINIT, successfully. So before calling EINIT for LE, we have to make 
sure IA32_SGXLEPUBKEYHASHn contain the matched value. One fact is only 
EINIT uses IA32_SGXLEPUBKEYHASHn, and after EINIT, other ENCLS/ENCLU 
(ex, EGETKEY) runs correctly even the MSRs are changed to other values.

To support KVM guests to run their own LEs inside guests, KVM traps 
IA32_SGXLEPUBKEYHASHn MSR write and keep the value to vcpu internally, 
and KVM needs to write the cached value to real MSRs before guest runs 
EINIT. The problem at host side, we also run LE, probably multiple LEs 
(it seems currently SGX driver plans to run single in-kernel LE but I am 
not familiar with details, and IMO we should not assume host will only 
run one LE), therefore if KVM changes the physical MSRs for guest,
host may not be able to run LE as it may not re-write the right MSRs 
back. There are two approaches to make host and KVM guests work together:

1. Anyone who wants to run LE is responsible for writing the correct 
value to IA32_SGXLEPUBKEYHASHn.

My current patch is based on this assumption. For KVM guest, naturally, 
we will write the cached value to real MSRs when vcpu is scheduled in. 
For host, SGX driver should write its own value to MSRs when it performs 
EINIT for LE.

One argument against this approach is KVM guest should never have impact 
on host side, meaning host should not be aware of such MSR change, in 
which case, if host do some performance optimization thing that won't 
update MSRs actively, when host run EINIT, the physical MSRs may contain 
incorrect value. Instead, KVM should be responsible for restoring the 
original MSRs, which brings us to approach 2 below.

2. KVM should restore MSRs after changing for guest.

To do this, the simplest way for KVM is: 1) to save the original 
physical MSRs and update to guest's MSRs before VMENTRY; 2) in VMEXIT 
rewrite the original value to physical MSRs.

To me this approach is also arguable, as KVM guest is actually just a 
normal process (OK, maybe not that normal), and KVM guest should be 
treated as the same as other processes which runs LE, which means 
approach 1 is also reasonable.

And approach 2 will have more performance impact than approach 1 for 
KVM, as it read/write IA32_SGXLEPUBKEYHASHn during each VMEXIT/VMENTRY, 
while approach 1 only write MSRs when vcpu is scheduled in, which is 
less frequent.

I'd like to hear all your comments and hopefully we can have some 
agreement on this.

Another thing is, not quite related to selecting which approach above, 
and either we choose approach 1 or approach 2, KVM still suffers the 
performance loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn 
MSRs, either when vcpu scheduled in or during each VMEXIT/VMENTRY. Given 
the fact that the IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We 
can actually do some optimization by trapping EINIT from guest and only 
update MSRs in EINIT VMEXIT. This works for approach 1, but for approach 
2 we have to do some tricky thing during VMEXIT/VMENTRY
to check whether MSRs have been changed by EINIT VMEXIT, and only 
restore the original value if EINIT VMEXIT has happened. Guest's LE 
continues to run even physical MSRs are changed back to original.

But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of 
guest, in which case we have to reconstruct and remap guest's ENCLS 
parameters and skip the ENCLS for guest; 2) using MTF to let guest to 
run ENCLS again, while still trapping ENCLS. Either case would introduce 
more complicated code and potentially be more buggy, and I don't think 
we should do this to save some time of writing MSRs. If we need to turn 
on ENCLS VMEXIT anyway we can optimize this.

Thank you in advance.

Thanks,
-Kai

On 5/8/2017 5:24 PM, Kai Huang wrote:
> If SGX runtime launch control is enabled on host (IA32_FEATURE_CONTROL[17]
> is set), KVM can support running multiple guests with each running LE signed
> with different RSA pubkey. KVM traps IA32_SGXLEPUBKEYHASHn MSR write from
> and keeps the values to vcpu internally, and when vcpu is scheduled in, KVM
> write those values to real IA32_SGXLEPUBKEYHASHn MSR.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  arch/x86/include/asm/msr-index.h |   5 ++
>  arch/x86/kvm/vmx.c               | 123 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e3770f570bb9..70482b951b0f 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -417,6 +417,11 @@
>  #define MSR_IA32_TSC_ADJUST             0x0000003b
>  #define MSR_IA32_BNDCFGS		0x00000d90
>
> +#define MSR_IA32_SGXLEPUBKEYHASH0	0x0000008c
> +#define MSR_IA32_SGXLEPUBKEYHASH1	0x0000008d
> +#define MSR_IA32_SGXLEPUBKEYHASH2	0x0000008e
> +#define MSR_IA32_SGXLEPUBKEYHASH3	0x0000008f
> +
>  #define MSR_IA32_XSS			0x00000da0
>
>  #define FEATURE_CONTROL_LOCKED				(1<<0)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a16539594a99..c96332b9dd44 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -656,6 +656,9 @@ struct vcpu_vmx {
>  	 */
>  	u64 msr_ia32_feature_control;
>  	u64 msr_ia32_feature_control_valid_bits;
> +
> +	/* SGX Launch Control public key hash */
> +	u64 msr_ia32_sgxlepubkeyhash[4];
>  };
>
>  enum segment_cache_field {
> @@ -2244,6 +2247,70 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx)
>  	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
>  }
>
> +static bool cpu_sgx_lepubkeyhash_writable(void)
> +{
> +	u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED |
> +			FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE);
> +
> +	rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
> +
> +	return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask);
> +}
> +
> +static bool vmx_sgx_lc_disabled_in_bios(struct kvm_vcpu *vcpu)
> +{
> +	return (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED)
> +		&& (!(to_vmx(vcpu)->msr_ia32_feature_control &
> +				FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE));
> +}
> +
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH0		0xa6053e051270b7ac
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH1	        0x6cfbe8ba8b3b413d
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH2		0xc4916d99f2b3735d
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH3		0xd4f8c05909f9bb3b
> +
> +static void vmx_sgx_init_lepubkeyhash(struct kvm_vcpu *vcpu)
> +{
> +	u64 h0, h1, h2, h3;
> +
> +	/*
> +	 * If runtime launch control is enabled (IA32_SGXLEPUBKEYHASHn is
> +	 * writable), we set guest's default value to be Intel's default
> +	 * hash (which is fixed value and can be hard-coded). Otherwise,
> +	 * guest can only use machine's IA32_SGXLEPUBKEYHASHn so set guest's
> +	 * default to that.
> +	 */
> +	if (cpu_sgx_lepubkeyhash_writable()) {
> +		h0 = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
> +		h1 = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
> +		h2 = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
> +		h3 = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
> +	}
> +	else {
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH0, h0);
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, h1);
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, h2);
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, h3);
> +	}
> +
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0] = h0;
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1] = h1;
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2] = h2;
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3] = h3;
> +}
> +
> +static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu)
> +{
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0]);
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1]);
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2]);
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3]);
> +}
> +
>  /*
>   * Switches to specified vcpu, until a matching vcpu_put(), but assumes
>   * vcpu mutex is already taken.
> @@ -2316,6 +2383,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
>  	vmx_vcpu_pi_load(vcpu, cpu);
>  	vmx->host_pkru = read_pkru();
> +
> +	/*
> +	 * Load guset's SGX LE pubkey hash if runtime launch control is
> +	 * enabled.
> +	 */
> +	if (guest_cpuid_has_sgx_launch_control(vcpu) &&
> +			cpu_sgx_lepubkeyhash_writable())
> +		vmx_sgx_lepubkeyhash_load(vcpu);
>  }
>
>  static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
> @@ -3225,6 +3300,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_FEATURE_CONTROL:
>  		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
>  		break;
> +	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
> +		/*
> +		 * SDM 35.1 Model-Specific Registers, table 35-2.
> +		 * Read permitted if CPUID.0x12.0:EAX[0] = 1. (We have
> +		 * guaranteed this will be true if guest_cpuid_has_sgx
> +		 * is true.)
> +		 */
> +		if (!guest_cpuid_has_sgx(vcpu))
> +			return 1;
> +		msr_info->data =
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_info->index -
> +			MSR_IA32_SGXLEPUBKEYHASH0];
> +		break;
>  	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
>  		if (!nested_vmx_allowed(vcpu))
>  			return 1;
> @@ -3344,6 +3432,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * SGX has been enabled in BIOS before using SGX.
>  		 */
>  		break;
> +	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
> +		/*
> +		 * SDM 35.1 Model-Specific Registers, table 35-2.
> +		 * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is
> +		 * available.
> +		 * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
> +		 * FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
> +		 */
> +		if (!guest_cpuid_has_sgx(vcpu) ||
> +				!guest_cpuid_has_sgx_launch_control(vcpu))
> +			return 1;
> +		/*
> +		 * Don't let userspace set guest's IA32_SGXLEPUBKEYHASHn,
> +		 * if machine's IA32_SGXLEPUBKEYHASHn cannot be changed at
> +		 * runtime. Note to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash are
> +		 * set to default in vmx_create_vcpu therefore guest is able
> +		 * to get the machine's IA32_SGXLEPUBKEYHASHn by rdmsr in
> +		 * guest.
> +		 */
> +		if (!cpu_sgx_lepubkeyhash_writable())
> +			return 1;
> +		/*
> +		 * If guest's FEATURE_CONTROL[17] is not set, guest's
> +		 * IA32_SGXLEPUBKEYHASHn are not writeable from guest.
> +		 */
> +		if (!vmx_sgx_lc_disabled_in_bios(vcpu) &&
> +				!msr_info->host_initiated)
> +			return 1;
> +		to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_index -
> +			MSR_IA32_SGXLEPUBKEYHASH0] = data;
> +		break;
>  	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
>  		if (!msr_info->host_initiated)
>  			return 1; /* they are read-only */
> @@ -9305,6 +9424,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>  		vmx->nested.vpid02 = allocate_vpid();
>  	}
>
> +	/* Set vcpu's default IA32_SGXLEPUBKEYHASHn */
> +	if (enable_sgx && boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL))
> +		vmx_sgx_init_lepubkeyhash(&vmx->vcpu);
> +
>  	vmx->nested.posted_intr_nv = -1;
>  	vmx->nested.current_vmptr = -1ull;
>  	vmx->nested.current_vmcs12 = NULL;
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12  0:32   ` Huang, Kai
@ 2017-05-12  3:28     ` Andy Lutomirski
  2017-05-12  4:56       ` Huang, Kai
  2017-05-15 12:46       ` Jarkko Sakkinen
  0 siblings, 2 replies; 78+ messages in thread
From: Andy Lutomirski @ 2017-05-12  3:28 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Kai Huang, Paolo Bonzini, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev, haim.cohen

[resending due to some kind of kernel.org glitch -- sorry if anyone
gets duplicates]

On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> My current patch is based on this assumption. For KVM guest, naturally, we
> will write the cached value to real MSRs when vcpu is scheduled in. For
> host, SGX driver should write its own value to MSRs when it performs EINIT
> for LE.

This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
would propose a totally different solution:

Have a percpu variable that stores the current SGXLEPUBKEYHASH along
with whatever lock is needed (probably just a mutex).  Users of EINIT
will take the mutex, compare the percpu variable to the desired value,
and, if it's different, do WRMSR and update the percpu variable.

KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
support the same handling as the host.  There is no action required at
all on KVM guest entry and exit.

FWIW, I think that KVM will, in the long run, want to trap EINIT for
other reasons: someone is going to want to implement policy for what
enclaves are allowed that applies to guests as well as the host.
Also, some day Intel may fix its architectural design flaw [1] by
allowing EINIT to personalize the enclave's keying, and, if it's done
by a new argument to EINIT instead of an MSR, KVM will have to trap
EINIT to handle it.

>
> One argument against this approach is KVM guest should never have impact on
> host side, meaning host should not be aware of such MSR change

As a somewhat generic comment, I don't like this approach to KVM
development.  KVM mucks with lots of important architectural control
registers, and, in all too many cases, it tries to do so independently
of the other arch/x86 code.  This ends up causing all kinds of grief.

Can't KVM and the real x86 arch code cooperate for real?  The host and
the KVM code are in arch/x86 in the same source tree.

>
> 2. KVM should restore MSRs after changing for guest.

No, this is IMO silly.  Don't restore it on each exit, in the user
return hook, or anywhere else.  Just make sure the host knows it was
changed.

> Another thing is, not quite related to selecting which approach above, and
> either we choose approach 1 or approach 2, KVM still suffers the performance
> loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn MSRs, either when
> vcpu scheduled in or during each VMEXIT/VMENTRY. Given the fact that the
> IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We can actually do some
> optimization by trapping EINIT from guest and only update MSRs in EINIT
> VMEXIT.

Yep.

> But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of guest,
> in which case we have to reconstruct and remap guest's ENCLS parameters and
> skip the ENCLS for guest; 2) using MTF to let guest to run ENCLS again,
> while still trapping ENCLS.

I would advocate for the former approach.  (But you can't remap the
parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
don't see why this is any more complicated than emulating any other
instruction that accesses memory.)

If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
wrong and then clear the exit flag once the MSRs are in sync.  You'll
need to be careful to avoid races in which the host's value leaks into
the guest.  I think you'll find that this is more complicated, less
flexible, and less performant than just handling ENCLS[EINIT] directly
in the host.

[1] Guests that steal sealed data from each other or from the host can
manipulate that data without compromising the hypervisor by simply
loading the same enclave that its rightful owner would use.  If you're
trying to use SGX to protect your crypto credentials so that, if
stolen, they can't be used outside the guest, I would consider this to
be a major flaw.  It breaks the security model in a multi-tenant cloud
situation.  I've complained about it before.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12  3:28     ` [intel-sgx-kernel-dev] " Andy Lutomirski
@ 2017-05-12  4:56       ` Huang, Kai
  2017-05-12  6:11         ` Andy Lutomirski
  2017-05-15 12:46       ` Jarkko Sakkinen
  1 sibling, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-05-12  4:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kai Huang, Paolo Bonzini, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev, haim.cohen

Hi Andy,

Thanks for your helpful comments! See my reply below, and some questions 
as well.

On 5/12/2017 3:28 PM, Andy Lutomirski wrote:
> [resending due to some kind of kernel.org glitch -- sorry if anyone
> gets duplicates]
>
> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>> My current patch is based on this assumption. For KVM guest, naturally, we
>> will write the cached value to real MSRs when vcpu is scheduled in. For
>> host, SGX driver should write its own value to MSRs when it performs EINIT
>> for LE.
>
> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> would propose a totally different solution:

I am not sure whether the cost of writing to 4 MSRs would be *extremely* 
slow, as when vcpu is schedule in, KVM is already doing vmcs_load, 
writing to several MSRs, etc.

>
> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> with whatever lock is needed (probably just a mutex).  Users of EINIT
> will take the mutex, compare the percpu variable to the desired value,
> and, if it's different, do WRMSR and update the percpu variable.
>
> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
> support the same handling as the host.  There is no action required at
> all on KVM guest entry and exit.

This is doable, but SGX driver needs to do those things and expose 
interfaces for KVM to use. In terms of the percpu data, it is nice to 
have, but I am not sure whether it is mandatory, as IMO EINIT is not 
even in performance critical path. We can simply read old value from 
MSRs out and compare whether the old equals to the new.

>
> FWIW, I think that KVM will, in the long run, want to trap EINIT for
> other reasons: someone is going to want to implement policy for what
> enclaves are allowed that applies to guests as well as the host.

I am not very convinced why "what enclaves are allowed" in host would 
apply to guest. Can you elaborate? I mean in general virtualization just 
focus emulating hardware behavior. If a native machine is able to run 
any LE, the virtual machine should be able to as well (of course, with 
guest's IA32_FEATURE_CONTROL[bit 17] set).

> Also, some day Intel may fix its architectural design flaw [1] by
> allowing EINIT to personalize the enclave's keying, and, if it's done
> by a new argument to EINIT instead of an MSR, KVM will have to trap
> EINIT to handle it.

Looks this flaw is not the same issue as above (host enclave policy 
applies to guest)?

>
>>
>> One argument against this approach is KVM guest should never have impact on
>> host side, meaning host should not be aware of such MSR change
>
> As a somewhat generic comment, I don't like this approach to KVM
> development.  KVM mucks with lots of important architectural control
> registers, and, in all too many cases, it tries to do so independently
> of the other arch/x86 code.  This ends up causing all kinds of grief.
>
> Can't KVM and the real x86 arch code cooperate for real?  The host and
> the KVM code are in arch/x86 in the same source tree.

Currently on host SGX driver, which is pretty much self-contained, 
implements all SGX related staff.

>
>>
>> 2. KVM should restore MSRs after changing for guest.
>
> No, this is IMO silly.  Don't restore it on each exit, in the user
> return hook, or anywhere else.  Just make sure the host knows it was
> changed.
>
>> Another thing is, not quite related to selecting which approach above, and
>> either we choose approach 1 or approach 2, KVM still suffers the performance
>> loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn MSRs, either when
>> vcpu scheduled in or during each VMEXIT/VMENTRY. Given the fact that the
>> IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We can actually do some
>> optimization by trapping EINIT from guest and only update MSRs in EINIT
>> VMEXIT.
>
> Yep.
>
>> But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of guest,
>> in which case we have to reconstruct and remap guest's ENCLS parameters and
>> skip the ENCLS for guest; 2) using MTF to let guest to run ENCLS again,
>> while still trapping ENCLS.
>
> I would advocate for the former approach.  (But you can't remap the
> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
> don't see why this is any more complicated than emulating any other
> instruction that accesses memory.)

No you cannot just copy. Because all address in guest's ENCLS parameters 
are guest's virtual address, we cannot use them to execute ENCLS in KVM. 
If any guest virtual addresses is used in ENCLS parameters, for example, 
PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to 
KVM's virtual address.

Btw, what is TOCTOU issue? would you also elaborate locking issue?

>
> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
> wrong and then clear the exit flag once the MSRs are in sync.  You'll
> need to be careful to avoid races in which the host's value leaks into
> the guest.  I think you'll find that this is more complicated, less
> flexible, and less performant than just handling ENCLS[EINIT] directly
> in the host.

Sorry I don't quite follow this part. Why would host's value leaks into 
guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's 
MSR read/write is always trapped and emulated by KVM.

>
> [1] Guests that steal sealed data from each other or from the host can
> manipulate that data without compromising the hypervisor by simply
> loading the same enclave that its rightful owner would use.  If you're
> trying to use SGX to protect your crypto credentials so that, if
> stolen, they can't be used outside the guest, I would consider this to
> be a major flaw.  It breaks the security model in a multi-tenant cloud
> situation.  I've complained about it before.
>

Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In 
this case even it is leaked looks we cannot dig anything out just the 
hash value?

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12  4:56       ` Huang, Kai
@ 2017-05-12  6:11         ` Andy Lutomirski
  2017-05-12 18:48           ` Christopherson, Sean J
                             ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Andy Lutomirski @ 2017-05-12  6:11 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
> to several MSRs, etc.

I'm speculating that these MSRs may be rather unoptimized and hence
unusualy slow.

>
>>
>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>> will take the mutex, compare the percpu variable to the desired value,
>> and, if it's different, do WRMSR and update the percpu variable.
>>
>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>> support the same handling as the host.  There is no action required at
>> all on KVM guest entry and exit.
>
>
> This is doable, but SGX driver needs to do those things and expose
> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
> but I am not sure whether it is mandatory, as IMO EINIT is not even in
> performance critical path. We can simply read old value from MSRs out and
> compare whether the old equals to the new.

I think the SGX driver should probably live in arch/x86, and the
interface could be a simple percpu variable that is exported (from the
main kernel image, not from a module).

>
>>
>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>> other reasons: someone is going to want to implement policy for what
>> enclaves are allowed that applies to guests as well as the host.
>
>
> I am not very convinced why "what enclaves are allowed" in host would apply
> to guest. Can you elaborate? I mean in general virtualization just focus
> emulating hardware behavior. If a native machine is able to run any LE, the
> virtual machine should be able to as well (of course, with guest's
> IA32_FEATURE_CONTROL[bit 17] set).

I strongly disagree.  I can imagine two classes of sensible policies
for launch control:

1. Allow everything.  This seems quite sensible to me.

2. Allow some things, and make sure that VMs have at least as
restrictive a policy as host root has.  After all, what's the point of
restricting enclaves in the host if host code can simply spawn a
little VM to run otherwise-disallowed enclaves?

>
>> Also, some day Intel may fix its architectural design flaw [1] by
>> allowing EINIT to personalize the enclave's keying, and, if it's done
>> by a new argument to EINIT instead of an MSR, KVM will have to trap
>> EINIT to handle it.
>
>
> Looks this flaw is not the same issue as above (host enclave policy applies
> to guest)?

It's related.  Without this flaw, it might make sense to apply looser
policy in the guest as in the host.  With this flaw, I think your
policy fails to have any real effect if you don't enforce it on
guests.

>
>>
>>>
>>> One argument against this approach is KVM guest should never have impact
>>> on
>>> host side, meaning host should not be aware of such MSR change
>>
>>
>> As a somewhat generic comment, I don't like this approach to KVM
>> development.  KVM mucks with lots of important architectural control
>> registers, and, in all too many cases, it tries to do so independently
>> of the other arch/x86 code.  This ends up causing all kinds of grief.
>>
>> Can't KVM and the real x86 arch code cooperate for real?  The host and
>> the KVM code are in arch/x86 in the same source tree.
>
>
> Currently on host SGX driver, which is pretty much self-contained,
> implements all SGX related staff.

I will probably NAK this if it comes my way for inclusion upstream.
Just because it can be self-contained doesn't mean it should be
self-contained.

>>
>> I would advocate for the former approach.  (But you can't remap the
>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>> don't see why this is any more complicated than emulating any other
>> instruction that accesses memory.)
>
>
> No you cannot just copy. Because all address in guest's ENCLS parameters are
> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
> guest virtual addresses is used in ENCLS parameters, for example,
> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
> virtual address.
>
> Btw, what is TOCTOU issue? would you also elaborate locking issue?

I was partially mis-remembering how this worked.  It looks like
SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
mapped.  If KVM applied some policy to the launchable enclaves, it
would want to make sure that it only looks at fields that are copied
to make sure that the enclave that gets launched is the one it
verified.  The locking issue I'm imagining is that the SECS (or
whatever else might be mapped) doesn't disappear and get reused for
something else while it's mapped in the host.  Presumably KVM has an
existing mechanism for this, but maybe SECS is special because it's
not quite normal memory IIRC.

>
>>
>> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
>> wrong and then clear the exit flag once the MSRs are in sync.  You'll
>> need to be careful to avoid races in which the host's value leaks into
>> the guest.  I think you'll find that this is more complicated, less
>> flexible, and less performant than just handling ENCLS[EINIT] directly
>> in the host.
>
>
> Sorry I don't quite follow this part. Why would host's value leaks into
> guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR
> read/write is always trapped and emulated by KVM.

You'd need to make sure that this sequence of events doesn't happen:

 - Guest does EINIT and it exits.
 - Host updates the MSRs and the ENCLS-exiting bitmap.
 - Guest is preempted before it retries EINIT.
 - A different host thread launches an enclave, thus changing the MSRs.
 - Guest resumes and runs EINIT without exiting with the wrong MSR values.

>
>>
>> [1] Guests that steal sealed data from each other or from the host can
>> manipulate that data without compromising the hypervisor by simply
>> loading the same enclave that its rightful owner would use.  If you're
>> trying to use SGX to protect your crypto credentials so that, if
>> stolen, they can't be used outside the guest, I would consider this to
>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>> situation.  I've complained about it before.
>>
>
> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
> case even it is leaked looks we cannot dig anything out just the hash value?

Not sure what you mean.  Are you asking about the lack of guest personalization?

Concretely, imagine I write an enclave that seals my TLS client
certificate's private key and offers an API to sign TLS certificate
requests with it.  This way, if my system is compromised, an attacker
can use the certificate only so long as they have access to my
machine.  If I kick them out or if they merely get the ability to read
the sealed data but not to execute code, the private key should still
be safe.  But, if this system is a VM guest, the attacker could run
the exact same enclave on another guest on the same physical CPU and
sign using my key.  Whoops!

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12  6:11         ` Andy Lutomirski
@ 2017-05-12 18:48           ` Christopherson, Sean J
  2017-05-12 20:50             ` Christopherson, Sean J
                               ` (2 more replies)
  2017-05-16  0:48           ` Huang, Kai
  2017-07-19 15:04           ` Sean Christopherson
  2 siblings, 3 replies; 78+ messages in thread
From: Christopherson, Sean J @ 2017-05-12 18:48 UTC (permalink / raw)
  To: 'Andy Lutomirski', Huang, Kai
  Cc: kvm list, Radim Krcmar, Cohen, Haim, intel-sgx-kernel-dev, Paolo Bonzini

Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > I am not sure whether the cost of writing to 4 MSRs would be *extremely*
> > slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
> > to several MSRs, etc.
> 
> I'm speculating that these MSRs may be rather unoptimized and hence
> unusualy slow.
> 

Good speculation :)  We've been told to expect that writing the hash MSRs
will be at least 2.5x slower than normal MSRs.

> >
> >>
> >> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> >> with whatever lock is needed (probably just a mutex).  Users of EINIT
> >> will take the mutex, compare the percpu variable to the desired value,
> >> and, if it's different, do WRMSR and update the percpu variable.
> >>
> >> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
> >> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
> >> support the same handling as the host.  There is no action required at
> >> all on KVM guest entry and exit.
> >
> >
> > This is doable, but SGX driver needs to do those things and expose
> > interfaces for KVM to use. In terms of the percpu data, it is nice to have,
> > but I am not sure whether it is mandatory, as IMO EINIT is not even in
> > performance critical path. We can simply read old value from MSRs out and
> > compare whether the old equals to the new.
> 
> I think the SGX driver should probably live in arch/x86, and the
> interface could be a simple percpu variable that is exported (from the
> main kernel image, not from a module).
> 

Agreed, this would make life easier for future SGX code that can't be
self-contained in the driver, e.g. EPC cgroup.  Future architectural
enhancements might also require tighter integration with the kernel.


> >>
> >> I would advocate for the former approach.  (But you can't remap the
> >> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
> >> don't see why this is any more complicated than emulating any other
> >> instruction that accesses memory.)
> >
> >
> > No you cannot just copy. Because all address in guest's ENCLS parameters are
> > guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
> > guest virtual addresses is used in ENCLS parameters, for example,
> > PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
> > virtual address.
> >
> > Btw, what is TOCTOU issue? would you also elaborate locking issue?
> 
> I was partially mis-remembering how this worked.  It looks like
> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
> mapped.  If KVM applied some policy to the launchable enclaves, it
> would want to make sure that it only looks at fields that are copied
> to make sure that the enclave that gets launched is the one it
> verified.  The locking issue I'm imagining is that the SECS (or
> whatever else might be mapped) doesn't disappear and get reused for
> something else while it's mapped in the host.  Presumably KVM has an
> existing mechanism for this, but maybe SECS is special because it's
> not quite normal memory IIRC.
> 

Mapping the SECS in the host should not be an issue, AFAIK there aren't
any restrictions on the VA passed to EINIT as long as it resolves to a
SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with
an arbitrary VA.

I don't think emulating EINIT introduces any TOCTOU race conditions that
wouldn't already exist.  Evicting the SECS or modifying the page tables
on a different thread while executing EINIT is either a guest kernel bug
or bizarre behavior that the guest can already handle.  Similarly, KVM
would need special handling for evicting a guest's SECS, regardless of
EINIT emulation.

> >> [1] Guests that steal sealed data from each other or from the host can
> >> manipulate that data without compromising the hypervisor by simply
> >> loading the same enclave that its rightful owner would use.  If you're
> >> trying to use SGX to protect your crypto credentials so that, if
> >> stolen, they can't be used outside the guest, I would consider this to
> >> be a major flaw.  It breaks the security model in a multi-tenant cloud
> >> situation.  I've complained about it before.
> >>
> >
> > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
> > case even it is leaked looks we cannot dig anything out just the hash value?
> 
> Not sure what you mean.  Are you asking about the lack of guest
> personalization?
> 
> Concretely, imagine I write an enclave that seals my TLS client
> certificate's private key and offers an API to sign TLS certificate
> requests with it.  This way, if my system is compromised, an attacker
> can use the certificate only so long as they have access to my
> machine.  If I kick them out or if they merely get the ability to read
> the sealed data but not to execute code, the private key should still
> be safe.  But, if this system is a VM guest, the attacker could run
> the exact same enclave on another guest on the same physical CPU and
> sign using my key.  Whoops!

I know this issue has been raised internally as well, but I don't know
the status of the situation.  I'll follow up and provide any information
I can.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12 18:48           ` Christopherson, Sean J
@ 2017-05-12 20:50             ` Christopherson, Sean J
  2017-05-16  0:59             ` Huang, Kai
  2017-05-16  1:22             ` Huang, Kai
  2 siblings, 0 replies; 78+ messages in thread
From: Christopherson, Sean J @ 2017-05-12 20:50 UTC (permalink / raw)
  To: Christopherson, Sean J, 'Andy Lutomirski', Huang, Kai
  Cc: Paolo Bonzini, Cohen, Haim, intel-sgx-kernel-dev, kvm list, Radim Krcmar

Christopherson, Sean J <sean.j.christopherson@intel.com> wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
> > >> [1] Guests that steal sealed data from each other or from the host can
> > >> manipulate that data without compromising the hypervisor by simply
> > >> loading the same enclave that its rightful owner would use.  If you're
> > >> trying to use SGX to protect your crypto credentials so that, if
> > >> stolen, they can't be used outside the guest, I would consider this to
> > >> be a major flaw.  It breaks the security model in a multi-tenant cloud
> > >> situation.  I've complained about it before.
> > >>
> > >
> > > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In
> > > this case even it is leaked looks we cannot dig anything out just the
> > > hash value?
> > 
> > Not sure what you mean.  Are you asking about the lack of guest
> > personalization?
> > 
> > Concretely, imagine I write an enclave that seals my TLS client
> > certificate's private key and offers an API to sign TLS certificate
> > requests with it.  This way, if my system is compromised, an attacker
> > can use the certificate only so long as they have access to my
> > machine.  If I kick them out or if they merely get the ability to read
> > the sealed data but not to execute code, the private key should still
> > be safe.  But, if this system is a VM guest, the attacker could run
> > the exact same enclave on another guest on the same physical CPU and
> > sign using my key.  Whoops!
> 
> I know this issue has been raised internally as well, but I don't know
> the status of the situation.  I'll follow up and provide any information
> I can.

So, the key players are well aware of the value added by per-VM keys,
but, ultimately, shipping this feature is dependent on having strong
requests from customers.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12  3:28     ` [intel-sgx-kernel-dev] " Andy Lutomirski
  2017-05-12  4:56       ` Huang, Kai
@ 2017-05-15 12:46       ` Jarkko Sakkinen
  2017-05-15 23:56         ` Huang, Kai
  1 sibling, 1 reply; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-05-15 12:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Huang, Kai, kvm list, Radim Krcmar, haim.cohen,
	intel-sgx-kernel-dev, Paolo Bonzini

On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
> [resending due to some kind of kernel.org glitch -- sorry if anyone
> gets duplicates]
> 
> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > My current patch is based on this assumption. For KVM guest, naturally, we
> > will write the cached value to real MSRs when vcpu is scheduled in. For
> > host, SGX driver should write its own value to MSRs when it performs EINIT
> > for LE.
> 
> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> would propose a totally different solution:
> 
> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> with whatever lock is needed (probably just a mutex).  Users of EINIT
> will take the mutex, compare the percpu variable to the desired value,
> and, if it's different, do WRMSR and update the percpu variable.

This is exactly what I've been suggesting internally: trap EINIT and
check the value and write conditionally.

I think this would be the best starting point.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-15 12:46       ` Jarkko Sakkinen
@ 2017-05-15 23:56         ` Huang, Kai
  2017-05-16 14:23           ` Paolo Bonzini
                             ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-15 23:56 UTC (permalink / raw)
  To: Jarkko Sakkinen, Andy Lutomirski
  Cc: kvm list, Radim Krcmar, haim.cohen, intel-sgx-kernel-dev, Paolo Bonzini



On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
> On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
>> [resending due to some kind of kernel.org glitch -- sorry if anyone
>> gets duplicates]
>>
>> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>> My current patch is based on this assumption. For KVM guest, naturally, we
>>> will write the cached value to real MSRs when vcpu is scheduled in. For
>>> host, SGX driver should write its own value to MSRs when it performs EINIT
>>> for LE.
>>
>> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
>> would propose a totally different solution:
>>
>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>> will take the mutex, compare the percpu variable to the desired value,
>> and, if it's different, do WRMSR and update the percpu variable.
>
> This is exactly what I've been suggesting internally: trap EINIT and
> check the value and write conditionally.
>
> I think this would be the best starting point.

OK. Assuming we are going to have this percpu variable for 
IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to 
this percpu variable after KVM writes guest's value to hardware MSR? And 
host (SGX driver) need to do the same thing (check the value and write 
conditionally), correct?

Thanks,
-Kai

>
> /Jarkko
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12  6:11         ` Andy Lutomirski
  2017-05-12 18:48           ` Christopherson, Sean J
@ 2017-05-16  0:48           ` Huang, Kai
  2017-05-16 14:21             ` Paolo Bonzini
  2017-05-17  0:09             ` Andy Lutomirski
  2017-07-19 15:04           ` Sean Christopherson
  2 siblings, 2 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-16  0:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kai Huang, Paolo Bonzini, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev, haim.cohen



On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
>> to several MSRs, etc.
>
> I'm speculating that these MSRs may be rather unoptimized and hence
> unusualy slow.
>
>>
>>>
>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>> will take the mutex, compare the percpu variable to the desired value,
>>> and, if it's different, do WRMSR and update the percpu variable.
>>>
>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>> support the same handling as the host.  There is no action required at
>>> all on KVM guest entry and exit.
>>
>>
>> This is doable, but SGX driver needs to do those things and expose
>> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>> performance critical path. We can simply read old value from MSRs out and
>> compare whether the old equals to the new.
>
> I think the SGX driver should probably live in arch/x86, and the
> interface could be a simple percpu variable that is exported (from the
> main kernel image, not from a module).
>
>>
>>>
>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>> other reasons: someone is going to want to implement policy for what
>>> enclaves are allowed that applies to guests as well as the host.
>>
>>
>> I am not very convinced why "what enclaves are allowed" in host would apply
>> to guest. Can you elaborate? I mean in general virtualization just focus
>> emulating hardware behavior. If a native machine is able to run any LE, the
>> virtual machine should be able to as well (of course, with guest's
>> IA32_FEATURE_CONTROL[bit 17] set).
>
> I strongly disagree.  I can imagine two classes of sensible policies
> for launch control:
>
> 1. Allow everything.  This seems quite sensible to me.
>
> 2. Allow some things, and make sure that VMs have at least as
> restrictive a policy as host root has.  After all, what's the point of
> restricting enclaves in the host if host code can simply spawn a
> little VM to run otherwise-disallowed enclaves?

What's the current SGX driver launch control policy? Yes allow 
everything works for KVM so lets skip this. Are we going to support 
allowing several LEs, or just allowing one single LE? I know Jarkko is 
doing in-kernel LE staff but I don't know details.

I am trying to find a way that we can both not break host launch control 
policy, and be consistent to HW behavior (from guest's view). Currently 
we can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn 
either enabled or disabled. I introduced an Qemu parameter 'lewr' for 
this purpose. Actually I introduced below Qemu SGX parameters for 
creating guest:

	-sgx epc=<size>,lehash='SHA-256 hash',lewr

where 'epc' specifies guest's EPC size, lehash specifies (initial) value 
of guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is 
allowed to change guest's IA32_SGXLEPUBKEYHASHn at runtime.

If host only allows one single LE to run, KVM can add a restrict that 
only allows to create KVM guest with runtime change to 
IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash 
can be used by guest. From guest's view, it simply has 
IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with 
default value to be host allowed (single) hash.

If host allows several LEs (not but everything), and if we create guest 
with 'lewr', then the behavior is not consistent with HW behavior, as 
from guest's hardware's point of view, we can actually run any LE but we 
have to tell guest that you are only allowed to change 
IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution 
is we don't allow to create guest with 'lewr' specified, and at the 
meantime, only allow to create guest with host approved hashes specified 
in 'lehash'. This will make guest's behavior consistent to HW behavior 
but only allows guest to run one LE (which is specified by 'lehash' when 
guest is created).

I'd like to hear comments from you guys.

Paolo, do you also have comments here from KVM's side?

Thanks,
-Kai

>
>>
>>> Also, some day Intel may fix its architectural design flaw [1] by
>>> allowing EINIT to personalize the enclave's keying, and, if it's done
>>> by a new argument to EINIT instead of an MSR, KVM will have to trap
>>> EINIT to handle it.
>>
>>
>> Looks this flaw is not the same issue as above (host enclave policy applies
>> to guest)?
>
> It's related.  Without this flaw, it might make sense to apply looser
> policy in the guest as in the host.  With this flaw, I think your
> policy fails to have any real effect if you don't enforce it on
> guests.
>
>>
>>>
>>>>
>>>> One argument against this approach is KVM guest should never have impact
>>>> on
>>>> host side, meaning host should not be aware of such MSR change
>>>
>>>
>>> As a somewhat generic comment, I don't like this approach to KVM
>>> development.  KVM mucks with lots of important architectural control
>>> registers, and, in all too many cases, it tries to do so independently
>>> of the other arch/x86 code.  This ends up causing all kinds of grief.
>>>
>>> Can't KVM and the real x86 arch code cooperate for real?  The host and
>>> the KVM code are in arch/x86 in the same source tree.
>>
>>
>> Currently on host SGX driver, which is pretty much self-contained,
>> implements all SGX related staff.
>
> I will probably NAK this if it comes my way for inclusion upstream.
> Just because it can be self-contained doesn't mean it should be
> self-contained.
>
>>>
>>> I would advocate for the former approach.  (But you can't remap the
>>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>>> don't see why this is any more complicated than emulating any other
>>> instruction that accesses memory.)
>>
>>
>> No you cannot just copy. Because all address in guest's ENCLS parameters are
>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
>> guest virtual addresses is used in ENCLS parameters, for example,
>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
>> virtual address.
>>
>> Btw, what is TOCTOU issue? would you also elaborate locking issue?
>
> I was partially mis-remembering how this worked.  It looks like
> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
> mapped.  If KVM applied some policy to the launchable enclaves, it
> would want to make sure that it only looks at fields that are copied
> to make sure that the enclave that gets launched is the one it
> verified.  The locking issue I'm imagining is that the SECS (or
> whatever else might be mapped) doesn't disappear and get reused for
> something else while it's mapped in the host.  Presumably KVM has an
> existing mechanism for this, but maybe SECS is special because it's
> not quite normal memory IIRC.
>
>>
>>>
>>> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
>>> wrong and then clear the exit flag once the MSRs are in sync.  You'll
>>> need to be careful to avoid races in which the host's value leaks into
>>> the guest.  I think you'll find that this is more complicated, less
>>> flexible, and less performant than just handling ENCLS[EINIT] directly
>>> in the host.
>>
>>
>> Sorry I don't quite follow this part. Why would host's value leaks into
>> guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR
>> read/write is always trapped and emulated by KVM.
>
> You'd need to make sure that this sequence of events doesn't happen:
>
>  - Guest does EINIT and it exits.
>  - Host updates the MSRs and the ENCLS-exiting bitmap.
>  - Guest is preempted before it retries EINIT.
>  - A different host thread launches an enclave, thus changing the MSRs.
>  - Guest resumes and runs EINIT without exiting with the wrong MSR values.
>
>>
>>>
>>> [1] Guests that steal sealed data from each other or from the host can
>>> manipulate that data without compromising the hypervisor by simply
>>> loading the same enclave that its rightful owner would use.  If you're
>>> trying to use SGX to protect your crypto credentials so that, if
>>> stolen, they can't be used outside the guest, I would consider this to
>>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>>> situation.  I've complained about it before.
>>>
>>
>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
>> case even it is leaked looks we cannot dig anything out just the hash value?
>
> Not sure what you mean.  Are you asking about the lack of guest personalization?
>
> Concretely, imagine I write an enclave that seals my TLS client
> certificate's private key and offers an API to sign TLS certificate
> requests with it.  This way, if my system is compromised, an attacker
> can use the certificate only so long as they have access to my
> machine.  If I kick them out or if they merely get the ability to read
> the sealed data but not to execute code, the private key should still
> be safe.  But, if this system is a VM guest, the attacker could run
> the exact same enclave on another guest on the same physical CPU and
> sign using my key.  Whoops!
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12 18:48           ` Christopherson, Sean J
  2017-05-12 20:50             ` Christopherson, Sean J
@ 2017-05-16  0:59             ` Huang, Kai
  2017-05-16  1:22             ` Huang, Kai
  2 siblings, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-16  0:59 UTC (permalink / raw)
  To: Christopherson, Sean J, 'Andy Lutomirski'
  Cc: kvm list, Radim Krcmar, Cohen, Haim, intel-sgx-kernel-dev, Paolo Bonzini



On 5/13/2017 6:48 AM, Christopherson, Sean J wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
>>> to several MSRs, etc.
>>
>> I'm speculating that these MSRs may be rather unoptimized and hence
>> unusualy slow.
>>
>
> Good speculation :)  We've been told to expect that writing the hash MSRs
> will be at least 2.5x slower than normal MSRs.
>
>>>
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>
>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>> support the same handling as the host.  There is no action required at
>>>> all on KVM guest entry and exit.
>>>
>>>
>>> This is doable, but SGX driver needs to do those things and expose
>>> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>> performance critical path. We can simply read old value from MSRs out and
>>> compare whether the old equals to the new.
>>
>> I think the SGX driver should probably live in arch/x86, and the
>> interface could be a simple percpu variable that is exported (from the
>> main kernel image, not from a module).
>>
>
> Agreed, this would make life easier for future SGX code that can't be
> self-contained in the driver, e.g. EPC cgroup.  Future architectural
> enhancements might also require tighter integration with the kernel.


>
>
>>>>
>>>> I would advocate for the former approach.  (But you can't remap the
>>>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>>>> don't see why this is any more complicated than emulating any other
>>>> instruction that accesses memory.)
>>>
>>>
>>> No you cannot just copy. Because all address in guest's ENCLS parameters are
>>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
>>> guest virtual addresses is used in ENCLS parameters, for example,
>>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
>>> virtual address.
>>>
>>> Btw, what is TOCTOU issue? would you also elaborate locking issue?
>>
>> I was partially mis-remembering how this worked.  It looks like
>> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
>> mapped.  If KVM applied some policy to the launchable enclaves, it
>> would want to make sure that it only looks at fields that are copied
>> to make sure that the enclave that gets launched is the one it
>> verified.  The locking issue I'm imagining is that the SECS (or
>> whatever else might be mapped) doesn't disappear and get reused for
>> something else while it's mapped in the host.  Presumably KVM has an
>> existing mechanism for this, but maybe SECS is special because it's
>> not quite normal memory IIRC.

I am thinking we might not need to check value in SIGSTRUCT or 
EINITTOKEN, as KVM needs to emulate guest IA32_SGXLEPUBKEYHASHn write. 
If we decide we should apply host's policy to KVM guest, it seems we can 
do the check when trapping guest write to IA32_SGXLEPUBKEYHASHn, and 
only host approved values (may be a white-list or something?) are 
allowed to be written in the guest.

Thanks,
-Kai

>>
>
> Mapping the SECS in the host should not be an issue, AFAIK there aren't
> any restrictions on the VA passed to EINIT as long as it resolves to a
> SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with
> an arbitrary VA.
>
> I don't think emulating EINIT introduces any TOCTOU race conditions that
> wouldn't already exist.  Evicting the SECS or modifying the page tables
> on a different thread while executing EINIT is either a guest kernel bug
> or bizarre behavior that the guest can already handle.  Similarly, KVM
> would need special handling for evicting a guest's SECS, regardless of
> EINIT emulation.

Agreed.

>
>>>> [1] Guests that steal sealed data from each other or from the host can
>>>> manipulate that data without compromising the hypervisor by simply
>>>> loading the same enclave that its rightful owner would use.  If you're
>>>> trying to use SGX to protect your crypto credentials so that, if
>>>> stolen, they can't be used outside the guest, I would consider this to
>>>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>>>> situation.  I've complained about it before.
>>>>
>>>
>>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
>>> case even it is leaked looks we cannot dig anything out just the hash value?
>>
>> Not sure what you mean.  Are you asking about the lack of guest
>> personalization?
>>
>> Concretely, imagine I write an enclave that seals my TLS client
>> certificate's private key and offers an API to sign TLS certificate
>> requests with it.  This way, if my system is compromised, an attacker
>> can use the certificate only so long as they have access to my
>> machine.  If I kick them out or if they merely get the ability to read
>> the sealed data but not to execute code, the private key should still
>> be safe.  But, if this system is a VM guest, the attacker could run
>> the exact same enclave on another guest on the same physical CPU and
>> sign using my key.  Whoops!
>
> I know this issue has been raised internally as well, but I don't know
> the status of the situation.  I'll follow up and provide any information
> I can.
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12 18:48           ` Christopherson, Sean J
  2017-05-12 20:50             ` Christopherson, Sean J
  2017-05-16  0:59             ` Huang, Kai
@ 2017-05-16  1:22             ` Huang, Kai
  2 siblings, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-16  1:22 UTC (permalink / raw)
  To: Christopherson, Sean J, 'Andy Lutomirski'
  Cc: kvm list, Radim Krcmar, Cohen, Haim, intel-sgx-kernel-dev, Paolo Bonzini



On 5/13/2017 6:48 AM, Christopherson, Sean J wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
>>> to several MSRs, etc.
>>
>> I'm speculating that these MSRs may be rather unoptimized and hence
>> unusualy slow.
>>
>
> Good speculation :)  We've been told to expect that writing the hash MSRs
> will be at least 2.5x slower than normal MSRs.
>
>>>
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>
>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>> support the same handling as the host.  There is no action required at
>>>> all on KVM guest entry and exit.
>>>
>>>
>>> This is doable, but SGX driver needs to do those things and expose
>>> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>> performance critical path. We can simply read old value from MSRs out and
>>> compare whether the old equals to the new.
>>
>> I think the SGX driver should probably live in arch/x86, and the
>> interface could be a simple percpu variable that is exported (from the
>> main kernel image, not from a module).
>>
>
> Agreed, this would make life easier for future SGX code that can't be
> self-contained in the driver, e.g. EPC cgroup.  Future architectural
> enhancements might also require tighter integration with the kernel.

I think this is better as well. In this way we can leverage SGX code 
more easily. Some SGX detection code can be done in arch/x86/ as well, 
so that other code can access, ex, SGX capabilities, quickly.

Another thing is actually SDK says SGX CPUID is per-thread, and we 
should not assume SGX CPUID will report the same info on all processors. 
I think it's better to check this as well. Moving SGX detection to 
identify_cpu can make this work more easily.

Thanks,
-Kai
>
>
>>>>
>>>> I would advocate for the former approach.  (But you can't remap the
>>>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>>>> don't see why this is any more complicated than emulating any other
>>>> instruction that accesses memory.)
>>>
>>>
>>> No you cannot just copy. Because all address in guest's ENCLS parameters are
>>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
>>> guest virtual addresses is used in ENCLS parameters, for example,
>>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
>>> virtual address.
>>>
>>> Btw, what is TOCTOU issue? would you also elaborate locking issue?
>>
>> I was partially mis-remembering how this worked.  It looks like
>> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
>> mapped.  If KVM applied some policy to the launchable enclaves, it
>> would want to make sure that it only looks at fields that are copied
>> to make sure that the enclave that gets launched is the one it
>> verified.  The locking issue I'm imagining is that the SECS (or
>> whatever else might be mapped) doesn't disappear and get reused for
>> something else while it's mapped in the host.  Presumably KVM has an
>> existing mechanism for this, but maybe SECS is special because it's
>> not quite normal memory IIRC.
>>
>
> Mapping the SECS in the host should not be an issue, AFAIK there aren't
> any restrictions on the VA passed to EINIT as long as it resolves to a
> SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with
> an arbitrary VA.
>
> I don't think emulating EINIT introduces any TOCTOU race conditions that
> wouldn't already exist.  Evicting the SECS or modifying the page tables
> on a different thread while executing EINIT is either a guest kernel bug
> or bizarre behavior that the guest can already handle.  Similarly, KVM
> would need special handling for evicting a guest's SECS, regardless of
> EINIT emulation.
>
>>>> [1] Guests that steal sealed data from each other or from the host can
>>>> manipulate that data without compromising the hypervisor by simply
>>>> loading the same enclave that its rightful owner would use.  If you're
>>>> trying to use SGX to protect your crypto credentials so that, if
>>>> stolen, they can't be used outside the guest, I would consider this to
>>>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>>>> situation.  I've complained about it before.
>>>>
>>>
>>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
>>> case even it is leaked looks we cannot dig anything out just the hash value?
>>
>> Not sure what you mean.  Are you asking about the lack of guest
>> personalization?
>>
>> Concretely, imagine I write an enclave that seals my TLS client
>> certificate's private key and offers an API to sign TLS certificate
>> requests with it.  This way, if my system is compromised, an attacker
>> can use the certificate only so long as they have access to my
>> machine.  If I kick them out or if they merely get the ability to read
>> the sealed data but not to execute code, the private key should still
>> be safe.  But, if this system is a VM guest, the attacker could run
>> the exact same enclave on another guest on the same physical CPU and
>> sign using my key.  Whoops!
>
> I know this issue has been raised internally as well, but I don't know
> the status of the situation.  I'll follow up and provide any information
> I can.
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-16  0:48           ` Huang, Kai
@ 2017-05-16 14:21             ` Paolo Bonzini
  2017-05-18  7:54               ` Huang, Kai
  2017-05-17  0:09             ` Andy Lutomirski
  1 sibling, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2017-05-16 14:21 UTC (permalink / raw)
  To: Huang, Kai, Andy Lutomirski
  Cc: Kai Huang, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev@lists.01.org, haim.cohen



On 16/05/2017 02:48, Huang, Kai wrote:
> 
> 
> If host only allows one single LE to run, KVM can add a restrict that
> only allows to create KVM guest with runtime change to
> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash
> can be used by guest. From guest's view, it simply has
> IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with
> default value to be host allowed (single) hash.
> 
> If host allows several LEs (not but everything), and if we create guest
> with 'lewr', then the behavior is not consistent with HW behavior, as
> from guest's hardware's point of view, we can actually run any LE but we
> have to tell guest that you are only allowed to change
> IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution
> is we don't allow to create guest with 'lewr' specified, and at the
> meantime, only allow to create guest with host approved hashes specified
> in 'lehash'. This will make guest's behavior consistent to HW behavior
> but only allows guest to run one LE (which is specified by 'lehash' when
> guest is created).
> 
> I'd like to hear comments from you guys.
> 
> Paolo, do you also have comments here from KVM's side?

I would start with read-only LE hash (same as the host), which is a
valid configuration anyway.  Then later we can trap EINIT to emulate
IA32_SGXLEPUBKEYHASHn.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-15 23:56         ` Huang, Kai
@ 2017-05-16 14:23           ` Paolo Bonzini
  2017-05-17 14:21           ` Sean Christopherson
  2017-05-20 13:23           ` Jarkko Sakkinen
  2 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2017-05-16 14:23 UTC (permalink / raw)
  To: Huang, Kai, Jarkko Sakkinen, Andy Lutomirski
  Cc: kvm list, Radim Krcmar, haim.cohen, intel-sgx-kernel-dev@lists.01.org



On 16/05/2017 01:56, Huang, Kai wrote:
>>>
>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>> will take the mutex, compare the percpu variable to the desired value,
>>> and, if it's different, do WRMSR and update the percpu variable.
>>
>> This is exactly what I've been suggesting internally: trap EINIT and
>> check the value and write conditionally.
>>
>> I think this would be the best starting point.
> 
> OK. Assuming we are going to have this percpu variable for
> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to
> this percpu variable after KVM writes guest's value to hardware MSR? And
> host (SGX driver) need to do the same thing (check the value and write
> conditionally), correct?

The percpu variable is just an optimization.  If EINIT is not
performance critical, you could even do the WRMSR unconditionally; what
matters is having a mutex that covers both WRMSR and EINIT.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-16  0:48           ` Huang, Kai
  2017-05-16 14:21             ` Paolo Bonzini
@ 2017-05-17  0:09             ` Andy Lutomirski
  2017-05-18  7:45               ` Huang, Kai
  1 sibling, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-05-17  0:09 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>
>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>> writing
>>> to several MSRs, etc.
>>
>>
>> I'm speculating that these MSRs may be rather unoptimized and hence
>> unusualy slow.
>>
>>>
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>
>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>> support the same handling as the host.  There is no action required at
>>>> all on KVM guest entry and exit.
>>>
>>>
>>>
>>> This is doable, but SGX driver needs to do those things and expose
>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>> have,
>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>> performance critical path. We can simply read old value from MSRs out and
>>> compare whether the old equals to the new.
>>
>>
>> I think the SGX driver should probably live in arch/x86, and the
>> interface could be a simple percpu variable that is exported (from the
>> main kernel image, not from a module).
>>
>>>
>>>>
>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>> other reasons: someone is going to want to implement policy for what
>>>> enclaves are allowed that applies to guests as well as the host.
>>>
>>>
>>>
>>> I am not very convinced why "what enclaves are allowed" in host would
>>> apply
>>> to guest. Can you elaborate? I mean in general virtualization just focus
>>> emulating hardware behavior. If a native machine is able to run any LE,
>>> the
>>> virtual machine should be able to as well (of course, with guest's
>>> IA32_FEATURE_CONTROL[bit 17] set).
>>
>>
>> I strongly disagree.  I can imagine two classes of sensible policies
>> for launch control:
>>
>> 1. Allow everything.  This seems quite sensible to me.
>>
>> 2. Allow some things, and make sure that VMs have at least as
>> restrictive a policy as host root has.  After all, what's the point of
>> restricting enclaves in the host if host code can simply spawn a
>> little VM to run otherwise-disallowed enclaves?
>
>
> What's the current SGX driver launch control policy? Yes allow everything
> works for KVM so lets skip this. Are we going to support allowing several
> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
> staff but I don't know details.
>
> I am trying to find a way that we can both not break host launch control
> policy, and be consistent to HW behavior (from guest's view). Currently we
> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn either
> enabled or disabled. I introduced an Qemu parameter 'lewr' for this purpose.
> Actually I introduced below Qemu SGX parameters for creating guest:
>
>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>
> where 'epc' specifies guest's EPC size, lehash specifies (initial) value of
> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is allowed
> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>
> If host only allows one single LE to run, KVM can add a restrict that only
> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
> disabled, so that only host allowed (single) hash can be used by guest. From
> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) hash.
>
> If host allows several LEs (not but everything), and if we create guest with
> 'lewr', then the behavior is not consistent with HW behavior, as from
> guest's hardware's point of view, we can actually run any LE but we have to
> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to some
> specific values. One compromise solution is we don't allow to create guest
> with 'lewr' specified, and at the meantime, only allow to create guest with
> host approved hashes specified in 'lehash'. This will make guest's behavior
> consistent to HW behavior but only allows guest to run one LE (which is
> specified by 'lehash' when guest is created).

I'm not sure I entirely agree for a couple reasons.

1. I wouldn't be surprised if the kernel ends up implementing a policy
in which it checks all enclaves (not just LEs) for acceptability.  In
fact, if the kernel sticks with the "no LE at all or just
kernel-internal LE", then checking enclaves directly against some
admin- or distro-provided signer list seems reasonable.  This type of
policy can't be forwarded to a guest by restricting allowed LE
signers.  But this is mostly speculation since AFAIK no one has
seriously proposed any particular policy support and the plan was to
not have this for the initial implementation.

2. While matching hardware behavior is nice in principle, there
doesn't seem to be useful hardware behavior to match here.  If the
host had a list of five allowed LE signers, how exactly would it
restrict the MSRs?  They're not written atomically, so you can't
directly tell what's being written.  Also, the only way to fail an MSR
write is to send #GP, and Windows (and even Linux) may not expect
that.  Linux doesn't panic due to #GP on MSR writes these days, but
you still get a big fat warning.  I wouldn't be at all surprised if
Windows BSODs.  ENCLS[EINIT], on the other hand, returns an actual
error code.  I'm not sure that a sensible error code exists
("SGX_HYPERVISOR_SAID_NO?", perhaps), but SGX_INVALID_EINITTOKEN seems
to mean, more or less, "the CPU thinks you're not authorized to do
this", so forcing that error code could be entirely reasonable.

If the host policy is to allow a list of LE signers, you could return
SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
the list.

--Andy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-15 23:56         ` Huang, Kai
  2017-05-16 14:23           ` Paolo Bonzini
@ 2017-05-17 14:21           ` Sean Christopherson
  2017-05-18  8:14             ` Huang, Kai
  2017-05-20 13:23           ` Jarkko Sakkinen
  2 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2017-05-17 14:21 UTC (permalink / raw)
  To: Huang, Kai, Jarkko Sakkinen, Andy Lutomirski
  Cc: Paolo Bonzini, haim.cohen, intel-sgx-kernel-dev, kvm list, Radim Krcmar

On Tue, 2017-05-16 at 11:56 +1200, Huang, Kai wrote:
> 
> On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
> > 
> > On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
> > > 
> > > [resending due to some kind of kernel.org glitch -- sorry if anyone
> > > gets duplicates]
> > > 
> > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com>
> > > wrote:
> > > > 
> > > > My current patch is based on this assumption. For KVM guest, naturally,
> > > > we
> > > > will write the cached value to real MSRs when vcpu is scheduled in. For
> > > > host, SGX driver should write its own value to MSRs when it performs
> > > > EINIT
> > > > for LE.
> > > This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> > > would propose a totally different solution:
> > > 
> > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > will take the mutex, compare the percpu variable to the desired value,
> > > and, if it's different, do WRMSR and update the percpu variable.
> > This is exactly what I've been suggesting internally: trap EINIT and
> > check the value and write conditionally.
> > 
> > I think this would be the best starting point.
> OK. Assuming we are going to have this percpu variable for 
> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to 
> this percpu variable after KVM writes guest's value to hardware MSR? And 
> host (SGX driver) need to do the same thing (check the value and write 
> conditionally), correct?
> 
> Thanks,
> -Kai

Yes, the percpu variable is simply a cache so that the kernel doesn't have to do
four RDMSRs every time it wants to do EINIT.  KVM would still maintain shadow
copies of the MSRs for each vcpu for emulating RDMSR, WRMSR and EINIT.  I don't
think KVM would even need to be aware of the percpu variable, i.e. the entire
lock->(rd/wr)msr->EINIT->unlock sequence can probably be encapsulated in a
single function that is called from both the primary SGX driver and from KVM.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-17  0:09             ` Andy Lutomirski
@ 2017-05-18  7:45               ` Huang, Kai
  2017-06-06 20:52                 ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-05-18  7:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kai Huang, Paolo Bonzini, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev, haim.cohen



On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>
>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>> wrote:
>>>>
>>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>> writing
>>>> to several MSRs, etc.
>>>
>>>
>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>> unusualy slow.
>>>
>>>>
>>>>>
>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>> will take the mutex, compare the percpu variable to the desired value,
>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>
>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>>> support the same handling as the host.  There is no action required at
>>>>> all on KVM guest entry and exit.
>>>>
>>>>
>>>>
>>>> This is doable, but SGX driver needs to do those things and expose
>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>> have,
>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>> performance critical path. We can simply read old value from MSRs out and
>>>> compare whether the old equals to the new.
>>>
>>>
>>> I think the SGX driver should probably live in arch/x86, and the
>>> interface could be a simple percpu variable that is exported (from the
>>> main kernel image, not from a module).
>>>
>>>>
>>>>>
>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>> other reasons: someone is going to want to implement policy for what
>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>
>>>>
>>>>
>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>> apply
>>>> to guest. Can you elaborate? I mean in general virtualization just focus
>>>> emulating hardware behavior. If a native machine is able to run any LE,
>>>> the
>>>> virtual machine should be able to as well (of course, with guest's
>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>
>>>
>>> I strongly disagree.  I can imagine two classes of sensible policies
>>> for launch control:
>>>
>>> 1. Allow everything.  This seems quite sensible to me.
>>>
>>> 2. Allow some things, and make sure that VMs have at least as
>>> restrictive a policy as host root has.  After all, what's the point of
>>> restricting enclaves in the host if host code can simply spawn a
>>> little VM to run otherwise-disallowed enclaves?
>>
>>
>> What's the current SGX driver launch control policy? Yes allow everything
>> works for KVM so lets skip this. Are we going to support allowing several
>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>> staff but I don't know details.
>>
>> I am trying to find a way that we can both not break host launch control
>> policy, and be consistent to HW behavior (from guest's view). Currently we
>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn either
>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this purpose.
>> Actually I introduced below Qemu SGX parameters for creating guest:
>>
>>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>
>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value of
>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is allowed
>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>
>> If host only allows one single LE to run, KVM can add a restrict that only
>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>> disabled, so that only host allowed (single) hash can be used by guest. From
>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) hash.
>>
>> If host allows several LEs (not but everything), and if we create guest with
>> 'lewr', then the behavior is not consistent with HW behavior, as from
>> guest's hardware's point of view, we can actually run any LE but we have to
>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to some
>> specific values. One compromise solution is we don't allow to create guest
>> with 'lewr' specified, and at the meantime, only allow to create guest with
>> host approved hashes specified in 'lehash'. This will make guest's behavior
>> consistent to HW behavior but only allows guest to run one LE (which is
>> specified by 'lehash' when guest is created).
>
> I'm not sure I entirely agree for a couple reasons.
>
> 1. I wouldn't be surprised if the kernel ends up implementing a policy
> in which it checks all enclaves (not just LEs) for acceptability.  In
> fact, if the kernel sticks with the "no LE at all or just
> kernel-internal LE", then checking enclaves directly against some
> admin- or distro-provided signer list seems reasonable.  This type of
> policy can't be forwarded to a guest by restricting allowed LE
> signers.  But this is mostly speculation since AFAIK no one has
> seriously proposed any particular policy support and the plan was to
> not have this for the initial implementation.
>
> 2. While matching hardware behavior is nice in principle, there
> doesn't seem to be useful hardware behavior to match here.  If the
> host had a list of five allowed LE signers, how exactly would it
> restrict the MSRs?  They're not written atomically, so you can't
> directly tell what's being written.

In this case I actually plan to just allow creating guest with guest's 
IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is 
specified, creating guest will fail. And we only allow creating guest 
with host allowed hash values (with 'lehash=hash-value'), and if 
'hash-value' specified by 'lehash' is not allowed by host, we also fail 
to create guest.

We can only allow creating guest with 'lewr' specified when host allows 
anything.

But in this way, we are restricting guest OS's ability to run LE, as 
only one LE, that is specified by 'lehash' parameter, can be run. But I 
think this won't hurt much, as multiple guests still are able to run 
different LEs?

Also, the only way to fail an MSR
> write is to send #GP, and Windows (and even Linux) may not expect
> that.  Linux doesn't panic due to #GP on MSR writes these days, but
> you still get a big fat warning.  I wouldn't be at all surprised if
> Windows BSODs.

We cannot allow writing some particular value to MSRs successfully, 
while injecting #GP when writing other values to the same MSRs. So #GP 
is not option.

ENCLS[EINIT], on the other hand, returns an actual
> error code.  I'm not sure that a sensible error code exists
> ("SGX_HYPERVISOR_SAID_NO?", perhaps),

Looks no such error code exists. And we cannot return such error code to 
guest as such error code is only supposed to be valid when ENCLS is run 
in hypervisor.

but SGX_INVALID_EINITTOKEN seems
> to mean, more or less, "the CPU thinks you're not authorized to do
> this", so forcing that error code could be entirely reasonable.
>
> If the host policy is to allow a list of LE signers, you could return
> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
> the list.

But this would be inconsistent with HW behavior. If the hash value in 
guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, 
EINIT is not supposed to return SGX_INVALID_EINITTOKEN.

I think from VMM's perspective, emulating HW behavior to be consistent 
with real HW behavior is very important.

Paolo, would you provide your comments?

>
> --Andy
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-16 14:21             ` Paolo Bonzini
@ 2017-05-18  7:54               ` Huang, Kai
  2017-05-18  8:58                 ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-05-18  7:54 UTC (permalink / raw)
  To: Paolo Bonzini, Andy Lutomirski
  Cc: Kai Huang, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev@lists.01.org, haim.cohen



On 5/17/2017 2:21 AM, Paolo Bonzini wrote:
>
>
> On 16/05/2017 02:48, Huang, Kai wrote:
>>
>>
>> If host only allows one single LE to run, KVM can add a restrict that
>> only allows to create KVM guest with runtime change to
>> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash
>> can be used by guest. From guest's view, it simply has
>> IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with
>> default value to be host allowed (single) hash.
>>
>> If host allows several LEs (not but everything), and if we create guest
>> with 'lewr', then the behavior is not consistent with HW behavior, as
>> from guest's hardware's point of view, we can actually run any LE but we
>> have to tell guest that you are only allowed to change
>> IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution
>> is we don't allow to create guest with 'lewr' specified, and at the
>> meantime, only allow to create guest with host approved hashes specified
>> in 'lehash'. This will make guest's behavior consistent to HW behavior
>> but only allows guest to run one LE (which is specified by 'lehash' when
>> guest is created).
>>
>> I'd like to hear comments from you guys.
>>
>> Paolo, do you also have comments here from KVM's side?
>
> I would start with read-only LE hash (same as the host), which is a
> valid configuration anyway.  Then later we can trap EINIT to emulate
> IA32_SGXLEPUBKEYHASHn.

You mean we can start with creating guest without Qemu 'lewr' parameter 
support, and always disallowing guest to change IA32_SGXLEPUBKEYHASHn?
Even in this way, KVM still needs to emulate IA32_SGXLEPUBKEYHASHn (just 
allow MSR reading but not writing), and write guest's value to physical 
MSRs when running guest (trapping EINIT and write MSRs during EINIT is 
really just performance optimization). Because host can run multiple LEs 
and change MSRs. Your suggestion only works when runtime change to 
IA32_SGXLEPUBKEYHASHn is disabled on host (meaning physical machine).

Thanks,
-Kai
>
> Paolo
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-17 14:21           ` Sean Christopherson
@ 2017-05-18  8:14             ` Huang, Kai
  2017-05-20 21:55               ` Andy Lutomirski
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-05-18  8:14 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Andy Lutomirski
  Cc: Paolo Bonzini, haim.cohen, intel-sgx-kernel-dev, kvm list, Radim Krcmar



On 5/18/2017 2:21 AM, Sean Christopherson wrote:
> On Tue, 2017-05-16 at 11:56 +1200, Huang, Kai wrote:
>>
>> On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
>>>
>>> On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
>>>>
>>>> [resending due to some kind of kernel.org glitch -- sorry if anyone
>>>> gets duplicates]
>>>>
>>>> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>> My current patch is based on this assumption. For KVM guest, naturally,
>>>>> we
>>>>> will write the cached value to real MSRs when vcpu is scheduled in. For
>>>>> host, SGX driver should write its own value to MSRs when it performs
>>>>> EINIT
>>>>> for LE.
>>>> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
>>>> would propose a totally different solution:
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>> This is exactly what I've been suggesting internally: trap EINIT and
>>> check the value and write conditionally.
>>>
>>> I think this would be the best starting point.
>> OK. Assuming we are going to have this percpu variable for
>> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to
>> this percpu variable after KVM writes guest's value to hardware MSR? And
>> host (SGX driver) need to do the same thing (check the value and write
>> conditionally), correct?
>>
>> Thanks,
>> -Kai
>
> Yes, the percpu variable is simply a cache so that the kernel doesn't have to do
> four RDMSRs every time it wants to do EINIT.  KVM would still maintain shadow
> copies of the MSRs for each vcpu for emulating RDMSR, WRMSR and EINIT.  I don't
> think KVM would even need to be aware of the percpu variable, i.e. the entire
> lock->(rd/wr)msr->EINIT->unlock sequence can probably be encapsulated in a
> single function that is called from both the primary SGX driver and from KVM.
>

You are making assumption that KVM will run ENCLS on behalf of guest. :)

If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I 
actually prefer to using MTF, as with MTF we don't have to do all the 
remapping guest's virtual address to KVM's virtual address thing, if we 
don't need to look into guest's ENCLS parameter. But if we need to look 
into guest's ENCLS parameters, for example, to locate physical SECS 
page, or to update physical EPC page's info (that KVM needs to 
maintain), maybe we can choose running ENCLS on behalf of guest.

But if we are going to run ENCLS on behalf of guest, I think providing a 
single function which does write MSR and EINIT for KVM should be a good 
idea.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-18  7:54               ` Huang, Kai
@ 2017-05-18  8:58                 ` Paolo Bonzini
  0 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2017-05-18  8:58 UTC (permalink / raw)
  To: Huang, Kai, Andy Lutomirski
  Cc: Kai Huang, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev@lists.01.org, haim.cohen



On 18/05/2017 09:54, Huang, Kai wrote:
>>
>> I would start with read-only LE hash (same as the host), which is a
>> valid configuration anyway.  Then later we can trap EINIT to emulate
>> IA32_SGXLEPUBKEYHASHn.
> 
> You mean we can start with creating guest without Qemu 'lewr' parameter
> support, and always disallowing guest to change IA32_SGXLEPUBKEYHASHn?
> Even in this way, KVM still needs to emulate IA32_SGXLEPUBKEYHASHn (just
> allow MSR reading but not writing), and write guest's value to physical
> MSRs when running guest (trapping EINIT and write MSRs during EINIT is
> really just performance optimization). Because host can run multiple LEs
> and change MSRs.

Oh, I didn't know this.  So I guess there isn't much benefit in skipping
the trapping of EINIT.

Paolo

> Your suggestion only works when runtime change to
> IA32_SGXLEPUBKEYHASHn is disabled on host (meaning physical machine).

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-15 23:56         ` Huang, Kai
  2017-05-16 14:23           ` Paolo Bonzini
  2017-05-17 14:21           ` Sean Christopherson
@ 2017-05-20 13:23           ` Jarkko Sakkinen
  2 siblings, 0 replies; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-05-20 13:23 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, kvm list, Radim Krcmar, haim.cohen,
	intel-sgx-kernel-dev, Paolo Bonzini

On Tue, May 16, 2017 at 11:56:38AM +1200, Huang, Kai wrote:
> 
> 
> On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
> > On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
> > > [resending due to some kind of kernel.org glitch -- sorry if anyone
> > > gets duplicates]
> > > 
> > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > > > My current patch is based on this assumption. For KVM guest, naturally, we
> > > > will write the cached value to real MSRs when vcpu is scheduled in. For
> > > > host, SGX driver should write its own value to MSRs when it performs EINIT
> > > > for LE.
> > > 
> > > This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> > > would propose a totally different solution:
> > > 
> > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > will take the mutex, compare the percpu variable to the desired value,
> > > and, if it's different, do WRMSR and update the percpu variable.
> > 
> > This is exactly what I've been suggesting internally: trap EINIT and
> > check the value and write conditionally.
> > 
> > I think this would be the best starting point.
> 
> OK. Assuming we are going to have this percpu variable for
> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to this
> percpu variable after KVM writes guest's value to hardware MSR? And host
> (SGX driver) need to do the same thing (check the value and write
> conditionally), correct?
> 
> Thanks,
> -Kai

This how I would understand it, yes.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-18  8:14             ` Huang, Kai
@ 2017-05-20 21:55               ` Andy Lutomirski
  2017-05-23  5:43                 ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-05-20 21:55 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Sean Christopherson, Jarkko Sakkinen, Andy Lutomirski,
	Paolo Bonzini, haim.cohen, intel-sgx-kernel-dev, kvm list,
	Radim Krcmar

On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>
> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
> actually prefer to using MTF, as with MTF we don't have to do all the
> remapping guest's virtual address to KVM's virtual address thing, if we
> don't need to look into guest's ENCLS parameter. But if we need to look into
> guest's ENCLS parameters, for example, to locate physical SECS page, or to
> update physical EPC page's info (that KVM needs to maintain), maybe we can
> choose running ENCLS on behalf of guest.

After thinking about this a bit, I don't see how MTF helps.
Currently, KVM works kind of like this:

local_irq_disable();
set up stuff;
VMRESUME;
restore some host state;
local_irq_enable();

If the guest is going to run with the EINIT-exiting bit clear, the
only way I see this working is to modify KVM along the lines of:

local_irq_disable();
set up stuff;
if (condition here) {
  WRMSR to SGXLEPUBKEYHASH;
  update percpu shadow copyl
  clear EINIT-exiting bit;
} else {
  set EINIT-exiting bit;
}
VMRESUME;
restore some host state;
local_irq_enable();

where "condition here" might be something like "the last VMRESUME
exited due to EINIT".

I don't see how MTF helps much.  And if I were the KVM maintainer, I
would probably prefer to trap EINIT instead of adding a special case
to the main vm entry code.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-20 21:55               ` Andy Lutomirski
@ 2017-05-23  5:43                 ` Huang, Kai
  2017-05-23  5:55                   ` Huang, Kai
  2017-05-23 16:34                   ` Andy Lutomirski
  0 siblings, 2 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-23  5:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, Jarkko Sakkinen, Paolo Bonzini, haim.cohen,
	intel-sgx-kernel-dev, kvm list, Radim Krcmar



On 5/21/2017 9:55 AM, Andy Lutomirski wrote:
> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>>
>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
>> actually prefer to using MTF, as with MTF we don't have to do all the
>> remapping guest's virtual address to KVM's virtual address thing, if we
>> don't need to look into guest's ENCLS parameter. But if we need to look into
>> guest's ENCLS parameters, for example, to locate physical SECS page, or to
>> update physical EPC page's info (that KVM needs to maintain), maybe we can
>> choose running ENCLS on behalf of guest.
>
> After thinking about this a bit, I don't see how MTF helps.
> Currently, KVM works kind of like this:
>
> local_irq_disable();
> set up stuff;
> VMRESUME;
> restore some host state;
> local_irq_enable();
>
> If the guest is going to run with the EINIT-exiting bit clear, the
> only way I see this working is to modify KVM along the lines of:
>
> local_irq_disable();
> set up stuff;
> if (condition here) {
>   WRMSR to SGXLEPUBKEYHASH;
>   update percpu shadow copyl
>   clear EINIT-exiting bit;
> } else {
>   set EINIT-exiting bit;
> }
> VMRESUME;
> restore some host state;
> local_irq_enable();
>
> where "condition here" might be something like "the last VMRESUME
> exited due to EINIT".
>
> I don't see how MTF helps much.  And if I were the KVM maintainer, I
> would probably prefer to trap EINIT instead of adding a special case
> to the main vm entry code.
>

Hi Andy,

Thanks for your comments. However I didn't intend to use MTF in your 
way. The idea of using MTF (along with ENCLS VMEXIT) is, by turning on 
MTF VMEXIT upon ENCLS VMEXIT, we are able to mark a single step VMEXIT 
after ENCLS so that ENCLS can run in guest as single step.

Let me explain how the two approaches work below in general, so that we 
can decide which is better. Only trapping EINIT in order to update 
IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in 
more general way, assuming we may want to trap more ENCLS in order to, 
ex, track EPC/Enclave status/info, in the future to support, ex, EPC 
oversubscription between KVM guests.

Below diagram shows the basic idea of the two approaches.


	--------------------------------------------------------------
			|	ENCLS		|
	--------------------------------------------------------------
			|	   	       /|\
	ENCLS VMEXIT	|			| VMENTRY
			|			|
		       \|/			|
		
     		1) identify which ENCLS leaf (RAX)
     		2) reconstruct/remap guest's ENCLS parameters, ex:
			- remap any guest VA (virtual address) to KVM VA
                 	- reconstruct PAGEINFO
         	3) do whatever needed before ENCLS, ex:
			- updating MSRs before EINIT
     		4) run ENCLS on behalf of guest, and skip ENCLS
		5) emulate ENCLS result (succeeded or not)
			- update guest's RAX-RDX.
			- and/or inject #GP (or #UD).
		6) do whatever needed after ENCLS, ex:
			- updating EPC/Enclave status/info

		   	1) Run ENCLS on behalf of guest


	--------------------------------------------------------------
			 |	ENCLS		   |
	--------------------------------------------------------------
			|/|\		          |/|\
	ENCLS VMEXIT	| | VMENTRY    MTF VMEXIT | | VMENTRY
		        | |	                  | |
		       \|/|		         \|/|
	1) Turn off EMCLS VMEXIT      1) Turn off MTF VMEXIT
	2) turn on MTF VMEXIT	      2) Turn on ENCLS VMEXIT
	3) cache ENCLS parameters     3) check whether ENCLS has run
	   (ENCLS changes RAX)        4) check whether ENCLS succeeded
	4) do whatever needed before     or not.
            ENCLS                      5) do whatever needed after ENCLS

			2) Using MTF

The concern of running ENCLS on behalf of guest is emulating ENCLS 
error. KVM needs to *correctly* emulate ENCLS error to guest so that the 
error we inject to guest can reflect the right behavior as if ENCLS run 
in guest. Running ENCLS in root-mode may be potentially different 
running ENCLS in non-root mode, therefore we have to go through all 
possible error codes to make sure we can emulate. And for some error 
code, ex, SGX_LOCKFAIL, we can handle it in KVM and don't have to inject 
error to guest. So the point is we have to go through all error code to 
make sure KVM can emulate ENCLS error code correctly for guest.
Another argument is Intel may add new error codes in the future when 
more SGX functionalities are introduced, so emulating error code may be 
a burden.

Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT 
upon ENCLS VMEXIT, the MTF won't be absolutely pending at end of that 
ENCLS. For example, MTF may be pending at end of interrupt (cannot 
recall exactly) if event is pending during VMENTRY from ENCLS VMEXIT. 
Therefore we have to do additional thing to check whether this MTF 
VMEXIT really happens after ENCLS run (step 3 above). And depending on 
what we need to do, we may need to check whether ENCLS succeeded or not 
in guest, which is also tricky, as ENCLS can fail in either setting 
error code in RAX, or generating #GP or #UD (step 4 above). We may still 
need to do gva->gpa->hpa, ex, in order to locate EPC/SECS page and 
update status, depending on the purpose of trapping ENCLS.

But by using MTF, we don't have to worry about ENCLS error emulation, as 
ENCLS runs in guest, thus we don't need to worry about this root-mode 
and non-root mode difference. I think this is the major reason that we 
want to use MTF.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-23  5:43                 ` Huang, Kai
@ 2017-05-23  5:55                   ` Huang, Kai
  2017-05-23 16:34                   ` Andy Lutomirski
  1 sibling, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-23  5:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, Jarkko Sakkinen, Paolo Bonzini, haim.cohen,
	intel-sgx-kernel-dev, kvm list, Radim Krcmar



On 5/23/2017 5:43 PM, Huang, Kai wrote:
>
>
> On 5/21/2017 9:55 AM, Andy Lutomirski wrote:
>> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai
>> <kai.huang@linux.intel.com> wrote:
>>> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>>>
>>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
>>> actually prefer to using MTF, as with MTF we don't have to do all the
>>> remapping guest's virtual address to KVM's virtual address thing, if we
>>> don't need to look into guest's ENCLS parameter. But if we need to
>>> look into
>>> guest's ENCLS parameters, for example, to locate physical SECS page,
>>> or to
>>> update physical EPC page's info (that KVM needs to maintain), maybe
>>> we can
>>> choose running ENCLS on behalf of guest.
>>
>> After thinking about this a bit, I don't see how MTF helps.
>> Currently, KVM works kind of like this:
>>
>> local_irq_disable();
>> set up stuff;
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> If the guest is going to run with the EINIT-exiting bit clear, the
>> only way I see this working is to modify KVM along the lines of:
>>
>> local_irq_disable();
>> set up stuff;
>> if (condition here) {
>>   WRMSR to SGXLEPUBKEYHASH;
>>   update percpu shadow copyl
>>   clear EINIT-exiting bit;
>> } else {
>>   set EINIT-exiting bit;
>> }
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> where "condition here" might be something like "the last VMRESUME
>> exited due to EINIT".
>>
>> I don't see how MTF helps much.  And if I were the KVM maintainer, I
>> would probably prefer to trap EINIT instead of adding a special case
>> to the main vm entry code.
>>
>
> Hi Andy,
>
> Thanks for your comments. However I didn't intend to use MTF in your
> way. The idea of using MTF (along with ENCLS VMEXIT) is, by turning on
> MTF VMEXIT upon ENCLS VMEXIT, we are able to mark a single step VMEXIT
> after ENCLS so that ENCLS can run in guest as single step.
>
> Let me explain how the two approaches work below in general, so that we
> can decide which is better. Only trapping EINIT in order to update
> IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in
> more general way, assuming we may want to trap more ENCLS in order to,
> ex, track EPC/Enclave status/info, in the future to support, ex, EPC
> oversubscription between KVM guests.
>
> Below diagram shows the basic idea of the two approaches.
>
>
>     --------------------------------------------------------------
>             |    ENCLS        |
>     --------------------------------------------------------------
>             |                  /|\
>     ENCLS VMEXIT    |            | VMENTRY
>             |            |
>                \|/            |

Looks the diagrams are broken. Sorry. I need to verify before sending 
next time. But looks they are still understandable?

Thanks,
Kai
>
>             1) identify which ENCLS leaf (RAX)
>             2) reconstruct/remap guest's ENCLS parameters, ex:
>             - remap any guest VA (virtual address) to KVM VA
>                     - reconstruct PAGEINFO
>             3) do whatever needed before ENCLS, ex:
>             - updating MSRs before EINIT
>             4) run ENCLS on behalf of guest, and skip ENCLS
>         5) emulate ENCLS result (succeeded or not)
>             - update guest's RAX-RDX.
>             - and/or inject #GP (or #UD).
>         6) do whatever needed after ENCLS, ex:
>             - updating EPC/Enclave status/info
>
>                1) Run ENCLS on behalf of guest
>
>
>     --------------------------------------------------------------
>              |    ENCLS           |
>     --------------------------------------------------------------
>             |/|\                  |/|\
>     ENCLS VMEXIT    | | VMENTRY    MTF VMEXIT | | VMENTRY
>                 | |                      | |
>                \|/|                 \|/|
>     1) Turn off EMCLS VMEXIT      1) Turn off MTF VMEXIT
>     2) turn on MTF VMEXIT          2) Turn on ENCLS VMEXIT
>     3) cache ENCLS parameters     3) check whether ENCLS has run
>        (ENCLS changes RAX)        4) check whether ENCLS succeeded
>     4) do whatever needed before     or not.
>            ENCLS                      5) do whatever needed after ENCLS
>
>             2) Using MTF
>
> The concern of running ENCLS on behalf of guest is emulating ENCLS
> error. KVM needs to *correctly* emulate ENCLS error to guest so that the
> error we inject to guest can reflect the right behavior as if ENCLS run
> in guest. Running ENCLS in root-mode may be potentially different
> running ENCLS in non-root mode, therefore we have to go through all
> possible error codes to make sure we can emulate. And for some error
> code, ex, SGX_LOCKFAIL, we can handle it in KVM and don't have to inject
> error to guest. So the point is we have to go through all error code to
> make sure KVM can emulate ENCLS error code correctly for guest.
> Another argument is Intel may add new error codes in the future when
> more SGX functionalities are introduced, so emulating error code may be
> a burden.
>
> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT
> upon ENCLS VMEXIT, the MTF won't be absolutely pending at end of that
> ENCLS. For example, MTF may be pending at end of interrupt (cannot
> recall exactly) if event is pending during VMENTRY from ENCLS VMEXIT.
> Therefore we have to do additional thing to check whether this MTF
> VMEXIT really happens after ENCLS run (step 3 above). And depending on
> what we need to do, we may need to check whether ENCLS succeeded or not
> in guest, which is also tricky, as ENCLS can fail in either setting
> error code in RAX, or generating #GP or #UD (step 4 above). We may still
> need to do gva->gpa->hpa, ex, in order to locate EPC/SECS page and
> update status, depending on the purpose of trapping ENCLS.
>
> But by using MTF, we don't have to worry about ENCLS error emulation, as
> ENCLS runs in guest, thus we don't need to worry about this root-mode
> and non-root mode difference. I think this is the major reason that we
> want to use MTF.
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-23  5:43                 ` Huang, Kai
  2017-05-23  5:55                   ` Huang, Kai
@ 2017-05-23 16:34                   ` Andy Lutomirski
  2017-05-23 16:43                     ` Paolo Bonzini
  1 sibling, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-05-23 16:34 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Sean Christopherson, Jarkko Sakkinen,
	Paolo Bonzini, haim.cohen, intel-sgx-kernel-dev, kvm list,
	Radim Krcmar

On Mon, May 22, 2017 at 10:43 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 5/21/2017 9:55 AM, Andy Lutomirski wrote:
>>
>> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>>>
>>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
>>> actually prefer to using MTF, as with MTF we don't have to do all the
>>> remapping guest's virtual address to KVM's virtual address thing, if we
>>> don't need to look into guest's ENCLS parameter. But if we need to look
>>> into
>>> guest's ENCLS parameters, for example, to locate physical SECS page, or
>>> to
>>> update physical EPC page's info (that KVM needs to maintain), maybe we
>>> can
>>> choose running ENCLS on behalf of guest.
>>
>>
>> After thinking about this a bit, I don't see how MTF helps.
>> Currently, KVM works kind of like this:
>>
>> local_irq_disable();
>> set up stuff;
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> If the guest is going to run with the EINIT-exiting bit clear, the
>> only way I see this working is to modify KVM along the lines of:
>>
>> local_irq_disable();
>> set up stuff;
>> if (condition here) {
>>   WRMSR to SGXLEPUBKEYHASH;
>>   update percpu shadow copyl
>>   clear EINIT-exiting bit;
>> } else {
>>   set EINIT-exiting bit;
>> }
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> where "condition here" might be something like "the last VMRESUME
>> exited due to EINIT".
>>
>> I don't see how MTF helps much.  And if I were the KVM maintainer, I
>> would probably prefer to trap EINIT instead of adding a special case
>> to the main vm entry code.
>>
>
> Hi Andy,
>
> Thanks for your comments. However I didn't intend to use MTF in your way.
> The idea of using MTF (along with ENCLS VMEXIT) is, by turning on MTF VMEXIT
> upon ENCLS VMEXIT, we are able to mark a single step VMEXIT after ENCLS so
> that ENCLS can run in guest as single step.
>
> Let me explain how the two approaches work below in general, so that we can
> decide which is better. Only trapping EINIT in order to update
> IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in more
> general way, assuming we may want to trap more ENCLS in order to, ex, track
> EPC/Enclave status/info, in the future to support, ex, EPC oversubscription
> between KVM guests.
>
> Below diagram shows the basic idea of the two approaches.
>

...

>
>         --------------------------------------------------------------
>                          |      ENCLS              |
>         --------------------------------------------------------------
>                         |/|\                      |/|\
>         ENCLS VMEXIT    | | VMENTRY    MTF VMEXIT | | VMENTRY
>                         | |                       | |
>                        \|/|                      \|/|
>         1) Turn off EMCLS VMEXIT      1) Turn off MTF VMEXIT
>         2) turn on MTF VMEXIT         2) Turn on ENCLS VMEXIT
>         3) cache ENCLS parameters     3) check whether ENCLS has run
>            (ENCLS changes RAX)        4) check whether ENCLS succeeded
>         4) do whatever needed before     or not.
>            ENCLS                      5) do whatever needed after ENCLS
>
>                         2) Using MTF
>

...

> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon
> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For
> example, MTF may be pending at end of interrupt (cannot recall exactly) if
> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do
> additional thing to check whether this MTF VMEXIT really happens after ENCLS
> run (step 3 above). And depending on what we need to do, we may need to
> check whether ENCLS succeeded or not in guest, which is also tricky, as
> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD
> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to
> locate EPC/SECS page and update status, depending on the purpose of trapping
> ENCLS.

I think there are some issues here.

First, you're making a big assumption that, when you resume the guest
with MTF set, the instruction that gets executed is still
ENCLS[EINIT].  That's not guaranteed as is -- you could race against
another vCPU that changes the instruction, the instruction could be in
IO space, host userspace could be messing with you, etc.  Second, I
don't think there's any precedent at all in KVM for doing this.
Third, you still need to make sure that the MSRs retain the value you
want them to have by the time ENCLS happens.  I think that, by the
time you resolve all of these issues, it'll look a lot like the
pseudocode I emailed out, and MTF won't be necessary any more.

>
> But by using MTF, we don't have to worry about ENCLS error emulation, as
> ENCLS runs in guest, thus we don't need to worry about this root-mode and
> non-root mode difference. I think this is the major reason that we want to
> use MTF.

I don't see why error emulation is hard.  If the host does ENCLS on
behalf of the guest and it returns an error, can't you return exactly
the same error to the guest with no further processing?  The only
tricky case is where the host rejects due to its own policy and you
have to choose an error code.

--Andy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-23 16:34                   ` Andy Lutomirski
@ 2017-05-23 16:43                     ` Paolo Bonzini
  2017-05-24  8:20                       ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2017-05-23 16:43 UTC (permalink / raw)
  To: Andy Lutomirski, Huang, Kai
  Cc: Sean Christopherson, Jarkko Sakkinen, haim.cohen,
	intel-sgx-kernel-dev@lists.01.org, kvm list, Radim Krcmar



On 23/05/2017 18:34, Andy Lutomirski wrote:
> 
>> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon
>> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For
>> example, MTF may be pending at end of interrupt (cannot recall exactly) if
>> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do
>> additional thing to check whether this MTF VMEXIT really happens after ENCLS
>> run (step 3 above). And depending on what we need to do, we may need to
>> check whether ENCLS succeeded or not in guest, which is also tricky, as
>> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD
>> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to
>> locate EPC/SECS page and update status, depending on the purpose of trapping
>> ENCLS.
> I think there are some issues here.
> 
> First, you're making a big assumption that, when you resume the guest
> with MTF set, the instruction that gets executed is still
> ENCLS[EINIT].  That's not guaranteed as is -- you could race against
> another vCPU that changes the instruction, the instruction could be in
> IO space, host userspace could be messing with you, etc.  Second, I
> don't think there's any precedent at all in KVM for doing this.
> Third, you still need to make sure that the MSRs retain the value you
> want them to have by the time ENCLS happens.  I think that, by the
> time you resolve all of these issues, it'll look a lot like the
> pseudocode I emailed out, and MTF won't be necessary any more.

Agreed.  Emulation in the host is better.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-23 16:43                     ` Paolo Bonzini
@ 2017-05-24  8:20                       ` Huang, Kai
  0 siblings, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-05-24  8:20 UTC (permalink / raw)
  To: Paolo Bonzini, Andy Lutomirski
  Cc: Sean Christopherson, Jarkko Sakkinen, haim.cohen,
	intel-sgx-kernel-dev@lists.01.org, kvm list, Radim Krcmar



On 5/24/2017 4:43 AM, Paolo Bonzini wrote:
>
>
> On 23/05/2017 18:34, Andy Lutomirski wrote:
>>
>>> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon
>>> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For
>>> example, MTF may be pending at end of interrupt (cannot recall exactly) if
>>> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do
>>> additional thing to check whether this MTF VMEXIT really happens after ENCLS
>>> run (step 3 above). And depending on what we need to do, we may need to
>>> check whether ENCLS succeeded or not in guest, which is also tricky, as
>>> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD
>>> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to
>>> locate EPC/SECS page and update status, depending on the purpose of trapping
>>> ENCLS.
>> I think there are some issues here.
>>
>> First, you're making a big assumption that, when you resume the guest
>> with MTF set, the instruction that gets executed is still
>> ENCLS[EINIT].  That's not guaranteed as is -- you could race against
>> another vCPU that changes the instruction, the instruction could be in
>> IO space, host userspace could be messing with you, etc.  Second, I
>> don't think there's any precedent at all in KVM for doing this.
>> Third, you still need to make sure that the MSRs retain the value you
>> want them to have by the time ENCLS happens.  I think that, by the
>> time you resolve all of these issues, it'll look a lot like the
>> pseudocode I emailed out, and MTF won't be necessary any more.
>
> Agreed.  Emulation in the host is better.

Hi Andy/Paolo,

Thanks for comments. I'll follow your suggestion in v2.

Thanks,
-Kai

>
> Paolo
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-18  7:45               ` Huang, Kai
@ 2017-06-06 20:52                 ` Huang, Kai
  2017-06-06 21:22                   ` Andy Lutomirski
  2017-06-08 12:31                   ` Jarkko Sakkinen
  0 siblings, 2 replies; 78+ messages in thread
From: Huang, Kai @ 2017-06-06 20:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kai Huang, Paolo Bonzini, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev, haim.cohen, Jarkko Sakkinen



On 5/18/2017 7:45 PM, Huang, Kai wrote:
> 
> 
> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai 
>> <kai.huang@linux.intel.com> wrote:
>>>
>>>
>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>
>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>> I am not sure whether the cost of writing to 4 MSRs would be 
>>>>> *extremely*
>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>> writing
>>>>> to several MSRs, etc.
>>>>
>>>>
>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>> unusualy slow.
>>>>
>>>>>
>>>>>>
>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>> will take the mutex, compare the percpu variable to the desired 
>>>>>> value,
>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>
>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its 
>>>>>> in-memory
>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate 
>>>>>> EINIT to
>>>>>> support the same handling as the host.  There is no action 
>>>>>> required at
>>>>>> all on KVM guest entry and exit.
>>>>>
>>>>>
>>>>>
>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>> have,
>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>> performance critical path. We can simply read old value from MSRs 
>>>>> out and
>>>>> compare whether the old equals to the new.
>>>>
>>>>
>>>> I think the SGX driver should probably live in arch/x86, and the
>>>> interface could be a simple percpu variable that is exported (from the
>>>> main kernel image, not from a module).
>>>>
>>>>>
>>>>>>
>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>
>>>>>
>>>>>
>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>> apply
>>>>> to guest. Can you elaborate? I mean in general virtualization just 
>>>>> focus
>>>>> emulating hardware behavior. If a native machine is able to run any 
>>>>> LE,
>>>>> the
>>>>> virtual machine should be able to as well (of course, with guest's
>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>
>>>>
>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>> for launch control:
>>>>
>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>
>>>> 2. Allow some things, and make sure that VMs have at least as
>>>> restrictive a policy as host root has.  After all, what's the point of
>>>> restricting enclaves in the host if host code can simply spawn a
>>>> little VM to run otherwise-disallowed enclaves?
>>>
>>>
>>> What's the current SGX driver launch control policy? Yes allow 
>>> everything
>>> works for KVM so lets skip this. Are we going to support allowing 
>>> several
>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>> staff but I don't know details.
>>>
>>> I am trying to find a way that we can both not break host launch control
>>> policy, and be consistent to HW behavior (from guest's view). 
>>> Currently we
>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn 
>>> either
>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this 
>>> purpose.
>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>
>>>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>
>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) 
>>> value of
>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is 
>>> allowed
>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>
>>> If host only allows one single LE to run, KVM can add a restrict that 
>>> only
>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>> disabled, so that only host allowed (single) hash can be used by 
>>> guest. From
>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) 
>>> hash.
>>>
>>> If host allows several LEs (not but everything), and if we create 
>>> guest with
>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>> guest's hardware's point of view, we can actually run any LE but we 
>>> have to
>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn 
>>> to some
>>> specific values. One compromise solution is we don't allow to create 
>>> guest
>>> with 'lewr' specified, and at the meantime, only allow to create 
>>> guest with
>>> host approved hashes specified in 'lehash'. This will make guest's 
>>> behavior
>>> consistent to HW behavior but only allows guest to run one LE (which is
>>> specified by 'lehash' when guest is created).
>>
>> I'm not sure I entirely agree for a couple reasons.
>>
>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>> in which it checks all enclaves (not just LEs) for acceptability.  In
>> fact, if the kernel sticks with the "no LE at all or just
>> kernel-internal LE", then checking enclaves directly against some
>> admin- or distro-provided signer list seems reasonable.  This type of
>> policy can't be forwarded to a guest by restricting allowed LE
>> signers.  But this is mostly speculation since AFAIK no one has
>> seriously proposed any particular policy support and the plan was to
>> not have this for the initial implementation.
>>
>> 2. While matching hardware behavior is nice in principle, there
>> doesn't seem to be useful hardware behavior to match here.  If the
>> host had a list of five allowed LE signers, how exactly would it
>> restrict the MSRs?  They're not written atomically, so you can't
>> directly tell what's being written.
> 
> In this case I actually plan to just allow creating guest with guest's 
> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is 
> specified, creating guest will fail. And we only allow creating guest 
> with host allowed hash values (with 'lehash=hash-value'), and if 
> 'hash-value' specified by 'lehash' is not allowed by host, we also fail 
> to create guest.
> 
> We can only allow creating guest with 'lewr' specified when host allows 
> anything.
> 
> But in this way, we are restricting guest OS's ability to run LE, as 
> only one LE, that is specified by 'lehash' parameter, can be run. But I 
> think this won't hurt much, as multiple guests still are able to run 
> different LEs?
> 
> Also, the only way to fail an MSR
>> write is to send #GP, and Windows (and even Linux) may not expect
>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>> you still get a big fat warning.  I wouldn't be at all surprised if
>> Windows BSODs.
> 
> We cannot allow writing some particular value to MSRs successfully, 
> while injecting #GP when writing other values to the same MSRs. So #GP 
> is not option.
> 
> ENCLS[EINIT], on the other hand, returns an actual
>> error code.  I'm not sure that a sensible error code exists
>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
> 
> Looks no such error code exists. And we cannot return such error code to 
> guest as such error code is only supposed to be valid when ENCLS is run 
> in hypervisor.
> 
> but SGX_INVALID_EINITTOKEN seems
>> to mean, more or less, "the CPU thinks you're not authorized to do
>> this", so forcing that error code could be entirely reasonable.
>>
>> If the host policy is to allow a list of LE signers, you could return
>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>> the list.
> 
> But this would be inconsistent with HW behavior. If the hash value in 
> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, 
> EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
> 
> I think from VMM's perspective, emulating HW behavior to be consistent 
> with real HW behavior is very important.
> 
> Paolo, would you provide your comments?

Hi all,

This has been quite for a while and I'd like to start discussion again. 
Jarkko told me that currently he only supports one LE in SGX driver, but 
I am not sure whether he is going to extend in the future or not. I 
think this might also depend on requirements from customers.

Andy,

If we only support one LE in driver, then we can only support the same 
LE for all KVM guests, according to your comments that host kernel 
launch control policy should also apply to KVM guests? WOuld you 
comments more?

Jarkko,

Could you help to clarify the whole launch control policy in host side 
so that we can have a better understanding together?

Thanks,
-Kai

> 
>>
>> --Andy
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-06 20:52                 ` Huang, Kai
@ 2017-06-06 21:22                   ` Andy Lutomirski
  2017-06-06 22:51                     ` Huang, Kai
  2017-06-08 12:31                   ` Jarkko Sakkinen
  1 sibling, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-06 21:22 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen, Jarkko Sakkinen

On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>
>>
>>
>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>
>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> I am not sure whether the cost of writing to 4 MSRs would be
>>>>>> *extremely*
>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>> writing
>>>>>> to several MSRs, etc.
>>>>>
>>>>>
>>>>>
>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>> unusualy slow.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>> will take the mutex, compare the percpu variable to the desired
>>>>>>> value,
>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>
>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its
>>>>>>> in-memory
>>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT
>>>>>>> to
>>>>>>> support the same handling as the host.  There is no action required
>>>>>>> at
>>>>>>> all on KVM guest entry and exit.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>> have,
>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>> performance critical path. We can simply read old value from MSRs out
>>>>>> and
>>>>>> compare whether the old equals to the new.
>>>>>
>>>>>
>>>>>
>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>> interface could be a simple percpu variable that is exported (from the
>>>>> main kernel image, not from a module).
>>>>>
>>>>>>
>>>>>>>
>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>> apply
>>>>>> to guest. Can you elaborate? I mean in general virtualization just
>>>>>> focus
>>>>>> emulating hardware behavior. If a native machine is able to run any
>>>>>> LE,
>>>>>> the
>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>
>>>>>
>>>>>
>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>> for launch control:
>>>>>
>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>
>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>> little VM to run otherwise-disallowed enclaves?
>>>>
>>>>
>>>>
>>>> What's the current SGX driver launch control policy? Yes allow
>>>> everything
>>>> works for KVM so lets skip this. Are we going to support allowing
>>>> several
>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>> staff but I don't know details.
>>>>
>>>> I am trying to find a way that we can both not break host launch control
>>>> policy, and be consistent to HW behavior (from guest's view). Currently
>>>> we
>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>> either
>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this
>>>> purpose.
>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>
>>>>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>
>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value
>>>> of
>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is
>>>> allowed
>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>
>>>> If host only allows one single LE to run, KVM can add a restrict that
>>>> only
>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>> disabled, so that only host allowed (single) hash can be used by guest.
>>>> From
>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)
>>>> hash.
>>>>
>>>> If host allows several LEs (not but everything), and if we create guest
>>>> with
>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>> guest's hardware's point of view, we can actually run any LE but we have
>>>> to
>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to
>>>> some
>>>> specific values. One compromise solution is we don't allow to create
>>>> guest
>>>> with 'lewr' specified, and at the meantime, only allow to create guest
>>>> with
>>>> host approved hashes specified in 'lehash'. This will make guest's
>>>> behavior
>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>> specified by 'lehash' when guest is created).
>>>
>>>
>>> I'm not sure I entirely agree for a couple reasons.
>>>
>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>> fact, if the kernel sticks with the "no LE at all or just
>>> kernel-internal LE", then checking enclaves directly against some
>>> admin- or distro-provided signer list seems reasonable.  This type of
>>> policy can't be forwarded to a guest by restricting allowed LE
>>> signers.  But this is mostly speculation since AFAIK no one has
>>> seriously proposed any particular policy support and the plan was to
>>> not have this for the initial implementation.
>>>
>>> 2. While matching hardware behavior is nice in principle, there
>>> doesn't seem to be useful hardware behavior to match here.  If the
>>> host had a list of five allowed LE signers, how exactly would it
>>> restrict the MSRs?  They're not written atomically, so you can't
>>> directly tell what's being written.
>>
>>
>> In this case I actually plan to just allow creating guest with guest's
>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>> specified, creating guest will fail. And we only allow creating guest with
>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'
>> specified by 'lehash' is not allowed by host, we also fail to create guest.
>>
>> We can only allow creating guest with 'lewr' specified when host allows
>> anything.
>>
>> But in this way, we are restricting guest OS's ability to run LE, as only
>> one LE, that is specified by 'lehash' parameter, can be run. But I think
>> this won't hurt much, as multiple guests still are able to run different
>> LEs?
>>
>> Also, the only way to fail an MSR
>>>
>>> write is to send #GP, and Windows (and even Linux) may not expect
>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>> Windows BSODs.
>>
>>
>> We cannot allow writing some particular value to MSRs successfully, while
>> injecting #GP when writing other values to the same MSRs. So #GP is not
>> option.
>>
>> ENCLS[EINIT], on the other hand, returns an actual
>>>
>>> error code.  I'm not sure that a sensible error code exists
>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>
>>
>> Looks no such error code exists. And we cannot return such error code to
>> guest as such error code is only supposed to be valid when ENCLS is run in
>> hypervisor.
>>
>> but SGX_INVALID_EINITTOKEN seems
>>>
>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>> this", so forcing that error code could be entirely reasonable.
>>>
>>> If the host policy is to allow a list of LE signers, you could return
>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>> the list.
>>
>>
>> But this would be inconsistent with HW behavior. If the hash value in
>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT
>> is not supposed to return SGX_INVALID_EINITTOKEN.
>>
>> I think from VMM's perspective, emulating HW behavior to be consistent
>> with real HW behavior is very important.
>>
>> Paolo, would you provide your comments?
>
>
> Hi all,
>
> This has been quite for a while and I'd like to start discussion again.
> Jarkko told me that currently he only supports one LE in SGX driver, but I
> am not sure whether he is going to extend in the future or not. I think this
> might also depend on requirements from customers.
>
> Andy,
>
> If we only support one LE in driver, then we can only support the same LE
> for all KVM guests, according to your comments that host kernel launch
> control policy should also apply to KVM guests? WOuld you comments more?

I'm not at all convinced that, going forward, Linux's host-side launch
control policy will be entirely contained in the LE.  I'm also not
convinced that non-Linux guests will function at all under this type
of policy -- what if FreeBSD's LE is different for whatever reason?

>
> Jarkko,
>
> Could you help to clarify the whole launch control policy in host side so
> that we can have a better understanding together?
>
> Thanks,
> -Kai
>
>>
>>>
>>> --Andy
>>>
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-06 21:22                   ` Andy Lutomirski
@ 2017-06-06 22:51                     ` Huang, Kai
  2017-06-07 14:45                       ` Cohen, Haim
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-06-06 22:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kai Huang, Paolo Bonzini, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev, haim.cohen, Jarkko Sakkinen



On 6/7/2017 9:22 AM, Andy Lutomirski wrote:
> On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>
>>>
>>>
>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>>
>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be
>>>>>>> *extremely*
>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>>> writing
>>>>>>> to several MSRs, etc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>> unusualy slow.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>>> will take the mutex, compare the percpu variable to the desired
>>>>>>>> value,
>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>
>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its
>>>>>>>> in-memory
>>>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT
>>>>>>>> to
>>>>>>>> support the same handling as the host.  There is no action required
>>>>>>>> at
>>>>>>>> all on KVM guest entry and exit.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>>> have,
>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>>> performance critical path. We can simply read old value from MSRs out
>>>>>>> and
>>>>>>> compare whether the old equals to the new.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>> interface could be a simple percpu variable that is exported (from the
>>>>>> main kernel image, not from a module).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>>> apply
>>>>>>> to guest. Can you elaborate? I mean in general virtualization just
>>>>>>> focus
>>>>>>> emulating hardware behavior. If a native machine is able to run any
>>>>>>> LE,
>>>>>>> the
>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>
>>>>>>
>>>>>>
>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>> for launch control:
>>>>>>
>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>
>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>
>>>>>
>>>>>
>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>> everything
>>>>> works for KVM so lets skip this. Are we going to support allowing
>>>>> several
>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>>> staff but I don't know details.
>>>>>
>>>>> I am trying to find a way that we can both not break host launch control
>>>>> policy, and be consistent to HW behavior (from guest's view). Currently
>>>>> we
>>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> either
>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this
>>>>> purpose.
>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>
>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>
>>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value
>>>>> of
>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is
>>>>> allowed
>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>
>>>>> If host only allows one single LE to run, KVM can add a restrict that
>>>>> only
>>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> disabled, so that only host allowed (single) hash can be used by guest.
>>>>> From
>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)
>>>>> hash.
>>>>>
>>>>> If host allows several LEs (not but everything), and if we create guest
>>>>> with
>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>> guest's hardware's point of view, we can actually run any LE but we have
>>>>> to
>>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to
>>>>> some
>>>>> specific values. One compromise solution is we don't allow to create
>>>>> guest
>>>>> with 'lewr' specified, and at the meantime, only allow to create guest
>>>>> with
>>>>> host approved hashes specified in 'lehash'. This will make guest's
>>>>> behavior
>>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>>> specified by 'lehash' when guest is created).
>>>>
>>>>
>>>> I'm not sure I entirely agree for a couple reasons.
>>>>
>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>> fact, if the kernel sticks with the "no LE at all or just
>>>> kernel-internal LE", then checking enclaves directly against some
>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>> seriously proposed any particular policy support and the plan was to
>>>> not have this for the initial implementation.
>>>>
>>>> 2. While matching hardware behavior is nice in principle, there
>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>> host had a list of five allowed LE signers, how exactly would it
>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>> directly tell what's being written.
>>>
>>>
>>> In this case I actually plan to just allow creating guest with guest's
>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>> specified, creating guest will fail. And we only allow creating guest with
>>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'
>>> specified by 'lehash' is not allowed by host, we also fail to create guest.
>>>
>>> We can only allow creating guest with 'lewr' specified when host allows
>>> anything.
>>>
>>> But in this way, we are restricting guest OS's ability to run LE, as only
>>> one LE, that is specified by 'lehash' parameter, can be run. But I think
>>> this won't hurt much, as multiple guests still are able to run different
>>> LEs?
>>>
>>> Also, the only way to fail an MSR
>>>>
>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>> Windows BSODs.
>>>
>>>
>>> We cannot allow writing some particular value to MSRs successfully, while
>>> injecting #GP when writing other values to the same MSRs. So #GP is not
>>> option.
>>>
>>> ENCLS[EINIT], on the other hand, returns an actual
>>>>
>>>> error code.  I'm not sure that a sensible error code exists
>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>
>>>
>>> Looks no such error code exists. And we cannot return such error code to
>>> guest as such error code is only supposed to be valid when ENCLS is run in
>>> hypervisor.
>>>
>>> but SGX_INVALID_EINITTOKEN seems
>>>>
>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>> this", so forcing that error code could be entirely reasonable.
>>>>
>>>> If the host policy is to allow a list of LE signers, you could return
>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>> the list.
>>>
>>>
>>> But this would be inconsistent with HW behavior. If the hash value in
>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT
>>> is not supposed to return SGX_INVALID_EINITTOKEN.
>>>
>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>> with real HW behavior is very important.
>>>
>>> Paolo, would you provide your comments?
>>
>>
>> Hi all,
>>
>> This has been quite for a while and I'd like to start discussion again.
>> Jarkko told me that currently he only supports one LE in SGX driver, but I
>> am not sure whether he is going to extend in the future or not. I think this
>> might also depend on requirements from customers.
>>
>> Andy,
>>
>> If we only support one LE in driver, then we can only support the same LE
>> for all KVM guests, according to your comments that host kernel launch
>> control policy should also apply to KVM guests? WOuld you comments more?
> 
> I'm not at all convinced that, going forward, Linux's host-side launch
> control policy will be entirely contained in the LE.  I'm also not
> convinced that non-Linux guests will function at all under this type
> of policy -- what if FreeBSD's LE is different for whatever reason?

I am not convinced either. I think we need Jarkko to elaborate how is 
host side launch control policy implemented, or is there any policy at 
all. I also tried to read SGX driver code but looks I couldn't find any 
implementation regarding to this.

Hi Jarkko,

Can you elaborate on this?

Thanks,
-Kai
> 
>>
>> Jarkko,
>>
>> Could you help to clarify the whole launch control policy in host side so
>> that we can have a better understanding together?
>>
>> Thanks,
>> -Kai
>>
>>>
>>>>
>>>> --Andy
>>>>
>>
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-06 22:51                     ` Huang, Kai
@ 2017-06-07 14:45                       ` Cohen, Haim
  0 siblings, 0 replies; 78+ messages in thread
From: Cohen, Haim @ 2017-06-07 14:45 UTC (permalink / raw)
  To: Huang, Kai, Andy Lutomirski
  Cc: Kai Huang, Paolo Bonzini, Radim Krcmar, kvm list,
	intel-sgx-kernel-dev, Jarkko Sakkinen, Cohen, Haim

On 6/6/2017 6:52 PM, Huang, Kai wrote:
>On 6/7/2017 9:22 AM, Andy Lutomirski wrote:
>> On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>>
>>>
>>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>>
>>>>
>>>>
>>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>>>
>>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
>>>>> <kai.huang@linux.intel.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai
>>>>>>> <kai.huang@linux.intel.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be
>>>>>>>> *extremely*
>>>>>>>> slow, as when vcpu is schedule in, KVM is already doing
>>>>>>>> vmcs_load, writing to several MSRs, etc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>>> unusualy slow.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH
>along
>>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>>>> will take the mutex, compare the percpu variable to the desired
>>>>>>>>> value,
>>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>>
>>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its
>>>>>>>>> in-memory
>>>>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT
>>>>>>>>> to
>>>>>>>>> support the same handling as the host.  There is no action required
>>>>>>>>> at
>>>>>>>>> all on KVM guest entry and exit.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>>>> have,
>>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>>>> performance critical path. We can simply read old value from MSRs out
>>>>>>>> and
>>>>>>>> compare whether the old equals to the new.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>>> interface could be a simple percpu variable that is exported (from the
>>>>>>> main kernel image, not from a module).
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>>>> apply
>>>>>>>> to guest. Can you elaborate? I mean in general virtualization just
>>>>>>>> focus
>>>>>>>> emulating hardware behavior. If a native machine is able to run any
>>>>>>>> LE,
>>>>>>>> the
>>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>>> for launch control:
>>>>>>>
>>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>>
>>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>>
>>>>>>
>>>>>>
>>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>>> everything
>>>>>> works for KVM so lets skip this. Are we going to support allowing
>>>>>> several
>>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>>>> staff but I don't know details.
>>>>>>
>>>>>> I am trying to find a way that we can both not break host launch control
>>>>>> policy, and be consistent to HW behavior (from guest's view). Currently
>>>>>> we
>>>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>>> either
>>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this
>>>>>> purpose.
>>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>>
>>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>>
>>>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value
>>>>>> of
>>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is
>>>>>> allowed
>>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>>
>>>>>> If host only allows one single LE to run, KVM can add a restrict that
>>>>>> only
>>>>>> allows to create KVM guest with runtime change to
>IA32_SGXLEPUBKEYHASHn
>>>>>> disabled, so that only host allowed (single) hash can be used by guest.
>>>>>> From
>>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and
>has
>>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)
>>>>>> hash.
>>>>>>
>>>>>> If host allows several LEs (not but everything), and if we create guest
>>>>>> with
>>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>>> guest's hardware's point of view, we can actually run any LE but we have
>>>>>> to
>>>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn
>to
>>>>>> some
>>>>>> specific values. One compromise solution is we don't allow to create
>>>>>> guest
>>>>>> with 'lewr' specified, and at the meantime, only allow to create guest
>>>>>> with
>>>>>> host approved hashes specified in 'lehash'. This will make guest's
>>>>>> behavior
>>>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>>>> specified by 'lehash' when guest is created).
>>>>>
>>>>>
>>>>> I'm not sure I entirely agree for a couple reasons.
>>>>>
>>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>>> fact, if the kernel sticks with the "no LE at all or just
>>>>> kernel-internal LE", then checking enclaves directly against some
>>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>>> seriously proposed any particular policy support and the plan was to
>>>>> not have this for the initial implementation.
>>>>>
>>>>> 2. While matching hardware behavior is nice in principle, there
>>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>>> host had a list of five allowed LE signers, how exactly would it
>>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>>> directly tell what's being written.
>>>>
>>>>
>>>> In this case I actually plan to just allow creating guest with guest's
>>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>>> specified, creating guest will fail. And we only allow creating guest with
>>>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'
>>>> specified by 'lehash' is not allowed by host, we also fail to create guest.
>>>>
>>>> We can only allow creating guest with 'lewr' specified when host allows
>>>> anything.
>>>>
>>>> But in this way, we are restricting guest OS's ability to run LE, as only
>>>> one LE, that is specified by 'lehash' parameter, can be run. But I think
>>>> this won't hurt much, as multiple guests still are able to run different
>>>> LEs?
>>>>
>>>> Also, the only way to fail an MSR
>>>>>
>>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>>> Windows BSODs.
>>>>
>>>>
>>>> We cannot allow writing some particular value to MSRs successfully, while
>>>> injecting #GP when writing other values to the same MSRs. So #GP is not
>>>> option.
>>>>
>>>> ENCLS[EINIT], on the other hand, returns an actual
>>>>>
>>>>> error code.  I'm not sure that a sensible error code exists
>>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>>
>>>>
>>>> Looks no such error code exists. And we cannot return such error code to
>>>> guest as such error code is only supposed to be valid when ENCLS is run in
>>>> hypervisor.
>>>>
>>>> but SGX_INVALID_EINITTOKEN seems
>>>>>
>>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>>> this", so forcing that error code could be entirely reasonable.
>>>>>
>>>>> If the host policy is to allow a list of LE signers, you could return
>>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>>> the list.
>>>>
>>>>
>>>> But this would be inconsistent with HW behavior. If the hash value in
>>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,
>EINIT
>>>> is not supposed to return SGX_INVALID_EINITTOKEN.
>>>>
>>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>>> with real HW behavior is very important.
>>>>
>>>> Paolo, would you provide your comments?
>>>
>>>
>>> Hi all,
>>>
>>> This has been quite for a while and I'd like to start discussion again.
>>> Jarkko told me that currently he only supports one LE in SGX driver, but I
>>> am not sure whether he is going to extend in the future or not. I think this
>>> might also depend on requirements from customers.
>>>
>>> Andy,
>>>
>>> If we only support one LE in driver, then we can only support the same LE
>>> for all KVM guests, according to your comments that host kernel launch
>>> control policy should also apply to KVM guests? WOuld you comments more?
>>
>> I'm not at all convinced that, going forward, Linux's host-side launch
>> control policy will be entirely contained in the LE.  I'm also not
>> convinced that non-Linux guests will function at all under this type
>> of policy -- what if FreeBSD's LE is different for whatever reason?
>
>I am not convinced either. I think we need Jarkko to elaborate how is
>host side launch control policy implemented, or is there any policy at
>all. I also tried to read SGX driver code but looks I couldn't find any
>implementation regarding to this.
>
>Hi Jarkko,
>
>Can you elaborate on this?
>
>Thanks,
>-Kai

I don't think the kernel's LE policy is relevant here.
As long as you allow the guest OS to set the IA32_SGXLEPUBKEYHASHn MSRs, either directly or by the VMM 'lehash' value, you don't need a support for more than on LE in the kernel.
The host kernel will have one LE and the guest kernel has another "one" LE that may have a different hash value.
I agree with Andy that different OS distributions are likely to have different LE hash values - so the guest OS may require different hash setting.

>>
>>>
>>> Jarkko,
>>>
>>> Could you help to clarify the whole launch control policy in host side so
>>> that we can have a better understanding together?
>>>
>>> Thanks,
>>> -Kai
>>>
>>>>
>>>>>
>>>>> --Andy
>>>>>
>>>
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-06 20:52                 ` Huang, Kai
  2017-06-06 21:22                   ` Andy Lutomirski
@ 2017-06-08 12:31                   ` Jarkko Sakkinen
  2017-06-08 23:47                     ` Huang, Kai
  1 sibling, 1 reply; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-08 12:31 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:
> 
> 
> On 5/18/2017 7:45 PM, Huang, Kai wrote:
> > 
> > 
> > On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
> > > On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
> > > <kai.huang@linux.intel.com> wrote:
> > > > 
> > > > 
> > > > On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
> > > > > 
> > > > > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
> > > > > wrote:
> > > > > > 
> > > > > > I am not sure whether the cost of writing to 4 MSRs
> > > > > > would be *extremely*
> > > > > > slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
> > > > > > writing
> > > > > > to several MSRs, etc.
> > > > > 
> > > > > 
> > > > > I'm speculating that these MSRs may be rather unoptimized and hence
> > > > > unusualy slow.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > > > > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > > > > > will take the mutex, compare the percpu variable to
> > > > > > > the desired value,
> > > > > > > and, if it's different, do WRMSR and update the percpu variable.
> > > > > > > 
> > > > > > > KVM will implement writes to SGXLEPUBKEYHASH by
> > > > > > > updating its in-memory
> > > > > > > state but *not* changing the MSRs.  KVM will trap
> > > > > > > and emulate EINIT to
> > > > > > > support the same handling as the host.  There is no
> > > > > > > action required at
> > > > > > > all on KVM guest entry and exit.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > This is doable, but SGX driver needs to do those things and expose
> > > > > > interfaces for KVM to use. In terms of the percpu data, it is nice to
> > > > > > have,
> > > > > > but I am not sure whether it is mandatory, as IMO EINIT is not even in
> > > > > > performance critical path. We can simply read old value
> > > > > > from MSRs out and
> > > > > > compare whether the old equals to the new.
> > > > > 
> > > > > 
> > > > > I think the SGX driver should probably live in arch/x86, and the
> > > > > interface could be a simple percpu variable that is exported (from the
> > > > > main kernel image, not from a module).
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > FWIW, I think that KVM will, in the long run, want to trap EINIT for
> > > > > > > other reasons: someone is going to want to implement policy for what
> > > > > > > enclaves are allowed that applies to guests as well as the host.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > I am not very convinced why "what enclaves are allowed" in host would
> > > > > > apply
> > > > > > to guest. Can you elaborate? I mean in general
> > > > > > virtualization just focus
> > > > > > emulating hardware behavior. If a native machine is able
> > > > > > to run any LE,
> > > > > > the
> > > > > > virtual machine should be able to as well (of course, with guest's
> > > > > > IA32_FEATURE_CONTROL[bit 17] set).
> > > > > 
> > > > > 
> > > > > I strongly disagree.  I can imagine two classes of sensible policies
> > > > > for launch control:
> > > > > 
> > > > > 1. Allow everything.  This seems quite sensible to me.
> > > > > 
> > > > > 2. Allow some things, and make sure that VMs have at least as
> > > > > restrictive a policy as host root has.  After all, what's the point of
> > > > > restricting enclaves in the host if host code can simply spawn a
> > > > > little VM to run otherwise-disallowed enclaves?
> > > > 
> > > > 
> > > > What's the current SGX driver launch control policy? Yes allow
> > > > everything
> > > > works for KVM so lets skip this. Are we going to support
> > > > allowing several
> > > > LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
> > > > staff but I don't know details.
> > > > 
> > > > I am trying to find a way that we can both not break host launch control
> > > > policy, and be consistent to HW behavior (from guest's view).
> > > > Currently we
> > > > can create a KVM guest with runtime change to
> > > > IA32_SGXLEPUBKEYHASHn either
> > > > enabled or disabled. I introduced an Qemu parameter 'lewr' for
> > > > this purpose.
> > > > Actually I introduced below Qemu SGX parameters for creating guest:
> > > > 
> > > >         -sgx epc=<size>,lehash='SHA-256 hash',lewr
> > > > 
> > > > where 'epc' specifies guest's EPC size, lehash specifies
> > > > (initial) value of
> > > > guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether
> > > > guest is allowed
> > > > to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
> > > > 
> > > > If host only allows one single LE to run, KVM can add a restrict
> > > > that only
> > > > allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
> > > > disabled, so that only host allowed (single) hash can be used by
> > > > guest. From
> > > > guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
> > > > IA32_SGXLEPUBKEYHASHn with default value to be host allowed
> > > > (single) hash.
> > > > 
> > > > If host allows several LEs (not but everything), and if we
> > > > create guest with
> > > > 'lewr', then the behavior is not consistent with HW behavior, as from
> > > > guest's hardware's point of view, we can actually run any LE but
> > > > we have to
> > > > tell guest that you are only allowed to change
> > > > IA32_SGXLEPUBKEYHASHn to some
> > > > specific values. One compromise solution is we don't allow to
> > > > create guest
> > > > with 'lewr' specified, and at the meantime, only allow to create
> > > > guest with
> > > > host approved hashes specified in 'lehash'. This will make
> > > > guest's behavior
> > > > consistent to HW behavior but only allows guest to run one LE (which is
> > > > specified by 'lehash' when guest is created).
> > > 
> > > I'm not sure I entirely agree for a couple reasons.
> > > 
> > > 1. I wouldn't be surprised if the kernel ends up implementing a policy
> > > in which it checks all enclaves (not just LEs) for acceptability.  In
> > > fact, if the kernel sticks with the "no LE at all or just
> > > kernel-internal LE", then checking enclaves directly against some
> > > admin- or distro-provided signer list seems reasonable.  This type of
> > > policy can't be forwarded to a guest by restricting allowed LE
> > > signers.  But this is mostly speculation since AFAIK no one has
> > > seriously proposed any particular policy support and the plan was to
> > > not have this for the initial implementation.
> > > 
> > > 2. While matching hardware behavior is nice in principle, there
> > > doesn't seem to be useful hardware behavior to match here.  If the
> > > host had a list of five allowed LE signers, how exactly would it
> > > restrict the MSRs?  They're not written atomically, so you can't
> > > directly tell what's being written.
> > 
> > In this case I actually plan to just allow creating guest with guest's
> > IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
> > specified, creating guest will fail. And we only allow creating guest
> > with host allowed hash values (with 'lehash=hash-value'), and if
> > 'hash-value' specified by 'lehash' is not allowed by host, we also fail
> > to create guest.
> > 
> > We can only allow creating guest with 'lewr' specified when host allows
> > anything.
> > 
> > But in this way, we are restricting guest OS's ability to run LE, as
> > only one LE, that is specified by 'lehash' parameter, can be run. But I
> > think this won't hurt much, as multiple guests still are able to run
> > different LEs?
> > 
> > Also, the only way to fail an MSR
> > > write is to send #GP, and Windows (and even Linux) may not expect
> > > that.  Linux doesn't panic due to #GP on MSR writes these days, but
> > > you still get a big fat warning.  I wouldn't be at all surprised if
> > > Windows BSODs.
> > 
> > We cannot allow writing some particular value to MSRs successfully,
> > while injecting #GP when writing other values to the same MSRs. So #GP
> > is not option.
> > 
> > ENCLS[EINIT], on the other hand, returns an actual
> > > error code.  I'm not sure that a sensible error code exists
> > > ("SGX_HYPERVISOR_SAID_NO?", perhaps),
> > 
> > Looks no such error code exists. And we cannot return such error code to
> > guest as such error code is only supposed to be valid when ENCLS is run
> > in hypervisor.
> > 
> > but SGX_INVALID_EINITTOKEN seems
> > > to mean, more or less, "the CPU thinks you're not authorized to do
> > > this", so forcing that error code could be entirely reasonable.
> > > 
> > > If the host policy is to allow a list of LE signers, you could return
> > > SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
> > > the list.
> > 
> > But this would be inconsistent with HW behavior. If the hash value in
> > guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,
> > EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
> > 
> > I think from VMM's perspective, emulating HW behavior to be consistent
> > with real HW behavior is very important.
> > 
> > Paolo, would you provide your comments?
> 
> Hi all,
> 
> This has been quite for a while and I'd like to start discussion again.
> Jarkko told me that currently he only supports one LE in SGX driver, but I
> am not sure whether he is going to extend in the future or not. I think this
> might also depend on requirements from customers.
> 
> Andy,
> 
> If we only support one LE in driver, then we can only support the same LE
> for all KVM guests, according to your comments that host kernel launch
> control policy should also apply to KVM guests? WOuld you comments more?
> 
> Jarkko,
> 
> Could you help to clarify the whole launch control policy in host side so
> that we can have a better understanding together?
> 
> Thanks,
> -Kai

So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't
VMM keep virtual values for MSRs and ask host side LE create token when
it needs to?

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-08 12:31                   ` Jarkko Sakkinen
@ 2017-06-08 23:47                     ` Huang, Kai
  2017-06-08 23:53                       ` Andy Lutomirski
  2017-06-10 12:23                       ` Jarkko Sakkinen
  0 siblings, 2 replies; 78+ messages in thread
From: Huang, Kai @ 2017-06-08 23:47 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote:
> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:
>>
>>
>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>
>>>
>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
>>>> <kai.huang@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I am not sure whether the cost of writing to 4 MSRs
>>>>>>> would be *extremely*
>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>>> writing
>>>>>>> to several MSRs, etc.
>>>>>>
>>>>>>
>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>> unusualy slow.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>>> will take the mutex, compare the percpu variable to
>>>>>>>> the desired value,
>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>
>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by
>>>>>>>> updating its in-memory
>>>>>>>> state but *not* changing the MSRs.  KVM will trap
>>>>>>>> and emulate EINIT to
>>>>>>>> support the same handling as the host.  There is no
>>>>>>>> action required at
>>>>>>>> all on KVM guest entry and exit.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>>> have,
>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>>> performance critical path. We can simply read old value
>>>>>>> from MSRs out and
>>>>>>> compare whether the old equals to the new.
>>>>>>
>>>>>>
>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>> interface could be a simple percpu variable that is exported (from the
>>>>>> main kernel image, not from a module).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>>> apply
>>>>>>> to guest. Can you elaborate? I mean in general
>>>>>>> virtualization just focus
>>>>>>> emulating hardware behavior. If a native machine is able
>>>>>>> to run any LE,
>>>>>>> the
>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>
>>>>>>
>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>> for launch control:
>>>>>>
>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>
>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>
>>>>>
>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>> everything
>>>>> works for KVM so lets skip this. Are we going to support
>>>>> allowing several
>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>>> staff but I don't know details.
>>>>>
>>>>> I am trying to find a way that we can both not break host launch control
>>>>> policy, and be consistent to HW behavior (from guest's view).
>>>>> Currently we
>>>>> can create a KVM guest with runtime change to
>>>>> IA32_SGXLEPUBKEYHASHn either
>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for
>>>>> this purpose.
>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>
>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>
>>>>> where 'epc' specifies guest's EPC size, lehash specifies
>>>>> (initial) value of
>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether
>>>>> guest is allowed
>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>
>>>>> If host only allows one single LE to run, KVM can add a restrict
>>>>> that only
>>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> disabled, so that only host allowed (single) hash can be used by
>>>>> guest. From
>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed
>>>>> (single) hash.
>>>>>
>>>>> If host allows several LEs (not but everything), and if we
>>>>> create guest with
>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>> guest's hardware's point of view, we can actually run any LE but
>>>>> we have to
>>>>> tell guest that you are only allowed to change
>>>>> IA32_SGXLEPUBKEYHASHn to some
>>>>> specific values. One compromise solution is we don't allow to
>>>>> create guest
>>>>> with 'lewr' specified, and at the meantime, only allow to create
>>>>> guest with
>>>>> host approved hashes specified in 'lehash'. This will make
>>>>> guest's behavior
>>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>>> specified by 'lehash' when guest is created).
>>>>
>>>> I'm not sure I entirely agree for a couple reasons.
>>>>
>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>> fact, if the kernel sticks with the "no LE at all or just
>>>> kernel-internal LE", then checking enclaves directly against some
>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>> seriously proposed any particular policy support and the plan was to
>>>> not have this for the initial implementation.
>>>>
>>>> 2. While matching hardware behavior is nice in principle, there
>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>> host had a list of five allowed LE signers, how exactly would it
>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>> directly tell what's being written.
>>>
>>> In this case I actually plan to just allow creating guest with guest's
>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>> specified, creating guest will fail. And we only allow creating guest
>>> with host allowed hash values (with 'lehash=hash-value'), and if
>>> 'hash-value' specified by 'lehash' is not allowed by host, we also fail
>>> to create guest.
>>>
>>> We can only allow creating guest with 'lewr' specified when host allows
>>> anything.
>>>
>>> But in this way, we are restricting guest OS's ability to run LE, as
>>> only one LE, that is specified by 'lehash' parameter, can be run. But I
>>> think this won't hurt much, as multiple guests still are able to run
>>> different LEs?
>>>
>>> Also, the only way to fail an MSR
>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>> Windows BSODs.
>>>
>>> We cannot allow writing some particular value to MSRs successfully,
>>> while injecting #GP when writing other values to the same MSRs. So #GP
>>> is not option.
>>>
>>> ENCLS[EINIT], on the other hand, returns an actual
>>>> error code.  I'm not sure that a sensible error code exists
>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>
>>> Looks no such error code exists. And we cannot return such error code to
>>> guest as such error code is only supposed to be valid when ENCLS is run
>>> in hypervisor.
>>>
>>> but SGX_INVALID_EINITTOKEN seems
>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>> this", so forcing that error code could be entirely reasonable.
>>>>
>>>> If the host policy is to allow a list of LE signers, you could return
>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>> the list.
>>>
>>> But this would be inconsistent with HW behavior. If the hash value in
>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,
>>> EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
>>>
>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>> with real HW behavior is very important.
>>>
>>> Paolo, would you provide your comments?
>>
>> Hi all,
>>
>> This has been quite for a while and I'd like to start discussion again.
>> Jarkko told me that currently he only supports one LE in SGX driver, but I
>> am not sure whether he is going to extend in the future or not. I think this
>> might also depend on requirements from customers.
>>
>> Andy,
>>
>> If we only support one LE in driver, then we can only support the same LE
>> for all KVM guests, according to your comments that host kernel launch
>> control policy should also apply to KVM guests? WOuld you comments more?
>>
>> Jarkko,
>>
>> Could you help to clarify the whole launch control policy in host side so
>> that we can have a better understanding together?
>>
>> Thanks,
>> -Kai
> 
> So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't
> VMM keep virtual values for MSRs and ask host side LE create token when
> it needs to?

Hi Jarkko,

Thanks for replying. VMM doesn't need driver to generate EINITTOKEN. The 
EINITTOKEN is from guest too, upon VMM traps EINIT.

I think Andy's comments is, if host SGX driver only allows someone's LE 
to run (ex, LE from Intel, Redhat, etc...), then SGX driver should also 
govern LEs from KVM guests as well, by checking SIGSTRUCT from KVM guest 
(the SIGSTRUCT is provided by KVM by trapping guest's EINIT).

In my understanding, although you only allows one LE in kernel, but you 
won't limit who's LE can be run (basically kernel can run LE signed by 
anyone, but just one LE when kernel is running), so I don't see there is 
any limitation to KVM guests here.

But it may still be better if SGX driver can provide function like:

     int sgx_validate_sigstruct(struct sigstruct *sig);

for KVM to call, in case driver is changed (ex, to only allows LEs from 
some particular ones to run), but this is not necessary now. KVM changes 
can be done later when driver make the changes.

Andy,

Am I understanding correctly? Does this make sense to you?

Thanks,
-Kai

> 
> /Jarkko
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-08 23:47                     ` Huang, Kai
@ 2017-06-08 23:53                       ` Andy Lutomirski
  2017-06-09 15:38                         ` Cohen, Haim
  2017-06-10 12:23                       ` Jarkko Sakkinen
  1 sibling, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-08 23:53 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Jarkko Sakkinen, Andy Lutomirski, Kai Huang, Paolo Bonzini,
	Radim Krcmar, kvm list, intel-sgx-kernel-dev, haim.cohen

On Thu, Jun 8, 2017 at 4:47 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote:
>>
>> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:
>>>
>>>
>>>
>>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>>
>>>>
>>>>
>>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>>>
>>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
>>>>> <kai.huang@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai
>>>>>>> <kai.huang@linux.intel.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure whether the cost of writing to 4 MSRs
>>>>>>>> would be *extremely*
>>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>>>> writing
>>>>>>>> to several MSRs, etc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>>> unusualy slow.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH
>>>>>>>>> along
>>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of
>>>>>>>>> EINIT
>>>>>>>>> will take the mutex, compare the percpu variable to
>>>>>>>>> the desired value,
>>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>>
>>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by
>>>>>>>>> updating its in-memory
>>>>>>>>> state but *not* changing the MSRs.  KVM will trap
>>>>>>>>> and emulate EINIT to
>>>>>>>>> support the same handling as the host.  There is no
>>>>>>>>> action required at
>>>>>>>>> all on KVM guest entry and exit.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice
>>>>>>>> to
>>>>>>>> have,
>>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even
>>>>>>>> in
>>>>>>>> performance critical path. We can simply read old value
>>>>>>>> from MSRs out and
>>>>>>>> compare whether the old equals to the new.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>>> interface could be a simple percpu variable that is exported (from
>>>>>>> the
>>>>>>> main kernel image, not from a module).
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT
>>>>>>>>> for
>>>>>>>>> other reasons: someone is going to want to implement policy for
>>>>>>>>> what
>>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not very convinced why "what enclaves are allowed" in host
>>>>>>>> would
>>>>>>>> apply
>>>>>>>> to guest. Can you elaborate? I mean in general
>>>>>>>> virtualization just focus
>>>>>>>> emulating hardware behavior. If a native machine is able
>>>>>>>> to run any LE,
>>>>>>>> the
>>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>>> for launch control:
>>>>>>>
>>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>>
>>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>>> restrictive a policy as host root has.  After all, what's the point
>>>>>>> of
>>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>>
>>>>>>
>>>>>>
>>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>>> everything
>>>>>> works for KVM so lets skip this. Are we going to support
>>>>>> allowing several
>>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel
>>>>>> LE
>>>>>> staff but I don't know details.
>>>>>>
>>>>>> I am trying to find a way that we can both not break host launch
>>>>>> control
>>>>>> policy, and be consistent to HW behavior (from guest's view).
>>>>>> Currently we
>>>>>> can create a KVM guest with runtime change to
>>>>>> IA32_SGXLEPUBKEYHASHn either
>>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for
>>>>>> this purpose.
>>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>>
>>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>>
>>>>>> where 'epc' specifies guest's EPC size, lehash specifies
>>>>>> (initial) value of
>>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether
>>>>>> guest is allowed
>>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>>
>>>>>> If host only allows one single LE to run, KVM can add a restrict
>>>>>> that only
>>>>>> allows to create KVM guest with runtime change to
>>>>>> IA32_SGXLEPUBKEYHASHn
>>>>>> disabled, so that only host allowed (single) hash can be used by
>>>>>> guest. From
>>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and
>>>>>> has
>>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed
>>>>>> (single) hash.
>>>>>>
>>>>>> If host allows several LEs (not but everything), and if we
>>>>>> create guest with
>>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>>> guest's hardware's point of view, we can actually run any LE but
>>>>>> we have to
>>>>>> tell guest that you are only allowed to change
>>>>>> IA32_SGXLEPUBKEYHASHn to some
>>>>>> specific values. One compromise solution is we don't allow to
>>>>>> create guest
>>>>>> with 'lewr' specified, and at the meantime, only allow to create
>>>>>> guest with
>>>>>> host approved hashes specified in 'lehash'. This will make
>>>>>> guest's behavior
>>>>>> consistent to HW behavior but only allows guest to run one LE (which
>>>>>> is
>>>>>> specified by 'lehash' when guest is created).
>>>>>
>>>>>
>>>>> I'm not sure I entirely agree for a couple reasons.
>>>>>
>>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>>> fact, if the kernel sticks with the "no LE at all or just
>>>>> kernel-internal LE", then checking enclaves directly against some
>>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>>> seriously proposed any particular policy support and the plan was to
>>>>> not have this for the initial implementation.
>>>>>
>>>>> 2. While matching hardware behavior is nice in principle, there
>>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>>> host had a list of five allowed LE signers, how exactly would it
>>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>>> directly tell what's being written.
>>>>
>>>>
>>>> In this case I actually plan to just allow creating guest with guest's
>>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>>> specified, creating guest will fail. And we only allow creating guest
>>>> with host allowed hash values (with 'lehash=hash-value'), and if
>>>> 'hash-value' specified by 'lehash' is not allowed by host, we also fail
>>>> to create guest.
>>>>
>>>> We can only allow creating guest with 'lewr' specified when host allows
>>>> anything.
>>>>
>>>> But in this way, we are restricting guest OS's ability to run LE, as
>>>> only one LE, that is specified by 'lehash' parameter, can be run. But I
>>>> think this won't hurt much, as multiple guests still are able to run
>>>> different LEs?
>>>>
>>>> Also, the only way to fail an MSR
>>>>>
>>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>>> Windows BSODs.
>>>>
>>>>
>>>> We cannot allow writing some particular value to MSRs successfully,
>>>> while injecting #GP when writing other values to the same MSRs. So #GP
>>>> is not option.
>>>>
>>>> ENCLS[EINIT], on the other hand, returns an actual
>>>>>
>>>>> error code.  I'm not sure that a sensible error code exists
>>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>>
>>>>
>>>> Looks no such error code exists. And we cannot return such error code to
>>>> guest as such error code is only supposed to be valid when ENCLS is run
>>>> in hypervisor.
>>>>
>>>> but SGX_INVALID_EINITTOKEN seems
>>>>>
>>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>>> this", so forcing that error code could be entirely reasonable.
>>>>>
>>>>> If the host policy is to allow a list of LE signers, you could return
>>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>>> the list.
>>>>
>>>>
>>>> But this would be inconsistent with HW behavior. If the hash value in
>>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,
>>>> EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
>>>>
>>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>>> with real HW behavior is very important.
>>>>
>>>> Paolo, would you provide your comments?
>>>
>>>
>>> Hi all,
>>>
>>> This has been quite for a while and I'd like to start discussion again.
>>> Jarkko told me that currently he only supports one LE in SGX driver, but
>>> I
>>> am not sure whether he is going to extend in the future or not. I think
>>> this
>>> might also depend on requirements from customers.
>>>
>>> Andy,
>>>
>>> If we only support one LE in driver, then we can only support the same LE
>>> for all KVM guests, according to your comments that host kernel launch
>>> control policy should also apply to KVM guests? WOuld you comments more?
>>>
>>> Jarkko,
>>>
>>> Could you help to clarify the whole launch control policy in host side so
>>> that we can have a better understanding together?
>>>
>>> Thanks,
>>> -Kai
>>
>>
>> So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't
>> VMM keep virtual values for MSRs and ask host side LE create token when
>> it needs to?
>
>
> Hi Jarkko,
>
> Thanks for replying. VMM doesn't need driver to generate EINITTOKEN. The
> EINITTOKEN is from guest too, upon VMM traps EINIT.
>
> I think Andy's comments is, if host SGX driver only allows someone's LE to
> run (ex, LE from Intel, Redhat, etc...), then SGX driver should also govern
> LEs from KVM guests as well, by checking SIGSTRUCT from KVM guest (the
> SIGSTRUCT is provided by KVM by trapping guest's EINIT).
>
> In my understanding, although you only allows one LE in kernel, but you
> won't limit who's LE can be run (basically kernel can run LE signed by
> anyone, but just one LE when kernel is running), so I don't see there is any
> limitation to KVM guests here.
>
> But it may still be better if SGX driver can provide function like:
>
>     int sgx_validate_sigstruct(struct sigstruct *sig);
>
> for KVM to call, in case driver is changed (ex, to only allows LEs from some
> particular ones to run), but this is not necessary now. KVM changes can be
> done later when driver make the changes.
>
> Andy,
>
> Am I understanding correctly? Does this make sense to you?

My understanding is that the kernel will not (at least initially)
allow users to supply an LE.  The kernel will handle launches all by
itself.  How it does so is an implementation detail, and it seems to
me that it will cause compatibility issues if guests have to use the
host's LE.

sgx_validate_sigstruct(...) sounds like it could be a good approach to me.

>
> Thanks,
> -Kai
>
>>
>> /Jarkko
>>
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-08 23:53                       ` Andy Lutomirski
@ 2017-06-09 15:38                         ` Cohen, Haim
  0 siblings, 0 replies; 78+ messages in thread
From: Cohen, Haim @ 2017-06-09 15:38 UTC (permalink / raw)
  To: Andy Lutomirski, Huang, Kai
  Cc: Jarkko Sakkinen, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, Cohen, Haim

On 6/8/2017 7:53 PM, Andy Lutomirski wrote:
>
>On Thu, Jun 8, 2017 at 4:47 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote:
>>>
>>> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:
>>>>
>>>>
>>>>
>>>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
>>>>>> <kai.huang@linux.intel.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai
>>>>>>>> <kai.huang@linux.intel.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be
>>>>>>>>> *extremely* slow, as when vcpu is schedule in, KVM is already
>>>>>>>>> doing vmcs_load, writing to several MSRs, etc.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm speculating that these MSRs may be rather unoptimized and
>>>>>>>> hence unusualy slow.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH
>>>>>>>>>> along with whatever lock is needed (probably just a mutex).
>>>>>>>>>> Users of EINIT will take the mutex, compare the percpu
>>>>>>>>>> variable to the desired value, and, if it's different, do
>>>>>>>>>> WRMSR and update the percpu variable.
>>>>>>>>>>
>>>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its
>>>>>>>>>> in-memory state but *not* changing the MSRs.  KVM will trap
>>>>>>>>>> and emulate EINIT to support the same handling as the host.
>>>>>>>>>> There is no action required at all on KVM guest entry and
>>>>>>>>>> exit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is doable, but SGX driver needs to do those things and
>>>>>>>>> expose interfaces for KVM to use. In terms of the percpu data,
>>>>>>>>> it is nice to have, but I am not sure whether it is mandatory,
>>>>>>>>> as IMO EINIT is not even in performance critical path. We can
>>>>>>>>> simply read old value from MSRs out and compare whether the old
>>>>>>>>> equals to the new.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>>>> interface could be a simple percpu variable that is exported
>>>>>>>> (from the main kernel image, not from a module).
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap
>>>>>>>>>> EINIT for other reasons: someone is going to want to implement
>>>>>>>>>> policy for what enclaves are allowed that applies to guests as
>>>>>>>>>> well as the host.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not very convinced why "what enclaves are allowed" in host
>>>>>>>>> would apply to guest. Can you elaborate? I mean in general
>>>>>>>>> virtualization just focus emulating hardware behavior. If a
>>>>>>>>> native machine is able to run any LE, the virtual machine
>>>>>>>>> should be able to as well (of course, with guest's
>>>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I strongly disagree.  I can imagine two classes of sensible
>>>>>>>> policies for launch control:
>>>>>>>>
>>>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>>>
>>>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>>>> restrictive a policy as host root has.  After all, what's the
>>>>>>>> point of restricting enclaves in the host if host code can
>>>>>>>> simply spawn a little VM to run otherwise-disallowed enclaves?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>>>> everything works for KVM so lets skip this. Are we going to
>>>>>>> support allowing several LEs, or just allowing one single LE? I
>>>>>>> know Jarkko is doing in-kernel LE staff but I don't know details.
>>>>>>>
>>>>>>> I am trying to find a way that we can both not break host launch
>>>>>>> control policy, and be consistent to HW behavior (from guest's
>>>>>>> view).
>>>>>>> Currently we
>>>>>>> can create a KVM guest with runtime change to
>>>>>>> IA32_SGXLEPUBKEYHASHn either enabled or disabled. I introduced an
>>>>>>> Qemu parameter 'lewr' for this purpose.
>>>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>>>
>>>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>>>
>>>>>>> where 'epc' specifies guest's EPC size, lehash specifies
>>>>>>> (initial) value of
>>>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest
>>>>>>> is allowed to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>>>
>>>>>>> If host only allows one single LE to run, KVM can add a restrict
>>>>>>> that only allows to create KVM guest with runtime change to
>>>>>>> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed
>>>>>>> (single) hash can be used by guest. From guest's view, it simply
>>>>>>> has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed
>>>>>>> (single) hash.
>>>>>>>
>>>>>>> If host allows several LEs (not but everything), and if we create
>>>>>>> guest with 'lewr', then the behavior is not consistent with HW
>>>>>>> behavior, as from guest's hardware's point of view, we can
>>>>>>> actually run any LE but we have to tell guest that you are only
>>>>>>> allowed to change IA32_SGXLEPUBKEYHASHn to some specific values.
>>>>>>> One compromise solution is we don't allow to create guest with
>>>>>>> 'lewr' specified, and at the meantime, only allow to create guest
>>>>>>> with host approved hashes specified in 'lehash'. This will make
>>>>>>> guest's behavior consistent to HW behavior but only allows guest
>>>>>>> to run one LE (which is specified by 'lehash' when guest is
>>>>>>> created).
>>>>>>
>>>>>>
>>>>>> I'm not sure I entirely agree for a couple reasons.
>>>>>>
>>>>>> 1. I wouldn't be surprised if the kernel ends up implementing a
>>>>>> policy in which it checks all enclaves (not just LEs) for
>>>>>> acceptability.  In fact, if the kernel sticks with the "no LE at
>>>>>> all or just kernel-internal LE", then checking enclaves directly
>>>>>> against some
>>>>>> admin- or distro-provided signer list seems reasonable.  This type
>>>>>> of policy can't be forwarded to a guest by restricting allowed LE
>>>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>>>> seriously proposed any particular policy support and the plan was
>>>>>> to not have this for the initial implementation.
>>>>>>
>>>>>> 2. While matching hardware behavior is nice in principle, there
>>>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>>>> host had a list of five allowed LE signers, how exactly would it
>>>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>>>> directly tell what's being written.
>>>>>
>>>>>
>>>>> In this case I actually plan to just allow creating guest with
>>>>> guest's IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified).
>>>>> If 'lewr' is specified, creating guest will fail. And we only allow
>>>>> creating guest with host allowed hash values (with
>>>>> 'lehash=hash-value'), and if 'hash-value' specified by 'lehash' is
>>>>> not allowed by host, we also fail to create guest.
>>>>>
>>>>> We can only allow creating guest with 'lewr' specified when host
>>>>> allows anything.
>>>>>
>>>>> But in this way, we are restricting guest OS's ability to run LE,
>>>>> as only one LE, that is specified by 'lehash' parameter, can be
>>>>> run. But I think this won't hurt much, as multiple guests still are
>>>>> able to run different LEs?
>>>>>
>>>>> Also, the only way to fail an MSR
>>>>>>
>>>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>>>> that.  Linux doesn't panic due to #GP on MSR writes these days,
>>>>>> but you still get a big fat warning.  I wouldn't be at all
>>>>>> surprised if Windows BSODs.
>>>>>
>>>>>
>>>>> We cannot allow writing some particular value to MSRs successfully,
>>>>> while injecting #GP when writing other values to the same MSRs. So
>>>>> #GP is not option.
>>>>>
>>>>> ENCLS[EINIT], on the other hand, returns an actual
>>>>>>
>>>>>> error code.  I'm not sure that a sensible error code exists
>>>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>>>
>>>>>
>>>>> Looks no such error code exists. And we cannot return such error
>>>>> code to guest as such error code is only supposed to be valid when
>>>>> ENCLS is run in hypervisor.
>>>>>
>>>>> but SGX_INVALID_EINITTOKEN seems
>>>>>>
>>>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>>>> this", so forcing that error code could be entirely reasonable.
>>>>>>
>>>>>> If the host policy is to allow a list of LE signers, you could
>>>>>> return SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE
>>>>>> that isn't in the list.
>>>>>
>>>>>
>>>>> But this would be inconsistent with HW behavior. If the hash value
>>>>> in guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by
>>>>> EINIT, EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
>>>>>
>>>>> I think from VMM's perspective, emulating HW behavior to be
>>>>> consistent with real HW behavior is very important.
>>>>>
>>>>> Paolo, would you provide your comments?
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> This has been quite for a while and I'd like to start discussion again.
>>>> Jarkko told me that currently he only supports one LE in SGX driver,
>>>> but I am not sure whether he is going to extend in the future or
>>>> not. I think this might also depend on requirements from customers.
>>>>
>>>> Andy,
>>>>
>>>> If we only support one LE in driver, then we can only support the
>>>> same LE for all KVM guests, according to your comments that host
>>>> kernel launch control policy should also apply to KVM guests? WOuld you
>comments more?
>>>>
>>>> Jarkko,
>>>>
>>>> Could you help to clarify the whole launch control policy in host
>>>> side so that we can have a better understanding together?
>>>>
>>>> Thanks,
>>>> -Kai
>>>
>>>
>>> So. I have pass through LE. It creates EINITTOKEN for anything.
>>> Couldn't VMM keep virtual values for MSRs and ask host side LE create
>>> token when it needs to?
>>
>>
>> Hi Jarkko,
>>
>> Thanks for replying. VMM doesn't need driver to generate EINITTOKEN.
>> The EINITTOKEN is from guest too, upon VMM traps EINIT.
>>
>> I think Andy's comments is, if host SGX driver only allows someone's
>> LE to run (ex, LE from Intel, Redhat, etc...), then SGX driver should
>> also govern LEs from KVM guests as well, by checking SIGSTRUCT from
>> KVM guest (the SIGSTRUCT is provided by KVM by trapping guest's EINIT).
>>
>> In my understanding, although you only allows one LE in kernel, but
>> you won't limit who's LE can be run (basically kernel can run LE
>> signed by anyone, but just one LE when kernel is running), so I don't
>> see there is any limitation to KVM guests here.
>>
>> But it may still be better if SGX driver can provide function like:
>>
>>     int sgx_validate_sigstruct(struct sigstruct *sig);
>>
>> for KVM to call, in case driver is changed (ex, to only allows LEs
>> from some particular ones to run), but this is not necessary now. KVM
>> changes can be done later when driver make the changes.
>>
>> Andy,
>>
>> Am I understanding correctly? Does this make sense to you?
>
>My understanding is that the kernel will not (at least initially) allow users to supply
>an LE.  The kernel will handle launches all by itself.  How it does so is an
>implementation detail, and it seems to me that it will cause compatibility issues if
>guests have to use the host's LE.
>
>sgx_validate_sigstruct(...) sounds like it could be a good approach to me.
>

I don't think the kernel needs to support more than one LE.
For my understanding the guest kernel may include different LE and will require different LE hash, either by configuring 'lehash' when launching the guest, or by the guest kernel setting the hash value.
I assume that the VMM should not block this MSR writes and allow the guest to set the hash values for its own LE (by caching the MSR writes and setting the values on EINIT).

So I don't see why the host kernel should provide EINITTOKEN for the guests.
Moreover, I guess that the approach of LE providing tokens to any enclave is only the initial implementation, and different distributions may decide to add some logic to the token generation, so it might be that the host LE or the guest LE will not provide token to some enclave, and I don't think we'll want to override this kernel logic.

>>
>> Thanks,
>> -Kai
>>
>>>
>>> /Jarkko
>>>
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-08 23:47                     ` Huang, Kai
  2017-06-08 23:53                       ` Andy Lutomirski
@ 2017-06-10 12:23                       ` Jarkko Sakkinen
  2017-06-11 22:45                         ` Huang, Kai
  1 sibling, 1 reply; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-10 12:23 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Fri, Jun 09, 2017 at 11:47:13AM +1200, Huang, Kai wrote:
> In my understanding, although you only allows one LE in kernel, but you
> won't limit who's LE can be run (basically kernel can run LE signed by
> anyone, but just one LE when kernel is running), so I don't see there is any
> limitation to KVM guests here.
> 
> But it may still be better if SGX driver can provide function like:
> 
>     int sgx_validate_sigstruct(struct sigstruct *sig);
> 
> for KVM to call, in case driver is changed (ex, to only allows LEs from some
> particular ones to run), but this is not necessary now. KVM changes can be
> done later when driver make the changes.
> 
> Andy,
> 
> Am I understanding correctly? Does this make sense to you?
> 
> Thanks,
> -Kai

Nope. I don't even understand the *beginnings* what that function would
do. I don't understand what the validation means here and what VMM would
do if that functions reports "success".

How that would work on a system where MSRs cannot be changed?

In that kind of system the host OS must generate EINITTOKEN for the LE
running on inside the guest and maintain completely virtualized MSR
values for the guest.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-10 12:23                       ` Jarkko Sakkinen
@ 2017-06-11 22:45                         ` Huang, Kai
  2017-06-12  8:36                           ` Jarkko Sakkinen
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-06-11 22:45 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/11/2017 12:23 AM, Jarkko Sakkinen wrote:
> On Fri, Jun 09, 2017 at 11:47:13AM +1200, Huang, Kai wrote:
>> In my understanding, although you only allows one LE in kernel, but you
>> won't limit who's LE can be run (basically kernel can run LE signed by
>> anyone, but just one LE when kernel is running), so I don't see there is any
>> limitation to KVM guests here.
>>
>> But it may still be better if SGX driver can provide function like:
>>
>>      int sgx_validate_sigstruct(struct sigstruct *sig);
>>
>> for KVM to call, in case driver is changed (ex, to only allows LEs from some
>> particular ones to run), but this is not necessary now. KVM changes can be
>> done later when driver make the changes.
>>
>> Andy,
>>
>> Am I understanding correctly? Does this make sense to you?
>>
>> Thanks,
>> -Kai
> 
> Nope. I don't even understand the *beginnings* what that function would
> do. I don't understand what the validation means here and what VMM would
> do if that functions reports "success".

The validation means either the sigstruct->modulus or 
SHA256(sigstruct->modulus) should be in a 'approved white-list' 
maintained by kernel (which I know doesn't exist now, but Andy some kind 
suggested we may or should have, in the future I guess), otherwise the 
function returns error to indicate the LE from guest is "unapproved by 
host kernel/driver".

Andy, would you explain here?

> 
> How that would work on a system where MSRs cannot be changed?

This is simple, we simply won't allow guest to choose its own 
IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter 
when creating the guest.

To elaborate, currently in my design Qemu has below new parameters to 
support SGX:

	# qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr

The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies 
guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured 
in BIOS for real machine), and 'lewr' specifies whether guest's 
IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and 
'lewr' are optional.

If MSRs cannot be changed on physical machine, then we will fail to 
create guest if either 'lehash' or 'lewr' is specified when creating the 
guest.

> 
> In that kind of system the host OS must generate EINITTOKEN for the LE
> running on inside the guest and maintain completely virtualized MSR
> values for the guest.

The host OS will not generate EINITTOKEN for guest in any circumstances, 
as EINITTOKEN will always be from guest's EINIT instruction. KVM traps 
EINIT from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT 
leaf, update MSRs, and run EINIT on behalf of guest.

Btw the purpose for KVM to trap EINIT is to update guest's virtual 
IA32_SGXLEPUBKEYHASHn to physical MSRs, before running EINIT. In fact 
KVM even doesn't need to trap EINIT but simply updating guest's MSRs to 
real MSRs when vcpu is scheduled in, if SGX driver can update host LE's 
hash to MSRs before EINIT in host. KVM are not trying to guarantee 
running EINIT successfully here, but simply to emulate guest's 
IA32_SGXLEPUBKEYHASHn and EINIT in the guest.

Thanks,
-Kai

> 
> /Jarkko
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-11 22:45                         ` Huang, Kai
@ 2017-06-12  8:36                           ` Jarkko Sakkinen
  2017-06-12  9:53                             ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-12  8:36 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Mon, Jun 12, 2017 at 10:45:07AM +1200, Huang, Kai wrote:
> > > But it may still be better if SGX driver can provide function like:
> > > 
> > >      int sgx_validate_sigstruct(struct sigstruct *sig);
> > > 
> > > for KVM to call, in case driver is changed (ex, to only allows LEs from some
> > > particular ones to run), but this is not necessary now. KVM changes can be
> > > done later when driver make the changes.
> > > 
> > > Andy,
> > > 
> > > Am I understanding correctly? Does this make sense to you?
> > > 
> > > Thanks,
> > > -Kai
> > 
> > Nope. I don't even understand the *beginnings* what that function would
> > do. I don't understand what the validation means here and what VMM would
> > do if that functions reports "success".
> 
> The validation means either the sigstruct->modulus or
> SHA256(sigstruct->modulus) should be in a 'approved white-list' maintained
> by kernel (which I know doesn't exist now, but Andy some kind suggested we
> may or should have, in the future I guess), otherwise the function returns
> error to indicate the LE from guest is "unapproved by host kernel/driver".
> 
> Andy, would you explain here?

That can be considered but I still have zero idea what this function is
and what its relation to whitelist would be.

> > How that would work on a system where MSRs cannot be changed?
> 
> This is simple, we simply won't allow guest to choose its own
> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> creating the guest.

Why not? You could have virtual MSRs and ask host LE to generate token
if they match to modulus.

> To elaborate, currently in my design Qemu has below new parameters to
> support SGX:
> 
> 	# qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr
> 
> The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies
> guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured in
> BIOS for real machine), and 'lewr' specifies whether guest's
> IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and
> 'lewr' are optional.
> 
> If MSRs cannot be changed on physical machine, then we will fail to create
> guest if either 'lehash' or 'lewr' is specified when creating the guest.
> 
> > 
> > In that kind of system the host OS must generate EINITTOKEN for the LE
> > running on inside the guest and maintain completely virtualized MSR
> > values for the guest.
> 
> The host OS will not generate EINITTOKEN for guest in any circumstances, as
> EINITTOKEN will always be from guest's EINIT instruction. KVM traps EINIT
> from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT leaf,
> update MSRs, and run EINIT on behalf of guest.

Seriously sounds like a stupid constraint or I'm not getting something
(which also might be the case). If you anyway trap EINIT, you could
create a special case for guest LE.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-12  8:36                           ` Jarkko Sakkinen
@ 2017-06-12  9:53                             ` Huang, Kai
  2017-06-12 16:24                               ` Andy Lutomirski
  2017-06-13 18:57                               ` Jarkko Sakkinen
  0 siblings, 2 replies; 78+ messages in thread
From: Huang, Kai @ 2017-06-12  9:53 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/12/2017 8:36 PM, Jarkko Sakkinen wrote:
> On Mon, Jun 12, 2017 at 10:45:07AM +1200, Huang, Kai wrote:
>>>> But it may still be better if SGX driver can provide function like:
>>>>
>>>>       int sgx_validate_sigstruct(struct sigstruct *sig);
>>>>
>>>> for KVM to call, in case driver is changed (ex, to only allows LEs from some
>>>> particular ones to run), but this is not necessary now. KVM changes can be
>>>> done later when driver make the changes.
>>>>
>>>> Andy,
>>>>
>>>> Am I understanding correctly? Does this make sense to you?
>>>>
>>>> Thanks,
>>>> -Kai
>>>
>>> Nope. I don't even understand the *beginnings* what that function would
>>> do. I don't understand what the validation means here and what VMM would
>>> do if that functions reports "success".
>>
>> The validation means either the sigstruct->modulus or
>> SHA256(sigstruct->modulus) should be in a 'approved white-list' maintained
>> by kernel (which I know doesn't exist now, but Andy some kind suggested we
>> may or should have, in the future I guess), otherwise the function returns
>> error to indicate the LE from guest is "unapproved by host kernel/driver".
>>
>> Andy, would you explain here?
> 
> That can be considered but I still have zero idea what this function is
> and what its relation to whitelist would be.

The relation is this function only returns success when 
sigstruct->modulus (or SHA256(sigstruct->modulus)) matches the value in 
the 'white-list'...

VMM does nothing if the function returns success, it continues to run 
EINIT on behalf of guest, and inject the result to guest (either success 
or failure, VMM doesn't care the result but need to report the result to 
guest to reflect HW behavior).

If this function returns error, VMM needs to report some error code to 
guest. VMM even doesn't need to run EINIT anymore.

But IMO we don't have add this function now. If your driver choose to 
have such 'white-list', we can do this in the future.

> 
>>> How that would work on a system where MSRs cannot be changed?
>>
>> This is simple, we simply won't allow guest to choose its own
>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>> creating the guest.
> 
> Why not? You could have virtual MSRs and ask host LE to generate token
> if they match to modulus.

The guest has its own LE running inside, and guest's LE will generate 
token for enclaves in guest. The host will not generate token for guest 
in any circumstances, because this is totally guest's behavior.

Virtualization is only about to emulate hardware's behavior, but not to 
assume, or depend on, or change guest's SW behavior. We are not trying 
to make sure EINIT will run successfully in guest, instead, we are 
trying to make sure EINIT will be just the same behavior as it runs on 
physical machine -- either success or failure, according to guest's 
sigstruct and token.

If EINIT in guest is supposed to fail (ex, incorrect token is generated 
by guest's LE), KVM will need to inject the supposed error to guest, to 
reflect the real hardware behavior. One more example, if KVM even 
chooses not to trap EINIT, how can you provide token generated by host 
LE to guest? You simply cannot. Host LE will never generate token for 
guest, in any circumstance.

> 
>> To elaborate, currently in my design Qemu has below new parameters to
>> support SGX:
>>
>> 	# qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr
>>
>> The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies
>> guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured in
>> BIOS for real machine), and 'lewr' specifies whether guest's
>> IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and
>> 'lewr' are optional.
>>
>> If MSRs cannot be changed on physical machine, then we will fail to create
>> guest if either 'lehash' or 'lewr' is specified when creating the guest.
>>
>>>
>>> In that kind of system the host OS must generate EINITTOKEN for the LE
>>> running on inside the guest and maintain completely virtualized MSR
>>> values for the guest.
>>
>> The host OS will not generate EINITTOKEN for guest in any circumstances, as
>> EINITTOKEN will always be from guest's EINIT instruction. KVM traps EINIT
>> from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT leaf,
>> update MSRs, and run EINIT on behalf of guest.
> 
> Seriously sounds like a stupid constraint or I'm not getting something
> (which also might be the case). If you anyway trap EINIT, you could
> create a special case for guest LE.

This is not constraint, but KVM has to emulate hardware correctly. For 
this part please see my explanation above.

And let me explain the purpose of trapping EINIT again here.

When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus) 
matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are 
correctly populated in sigstruct and token as well), KVM needs to make 
sure that EINIT will run successfully in guest, even physical 
IA32_SGXLEPUBKEYHASHn are not equal to guest's virtual MSRs at this 
particular time. This is because given the same condition, the EINIT 
will run successfully on physical machine. KVM needs to emulate the 
right HW behavior.

How to make sure guest's EINIT will run successfully in this case? We 
need to update guest's virtual MSRs to physical MSRs before guest runs 
EINIT. The whole purpose for KVM to trap EINIT from guest, is to 
guarantee that before EINIT, KVM can update guest's virtual MSRs to 
physical MSRs.

Like I said before, KVM even doesn't need to trap EINIT for this 
purpose, ex, KVM can update guest's virtual MSRs to real MSRs when vcpu 
is scheduled in (which is exactly what I did in this patch, btw), but 
this is considered to be performance unfriendly, so Andy, Sean suggested 
we should do this when trapping EINIT from guest.

Another problem of updating MSRs when vcpu is scheduled in is, this will 
break SGX driver if SGX driver does some performance optimization in 
terms of updating MSRs, such as keeping per_cpu variables for MSR 
current value, and only updating MSRs if per_cpu value is not equal to 
the value you want to update. Because of this, Andy, Sean also suggested 
that SGX driver can provide a function to handle updating MSRs and run 
EINIT together with mutex protected, and KVM simply calls that function 
(of course KVM will provide the sigstruct and token by trapping EINIT 
from guest). This can cover MSRs updating and EINIT for both host and 
KVM guests. IMO this is a good idea as well, and you should consider to 
do in this way.

Thanks,
-Kai

> 
> /Jarkko
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-12  9:53                             ` Huang, Kai
@ 2017-06-12 16:24                               ` Andy Lutomirski
  2017-06-12 22:08                                 ` Huang, Kai
  2017-06-13 18:57                               ` Jarkko Sakkinen
  1 sibling, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-12 16:24 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Jarkko Sakkinen, Andy Lutomirski, Kai Huang, Paolo Bonzini,
	Radim Krcmar, kvm list, intel-sgx-kernel-dev, haim.cohen

On Mon, Jun 12, 2017 at 2:53 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
\
>>
>>>> How that would work on a system where MSRs cannot be changed?
>>>
>>>
>>> This is simple, we simply won't allow guest to choose its own
>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>>> creating the guest.
>>
>>
>> Why not? You could have virtual MSRs and ask host LE to generate token
>> if they match to modulus.
>
>
> The guest has its own LE running inside, and guest's LE will generate token
> for enclaves in guest. The host will not generate token for guest in any
> circumstances, because this is totally guest's behavior.
>
> Virtualization is only about to emulate hardware's behavior, but not to
> assume, or depend on, or change guest's SW behavior. We are not trying to
> make sure EINIT will run successfully in guest, instead, we are trying to
> make sure EINIT will be just the same behavior as it runs on physical
> machine -- either success or failure, according to guest's sigstruct and
> token.

I disagree.  Virtualization can do whatever it wants, but a pretty
strong constraint is that the guest should be at least reasonably
function.  A Windows guest, for example, shouldn't BSOD.  But the host
most certainly can restrict what the guest can do.  If a guest is
given pass-through access to a graphics card, the host is well within
its rights to impose thermal policy, prevent reflashing, etc.
Similarly, if a guest is given access to SGX, the host can and should
impose required policy on the guest.  If this means that an EINIT that
would have succeeded at host CPL 0 fails in the guest, so be it.

Of course, there isn't much in the way of host policy right now, so
this may not require any particular action until interesting host
policy shows up.

> This is not constraint, but KVM has to emulate hardware correctly. For this
> part please see my explanation above.
>
> And let me explain the purpose of trapping EINIT again here.
>
> When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus)
> matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are correctly
> populated in sigstruct and token as well), KVM needs to make sure that EINIT
> will run successfully in guest, even physical IA32_SGXLEPUBKEYHASHn are not
> equal to guest's virtual MSRs at this particular time.

True, so long as it doesn't contradict host policy to do so.

> This is because given
> the same condition, the EINIT will run successfully on physical machine. KVM
> needs to emulate the right HW behavior.

No.  The host needs to do this because KVM needs to work and be
useful, not because KVM needs to precisely match CPU behavior as seen
by VMX root.

To avoid confusion, I don't believe I've ever said that guests should
be restricted in which LEs they can use.  The types of restrictions
I'm talking about are that, if the host prevents user code from
running, say, a provisioning enclave that isn't whitelisted, then the
guest should have the same restriction applied.  This type of
restriction *can't* be usefully done by restricting acceptable MSR
values, but it's trivial by trapping EINIT.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-12 16:24                               ` Andy Lutomirski
@ 2017-06-12 22:08                                 ` Huang, Kai
  2017-06-12 23:00                                   ` Andy Lutomirski
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-06-12 22:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jarkko Sakkinen, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/13/2017 4:24 AM, Andy Lutomirski wrote:
> On Mon, Jun 12, 2017 at 2:53 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> \
>>>
>>>>> How that would work on a system where MSRs cannot be changed?
>>>>
>>>>
>>>> This is simple, we simply won't allow guest to choose its own
>>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>>>> creating the guest.
>>>
>>>
>>> Why not? You could have virtual MSRs and ask host LE to generate token
>>> if they match to modulus.
>>
>>
>> The guest has its own LE running inside, and guest's LE will generate token
>> for enclaves in guest. The host will not generate token for guest in any
>> circumstances, because this is totally guest's behavior.
>>
>> Virtualization is only about to emulate hardware's behavior, but not to
>> assume, or depend on, or change guest's SW behavior. We are not trying to
>> make sure EINIT will run successfully in guest, instead, we are trying to
>> make sure EINIT will be just the same behavior as it runs on physical
>> machine -- either success or failure, according to guest's sigstruct and
>> token.
> 
> I disagree.  Virtualization can do whatever it wants, but a pretty
> strong constraint is that the guest should be at least reasonably
> function.  

Virtualization only can do whatever it wants on the part that it can 
trap and emulate, and such *whatever* should not break HW behavior 
presented to guest. This is fundamental thing for virtualization. I 
don't know whether there are some *minor* case that the HW behavior 
emulation is not fully respected but I believe, if they exist, those 
cases are extremely rare, and we certainly have a good reason why such 
cases are OK.

Anyway I'll leave this part to KVM maintainers.

Paolo, Radim,

Sorry this thread is a little bit long till now. Can you comments on this?

A Windows guest, for example, shouldn't BSOD.  But the host
> most certainly can restrict what the guest can do.  If a guest is
> given pass-through access to a graphics card, the host is well within
> its rights to impose thermal policy, prevent reflashing, etc.

You need to see whether those policies are provided by PCIE configration 
space or by device's registers. If policies are implemented via PCIE 
configuration space (which is trapped and emulated by VMM/Qemu), you 
certainly can apply restrict on it by emulating PCIE configuration space 
access. But if those policies are done by device registers, which driver 
will control totally, it's totally up to driver and host cannot apply 
any policies. You can choose to or not to pass through device's specific 
BARs (ex, you cannot passthrough the BARs contains MSI), but once the 
BARs are passthroughed to guest, you cannot control guest's behavior.


> Similarly, if a guest is given access to SGX, the host can and should
> impose required policy on the guest.  If this means that an EINIT that
> would have succeeded at host CPL 0 fails in the guest, so be it.

This is completely different. And I disagree. If EINIT can run at host 
CPL 0, it can be run in guest's CPL 0 as well. Unless hardware doesn't 
support this, in which case we even cannot support SGX virtualization.

The exception is, if HW provides, for example, some specific capability 
bits that can be used to control whether EINIT can be run in CPL 0 or 
not, and hypervisor is able to trap those bits, then hypervisor can 
manipulate those bits to make guest think HW doesn't allow EINIT to run 
in CPL 0, in which case it is quite reasonable in guest EINIT cannot run 
in CPL 0 (because it is HW behavior).

> 
> Of course, there isn't much in the way of host policy right now, so
> this may not require any particular action until interesting host
> policy shows up.
> 
>> This is not constraint, but KVM has to emulate hardware correctly. For this
>> part please see my explanation above.
>>
>> And let me explain the purpose of trapping EINIT again here.
>>
>> When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus)
>> matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are correctly
>> populated in sigstruct and token as well), KVM needs to make sure that EINIT
>> will run successfully in guest, even physical IA32_SGXLEPUBKEYHASHn are not
>> equal to guest's virtual MSRs at this particular time.
> 
> True, so long as it doesn't contradict host policy to do so.
> 
>> This is because given
>> the same condition, the EINIT will run successfully on physical machine. KVM
>> needs to emulate the right HW behavior.
> 
> No.  The host needs to do this because KVM needs to work and be
> useful, not because KVM needs to precisely match CPU behavior as seen
> by VMX root.

If we need to break HW behavior to make SGX useful. The maintainer may 
choose not to support SGX virtualization, as from HW this feature simply 
cannot be virtualized.

Anyway I'll leave this to KVM maintainers to determine.

> 
> To avoid confusion, I don't believe I've ever said that guests should
> be restricted in which LEs they can use.  The types of restrictions
> I'm talking about are that, if the host prevents user code from
> running, say, a provisioning enclave that isn't whitelisted, then the
> guest should have the same restriction applied.  This type of
> restriction *can't* be usefully done by restricting acceptable MSR
> values, but it's trivial by trapping EINIT.

OK. Sorry I didn't get your point before. I thought it was about 
restrict LE.

I don't know whether SGX driver will have restrict on running 
provisioning enclave. In my understanding provisioning enclave is always 
from Intel. However I am not expert here and probably be wrong. Can you 
point out *exactly* what restricts in host must/should be applied to 
guest so that Jarkko can know whether he will support those restricts or 
not? Otherwise I don't think we even need to talk about this topic at 
current stage.

Thanks,
-Kai

> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-12 22:08                                 ` Huang, Kai
@ 2017-06-12 23:00                                   ` Andy Lutomirski
  2017-06-16  3:46                                     ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-12 23:00 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Jarkko Sakkinen, Kai Huang, Paolo Bonzini,
	Radim Krcmar, kvm list, intel-sgx-kernel-dev, haim.cohen

On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
> I don't know whether SGX driver will have restrict on running provisioning
> enclave. In my understanding provisioning enclave is always from Intel.
> However I am not expert here and probably be wrong. Can you point out
> *exactly* what restricts in host must/should be applied to guest so that
> Jarkko can know whether he will support those restricts or not? Otherwise I
> don't think we even need to talk about this topic at current stage.
>

The whole point is that I don't know.  But here are two types of
restriction I can imagine demand for:

1. Only a particular approved provisioning enclave may run (be it
Intel's or otherwise -- with a non-Intel LE, I think you can launch a
non-Intel provisioning enclave).  This would be done to restrict what
types of remote attestation can be done. (Intel supplies a remote
attestation service that uses some contractual policy that I don't
know.  Maybe a system owner wants a different policy applied to ISVs.)
 Imposing this policy on guests more or less requires filtering EINIT.

2. For kiosk-ish or single-purpose applications, I can imagine that
you would want to allow a specific list of enclave signers or even
enclave hashes. Maybe you would allow exactly one enclave hash.  You
could kludge this up with a restrictive LE policy, but you could also
do it for real by implementing the specific restriction in the kernel.
Then you'd want to impose it on the guest, and you'd do it by
filtering EINIT.

For the time being, I don't expect either policy to be implemented
right away, but I bet that something like this will eventually happen.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-12  9:53                             ` Huang, Kai
  2017-06-12 16:24                               ` Andy Lutomirski
@ 2017-06-13 18:57                               ` Jarkko Sakkinen
  2017-06-13 19:05                                 ` Jarkko Sakkinen
  2017-06-13 23:28                                 ` Huang, Kai
  1 sibling, 2 replies; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-13 18:57 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > This is simple, we simply won't allow guest to choose its own
> > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> > > creating the guest.
> > 
> > Why not? You could have virtual MSRs and ask host LE to generate token
> > if they match to modulus.
> 
> The guest has its own LE running inside, and guest's LE will generate token
> for enclaves in guest. The host will not generate token for guest in any
> circumstances, because this is totally guest's behavior.

Why can't host LE generate the token without guest knowning it and
supply it with EINIT?
> > Seriously sounds like a stupid constraint or I'm not getting something
> > (which also might be the case). If you anyway trap EINIT, you could
> > create a special case for guest LE.
> 
> This is not constraint, but KVM has to emulate hardware correctly. For this
> part please see my explanation above.

I'm being now totally honest to your: your explanation makes absolutely
zero sense to me. You don't need a 1000+ words to explain the scenarios
where "host as a delegate LE" approach would go wrong.

Please just pinpoint the scenarios where it goes wrong. I'll ignore
the text below.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-13 18:57                               ` Jarkko Sakkinen
@ 2017-06-13 19:05                                 ` Jarkko Sakkinen
  2017-06-13 20:13                                   ` Sean Christopherson
  2017-06-13 23:28                                 ` Huang, Kai
  1 sibling, 1 reply; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-13 19:05 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Tue, Jun 13, 2017 at 09:57:18PM +0300, Jarkko Sakkinen wrote:
> On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > > This is simple, we simply won't allow guest to choose its own
> > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> > > > creating the guest.
> > > 
> > > Why not? You could have virtual MSRs and ask host LE to generate token
> > > if they match to modulus.
> > 
> > The guest has its own LE running inside, and guest's LE will generate token
> > for enclaves in guest. The host will not generate token for guest in any
> > circumstances, because this is totally guest's behavior.
> 
> Why can't host LE generate the token without guest knowning it and
> supply it with EINIT?
> > > Seriously sounds like a stupid constraint or I'm not getting something
> > > (which also might be the case). If you anyway trap EINIT, you could
> > > create a special case for guest LE.
> > 
> > This is not constraint, but KVM has to emulate hardware correctly. For this
> > part please see my explanation above.
> 
> I'm being now totally honest to your: your explanation makes absolutely
> zero sense to me. You don't need a 1000+ words to explain the scenarios
> where "host as a delegate LE" approach would go wrong.
> 
> Please just pinpoint the scenarios where it goes wrong. I'll ignore
> the text below.
> 
> /Jarkko

When I've been reading this discussion the biggest lesson for me has
been that this is a new argument for having in-kernel LE in addition
to what Andy has stated before: the MSRs *never* need to be updated on
behalf of the guest.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-13 19:05                                 ` Jarkko Sakkinen
@ 2017-06-13 20:13                                   ` Sean Christopherson
  2017-06-14  9:37                                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 78+ messages in thread
From: Sean Christopherson @ 2017-06-13 20:13 UTC (permalink / raw)
  To: Jarkko Sakkinen, Huang, Kai
  Cc: kvm list, Radim Krcmar, intel-sgx-kernel-dev, Paolo Bonzini

On Tue, 2017-06-13 at 22:05 +0300, Jarkko Sakkinen wrote:
> On Tue, Jun 13, 2017 at 09:57:18PM +0300, Jarkko Sakkinen wrote:
> > 
> > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > 
> > > > 
> > > > > 
> > > > > This is simple, we simply won't allow guest to choose its own
> > > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter
> > > > > when
> > > > > creating the guest.
> > > > Why not? You could have virtual MSRs and ask host LE to generate token
> > > > if they match to modulus.
> > > The guest has its own LE running inside, and guest's LE will generate
> > > token
> > > for enclaves in guest. The host will not generate token for guest in any
> > > circumstances, because this is totally guest's behavior.
> > Why can't host LE generate the token without guest knowning it and
> > supply it with EINIT?
> > > 
> > > > 
> > > > Seriously sounds like a stupid constraint or I'm not getting something
> > > > (which also might be the case). If you anyway trap EINIT, you could
> > > > create a special case for guest LE.
> > > This is not constraint, but KVM has to emulate hardware correctly. For
> > > this
> > > part please see my explanation above.
> > I'm being now totally honest to your: your explanation makes absolutely
> > zero sense to me. You don't need a 1000+ words to explain the scenarios
> > where "host as a delegate LE" approach would go wrong.
> > 
> > Please just pinpoint the scenarios where it goes wrong. I'll ignore
> > the text below.
> > 
> > /Jarkko
> When I've been reading this discussion the biggest lesson for me has
> been that this is a new argument for having in-kernel LE in addition
> to what Andy has stated before: the MSRs *never* need to be updated on
> behalf of the guest.
> 
> /Jarkko

The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be used
to EINIT an enclave that is requesting access to the EINITTOKENKEY, i.e. a LE.
Preventing the guest from running its own LE is not an option, as the owner of
the LE, e.g. guest kernel or userspace daemon, will likely disable SGX if its LE
fails to run (including any ECALLS into the LE).  Allowing a guest to run a LE
doesn't mean the host can't ignore/discard the guest's EINITTOKENs, assuming the
host traps EINIT.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-13 18:57                               ` Jarkko Sakkinen
  2017-06-13 19:05                                 ` Jarkko Sakkinen
@ 2017-06-13 23:28                                 ` Huang, Kai
  2017-06-14  9:44                                   ` Jarkko Sakkinen
  1 sibling, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-06-13 23:28 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/14/2017 6:57 AM, Jarkko Sakkinen wrote:
> On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
>>>> This is simple, we simply won't allow guest to choose its own
>>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>>>> creating the guest.
>>>
>>> Why not? You could have virtual MSRs and ask host LE to generate token
>>> if they match to modulus.
>>
>> The guest has its own LE running inside, and guest's LE will generate token
>> for enclaves in guest. The host will not generate token for guest in any
>> circumstances, because this is totally guest's behavior.
> 
> Why can't host LE generate the token without guest knowning it and
> supply it with EINIT?

I have said many times, virtualization is only about to emulate hardware 
behavior. The same software runs in native machine is supposed to run in 
guest. I don't care on host how you implement -- whether LE delegates 
token for other enclave or not. Your implementation works on host, the 
exact driver works in guest. I said several times, I even don't have to 
trap EINIT, in which case you simply cannot generate token for EINIT 
from guest, because there is no EINIT from guest!


>>> Seriously sounds like a stupid constraint or I'm not getting something
>>> (which also might be the case). If you anyway trap EINIT, you could
>>> create a special case for guest LE.
>>
>> This is not constraint, but KVM has to emulate hardware correctly. For this
>> part please see my explanation above.
> 
> I'm being now totally honest to your: your explanation makes absolutely
> zero sense to me. You don't need a 1000+ words to explain the scenarios
> where "host as a delegate LE" approach would go wrong.

This doesn't require 1000+ words.. This is simple, and I have explained:

This is not constraint, but KVM has to emulate hardware correctly.

The additional 1000+ words that I spent lots of time on typing is trying 
to explain to you -- why we (at least Andy, Sean, Haim I believe) agreed 
to choose to trap EINIT.


> 
> Please just pinpoint the scenarios where it goes wrong. I'll ignore
> the text below.

Of course you can, if you already understand why we agreed to choose to 
trap EINIT.

I think Andy's comments just made things more complicated. He is trying 
to solve problems that don't exist in your implementation, and I believe 
we can handle this in the future, if those problems come up.

So let's focus on how to handle updating MSRs and EINIT. I believe Andy, 
me, and Sean are all on the same page. Not sure whether you are or not.

For me I am perfectly fine not to trap EINIT (which is exactly this 
patch did), but you have to guarantee the correctness of host side.

> 
> /Jarkko
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-13 20:13                                   ` Sean Christopherson
@ 2017-06-14  9:37                                     ` Jarkko Sakkinen
  2017-06-14 15:11                                       ` Christopherson, Sean J
  0 siblings, 1 reply; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-14  9:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Huang, Kai, kvm list, Radim Krcmar, intel-sgx-kernel-dev, Paolo Bonzini

On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote:
 
> The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be used
> to EINIT an enclave that is requesting access to the EINITTOKENKEY, i.e. a LE.
> Preventing the guest from running its own LE is not an option, as the owner of
> the LE, e.g. guest kernel or userspace daemon, will likely disable SGX if its LE
> fails to run (including any ECALLS into the LE).  Allowing a guest to run a LE
> doesn't mean the host can't ignore/discard the guest's EINITTOKENs, assuming the
> host traps EINIT.

[I started one week leave today but will peek MLs seldomly so except
some delay in my follow up responses]

Please, lets not use the term ECALL in these discussions. It's neither
hardware nor kernel specific concept. It's abstraction that exists only
in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example.
There are enough moving parts without such abstraction.

I'm looking at the section "EINIT - Initialize an Enclave for Execution"
from the SDM. I'm not seeing a branch in the pseudo code that checks for
ATTRIBUTES.EINITTOKENKEY.

39.1.4 states that "Only Launch Enclaves are allowed to launch without a
valid token." I'm not sure what I should deduce from that because that
statement is *incorrect*. If you control the MSRs, you can launch
anything you want to launch. I guess we should make a bug report of this
section as it's complete nonsense?

The table 41-56 does not show any key material bound to key hash defined
in the MSRs.

Instead of teaching me stuff that I already know I would just like to
get pinpointed where is the "side-effect" that makes the constraint that
you are claiming. I can then update the documentation so that we don't
have to go through this discussion anymore :-)

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-13 23:28                                 ` Huang, Kai
@ 2017-06-14  9:44                                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-14  9:44 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen

On Wed, Jun 14, 2017 at 11:28:32AM +1200, Huang, Kai wrote:
> 
> 
> On 6/14/2017 6:57 AM, Jarkko Sakkinen wrote:
> > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > > > This is simple, we simply won't allow guest to choose its own
> > > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> > > > > creating the guest.
> > > > 
> > > > Why not? You could have virtual MSRs and ask host LE to generate token
> > > > if they match to modulus.
> > > 
> > > The guest has its own LE running inside, and guest's LE will generate token
> > > for enclaves in guest. The host will not generate token for guest in any
> > > circumstances, because this is totally guest's behavior.
> > 
> > Why can't host LE generate the token without guest knowning it and
> > supply it with EINIT?
> 
> I have said many times, virtualization is only about to emulate hardware
> behavior. The same software runs in native machine is supposed to run in
> guest. I don't care on host how you implement -- whether LE delegates token
> for other enclave or not. Your implementation works on host, the exact
> driver works in guest. I said several times, I even don't have to trap
> EINIT, in which case you simply cannot generate token for EINIT from guest,
> because there is no EINIT from guest!
> 
> 
> > > > Seriously sounds like a stupid constraint or I'm not getting something
> > > > (which also might be the case). If you anyway trap EINIT, you could
> > > > create a special case for guest LE.
> > > 
> > > This is not constraint, but KVM has to emulate hardware correctly. For this
> > > part please see my explanation above.
> > 
> > I'm being now totally honest to your: your explanation makes absolutely
> > zero sense to me. You don't need a 1000+ words to explain the scenarios
> > where "host as a delegate LE" approach would go wrong.
> 
> This doesn't require 1000+ words.. This is simple, and I have explained:
> 
> This is not constraint, but KVM has to emulate hardware correctly.
> 
> The additional 1000+ words that I spent lots of time on typing is trying to
> explain to you -- why we (at least Andy, Sean, Haim I believe) agreed to
> choose to trap EINIT.
> 
> 
> > 
> > Please just pinpoint the scenarios where it goes wrong. I'll ignore
> > the text below.
> 
> Of course you can, if you already understand why we agreed to choose to trap
> EINIT.
> 
> I think Andy's comments just made things more complicated. He is trying to
> solve problems that don't exist in your implementation, and I believe we can
> handle this in the future, if those problems come up.
> 
> So let's focus on how to handle updating MSRs and EINIT. I believe Andy, me,
> and Sean are all on the same page. Not sure whether you are or not.
> 
> For me I am perfectly fine not to trap EINIT (which is exactly this patch
> did), but you have to guarantee the correctness of host side.
> 
> > 
> > /Jarkko
> > 

I'm not yet seeing why MSRs would ever would need to be updated.

See my response to Sean for details.

There's probably some detail in the SDM that I'm not observing.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-14  9:37                                     ` Jarkko Sakkinen
@ 2017-06-14 15:11                                       ` Christopherson, Sean J
  2017-06-14 17:03                                         ` Jarkko Sakkinen
  0 siblings, 1 reply; 78+ messages in thread
From: Christopherson, Sean J @ 2017-06-14 15:11 UTC (permalink / raw)
  To: 'Jarkko Sakkinen'
  Cc: Huang, Kai, kvm list, Radim Krcmar, intel-sgx-kernel-dev, Paolo Bonzini

Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote:
>  
> > The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be
> > used to EINIT an enclave that is requesting access to the EINITTOKENKEY,
> > i.e. a LE. Preventing the guest from running its own LE is not an option,
> > as the owner of the LE, e.g. guest kernel or userspace daemon, will likely
> > disable SGX if its LE fails to run (including any ECALLS into the LE).
> > Allowing a guest to run a LE doesn't mean the host can't ignore/discard the
> > guest's EINITTOKENs, assuming the host traps EINIT.
> 
> [I started one week leave today but will peek MLs seldomly so except
> some delay in my follow up responses]
> 
> Please, lets not use the term ECALL in these discussions. It's neither
> hardware nor kernel specific concept. It's abstraction that exists only
> in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example.
> There are enough moving parts without such abstraction.
> 
> I'm looking at the section "EINIT - Initialize an Enclave for Execution"
> from the SDM. I'm not seeing a branch in the pseudo code that checks for
> ATTRIBUTES.EINITTOKENKEY.

(* if controlled ATTRIBUTES are set, SIGSTRUCT must be signed using an authorized key *)
CONTROLLED_ATTRIBUTES <- 0000000000000020H;
IF (((DS:RCX.ATTRIBUTES & CONTROLLED_ATTRIBUTES) != 0) and (TMP_MRSIGNER != IA32_SGXLEPUBKEYHASH))
    RFLAG.ZF <- 1;
    RAX <- SGX_INVALID_ATTRIBUTE;
    GOTO EXIT;
FI;

Bit 5, i.e. 20H, corresponds to the EINITTOKENKEY.  This is also covered in the
text description under Intel SGX Launch Control Configuration - "The hash of the
public key used to sign the SIGSTRUCT of the Launch Enclave must equal the value
in the IA32_SGXLEPUBKEYHASH MSRs."


> 39.1.4 states that "Only Launch Enclaves are allowed to launch without a
> valid token." I'm not sure what I should deduce from that because that
> statement is *incorrect*. If you control the MSRs, you can launch
> anything you want to launch. I guess we should make a bug report of this
> section as it's complete nonsense?

I wouldn't call it complete nonsense, there are far more egregious ambiguities
in the SDM.  If you read the statement in the context of someone learning about
SGX, it makes perfect sense: if it's not a launch enclave, it needs a token.
Sure, rewording the statement to something like "Only enclaves whose public key
hash equals the value in the IA32_SGXLEPUBKEYHASH MSRs are allowed to launch
without a token." is technically more accurate, but I wouldn't describe the
current wording as "complete nonsense".  


> The table 41-56 does not show any key material bound to key hash defined
> in the MSRs.
> 
> Instead of teaching me stuff that I already know I would just like to
> get pinpointed where is the "side-effect" that makes the constraint that
> you are claiming. I can then update the documentation so that we don't
> have to go through this discussion anymore :-)
> 
> /Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-14 15:11                                       ` Christopherson, Sean J
@ 2017-06-14 17:03                                         ` Jarkko Sakkinen
  0 siblings, 0 replies; 78+ messages in thread
From: Jarkko Sakkinen @ 2017-06-14 17:03 UTC (permalink / raw)
  To: Christopherson, Sean J
  Cc: Huang, Kai, kvm list, Radim Krcmar, intel-sgx-kernel-dev, Paolo Bonzini

On Wed, Jun 14, 2017 at 03:11:34PM +0000, Christopherson, Sean J wrote:
> Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote:
> >  
> > > The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be
> > > used to EINIT an enclave that is requesting access to the EINITTOKENKEY,
> > > i.e. a LE. Preventing the guest from running its own LE is not an option,
> > > as the owner of the LE, e.g. guest kernel or userspace daemon, will likely
> > > disable SGX if its LE fails to run (including any ECALLS into the LE).
> > > Allowing a guest to run a LE doesn't mean the host can't ignore/discard the
> > > guest's EINITTOKENs, assuming the host traps EINIT.
> > 
> > [I started one week leave today but will peek MLs seldomly so except
> > some delay in my follow up responses]
> > 
> > Please, lets not use the term ECALL in these discussions. It's neither
> > hardware nor kernel specific concept. It's abstraction that exists only
> > in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example.
> > There are enough moving parts without such abstraction.
> > 
> > I'm looking at the section "EINIT - Initialize an Enclave for Execution"
> > from the SDM. I'm not seeing a branch in the pseudo code that checks for
> > ATTRIBUTES.EINITTOKENKEY.
> 
> (* if controlled ATTRIBUTES are set, SIGSTRUCT must be signed using an authorized key *)
> CONTROLLED_ATTRIBUTES <- 0000000000000020H;
> IF (((DS:RCX.ATTRIBUTES & CONTROLLED_ATTRIBUTES) != 0) and (TMP_MRSIGNER != IA32_SGXLEPUBKEYHASH))
>     RFLAG.ZF <- 1;
>     RAX <- SGX_INVALID_ATTRIBUTE;
>     GOTO EXIT;
> FI;
> 
> Bit 5, i.e. 20H, corresponds to the EINITTOKENKEY.  This is also covered in the
> text description under Intel SGX Launch Control Configuration - "The hash of the
> public key used to sign the SIGSTRUCT of the Launch Enclave must equal the value
> in the IA32_SGXLEPUBKEYHASH MSRs."

Thanks. I wonder by the naming is ambiguous (the value is exactly the
same as the value of ATTRIBUTES.EINITTOKENKEY but the name is different)
but there it is.

> > 39.1.4 states that "Only Launch Enclaves are allowed to launch without a
> > valid token." I'm not sure what I should deduce from that because that
> > statement is *incorrect*. If you control the MSRs, you can launch
> > anything you want to launch. I guess we should make a bug report of this
> > section as it's complete nonsense?
> 
> I wouldn't call it complete nonsense, there are far more egregious ambiguities
> in the SDM.  If you read the statement in the context of someone learning about
> SGX, it makes perfect sense: if it's not a launch enclave, it needs a token.
> Sure, rewording the statement to something like "Only enclaves whose public key
> hash equals the value in the IA32_SGXLEPUBKEYHASH MSRs are allowed to launch
> without a token." is technically more accurate, but I wouldn't describe the
> current wording as "complete nonsense".  

Agreed! That was a harsh overstatement.

I think that in this kind of stuff the accurancy still would make sense
when cryptography is involved.

I'll make updates to intel_sgx.rst. It's good to have it documented when
virtualization stuff is upstreamed.

/Jarkko

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-12 23:00                                   ` Andy Lutomirski
@ 2017-06-16  3:46                                     ` Huang, Kai
  2017-06-16  4:11                                       ` Andy Lutomirski
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-06-16  3:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jarkko Sakkinen, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>> I don't know whether SGX driver will have restrict on running provisioning
>> enclave. In my understanding provisioning enclave is always from Intel.
>> However I am not expert here and probably be wrong. Can you point out
>> *exactly* what restricts in host must/should be applied to guest so that
>> Jarkko can know whether he will support those restricts or not? Otherwise I
>> don't think we even need to talk about this topic at current stage.
>>
> 
> The whole point is that I don't know.  But here are two types of
> restriction I can imagine demand for:
> 
> 1. Only a particular approved provisioning enclave may run (be it
> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
> non-Intel provisioning enclave).  This would be done to restrict what
> types of remote attestation can be done. (Intel supplies a remote
> attestation service that uses some contractual policy that I don't
> know.  Maybe a system owner wants a different policy applied to ISVs.)
>   Imposing this policy on guests more or less requires filtering EINIT.

Hi Andy,

Sorry for late reply.

What is the issue if host and guest run provisioning enclave from 
different vendor, for example, host runs intel's provisioning enclave, 
and guest runs other vendor's provisioning enclave? Or different guests 
run provisioning enclaves from different vendors?

One reason I am asking is that, on Xen (where we don't have concept of 
*host*), it's likely that we won't apply any policy at Xen hypervisor at 
all, and guests will be able to run any enclave from any signer as their 
wish.

Sorry that I don't understand (or kind of forgot) the issues here.

> 
> 2. For kiosk-ish or single-purpose applications, I can imagine that
> you would want to allow a specific list of enclave signers or even
> enclave hashes. Maybe you would allow exactly one enclave hash.  You
> could kludge this up with a restrictive LE policy, but you could also
> do it for real by implementing the specific restriction in the kernel.
> Then you'd want to impose it on the guest, and you'd do it by
> filtering EINIT.
Assuming the enclave hash means measurement of enclave, and assuming we 
have a policy that we only allow enclave from one signer to run, would 
you also elaborate the issue that, if host and guest run enclaves from 
different signer? If host has such policy, and we are allowing creating 
guests on such host, I think that typically we will have the same policy 
in the guest (vetted by guest's kernel). The owner of that host should 
be aware of the risk (if there's any) by creating guest and run enclave 
inside it.

Thanks,
-Kai

> 
> For the time being, I don't expect either policy to be implemented
> right away, but I bet that something like this will eventually happen.
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-16  3:46                                     ` Huang, Kai
@ 2017-06-16  4:11                                       ` Andy Lutomirski
  2017-06-16  4:33                                         ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-16  4:11 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Jarkko Sakkinen, Kai Huang, Paolo Bonzini,
	Radim Krcmar, kvm list, intel-sgx-kernel-dev, haim.cohen

On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>
>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>>
>>> I don't know whether SGX driver will have restrict on running
>>> provisioning
>>> enclave. In my understanding provisioning enclave is always from Intel.
>>> However I am not expert here and probably be wrong. Can you point out
>>> *exactly* what restricts in host must/should be applied to guest so that
>>> Jarkko can know whether he will support those restricts or not? Otherwise
>>> I
>>> don't think we even need to talk about this topic at current stage.
>>>
>>
>> The whole point is that I don't know.  But here are two types of
>> restriction I can imagine demand for:
>>
>> 1. Only a particular approved provisioning enclave may run (be it
>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>> non-Intel provisioning enclave).  This would be done to restrict what
>> types of remote attestation can be done. (Intel supplies a remote
>> attestation service that uses some contractual policy that I don't
>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>   Imposing this policy on guests more or less requires filtering EINIT.
>
>
> Hi Andy,
>
> Sorry for late reply.
>
> What is the issue if host and guest run provisioning enclave from different
> vendor, for example, host runs intel's provisioning enclave, and guest runs
> other vendor's provisioning enclave? Or different guests run provisioning
> enclaves from different vendors?

There's no issue unless someone has tried to impose a policy.  There
is clearly at least some interest in having policies that affect what
enclaves can run -- otherwise there wouldn't be LEs in the first
place.

>
> One reason I am asking is that, on Xen (where we don't have concept of
> *host*), it's likely that we won't apply any policy at Xen hypervisor at
> all, and guests will be able to run any enclave from any signer as their
> wish.

That seems entirely reasonable.  Someone may eventually ask Xen to add
support for SGX enclave restrictions, in which case you'll either have
to tell them that it won't happen or implement it.

>
> Sorry that I don't understand (or kind of forgot) the issues here.
>
>>
>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>> you would want to allow a specific list of enclave signers or even
>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>> could kludge this up with a restrictive LE policy, but you could also
>> do it for real by implementing the specific restriction in the kernel.
>> Then you'd want to impose it on the guest, and you'd do it by
>> filtering EINIT.
>
> Assuming the enclave hash means measurement of enclave, and assuming we have
> a policy that we only allow enclave from one signer to run, would you also
> elaborate the issue that, if host and guest run enclaves from different
> signer? If host has such policy, and we are allowing creating guests on such
> host, I think that typically we will have the same policy in the guest

Yes, I presume this too, but.

> (vetted by guest's kernel). The owner of that host should be aware of the
> risk (if there's any) by creating guest and run enclave inside it.

No.  The host does not trust the guest in general.  If the host has a
policy that the only enclave that shall run is X, that doesn't mean
that the host shall reject all enclaves requested by the normal
userspace API except X but that, if /dev/kvm is used, then the user is
magically trusted to not load a guest that fails to respect the host
policy.  It means that the only enclave that shall run is X regardless
of what interface is used.  The host must only allow X to be loaded by
its userspace and the host must only allow X to be loaded by a guest.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-16  4:11                                       ` Andy Lutomirski
@ 2017-06-16  4:33                                         ` Huang, Kai
  2017-06-16  9:34                                           ` Huang, Kai
                                                             ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Huang, Kai @ 2017-06-16  4:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jarkko Sakkinen, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>
>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>> wrote:
>>>>
>>>>
>>>> I don't know whether SGX driver will have restrict on running
>>>> provisioning
>>>> enclave. In my understanding provisioning enclave is always from Intel.
>>>> However I am not expert here and probably be wrong. Can you point out
>>>> *exactly* what restricts in host must/should be applied to guest so that
>>>> Jarkko can know whether he will support those restricts or not? Otherwise
>>>> I
>>>> don't think we even need to talk about this topic at current stage.
>>>>
>>>
>>> The whole point is that I don't know.  But here are two types of
>>> restriction I can imagine demand for:
>>>
>>> 1. Only a particular approved provisioning enclave may run (be it
>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>> non-Intel provisioning enclave).  This would be done to restrict what
>>> types of remote attestation can be done. (Intel supplies a remote
>>> attestation service that uses some contractual policy that I don't
>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>    Imposing this policy on guests more or less requires filtering EINIT.
>>
>>
>> Hi Andy,
>>
>> Sorry for late reply.
>>
>> What is the issue if host and guest run provisioning enclave from different
>> vendor, for example, host runs intel's provisioning enclave, and guest runs
>> other vendor's provisioning enclave? Or different guests run provisioning
>> enclaves from different vendors?
> 
> There's no issue unless someone has tried to impose a policy.  There
> is clearly at least some interest in having policies that affect what
> enclaves can run -- otherwise there wouldn't be LEs in the first
> place.
> 
>>
>> One reason I am asking is that, on Xen (where we don't have concept of
>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>> all, and guests will be able to run any enclave from any signer as their
>> wish.
> 
> That seems entirely reasonable.  Someone may eventually ask Xen to add
> support for SGX enclave restrictions, in which case you'll either have
> to tell them that it won't happen or implement it.
> 
>>
>> Sorry that I don't understand (or kind of forgot) the issues here.
>>
>>>
>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>> you would want to allow a specific list of enclave signers or even
>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>> could kludge this up with a restrictive LE policy, but you could also
>>> do it for real by implementing the specific restriction in the kernel.
>>> Then you'd want to impose it on the guest, and you'd do it by
>>> filtering EINIT.
>>
>> Assuming the enclave hash means measurement of enclave, and assuming we have
>> a policy that we only allow enclave from one signer to run, would you also
>> elaborate the issue that, if host and guest run enclaves from different
>> signer? If host has such policy, and we are allowing creating guests on such
>> host, I think that typically we will have the same policy in the guest
> 
> Yes, I presume this too, but.
> 
>> (vetted by guest's kernel). The owner of that host should be aware of the
>> risk (if there's any) by creating guest and run enclave inside it.
> 
> No.  The host does not trust the guest in general.  If the host has a

I agree.

> policy that the only enclave that shall run is X, that doesn't mean
> that the host shall reject all enclaves requested by the normal
> userspace API except X but that, if /dev/kvm is used, then the user is
> magically trusted to not load a guest that fails to respect the host
> policy.  It means that the only enclave that shall run is X regardless
> of what interface is used.  The host must only allow X to be loaded by
> its userspace and the host must only allow X to be loaded by a guest.
> 

This is theoretical thing. I think your statement makes sense only if we 
have specific example that can prove there's actual risk when allowing 
guest to exceed X approved by host.

I will dig more in your previous emails to see whether you have listed 
such real cases (I some kind forgot sorry) but if you don't mind, you 
can list such cases here.

Thanks,
-Kai

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-16  4:33                                         ` Huang, Kai
@ 2017-06-16  9:34                                           ` Huang, Kai
  2017-06-16 16:03                                           ` Andy Lutomirski
  2017-06-16 16:25                                           ` Andy Lutomirski
  2 siblings, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-06-16  9:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jarkko Sakkinen, Kai Huang, Paolo Bonzini, Radim Krcmar,
	kvm list, intel-sgx-kernel-dev, haim.cohen



On 6/16/2017 4:33 PM, Huang, Kai wrote:
> 
> 
> On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
>> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai 
>> <kai.huang@linux.intel.com> wrote:
>>>
>>>
>>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>>
>>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> I don't know whether SGX driver will have restrict on running
>>>>> provisioning
>>>>> enclave. In my understanding provisioning enclave is always from 
>>>>> Intel.
>>>>> However I am not expert here and probably be wrong. Can you point out
>>>>> *exactly* what restricts in host must/should be applied to guest so 
>>>>> that
>>>>> Jarkko can know whether he will support those restricts or not? 
>>>>> Otherwise
>>>>> I
>>>>> don't think we even need to talk about this topic at current stage.
>>>>>
>>>>
>>>> The whole point is that I don't know.  But here are two types of
>>>> restriction I can imagine demand for:
>>>>
>>>> 1. Only a particular approved provisioning enclave may run (be it
>>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>>> non-Intel provisioning enclave).  This would be done to restrict what
>>>> types of remote attestation can be done. (Intel supplies a remote
>>>> attestation service that uses some contractual policy that I don't
>>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>>    Imposing this policy on guests more or less requires filtering 
>>>> EINIT.
>>>
>>>
>>> Hi Andy,
>>>
>>> Sorry for late reply.
>>>
>>> What is the issue if host and guest run provisioning enclave from 
>>> different
>>> vendor, for example, host runs intel's provisioning enclave, and 
>>> guest runs
>>> other vendor's provisioning enclave? Or different guests run 
>>> provisioning
>>> enclaves from different vendors?
>>
>> There's no issue unless someone has tried to impose a policy.  There
>> is clearly at least some interest in having policies that affect what
>> enclaves can run -- otherwise there wouldn't be LEs in the first
>> place.
>>
>>>
>>> One reason I am asking is that, on Xen (where we don't have concept of
>>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>>> all, and guests will be able to run any enclave from any signer as their
>>> wish.
>>
>> That seems entirely reasonable.  Someone may eventually ask Xen to add
>> support for SGX enclave restrictions, in which case you'll either have
>> to tell them that it won't happen or implement it.
>>
>>>
>>> Sorry that I don't understand (or kind of forgot) the issues here.
>>>
>>>>
>>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>>> you would want to allow a specific list of enclave signers or even
>>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>>> could kludge this up with a restrictive LE policy, but you could also
>>>> do it for real by implementing the specific restriction in the kernel.
>>>> Then you'd want to impose it on the guest, and you'd do it by
>>>> filtering EINIT.
>>>
>>> Assuming the enclave hash means measurement of enclave, and assuming 
>>> we have
>>> a policy that we only allow enclave from one signer to run, would you 
>>> also
>>> elaborate the issue that, if host and guest run enclaves from different
>>> signer? If host has such policy, and we are allowing creating guests 
>>> on such
>>> host, I think that typically we will have the same policy in the guest
>>
>> Yes, I presume this too, but.
>>
>>> (vetted by guest's kernel). The owner of that host should be aware of 
>>> the
>>> risk (if there's any) by creating guest and run enclave inside it.
>>
>> No.  The host does not trust the guest in general.  If the host has a
> 
> I agree.
> 
>> policy that the only enclave that shall run is X, that doesn't mean
>> that the host shall reject all enclaves requested by the normal
>> userspace API except X but that, if /dev/kvm is used, then the user is
>> magically trusted to not load a guest that fails to respect the host
>> policy.  It means that the only enclave that shall run is X regardless
>> of what interface is used.  The host must only allow X to be loaded by
>> its userspace and the host must only allow X to be loaded by a guest.
>>
> 
> This is theoretical thing. I think your statement makes sense only if we 
> have specific example that can prove there's actual risk when allowing 
> guest to exceed X approved by host.
> 
> I will dig more in your previous emails to see whether you have listed 
> such real cases (I some kind forgot sorry) but if you don't mind, you 
> can list such cases here.

Hi Andy,

I can find an example you listed in your previous email but it is not 
related to host policy but related to SGX's key architecture issue. I 
quoted below:

"Concretely, imagine I write an enclave that seals my TLS client
certificate's private key and offers an API to sign TLS certificate
requests with it.  This way, if my system is compromised, an attacker
can use the certificate only so long as they have access to my
machine.  If I kick them out or if they merely get the ability to read
the sealed data but not to execute code, the private key should still
be safe.  But, if this system is a VM guest, the attacker could run
the exact same enclave on another guest on the same physical CPU and
sign using my key.  Whoops!"

I think you will have this problem even you apply the most strict policy 
at both host and guest -- only allow one enclave from one signer to run. 
This is indeed a flaw but virtualization cannot do anything to solve 
this -- unless we don't support virtualization at all :)

Sorry I am just trying to find out whether there's real case that really 
require we apply host's policy to guest and will have problem if we don't.

Thanks,
-Kai

> 
> Thanks,
> -Kai

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-16  4:33                                         ` Huang, Kai
  2017-06-16  9:34                                           ` Huang, Kai
@ 2017-06-16 16:03                                           ` Andy Lutomirski
  2017-06-16 16:25                                           ` Andy Lutomirski
  2 siblings, 0 replies; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-16 16:03 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Jarkko Sakkinen, Kai Huang, Paolo Bonzini,
	Radim Krcmar, kvm list, intel-sgx-kernel-dev, haim.cohen

On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
>>
>> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>>
>>>
>>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> I don't know whether SGX driver will have restrict on running
>>>>> provisioning
>>>>> enclave. In my understanding provisioning enclave is always from Intel.
>>>>> However I am not expert here and probably be wrong. Can you point out
>>>>> *exactly* what restricts in host must/should be applied to guest so
>>>>> that
>>>>> Jarkko can know whether he will support those restricts or not?
>>>>> Otherwise
>>>>> I
>>>>> don't think we even need to talk about this topic at current stage.
>>>>>
>>>>
>>>> The whole point is that I don't know.  But here are two types of
>>>> restriction I can imagine demand for:
>>>>
>>>> 1. Only a particular approved provisioning enclave may run (be it
>>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>>> non-Intel provisioning enclave).  This would be done to restrict what
>>>> types of remote attestation can be done. (Intel supplies a remote
>>>> attestation service that uses some contractual policy that I don't
>>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>>    Imposing this policy on guests more or less requires filtering EINIT.
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> Sorry for late reply.
>>>
>>> What is the issue if host and guest run provisioning enclave from
>>> different
>>> vendor, for example, host runs intel's provisioning enclave, and guest
>>> runs
>>> other vendor's provisioning enclave? Or different guests run provisioning
>>> enclaves from different vendors?
>>
>>
>> There's no issue unless someone has tried to impose a policy.  There
>> is clearly at least some interest in having policies that affect what
>> enclaves can run -- otherwise there wouldn't be LEs in the first
>> place.
>>
>>>
>>> One reason I am asking is that, on Xen (where we don't have concept of
>>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>>> all, and guests will be able to run any enclave from any signer as their
>>> wish.
>>
>>
>> That seems entirely reasonable.  Someone may eventually ask Xen to add
>> support for SGX enclave restrictions, in which case you'll either have
>> to tell them that it won't happen or implement it.
>>
>>>
>>> Sorry that I don't understand (or kind of forgot) the issues here.
>>>
>>>>
>>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>>> you would want to allow a specific list of enclave signers or even
>>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>>> could kludge this up with a restrictive LE policy, but you could also
>>>> do it for real by implementing the specific restriction in the kernel.
>>>> Then you'd want to impose it on the guest, and you'd do it by
>>>> filtering EINIT.
>>>
>>>
>>> Assuming the enclave hash means measurement of enclave, and assuming we
>>> have
>>> a policy that we only allow enclave from one signer to run, would you
>>> also
>>> elaborate the issue that, if host and guest run enclaves from different
>>> signer? If host has such policy, and we are allowing creating guests on
>>> such
>>> host, I think that typically we will have the same policy in the guest
>>
>>
>> Yes, I presume this too, but.
>>
>>> (vetted by guest's kernel). The owner of that host should be aware of the
>>> risk (if there's any) by creating guest and run enclave inside it.
>>
>>
>> No.  The host does not trust the guest in general.  If the host has a
>
>
> I agree.
>
>> policy that the only enclave that shall run is X, that doesn't mean
>> that the host shall reject all enclaves requested by the normal
>> userspace API except X but that, if /dev/kvm is used, then the user is
>> magically trusted to not load a guest that fails to respect the host
>> policy.  It means that the only enclave that shall run is X regardless
>> of what interface is used.  The host must only allow X to be loaded by
>> its userspace and the host must only allow X to be loaded by a guest.
>>
>
> This is theoretical thing. I think your statement makes sense only if we
> have specific example that can prove there's actual risk when allowing guest
> to exceed X approved by host.

I would turn this around.  Can you come up with any example where the
host would have a restrictive policy but that the policy should not be
enforced for guests?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-16  4:33                                         ` Huang, Kai
  2017-06-16  9:34                                           ` Huang, Kai
  2017-06-16 16:03                                           ` Andy Lutomirski
@ 2017-06-16 16:25                                           ` Andy Lutomirski
  2017-06-16 16:31                                             ` Christopherson, Sean J
  2 siblings, 1 reply; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-16 16:25 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Andy Lutomirski, Jarkko Sakkinen, Kai Huang, Paolo Bonzini,
	Radim Krcmar, kvm list, intel-sgx-kernel-dev, haim.cohen

On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
>>
>> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>>
>>>
>>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> I don't know whether SGX driver will have restrict on running
>>>>> provisioning
>>>>> enclave. In my understanding provisioning enclave is always from Intel.
>>>>> However I am not expert here and probably be wrong. Can you point out
>>>>> *exactly* what restricts in host must/should be applied to guest so
>>>>> that
>>>>> Jarkko can know whether he will support those restricts or not?
>>>>> Otherwise
>>>>> I
>>>>> don't think we even need to talk about this topic at current stage.
>>>>>
>>>>
>>>> The whole point is that I don't know.  But here are two types of
>>>> restriction I can imagine demand for:
>>>>
>>>> 1. Only a particular approved provisioning enclave may run (be it
>>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>>> non-Intel provisioning enclave).  This would be done to restrict what
>>>> types of remote attestation can be done. (Intel supplies a remote
>>>> attestation service that uses some contractual policy that I don't
>>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>>    Imposing this policy on guests more or less requires filtering EINIT.
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> Sorry for late reply.
>>>
>>> What is the issue if host and guest run provisioning enclave from
>>> different
>>> vendor, for example, host runs intel's provisioning enclave, and guest
>>> runs
>>> other vendor's provisioning enclave? Or different guests run provisioning
>>> enclaves from different vendors?
>>
>>
>> There's no issue unless someone has tried to impose a policy.  There
>> is clearly at least some interest in having policies that affect what
>> enclaves can run -- otherwise there wouldn't be LEs in the first
>> place.
>>
>>>
>>> One reason I am asking is that, on Xen (where we don't have concept of
>>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>>> all, and guests will be able to run any enclave from any signer as their
>>> wish.
>>
>>
>> That seems entirely reasonable.  Someone may eventually ask Xen to add
>> support for SGX enclave restrictions, in which case you'll either have
>> to tell them that it won't happen or implement it.
>>
>>>
>>> Sorry that I don't understand (or kind of forgot) the issues here.
>>>
>>>>
>>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>>> you would want to allow a specific list of enclave signers or even
>>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>>> could kludge this up with a restrictive LE policy, but you could also
>>>> do it for real by implementing the specific restriction in the kernel.
>>>> Then you'd want to impose it on the guest, and you'd do it by
>>>> filtering EINIT.
>>>
>>>
>>> Assuming the enclave hash means measurement of enclave, and assuming we
>>> have
>>> a policy that we only allow enclave from one signer to run, would you
>>> also
>>> elaborate the issue that, if host and guest run enclaves from different
>>> signer? If host has such policy, and we are allowing creating guests on
>>> such
>>> host, I think that typically we will have the same policy in the guest
>>
>>
>> Yes, I presume this too, but.
>>
>>> (vetted by guest's kernel). The owner of that host should be aware of the
>>> risk (if there's any) by creating guest and run enclave inside it.
>>
>>
>> No.  The host does not trust the guest in general.  If the host has a
>
>
> I agree.
>
>> policy that the only enclave that shall run is X, that doesn't mean
>> that the host shall reject all enclaves requested by the normal
>> userspace API except X but that, if /dev/kvm is used, then the user is
>> magically trusted to not load a guest that fails to respect the host
>> policy.  It means that the only enclave that shall run is X regardless
>> of what interface is used.  The host must only allow X to be loaded by
>> its userspace and the host must only allow X to be loaded by a guest.
>>
>
> This is theoretical thing. I think your statement makes sense only if we
> have specific example that can prove there's actual risk when allowing guest
> to exceed X approved by host.
>
> I will dig more in your previous emails to see whether you have listed such
> real cases (I some kind forgot sorry) but if you don't mind, you can list
> such cases here.

I'm operating under the assumption that some kind of policy exists in
the first place.  I can imagine everything working fairly well without
any real policy, but apparently there are vendors who want restrictive
policies.  What I can't imagine is anyone who wants a restrictive
policy but is then okay with the host only partially enforcing it.

>
> Thanks,
> -Kai

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-16 16:25                                           ` Andy Lutomirski
@ 2017-06-16 16:31                                             ` Christopherson, Sean J
  2017-06-16 16:43                                               ` Andy Lutomirski
  0 siblings, 1 reply; 78+ messages in thread
From: Christopherson, Sean J @ 2017-06-16 16:31 UTC (permalink / raw)
  To: 'Andy Lutomirski', Huang, Kai
  Cc: intel-sgx-kernel-dev, kvm list, Radim Krcmar, Paolo Bonzini

Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> >
> >
> > On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
> >>
> >> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
> >>>>
> >>>>
> >>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
> >>>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> I don't know whether SGX driver will have restrict on running
> >>>>> provisioning
> >>>>> enclave. In my understanding provisioning enclave is always from Intel.
> >>>>> However I am not expert here and probably be wrong. Can you point out
> >>>>> *exactly* what restricts in host must/should be applied to guest so
> >>>>> that
> >>>>> Jarkko can know whether he will support those restricts or not?
> >>>>> Otherwise
> >>>>> I
> >>>>> don't think we even need to talk about this topic at current stage.
> >>>>>
> >>>>
> >>>> The whole point is that I don't know.  But here are two types of
> >>>> restriction I can imagine demand for:
> >>>>
> >>>> 1. Only a particular approved provisioning enclave may run (be it
> >>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
> >>>> non-Intel provisioning enclave).  This would be done to restrict what
> >>>> types of remote attestation can be done. (Intel supplies a remote
> >>>> attestation service that uses some contractual policy that I don't
> >>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
> >>>>    Imposing this policy on guests more or less requires filtering EINIT.
> >>>
> >>>
> >>>
> >>> Hi Andy,
> >>>
> >>> Sorry for late reply.
> >>>
> >>> What is the issue if host and guest run provisioning enclave from
> >>> different
> >>> vendor, for example, host runs intel's provisioning enclave, and guest
> >>> runs
> >>> other vendor's provisioning enclave? Or different guests run provisioning
> >>> enclaves from different vendors?
> >>
> >>
> >> There's no issue unless someone has tried to impose a policy.  There
> >> is clearly at least some interest in having policies that affect what
> >> enclaves can run -- otherwise there wouldn't be LEs in the first
> >> place.
> >>
> >>>
> >>> One reason I am asking is that, on Xen (where we don't have concept of
> >>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
> >>> all, and guests will be able to run any enclave from any signer as their
> >>> wish.
> >>
> >>
> >> That seems entirely reasonable.  Someone may eventually ask Xen to add
> >> support for SGX enclave restrictions, in which case you'll either have
> >> to tell them that it won't happen or implement it.
> >>
> >>>
> >>> Sorry that I don't understand (or kind of forgot) the issues here.
> >>>
> >>>>
> >>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
> >>>> you would want to allow a specific list of enclave signers or even
> >>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
> >>>> could kludge this up with a restrictive LE policy, but you could also
> >>>> do it for real by implementing the specific restriction in the kernel.
> >>>> Then you'd want to impose it on the guest, and you'd do it by
> >>>> filtering EINIT.
> >>>
> >>>
> >>> Assuming the enclave hash means measurement of enclave, and assuming we
> >>> have
> >>> a policy that we only allow enclave from one signer to run, would you
> >>> also
> >>> elaborate the issue that, if host and guest run enclaves from different
> >>> signer? If host has such policy, and we are allowing creating guests on
> >>> such
> >>> host, I think that typically we will have the same policy in the guest
> >>
> >>
> >> Yes, I presume this too, but.
> >>
> >>> (vetted by guest's kernel). The owner of that host should be aware of the
> >>> risk (if there's any) by creating guest and run enclave inside it.
> >>
> >>
> >> No.  The host does not trust the guest in general.  If the host has a
> >
> >
> > I agree.
> >
> >> policy that the only enclave that shall run is X, that doesn't mean
> >> that the host shall reject all enclaves requested by the normal
> >> userspace API except X but that, if /dev/kvm is used, then the user is
> >> magically trusted to not load a guest that fails to respect the host
> >> policy.  It means that the only enclave that shall run is X regardless
> >> of what interface is used.  The host must only allow X to be loaded by
> >> its userspace and the host must only allow X to be loaded by a guest.
> >>
> >
> > This is theoretical thing. I think your statement makes sense only if we
> > have specific example that can prove there's actual risk when allowing guest
> > to exceed X approved by host.
> >
> > I will dig more in your previous emails to see whether you have listed such
> > real cases (I some kind forgot sorry) but if you don't mind, you can list
> > such cases here.
> 
> I'm operating under the assumption that some kind of policy exists in
> the first place.  I can imagine everything working fairly well without
> any real policy, but apparently there are vendors who want restrictive
> policies.  What I can't imagine is anyone who wants a restrictive
> policy but is then okay with the host only partially enforcing it.

I think there is a certain amount of inception going on here, i.e. the only
reason we're discussing LE enforced policies in the kernel is because the LE
architecture exists and can't be disabled.  The LE, as originally designed,
is intended to be a way for *userspace* to control what code can run on the
system, e.g. to provide a hook for anti-virus/malware to inspect an enclave
since it's impossible to inspect an enclave once it is running.

The kernel doesn't need an LE to restrict what enclaves can run, e.g. it can
perform inspection at any point during the initialization process.  This is
true for guest enclaves as well since the kernel can trap EINIT.  By making
the LE kernel-only we've bastardized the concept of the LE and have negated
the primary value provided by an LE[1][2].  In my opinion, the discussion of
the kernel's launch policies is much ado about nothing, e.g. if supported by
hardware, I think we'd opt to disable launch control completely.

[1] On a system with unlocked IA32_SGXLEPUBKEYHASH MSRs, the only value added
by a using a LE to enforce the kernel's policies is defense-in-depth, e.g. an
attacker can't hide malicious code in an enclave even if it gains control of
the kernel.  I think this is a very minor benefit since running in an enclave
doesn't grant any new privileges and doesn't persist across system reset.

[2] I think it's safe to assume that any use case that requires locked hash
MSRs is out of scope for this discussion, given that the upstream kernel will
require unlocked MSRs.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-06-16 16:31                                             ` Christopherson, Sean J
@ 2017-06-16 16:43                                               ` Andy Lutomirski
  0 siblings, 0 replies; 78+ messages in thread
From: Andy Lutomirski @ 2017-06-16 16:43 UTC (permalink / raw)
  To: Christopherson, Sean J
  Cc: Andy Lutomirski, Huang, Kai, intel-sgx-kernel-dev, kvm list,
	Radim Krcmar, Paolo Bonzini

On Fri, Jun 16, 2017 at 9:31 AM, Christopherson, Sean J
<sean.j.christopherson@intel.com> wrote:
> I think there is a certain amount of inception going on here, i.e. the only
> reason we're discussing LE enforced policies in the kernel is because the LE
> architecture exists and can't be disabled.  The LE, as originally designed,
> is intended to be a way for *userspace* to control what code can run on the
> system, e.g. to provide a hook for anti-virus/malware to inspect an enclave
> since it's impossible to inspect an enclave once it is running.
>
> The kernel doesn't need an LE to restrict what enclaves can run, e.g. it can
> perform inspection at any point during the initialization process.  This is
> true for guest enclaves as well since the kernel can trap EINIT.  By making
> the LE kernel-only we've bastardized the concept of the LE and have negated
> the primary value provided by an LE[1][2].  In my opinion, the discussion of
> the kernel's launch policies is much ado about nothing, e.g. if supported by
> hardware, I think we'd opt to disable launch control completely.

Agreed.

I don't think I've ever said that the kernel should implement
restrictions on what enclaves should run [1].  All I've said is that
(a) if the kernel does implement restrictions like this, it should
apply them to guests as well and (b) that the kernel should probably
trap EINIT because that's the most sensible way to deal with the MSRs.

[1] With the possible exception of provisioning enclaves.  I'm still
not convinced that anyone except root should be allowed to run an
exclave with the provision bit set, as that bit gives access to the
provisioning key, which is rather special.  From memory, it bypasses
the owner epoch, and it may have privacy issues.  Maybe this is a
nonissue, but I'd like to see someone seriously analyze how
provisioning enclaves that may not be signed by Intel affect the
overall security of the system and how Linux should handle them.  SGX
was designed under the assumption that provisioning enclaves would
only ever be signed by Intel, and that's not the case any more, and
dealing with this intelligently may require some thought.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave
  2017-05-11  9:34     ` Huang, Kai
@ 2017-06-19  5:02       ` Huang, Kai
  2017-06-27 15:29         ` Radim Krčmář
  0 siblings, 1 reply; 78+ messages in thread
From: Huang, Kai @ 2017-06-19  5:02 UTC (permalink / raw)
  To: Paolo Bonzini, Kai Huang, rkrcmar, kvm



On 5/11/2017 9:34 PM, Huang, Kai wrote:
> 
> 
> On 5/8/2017 8:22 PM, Paolo Bonzini wrote:
>>
>>
>> On 08/05/2017 07:24, Kai Huang wrote:
>>> @@ -6977,6 +7042,31 @@ static __exit void hardware_unsetup(void)
>>>   */
>>>  static int handle_pause(struct kvm_vcpu *vcpu)
>>>  {
>>> +    /*
>>> +     * SDM 39.6.3 PAUSE Instruction.
>>> +     *
>>> +     * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM 
>>> should
>>> +     * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
>>> +     * again without further PAUSE-looping VMEXIT.
>>> +     *
>>> +     * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should 
>>> disable
>>> +     * 'PAUSE exiting' so PAUSE can be executed in Enclave again 
>>> without
>>> +     * further PAUSE VMEXIT.
>>> +     */
>>
>> How is PLE re-enabled?
> 
> Currently it will not be enabled again. Probably we can re-enable it at 
> another VMEXIT, if that VMEXIT is not PLE VMEXIT?

Hi Paolo, all,

Sorry for reply late.

Do you think it is feasible to turn on PLE again on another further 
VMEXIT? Or another VMEXIT that is not from enclave?

Any suggestions so that I can improve in next version RFC?

> 
>>
>> I don't understand the interaction of the internal control registers
>> (paragraph 41.1.4) with VMX.  How can you migrate the VM between EENTER
>> and EEXIT?
> 
> Current SGX hardware architecture doesn't support live migration, as the 
> key architecture of SGX is not migratable. For example, some keys are 
> persistent and bound to hardware (sealing and attestation). Therefore 
> right now if SGX is exposed to guest, live migration is not supposed.

We recently had a discussion on this. We figured out that we are able to 
support SGX live migration with some kind of workaround -- basically the 
idea is we can ignore source VM's EPC and depend on destination VM's SGX 
driver and userspace SW stack to handle *sudden lose of EPC*, but this 
will cause some inconsistence with HW behavior, and will need to depend 
on driver's ability. I'll elaborate this in next version design and RFC 
and we can have a discussion whether to support it or not (along with 
snapshot support). But maybe we can also have a detailed discussion if 
you want to start now?

Thanks,
-Kai
> 
>>
>> In addition, paragraph 41.1.4 does not include the parts of CR_SAVE_FS*
>> and CR_SAVE_GS* (base, limit, access rights) and does not include
>> CR_ENCLAVE_ENTRY_IP.
> 
> CPU can exit enclave via EEXIT, or by Asynchronous Enclave Exit (AEX). 
> All non-EEXIT enclave exit are referred as AEX. When AEX happens, a so 
> called "synthetic state" is created on CPU to prevent any software from 
> trying to observe *secret* from CPU status in AEX. What exactly will be 
> pushed in "synthetic state" is in SDM 40.3.
> 
> So in my understanding, CPU won't put something like 
> "CR_ENCLAVE_ENTRY_IP" to RIP. Actually during AEX, Asynchronous Exit 
> Pointer (AEP), which is in normal memory, will be pushed to stack and 
> IRET will return to AEP to continue to run. AEP typically points to some 
> small piece of code which basically calls ERESUME so that we can go back 
> to enclave to run.
> 
> Hope my reply answered your questions?
> 
> Thanks,
> -Kai
> 
>>
>> Paolo
>>
>>> +    if (vmx_exit_from_enclave(vcpu)) {
>>> +        u32 exec_ctl, secondary_exec_ctl;
>>> +
>>> +        exec_ctl = vmx_exec_control(to_vmx(vcpu));
>>> +        exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
>>> +        vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
>>> +
>>> +        secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
>>> +        secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>>> +        vmcs_set_secondary_exec_control(secondary_exec_ctl);
>>> +
>>> +        return 1;
>>> +    }
>>> +
>>
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave
  2017-06-19  5:02       ` Huang, Kai
@ 2017-06-27 15:29         ` Radim Krčmář
  2017-06-28 22:22           ` Huang, Kai
  0 siblings, 1 reply; 78+ messages in thread
From: Radim Krčmář @ 2017-06-27 15:29 UTC (permalink / raw)
  To: Huang, Kai; +Cc: Paolo Bonzini, Kai Huang, kvm

2017-06-19 17:02+1200, Huang, Kai:
> Hi Paolo, all,
> 
> Sorry for reply late.

I'm sorry as well,

> Do you think it is feasible to turn on PLE again on another further VMEXIT?
> Or another VMEXIT that is not from enclave?
> 
> Any suggestions so that I can improve in next version RFC?

KVM doesn't enable "PAUSE exiting", it enables "PAUSE loop exiting".
SDM recommends disabling "PAUSE exiting" because the VM exits
(fault-like) on every PAUSE and needs intervention in order to progress,
but SDM doesn't say to disable "PAUSE loop exiting".

Being inside an enclave doesn't change the usefulness of PLE (yielding
the CPU to a task that isn't blocked), so I think it would be best to do
nothing with it, thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave
  2017-06-27 15:29         ` Radim Krčmář
@ 2017-06-28 22:22           ` Huang, Kai
  0 siblings, 0 replies; 78+ messages in thread
From: Huang, Kai @ 2017-06-28 22:22 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Kai Huang, kvm



On 6/28/2017 3:29 AM, Radim Krčmář wrote:
> 2017-06-19 17:02+1200, Huang, Kai:
>> Hi Paolo, all,
>>
>> Sorry for reply late.
> 
> I'm sorry as well,
> 
>> Do you think it is feasible to turn on PLE again on another further VMEXIT?
>> Or another VMEXIT that is not from enclave?
>>
>> Any suggestions so that I can improve in next version RFC?
> 
> KVM doesn't enable "PAUSE exiting", it enables "PAUSE loop exiting".
> SDM recommends disabling "PAUSE exiting" because the VM exits
> (fault-like) on every PAUSE and needs intervention in order to progress,
> but SDM doesn't say to disable "PAUSE loop exiting".
> 
> Being inside an enclave doesn't change the usefulness of PLE (yielding
> the CPU to a task that isn't blocked), so I think it would be best to do
> nothing with it, thanks.

Hi Radim,

Thanks for feedback. You are right, and obviously I didn't read SDM 
carefully. Will do in next version. :)

Thanks,
-Kai
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support
  2017-05-12  6:11         ` Andy Lutomirski
  2017-05-12 18:48           ` Christopherson, Sean J
  2017-05-16  0:48           ` Huang, Kai
@ 2017-07-19 15:04           ` Sean Christopherson
  2 siblings, 0 replies; 78+ messages in thread
From: Sean Christopherson @ 2017-07-19 15:04 UTC (permalink / raw)
  To: Andy Lutomirski, Huang, Kai, Jarkko Sakkinen
  Cc: kvm list, Radim Krcmar, haim.cohen, intel-sgx-kernel-dev, Paolo Bonzini

On Thu, 2017-05-11 at 23:11 -0700, Andy Lutomirski wrote:
> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > 
> > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > will take the mutex, compare the percpu variable to the desired value,
> > > and, if it's different, do WRMSR and update the percpu variable.
> > > 
> > > KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
> > > state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
> > > support the same handling as the host.  There is no action required at
> > > all on KVM guest entry and exit.
> > 
> > This is doable, but SGX driver needs to do those things and expose
> > interfaces for KVM to use. In terms of the percpu data, it is nice to have,
> > but I am not sure whether it is mandatory, as IMO EINIT is not even in
> > performance critical path. We can simply read old value from MSRs out and
> > compare whether the old equals to the new.
> I think the SGX driver should probably live in arch/x86, and the
> interface could be a simple percpu variable that is exported (from the
> main kernel image, not from a module).

Jarkko, what are your thoughts on moving the SGX code into arch/x86 and removing
the option to build it as a module?  This would simplify the KVM and EPC cgroup
implementations.

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2017-07-19 15:04 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08  5:24 [RFC PATCH 00/10] Basic KVM SGX Virtualization support Kai Huang
2017-05-08  5:24 ` [PATCH 01/10] x86: add SGX Launch Control definition to cpufeature Kai Huang
2017-05-08  5:24 ` [PATCH 02/10] kvm: vmx: add ENCLS VMEXIT detection Kai Huang
2017-05-08  5:24 ` [PATCH 03/10] kvm: vmx: detect presence of host SGX driver Kai Huang
2017-05-08  5:24 ` [PATCH 04/10] kvm: sgx: new functions to init and destory SGX for guest Kai Huang
2017-05-08  5:24 ` [PATCH 05/10] kvm: x86: add KVM_GET_SUPPORTED_CPUID SGX support Kai Huang
2017-05-08  5:24 ` [PATCH 06/10] kvm: x86: add KVM_SET_CPUID2 " Kai Huang
2017-05-08  5:24 ` [PATCH 07/10] kvm: vmx: add SGX IA32_FEATURE_CONTROL MSR emulation Kai Huang
2017-05-08  5:24 ` [PATCH 08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support Kai Huang
2017-05-12  0:32   ` Huang, Kai
2017-05-12  3:28     ` [intel-sgx-kernel-dev] " Andy Lutomirski
2017-05-12  4:56       ` Huang, Kai
2017-05-12  6:11         ` Andy Lutomirski
2017-05-12 18:48           ` Christopherson, Sean J
2017-05-12 20:50             ` Christopherson, Sean J
2017-05-16  0:59             ` Huang, Kai
2017-05-16  1:22             ` Huang, Kai
2017-05-16  0:48           ` Huang, Kai
2017-05-16 14:21             ` Paolo Bonzini
2017-05-18  7:54               ` Huang, Kai
2017-05-18  8:58                 ` Paolo Bonzini
2017-05-17  0:09             ` Andy Lutomirski
2017-05-18  7:45               ` Huang, Kai
2017-06-06 20:52                 ` Huang, Kai
2017-06-06 21:22                   ` Andy Lutomirski
2017-06-06 22:51                     ` Huang, Kai
2017-06-07 14:45                       ` Cohen, Haim
2017-06-08 12:31                   ` Jarkko Sakkinen
2017-06-08 23:47                     ` Huang, Kai
2017-06-08 23:53                       ` Andy Lutomirski
2017-06-09 15:38                         ` Cohen, Haim
2017-06-10 12:23                       ` Jarkko Sakkinen
2017-06-11 22:45                         ` Huang, Kai
2017-06-12  8:36                           ` Jarkko Sakkinen
2017-06-12  9:53                             ` Huang, Kai
2017-06-12 16:24                               ` Andy Lutomirski
2017-06-12 22:08                                 ` Huang, Kai
2017-06-12 23:00                                   ` Andy Lutomirski
2017-06-16  3:46                                     ` Huang, Kai
2017-06-16  4:11                                       ` Andy Lutomirski
2017-06-16  4:33                                         ` Huang, Kai
2017-06-16  9:34                                           ` Huang, Kai
2017-06-16 16:03                                           ` Andy Lutomirski
2017-06-16 16:25                                           ` Andy Lutomirski
2017-06-16 16:31                                             ` Christopherson, Sean J
2017-06-16 16:43                                               ` Andy Lutomirski
2017-06-13 18:57                               ` Jarkko Sakkinen
2017-06-13 19:05                                 ` Jarkko Sakkinen
2017-06-13 20:13                                   ` Sean Christopherson
2017-06-14  9:37                                     ` Jarkko Sakkinen
2017-06-14 15:11                                       ` Christopherson, Sean J
2017-06-14 17:03                                         ` Jarkko Sakkinen
2017-06-13 23:28                                 ` Huang, Kai
2017-06-14  9:44                                   ` Jarkko Sakkinen
2017-07-19 15:04           ` Sean Christopherson
2017-05-15 12:46       ` Jarkko Sakkinen
2017-05-15 23:56         ` Huang, Kai
2017-05-16 14:23           ` Paolo Bonzini
2017-05-17 14:21           ` Sean Christopherson
2017-05-18  8:14             ` Huang, Kai
2017-05-20 21:55               ` Andy Lutomirski
2017-05-23  5:43                 ` Huang, Kai
2017-05-23  5:55                   ` Huang, Kai
2017-05-23 16:34                   ` Andy Lutomirski
2017-05-23 16:43                     ` Paolo Bonzini
2017-05-24  8:20                       ` Huang, Kai
2017-05-20 13:23           ` Jarkko Sakkinen
2017-05-08  5:24 ` [PATCH 09/10] kvm: vmx: handle ENCLS VMEXIT Kai Huang
2017-05-08  8:08   ` Paolo Bonzini
2017-05-10  1:30     ` Huang, Kai
2017-05-08  5:24 ` [PATCH 10/10] kvm: vmx: handle VMEXIT from SGX Enclave Kai Huang
2017-05-08  8:22   ` Paolo Bonzini
2017-05-11  9:34     ` Huang, Kai
2017-06-19  5:02       ` Huang, Kai
2017-06-27 15:29         ` Radim Krčmář
2017-06-28 22:22           ` Huang, Kai
2017-05-08  5:24 ` [PATCH 11/11] kvm: vmx: workaround FEATURE_CONTROL[17] is not set by BIOS Kai Huang
2017-05-08  5:29   ` Huang, Kai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.