All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
@ 2017-07-09  8:03 Kai Huang
  2017-07-09  8:04 ` [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset Kai Huang
                   ` (16 more replies)
  0 siblings, 17 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:03 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

Hi all,

This series is RFC Xen SGX virtualization support design and RFC draft patches.

Intel SGX (Software Guard Extensions) is a new set of instructions and memory
access mechanisms targetting for application developers seeking to protect
select code and data from disclosure or modification.

The SGX specification can be found in latest Intel SDM as Volume 3D:

https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf

SGX is relatively more complicated on specification (entire Volume D) so it is
unrealistic to list all hardware details here. First part of the design is the
brief SGX introduction, which I think is mandatory for introducing SGX
virtualization support. Part 2 is design itself. And I put some reference at
last.

In first part I only introduced the info related virtualization support,
although this is definitely not the most important part of SGX. Other parts of
SGX (most related to cryptography), ie, enclave measurement, SGX key
architecture, Sealing & Attestion (which is critical feature actually) are
ommited. Please refer to SGX specification for detailed info.

In the design there are some particualr points that I don't know which
implementation is better. For those I added a question mark (?) at the right
of the menu. Your comments on those parts (and other comments as well, of
course) are highly appreciated.

Because SGX has lots of details, so I think the design itself can only be high
level, and I also included the RFC patches which contains lots of details.
Your comments on the patches are also highly appreciated.

The code can also be found at below github repo for your access:

    # git clone https://github.com/01org/xen-sgx -b rfc-v1

And there is another branch named 4.6-sgx is another implementation based on
Xen 4.6, it is old but it has some different implementation with this rfc-v1
patches in terms of design choice (ex, it adds a dedicated hypercall).

Please help to review and give comments. Thanks in advance.

==============================================================================

1. SGX Introduction
    1.1 Overview
        1.1.1 Enclave
        1.1.2 EPC (Enclave Paage Cache)
        1.1.3 ENCLS and ENCLU
    1.2 Discovering SGX Capability
        1.2.1 Enumerate SGX via CPUID
        1.2.2 Intel SGX Opt-in Configuration
    1.3 Enclave Life Cycle
        1.3.1 Constructing & Destroying Enclave
        1.3.2 Enclave Entry and Exit
            1.3.2.1 Synchonous Entry and Exit
            1.3.2.2 Asynchounous Enclave Exit
        1.3.3 EPC Eviction and Reload
    1.4 SGX Launch Control
    1.5 SGX Interaction with IA32 and IA64 Architecture
2. SGX Virtualization Design
    2.1 High Level Toolstack Changes
        2.1.1 New 'epc' parameter
        2.1.2 New XL commands (?)
        2.1.3 Notify domain's virtual EPC base and size to Xen
        2.1.4 Launch Control Support (?)
    2.2 High Level Hypervisor Changes
        2.2.1 EPC Management (?)
        2.2.2 EPC Virtualization (?)
        2.2.3 Populate EPC for Guest
        2.2.4 New Dedicated Hypercall (?)
        2.2.5 Launch Control Support
        2.2.6 CPUID Emulation
        2.2.7 MSR Emulation
        2.2.8 EPT Violation & ENCLS Trapping Handling
        2.2.9 Guest Suspend & Resume
        2.2.10 Destroying Domain
    2.3 Additional Point: Live Migration, Snapshot Support (?)
3. Reference

1. SGX Introduction

1.1 Overview

1.1.1 Enclave

Intel Software Guard Extensions (SGX) is a set of instructions and mechanisms
for memory accesses in order to provide security accesses for sensitive
applications and data. SGX allows an application to use it's pariticular address
space as an *enclave*, which is a protected area provides confidentiality and
integrity even in the presence of privileged malware. Accesses to the enclave
memory area from any software not resident in the enclave are prevented,
including those from privileged software. Below diagram illustrates the presence
of Enclave in application.

        |-----------------------|
        |                       |
        |   |---------------|   |
        |   |   OS kernel   |   |       |-----------------------|
        |   |---------------|   |       |                       |
        |   |               |   |       |   |---------------|   |
        |   |---------------|   |       |   | Entry table   |   |
        |   |   Enclave     |---|-----> |   |---------------|   |
        |   |---------------|   |       |   | Enclave stack |   |
        |   |   App code    |   |       |   |---------------|   |
        |   |---------------|   |       |   | Enclave heap  |   |
        |   |   Enclave     |   |       |   |---------------|   |
        |   |---------------|   |       |   | Enclave code  |   |
        |   |   App code    |   |       |   |---------------|   |
        |   |---------------|   |       |                       |
        |           |           |       |-----------------------|
        |-----------------------|

SGX supports SGX1 and SGX2 extensions. SGX1 provides basic enclave support,
and SGX2 allows additional flexibility in runtime management of enclave
resources and thread execution within an enclave.

1.1.2 EPC (Enclave Page Cache)

Just like normal application memory management, enclave memory management can be
devided into two parts: address space allocation and memory commitment. Address
space allocation is allocating particular range of linear address space for
enclave. Memory commitment is assigning actual resource for the enclave.

Enclave Page Cache (EPC) is the physical resource used to commit to enclave.
EPC is divided to 4K pages. An EPC page is 4K in size and always aligned to 4K
boundary. Hardware performs additional access control checks to restrict access
to the EPC page. The Enclave Page Cache Map (EPCM) is a secure structure which
holds one entry for each EPC page, and is used by hardware to track the status
of each EPC page (invisibe to software). Typically EPC and EPCM are reserved
by BIOS as Processor Reserved Memory but the actual amount, size, and layout
of EPC are model-specific, and dependent on BIOS settings. EPC is enumerated
via new SGX CPUID, and is reported as reserved memory.

EPC pages can either be invalid or valid. There are 4 valid EPC types in SGX1:
regular EPC page, SGX Enclave Control Structure (SECS) page, Thread Control
Structure (TCS) page, and Version Array (VA) page. SGX2 adds Trimmed EPC page.
Each enclave is associated with one SECS page. Each thread in enclave is
associated with one TCS page. VA page is used in EPC page eviction and reload.
Trimmed EPC page is introduced in SGX2 when particular 4K page in enclave is
going to be freed (trimmed) at runtime after enclave is initialized.

1.1.3 ENCLS and ENCLU

Two new instructions ENCLS and ENCLU are introduced to manage enclave and EPC.
ENCLS can only run in ring 0, while ENCLU can only run in ring 3. Both ENCLS and
ENCLU have multiple leaf functions, with EAX indicating the specific leaf
function.

SGX1 supports below ENCLS and ENCLU leaves:

    ENCLS:
    - ECREATE, EADD, EEXTEND, EINIT, EREMOVE (Enclave build and destroy)
    - EPA, EBLOCK, ETRACK, EWB, ELDU/ELDB (EPC eviction & reload)

    ENCLU:
    - EENTER, EEXIT, ERESUME (Enclave entry, exit, re-enter)
    - EGETKEY, EREPORT (SGX key derivation, attestation)

Additionally, SGX2 supports below ENCLS and ENCLU leaves for runtime add/remove
EPC page to enclave after enclave is initialized, along with permission change.

    ENCLS:
    - EAUG, EMODT, EMODPR
    
    ENCLU:
    - EACCEPT, EACCEPTCOPY, EMODPE

VMM is able to interfere with ENCLS running in guest (see 1.2.x SGX interaction
with VMX) but is unable to interfere with ENCLU.

1.2 Discovering SGX Capability

1.2.1 Enumerate SGX via CPUID

If CPUID.0x7.0:EBX.SGX (bit 2) is 1, then processor supports SGX and SGX
capability and resource can be enumerated via new SGX CPUID (0x12).
CPUID.0x12.0x0 reports SGX capability, such as the presence of SGX1, SGX2,
enclave's maximum size for both 32-bit and 64-bit application. CPUID.0x12.0x1
reports the availability of bits that can be set for SECS.ATTRIBUTES.
CPUID.0x12.0x2 reports the EPC resource's base and size. Platform may support
multiple EPC sections, and CPUID.0x12.0x3 and further sub-leaves can be used
to detect the existence of multiple EPC sections (until CPUID reports invalid
EPC).

Refer to 37.7.2 Intel SGX Resource Enumeration Leaves for full description of
SGX CPUID 0x12.

1.2.2 Intel SGX Opt-in Configuration

On processors that support Intel SGX, IA32_FEATURE_CONTROL also provides the
SGX_ENABLE bit (bit 18) to turn on/off SGX. Before system software can enable
and use SGX, BIOS is required to set IA32_FEATURE_CONTROL.SGX_ENABLE = 1 to
opt-in SGX.

Setting SGX_ENABLE follows the rules of IA32_FEATURE_CONTROL.LOCK (bit 0).
Software is considered to have opted into Intel SGX if and only if
IA32_FEATURE_CONTROL.SGX_ENABLE and IA32_FEATURE_CONTROL.LOCK are set to 1.

The setting of IA32_FEATURE_CONTROL.SGX_ENABLE (bit 18) is not reflected by
SGX CPUID. Enclave instructions will behavior differently according to value
of CPUID.0x7.0x0:EBX.SGX and whether BIOS has opted-in SGX.

Refer to 37.7.1 Intel SGX Opt-in Configuration for more information.

1.3 Enclave Life Cycle

1.3.1 Constructing & Destroying Enclave

Enclave is created via ENCLS[ECREATE] leaf by previleged software. Basically
ECREATE converts an invalid EPC page into SECS page, according to a source SECS
structure resides in normal memory. The source SECS contains enclave's info
such as base (linear) address, size, enclave attributes, enclave's measurement,
etc.

After ECREATE, for each 4K linear address space page, priviledged software uses
EADD and EEXTEND to add one EPC page to it. Enclave code/data (resides in normal
memory) is loaded to enclave during EADD for enclave's each 4K page. After all
EPC pages are added to enclave, priviledged software calls EINIT to initialize
the enclave, and then enclave is ready to run.

During enclave is constructed, enclave measurement, which is a SHA256 hash
value, is also built according to enclave's size, code/data itself and its
location in enclave, etc. The measurement can be used to uniquely identify the
enclave. SIGSTRUCT in EINIT leaf also contains the measurement specified by
untrusted software, via MRENCLAVE. EINIT will check the two measurements and
will only succeed when the two matches.

Enclave is destroyed by running EREMOVE for all Enclave's EPC page, and then
for enclave's SECS. EREMOVE will report SGX_CHILD_PRESENT error if it is called
for SECS when there's still regular EPC pages that haven't been removed from
enclave.

Please refer to SDM chapter 39.1 Constructing an Enclave for more infomation.

1.3.2 Enclave Entry and Exit

1.3.2.1 Synchonous Entry and Exit

After enclave is constructed, non-priviledged software use ENCLU[EENTER] to
enter enclave to run. While process runs in enclave, non-priviledged software
can use ENCLU[EEXIT] to exit from enclave and return to normal mode.

1.3.2.2 Asynchounous Enclave Exit

Asynchronous and synchronous events, such as exceptions, interrupts, traps,
SMIs, and VM exits may occur while executing inside an enclave. These events
are referred to as Enclave Exiting Events (EEE). Upon an EEE, the processor
state is securely saved inside the enclave and then replaced by a synthetic
state to prevent leakage of secrets. The process of securely saving state and
establishing the synthetic state is called an Asynchronous Enclave Exit (AEX).

After AEX, non-priviledged software uses ENCLU[ERESUME] to re-enter enclave.
The SGX userspace software maintains a small piece of code (resides in normal
memory) which basically calls ERESUME to re-enter enclave. The address of this
piece of code is called Asynchronous Exit Pointer (AEP). AEP is specified as
parameter in EENTER and will be kept internally in enclave. Upon AEX, AEP will
be pushed to stack and upon returning from EEE handling, such as IRET, AEP will
be loaded to RIP and ERESUME will be called subsequently to re-enter enclave.

During AEX the processor will do context saving and restore automatically
therefore no change to interrupt handling of OS kernel and VMM is required. It
is SGX userspace software's responsibility to setup AEP correctly.

Please refer to SDM chapter 39.2 Enclave Entry and Exit for more infomation.

1.3.3 EPC Eviction and Reload

SGX also allows priviledged software to evict any EPC pages that are used by
enclave. The idea is the same as normal memory swapping. Below is the detail
info of how to evict EPC pages.

Below is the sequence to evict regular EPC page:

	1) Select one or multiple regular EPC pages from one enclave
	2) Remove EPT/PT mapping for selected EPC pages
	3) Send IPIs to remote CPUs to flush TLB of selected EPC pages
	4) EBLOCK on selected EPC pages
	5) ETRACK on enclave's SECS page
	6) allocate one available slot (8-byte) in VA page
	7) EWB on selected EPC pages

With EWB taking:

	- VA slot, to restore eviction version info.
	- one normal 4K page in memory, to store encrypted content of EPC page.
	- one struct PCMD in memory, to store meta data.

    (VA slot is a 8-byte slot in VA page, which is a particualr EPC page.)

And below is the sequence to evict an SECS page or VA page:

	1) locate SECS (or VA) page
	2) remove EPT/PT mapping for SECS (or VA) page
	3) Send IPIs to remote CPUs
	6) allocate one available slot (8-byte) in VA page
	4) EWB on SECS (or) page

And for evicting SECS page, all regular EPC pages that belongs to that SECS
must be evicted out prior, otherwise EWB returns SGX_CHILD_PRESENT error.

And to reload an EPC page:

	1) ELDU/ELDB on EPC page
	2) setup EPT/PT mapping

With ELDU/ELDB taking:

	- location of SECS page
	- linear address of enclave's 4K page (that we are going to reload to)
	- VA slot (used in EWB)
	- 4K page in memory (used in EWB)
	- struct PCMD in memory (used in EWB)

Please refer to SDM chapter 39.5 EPC and Management of EPC pages for more
information.

*********** Instruction Behavior changes in Enclave

- Illegal instructions inside enclave

            Instruction                 Result              Comment

    CPUID,GETSEC,RDPMC,SGDT,SIDT,SLDT,STR,VMCALL,

1.4 SGX Launch Control

SGX requires running "Launch Enclave" (LE) before running any other enclaves.
This is because LE is the only enclave that does not requires EINITTOKEN in
EINIT. Running any other enclave requires a valid EINITTOKEN, which contains
MAC of the (first 192 bytes) EINITTOKEN calculated by EINITTOKEN key. EINIT
will verify the MAC via internally deriving the EINITTOKEN key, and only the
EINITTOKEN that has matched MAC will be accepted by EINIT. The EINITTOKEN key
derivation depends on some info from LE. The typical process is LE generates
EINITTOKEN for other enclave according to LE itself and the target enclave,
and calcualtes the MAC by using ENCLU[EGETKEY] to get the EINITTOKEN key. Only
LE is able to get the EINITTOKEN key.

Running LE requies the SHA256 hash of LE signer's RSA public key (SHA256 of
sigstruct->modulus) to equal to IA32_SGXLEPUBKEYHASH[0-3] MSRs (the 4 MSRs
together makes up 256-bit SHA256 hash value).

If CPUID.0x7.0x0:EBX.SGX is set, then IA32_SGXLEPUBKEYHASHn are readable. If
CPUID.0x7.0x0:ECX.SGX_LAUNCH_CONTROL[bit 30] is set, then IA32_FEATURE_CONTROL
MSR has SGX_LAUNCH_CONTROL_ENABLE bit (bit 17) available. 1-setting of
SGX_LAUNCH_CONTROL_ENABLE bit enables runtime change of IA32_SGXLEPUBKEYHASHn
after IA32_FEATURE_CONTROL is locked. Otherwise, IA32_SGXLEPUBKEYHASHn are
read-only after IA32_FEATURE_CONTROL is locked. IA32_SGXLEPUBKEYHASHn will be
set to SHA256 hash of Intel's default RSA public key.

Above mechanism allows 3rd party to run their own LE.

On physical machine, typically BIOS will provide option to *lock* or *unlock*
IA32_SGXLEPUBKEYHASHn before transfering to OS. BIOS may also provide interface
for user to change default value of IA32_SGXLEPUBKEYHASHn, but what interfaces
will be provided by BIOS is BIOS implementation dependent.

1.5 SGX Interaction with IA32 and IA64 Architecture

SDM Chapter 42 describes SGX interaction with various features in IA32 and IA64
architecture. Below outlines the major ones. Refer to Chapter 42 for full
description of SGX interaction with various IA32 and IA64 features.

1.5.1 VMX Changes for Supporting SGX Virtualization

A new 64-bit ENCLS-exiting bitmap control field is added to VMCS (encoding
0202EH) to control VMEXIT on ENCLS leaf functions. And a new "Enable ENCLS
exiting" control bit (bit 15) is defined in secondary processor based vm
execution control. 1-Setting of "Enable ENCLS exiting" enables ENCLS-exiting
bitmap control. ENCLS-exiting bitmap controls which ENCLS leaves will trigger
VMEXIT.

Additionally two new bits are added to indicate whether VMEXIT (any) is from
enclave. Below two bits will be set if VMEXIT is from enclave:
    - Bit 27 in the Exit reason filed of Basic VM-exit information.
    - Bit 4 in the Interruptibility State of Guest Non-Register State of VMCS.

Refer to 42.5 Interactions with VMX, 27.2.1 Basic VM-Exit Information, and
27.3.4 Saving Non-Register.

1.5.2 Interaction with XSAVE

SGX defines a sub-field called X-Feature Request Mask (XFRM) in the attributes
field of SECS. On enclave entry, SGX HW verifies XFRM in SECS.ATTRIBUTES are
already enabled in XCR0.

Upon AEX, SGX saves the processor extended state and miscellaneous state to
enclave's state-save area (SSA), and clear the secrets from processor extended
state that is used by enclave (from leaking secrets).

Refer to 42.7 Interaction with Processor Extended State and Miscellaneous State

1.5.3 Interaction with S state

When processor goes into S3-S5 state, EPC is destroyed, thus all enclaves are
destroyed as well consequently.

Refer to 42.14 Interaction with S States.

2. SGX Virtualization Design

2.1 High Level Toolstack Changes:

2.1.1 New 'epc' parameter

EPC is limited resource. In order to use EPC efficiently among all domains,
when creating guest, administrator should be able to specify domain's virtual
EPC size. And admin
alao should be able to get all domain's virtual EPC size.

For this purpose, a new 'epc = <size>' parameter is added to XL configuration
file. This parameter specifies guest's virtual EPC size. The EPC base address
will be calculated by toolstack internally, according to guest's memory size,
MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.

2.1.2 New XL commands (?)

Administrator should be able to get physical EPC size, and all domain's virtual
EPC size. For this purpose, we can introduce 2 additional commands:

    # xl sgxinfo

Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2,
etc) if necessary.

    # xl sgxlist <did>

Which will print out particular domain's virtual EPC size, or list all virtual
EPC sizes for all supported domains.

Alternatively, we can also extend existing XL commands by adding new option

    # xl info -sgx

Which will print out physical EPC size along with other physinfo. And

    # xl list <did> -sgx

Which will print out domain's virtual EPC size.

Comments?

In my RFC patches I didn't implement the commands as I don't know which
is better. In the github repo I mentioned at the beginning, there's an old
branch in which I implemented 'xl sgxinfo' and 'xl sgxlist', but they are
implemented via dedicated hypercall for SGX, which I am not sure whether is a
good option so I didn't include it in my RFC patches.

2.1.3 Notify domain's virtual EPC base and size to Xen

Xen needs to know guest's EPC base and size in order to populate EPC pages for
it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.

2.1.4 Launch Control Support (?)

Xen Launch Control Support is about to support running multiple domains with
each running its own LE signed by different owners (if HW allows, explained
below). As explained in 1.4 SGX Launch Control, EINIT for LE (Launch Enclave)
only succeeds when SHA256(SIGSTRUCT.modulus) matches IA32_SGXLEPUBKEYHASHn,
and EINIT for other enclaves will derive EINITTOKEN key according to 
IA32_SGXLEPUBKEYHASHn. Therefore, to support this, guest's virtual
IA32_SGXLEPUBKEYHASHn must be updated to phyiscal MSRs before EINIT (which
also means the physical IA32_SGXLEPUBKEYHASHn need to be *unlocked* in BIOS
before booting to OS).

For physical machine, it is BIOS's writer's decision that whether BIOS would
provide interface for user to specify customerized IA32_SGXLEPUBKEYHASHn (it
is default to digest of Intel's signing key after reset). In reality, OS's SGX
driver may require BIOS to make MSRs *unlocked* and actively write the hash
value to MSRs in order to run EINIT successfully, as in this case, the driver
will not depend on BIOS's capability (whether it allows user to customerize
IA32_SGXLEPUBKEYHASHn value).

The problem is for Xen, do we need a new parameter, such as 'lehash=<SHA256>'
to specify the default value of guset's virtual IA32_SGXLEPUBKEYHASHn? And do
we need a new parameter, such as 'lewr' to specify whether guest's virtual MSRs
are locked or not before handling to guest's OS?

I tends to not introduce 'lehash', as it seems SGX driver would actively update
the MSRs. And new parameter would add additional changes for upper layer
software (such as openstack). And 'lewr' is not needed either as Xen can always
*unlock* the MSRs to guest.

Please give comments?

Currently in my RFC patches above two parameters are not implemented.
Xen hypervisor will always *unlock* the MSRs. Whether there is 'lehash'
parameter or not doesn't impact Xen hypervisor's emulation of
IA32_SGXLEPUBKEYHASHn. See below Xen hypervisor changes for details.

2.2 High Level Xen Hypervisor Changes:

2.2.1 EPC Management (?)

Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before
supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible
that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on,
until invaid EPC is reported), but this is only true on multiple-socket server
machines. For server machines there are additional things also needs to be done,
such as NUMA EPC, scheduling, etc. We will support server machine in the future
but currently we only support one EPC.

EPC is reported as reserved memory (so it is not reported as normal memory).
EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of each
EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and free
EPC pages for guest.

There are two ways to manage EPC: Manage EPC separately; or Integrate it to
existing memory management framework.

It is easy to manage EPC separately, as currently EPC is pretty small (~100MB),
and we can even put them in a single list. However it is not flexible, for
example, you will have to write new algorithms when EPC becomes larger, ex, GB.
And you have to write new code to support NUMA EPC (although this will not come
in short time).

Integrating EPC to existing memory management framework seems more reasonable,
as in this way we can resume memory management data structures/algorithms, and
it will be more flexible to support larger EPC and potentially NUMA EPC. But
modifying MM framework has a higher risk to break existing memory management
code (potentially more bugs).

In my RFC patches currently we choose to manage EPC separately. A new
structure epc_page is added to represent a single 4K EPC page. A whole array
of struct epc_page will be allocated during EPC initialization, so that given
the other, one of PFN of EPC page and 'struct epc_page' can be got by adding
offset.

But maybe integrating EPC to MM framework is more reasonable. Comments?

2.2.2 EPC Virtualization (?)

This part is how to populate EPC for guests. We have 3 choices:
    - Static Partitioning
    - Oversubscription
    - Ballooning

Static Partitioning means all EPC pages will be allocated and mapped to guest
when it is created, and there's no runtime change of page table mappings for EPC
pages. Oversubscription means Xen hypervisor supports EPC page swapping between
domains, meaning Xen is able to evict EPC page from another domain and assign it
to the domain that needs the EPC. With oversubscription, EPC can be assigned to
domain on demand, when EPT violation happens. Ballooning is similar to memory
ballooning. It is basically "Static Partitioning" + "Balloon driver" in guest.

Static Partitioning is the easiest way in terms of implementation, and there
will be no hypervisor overhead (except EPT overhead of course), because in
"Static partitioning", there is no EPT violation for EPC, and Xen doesn't need
to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root mode.

Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like "Static
Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and doesn't
have EPT violation for EPC either. To support ballooning, we need ballooning
driver in guest to issue hypercall to give up or reclaim EPC pages. In terms of
hypercall, we have two choices: 1) Add new hypercall for EPC ballooning; 2)
Using existing XENMEM_{increase/decrease}_reservation with new memory flag, ie,
XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall or not
later.

Oversubscription looks nice but it requires more complicated implemetation.
Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to follow specific
steps to evict EPC pages, and in order to do that, basically Xen needs to trap
ENCLS from guest and keep track of EPC page status and enclave info from all
guest. This is because:
    - To evict regular EPC page, Xen needs to know SECS location
    - Xen needs to know EPC page type: evicting regular EPC and evicting SECS,
      VA page have different steps.
    - Xen needs to know EPC page status: whether the page is blocked or not.

Those info can only be got by trapping ENCLS from guest, and parsing its
parameters (to identify SECS page, etc). Parsing ENCLS parameters means we need
to know which ENCLS leaf is being trapped, and we need to translate guest's
virtual address to get physical address in order to locate EPC page. And once
ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to
reconstruct ENCLS parameters by remapping all guest's virtual address to Xen's
virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective address*
which is able to be traslated by processor when running ENCLS.

    --------------------------------------------------------------
                |   ENCLS   |
    --------------------------------------------------------------
                |          /|\
    ENCLS VMEXIT|           | VMENTRY
                |           |
               \|/          |

		1) parse ENCLS parameters
		2) reconstruct(remap) guest's ENCLS parameters
		3) run ENCLS on behalf of guest (and skip ENCLS)
		4) on success, update EPC/enclave info, or inject error

And Xen needs to maintain each EPC page's status (type, blocked or not, in
enclave or not, etc). Xen also needs to maintain all Enclave's info from all
guests, in order to find the correct SECS for regular EPC page, and enclave's
linear address as well.

So in general, "Static Partitioning" has simplest implementation, but obviously
not the best way to use EPC efficiently; "Ballooning" has all pros of Static
Partitioning but requies guest balloon driver; "Oversubscription" is best in
terms of flexibility but requires complicated hypervisor implemetation.

We have implemented "Static Partitioning" in RFC patches, but needs your
feedback on whether it is enough. If not, which one should we do at next stage
-- Ballooning or Oversubscription. IMO Ballooning may be good enough, given fact
that currently memory is also "Static Partitioning" + "Ballooning".

Comments?

2.2.3 Populate EPC for Guest

Toolstack notifies Xen about domain's EPC base and size by XEN_DOMCTL_set_cpuid,
so currently Xen populates all EPC pages for guest in XEN_DOMCTL_set_cpuid,
particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. Once Xen
checks the values passed from toolstack is valid, Xen will allocate all EPC
pages and setup EPT mappings for guest.

2.2.4 New Dedicated Hypercall (?)

So far for all the changes mentioned above, if without a dedicated new
hypercall, we have to implement those changes in:

    - xl sgxifo (or xl info -sgx)

    Toolstack can do this by running SGX CPUID directly, along with checking
    host cpu featureset.

    - xl sgxlist (or xl list -sgx)

    This is not quite straightforward. Looks we have to extend
    xen_domctl_getdomaininfo. However SGX is Intel specific feature, so I am
    not sure it's a good idea to extend xen_domctl_getdomaininfo.

    -  Populate EPC for guest

    In XEN_DOMCTL_set_cpuid, Xen populates EPC pages for guest after receiving
    EPC base and size from toolstack.

    - Potential EPC Ballooning

    Need to add new XENMEMF_epc and use existing
    XENMEM_{increase/decrease}_reservation.

With new hypercall for SGX (ie, XEN_sgx_op), all of above can be consolidated
into the hypercall. We can also extend it to more generic hypercall for Intel
platform genrally (ie, XEN_intel_op). For example, the new hypercall would look
like:

    #define XEN_INTEL_SGX_physinfo  0x1
    struct xen_sgx_physinfo {
        /* OUT */
        unsigned long total_epc_pages;
        unsigned long free_epc_pages;
    };
    typedef struct xen_sgx_physinfo xen_sgx_physinfo_t;
    DEFINE_XEN_GUEST_HANDLE(xen_sgx_physinfo_t);

    #define XEN_INTEL_SGX_setup_epc 0x2
    struct xen_sgx_setup_epc {
        /* IN */
        domid_t domid;
        unsigned long epc_base_gfn;
        unsigned long total_epc_pages;
    };
    typedef struct xen_sgx_setup_epc xen_sgx_setup_epc_t;
    DEFINE_XEN_GUEST_HANDLE(xen_sgx_setup_epc_t);

    #define XEN_INTEL_SGX_dominfo   0x3
    struct xen_sgx_dominfo {
        /* IN */
        domid_t domid;
        /* OUT */
        unsigned long epc_base_gfn;
        unsigned long total_epc_pages;
    };
    DEFINE_XEN_GUEST_HANDLE(xen_sgx_dominfo);

    struct xen_sgx_op {
        /* XEN_INTEL_SGX_* */
        int cmd;
        union {
            struct xen_sgx_physinfo physinfo;
            struct xen_sgx_setup_epc setup_epc;
            struct xen_sgx_dominfo dominfo;
        } u;
    };    
    typedef struct xen_sgx_op xen_sgx_op_t;
    DEFINE_XEN_GUEST_HANDLE(xen_sgx_op);

    /* New arch specific hypercall for Intel platform specific operations,
     * __HYPERVISOR_arch_0 is used by Xen x86 machine check... */
    #define __HYPERVISOR_intel_op  __HYPERVISOR_arch_1
    /* Currently only SGX uses this */
    #define XEN_INTEL_OP_sgx                (0x1 << 1)
    struct xen_intel_op {
        int cmd;    /* XEN_INTEL_OP_*** */
        union {
            struct xen_sgx_op sgx_op;
        } u;
    }
    typedef struct xen_intel_op xen_intel_op_t;
    DEFINE_XEN_GUEST_HANDLE(xen_intel_op_t);


In my RFC patches, the new hypercall is not implemented as I am not sure
whether it is a good idea.

Comments?

2.2.5 Launch Control Support

To support running multiple domains with each running its own LE signed by
different owners, physical machine's BIOS must leave IA32_SGXLEPUBKEYHASHn
*unlocked* before handing to Xen. Xen will trap domain's write to
IA32_SGXLEPUBKEYHASHn and keep the value in vcpu internally, and update the
value to physical MSRs when vcpu is scheduled in. This can guarantee that
when EINIT runs in guest, guest's virtual IA32_SGXLEPUBKEYHASHn have been
written to physical MSRs.

SGX_LAUNCH_CONTROL_ENABLE bit will always be set in guest's
IA32_FEATURE_CONTROL MSR (see 2.1.4 Launch Control Support).

If physical IA32_SGXLEPUBKEYHASHn are *locked* in machine's BIOS, then only MSR
read is allowed from guest, and Xen will inject error for guest's MSR writes.

If CPUID.0x7.0x0:ECX.SGX_LAUHCN_CONTROL is not present, then this feature will
not be exposed to guest as well, and SGX_LAUNCH_CONTROL_ENABLE bit is set to 0
(as it is invalid).

2.2.6 CPUID Emulation

Most of native SGX CPUID info can be exposed to guest, expect below two parts:
    - Sub-leaf 0x2 needs to report domain's virtual EPC base and size, instead
      of physical EPC info.
    - Sub-leaf 0x1 needs to be consistent with guest's XCR0. For the reason of
      this part please refer to 1.5.2 Interaction with XSAVE.

2.2.7 MSR Emulation

SGX_ENABLE it in IA32_FEATURE_CONTROL is always set if SGX is exposed to guest,
SGX_LAUNCH_CONTROL_ENABLE bit is handled as in 2.2.4. Any write from guest to
IA32_FEATURE_CONTROL is ignored.

IA32_SGXLEPUBKEYHASHn emulation is described in 2.2.4.

2.2.8 EPT Violation & ENCLS Trapping Handling

Only needed when Xen supports EPC Oversubscription, as explained above.

2.2.9 Guest Suspend & Resume

On hardware, EPC is destroyed when power goes to S3-S5. So Xen will destroy
guest's EPC when guest's power goes into S3-S5. Currently Xen is notified by
Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen will
destroy EPC if S State is S3-S5.

Specifically, Xen will run EREMOVE for guest's each EPC page, as guest may
not handle EPC suspend & resume correctly, in which case physically guest's EPC
pages may still be valid, so Xen needs to run EREMOVE to make sure all EPC
pages are becoming invalid. Otherwise further operation in guest on EPC may
fault as it assumes all EPC pages are invalid after guest is resumed.

For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case Xen will
keep this SECS page into a list, and call EREMOVE for them again after all EPC
pages have been called with EREMOVE. This time the EREMOVE on SECS will succeed
as all children (regular EPC pages) have already been removed.

2.2.10 Destroying Domain

Normally Xen just frees all EPC pages for domain when it is destroyed. But Xen
will also do EREMOVE on all guest's EPC pages (described in above 2.2.7) before
free them, as guest may shutdown unexpected (ex, user kills guest), and in this
case, guest's EPC may still be valid.

2.3 Additional Point: Live Migration, Snapshot Support (?)

Actually from hardware's point of view, SGX is not migratable. There are two
reasons:

    - SGX key architecture cannot be virtualized.

    For example, some keys are bound to CPU. For example, Sealing key, EREPORT
    key, etc. If VM is migrated to another machine, the same enclave will derive
    the different keys. Taking Sealing key as an example, Sealing key is
    typically used by enclave (enclave can get sealing key by EGETKEY) to *seal*
    its secrets to outside (ex, persistent storage) for further use. If Sealing
    key changes after VM migration, then the enclave can never get the sealed
    secrets back by using sealing key, as it has changed, and old sealing key
    cannot be got back.

    - There's no ENCLS to evict EPC page to normal memory, but at the meaning
    time, still keep content in EPC. Currently once EPC page is evicted, the EPC
    page becomes invalid. So technically, we are unable to implement live
    migration (or check pointing, or snapshot) for enclave.

But, with some workaround, and some facts of existing SGX driver, technically
we are able to support Live migration (or even check pointing, snapshot). This
is because:

    - Changing key (which is bound to CPU) is not a problem in reality

    Take Sealing key as an example. Losing sealed data is not a problem, because
    sealing key is only supposed to encrypt secrets that can be provisioned
    again. The typical work model is, enclave gets secrets provisioned from
    remote (service provider), and use sealing key to store it for further use.
    When enclave tries to *unseal* use sealing key, if the sealing key is
    changed, enclave will find the data is some kind of corrupted (integrity
    check failure), so it will ask secrets to be provisioned again from remote.
    Another reason is, in data center, VM's typically share lots of data, and as
    sealing key is bound to CPU, it means the data encrypted by one enclave on
    one machine cannot be shared by another enclave on another mahcine. So from
    SGX app writer's point of view, developer should treat Sealing key as a
    changeable key, and should handle lose of sealing data anyway. Sealing key
    should only be used to seal secrets that can be easily provisioned again.

    For other keys such as EREPORT key and provisioning key, which are used for
    local attestation and remote attestation, due to the second reason below,
    losing them is not a problem either.

    - Sudden lose of EPC is not a problem.

    On hardware, EPC will be lost if system goes to S3-S5, or reset, or
    shutdown, and SGX driver need to handle lose of EPC due to power transition.
    This is done by cooperation between SGX driver and userspace SGX SDK/apps.
    However during live migration, there may not be power transition in guest,
    so there may not be EPC lose during live migration. And technically we
    cannot *really* live migrate enclave (explained above), so looks it's not
    feasible. But the fact is that both Linux SGX driver and Windows SGX driver
    have already supported *sudden* lose of EPC (not EPC lose during power
    transition), which means both driver are able to recover in case EPC is lost
    at any runtime. With this, technically we are able to support live migration
    by simply ignoring EPC. After VM is migrated, the destination VM will only
    suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX
    driver are already able to handle.

    But we must point out such *sudden* lose of EPC is not hardware behavior,
    and other SGX driver for other OSes (such as FreeBSD) may not implement
    this, so for those guests, destination VM will behavior in unexpected
    manner. But I am not sure we need to care about other OSes.

For the same reason, we are able to support check pointing for SGX guest (only
Linux and Windows);

For snapshot, we can support snapshot SGX guest by either:

    - Suspend guest before snapshot (s3-s5). This works for all guests but
      requires user to manually susppend guest.
    - Issue an hypercall to destroy guest's EPC in save_vm. This only works for
      Linux and Windows but doesn't require user intervention.

What's your comments?

3. Reference

    - Intel SGX Homepage
    https://software.intel.com/en-us/sgx

    - Linux SGX SDK
    https://01.org/intel-software-guard-extensions

    - Linux SGX driver for upstreaming
    https://github.com/01org/linux-sgx

    - Intel SGX Specification (SDM Vol 3D)
    https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf

    - Paper: Intel SGX Explained
    https://eprint.iacr.org/2016/086.pdf

    - ISCA 2015 tutorial slides for Intel® SGX - Intel® Software
    https://software.intel.com/sites/default/files/332680-002.pdf

Kai Huang (15):
  xen: x86: expose SGX to HVM domain in CPU featureset
  xen: vmx: detect ENCLS VMEXIT
  xen: x86: add early stage SGX feature detection
  xen: mm: add ioremap_cache
  xen: p2m: new 'p2m_epc' type for EPC mapping
  xen: x86: add SGX basic EPC management
  xen: x86: add functions to populate and destroy EPC for domain
  xen: x86: add SGX cpuid handling support.
  xen: vmx: handle SGX related MSRs
  xen: vmx: handle ENCLS VMEXIT
  xen: vmx: handle VMEXIT from SGX enclave
  xen: x86: reset EPC when guest got suspended.
  xen: tools: add new 'epc' parameter support
  xen: tools: add SGX to applying CPUID policy
  xen: tools: expose EPC in ACPI table

 tools/firmware/hvmloader/util.c             |  23 +
 tools/firmware/hvmloader/util.h             |   3 +
 tools/libacpi/build.c                       |   3 +
 tools/libacpi/dsdt.asl                      |  49 ++
 tools/libacpi/dsdt_acpi_info.asl            |   6 +-
 tools/libacpi/libacpi.h                     |   1 +
 tools/libxc/include/xc_dom.h                |   4 +
 tools/libxc/include/xenctrl.h               |  10 +
 tools/libxc/xc_cpuid_x86.c                  |  68 ++-
 tools/libxl/libxl.h                         |   3 +-
 tools/libxl/libxl_cpuid.c                   |  15 +-
 tools/libxl/libxl_create.c                  |   9 +
 tools/libxl/libxl_dom.c                     |  36 +-
 tools/libxl/libxl_internal.h                |   2 +
 tools/libxl/libxl_nocpuid.c                 |   4 +-
 tools/libxl/libxl_types.idl                 |   6 +
 tools/libxl/libxl_x86.c                     |  12 +
 tools/libxl/libxl_x86_acpi.c                |   3 +
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  11 +-
 tools/python/xen/lowlevel/xc/xc.c           |  11 +-
 tools/xl/xl_parse.c                         |   5 +
 xen/arch/x86/cpuid.c                        |  87 ++-
 xen/arch/x86/domctl.c                       |  47 +-
 xen/arch/x86/hvm/hvm.c                      |   3 +
 xen/arch/x86/hvm/vmx/Makefile               |   1 +
 xen/arch/x86/hvm/vmx/sgx.c                  | 871 ++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c                 |  21 +
 xen/arch/x86/hvm/vmx/vmx.c                  |  73 +++
 xen/arch/x86/hvm/vmx/vvmx.c                 |  11 +
 xen/arch/x86/mm.c                           |  15 +-
 xen/arch/x86/mm/p2m-ept.c                   |   3 +
 xen/arch/x86/mm/p2m.c                       |  41 ++
 xen/include/asm-x86/cpufeature.h            |   4 +
 xen/include/asm-x86/cpuid.h                 |  26 +-
 xen/include/asm-x86/hvm/hvm.h               |   3 +
 xen/include/asm-x86/hvm/vmx/sgx.h           | 100 ++++
 xen/include/asm-x86/hvm/vmx/vmcs.h          |  10 +
 xen/include/asm-x86/hvm/vmx/vmx.h           |   3 +
 xen/include/asm-x86/msr-index.h             |   6 +
 xen/include/asm-x86/p2m.h                   |  12 +-
 xen/include/public/arch-x86/cpufeatureset.h |   3 +-
 xen/include/xen/vmap.h                      |   1 +
 xen/tools/gen-cpuid.py                      |   3 +
 43 files changed, 1607 insertions(+), 21 deletions(-)
 create mode 100644 xen/arch/x86/hvm/vmx/sgx.c
 create mode 100644 xen/include/asm-x86/hvm/vmx/sgx.h

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
@ 2017-07-09  8:04 ` Kai Huang
  2017-07-12 11:09   ` Andrew Cooper
  2017-07-18 10:12   ` Andrew Cooper
  2017-07-09  8:09 ` [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT Kai Huang
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:04 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

Expose SGX in CPU featureset for HVM domain. SGX will not be supported for
PV domain, as ENCLS (which SGX driver in guest essentially runs) must run
in ring 0, while PV kernel runs in ring 3. Theoretically we can support SGX
in PV domain via either emulating #GP caused by ENCLS running in ring 3, or
by PV ENCLS but it is really not necessary at this stage. And currently SGX
is only exposed to HAP HVM domain (we can add for shadow in the future).

SGX Launch Control is also exposed in CPU featureset for HVM domain. SGX
Launch Control depends on SGX.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/include/public/arch-x86/cpufeatureset.h | 3 ++-
 xen/tools/gen-cpuid.py                      | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 97dd3534c5..b6c54e654e 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -193,7 +193,7 @@ XEN_CPUFEATURE(XSAVES,        4*32+ 3) /*S  XSAVES/XRSTORS instructions */
 /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
 XEN_CPUFEATURE(FSGSBASE,      5*32+ 0) /*A  {RD,WR}{FS,GS}BASE instructions */
 XEN_CPUFEATURE(TSC_ADJUST,    5*32+ 1) /*S  TSC_ADJUST MSR available */
-XEN_CPUFEATURE(SGX,           5*32+ 2) /*   Software Guard extensions */
+XEN_CPUFEATURE(SGX,           5*32+ 2) /*H  Intel Software Guard extensions */
 XEN_CPUFEATURE(BMI1,          5*32+ 3) /*A  1st bit manipulation extensions */
 XEN_CPUFEATURE(HLE,           5*32+ 4) /*A  Hardware Lock Elision */
 XEN_CPUFEATURE(AVX2,          5*32+ 5) /*A  AVX2 instructions */
@@ -229,6 +229,7 @@ XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
 XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
 XEN_CPUFEATURE(RDPID,         6*32+22) /*A  RDPID instruction */
+XEN_CPUFEATURE(SGX_LAUNCH_CONTROL, 6*32+30) /*H Intel SGX Launch Control */
 
 /* AMD-defined CPU features, CPUID level 0x80000007.edx, word 7 */
 XEN_CPUFEATURE(ITSC,          7*32+ 8) /*   Invariant TSC */
diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 9ec4486f2b..1301eee310 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -256,6 +256,9 @@ def crunch_numbers(state):
         AVX512F: [AVX512DQ, AVX512IFMA, AVX512PF, AVX512ER, AVX512CD,
                   AVX512BW, AVX512VL, AVX512VBMI, AVX512_4VNNIW,
                   AVX512_4FMAPS, AVX512_VPOPCNTDQ],
+
+        # SGX Launch Control depends on SGX
+        SGX: [SGX_LAUNCH_CONTROL],
     }
 
     deep_features = tuple(sorted(deps.keys()))
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
  2017-07-09  8:04 ` [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-12 11:11   ` Andrew Cooper
  2017-07-09  8:09 ` [PATCH 03/15] xen: x86: add early stage SGX feature detection Kai Huang
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

If ENCLS VMEXIT is not present then we cannot support SGX virtualization.
This patch detects presence of ENCLS VMEXIT. A Xen boot boolean parameter
'sgx' is also added to manually enable/disable SGX.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 17 +++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8103b20d29..ae7e6f9321 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -98,6 +98,9 @@ static void __init parse_ept_param(char *s)
 }
 custom_param("ept", parse_ept_param);
 
+static bool_t __read_mostly opt_sgx_enabled = 1;
+boolean_param("sgx", opt_sgx_enabled);
+
 /* Dynamic (run-time adjusted) execution control flags. */
 u32 vmx_pin_based_exec_control __read_mostly;
 u32 vmx_cpu_based_exec_control __read_mostly;
@@ -138,6 +141,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
     P(cpu_has_vmx_pml, "Page Modification Logging");
     P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
+    P(cpu_has_vmx_encls, "SGX ENCLS Exiting");
 #undef P
 
     if ( !printed )
@@ -243,6 +247,8 @@ static int vmx_init_vmcs_config(void)
             opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
         if ( opt_pml_enabled )
             opt |= SECONDARY_EXEC_ENABLE_PML;
+        if ( opt_sgx_enabled )
+            opt |= SECONDARY_EXEC_ENABLE_ENCLS;
 
         /*
          * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
@@ -336,6 +342,14 @@ static int vmx_init_vmcs_config(void)
         _vmx_secondary_exec_control &= ~ SECONDARY_EXEC_PAUSE_LOOP_EXITING;
     }
 
+    /*
+     * Turn off SGX if ENCLS VMEXIT is not present. Actually on real machine,
+     * if SGX CPUID is present (CPUID.0x7.0x0:EBX.SGX = 1), then ENCLS VMEXIT
+     * will always be present. We do the check anyway here.
+     */
+    if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_ENCLS) )
+        opt_sgx_enabled = 0;
+
     min = VM_EXIT_ACK_INTR_ON_EXIT;
     opt = VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT |
           VM_EXIT_CLEAR_BNDCFGS;
@@ -1146,6 +1160,9 @@ static int construct_vmcs(struct vcpu *v)
     /* Disable PML anyway here as it will only be enabled in log dirty mode */
     v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
+    /* Disable ENCLS VMEXIT. It will only be turned on when needed. */
+    v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_ENCLS;
+
     /* Host data selectors. */
     __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS);
     __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index e3cdfdf576..889091da42 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -232,6 +232,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
 #define SECONDARY_EXEC_ENABLE_VM_FUNCTIONS      0x00002000
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
+#define SECONDARY_EXEC_ENABLE_ENCLS             0x00008000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS   0x00040000
 #define SECONDARY_EXEC_XSAVES                   0x00100000
@@ -312,6 +313,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_secondary_exec_control & SECONDARY_EXEC_XSAVES)
 #define cpu_has_vmx_tsc_scaling \
     (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
+#define cpu_has_vmx_encls \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_ENCLS)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/15] xen: x86: add early stage SGX feature detection
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
  2017-07-09  8:04 ` [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset Kai Huang
  2017-07-09  8:09 ` [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-19 14:23   ` Andrew Cooper
  2017-07-09  8:09 ` [PATCH 06/15] xen: x86: add SGX basic EPC management Kai Huang
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

This patch adds early stage SGX feature detection via SGX CPUID 0x12. Function
detect_sgx is added to detect SGX info on each CPU (called from vmx_cpu_up).
SDM says SGX info returned by CPUID is per-thread, and we cannot assume all
threads will return the same SGX info, so we have to detect SGX for each CPU.
For simplicity, currently SGX is only supported when all CPUs reports the same
SGX info.

SDM also says it's possible to have multiple EPC sections but this is only for
multiple-socket server, which we don't support now (there are other things
need to be done, ex, NUMA EPC, scheduling, etc, as well), so currently only
one EPC is supported.

Dedicated files sgx.c and sgx.h are added (under vmx directory as SGX is Intel
specific) for bulk of above SGX detection code detection code, and for further
SGX code as well.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/Makefile     |   1 +
 xen/arch/x86/hvm/vmx/sgx.c        | 208 ++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c       |   4 +
 xen/include/asm-x86/cpufeature.h  |   1 +
 xen/include/asm-x86/hvm/vmx/sgx.h |  45 +++++++++
 5 files changed, 259 insertions(+)
 create mode 100644 xen/arch/x86/hvm/vmx/sgx.c
 create mode 100644 xen/include/asm-x86/hvm/vmx/sgx.h

diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 04a29ce59d..f6bcf0d143 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -4,3 +4,4 @@ obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
 obj-y += vvmx.o
+obj-y += sgx.o
diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
new file mode 100644
index 0000000000..6b41469371
--- /dev/null
+++ b/xen/arch/x86/hvm/vmx/sgx.c
@@ -0,0 +1,208 @@
+/*
+ * Intel Software Guard Extensions support
+ *
+ * Author: Kai Huang <kai.huang@linux.intel.com>
+ */
+
+#include <asm/cpufeature.h>
+#include <asm/msr-index.h>
+#include <asm/msr.h>
+#include <asm/hvm/vmx/sgx.h>
+#include <asm/hvm/vmx/vmcs.h>
+
+static struct sgx_cpuinfo __read_mostly sgx_cpudata[NR_CPUS];
+static struct sgx_cpuinfo __read_mostly boot_sgx_cpudata;
+
+static bool_t sgx_enabled_in_bios(void)
+{
+    uint64_t val, sgx_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
+                                IA32_FEATURE_CONTROL_LOCK;
+
+    rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
+
+    return (val & sgx_enabled) == sgx_enabled;
+}
+
+static void __detect_sgx(int cpu)
+{
+    struct sgx_cpuinfo *sgxinfo = &sgx_cpudata[cpu];
+    u32 eax, ebx, ecx, edx;
+
+    memset(sgxinfo, 0, sizeof(*sgxinfo));
+
+    /*
+     * In reality if SGX is not enabled in BIOS, SGX CPUID should report
+     * invalid SGX info, but we do the check anyway to make sure.
+     */
+    if ( !sgx_enabled_in_bios() )
+    {
+        printk("CPU%d: SGX disabled in BIOS.\n", cpu);
+        goto not_supported;
+    }
+
+    /*
+     * CPUID.0x12.0x0:
+     *
+     *  EAX [0]:    whether SGX1 is supported.
+     *      [1]:    whether SGX2 is supported.
+     *  EBX [31:0]: miscselect
+     *  ECX [31:0]: reserved
+     *  EDX [7:0]:  MaxEnclaveSize_Not64
+     *      [15:8]: MaxEnclaveSize_64
+     */
+    cpuid_count(SGX_CPUID, 0x0, &eax, &ebx, &ecx, &edx);
+    sgxinfo->cap = eax & (SGX_CAP_SGX1 | SGX_CAP_SGX2);
+    sgxinfo->miscselect = ebx;
+    sgxinfo->max_enclave_size32 = edx & 0xff;
+    sgxinfo->max_enclave_size64 = (edx & 0xff00) >> 8;
+
+    if ( !(eax & SGX_CAP_SGX1) )
+    {
+        /* We may reach here if BIOS doesn't enable SGX */
+        printk("CPU%d: CPUID.0x12.0x0 reports not SGX support.\n", cpu);
+        goto not_supported;
+    }
+
+    /*
+     * CPUID.0x12.0x1:
+     *
+     *  EAX [31:0]: bitmask of 1-setting of SECS.ATTRIBUTES[31:0]
+     *  EBX [31:0]: bitmask of 1-setting of SECS.ATTRIBUTES[63:32]
+     *  ECX [31:0]: bitmask of 1-setting of SECS.ATTRIBUTES[95:64]
+     *  EDX [31:0]: bitmask of 1-setting of SECS.ATTRIBUTES[127:96]
+     */
+    cpuid_count(SGX_CPUID, 0x1, &eax, &ebx, &ecx, &edx);
+    sgxinfo->secs_attr_bitmask[0] = eax;
+    sgxinfo->secs_attr_bitmask[1] = ebx;
+    sgxinfo->secs_attr_bitmask[2] = ecx;
+    sgxinfo->secs_attr_bitmask[3] = edx;
+
+    /*
+     * CPUID.0x12.0x2:
+     *
+     *  EAX [3:0]:      0000: this sub-leaf is invalid
+     *                  0001: this sub-leaf enumerates EPC resource
+     *      [11:4]:     reserved
+     *      [31:12]:    bits 31:12 of physical address of EPC base (when
+     *                  EAX[3:0] is 0001, which applies to following)
+     *  EBX [19:0]:     bits 51:32 of physical address of EPC base
+     *      [31:20]:    reserved
+     *  ECX [3:0]:      0000: EDX:ECX are 0
+     *                  0001: this is EPC section.
+     *      [11:4]:     reserved
+     *      [31:12]:    bits 31:12 of EPC size
+     *  EDX [19:0]:     bits 51:32 of EPC size
+     *      [31:20]:    reserved
+     *
+     *  TODO: So far assume there's only one EPC resource.
+     */
+    cpuid_count(SGX_CPUID, 0x2, &eax, &ebx, &ecx, &edx);
+    if ( !(eax & 0x1) || !(ecx & 0x1) )
+    {
+        /* We may reach here if BIOS doesn't enable SGX */
+        printk("CPU%d: CPUID.0x12.0x2 reports invalid EPC resource.\n", cpu);
+        goto not_supported;
+    }
+    sgxinfo->epc_base = (((u64)(ebx & 0xfffff)) << 32) | (eax & 0xfffff000);
+    sgxinfo->epc_size = (((u64)(edx & 0xfffff)) << 32) | (ecx & 0xfffff000);
+
+    return;
+
+not_supported:
+    memset(sgxinfo, 0, sizeof(*sgxinfo));
+}
+
+void detect_sgx(int cpu)
+{
+    /* Caller (vmx_cpu_up) has checked cpu_has_vmx_encls */
+    if ( !cpu_has_sgx || boot_cpu_data.cpuid_level < SGX_CPUID )
+    {
+        setup_clear_cpu_cap(X86_FEATURE_SGX);
+        return;
+    }
+
+    __detect_sgx(cpu);
+}
+
+static void __init disable_sgx(void)
+{
+    memset(&boot_sgx_cpudata, 0, sizeof (struct sgx_cpuinfo));
+    /*
+     * X86_FEATURE_SGX is cleared in boot_cpu_data so that cpu_has_sgx
+     * can be used anywhere to check whether SGX is supported by Xen.
+     *
+     * FIXME: also adjust boot_cpu_data.cpuid_level ?
+     */
+    setup_clear_cpu_cap(X86_FEATURE_SGX);
+}
+
+static void __init print_sgx_cpuinfo(struct sgx_cpuinfo *sgxinfo)
+{
+    printk("SGX: \n"
+           "\tCAP: %s,%s\n"
+           "\tEPC: [0x%"PRIx64", 0x%"PRIx64")\n",
+           boot_sgx_cpudata.cap & SGX_CAP_SGX1 ? "SGX1" : "",
+           boot_sgx_cpudata.cap & SGX_CAP_SGX2 ? "SGX2" : "",
+           boot_sgx_cpudata.epc_base,
+           boot_sgx_cpudata.epc_base + boot_sgx_cpudata.epc_size);
+}
+
+/*
+ * Check SGX CPUID info all for all CPUs, and only support SGX when all CPUs
+ * report the same SGX info. SDM (37.7.2 Intel SGX Resource Enumeration Leaves)
+ * says "software should not assume that if Intel SGX instructions are
+ * supported on one hardware thread, they are also supported elsewhere.".
+ * For simplicity, we only support SGX when all CPUs reports consistent SGX
+ * info.
+ *
+ * boot_sgx_cpudata is set to store the *common* SGX CPUID info.
+ */
+static bool_t __init check_sgx_consistency(void)
+{
+    int i;
+
+    for_each_online_cpu ( i )
+    {
+        struct sgx_cpuinfo *s = &sgx_cpudata[i];
+
+        if ( memcmp(&boot_sgx_cpudata, s, sizeof (*s)) )
+        {
+            printk("SGX inconsistency between CPU 0 and CPU %d. "
+                    "Disable SGX.\n", i);
+            memset(&boot_sgx_cpudata, 0,  sizeof (*s));
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static int __init sgx_init(void)
+{
+    /* Assume CPU 0 is always online */
+    boot_sgx_cpudata = sgx_cpudata[0];
+
+    if ( !(boot_sgx_cpudata.cap & SGX_CAP_SGX1) )
+        goto not_supported;
+
+    if ( !check_sgx_consistency() )
+        goto not_supported;
+
+    print_sgx_cpuinfo(&boot_sgx_cpudata);
+
+    return 0;
+not_supported:
+    disable_sgx();
+    return -EINVAL;
+}
+__initcall(sgx_init);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index ae7e6f9321..518133bbfd 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -40,6 +40,7 @@
 #include <asm/shadow.h>
 #include <asm/tboot.h>
 #include <asm/apic.h>
+#include <asm/hvm/vmx/sgx.h>
 
 static bool_t __read_mostly opt_vpid_enabled = 1;
 boolean_param("vpid", opt_vpid_enabled);
@@ -696,6 +697,9 @@ int vmx_cpu_up(void)
 
     vmx_pi_per_cpu_init(cpu);
 
+    if ( cpu_has_vmx_encls )
+        detect_sgx(cpu);
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 84cc51d2bd..9793f8c1c5 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -85,6 +85,7 @@
 
 /* CPUID level 0x00000007:0.ebx */
 #define cpu_has_fsgsbase        boot_cpu_has(X86_FEATURE_FSGSBASE)
+#define cpu_has_sgx             boot_cpu_has(X86_FEATURE_SGX)
 #define cpu_has_bmi1            boot_cpu_has(X86_FEATURE_BMI1)
 #define cpu_has_hle             boot_cpu_has(X86_FEATURE_HLE)
 #define cpu_has_avx2            boot_cpu_has(X86_FEATURE_AVX2)
diff --git a/xen/include/asm-x86/hvm/vmx/sgx.h b/xen/include/asm-x86/hvm/vmx/sgx.h
new file mode 100644
index 0000000000..5414d8237e
--- /dev/null
+++ b/xen/include/asm-x86/hvm/vmx/sgx.h
@@ -0,0 +1,45 @@
+/*
+ * Intel Software Guard Extensions support
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ *
+ * Author: Kai Huang <kai.huang@linux.intel.com>
+ */
+#ifndef __ASM_X86_HVM_VMX_SGX_H__
+#define __ASM_X86_HVM_VMX_SGX_H__
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/init.h>
+#include <asm/processor.h>
+
+#define SGX_CPUID 0x12
+
+/*
+ * SGX info reported by SGX CPUID.
+ *
+ * TODO:
+ *
+ * SDM (37.7.2 Intel SGX Resource Enumeration Leaves) actually says it's
+ * possible there are multiple EPC resources on the machine (CPUID.0x12,
+ * ECX starting with 0x2 enumerates available EPC resources until invalid
+ * EPC resource is returned). But this is only for multiple socket server,
+ * which we current don't support now (there are additional things need to
+ * be done as well). So far for simplicity we assume there is only one EPC.
+ */
+struct sgx_cpuinfo {
+#define SGX_CAP_SGX1    (1UL << 0)
+#define SGX_CAP_SGX2    (1UL << 1)
+    uint32_t cap;
+    uint32_t miscselect;
+    uint8_t max_enclave_size64;
+    uint8_t max_enclave_size32;
+    uint32_t secs_attr_bitmask[4];
+    uint64_t epc_base;
+    uint64_t epc_size;
+};
+
+/* Detect SGX info for particular CPU via SGX CPUID */
+void detect_sgx(int cpu);
+
+#endif  /* __ASM_X86_HVM_VMX_SGX_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/15] xen: x86: add SGX basic EPC management
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (2 preceding siblings ...)
  2017-07-09  8:09 ` [PATCH 03/15] xen: x86: add early stage SGX feature detection Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-09  8:09 ` [PATCH 07/15] xen: x86: add functions to populate and destroy EPC for domain Kai Huang
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

EPC is limited resource reserved by BIOS. Typically EPC size is from dozens of
MB to more than a hundred MB. EPC is reported as reserved memory in e820 but
not normal memory. EPC must be managed in 4K pages.

From implementation's view, we can choose either to manage EPC separately, or
to extend existing memory management code to support EPC. The latter has
advantage of being able to use existing memory management algorithm but is
more complicated to implement (thus more risky), while the former is more
simple but has to write own EPC management algorithm. Currently we choose the
former. Given the fact that EPC size is small, currently we simply put all EPC
pages into single list, so allocation and free are very straightforward.

Like there is one 'struct page_info' for each memory page, a 'struct epc_page'
is added to represent status of each EPC page, and all 'struct epc_page' will
be in an array which is allocated during SGX initialization. Entire EPC is also
mapped to Xen's virtual address so that each EPC page's virtual address can be
calculated by base virtual address + offset.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/sgx.c        | 154 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/sgx.h |  19 +++++
 2 files changed, 173 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
index 6b41469371..f4c9b2f933 100644
--- a/xen/arch/x86/hvm/vmx/sgx.c
+++ b/xen/arch/x86/hvm/vmx/sgx.c
@@ -7,12 +7,89 @@
 #include <asm/cpufeature.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
+#include <xen/errno.h>
+#include <xen/mm.h>
 #include <asm/hvm/vmx/sgx.h>
 #include <asm/hvm/vmx/vmcs.h>
 
 static struct sgx_cpuinfo __read_mostly sgx_cpudata[NR_CPUS];
 static struct sgx_cpuinfo __read_mostly boot_sgx_cpudata;
 
+/*
+ * epc_frametable keeps an array of struct epc_page for every EPC pages, so that
+ * epc_page_to_mfn, epc_mfn_to_page works straightforwardly. The array will be
+ * allocated dynamically according to machine's EPC size.
+ */
+static struct epc_page *epc_frametable = NULL;
+/*
+ * EPC is mapped to Xen's virtual address at once, so that each EPC page's
+ * virtual address is epc_base_vaddr + offset.
+ */
+static void *epc_base_vaddr = NULL;
+
+/* Global free EPC pages list. */
+static struct list_head free_epc_list;
+static spinlock_t epc_lock;
+
+#define total_epc_npages (boot_sgx_cpudata.epc_size >> PAGE_SHIFT)
+#define epc_base_mfn (boot_sgx_cpudata.epc_base >> PAGE_SHIFT)
+
+/* Current number of free EPC pages in free_epc_list */
+static unsigned long free_epc_npages = 0;
+
+unsigned long epc_page_to_mfn(struct epc_page *epg)
+{
+    BUG_ON(!epc_frametable);
+    BUG_ON(!epc_base_mfn);
+
+    return epc_base_mfn + (epg - epc_frametable);
+}
+
+struct epc_page *epc_mfn_to_page(unsigned long mfn)
+{
+    BUG_ON(!epc_frametable);
+    BUG_ON(!epc_base_mfn);
+
+    return epc_frametable + (mfn - epc_base_mfn);
+}
+
+struct epc_page *alloc_epc_page(void)
+{
+    struct epc_page *epg;
+
+    spin_lock(&epc_lock);
+    epg = list_first_entry_or_null(&free_epc_list, struct epc_page, list);
+    if ( epg ) {
+        list_del(&epg->list);
+        free_epc_npages--;
+    }
+    spin_unlock(&epc_lock);
+
+    return epg;
+}
+
+void free_epc_page(struct epc_page *epg)
+{
+    spin_lock(&epc_lock);
+    list_add_tail(&epg->list, &free_epc_list);
+    free_epc_npages++;
+    spin_unlock(&epc_lock);
+}
+
+void *map_epc_page_to_xen(struct epc_page *epg)
+{
+    BUG_ON(!epc_base_vaddr);
+    BUG_ON(!epc_frametable);
+
+    return (void *)(((unsigned long)(epc_base_vaddr)) +
+            ((epg - epc_frametable) << PAGE_SHIFT));
+}
+
+void unmap_epc_page(void *addr)
+{
+    /* Nothing */
+}
+
 static bool_t sgx_enabled_in_bios(void)
 {
     uint64_t val, sgx_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
@@ -177,6 +254,80 @@ static bool_t __init check_sgx_consistency(void)
     return true;
 }
 
+static int inline npages_to_order(unsigned long npages)
+{
+    int order = 0;
+
+    while ( (1 << order) < npages )
+        order++;
+
+    return order;
+}
+
+static int __init init_epc_frametable(unsigned long npages)
+{
+    unsigned long i, order;
+
+    order = npages * sizeof(struct epc_page);
+    order >>= 12;
+    order = npages_to_order(order);
+
+    epc_frametable = alloc_xenheap_pages(order, 0);
+    if ( !epc_frametable )
+        return -ENOMEM;
+
+    for ( i = 0; i < npages; i++ )
+    {
+        struct epc_page *epg = epc_frametable + i;
+
+        list_add_tail(&epg->list, &free_epc_list);
+    }
+
+    return 0;
+}
+
+static void destroy_epc_frametable(unsigned long npages)
+{
+    unsigned long order;
+
+    if ( !epc_frametable )
+        return;
+
+    order = npages * sizeof(struct epc_page);
+    order >>= 12;
+    order = npages_to_order(order);
+
+    free_xenheap_pages(epc_frametable, order);
+}
+
+static int __init sgx_init_epc(void)
+{
+    int r;
+
+    INIT_LIST_HEAD(&free_epc_list);
+    spin_lock_init(&epc_lock);
+
+    r = init_epc_frametable(total_epc_npages);
+    if ( r )
+    {
+        printk("Failed to allocate EPC frametable. Disable SGX.\n");
+        return r;
+    }
+
+    epc_base_vaddr = ioremap_cache(epc_base_mfn << PAGE_SHIFT,
+            total_epc_npages << PAGE_SHIFT);
+    if ( !epc_base_vaddr )
+    {
+        printk("Failed to ioremap_cache EPC. Disable SGX.\n");
+        destroy_epc_frametable(total_epc_npages);
+        return -EFAULT;
+    }
+
+    free_epc_npages = total_epc_npages;
+
+    return 0;
+}
+
 static int __init sgx_init(void)
 {
     /* Assume CPU 0 is always online */
@@ -188,6 +339,9 @@ static int __init sgx_init(void)
     if ( !check_sgx_consistency() )
         goto not_supported;
 
+    if ( sgx_init_epc() )
+        goto not_supported;
+
     print_sgx_cpuinfo(&boot_sgx_cpudata);
 
     return 0;
diff --git a/xen/include/asm-x86/hvm/vmx/sgx.h b/xen/include/asm-x86/hvm/vmx/sgx.h
index 5414d8237e..ff420e006e 100644
--- a/xen/include/asm-x86/hvm/vmx/sgx.h
+++ b/xen/include/asm-x86/hvm/vmx/sgx.h
@@ -12,6 +12,7 @@
 #include <xen/types.h>
 #include <xen/init.h>
 #include <asm/processor.h>
+#include <xen/list.h>
 
 #define SGX_CPUID 0x12
 
@@ -42,4 +43,22 @@ struct sgx_cpuinfo {
 /* Detect SGX info for particular CPU via SGX CPUID */
 void detect_sgx(int cpu);
 
+/*
+ * EPC page infomation structure. Each EPC has one struct epc_page to keep EPC
+ * page info, just like struct page_info for normal memory.
+ *
+ * So far in reality machine's EPC size won't execeed 100MB, so currently just
+ * put all free EPC pages in global free list.
+ */
+struct epc_page {
+    struct list_head list;  /* all free EPC pages are in global free list. */
+};
+
+struct epc_page *alloc_epc_page(void);
+void free_epc_page(struct epc_page *epg);
+unsigned long epc_page_to_mfn(struct epc_page *epg);
+struct epc_page *epc_mfn_to_page(unsigned long mfn);
+void *map_epc_page_to_xen(struct epc_page *epg);
+void unmap_epc_page(void *addr);
+
 #endif  /* __ASM_X86_HVM_VMX_SGX_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/15] xen: x86: add functions to populate and destroy EPC for domain
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (3 preceding siblings ...)
  2017-07-09  8:09 ` [PATCH 06/15] xen: x86: add SGX basic EPC management Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-09  8:09 ` [PATCH 09/15] xen: vmx: handle SGX related MSRs Kai Huang
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

Add per-domain structure to store SGX per-domain info. Currently only domain's
EPC base and size are stored. Also add new functions for further use:
    - hvm_populate_epc  # populate EPC when EPC base & size are notified.
    - hvm_reset_epc     # Reset domain's EPC to be invalid. Used when domain
                          goes to S3-S5, or being destroyed.
    - hvm_destroy_epc   # destroy and free domain's EPC.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/sgx.c         | 315 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |   3 +
 xen/include/asm-x86/hvm/vmx/sgx.h  |  14 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |   2 +
 4 files changed, 334 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
index f4c9b2f933..14379151e8 100644
--- a/xen/arch/x86/hvm/vmx/sgx.c
+++ b/xen/arch/x86/hvm/vmx/sgx.c
@@ -9,6 +9,8 @@
 #include <asm/msr.h>
 #include <xen/errno.h>
 #include <xen/mm.h>
+#include <xen/sched.h>
+#include <asm/p2m.h>
 #include <asm/hvm/vmx/sgx.h>
 #include <asm/hvm/vmx/vmcs.h>
 
@@ -90,6 +92,319 @@ void unmap_epc_page(void *addr)
     /* Nothing */
 }
 
+/* ENCLS opcode */
+#define ENCLS   .byte 0x0f, 0x01, 0xcf
+
+/*
+ * ENCLS leaf functions
+ *
+ * However currently we only needs EREMOVE..
+ */
+enum {
+    ECREATE = 0x0,
+    EADD    = 0x1,
+    EINIT   = 0x2,
+    EREMOVE = 0x3,
+    EDGBRD  = 0x4,
+    EDGBWR  = 0x5,
+    EEXTEND = 0x6,
+    ELDU    = 0x8,
+    EBLOCK  = 0x9,
+    EPA     = 0xA,
+    EWB     = 0xB,
+    ETRACK  = 0xC,
+    EAUG    = 0xD,
+    EMODPR  = 0xE,
+    EMODT   = 0xF,
+};
+
+/*
+ * ENCLS error code
+ *
+ * Currently we only need SGX_CHILD_PRESENT
+ */
+#define SGX_CHILD_PRESENT   13
+
+static inline int __encls(unsigned long rax, unsigned long rbx,
+                          unsigned long rcx, unsigned long rdx)
+{
+    int ret;
+
+    asm volatile ( "ENCLS;\n\t"
+            : "=a" (ret)
+            : "a" (rax), "b" (rbx), "c" (rcx), "d" (rdx)
+            : "memory", "cc");
+
+    return ret;
+}
+
+static inline int __eremove(void *epc)
+{
+    unsigned long rbx = 0, rdx = 0;
+
+    return __encls(EREMOVE, rbx, (unsigned long)epc, rdx);
+}
+
+static int sgx_eremove(struct epc_page *epg)
+{
+    void *addr = map_epc_page_to_xen(epg);
+    int ret;
+
+    BUG_ON(!addr);
+
+    ret =  __eremove(addr);
+
+    unmap_epc_page(addr);
+
+    return ret;
+}
+
+/*
+ * Reset domain's EPC with EREMOVE. free_epc indicates whether to free EPC
+ * pages during reset. This will be called when domain goes into S3-S5 state
+ * (with free_epc being false), and when domain is destroyed (with free_epc
+ * being true).
+ *
+ * It is possible that EREMOVE will be called for SECS when it still has
+ * children present, in which case SGX_CHILD_PRESENT will be returned. In this
+ * case, SECS page is kept to a tmp list and after all EPC pages have been
+ * called with EREMOVE, we call EREMOVE for all the SECS pages again, and this
+ * time SGX_CHILD_PRESENT should never occur as all children should have been
+ * removed.
+ *
+ * If unexpected error returned by EREMOVE, it means the EPC page becomes
+ * abnormal, so it will not be freed even free_epc is true, as further use of
+ * this EPC can cause unexpected error, potentially damaging other domains.
+ */
+static int __hvm_reset_epc(struct domain *d, unsigned long epc_base_pfn,
+        unsigned long epc_npages, bool_t free_epc)
+{
+    struct list_head secs_list;
+    struct list_head *p, *tmp;
+    unsigned long i;
+    int ret = 0;
+
+    INIT_LIST_HEAD(&secs_list);
+
+    for ( i = 0; i < epc_npages; i++ )
+    {
+        struct epc_page *epg;
+        unsigned long gfn;
+        mfn_t mfn;
+        p2m_type_t t;
+        int r;
+
+        gfn = i + epc_base_pfn;
+        mfn = get_gfn_query(d, gfn, &t);
+        if ( unlikely(mfn_eq(mfn, INVALID_MFN)) )
+        {
+            printk("Domain %d: Reset EPC error: invalid MFN for gfn 0x%lx\n",
+                    d->domain_id, gfn);
+            put_gfn(d, gfn);
+            ret = -EFAULT;
+            continue;
+        }
+
+        if ( unlikely(!p2m_is_epc(t)) )
+        {
+            printk("Domain %d: Reset EPC error: (gfn 0x%lx, mfn 0x%lx): " 
+                    "is not p2m_epc.\n", d->domain_id, gfn, mfn_x(mfn));
+            put_gfn(d, gfn);
+            ret = -EFAULT;
+            continue;
+        }
+
+        put_gfn(d, gfn);
+
+        epg = epc_mfn_to_page(mfn_x(mfn));
+
+        /* EREMOVE the EPC page to make it invalid */
+        r = sgx_eremove(epg);
+        if ( r == SGX_CHILD_PRESENT )
+        {
+            list_add_tail(&epg->list, &secs_list);
+            continue;
+        }
+
+        if ( r )
+        {
+            printk("Domain %d: Reset EPC error: (gfn 0x%lx, mfn 0x%lx): "
+                    "EREMOVE returns %d\n", d->domain_id, gfn, mfn_x(mfn), r);
+            ret = r;
+            if ( free_epc )
+                printk("WARNING: EPC (mfn 0x%lx) becomes abnormal. "
+                        "Remove it from useable EPC.", mfn_x(mfn));
+            continue;
+        }
+
+        if ( free_epc )
+        {
+            /* If EPC page is going to be freed, then also remove the mapping */
+            if ( clear_epc_p2m_entry(d, gfn, mfn) )
+            {
+                printk("Domain %d: Reset EPC error: (gfn 0x%lx, mfn 0x%lx): "
+                        "clear p2m entry failed.\n", d->domain_id, gfn,
+                        mfn_x(mfn));
+                ret = -EFAULT;
+            }
+            free_epc_page(epg);
+        }
+    }
+
+    list_for_each_safe(p, tmp, &secs_list)
+    {
+        struct epc_page *epg = list_entry(p, struct epc_page, list);
+        int r;
+
+        r = sgx_eremove(epg);
+        if ( r )
+        {
+            printk("Domain %d: Reset EPC error: mfn 0x%lx: "
+                    "EREMOVE returns %d for SECS page\n",
+                    d->domain_id, epc_page_to_mfn(epg), r);
+            ret = r;
+            list_del(p);
+
+            if ( free_epc )
+                printk("WARNING: EPC (mfn 0x%lx) becomes abnormal. "
+                        "Remove it from useable EPC.",
+                        epc_page_to_mfn(epg));
+            continue;
+        }
+
+        if ( free_epc )
+            free_epc_page(epg);
+    }
+
+    return ret;
+}
+
+static void __hvm_unpopulate_epc(struct domain *d, unsigned long epc_base_pfn,
+        unsigned long populated_npages)
+{
+    unsigned long i;
+
+    for ( i = 0; i < populated_npages; i++ )
+    {
+        struct epc_page *epg;
+        unsigned long gfn;
+        mfn_t mfn;
+        p2m_type_t t;
+
+        gfn = i + epc_base_pfn;
+        mfn = get_gfn_query(d, gfn, &t);
+        if ( unlikely(mfn_eq(mfn, INVALID_MFN)) )
+        {
+            /*
+             * __hvm_unpopulate_epc only called when creating the domain on
+             * failure, therefore we can just ignore this error.
+             */
+            printk("%s: Domain %u gfn 0x%lx returns invalid mfn\n", __func__,
+                    d->domain_id, gfn);
+            put_gfn(d, gfn);
+            continue;
+        }
+
+        if ( unlikely(!p2m_is_epc(t)) )
+        {
+            printk("%s: Domain %u gfn 0x%lx returns non-EPC p2m type: %d\n",
+                    __func__, d->domain_id, gfn, (int)t);
+            put_gfn(d, gfn);
+            continue;
+        }
+
+        put_gfn(d, gfn);
+
+        if ( clear_epc_p2m_entry(d, gfn, mfn) )
+        {
+            printk("clear_epc_p2m_entry failed: gfn 0x%lx, mfn 0x%lx\n",
+                    gfn, mfn_x(mfn));
+            continue;
+        }
+
+        epg = epc_mfn_to_page(mfn_x(mfn));
+        free_epc_page(epg);
+    }
+}
+
+static int __hvm_populate_epc(struct domain *d, unsigned long epc_base_pfn,
+        unsigned long epc_npages)
+{
+    unsigned long i;
+    int ret;
+
+    for ( i = 0; i < epc_npages; i++ )
+    {
+        struct epc_page *epg = alloc_epc_page();
+        unsigned long mfn;
+
+        if ( !epg )
+        {
+            printk("%s: Out of EPC\n", __func__);
+            ret = -ENOMEM;
+            goto err;
+        }
+
+        mfn = epc_page_to_mfn(epg);
+        ret = set_epc_p2m_entry(d, i + epc_base_pfn, _mfn(mfn));
+        if ( ret )
+        {
+            printk("%s: set_epc_p2m_entry failed with %d: gfn 0x%lx, "
+                    "mfn 0x%lx\n", __func__, ret, i + epc_base_pfn, mfn);
+            free_epc_page(epg);
+            goto err;
+        }
+    }
+
+    return 0;
+
+err:
+    __hvm_unpopulate_epc(d, epc_base_pfn, i);
+    return ret;
+}
+
+int hvm_populate_epc(struct domain *d, unsigned long epc_base_pfn,
+        unsigned long epc_npages)
+{
+    struct sgx_domain *sgx = to_sgx(d);
+    int ret;
+
+    if ( hvm_epc_populated(d) )
+        return -EBUSY;
+
+    if ( !epc_base_pfn || !epc_npages )
+        return -EINVAL;
+
+    if ( (ret = __hvm_populate_epc(d, epc_base_pfn, epc_npages)) )
+        return ret;
+
+    sgx->epc_base_pfn = epc_base_pfn;
+    sgx->epc_npages = epc_npages;
+
+    return 0;
+}
+
+/*
+ *
+*
+ * This function returns error immediately if there's any unexpected error
+ * during this process.
+ */
+int hvm_reset_epc(struct domain *d, bool_t free_epc)
+{
+    struct sgx_domain *sgx = to_sgx(d);
+
+    if ( !hvm_epc_populated(d) )
+        return 0;
+
+    return __hvm_reset_epc(d, sgx->epc_base_pfn, sgx->epc_npages, free_epc);
+}
+
+void hvm_destroy_epc(struct domain *d)
+{
+    hvm_reset_epc(d, true);
+}
+
 static bool_t sgx_enabled_in_bios(void)
 {
     uint64_t val, sgx_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c53b24955a..243643111d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -416,6 +416,9 @@ static int vmx_domain_initialise(struct domain *d)
 
 static void vmx_domain_destroy(struct domain *d)
 {
+    if ( hvm_epc_populated(d) )
+        hvm_destroy_epc(d);
+
     if ( !has_vlapic(d) )
         return;
 
diff --git a/xen/include/asm-x86/hvm/vmx/sgx.h b/xen/include/asm-x86/hvm/vmx/sgx.h
index ff420e006e..40f860662a 100644
--- a/xen/include/asm-x86/hvm/vmx/sgx.h
+++ b/xen/include/asm-x86/hvm/vmx/sgx.h
@@ -13,6 +13,7 @@
 #include <xen/init.h>
 #include <asm/processor.h>
 #include <xen/list.h>
+#include <public/hvm/params.h>   /* HVM_PARAM_SGX */
 
 #define SGX_CPUID 0x12
 
@@ -61,4 +62,17 @@ struct epc_page *epc_mfn_to_page(unsigned long mfn);
 void *map_epc_page_to_xen(struct epc_page *epg);
 void unmap_epc_page(void *addr);
 
+struct sgx_domain {
+    unsigned long epc_base_pfn;
+    unsigned long epc_npages;
+};
+
+#define to_sgx(d)   (&((d)->arch.hvm_domain.vmx.sgx))
+#define hvm_epc_populated(d)  (!!((d)->arch.hvm_domain.vmx.sgx.epc_base_pfn))
+
+int hvm_populate_epc(struct domain *d, unsigned long epc_base_pfn,
+        unsigned long epc_npages);
+int hvm_reset_epc(struct domain *d, bool_t free_epc);
+void hvm_destroy_epc(struct domain *d);
+
 #endif  /* __ASM_X86_HVM_VMX_SGX_H__ */
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 889091da42..6cfa5c3310 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -20,6 +20,7 @@
 
 #include <asm/hvm/io.h>
 #include <irq_vectors.h>
+#include <asm/hvm/vmx/sgx.h>
 
 extern void vmcs_dump_vcpu(struct vcpu *v);
 extern void setup_vmcs_dump(void);
@@ -62,6 +63,7 @@ struct vmx_domain {
     unsigned long apic_access_mfn;
     /* VMX_DOMAIN_* */
     unsigned int status;
+    struct sgx_domain sgx;
 };
 
 struct pi_desc {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/15] xen: vmx: handle SGX related MSRs
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (4 preceding siblings ...)
  2017-07-09  8:09 ` [PATCH 07/15] xen: x86: add functions to populate and destroy EPC for domain Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-19 17:27   ` Andrew Cooper
  2017-07-09  8:09 ` [PATCH 10/15] xen: vmx: handle ENCLS VMEXIT Kai Huang
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

This patch handles IA32_FEATURE_CONTROL and IA32_SGXLEPUBKEYHASHn MSRs.

For IA32_FEATURE_CONTROL, if SGX is exposed to domain, then SGX_ENABLE bit
is always set. If SGX launch control is also exposed to domain, and physical
IA32_SGXLEPUBKEYHASHn are writable, then SGX_LAUNCH_CONTROL_ENABLE bit is
also always set. Write to IA32_FEATURE_CONTROL is ignored.

For IA32_SGXLEPUBKEYHASHn, a new 'struct sgx_vcpu' is added for per-vcpu SGX
staff, and currently it has vcpu's virtual ia32_sgxlepubkeyhash[0-3]. Two
boolean 'readable' and 'writable' are also added to indicate whether virtual
IA32_SGXLEPUBKEYHASHn are readable and writable.

During vcpu is initialized, virtual ia32_sgxlepubkeyhash are also initialized.
If physical IA32_SGXLEPUBKEYHASHn are writable, then ia32_sgxlepubkeyhash are
set to Intel's default value, as for physical machine, those MSRs will have
Intel's default value. If physical MSRs are not writable (it is *locked* by
BIOS before handling to Xen), then we try to read those MSRs and use physical
values as defult value for virtual MSRs. One thing is rdmsr_safe is used, as
although SDM says if SGX is present, IA32_SGXLEPUBKEYHASHn are available for
read, but in reality, skylake client (at least some, depending on BIOS) doesn't
have those MSRs available, so we use rdmsr_safe and set readable to false if it
returns error code.

For IA32_SGXLEPUBKEYHASHn MSR read from guest, if physical MSRs are not
readable, guest is not allowed to read either, otherwise vcpu's virtual MSR
value is returned.

For IA32_SGXLEPUBKEYHASHn MSR write from guest, we allow guest to write if both
physical MSRs are writable and SGX launch control is exposed to domain,
otherwise error is injected.

To make EINIT run successfully in guest, vcpu's virtual IA32_SGXLEPUBKEYHASHn
will be update to physical MSRs when vcpu is scheduled in.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/sgx.c         | 194 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |  24 +++++
 xen/include/asm-x86/cpufeature.h   |   3 +
 xen/include/asm-x86/hvm/vmx/sgx.h  |  22 +++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |   2 +
 xen/include/asm-x86/msr-index.h    |   6 ++
 6 files changed, 251 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
index 14379151e8..4944e57aef 100644
--- a/xen/arch/x86/hvm/vmx/sgx.c
+++ b/xen/arch/x86/hvm/vmx/sgx.c
@@ -405,6 +405,200 @@ void hvm_destroy_epc(struct domain *d)
     hvm_reset_epc(d, true);
 }
 
+/* Whether IA32_SGXLEPUBKEYHASHn are physically *unlocked* by BIOS */
+bool_t sgx_ia32_sgxlepubkeyhash_writable(void)
+{
+    uint64_t sgx_lc_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
+                              IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE |
+                              IA32_FEATURE_CONTROL_LOCK;
+    uint64_t val;
+
+    rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
+
+    return (val & sgx_lc_enabled) == sgx_lc_enabled;
+}
+
+bool_t domain_has_sgx(struct domain *d)
+{
+    /* hvm_epc_populated(d) implies CPUID has SGX */
+    return hvm_epc_populated(d);
+}
+
+bool_t domain_has_sgx_launch_control(struct domain *d)
+{
+    struct cpuid_policy *p = d->arch.cpuid;
+
+    if ( !domain_has_sgx(d) )
+        return false;
+
+    /* Unnecessary but check anyway */
+    if ( !cpu_has_sgx_launch_control )
+        return false;
+
+    return !!p->feat.sgx_launch_control;
+}
+
+/* Digest of Intel signing key. MSR's default value after reset. */
+#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac
+#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d
+#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d
+#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b
+
+void sgx_vcpu_init(struct vcpu *v)
+{
+    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
+
+    memset(sgxv, 0, sizeof (*sgxv));
+
+    if ( sgx_ia32_sgxlepubkeyhash_writable() )
+    {
+        /*
+         * If physical MSRs are writable, set vcpu's default value to Intel's
+         * default value. For real machine, after reset, MSRs contain Intel's
+         * default value.
+         */
+        sgxv->ia32_sgxlepubkeyhash[0] = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
+        sgxv->ia32_sgxlepubkeyhash[1] = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
+        sgxv->ia32_sgxlepubkeyhash[2] = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
+        sgxv->ia32_sgxlepubkeyhash[3] = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
+
+        sgxv->readable = 1;
+        sgxv->writable = domain_has_sgx_launch_control(v->domain);
+    }
+    else
+    {
+        uint64_t v;
+        /*
+         * Although SDM says if SGX is present, then IA32_SGXLEPUBKEYHASHn are
+         * available for read, but in reality for SKYLAKE client machines,
+         * those MSRs are not available if SGX is present, so we cannot rely on
+         * cpu_has_sgx to determine whether to we are able to read MSRs,
+         * instead, we always use rdmsr_safe.
+         */
+        sgxv->readable = rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, v) ? 0 : 1;
+
+        if ( !sgxv->readable )
+            return;
+
+        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, sgxv->ia32_sgxlepubkeyhash[0]);
+        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH1, sgxv->ia32_sgxlepubkeyhash[1]);
+        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH2, sgxv->ia32_sgxlepubkeyhash[2]);
+        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH3, sgxv->ia32_sgxlepubkeyhash[3]);
+    }
+}
+
+void sgx_ctxt_switch_to(struct vcpu *v)
+{
+    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
+
+    if ( sgxv->writable && sgx_ia32_sgxlepubkeyhash_writable() )
+    {
+        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0, sgxv->ia32_sgxlepubkeyhash[0]);
+        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1, sgxv->ia32_sgxlepubkeyhash[1]);
+        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2, sgxv->ia32_sgxlepubkeyhash[2]);
+        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3, sgxv->ia32_sgxlepubkeyhash[3]);
+    }
+}
+
+int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 *msr_content)
+{
+    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
+    u64 data;
+    int r = 1;
+
+    if ( !domain_has_sgx(v->domain) )
+        return 0;
+
+    switch ( msr )
+    {
+    case MSR_IA32_FEATURE_CONTROL:
+        data = (IA32_FEATURE_CONTROL_LOCK |
+                IA32_FEATURE_CONTROL_SGX_ENABLE);
+        /*
+         * If physical IA32_SGXLEPUBKEYHASHn are writable, then we always
+         * allow guest to be able to change IA32_SGXLEPUBKEYHASHn at runtime.
+         */
+        if ( sgx_ia32_sgxlepubkeyhash_writable() &&
+                domain_has_sgx_launch_control(v->domain) )
+            data |= IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
+
+        *msr_content = data;
+
+        break;
+    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
+        /*
+         * SDM 35.1 Model-Specific Registers, table 35-2.
+         *
+         * IA32_SGXLEPUBKEYHASH[0..3]:
+         *
+         * Read permitted if CPUID.0x12.0:EAX[0] = 1.
+         *
+         * In reality, MSRs may not be readable even SGX is present, in which
+         * case guest is not allowed to read either.
+         */
+        if ( !sgxv->readable )
+        {
+            r = 0;
+            break;
+        }
+
+        data = sgxv->ia32_sgxlepubkeyhash[msr - MSR_IA32_SGXLEPUBKEYHASH0];
+
+        *msr_content = data;
+
+        break;
+    default:
+        r = 0;
+        break;
+    }
+
+    return r;
+}
+
+int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 msr_content)
+{
+    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
+    int r = 1;
+
+    if ( !domain_has_sgx(v->domain) )
+        return 0;
+
+    switch ( msr )
+    {
+    case MSR_IA32_FEATURE_CONTROL:
+        /* sliently drop */
+        break;
+    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
+        /*
+         * SDM 35.1 Model-Specific Registers, table 35-2.
+         *
+         * IA32_SGXLEPUBKEYHASH[0..3]:
+         *
+         * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is available.
+         * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
+         *      FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
+         *
+         * sgxv->writable == 1 means sgx_ia32_sgxlepubkeyhash_writable() and
+         * domain_has_sgx_launch_control(d) both are true.
+         */
+        if ( !sgxv->writable )
+        {
+            r = 0;
+            break;
+        }
+
+        sgxv->ia32_sgxlepubkeyhash[msr - MSR_IA32_SGXLEPUBKEYHASH0] =
+            msr_content;
+
+        break;
+    default:
+        r = 0;
+        break;
+    }
+
+    return r;
+}
+
 static bool_t sgx_enabled_in_bios(void)
 {
     uint64_t val, sgx_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 243643111d..7ee5515bdc 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -470,6 +470,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
     if ( v->vcpu_id == 0 )
         v->arch.user_regs.rax = 1;
 
+    sgx_vcpu_init(v);
+
     return 0;
 }
 
@@ -1048,6 +1050,9 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
 
     if ( v->domain->arch.hvm_domain.pi_ops.switch_to )
         v->domain->arch.hvm_domain.pi_ops.switch_to(v);
+
+    if ( domain_has_sgx(v->domain) )
+        sgx_ctxt_switch_to(v);
 }
 
 
@@ -2876,10 +2881,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         __vmread(GUEST_IA32_DEBUGCTL, msr_content);
         break;
     case MSR_IA32_FEATURE_CONTROL:
+        /* If neither SGX nor nested is supported, this MSR should not be
+         * touched */
+        if ( !sgx_msr_read_intercept(current, msr, msr_content) &&
+                !nvmx_msr_read_intercept(msr, msr_content) )
+            goto gp_fault;
+        break;
     case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
         if ( !nvmx_msr_read_intercept(msr, msr_content) )
             goto gp_fault;
         break;
+    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
+        if ( !sgx_msr_read_intercept(current, msr, msr_content) )
+            goto gp_fault;
+        break;
     case MSR_IA32_MISC_ENABLE:
         rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
         /* Debug Trace Store is not supported. */
@@ -3119,10 +3134,19 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         break;
     }
     case MSR_IA32_FEATURE_CONTROL:
+        /* See vmx_msr_read_intercept */
+        if ( !sgx_msr_write_intercept(current, msr, msr_content) &&
+                !nvmx_msr_write_intercept(msr, msr_content) )
+            goto gp_fault;
+        break;
     case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_TRUE_ENTRY_CTLS:
         if ( !nvmx_msr_write_intercept(msr, msr_content) )
             goto gp_fault;
         break;
+    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
+        if ( !sgx_msr_write_intercept(current, msr, msr_content) )
+            goto gp_fault;
+        break;
     case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
     case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
     case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 9793f8c1c5..dfb17c4bd8 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -98,6 +98,9 @@
 #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
 #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
 
+/* CPUID level 0x00000007:0.ecx */
+#define cpu_has_sgx_launch_control  boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL)
+
 /* CPUID level 0x80000007.edx */
 #define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
 
diff --git a/xen/include/asm-x86/hvm/vmx/sgx.h b/xen/include/asm-x86/hvm/vmx/sgx.h
index 40f860662a..c460f61e5e 100644
--- a/xen/include/asm-x86/hvm/vmx/sgx.h
+++ b/xen/include/asm-x86/hvm/vmx/sgx.h
@@ -75,4 +75,26 @@ int hvm_populate_epc(struct domain *d, unsigned long epc_base_pfn,
 int hvm_reset_epc(struct domain *d, bool_t free_epc);
 void hvm_destroy_epc(struct domain *d);
 
+/* Per-vcpu SGX structure */
+struct sgx_vcpu {
+    uint64_t ia32_sgxlepubkeyhash[4];
+    /*
+     * Although SDM says if SGX is present, then IA32_SGXLEPUBKEYHASHn are
+     * available for read, but in reality for SKYLAKE client machines, those
+     * those MSRs are not available if SGX is present.
+     */
+    bool_t readable;
+    bool_t writable;
+};
+#define to_sgx_vcpu(v)  (&(v->arch.hvm_vmx.sgx))
+
+bool_t sgx_ia32_sgxlepubkeyhash_writable(void);
+bool_t domain_has_sgx(struct domain *d);
+bool_t domain_has_sgx_launch_control(struct domain *d);
+
+void sgx_vcpu_init(struct vcpu *v);
+void sgx_ctxt_switch_to(struct vcpu *v);
+int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 *msr_content);
+int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 msr_content);
+
 #endif  /* __ASM_X86_HVM_VMX_SGX_H__ */
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6cfa5c3310..fc0b9d85fd 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -160,6 +160,8 @@ struct arch_vmx_struct {
      * pCPU and wakeup the related vCPU.
      */
     struct pi_blocking_vcpu pi_blocking;
+
+    struct sgx_vcpu sgx;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 771e7500af..16206a11b7 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -296,6 +296,12 @@
 #define IA32_FEATURE_CONTROL_SENTER_PARAM_CTL         0x7f00
 #define IA32_FEATURE_CONTROL_ENABLE_SENTER            0x8000
 #define IA32_FEATURE_CONTROL_SGX_ENABLE               0x40000
+#define IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE  0x20000
+
+#define MSR_IA32_SGXLEPUBKEYHASH0   0x0000008c
+#define MSR_IA32_SGXLEPUBKEYHASH1   0x0000008d
+#define MSR_IA32_SGXLEPUBKEYHASH2   0x0000008e
+#define MSR_IA32_SGXLEPUBKEYHASH3   0x0000008f
 
 #define MSR_IA32_TSC_ADJUST		0x0000003b
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/15] xen: vmx: handle ENCLS VMEXIT
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (5 preceding siblings ...)
  2017-07-09  8:09 ` [PATCH 09/15] xen: vmx: handle SGX related MSRs Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-09  8:09 ` [PATCH 11/15] xen: vmx: handle VMEXIT from SGX enclave Kai Huang
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

Currently EPC are statically allocated and mapped to guest, we don't have
to trap ENCLS as it runs perfectly in VMX non-root mode. But exposing SGX
to guest means we also expose ENABLE_ENCLS bit to L1 hypervisor, therefore
we cannot stop L1 from enabling ENCLS VMEXIT. For ENCLS VMEXIT from L2 guest,
we simply inject it to L1, otherwise the ENCLS VMEXIT is unexpected in L0
and we simply crash the domain.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         | 10 ++++++++++
 xen/arch/x86/hvm/vmx/vvmx.c        | 11 +++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |  1 +
 4 files changed, 23 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7ee5515bdc..ea3d468bb0 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -4126,6 +4126,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_handle_apic_write();
         break;
 
+    case EXIT_REASON_ENCLS:
+        /*
+         * Currently L0 doesn't turn on ENCLS VMEXIT, but L0 cannot stop L1
+         * from enabling ENCLS VMEXIT. ENCLS VMEXIT from L2 guest has already
+         * been handled so by reaching here it is a BUG. We simply crash the
+         * domain.
+         */
+        domain_crash(v->domain);
+        break;
+
     case EXIT_REASON_PML_FULL:
         vmx_vcpu_flush_pml_buffer(v);
         break;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3560faec6d..7eb10738d9 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -2059,6 +2059,12 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
                SECONDARY_EXEC_ENABLE_VPID |
                SECONDARY_EXEC_UNRESTRICTED_GUEST |
                SECONDARY_EXEC_ENABLE_EPT;
+        /*
+         * If SGX is exposed to guest, then ENABLE_ENCLS bit must also be
+         * exposed to guest.
+         */
+        if ( domain_has_sgx(d) )
+            data |= SECONDARY_EXEC_ENABLE_ENCLS;
         data = gen_vmx_msr(data, 0, host_data);
         break;
     case MSR_IA32_VMX_EXIT_CTLS:
@@ -2291,6 +2297,11 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
     case EXIT_REASON_VMXON:
     case EXIT_REASON_INVEPT:
     case EXIT_REASON_XSETBV:
+    /*
+     * L0 doesn't turn on ENCLS VMEXIT now, so ENCLS VMEXIT must come from
+     * L2 guest, and is because of ENCLS VMEXIT is turned on by L1.
+     */
+    case EXIT_REASON_ENCLS:
         /* inject to L1 */
         nvcpu->nv_vmexit_pending = 1;
         break;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index fc0b9d85fd..1350b7bc81 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -396,6 +396,7 @@ enum vmcs_field {
     VIRT_EXCEPTION_INFO             = 0x0000202a,
     XSS_EXIT_BITMAP                 = 0x0000202c,
     TSC_MULTIPLIER                  = 0x00002032,
+    ENCLS_EXITING_BITMAP            = 0x0000202E,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     VMCS_LINK_POINTER               = 0x00002800,
     GUEST_IA32_DEBUGCTL             = 0x00002802,
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 4889a64255..211f5c8058 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -210,6 +210,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
 #define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
 #define EXIT_REASON_VMFUNC              59
+#define EXIT_REASON_ENCLS               60
 #define EXIT_REASON_PML_FULL            62
 #define EXIT_REASON_XSAVES              63
 #define EXIT_REASON_XRSTORS             64
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/15] xen: vmx: handle VMEXIT from SGX enclave
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (6 preceding siblings ...)
  2017-07-09  8:09 ` [PATCH 10/15] xen: vmx: handle ENCLS VMEXIT Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-09  8:09 ` [PATCH 12/15] xen: x86: reset EPC when guest got suspended Kai Huang
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

VMX adds new bit to both exit_reason and GUEST_INTERRUPT_STATE to indicate
whether VMEXIT happens in Enclave. Several instructions are also invalid or
behave differently in enclave according to SDM. This patch handles those
cases.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         | 29 +++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  2 ++
 xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ea3d468bb0..d0c43ea0c8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -57,6 +57,7 @@
 #include <asm/event.h>
 #include <asm/monitor.h>
 #include <public/arch-x86/cpuid.h>
+#include <asm/hvm/vmx/sgx.h>
 
 static bool_t __initdata opt_force_ept;
 boolean_param("force-ept", opt_force_ept);
@@ -3544,6 +3545,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
     unsigned int vector = 0, mode;
     struct vcpu *v = current;
+    bool_t exit_from_sgx_enclave;
 
     __vmread(GUEST_RIP,    &regs->rip);
     __vmread(GUEST_RSP,    &regs->rsp);
@@ -3569,6 +3571,11 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
     perfc_incra(vmexits, exit_reason);
 
+    /* We need to handle several VMEXITs if VMEXIT is from enclave. Also clear
+     * bit 27 as it is further useless. */
+    exit_from_sgx_enclave = !!(exit_reason & VMX_EXIT_REASONS_FROM_ENCLAVE);
+    exit_reason &= ~VMX_EXIT_REASONS_FROM_ENCLAVE;
+
     /* Handle the interrupt we missed before allowing any more in. */
     switch ( (uint16_t)exit_reason )
     {
@@ -4070,6 +4077,18 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         break;
 
     case EXIT_REASON_INVD:
+       /*
+        * SDM 39.6.5 INVD Handling when Enclave Are Enabled
+        *
+        * INVD cause #GP if EPC is enabled.
+        * FIXME: WBINVD??
+        */
+        if ( exit_from_sgx_enclave )
+        {
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+            break;
+        }
+        /* Otherwise passthrough */
     case EXIT_REASON_WBINVD:
     {
         update_guest_eip(); /* Safe: INVD, WBINVD */
@@ -4081,6 +4100,16 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
     {
         paddr_t gpa;
 
+        /*
+         * Currently EPT violation from enclave is not possible as all EPC pages
+         * are statically allocated to guest when guest is created. We simply
+         * crash guest in this case.
+         */
+        if ( exit_from_sgx_enclave )
+        {
+            domain_crash(v->domain);
+            break;
+        }
         __vmread(GUEST_PHYSICAL_ADDRESS, &gpa);
         __vmread(EXIT_QUALIFICATION, &exit_qualification);
         ept_handle_violation(exit_qualification, gpa);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 1350b7bc81..bbbc3d0d78 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -327,6 +327,8 @@ extern u64 vmx_ept_vpid_cap;
 #define VMX_INTR_SHADOW_MOV_SS          0x00000002
 #define VMX_INTR_SHADOW_SMI             0x00000004
 #define VMX_INTR_SHADOW_NMI             0x00000008
+#define VMX_INTR_ENCLAVE_INTR           0x00000010  /* VMEXIT was incident to
+                                                       enclave mode */
 
 #define VMX_BASIC_REVISION_MASK         0x7fffffff
 #define VMX_BASIC_VMCS_SIZE_MASK        (0x1fffULL << 32)
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 211f5c8058..2184d35246 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -153,6 +153,8 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
  * Exit Reasons
  */
 #define VMX_EXIT_REASONS_FAILED_VMENTRY 0x80000000
+/* Bit 27 is also set if VMEXIT is from SGX enclave mode */
+#define VMX_EXIT_REASONS_FROM_ENCLAVE   0x08000000
 
 #define EXIT_REASON_EXCEPTION_NMI       0
 #define EXIT_REASON_EXTERNAL_INTERRUPT  1
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/15] xen: x86: reset EPC when guest got suspended.
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (7 preceding siblings ...)
  2017-07-09  8:09 ` [PATCH 11/15] xen: vmx: handle VMEXIT from SGX enclave Kai Huang
@ 2017-07-09  8:09 ` Kai Huang
  2017-07-09  8:10 ` [PATCH 04/15] xen: mm: add ioremap_cache Kai Huang
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:09 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, kevin.tian, jbeulich

EPC is destroyed when power state goes to S3-S5. Emulate this behavior.

A new function s3_suspend is added to hvm_function_table for this purpose.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/hvm/hvm.c        | 3 +++
 xen/arch/x86/hvm/vmx/vmx.c    | 7 +++++++
 xen/include/asm-x86/hvm/hvm.h | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 70ddc81d44..1021cd7307 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3858,6 +3858,9 @@ static void hvm_s3_suspend(struct domain *d)
 
     hvm_vcpu_reset_state(d->vcpu[0], 0xf000, 0xfff0);
 
+    if ( hvm_funcs.s3_suspend )
+        hvm_funcs.s3_suspend(d);
+
     domain_unlock(d);
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d0c43ea0c8..98c346178e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2290,6 +2290,12 @@ static bool vmx_get_pending_event(struct vcpu *v, struct x86_event *info)
     return true;
 }
 
+static void vmx_s3_suspend(struct domain *d)
+{
+    if ( domain_has_sgx(d) )
+        hvm_reset_epc(d, false);
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -2360,6 +2366,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
         .max_ratio = VMX_TSC_MULTIPLIER_MAX,
         .setup     = vmx_setup_tsc_scaling,
     },
+    .s3_suspend = vmx_s3_suspend,
 };
 
 /* Handle VT-d posted-interrupt when VCPU is blocked. */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index b687e03dce..244b6566f2 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -226,6 +226,9 @@ struct hvm_function_table {
         /* Architecture function to setup TSC scaling ratio */
         void (*setup)(struct vcpu *v);
     } tsc_scaling;
+
+    /* Domain S3 suspend */
+    void (*s3_suspend)(struct domain *d);
 };
 
 extern struct hvm_function_table hvm_funcs;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/15] xen: mm: add ioremap_cache
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (8 preceding siblings ...)
  2017-07-09  8:09 ` [PATCH 12/15] xen: x86: reset EPC when guest got suspended Kai Huang
@ 2017-07-09  8:10 ` Kai Huang
  2017-07-11 20:14   ` Julien Grall
  2017-07-09  8:10 ` [PATCH 08/15] xen: x86: add SGX cpuid handling support Kai Huang
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:10 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, jbeulich

Currently Xen only has non-cacheable version of ioremap. Although EPC is
reported as reserved memory in e820 but it can be mapped as cacheable. This
patch adds ioremap_cache (cacheable version of ioremap).

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/mm.c      | 15 +++++++++++++--
 xen/include/xen/vmap.h |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 101ab33193..d0b6b3a247 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6284,9 +6284,10 @@ void *__init arch_vmap_virt_end(void)
     return (void *)fix_to_virt(__end_of_fixed_addresses);
 }
 
-void __iomem *ioremap(paddr_t pa, size_t len)
+static void __iomem *__ioremap(paddr_t pa, size_t len, bool_t cache)
 {
     mfn_t mfn = _mfn(PFN_DOWN(pa));
+    unsigned int flags = cache ? PAGE_HYPERVISOR : PAGE_HYPERVISOR_NOCACHE;
     void *va;
 
     WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL));
@@ -6299,12 +6300,22 @@ void __iomem *ioremap(paddr_t pa, size_t len)
         unsigned int offs = pa & (PAGE_SIZE - 1);
         unsigned int nr = PFN_UP(offs + len);
 
-        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE, VMAP_DEFAULT) + offs;
+        va = __vmap(&mfn, nr, 1, 1, flags, VMAP_DEFAULT) + offs;
     }
 
     return (void __force __iomem *)va;
 }
 
+void __iomem *ioremap(paddr_t pa, size_t len)
+{
+    return __ioremap(pa, len, false);
+}
+
+void __iomem *ioremap_cache(paddr_t pa, size_t len)
+{
+    return __ioremap(pa, len, true);
+}
+
 int create_perdomain_mapping(struct domain *d, unsigned long va,
                              unsigned int nr, l1_pgentry_t **pl1tab,
                              struct page_info **ppg)
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 369560e620..f6037e368c 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -24,6 +24,7 @@ void *vzalloc(size_t size);
 void vfree(void *va);
 
 void __iomem *ioremap(paddr_t, size_t);
+void __iomem *ioremap_cache(paddr_t, size_t);
 
 static inline void iounmap(void __iomem *va)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/15] xen: x86: add SGX cpuid handling support.
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (9 preceding siblings ...)
  2017-07-09  8:10 ` [PATCH 04/15] xen: mm: add ioremap_cache Kai Huang
@ 2017-07-09  8:10 ` Kai Huang
  2017-07-12 10:56   ` Andrew Cooper
  2017-07-09  8:12 ` [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping Kai Huang
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:10 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, jbeulich

This patch adds SGX to cpuid handling support. In init_guest_cpuid, for
raw_policy and host_policy, physical EPC info is reported, but for pv_max_policy
and hvm_max_policy EPC is hidden, as for particular domain, it's EPC base and
size are from tookstack, and it is meaningless to contain physical EPC info in
them. Before domain's EPC base and size are properly configured, guest's SGX
cpuid should report invalid EPC, which is also consistent with HW behavior.

Currently all EPC pages are fully populated for domain when it is created.
Xen gets domain's EPC base and size from toolstack via XEN_DOMCTL_set_cpuid,
so domain's EPC pages are also populated in XEN_DOMCTL_set_cpuid, after
receiving valid EPC base and size. Failure to populate EPC (such as there's
no enough free EPC pages) results in domain creation failure by making
XEN_DOMCTL_set_cpuid return error.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/cpuid.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/domctl.c       | 47 +++++++++++++++++++++++-
 xen/include/asm-x86/cpuid.h | 26 +++++++++++++-
 3 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index d359e090f3..db896be2e8 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -9,6 +9,7 @@
 #include <asm/paging.h>
 #include <asm/processor.h>
 #include <asm/xstate.h>
+#include <asm/hvm/vmx/sgx.h>
 
 const uint32_t known_features[] = INIT_KNOWN_FEATURES;
 const uint32_t special_features[] = INIT_SPECIAL_FEATURES;
@@ -158,6 +159,44 @@ static void recalculate_xstate(struct cpuid_policy *p)
     }
 }
 
+static void recalculate_sgx(struct cpuid_policy *p, bool_t hide_epc)
+{
+    if ( !p->feat.sgx )
+    {
+        memset(&p->sgx, 0, sizeof (p->sgx));
+        return;
+    }
+
+    if ( !p->sgx.sgx1 )
+    {
+        memset(&p->sgx, 0, sizeof (p->sgx));
+        return;
+    }
+
+    /*
+     * SDM 42.7.2.1 SECS.ATTRIBUTE.XFRM:
+     *
+     * Legal value for SECS.ATTRIBUTE.XFRM conform to these requirements:
+     *  - XFRM[1:0] must be set to 0x3;
+     *  - If processor does not support XSAVE, or if the system software has not
+     *    enabled XSAVE, then XFRM[63:2] must be 0.
+     *  - If the processor does support XSAVE, XFRM must contain a value that
+     *    would be legal if loaded into XCR0.
+     */
+    p->sgx.xfrm_low = 0x3;
+    p->sgx.xfrm_high = 0;
+    if ( p->basic.xsave )
+    {
+        p->sgx.xfrm_low |= p->xstate.xcr0_low;
+        p->sgx.xfrm_high |= p->xstate.xcr0_high;
+    }
+
+    if ( hide_epc )
+    {
+        memset(&p->sgx.raw[0x2], 0, sizeof (struct cpuid_leaf));
+    }
+}
+
 /*
  * Misc adjustments to the policy.  Mostly clobbering reserved fields and
  * duplicating shared fields.  Intentionally hidden fields are annotated.
@@ -239,7 +278,7 @@ static void __init calculate_raw_policy(void)
     {
         switch ( i )
         {
-        case 0x4: case 0x7: case 0xd:
+        case 0x4: case 0x7: case 0xd: case 0x12:
             /* Multi-invocation leaves.  Deferred. */
             continue;
         }
@@ -299,6 +338,19 @@ static void __init calculate_raw_policy(void)
         }
     }
 
+    if ( p->basic.max_leaf >= SGX_CPUID )
+    {
+        /*
+         * For raw policy we just report native CPUID. For EPC on native it's
+         * possible that we will have multiple EPC sections (meaning subleaf 3,
+         * 4, ... may also be valid), but as the policy is for guest so we only
+         * need one EPC section (subleaf 2).
+         */
+        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
+        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
+        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
+    }
+
     /* Extended leaves. */
     cpuid_leaf(0x80000000, &p->extd.raw[0]);
     for ( i = 1; i < min(ARRAY_SIZE(p->extd.raw),
@@ -324,6 +376,8 @@ static void __init calculate_host_policy(void)
     cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p);
     recalculate_xstate(p);
     recalculate_misc(p);
+    /* For host policy we report physical EPC */
+    recalculate_sgx(p, 0);
 
     if ( p->extd.svm )
     {
@@ -357,6 +411,11 @@ static void __init calculate_pv_max_policy(void)
     sanitise_featureset(pv_featureset);
     cpuid_featureset_to_policy(pv_featureset, p);
     recalculate_xstate(p);
+    /*
+     * For PV policy we don't report physical EPC. Actually for PV policy
+     * currently SGX will be disabled.
+     */
+    recalculate_sgx(p, 1);
 
     p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
 }
@@ -413,6 +472,13 @@ static void __init calculate_hvm_max_policy(void)
     sanitise_featureset(hvm_featureset);
     cpuid_featureset_to_policy(hvm_featureset, p);
     recalculate_xstate(p);
+    /*
+     * For HVM policy we don't report physical EPC. Actually cpuid policy
+     * should report VM's virtual EPC base and size. However VM's virtual
+     * EPC info will come from toolstack, and only after Xen is notified
+     * VM's cpuid policy should report invalid EPC.
+     */
+    recalculate_sgx(p, 1);
 }
 
 void __init init_guest_cpuid(void)
@@ -528,6 +594,12 @@ void recalculate_cpuid_policy(struct domain *d)
     if ( p->basic.max_leaf < XSTATE_CPUID )
         __clear_bit(X86_FEATURE_XSAVE, fs);
 
+    if ( p->basic.max_leaf < SGX_CPUID )
+    {
+        __clear_bit(X86_FEATURE_SGX, fs);
+        __clear_bit(X86_FEATURE_SGX_LAUNCH_CONTROL, fs);
+    }
+
     sanitise_featureset(fs);
 
     /* Fold host's FDP_EXCP_ONLY and NO_FPU_SEL into guest's view. */
@@ -550,6 +622,12 @@ void recalculate_cpuid_policy(struct domain *d)
 
     recalculate_xstate(p);
     recalculate_misc(p);
+    /*
+     * recalculate_cpuid_policy is also called for domain's cpuid policy,
+     * which is from toolstack via XEN_DOMCTL_set_cpuid, therefore we cannot
+     * hide domain's virtual EPC from toolstack.
+     */
+    recalculate_sgx(p, 0);
 
     for ( i = 0; i < ARRAY_SIZE(p->cache.raw); ++i )
     {
@@ -645,6 +723,13 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
             *res = p->xstate.raw[subleaf];
             break;
 
+        case SGX_CPUID:
+            if ( !p->feat.sgx )
+                return;
+
+            *res = p->sgx.raw[subleaf];
+            break;
+
         default:
             *res = p->basic.raw[leaf];
             break;
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index f40e989fd8..7d49947a3e 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -53,6 +53,7 @@ static int update_domain_cpuid_info(struct domain *d,
     struct cpuid_policy *p = d->arch.cpuid;
     const struct cpuid_leaf leaf = { ctl->eax, ctl->ebx, ctl->ecx, ctl->edx };
     int old_vendor = p->x86_vendor;
+    int ret = 0;
 
     /*
      * Skip update for leaves we don't care about.  This avoids the overhead
@@ -74,6 +75,7 @@ static int update_domain_cpuid_info(struct domain *d,
         if ( ctl->input[0] == XSTATE_CPUID &&
              ctl->input[1] != 1 ) /* Everything else automatically calculated. */
             return 0;
+
         break;
 
     case 0x40000000: case 0x40000100:
@@ -104,6 +106,10 @@ static int update_domain_cpuid_info(struct domain *d,
             p->xstate.raw[ctl->input[1]] = leaf;
             break;
 
+        case SGX_CPUID:
+            p->sgx.raw[ctl->input[1]] = leaf;
+            break;
+
         default:
             p->basic.raw[ctl->input[0]] = leaf;
             break;
@@ -255,6 +261,45 @@ static int update_domain_cpuid_info(struct domain *d,
         }
         break;
 
+    case 0x12:
+    {
+        uint64_t base_pfn, npages;
+
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+
+        if ( ctl->input[1] != 2 )
+            break;
+
+        /* SGX has not enabled */
+        if ( !p->feat.sgx || !p->sgx.sgx1 )
+            break;
+
+        /*
+         * If SGX is enabled in CPUID, then we are expecting valid EPC resource
+         * in sub-leaf 0x2. Return -EFAULT to notify toolstack that there's
+         * something wrong.
+         */
+        if ( !p->sgx.base_valid || !p->sgx.size_valid )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        base_pfn = (((uint64_t)(p->sgx.base_pfn_high)) << 20) |
+            (uint64_t)p->sgx.base_pfn_low;
+        npages = (((uint64_t)(p->sgx.npages_high)) << 20) |
+            (uint64_t)p->sgx.npages_low;
+
+        if ( !hvm_epc_populated(d) )
+            ret = hvm_populate_epc(d, base_pfn, npages);
+        else
+            if ( base_pfn != to_sgx(d)->epc_base_pfn ||
+                    npages != to_sgx(d)->epc_npages )
+                ret = -EINVAL;
+
+        break;
+    }
     case 0x80000001:
         if ( is_pv_domain(d) && ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) )
         {
@@ -299,7 +344,7 @@ static int update_domain_cpuid_info(struct domain *d,
         break;
     }
 
-    return 0;
+    return ret;
 }
 
 void arch_get_domain_info(const struct domain *d,
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index ac25908eca..326f267263 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -61,10 +61,11 @@ extern struct cpuidmasks cpuidmask_defaults;
 /* Whether or not cpuid faulting is available for the current domain. */
 DECLARE_PER_CPU(bool, cpuid_faulting_enabled);
 
-#define CPUID_GUEST_NR_BASIC      (0xdu + 1)
+#define CPUID_GUEST_NR_BASIC      (0x12u + 1)
 #define CPUID_GUEST_NR_FEAT       (0u + 1)
 #define CPUID_GUEST_NR_CACHE      (5u + 1)
 #define CPUID_GUEST_NR_XSTATE     (62u + 1)
+#define CPUID_GUEST_NR_SGX        (0x2u + 1)
 #define CPUID_GUEST_NR_EXTD_INTEL (0x8u + 1)
 #define CPUID_GUEST_NR_EXTD_AMD   (0x1cu + 1)
 #define CPUID_GUEST_NR_EXTD       MAX(CPUID_GUEST_NR_EXTD_INTEL, \
@@ -169,6 +170,29 @@ struct cpuid_policy
         } comp[CPUID_GUEST_NR_XSTATE];
     } xstate;
 
+    union {
+        struct cpuid_leaf raw[CPUID_GUEST_NR_SGX];
+
+        struct {
+            /* Subleaf 0. */
+            uint32_t sgx1:1, sgx2:1, :30;
+            uint32_t miscselect, /* c */ :32;
+            uint32_t maxenclavesize_n64:8, maxenclavesize_64:8, :16;
+
+            /* Subleaf 1. */
+            uint32_t init:1, debug:1, mode64:1, /*reserve*/:1, provisionkey:1,
+                     einittokenkey:1, :26;
+            uint32_t /* reserve */:32;
+            uint32_t xfrm_low, xfrm_high;
+
+            /* Subleaf 2. */
+            uint32_t base_valid:1, :11, base_pfn_low:20;
+            uint32_t base_pfn_high:20, :12;
+            uint32_t size_valid:1, :11, npages_low:20;
+            uint32_t npages_high:20, :12;
+        };
+    } sgx;
+
     /* Extended leaves: 0x800000xx */
     union {
         struct cpuid_leaf raw[CPUID_GUEST_NR_EXTD];
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (10 preceding siblings ...)
  2017-07-09  8:10 ` [PATCH 08/15] xen: x86: add SGX cpuid handling support Kai Huang
@ 2017-07-09  8:12 ` Kai Huang
  2017-07-12 11:01   ` Andrew Cooper
  2017-07-09  8:14 ` [PATCH 13/15] xen: tools: add new 'epc' parameter support Kai Huang
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:12 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, andrew.cooper3, kevin.tian, jbeulich

A new 'p2m_epc' type is added for EPC mapping type. Two wrapper functions
set_epc_p2m_entry and clear_epc_p2m_entry are also added for further use.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 xen/arch/x86/mm/p2m-ept.c |  3 +++
 xen/arch/x86/mm/p2m.c     | 41 +++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h | 12 ++++++++++--
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index ecab56fbec..95929868dc 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -182,6 +182,9 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->a = !!cpu_has_vmx_ept_ad;
             entry->d = 0;
             break;
+        case p2m_epc:
+            entry->r = entry->w = entry->x = 1;
+            break;
     }
 
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index bee733dc46..29f42cb96d 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1176,6 +1176,12 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
     return ret;
 }
 
+int set_epc_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_epc,
+            p2m_get_hostp2m(d)->default_access);
+}
+
 /*
  * Returns:
  *    0        for success
@@ -1260,6 +1266,41 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn)
     return ret;
 }
 
+int clear_epc_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    mfn_t omfn;
+    p2m_type_t ot;
+    p2m_access_t oa;
+    int ret = 0;
+
+    gfn_lock(p2m, gfn, 0);
+
+    omfn = p2m->get_entry(p2m, gfn, &ot, &oa, 0, NULL, NULL);
+    if ( mfn_eq(omfn, INVALID_MFN) || !p2m_is_epc(ot) )
+    {
+        printk(XENLOG_G_WARNING
+                "d%d: invalid EPC map to clear: gfn 0x%lx, type %d.\n",
+                d->domain_id, gfn, ot);
+        goto out;
+    }
+    if ( !mfn_eq(mfn, omfn) )
+    {
+        printk(XENLOG_G_WARNING
+                "d%d: mistaken EPC mfn to clear: gfn 0x%lx, "
+                "omfn 0x%lx, mfn 0x%lx.\n",
+                d->domain_id, gfn, mfn_x(omfn), mfn_x(mfn));
+    }
+
+    ret = p2m_set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K, p2m_invalid,
+            p2m->default_access);
+
+out:
+    gfn_unlock(p2m, gfn, 0);
+
+    return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index e736609241..a9e330dd3c 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -72,6 +72,7 @@ typedef enum {
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
     p2m_map_foreign  = 14,        /* ram pages from foreign domain */
     p2m_ioreq_server = 15,
+    p2m_epc = 16,                 /* EPC */
 } p2m_type_t;
 
 /* Modifiers to the query */
@@ -142,10 +143,13 @@ typedef unsigned int p2m_query_t;
                             | p2m_to_mask(p2m_ram_logdirty) )
 #define P2M_SHARED_TYPES   (p2m_to_mask(p2m_ram_shared))
 
+#define P2M_EPC_TYPES   (p2m_to_mask(p2m_epc))
+
 /* Valid types not necessarily associated with a (valid) MFN. */
 #define P2M_INVALID_MFN_TYPES (P2M_POD_TYPES                  \
                                | p2m_to_mask(p2m_mmio_direct) \
-                               | P2M_PAGING_TYPES)
+                               | P2M_PAGING_TYPES             \
+                               | P2M_EPC_TYPES)
 
 /* Broken type: the frame backing this pfn has failed in hardware
  * and must not be touched. */
@@ -153,6 +157,7 @@ typedef unsigned int p2m_query_t;
 
 /* Useful predicates */
 #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES)
+#define p2m_is_epc(_t) (p2m_to_mask(_t) & P2M_EPC_TYPES)
 #define p2m_is_hole(_t) (p2m_to_mask(_t) & P2M_HOLE_TYPES)
 #define p2m_is_mmio(_t) (p2m_to_mask(_t) & P2M_MMIO_TYPES)
 #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
@@ -163,7 +168,7 @@ typedef unsigned int p2m_query_t;
 /* Grant types are *not* considered valid, because they can be
    unmapped at any time and, unless you happen to be the shadow or p2m
    implementations, there's no way of synchronising against that. */
-#define p2m_is_valid(_t) (p2m_to_mask(_t) & (P2M_RAM_TYPES | P2M_MMIO_TYPES))
+#define p2m_is_valid(_t) (p2m_to_mask(_t) & (P2M_RAM_TYPES | P2M_MMIO_TYPES | P2M_EPC_TYPES))
 #define p2m_has_emt(_t)  (p2m_to_mask(_t) & (P2M_RAM_TYPES | p2m_to_mask(p2m_mmio_direct)))
 #define p2m_is_pageable(_t) (p2m_to_mask(_t) & P2M_PAGEABLE_TYPES)
 #define p2m_is_paging(_t)   (p2m_to_mask(_t) & P2M_PAGING_TYPES)
@@ -634,6 +639,9 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn);
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
 
+int set_epc_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
+int clear_epc_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
+
 /* 
  * Populate-on-demand
  */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 13/15] xen: tools: add new 'epc' parameter support
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (11 preceding siblings ...)
  2017-07-09  8:12 ` [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping Kai Huang
@ 2017-07-09  8:14 ` Kai Huang
  2017-07-09  8:15 ` [PATCH 14/15] xen: tools: add SGX to applying CPUID policy Kai Huang
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:14 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson

In order to be able to configure domain's EPC size when it is created, a new
'epc' parameter is added to XL configuration file. Like memory it indicates
EPC size in MB. A new 'libxl_sgx_buildinfo', which contains EPC base and size,
is also added to libxl_domain_buind_info. EPC base and size are also added to
'xc_dom_image' in order to add EPC to e820 table. EPC base is calculated
internally.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 tools/libxc/include/xc_dom.h |  4 ++++
 tools/libxl/libxl_create.c   |  9 +++++++++
 tools/libxl/libxl_dom.c      | 30 ++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_types.idl  |  6 ++++++
 tools/libxl/libxl_x86.c      | 12 ++++++++++++
 tools/xl/xl_parse.c          |  5 +++++
 7 files changed, 68 insertions(+)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index ce47058c41..be10af7002 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -203,6 +203,10 @@ struct xc_dom_image {
     xen_paddr_t lowmem_end;
     xen_paddr_t highmem_end;
     xen_pfn_t vga_hole_size;
+#if defined(__i386__) || defined(__x86_64__)
+    xen_paddr_t epc_base;
+    xen_paddr_t epc_size;
+#endif
 
     /* If unset disables the setup of the IOREQ pages. */
     bool device_model;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index bffbc456c1..8710e53ffd 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -59,6 +59,13 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
                             LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
+void libxl__sgx_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    if (b_info->u.hvm.sgx.epckb == LIBXL_MEMKB_DEFAULT)
+        b_info->u.hvm.sgx.epckb = 0;
+    b_info->u.hvm.sgx.epcbase = 0;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -372,6 +379,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
 
         libxl__rdm_setdefault(gc, b_info);
+
+        libxl__sgx_setdefault(gc, b_info);
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 5d914a59ee..6d1d51d35d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1124,6 +1124,36 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         highmem_end = (1ull << 32) + (lowmem_end - mmio_start);
         lowmem_end = mmio_start;
     }
+#if defined(__i386__) || defined(__x86_64__)
+    if (info->u.hvm.sgx.epckb) {
+        /*
+         * FIXME:
+         *
+         * Currently EPC base is put at highmem_end + 8G, which should be
+         * safe in most cases.
+         *
+         * I am not quite sure which is the best way to calcualte EPC base.
+         * IMO we can either:
+         * 1) put EPC between lowmem_end to mmio_start, but this brings
+         * additional logic to handle, ex, lowmem_end may become too small
+         * if EPC is large (shall we limit domain's EPC size?), and hvmloader
+         * will try to enlarge MMIO space until lowmem_end, or even relocate
+         * lowmem -- all those make things complicated, so probably put EPC
+         * in hole between lowmem_end to mmio_start is not good.
+         * 2) put EPC after highmem_end, but hvmloader may also relocate MMIO
+         * resource to the place after highmem_end. Maybe the ideal way is to
+         * put EPC right after highmem_end, and change hvmloader to detect
+         * EPC, and put high MMIO resource after EPC. I've done this but I
+         * found a strange bug that EPT mapping of EPC will be (at least part
+         * of the mappings) will be removed by whom I still cannot find.
+         * Currently EPC base is put at highmem_end + 8G, and hvmloader code
+         * is not changed to handle EPC, but this should be safe for most cases.
+         */
+        info->u.hvm.sgx.epcbase = highmem_end + (2ULL << 32);
+    }
+    dom->epc_size = (info->u.hvm.sgx.epckb << 10);
+    dom->epc_base = info->u.hvm.sgx.epcbase;
+#endif
     dom->lowmem_end = lowmem_end;
     dom->highmem_end = highmem_end;
     dom->mmio_start = mmio_start;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index afe6652847..9a1d309dac 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1252,6 +1252,8 @@ _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
 _hidden void libxl__rdm_setdefault(libxl__gc *gc,
                                    libxl_domain_build_info *b_info);
+_hidden void libxl__sgx_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 _hidden int libxl__device_p9_setdefault(libxl__gc *gc,
                                         libxl_device_p9 *p9);
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 22044259f3..9723c1fa46 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -450,6 +450,11 @@ libxl_altp2m_mode = Enumeration("altp2m_mode", [
     (3, "limited"),
     ], init_val = "LIBXL_ALTP2M_MODE_DISABLED")
 
+libxl_sgx_buildinfo = Struct("sgx_buildinfo", [
+    ("epcbase", uint64), # EPC base address
+    ("epckb", MemKB), # EPC size in KB
+    ], dir=DIR_IN)
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -564,6 +569,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
+                                       ("sgx", libxl_sgx_buildinfo),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 455f6f0bed..35b0ff1ba3 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -530,6 +530,9 @@ int libxl__arch_domain_construct_memmap(libxl__gc *gc,
         if (dom->acpi_modules[i].length)
             e820_entries++;
 
+    if ( dom->epc_base && dom->epc_size )
+        e820_entries++;
+
     if (e820_entries >= E820MAX) {
         LOGD(ERROR, domid, "Ooops! Too many entries in the memory map!");
         rc = ERROR_INVAL;
@@ -570,6 +573,15 @@ int libxl__arch_domain_construct_memmap(libxl__gc *gc,
         e820[nr].addr = ((uint64_t)1 << 32);
         e820[nr].size = highmem_size;
         e820[nr].type = E820_RAM;
+        nr++;
+    }
+
+    /* EPC */
+    if (dom->epc_base && dom->epc_size) {
+        e820[nr].addr = dom->epc_base;
+        e820[nr].size = dom->epc_size;
+        e820[nr].type = E820_RESERVED;
+        nr++;
     }
 
     if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0) {
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 856a304b30..4a9be64f78 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1182,6 +1182,11 @@ void parse_config_data(const char *config_source,
 
         if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
             b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
+
+        if (!xlu_cfg_get_long (config, "epc", &l, 0)) {
+            /* Get EPC size. EPC base is calculated by toolstack later. */
+            b_info->u.hvm.sgx.epckb = l * 1024;
+        }
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 14/15] xen: tools: add SGX to applying CPUID policy
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (12 preceding siblings ...)
  2017-07-09  8:14 ` [PATCH 13/15] xen: tools: add new 'epc' parameter support Kai Huang
@ 2017-07-09  8:15 ` Kai Huang
  2017-07-09  8:16 ` [PATCH 15/15] xen: tools: expose EPC in ACPI table Kai Huang
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:15 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, dave

In libxc, a new structure 'xc_cpuid_policy_build_info_t' is added to carry
domain's EPC base and size info from libxl. libxl_cpuid_apply_policy is also
changed to take 'libxl_domain_build_info_t' as parameter, where domain's EPC
base and size can be got and passed to xc_cpuid_apply_policy.
xc_cpuid_apply_policy is extended to support SGX CPUID. If hypervisor doesn't
report SGX feature in host type cpufeatureset, then using 'epc' parameter
results in domain creation failure as SGX cannot be supported.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 tools/libxc/include/xenctrl.h       | 10 ++++++
 tools/libxc/xc_cpuid_x86.c          | 68 ++++++++++++++++++++++++++++++++++---
 tools/libxl/libxl.h                 |  3 +-
 tools/libxl/libxl_cpuid.c           | 15 ++++++--
 tools/libxl/libxl_dom.c             |  6 +++-
 tools/libxl/libxl_nocpuid.c         |  4 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 11 +++++-
 tools/python/xen/lowlevel/xc/xc.c   | 11 +++++-
 8 files changed, 117 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 1629f412dd..b621b35dea 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1783,6 +1783,15 @@ int xc_domain_debug_control(xc_interface *xch,
                             uint32_t vcpu);
 
 #if defined(__i386__) || defined(__x86_64__)
+typedef struct xc_cpuid_policy_build_info_sgx {
+    uint64_t epc_base;
+    uint64_t epc_size;
+} xc_cpuid_policy_build_info_sgx_t;
+
+typedef struct xc_cpuid_policy_build_info {
+    xc_cpuid_policy_build_info_sgx_t sgx;
+} xc_cpuid_policy_build_info_t;
+
 int xc_cpuid_check(xc_interface *xch,
                    const unsigned int *input,
                    const char **config,
@@ -1794,6 +1803,7 @@ int xc_cpuid_set(xc_interface *xch,
                  char **config_transformed);
 int xc_cpuid_apply_policy(xc_interface *xch,
                           domid_t domid,
+                          xc_cpuid_policy_build_info_t *b_info,
                           uint32_t *featureset,
                           unsigned int nr_features);
 void xc_cpuid_to_str(const unsigned int *regs,
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 1bedf050b8..b7eb652db9 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -38,7 +38,7 @@ enum {
 #define clear_feature(idx, dst) ((dst) &= ~bitmaskof(idx))
 #define set_feature(idx, dst)   ((dst) |=  bitmaskof(idx))
 
-#define DEF_MAX_BASE 0x0000000du
+#define DEF_MAX_BASE 0x00000012u
 #define DEF_MAX_INTELEXT  0x80000008u
 #define DEF_MAX_AMDEXT    0x8000001cu
 
@@ -178,6 +178,8 @@ struct cpuid_domain_info
     /* HVM-only information. */
     bool pae;
     bool nestedhvm;
+
+    xc_cpuid_policy_build_info_t *b_info;
 };
 
 static void cpuid(const unsigned int *input, unsigned int *regs)
@@ -369,6 +371,12 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
                                   const struct cpuid_domain_info *info,
                                   const unsigned int *input, unsigned int *regs)
 {
+    xc_cpuid_policy_build_info_t *b_info = info->b_info;
+    xc_cpuid_policy_build_info_sgx_t *sgx = NULL;
+
+    if ( b_info )
+        sgx = &b_info->sgx;
+
     switch ( input[0] )
     {
     case 0x00000004:
@@ -381,6 +389,30 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
         regs[3] &= 0x3ffu;
         break;
 
+    case 0x00000012:
+        if ( !sgx ) {
+            regs[0] = regs[1] = regs[2] = regs[3] = 0;
+            break;
+        }
+
+        if ( !sgx->epc_base || !sgx->epc_size ) {
+            regs[0] = regs[1] = regs[2] = regs[3] = 0;
+            break;
+        }
+
+        if ( input[1] == 2 ) {
+            /*
+             * FIX EPC base and size for SGX CPUID leaf 2. Xen hypervisor is
+             * depending on XEN_DOMCTL_set_cpuid to know domain's EPC base
+             * and size.
+             */
+            regs[0] = (uint32_t)(sgx->epc_base & 0xfffff000) | 0x1;
+            regs[1] = (uint32_t)(sgx->epc_base >> 32);
+            regs[2] = (uint32_t)(sgx->epc_size & 0xfffff000) | 0x1;
+            regs[3] = (uint32_t)(sgx->epc_size >> 32);
+        }
+        break;
+
     case 0x80000000:
         if ( regs[0] > DEF_MAX_INTELEXT )
             regs[0] = DEF_MAX_INTELEXT;
@@ -444,6 +476,10 @@ static void xc_cpuid_hvm_policy(xc_interface *xch,
         regs[1] = regs[2] = regs[3] = 0;
         break;
 
+    case 0x00000012:
+        /* Intel SGX. Passthrough to Intel function */
+        break;
+
     case 0x80000000:
         /* Passthrough to cpu vendor specific functions */
         break;
@@ -649,12 +685,13 @@ void xc_cpuid_to_str(const unsigned int *regs, char **strs)
     }
 }
 
-static void sanitise_featureset(struct cpuid_domain_info *info)
+static int sanitise_featureset(struct cpuid_domain_info *info)
 {
     const uint32_t fs_size = xc_get_cpu_featureset_size();
     uint32_t disabled_features[fs_size];
     static const uint32_t deep_features[] = INIT_DEEP_FEATURES;
     unsigned int i, b;
+    xc_cpuid_policy_build_info_t *b_info = info->b_info;
 
     if ( info->hvm )
     {
@@ -707,9 +744,19 @@ static void sanitise_featureset(struct cpuid_domain_info *info)
             disabled_features[i] &= ~dfs[i];
         }
     }
+
+    /* Cannot support 'epc' parameter if SGX is unavailable */
+    if ( b_info && b_info->sgx.epc_base && b_info->sgx.epc_size )
+        if (!test_bit(X86_FEATURE_SGX, info->featureset)) {
+            printf("Xen hypervisor doesn't support SGX.\n");
+            return -EFAULT;
+        }
+
+    return 0;
 }
 
 int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
+                          xc_cpuid_policy_build_info_t *b_info,
                           uint32_t *featureset,
                           unsigned int nr_features)
 {
@@ -722,6 +769,8 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
     if ( rc )
         goto out;
 
+    info.b_info = b_info;
+
     cpuid(input, regs);
     base_max = (regs[0] <= DEF_MAX_BASE) ? regs[0] : DEF_MAX_BASE;
     input[0] = 0x80000000;
@@ -732,7 +781,9 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
     else
         ext_max = (regs[0] <= DEF_MAX_INTELEXT) ? regs[0] : DEF_MAX_INTELEXT;
 
-    sanitise_featureset(&info);
+    rc = sanitise_featureset(&info);
+    if ( rc )
+        goto out;
 
     input[0] = 0;
     input[1] = XEN_CPUID_INPUT_UNUSED;
@@ -757,12 +808,21 @@ int xc_cpuid_apply_policy(xc_interface *xch, domid_t domid,
                 continue;
         }
 
+        /* Intel SGX */
+        if ( input[0] == 0x12 )
+        {
+            input[1]++;
+            /* Intel SGX has 3 leaves */
+            if ( input[1] < 3 )
+                continue;
+        }
+
         input[0]++;
         if ( !(input[0] & 0x80000000u) && (input[0] > base_max ) )
             input[0] = 0x80000000u;
 
         input[1] = XEN_CPUID_INPUT_UNUSED;
-        if ( (input[0] == 4) || (input[0] == 7) )
+        if ( (input[0] == 4) || (input[0] == 7) || input[0] == 0x12)
             input[1] = 0;
         else if ( input[0] == 0xd )
             input[1] = 1; /* Xen automatically calculates almost everything. */
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index cf8687aa7e..dad72bf277 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1949,7 +1949,8 @@ libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
 int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str);
 int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *cpuid,
                                   const char* str);
-void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid);
+int libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid,
+                             libxl_domain_build_info *info);
 void libxl_cpuid_set(libxl_ctx *ctx, uint32_t domid,
                      libxl_cpuid_policy_list cpuid);
 
diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 24591e2461..550258bdf4 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -332,9 +332,20 @@ int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *cpuid,
     return 0;
 }
 
-void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid)
+int libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid,
+                             libxl_domain_build_info *info)
 {
-    xc_cpuid_apply_policy(ctx->xch, domid, NULL, 0);
+    xc_cpuid_policy_build_info_t cpuid_binfo;
+
+    memset(&cpuid_binfo, 0, sizeof (xc_cpuid_policy_build_info_t));
+
+    /* Currently only Intel SGX needs info when applying CPUID policy */
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+        cpuid_binfo.sgx.epc_base = info->u.hvm.sgx.epcbase;
+        cpuid_binfo.sgx.epc_size = (info->u.hvm.sgx.epckb << 10);
+    }
+
+    return xc_cpuid_apply_policy(ctx->xch, domid, &cpuid_binfo, NULL, 0);
 }
 
 void libxl_cpuid_set(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 6d1d51d35d..9d05d2813e 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -535,7 +535,11 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid,
         return ERROR_FAIL;
     }
 
-    libxl_cpuid_apply_policy(ctx, domid);
+    rc = libxl_cpuid_apply_policy(ctx, domid, info);
+    if (rc) {
+        LOG(ERROR, "Failed to apply CPUID policy (%d)", rc);
+        return ERROR_FAIL;
+    }
     if (info->cpuid != NULL)
         libxl_cpuid_set(ctx, domid, info->cpuid);
 
diff --git a/tools/libxl/libxl_nocpuid.c b/tools/libxl/libxl_nocpuid.c
index ef1161c434..70e0486e98 100644
--- a/tools/libxl/libxl_nocpuid.c
+++ b/tools/libxl/libxl_nocpuid.c
@@ -34,8 +34,10 @@ int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *cpuid,
     return 0;
 }
 
-void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid)
+int libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid,
+                             libxl_domain_build_info *info)
 {
+    return 0;
 }
 
 void libxl_cpuid_set(libxl_ctx *ctx, uint32_t domid,
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 5e455519d4..34f90bc630 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -796,7 +796,16 @@ CAMLprim value stub_xc_domain_cpuid_apply_policy(value xch, value domid)
 #if defined(__i386__) || defined(__x86_64__)
 	int r;
 
-	r = xc_cpuid_apply_policy(_H(xch), _D(domid), NULL, 0);
+    /*
+     * FIXME:
+     *
+     * Don't support passing SGX info to xc_cpuid_apply_policy here. To be
+     * honest I don't know the purpose of this CAML function, so I don't
+     * know whether we need to allow *caller* of this function to pass SGX
+     * info. As EPC base is calculated internally by toolstack so I think
+     * it is also impossible to pass EPC base from *user*.
+     */
+	r = xc_cpuid_apply_policy(_H(xch), _D(domid), NULL, NULL, 0);
 	if (r < 0)
 		failwith_xc(_H(xch));
 #else
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 5d112af6e0..a3e753589e 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -742,7 +742,16 @@ static PyObject *pyxc_dom_set_policy_cpuid(XcObject *self,
     if ( !PyArg_ParseTuple(args, "i", &domid) )
         return NULL;
 
-    if ( xc_cpuid_apply_policy(self->xc_handle, domid, NULL, 0) )
+    /*
+     * FIXME:
+     *
+     * Don't support passing SGX info to xc_cpuid_apply_policy here. To be
+     * honest I don't know the purpose of this python function, so I don't
+     * know whether we need to allow *caller* of this function to pass SGX
+     * info. As EPC base is calculated internally by toolstack so I think
+     * it is also impossible to pass EPC base from *user*.
+     */
+    if ( xc_cpuid_apply_policy(self->xc_handle, domid, NULL, NULL, 0) )
         return pyxc_error_to_exception(self->xc_handle);
 
     Py_INCREF(zero);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (13 preceding siblings ...)
  2017-07-09  8:15 ` [PATCH 14/15] xen: tools: add SGX to applying CPUID policy Kai Huang
@ 2017-07-09  8:16 ` Kai Huang
  2017-07-12 11:05   ` Andrew Cooper
                     ` (2 more replies)
  2017-07-11 14:13 ` [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Andrew Cooper
  2017-07-17  9:16 ` Wei Liu
  16 siblings, 3 replies; 58+ messages in thread
From: Kai Huang @ 2017-07-09  8:16 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, jbeulich, andrew.cooper3

On physical machine EPC is exposed in ACPI table via "INT0E0C". Although EPC
can be discovered by CPUID but Windows driver requires EPC to be exposed in
ACPI table as well. This patch exposes EPC in ACPI table.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 tools/firmware/hvmloader/util.c  | 23 +++++++++++++++++++
 tools/firmware/hvmloader/util.h  |  3 +++
 tools/libacpi/build.c            |  3 +++
 tools/libacpi/dsdt.asl           | 49 ++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/dsdt_acpi_info.asl |  6 +++--
 tools/libacpi/libacpi.h          |  1 +
 tools/libxl/libxl_x86_acpi.c     |  3 +++
 7 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240bb9..4a1da2d63a 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -330,6 +330,15 @@ cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
         : "0" (idx) );
 }
 
+void cpuid_count(uint32_t idx, uint32_t count, uint32_t *eax,
+                 uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
+{
+    asm volatile (
+        "cpuid"
+        : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
+        : "0" (idx), "c" (count) );
+}
+
 static const char hex_digits[] = "0123456789abcdef";
 
 /* Write a two-character hex representation of 'byte' to digits[].
@@ -888,6 +897,18 @@ static uint8_t acpi_lapic_id(unsigned cpu)
     return LAPIC_ID(cpu);
 }
 
+static void get_epc_info(struct acpi_config *config)
+{
+    uint32_t eax, ebx, ecx, edx;
+
+    cpuid_count(0x12, 0x2, &eax, &ebx, &ecx, &edx);
+
+    config->epc_base = (((uint64_t)(ebx & 0xfffff)) << 32) |
+                       (uint64_t)(eax & 0xfffff000);
+    config->epc_size = (((uint64_t)(edx & 0xfffff)) << 32) |
+                       (uint64_t)(ecx & 0xfffff000);
+}
+
 void hvmloader_acpi_build_tables(struct acpi_config *config,
                                  unsigned int physical)
 {
@@ -920,6 +941,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
         config->pci_hi_len = pci_hi_mem_end - pci_hi_mem_start;
     }
 
+    get_epc_info(config);
+
     s = xenstore_read("platform/generation-id", "0:0");
     if ( s )
     {
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 6062f0b8cf..deac0abb86 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -112,6 +112,9 @@ int hpet_exists(unsigned long hpet_base);
 void cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx,
            uint32_t *ecx, uint32_t *edx);
 
+void cpuid_count(uint32_t idx, uint32_t count, uint32_t *eax,
+                 uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
+
 /* Read the TSC register. */
 static inline uint64_t rdtsc(void)
 {
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9604..9d64856e26 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -54,6 +54,7 @@ struct acpi_info {
     uint32_t madt_lapic0_addr;  /* 16   - Address of first MADT LAPIC struct */
     uint32_t vm_gid_addr;       /* 20   - Address of VM generation id buffer */
     uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
+    uint64_t epc_min, epc_len;  /* 40, 48 - EPC region */
 };
 
 static void set_checksum(
@@ -535,6 +536,8 @@ int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config)
         acpi_info->pci_hi_min = config->pci_hi_start;
         acpi_info->pci_hi_len = config->pci_hi_len;
     }
+    acpi_info->epc_min = config->epc_base;
+    acpi_info->epc_len = config->epc_size;
 
     /*
      * Fill in high-memory data structures, starting at @buf.
diff --git a/tools/libacpi/dsdt.asl b/tools/libacpi/dsdt.asl
index fa8ff317b2..25ce196028 100644
--- a/tools/libacpi/dsdt.asl
+++ b/tools/libacpi/dsdt.asl
@@ -441,6 +441,55 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
                 }
             }
         }
+
+        Device (EPC)
+        {
+            Name (_HID, EisaId ("INT0E0C"))
+            Name (_STR, Unicode ("Enclave Page Cache 1.5"))
+            Name (_MLS, Package (0x01)
+            {
+                Package (0x02)
+                {
+                    "en",
+                    Unicode ("Enclave Page Cache 1.5")
+                }
+            })
+            Name (RBUF, ResourceTemplate ()
+            {
+                QWordMemory (ResourceConsumer, PosDecode, MinFixed, MaxFixed,
+                    Cacheable, ReadWrite,
+                    0x0000000000000000, // Granularity
+                    0x0000000000000000, // Range Minimum
+                    0x0000000000000000, // Range Maximum
+                    0x0000000000000000, // Translation Offset
+                    0x0000000000000001, // Length
+                    ,, _Y03,
+                    AddressRangeMemory, TypeStatic)
+            })
+
+            Method(_CRS, 0, NotSerialized) // _CRS: Current Resource Settings
+            {
+                CreateQwordField (RBUF, \_SB.EPC._Y03._MIN, EMIN) // _MIN: Minimuum Base Address
+                CreateQwordField (RBUF, \_SB.EPC._Y03._MAX, EMAX) // _MIN: Maximum Base Address
+                CreateQwordField (RBUF, \_SB.EPC._Y03._LEN, ELEN) // _LEN: Length
+                Store(\_SB.EMIN, EMIN)
+                Store(\_SB.ELEN, ELEN)
+                Add(EMIN, ELEN, EMAX)
+                Subtract(EMAX, One, EMAX)
+
+                Return (RBUF)
+            }
+
+            Method(_STA, 0, NotSerialized) // _STA: Status
+            {
+                IF ((\_SB.ELEN != Zero))
+                {
+                    Return (0x0F)
+                }
+
+                Return (Zero)
+            }
+        }
     }
     /* _S3 and _S4 are in separate SSDTs */
     Name (\_S5, Package (0x04) {
diff --git a/tools/libacpi/dsdt_acpi_info.asl b/tools/libacpi/dsdt_acpi_info.asl
index 0136dce55c..ac6b14f82f 100644
--- a/tools/libacpi/dsdt_acpi_info.asl
+++ b/tools/libacpi/dsdt_acpi_info.asl
@@ -5,7 +5,7 @@
         * BIOS region must match struct acpi_info in build.c and
         * be located at ACPI_INFO_PHYSICAL_ADDRESS = 0xFC000000
         */
-       OperationRegion(BIOS, SystemMemory, 0xFC000000, 40)
+       OperationRegion(BIOS, SystemMemory, 0xFC000000, 56)
        Field(BIOS, ByteAcc, NoLock, Preserve) {
            UAR1, 1,
            UAR2, 1,
@@ -21,6 +21,8 @@
            LMIN, 32,
            HMIN, 32,
            LLEN, 32,
-           HLEN, 32
+           HLEN, 32,
+           EMIN, 64,
+           ELEN, 64,
        }
     }
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 2ed1ecfc8e..5645e0866b 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -63,6 +63,7 @@ struct acpi_config {
     /* PCI I/O hole */
     uint32_t pci_start, pci_len;
     uint64_t pci_hi_start, pci_hi_len;
+    uint64_t epc_base, epc_size;
 
     uint32_t table_flags;
     uint8_t acpi_revision;
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index c0a6e321ec..0d62a76590 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -156,6 +156,9 @@ static int init_acpi_config(libxl__gc *gc,
     config->lapic_id = acpi_lapic_id;
     config->acpi_revision = 5;
 
+    config->epc_base = b_info->u.hvm.sgx.epcbase;
+    config->epc_size = (b_info->u.hvm.sgx.epckb << 10);
+
     rc = 0;
 out:
     return rc;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (14 preceding siblings ...)
  2017-07-09  8:16 ` [PATCH 15/15] xen: tools: expose EPC in ACPI table Kai Huang
@ 2017-07-11 14:13 ` Andrew Cooper
  2017-07-17  6:08   ` Huang, Kai
  2017-07-17  9:16 ` Wei Liu
  16 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-11 14:13 UTC (permalink / raw)
  To: Kai Huang, xen-devel
  Cc: kevin.tian, sstabellini, wei.liu2, George.Dunlap, tim,
	ian.jackson, jbeulich

On 09/07/17 09:03, Kai Huang wrote:
> Hi all,
>
> This series is RFC Xen SGX virtualization support design and RFC draft patches.

Thankyou very much for this design doc.

> 2. SGX Virtualization Design
>
> 2.1 High Level Toolstack Changes:
>
> 2.1.1 New 'epc' parameter
>
> EPC is limited resource. In order to use EPC efficiently among all domains,
> when creating guest, administrator should be able to specify domain's virtual
> EPC size. And admin
> alao should be able to get all domain's virtual EPC size.
>
> For this purpose, a new 'epc = <size>' parameter is added to XL configuration
> file. This parameter specifies guest's virtual EPC size. The EPC base address
> will be calculated by toolstack internally, according to guest's memory size,
> MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.

How will this interact with multi-package servers?  Even though its fine 
to implement the single-package support first, the design should be 
extensible to the multi-package case.

First of all, what are the implications of multi-package SGX?

(Somewhere) you mention changes to scheduling.  I presume this is 
because a guest with EPC mappings in EPT must be scheduled on the same 
package, or ENCLU[EENTER] will fail.  I presume also that each package 
will have separate, unrelated private keys?

I presume there is no sensible way (even on native) for a single logical 
process to use multiple different enclaves?  By extension, does it make 
sense to try and offer parts of multiple enclaves to a single VM?

> 2.1.3 Notify domain's virtual EPC base and size to Xen
>
> Xen needs to know guest's EPC base and size in order to populate EPC pages for
> it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.

I am currently in the process of reworking the Xen/Toolstack interface 
when it comes to CPUID handling.  The latest design is available here: 
https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00378.html 
but the end result will be the toolstack expressing its CPUID policy in 
terms of the architectural layout.

Therefore, I would expect that, however the setting is represented in 
the configuration file, xl/libxl would configure it with the hypervisor 
by setting CPUID.0x12[2] with the appropriate base and size.

> 2.1.4 Launch Control Support (?)
>
> Xen Launch Control Support is about to support running multiple domains with
> each running its own LE signed by different owners (if HW allows, explained
> below). As explained in 1.4 SGX Launch Control, EINIT for LE (Launch Enclave)
> only succeeds when SHA256(SIGSTRUCT.modulus) matches IA32_SGXLEPUBKEYHASHn,
> and EINIT for other enclaves will derive EINITTOKEN key according to
> IA32_SGXLEPUBKEYHASHn. Therefore, to support this, guest's virtual
> IA32_SGXLEPUBKEYHASHn must be updated to phyiscal MSRs before EINIT (which
> also means the physical IA32_SGXLEPUBKEYHASHn need to be *unlocked* in BIOS
> before booting to OS).
>
> For physical machine, it is BIOS's writer's decision that whether BIOS would
> provide interface for user to specify customerized IA32_SGXLEPUBKEYHASHn (it
> is default to digest of Intel's signing key after reset). In reality, OS's SGX
> driver may require BIOS to make MSRs *unlocked* and actively write the hash
> value to MSRs in order to run EINIT successfully, as in this case, the driver
> will not depend on BIOS's capability (whether it allows user to customerize
> IA32_SGXLEPUBKEYHASHn value).
>
> The problem is for Xen, do we need a new parameter, such as 'lehash=<SHA256>'
> to specify the default value of guset's virtual IA32_SGXLEPUBKEYHASHn? And do
> we need a new parameter, such as 'lewr' to specify whether guest's virtual MSRs
> are locked or not before handling to guest's OS?
>
> I tends to not introduce 'lehash', as it seems SGX driver would actively update
> the MSRs. And new parameter would add additional changes for upper layer
> software (such as openstack). And 'lewr' is not needed either as Xen can always
> *unlock* the MSRs to guest.
>
> Please give comments?
>
> Currently in my RFC patches above two parameters are not implemented.
> Xen hypervisor will always *unlock* the MSRs. Whether there is 'lehash'
> parameter or not doesn't impact Xen hypervisor's emulation of
> IA32_SGXLEPUBKEYHASHn. See below Xen hypervisor changes for details.

Reading around, am I correct with the following?

1) Some processors have no launch control.  There is no restriction on 
which enclaves can boot.

2) Some Skylake client processors claim to have launch control, but the 
MSRs are unavailable (is this an erratum?).  These are limited to 
booting enclaves matching the Intel public key.

3) Launch control may be locked by the BIOS.  There may be a custom 
hash, or it might be the Intel default.  Xen can't adjust it at all, but 
can support running any number of VMs with matching enclaves.

4) Launch control may be unlocked by the BIOS.  In this case, Xen can 
context switch a hash per domain, and run all enclaves.

The eventual plans for CPUID and MSR levelling should allow all of these 
to be expressed in sensible ways, and I don't forsee any issues with 
supporting all of these scenarios.



> 2.2 High Level Xen Hypervisor Changes:
>
> 2.2.1 EPC Management (?)
>
> Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before
> supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible
> that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on,
> until invaid EPC is reported), but this is only true on multiple-socket server
> machines. For server machines there are additional things also needs to be done,
> such as NUMA EPC, scheduling, etc. We will support server machine in the future
> but currently we only support one EPC.
>
> EPC is reported as reserved memory (so it is not reported as normal memory).
> EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of each
> EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and free
> EPC pages for guest.
>
> There are two ways to manage EPC: Manage EPC separately; or Integrate it to
> existing memory management framework.
>
> It is easy to manage EPC separately, as currently EPC is pretty small (~100MB),
> and we can even put them in a single list. However it is not flexible, for
> example, you will have to write new algorithms when EPC becomes larger, ex, GB.
> And you have to write new code to support NUMA EPC (although this will not come
> in short time).
>
> Integrating EPC to existing memory management framework seems more reasonable,
> as in this way we can resume memory management data structures/algorithms, and
> it will be more flexible to support larger EPC and potentially NUMA EPC. But
> modifying MM framework has a higher risk to break existing memory management
> code (potentially more bugs).
>
> In my RFC patches currently we choose to manage EPC separately. A new
> structure epc_page is added to represent a single 4K EPC page. A whole array
> of struct epc_page will be allocated during EPC initialization, so that given
> the other, one of PFN of EPC page and 'struct epc_page' can be got by adding
> offset.
>
> But maybe integrating EPC to MM framework is more reasonable. Comments?
>
> 2.2.2 EPC Virtualization (?)

It looks like managing the EPC is very similar to managing the NVDIMM 
ranges.  We have a (set of) physical address ranges which need 4k 
ownership granularity to different domains.

I think integrating this into struct page_struct is the better way to go.

>
> This part is how to populate EPC for guests. We have 3 choices:
>      - Static Partitioning
>      - Oversubscription
>      - Ballooning
>
> Static Partitioning means all EPC pages will be allocated and mapped to guest
> when it is created, and there's no runtime change of page table mappings for EPC
> pages. Oversubscription means Xen hypervisor supports EPC page swapping between
> domains, meaning Xen is able to evict EPC page from another domain and assign it
> to the domain that needs the EPC. With oversubscription, EPC can be assigned to
> domain on demand, when EPT violation happens. Ballooning is similar to memory
> ballooning. It is basically "Static Partitioning" + "Balloon driver" in guest.
>
> Static Partitioning is the easiest way in terms of implementation, and there
> will be no hypervisor overhead (except EPT overhead of course), because in
> "Static partitioning", there is no EPT violation for EPC, and Xen doesn't need
> to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root mode.
>
> Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like "Static
> Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and doesn't
> have EPT violation for EPC either. To support ballooning, we need ballooning
> driver in guest to issue hypercall to give up or reclaim EPC pages. In terms of
> hypercall, we have two choices: 1) Add new hypercall for EPC ballooning; 2)
> Using existing XENMEM_{increase/decrease}_reservation with new memory flag, ie,
> XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall or not
> later.
>
> Oversubscription looks nice but it requires more complicated implemetation.
> Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to follow specific
> steps to evict EPC pages, and in order to do that, basically Xen needs to trap
> ENCLS from guest and keep track of EPC page status and enclave info from all
> guest. This is because:
>      - To evict regular EPC page, Xen needs to know SECS location
>      - Xen needs to know EPC page type: evicting regular EPC and evicting SECS,
>        VA page have different steps.
>      - Xen needs to know EPC page status: whether the page is blocked or not.
>
> Those info can only be got by trapping ENCLS from guest, and parsing its
> parameters (to identify SECS page, etc). Parsing ENCLS parameters means we need
> to know which ENCLS leaf is being trapped, and we need to translate guest's
> virtual address to get physical address in order to locate EPC page. And once
> ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to
> reconstruct ENCLS parameters by remapping all guest's virtual address to Xen's
> virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective address*
> which is able to be traslated by processor when running ENCLS.
>
>      --------------------------------------------------------------
>                  |   ENCLS   |
>      --------------------------------------------------------------
>                  |          /|\
>      ENCLS VMEXIT|           | VMENTRY
>                  |           |
>                 \|/          |
>
> 		1) parse ENCLS parameters
> 		2) reconstruct(remap) guest's ENCLS parameters
> 		3) run ENCLS on behalf of guest (and skip ENCLS)
> 		4) on success, update EPC/enclave info, or inject error
>
> And Xen needs to maintain each EPC page's status (type, blocked or not, in
> enclave or not, etc). Xen also needs to maintain all Enclave's info from all
> guests, in order to find the correct SECS for regular EPC page, and enclave's
> linear address as well.
>
> So in general, "Static Partitioning" has simplest implementation, but obviously
> not the best way to use EPC efficiently; "Ballooning" has all pros of Static
> Partitioning but requies guest balloon driver; "Oversubscription" is best in
> terms of flexibility but requires complicated hypervisor implemetation.
>
> We have implemented "Static Partitioning" in RFC patches, but needs your
> feedback on whether it is enough. If not, which one should we do at next stage
> -- Ballooning or Oversubscription. IMO Ballooning may be good enough, given fact
> that currently memory is also "Static Partitioning" + "Ballooning".
>
> Comments?

Definitely go for static partitioning to begin with.  This is far 
simpler to implement.

I can't see a pressing usecase for oversubscription or ballooning. Any 
datacenter work will be using exclusively static, and I expect static 
will fine for all (or at least, most) client usecases.

>
> 2.2.3 Populate EPC for Guest
>
> Toolstack notifies Xen about domain's EPC base and size by XEN_DOMCTL_set_cpuid,
> so currently Xen populates all EPC pages for guest in XEN_DOMCTL_set_cpuid,
> particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. Once Xen
> checks the values passed from toolstack is valid, Xen will allocate all EPC
> pages and setup EPT mappings for guest.
>
> 2.2.4 New Dedicated Hypercall (?)

All this information should (eventually) be available via the 
appropriate SYSCTL_get_{cpuid,msr}_policy hypercalls.  I don't see any 
need for dedicated hypercalls.

> 2.2.9 Guest Suspend & Resume
>
> On hardware, EPC is destroyed when power goes to S3-S5. So Xen will destroy
> guest's EPC when guest's power goes into S3-S5. Currently Xen is notified by
> Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen will
> destroy EPC if S State is S3-S5.
>
> Specifically, Xen will run EREMOVE for guest's each EPC page, as guest may
> not handle EPC suspend & resume correctly, in which case physically guest's EPC
> pages may still be valid, so Xen needs to run EREMOVE to make sure all EPC
> pages are becoming invalid. Otherwise further operation in guest on EPC may
> fault as it assumes all EPC pages are invalid after guest is resumed.
>
> For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case Xen will
> keep this SECS page into a list, and call EREMOVE for them again after all EPC
> pages have been called with EREMOVE. This time the EREMOVE on SECS will succeed
> as all children (regular EPC pages) have already been removed.
>
> 2.2.10 Destroying Domain
>
> Normally Xen just frees all EPC pages for domain when it is destroyed. But Xen
> will also do EREMOVE on all guest's EPC pages (described in above 2.2.7) before
> free them, as guest may shutdown unexpected (ex, user kills guest), and in this
> case, guest's EPC may still be valid.
>
> 2.3 Additional Point: Live Migration, Snapshot Support (?)

How big is the EPC?  If we are talking MB rather than GB, movement of 
the EPC could be after the pause, which would add some latency to live 
migration but should work.  I expect that people would prefer to have 
the flexibility of migration even at the cost of extra latency.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/15] xen: mm: add ioremap_cache
  2017-07-09  8:10 ` [PATCH 04/15] xen: mm: add ioremap_cache Kai Huang
@ 2017-07-11 20:14   ` Julien Grall
  2017-07-12  1:52     ` Huang, Kai
  2017-07-12  6:17     ` Jan Beulich
  0 siblings, 2 replies; 58+ messages in thread
From: Julien Grall @ 2017-07-11 20:14 UTC (permalink / raw)
  To: Kai Huang, xen-devel; +Cc: andrew.cooper3, jbeulich

Hi,

On 07/09/2017 09:10 AM, Kai Huang wrote:
> Currently Xen only has non-cacheable version of ioremap. Although EPC is
> reported as reserved memory in e820 but it can be mapped as cacheable. This
> patch adds ioremap_cache (cacheable version of ioremap).
> 
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>   xen/arch/x86/mm.c      | 15 +++++++++++++--
>   xen/include/xen/vmap.h |  1 +

First of all, this is common code and the "REST" maintainers should have 
been CCed for this include.

But xen/include/xen/vmap.h is common code and going to break ARM. We 
already have an inline implementation of ioremap_nocache. You should 
move the definition in x86 specific headers.

Please make sure to at least build test ARM when touching common code.

Cheers,

>   2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 101ab33193..d0b6b3a247 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -6284,9 +6284,10 @@ void *__init arch_vmap_virt_end(void)
>       return (void *)fix_to_virt(__end_of_fixed_addresses);
>   }
>   
> -void __iomem *ioremap(paddr_t pa, size_t len)
> +static void __iomem *__ioremap(paddr_t pa, size_t len, bool_t cache)
>   {
>       mfn_t mfn = _mfn(PFN_DOWN(pa));
> +    unsigned int flags = cache ? PAGE_HYPERVISOR : PAGE_HYPERVISOR_NOCACHE;
>       void *va;
>   
>       WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL));
> @@ -6299,12 +6300,22 @@ void __iomem *ioremap(paddr_t pa, size_t len)
>           unsigned int offs = pa & (PAGE_SIZE - 1);
>           unsigned int nr = PFN_UP(offs + len);
>   
> -        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE, VMAP_DEFAULT) + offs;
> +        va = __vmap(&mfn, nr, 1, 1, flags, VMAP_DEFAULT) + offs;
>       }
>   
>       return (void __force __iomem *)va;
>   }
>   
> +void __iomem *ioremap(paddr_t pa, size_t len)
> +{
> +    return __ioremap(pa, len, false);
> +}
> +
> +void __iomem *ioremap_cache(paddr_t pa, size_t len)
> +{
> +    return __ioremap(pa, len, true);
> +}
> +
>   int create_perdomain_mapping(struct domain *d, unsigned long va,
>                                unsigned int nr, l1_pgentry_t **pl1tab,
>                                struct page_info **ppg)
> diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
> index 369560e620..f6037e368c 100644
> --- a/xen/include/xen/vmap.h
> +++ b/xen/include/xen/vmap.h
> @@ -24,6 +24,7 @@ void *vzalloc(size_t size);
>   void vfree(void *va);
>   
>   void __iomem *ioremap(paddr_t, size_t);
> +void __iomem *ioremap_cache(paddr_t, size_t);
>   
>   static inline void iounmap(void __iomem *va)
>   {
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/15] xen: mm: add ioremap_cache
  2017-07-11 20:14   ` Julien Grall
@ 2017-07-12  1:52     ` Huang, Kai
  2017-07-12  7:13       ` Julien Grall
  2017-07-12  6:17     ` Jan Beulich
  1 sibling, 1 reply; 58+ messages in thread
From: Huang, Kai @ 2017-07-12  1:52 UTC (permalink / raw)
  To: Julien Grall, Kai Huang, xen-devel; +Cc: andrew.cooper3, jbeulich

Hi Julien,

Thanks for pointing out. I'll move to x86 specific.

I've cc-ed all maintainers reported by ./scripts/get_maintainer.pl, 
looks this script doesn't report all maintainers. Sorry. I'll add ARM 
maintainers next time.

Thanks,
-Kai

On 7/12/2017 8:14 AM, Julien Grall wrote:
> Hi,
> 
> On 07/09/2017 09:10 AM, Kai Huang wrote:
>> Currently Xen only has non-cacheable version of ioremap. Although EPC is
>> reported as reserved memory in e820 but it can be mapped as cacheable. 
>> This
>> patch adds ioremap_cache (cacheable version of ioremap).
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/mm.c      | 15 +++++++++++++--
>>   xen/include/xen/vmap.h |  1 +
> 
> First of all, this is common code and the "REST" maintainers should have 
> been CCed for this include.
> 
> But xen/include/xen/vmap.h is common code and going to break ARM. We 
> already have an inline implementation of ioremap_nocache. You should 
> move the definition in x86 specific headers.
> 
> Please make sure to at least build test ARM when touching common code.
> 
> Cheers,
> 
>>   2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 101ab33193..d0b6b3a247 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -6284,9 +6284,10 @@ void *__init arch_vmap_virt_end(void)
>>       return (void *)fix_to_virt(__end_of_fixed_addresses);
>>   }
>> -void __iomem *ioremap(paddr_t pa, size_t len)
>> +static void __iomem *__ioremap(paddr_t pa, size_t len, bool_t cache)
>>   {
>>       mfn_t mfn = _mfn(PFN_DOWN(pa));
>> +    unsigned int flags = cache ? PAGE_HYPERVISOR : 
>> PAGE_HYPERVISOR_NOCACHE;
>>       void *va;
>>       WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL));
>> @@ -6299,12 +6300,22 @@ void __iomem *ioremap(paddr_t pa, size_t len)
>>           unsigned int offs = pa & (PAGE_SIZE - 1);
>>           unsigned int nr = PFN_UP(offs + len);
>> -        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE, 
>> VMAP_DEFAULT) + offs;
>> +        va = __vmap(&mfn, nr, 1, 1, flags, VMAP_DEFAULT) + offs;
>>       }
>>       return (void __force __iomem *)va;
>>   }
>> +void __iomem *ioremap(paddr_t pa, size_t len)
>> +{
>> +    return __ioremap(pa, len, false);
>> +}
>> +
>> +void __iomem *ioremap_cache(paddr_t pa, size_t len)
>> +{
>> +    return __ioremap(pa, len, true);
>> +}
>> +
>>   int create_perdomain_mapping(struct domain *d, unsigned long va,
>>                                unsigned int nr, l1_pgentry_t **pl1tab,
>>                                struct page_info **ppg)
>> diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
>> index 369560e620..f6037e368c 100644
>> --- a/xen/include/xen/vmap.h
>> +++ b/xen/include/xen/vmap.h
>> @@ -24,6 +24,7 @@ void *vzalloc(size_t size);
>>   void vfree(void *va);
>>   void __iomem *ioremap(paddr_t, size_t);
>> +void __iomem *ioremap_cache(paddr_t, size_t);
>>   static inline void iounmap(void __iomem *va)
>>   {
>>
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/15] xen: mm: add ioremap_cache
  2017-07-11 20:14   ` Julien Grall
  2017-07-12  1:52     ` Huang, Kai
@ 2017-07-12  6:17     ` Jan Beulich
  2017-07-13  4:59       ` Huang, Kai
  1 sibling, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2017-07-12  6:17 UTC (permalink / raw)
  To: kaih.linux; +Cc: andrew.cooper3, julien.grall, xen-devel

>>> Julien Grall <julien.grall@arm.com> 07/11/17 10:15 PM >>>
>On 07/09/2017 09:10 AM, Kai Huang wrote:
>> Currently Xen only has non-cacheable version of ioremap. Although EPC is
>> reported as reserved memory in e820 but it can be mapped as cacheable. This
>> patch adds ioremap_cache (cacheable version of ioremap).
>> 
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/mm.c      | 15 +++++++++++++--
>>   xen/include/xen/vmap.h |  1 +
>
>First of all, this is common code and the "REST" maintainers should have 
>been CCed for this include.
>
>But xen/include/xen/vmap.h is common code and going to break ARM. We 
>already have an inline implementation of ioremap_nocache. You should 
>move the definition in x86 specific headers.

Indeed, plus the ARM implementation actually shows how this would better
be done: Have a function allowing more than just true/false to be passed in,
to eventually also allow having ioremap_wc() and alike as wrappers. As long
as it's x86-specific I'd then also suggest calling the new wrapper function
ioremap_wb() (as "cache" may also mean WT).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/15] xen: mm: add ioremap_cache
  2017-07-12  1:52     ` Huang, Kai
@ 2017-07-12  7:13       ` Julien Grall
  2017-07-13  5:01         ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Julien Grall @ 2017-07-12  7:13 UTC (permalink / raw)
  To: Huang, Kai, Kai Huang, xen-devel; +Cc: andrew.cooper3, jbeulich



On 07/12/2017 02:52 AM, Huang, Kai wrote:
> Hi Julien,

Hello Kai,

Please avoid top-posting.

> 
> Thanks for pointing out. I'll move to x86 specific.
> 
> I've cc-ed all maintainers reported by ./scripts/get_maintainer.pl, 
> looks this script doesn't report all maintainers. Sorry. I'll add ARM 
> maintainers next time. 

I would always double check the result of scripts/get_maintainer.pl. I 
am aware of a bug in scripts/get_maintainers.pl where only maintainer of 
the specific component (here x86) are listed, even when you touch common 
code.

In this case, I didn't ask to CC ARM maintainers, but CC "THE REST" 
group (see MAINTAINERS).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/15] xen: x86: add SGX cpuid handling support.
  2017-07-09  8:10 ` [PATCH 08/15] xen: x86: add SGX cpuid handling support Kai Huang
@ 2017-07-12 10:56   ` Andrew Cooper
  2017-07-13  5:42     ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-12 10:56 UTC (permalink / raw)
  To: Kai Huang, xen-devel; +Cc: jbeulich

On 09/07/17 10:10, Kai Huang wrote:
> This patch adds SGX to cpuid handling support. In init_guest_cpuid, for
> raw_policy and host_policy, physical EPC info is reported, but for pv_max_policy
> and hvm_max_policy EPC is hidden, as for particular domain, it's EPC base and
> size are from tookstack, and it is meaningless to contain physical EPC info in
> them. Before domain's EPC base and size are properly configured, guest's SGX
> cpuid should report invalid EPC, which is also consistent with HW behavior.
>
> Currently all EPC pages are fully populated for domain when it is created.
> Xen gets domain's EPC base and size from toolstack via XEN_DOMCTL_set_cpuid,
> so domain's EPC pages are also populated in XEN_DOMCTL_set_cpuid, after
> receiving valid EPC base and size. Failure to populate EPC (such as there's
> no enough free EPC pages) results in domain creation failure by making
> XEN_DOMCTL_set_cpuid return error.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>   xen/arch/x86/cpuid.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++-
>   xen/arch/x86/domctl.c       | 47 +++++++++++++++++++++++-
>   xen/include/asm-x86/cpuid.h | 26 +++++++++++++-
>   3 files changed, 157 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index d359e090f3..db896be2e8 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -9,6 +9,7 @@
>   #include <asm/paging.h>
>   #include <asm/processor.h>
>   #include <asm/xstate.h>
> +#include <asm/hvm/vmx/sgx.h>
>   
>   const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>   const uint32_t special_features[] = INIT_SPECIAL_FEATURES;
> @@ -158,6 +159,44 @@ static void recalculate_xstate(struct cpuid_policy *p)
>       }
>   }
>   
> +static void recalculate_sgx(struct cpuid_policy *p, bool_t hide_epc)

Across the entire series, please use bool rather than bool_t.

Why do we need this hide_epc parameter?  If we aren't providing any epc 
resource to the guest, the entire sgx union should be zero and the SGX 
feature bit should be hidden.

> +{
> +    if ( !p->feat.sgx )
> +    {
> +        memset(&p->sgx, 0, sizeof (p->sgx));
> +        return;
> +    }
> +
> +    if ( !p->sgx.sgx1 )
> +    {
> +        memset(&p->sgx, 0, sizeof (p->sgx));
> +        return;
> +    }

These two clauses can be combined.

> +
> +    /*
> +     * SDM 42.7.2.1 SECS.ATTRIBUTE.XFRM:
> +     *
> +     * Legal value for SECS.ATTRIBUTE.XFRM conform to these requirements:
> +     *  - XFRM[1:0] must be set to 0x3;
> +     *  - If processor does not support XSAVE, or if the system software has not
> +     *    enabled XSAVE, then XFRM[63:2] must be 0.
> +     *  - If the processor does support XSAVE, XFRM must contain a value that
> +     *    would be legal if loaded into XCR0.
> +     */
> +    p->sgx.xfrm_low = 0x3;
> +    p->sgx.xfrm_high = 0;
> +    if ( p->basic.xsave )
> +    {
> +        p->sgx.xfrm_low |= p->xstate.xcr0_low;
> +        p->sgx.xfrm_high |= p->xstate.xcr0_high;
> +    }

There is a bug here, but it will disappear with my CPUID work.  At the 
moment, the job of this function is to sanitise values handed by the 
toolstack, which includes zeroing all the reserved bits.  This is 
because there is currently no way to signal a failure.

When I fix the toolstack interface, the toolstack will propose a new 
CPUID policy, and Xen will have a function to check it against the 
architectural requirements.  At that point, we will be applying checks, 
but not modifying the contents.

> +
> +    if ( hide_epc )
> +    {
> +        memset(&p->sgx.raw[0x2], 0, sizeof (struct cpuid_leaf));
> +    }
> +}
> +
>   /*
>    * Misc adjustments to the policy.  Mostly clobbering reserved fields and
>    * duplicating shared fields.  Intentionally hidden fields are annotated.
> @@ -239,7 +278,7 @@ static void __init calculate_raw_policy(void)
>       {
>           switch ( i )
>           {
> -        case 0x4: case 0x7: case 0xd:
> +        case 0x4: case 0x7: case 0xd: case 0x12:
>               /* Multi-invocation leaves.  Deferred. */
>               continue;
>           }
> @@ -299,6 +338,19 @@ static void __init calculate_raw_policy(void)
>           }
>       }
>   
> +    if ( p->basic.max_leaf >= SGX_CPUID )
> +    {
> +        /*
> +         * For raw policy we just report native CPUID. For EPC on native it's
> +         * possible that we will have multiple EPC sections (meaning subleaf 3,
> +         * 4, ... may also be valid), but as the policy is for guest so we only
> +         * need one EPC section (subleaf 2).
> +         */
> +        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
> +        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
> +        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);

Copy & paste error?  I presume you meant to use leaves 1 and 2 here, 
rather than leaf 0 each time?

> +    }
> +
>       /* Extended leaves. */
>       cpuid_leaf(0x80000000, &p->extd.raw[0]);
>       for ( i = 1; i < min(ARRAY_SIZE(p->extd.raw),
> @@ -324,6 +376,8 @@ static void __init calculate_host_policy(void)
>       cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p);
>       recalculate_xstate(p);
>       recalculate_misc(p);
> +    /* For host policy we report physical EPC */
> +    recalculate_sgx(p, 0);
>   
>       if ( p->extd.svm )
>       {
> @@ -357,6 +411,11 @@ static void __init calculate_pv_max_policy(void)
>       sanitise_featureset(pv_featureset);
>       cpuid_featureset_to_policy(pv_featureset, p);
>       recalculate_xstate(p);
> +    /*
> +     * For PV policy we don't report physical EPC. Actually for PV policy
> +     * currently SGX will be disabled.
> +     */
> +    recalculate_sgx(p, 1);
>   
>       p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
>   }
> @@ -413,6 +472,13 @@ static void __init calculate_hvm_max_policy(void)
>       sanitise_featureset(hvm_featureset);
>       cpuid_featureset_to_policy(hvm_featureset, p);
>       recalculate_xstate(p);
> +    /*
> +     * For HVM policy we don't report physical EPC. Actually cpuid policy
> +     * should report VM's virtual EPC base and size. However VM's virtual
> +     * EPC info will come from toolstack, and only after Xen is notified
> +     * VM's cpuid policy should report invalid EPC.
> +     */
> +    recalculate_sgx(p, 1);
>   }
>   
>   void __init init_guest_cpuid(void)
> @@ -528,6 +594,12 @@ void recalculate_cpuid_policy(struct domain *d)
>       if ( p->basic.max_leaf < XSTATE_CPUID )
>           __clear_bit(X86_FEATURE_XSAVE, fs);
>   
> +    if ( p->basic.max_leaf < SGX_CPUID )
> +    {
> +        __clear_bit(X86_FEATURE_SGX, fs);
> +        __clear_bit(X86_FEATURE_SGX_LAUNCH_CONTROL, fs);

Because you filled in the feature dependency graph for SGX_LAUNCH 
depending on SGX, this second clear bit isn't necessary.  Clearing SGX 
will cause sanitise_featureset() to automatically clear SGX_LAUNCH (and 
any future feature bits).

> +    }
> +
>       sanitise_featureset(fs);
>   
>       /* Fold host's FDP_EXCP_ONLY and NO_FPU_SEL into guest's view. */
> @@ -550,6 +622,12 @@ void recalculate_cpuid_policy(struct domain *d)
>   
>       recalculate_xstate(p);
>       recalculate_misc(p);
> +    /*
> +     * recalculate_cpuid_policy is also called for domain's cpuid policy,
> +     * which is from toolstack via XEN_DOMCTL_set_cpuid, therefore we cannot
> +     * hide domain's virtual EPC from toolstack.
> +     */
> +    recalculate_sgx(p, 0);
>   
>       for ( i = 0; i < ARRAY_SIZE(p->cache.raw); ++i )
>       {
> @@ -645,6 +723,13 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
>               *res = p->xstate.raw[subleaf];
>               break;
>   
> +        case SGX_CPUID:
> +            if ( !p->feat.sgx )
> +                return;

|| subleaf >= ARRAY_SIZE(p->sgx.raw)

Otherwise, a guest CPUID query can walk read off the end of raw[].

> +
> +            *res = p->sgx.raw[subleaf];
> +            break;
> +
>           default:
>               *res = p->basic.raw[leaf];
>               break;
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index f40e989fd8..7d49947a3e 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -53,6 +53,7 @@ static int update_domain_cpuid_info(struct domain *d,
>       struct cpuid_policy *p = d->arch.cpuid;
>       const struct cpuid_leaf leaf = { ctl->eax, ctl->ebx, ctl->ecx, ctl->edx };
>       int old_vendor = p->x86_vendor;
> +    int ret = 0;
>   
>       /*
>        * Skip update for leaves we don't care about.  This avoids the overhead
> @@ -74,6 +75,7 @@ static int update_domain_cpuid_info(struct domain *d,
>           if ( ctl->input[0] == XSTATE_CPUID &&
>                ctl->input[1] != 1 ) /* Everything else automatically calculated. */
>               return 0;
> +
>           break;
>   
>       case 0x40000000: case 0x40000100:
> @@ -104,6 +106,10 @@ static int update_domain_cpuid_info(struct domain *d,
>               p->xstate.raw[ctl->input[1]] = leaf;
>               break;
>   
> +        case SGX_CPUID:
> +            p->sgx.raw[ctl->input[1]] = leaf;
> +            break;

You also need to modify the higher switch statement so the toolstack 
can't cause Xen to write beyond the end of .raw[].

> +
>           default:
>               p->basic.raw[ctl->input[0]] = leaf;
>               break;
> @@ -255,6 +261,45 @@ static int update_domain_cpuid_info(struct domain *d,
>           }
>           break;
>   
> +    case 0x12:
> +    {
> +        uint64_t base_pfn, npages;
> +
> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
> +            break;
> +
> +        if ( ctl->input[1] != 2 )
> +            break;
> +
> +        /* SGX has not enabled */
> +        if ( !p->feat.sgx || !p->sgx.sgx1 )
> +            break;
> +
> +        /*
> +         * If SGX is enabled in CPUID, then we are expecting valid EPC resource
> +         * in sub-leaf 0x2. Return -EFAULT to notify toolstack that there's
> +         * something wrong.
> +         */
> +        if ( !p->sgx.base_valid || !p->sgx.size_valid )

Is there any plausible usecase where only one of these is valid?  If 
not, why are they split?

> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        base_pfn = (((uint64_t)(p->sgx.base_pfn_high)) << 20) |
> +            (uint64_t)p->sgx.base_pfn_low;
> +        npages = (((uint64_t)(p->sgx.npages_high)) << 20) |
> +            (uint64_t)p->sgx.npages_low;
> +
> +        if ( !hvm_epc_populated(d) )
> +            ret = hvm_populate_epc(d, base_pfn, npages);
> +        else
> +            if ( base_pfn != to_sgx(d)->epc_base_pfn ||
> +                    npages != to_sgx(d)->epc_npages )
> +                ret = -EINVAL;
> +
> +        break;
> +    }
>       case 0x80000001:
>           if ( is_pv_domain(d) && ((levelling_caps & LCAP_e1cd) == LCAP_e1cd) )
>           {
> @@ -299,7 +344,7 @@ static int update_domain_cpuid_info(struct domain *d,
>           break;
>       }
>   
> -    return 0;
> +    return ret;
>   }
>   
>   void arch_get_domain_info(const struct domain *d,
> diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
> index ac25908eca..326f267263 100644
> --- a/xen/include/asm-x86/cpuid.h
> +++ b/xen/include/asm-x86/cpuid.h
> @@ -61,10 +61,11 @@ extern struct cpuidmasks cpuidmask_defaults;
>   /* Whether or not cpuid faulting is available for the current domain. */
>   DECLARE_PER_CPU(bool, cpuid_faulting_enabled);
>   
> -#define CPUID_GUEST_NR_BASIC      (0xdu + 1)
> +#define CPUID_GUEST_NR_BASIC      (0x12u + 1)
>   #define CPUID_GUEST_NR_FEAT       (0u + 1)
>   #define CPUID_GUEST_NR_CACHE      (5u + 1)
>   #define CPUID_GUEST_NR_XSTATE     (62u + 1)
> +#define CPUID_GUEST_NR_SGX        (0x2u + 1)
>   #define CPUID_GUEST_NR_EXTD_INTEL (0x8u + 1)
>   #define CPUID_GUEST_NR_EXTD_AMD   (0x1cu + 1)
>   #define CPUID_GUEST_NR_EXTD       MAX(CPUID_GUEST_NR_EXTD_INTEL, \
> @@ -169,6 +170,29 @@ struct cpuid_policy
>           } comp[CPUID_GUEST_NR_XSTATE];
>       } xstate;
>   
> +    union {
> +        struct cpuid_leaf raw[CPUID_GUEST_NR_SGX];
> +
> +        struct {
> +            /* Subleaf 0. */
> +            uint32_t sgx1:1, sgx2:1, :30;

Please use bool bitfields for these.

Something like:

bool sgx1:1 sgx2:2;
uint32_t :30;

should be fine.

> +            uint32_t miscselect, /* c */ :32;
> +            uint32_t maxenclavesize_n64:8, maxenclavesize_64:8, :16;

uint8_t for these please, rather than an 8 bit bitfield.

Can we use clearer names, such as maxsize_legacy and maxsize_long? They 
will be accessed via p->sgx. anyway, so the "enclave" bit of context is 
already present.

> +
> +            /* Subleaf 1. */
> +            uint32_t init:1, debug:1, mode64:1, /*reserve*/:1, provisionkey:1,
> +                     einittokenkey:1, :26;

bools as well please here.

> +            uint32_t /* reserve */:32;
> +            uint32_t xfrm_low, xfrm_high;

uint64_t xfrm ?

The XSAVE words are apart because they are not adjacent in the 
architectural layout, but these are.

> +
> +            /* Subleaf 2. */
> +            uint32_t base_valid:1, :11, base_pfn_low:20;
> +            uint32_t base_pfn_high:20, :12;
> +            uint32_t size_valid:1, :11, npages_low:20;
> +            uint32_t npages_high:20, :12;
> +        };

Are the {base,size}_valid fields correct?  The manual says the are 4-bit 
fields rather than single bit fields.

I would also drop the _pfn from the base names.  The fields still need 
shifting to get a sensible value.

~Andrew

> +    } sgx;
> +
>       /* Extended leaves: 0x800000xx */
>       union {
>           struct cpuid_leaf raw[CPUID_GUEST_NR_EXTD];


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping
  2017-07-09  8:12 ` [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping Kai Huang
@ 2017-07-12 11:01   ` Andrew Cooper
  2017-07-12 12:21     ` George Dunlap
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-12 11:01 UTC (permalink / raw)
  To: Kai Huang, xen-devel; +Cc: George.Dunlap, kevin.tian, jbeulich

On 09/07/17 10:12, Kai Huang wrote:
> A new 'p2m_epc' type is added for EPC mapping type. Two wrapper functions
> set_epc_p2m_entry and clear_epc_p2m_entry are also added for further use.

Other groups in Intel have been looking to reduce the number of p2m 
types we have, so we can use more hardware defined bits in the EPT 
pagetable entries.

If we need a new type then we will certainly add one, but it is not 
clear why this type is needed.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-09  8:16 ` [PATCH 15/15] xen: tools: expose EPC in ACPI table Kai Huang
@ 2017-07-12 11:05   ` Andrew Cooper
  2017-07-13  8:23     ` Huang, Kai
  2017-07-14 11:31   ` Jan Beulich
  2017-07-17 10:54   ` Roger Pau Monné
  2 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-12 11:05 UTC (permalink / raw)
  To: Kai Huang, xen-devel; +Cc: wei.liu2, ian.jackson, jbeulich

On 09/07/17 10:16, Kai Huang wrote:
> On physical machine EPC is exposed in ACPI table via "INT0E0C". Although EPC
> can be discovered by CPUID but Windows driver requires EPC to be exposed in
> ACPI table as well. This patch exposes EPC in ACPI table.

:(

> diff --git a/tools/libacpi/dsdt.asl b/tools/libacpi/dsdt.asl
> index fa8ff317b2..25ce196028 100644
> --- a/tools/libacpi/dsdt.asl
> +++ b/tools/libacpi/dsdt.asl
> @@ -441,6 +441,55 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
>                   }
>               }
>           }
> +
> +        Device (EPC)

Would it not be better to put this into an SSDT, and only expose it to 
the guest if SGX is advertised?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset
  2017-07-09  8:04 ` [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset Kai Huang
@ 2017-07-12 11:09   ` Andrew Cooper
  2017-07-17  6:20     ` Huang, Kai
  2017-07-18 10:12   ` Andrew Cooper
  1 sibling, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-12 11:09 UTC (permalink / raw)
  To: Kai Huang, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, tim, ian.jackson, jbeulich

On 09/07/17 10:04, Kai Huang wrote:
> Expose SGX in CPU featureset for HVM domain. SGX will not be supported for
> PV domain, as ENCLS (which SGX driver in guest essentially runs) must run
> in ring 0, while PV kernel runs in ring 3. Theoretically we can support SGX
> in PV domain via either emulating #GP caused by ENCLS running in ring 3, or
> by PV ENCLS but it is really not necessary at this stage. And currently SGX
> is only exposed to HAP HVM domain (we can add for shadow in the future).
>
> SGX Launch Control is also exposed in CPU featureset for HVM domain. SGX
> Launch Control depends on SGX.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>

I think its perfectly reasonable to restrict to HVM guests to start 
with, although I don't see how shadow vs HAP has any impact at this 
stage?  All that matters is that the EPC pages appear in the guests p2m.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT
  2017-07-09  8:09 ` [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT Kai Huang
@ 2017-07-12 11:11   ` Andrew Cooper
  2017-07-12 18:54     ` Jan Beulich
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-12 11:11 UTC (permalink / raw)
  To: Kai Huang, xen-devel; +Cc: kevin.tian, jbeulich

On 09/07/17 10:09, Kai Huang wrote:
> If ENCLS VMEXIT is not present then we cannot support SGX virtualization.
> This patch detects presence of ENCLS VMEXIT. A Xen boot boolean parameter
> 'sgx' is also added to manually enable/disable SGX.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>

At a minimum, you also need to modify calculate_hvm_max_policy() to hide 
SGX if we don't have ENCLS intercept support.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping
  2017-07-12 11:01   ` Andrew Cooper
@ 2017-07-12 12:21     ` George Dunlap
  2017-07-13  5:56       ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: George Dunlap @ 2017-07-12 12:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Kevin Tian, Kai Huang, George Dunlap, jbeulich, xen-devel


> On Jul 12, 2017, at 1:01 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> 
> On 09/07/17 10:12, Kai Huang wrote:
>> A new 'p2m_epc' type is added for EPC mapping type. Two wrapper functions
>> set_epc_p2m_entry and clear_epc_p2m_entry are also added for further use.
> 
> Other groups in Intel have been looking to reduce the number of p2m types we have, so we can use more hardware defined bits in the EPT pagetable entries.
> 
> If we need a new type then we will certainly add one, but it is not clear why this type is needed.

Does the hypervisor need to know which pages of a domain’s p2m 1) have valid config set up, but 2) aren’t accessible to itself or any other domain?

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT
  2017-07-12 11:11   ` Andrew Cooper
@ 2017-07-12 18:54     ` Jan Beulich
  2017-07-13  4:57       ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2017-07-12 18:54 UTC (permalink / raw)
  To: andrew.cooper3, kaih.linux; +Cc: kevin.tian, xen-devel

>>> Andrew Cooper <andrew.cooper3@citrix.com> 07/12/17 1:12 PM >>>
>On 09/07/17 10:09, Kai Huang wrote:
>> If ENCLS VMEXIT is not present then we cannot support SGX virtualization.
>> This patch detects presence of ENCLS VMEXIT. A Xen boot boolean parameter
>> 'sgx' is also added to manually enable/disable SGX.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>
>At a minimum, you also need to modify calculate_hvm_max_policy() to hide 
>SGX if we don't have ENCLS intercept support.

Additionally I think the command line option should default to off initially
and it needs an entry in the command line option doc.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT
  2017-07-12 18:54     ` Jan Beulich
@ 2017-07-13  4:57       ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-13  4:57 UTC (permalink / raw)
  To: Jan Beulich, andrew.cooper3, kaih.linux; +Cc: kevin.tian, xen-devel



On 7/13/2017 6:54 AM, Jan Beulich wrote:
>>>> Andrew Cooper <andrew.cooper3@citrix.com> 07/12/17 1:12 PM >>>
>> On 09/07/17 10:09, Kai Huang wrote:
>>> If ENCLS VMEXIT is not present then we cannot support SGX virtualization.
>>> This patch detects presence of ENCLS VMEXIT. A Xen boot boolean parameter
>>> 'sgx' is also added to manually enable/disable SGX.
>>>
>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>
>> At a minimum, you also need to modify calculate_hvm_max_policy() to hide
>> SGX if we don't have ENCLS intercept support.

Actually IMO this is not needed, as I added an __initcall(sgx_init) (see 
patch 0003), where I will call setup_clear_cpu_cap(X86_FEATURE_SGX) if 
for any reason boot_sgx_cpuidata (which contains common SGX cpuid info 
shared by all cores) doesn't have valid SGX info. if ENCLS VMEXIT is not 
present, then detect_sgx won't be called for any core so that 
X86_FEATURE_SGX will be cleared in boot_cpu_data. As init_guest_cpuid is 
called after all __initcalls are called, so if ENCLS VMEXIT is not 
present, or sgx is disabled via boot parameter, then even for 
host_policy, it won't have SGX.

Of course if we changed the implementation of __initcall(sgx_init), we 
probably need to explicitly clear SGX here. Anyway clearing SGX here 
doesn't have any harm, so I am completely fine to do it if you think it 
is necessary.

> 
> Additionally I think the command line option should default to off initially
> and it needs an entry in the command line option doc.

Sure. I'll change default to 0 and change the doc as well.

Thanks,
-Kai
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/15] xen: mm: add ioremap_cache
  2017-07-12  6:17     ` Jan Beulich
@ 2017-07-13  4:59       ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-13  4:59 UTC (permalink / raw)
  To: Jan Beulich, kaih.linux; +Cc: andrew.cooper3, julien.grall, xen-devel



On 7/12/2017 6:17 PM, Jan Beulich wrote:
>>>> Julien Grall <julien.grall@arm.com> 07/11/17 10:15 PM >>>
>> On 07/09/2017 09:10 AM, Kai Huang wrote:
>>> Currently Xen only has non-cacheable version of ioremap. Although EPC is
>>> reported as reserved memory in e820 but it can be mapped as cacheable. This
>>> patch adds ioremap_cache (cacheable version of ioremap).
>>>
>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>> ---
>>>    xen/arch/x86/mm.c      | 15 +++++++++++++--
>>>    xen/include/xen/vmap.h |  1 +
>>
>> First of all, this is common code and the "REST" maintainers should have
>> been CCed for this include.
>>
>> But xen/include/xen/vmap.h is common code and going to break ARM. We
>> already have an inline implementation of ioremap_nocache. You should
>> move the definition in x86 specific headers.
> 
> Indeed, plus the ARM implementation actually shows how this would better
> be done: Have a function allowing more than just true/false to be passed in,
> to eventually also allow having ioremap_wc() and alike as wrappers. As long
> as it's x86-specific I'd then also suggest calling the new wrapper function
> ioremap_wb() (as "cache" may also mean WT).

Hi Jan,

Thanks for comments. I'll do as you suggested.

Thanks,
-Kai
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/15] xen: mm: add ioremap_cache
  2017-07-12  7:13       ` Julien Grall
@ 2017-07-13  5:01         ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-13  5:01 UTC (permalink / raw)
  To: Julien Grall, Kai Huang, xen-devel; +Cc: andrew.cooper3, jbeulich



On 7/12/2017 7:13 PM, Julien Grall wrote:
> 
> 
> On 07/12/2017 02:52 AM, Huang, Kai wrote:
>> Hi Julien,
> 
> Hello Kai,
> 
> Please avoid top-posting.

Sorry. Will avoid in the future :)
> 
>>
>> Thanks for pointing out. I'll move to x86 specific.
>>
>> I've cc-ed all maintainers reported by ./scripts/get_maintainer.pl, 
>> looks this script doesn't report all maintainers. Sorry. I'll add ARM 
>> maintainers next time. 
> 
> I would always double check the result of scripts/get_maintainer.pl. I 
> am aware of a bug in scripts/get_maintainers.pl where only maintainer of 
> the specific component (here x86) are listed, even when you touch common 
> code.
> 
> In this case, I didn't ask to CC ARM maintainers, but CC "THE REST" 
> group (see MAINTAINERS).

Understood. I'll follow in the future.

Thanks,
-Kai
> 
> Cheers,
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/15] xen: x86: add SGX cpuid handling support.
  2017-07-12 10:56   ` Andrew Cooper
@ 2017-07-13  5:42     ` Huang, Kai
  2017-07-14  7:37       ` Andrew Cooper
  0 siblings, 1 reply; 58+ messages in thread
From: Huang, Kai @ 2017-07-13  5:42 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel; +Cc: jbeulich



On 7/12/2017 10:56 PM, Andrew Cooper wrote:
> On 09/07/17 10:10, Kai Huang wrote:
>> This patch adds SGX to cpuid handling support. In init_guest_cpuid, for
>> raw_policy and host_policy, physical EPC info is reported, but for 
>> pv_max_policy
>> and hvm_max_policy EPC is hidden, as for particular domain, it's EPC 
>> base and
>> size are from tookstack, and it is meaningless to contain physical EPC 
>> info in
>> them. Before domain's EPC base and size are properly configured, 
>> guest's SGX
>> cpuid should report invalid EPC, which is also consistent with HW 
>> behavior.
>>
>> Currently all EPC pages are fully populated for domain when it is 
>> created.
>> Xen gets domain's EPC base and size from toolstack via 
>> XEN_DOMCTL_set_cpuid,
>> so domain's EPC pages are also populated in XEN_DOMCTL_set_cpuid, after
>> receiving valid EPC base and size. Failure to populate EPC (such as 
>> there's
>> no enough free EPC pages) results in domain creation failure by making
>> XEN_DOMCTL_set_cpuid return error.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/cpuid.c        | 87 
>> ++++++++++++++++++++++++++++++++++++++++++++-
>>   xen/arch/x86/domctl.c       | 47 +++++++++++++++++++++++-
>>   xen/include/asm-x86/cpuid.h | 26 +++++++++++++-
>>   3 files changed, 157 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
>> index d359e090f3..db896be2e8 100644
>> --- a/xen/arch/x86/cpuid.c
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -9,6 +9,7 @@
>>   #include <asm/paging.h>
>>   #include <asm/processor.h>
>>   #include <asm/xstate.h>
>> +#include <asm/hvm/vmx/sgx.h>
>>   const uint32_t known_features[] = INIT_KNOWN_FEATURES;
>>   const uint32_t special_features[] = INIT_SPECIAL_FEATURES;
>> @@ -158,6 +159,44 @@ static void recalculate_xstate(struct 
>> cpuid_policy *p)
>>       }
>>   }
>> +static void recalculate_sgx(struct cpuid_policy *p, bool_t hide_epc)
> 
> Across the entire series, please use bool rather than bool_t.
Hi Andrew,

Thank you very much for comments.

Will do.

> 
> Why do we need this hide_epc parameter?  If we aren't providing any epc 
> resource to the guest, the entire sgx union should be zero and the SGX 
> feature bit should be hidden.

My intention was to hide physical EPC info for pv_max_policy and 
hvm_max_policy (recalculate_sgx is also called by 
calculate_pv_max_policy and calculate_hvm_max_policy), as they are for 
guest and don't need physical EPC info. But keeping physical EPC info in 
them does no harm so I think we can simply remove hide_epc.

IMO we cannot check whether EPC is valid and zero sgx union in 
recalculate_sgx, as it is called for each CPUID. For example, it is 
called for SGX subleaf 0, and 1, and then 2, and when subleaf 0 and 1 
are called, the EPC resource is 0 (hasn't been configured).

> 
>> +{
>> +    if ( !p->feat.sgx )
>> +    {
>> +        memset(&p->sgx, 0, sizeof (p->sgx));
>> +        return;
>> +    }
>> +
>> +    if ( !p->sgx.sgx1 )
>> +    {
>> +        memset(&p->sgx, 0, sizeof (p->sgx));
>> +        return;
>> +    }
> 
> These two clauses can be combined.

Will do.

> 
>> +
>> +    /*
>> +     * SDM 42.7.2.1 SECS.ATTRIBUTE.XFRM:
>> +     *
>> +     * Legal value for SECS.ATTRIBUTE.XFRM conform to these 
>> requirements:
>> +     *  - XFRM[1:0] must be set to 0x3;
>> +     *  - If processor does not support XSAVE, or if the system 
>> software has not
>> +     *    enabled XSAVE, then XFRM[63:2] must be 0.
>> +     *  - If the processor does support XSAVE, XFRM must contain a 
>> value that
>> +     *    would be legal if loaded into XCR0.
>> +     */
>> +    p->sgx.xfrm_low = 0x3;
>> +    p->sgx.xfrm_high = 0;
>> +    if ( p->basic.xsave )
>> +    {
>> +        p->sgx.xfrm_low |= p->xstate.xcr0_low;
>> +        p->sgx.xfrm_high |= p->xstate.xcr0_high;
>> +    }
> 
> There is a bug here, but it will disappear with my CPUID work.  At the 
> moment, the job of this function is to sanitise values handed by the 
> toolstack, which includes zeroing all the reserved bits.  This is 
> because there is currently no way to signal a failure.
> 
> When I fix the toolstack interface, the toolstack will propose a new 
> CPUID policy, and Xen will have a function to check it against the 
> architectural requirements.  At that point, we will be applying checks, 
> but not modifying the contents.

I think I need to look at your design first and then I should be able to 
understand your comment. :)

> 
>> +
>> +    if ( hide_epc )
>> +    {
>> +        memset(&p->sgx.raw[0x2], 0, sizeof (struct cpuid_leaf));
>> +    }
>> +}
>> +
>>   /*
>>    * Misc adjustments to the policy.  Mostly clobbering reserved 
>> fields and
>>    * duplicating shared fields.  Intentionally hidden fields are 
>> annotated.
>> @@ -239,7 +278,7 @@ static void __init calculate_raw_policy(void)
>>       {
>>           switch ( i )
>>           {
>> -        case 0x4: case 0x7: case 0xd:
>> +        case 0x4: case 0x7: case 0xd: case 0x12:
>>               /* Multi-invocation leaves.  Deferred. */
>>               continue;
>>           }
>> @@ -299,6 +338,19 @@ static void __init calculate_raw_policy(void)
>>           }
>>       }
>> +    if ( p->basic.max_leaf >= SGX_CPUID )
>> +    {
>> +        /*
>> +         * For raw policy we just report native CPUID. For EPC on 
>> native it's
>> +         * possible that we will have multiple EPC sections (meaning 
>> subleaf 3,
>> +         * 4, ... may also be valid), but as the policy is for guest 
>> so we only
>> +         * need one EPC section (subleaf 2).
>> +         */
>> +        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
>> +        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
>> +        cpuid_count_leaf(SGX_CPUID, 0, &p->sgx.raw[0]);
> 
> Copy & paste error?  I presume you meant to use leaves 1 and 2 here, 
> rather than leaf 0 each time?

Oh sorry. Yes indeed I meant zero out subleaf 1 and 2.

> 
>> +    }
>> +
>>       /* Extended leaves. */
>>       cpuid_leaf(0x80000000, &p->extd.raw[0]);
>>       for ( i = 1; i < min(ARRAY_SIZE(p->extd.raw),
>> @@ -324,6 +376,8 @@ static void __init calculate_host_policy(void)
>>       cpuid_featureset_to_policy(boot_cpu_data.x86_capability, p);
>>       recalculate_xstate(p);
>>       recalculate_misc(p);
>> +    /* For host policy we report physical EPC */
>> +    recalculate_sgx(p, 0);
>>       if ( p->extd.svm )
>>       {
>> @@ -357,6 +411,11 @@ static void __init calculate_pv_max_policy(void)
>>       sanitise_featureset(pv_featureset);
>>       cpuid_featureset_to_policy(pv_featureset, p);
>>       recalculate_xstate(p);
>> +    /*
>> +     * For PV policy we don't report physical EPC. Actually for PV 
>> policy
>> +     * currently SGX will be disabled.
>> +     */
>> +    recalculate_sgx(p, 1);
>>       p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
>>   }
>> @@ -413,6 +472,13 @@ static void __init calculate_hvm_max_policy(void)
>>       sanitise_featureset(hvm_featureset);
>>       cpuid_featureset_to_policy(hvm_featureset, p);
>>       recalculate_xstate(p);
>> +    /*
>> +     * For HVM policy we don't report physical EPC. Actually cpuid 
>> policy
>> +     * should report VM's virtual EPC base and size. However VM's 
>> virtual
>> +     * EPC info will come from toolstack, and only after Xen is notified
>> +     * VM's cpuid policy should report invalid EPC.
>> +     */
>> +    recalculate_sgx(p, 1);
>>   }
>>   void __init init_guest_cpuid(void)
>> @@ -528,6 +594,12 @@ void recalculate_cpuid_policy(struct domain *d)
>>       if ( p->basic.max_leaf < XSTATE_CPUID )
>>           __clear_bit(X86_FEATURE_XSAVE, fs);
>> +    if ( p->basic.max_leaf < SGX_CPUID )
>> +    {
>> +        __clear_bit(X86_FEATURE_SGX, fs);
>> +        __clear_bit(X86_FEATURE_SGX_LAUNCH_CONTROL, fs);
> 
> Because you filled in the feature dependency graph for SGX_LAUNCH 
> depending on SGX, this second clear bit isn't necessary.  Clearing SGX 
> will cause sanitise_featureset() to automatically clear SGX_LAUNCH (and 
> any future feature bits).

Yes you are right. I'll just clear SGX in next version.

> 
>> +    }
>> +
>>       sanitise_featureset(fs);
>>       /* Fold host's FDP_EXCP_ONLY and NO_FPU_SEL into guest's view. */
>> @@ -550,6 +622,12 @@ void recalculate_cpuid_policy(struct domain *d)
>>       recalculate_xstate(p);
>>       recalculate_misc(p);
>> +    /*
>> +     * recalculate_cpuid_policy is also called for domain's cpuid 
>> policy,
>> +     * which is from toolstack via XEN_DOMCTL_set_cpuid, therefore we 
>> cannot
>> +     * hide domain's virtual EPC from toolstack.
>> +     */
>> +    recalculate_sgx(p, 0);
>>       for ( i = 0; i < ARRAY_SIZE(p->cache.raw); ++i )
>>       {
>> @@ -645,6 +723,13 @@ void guest_cpuid(const struct vcpu *v, uint32_t 
>> leaf,
>>               *res = p->xstate.raw[subleaf];
>>               break;
>> +        case SGX_CPUID:
>> +            if ( !p->feat.sgx )
>> +                return;
> 
> || subleaf >= ARRAY_SIZE(p->sgx.raw)
> 
> Otherwise, a guest CPUID query can walk read off the end of raw[].

Oh yes. Thanks for pointing out.

> 
>> +
>> +            *res = p->sgx.raw[subleaf];
>> +            break;
>> +
>>           default:
>>               *res = p->basic.raw[leaf];
>>               break;
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index f40e989fd8..7d49947a3e 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -53,6 +53,7 @@ static int update_domain_cpuid_info(struct domain *d,
>>       struct cpuid_policy *p = d->arch.cpuid;
>>       const struct cpuid_leaf leaf = { ctl->eax, ctl->ebx, ctl->ecx, 
>> ctl->edx };
>>       int old_vendor = p->x86_vendor;
>> +    int ret = 0;
>>       /*
>>        * Skip update for leaves we don't care about.  This avoids the 
>> overhead
>> @@ -74,6 +75,7 @@ static int update_domain_cpuid_info(struct domain *d,
>>           if ( ctl->input[0] == XSTATE_CPUID &&
>>                ctl->input[1] != 1 ) /* Everything else automatically 
>> calculated. */
>>               return 0;
>> +
>>           break;
>>       case 0x40000000: case 0x40000100:
>> @@ -104,6 +106,10 @@ static int update_domain_cpuid_info(struct domain 
>> *d,
>>               p->xstate.raw[ctl->input[1]] = leaf;
>>               break;
>> +        case SGX_CPUID:
>> +            p->sgx.raw[ctl->input[1]] = leaf;
>> +            break;
> 
> You also need to modify the higher switch statement so the toolstack 
> can't cause Xen to write beyond the end of .raw[].

Yes I understand your point now. Will do.

> 
>> +
>>           default:
>>               p->basic.raw[ctl->input[0]] = leaf;
>>               break;
>> @@ -255,6 +261,45 @@ static int update_domain_cpuid_info(struct domain 
>> *d,
>>           }
>>           break;
>> +    case 0x12:
>> +    {
>> +        uint64_t base_pfn, npages;
>> +
>> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
>> +            break;
>> +
>> +        if ( ctl->input[1] != 2 )
>> +            break;
>> +
>> +        /* SGX has not enabled */
>> +        if ( !p->feat.sgx || !p->sgx.sgx1 )
>> +            break;
>> +
>> +        /*
>> +         * If SGX is enabled in CPUID, then we are expecting valid 
>> EPC resource
>> +         * in sub-leaf 0x2. Return -EFAULT to notify toolstack that 
>> there's
>> +         * something wrong.
>> +         */
>> +        if ( !p->sgx.base_valid || !p->sgx.size_valid )
> 
> Is there any plausible usecase where only one of these is valid?  If 
> not, why are they split?

You mean why are they split in SDM? I don't think there's any usecase 
where only one is valid. In reality, either they both valid, or they 
both invalid, otherwise there's bug in either CPU ucode or BIOS. This is 
just the definition in SDM.

> 
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        base_pfn = (((uint64_t)(p->sgx.base_pfn_high)) << 20) |
>> +            (uint64_t)p->sgx.base_pfn_low;
>> +        npages = (((uint64_t)(p->sgx.npages_high)) << 20) |
>> +            (uint64_t)p->sgx.npages_low;
>> +
>> +        if ( !hvm_epc_populated(d) )
>> +            ret = hvm_populate_epc(d, base_pfn, npages);
>> +        else
>> +            if ( base_pfn != to_sgx(d)->epc_base_pfn ||
>> +                    npages != to_sgx(d)->epc_npages )
>> +                ret = -EINVAL;
>> +
>> +        break;
>> +    }
>>       case 0x80000001:
>>           if ( is_pv_domain(d) && ((levelling_caps & LCAP_e1cd) == 
>> LCAP_e1cd) )
>>           {
>> @@ -299,7 +344,7 @@ static int update_domain_cpuid_info(struct domain *d,
>>           break;
>>       }
>> -    return 0;
>> +    return ret;
>>   }
>>   void arch_get_domain_info(const struct domain *d,
>> diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
>> index ac25908eca..326f267263 100644
>> --- a/xen/include/asm-x86/cpuid.h
>> +++ b/xen/include/asm-x86/cpuid.h
>> @@ -61,10 +61,11 @@ extern struct cpuidmasks cpuidmask_defaults;
>>   /* Whether or not cpuid faulting is available for the current 
>> domain. */
>>   DECLARE_PER_CPU(bool, cpuid_faulting_enabled);
>> -#define CPUID_GUEST_NR_BASIC      (0xdu + 1)
>> +#define CPUID_GUEST_NR_BASIC      (0x12u + 1)
>>   #define CPUID_GUEST_NR_FEAT       (0u + 1)
>>   #define CPUID_GUEST_NR_CACHE      (5u + 1)
>>   #define CPUID_GUEST_NR_XSTATE     (62u + 1)
>> +#define CPUID_GUEST_NR_SGX        (0x2u + 1)
>>   #define CPUID_GUEST_NR_EXTD_INTEL (0x8u + 1)
>>   #define CPUID_GUEST_NR_EXTD_AMD   (0x1cu + 1)
>>   #define CPUID_GUEST_NR_EXTD       MAX(CPUID_GUEST_NR_EXTD_INTEL, \
>> @@ -169,6 +170,29 @@ struct cpuid_policy
>>           } comp[CPUID_GUEST_NR_XSTATE];
>>       } xstate;
>> +    union {
>> +        struct cpuid_leaf raw[CPUID_GUEST_NR_SGX];
>> +
>> +        struct {
>> +            /* Subleaf 0. */
>> +            uint32_t sgx1:1, sgx2:1, :30;
> 
> Please use bool bitfields for these.
> 
> Something like:
> 
> bool sgx1:1 sgx2:2;
> uint32_t :30;
> 
> should be fine.
> 

OK. Will do. Thanks.

>> +            uint32_t miscselect, /* c */ :32;
>> +            uint32_t maxenclavesize_n64:8, maxenclavesize_64:8, :16;
> 
> uint8_t for these please, rather than an 8 bit bitfield.
> 
> Can we use clearer names, such as maxsize_legacy and maxsize_long? They 
> will be accessed via p->sgx. anyway, so the "enclave" bit of context is 
> already present.

Sure. I'll change to the name you suggested.

> 
>> +
>> +            /* Subleaf 1. */
>> +            uint32_t init:1, debug:1, mode64:1, /*reserve*/:1, 
>> provisionkey:1,
>> +                     einittokenkey:1, :26;
> 
> bools as well please here.

Will do.

> 
>> +            uint32_t /* reserve */:32;
>> +            uint32_t xfrm_low, xfrm_high;
> 
> uint64_t xfrm ?
> 
> The XSAVE words are apart because they are not adjacent in the 
> architectural layout, but these are.

I think the reason I chose xfrm_low and xfrm_high is in recalculate_sgx, 
I need to reference them:

	if ( p->basic.xsave )
	{
             p->sgx.xfrm_low |= p->xstate.xcr0_low;
             p->sgx.xfrm_high |= p->xstate.xcr0_high;
         }

But I have no problem change to xfrm.

> 
>> +
>> +            /* Subleaf 2. */
>> +            uint32_t base_valid:1, :11, base_pfn_low:20;
>> +            uint32_t base_pfn_high:20, :12;
>> +            uint32_t size_valid:1, :11, npages_low:20;
>> +            uint32_t npages_high:20, :12;
>> +        };
> 
> Are the {base,size}_valid fields correct?  The manual says the are 4-bit 
> fields rather than single bit fields.

They are 4 bits in SDM but actually currently only bit 1 is valid (other 
values are reserved). I think for now bool base_valid should be enough. 
We can extend when new values come out. What's your suggestion?

> 
> I would also drop the _pfn from the base names.  The fields still need 
> shifting to get a sensible value.

OK. Will do.

> 
> ~Andrew
> 
>> +    } sgx;
>> +
>>       /* Extended leaves: 0x800000xx */
>>       union {
>>           struct cpuid_leaf raw[CPUID_GUEST_NR_EXTD];
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping
  2017-07-12 12:21     ` George Dunlap
@ 2017-07-13  5:56       ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-13  5:56 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper; +Cc: Kevin Tian, Kai Huang, jbeulich, xen-devel



On 7/13/2017 12:21 AM, George Dunlap wrote:
> 
>> On Jul 12, 2017, at 1:01 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>> On 09/07/17 10:12, Kai Huang wrote:
>>> A new 'p2m_epc' type is added for EPC mapping type. Two wrapper functions
>>> set_epc_p2m_entry and clear_epc_p2m_entry are also added for further use.
>>
>> Other groups in Intel have been looking to reduce the number of p2m types we have, so we can use more hardware defined bits in the EPT pagetable entries.
>>
>> If we need a new type then we will certainly add one, but it is not clear why this type is needed.
> 
> Does the hypervisor need to know which pages of a domain’s p2m 1) have valid config set up, but 2) aren’t accessible to itself or any other domain?

Hi Andrew, George,

Actually I haven't thought this thoroughly, but my first glance is 
there's no existing p2m_type that can be reasonably used for EPC. 
Probably p2m_ram_rw or p2m_mmio_direct are two potential candidates. For 
EPC, for *static partitioning* Xen hypervisor just needs to setup 
mappings and then leave it until guest is destroyed. But for p2m_ram_rw 
and p2m_mmio_direct there are additional logic when Xen learns about the 
two types. To me adding 'p2m_epc' is more straightforward and safe. 
Maybe we can change to a more generic name such as 'p2m_ram_encrypted'? 
But again I am not sure other encryption technology can also be applied 
to EPC.

> 
>   -George
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-12 11:05   ` Andrew Cooper
@ 2017-07-13  8:23     ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-13  8:23 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel; +Cc: ian.jackson, wei.liu2, jbeulich



On 7/12/2017 11:05 PM, Andrew Cooper wrote:
> On 09/07/17 10:16, Kai Huang wrote:
>> On physical machine EPC is exposed in ACPI table via "INT0E0C". 
>> Although EPC
>> can be discovered by CPUID but Windows driver requires EPC to be 
>> exposed in
>> ACPI table as well. This patch exposes EPC in ACPI table.
> 
> :(
> 
>> diff --git a/tools/libacpi/dsdt.asl b/tools/libacpi/dsdt.asl
>> index fa8ff317b2..25ce196028 100644
>> --- a/tools/libacpi/dsdt.asl
>> +++ b/tools/libacpi/dsdt.asl
>> @@ -441,6 +441,55 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", 
>> "HVM", 0)
>>                   }
>>               }
>>           }
>> +
>> +        Device (EPC)
> 
> Would it not be better to put this into an SSDT, and only expose it to 
> the guest if SGX is advertised?

You mean to create dedicated ssdt_epc.asl? I thought about this, but I 
am not quite sure if we can, because new EPC device will need to refer 
\_SB.EMIN, and \_SB.ELEN, which are in acpi_info, to get EPC base and 
size. Can we refer acpi_info in dedicated ssdt_epc.asl?

Thanks,
-Kai

> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/15] xen: x86: add SGX cpuid handling support.
  2017-07-13  5:42     ` Huang, Kai
@ 2017-07-14  7:37       ` Andrew Cooper
  2017-07-14 11:08         ` Jan Beulich
  2017-07-17  6:16         ` Huang, Kai
  0 siblings, 2 replies; 58+ messages in thread
From: Andrew Cooper @ 2017-07-14  7:37 UTC (permalink / raw)
  To: Huang, Kai, Kai Huang, xen-devel; +Cc: jbeulich

On 13/07/17 07:42, Huang, Kai wrote:
> On 7/12/2017 10:56 PM, Andrew Cooper wrote:
>> On 09/07/17 10:10, Kai Huang wrote:
>>
>> Why do we need this hide_epc parameter?  If we aren't providing any 
>> epc resource to the guest, the entire sgx union should be zero and 
>> the SGX feature bit should be hidden.
>
> My intention was to hide physical EPC info for pv_max_policy and 
> hvm_max_policy (recalculate_sgx is also called by 
> calculate_pv_max_policy and calculate_hvm_max_policy), as they are for 
> guest and don't need physical EPC info. But keeping physical EPC info 
> in them does no harm so I think we can simply remove hide_epc.

It is my experience that providing half the information is worse than 
providing none or all of it, because developers are notorious for taking 
shortcuts when looking for features.

Patch 1 means that a PV guest will never have p->feat.sgx set. 
Therefore, we will hit the memset() below, and zero the whole of the SGX 
union.

>
> IMO we cannot check whether EPC is valid and zero sgx union in 
> recalculate_sgx, as it is called for each CPUID. For example, it is 
> called for SGX subleaf 0, and 1, and then 2, and when subleaf 0 and 1 
> are called, the EPC resource is 0 (hasn't been configured).

recalculate_*() only get called when the toolstack makes updates to the 
policy.  It is an unfortunate side effect of the current implementation, 
but will be going away with my CPUID work.

The intended flow will be this:

At Xen boot:
* Calculates the raw, host and max policies (as we do today)

At domain create:
* Appropriate policy gets copied to make the default domain policy.
* Toolstack gets the whole policy at one with a new 
DOMCTL_get_cpuid_policy hypercall.
* Toolstack makes all adjustments (locally) that it wants to, based on 
configuration, etc.
* Toolstack makes a single DOMCTL_set_cpuid_policy hypercall.
* Xen audits the new policy proposed by the toolstack, resulting in a 
single yes/no decision.
** If not, the toolstack is told to try again.  This will likely result 
in xl asking the user to modify their .cfg file.
** If yes, the proposed policy becomes the actual policy.

This scheme will fix the current problem we have where the toolstack 
blindly proposes changes (one leaf at a time), and Xen has to zero the 
bits it doesn't like (because the toolstack has never traditionally 
checked the return value of the hypercall :( )

>
>
>>
>>> +
>>> +            /* Subleaf 2. */
>>> +            uint32_t base_valid:1, :11, base_pfn_low:20;
>>> +            uint32_t base_pfn_high:20, :12;
>>> +            uint32_t size_valid:1, :11, npages_low:20;
>>> +            uint32_t npages_high:20, :12;
>>> +        };
>>
>> Are the {base,size}_valid fields correct?  The manual says the are 
>> 4-bit fields rather than single bit fields.
>
> They are 4 bits in SDM but actually currently only bit 1 is valid 
> (other values are reserved). I think for now bool base_valid should be 
> enough. We can extend when new values come out. What's your suggestion?

Ok.  That can work for now.

>
>>
>> I would also drop the _pfn from the base names.  The fields still 
>> need shifting to get a sensible value.
>
> OK. Will do.

As a further thought, what about uint64_t base:40 and size:40?  That 
would reduce the complexity of calculating the values.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/15] xen: x86: add SGX cpuid handling support.
  2017-07-14  7:37       ` Andrew Cooper
@ 2017-07-14 11:08         ` Jan Beulich
  2017-07-17  6:16         ` Huang, Kai
  1 sibling, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2017-07-14 11:08 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang; +Cc: Kai Huang, xen-devel

>>> On 14.07.17 at 09:37, <andrew.cooper3@citrix.com> wrote:
> On 13/07/17 07:42, Huang, Kai wrote:
>> On 7/12/2017 10:56 PM, Andrew Cooper wrote:
>>> On 09/07/17 10:10, Kai Huang wrote:
>>>> +            /* Subleaf 2. */
>>>> +            uint32_t base_valid:1, :11, base_pfn_low:20;
>>>> +            uint32_t base_pfn_high:20, :12;
>>>> +            uint32_t size_valid:1, :11, npages_low:20;
>>>> +            uint32_t npages_high:20, :12;
>>>> +        };
>>>
>>> Are the {base,size}_valid fields correct?  The manual says the are 
>>> 4-bit fields rather than single bit fields.
>>
>> They are 4 bits in SDM but actually currently only bit 1 is valid 
>> (other values are reserved). I think for now bool base_valid should be 
>> enough. We can extend when new values come out. What's your suggestion?
> 
> Ok.  That can work for now.
> 
>>
>>>
>>> I would also drop the _pfn from the base names.  The fields still 
>>> need shifting to get a sensible value.
>>
>> OK. Will do.
> 
> As a further thought, what about uint64_t base:40 and size:40?  That 
> would reduce the complexity of calculating the values.

But that may not really be portable. I've just checked the Intel
compiler (on Windows, admittedly), and it then starts the base
and size fields each on an 8-byte boundary. Hence all other fields
would then better also be uint64_t.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-09  8:16 ` [PATCH 15/15] xen: tools: expose EPC in ACPI table Kai Huang
  2017-07-12 11:05   ` Andrew Cooper
@ 2017-07-14 11:31   ` Jan Beulich
  2017-07-17  6:11     ` Huang, Kai
  2017-07-17 10:54   ` Roger Pau Monné
  2 siblings, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2017-07-14 11:31 UTC (permalink / raw)
  To: Kai Huang; +Cc: andrew.cooper3, wei.liu2, ian.jackson, xen-devel

>>> On 09.07.17 at 10:16, <kaih.linux@gmail.com> wrote:
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -330,6 +330,15 @@ cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
>          : "0" (idx) );
>  }
>  
> +void cpuid_count(uint32_t idx, uint32_t count, uint32_t *eax,

Please name the first two leaf and subleaf.

> @@ -888,6 +897,18 @@ static uint8_t acpi_lapic_id(unsigned cpu)
>      return LAPIC_ID(cpu);
>  }
>  
> +static void get_epc_info(struct acpi_config *config)
> +{
> +    uint32_t eax, ebx, ecx, edx;
> +
> +    cpuid_count(0x12, 0x2, &eax, &ebx, &ecx, &edx);
> +
> +    config->epc_base = (((uint64_t)(ebx & 0xfffff)) << 32) |
> +                       (uint64_t)(eax & 0xfffff000);

Pointless cast.

> +    config->epc_size = (((uint64_t)(edx & 0xfffff)) << 32) |
> +                       (uint64_t)(ecx & 0xfffff000);

Again.

> --- a/tools/libacpi/dsdt.asl
> +++ b/tools/libacpi/dsdt.asl
> @@ -441,6 +441,55 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
>                  }
>              }
>          }
> +
> +        Device (EPC)
> +        {
> +            Name (_HID, EisaId ("INT0E0C"))
> +            Name (_STR, Unicode ("Enclave Page Cache 1.5"))
> +            Name (_MLS, Package (0x01)
> +            {
> +                Package (0x02)
> +                {
> +                    "en",
> +                    Unicode ("Enclave Page Cache 1.5")
> +                }
> +            })
> +            Name (RBUF, ResourceTemplate ()
> +            {
> +                QWordMemory (ResourceConsumer, PosDecode, MinFixed, MaxFixed,
> +                    Cacheable, ReadWrite,
> +                    0x0000000000000000, // Granularity
> +                    0x0000000000000000, // Range Minimum
> +                    0x0000000000000000, // Range Maximum
> +                    0x0000000000000000, // Translation Offset
> +                    0x0000000000000001, // Length
> +                    ,, _Y03,
> +                    AddressRangeMemory, TypeStatic)
> +            })
> +
> +            Method(_CRS, 0, NotSerialized) // _CRS: Current Resource Settings
> +            {
> +                CreateQwordField (RBUF, \_SB.EPC._Y03._MIN, EMIN) // _MIN: Minimuum Base Address
> +                CreateQwordField (RBUF, \_SB.EPC._Y03._MAX, EMAX) // _MIN: Maximum Base Address
> +                CreateQwordField (RBUF, \_SB.EPC._Y03._LEN, ELEN) // _LEN: Length

Please see the comment in _SB.PCI0._CRS regarding operations
on qword fields. Even if we may not formally support the named
Windows versions anymore, we should continue to be careful
here. You could have noticed this by seeing that ...

> @@ -21,6 +21,8 @@
>             LMIN, 32,
>             HMIN, 32,
>             LLEN, 32,
> -           HLEN, 32
> +           HLEN, 32,
> +           EMIN, 64,
> +           ELEN, 64,
>         }

... there have been no 64-bit fields here so far.

> @@ -156,6 +156,9 @@ static int init_acpi_config(libxl__gc *gc,
>      config->lapic_id = acpi_lapic_id;
>      config->acpi_revision = 5;
>  
> +    config->epc_base = b_info->u.hvm.sgx.epcbase;
> +    config->epc_size = (b_info->u.hvm.sgx.epckb << 10);

Pointless parentheses. Plus I guess the field names could do with
an underscore separator in the middle - it took me a moment to
realize this is a kB value (explaining the shift by 10).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
  2017-07-11 14:13 ` [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Andrew Cooper
@ 2017-07-17  6:08   ` Huang, Kai
  2017-07-21  9:04     ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Huang, Kai @ 2017-07-17  6:08 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel
  Cc: kevin.tian, sstabellini, wei.liu2, George.Dunlap, ian.jackson,
	tim, jbeulich

Hi Andrew,

Thank you very much for comments. Sorry for late reply, and please see 
my reply below.

On 7/12/2017 2:13 AM, Andrew Cooper wrote:
> On 09/07/17 09:03, Kai Huang wrote:
>> Hi all,
>>
>> This series is RFC Xen SGX virtualization support design and RFC draft 
>> patches.
> 
> Thankyou very much for this design doc.
> 
>> 2. SGX Virtualization Design
>>
>> 2.1 High Level Toolstack Changes:
>>
>> 2.1.1 New 'epc' parameter
>>
>> EPC is limited resource. In order to use EPC efficiently among all 
>> domains,
>> when creating guest, administrator should be able to specify domain's 
>> virtual
>> EPC size. And admin
>> alao should be able to get all domain's virtual EPC size.
>>
>> For this purpose, a new 'epc = <size>' parameter is added to XL 
>> configuration
>> file. This parameter specifies guest's virtual EPC size. The EPC base 
>> address
>> will be calculated by toolstack internally, according to guest's 
>> memory size,
>> MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be 
>> accepted.
> 
> How will this interact with multi-package servers?  Even though its fine 
> to implement the single-package support first, the design should be 
> extensible to the multi-package case.
> 
> First of all, what are the implications of multi-package SGX?
> 
> (Somewhere) you mention changes to scheduling.  I presume this is 
> because a guest with EPC mappings in EPT must be scheduled on the same 
> package, or ENCLU[EENTER] will fail.  I presume also that each package 
> will have separate, unrelated private keys?

The ENCLU[EENTE] will continue to work on multi-package server. Actually 
I was told all ISA existing behavior documented in SDM won't change for 
server, as otherwise this would be a bad design :)

Unfortunately I was told I cannot talk about MP server SGX a lot now. 
Basically I can only talk about staff already documented in SDM (sorry 
:( ). But I guess multiple EPC in CPUID is designed to cover MP server, 
at lease mainly (we can do reasonable guess).

In terms of the design, I think we can follow XL config file parameters 
for memory. 'epc' parameter will always specify totol EPC size that the 
domain has. And we can use existing NUMA related parameters, such as 
setting cpus='...' to physically pin vcpu to specific pCPUs, so that EPC 
will be mostly allocated from related node. If that node runs out of 
EPC, we can decide whether to allocate EPC from other node, or fail to 
create domain. I know Linux supports NUMA policy which can specify 
whether to allow allocating memory from other nodes, does Xen has such 
policy? Sorry I haven't checked this. If Xen has such policy, we need to 
choose whether to use memory policy, or introduce new policy for EPC.

If we are going to support vNUAM EPC in the future. We can also use 
similar way to config vNUMA EPC in XL config.

Sorry I mentioned scheduling. I should say *potentially* :). My thinking 
was as SGX is per-thread, then SGX info reported by different CPU 
package may be different (ex, whether SGX2 is supported), then we may 
need scheduler to be aware of SGX. But I think we don't have to consider 
this now.

What's your comments?

> 
> I presume there is no sensible way (even on native) for a single logical 
> process to use multiple different enclaves?  By extension, does it make 
> sense to try and offer parts of multiple enclaves to a single VM?

The native machine allows running multiple enclaves, even signed by 
multiple authors. SGX only has limit that before launching any other 
enclave, Launch Enclave (LE) must be launched. LE is the only enclave 
that doesn't require EINITTOKEN in EINIT. For LE, its signer 
(SHA256(sigstruct->modulus)) must be equal to the value in 
IA32_SGXLEPUBKEYHASHn MSRs. LE will generates EINITTOKEN for other 
enclaves (EINIT for other enclaves requires EINITTOKEN). For other 
enclaves, there's no such limitation that enclave's signer must match 
IA32_SGXLEPUBKEYHASHn so the signer can be anybody. But for other 
enclaves, before running EINIT, the LE's signer (which is equal to 
IA32_SGXLEPUBKEYHASHn as explained above) needs to be updated to 
IA32_SGXLEPUBKEYHASHn (MSRs can be changed, for example, when there's 
multiple LEs running in OS). This is because EINIT needs to perform 
EINITTOKEN integrity check (EINITTOKEN contains MAC info that calculated 
by LE, and EINIT needs LE's IA32_SGXLEPUBKEYHASHn to derive the key to 
verify MAC).

SGX in VM doesn't change those behaviors, so in VM, the enclaves can 
also be signed by anyone, but Xen needs to emulate IA32_SGXLEPUBKEYHASHn 
so that when one VM is running, the correct IA32_SGXLEPUBKEYHASHn are 
already in physical MSRs.

> 
>> 2.1.3 Notify domain's virtual EPC base and size to Xen
>>
>> Xen needs to know guest's EPC base and size in order to populate EPC 
>> pages for
>> it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.
> 
> I am currently in the process of reworking the Xen/Toolstack interface 
> when it comes to CPUID handling.  The latest design is available here: 
> https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00378.html 
> but the end result will be the toolstack expressing its CPUID policy in 
> terms of the architectural layout.
> 
> Therefore, I would expect that, however the setting is represented in 
> the configuration file, xl/libxl would configure it with the hypervisor 
> by setting CPUID.0x12[2] with the appropriate base and size.

I agree. I saw you are planning to introduce new 
XEN_DOMCTL_get{set}_cpuid_policy, which will allow toolstack to 
query/set cpuid policy in single hypercall (if I understand correctly), 
so I think we should definitely use the new hypercalls.

I also saw you are planning to introduce new hypercall to query 
raw/host/pv_max/hvm_max cpuid policy (not just featureset), so I think 
'xl sgxinfo' (or xl info -sgx) can certainly use that to get physical 
SGX info (EPC info). And 'xl sgxlist' (or xl list -sgx) can use 
XEN_DOMCTL_get{set}_cpuid_policy to display domain's SGX info (EPC info).

Btw, do you think we need 'xl sgxinfo' and 'xl sgxlist'? If we do, which 
is better? New 'xl sgxinfo' and 'xl sgxlist', or extending existing 'xl 
info' and 'xl list' to support SGX, such as 'xl info -sgx' and 'xl list 
-sgx' above?


> 
>> 2.1.4 Launch Control Support (?)
>>
>> Xen Launch Control Support is about to support running multiple 
>> domains with
>> each running its own LE signed by different owners (if HW allows, 
>> explained
>> below). As explained in 1.4 SGX Launch Control, EINIT for LE (Launch 
>> Enclave)
>> only succeeds when SHA256(SIGSTRUCT.modulus) matches 
>> IA32_SGXLEPUBKEYHASHn,
>> and EINIT for other enclaves will derive EINITTOKEN key according to
>> IA32_SGXLEPUBKEYHASHn. Therefore, to support this, guest's virtual
>> IA32_SGXLEPUBKEYHASHn must be updated to phyiscal MSRs before EINIT 
>> (which
>> also means the physical IA32_SGXLEPUBKEYHASHn need to be *unlocked* in 
>> BIOS
>> before booting to OS).
>>
>> For physical machine, it is BIOS's writer's decision that whether BIOS 
>> would
>> provide interface for user to specify customerized 
>> IA32_SGXLEPUBKEYHASHn (it
>> is default to digest of Intel's signing key after reset). In reality, 
>> OS's SGX
>> driver may require BIOS to make MSRs *unlocked* and actively write the 
>> hash
>> value to MSRs in order to run EINIT successfully, as in this case, the 
>> driver
>> will not depend on BIOS's capability (whether it allows user to 
>> customerize
>> IA32_SGXLEPUBKEYHASHn value).
>>
>> The problem is for Xen, do we need a new parameter, such as 
>> 'lehash=<SHA256>'
>> to specify the default value of guset's virtual IA32_SGXLEPUBKEYHASHn? 
>> And do
>> we need a new parameter, such as 'lewr' to specify whether guest's 
>> virtual MSRs
>> are locked or not before handling to guest's OS?
>>
>> I tends to not introduce 'lehash', as it seems SGX driver would 
>> actively update
>> the MSRs. And new parameter would add additional changes for upper layer
>> software (such as openstack). And 'lewr' is not needed either as Xen 
>> can always
>> *unlock* the MSRs to guest.
>>
>> Please give comments?
>>
>> Currently in my RFC patches above two parameters are not implemented.
>> Xen hypervisor will always *unlock* the MSRs. Whether there is 'lehash'
>> parameter or not doesn't impact Xen hypervisor's emulation of
>> IA32_SGXLEPUBKEYHASHn. See below Xen hypervisor changes for details.
> 
> Reading around, am I correct with the following?
> 
> 1) Some processors have no launch control.  There is no restriction on 
> which enclaves can boot.

Yes that some processors have no launch control. However it doesn't mean 
there's no restriction on which enclaves can boot. Contrary, on those 
machines only Intel's Launch Enclave (LE) can run, as on those machine, 
IA32_SGXLEPUBKEYHASHn either doesn't exist, or equal to digest of 
Intel's signing RSA pubkey. However although only Intel's LE can be run, 
we can still run other enclaves from other signers. Please see my reply 
above.

> 
> 2) Some Skylake client processors claim to have launch control, but the 
> MSRs are unavailable (is this an erratum?).  These are limited to 
> booting enclaves matching the Intel public key.

Sorry I don't know whether this is an erratum. I will get back to you 
after confirming internally.


> 
> 3) Launch control may be locked by the BIOS.  There may be a custom 
> hash, or it might be the Intel default.  Xen can't adjust it at all, but 
> can support running any number of VMs with matching enclaves.

Yes Launch control may be locked by BIOS, although this depends on 
whether BIOS provides interface for user to configure. I was told that 
typically BIOS will unlock Launch Control, as SGX driver is expecting 
such behavior. But I am not sure we can always assume this.

Whether there will be custom hash also depends on BIOS. BIOS may or may 
not provide interface for user to configure custom hash. So on physical 
machine, I think we need to consider all the cases. On machine that with 
Launch control *unlocked*, Xen is able to dynamically change 
IA32_SGXLEKEYHASHn so that Xen is able to run multiple VM with each 
running LE from different signer. However if launch control is *locked* 
in BIOS, then Xen is still able to run multiple VM, but all VM can only 
run LE from the signer that matches the IA32_SGXLEPUBKEYHASHn (which in 
most case should be Intel default, but can be custom hash if BIOS allows 
user to configure).

Sorry I am not quite sure the typical implementation of BIOS. I think I 
can reach out internally and get back to you if I have something.

> 
> 4) Launch control may be unlocked by the BIOS.  In this case, Xen can 
> context switch a hash per domain, and run all enclaves.

Yes. With enclave == LE I think you meant.

> 
> The eventual plans for CPUID and MSR levelling should allow all of these 
> to be expressed in sensible ways, and I don't forsee any issues with 
> supporting all of these scenarios.

So do you think we should have 'lehash' and 'lewr' parameters in XL 
config file? The former provides custom hash, and the latter provides 
whether unlock guest's Launch control.

My thinking is SGX driver needs to *actively* write LE's pubkey hash to 
IA32_SGXLEPUBKEYHASHn in *unlocked* mode, so 'lehash' alone is not 
needed. 'lehash' only has meaning when 'lewr' is needed to provide a 
default hash value in locked mode, as if we always use *unlocked* mode 
for guest, 'lehash' is not necessary.

> 
> 
> 
>> 2.2 High Level Xen Hypervisor Changes:
>>
>> 2.2.1 EPC Management (?)
>>
>> Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before
>> supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's 
>> possible
>> that there are multiple EPC sections (enumerated via sub-leaves 0x3 
>> and so on,
>> until invaid EPC is reported), but this is only true on 
>> multiple-socket server
>> machines. For server machines there are additional things also needs 
>> to be done,
>> such as NUMA EPC, scheduling, etc. We will support server machine in 
>> the future
>> but currently we only support one EPC.
>>
>> EPC is reported as reserved memory (so it is not reported as normal 
>> memory).
>> EPC must be managed in 4K pages. CPU hardware uses EPCM to track 
>> status of each
>> EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc 
>> and free
>> EPC pages for guest.
>>
>> There are two ways to manage EPC: Manage EPC separately; or Integrate 
>> it to
>> existing memory management framework.
>>
>> It is easy to manage EPC separately, as currently EPC is pretty small 
>> (~100MB),
>> and we can even put them in a single list. However it is not flexible, 
>> for
>> example, you will have to write new algorithms when EPC becomes 
>> larger, ex, GB.
>> And you have to write new code to support NUMA EPC (although this will 
>> not come
>> in short time).
>>
>> Integrating EPC to existing memory management framework seems more 
>> reasonable,
>> as in this way we can resume memory management data 
>> structures/algorithms, and
>> it will be more flexible to support larger EPC and potentially NUMA 
>> EPC. But
>> modifying MM framework has a higher risk to break existing memory 
>> management
>> code (potentially more bugs).
>>
>> In my RFC patches currently we choose to manage EPC separately. A new
>> structure epc_page is added to represent a single 4K EPC page. A whole 
>> array
>> of struct epc_page will be allocated during EPC initialization, so 
>> that given
>> the other, one of PFN of EPC page and 'struct epc_page' can be got by 
>> adding
>> offset.
>>
>> But maybe integrating EPC to MM framework is more reasonable. Comments?
>>
>> 2.2.2 EPC Virtualization (?)
> 
> It looks like managing the EPC is very similar to managing the NVDIMM 
> ranges.  We have a (set of) physical address ranges which need 4k 
> ownership granularity to different domains.
> 
> I think integrating this into struct page_struct is the better way to go.

Will do. So I assume we will introduce new MEMF_epc, and use existing 
alloc_domheap/xenheap_pages to allocate EPC? MEMF_epc can also be used 
if we need to support ballooning in the future (using existing 
XENMEM_{decrease/increase}_reservation.

> 
>>
>> This part is how to populate EPC for guests. We have 3 choices:
>>      - Static Partitioning
>>      - Oversubscription
>>      - Ballooning
>>
>> Static Partitioning means all EPC pages will be allocated and mapped 
>> to guest
>> when it is created, and there's no runtime change of page table 
>> mappings for EPC
>> pages. Oversubscription means Xen hypervisor supports EPC page 
>> swapping between
>> domains, meaning Xen is able to evict EPC page from another domain and 
>> assign it
>> to the domain that needs the EPC. With oversubscription, EPC can be 
>> assigned to
>> domain on demand, when EPT violation happens. Ballooning is similar to 
>> memory
>> ballooning. It is basically "Static Partitioning" + "Balloon driver" 
>> in guest.
>>
>> Static Partitioning is the easiest way in terms of implementation, and 
>> there
>> will be no hypervisor overhead (except EPT overhead of course), 
>> because in
>> "Static partitioning", there is no EPT violation for EPC, and Xen 
>> doesn't need
>> to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root 
>> mode.
>>
>> Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like 
>> "Static
>> Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and 
>> doesn't
>> have EPT violation for EPC either. To support ballooning, we need 
>> ballooning
>> driver in guest to issue hypercall to give up or reclaim EPC pages. In 
>> terms of
>> hypercall, we have two choices: 1) Add new hypercall for EPC 
>> ballooning; 2)
>> Using existing XENMEM_{increase/decrease}_reservation with new memory 
>> flag, ie,
>> XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall 
>> or not
>> later.
>>
>> Oversubscription looks nice but it requires more complicated 
>> implemetation.
>> Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to 
>> follow specific
>> steps to evict EPC pages, and in order to do that, basically Xen needs 
>> to trap
>> ENCLS from guest and keep track of EPC page status and enclave info 
>> from all
>> guest. This is because:
>>      - To evict regular EPC page, Xen needs to know SECS location
>>      - Xen needs to know EPC page type: evicting regular EPC and 
>> evicting SECS,
>>        VA page have different steps.
>>      - Xen needs to know EPC page status: whether the page is blocked 
>> or not.
>>
>> Those info can only be got by trapping ENCLS from guest, and parsing its
>> parameters (to identify SECS page, etc). Parsing ENCLS parameters 
>> means we need
>> to know which ENCLS leaf is being trapped, and we need to translate 
>> guest's
>> virtual address to get physical address in order to locate EPC page. 
>> And once
>> ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to
>> reconstruct ENCLS parameters by remapping all guest's virtual address 
>> to Xen's
>> virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective 
>> address*
>> which is able to be traslated by processor when running ENCLS.
>>
>>      --------------------------------------------------------------
>>                  |   ENCLS   |
>>      --------------------------------------------------------------
>>                  |          /|\
>>      ENCLS VMEXIT|           | VMENTRY
>>                  |           |
>>                 \|/          |
>>
>>         1) parse ENCLS parameters
>>         2) reconstruct(remap) guest's ENCLS parameters
>>         3) run ENCLS on behalf of guest (and skip ENCLS)
>>         4) on success, update EPC/enclave info, or inject error
>>
>> And Xen needs to maintain each EPC page's status (type, blocked or 
>> not, in
>> enclave or not, etc). Xen also needs to maintain all Enclave's info 
>> from all
>> guests, in order to find the correct SECS for regular EPC page, and 
>> enclave's
>> linear address as well.
>>
>> So in general, "Static Partitioning" has simplest implementation, but 
>> obviously
>> not the best way to use EPC efficiently; "Ballooning" has all pros of 
>> Static
>> Partitioning but requies guest balloon driver; "Oversubscription" is 
>> best in
>> terms of flexibility but requires complicated hypervisor implemetation.
>>
>> We have implemented "Static Partitioning" in RFC patches, but needs your
>> feedback on whether it is enough. If not, which one should we do at 
>> next stage
>> -- Ballooning or Oversubscription. IMO Ballooning may be good enough, 
>> given fact
>> that currently memory is also "Static Partitioning" + "Ballooning".
>>
>> Comments?
> 
> Definitely go for static partitioning to begin with.  This is far 
> simpler to implement.
> 
> I can't see a pressing usecase for oversubscription or ballooning. Any 
> datacenter work will be using exclusively static, and I expect static 
> will fine for all (or at least, most) client usecases.

Thanks. So for the first stage I will focus on static partitioning.

> 
>>
>> 2.2.3 Populate EPC for Guest
>>
>> Toolstack notifies Xen about domain's EPC base and size by 
>> XEN_DOMCTL_set_cpuid,
>> so currently Xen populates all EPC pages for guest in 
>> XEN_DOMCTL_set_cpuid,
>> particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. 
>> Once Xen
>> checks the values passed from toolstack is valid, Xen will allocate 
>> all EPC
>> pages and setup EPT mappings for guest.
>>
>> 2.2.4 New Dedicated Hypercall (?)
> 
> All this information should (eventually) be available via the 
> appropriate SYSCTL_get_{cpuid,msr}_policy hypercalls.  I don't see any 
> need for dedicated hypercalls.

Yes I agree.  Originally I had concern that without dedicated hypercall, 
it is hard to implement 'xl sgxinfo' and 'xl sgxlist', but according to 
your new CPUID enhancement plan, the two can be done via the new 
hypercalls to query Xen's and domain's cpuid policy. See my reply above 
regarding to "Notify Xen about guest's EPC info".

> 
>> 2.2.9 Guest Suspend & Resume
>>
>> On hardware, EPC is destroyed when power goes to S3-S5. So Xen will 
>> destroy
>> guest's EPC when guest's power goes into S3-S5. Currently Xen is 
>> notified by
>> Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen 
>> will
>> destroy EPC if S State is S3-S5.
>>
>> Specifically, Xen will run EREMOVE for guest's each EPC page, as guest 
>> may
>> not handle EPC suspend & resume correctly, in which case physically 
>> guest's EPC
>> pages may still be valid, so Xen needs to run EREMOVE to make sure all 
>> EPC
>> pages are becoming invalid. Otherwise further operation in guest on 
>> EPC may
>> fault as it assumes all EPC pages are invalid after guest is resumed.
>>
>> For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case 
>> Xen will
>> keep this SECS page into a list, and call EREMOVE for them again after 
>> all EPC
>> pages have been called with EREMOVE. This time the EREMOVE on SECS 
>> will succeed
>> as all children (regular EPC pages) have already been removed.
>>
>> 2.2.10 Destroying Domain
>>
>> Normally Xen just frees all EPC pages for domain when it is destroyed. 
>> But Xen
>> will also do EREMOVE on all guest's EPC pages (described in above 
>> 2.2.7) before
>> free them, as guest may shutdown unexpected (ex, user kills guest), 
>> and in this
>> case, guest's EPC may still be valid.
>>
>> 2.3 Additional Point: Live Migration, Snapshot Support (?)
> 
> How big is the EPC?  If we are talking MB rather than GB, movement of 
> the EPC could be after the pause, which would add some latency to live 
> migration but should work.  I expect that people would prefer to have 
> the flexibility of migration even at the cost of extra latency.
> 

The EPC is typically ~100MB at maximum (as I observed). The EPC is 
typically reserved with EPCM (EPC map, which is invisible to SW) 
together by BIOS as processor reserved memory (RPM). On real machine, 
for both our internal develop machines, and some machines that from 
Dell, HP, Lenovo (that you can buy from market now), BIOS always 
provides 3 choices in terms RPM: 32M, 64M, and 128M. And with 128M RPM, 
EPC is slightly less than 100M.

The problem is EPC cannot be moved. I think you were saying moving EPC 
by evicting EPC out at last stage and copy evicted content to remote, 
and then reload. However I don't think this will work, as EPC eviction 
itself needs to use a VA slot (which itslef is EPC), so you can image 
that the VA slots cannot be moved to remote. Even if they can, they 
cannot be used to reload EPC in remote, as info in VA slot is bound to 
platform and cannot be used on remote.

To support live migration, we can only choose to ignore EPC during live 
migration and let guest SGX driver/user SW stack to handle restoring 
enclave (which is actually a lot simpler in hypervisor/toolstack's 
implementation) . Guest SGX driver needs to handle lose EPC anyway, as 
EPC is destroyed in S3-S5. The only difference is to support live 
migration, guest SGX driver needs to support *sudden* lose of EPC, which 
is not HW behavior, and I was told that currently both Windows & Linux 
SGX driver already support *sudden* lose of EPC, which leaves us a 
question whether we need to support SGX live migration (and snapshot).

> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-14 11:31   ` Jan Beulich
@ 2017-07-17  6:11     ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-17  6:11 UTC (permalink / raw)
  To: Jan Beulich, Kai Huang; +Cc: andrew.cooper3, wei.liu2, ian.jackson, xen-devel



On 7/14/2017 11:31 PM, Jan Beulich wrote:
>>>> On 09.07.17 at 10:16, <kaih.linux@gmail.com> wrote:
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -330,6 +330,15 @@ cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
>>           : "0" (idx) );
>>   }
>>   
>> +void cpuid_count(uint32_t idx, uint32_t count, uint32_t *eax,
> 
> Please name the first two leaf and subleaf.

Sure will do.

> 
>> @@ -888,6 +897,18 @@ static uint8_t acpi_lapic_id(unsigned cpu)
>>       return LAPIC_ID(cpu);
>>   }
>>   
>> +static void get_epc_info(struct acpi_config *config)
>> +{
>> +    uint32_t eax, ebx, ecx, edx;
>> +
>> +    cpuid_count(0x12, 0x2, &eax, &ebx, &ecx, &edx);
>> +
>> +    config->epc_base = (((uint64_t)(ebx & 0xfffff)) << 32) |
>> +                       (uint64_t)(eax & 0xfffff000);
> 
> Pointless cast.
> 
>> +    config->epc_size = (((uint64_t)(edx & 0xfffff)) << 32) |
>> +                       (uint64_t)(ecx & 0xfffff000);
> 
> Again.

Will do.

> 
>> --- a/tools/libacpi/dsdt.asl
>> +++ b/tools/libacpi/dsdt.asl
>> @@ -441,6 +441,55 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
>>                   }
>>               }
>>           }
>> +
>> +        Device (EPC)
>> +        {
>> +            Name (_HID, EisaId ("INT0E0C"))
>> +            Name (_STR, Unicode ("Enclave Page Cache 1.5"))
>> +            Name (_MLS, Package (0x01)
>> +            {
>> +                Package (0x02)
>> +                {
>> +                    "en",
>> +                    Unicode ("Enclave Page Cache 1.5")
>> +                }
>> +            })
>> +            Name (RBUF, ResourceTemplate ()
>> +            {
>> +                QWordMemory (ResourceConsumer, PosDecode, MinFixed, MaxFixed,
>> +                    Cacheable, ReadWrite,
>> +                    0x0000000000000000, // Granularity
>> +                    0x0000000000000000, // Range Minimum
>> +                    0x0000000000000000, // Range Maximum
>> +                    0x0000000000000000, // Translation Offset
>> +                    0x0000000000000001, // Length
>> +                    ,, _Y03,
>> +                    AddressRangeMemory, TypeStatic)
>> +            })
>> +
>> +            Method(_CRS, 0, NotSerialized) // _CRS: Current Resource Settings
>> +            {
>> +                CreateQwordField (RBUF, \_SB.EPC._Y03._MIN, EMIN) // _MIN: Minimuum Base Address
>> +                CreateQwordField (RBUF, \_SB.EPC._Y03._MAX, EMAX) // _MIN: Maximum Base Address
>> +                CreateQwordField (RBUF, \_SB.EPC._Y03._LEN, ELEN) // _LEN: Length
> 
> Please see the comment in _SB.PCI0._CRS regarding operations
> on qword fields. Even if we may not formally support the named
> Windows versions anymore, we should continue to be careful
> here. You could have noticed this by seeing that ...
> 
>> @@ -21,6 +21,8 @@
>>              LMIN, 32,
>>              HMIN, 32,
>>              LLEN, 32,
>> -           HLEN, 32
>> +           HLEN, 32,
>> +           EMIN, 64,
>> +           ELEN, 64,
>>          }
> 
> ... there have been no 64-bit fields here so far.

Thank you for pointing this out. I'll take a look.

> 
>> @@ -156,6 +156,9 @@ static int init_acpi_config(libxl__gc *gc,
>>       config->lapic_id = acpi_lapic_id;
>>       config->acpi_revision = 5;
>>   
>> +    config->epc_base = b_info->u.hvm.sgx.epcbase;
>> +    config->epc_size = (b_info->u.hvm.sgx.epckb << 10);
> 
> Pointless parentheses. Plus I guess the field names could do with
> an underscore separator in the middle - it took me a moment to
> realize this is a kB value (explaining the shift by 10).

Sure. will change to epc_kb and epc_base :)

Thanks,
-Kai
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/15] xen: x86: add SGX cpuid handling support.
  2017-07-14  7:37       ` Andrew Cooper
  2017-07-14 11:08         ` Jan Beulich
@ 2017-07-17  6:16         ` Huang, Kai
  1 sibling, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-17  6:16 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel; +Cc: jbeulich



On 7/14/2017 7:37 PM, Andrew Cooper wrote:
> On 13/07/17 07:42, Huang, Kai wrote:
>> On 7/12/2017 10:56 PM, Andrew Cooper wrote:
>>> On 09/07/17 10:10, Kai Huang wrote:
>>>
>>> Why do we need this hide_epc parameter?  If we aren't providing any 
>>> epc resource to the guest, the entire sgx union should be zero and 
>>> the SGX feature bit should be hidden.
>>
>> My intention was to hide physical EPC info for pv_max_policy and 
>> hvm_max_policy (recalculate_sgx is also called by 
>> calculate_pv_max_policy and calculate_hvm_max_policy), as they are for 
>> guest and don't need physical EPC info. But keeping physical EPC info 
>> in them does no harm so I think we can simply remove hide_epc.
> 
> It is my experience that providing half the information is worse than 
> providing none or all of it, because developers are notorious for taking 
> shortcuts when looking for features.
> 
> Patch 1 means that a PV guest will never have p->feat.sgx set. 
> Therefore, we will hit the memset() below, and zero the whole of the SGX 
> union.

Yes I'll remove hide_epc. It is not absolutely needed.

> 
>>
>> IMO we cannot check whether EPC is valid and zero sgx union in 
>> recalculate_sgx, as it is called for each CPUID. For example, it is 
>> called for SGX subleaf 0, and 1, and then 2, and when subleaf 0 and 1 
>> are called, the EPC resource is 0 (hasn't been configured).
> 
> recalculate_*() only get called when the toolstack makes updates to the 
> policy.  It is an unfortunate side effect of the current implementation, 
> but will be going away with my CPUID work.
> 
> The intended flow will be this:
> 
> At Xen boot:
> * Calculates the raw, host and max policies (as we do today)
> 
> At domain create:
> * Appropriate policy gets copied to make the default domain policy.
> * Toolstack gets the whole policy at one with a new 
> DOMCTL_get_cpuid_policy hypercall.
> * Toolstack makes all adjustments (locally) that it wants to, based on 
> configuration, etc.
> * Toolstack makes a single DOMCTL_set_cpuid_policy hypercall.
> * Xen audits the new policy proposed by the toolstack, resulting in a 
> single yes/no decision.
> ** If not, the toolstack is told to try again.  This will likely result 
> in xl asking the user to modify their .cfg file.
> ** If yes, the proposed policy becomes the actual policy.
> 
> This scheme will fix the current problem we have where the toolstack 
> blindly proposes changes (one leaf at a time), and Xen has to zero the 
> bits it doesn't like (because the toolstack has never traditionally 
> checked the return value of the hypercall :( )

This is actually what I was looking for when implementing CPUID support 
for SGX. I think I'll wait for your work to be merged to Xen and then do 
my work above your work. :)

Thanks,
-Kai

> 
>>
>>
>>>
>>>> +
>>>> +            /* Subleaf 2. */
>>>> +            uint32_t base_valid:1, :11, base_pfn_low:20;
>>>> +            uint32_t base_pfn_high:20, :12;
>>>> +            uint32_t size_valid:1, :11, npages_low:20;
>>>> +            uint32_t npages_high:20, :12;
>>>> +        };
>>>
>>> Are the {base,size}_valid fields correct?  The manual says the are 
>>> 4-bit fields rather than single bit fields.
>>
>> They are 4 bits in SDM but actually currently only bit 1 is valid 
>> (other values are reserved). I think for now bool base_valid should be 
>> enough. We can extend when new values come out. What's your suggestion?
> 
> Ok.  That can work for now.
> 
>>
>>>
>>> I would also drop the _pfn from the base names.  The fields still 
>>> need shifting to get a sensible value.
>>
>> OK. Will do.
> 
> As a further thought, what about uint64_t base:40 and size:40?  That 
> would reduce the complexity of calculating the values.
> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset
  2017-07-12 11:09   ` Andrew Cooper
@ 2017-07-17  6:20     ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-17  6:20 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim, jbeulich



On 7/12/2017 11:09 PM, Andrew Cooper wrote:
> On 09/07/17 10:04, Kai Huang wrote:
>> Expose SGX in CPU featureset for HVM domain. SGX will not be supported 
>> for
>> PV domain, as ENCLS (which SGX driver in guest essentially runs) must run
>> in ring 0, while PV kernel runs in ring 3. Theoretically we can 
>> support SGX
>> in PV domain via either emulating #GP caused by ENCLS running in ring 
>> 3, or
>> by PV ENCLS but it is really not necessary at this stage. And 
>> currently SGX
>> is only exposed to HAP HVM domain (we can add for shadow in the future).
>>
>> SGX Launch Control is also exposed in CPU featureset for HVM domain. SGX
>> Launch Control depends on SGX.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> 
> I think its perfectly reasonable to restrict to HVM guests to start 
> with, although I don't see how shadow vs HAP has any impact at this 
> stage?  All that matters is that the EPC pages appear in the guests p2m.

Hmm it seems I forgot replying this one. Sorry. Actually there's no 
difference between shadow and HAP SGX, as currently SGX functionality is 
not depending on EPT. I didn't expose SGX to shadow as I haven't got 
chance to implement and test shadow part. I will add shadow support in 
next version.

Thanks,
-Kai
> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
  2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
                   ` (15 preceding siblings ...)
  2017-07-11 14:13 ` [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Andrew Cooper
@ 2017-07-17  9:16 ` Wei Liu
  2017-07-18  8:22   ` Huang, Kai
  16 siblings, 1 reply; 58+ messages in thread
From: Wei Liu @ 2017-07-17  9:16 UTC (permalink / raw)
  To: Kai Huang
  Cc: tim, kevin.tian, sstabellini, wei.liu2, George.Dunlap,
	andrew.cooper3, ian.jackson, xen-devel, jbeulich

Hi Kai

Thanks for this nice write-up.

Some comments and questions below.

On Sun, Jul 09, 2017 at 08:03:10PM +1200, Kai Huang wrote:
> Hi all,
> 
[...]
> 2. SGX Virtualization Design
> 
> 2.1 High Level Toolstack Changes:
> 
> 2.1.1 New 'epc' parameter
> 
> EPC is limited resource. In order to use EPC efficiently among all domains,
> when creating guest, administrator should be able to specify domain's virtual
> EPC size. And admin
> alao should be able to get all domain's virtual EPC size.
> 
> For this purpose, a new 'epc = <size>' parameter is added to XL configuration
> file. This parameter specifies guest's virtual EPC size. The EPC base address
> will be calculated by toolstack internally, according to guest's memory size,
> MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.
> 
> 2.1.2 New XL commands (?)
> 
> Administrator should be able to get physical EPC size, and all domain's virtual
> EPC size. For this purpose, we can introduce 2 additional commands:
> 
>     # xl sgxinfo
> 
> Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2,
> etc) if necessary.
> 
>     # xl sgxlist <did>
> 
> Which will print out particular domain's virtual EPC size, or list all virtual
> EPC sizes for all supported domains.
> 
> Alternatively, we can also extend existing XL commands by adding new option
> 
>     # xl info -sgx
> 
> Which will print out physical EPC size along with other physinfo. And
> 
>     # xl list <did> -sgx
> 
> Which will print out domain's virtual EPC size.
> 
> Comments?
> 

Can a guest have multiple EPC? If so, the proposed parameter is not good
enough.

Can a guest with EPC enabled be migrated? The answer to this question
can lead to multiple other questions.

Another question, is EPC going to be backed by normal memory? This is
related to memory accounting of the guest.

Is EPC going to be modeled as a device or another type of memory? This
is related to how we manage it in the toolstack.

Finally why do you not allow the users to specify the base address?

> In my RFC patches I didn't implement the commands as I don't know which
> is better. In the github repo I mentioned at the beginning, there's an old
> branch in which I implemented 'xl sgxinfo' and 'xl sgxlist', but they are
> implemented via dedicated hypercall for SGX, which I am not sure whether is a
> good option so I didn't include it in my RFC patches.
> 
> 2.1.3 Notify domain's virtual EPC base and size to Xen
> 
> Xen needs to know guest's EPC base and size in order to populate EPC pages for
> it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.
> 
> 2.1.4 Launch Control Support (?)
[...]
> 
> But maybe integrating EPC to MM framework is more reasonable. Comments?
> 
> 2.2.2 EPC Virtualization (?)
> 
> This part is how to populate EPC for guests. We have 3 choices:
>     - Static Partitioning
>     - Oversubscription
>     - Ballooning
> 

IMHO static partitioning is good enough as a starting point.

Ballooning is nice to have but please don't make it mandatory. Not all
guests have balloon driver -- imagine a unikernel style secure domain
running with EPC.


> 
> 2.3 Additional Point: Live Migration, Snapshot Support (?)
> 

Oh, here it is. Nice.

> Actually from hardware's point of view, SGX is not migratable. There are two
> reasons:
> 
>     - SGX key architecture cannot be virtualized.
> 
>     For example, some keys are bound to CPU. For example, Sealing key, EREPORT
>     key, etc. If VM is migrated to another machine, the same enclave will derive
>     the different keys. Taking Sealing key as an example, Sealing key is
>     typically used by enclave (enclave can get sealing key by EGETKEY) to *seal*
>     its secrets to outside (ex, persistent storage) for further use. If Sealing
>     key changes after VM migration, then the enclave can never get the sealed
>     secrets back by using sealing key, as it has changed, and old sealing key
>     cannot be got back.
> 
>     - There's no ENCLS to evict EPC page to normal memory, but at the meaning
>     time, still keep content in EPC. Currently once EPC page is evicted, the EPC
>     page becomes invalid. So technically, we are unable to implement live
>     migration (or check pointing, or snapshot) for enclave.
> 
> But, with some workaround, and some facts of existing SGX driver, technically
> we are able to support Live migration (or even check pointing, snapshot). This
> is because:
> 
>     - Changing key (which is bound to CPU) is not a problem in reality
> 
>     Take Sealing key as an example. Losing sealed data is not a problem, because
>     sealing key is only supposed to encrypt secrets that can be provisioned
>     again. The typical work model is, enclave gets secrets provisioned from
>     remote (service provider), and use sealing key to store it for further use.
>     When enclave tries to *unseal* use sealing key, if the sealing key is
>     changed, enclave will find the data is some kind of corrupted (integrity
>     check failure), so it will ask secrets to be provisioned again from remote.
>     Another reason is, in data center, VM's typically share lots of data, and as
>     sealing key is bound to CPU, it means the data encrypted by one enclave on
>     one machine cannot be shared by another enclave on another mahcine. So from
>     SGX app writer's point of view, developer should treat Sealing key as a
>     changeable key, and should handle lose of sealing data anyway. Sealing key
>     should only be used to seal secrets that can be easily provisioned again.
> 
>     For other keys such as EREPORT key and provisioning key, which are used for
>     local attestation and remote attestation, due to the second reason below,
>     losing them is not a problem either.
> 
>     - Sudden lose of EPC is not a problem.
> 
>     On hardware, EPC will be lost if system goes to S3-S5, or reset, or
>     shutdown, and SGX driver need to handle lose of EPC due to power transition.
>     This is done by cooperation between SGX driver and userspace SGX SDK/apps.
>     However during live migration, there may not be power transition in guest,
>     so there may not be EPC lose during live migration. And technically we
>     cannot *really* live migrate enclave (explained above), so looks it's not
>     feasible. But the fact is that both Linux SGX driver and Windows SGX driver
>     have already supported *sudden* lose of EPC (not EPC lose during power
>     transition), which means both driver are able to recover in case EPC is lost
>     at any runtime. With this, technically we are able to support live migration
>     by simply ignoring EPC. After VM is migrated, the destination VM will only
>     suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX
>     driver are already able to handle.
> 
>     But we must point out such *sudden* lose of EPC is not hardware behavior,
>     and other SGX driver for other OSes (such as FreeBSD) may not implement
>     this, so for those guests, destination VM will behavior in unexpected
>     manner. But I am not sure we need to care about other OSes.

Presumably it wouldn't be too hard for FreeBSD to replicate the
behaviour of Linux and Windows.

> 
> For the same reason, we are able to support check pointing for SGX guest (only
> Linux and Windows);
> 
> For snapshot, we can support snapshot SGX guest by either:
> 
>     - Suspend guest before snapshot (s3-s5). This works for all guests but
>       requires user to manually susppend guest.
>     - Issue an hypercall to destroy guest's EPC in save_vm. This only works for
>       Linux and Windows but doesn't require user intervention.
> 
> What's your comments?
> 

IMHO it is of course good to have migration and snapshot support for
such guests.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-09  8:16 ` [PATCH 15/15] xen: tools: expose EPC in ACPI table Kai Huang
  2017-07-12 11:05   ` Andrew Cooper
  2017-07-14 11:31   ` Jan Beulich
@ 2017-07-17 10:54   ` Roger Pau Monné
  2017-07-18  8:36     ` Huang, Kai
  2 siblings, 1 reply; 58+ messages in thread
From: Roger Pau Monné @ 2017-07-17 10:54 UTC (permalink / raw)
  To: Kai Huang; +Cc: ian.jackson, andrew.cooper3, wei.liu2, jbeulich, xen-devel

On Sun, Jul 09, 2017 at 08:16:05PM +1200, Kai Huang wrote:
> On physical machine EPC is exposed in ACPI table via "INT0E0C". Although EPC
> can be discovered by CPUID but Windows driver requires EPC to be exposed in
> ACPI table as well. This patch exposes EPC in ACPI table.
> 
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  tools/firmware/hvmloader/util.c  | 23 +++++++++++++++++++
>  tools/firmware/hvmloader/util.h  |  3 +++

Is there any reason this needs to be done in hvmloader instead of
libacpi? I'm mostly asking this because PVH guests can also get ACPI
tables, so it would be good to be able to expose EPC to them using
ACPI.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
  2017-07-17  9:16 ` Wei Liu
@ 2017-07-18  8:22   ` Huang, Kai
  2017-07-28 13:40     ` Wei Liu
  0 siblings, 1 reply; 58+ messages in thread
From: Huang, Kai @ 2017-07-18  8:22 UTC (permalink / raw)
  To: Wei Liu, Kai Huang
  Cc: kevin.tian, sstabellini, George.Dunlap, andrew.cooper3, tim,
	xen-devel, jbeulich, ian.jackson

Hi Wei,

Thank you very much for comments. Please see my reply below.

On 7/17/2017 9:16 PM, Wei Liu wrote:
> Hi Kai
> 
> Thanks for this nice write-up.
> 
> Some comments and questions below.
> 
> On Sun, Jul 09, 2017 at 08:03:10PM +1200, Kai Huang wrote:
>> Hi all,
>>
> [...]
>> 2. SGX Virtualization Design
>>
>> 2.1 High Level Toolstack Changes:
>>
>> 2.1.1 New 'epc' parameter
>>
>> EPC is limited resource. In order to use EPC efficiently among all domains,
>> when creating guest, administrator should be able to specify domain's virtual
>> EPC size. And admin
>> alao should be able to get all domain's virtual EPC size.
>>
>> For this purpose, a new 'epc = <size>' parameter is added to XL configuration
>> file. This parameter specifies guest's virtual EPC size. The EPC base address
>> will be calculated by toolstack internally, according to guest's memory size,
>> MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.
>>
>> 2.1.2 New XL commands (?)
>>
>> Administrator should be able to get physical EPC size, and all domain's virtual
>> EPC size. For this purpose, we can introduce 2 additional commands:
>>
>>      # xl sgxinfo
>>
>> Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2,
>> etc) if necessary.
>>
>>      # xl sgxlist <did>
>>
>> Which will print out particular domain's virtual EPC size, or list all virtual
>> EPC sizes for all supported domains.
>>
>> Alternatively, we can also extend existing XL commands by adding new option
>>
>>      # xl info -sgx
>>
>> Which will print out physical EPC size along with other physinfo. And
>>
>>      # xl list <did> -sgx
>>
>> Which will print out domain's virtual EPC size.
>>
>> Comments?
>>
> 
> Can a guest have multiple EPC? If so, the proposed parameter is not good
> enough.

According to SDM a machine may have multiple EPC, but it may have 
doesn't mean it must have. EPC is typically reserved by BIOS as 
Processor Reserved Memory (PRM), and in my understanding, client machine 
  doesn't need to have multiple EPC. Currently, I don't see why we need 
to expose multiple EPC to guest. Even physical machine reports multiple 
EPC, exposing one EPC to guest is enough. Currently SGX should not be 
supported with virtual NUMA simultaneously for a single domain.

> 
> Can a guest with EPC enabled be migrated? The answer to this question
> can lead to multiple other questions.

See the last section of my design. I saw you've already seen it. :)

> 
> Another question, is EPC going to be backed by normal memory? This is
> related to memory accounting of the guest.

Although SDM says typically EPC is allocated by BIOS as PRM, but I think 
we can just treat EPC as PRM, so I believe yes, physically EPC is backed 
by normal memory. But EPC is reported as reserved memory in e820 table.

> 
> Is EPC going to be modeled as a device or another type of memory? This
> is related to how we manage it in the toolstack.

I think we'd better to treat EPC as another type of memory. I am not 
sure whether it should be modeled as device, as on real machine, EPC is 
also exposed in ACPI table via "INT0E0C" device under \_SB (however it 
is not modeled as PCIE device for sure).

> 
> Finally why do you not allow the users to specify the base address?

I don't see any reason why user needs to specify base address. If we do, 
then specify what address? On real machine, BIOS set the base address, 
and for VM, I think toolstack/Xen should do this.

> 
>> In my RFC patches I didn't implement the commands as I don't know which
>> is better. In the github repo I mentioned at the beginning, there's an old
>> branch in which I implemented 'xl sgxinfo' and 'xl sgxlist', but they are
>> implemented via dedicated hypercall for SGX, which I am not sure whether is a
>> good option so I didn't include it in my RFC patches.
>>
>> 2.1.3 Notify domain's virtual EPC base and size to Xen
>>
>> Xen needs to know guest's EPC base and size in order to populate EPC pages for
>> it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.
>>
>> 2.1.4 Launch Control Support (?)
> [...]
>>
>> But maybe integrating EPC to MM framework is more reasonable. Comments?
>>
>> 2.2.2 EPC Virtualization (?)
>>
>> This part is how to populate EPC for guests. We have 3 choices:
>>      - Static Partitioning
>>      - Oversubscription
>>      - Ballooning
>>
> 
> IMHO static partitioning is good enough as a starting point.
> 
> Ballooning is nice to have but please don't make it mandatory. Not all
> guests have balloon driver -- imagine a unikernel style secure domain
> running with EPC.

That's good point. Thanks.
> 
> 
>>
>> 2.3 Additional Point: Live Migration, Snapshot Support (?)
>>
> 
> Oh, here it is. Nice.
> 
>> Actually from hardware's point of view, SGX is not migratable. There are two
>> reasons:
>>
>>      - SGX key architecture cannot be virtualized.
>>
>>      For example, some keys are bound to CPU. For example, Sealing key, EREPORT
>>      key, etc. If VM is migrated to another machine, the same enclave will derive
>>      the different keys. Taking Sealing key as an example, Sealing key is
>>      typically used by enclave (enclave can get sealing key by EGETKEY) to *seal*
>>      its secrets to outside (ex, persistent storage) for further use. If Sealing
>>      key changes after VM migration, then the enclave can never get the sealed
>>      secrets back by using sealing key, as it has changed, and old sealing key
>>      cannot be got back.
>>
>>      - There's no ENCLS to evict EPC page to normal memory, but at the meaning
>>      time, still keep content in EPC. Currently once EPC page is evicted, the EPC
>>      page becomes invalid. So technically, we are unable to implement live
>>      migration (or check pointing, or snapshot) for enclave.
>>
>> But, with some workaround, and some facts of existing SGX driver, technically
>> we are able to support Live migration (or even check pointing, snapshot). This
>> is because:
>>
>>      - Changing key (which is bound to CPU) is not a problem in reality
>>
>>      Take Sealing key as an example. Losing sealed data is not a problem, because
>>      sealing key is only supposed to encrypt secrets that can be provisioned
>>      again. The typical work model is, enclave gets secrets provisioned from
>>      remote (service provider), and use sealing key to store it for further use.
>>      When enclave tries to *unseal* use sealing key, if the sealing key is
>>      changed, enclave will find the data is some kind of corrupted (integrity
>>      check failure), so it will ask secrets to be provisioned again from remote.
>>      Another reason is, in data center, VM's typically share lots of data, and as
>>      sealing key is bound to CPU, it means the data encrypted by one enclave on
>>      one machine cannot be shared by another enclave on another mahcine. So from
>>      SGX app writer's point of view, developer should treat Sealing key as a
>>      changeable key, and should handle lose of sealing data anyway. Sealing key
>>      should only be used to seal secrets that can be easily provisioned again.
>>
>>      For other keys such as EREPORT key and provisioning key, which are used for
>>      local attestation and remote attestation, due to the second reason below,
>>      losing them is not a problem either.
>>
>>      - Sudden lose of EPC is not a problem.
>>
>>      On hardware, EPC will be lost if system goes to S3-S5, or reset, or
>>      shutdown, and SGX driver need to handle lose of EPC due to power transition.
>>      This is done by cooperation between SGX driver and userspace SGX SDK/apps.
>>      However during live migration, there may not be power transition in guest,
>>      so there may not be EPC lose during live migration. And technically we
>>      cannot *really* live migrate enclave (explained above), so looks it's not
>>      feasible. But the fact is that both Linux SGX driver and Windows SGX driver
>>      have already supported *sudden* lose of EPC (not EPC lose during power
>>      transition), which means both driver are able to recover in case EPC is lost
>>      at any runtime. With this, technically we are able to support live migration
>>      by simply ignoring EPC. After VM is migrated, the destination VM will only
>>      suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX
>>      driver are already able to handle.
>>
>>      But we must point out such *sudden* lose of EPC is not hardware behavior,
>>      and other SGX driver for other OSes (such as FreeBSD) may not implement
>>      this, so for those guests, destination VM will behavior in unexpected
>>      manner. But I am not sure we need to care about other OSes.
> 
> Presumably it wouldn't be too hard for FreeBSD to replicate the
> behaviour of Linux and Windows.

The problem is this is not hardware behavior. If FreeBSD guys just look 
at the SDM then they may not expect such sudden lose of EPC. But I guess 
maybe they will just port existing driver. :)

> 
>>
>> For the same reason, we are able to support check pointing for SGX guest (only
>> Linux and Windows);
>>
>> For snapshot, we can support snapshot SGX guest by either:
>>
>>      - Suspend guest before snapshot (s3-s5). This works for all guests but
>>        requires user to manually susppend guest.
>>      - Issue an hypercall to destroy guest's EPC in save_vm. This only works for
>>        Linux and Windows but doesn't require user intervention.
>>
>> What's your comments?
>>
> 
> IMHO it is of course good to have migration and snapshot support for
> such guests.

Thanks. I have no problem supporting migration and snapshot if no one 
opposes.

Thanks,
-Kai

> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-17 10:54   ` Roger Pau Monné
@ 2017-07-18  8:36     ` Huang, Kai
  2017-07-18 10:21       ` Roger Pau Monné
  0 siblings, 1 reply; 58+ messages in thread
From: Huang, Kai @ 2017-07-18  8:36 UTC (permalink / raw)
  To: Roger Pau Monné, Kai Huang
  Cc: andrew.cooper3, ian.jackson, wei.liu2, jbeulich, xen-devel



On 7/17/2017 10:54 PM, Roger Pau Monné wrote:
> On Sun, Jul 09, 2017 at 08:16:05PM +1200, Kai Huang wrote:
>> On physical machine EPC is exposed in ACPI table via "INT0E0C". Although EPC
>> can be discovered by CPUID but Windows driver requires EPC to be exposed in
>> ACPI table as well. This patch exposes EPC in ACPI table.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   tools/firmware/hvmloader/util.c  | 23 +++++++++++++++++++
>>   tools/firmware/hvmloader/util.h  |  3 +++
> 
> Is there any reason this needs to be done in hvmloader instead of
> libacpi? I'm mostly asking this because PVH guests can also get ACPI
> tables, so it would be good to be able to expose EPC to them using
> ACPI.

Hi Roger,

Thanks for comments. I didn't deliberately choose to do in hvmloader 
instead of libacpi. It seems libxl only builds ACPI table when guest is 
HVM, and it doesn't use any device model, and I think I have covered 
this part (see changes to init_acpi_config). Is there anything that I 
missed?

Thanks,
-Kai
> 
> Thanks, Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset
  2017-07-09  8:04 ` [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset Kai Huang
  2017-07-12 11:09   ` Andrew Cooper
@ 2017-07-18 10:12   ` Andrew Cooper
  2017-07-18 22:41     ` Huang, Kai
  1 sibling, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-18 10:12 UTC (permalink / raw)
  To: Kai Huang, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, tim, ian.jackson, jbeulich

On 09/07/17 09:04, Kai Huang wrote:
> Expose SGX in CPU featureset for HVM domain. SGX will not be supported for
> PV domain, as ENCLS (which SGX driver in guest essentially runs) must run
> in ring 0, while PV kernel runs in ring 3. Theoretically we can support SGX
> in PV domain via either emulating #GP caused by ENCLS running in ring 3, or
> by PV ENCLS but it is really not necessary at this stage. And currently SGX
> is only exposed to HAP HVM domain (we can add for shadow in the future).
>
> SGX Launch Control is also exposed in CPU featureset for HVM domain. SGX
> Launch Control depends on SGX.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/include/public/arch-x86/cpufeatureset.h | 3 ++-
>  xen/tools/gen-cpuid.py                      | 3 +++
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
> index 97dd3534c5..b6c54e654e 100644
> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -193,7 +193,7 @@ XEN_CPUFEATURE(XSAVES,        4*32+ 3) /*S  XSAVES/XRSTORS instructions */
>  /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
>  XEN_CPUFEATURE(FSGSBASE,      5*32+ 0) /*A  {RD,WR}{FS,GS}BASE instructions */
>  XEN_CPUFEATURE(TSC_ADJUST,    5*32+ 1) /*S  TSC_ADJUST MSR available */
> -XEN_CPUFEATURE(SGX,           5*32+ 2) /*   Software Guard extensions */
> +XEN_CPUFEATURE(SGX,           5*32+ 2) /*H  Intel Software Guard extensions */
>  XEN_CPUFEATURE(BMI1,          5*32+ 3) /*A  1st bit manipulation extensions */
>  XEN_CPUFEATURE(HLE,           5*32+ 4) /*A  Hardware Lock Elision */
>  XEN_CPUFEATURE(AVX2,          5*32+ 5) /*A  AVX2 instructions */
> @@ -229,6 +229,7 @@ XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
>  XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
>  XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
>  XEN_CPUFEATURE(RDPID,         6*32+22) /*A  RDPID instruction */
> +XEN_CPUFEATURE(SGX_LAUNCH_CONTROL, 6*32+30) /*H Intel SGX Launch Control */

Could we abbreviate this to SGX_LC ?  It is certainly rather shorter to
write, and appears to be used elsewhere.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-18  8:36     ` Huang, Kai
@ 2017-07-18 10:21       ` Roger Pau Monné
  2017-07-18 22:44         ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Roger Pau Monné @ 2017-07-18 10:21 UTC (permalink / raw)
  To: Huang, Kai
  Cc: wei.liu2, andrew.cooper3, ian.jackson, xen-devel, jbeulich, Kai Huang

On Tue, Jul 18, 2017 at 08:36:15PM +1200, Huang, Kai wrote:
> 
> 
> On 7/17/2017 10:54 PM, Roger Pau Monné wrote:
> > On Sun, Jul 09, 2017 at 08:16:05PM +1200, Kai Huang wrote:
> > > On physical machine EPC is exposed in ACPI table via "INT0E0C". Although EPC
> > > can be discovered by CPUID but Windows driver requires EPC to be exposed in
> > > ACPI table as well. This patch exposes EPC in ACPI table.
> > > 
> > > Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> > > ---
> > >   tools/firmware/hvmloader/util.c  | 23 +++++++++++++++++++
> > >   tools/firmware/hvmloader/util.h  |  3 +++
> > 
> > Is there any reason this needs to be done in hvmloader instead of
> > libacpi? I'm mostly asking this because PVH guests can also get ACPI
> > tables, so it would be good to be able to expose EPC to them using
> > ACPI.
> 
> Hi Roger,
> 
> Thanks for comments. I didn't deliberately choose to do in hvmloader instead
> of libacpi. It seems libxl only builds ACPI table when guest is HVM, and it
> doesn't use any device model, and I think I have covered this part (see
> changes to init_acpi_config). Is there anything that I missed?

dsdt.asl is only used for HVM guests, PVH guests basically get an
empty dsdt + dsdt_acpi_info + processor objects populated by make_dsdt
(see Makefile in libacpi), so they end up without the EPC Device
block.

It would be good if a new empty dsdt is created, that contains the
Device EPC block, or a ssdt is used, and it's added to both HVM/PVH
guests.

Alternatively you could also code the EPC Device block in mk_dsdt, but
that's going to be cumbersome IMHO.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset
  2017-07-18 10:12   ` Andrew Cooper
@ 2017-07-18 22:41     ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-18 22:41 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim, jbeulich



On 7/18/2017 10:12 PM, Andrew Cooper wrote:
> On 09/07/17 09:04, Kai Huang wrote:
>> Expose SGX in CPU featureset for HVM domain. SGX will not be supported for
>> PV domain, as ENCLS (which SGX driver in guest essentially runs) must run
>> in ring 0, while PV kernel runs in ring 3. Theoretically we can support SGX
>> in PV domain via either emulating #GP caused by ENCLS running in ring 3, or
>> by PV ENCLS but it is really not necessary at this stage. And currently SGX
>> is only exposed to HAP HVM domain (we can add for shadow in the future).
>>
>> SGX Launch Control is also exposed in CPU featureset for HVM domain. SGX
>> Launch Control depends on SGX.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/include/public/arch-x86/cpufeatureset.h | 3 ++-
>>   xen/tools/gen-cpuid.py                      | 3 +++
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
>> index 97dd3534c5..b6c54e654e 100644
>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -193,7 +193,7 @@ XEN_CPUFEATURE(XSAVES,        4*32+ 3) /*S  XSAVES/XRSTORS instructions */
>>   /* Intel-defined CPU features, CPUID level 0x00000007:0.ebx, word 5 */
>>   XEN_CPUFEATURE(FSGSBASE,      5*32+ 0) /*A  {RD,WR}{FS,GS}BASE instructions */
>>   XEN_CPUFEATURE(TSC_ADJUST,    5*32+ 1) /*S  TSC_ADJUST MSR available */
>> -XEN_CPUFEATURE(SGX,           5*32+ 2) /*   Software Guard extensions */
>> +XEN_CPUFEATURE(SGX,           5*32+ 2) /*H  Intel Software Guard extensions */
>>   XEN_CPUFEATURE(BMI1,          5*32+ 3) /*A  1st bit manipulation extensions */
>>   XEN_CPUFEATURE(HLE,           5*32+ 4) /*A  Hardware Lock Elision */
>>   XEN_CPUFEATURE(AVX2,          5*32+ 5) /*A  AVX2 instructions */
>> @@ -229,6 +229,7 @@ XEN_CPUFEATURE(PKU,           6*32+ 3) /*H  Protection Keys for Userspace */
>>   XEN_CPUFEATURE(OSPKE,         6*32+ 4) /*!  OS Protection Keys Enable */
>>   XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
>>   XEN_CPUFEATURE(RDPID,         6*32+22) /*A  RDPID instruction */
>> +XEN_CPUFEATURE(SGX_LAUNCH_CONTROL, 6*32+30) /*H Intel SGX Launch Control */
> 
> Could we abbreviate this to SGX_LC ?  It is certainly rather shorter to
> write, and appears to be used elsewhere.

Sure. Will do.

Thanks,
-Kai
> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 15/15] xen: tools: expose EPC in ACPI table
  2017-07-18 10:21       ` Roger Pau Monné
@ 2017-07-18 22:44         ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-18 22:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: wei.liu2, andrew.cooper3, ian.jackson, xen-devel, jbeulich, Kai Huang



On 7/18/2017 10:21 PM, Roger Pau Monné wrote:
> On Tue, Jul 18, 2017 at 08:36:15PM +1200, Huang, Kai wrote:
>>
>>
>> On 7/17/2017 10:54 PM, Roger Pau Monné wrote:
>>> On Sun, Jul 09, 2017 at 08:16:05PM +1200, Kai Huang wrote:
>>>> On physical machine EPC is exposed in ACPI table via "INT0E0C". Although EPC
>>>> can be discovered by CPUID but Windows driver requires EPC to be exposed in
>>>> ACPI table as well. This patch exposes EPC in ACPI table.
>>>>
>>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>>> ---
>>>>    tools/firmware/hvmloader/util.c  | 23 +++++++++++++++++++
>>>>    tools/firmware/hvmloader/util.h  |  3 +++
>>>
>>> Is there any reason this needs to be done in hvmloader instead of
>>> libacpi? I'm mostly asking this because PVH guests can also get ACPI
>>> tables, so it would be good to be able to expose EPC to them using
>>> ACPI.
>>
>> Hi Roger,
>>
>> Thanks for comments. I didn't deliberately choose to do in hvmloader instead
>> of libacpi. It seems libxl only builds ACPI table when guest is HVM, and it
>> doesn't use any device model, and I think I have covered this part (see
>> changes to init_acpi_config). Is there anything that I missed?
> 
> dsdt.asl is only used for HVM guests, PVH guests basically get an
> empty dsdt + dsdt_acpi_info + processor objects populated by make_dsdt
> (see Makefile in libacpi), so they end up without the EPC Device
> block.
> 
> It would be good if a new empty dsdt is created, that contains the
> Device EPC block, or a ssdt is used, and it's added to both HVM/PVH
> guests.
> 
> Alternatively you could also code the EPC Device block in mk_dsdt, but
> that's going to be cumbersome IMHO.

Hi Roger,

I got your point. I think it's definitely better to cover PVH if we can. 
Let me see whether it is possible to do.

Thanks,
-Kai
> 
> Thanks, Roger.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/15] xen: x86: add early stage SGX feature detection
  2017-07-09  8:09 ` [PATCH 03/15] xen: x86: add early stage SGX feature detection Kai Huang
@ 2017-07-19 14:23   ` Andrew Cooper
  2017-07-21  9:17     ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-19 14:23 UTC (permalink / raw)
  To: Kai Huang, xen-devel; +Cc: kevin.tian, jbeulich

On 09/07/17 09:09, Kai Huang wrote:
> This patch adds early stage SGX feature detection via SGX CPUID 0x12. Function
> detect_sgx is added to detect SGX info on each CPU (called from vmx_cpu_up).
> SDM says SGX info returned by CPUID is per-thread, and we cannot assume all
> threads will return the same SGX info, so we have to detect SGX for each CPU.
> For simplicity, currently SGX is only supported when all CPUs reports the same
> SGX info.
>
> SDM also says it's possible to have multiple EPC sections but this is only for
> multiple-socket server, which we don't support now (there are other things
> need to be done, ex, NUMA EPC, scheduling, etc, as well), so currently only
> one EPC is supported.
>
> Dedicated files sgx.c and sgx.h are added (under vmx directory as SGX is Intel
> specific) for bulk of above SGX detection code detection code, and for further
> SGX code as well.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>

I am not sure putting this under hvm/ is a sensible move.  Almost
everything in this patch is currently common, and I can forsee us
wanting to introduce PV support, so it would be good to introduce this
in a guest-neutral location to begin with.

> ---
>  xen/arch/x86/hvm/vmx/Makefile     |   1 +
>  xen/arch/x86/hvm/vmx/sgx.c        | 208 ++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/vmx/vmcs.c       |   4 +
>  xen/include/asm-x86/cpufeature.h  |   1 +
>  xen/include/asm-x86/hvm/vmx/sgx.h |  45 +++++++++
>  5 files changed, 259 insertions(+)
>  create mode 100644 xen/arch/x86/hvm/vmx/sgx.c
>  create mode 100644 xen/include/asm-x86/hvm/vmx/sgx.h
>
> diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
> index 04a29ce59d..f6bcf0d143 100644
> --- a/xen/arch/x86/hvm/vmx/Makefile
> +++ b/xen/arch/x86/hvm/vmx/Makefile
> @@ -4,3 +4,4 @@ obj-y += realmode.o
>  obj-y += vmcs.o
>  obj-y += vmx.o
>  obj-y += vvmx.o
> +obj-y += sgx.o
> diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
> new file mode 100644
> index 0000000000..6b41469371
> --- /dev/null
> +++ b/xen/arch/x86/hvm/vmx/sgx.c

This file looks like it should be arch/x86/sgx.c, given its current content.

> @@ -0,0 +1,208 @@
> +/*
> + * Intel Software Guard Extensions support

Please include a GPLv2 header.

> + *
> + * Author: Kai Huang <kai.huang@linux.intel.com>
> + */
> +
> +#include <asm/cpufeature.h>
> +#include <asm/msr-index.h>
> +#include <asm/msr.h>
> +#include <asm/hvm/vmx/sgx.h>
> +#include <asm/hvm/vmx/vmcs.h>
> +
> +static struct sgx_cpuinfo __read_mostly sgx_cpudata[NR_CPUS];
> +static struct sgx_cpuinfo __read_mostly boot_sgx_cpudata;

I don't think any of this is necessary.  The description says that all
EPCs across the server will be reported in CPUID subleaves, and our
implementation gives up if the data are non-identical across CPUs.

Therefore, we only need to keep one copy of the data, and check check
APs against the master copy.


Let me see about splitting up a few bits of the existing CPUID
infrastructure, so we can use the host cpuid policy more effectively for
Xen related things.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/15] xen: vmx: handle SGX related MSRs
  2017-07-09  8:09 ` [PATCH 09/15] xen: vmx: handle SGX related MSRs Kai Huang
@ 2017-07-19 17:27   ` Andrew Cooper
  2017-07-21  9:42     ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2017-07-19 17:27 UTC (permalink / raw)
  To: Kai Huang, xen-devel; +Cc: kevin.tian, jbeulich

On 09/07/17 09:09, Kai Huang wrote:
> This patch handles IA32_FEATURE_CONTROL and IA32_SGXLEPUBKEYHASHn MSRs.
>
> For IA32_FEATURE_CONTROL, if SGX is exposed to domain, then SGX_ENABLE bit
> is always set. If SGX launch control is also exposed to domain, and physical
> IA32_SGXLEPUBKEYHASHn are writable, then SGX_LAUNCH_CONTROL_ENABLE bit is
> also always set. Write to IA32_FEATURE_CONTROL is ignored.
>
> For IA32_SGXLEPUBKEYHASHn, a new 'struct sgx_vcpu' is added for per-vcpu SGX
> staff, and currently it has vcpu's virtual ia32_sgxlepubkeyhash[0-3]. Two
> boolean 'readable' and 'writable' are also added to indicate whether virtual
> IA32_SGXLEPUBKEYHASHn are readable and writable.
>
> During vcpu is initialized, virtual ia32_sgxlepubkeyhash are also initialized.
> If physical IA32_SGXLEPUBKEYHASHn are writable, then ia32_sgxlepubkeyhash are
> set to Intel's default value, as for physical machine, those MSRs will have
> Intel's default value. If physical MSRs are not writable (it is *locked* by
> BIOS before handling to Xen), then we try to read those MSRs and use physical
> values as defult value for virtual MSRs. One thing is rdmsr_safe is used, as
> although SDM says if SGX is present, IA32_SGXLEPUBKEYHASHn are available for
> read, but in reality, skylake client (at least some, depending on BIOS) doesn't
> have those MSRs available, so we use rdmsr_safe and set readable to false if it
> returns error code.
>
> For IA32_SGXLEPUBKEYHASHn MSR read from guest, if physical MSRs are not
> readable, guest is not allowed to read either, otherwise vcpu's virtual MSR
> value is returned.
>
> For IA32_SGXLEPUBKEYHASHn MSR write from guest, we allow guest to write if both
> physical MSRs are writable and SGX launch control is exposed to domain,
> otherwise error is injected.
>
> To make EINIT run successfully in guest, vcpu's virtual IA32_SGXLEPUBKEYHASHn
> will be update to physical MSRs when vcpu is scheduled in.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  xen/arch/x86/hvm/vmx/sgx.c         | 194 +++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/vmx/vmx.c         |  24 +++++
>  xen/include/asm-x86/cpufeature.h   |   3 +
>  xen/include/asm-x86/hvm/vmx/sgx.h  |  22 +++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |   2 +
>  xen/include/asm-x86/msr-index.h    |   6 ++
>  6 files changed, 251 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
> index 14379151e8..4944e57aef 100644
> --- a/xen/arch/x86/hvm/vmx/sgx.c
> +++ b/xen/arch/x86/hvm/vmx/sgx.c
> @@ -405,6 +405,200 @@ void hvm_destroy_epc(struct domain *d)
>      hvm_reset_epc(d, true);
>  }
>  
> +/* Whether IA32_SGXLEPUBKEYHASHn are physically *unlocked* by BIOS */
> +bool_t sgx_ia32_sgxlepubkeyhash_writable(void)
> +{
> +    uint64_t sgx_lc_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
> +                              IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE |
> +                              IA32_FEATURE_CONTROL_LOCK;
> +    uint64_t val;
> +
> +    rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
> +
> +    return (val & sgx_lc_enabled) == sgx_lc_enabled;
> +}
> +
> +bool_t domain_has_sgx(struct domain *d)
> +{
> +    /* hvm_epc_populated(d) implies CPUID has SGX */
> +    return hvm_epc_populated(d);
> +}
> +
> +bool_t domain_has_sgx_launch_control(struct domain *d)
> +{
> +    struct cpuid_policy *p = d->arch.cpuid;
> +
> +    if ( !domain_has_sgx(d) )
> +        return false;
> +
> +    /* Unnecessary but check anyway */
> +    if ( !cpu_has_sgx_launch_control )
> +        return false;
> +
> +    return !!p->feat.sgx_launch_control;
> +}

Both of these should be d->arch.cpuid->feat.{sgx,sgx_lc} only, and not
from having individual helpers.

The CPUID setup during host boot and domain construction should take
care of setting everything up properly, or hiding the features from the
guest.  The point of the work I've been doing is to prevent situations
where the guest can see SGX but something doesn't work because of Xen
using nested checks like this.

> +
> +/* Digest of Intel signing key. MSR's default value after reset. */
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d
> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b
> +
> +void sgx_vcpu_init(struct vcpu *v)
> +{
> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
> +
> +    memset(sgxv, 0, sizeof (*sgxv));
> +
> +    if ( sgx_ia32_sgxlepubkeyhash_writable() )
> +    {
> +        /*
> +         * If physical MSRs are writable, set vcpu's default value to Intel's
> +         * default value. For real machine, after reset, MSRs contain Intel's
> +         * default value.
> +         */
> +        sgxv->ia32_sgxlepubkeyhash[0] = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
> +        sgxv->ia32_sgxlepubkeyhash[1] = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
> +        sgxv->ia32_sgxlepubkeyhash[2] = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
> +        sgxv->ia32_sgxlepubkeyhash[3] = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
> +
> +        sgxv->readable = 1;
> +        sgxv->writable = domain_has_sgx_launch_control(v->domain);
> +    }
> +    else
> +    {
> +        uint64_t v;
> +        /*
> +         * Although SDM says if SGX is present, then IA32_SGXLEPUBKEYHASHn are
> +         * available for read, but in reality for SKYLAKE client machines,
> +         * those MSRs are not available if SGX is present, so we cannot rely on
> +         * cpu_has_sgx to determine whether to we are able to read MSRs,
> +         * instead, we always use rdmsr_safe.

Talking with Jun at XenSummit, I got the impression that the
availability of these has MSRs is based on SGX_LC, not SGX.

Furthermore, that is my reading of 41.2.2 "Intel SGX Launch Control
Configuration", although the logic is expressed in terms of checking SGX
before SGX_LC.

> +         */
> +        sgxv->readable = rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, v) ? 0 : 1;
> +
> +        if ( !sgxv->readable )
> +            return;
> +
> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, sgxv->ia32_sgxlepubkeyhash[0]);
> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH1, sgxv->ia32_sgxlepubkeyhash[1]);
> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH2, sgxv->ia32_sgxlepubkeyhash[2]);
> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH3, sgxv->ia32_sgxlepubkeyhash[3]);
> +    }
> +}
> +
> +void sgx_ctxt_switch_to(struct vcpu *v)
> +{
> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
> +
> +    if ( sgxv->writable && sgx_ia32_sgxlepubkeyhash_writable() )

This causes a read of FEATURE_CONTROL on every context switch path,
which is inefficient.

Just like with CPUID policy, we will (eventually) have a generic MSR
policy for the guest to use.  In particular, I can forsee a usecase
where hardware has LC unlocked, but the host administrator wishes LC to
be locked from the guests point of view.

> +    {
> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0, sgxv->ia32_sgxlepubkeyhash[0]);
> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1, sgxv->ia32_sgxlepubkeyhash[1]);
> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2, sgxv->ia32_sgxlepubkeyhash[2]);
> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3, sgxv->ia32_sgxlepubkeyhash[3]);
> +    }
> +}
> +
> +int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 *msr_content)
> +{
> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
> +    u64 data;
> +    int r = 1;
> +
> +    if ( !domain_has_sgx(v->domain) )
> +        return 0;
> +
> +    switch ( msr )
> +    {
> +    case MSR_IA32_FEATURE_CONTROL:
> +        data = (IA32_FEATURE_CONTROL_LOCK |
> +                IA32_FEATURE_CONTROL_SGX_ENABLE);
> +        /*
> +         * If physical IA32_SGXLEPUBKEYHASHn are writable, then we always
> +         * allow guest to be able to change IA32_SGXLEPUBKEYHASHn at runtime.
> +         */
> +        if ( sgx_ia32_sgxlepubkeyhash_writable() &&
> +                domain_has_sgx_launch_control(v->domain) )
> +            data |= IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
> +
> +        *msr_content = data;
> +
> +        break;

Newline here please.

> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:

Spaces around ... please.  (it is only because of the #defines that this
isn't a syntax error).

> +        /*
> +         * SDM 35.1 Model-Specific Registers, table 35-2.
> +         *
> +         * IA32_SGXLEPUBKEYHASH[0..3]:
> +         *
> +         * Read permitted if CPUID.0x12.0:EAX[0] = 1.
> +         *
> +         * In reality, MSRs may not be readable even SGX is present, in which
> +         * case guest is not allowed to read either.
> +         */
> +        if ( !sgxv->readable )
> +        {
> +            r = 0;
> +            break;
> +        }
> +
> +        data = sgxv->ia32_sgxlepubkeyhash[msr - MSR_IA32_SGXLEPUBKEYHASH0];
> +
> +        *msr_content = data;
> +
> +        break;
> +    default:
> +        r = 0;
> +        break;
> +    }
> +
> +    return r;
> +}
> +
> +int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 msr_content)
> +{
> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
> +    int r = 1;
> +
> +    if ( !domain_has_sgx(v->domain) )
> +        return 0;
> +
> +    switch ( msr )
> +    {
> +    case MSR_IA32_FEATURE_CONTROL:
> +        /* sliently drop */

Silently dropping is not ok.  This change needs rebasing over c/s
46c3acb308 where I have fixed up the writeability of FEATURE_CONTROL.

> +        break;
> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
> +        /*
> +         * SDM 35.1 Model-Specific Registers, table 35-2.
> +         *
> +         * IA32_SGXLEPUBKEYHASH[0..3]:
> +         *
> +         * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is available.
> +         * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
> +         *      FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
> +         *
> +         * sgxv->writable == 1 means sgx_ia32_sgxlepubkeyhash_writable() and
> +         * domain_has_sgx_launch_control(d) both are true.
> +         */
> +        if ( !sgxv->writable )
> +        {
> +            r = 0;
> +            break;
> +        }
> +
> +        sgxv->ia32_sgxlepubkeyhash[msr - MSR_IA32_SGXLEPUBKEYHASH0] =
> +            msr_content;
> +
> +        break;
> +    default:
> +        r = 0;
> +        break;
> +    }
> +
> +    return r;
> +}
> +
>  static bool_t sgx_enabled_in_bios(void)
>  {
>      uint64_t val, sgx_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 243643111d..7ee5515bdc 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -470,6 +470,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>      if ( v->vcpu_id == 0 )
>          v->arch.user_regs.rax = 1;
>  
> +    sgx_vcpu_init(v);
> +
>      return 0;
>  }
>  
> @@ -1048,6 +1050,9 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
>  
>      if ( v->domain->arch.hvm_domain.pi_ops.switch_to )
>          v->domain->arch.hvm_domain.pi_ops.switch_to(v);
> +
> +    if ( domain_has_sgx(v->domain) )
> +        sgx_ctxt_switch_to(v);
>  }
>  
>  
> @@ -2876,10 +2881,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>          __vmread(GUEST_IA32_DEBUGCTL, msr_content);
>          break;
>      case MSR_IA32_FEATURE_CONTROL:
> +        /* If neither SGX nor nested is supported, this MSR should not be
> +         * touched */
> +        if ( !sgx_msr_read_intercept(current, msr, msr_content) &&
> +                !nvmx_msr_read_intercept(msr, msr_content) )
> +            goto gp_fault;

Unfortunately, this logic is broken.  In the case that both SMX and VMX
are configured, the VMX handler will clobber the values set up by the
SGX handler.  Sergey has a VMX-policy series (v1 posted, v2 in the
works) to start addressing some of the issues on the VMX side, but
fundamentally, all reads like this need serving out of a single policy,
rather than having different subsystems fighting for control of the
values.  (The Xen MSR code is terrible for this at the moment.)

> +        break;
>      case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
>          if ( !nvmx_msr_read_intercept(msr, msr_content) )
>              goto gp_fault;
>          break;
> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
> +        if ( !sgx_msr_read_intercept(current, msr, msr_content) )
> +            goto gp_fault;
> +        break;
>      case MSR_IA32_MISC_ENABLE:
>          rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
>          /* Debug Trace Store is not supported. */
> @@ -3119,10 +3134,19 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>          break;
>      }
>      case MSR_IA32_FEATURE_CONTROL:
> +        /* See vmx_msr_read_intercept */
> +        if ( !sgx_msr_write_intercept(current, msr, msr_content) &&
> +                !nvmx_msr_write_intercept(msr, msr_content) )

Definitely needs a rebase.  nvmx_msr_write_intercept() has been removed.

~Andrew

> +            goto gp_fault;
> +        break;
>      case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_TRUE_ENTRY_CTLS:
>          if ( !nvmx_msr_write_intercept(msr, msr_content) )
>              goto gp_fault;
>          break;
> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
> +        if ( !sgx_msr_write_intercept(current, msr, msr_content) )
> +            goto gp_fault;
> +        break;
>      case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
>      case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
>      case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
> index 9793f8c1c5..dfb17c4bd8 100644
> --- a/xen/include/asm-x86/cpufeature.h
> +++ b/xen/include/asm-x86/cpufeature.h
> @@ -98,6 +98,9 @@
>  #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
>  #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
>  
> +/* CPUID level 0x00000007:0.ecx */
> +#define cpu_has_sgx_launch_control  boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL)
> +
>  /* CPUID level 0x80000007.edx */
>  #define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
>  
> diff --git a/xen/include/asm-x86/hvm/vmx/sgx.h b/xen/include/asm-x86/hvm/vmx/sgx.h
> index 40f860662a..c460f61e5e 100644
> --- a/xen/include/asm-x86/hvm/vmx/sgx.h
> +++ b/xen/include/asm-x86/hvm/vmx/sgx.h
> @@ -75,4 +75,26 @@ int hvm_populate_epc(struct domain *d, unsigned long epc_base_pfn,
>  int hvm_reset_epc(struct domain *d, bool_t free_epc);
>  void hvm_destroy_epc(struct domain *d);
>  
> +/* Per-vcpu SGX structure */
> +struct sgx_vcpu {
> +    uint64_t ia32_sgxlepubkeyhash[4];
> +    /*
> +     * Although SDM says if SGX is present, then IA32_SGXLEPUBKEYHASHn are
> +     * available for read, but in reality for SKYLAKE client machines, those
> +     * those MSRs are not available if SGX is present.
> +     */
> +    bool_t readable;
> +    bool_t writable;
> +};
> +#define to_sgx_vcpu(v)  (&(v->arch.hvm_vmx.sgx))
> +
> +bool_t sgx_ia32_sgxlepubkeyhash_writable(void);
> +bool_t domain_has_sgx(struct domain *d);
> +bool_t domain_has_sgx_launch_control(struct domain *d);
> +
> +void sgx_vcpu_init(struct vcpu *v);
> +void sgx_ctxt_switch_to(struct vcpu *v);
> +int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 *msr_content);
> +int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 msr_content);
> +
>  #endif  /* __ASM_X86_HVM_VMX_SGX_H__ */
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 6cfa5c3310..fc0b9d85fd 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -160,6 +160,8 @@ struct arch_vmx_struct {
>       * pCPU and wakeup the related vCPU.
>       */
>      struct pi_blocking_vcpu pi_blocking;
> +
> +    struct sgx_vcpu sgx;
>  };
>  
>  int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
> index 771e7500af..16206a11b7 100644
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -296,6 +296,12 @@
>  #define IA32_FEATURE_CONTROL_SENTER_PARAM_CTL         0x7f00
>  #define IA32_FEATURE_CONTROL_ENABLE_SENTER            0x8000
>  #define IA32_FEATURE_CONTROL_SGX_ENABLE               0x40000
> +#define IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE  0x20000
> +
> +#define MSR_IA32_SGXLEPUBKEYHASH0   0x0000008c
> +#define MSR_IA32_SGXLEPUBKEYHASH1   0x0000008d
> +#define MSR_IA32_SGXLEPUBKEYHASH2   0x0000008e
> +#define MSR_IA32_SGXLEPUBKEYHASH3   0x0000008f
>  
>  #define MSR_IA32_TSC_ADJUST		0x0000003b
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
  2017-07-17  6:08   ` Huang, Kai
@ 2017-07-21  9:04     ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-21  9:04 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel
  Cc: kevin.tian, sstabellini, wei.liu2, George.Dunlap, tim,
	ian.jackson, jbeulich



On 7/17/2017 6:08 PM, Huang, Kai wrote:
> Hi Andrew,
> 
> Thank you very much for comments. Sorry for late reply, and please see 
> my reply below.
> 
> On 7/12/2017 2:13 AM, Andrew Cooper wrote:
>> On 09/07/17 09:03, Kai Huang wrote:
>>> Hi all,
>>>
>>> This series is RFC Xen SGX virtualization support design and RFC 
>>> draft patches.
>>
>> Thankyou very much for this design doc.
>>
>>> 2. SGX Virtualization Design
>>>
>>> 2.1 High Level Toolstack Changes:
>>>
>>> 2.1.1 New 'epc' parameter
>>>
>>> EPC is limited resource. In order to use EPC efficiently among all 
>>> domains,
>>> when creating guest, administrator should be able to specify domain's 
>>> virtual
>>> EPC size. And admin
>>> alao should be able to get all domain's virtual EPC size.
>>>
>>> For this purpose, a new 'epc = <size>' parameter is added to XL 
>>> configuration
>>> file. This parameter specifies guest's virtual EPC size. The EPC base 
>>> address
>>> will be calculated by toolstack internally, according to guest's 
>>> memory size,
>>> MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be 
>>> accepted.
>>
>> How will this interact with multi-package servers?  Even though its 
>> fine to implement the single-package support first, the design should 
>> be extensible to the multi-package case.
>>
>> First of all, what are the implications of multi-package SGX?
>>
>> (Somewhere) you mention changes to scheduling.  I presume this is 
>> because a guest with EPC mappings in EPT must be scheduled on the same 
>> package, or ENCLU[EENTER] will fail.  I presume also that each package 
>> will have separate, unrelated private keys?
> 
> The ENCLU[EENTE] will continue to work on multi-package server. Actually 
> I was told all ISA existing behavior documented in SDM won't change for 
> server, as otherwise this would be a bad design :)
> 
> Unfortunately I was told I cannot talk about MP server SGX a lot now. 
> Basically I can only talk about staff already documented in SDM (sorry 
> :( ). But I guess multiple EPC in CPUID is designed to cover MP server, 
> at lease mainly (we can do reasonable guess).
> 
> In terms of the design, I think we can follow XL config file parameters 
> for memory. 'epc' parameter will always specify totol EPC size that the 
> domain has. And we can use existing NUMA related parameters, such as 
> setting cpus='...' to physically pin vcpu to specific pCPUs, so that EPC 
> will be mostly allocated from related node. If that node runs out of 
> EPC, we can decide whether to allocate EPC from other node, or fail to 
> create domain. I know Linux supports NUMA policy which can specify 
> whether to allow allocating memory from other nodes, does Xen has such 
> policy? Sorry I haven't checked this. If Xen has such policy, we need to 
> choose whether to use memory policy, or introduce new policy for EPC.
> 
> If we are going to support vNUAM EPC in the future. We can also use 
> similar way to config vNUMA EPC in XL config.
> 
> Sorry I mentioned scheduling. I should say *potentially* :). My thinking 
> was as SGX is per-thread, then SGX info reported by different CPU 
> package may be different (ex, whether SGX2 is supported), then we may 
> need scheduler to be aware of SGX. But I think we don't have to consider 
> this now.
> 
> What's your comments?
> 
>>
>> I presume there is no sensible way (even on native) for a single 
>> logical process to use multiple different enclaves?  By extension, 
>> does it make sense to try and offer parts of multiple enclaves to a 
>> single VM?
> 
> The native machine allows running multiple enclaves, even signed by 
> multiple authors. SGX only has limit that before launching any other 
> enclave, Launch Enclave (LE) must be launched. LE is the only enclave 
> that doesn't require EINITTOKEN in EINIT. For LE, its signer 
> (SHA256(sigstruct->modulus)) must be equal to the value in 
> IA32_SGXLEPUBKEYHASHn MSRs. LE will generates EINITTOKEN for other 
> enclaves (EINIT for other enclaves requires EINITTOKEN). For other 
> enclaves, there's no such limitation that enclave's signer must match 
> IA32_SGXLEPUBKEYHASHn so the signer can be anybody. But for other 
> enclaves, before running EINIT, the LE's signer (which is equal to 
> IA32_SGXLEPUBKEYHASHn as explained above) needs to be updated to 
> IA32_SGXLEPUBKEYHASHn (MSRs can be changed, for example, when there's 
> multiple LEs running in OS). This is because EINIT needs to perform 
> EINITTOKEN integrity check (EINITTOKEN contains MAC info that calculated 
> by LE, and EINIT needs LE's IA32_SGXLEPUBKEYHASHn to derive the key to 
> verify MAC).
> 
> SGX in VM doesn't change those behaviors, so in VM, the enclaves can 
> also be signed by anyone, but Xen needs to emulate IA32_SGXLEPUBKEYHASHn 
> so that when one VM is running, the correct IA32_SGXLEPUBKEYHASHn are 
> already in physical MSRs.
> 
>>
>>> 2.1.3 Notify domain's virtual EPC base and size to Xen
>>>
>>> Xen needs to know guest's EPC base and size in order to populate EPC 
>>> pages for
>>> it. Toolstack notifies EPC base and size to Xen via 
>>> XEN_DOMCTL_set_cpuid.
>>
>> I am currently in the process of reworking the Xen/Toolstack interface 
>> when it comes to CPUID handling.  The latest design is available here: 
>> https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00378.html 
>> but the end result will be the toolstack expressing its CPUID policy 
>> in terms of the architectural layout.
>>
>> Therefore, I would expect that, however the setting is represented in 
>> the configuration file, xl/libxl would configure it with the 
>> hypervisor by setting CPUID.0x12[2] with the appropriate base and size.
> 
> I agree. I saw you are planning to introduce new 
> XEN_DOMCTL_get{set}_cpuid_policy, which will allow toolstack to 
> query/set cpuid policy in single hypercall (if I understand correctly), 
> so I think we should definitely use the new hypercalls.
> 
> I also saw you are planning to introduce new hypercall to query 
> raw/host/pv_max/hvm_max cpuid policy (not just featureset), so I think 
> 'xl sgxinfo' (or xl info -sgx) can certainly use that to get physical 
> SGX info (EPC info). And 'xl sgxlist' (or xl list -sgx) can use 
> XEN_DOMCTL_get{set}_cpuid_policy to display domain's SGX info (EPC info).
> 
> Btw, do you think we need 'xl sgxinfo' and 'xl sgxlist'? If we do, which 
> is better? New 'xl sgxinfo' and 'xl sgxlist', or extending existing 'xl 
> info' and 'xl list' to support SGX, such as 'xl info -sgx' and 'xl list 
> -sgx' above?
> 
> 
>>
>>> 2.1.4 Launch Control Support (?)
>>>
>>> Xen Launch Control Support is about to support running multiple 
>>> domains with
>>> each running its own LE signed by different owners (if HW allows, 
>>> explained
>>> below). As explained in 1.4 SGX Launch Control, EINIT for LE (Launch 
>>> Enclave)
>>> only succeeds when SHA256(SIGSTRUCT.modulus) matches 
>>> IA32_SGXLEPUBKEYHASHn,
>>> and EINIT for other enclaves will derive EINITTOKEN key according to
>>> IA32_SGXLEPUBKEYHASHn. Therefore, to support this, guest's virtual
>>> IA32_SGXLEPUBKEYHASHn must be updated to phyiscal MSRs before EINIT 
>>> (which
>>> also means the physical IA32_SGXLEPUBKEYHASHn need to be *unlocked* 
>>> in BIOS
>>> before booting to OS).
>>>
>>> For physical machine, it is BIOS's writer's decision that whether 
>>> BIOS would
>>> provide interface for user to specify customerized 
>>> IA32_SGXLEPUBKEYHASHn (it
>>> is default to digest of Intel's signing key after reset). In reality, 
>>> OS's SGX
>>> driver may require BIOS to make MSRs *unlocked* and actively write 
>>> the hash
>>> value to MSRs in order to run EINIT successfully, as in this case, 
>>> the driver
>>> will not depend on BIOS's capability (whether it allows user to 
>>> customerize
>>> IA32_SGXLEPUBKEYHASHn value).
>>>
>>> The problem is for Xen, do we need a new parameter, such as 
>>> 'lehash=<SHA256>'
>>> to specify the default value of guset's virtual 
>>> IA32_SGXLEPUBKEYHASHn? And do
>>> we need a new parameter, such as 'lewr' to specify whether guest's 
>>> virtual MSRs
>>> are locked or not before handling to guest's OS?
>>>
>>> I tends to not introduce 'lehash', as it seems SGX driver would 
>>> actively update
>>> the MSRs. And new parameter would add additional changes for upper layer
>>> software (such as openstack). And 'lewr' is not needed either as Xen 
>>> can always
>>> *unlock* the MSRs to guest.
>>>
>>> Please give comments?
>>>
>>> Currently in my RFC patches above two parameters are not implemented.
>>> Xen hypervisor will always *unlock* the MSRs. Whether there is 'lehash'
>>> parameter or not doesn't impact Xen hypervisor's emulation of
>>> IA32_SGXLEPUBKEYHASHn. See below Xen hypervisor changes for details.
>>
>> Reading around, am I correct with the following?
>>
>> 1) Some processors have no launch control.  There is no restriction on 
>> which enclaves can boot.
> 
> Yes that some processors have no launch control. However it doesn't mean 
> there's no restriction on which enclaves can boot. Contrary, on those 
> machines only Intel's Launch Enclave (LE) can run, as on those machine, 
> IA32_SGXLEPUBKEYHASHn either doesn't exist, or equal to digest of 
> Intel's signing RSA pubkey. However although only Intel's LE can be run, 
> we can still run other enclaves from other signers. Please see my reply 
> above.
> 
>>
>> 2) Some Skylake client processors claim to have launch control, but 
>> the MSRs are unavailable (is this an erratum?).  These are limited to 
>> booting enclaves matching the Intel public key.
> 
> Sorry I don't know whether this is an erratum. I will get back to you 
> after confirming internally.

Hi Andrew,

I raised this internally, and it turns out that in the latest SDM Intel 
has fixed the statement, so that IA32_SGXLEPUBKEYHASHn MSRs are only 
available when both SGX and SGX_LC is present in CPUID. When I was 
writing the design and patches, I was referring to old SDM, and the old 
one doesn't mention SGX_LC in CPUID as condition. So it is my fault and 
this statement has been fixed in latest SDM (41.2.2 Intel SGX Launch 
Control Configuration):

https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf

However in latest SDM volume 4: Model-Specific Registers:

https://software.intel.com/sites/default/files/managed/22/0d/335592-sdm-vol-4.pdf

You can still see that for IA32_SGXLEPUBKEYHASHn (table 2-2, register 
address 8CH): "Read permitted If CPUID.(EAX=12H,ECX=0H):EAX[0]=1". So 
there's still error in SDM.

I don't think this will be an erratum. Intel will fix the error in vol 4 
in next version SDM. We should refer to 41.2.2 as it has accurate 
description.

> 
> 
>>
>> 3) Launch control may be locked by the BIOS.  There may be a custom 
>> hash, or it might be the Intel default.  Xen can't adjust it at all, 
>> but can support running any number of VMs with matching enclaves.
> 
> Yes Launch control may be locked by BIOS, although this depends on 
> whether BIOS provides interface for user to configure. I was told that 
> typically BIOS will unlock Launch Control, as SGX driver is expecting 
> such behavior. But I am not sure we can always assume this.
> 
> Whether there will be custom hash also depends on BIOS. BIOS may or may 
> not provide interface for user to configure custom hash. So on physical 
> machine, I think we need to consider all the cases. On machine that with 
> Launch control *unlocked*, Xen is able to dynamically change 
> IA32_SGXLEKEYHASHn so that Xen is able to run multiple VM with each 
> running LE from different signer. However if launch control is *locked* 
> in BIOS, then Xen is still able to run multiple VM, but all VM can only 
> run LE from the signer that matches the IA32_SGXLEPUBKEYHASHn (which in 
> most case should be Intel default, but can be custom hash if BIOS allows 
> user to configure).
> 
> Sorry I am not quite sure the typical implementation of BIOS. I think I 
> can reach out internally and get back to you if I have something.

I also reached out internally to find the typical BIOS implementation in 
terms of SGX LC. Typically BIOS will neither provide configuration 
options for user to set custom hash, nor select whether MSRs are locked 
or not. Typically for client machine, MSRs are locked with Intel 
default, and for server machine, MSRs are unlocked. But we cannot rule 
out 3rd party to provide different BIOS that may provide options for 
user to choose locked/unlocked mode, and/or for user to specify custom 
hash. Custom hash + locked mode may be useful for some special purpose 
(ex, IT management) as it provides most secure option -- that even 
kernel/VMM can only launch LE signed with particular signer. In case of 
VM, custom hash + locked mode may be even more useful than bare-metal as 
VM is usually supposed to run some particular purpose appliance.

So I think it is better to keep 'lehash' and 'lewr' XL parameters. They 
both are optional -- the former provides custom hash, and the latter set 
VM to be in unlocked mode. If neither is specified, then VM will be in 
locked mode, and VM's virtual IA32_SGXLEPUBKEYHASHn either have Intel's 
default value (when physical machine is unlocked), or have machine's MSR 
values (when machine is in locked mode). And when physical machine is in 
locked mode, specifying either 'lehash' or 'lewr' will result in 
creating VM failure.

So we have 3 XL parameters for SGX: 'epc', 'lehash' and 'lewr', probably 
we should consolidate them into one XL parameter, such 
sgx=['epc=<size>', 'lehash=<sha256>', 'lewr=[on|off]'] ?

Thanks,
-Kai

> 
>>
>> 4) Launch control may be unlocked by the BIOS.  In this case, Xen can 
>> context switch a hash per domain, and run all enclaves.
> 
> Yes. With enclave == LE I think you meant.
> 
>>
>> The eventual plans for CPUID and MSR levelling should allow all of 
>> these to be expressed in sensible ways, and I don't forsee any issues 
>> with supporting all of these scenarios.
> 
> So do you think we should have 'lehash' and 'lewr' parameters in XL 
> config file? The former provides custom hash, and the latter provides 
> whether unlock guest's Launch control.
> 
> My thinking is SGX driver needs to *actively* write LE's pubkey hash to 
> IA32_SGXLEPUBKEYHASHn in *unlocked* mode, so 'lehash' alone is not 
> needed. 'lehash' only has meaning when 'lewr' is needed to provide a 
> default hash value in locked mode, as if we always use *unlocked* mode 
> for guest, 'lehash' is not necessary.
> 
>>
>>
>>
>>> 2.2 High Level Xen Hypervisor Changes:
>>>
>>> 2.2.1 EPC Management (?)
>>>
>>> Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before
>>> supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's 
>>> possible
>>> that there are multiple EPC sections (enumerated via sub-leaves 0x3 
>>> and so on,
>>> until invaid EPC is reported), but this is only true on 
>>> multiple-socket server
>>> machines. For server machines there are additional things also needs 
>>> to be done,
>>> such as NUMA EPC, scheduling, etc. We will support server machine in 
>>> the future
>>> but currently we only support one EPC.
>>>
>>> EPC is reported as reserved memory (so it is not reported as normal 
>>> memory).
>>> EPC must be managed in 4K pages. CPU hardware uses EPCM to track 
>>> status of each
>>> EPC pages. Xen needs to manage EPC and provide functions to, ie, 
>>> alloc and free
>>> EPC pages for guest.
>>>
>>> There are two ways to manage EPC: Manage EPC separately; or Integrate 
>>> it to
>>> existing memory management framework.
>>>
>>> It is easy to manage EPC separately, as currently EPC is pretty small 
>>> (~100MB),
>>> and we can even put them in a single list. However it is not 
>>> flexible, for
>>> example, you will have to write new algorithms when EPC becomes 
>>> larger, ex, GB.
>>> And you have to write new code to support NUMA EPC (although this 
>>> will not come
>>> in short time).
>>>
>>> Integrating EPC to existing memory management framework seems more 
>>> reasonable,
>>> as in this way we can resume memory management data 
>>> structures/algorithms, and
>>> it will be more flexible to support larger EPC and potentially NUMA 
>>> EPC. But
>>> modifying MM framework has a higher risk to break existing memory 
>>> management
>>> code (potentially more bugs).
>>>
>>> In my RFC patches currently we choose to manage EPC separately. A new
>>> structure epc_page is added to represent a single 4K EPC page. A 
>>> whole array
>>> of struct epc_page will be allocated during EPC initialization, so 
>>> that given
>>> the other, one of PFN of EPC page and 'struct epc_page' can be got by 
>>> adding
>>> offset.
>>>
>>> But maybe integrating EPC to MM framework is more reasonable. Comments?
>>>
>>> 2.2.2 EPC Virtualization (?)
>>
>> It looks like managing the EPC is very similar to managing the NVDIMM 
>> ranges.  We have a (set of) physical address ranges which need 4k 
>> ownership granularity to different domains.
>>
>> I think integrating this into struct page_struct is the better way to go.
> 
> Will do. So I assume we will introduce new MEMF_epc, and use existing 
> alloc_domheap/xenheap_pages to allocate EPC? MEMF_epc can also be used 
> if we need to support ballooning in the future (using existing 
> XENMEM_{decrease/increase}_reservation.
> 
>>
>>>
>>> This part is how to populate EPC for guests. We have 3 choices:
>>>      - Static Partitioning
>>>      - Oversubscription
>>>      - Ballooning
>>>
>>> Static Partitioning means all EPC pages will be allocated and mapped 
>>> to guest
>>> when it is created, and there's no runtime change of page table 
>>> mappings for EPC
>>> pages. Oversubscription means Xen hypervisor supports EPC page 
>>> swapping between
>>> domains, meaning Xen is able to evict EPC page from another domain 
>>> and assign it
>>> to the domain that needs the EPC. With oversubscription, EPC can be 
>>> assigned to
>>> domain on demand, when EPT violation happens. Ballooning is similar 
>>> to memory
>>> ballooning. It is basically "Static Partitioning" + "Balloon driver" 
>>> in guest.
>>>
>>> Static Partitioning is the easiest way in terms of implementation, 
>>> and there
>>> will be no hypervisor overhead (except EPT overhead of course), 
>>> because in
>>> "Static partitioning", there is no EPT violation for EPC, and Xen 
>>> doesn't need
>>> to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root 
>>> mode.
>>>
>>> Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like 
>>> "Static
>>> Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and 
>>> doesn't
>>> have EPT violation for EPC either. To support ballooning, we need 
>>> ballooning
>>> driver in guest to issue hypercall to give up or reclaim EPC pages. 
>>> In terms of
>>> hypercall, we have two choices: 1) Add new hypercall for EPC 
>>> ballooning; 2)
>>> Using existing XENMEM_{increase/decrease}_reservation with new memory 
>>> flag, ie,
>>> XENMEMF_epc. I'll discuss more regarding to adding dedicated 
>>> hypercall or not
>>> later.
>>>
>>> Oversubscription looks nice but it requires more complicated 
>>> implemetation.
>>> Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to 
>>> follow specific
>>> steps to evict EPC pages, and in order to do that, basically Xen 
>>> needs to trap
>>> ENCLS from guest and keep track of EPC page status and enclave info 
>>> from all
>>> guest. This is because:
>>>      - To evict regular EPC page, Xen needs to know SECS location
>>>      - Xen needs to know EPC page type: evicting regular EPC and 
>>> evicting SECS,
>>>        VA page have different steps.
>>>      - Xen needs to know EPC page status: whether the page is blocked 
>>> or not.
>>>
>>> Those info can only be got by trapping ENCLS from guest, and parsing its
>>> parameters (to identify SECS page, etc). Parsing ENCLS parameters 
>>> means we need
>>> to know which ENCLS leaf is being trapped, and we need to translate 
>>> guest's
>>> virtual address to get physical address in order to locate EPC page. 
>>> And once
>>> ENCLS is trapped, we have to emulate ENCLS in Xen, which means we 
>>> need to
>>> reconstruct ENCLS parameters by remapping all guest's virtual address 
>>> to Xen's
>>> virtual address (gva->gpa->pa->xen_va), as ENCLS always use 
>>> *effective address*
>>> which is able to be traslated by processor when running ENCLS.
>>>
>>>      --------------------------------------------------------------
>>>                  |   ENCLS   |
>>>      --------------------------------------------------------------
>>>                  |          /|\
>>>      ENCLS VMEXIT|           | VMENTRY
>>>                  |           |
>>>                 \|/          |
>>>
>>>         1) parse ENCLS parameters
>>>         2) reconstruct(remap) guest's ENCLS parameters
>>>         3) run ENCLS on behalf of guest (and skip ENCLS)
>>>         4) on success, update EPC/enclave info, or inject error
>>>
>>> And Xen needs to maintain each EPC page's status (type, blocked or 
>>> not, in
>>> enclave or not, etc). Xen also needs to maintain all Enclave's info 
>>> from all
>>> guests, in order to find the correct SECS for regular EPC page, and 
>>> enclave's
>>> linear address as well.
>>>
>>> So in general, "Static Partitioning" has simplest implementation, but 
>>> obviously
>>> not the best way to use EPC efficiently; "Ballooning" has all pros of 
>>> Static
>>> Partitioning but requies guest balloon driver; "Oversubscription" is 
>>> best in
>>> terms of flexibility but requires complicated hypervisor implemetation.
>>>
>>> We have implemented "Static Partitioning" in RFC patches, but needs your
>>> feedback on whether it is enough. If not, which one should we do at 
>>> next stage
>>> -- Ballooning or Oversubscription. IMO Ballooning may be good enough, 
>>> given fact
>>> that currently memory is also "Static Partitioning" + "Ballooning".
>>>
>>> Comments?
>>
>> Definitely go for static partitioning to begin with.  This is far 
>> simpler to implement.
>>
>> I can't see a pressing usecase for oversubscription or ballooning. Any 
>> datacenter work will be using exclusively static, and I expect static 
>> will fine for all (or at least, most) client usecases.
> 
> Thanks. So for the first stage I will focus on static partitioning.
> 
>>
>>>
>>> 2.2.3 Populate EPC for Guest
>>>
>>> Toolstack notifies Xen about domain's EPC base and size by 
>>> XEN_DOMCTL_set_cpuid,
>>> so currently Xen populates all EPC pages for guest in 
>>> XEN_DOMCTL_set_cpuid,
>>> particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. 
>>> Once Xen
>>> checks the values passed from toolstack is valid, Xen will allocate 
>>> all EPC
>>> pages and setup EPT mappings for guest.
>>>
>>> 2.2.4 New Dedicated Hypercall (?)
>>
>> All this information should (eventually) be available via the 
>> appropriate SYSCTL_get_{cpuid,msr}_policy hypercalls.  I don't see any 
>> need for dedicated hypercalls.
> 
> Yes I agree.  Originally I had concern that without dedicated hypercall, 
> it is hard to implement 'xl sgxinfo' and 'xl sgxlist', but according to 
> your new CPUID enhancement plan, the two can be done via the new 
> hypercalls to query Xen's and domain's cpuid policy. See my reply above 
> regarding to "Notify Xen about guest's EPC info".
> 
>>
>>> 2.2.9 Guest Suspend & Resume
>>>
>>> On hardware, EPC is destroyed when power goes to S3-S5. So Xen will 
>>> destroy
>>> guest's EPC when guest's power goes into S3-S5. Currently Xen is 
>>> notified by
>>> Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen 
>>> will
>>> destroy EPC if S State is S3-S5.
>>>
>>> Specifically, Xen will run EREMOVE for guest's each EPC page, as 
>>> guest may
>>> not handle EPC suspend & resume correctly, in which case physically 
>>> guest's EPC
>>> pages may still be valid, so Xen needs to run EREMOVE to make sure 
>>> all EPC
>>> pages are becoming invalid. Otherwise further operation in guest on 
>>> EPC may
>>> fault as it assumes all EPC pages are invalid after guest is resumed.
>>>
>>> For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which 
>>> case Xen will
>>> keep this SECS page into a list, and call EREMOVE for them again 
>>> after all EPC
>>> pages have been called with EREMOVE. This time the EREMOVE on SECS 
>>> will succeed
>>> as all children (regular EPC pages) have already been removed.
>>>
>>> 2.2.10 Destroying Domain
>>>
>>> Normally Xen just frees all EPC pages for domain when it is 
>>> destroyed. But Xen
>>> will also do EREMOVE on all guest's EPC pages (described in above 
>>> 2.2.7) before
>>> free them, as guest may shutdown unexpected (ex, user kills guest), 
>>> and in this
>>> case, guest's EPC may still be valid.
>>>
>>> 2.3 Additional Point: Live Migration, Snapshot Support (?)
>>
>> How big is the EPC?  If we are talking MB rather than GB, movement of 
>> the EPC could be after the pause, which would add some latency to live 
>> migration but should work.  I expect that people would prefer to have 
>> the flexibility of migration even at the cost of extra latency.
>>
> 
> The EPC is typically ~100MB at maximum (as I observed). The EPC is 
> typically reserved with EPCM (EPC map, which is invisible to SW) 
> together by BIOS as processor reserved memory (RPM). On real machine, 
> for both our internal develop machines, and some machines that from 
> Dell, HP, Lenovo (that you can buy from market now), BIOS always 
> provides 3 choices in terms RPM: 32M, 64M, and 128M. And with 128M RPM, 
> EPC is slightly less than 100M.
> 
> The problem is EPC cannot be moved. I think you were saying moving EPC 
> by evicting EPC out at last stage and copy evicted content to remote, 
> and then reload. However I don't think this will work, as EPC eviction 
> itself needs to use a VA slot (which itslef is EPC), so you can image 
> that the VA slots cannot be moved to remote. Even if they can, they 
> cannot be used to reload EPC in remote, as info in VA slot is bound to 
> platform and cannot be used on remote.
> 
> To support live migration, we can only choose to ignore EPC during live 
> migration and let guest SGX driver/user SW stack to handle restoring 
> enclave (which is actually a lot simpler in hypervisor/toolstack's 
> implementation) . Guest SGX driver needs to handle lose EPC anyway, as 
> EPC is destroyed in S3-S5. The only difference is to support live 
> migration, guest SGX driver needs to support *sudden* lose of EPC, which 
> is not HW behavior, and I was told that currently both Windows & Linux 
> SGX driver already support *sudden* lose of EPC, which leaves us a 
> question whether we need to support SGX live migration (and snapshot).
> 
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/15] xen: x86: add early stage SGX feature detection
  2017-07-19 14:23   ` Andrew Cooper
@ 2017-07-21  9:17     ` Huang, Kai
  2017-07-22  1:06       ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Huang, Kai @ 2017-07-21  9:17 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel; +Cc: kevin.tian, jbeulich



On 7/20/2017 2:23 AM, Andrew Cooper wrote:
> On 09/07/17 09:09, Kai Huang wrote:
>> This patch adds early stage SGX feature detection via SGX CPUID 0x12. Function
>> detect_sgx is added to detect SGX info on each CPU (called from vmx_cpu_up).
>> SDM says SGX info returned by CPUID is per-thread, and we cannot assume all
>> threads will return the same SGX info, so we have to detect SGX for each CPU.
>> For simplicity, currently SGX is only supported when all CPUs reports the same
>> SGX info.
>>
>> SDM also says it's possible to have multiple EPC sections but this is only for
>> multiple-socket server, which we don't support now (there are other things
>> need to be done, ex, NUMA EPC, scheduling, etc, as well), so currently only
>> one EPC is supported.
>>
>> Dedicated files sgx.c and sgx.h are added (under vmx directory as SGX is Intel
>> specific) for bulk of above SGX detection code detection code, and for further
>> SGX code as well.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> 
> I am not sure putting this under hvm/ is a sensible move.  Almost
> everything in this patch is currently common, and I can forsee us
> wanting to introduce PV support, so it would be good to introduce this
> in a guest-neutral location to begin with.
> 
>> ---
>>   xen/arch/x86/hvm/vmx/Makefile     |   1 +
>>   xen/arch/x86/hvm/vmx/sgx.c        | 208 ++++++++++++++++++++++++++++++++++++++
>>   xen/arch/x86/hvm/vmx/vmcs.c       |   4 +
>>   xen/include/asm-x86/cpufeature.h  |   1 +
>>   xen/include/asm-x86/hvm/vmx/sgx.h |  45 +++++++++
>>   5 files changed, 259 insertions(+)
>>   create mode 100644 xen/arch/x86/hvm/vmx/sgx.c
>>   create mode 100644 xen/include/asm-x86/hvm/vmx/sgx.h
>>
>> diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
>> index 04a29ce59d..f6bcf0d143 100644
>> --- a/xen/arch/x86/hvm/vmx/Makefile
>> +++ b/xen/arch/x86/hvm/vmx/Makefile
>> @@ -4,3 +4,4 @@ obj-y += realmode.o
>>   obj-y += vmcs.o
>>   obj-y += vmx.o
>>   obj-y += vvmx.o
>> +obj-y += sgx.o
>> diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
>> new file mode 100644
>> index 0000000000..6b41469371
>> --- /dev/null
>> +++ b/xen/arch/x86/hvm/vmx/sgx.c
> 
> This file looks like it should be arch/x86/sgx.c, given its current content.
> 
>> @@ -0,0 +1,208 @@
>> +/*
>> + * Intel Software Guard Extensions support
> 
> Please include a GPLv2 header.
> 
>> + *
>> + * Author: Kai Huang <kai.huang@linux.intel.com>
>> + */
>> +
>> +#include <asm/cpufeature.h>
>> +#include <asm/msr-index.h>
>> +#include <asm/msr.h>
>> +#include <asm/hvm/vmx/sgx.h>
>> +#include <asm/hvm/vmx/vmcs.h>
>> +
>> +static struct sgx_cpuinfo __read_mostly sgx_cpudata[NR_CPUS];
>> +static struct sgx_cpuinfo __read_mostly boot_sgx_cpudata;
> 
> I don't think any of this is necessary.  The description says that all
> EPCs across the server will be reported in CPUID subleaves, and our
> implementation gives up if the data are non-identical across CPUs.
> 
> Therefore, we only need to keep one copy of the data, and check check
> APs against the master copy.

Right. boot_sgx_cpudata is what we need. Currently detect_sgx is called 
from vmx_cpu_up. How about changing to calling it from identify_cpu, and 
something like below ?

	if ( c == &boot_cpu_data )
		detect_sgx(&boot_sgx_cpudata);
	else {
		struct sgx_cpuinfo tmp;
		detect_sgx(&tmp);
		if ( memcmp(&boot_sgx_cpudata, &tmp, sizeof (tmp)) )
			//disable SGX
	}

Thanks,
-Kai
> 
> 
> Let me see about splitting up a few bits of the existing CPUID
> infrastructure, so we can use the host cpuid policy more effectively for
> Xen related things.
> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/15] xen: vmx: handle SGX related MSRs
  2017-07-19 17:27   ` Andrew Cooper
@ 2017-07-21  9:42     ` Huang, Kai
  2017-07-22  1:37       ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Huang, Kai @ 2017-07-21  9:42 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel; +Cc: kevin.tian, jbeulich



On 7/20/2017 5:27 AM, Andrew Cooper wrote:
> On 09/07/17 09:09, Kai Huang wrote:
>> This patch handles IA32_FEATURE_CONTROL and IA32_SGXLEPUBKEYHASHn MSRs.
>>
>> For IA32_FEATURE_CONTROL, if SGX is exposed to domain, then SGX_ENABLE bit
>> is always set. If SGX launch control is also exposed to domain, and physical
>> IA32_SGXLEPUBKEYHASHn are writable, then SGX_LAUNCH_CONTROL_ENABLE bit is
>> also always set. Write to IA32_FEATURE_CONTROL is ignored.
>>
>> For IA32_SGXLEPUBKEYHASHn, a new 'struct sgx_vcpu' is added for per-vcpu SGX
>> staff, and currently it has vcpu's virtual ia32_sgxlepubkeyhash[0-3]. Two
>> boolean 'readable' and 'writable' are also added to indicate whether virtual
>> IA32_SGXLEPUBKEYHASHn are readable and writable.
>>
>> During vcpu is initialized, virtual ia32_sgxlepubkeyhash are also initialized.
>> If physical IA32_SGXLEPUBKEYHASHn are writable, then ia32_sgxlepubkeyhash are
>> set to Intel's default value, as for physical machine, those MSRs will have
>> Intel's default value. If physical MSRs are not writable (it is *locked* by
>> BIOS before handling to Xen), then we try to read those MSRs and use physical
>> values as defult value for virtual MSRs. One thing is rdmsr_safe is used, as
>> although SDM says if SGX is present, IA32_SGXLEPUBKEYHASHn are available for
>> read, but in reality, skylake client (at least some, depending on BIOS) doesn't
>> have those MSRs available, so we use rdmsr_safe and set readable to false if it
>> returns error code.
>>
>> For IA32_SGXLEPUBKEYHASHn MSR read from guest, if physical MSRs are not
>> readable, guest is not allowed to read either, otherwise vcpu's virtual MSR
>> value is returned.
>>
>> For IA32_SGXLEPUBKEYHASHn MSR write from guest, we allow guest to write if both
>> physical MSRs are writable and SGX launch control is exposed to domain,
>> otherwise error is injected.
>>
>> To make EINIT run successfully in guest, vcpu's virtual IA32_SGXLEPUBKEYHASHn
>> will be update to physical MSRs when vcpu is scheduled in.
>>
>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>> ---
>>   xen/arch/x86/hvm/vmx/sgx.c         | 194 +++++++++++++++++++++++++++++++++++++
>>   xen/arch/x86/hvm/vmx/vmx.c         |  24 +++++
>>   xen/include/asm-x86/cpufeature.h   |   3 +
>>   xen/include/asm-x86/hvm/vmx/sgx.h  |  22 +++++
>>   xen/include/asm-x86/hvm/vmx/vmcs.h |   2 +
>>   xen/include/asm-x86/msr-index.h    |   6 ++
>>   6 files changed, 251 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
>> index 14379151e8..4944e57aef 100644
>> --- a/xen/arch/x86/hvm/vmx/sgx.c
>> +++ b/xen/arch/x86/hvm/vmx/sgx.c
>> @@ -405,6 +405,200 @@ void hvm_destroy_epc(struct domain *d)
>>       hvm_reset_epc(d, true);
>>   }
>>   
>> +/* Whether IA32_SGXLEPUBKEYHASHn are physically *unlocked* by BIOS */
>> +bool_t sgx_ia32_sgxlepubkeyhash_writable(void)
>> +{
>> +    uint64_t sgx_lc_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
>> +                              IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE |
>> +                              IA32_FEATURE_CONTROL_LOCK;
>> +    uint64_t val;
>> +
>> +    rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
>> +
>> +    return (val & sgx_lc_enabled) == sgx_lc_enabled;
>> +}
>> +
>> +bool_t domain_has_sgx(struct domain *d)
>> +{
>> +    /* hvm_epc_populated(d) implies CPUID has SGX */
>> +    return hvm_epc_populated(d);
>> +}
>> +
>> +bool_t domain_has_sgx_launch_control(struct domain *d)
>> +{
>> +    struct cpuid_policy *p = d->arch.cpuid;
>> +
>> +    if ( !domain_has_sgx(d) )
>> +        return false;
>> +
>> +    /* Unnecessary but check anyway */
>> +    if ( !cpu_has_sgx_launch_control )
>> +        return false;
>> +
>> +    return !!p->feat.sgx_launch_control;
>> +}
> 
> Both of these should be d->arch.cpuid->feat.{sgx,sgx_lc} only, and not
> from having individual helpers.
> 
> The CPUID setup during host boot and domain construction should take
> care of setting everything up properly, or hiding the features from the
> guest.  The point of the work I've been doing is to prevent situations
> where the guest can see SGX but something doesn't work because of Xen
> using nested checks like this.

Thanks for comments. Will change to simple check against 
d->arch.cpuid->feat.{sgx,sgx_lc}.

> 
>> +
>> +/* Digest of Intel signing key. MSR's default value after reset. */
>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac
>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d
>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d
>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b
>> +
>> +void sgx_vcpu_init(struct vcpu *v)
>> +{
>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>> +
>> +    memset(sgxv, 0, sizeof (*sgxv));
>> +
>> +    if ( sgx_ia32_sgxlepubkeyhash_writable() )
>> +    {
>> +        /*
>> +         * If physical MSRs are writable, set vcpu's default value to Intel's
>> +         * default value. For real machine, after reset, MSRs contain Intel's
>> +         * default value.
>> +         */
>> +        sgxv->ia32_sgxlepubkeyhash[0] = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
>> +        sgxv->ia32_sgxlepubkeyhash[1] = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
>> +        sgxv->ia32_sgxlepubkeyhash[2] = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
>> +        sgxv->ia32_sgxlepubkeyhash[3] = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
>> +
>> +        sgxv->readable = 1;
>> +        sgxv->writable = domain_has_sgx_launch_control(v->domain);
>> +    }
>> +    else
>> +    {
>> +        uint64_t v;
>> +        /*
>> +         * Although SDM says if SGX is present, then IA32_SGXLEPUBKEYHASHn are
>> +         * available for read, but in reality for SKYLAKE client machines,
>> +         * those MSRs are not available if SGX is present, so we cannot rely on
>> +         * cpu_has_sgx to determine whether to we are able to read MSRs,
>> +         * instead, we always use rdmsr_safe.
> 
> Talking with Jun at XenSummit, I got the impression that the
> availability of these has MSRs is based on SGX_LC, not SGX.
> 
> Furthermore, that is my reading of 41.2.2 "Intel SGX Launch Control
> Configuration", although the logic is expressed in terms of checking SGX
> before SGX_LC.

Yes you are correct indeed. When I was writing the code I was reading 
the old SDM, which has bug and doesn't mention SGX_LC in CPUID as a 
condition. Please see my reply to your question whether this is erratum.

We should add cpu_has_sgx_lc as additional check, and I think we can use 
rdmsr if both SGX and SGX_LC are present (probably using rdmsr_safe is 
still better?).

> 
>> +         */
>> +        sgxv->readable = rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, v) ? 0 : 1;
>> +
>> +        if ( !sgxv->readable )
>> +            return;
>> +
>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, sgxv->ia32_sgxlepubkeyhash[0]);
>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH1, sgxv->ia32_sgxlepubkeyhash[1]);
>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH2, sgxv->ia32_sgxlepubkeyhash[2]);
>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH3, sgxv->ia32_sgxlepubkeyhash[3]);
>> +    }
>> +}
>> +
>> +void sgx_ctxt_switch_to(struct vcpu *v)
>> +{
>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>> +
>> +    if ( sgxv->writable && sgx_ia32_sgxlepubkeyhash_writable() )
> 
> This causes a read of FEATURE_CONTROL on every context switch path,
> which is inefficient.
> 
> Just like with CPUID policy, we will (eventually) have a generic MSR
> policy for the guest to use.  In particular, I can forsee a usecase
> where hardware has LC unlocked, but the host administrator wishes LC to
> be locked from the guests point of view.

We can remove sgx_ia32_sgxlepubkeyhash_writable, as if sgxv->writable is 
true, then sgx_ia32_sgxlepubkeyhash_writable is always true. I am not 
sure whether we should leave guest in locked mode in most cases but I 
think we can add 'lewr' XL parameter to explicitly set guest to unlocked 
mode (otherwise guest is locked). Please see my latest reply to design 
of Launch Control.

> 
>> +    {
>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0, sgxv->ia32_sgxlepubkeyhash[0]);
>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1, sgxv->ia32_sgxlepubkeyhash[1]);
>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2, sgxv->ia32_sgxlepubkeyhash[2]);
>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3, sgxv->ia32_sgxlepubkeyhash[3]);
>> +    }
>> +}
>> +
>> +int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 *msr_content)
>> +{
>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>> +    u64 data;
>> +    int r = 1;
>> +
>> +    if ( !domain_has_sgx(v->domain) )
>> +        return 0;
>> +
>> +    switch ( msr )
>> +    {
>> +    case MSR_IA32_FEATURE_CONTROL:
>> +        data = (IA32_FEATURE_CONTROL_LOCK |
>> +                IA32_FEATURE_CONTROL_SGX_ENABLE);
>> +        /*
>> +         * If physical IA32_SGXLEPUBKEYHASHn are writable, then we always
>> +         * allow guest to be able to change IA32_SGXLEPUBKEYHASHn at runtime.
>> +         */
>> +        if ( sgx_ia32_sgxlepubkeyhash_writable() &&
>> +                domain_has_sgx_launch_control(v->domain) )
>> +            data |= IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
>> +
>> +        *msr_content = data;
>> +
>> +        break;
> 
> Newline here please.

Sure.

> 
>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
> 
> Spaces around ... please.  (it is only because of the #defines that this
> isn't a syntax error).

Will do.

> 
>> +        /*
>> +         * SDM 35.1 Model-Specific Registers, table 35-2.
>> +         *
>> +         * IA32_SGXLEPUBKEYHASH[0..3]:
>> +         *
>> +         * Read permitted if CPUID.0x12.0:EAX[0] = 1.
>> +         *
>> +         * In reality, MSRs may not be readable even SGX is present, in which
>> +         * case guest is not allowed to read either.
>> +         */
>> +        if ( !sgxv->readable )
>> +        {
>> +            r = 0;
>> +            break;
>> +        }
>> +
>> +        data = sgxv->ia32_sgxlepubkeyhash[msr - MSR_IA32_SGXLEPUBKEYHASH0];
>> +
>> +        *msr_content = data;
>> +
>> +        break;
>> +    default:
>> +        r = 0;
>> +        break;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>> +int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 msr_content)
>> +{
>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>> +    int r = 1;
>> +
>> +    if ( !domain_has_sgx(v->domain) )
>> +        return 0;
>> +
>> +    switch ( msr )
>> +    {
>> +    case MSR_IA32_FEATURE_CONTROL:
>> +        /* sliently drop */
> 
> Silently dropping is not ok.  This change needs rebasing over c/s
> 46c3acb308 where I have fixed up the writeability of FEATURE_CONTROL.

Thanks. I'll take a look and change accordingly.

> 
>> +        break;
>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
>> +        /*
>> +         * SDM 35.1 Model-Specific Registers, table 35-2.
>> +         *
>> +         * IA32_SGXLEPUBKEYHASH[0..3]:
>> +         *
>> +         * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is available.
>> +         * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
>> +         *      FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
>> +         *
>> +         * sgxv->writable == 1 means sgx_ia32_sgxlepubkeyhash_writable() and
>> +         * domain_has_sgx_launch_control(d) both are true.
>> +         */
>> +        if ( !sgxv->writable )
>> +        {
>> +            r = 0;
>> +            break;
>> +        }
>> +
>> +        sgxv->ia32_sgxlepubkeyhash[msr - MSR_IA32_SGXLEPUBKEYHASH0] =
>> +            msr_content;
>> +
>> +        break;
>> +    default:
>> +        r = 0;
>> +        break;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>>   static bool_t sgx_enabled_in_bios(void)
>>   {
>>       uint64_t val, sgx_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 243643111d..7ee5515bdc 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -470,6 +470,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>>       if ( v->vcpu_id == 0 )
>>           v->arch.user_regs.rax = 1;
>>   
>> +    sgx_vcpu_init(v);
>> +
>>       return 0;
>>   }
>>   
>> @@ -1048,6 +1050,9 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
>>   
>>       if ( v->domain->arch.hvm_domain.pi_ops.switch_to )
>>           v->domain->arch.hvm_domain.pi_ops.switch_to(v);
>> +
>> +    if ( domain_has_sgx(v->domain) )
>> +        sgx_ctxt_switch_to(v);
>>   }
>>   
>>   
>> @@ -2876,10 +2881,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>>           __vmread(GUEST_IA32_DEBUGCTL, msr_content);
>>           break;
>>       case MSR_IA32_FEATURE_CONTROL:
>> +        /* If neither SGX nor nested is supported, this MSR should not be
>> +         * touched */
>> +        if ( !sgx_msr_read_intercept(current, msr, msr_content) &&
>> +                !nvmx_msr_read_intercept(msr, msr_content) )
>> +            goto gp_fault;
> 
> Unfortunately, this logic is broken.  In the case that both SMX and VMX
> are configured, the VMX handler will clobber the values set up by the
> SGX handler.  Sergey has a VMX-policy series (v1 posted, v2 in the
> works) to start addressing some of the issues on the VMX side, but
> fundamentally, all reads like this need serving out of a single policy,
> rather than having different subsystems fighting for control of the
> values.  (The Xen MSR code is terrible for this at the moment.)

Thanks for pointing out. I have located vmx-policy series. I'll look 
into it to see how should I change this logic.

> 
>> +        break;
>>       case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
>>           if ( !nvmx_msr_read_intercept(msr, msr_content) )
>>               goto gp_fault;
>>           break;
>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
>> +        if ( !sgx_msr_read_intercept(current, msr, msr_content) )
>> +            goto gp_fault;
>> +        break;
>>       case MSR_IA32_MISC_ENABLE:
>>           rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
>>           /* Debug Trace Store is not supported. */
>> @@ -3119,10 +3134,19 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>>           break;
>>       }
>>       case MSR_IA32_FEATURE_CONTROL:
>> +        /* See vmx_msr_read_intercept */
>> +        if ( !sgx_msr_write_intercept(current, msr, msr_content) &&
>> +                !nvmx_msr_write_intercept(msr, msr_content) )
> 
> Definitely needs a rebase.  nvmx_msr_write_intercept() has been removed.

Yeah. The code base of this series is about 3-4 weeks ago unfortunately. 
Will do.

Thanks,
-Kai

> 
> ~Andrew
> 
>> +            goto gp_fault;
>> +        break;
>>       case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_TRUE_ENTRY_CTLS:
>>           if ( !nvmx_msr_write_intercept(msr, msr_content) )
>>               goto gp_fault;
>>           break;
>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
>> +        if ( !sgx_msr_write_intercept(current, msr, msr_content) )
>> +            goto gp_fault;
>> +        break;
>>       case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
>>       case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
>>       case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
>> diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
>> index 9793f8c1c5..dfb17c4bd8 100644
>> --- a/xen/include/asm-x86/cpufeature.h
>> +++ b/xen/include/asm-x86/cpufeature.h
>> @@ -98,6 +98,9 @@
>>   #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
>>   #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
>>   
>> +/* CPUID level 0x00000007:0.ecx */
>> +#define cpu_has_sgx_launch_control  boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL)
>> +
>>   /* CPUID level 0x80000007.edx */
>>   #define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
>>   
>> diff --git a/xen/include/asm-x86/hvm/vmx/sgx.h b/xen/include/asm-x86/hvm/vmx/sgx.h
>> index 40f860662a..c460f61e5e 100644
>> --- a/xen/include/asm-x86/hvm/vmx/sgx.h
>> +++ b/xen/include/asm-x86/hvm/vmx/sgx.h
>> @@ -75,4 +75,26 @@ int hvm_populate_epc(struct domain *d, unsigned long epc_base_pfn,
>>   int hvm_reset_epc(struct domain *d, bool_t free_epc);
>>   void hvm_destroy_epc(struct domain *d);
>>   
>> +/* Per-vcpu SGX structure */
>> +struct sgx_vcpu {
>> +    uint64_t ia32_sgxlepubkeyhash[4];
>> +    /*
>> +     * Although SDM says if SGX is present, then IA32_SGXLEPUBKEYHASHn are
>> +     * available for read, but in reality for SKYLAKE client machines, those
>> +     * those MSRs are not available if SGX is present.
>> +     */
>> +    bool_t readable;
>> +    bool_t writable;
>> +};
>> +#define to_sgx_vcpu(v)  (&(v->arch.hvm_vmx.sgx))
>> +
>> +bool_t sgx_ia32_sgxlepubkeyhash_writable(void);
>> +bool_t domain_has_sgx(struct domain *d);
>> +bool_t domain_has_sgx_launch_control(struct domain *d);
>> +
>> +void sgx_vcpu_init(struct vcpu *v);
>> +void sgx_ctxt_switch_to(struct vcpu *v);
>> +int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 *msr_content);
>> +int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 msr_content);
>> +
>>   #endif  /* __ASM_X86_HVM_VMX_SGX_H__ */
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> index 6cfa5c3310..fc0b9d85fd 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -160,6 +160,8 @@ struct arch_vmx_struct {
>>        * pCPU and wakeup the related vCPU.
>>        */
>>       struct pi_blocking_vcpu pi_blocking;
>> +
>> +    struct sgx_vcpu sgx;
>>   };
>>   
>>   int vmx_create_vmcs(struct vcpu *v);
>> diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
>> index 771e7500af..16206a11b7 100644
>> --- a/xen/include/asm-x86/msr-index.h
>> +++ b/xen/include/asm-x86/msr-index.h
>> @@ -296,6 +296,12 @@
>>   #define IA32_FEATURE_CONTROL_SENTER_PARAM_CTL         0x7f00
>>   #define IA32_FEATURE_CONTROL_ENABLE_SENTER            0x8000
>>   #define IA32_FEATURE_CONTROL_SGX_ENABLE               0x40000
>> +#define IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE  0x20000
>> +
>> +#define MSR_IA32_SGXLEPUBKEYHASH0   0x0000008c
>> +#define MSR_IA32_SGXLEPUBKEYHASH1   0x0000008d
>> +#define MSR_IA32_SGXLEPUBKEYHASH2   0x0000008e
>> +#define MSR_IA32_SGXLEPUBKEYHASH3   0x0000008f
>>   
>>   #define MSR_IA32_TSC_ADJUST		0x0000003b
>>   
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/15] xen: x86: add early stage SGX feature detection
  2017-07-21  9:17     ` Huang, Kai
@ 2017-07-22  1:06       ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-22  1:06 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel; +Cc: kevin.tian, jbeulich



On 7/21/2017 9:17 PM, Huang, Kai wrote:
> 
> 
> On 7/20/2017 2:23 AM, Andrew Cooper wrote:
>> On 09/07/17 09:09, Kai Huang wrote:
>>> This patch adds early stage SGX feature detection via SGX CPUID 0x12. 
>>> Function
>>> detect_sgx is added to detect SGX info on each CPU (called from 
>>> vmx_cpu_up).
>>> SDM says SGX info returned by CPUID is per-thread, and we cannot 
>>> assume all
>>> threads will return the same SGX info, so we have to detect SGX for 
>>> each CPU.
>>> For simplicity, currently SGX is only supported when all CPUs reports 
>>> the same
>>> SGX info.
>>>
>>> SDM also says it's possible to have multiple EPC sections but this is 
>>> only for
>>> multiple-socket server, which we don't support now (there are other 
>>> things
>>> need to be done, ex, NUMA EPC, scheduling, etc, as well), so 
>>> currently only
>>> one EPC is supported.
>>>
>>> Dedicated files sgx.c and sgx.h are added (under vmx directory as SGX 
>>> is Intel
>>> specific) for bulk of above SGX detection code detection code, and 
>>> for further
>>> SGX code as well.
>>>
>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>
>> I am not sure putting this under hvm/ is a sensible move.  Almost
>> everything in this patch is currently common, and I can forsee us
>> wanting to introduce PV support, so it would be good to introduce this
>> in a guest-neutral location to begin with.

Sorry I forgot to response to this in my last reply. I looked at code 
again and yes I think we can make the code to common place. I will move 
current sgx.c to arch/x86/sgx.c. Thanks for comments.

>>
>>> ---
>>>   xen/arch/x86/hvm/vmx/Makefile     |   1 +
>>>   xen/arch/x86/hvm/vmx/sgx.c        | 208 
>>> ++++++++++++++++++++++++++++++++++++++
>>>   xen/arch/x86/hvm/vmx/vmcs.c       |   4 +
>>>   xen/include/asm-x86/cpufeature.h  |   1 +
>>>   xen/include/asm-x86/hvm/vmx/sgx.h |  45 +++++++++
>>>   5 files changed, 259 insertions(+)
>>>   create mode 100644 xen/arch/x86/hvm/vmx/sgx.c
>>>   create mode 100644 xen/include/asm-x86/hvm/vmx/sgx.h
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/Makefile 
>>> b/xen/arch/x86/hvm/vmx/Makefile
>>> index 04a29ce59d..f6bcf0d143 100644
>>> --- a/xen/arch/x86/hvm/vmx/Makefile
>>> +++ b/xen/arch/x86/hvm/vmx/Makefile
>>> @@ -4,3 +4,4 @@ obj-y += realmode.o
>>>   obj-y += vmcs.o
>>>   obj-y += vmx.o
>>>   obj-y += vvmx.o
>>> +obj-y += sgx.o
>>> diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
>>> new file mode 100644
>>> index 0000000000..6b41469371
>>> --- /dev/null
>>> +++ b/xen/arch/x86/hvm/vmx/sgx.c
>>
>> This file looks like it should be arch/x86/sgx.c, given its current 
>> content.

Will do.

>>
>>> @@ -0,0 +1,208 @@
>>> +/*
>>> + * Intel Software Guard Extensions support
>>
>> Please include a GPLv2 header.

Yes will do.

Thanks,
-Kai
>>
>>> + *
>>> + * Author: Kai Huang <kai.huang@linux.intel.com>
>>> + */
>>> +
>>> +#include <asm/cpufeature.h>
>>> +#include <asm/msr-index.h>
>>> +#include <asm/msr.h>
>>> +#include <asm/hvm/vmx/sgx.h>
>>> +#include <asm/hvm/vmx/vmcs.h>
>>> +
>>> +static struct sgx_cpuinfo __read_mostly sgx_cpudata[NR_CPUS];
>>> +static struct sgx_cpuinfo __read_mostly boot_sgx_cpudata;
>>
>> I don't think any of this is necessary.  The description says that all
>> EPCs across the server will be reported in CPUID subleaves, and our
>> implementation gives up if the data are non-identical across CPUs.
>>
>> Therefore, we only need to keep one copy of the data, and check check
>> APs against the master copy.
> 
> Right. boot_sgx_cpudata is what we need. Currently detect_sgx is called 
> from vmx_cpu_up. How about changing to calling it from identify_cpu, and 
> something like below ?
> 
>      if ( c == &boot_cpu_data )
>          detect_sgx(&boot_sgx_cpudata);
>      else {
>          struct sgx_cpuinfo tmp;
>          detect_sgx(&tmp);
>          if ( memcmp(&boot_sgx_cpudata, &tmp, sizeof (tmp)) )
>              //disable SGX
>      }
> 
> Thanks,
> -Kai
>>
>>
>> Let me see about splitting up a few bits of the existing CPUID
>> infrastructure, so we can use the host cpuid policy more effectively for
>> Xen related things.
>>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
>>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/15] xen: vmx: handle SGX related MSRs
  2017-07-21  9:42     ` Huang, Kai
@ 2017-07-22  1:37       ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-22  1:37 UTC (permalink / raw)
  To: Andrew Cooper, Kai Huang, xen-devel; +Cc: kevin.tian, jbeulich



On 7/21/2017 9:42 PM, Huang, Kai wrote:
> 
> 
> On 7/20/2017 5:27 AM, Andrew Cooper wrote:
>> On 09/07/17 09:09, Kai Huang wrote:
>>> This patch handles IA32_FEATURE_CONTROL and IA32_SGXLEPUBKEYHASHn MSRs.
>>>
>>> For IA32_FEATURE_CONTROL, if SGX is exposed to domain, then 
>>> SGX_ENABLE bit
>>> is always set. If SGX launch control is also exposed to domain, and 
>>> physical
>>> IA32_SGXLEPUBKEYHASHn are writable, then SGX_LAUNCH_CONTROL_ENABLE 
>>> bit is
>>> also always set. Write to IA32_FEATURE_CONTROL is ignored.
>>>
>>> For IA32_SGXLEPUBKEYHASHn, a new 'struct sgx_vcpu' is added for 
>>> per-vcpu SGX
>>> staff, and currently it has vcpu's virtual ia32_sgxlepubkeyhash[0-3]. 
>>> Two
>>> boolean 'readable' and 'writable' are also added to indicate whether 
>>> virtual
>>> IA32_SGXLEPUBKEYHASHn are readable and writable.
>>>
>>> During vcpu is initialized, virtual ia32_sgxlepubkeyhash are also 
>>> initialized.
>>> If physical IA32_SGXLEPUBKEYHASHn are writable, then 
>>> ia32_sgxlepubkeyhash are
>>> set to Intel's default value, as for physical machine, those MSRs 
>>> will have
>>> Intel's default value. If physical MSRs are not writable (it is 
>>> *locked* by
>>> BIOS before handling to Xen), then we try to read those MSRs and use 
>>> physical
>>> values as defult value for virtual MSRs. One thing is rdmsr_safe is 
>>> used, as
>>> although SDM says if SGX is present, IA32_SGXLEPUBKEYHASHn are 
>>> available for
>>> read, but in reality, skylake client (at least some, depending on 
>>> BIOS) doesn't
>>> have those MSRs available, so we use rdmsr_safe and set readable to 
>>> false if it
>>> returns error code.
>>>
>>> For IA32_SGXLEPUBKEYHASHn MSR read from guest, if physical MSRs are not
>>> readable, guest is not allowed to read either, otherwise vcpu's 
>>> virtual MSR
>>> value is returned.
>>>
>>> For IA32_SGXLEPUBKEYHASHn MSR write from guest, we allow guest to 
>>> write if both
>>> physical MSRs are writable and SGX launch control is exposed to domain,
>>> otherwise error is injected.
>>>
>>> To make EINIT run successfully in guest, vcpu's virtual 
>>> IA32_SGXLEPUBKEYHASHn
>>> will be update to physical MSRs when vcpu is scheduled in.
>>>
>>> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
>>> ---
>>>   xen/arch/x86/hvm/vmx/sgx.c         | 194 
>>> +++++++++++++++++++++++++++++++++++++
>>>   xen/arch/x86/hvm/vmx/vmx.c         |  24 +++++
>>>   xen/include/asm-x86/cpufeature.h   |   3 +
>>>   xen/include/asm-x86/hvm/vmx/sgx.h  |  22 +++++
>>>   xen/include/asm-x86/hvm/vmx/vmcs.h |   2 +
>>>   xen/include/asm-x86/msr-index.h    |   6 ++
>>>   6 files changed, 251 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/sgx.c b/xen/arch/x86/hvm/vmx/sgx.c
>>> index 14379151e8..4944e57aef 100644
>>> --- a/xen/arch/x86/hvm/vmx/sgx.c
>>> +++ b/xen/arch/x86/hvm/vmx/sgx.c
>>> @@ -405,6 +405,200 @@ void hvm_destroy_epc(struct domain *d)
>>>       hvm_reset_epc(d, true);
>>>   }
>>> +/* Whether IA32_SGXLEPUBKEYHASHn are physically *unlocked* by BIOS */
>>> +bool_t sgx_ia32_sgxlepubkeyhash_writable(void)
>>> +{
>>> +    uint64_t sgx_lc_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
>>> +                              
>>> IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE |
>>> +                              IA32_FEATURE_CONTROL_LOCK;
>>> +    uint64_t val;
>>> +
>>> +    rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
>>> +
>>> +    return (val & sgx_lc_enabled) == sgx_lc_enabled;
>>> +}
>>> +
>>> +bool_t domain_has_sgx(struct domain *d)
>>> +{
>>> +    /* hvm_epc_populated(d) implies CPUID has SGX */
>>> +    return hvm_epc_populated(d);
>>> +}
>>> +
>>> +bool_t domain_has_sgx_launch_control(struct domain *d)
>>> +{
>>> +    struct cpuid_policy *p = d->arch.cpuid;
>>> +
>>> +    if ( !domain_has_sgx(d) )
>>> +        return false;
>>> +
>>> +    /* Unnecessary but check anyway */
>>> +    if ( !cpu_has_sgx_launch_control )
>>> +        return false;
>>> +
>>> +    return !!p->feat.sgx_launch_control;
>>> +}
>>
>> Both of these should be d->arch.cpuid->feat.{sgx,sgx_lc} only, and not
>> from having individual helpers.
>>
>> The CPUID setup during host boot and domain construction should take
>> care of setting everything up properly, or hiding the features from the
>> guest.  The point of the work I've been doing is to prevent situations
>> where the guest can see SGX but something doesn't work because of Xen
>> using nested checks like this.
> 
> Thanks for comments. Will change to simple check against 
> d->arch.cpuid->feat.{sgx,sgx_lc}.
> 
>>
>>> +
>>> +/* Digest of Intel signing key. MSR's default value after reset. */
>>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac
>>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d
>>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d
>>> +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b
>>> +
>>> +void sgx_vcpu_init(struct vcpu *v)
>>> +{
>>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>>> +
>>> +    memset(sgxv, 0, sizeof (*sgxv));
>>> +
>>> +    if ( sgx_ia32_sgxlepubkeyhash_writable() )
>>> +    {
>>> +        /*
>>> +         * If physical MSRs are writable, set vcpu's default value 
>>> to Intel's
>>> +         * default value. For real machine, after reset, MSRs 
>>> contain Intel's
>>> +         * default value.
>>> +         */
>>> +        sgxv->ia32_sgxlepubkeyhash[0] = 
>>> SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
>>> +        sgxv->ia32_sgxlepubkeyhash[1] = 
>>> SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
>>> +        sgxv->ia32_sgxlepubkeyhash[2] = 
>>> SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
>>> +        sgxv->ia32_sgxlepubkeyhash[3] = 
>>> SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
>>> +
>>> +        sgxv->readable = 1;
>>> +        sgxv->writable = domain_has_sgx_launch_control(v->domain);
>>> +    }
>>> +    else
>>> +    {
>>> +        uint64_t v;
>>> +        /*
>>> +         * Although SDM says if SGX is present, then 
>>> IA32_SGXLEPUBKEYHASHn are
>>> +         * available for read, but in reality for SKYLAKE client 
>>> machines,
>>> +         * those MSRs are not available if SGX is present, so we 
>>> cannot rely on
>>> +         * cpu_has_sgx to determine whether to we are able to read 
>>> MSRs,
>>> +         * instead, we always use rdmsr_safe.
>>
>> Talking with Jun at XenSummit, I got the impression that the
>> availability of these has MSRs is based on SGX_LC, not SGX.
>>
>> Furthermore, that is my reading of 41.2.2 "Intel SGX Launch Control
>> Configuration", although the logic is expressed in terms of checking SGX
>> before SGX_LC.
> 
> Yes you are correct indeed. When I was writing the code I was reading 
> the old SDM, which has bug and doesn't mention SGX_LC in CPUID as a 
> condition. Please see my reply to your question whether this is erratum.
> 
> We should add cpu_has_sgx_lc as additional check, and I think we can use 
> rdmsr if both SGX and SGX_LC are present (probably using rdmsr_safe is 
> still better?).
> 
>>
>>> +         */
>>> +        sgxv->readable = rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, v) ? 
>>> 0 : 1;
>>> +
>>> +        if ( !sgxv->readable )
>>> +            return;
>>> +
>>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH0, 
>>> sgxv->ia32_sgxlepubkeyhash[0]);
>>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH1, 
>>> sgxv->ia32_sgxlepubkeyhash[1]);
>>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH2, 
>>> sgxv->ia32_sgxlepubkeyhash[2]);
>>> +        rdmsr_safe(MSR_IA32_SGXLEPUBKEYHASH3, 
>>> sgxv->ia32_sgxlepubkeyhash[3]);
>>> +    }
>>> +}
>>> +
>>> +void sgx_ctxt_switch_to(struct vcpu *v)
>>> +{
>>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>>> +
>>> +    if ( sgxv->writable && sgx_ia32_sgxlepubkeyhash_writable() )
>>
>> This causes a read of FEATURE_CONTROL on every context switch path,
>> which is inefficient.
>>
>> Just like with CPUID policy, we will (eventually) have a generic MSR
>> policy for the guest to use.  In particular, I can forsee a usecase
>> where hardware has LC unlocked, but the host administrator wishes LC to
>> be locked from the guests point of view.
> 
> We can remove sgx_ia32_sgxlepubkeyhash_writable, as if sgxv->writable is 
> true, then sgx_ia32_sgxlepubkeyhash_writable is always true. I am not 
> sure whether we should leave guest in locked mode in most cases but I 
> think we can add 'lewr' XL parameter to explicitly set guest to unlocked 
> mode (otherwise guest is locked). Please see my latest reply to design 
> of Launch Control.

Hi Andrew,

I'd like to add something regarding to performance optimization of 
updating the MSRs. There are two optimizations that we can do:

- We can add per_cpu variable for the MSRs, and keep the per_vcpu 
variable always being equal to physical MSRs, then we only need to 
update the MSRs when the value we want to update is not equal to the 
value of per_vcpu variable. This can reduce some physical MSR write.

- Thanks to current SGX implementation, IA32_SGXLEPUBKEYHASHn MSRs are 
only used by EINIT. Once EINIT is done, EGETKEY won't depend on 
IA32_SGXLEPUBKEYHASHn. So we can also trap guest's EINIT, and update 
MSRs in EINIT VMEXIT. However if we trap EINIT, Xen needs to run EINIT 
on behalf of guest, meaning Xen needs to remap guest's EINIT parameters 
(we probably don't need to reconstruct EINIT parameters in Xen as both 
SIGSTRUCT and EINITTOKEN don't contain guest's virtual address 
internally that needs to be remapped by Xen), run EINIT, and emulate 
EINIT return value to guest.

The first optimization is pretty straightforward, and I think we should 
do it (and I'll do in next version). The second optimization requires 
trapping EINIT from guest (thus more complicated implementation) but can 
eliminate unnecessary MSR updates during context switch. Do you think we 
should do both the optimizations?

Thanks,
-Kai
> 
>>
>>> +    {
>>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0, 
>>> sgxv->ia32_sgxlepubkeyhash[0]);
>>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1, 
>>> sgxv->ia32_sgxlepubkeyhash[1]);
>>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2, 
>>> sgxv->ia32_sgxlepubkeyhash[2]);
>>> +        wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3, 
>>> sgxv->ia32_sgxlepubkeyhash[3]);
>>> +    }
>>> +}
>>> +
>>> +int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 
>>> *msr_content)
>>> +{
>>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>>> +    u64 data;
>>> +    int r = 1;
>>> +
>>> +    if ( !domain_has_sgx(v->domain) )
>>> +        return 0;
>>> +
>>> +    switch ( msr )
>>> +    {
>>> +    case MSR_IA32_FEATURE_CONTROL:
>>> +        data = (IA32_FEATURE_CONTROL_LOCK |
>>> +                IA32_FEATURE_CONTROL_SGX_ENABLE);
>>> +        /*
>>> +         * If physical IA32_SGXLEPUBKEYHASHn are writable, then we 
>>> always
>>> +         * allow guest to be able to change IA32_SGXLEPUBKEYHASHn at 
>>> runtime.
>>> +         */
>>> +        if ( sgx_ia32_sgxlepubkeyhash_writable() &&
>>> +                domain_has_sgx_launch_control(v->domain) )
>>> +            data |= IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE;
>>> +
>>> +        *msr_content = data;
>>> +
>>> +        break;
>>
>> Newline here please.
> 
> Sure.
> 
>>
>>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
>>
>> Spaces around ... please.  (it is only because of the #defines that this
>> isn't a syntax error).
> 
> Will do.
> 
>>
>>> +        /*
>>> +         * SDM 35.1 Model-Specific Registers, table 35-2.
>>> +         *
>>> +         * IA32_SGXLEPUBKEYHASH[0..3]:
>>> +         *
>>> +         * Read permitted if CPUID.0x12.0:EAX[0] = 1.
>>> +         *
>>> +         * In reality, MSRs may not be readable even SGX is present, 
>>> in which
>>> +         * case guest is not allowed to read either.
>>> +         */
>>> +        if ( !sgxv->readable )
>>> +        {
>>> +            r = 0;
>>> +            break;
>>> +        }
>>> +
>>> +        data = sgxv->ia32_sgxlepubkeyhash[msr - 
>>> MSR_IA32_SGXLEPUBKEYHASH0];
>>> +
>>> +        *msr_content = data;
>>> +
>>> +        break;
>>> +    default:
>>> +        r = 0;
>>> +        break;
>>> +    }
>>> +
>>> +    return r;
>>> +}
>>> +
>>> +int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 
>>> msr_content)
>>> +{
>>> +    struct sgx_vcpu *sgxv = to_sgx_vcpu(v);
>>> +    int r = 1;
>>> +
>>> +    if ( !domain_has_sgx(v->domain) )
>>> +        return 0;
>>> +
>>> +    switch ( msr )
>>> +    {
>>> +    case MSR_IA32_FEATURE_CONTROL:
>>> +        /* sliently drop */
>>
>> Silently dropping is not ok.  This change needs rebasing over c/s
>> 46c3acb308 where I have fixed up the writeability of FEATURE_CONTROL.
> 
> Thanks. I'll take a look and change accordingly.
> 
>>
>>> +        break;
>>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
>>> +        /*
>>> +         * SDM 35.1 Model-Specific Registers, table 35-2.
>>> +         *
>>> +         * IA32_SGXLEPUBKEYHASH[0..3]:
>>> +         *
>>> +         * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is 
>>> available.
>>> +         * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
>>> +         *      FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
>>> +         *
>>> +         * sgxv->writable == 1 means 
>>> sgx_ia32_sgxlepubkeyhash_writable() and
>>> +         * domain_has_sgx_launch_control(d) both are true.
>>> +         */
>>> +        if ( !sgxv->writable )
>>> +        {
>>> +            r = 0;
>>> +            break;
>>> +        }
>>> +
>>> +        sgxv->ia32_sgxlepubkeyhash[msr - MSR_IA32_SGXLEPUBKEYHASH0] =
>>> +            msr_content;
>>> +
>>> +        break;
>>> +    default:
>>> +        r = 0;
>>> +        break;
>>> +    }
>>> +
>>> +    return r;
>>> +}
>>> +
>>>   static bool_t sgx_enabled_in_bios(void)
>>>   {
>>>       uint64_t val, sgx_enabled = IA32_FEATURE_CONTROL_SGX_ENABLE |
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index 243643111d..7ee5515bdc 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -470,6 +470,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>>>       if ( v->vcpu_id == 0 )
>>>           v->arch.user_regs.rax = 1;
>>> +    sgx_vcpu_init(v);
>>> +
>>>       return 0;
>>>   }
>>> @@ -1048,6 +1050,9 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
>>>       if ( v->domain->arch.hvm_domain.pi_ops.switch_to )
>>>           v->domain->arch.hvm_domain.pi_ops.switch_to(v);
>>> +
>>> +    if ( domain_has_sgx(v->domain) )
>>> +        sgx_ctxt_switch_to(v);
>>>   }
>>> @@ -2876,10 +2881,20 @@ static int vmx_msr_read_intercept(unsigned 
>>> int msr, uint64_t *msr_content)
>>>           __vmread(GUEST_IA32_DEBUGCTL, msr_content);
>>>           break;
>>>       case MSR_IA32_FEATURE_CONTROL:
>>> +        /* If neither SGX nor nested is supported, this MSR should 
>>> not be
>>> +         * touched */
>>> +        if ( !sgx_msr_read_intercept(current, msr, msr_content) &&
>>> +                !nvmx_msr_read_intercept(msr, msr_content) )
>>> +            goto gp_fault;
>>
>> Unfortunately, this logic is broken.  In the case that both SMX and VMX
>> are configured, the VMX handler will clobber the values set up by the
>> SGX handler.  Sergey has a VMX-policy series (v1 posted, v2 in the
>> works) to start addressing some of the issues on the VMX side, but
>> fundamentally, all reads like this need serving out of a single policy,
>> rather than having different subsystems fighting for control of the
>> values.  (The Xen MSR code is terrible for this at the moment.)
> 
> Thanks for pointing out. I have located vmx-policy series. I'll look 
> into it to see how should I change this logic.
> 
>>
>>> +        break;
>>>       case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
>>>           if ( !nvmx_msr_read_intercept(msr, msr_content) )
>>>               goto gp_fault;
>>>           break;
>>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
>>> +        if ( !sgx_msr_read_intercept(current, msr, msr_content) )
>>> +            goto gp_fault;
>>> +        break;
>>>       case MSR_IA32_MISC_ENABLE:
>>>           rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
>>>           /* Debug Trace Store is not supported. */
>>> @@ -3119,10 +3134,19 @@ static int vmx_msr_write_intercept(unsigned 
>>> int msr, uint64_t msr_content)
>>>           break;
>>>       }
>>>       case MSR_IA32_FEATURE_CONTROL:
>>> +        /* See vmx_msr_read_intercept */
>>> +        if ( !sgx_msr_write_intercept(current, msr, msr_content) &&
>>> +                !nvmx_msr_write_intercept(msr, msr_content) )
>>
>> Definitely needs a rebase.  nvmx_msr_write_intercept() has been removed.
> 
> Yeah. The code base of this series is about 3-4 weeks ago unfortunately. 
> Will do.
> 
> Thanks,
> -Kai
> 
>>
>> ~Andrew
>>
>>> +            goto gp_fault;
>>> +        break;
>>>       case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_TRUE_ENTRY_CTLS:
>>>           if ( !nvmx_msr_write_intercept(msr, msr_content) )
>>>               goto gp_fault;
>>>           break;
>>> +    case MSR_IA32_SGXLEPUBKEYHASH0...MSR_IA32_SGXLEPUBKEYHASH3:
>>> +        if ( !sgx_msr_write_intercept(current, msr, msr_content) )
>>> +            goto gp_fault;
>>> +        break;
>>>       case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
>>>       case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
>>>       case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
>>> diff --git a/xen/include/asm-x86/cpufeature.h 
>>> b/xen/include/asm-x86/cpufeature.h
>>> index 9793f8c1c5..dfb17c4bd8 100644
>>> --- a/xen/include/asm-x86/cpufeature.h
>>> +++ b/xen/include/asm-x86/cpufeature.h
>>> @@ -98,6 +98,9 @@
>>>   #define cpu_has_smap            boot_cpu_has(X86_FEATURE_SMAP)
>>>   #define cpu_has_sha             boot_cpu_has(X86_FEATURE_SHA)
>>> +/* CPUID level 0x00000007:0.ecx */
>>> +#define cpu_has_sgx_launch_control  
>>> boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL)
>>> +
>>>   /* CPUID level 0x80000007.edx */
>>>   #define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
>>> diff --git a/xen/include/asm-x86/hvm/vmx/sgx.h 
>>> b/xen/include/asm-x86/hvm/vmx/sgx.h
>>> index 40f860662a..c460f61e5e 100644
>>> --- a/xen/include/asm-x86/hvm/vmx/sgx.h
>>> +++ b/xen/include/asm-x86/hvm/vmx/sgx.h
>>> @@ -75,4 +75,26 @@ int hvm_populate_epc(struct domain *d, unsigned 
>>> long epc_base_pfn,
>>>   int hvm_reset_epc(struct domain *d, bool_t free_epc);
>>>   void hvm_destroy_epc(struct domain *d);
>>> +/* Per-vcpu SGX structure */
>>> +struct sgx_vcpu {
>>> +    uint64_t ia32_sgxlepubkeyhash[4];
>>> +    /*
>>> +     * Although SDM says if SGX is present, then 
>>> IA32_SGXLEPUBKEYHASHn are
>>> +     * available for read, but in reality for SKYLAKE client 
>>> machines, those
>>> +     * those MSRs are not available if SGX is present.
>>> +     */
>>> +    bool_t readable;
>>> +    bool_t writable;
>>> +};
>>> +#define to_sgx_vcpu(v)  (&(v->arch.hvm_vmx.sgx))
>>> +
>>> +bool_t sgx_ia32_sgxlepubkeyhash_writable(void);
>>> +bool_t domain_has_sgx(struct domain *d);
>>> +bool_t domain_has_sgx_launch_control(struct domain *d);
>>> +
>>> +void sgx_vcpu_init(struct vcpu *v);
>>> +void sgx_ctxt_switch_to(struct vcpu *v);
>>> +int sgx_msr_read_intercept(struct vcpu *v, unsigned int msr, u64 
>>> *msr_content);
>>> +int sgx_msr_write_intercept(struct vcpu *v, unsigned int msr, u64 
>>> msr_content);
>>> +
>>>   #endif  /* __ASM_X86_HVM_VMX_SGX_H__ */
>>> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
>>> b/xen/include/asm-x86/hvm/vmx/vmcs.h
>>> index 6cfa5c3310..fc0b9d85fd 100644
>>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>>> @@ -160,6 +160,8 @@ struct arch_vmx_struct {
>>>        * pCPU and wakeup the related vCPU.
>>>        */
>>>       struct pi_blocking_vcpu pi_blocking;
>>> +
>>> +    struct sgx_vcpu sgx;
>>>   };
>>>   int vmx_create_vmcs(struct vcpu *v);
>>> diff --git a/xen/include/asm-x86/msr-index.h 
>>> b/xen/include/asm-x86/msr-index.h
>>> index 771e7500af..16206a11b7 100644
>>> --- a/xen/include/asm-x86/msr-index.h
>>> +++ b/xen/include/asm-x86/msr-index.h
>>> @@ -296,6 +296,12 @@
>>>   #define IA32_FEATURE_CONTROL_SENTER_PARAM_CTL         0x7f00
>>>   #define IA32_FEATURE_CONTROL_ENABLE_SENTER            0x8000
>>>   #define IA32_FEATURE_CONTROL_SGX_ENABLE               0x40000
>>> +#define IA32_FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE  0x20000
>>> +
>>> +#define MSR_IA32_SGXLEPUBKEYHASH0   0x0000008c
>>> +#define MSR_IA32_SGXLEPUBKEYHASH1   0x0000008d
>>> +#define MSR_IA32_SGXLEPUBKEYHASH2   0x0000008e
>>> +#define MSR_IA32_SGXLEPUBKEYHASH3   0x0000008f
>>>   #define MSR_IA32_TSC_ADJUST        0x0000003b
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
>>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
  2017-07-18  8:22   ` Huang, Kai
@ 2017-07-28 13:40     ` Wei Liu
  2017-07-31  8:37       ` Huang, Kai
  0 siblings, 1 reply; 58+ messages in thread
From: Wei Liu @ 2017-07-28 13:40 UTC (permalink / raw)
  To: Huang, Kai
  Cc: kevin.tian, sstabellini, Wei Liu, George.Dunlap, andrew.cooper3,
	tim, xen-devel, jbeulich, Kai Huang, ian.jackson

On Tue, Jul 18, 2017 at 08:22:55PM +1200, Huang, Kai wrote:
> Hi Wei,
> 
> Thank you very much for comments. Please see my reply below.
> 
> On 7/17/2017 9:16 PM, Wei Liu wrote:
> > Hi Kai
> > 
> > Thanks for this nice write-up.
> > 
> > Some comments and questions below.
> > 
> > On Sun, Jul 09, 2017 at 08:03:10PM +1200, Kai Huang wrote:
> > > Hi all,
> > > 
> > [...]
> > > 2. SGX Virtualization Design
> > > 
> > > 2.1 High Level Toolstack Changes:
> > > 
> > > 2.1.1 New 'epc' parameter
> > > 
> > > EPC is limited resource. In order to use EPC efficiently among all domains,
> > > when creating guest, administrator should be able to specify domain's virtual
> > > EPC size. And admin
> > > alao should be able to get all domain's virtual EPC size.
> > > 
> > > For this purpose, a new 'epc = <size>' parameter is added to XL configuration
> > > file. This parameter specifies guest's virtual EPC size. The EPC base address
> > > will be calculated by toolstack internally, according to guest's memory size,
> > > MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.
> > > 
> > > 2.1.2 New XL commands (?)
> > > 
> > > Administrator should be able to get physical EPC size, and all domain's virtual
> > > EPC size. For this purpose, we can introduce 2 additional commands:
> > > 
> > >      # xl sgxinfo
> > > 
> > > Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2,
> > > etc) if necessary.
> > > 
> > >      # xl sgxlist <did>
> > > 
> > > Which will print out particular domain's virtual EPC size, or list all virtual
> > > EPC sizes for all supported domains.
> > > 
> > > Alternatively, we can also extend existing XL commands by adding new option
> > > 
> > >      # xl info -sgx
> > > 
> > > Which will print out physical EPC size along with other physinfo. And
> > > 
> > >      # xl list <did> -sgx
> > > 
> > > Which will print out domain's virtual EPC size.
> > > 
> > > Comments?
> > > 
> > 
> > Can a guest have multiple EPC? If so, the proposed parameter is not good
> > enough.
> 
> According to SDM a machine may have multiple EPC, but it may have doesn't
> mean it must have. EPC is typically reserved by BIOS as Processor Reserved
> Memory (PRM), and in my understanding, client machine  doesn't need to have
> multiple EPC. Currently, I don't see why we need to expose multiple EPC to
> guest. Even physical machine reports multiple EPC, exposing one EPC to guest
> is enough. Currently SGX should not be supported with virtual NUMA
> simultaneously for a single domain.
> 

When you say "is enough", do you mean Intel doesn't recommend users to
use more than one? I don't think from reading this doc precludes using
more then one technically.

> > 
> > Can a guest with EPC enabled be migrated? The answer to this question
> > can lead to multiple other questions.
> 
> See the last section of my design. I saw you've already seen it. :)
> 
> > 
> > Another question, is EPC going to be backed by normal memory? This is
> > related to memory accounting of the guest.
> 
> Although SDM says typically EPC is allocated by BIOS as PRM, but I think we
> can just treat EPC as PRM, so I believe yes, physically EPC is backed by
> normal memory. But EPC is reported as reserved memory in e820 table.
> 
> > 
> > Is EPC going to be modeled as a device or another type of memory? This
> > is related to how we manage it in the toolstack.
> 
> I think we'd better to treat EPC as another type of memory. I am not sure
> whether it should be modeled as device, as on real machine, EPC is also
> exposed in ACPI table via "INT0E0C" device under \_SB (however it is not
> modeled as PCIE device for sure).
> 
> > 
> > Finally why do you not allow the users to specify the base address?
> 
> I don't see any reason why user needs to specify base address. If we do,
> then specify what address? On real machine, BIOS set the base address, and
> for VM, I think toolstack/Xen should do this.

We can expose an option for user to control that if they want to and at
the same time provide the logic to calculate the base address
internally. I'm not sure if that's going to be very useful, but I'm not
convinced it is entirely useless either.

Thinking a bit more we can always extend the syntax and API to support
that if need be, so I'm fine with not providing such mechanism at early
stage.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
  2017-07-28 13:40     ` Wei Liu
@ 2017-07-31  8:37       ` Huang, Kai
  0 siblings, 0 replies; 58+ messages in thread
From: Huang, Kai @ 2017-07-31  8:37 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, sstabellini, George.Dunlap, andrew.cooper3, tim,
	xen-devel, jbeulich, Kai Huang, ian.jackson

Hi Wei,

Thanks for your comments. Please see my reply below.

On 7/29/2017 1:40 AM, Wei Liu wrote:
> On Tue, Jul 18, 2017 at 08:22:55PM +1200, Huang, Kai wrote:
>> Hi Wei,
>>
>> Thank you very much for comments. Please see my reply below.
>>
>> On 7/17/2017 9:16 PM, Wei Liu wrote:
>>> Hi Kai
>>>
>>> Thanks for this nice write-up.
>>>
>>> Some comments and questions below.
>>>
>>> On Sun, Jul 09, 2017 at 08:03:10PM +1200, Kai Huang wrote:
>>>> Hi all,
>>>>
>>> [...]
>>>> 2. SGX Virtualization Design
>>>>
>>>> 2.1 High Level Toolstack Changes:
>>>>
>>>> 2.1.1 New 'epc' parameter
>>>>
>>>> EPC is limited resource. In order to use EPC efficiently among all domains,
>>>> when creating guest, administrator should be able to specify domain's virtual
>>>> EPC size. And admin
>>>> alao should be able to get all domain's virtual EPC size.
>>>>
>>>> For this purpose, a new 'epc = <size>' parameter is added to XL configuration
>>>> file. This parameter specifies guest's virtual EPC size. The EPC base address
>>>> will be calculated by toolstack internally, according to guest's memory size,
>>>> MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.
>>>>
>>>> 2.1.2 New XL commands (?)
>>>>
>>>> Administrator should be able to get physical EPC size, and all domain's virtual
>>>> EPC size. For this purpose, we can introduce 2 additional commands:
>>>>
>>>>       # xl sgxinfo
>>>>
>>>> Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2,
>>>> etc) if necessary.
>>>>
>>>>       # xl sgxlist <did>
>>>>
>>>> Which will print out particular domain's virtual EPC size, or list all virtual
>>>> EPC sizes for all supported domains.
>>>>
>>>> Alternatively, we can also extend existing XL commands by adding new option
>>>>
>>>>       # xl info -sgx
>>>>
>>>> Which will print out physical EPC size along with other physinfo. And
>>>>
>>>>       # xl list <did> -sgx
>>>>
>>>> Which will print out domain's virtual EPC size.
>>>>
>>>> Comments?
>>>>
>>>
>>> Can a guest have multiple EPC? If so, the proposed parameter is not good
>>> enough.
>>
>> According to SDM a machine may have multiple EPC, but it may have doesn't
>> mean it must have. EPC is typically reserved by BIOS as Processor Reserved
>> Memory (PRM), and in my understanding, client machine  doesn't need to have
>> multiple EPC. Currently, I don't see why we need to expose multiple EPC to
>> guest. Even physical machine reports multiple EPC, exposing one EPC to guest
>> is enough. Currently SGX should not be supported with virtual NUMA
>> simultaneously for a single domain.
>>
> 
> When you say "is enough", do you mean Intel doesn't recommend users to
> use more than one? I don't think from reading this doc precludes using
> more then one technically.

No I don't think Intel would make such recommendation. For real hardware 
yes it's possible there are multiple EPC sections, but for client or 
single socket server machine, typically there will be only one EPC. In 
case of VM, I don't see there's any benefit of exposing multiple EPCs to 
guest, except the vNUMA case. My thinking is although SDM doesn't 
preclude using more than one EPC but for VM there's no need to use more 
than one.

> 
>>>
>>> Can a guest with EPC enabled be migrated? The answer to this question
>>> can lead to multiple other questions.
>>
>> See the last section of my design. I saw you've already seen it. :)
>>
>>>
>>> Another question, is EPC going to be backed by normal memory? This is
>>> related to memory accounting of the guest.
>>
>> Although SDM says typically EPC is allocated by BIOS as PRM, but I think we
>> can just treat EPC as PRM, so I believe yes, physically EPC is backed by
>> normal memory. But EPC is reported as reserved memory in e820 table.
>>
>>>
>>> Is EPC going to be modeled as a device or another type of memory? This
>>> is related to how we manage it in the toolstack.
>>
>> I think we'd better to treat EPC as another type of memory. I am not sure
>> whether it should be modeled as device, as on real machine, EPC is also
>> exposed in ACPI table via "INT0E0C" device under \_SB (however it is not
>> modeled as PCIE device for sure).
>>
>>>
>>> Finally why do you not allow the users to specify the base address?
>>
>> I don't see any reason why user needs to specify base address. If we do,
>> then specify what address? On real machine, BIOS set the base address, and
>> for VM, I think toolstack/Xen should do this.
> 
> We can expose an option for user to control that if they want to and at
> the same time provide the logic to calculate the base address
> internally. I'm not sure if that's going to be very useful, but I'm not
> convinced it is entirely useless either.
> 
> Thinking a bit more we can always extend the syntax and API to support
> that if need be, so I'm fine with not providing such mechanism at early
> stage.

Yeah I think we can extend if needed in the future. Thanks Wei.

Thanks,
-Kai

> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2017-07-31  8:37 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-09  8:03 [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Kai Huang
2017-07-09  8:04 ` [PATCH 01/15] xen: x86: expose SGX to HVM domain in CPU featureset Kai Huang
2017-07-12 11:09   ` Andrew Cooper
2017-07-17  6:20     ` Huang, Kai
2017-07-18 10:12   ` Andrew Cooper
2017-07-18 22:41     ` Huang, Kai
2017-07-09  8:09 ` [PATCH 02/15] xen: vmx: detect ENCLS VMEXIT Kai Huang
2017-07-12 11:11   ` Andrew Cooper
2017-07-12 18:54     ` Jan Beulich
2017-07-13  4:57       ` Huang, Kai
2017-07-09  8:09 ` [PATCH 03/15] xen: x86: add early stage SGX feature detection Kai Huang
2017-07-19 14:23   ` Andrew Cooper
2017-07-21  9:17     ` Huang, Kai
2017-07-22  1:06       ` Huang, Kai
2017-07-09  8:09 ` [PATCH 06/15] xen: x86: add SGX basic EPC management Kai Huang
2017-07-09  8:09 ` [PATCH 07/15] xen: x86: add functions to populate and destroy EPC for domain Kai Huang
2017-07-09  8:09 ` [PATCH 09/15] xen: vmx: handle SGX related MSRs Kai Huang
2017-07-19 17:27   ` Andrew Cooper
2017-07-21  9:42     ` Huang, Kai
2017-07-22  1:37       ` Huang, Kai
2017-07-09  8:09 ` [PATCH 10/15] xen: vmx: handle ENCLS VMEXIT Kai Huang
2017-07-09  8:09 ` [PATCH 11/15] xen: vmx: handle VMEXIT from SGX enclave Kai Huang
2017-07-09  8:09 ` [PATCH 12/15] xen: x86: reset EPC when guest got suspended Kai Huang
2017-07-09  8:10 ` [PATCH 04/15] xen: mm: add ioremap_cache Kai Huang
2017-07-11 20:14   ` Julien Grall
2017-07-12  1:52     ` Huang, Kai
2017-07-12  7:13       ` Julien Grall
2017-07-13  5:01         ` Huang, Kai
2017-07-12  6:17     ` Jan Beulich
2017-07-13  4:59       ` Huang, Kai
2017-07-09  8:10 ` [PATCH 08/15] xen: x86: add SGX cpuid handling support Kai Huang
2017-07-12 10:56   ` Andrew Cooper
2017-07-13  5:42     ` Huang, Kai
2017-07-14  7:37       ` Andrew Cooper
2017-07-14 11:08         ` Jan Beulich
2017-07-17  6:16         ` Huang, Kai
2017-07-09  8:12 ` [PATCH 05/15] xen: p2m: new 'p2m_epc' type for EPC mapping Kai Huang
2017-07-12 11:01   ` Andrew Cooper
2017-07-12 12:21     ` George Dunlap
2017-07-13  5:56       ` Huang, Kai
2017-07-09  8:14 ` [PATCH 13/15] xen: tools: add new 'epc' parameter support Kai Huang
2017-07-09  8:15 ` [PATCH 14/15] xen: tools: add SGX to applying CPUID policy Kai Huang
2017-07-09  8:16 ` [PATCH 15/15] xen: tools: expose EPC in ACPI table Kai Huang
2017-07-12 11:05   ` Andrew Cooper
2017-07-13  8:23     ` Huang, Kai
2017-07-14 11:31   ` Jan Beulich
2017-07-17  6:11     ` Huang, Kai
2017-07-17 10:54   ` Roger Pau Monné
2017-07-18  8:36     ` Huang, Kai
2017-07-18 10:21       ` Roger Pau Monné
2017-07-18 22:44         ` Huang, Kai
2017-07-11 14:13 ` [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches Andrew Cooper
2017-07-17  6:08   ` Huang, Kai
2017-07-21  9:04     ` Huang, Kai
2017-07-17  9:16 ` Wei Liu
2017-07-18  8:22   ` Huang, Kai
2017-07-28 13:40     ` Wei Liu
2017-07-31  8:37       ` Huang, Kai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.