xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Christopher Clark <christopher.w.clark@gmail.com>
To: xen-devel@lists.xenproject.org
Cc: "Daniel P. Smith" <dpsmith@apertussolutions.com>,
	andrew.cooper3@citrix.com, stefano.stabellini@xilinx.com,
	jgrall@amazon.com, Julien.grall.oss@gmail.com,
	iwj@xenproject.org, wl@xen.org, george.dunlap@citrix.com,
	jbeulich@suse.com, persaur@gmail.com, Bertrand.Marquis@arm.com,
	roger.pau@citrix.com, luca.fancellu@arm.com, paul@xen.org,
	adam.schwalm@starlab.io, scott.davis@starlab.io,
	Christopher Clark <christopher.clark@starlab.io>
Subject: [PATCH v4 1/2] docs/designs/launch: Hyperlaunch design document
Date: Thu, 13 May 2021 20:41:00 -0700	[thread overview]
Message-ID: <20210514034101.3683-2-christopher.w.clark@gmail.com> (raw)
In-Reply-To: <20210514034101.3683-1-christopher.w.clark@gmail.com>

From: "Daniel P. Smith" <dpsmith@apertussolutions.com>

Adds a design document for Hyperlaunch, formerly DomB mode of dom0less.

Signed-off-by: Christopher Clark <christopher.clark@starlab.io>
Signed-off by: Daniel P. Smith <dpsmith@apertussolutions.com>
Reviewed-by: Rich Persaud <rp@stacktrust.org>

Changes since v3:
* Rename the Landscape table
* Changed Crash Domain to Recovery Domain
  * amended text to indicate that this will be new rather than existing Xen
  * including update to the configuration, permission, function table
* Add definitions for “recovery domain” and “crash environment”, describing
  the different functionalities
  * some design issues deferred
* Added section to explain the motivations for the separation between VM
  creation (by the hypervisor) and VM configuration (by the boot domain)
* Adjusted the description of the current process for creating a domain
* Added recommendation for UEFI boot to use GRUB.efi to load via multiboot2
* Added Document Structure section
* Added section on Communication of Domain Configuration

 docs/designs/launch/hyperlaunch.rst | 1004 +++++++++++++++++++++++++++
 1 file changed, 1004 insertions(+)
 create mode 100644 docs/designs/launch/hyperlaunch.rst

diff --git a/docs/designs/launch/hyperlaunch.rst b/docs/designs/launch/hyperlaunch.rst
new file mode 100644
index 0000000000..30fce8c9c3
--- /dev/null
+++ b/docs/designs/launch/hyperlaunch.rst
@@ -0,0 +1,1004 @@
+Hyperlaunch Design Document
+.. sectnum:: :depth: 4
+This post is a Request for Comment on the included v4 of a design document that
+describes Hyperlaunch: a new method of launching the Xen hypervisor, relating
+to dom0less and work from the Hyperlaunch project. We invite discussion of this
+on this list, at the monthly Xen Community Calls, and at dedicated meetings on
+this topic in the Xen Working Group which will be announced in advance on the
+Xen Development mailing list.
+.. contents:: :depth: 3
+This document describes the design and motivation for the funded development of
+a new, flexible system for launching the Xen hypervisor and virtual machines
+named: "Hyperlaunch".
+The design enables seamless transition for existing systems that require a
+dom0, and provides a new general capability to build and launch alternative
+configurations of virtual machines, including support for static partitioning
+and accelerated start of VMs during host boot, while adhering to the principles
+of least privilege. It incorporates the existing dom0less functionality,
+extended to fold in the new developments from the Hyperlaunch project, with
+support for both x86 and Arm platform architectures, building upon and
+replacing the earlier 'late hardware domain' feature for disaggregation of
+Hyperlaunch is designed to be flexible and reusable across multiple use cases,
+and our aim is to ensure that it is capable, widely exercised, comprehensively
+tested, and well understood by the Xen community.
+Document Structure
+This is the primary design document for Hyperlaunch, to provide an overview of
+the feature. Separate additional documents will cover specific aspects of
+Hyperlaunch in further detail, including:
+  - The Device Tree specification for Hyperlaunch metadata
+  - New Domain Roles for Xen and the Xen Security Modules (XSM) policy
+  - Passthrough of PCI devices with Hyperlaunch
+Born out of improving support for Dynamic Root of Trust for Measurement (DRTM),
+the Hyperlaunch project is focused on restructuring the system launch of Xen.
+The Hyperlaunch design provides a security architecture that builds on the
+principles of Least Privilege and Strong Isolation, achieving this through the
+disaggregation of system functions. It enables this with the introduction of a
+boot domain that works in conjunction with the hypervisor to provide the
+ability to launch multiple domains as part of host boot while maintaining a
+least privilege implementation.
+While the Hyperlaunch project inception was and continues to be driven by a
+focus on security through disaggregation, there are multiple use cases with a
+non-security focus that require or benefit from the ability to launch multiple
+domains at host boot. This was proven by the need that drove the implementation
+of the dom0less capability in the Arm branch of Xen.
+Hyperlaunch is designed to be flexible and reusable across multiple use cases,
+and our aim is to ensure that it is capable, widely exercised, comprehensively
+tested, and provides a robust foundation for current and emerging system launch
+requirements of the Xen community.
+* In general strive to maintain compatibility with existing Xen behavior
+* A default build of the hypervisor should be capable of booting both legacy-compatible and new styles of launch:
+        * classic Xen boot: starting a single, privileged Dom0
+        * classic Xen boot with late hardware domain: starting a Dom0 that transitions hardware access/control to another domain
+        * a dom0less boot: starting multiple domains without privilege assignment controls
+        * Hyperlaunch: starting one or more VMs, with flexible configuration
+* Preferred that it be managed via KCONFIG options to govern inclusion of support for each style
+* The selection between classic boot and Hyperlaunch boot should be automatic
+        * Preferred that it not require a kernel command line parameter for selection
+* It should not require modification to boot loaders
+* It should provide a user friendly interface for its configuration and management
+* It must provide a method for building systems that fallback to console access in the event of misconfiguration
+* It should be able to boot an x86 Xen environment without the need for a Dom0 domain
+Requirements and Design
+Hyperlaunch is defined as the ability of a hypervisor to construct and start
+one or more virtual machines at system launch in a specific way. A hypervisor
+can support one or both modes of configuration, Hyperlaunch Static and
+Hyperlaunch Dynamic. The Hyperlaunch Static mode functions as a static
+partitioning hypervisor ensuring only the virtual machines started at system
+launch are running on the system. The Hyperlaunch Dynamic mode functions as a
+dynamic hypervisor allowing for additional virtual machines to be started after
+the initial virtual machines have started. The Xen hypervisor is capable of
+both modes of configuration from the same binary and when paired with its XSM
+flask, provides strong controls that enable fine grained system partitioning.
+Hypervisor Launch Landscape
+This comparison table presents the distinctive capabilities of Hyperlaunch with
+reference to existing launch configurations currently available in Xen and
+other hypervisors.
+ +---------------+-----------+------------+-----------+-------------+---------------------+
+ | **Xen Dom0**  | **Linux** | **Late**   | **Jail**  | **Xen**     | **Xen Hyperlaunch** |
+ | **(Classic)** | **KVM**   | **HW Dom** | **house** | **dom0less**+---------+-----------+
+ |               |           |            |           |             | Static  | Dynamic   |
+ +===============+===========+============+===========+=============+=========+===========+
+ | Hypervisor able to launch multiple VMs during host boot                                |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |     Y     |       Y     |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Hypervisor supports Static Partitioning                                                |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |     Y     |       Y     |    Y    |           |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Able to launch VMs dynamically after host boot                                         |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |       Y       |     Y     |      Y*    |     Y     |       Y*    |         |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Supports strong isolation between all VMs started at host boot                         |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |     Y     |       Y     |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Enables flexible sequencing of VM start during host boot                               |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |           |             |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Prevent all-powerful static root domain being launched at boot                         |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |           |       Y*    |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Operates without a Highly-privileged management VM (eg. Dom0)                          |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |      Y*    |           |       Y*    |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Operates without a privileged toolstack VM (Control Domain)                            |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |           |       Y*    |    Y    |           |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Extensible VM configuration applied before launch of VMs at host boot                  |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |           |             |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Flexible granular assignment of permissions and functions to VMs                       |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |           |             |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | Supports extensible VM measurement architecture for DRTM and attestation               |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |           |             |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ | PCI passthrough configured at host boot                                                |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+ |               |           |            |           |             |    Y    |     Y     |
+ +---------------+-----------+------------+-----------+-------------+---------+-----------+
+Domain Construction
+An important aspect of the Hyperlaunch architecture is that the hypervisor
+performs domain construction for all the Initial Domains,  ie. it builds each
+domain that is described in the Launch Control Module. More specifically, the
+hypervisor will perform the function of *domain creation* for each Initial
+Domain: it allocates the unique domain identifier assigned to the virtual
+machine and records essential metadata about it in the internal data structure
+that enables scheduling the domain to run. It will also perform *basic domain
+construction*: build the initial page tables with data from the kernel and
+initial ramdisk supplied, and as appropriate for the domain type, populate the
+p2m table and ACPI tables.
+Subsequent to this, the boot domain can apply additional configuration to the
+initial domains from the data in the LCM, in *extended domain construction*.
+The benefits of this structure include:
+* Security: Contrains the permissions required by the boot domain: it does not
+  require the capability to create domains in this structure. This aligns with
+  the principles of least privilege.
+* Flexibility: Enables policy-based dynamic assignment of hardware by the boot
+  domain, customizable according to use-case and able to adapt to hardware
+  discovery
+* Compatibility: Supports reuse of familiar tools with use-case customized boot
+  domains.
+* Commonality: Reuses the same logic for initial basic domain building across
+  diverse Xen deployments.
+	* It aligns the x86 initial domain construction with the existing Arm
+	  dom0less feature for construction of multiple domains at boot.
+	* The boot domain implementation may vary significantly with different
+	  deployment use cases, whereas the hypervisor implementation is
+	  common.
+* Correctness: Increases confidence in the implementation of domain
+  construction, since it is performed by the hypervisor in well maintained and
+  centrally tested logic.
+* Performance: Enables launch for configurations where a fast start of
+  multiple domains at boot is a requirement.
+* Capability: Supports launch of advanced configurations where a sequenced
+  start of multiple domains is required, or multiple domains are involved in
+  startup of the running system configuration
+	* eg. for PCI passthrough on systems where the toolstack runs in a
+	  separate domain to the hardware management.
+Please, see the ‘Hyperlaunch Device Tree’ design document, which describes the
+configuration module that is provided to the hypervisor by the bootloader.
+The hypervisor determines how these domains are started as host boot completes:
+in some systems the Boot Domain acts upon the extended boot configuration
+supplied as part of launch, performing configuration tasks for preparing the
+other domains for the hypervisor to commence running them.
+Common Boot Configurations
+When looking across those that have expressed interest or discussed a need for
+launching multiple domains at host boot, the Hyperlaunch approach is to provide
+the means to start nearly any combination of domains. Below is an enumerated
+selection of common boot configurations for reference in the following section. 
+Dynamic Launch with a Highly-Privileged Domain 0
+Hyperlaunch Classic: Dom0
+        This configuration mimics the classic Xen start and domain construction
+        where a single domain is constructed with all privileges and functions for
+        managing hardware and running virtualization toolstack software.
+Hyperlaunch Classic: Extended Launch Dom0
+        This configuration is where a Dom0 is started via a Boot Domain that runs
+        first. This is for cases where some preprocessing in a less privileged domain
+        is required before starting the all-privileged Domain 0.
+Hyperlaunch Classic: Basic Cloud
+        This configuration constructs a Dom0 that is started in parallel with some
+        number of workload domains.
+Hyperlaunch Classic: Cloud
+        This configuration builds a Dom0 and some number of workload domains, launched
+        via a Boot Domain that runs first.
+Static Launch Configurations: without a Domain 0 or a Control Domain
+Hyperlaunch Static: Basic
+        Simple static partitioning where all domains that can be run on this system are
+        built and started during host boot and where no domain is started with the
+        Control Domain permissions, thus making it not possible to create/start any
+        further new domains.
+Hyperlaunch Static: Standard
+        This is a variation of the “Hyperlaunch Static: Basic” static partitioning
+        configuration with the introduction of a Boot Domain. This configuration allows
+        for use of a Boot Domain to be able to apply extended configuration
+        to the Initial Domains before they are started and
+        sequence the order in which they start.
+Hyperlaunch Static: Disaggregated
+        This is a variation of the “Hyperlaunch Static: Standard” configuration with
+        the introduction of a Boot Domain and an illustration that some functions can
+        be disaggregated to dedicated domains.
+Dynamic Launch of Disaggregated System Configurations
+Hyperlaunch Dynamic: Hardware Domain
+        This configuration mimics the existing Xen feature late hardware domain with
+        the one difference being that the hardware domain is constructed by the
+        hypervisor at startup instead of later by Dom0.
+Hyperlaunch Dynamic: Flexible Disaggregation
+        This configuration is similar to the “Hyperlaunch Classic: Dom0” configuration
+        except that it includes starting a separate hardware domain during Xen startup.
+        It is also similar to “Hyperlaunch Dynamic: Hardware Domain” configuration, but
+        it launches via a Boot Domain that runs first.
+Hyperlaunch Dynamic: Full Disaggregation
+        In this configuration it is demonstrated how it is possible to start a fully
+        disaggregated system: the virtualization toolstack runs in a Control Domain,
+        separate from the domains responsible for managing hardware, XenStore, the Xen
+        Console and Crash functions, each launched via a Boot Domain.
+Example Use Cases and Configurations
+The following example use cases can be matched to configurations listed in the
+previous section.
+Use case: Modern cloud hypervisor
+**Option:** Hyperlaunch Classic: Cloud
+This configuration will support strong isolation for virtual TPM domains and
+measured launch in support of attestation to infrastructure management, while
+allowing the use of existing Dom0 virtualization toolstack software.
+Use case: Edge device with security or safety requirements
+**Option:** Hyperlaunch Static: Boot
+This configuration runs without requiring a highly-privileged Dom0, and enables
+extended VM configuration to be applied to the Initial VMs prior to launching
+them, optionally in a sequenced start.
+Use case: Client hypervisor
+**Option:** Hyperlaunch Dynamic: Flexible Disaggregation
+**Option:** Hyperlaunch Dynamic: Full Disaggregation
+These configurations enable dynamic client workloads, strong isolation for the
+domain running the virtualization toolstack software and each domain managing
+hardware, with PCI passthrough performed during host boot and support for
+measured launch.
+Hyperlaunch Disaggregated Launch
+Existing in Xen today are two primary permissions, *control domain* and
+*hardware domain*, and two functions, *console domain* and *xenstore domain*,
+that can be assigned to a domain. Traditionally all of these permissions and
+functions are all assigned to Dom0 at start and can then be delegated to other
+domains created by the toolstack in Dom0. With Hyperlaunch it becomes possible
+to assign these permissions and functions to any domain for which there is a
+definition provided at startup.
+Additionally, two further functions are introduced: the *recovery domain*,
+intended to assist with recovery from failures encountered starting VMs during
+host boot, and the *boot domain*, for performing aspects of domain construction
+during startup.
+Supporting the booting of each of the above common boot configurations is
+accomplished by considering the set of initial domains and the assignment of
+Xen’s permissions and functions, including the ones introduced by Hyperlaunch,
+to these domains. A discussion of these will be covered later but for now they
+are laid out in a table with a mapping to the common boot configurations. This
+table is not intended to be an exhaustive list of configurations and does not
+account for flask policy specified functions that are use case specific.
+In the table each number represents a separate domain being
+constructed by the Hyperlaunch construction path as Xen starts, and the
+designator, ``{n}`` signifies that there may be “n” additional domains that may
+be constructed that do not have any special role for a general Xen system.
+ +-------------------+------------------+-----------------------------------+
+ | Configuration     |    Permission    |            Function               |
+ |                   +------+------+----+------+--------+--------+----------+
+ |                   | None | Ctrl | HW | Boot |Recovery| Console| Xenstore |
+ +===================+======+======+====+======+========+========+==========+
+ | Classic: Dom0     |      |  0   | 0  |      |   0    |   0    |    0     |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Classic: Extended |      |  1   | 1  |  0   |   1    |   1    |    1     |
+ | Launch Dom0       |      |      |    |      |        |        |          |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Classic:          | {n}  |  0   | 0  |      |   0    |   0    |    0     |
+ | Basic Cloud       |      |      |    |      |        |        |          |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Classic: Cloud    | {n}  |  1   | 1  |  0   |   1    |   1    |    1     |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Static: Basic     | {n}  |      | 0  |      |   0    |   0    |    0     |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Static: Standard  | {n}  |      | 1  |  0   |   1    |   1    |    1     |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Static:           | {n}  |      | 2  |  0   |   3    |   4    |    1     |
+ | Disaggregated     |      |      |    |      |        |        |          |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Dynamic:          |      |  0   | 1  |      |   0    |   0    |    0     |
+ | Hardware Domain   |      |      |    |      |        |        |          |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Dynamic: Flexible | {n}  |  1   | 2  |  0   |   1    |   1    |    1     |
+ | Disaggregation    |      |      |    |      |        |        |          |
+ +-------------------+------+------+----+------+--------+--------+----------+
+ | Dynamic: Full     | {n}  |  2   | 3  |  0   |   4    |   5    |    1     |
+ | Disaggregation    |      |      |    |      |        |        |          |
+ +-------------------+------+------+----+------+--------+--------+----------+
+Overview of Hyperlaunch Flow
+Before delving into Hyperlaunch, a good basis to start with is an understanding
+of the current process to create a domain. A way to view this process starts
+with the core configuration which is the information the hypervisor requires to
+make the call to `domain_create`, followed by basic construction to provide the
+memory image to run, including the kernel and ramdisk. A subsequent step
+applies the extended configuration used by the toolstack to provide a domain
+with any additional configuration information. Until the extended configuration
+is completed, a domain has access to no resources except its allocated vcpus
+and memory. The exception to this is Dom0, which the hypervisor explicitly
+grants control and access to all system resources, except for those that only
+the hypervisor should have control over.  This exception for Dom0 is driven by
+the system structure with a monolithic Dom0 domain predating introduction of
+support for disaggregation into Xen, and the corresponding default assignment
+of multiple roles within the Xen system to Dom0.
+While not a different domain creation path, there does exist the Hardware
+Domain (hwdom), sometimes also referred to as late-Dom0. It is an early effort
+to disaggregate Dom0’s roles into a separate control domain and hardware
+domain. This capability is activated by the passing of a domain id to the
+`hardware_dom` kernel command line parameter, and the Xen hypervisor will then
+flag that domain id as the hardware domain. Later when the toolstack constructs
+a domain with that domain id as the requested domid, the hypervisor will
+transfer all device I/O from Dom0 to this domain. In addition it will also
+transfer the “host shutdown on domain shutdown” flag from Dom0 to the hardware
+domain. It is worth mentioning that this approach for disaggregation was
+created in this manner due to the inability of Xen to launch more than one
+domain at startup.
+Hyperlaunch Xen startup
+The Hyperlaunch approach’s primary focus is on how to assign the roles
+traditionally granted to Dom0 to one or more domains at host boot. While the
+statement is simple to make, the implications are not trivial by any means.
+This also explains why the Hyperlaunch approach is orthogonal to the existing
+dom0less capability. The dom0less capability focuses on enabling the launch of
+multiple domains in parallel with Dom0 at host boot. A corollary for dom0less
+is that for systems that don’t require Dom0 after all guest domains have
+started, they are able to do the host boot without a Dom0. Though it should be
+noted that it may be possible to start  Dom0 at a later point. Whereas with
+Hyperlaunch, its approach of separating Dom0’s roles requires the ability to
+launch multiple domains at host boot. The direct consequences from this
+approach are profound and provide a myriad of possible configurations for which
+a sample of common boot configurations were already presented.
+To enable the Hyperlaunch approach a new alternative path for host boot within
+the hypervisor must be introduced. This alternative path effectively branches
+just before the current point of Dom0 construction and begins an alternate
+means of system construction. The determination if this alternate path should
+be taken is through the inspection of the boot chain. If the bootloader has
+loaded a specific configuration, as described later, it will enable Xen to
+detect that a Hyperlaunch configuration has been provided. Once a Hyperlaunch
+configuration is detected, this alternate path can be thought of as occurring
+in phases: domain creation, domain preparation, and launch finalization.
+Domain Creation
+The domain creation phase begins with Xen parsing the bootloader provided
+material, to understand the content of the modules provided. It will then load
+any microcode or XSM policy it discovers. For each domain configuration Xen
+finds, it parses the configuration to construct the necessary domain definition
+to instantiate an instance of the domain and leave it in a paused state. When
+all domain configurations have been instantiated as domains, if one of them is
+flagged as the Boot Domain, that domain will be unpaused starting the domain
+preparation phase. If there is no Boot Domain defined, then the domain
+preparation phase will be skipped and Xen will trigger the launch finalization
+Domain Preparation Phase
+The domain preparation phase is an optional check point for the execution of a
+workload specific domain, the Boot Domain. While the Boot Domain is the first
+domain to run and has some degree of control over the system, it is extremely
+restricted in both system resource access and hypervisor operations. Its
+purpose is to:
+* Access the configuration provided by the bootloader
+* Finalize the configuration of the domains
+* Conduct any setup and launch related operations
+* Do an ordered unpause of domains that require an ordered start
+When the Boot Domain has completed, it will notify the hypervisor that it is
+done triggering the launch finalization phase.
+Launch Finalization
+The hypervisor handles the launch finalization phase which is equivalent to the
+clean up phase. As such the steps taken by the hypervisor, not necessarily in
+implementation order, are as follows,
+* Free the boot module chain
+* If a Boot Domain was used, reclaim Boot Domain resources
+* Unpause any domains still in a paused state
+* Boot Domain uses a reserved function thus can never be respawned
+While the focus thus far has been on how the Hyperlaunch capability will work,
+it is worth mentioning what it does not do or limit from occurring. It does not
+stop or inhibit the assigning of the control domain role which gives the domain
+the ability to create, start, stop, restart, and destroy domains or the
+hardware domain role which gives access to all I/O devices except those that
+the hypervisor has reserved for itself. In particular it is still possible to
+construct a domain with all the privileged roles, i.e. a Dom0, with or without
+the domain id being zero. In fact what limitations are imposed now become fully
+configurable without the risk of circumvention by an all privileged domain.
+Structuring of Hyperlaunch
+The structure of Hyperlaunch is built around the existing capabilities of the
+host boot protocol. This approach was driven by the objective not to require
+modifications to the boot loader. The only requirement is that the boot loader
+supports the Multiboot2 (MB2) protocol. For UEFI boot, our recommendation is to
+use GRUB.efi to load Xen and the initial domain materials via the multiboot2
+method. On Arm platforms, Hyperlaunch is compatible with the existing interface
+for boot into the hypervisor.
+x86 Multiboot2
+The MB2 protocol has no concept of a manifest to tell the initial kernel what
+is contained in the chain, leaving it to the kernel to impose a loading
+convention, use magic number identification, or both. When considering the
+passing of multiple kernels, ramdisks, and domain configuration along with any
+existing modules already passed, there is no sane convention that could be
+imposed and magic number identification is nearly impossible when considering
+the objective not to impose unnecessary complication to the hypervisor.
+As it was alluded to previously, a manifest describing the contents in the MB2
+chain and how they relate within a Xen context is needed. To address this need
+the Launch Control Module (LCM) was designed to provide such a manifest. The
+LCM was designed to have a specific set of properties,
+* minimize the complexity of the parsing logic required by the hypervisor
+* allow for expanding and optional configuration fragments without breaking
+  backwards compatibility
+To enable automatic detection of a Hyperlaunch configuration, the LCM must be
+the first MB2 module in the MB2 module chain. The LCM is implemented using the
+Device Tree as defined in the Hyperlaunch Device Tree design document. With the
+LCM implemented in Device Tree, it has a magic number that enables the
+hypervisor to detect its presence when used in a Multiboot2 module chain. The
+hypervisor can confirm that it is a proper LCM Device Tree by checking for a
+compliant Hyperlaunch Device Tree. The Hyperlaunch Device Tree nodes are
+designed to allow,
+* for the hypervisor to parse only those entries it understands,
+* for packing custom information for a custom boot domain,
+* the ability to use a new LCM with an older hypervisor,
+* and the ability to use an older LCM with a new hypervisor.
+Arm Device Tree
+As discussed the LCM is in Device Tree format and was designed to co-exist in
+the Device Tree ecosystem, and in particular in parallel with dom0less Device
+Tree entries. On Arm, Xen is already designed to boot from a host Device Tree
+description (dtb) file and the LCM entries can be embedded into this host dtb
+file. This makes detecting the LCM entries and supporting Hyperlaunch on Arm
+relatively straight forward. Relative to the described x86 approach, at the
+point where Xen inspects the first MB2 module, on Arm Xen will check if the top
+level LCM node exists in the host dtb file. If the LCM node does exist, then at
+that point it will enter into the same code path as the x86 entry would go. 
+Xen hypervisor
+It was previously discussed at a higher level of the new host boot flow that
+will be introduced. Within this new flow is the configuration parsing and
+domain creation phase which will be expanded upon here. The hypervisor will
+inspect the LCM for a config node and if found will iterate through all modules
+nodes. The module nodes are used to identify if any modules contain microcode
+or an XSM policy. As it processes domain nodes, it will construct the domain
+using the node properties and the modules nodes. Once it has completed
+iterating through all the entries in the LCM, if a constructed domain has the
+Boot Domain attribute, it will then be unpaused. Otherwise the hypervisor will
+start the launch finalization phase.
+Boot Domain
+Traditionally domain creation was controlled by the user within the Dom0
+environment whereby custom toolstacks could be implemented to impose
+requirements on the process. The Boot Domain is a means to enable the user to
+continue to maintain a degree of that control over domain creation but within a
+limited privilege environment. The Boot Domain will have access to the LCM and
+the boot chain along with access to a subset of the hypercall operations. When
+the Boot Domain is finished it will notify the hypervisor through a hypercall
+Recovery Domain
+With the existing Dom0 host boot path, when a failure occurs there are several
+assumptions that can safely be made to get the user to a console for
+troubleshooting. With the Hyperlaunch host boot path those assumptions can no
+longer be made, thus a means is needed to get the user to a console in the case
+of a recoverable failure. The recovery domain is configured by a domain
+configuration entry in the LCM, in the same manner as the other initial
+domains, and it will not be unpaused at launch finalization unless a failure is
+encountered starting the initial domains.
+Xen has existing support for a Crash Environment where memory can be reserved
+at host boot and a kernel loaded into it, to be jumped into at any point while
+the system is running when a crash is detected. The Recovery Domain
+functionality is a separate, complementary capability. The Crash Environment
+replaces the previously active hypervisor and running guests, and enables a
+process for mounting disks to write out log information prior to rebooting the
+system. In contrast, the Recovery Domain is able to use the functionality of
+the Xen hypervisor, that is still present and running, to perform recovery
+handling for errors encountered with starting the initial domains.
+Deferred Design
+To be determined:
+* Define what is detected as a crash
+* Explain how crash detection is performed and which components are involved
+* Explain how the recovery domain is unpaused
+* Explain how and when the resources assigned to the recovery domain are reclaimed
+* Define what the recovery domain is able to do
+* Determine what permissions the recovery domain requires to perform its job
+Control Domain
+The concept of the Control Domain already exists within Xen as a boolean,
+`is_privileged`, that governs access to many of the privileged interfaces of
+the hypervisor that support a domain running a virtualization system toolstack.
+Hyperlaunch will allow the `is_privileged` flag to be set on any domain that is
+created at launch, rather than only a Dom0. It may potentially be set on
+multiple domains.
+Hardware Domain
+The Hardware Domain is also an existing concept for Xen that is enabled through
+the `is_hardware_domain` check. With Hyperlaunch the previous process of I/O
+accesses being assigned to Dom0 for later transfer to the hardware domain would
+no longer be required. Instead during the configuration phase the Xen
+hypervisor would directly assign the I/O accesses to the domain with the
+hardware domain permission bit enabled.
+Console Domain
+Traditionally the Xen console is assigned to the control domain and then
+reassignable by the toolstack to another domain. With Hyperlaunch it becomes
+possible to construct a boot configuration where there is no control domain or
+have a use case where the Xen console needs to be isolated. As such it becomes
+necessary to be able to designate which of the initial domains should be
+assigned the Xen console. Therefore Hyperlaunch introduces the ability to
+specify an initial domain which the console is assigned along with a convention
+of ordered assignment for when there is no explicit assignment.
+Communication of Domain Configurations
+There are several standard methods for an Operating System to access machine
+configuration and environment information: ACPI is common on x86 systems,
+whereas Device Tree is more typical on Arm platforms. There are currently
+implementations of both in Xen.
+* For dom0less, guest Device Trees are dynamically constructed by the
+  hypervisor to convey domain configuration data
+* For PVH dom0 on x86, ACPI tables are built by the hypervisor before the
+  domain is started
+Note that both of these mechanisms convey static data that is fixed prior to
+the point of domain construction. Hyperlaunch will retain both the existing
+ACPI and Device Tree methods.
+Communication of data between a Boot Domain and a Control Domain is of note
+since they may not be running concurrently: the method used will depend on
+their specific implementations, but one option available is to use Xen’s hypfs
+for transfer of basic data to support system bootstrap.
+Appendix 1: Flow Sequence of Steps of a Hyperlaunch Boot
+Provided here is an ordered flow of a Hyperlaunch with a highlight logic
+decision points. Not all branch points are recorded, specifically for the
+variety of error conditions that may occur. ::
+  1. Hypervisor Startup:
+  2a. (x86) Inspect first module provided by the bootloader
+      a. Is the module an LCM
+          i. YES: proceed with the Hyperlaunch host boot path
+          ii. NO: proceed with a Dom0 host boot path
+  2b. (Arm) Inspect host dtb for `/chosen/hypervisor` node
+      a. Is the LCM present
+          i. YES: proceed with the Hyperlaunch host boot path
+          ii. NO: proceed with a Dom0/dom0less host boot path
+  3. Iterate through the LCM entries looking for the module description
+     entry
+      a. Check if any of the modules are microcode or policy and if so,
+         load
+  4. Iterate through the LCM entries processing all domain description
+     entries
+      a. Use the details from the Basic Configuration to call
+         `domain_create`
+      b. Record if a domain is flagged as the Boot Domain
+      c. Record if a domain is flagged as the Recovery Domain
+  5. Was a Boot Domain created
+      a. YES:
+          i. Attach console to Boot Domain
+          ii. Unpause Boot Domain
+          iii. Goto Boot Domain (step 6)
+      b. NO: Goto Launch Finalization (step 10)
+  6. Boot Domain:
+  7. Boot Domain comes online and may do any of the following actions
+      a. Process the LCM
+      b. Validate the MB2 chain
+      c. Make additional configuration settings for staged domains
+      d. Unpause any precursor domains
+      e. Set any runtime configurations
+  8. Boot Domain does any necessary cleanup
+  9. Boot Domain make hypercall op call to signal it is finished
+      i. Hypervisor reclaims all Boot Domain resources
+      ii. Hypervisor records that the Boot Domain ran
+      ii. Goto Launch Finalization (step 9)
+  10. Launch Finalization
+  11. If a configured domain was flagged to have the console, the
+      hypervisor assigns it
+  12. The hypervisor clears the LCM and bootloader loaded module,
+      reclaiming the memory
+  13. The hypervisor iterates through domains unpausing any domain not
+      flagged as the recovery domain
+Appendix 2: Considerations in Naming the Hyperlaunch Feature
+* The term “Launch” is preferred over “Boot”
+        * Multiple individual component boots can occur in the new system start
+          process; Launch is preferable for describing the whole process
+        * Fortunately there is consensus in the current group of stakeholders
+          that the term “Launch” is good and appropriate
+* The names we define must support becoming meaningful and simple to use
+  outside the Xen community
+        * They must be able to be resolved quickly via search engine to a clear
+          explanation (eg. Xen marketing material, documentation or wiki)
+        * We prefer that the terms be helpful for marketing communications
+        * Consequence: avoid the term “domain” which is Xen-specific and
+          requires a definition to be provided each time when used elsewhere
+* There is a need to communicate that Xen is  capable of being used as a Static
+  Partitioning hypervisor
+        * The community members using and maintaining dom0less are the current
+          primary stakeholders for this
+* There is a need to communicate that the new launch functionality provides new
+  capabilities not available elsewhere, and is more than just supporting Static
+  Partitioning
+        * No other hypervisor known to the authors of this document is capable
+          of providing what Hyperlaunch will be able to do. The launch sequence is
+          designed to:
+                * Remove dependency on a single, highly-privileged initial domain
+                * Allow the initial domains started to be independent and fully
+                  isolated from each other
+                * Support configurations where no further VMs can be launched
+                  once the initial domains have started
+                * Use a standard, extensible format for conveying VM
+                  configuration data
+                * Ensure that domain building of all initial domains is
+                  performed by the hypervisor from materials supplied by the
+                  bootloader
+                * Enable flexible configuration to be applied to all initial
+                  domains by an optional Boot Domain, that runs with limited
+                  privilege, before any other domain starts and obtains the VM
+                  configuration data from the bootloader materials via the
+                  hypervisor
+                * Enable measurements of all of the boot materials prior to
+                  their use, in a sequence with minimized privilege
+                * Support use-case-specific customized Boot Domains
+                * Complement the hypervisor’s existing ability to enforce
+                  policy-based Mandatory Access Control
+* “Static” and “Dynamic” have different and important meanings in different
+  communities
+        * Static and Dynamic Partitioning describe the ability to create new
+          virtual machines, or not, after the initial host boot process
+          completes
+        * Static and Dynamic Root of Trust describe the nature of the trust
+          chain for a measured launch. In this case Static is referring to the
+          fact that the trust chain is fixed and non-repeatable until the next
+          host reboot or shutdown. Whereas Dynamic in this case refers to the
+          ability to conduct the measured launch at any time and potentially
+          multiple times before the next host reboot or shutdown. 
+                * We will be using Hyperlaunch with both Static and Dynamic
+                  Roots of Trust, to launch both Static and Dynamically
+                  Partitioned Systems, and being clear about exactly which
+                  combination is being started will be very important (eg. for
+                  certification processes)
+        * Consequence: uses of “Static” and “Dynamic” need to be qualified if
+          they are incorporated into the naming of this functionality
+                * This can be done by adding the preceding, stronger branded
+                  term: “Hyperlaunch”, before “Static” or “Dynamic”
+                * ie. “Hyperlaunch Static” describes launch of a
+                  Statically Partitioned system
+                * and “Hyperlaunch Dynamic” describes launch of a
+                  Dynamically Partitioned system.
+                * In practice, this means that “Hyperlaunch Static” describes
+                  starting a Static Partitioned system where no new domains can
+                  be started later (ie. no VM has the Control Domain
+                  permission), whereas “Hyperlaunch Dynamic” will launch some
+                  VM with the Control Domain permission, able to create VMs
+                  dynamically at a later point.
+**Naming Proposal:**
+* New Term: “Hyperlaunch” : the ability of a hypervisor to construct and start
+  one or more virtual machines at system launch, in the following manner:
+        * The hypervisor must build all of the domains that it starts at host
+          boot
+                * Similar to the way the dom0 domain is built by the hypervisor
+                  today, and how dom0less works: it will run a loop to build
+                  them all, driven from the configuration provided
+                * This is a requirement for ensuring that there is Strong
+                  Isolation between each of the initial VMs
+        * A single file contains the VM configs (“Launch Control Module”: LCM,
+          in Device Tree binary format) is provided to the hypervisor
+                * The hypervisor parses it and builds domains
+                * If the LCM config says that a Boot Domain should run first,
+                  then the LCM file itself is made available to the Boot Domain
+                  for it to parse and act on, to invoke operations via the
+                  hypervisor to apply additional configuration to the other VMs
+                  (ie. executing a privilege-constrained toolstack)
+* New Term: “Hyperlaunch Static”: starts a Static Partitioned system, where
+  only the virtual machines started at system launch are running on the system
+* New Term: “Hyperlaunch Dynamic”: starts a system where virtual machines may
+  be dynamically added after the initial virtual machines have started.
+In the default configuration, Xen will be capable of both styles of Hyperlaunch
+from the same hypervisor binary, when paired with its XSM flask, provides
+strong controls that enable fine grained system partitioning.
+* Retiring Term: “DomB”: will no longer be used to describe the optional first
+  domain that is started. It is replaced with the more general term: “Boot
+  Domain”.
+* Retiring Term: “Dom0less”: it is to be replaced with “Hyperlaunch Static”
+Appendix 3: Terminology
+To help ensure clarity in reading this document, the following is the
+definition of terminology used within this document.
+Basic Configuration
+    the minimal information the hypervisor requires to instantiate a domain instance
+Boot Domain
+    a domain with limited privileges launched by the hypervisor during a
+    Multiple Domain Boot that runs as the first domain started. In the Hyperlaunch
+    architecture, it is responsible for assisting with higher level operations of
+    the domain setup process.
+Classic Launch
+    a backwards-compatible host boot that ends with the launch of a single domain (Dom0)
+Console Domain
+    a domain that has the Xen console assigned to it
+Control Domain
+    a privileged domain that has been granted Control Domain permissions which
+    are those that are required by the Xen toolstack for managing other domains.
+    These permissions are a subset of those that are granted to Dom0.
+Device Tree
+    a standardized data structure, with defined file formats, for describing
+    initial system configuration
+    the separation of system roles and responsibilities across multiple
+    connected components that work together to provide functionality
+    the highly-privileged, first and only domain started at host boot on a
+    conventional Xen system
+    an existing feature of Xen on Arm that provides Multiple Domain Boot
+    a running instance of a virtual machine; (as the term is commonly used in
+    the Xen Community)
+     the former name for Hyperlaunch
+Extended Configuration
+    any configuration options for a domain beyond its Basic Configuration
+Hardware Domain
+    a privileged domain that has been granted permissions to access and manage
+    host hardware. These permissions are a subset of those that are granted to
+    Dom0.
+Host Boot
+    the system startup of Xen using the configuration provided by the bootloader
+    a flexible host boot that ends with the launch of one or more domains
+Initial Domain
+    a domain that is described in the LCM that is run as part of a multiple
+    domain boot. This includes the Boot Domain, Recovery Domain and all Launched
+    Domains.
+Late Hardware Domain
+    a Hardware Domain that is launched after host boot has already completed
+    with a running Dom0. When the Late Hardware Domain is started, Dom0
+    relinquishes and transfers the permissions to access and manage host hardware
+    to it..
+Launch Control Module (LCM)
+    A file supplied to the hypervisor by the bootloader that contains
+    configuration data for the hypervisor and the initial set of virtual machines
+    to be run at boot
+Launched Domain
+    a domain, aside from the boot domain and recovery domain, that is started as
+    part of a multiple domain boot and remains running once the boot process is
+    complete
+Multiple Domain Boot
+    a system configuration where the hypervisor and multiple virtual machines
+    are all launched when the host system hardware boots
+Recovery Domain
+    an optional fallback domain that the hypervisor may start in the event of a
+    detectable error encountered during the multiple domain boot process
+System Device Tree
+    this is the product of an Arm community project to extend Device Tree to
+    cover more aspects of initial system configuration
+Appendix 4: Copyright License
+This work is licensed under a Creative Commons Attribution 4.0 International
+License. A copy of this license may be obtained from the Creative Commons
+website (https://creativecommons.org/licenses/by/4.0/legalcode).
+| Contributions by:
+| Christopher Clark are Copyright © 2021 Star Lab Corporation
+| Daniel P. Smith are Copyright  © 2021 Apertus Solutions, LLC

  reply	other threads:[~2021-05-14  3:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-14  3:40 [PATCH v4 0/2] Introducing Hyperlaunch capability design (formerly: DomB mode of dom0less) Christopher Clark
2021-05-14  3:41 ` Christopher Clark [this message]
2021-07-07  5:27   ` Ping: [PATCH v4 1/2] docs/designs/launch: Hyperlaunch design document Christopher Clark
2021-05-14  3:41 ` [PATCH v4 2/2] docs/designs/launch: Hyperlaunch device tree Christopher Clark
2021-07-07  5:28   ` Ping: " Christopher Clark
2021-05-14 14:18 ` [PATCH v4 0/2] Introducing Hyperlaunch capability design (formerly: DomB mode of dom0less) Daniel P. Smith
2021-07-07  5:24   ` Ping: " Christopher Clark
2021-07-09  6:35 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210514034101.3683-2-christopher.w.clark@gmail.com \
    --to=christopher.w.clark@gmail.com \
    --cc=Bertrand.Marquis@arm.com \
    --cc=Julien.grall.oss@gmail.com \
    --cc=adam.schwalm@starlab.io \
    --cc=andrew.cooper3@citrix.com \
    --cc=christopher.clark@starlab.io \
    --cc=dpsmith@apertussolutions.com \
    --cc=george.dunlap@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=jbeulich@suse.com \
    --cc=jgrall@amazon.com \
    --cc=luca.fancellu@arm.com \
    --cc=paul@xen.org \
    --cc=persaur@gmail.com \
    --cc=roger.pau@citrix.com \
    --cc=scott.davis@starlab.io \
    --cc=stefano.stabellini@xilinx.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    --subject='Re: [PATCH v4 1/2] docs/designs/launch: Hyperlaunch design document' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).