Re: [PATCH v1 00/15] Add support for Nitro Enclaves

From: Alexander Graf <graf@amazon.de>
To: Liran Alon <liran.alon@oracle.com>,
	"Paraschiv, Andra-Irina" <andraprs@amazon.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	<linux-kernel@vger.kernel.org>
Cc: Anthony Liguori <aliguori@amazon.com>,
	Benjamin Herrenschmidt <benh@amazon.com>,
	Colm MacCarthaigh <colmmacc@amazon.com>,
	Bjoern Doebel <doebel@amazon.de>,
	David Woodhouse <dwmw@amazon.co.uk>,
	Frank van der Linden <fllinden@amazon.com>,
	Martin Pohlack <mpohlack@amazon.de>, Matt Wilson <msw@amazon.com>,
	Balbir Singh <sblbir@amazon.com>,
	Stewart Smith <trawets@amazon.com>,
	Uwe Dannowski <uwed@amazon.de>, <kvm@vger.kernel.org>,
	<ne-devel-upstream@amazon.com>
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves
Date: Tue, 28 Apr 2020 17:25:29 +0200	[thread overview]
Message-ID: <50f58a36-76ee-5e97-f5e6-1f08bee0c596@amazon.de> (raw)
In-Reply-To: <26111e31-8ff5-8358-1e05-6d7df0441ab1@oracle.com>

On 27.04.20 13:44, Liran Alon wrote:
> 
> On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:
>>
>> On 25/04/2020 18:25, Liran Alon wrote:
>>>
>>> On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
>>>>
>>>> The memory and CPUs are carved out of the primary VM, they are
>>>> dedicated for the enclave. The Nitro hypervisor running on the host
>>>> ensures memory and CPU isolation between the primary VM and the
>>>> enclave VM.
>>> I hope you properly take into consideration Hyper-Threading
>>> speculative side-channel vulnerabilities here.
>>> i.e. Usually cloud providers designate each CPU core to be assigned
>>> to run only vCPUs of specific guest. To avoid sharing a single CPU
>>> core between multiple guests.
>>> To handle this properly, you need to use some kind of core-scheduling
>>> mechanism (Such that each CPU core either runs only vCPUs of enclave
>>> or only vCPUs of primary VM at any given point in time).
>>>
>>> In addition, can you elaborate more on how the enclave memory is
>>> carved out of the primary VM?
>>> Does this involve performing a memory hot-unplug operation from
>>> primary VM or just unmap enclave-assigned guest physical pages from
>>> primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?
>>
>> Correct, we take into consideration the HT setup. The enclave gets
>> dedicated physical cores. The primary VM and the enclave VM don't run
>> on CPU siblings of a physical core.
> The way I would imagine this to work is that Primary-VM just specifies
> how many vCPUs will the Enclave-VM have and those vCPUs will be set with
> affinity to run on same physical CPU cores as Primary-VM.
> But with the exception that scheduler is modified to not run vCPUs of
> Primary-VM and Enclave-VM as sibling on the same physical CPU core
> (core-scheduling). i.e. This is different than primary-VM losing
> physical CPU cores permanently as long as the Enclave-VM is running.
> Or maybe this should even be controlled by a knob in virtual PCI device
> interface to allow flexibility to customer to decide if Enclave-VM needs
> dedicated CPU cores or is it ok to share them with Primary-VM
> as long as core-scheduling is used to guarantee proper isolation.

Running both parent and enclave on the same core can *potentially* lead 
to L2 cache leakage, so we decided not to go with it :).

>>
>> Regarding the memory carve out, the logic includes page table entries
>> handling.
> As I thought. Thanks for conformation.
>>
>> IIRC, memory hot-unplug can be used for the memory blocks that were
>> previously hot-plugged.
>>
>> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$ 
>>
>>
>>>
>>> I don't quite understand why Enclave VM needs to be
>>> provisioned/teardown during primary VM's runtime.
>>>
>>> For example, an alternative could have been to just provision both
>>> primary VM and Enclave VM on primary VM startup.
>>> Then, wait for primary VM to setup a communication channel with
>>> Enclave VM (E.g. via virtio-vsock).
>>> Then, primary VM is free to request Enclave VM to perform various
>>> tasks when required on the isolated environment.
>>>
>>> Such setup will mimic a common Enclave setup. Such as Microsoft
>>> Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
>>> similar to TEEs running on ARM TrustZone.
>>> i.e. In my alternative proposed solution, the Enclave VM is similar
>>> to VTL1/TrustZone.
>>> It will also avoid requiring introducing a new PCI device and driver.
>>
>> True, this can be another option, to provision the primary VM and the
>> enclave VM at launch time.
>>
>> In the proposed setup, the primary VM starts with the initial
>> allocated resources (memory, CPUs). The launch path of the enclave VM,
>> as it's spawned on the same host, is done via the ioctl interface -
>> PCI device - host hypervisor path. Short-running or long-running
>> enclave can be bootstrapped during primary VM lifetime. Depending on
>> the use case, a custom set of resources (memory and CPUs) is set for
>> an enclave and then given back when the enclave is terminated; these
>> resources can be used for another enclave spawned later on or the
>> primary VM tasks.
>>
> Yes, I already understood this is how the mechanism work. I'm
> questioning whether this is indeed a good approach that should also be
> taken by upstream.

I thought the point of Linux was to support devices that exist, rather 
than change the way the world works around it? ;)

> The use-case of using Nitro Enclaves is for a Confidential-Computing
> service. i.e. The ability to provision a compute instance that can be
> trusted to perform a bunch of computation on sensitive
> information with high confidence that it cannot be compromised as it's
> highly isolated. Some technologies such as Intel SGX and AMD SEV
> attempted to achieve this even with guarantees that
> the computation is isolated from the hardware and hypervisor itself.

Yeah, that worked really well, didn't it? ;)

> I would have expected that for the vast majority of real customer
> use-cases, the customer will provision a compute instance that runs some
> confidential-computing task in an enclave which it
> keeps running for the entire life-time of the compute instance. As the
> sole purpose of the compute instance is to just expose a service that
> performs some confidential-computing task.
> For those cases, it should have been sufficient to just pre-provision a
> single Enclave-VM that performs this task, together with the compute
> instance and connect them via virtio-vsock.
> Without introducing any new virtual PCI device, guest PCI driver and
> unique semantics of stealing resources (CPUs and Memory) from primary-VM
> at runtime.

You would also need to preprovision the image that runs in the enclave, 
which is usually only determined at runtime. For that you need the PCI 
driver anyway, so why not make the creation dynamic too?

> In this Nitro Enclave architecture, we de-facto put Compute
> control-plane abilities in the hands of the guest VM. Instead of
> introducing new control-plane primitives that allows building
> the data-plane architecture desired by the customer in a flexible manner.
> * What if the customer prefers to have it's Enclave VM polling S3 bucket
> for new tasks and produce results to S3 as-well? Without having any
> "Primary-VM" or virtio-vsock connection of any kind?
> * What if for some use-cases customer wants Enclave-VM to have dedicated
> compute power (i.e. Not share physical CPU cores with primary-VM. Not
> even with core-scheduling) but for other
> use-cases, customer prefers to share physical CPU cores with Primary-VM
> (Together with core-scheduling guarantees)? (Although this could be
> addressed by extending the virtual PCI device
> interface with a knob to control this)
> 
> An alternative would have been to have the following new control-plane
> primitives:
> * Ability to provision a VM without boot-volume, but instead from an
> Image that is used to boot from memory. Allowing to provision disk-less 
> VMs.
>    (E.g. Can be useful for other use-cases such as VMs not requiring EBS
> at all which could allow cheaper compute instance)
> * Ability to provision a group of VMs together as a group such that they
> are guaranteed to launch as sibling VMs on the same host.
> * Ability to create a fast-path connection between sibling VMs on the
> same host with virtio-vsock. Or even also other shared-memory mechanism.
> * Extend AWS Fargate with ability to run multiple microVMs as a group
> (Similar to above) connected with virtio-vsock. To allow on-demand scale
> of confidential-computing task.

Yes, there are a *lot* of different ways to implement enclaves in a 
cloud environment. This is the one that we focused on, but I'm sure 
others in the space will have more ideas. It's definitely an interesting 
space and I'm eager to see more innovation happening :).

> Having said that, I do see a similar architecture to Nitro Enclaves
> virtual PCI device used for a different purpose: For hypervisor-based
> security isolation (Such as Windows VBS).
> E.g. Linux boot-loader can detect the presence of this virtual PCI
> device and use it to provision multiple VM security domains. Such that
> when a security domain is created,
> it is specified what is the hardware resources it have access to (Guest
> memory pages, IOPorts, MSRs and etc.) and the blob it should run to
> bootstrap. Similar, but superior than,
> Hyper-V VSM. In addition, some security domains will be given special
> abilities to control other security domains (For example, to control the
> +XS,+XU EPT bits of other security
> domains to enforce code-integrity. Similar to Windows VBS HVCI). Just an
> idea... :)

Yes, absolutely! So much fun to be had :D

Alex

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879