linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/18] Add support for Nitro Enclaves
@ 2020-06-22 20:03 Andra Paraschiv
  2020-06-22 20:03 ` [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition Andra Paraschiv
                   ` (17 more replies)
  0 siblings, 18 replies; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPU, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest  that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the NE driver.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command. The PCI device commands are then
translated into  actions taken on the hypervisor side; that's the Nitro
hypervisor running on the host where the primary VM is running. The Nitro
hypervisor is based on core KVM technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

The memory regions carved out of the primary VM and given to an enclave need to
be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
The enclave memory and CPUs need to be from the same NUMA node.

An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
available for the primary VM. A CPU pool has to be set for NE purposes by an
user with admin capability. See the cpu list section from the kernel
documentation [4] for how a CPU pool format looks.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol.

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.

Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

Thank you.

Andra

[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
[3] https://lwn.net/Articles/807108/
[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
[5] https://man7.org/linux/man-pages/man7/vsock.7.html

---

Patch Series Changelog

The patch series is built on top of v5.8-rc2.

v3 -> v4

* Rebase on top of v5.8-rc2.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.
* Decouple NE ioctl interface from KVM API.
* Remove the "packed" attribute and include padding in the NE data structures.
* Update documentation based on the changes from v4.
* Update sample to match the updates in v4.
* Remove the NE CPU pool init during NE kernel module loading.
* Setup the NE CPU pool at runtime via a sysfs file for the kernel parameter.
* Check if the enclave memory and CPUs are from the same NUMA node.
* Add minimum enclave memory size definition.
* v3: https://lore.kernel.org/lkml/20200525221334.62966-1-andraprs@amazon.com/ 

v2 -> v3

* Rebase on top of v5.7-rc7.
* Add changelog to each patch in the series.
* Remove "ratelimited" from the logs that are not in the ioctl call paths.
* Update static calls sanity checks.
* Remove file ops that do nothing for now.
* Remove GPL additional wording as SPDX-License-Identifier is already in place.
* v2: https://lore.kernel.org/lkml/20200522062946.28973-1-andraprs@amazon.com/

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd creation error cleanup path.
* Add fix reported by kbuild test robot <lkp@intel.com>.
* v1: https://lore.kernel.org/lkml/20200421184150.68011-1-andraprs@amazon.com/

---

Andra Paraschiv (18):
  nitro_enclaves: Add ioctl interface definition
  nitro_enclaves: Define the PCI device interface
  nitro_enclaves: Define enclave info for internal bookkeeping
  nitro_enclaves: Init PCI device driver
  nitro_enclaves: Handle PCI device command requests
  nitro_enclaves: Handle out-of-band PCI device events
  nitro_enclaves: Init misc device providing the ioctl interface
  nitro_enclaves: Add logic for enclave vm creation
  nitro_enclaves: Add logic for enclave vcpu creation
  nitro_enclaves: Add logic for enclave image load info
  nitro_enclaves: Add logic for enclave memory region set
  nitro_enclaves: Add logic for enclave start
  nitro_enclaves: Add logic for enclave termination
  nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
  nitro_enclaves: Add Makefile for the Nitro Enclaves driver
  nitro_enclaves: Add sample for ioctl interface usage
  nitro_enclaves: Add overview documentation
  MAINTAINERS: Add entry for the Nitro Enclaves driver

 Documentation/nitro_enclaves/ne_overview.rst  |   87 ++
 .../userspace-api/ioctl/ioctl-number.rst      |    5 +-
 MAINTAINERS                                   |   13 +
 drivers/virt/Kconfig                          |    2 +
 drivers/virt/Makefile                         |    2 +
 drivers/virt/nitro_enclaves/Kconfig           |   16 +
 drivers/virt/nitro_enclaves/Makefile          |   11 +
 drivers/virt/nitro_enclaves/ne_misc_dev.c     | 1364 +++++++++++++++++
 drivers/virt/nitro_enclaves/ne_misc_dev.h     |  115 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.c      |  626 ++++++++
 drivers/virt/nitro_enclaves/ne_pci_dev.h      |  264 ++++
 include/linux/nitro_enclaves.h                |   11 +
 include/uapi/linux/nitro_enclaves.h           |  137 ++
 samples/nitro_enclaves/.gitignore             |    2 +
 samples/nitro_enclaves/Makefile               |   16 +
 samples/nitro_enclaves/ne_ioctl_sample.c      |  520 +++++++
 16 files changed, 3190 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.rst
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig
 create mode 100644 drivers/virt/nitro_enclaves/Makefile
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-06-23  8:56   ` Stefan Hajnoczi
  2020-07-02 15:24   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface Andra Paraschiv
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
  ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
  and minor.
---
 .../userspace-api/ioctl/ioctl-number.rst      |   5 +-
 include/linux/nitro_enclaves.h                |  11 ++
 include/uapi/linux/nitro_enclaves.h           | 137 ++++++++++++++++++
 3 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 59472cd6a11d..783440c6719b 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -328,8 +328,11 @@ Code  Seq#    Include File                                           Comments
 0xAC  00-1F  linux/raw.h
 0xAD  00                                                             Netfilter device in development:
                                                                      <mailto:rusty@rustcorp.com.au>
-0xAE  all    linux/kvm.h                                             Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h                                             Kernel-based Virtual Machine
                                                                      <mailto:kvm@vger.kernel.org>
+0xAE  40-FF  linux/kvm.h                                             Kernel-based Virtual Machine
+                                                                     <mailto:kvm@vger.kernel.org>
+0xAE  20-3F  linux/nitro_enclaves.h                                  Nitro Enclaves
 0xAF  00-1F  linux/fsl_hypervisor.h                                  Freescale hypervisor
 0xB0  all                                                            RATIO devices in development:
                                                                      <mailto:vgo@ratio.de>
diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index 000000000000..d91ef2bfdf47
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include <uapi/linux/nitro_enclaves.h>
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index 000000000000..3270eb939a97
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include <linux/types.h>
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+#define NE_API_VERSION (1)
+
+/**
+ * The command is used to get the version of the NE API. This way the user space
+ * processes can be aware of the feature sets provided by the NE kernel driver.
+ *
+ * The NE API version is returned as result of this ioctl call.
+ */
+#define NE_GET_API_VERSION _IO(0xAE, 0x20)
+
+/**
+ * The command is used to create a slot that is associated with an enclave VM.
+ *
+ * The generated unique slot id is a read parameter of this command. An enclave
+ * file descriptor is returned as result of this ioctl call. The enclave fd can
+ * be further used with ioctl calls to set vCPUs and memory regions, then start
+ * the enclave.
+ */
+#define NE_CREATE_VM _IOR(0xAE, 0x21, __u64)
+
+/**
+ * The command is used to set a vCPU for an enclave. A CPU pool needs to be set
+ * for enclave usage, before calling this function. CPU 0 and its siblings need
+ * to remain available for the primary / parent VM, so they cannot be set for
+ * an enclave.
+ *
+ * The vCPU id is a write / read parameter. If its value is 0, then a CPU is
+ * chosen from the enclave CPU pool and returned via this parameter. A vCPU file
+ * descriptor is returned as result of this ioctl call.
+ */
+#define NE_CREATE_VCPU _IOWR(0xAE, 0x22, __u32)
+
+/**
+ * The command is used to get information needed for in-memory enclave image
+ * loading e.g. offset in enclave memory to start placing the enclave image.
+ *
+ * The image load info is a write / read parameter. It includes info provided
+ * by the caller - flags - and returns the offset in enclave memory where to
+ * start placing the enclave image.
+ */
+#define NE_GET_IMAGE_LOAD_INFO _IOWR(0xAE, 0x23, struct ne_image_load_info)
+
+/**
+ * The command is used to set a memory region for an enclave, given the
+ * allocated memory from the userspace.
+ *
+ * The user memory region is a write parameter. It includes info provided
+ * by the caller - flags, memory size and userspace address.
+ */
+#define NE_SET_USER_MEMORY_REGION _IOW(0xAE, 0x24, struct ne_user_memory_region)
+
+/**
+ * The command is used to trigger enclave start after the enclave resources,
+ * such as memory and CPU, have been set.
+ *
+ * The enclave start info is a write / read parameter. It includes info provided
+ * by the caller - enclave cid and flags - and returns the cid (if input cid is
+ * 0).
+ */
+#define NE_START_ENCLAVE _IOWR(0xAE, 0x25, struct ne_enclave_start_info)
+
+/* Image load info flags */
+
+/* Enclave Image Format (EIF) */
+#define NE_EIF_IMAGE (0x01)
+
+/* Info necessary for in-memory enclave image loading (write / read). */
+struct ne_image_load_info {
+	/**
+	 * Flags to determine the enclave image type (e.g. Enclave Image Format
+	 * - EIF) (write).
+	 */
+	__u64 flags;
+
+	/**
+	 * Offset in enclave memory where to start placing the enclave image
+	 * (read).
+	 */
+	__u64 memory_offset;
+};
+
+/* User memory region flags */
+
+/* Memory region for enclave general usage. */
+#define NE_DEFAULT_MEMORY_REGION (0x00)
+
+/* Memory region to be set for an enclave (write). */
+struct ne_user_memory_region {
+	/**
+	 * Flags to determine the usage for the memory region (write).
+	 */
+	__u64 flags;
+
+	/**
+	 * The size, in bytes, of the memory region to be set for an enclave
+	 * (write).
+	 */
+	__u64 memory_size;
+
+	/**
+	 * The start of the userspace allocated memory of the memory region to
+	 * set for an enclave (write).
+	 */
+	__u64 userspace_addr;
+};
+
+/* Enclave start info flags */
+
+/* Start enclave in debug mode. */
+#define NE_ENCLAVE_DEBUG_MODE (0x01)
+
+/* Setup info necessary for enclave start (write / read). */
+struct ne_enclave_start_info {
+	/* Flags for the enclave to start with (e.g. debug mode) (write). */
+	__u64 flags;
+
+	/**
+	 * Context ID (CID) for the enclave vsock device. If 0 as input, the
+	 * CID is autogenerated by the hypervisor and returned back as output
+	 * by the driver (write / read).
+	 */
+	__u64 enclave_cid;
+};
+
+#endif /* _UAPI_LINUX_NITRO_ENCLAVES_H_ */
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
  2020-06-22 20:03 ` [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-02 15:24   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping Andra Paraschiv
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself run, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the functions for the PCI device init
/ uninit and command requests handling.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Remove the "packed" attribute and include padding in the NE data
  structures.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path naming to drivers/virt/nitro_enclaves.
* Update NE_ENABLE_OFF / NE_ENABLE_ON defines.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.h | 264 +++++++++++++++++++++++
 1 file changed, 264 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.h b/drivers/virt/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index 000000000000..0582584828b7
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include <linux/atomic.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+#include <linux/wait.h>
+
+/* Nitro Enclaves (NE) PCI device identifier */
+
+#define PCI_DEVICE_ID_NE (0xe4c1)
+#define PCI_BAR_NE (0x03)
+
+/* Device registers */
+
+/**
+ * (1 byte) Register to notify the device that the driver is using it
+ * (Read/Write).
+ */
+#define NE_ENABLE (0x0000)
+#define NE_ENABLE_OFF (0x00)
+#define NE_ENABLE_ON (0x01)
+
+/* (2 bytes) Register to select the device run-time version (Read/Write). */
+#define NE_VERSION (0x0002)
+#define NE_VERSION_MAX (0x0001)
+
+/**
+ * (4 bytes) Register to notify the device what command was requested
+ * (Write-Only).
+ */
+#define NE_COMMAND (0x0004)
+
+/**
+ * (4 bytes) Register to notify the driver that a reply or a device event
+ * is available (Read-Only):
+ * - Lower half  - command reply counter
+ * - Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT (0x000c)
+#define NE_EVTCNT_REPLY_SHIFT (0)
+#define NE_EVTCNT_REPLY_MASK (0x0000ffff)
+#define NE_EVTCNT_REPLY(cnt) (((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+			      NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT (16)
+#define NE_EVTCNT_EVENT_MASK (0xffff0000)
+#define NE_EVTCNT_EVENT(cnt) (((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+			      NE_EVTCNT_EVENT_SHIFT)
+
+/* (240 bytes) Buffer for sending the command request payload (Read/Write). */
+#define NE_SEND_DATA (0x0010)
+
+/* (240 bytes) Buffer for receiving the command reply payload (Read-Only). */
+#define NE_RECV_DATA (0x0100)
+
+/* Device MMIO buffer sizes */
+
+/* 240 bytes for send / recv buffer. */
+#define NE_SEND_DATA_SIZE (240)
+#define NE_RECV_DATA_SIZE (240)
+
+/* MSI-X interrupt vectors */
+
+/* MSI-X vector used for command reply notification. */
+#define NE_VEC_REPLY (0)
+
+/* MSI-X vector used for out-of-band events e.g. enclave crash. */
+#define NE_VEC_EVENT (1)
+
+/* Device command types. */
+enum ne_pci_dev_cmd_type {
+	INVALID_CMD = 0,
+	ENCLAVE_START = 1,
+	ENCLAVE_GET_SLOT = 2,
+	ENCLAVE_STOP = 3,
+	SLOT_ALLOC = 4,
+	SLOT_FREE = 5,
+	SLOT_ADD_MEM = 6,
+	SLOT_ADD_VCPU = 7,
+	SLOT_COUNT = 8,
+	NEXT_SLOT = 9,
+	SLOT_INFO = 10,
+	SLOT_ADD_BULK_VCPUS = 11,
+	MAX_CMD,
+};
+
+/* Device commands - payload structure for requests and replies. */
+
+struct enclave_start_req {
+	/* Slot unique id mapped to the enclave to start. */
+	u64 slot_uid;
+
+	/**
+	 * Context ID (CID) for the enclave vsock device.
+	 * If 0, CID is autogenerated.
+	 */
+	u64 enclave_cid;
+
+	/* Flags for the enclave to start with (e.g. debug mode). */
+	u64 flags;
+};
+
+struct enclave_get_slot_req {
+	/* Context ID (CID) for the enclave vsock device. */
+	u64 enclave_cid;
+};
+
+struct enclave_stop_req {
+	/* Slot unique id mapped to the enclave to stop. */
+	u64 slot_uid;
+};
+
+struct slot_alloc_req {
+	/* In order to avoid weird sizeof edge cases. */
+	u8 unused;
+};
+
+struct slot_free_req {
+	/* Slot unique id mapped to the slot to free. */
+	u64 slot_uid;
+};
+
+struct slot_add_mem_req {
+	/* Slot unique id mapped to the slot to add the memory region to. */
+	u64 slot_uid;
+
+	/* Physical address of the memory region to add to the slot. */
+	u64 paddr;
+
+	/* Memory size, in bytes, of the memory region to add to the slot. */
+	u64 size;
+};
+
+struct slot_add_vcpu_req {
+	/* Slot unique id mapped to the slot to add the vCPU to. */
+	u64 slot_uid;
+
+	/* vCPU ID of the CPU to add to the enclave. */
+	u32 vcpu_id;
+
+	u8 padding[4];
+};
+
+struct slot_count_req {
+	/* In order to avoid weird sizeof edge cases. */
+	u8 unused;
+};
+
+struct next_slot_req {
+	/* Slot unique id of the next slot in the iteration. */
+	u64 slot_uid;
+};
+
+struct slot_info_req {
+	/* Slot unique id mapped to the slot to get information about. */
+	u64 slot_uid;
+};
+
+struct slot_add_bulk_vcpus_req {
+	/* Slot unique id mapped to the slot to add vCPUs to. */
+	u64 slot_uid;
+
+	/* Number of vCPUs to add to the slot. */
+	u64 nr_vcpus;
+};
+
+struct ne_pci_dev_cmd_reply {
+	/* Return code of the functionality that processed the request. */
+	s32 rc;
+
+	u8 padding0[4];
+
+	/* Valid for all commands except SLOT_COUNT. */
+	u64 slot_uid;
+
+	/* Valid for ENCLAVE_START command. */
+	u64 enclave_cid;
+
+	/* Valid for SLOT_COUNT command. */
+	u64 slot_count;
+
+	/* Valid for SLOT_ALLOC and SLOT_INFO commands. */
+	u64 mem_regions;
+
+	/* Valid for SLOT_INFO command. */
+	u64 mem_size;
+
+	/* Valid for SLOT_INFO command. */
+	u64 nr_vcpus;
+
+	/* Valid for SLOT_INFO command. */
+	u64 flags;
+
+	/* Valid for SLOT_INFO command. */
+	u16 state;
+
+	u8 padding1[6];
+};
+
+/* Nitro Enclaves (NE) PCI device. */
+struct ne_pci_dev {
+	/* Variable set if a reply has been sent by the PCI device. */
+	atomic_t cmd_reply_avail;
+
+	/* Wait queue for handling command reply from the PCI device. */
+	wait_queue_head_t cmd_reply_wait_q;
+
+	/* List of the enclaves managed by the PCI device. */
+	struct list_head enclaves_list;
+
+	/* Mutex for accessing the list of enclaves. */
+	struct mutex enclaves_list_mutex;
+
+	/**
+	 * Work queue for handling out-of-band events triggered by the Nitro
+	 * Hypervisor which require enclave state scanning and propagation to
+	 * the enclave process.
+	 */
+	struct workqueue_struct *event_wq;
+
+	/* MMIO region of the PCI device. */
+	void __iomem *iomem_base;
+
+	/* Work item for every received out-of-band event. */
+	struct work_struct notify_work;
+
+	/* Mutex for accessing the PCI device MMIO space. */
+	struct mutex pci_dev_mutex;
+
+	/* PCI device data structure. */
+	struct pci_dev *pdev;
+};
+
+/**
+ * ne_do_request - Submit command request to the PCI device based on the command
+ * type and retrieve the associated reply.
+ *
+ * This function uses the ne_pci_dev mutex to handle one command at a time.
+ *
+ * @pdev: PCI device to send the command to and receive the reply from.
+ * @cmd_type: command type of the request sent to the PCI device.
+ * @cmd_request: command request payload.
+ * @cmd_request_size: size of the command request payload.
+ * @cmd_reply: command reply payload.
+ * @cmd_reply_size: size of the command reply payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+		  void *cmd_request, size_t cmd_request_size,
+		  struct ne_pci_dev_cmd_reply *cmd_reply,
+		  size_t cmd_reply_size);
+
+/* Nitro Enclaves (NE) PCI device driver */
+extern struct pci_driver ne_pci_driver;
+
+#endif /* _NE_PCI_DEV_H_ */
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
  2020-06-22 20:03 ` [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition Andra Paraschiv
  2020-06-22 20:03 ` [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-02 15:24   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 04/18] nitro_enclaves: Init PCI device driver Andra Paraschiv
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Add NUMA node field for an enclave metadata as the enclave memory and
  CPUs need to be from the same NUMA node.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
  update.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.h | 115 ++++++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.h b/drivers/virt/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index 000000000000..58eb9884379f
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/wait.h>
+
+/* Entry in vCPU IDs list. */
+struct ne_vcpu_id {
+	/* CPU id associated with a given slot, apic id on x86. */
+	u32 vcpu_id;
+
+	struct list_head vcpu_id_list_entry;
+};
+
+/* Entry in memory regions list. */
+struct ne_mem_region {
+	struct list_head mem_region_list_entry;
+
+	/* Number of pages that make up the memory region. */
+	unsigned long nr_pages;
+
+	/* Pages that make up the user space memory region. */
+	struct page **pages;
+};
+
+/* Per-enclave data used for enclave lifetime management. */
+struct ne_enclave {
+	/**
+	 * CPU pool with siblings of already allocated CPUs to an enclave.
+	 * This is used when a CPU pool is set, to be able to know the CPU
+	 * siblings for the hyperthreading (HT) setup.
+	 */
+	cpumask_var_t cpu_siblings;
+
+	struct list_head enclave_list_entry;
+
+	/* Mutex for accessing this internal state. */
+	struct mutex enclave_info_mutex;
+
+	/**
+	 * Wait queue used for out-of-band event notifications
+	 * triggered from the PCI device event handler to the enclave
+	 * process via the poll function.
+	 */
+	wait_queue_head_t eventq;
+
+	/* Variable used to determine if the out-of-band event was triggered. */
+	bool has_event;
+
+	/**
+	 * The maximum number of memory regions that can be handled by the
+	 * lower levels.
+	 */
+	u64 max_mem_regions;
+
+	/* Enclave memory regions list. */
+	struct list_head mem_regions_list;
+
+	/* Enclave memory size. */
+	u64 mem_size;
+
+	/* Enclave process abstraction mm data struct. */
+	struct mm_struct *mm;
+
+	/* Number of memory regions associated with the enclave. */
+	u64 nr_mem_regions;
+
+	/* Number of vcpus associated with the enclave. */
+	u64 nr_vcpus;
+
+	/* NUMA node of the enclave memory and CPUs. */
+	u32 numa_node;
+
+	/* PCI device used for enclave lifetime management. */
+	struct pci_dev *pdev;
+
+	/* Slot unique id mapped to the enclave. */
+	u64 slot_uid;
+
+	/* Enclave state, updated during enclave lifetime. */
+	u16 state;
+
+	/* Enclave vCPUs list. */
+	struct list_head vcpu_ids_list;
+};
+
+/* States available for an enclave. */
+enum ne_state {
+	/* NE_START_ENCLAVE ioctl was never issued for the enclave. */
+	NE_STATE_INIT = 0,
+
+	/**
+	 * NE_START_ENCLAVE ioctl was issued and the enclave is running
+	 * as expected.
+	 */
+	NE_STATE_RUNNING = 2,
+
+	/* Enclave exited without userspace interaction. */
+	NE_STATE_STOPPED = U16_MAX,
+};
+
+/* Nitro Enclaves (NE) misc device */
+extern struct miscdevice ne_misc_dev;
+
+#endif /* _NE_MISC_DEV_H_ */
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 04/18] nitro_enclaves: Init PCI device driver
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (2 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-02 15:09   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests Andra Paraschiv
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
  then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 261 +++++++++++++++++++++++
 1 file changed, 261 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index 000000000000..235fa3ecbee2
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,261 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/* Nitro Enclaves (NE) PCI device driver. */
+
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
+#include <linux/nitro_enclaves.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define NE_DEFAULT_TIMEOUT_MSECS (120000) /* 120 sec */
+
+static const struct pci_device_id ne_pci_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix - Setup MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to setup the MSI-X for.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+	int nr_vecs = 0;
+	int rc = -EINVAL;
+
+	if (!ne_pci_dev)
+		return -EINVAL;
+
+	nr_vecs = pci_msix_vec_count(pdev);
+	if (nr_vecs < 0) {
+		rc = nr_vecs;
+
+		dev_err(&pdev->dev, "Error in getting vec count [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_teardown_msix - Teardown MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to teardown the MSI-X for.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	if (!ne_pci_dev)
+		return;
+
+	pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable - Select PCI device version and enable it.
+ *
+ * @pdev: PCI device to select version for and then enable.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+	u8 dev_enable_reply = 0;
+	u16 dev_version_reply = 0;
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+		return -EINVAL;
+
+	iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+	dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+	if (dev_version_reply != NE_VERSION_MAX) {
+		dev_err(&pdev->dev, "Error in pci dev version cmd\n");
+
+		return -EIO;
+	}
+
+	iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+	dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+	if (dev_enable_reply != NE_ENABLE_ON) {
+		dev_err(&pdev->dev, "Error in pci dev enable cmd\n");
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_pci_dev_disable - Disable PCI device.
+ *
+ * @pdev: PCI device to disable.
+ */
+static void ne_pci_dev_disable(struct pci_dev *pdev)
+{
+	u8 dev_disable_reply = 0;
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+	const unsigned int sleep_time = 10; /* 10 ms */
+	unsigned int sleep_time_count = 0;
+
+	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+		return;
+
+	iowrite8(NE_ENABLE_OFF, ne_pci_dev->iomem_base + NE_ENABLE);
+
+	/*
+	 * Check for NE_ENABLE_OFF in a loop, to handle cases when the device
+	 * state is not immediately set to disabled and going through a
+	 * transitory state of disabling.
+	 */
+	while (sleep_time_count < NE_DEFAULT_TIMEOUT_MSECS) {
+		dev_disable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+		if (dev_disable_reply == NE_ENABLE_OFF)
+			return;
+
+		msleep_interruptible(sleep_time);
+		sleep_time_count += sleep_time;
+	}
+
+	dev_disable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+	if (dev_disable_reply != NE_ENABLE_OFF)
+		dev_err(&pdev->dev, "Error in pci dev disable cmd\n");
+}
+
+static int ne_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct ne_pci_dev *ne_pci_dev = NULL;
+	int rc = -EINVAL;
+
+	ne_pci_dev = kzalloc(sizeof(*ne_pci_dev), GFP_KERNEL);
+	if (!ne_pci_dev)
+		return -ENOMEM;
+
+	rc = pci_enable_device(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in pci dev enable [rc=%d]\n", rc);
+
+		goto free_ne_pci_dev;
+	}
+
+	rc = pci_request_regions_exclusive(pdev, "ne_pci_dev");
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in pci request regions [rc=%d]\n",
+			rc);
+
+		goto disable_pci_dev;
+	}
+
+	ne_pci_dev->iomem_base = pci_iomap(pdev, PCI_BAR_NE, 0);
+	if (!ne_pci_dev->iomem_base) {
+		rc = -ENOMEM;
+
+		dev_err(&pdev->dev, "Error in pci iomap [rc=%d]\n", rc);
+
+		goto release_pci_regions;
+	}
+
+	pci_set_drvdata(pdev, ne_pci_dev);
+
+	rc = ne_setup_msix(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in pci dev msix setup [rc=%d]\n",
+			rc);
+
+		goto iounmap_pci_bar;
+	}
+
+	ne_pci_dev_disable(pdev);
+
+	rc = ne_pci_dev_enable(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in ne_pci_dev enable [rc=%d]\n",
+			rc);
+
+		goto teardown_msix;
+	}
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+	init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
+	INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
+	mutex_init(&ne_pci_dev->enclaves_list_mutex);
+	mutex_init(&ne_pci_dev->pci_dev_mutex);
+	ne_pci_dev->pdev = pdev;
+
+	return 0;
+
+teardown_msix:
+	ne_teardown_msix(pdev);
+iounmap_pci_bar:
+	pci_set_drvdata(pdev, NULL);
+	pci_iounmap(pdev, ne_pci_dev->iomem_base);
+release_pci_regions:
+	pci_release_regions(pdev);
+disable_pci_dev:
+	pci_disable_device(pdev);
+free_ne_pci_dev:
+	kfree(ne_pci_dev);
+
+	return rc;
+}
+
+static void ne_pci_remove(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+		return;
+
+	ne_pci_dev_disable(pdev);
+
+	ne_teardown_msix(pdev);
+
+	pci_set_drvdata(pdev, NULL);
+
+	pci_iounmap(pdev, ne_pci_dev->iomem_base);
+
+	pci_release_regions(pdev);
+
+	pci_disable_device(pdev);
+
+	kfree(ne_pci_dev);
+}
+
+/*
+ * TODO: Add suspend / resume functions for power management w/ CONFIG_PM, if
+ * needed.
+ */
+struct pci_driver ne_pci_driver = {
+	.name		= "nitro_enclaves",
+	.id_table	= ne_pci_ids,
+	.probe		= ne_pci_probe,
+	.remove		= ne_pci_remove,
+};
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (3 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 04/18] nitro_enclaves: Init PCI device driver Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-02 15:19   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events Andra Paraschiv
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv, kbuild test robot

The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>

Fix issue reported in:
https://lore.kernel.org/lkml/202004231644.xTmN4Z1z%25lkp@intel.com/

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 232 +++++++++++++++++++++++
 1 file changed, 232 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 235fa3ecbee2..c24230cfe7c0 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -27,6 +27,218 @@ static const struct pci_device_id ne_pci_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, ne_pci_ids);
 
+/**
+ * ne_submit_request - Submit command request to the PCI device based on the
+ * command type.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to send the command to.
+ * @cmd_type: command type of the request sent to the PCI device.
+ * @cmd_request: command request payload.
+ * @cmd_request_size: size of the command request payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_submit_request(struct pci_dev *pdev,
+			     enum ne_pci_dev_cmd_type cmd_type,
+			     void *cmd_request, size_t cmd_request_size)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+		return -EINVAL;
+
+	memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
+		    cmd_request_size);
+
+	iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+
+	return 0;
+}
+
+/**
+ * ne_retrieve_reply - Retrieve reply from the PCI device.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to receive the reply from.
+ * @cmd_reply: command reply payload.
+ * @cmd_reply_size: size of the command reply payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_retrieve_reply(struct pci_dev *pdev,
+			     struct ne_pci_dev_cmd_reply *cmd_reply,
+			     size_t cmd_reply_size)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+		return -EINVAL;
+
+	memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
+		      cmd_reply_size);
+
+	return 0;
+}
+
+/**
+ * ne_wait_for_reply - Wait for a reply of a PCI command.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device for which a reply is waited.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+	int rc = -EINVAL;
+
+	if (!ne_pci_dev)
+		return -EINVAL;
+
+	/*
+	 * TODO: Update to _interruptible and handle interrupted wait event
+	 * e.g. -ERESTARTSYS, incoming signals + add / update timeout.
+	 */
+	rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+				atomic_read(&ne_pci_dev->cmd_reply_avail) != 0,
+				msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
+	if (!rc)
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
+		  void *cmd_request, size_t cmd_request_size,
+		  struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
+{
+	struct ne_pci_dev *ne_pci_dev = NULL;
+	int rc = -EINVAL;
+
+	if (!pdev)
+		return -ENODEV;
+
+	ne_pci_dev = pci_get_drvdata(pdev);
+	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
+		return -EINVAL;
+
+	if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+		dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%u\n",
+				    cmd_type);
+
+		return -EINVAL;
+	}
+
+	if (!cmd_request) {
+		dev_err_ratelimited(&pdev->dev, "Null cmd request\n");
+
+		return -EINVAL;
+	}
+
+	if (cmd_request_size > NE_SEND_DATA_SIZE) {
+		dev_err_ratelimited(&pdev->dev,
+				    "Invalid req size=%zu for cmd type=%u\n",
+				    cmd_request_size, cmd_type);
+
+		return -EINVAL;
+	}
+
+	if (!cmd_reply) {
+		dev_err_ratelimited(&pdev->dev, "Null cmd reply\n");
+
+		return -EINVAL;
+	}
+
+	if (cmd_reply_size > NE_RECV_DATA_SIZE) {
+		dev_err_ratelimited(&pdev->dev, "Invalid reply size=%zu\n",
+				    cmd_reply_size);
+
+		return -EINVAL;
+	}
+
+	/*
+	 * Use this mutex so that the PCI device handles one command request at
+	 * a time.
+	 */
+	mutex_lock(&ne_pci_dev->pci_dev_mutex);
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+
+	rc = ne_submit_request(pdev, cmd_type, cmd_request, cmd_request_size);
+	if (rc < 0) {
+		dev_err_ratelimited(&pdev->dev,
+				    "Error in submit request [rc=%d]\n", rc);
+
+		goto unlock_mutex;
+	}
+
+	rc = ne_wait_for_reply(pdev);
+	if (rc < 0) {
+		dev_err_ratelimited(&pdev->dev,
+				    "Error in wait for reply [rc=%d]\n", rc);
+
+		goto unlock_mutex;
+	}
+
+	rc = ne_retrieve_reply(pdev, cmd_reply, cmd_reply_size);
+	if (rc < 0) {
+		dev_err_ratelimited(&pdev->dev,
+				    "Error in retrieve reply [rc=%d]\n", rc);
+
+		goto unlock_mutex;
+	}
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
+
+	if (cmd_reply->rc < 0) {
+		dev_err_ratelimited(&pdev->dev,
+				    "Error in cmd process logic [rc=%d]\n",
+				    cmd_reply->rc);
+
+		rc = cmd_reply->rc;
+
+		goto unlock_mutex;
+	}
+
+	mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+	return 0;
+
+unlock_mutex:
+	mutex_unlock(&ne_pci_dev->pci_dev_mutex);
+
+	return rc;
+}
+
+/**
+ * ne_reply_handler - Interrupt handler for retrieving a reply matching
+ * a request sent to the PCI device for enclave lifetime management.
+ *
+ * @irq: received interrupt for a reply sent by the PCI device.
+ * @args: PCI device private data structure.
+ *
+ * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
+ */
+static irqreturn_t ne_reply_handler(int irq, void *args)
+{
+	struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+	if (!ne_pci_dev)
+		return IRQ_NONE;
+
+	atomic_set(&ne_pci_dev->cmd_reply_avail, 1);
+
+	/* TODO: Update to _interruptible. */
+	wake_up(&ne_pci_dev->cmd_reply_wait_q);
+
+	return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_msix - Setup MSI-X vectors for the PCI device.
  *
@@ -59,7 +271,25 @@ static int ne_setup_msix(struct pci_dev *pdev)
 		return rc;
 	}
 
+	/*
+	 * This IRQ gets triggered every time the PCI device responds to a
+	 * command request. The reply is then retrieved, reading from the MMIO
+	 * space of the PCI device.
+	 */
+	rc = request_irq(pci_irq_vector(pdev, NE_VEC_REPLY),
+			 ne_reply_handler, 0, "enclave_cmd", ne_pci_dev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in request irq reply [rc=%d]\n", rc);
+
+		goto free_irq_vectors;
+	}
+
 	return 0;
+
+free_irq_vectors:
+	pci_free_irq_vectors(pdev);
+
+	return rc;
 }
 
 /**
@@ -74,6 +304,8 @@ static void ne_teardown_msix(struct pci_dev *pdev)
 	if (!ne_pci_dev)
 		return;
 
+	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
+
 	pci_free_irq_vectors(pdev);
 }
 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (4 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-02 15:24   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface Andra Paraschiv
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.

Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 122 +++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index c24230cfe7c0..9a137862cade 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -239,6 +239,93 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
 	return IRQ_HANDLED;
 }
 
+/**
+ * ne_event_work_handler - Work queue handler for notifying enclaves on
+ * a state change received by the event interrupt handler.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * @work: item containing the Nitro Enclaves PCI device for which a
+ *	  out-of-band event was issued.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	struct ne_enclave *ne_enclave = NULL;
+	struct ne_pci_dev *ne_pci_dev =
+		container_of(work, struct ne_pci_dev, notify_work);
+	int rc = -EINVAL;
+	struct slot_info_req slot_info_req = {};
+
+	if (!ne_pci_dev)
+		return;
+
+	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+	/*
+	 * Iterate over all enclaves registered for the Nitro Enclaves
+	 * PCI device and determine for which enclave(s) the out-of-band event
+	 * is corresponding to.
+	 */
+	list_for_each_entry(ne_enclave, &ne_pci_dev->enclaves_list,
+			    enclave_list_entry) {
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		/*
+		 * Enclaves that were never started cannot receive out-of-band
+		 * events.
+		 */
+		if (ne_enclave->state != NE_STATE_RUNNING)
+			goto unlock;
+
+		slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+		rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, &slot_info_req,
+				   sizeof(slot_info_req), &cmd_reply,
+				   sizeof(cmd_reply));
+		if (rc < 0)
+			dev_err(&ne_enclave->pdev->dev,
+				"Error in slot info [rc=%d]\n", rc);
+
+		/* Notify enclave process that the enclave state changed. */
+		if (ne_enclave->state != cmd_reply.state) {
+			ne_enclave->state = cmd_reply.state;
+
+			ne_enclave->has_event = true;
+
+			wake_up_interruptible(&ne_enclave->eventq);
+		}
+
+unlock:
+		 mutex_unlock(&ne_enclave->enclave_info_mutex);
+	}
+
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler - Interrupt handler for PCI device out-of-band
+ * events. This interrupt does not supply any data in the MMIO region.
+ * It notifies a change in the state of any of the launched enclaves.
+ *
+ * @irq: received interrupt for an out-of-band event.
+ * @args: PCI device private data structure.
+ *
+ * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+	struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+	if (!ne_pci_dev)
+		return IRQ_NONE;
+
+	queue_work(ne_pci_dev->event_wq, &ne_pci_dev->notify_work);
+
+	return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_msix - Setup MSI-X vectors for the PCI device.
  *
@@ -284,8 +371,37 @@ static int ne_setup_msix(struct pci_dev *pdev)
 		goto free_irq_vectors;
 	}
 
+	ne_pci_dev->event_wq = create_singlethread_workqueue("ne_pci_dev_wq");
+	if (!ne_pci_dev->event_wq) {
+		rc = -ENOMEM;
+
+		dev_err(&pdev->dev, "Cannot get wq for dev events [rc=%d]\n",
+			rc);
+
+		goto free_reply_irq_vec;
+	}
+
+	INIT_WORK(&ne_pci_dev->notify_work, ne_event_work_handler);
+
+	/*
+	 * This IRQ gets triggered every time any enclave's state changes. Its
+	 * handler then scans for the changes and propagates them to the user
+	 * space.
+	 */
+	rc = request_irq(pci_irq_vector(pdev, NE_VEC_EVENT),
+			 ne_event_handler, 0, "enclave_evt", ne_pci_dev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in request irq event [rc=%d]\n", rc);
+
+		goto destroy_wq;
+	}
+
 	return 0;
 
+destroy_wq:
+	destroy_workqueue(ne_pci_dev->event_wq);
+free_reply_irq_vec:
+	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
 free_irq_vectors:
 	pci_free_irq_vectors(pdev);
 
@@ -304,6 +420,12 @@ static void ne_teardown_msix(struct pci_dev *pdev)
 	if (!ne_pci_dev)
 		return;
 
+	free_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_pci_dev);
+
+	flush_work(&ne_pci_dev->notify_work);
+	flush_workqueue(ne_pci_dev->event_wq);
+	destroy_workqueue(ne_pci_dev->event_wq);
+
 	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
 
 	pci_free_irq_vectors(pdev);
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (5 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-06-29 16:20   ` Greg KH
  2020-07-06  7:13   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation Andra Paraschiv
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
  pool is now setup at runtime, via a sysfs file for the kernel
  parameter.
* Add minimum enclave memory size definition.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
  paths.
* Remove file ops that do nothing for now - open and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
  cores are given for the enclave(s).
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 133 ++++++++++++++++++++++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
 2 files changed, 144 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index 000000000000..628fb10c2b36
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/capability.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/hugetlb.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/nitro_enclaves.h>
+#include <linux/pci.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+#define NE_MIN_ENCLAVE_MEM_SIZE (64 * 1024UL * 1024UL)
+
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+/*
+ * TODO: Update logic to create new sysfs entries instead of using
+ * a kernel parameter e.g. if multiple sysfs files needed.
+ */
+static const struct kernel_param_ops ne_cpu_pool_ops = {
+};
+
+static char ne_cpus[PAGE_SIZE];
+static struct kparam_string ne_cpus_arg = {
+	.maxlen = sizeof(ne_cpus),
+	.string = ne_cpus,
+};
+
+module_param_cb(ne_cpus, &ne_cpu_pool_ops, &ne_cpus_arg, 0644);
+MODULE_PARM_DESC(ne_cpus, "<cpu-list> - CPU pool used for Nitro Enclaves");
+
+/* CPU pool used for Nitro Enclaves. */
+struct ne_cpu_pool {
+	/* Available CPUs in the pool. */
+	cpumask_var_t avail;
+	struct mutex mutex;
+};
+
+static struct ne_cpu_pool ne_cpu_pool;
+
+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	switch (cmd) {
+	case NE_GET_API_VERSION:
+		return NE_API_VERSION;
+
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
+static const struct file_operations ne_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= noop_llseek,
+	.unlocked_ioctl	= ne_ioctl,
+};
+
+struct miscdevice ne_misc_dev = {
+	.minor	= MISC_DYNAMIC_MINOR,
+	.name	= "nitro_enclaves",
+	.fops	= &ne_fops,
+	.mode	= 0660,
+};
+
+static int __init ne_init(void)
+{
+	struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+					      PCI_DEVICE_ID_NE, NULL);
+	int rc = -EINVAL;
+
+	if (!pdev)
+		return -ENODEV;
+
+	if (!zalloc_cpumask_var(&ne_cpu_pool.avail, GFP_KERNEL))
+		return -ENOMEM;
+
+	mutex_init(&ne_cpu_pool.mutex);
+
+	rc = pci_register_driver(&ne_pci_driver);
+	if (rc < 0) {
+		dev_err(&pdev->dev,
+			"Error in pci register driver [rc=%d]\n", rc);
+
+		goto free_cpumask;
+	}
+
+	return 0;
+
+free_cpumask:
+	free_cpumask_var(ne_cpu_pool.avail);
+
+	return rc;
+}
+
+static void __exit ne_exit(void)
+{
+	pci_unregister_driver(&ne_pci_driver);
+
+	free_cpumask_var(ne_cpu_pool.avail);
+}
+
+/* TODO: Handle actions such as reboot, kexec. */
+
+module_init(ne_init);
+module_exit(ne_exit);
+
+MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
+MODULE_DESCRIPTION("Nitro Enclaves Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 9a137862cade..c781cd0a50bf 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -557,6 +557,13 @@ static int ne_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto teardown_msix;
 	}
 
+	rc = misc_register(&ne_misc_dev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in misc dev register [rc=%d]\n", rc);
+
+		goto disable_ne_pci_dev;
+	}
+
 	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
 	init_waitqueue_head(&ne_pci_dev->cmd_reply_wait_q);
 	INIT_LIST_HEAD(&ne_pci_dev->enclaves_list);
@@ -566,6 +573,8 @@ static int ne_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	return 0;
 
+disable_ne_pci_dev:
+	ne_pci_dev_disable(pdev);
 teardown_msix:
 	ne_teardown_msix(pdev);
 iounmap_pci_bar:
@@ -588,6 +597,8 @@ static void ne_pci_remove(struct pci_dev *pdev)
 	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
 		return;
 
+	misc_deregister(&ne_misc_dev);
+
 	ne_pci_dev_disable(pdev);
 
 	ne_teardown_msix(pdev);
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (6 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06  7:53   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation Andra Paraschiv
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 168 ++++++++++++++++++++++
 1 file changed, 168 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 628fb10c2b36..f70496813033 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -60,12 +60,180 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+	__poll_t mask = 0;
+	struct ne_enclave *ne_enclave = file->private_data;
+
+	poll_wait(file, &ne_enclave->eventq, wait);
+
+	if (!ne_enclave->has_event)
+		return mask;
+
+	mask = POLLHUP;
+
+	return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= noop_llseek,
+	.poll		= ne_enclave_poll,
+};
+
+/**
+ * ne_create_vm_ioctl - Alloc slot to be associated with an enclave. Create
+ * enclave file descriptor to be further used for enclave resources handling
+ * e.g. memory regions and CPUs.
+ *
+ * This function gets called with the ne_pci_dev enclave mutex held.
+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ * @ne_pci_dev: private data associated with the PCI device.
+ * @slot_uid: generated unique slot id associated with an enclave.
+ *
+ * @returns: enclave fd on success, negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct pci_dev *pdev,
+			      struct ne_pci_dev *ne_pci_dev, u64 *slot_uid)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	unsigned int cpu = 0;
+	int fd = 0;
+	struct file *file = NULL;
+	struct ne_enclave *ne_enclave = NULL;
+	int numa_node = -1;
+	int rc = -EINVAL;
+	struct slot_alloc_req slot_alloc_req = {};
+
+	ne_enclave = kzalloc(sizeof(*ne_enclave), GFP_KERNEL);
+	if (!ne_enclave)
+		return -ENOMEM;
+
+	if (!zalloc_cpumask_var(&ne_enclave->cpu_siblings, GFP_KERNEL)) {
+		kfree(ne_enclave);
+
+		return -ENOMEM;
+	}
+
+	cpu = cpumask_any(ne_cpu_pool.avail);
+	if (cpu >= nr_cpu_ids) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "No CPUs available in CPU pool\n");
+
+		goto free_cpumask;
+	}
+
+	numa_node = cpu_to_node(cpu);
+	if (numa_node < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Invalid NUMA node %d\n", numa_node);
+
+		goto free_cpumask;
+	}
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		rc = fd;
+
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in getting unused fd [rc=%d]\n",
+				    rc);
+
+		goto free_cpumask;
+	}
+
+	file = anon_inode_getfile("ne-vm", &ne_enclave_fops, ne_enclave,
+				  O_RDWR);
+	if (IS_ERR(file)) {
+		rc = PTR_ERR(file);
+
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in anon inode get file [rc=%d]\n",
+				    rc);
+
+		goto put_fd;
+	}
+
+	ne_enclave->pdev = pdev;
+
+	rc = ne_do_request(ne_enclave->pdev, SLOT_ALLOC, &slot_alloc_req,
+			   sizeof(slot_alloc_req), &cmd_reply,
+			   sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in slot alloc [rc=%d]\n", rc);
+
+		goto put_file;
+	}
+
+	init_waitqueue_head(&ne_enclave->eventq);
+	ne_enclave->has_event = false;
+	mutex_init(&ne_enclave->enclave_info_mutex);
+	ne_enclave->max_mem_regions = cmd_reply.mem_regions;
+	INIT_LIST_HEAD(&ne_enclave->mem_regions_list);
+	ne_enclave->mm = current->mm;
+	ne_enclave->numa_node = numa_node;
+	ne_enclave->slot_uid = cmd_reply.slot_uid;
+	ne_enclave->state = NE_STATE_INIT;
+	INIT_LIST_HEAD(&ne_enclave->vcpu_ids_list);
+
+	list_add(&ne_enclave->enclave_list_entry, &ne_pci_dev->enclaves_list);
+
+	*slot_uid = ne_enclave->slot_uid;
+
+	fd_install(fd, file);
+
+	return fd;
+
+put_file:
+	fput(file);
+put_fd:
+	put_unused_fd(fd);
+free_cpumask:
+	free_cpumask_var(ne_enclave->cpu_siblings);
+	kfree(ne_enclave);
+
+	return rc;
+}
+
 static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
+	struct ne_pci_dev *ne_pci_dev = NULL;
+	struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+					      PCI_DEVICE_ID_NE, NULL);
+
+	if (!pdev)
+		return -ENODEV;
+
+	ne_pci_dev = pci_get_drvdata(pdev);
+	if (!ne_pci_dev)
+		return -EINVAL;
+
 	switch (cmd) {
 	case NE_GET_API_VERSION:
 		return NE_API_VERSION;
 
+	case NE_CREATE_VM: {
+		u64 slot_uid = 0;
+		int rc = -EINVAL;
+
+		mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+		rc = ne_create_vm_ioctl(pdev, ne_pci_dev, &slot_uid);
+
+		mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+		if (copy_to_user((void *)arg, &slot_uid, sizeof(slot_uid))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy to user\n");
+
+			return -EFAULT;
+		}
+
+		return rc;
+	}
+
 	default:
 		return -ENOTTY;
 	}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (7 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 10:12   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info Andra Paraschiv
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

An enclave, before being started, has its resources set. One of its
resources is CPU.

The NE CPU pool is set for choosing CPUs for enclaves from it. Offline
the CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown.

The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.

Add ioctl command logic for enclave vCPU creation. Return as result a
file descriptor that is associated with the enclave vCPU.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Setup the NE CPU pool at runtime via a sysfs file for the kernel
  parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vcpu.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 491 ++++++++++++++++++++++
 1 file changed, 491 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index f70496813033..d6777008f685 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -39,7 +39,11 @@
  * TODO: Update logic to create new sysfs entries instead of using
  * a kernel parameter e.g. if multiple sysfs files needed.
  */
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
+
 static const struct kernel_param_ops ne_cpu_pool_ops = {
+	.get = param_get_string,
+	.set = ne_set_kernel_param,
 };
 
 static char ne_cpus[PAGE_SIZE];
@@ -60,6 +64,485 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+static const struct file_operations ne_enclave_vcpu_fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= noop_llseek,
+};
+
+/**
+ * ne_check_enclaves_created - Verify if at least one enclave has been created.
+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ *
+ * @returns: true if at least one enclave is created, false otherwise.
+ */
+static bool ne_check_enclaves_created(struct pci_dev *pdev)
+{
+	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
+
+	if (!ne_pci_dev)
+		return false;
+
+	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+	if (list_empty(&ne_pci_dev->enclaves_list)) {
+		mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+		return false;
+	}
+
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+	return true;
+}
+
+/**
+ * ne_setup_cpu_pool - Set the NE CPU pool after handling sanity checks such as
+ * not sharing CPU cores with the primary / parent VM or not using CPU 0, which
+ * should remain available for the primary / parent VM. Offline the CPUs from
+ * the pool after the checks passed.
+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ * @ne_cpu_list: the CPU list used for setting NE CPU pool.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_cpu_pool(struct pci_dev *pdev, const char *ne_cpu_list)
+{
+	unsigned int cpu = 0;
+	unsigned int cpu_sibling = 0;
+	int numa_node = -1;
+	int rc = -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN)) {
+		dev_err(&pdev->dev, "No admin capability for CPU pool setup\n");
+
+		return -EPERM;
+	}
+
+	if (!ne_cpu_list)
+		return 0;
+
+	if (ne_check_enclaves_created(pdev)) {
+		dev_err(&pdev->dev, "The CPU pool is used, enclaves created\n");
+
+		return -EINVAL;
+	}
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	rc = cpulist_parse(ne_cpu_list, ne_cpu_pool.avail);
+	if (rc < 0) {
+		dev_err(&pdev->dev,
+			"Error in cpulist parse [rc=%d]\n", rc);
+
+		goto unlock_mutex;
+	}
+
+	/*
+	 * Check if CPU 0 and its siblings are included in the provided CPU pool
+	 * They should remain available for the primary / parent VM.
+	 */
+	if (cpumask_test_cpu(0, ne_cpu_pool.avail)) {
+
+		dev_err(&pdev->dev,
+			"CPU 0 has to remain available for the primary VM\n");
+
+		rc = -EINVAL;
+
+		goto unlock_mutex;
+	}
+
+	for_each_cpu(cpu_sibling, topology_sibling_cpumask(0)) {
+		if (cpumask_test_cpu(cpu_sibling, ne_cpu_pool.avail)) {
+			dev_err(&pdev->dev,
+				"CPU sibling %d of CPU 0 is in the CPU pool\n",
+				cpu_sibling);
+
+			rc = -EINVAL;
+
+			goto unlock_mutex;
+		}
+	}
+
+	/*
+	 * Check if CPU siblings are included in the provided CPU pool. The
+	 * expectation is that CPU cores are made available in the CPU pool for
+	 * enclaves.
+	 */
+	for_each_cpu(cpu, ne_cpu_pool.avail) {
+		for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
+			if (!cpumask_test_cpu(cpu_sibling, ne_cpu_pool.avail)) {
+				dev_err(&pdev->dev,
+					"CPU %d is not in the CPU pool\n",
+					cpu_sibling);
+
+				rc = -EINVAL;
+
+				goto unlock_mutex;
+			}
+		}
+	}
+
+	/*
+	 * Check if the CPUs from the NE CPU pool are from the same NUMA node.
+	 */
+	for_each_cpu(cpu, ne_cpu_pool.avail) {
+		if (numa_node < 0) {
+			numa_node = cpu_to_node(cpu);
+
+			if (numa_node < 0) {
+				dev_err(&pdev->dev,
+					"Invalid NUMA node %d\n", numa_node);
+
+				rc = -EINVAL;
+
+				goto unlock_mutex;
+			}
+		} else {
+			if (numa_node != cpu_to_node(cpu)) {
+				dev_err(&pdev->dev,
+					"CPUs are from different NUMA nodes\n");
+
+				rc = -EINVAL;
+
+				goto unlock_mutex;
+			}
+		}
+	}
+
+	for_each_cpu(cpu, ne_cpu_pool.avail) {
+		rc = remove_cpu(cpu);
+		if (rc != 0) {
+			dev_err(&pdev->dev,
+				"CPU %d is not offlined [rc=%d]\n", cpu, rc);
+
+			goto online_cpus;
+		}
+	}
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	return 0;
+
+online_cpus:
+	for_each_cpu(cpu, ne_cpu_pool.avail)
+		add_cpu(cpu);
+unlock_mutex:
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	return rc;
+}
+
+/**
+ * ne_teardown_cpu_pool - Online the CPUs from the NE CPU pool and cleanup the
+ * CPU pool.
+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ */
+static void ne_teardown_cpu_pool(struct pci_dev *pdev)
+{
+	unsigned int cpu = 0;
+	int rc = -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN)) {
+		dev_err(&pdev->dev, "No admin capability for CPU pool setup\n");
+
+		return;
+	}
+
+	if (!ne_cpu_pool.avail)
+		return;
+
+	if (ne_check_enclaves_created(pdev)) {
+		dev_err(&pdev->dev, "The CPU pool is used, enclaves created\n");
+
+		return;
+	}
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	for_each_cpu(cpu, ne_cpu_pool.avail) {
+		rc = add_cpu(cpu);
+		if (rc != 0)
+			dev_err(&pdev->dev,
+				"CPU %d is not onlined [rc=%d]\n", cpu, rc);
+	}
+
+	cpumask_clear(ne_cpu_pool.avail);
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+}
+
+static int ne_set_kernel_param(const char *val, const struct kernel_param *kp)
+{
+	const char *ne_cpu_list = val;
+	struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+					      PCI_DEVICE_ID_NE, NULL);
+	int rc = -EINVAL;
+
+	if (!pdev)
+		return -ENODEV;
+
+	ne_teardown_cpu_pool(pdev);
+
+	rc = ne_setup_cpu_pool(pdev, ne_cpu_list);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "Error in setup CPU pool [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	return param_set_copystring(val, kp);
+}
+
+/**
+ * ne_get_cpu_from_cpu_pool - Get a CPU from the CPU pool. If the vCPU id is 0,
+ * the CPU is autogenerated and chosen from the NE CPU pool.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @vcpu_id: id of the CPU to be associated with the given slot, apic id on x86.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_get_cpu_from_cpu_pool(struct ne_enclave *ne_enclave, u32 *vcpu_id)
+{
+	unsigned int cpu = 0;
+	unsigned int cpu_sibling = 0;
+
+	if (*vcpu_id != 0) {
+		if (cpumask_test_cpu(*vcpu_id, ne_enclave->cpu_siblings)) {
+			cpumask_clear_cpu(*vcpu_id, ne_enclave->cpu_siblings);
+
+			return 0;
+		}
+
+		mutex_lock(&ne_cpu_pool.mutex);
+
+		if (!cpumask_test_cpu(*vcpu_id, ne_cpu_pool.avail)) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "CPU %d is not in NE CPU pool\n",
+					    *vcpu_id);
+
+			mutex_unlock(&ne_cpu_pool.mutex);
+
+			return -EINVAL;
+		}
+
+		cpumask_clear_cpu(*vcpu_id, ne_cpu_pool.avail);
+
+		/*
+		 * Make sure the CPU siblings are not marked as available
+		 * anymore.
+		 */
+		for_each_cpu(cpu_sibling, topology_sibling_cpumask(*vcpu_id)) {
+			if (cpu_sibling != *vcpu_id) {
+				cpumask_clear_cpu(cpu_sibling,
+						  ne_cpu_pool.avail);
+
+				cpumask_set_cpu(cpu_sibling,
+						ne_enclave->cpu_siblings);
+			}
+		}
+
+		mutex_unlock(&ne_cpu_pool.mutex);
+
+		return 0;
+	}
+
+	/* There are CPU siblings available to choose from. */
+	cpu = cpumask_any(ne_enclave->cpu_siblings);
+	if (cpu < nr_cpu_ids) {
+		cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
+
+		*vcpu_id = cpu;
+
+		return 0;
+	}
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	/* Choose any CPU from the available CPU pool. */
+	cpu = cpumask_any(ne_cpu_pool.avail);
+	if (cpu >= nr_cpu_ids) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "No CPUs available in CPU pool\n");
+
+		mutex_unlock(&ne_cpu_pool.mutex);
+
+		return -EINVAL;
+	}
+
+	cpumask_clear_cpu(cpu, ne_cpu_pool.avail);
+
+	/* Make sure the CPU siblings are not marked as available anymore. */
+	for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
+		if (cpu_sibling != cpu) {
+			cpumask_clear_cpu(cpu_sibling, ne_cpu_pool.avail);
+
+			cpumask_set_cpu(cpu_sibling, ne_enclave->cpu_siblings);
+		}
+	}
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+
+	*vcpu_id = cpu;
+
+	return 0;
+}
+
+/**
+ * ne_create_vcpu_ioctl - Add vCPU to the slot associated with the current
+ * enclave. Create vCPU file descriptor to be further used for CPU handling.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @vcpu_id: id of the CPU to be associated with the given slot, apic id on x86.
+ *
+ * @returns: vCPU fd on success, negative return value on failure.
+ */
+static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	int fd = 0;
+	struct file *file = NULL;
+	struct ne_vcpu_id *ne_vcpu_id = NULL;
+	int rc = -EINVAL;
+	struct slot_add_vcpu_req slot_add_vcpu_req = {};
+
+	if (ne_enclave->mm != current->mm)
+		return -EIO;
+
+	ne_vcpu_id = kzalloc(sizeof(*ne_vcpu_id), GFP_KERNEL);
+	if (!ne_vcpu_id)
+		return -ENOMEM;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		rc = fd;
+
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in getting unused fd [rc=%d]\n", rc);
+
+		goto free_ne_vcpu_id;
+	}
+
+	/* TODO: Include (vcpu) id in the ne-vm-vcpu naming. */
+	file = anon_inode_getfile("ne-vm-vcpu", &ne_enclave_vcpu_fops,
+				  ne_enclave, O_RDWR);
+	if (IS_ERR(file)) {
+		rc = PTR_ERR(file);
+
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in anon inode get file [rc=%d]\n",
+				    rc);
+
+		goto put_fd;
+	}
+
+	slot_add_vcpu_req.slot_uid = ne_enclave->slot_uid;
+	slot_add_vcpu_req.vcpu_id = vcpu_id;
+
+	rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_VCPU, &slot_add_vcpu_req,
+			   sizeof(slot_add_vcpu_req), &cmd_reply,
+			   sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in slot add vcpu [rc=%d]\n", rc);
+
+		goto put_file;
+	}
+
+	ne_vcpu_id->vcpu_id = vcpu_id;
+
+	list_add(&ne_vcpu_id->vcpu_id_list_entry, &ne_enclave->vcpu_ids_list);
+
+	ne_enclave->nr_vcpus++;
+
+	fd_install(fd, file);
+
+	return fd;
+
+put_file:
+	fput(file);
+put_fd:
+	put_unused_fd(fd);
+free_ne_vcpu_id:
+	kfree(ne_vcpu_id);
+
+	return rc;
+}
+
+static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
+			     unsigned long arg)
+{
+	struct ne_enclave *ne_enclave = file->private_data;
+
+	if (!ne_enclave || !ne_enclave->pdev)
+		return -EINVAL;
+
+	switch (cmd) {
+	case NE_CREATE_VCPU: {
+		int rc = -EINVAL;
+		u32 vcpu_id = 0;
+
+		if (copy_from_user(&vcpu_id, (void *)arg, sizeof(vcpu_id))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy from user\n");
+
+			return -EFAULT;
+		}
+
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		if (ne_enclave->state != NE_STATE_INIT) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave isn't in init state\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -EINVAL;
+		}
+
+		/* Use the CPU pool for choosing a CPU for the enclave. */
+		rc = ne_get_cpu_from_cpu_pool(ne_enclave, &vcpu_id);
+		if (rc < 0) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in get CPU from pool\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -EINVAL;
+		}
+
+		rc = ne_create_vcpu_ioctl(ne_enclave, vcpu_id);
+
+		/* Put back the CPU in enclave cpu pool, if add vcpu error. */
+		if (rc < 0)
+			cpumask_set_cpu(vcpu_id, ne_enclave->cpu_siblings);
+
+		mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+		if (copy_to_user((void *)arg, &vcpu_id, sizeof(vcpu_id))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy to user\n");
+
+			return -EFAULT;
+		}
+
+		return rc;
+	}
+
+	default:
+		return -ENOTTY;
+	}
+
+	return 0;
+}
+
 static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
 {
 	__poll_t mask = 0;
@@ -79,6 +562,7 @@ static const struct file_operations ne_enclave_fops = {
 	.owner		= THIS_MODULE,
 	.llseek		= noop_llseek,
 	.poll		= ne_enclave_poll,
+	.unlocked_ioctl	= ne_enclave_ioctl,
 };
 
 /**
@@ -286,8 +770,15 @@ static int __init ne_init(void)
 
 static void __exit ne_exit(void)
 {
+	struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
+					      PCI_DEVICE_ID_NE, NULL);
+	if (!pdev)
+		return;
+
 	pci_unregister_driver(&ne_pci_driver);
 
+	ne_teardown_cpu_pool(pdev);
+
 	free_cpumask_var(ne_cpu_pool.avail);
 }
 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (8 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 10:16   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set Andra Paraschiv
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 25 +++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index d6777008f685..cfdefa52ed2a 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -536,6 +536,31 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
 		return rc;
 	}
 
+	case NE_GET_IMAGE_LOAD_INFO: {
+		struct ne_image_load_info image_load_info = {};
+
+		if (copy_from_user(&image_load_info, (void *)arg,
+				   sizeof(image_load_info))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy from user\n");
+
+			return -EFAULT;
+		}
+
+		if (image_load_info.flags == NE_EIF_IMAGE)
+			image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
+
+		if (copy_to_user((void *)arg, &image_load_info,
+				 sizeof(image_load_info))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy to user\n");
+
+			return -EFAULT;
+		}
+
+		return 0;
+	}
+
 	default:
 		return -ENOTTY;
 	}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (9 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 10:46   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start Andra Paraschiv
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Check enclave memory regions are from the same NUMA node as the
  enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
  memory region.
* Check if enclave state is init when setting an enclave memory region.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 257 ++++++++++++++++++++++
 1 file changed, 257 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index cfdefa52ed2a..17ccb6cdbd75 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -476,6 +476,233 @@ static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
 	return rc;
 }
 
+/**
+ * ne_sanity_check_user_mem_region - Sanity check the userspace memory
+ * region received during the set user memory region ioctl call.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be sanity checked.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+	struct ne_user_memory_region *mem_region)
+{
+	if (ne_enclave->mm != current->mm)
+		return -EIO;
+
+	if ((mem_region->memory_size % NE_MIN_MEM_REGION_SIZE) != 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Mem size not multiple of 2 MiB\n");
+
+		return -EINVAL;
+	}
+
+	if ((mem_region->userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
+	    !access_ok((void __user *)(unsigned long)mem_region->userspace_addr,
+		       mem_region->memory_size)) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Invalid user space addr range\n");
+
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * ne_set_user_memory_region_ioctl - Add user space memory region to the slot
+ * associated with the current enclave.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be associated with the given slot.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
+	struct ne_user_memory_region *mem_region)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	long gup_rc = 0;
+	unsigned long i = 0;
+	struct ne_mem_region *ne_mem_region = NULL;
+	unsigned long nr_phys_contig_mem_regions = 0;
+	unsigned long nr_pinned_pages = 0;
+	struct page **phys_contig_mem_regions = NULL;
+	int rc = -EINVAL;
+	struct slot_add_mem_req slot_add_mem_req = {};
+
+	rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
+	if (rc < 0)
+		return rc;
+
+	ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
+	if (!ne_mem_region)
+		return -ENOMEM;
+
+	/*
+	 * TODO: Update nr_pages value to handle contiguous virtual address
+	 * ranges mapped to non-contiguous physical regions. Hugetlbfs can give
+	 * 2 MiB / 1 GiB contiguous physical regions.
+	 */
+	ne_mem_region->nr_pages = mem_region->memory_size /
+		NE_MIN_MEM_REGION_SIZE;
+
+	ne_mem_region->pages = kcalloc(ne_mem_region->nr_pages,
+				       sizeof(*ne_mem_region->pages),
+				       GFP_KERNEL);
+	if (!ne_mem_region->pages) {
+		kfree(ne_mem_region);
+
+		return -ENOMEM;
+	}
+
+	phys_contig_mem_regions = kcalloc(ne_mem_region->nr_pages,
+					  sizeof(*phys_contig_mem_regions),
+					  GFP_KERNEL);
+	if (!phys_contig_mem_regions) {
+		kfree(ne_mem_region->pages);
+		kfree(ne_mem_region);
+
+		return -ENOMEM;
+	}
+
+	/*
+	 * TODO: Handle non-contiguous memory regions received from user space.
+	 * Hugetlbfs can give 2 MiB / 1 GiB contiguous physical regions. The
+	 * virtual address space can be seen as contiguous, although it is
+	 * mapped underneath to 2 MiB / 1 GiB physical regions e.g. 8 MiB
+	 * virtual address space mapped to 4 physically contiguous regions of 2
+	 * MiB.
+	 */
+	do {
+		unsigned long tmp_nr_pages = ne_mem_region->nr_pages -
+			nr_pinned_pages;
+		struct page **tmp_pages = ne_mem_region->pages +
+			nr_pinned_pages;
+		u64 tmp_userspace_addr = mem_region->userspace_addr +
+			nr_pinned_pages * NE_MIN_MEM_REGION_SIZE;
+
+		gup_rc = get_user_pages(tmp_userspace_addr, tmp_nr_pages,
+					FOLL_GET, tmp_pages, NULL);
+		if (gup_rc < 0) {
+			rc = gup_rc;
+
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in gup [rc=%d]\n", rc);
+
+			unpin_user_pages(ne_mem_region->pages, nr_pinned_pages);
+
+			goto free_mem_region;
+		}
+
+		nr_pinned_pages += gup_rc;
+
+	} while (nr_pinned_pages < ne_mem_region->nr_pages);
+
+	/*
+	 * TODO: Update checks once physically contiguous regions are collected
+	 * based on the user space address and get_user_pages() results.
+	 */
+	for (i = 0; i < ne_mem_region->nr_pages; i++) {
+		if (!PageHuge(ne_mem_region->pages[i])) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Not a hugetlbfs page\n");
+
+			goto unpin_pages;
+		}
+
+		if (huge_page_size(page_hstate(ne_mem_region->pages[i])) !=
+		    NE_MIN_MEM_REGION_SIZE) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Page size isn't 2 MiB\n");
+
+			goto unpin_pages;
+		}
+
+		if (ne_enclave->numa_node !=
+		    page_to_nid(ne_mem_region->pages[i])) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Page isn't from NUMA node %d\n",
+					    ne_enclave->numa_node);
+
+			goto unpin_pages;
+		}
+
+		/*
+		 * TODO: Update once handled non-contiguous memory regions
+		 * received from user space.
+		 */
+		phys_contig_mem_regions[i] = ne_mem_region->pages[i];
+	}
+
+	/*
+	 * TODO: Update once handled non-contiguous memory regions received
+	 * from user space.
+	 */
+	nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
+
+	if ((ne_enclave->nr_mem_regions + nr_phys_contig_mem_regions) >
+	    ne_enclave->max_mem_regions) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Reached max memory regions %lld\n",
+				    ne_enclave->max_mem_regions);
+
+		goto unpin_pages;
+	}
+
+	for (i = 0; i < nr_phys_contig_mem_regions; i++) {
+		u64 phys_addr = page_to_phys(phys_contig_mem_regions[i]);
+
+		slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
+		slot_add_mem_req.paddr = phys_addr;
+		/*
+		 * TODO: Update memory size of physical contiguous memory
+		 * region, in case of non-contiguous memory regions received
+		 * from user space.
+		 */
+		slot_add_mem_req.size = NE_MIN_MEM_REGION_SIZE;
+
+		rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_MEM,
+				   &slot_add_mem_req, sizeof(slot_add_mem_req),
+				   &cmd_reply, sizeof(cmd_reply));
+		if (rc < 0) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in slot add mem [rc=%d]\n",
+					    rc);
+
+			/* TODO: Only unpin memory regions not added. */
+			goto unpin_pages;
+		}
+
+		ne_enclave->mem_size += slot_add_mem_req.size;
+		ne_enclave->nr_mem_regions++;
+
+		memset(&slot_add_mem_req, 0, sizeof(slot_add_mem_req));
+		memset(&cmd_reply, 0, sizeof(cmd_reply));
+	}
+
+	list_add(&ne_mem_region->mem_region_list_entry,
+		 &ne_enclave->mem_regions_list);
+
+	kfree(phys_contig_mem_regions);
+
+	return 0;
+
+unpin_pages:
+	unpin_user_pages(ne_mem_region->pages, ne_mem_region->nr_pages);
+free_mem_region:
+	kfree(phys_contig_mem_regions);
+	kfree(ne_mem_region->pages);
+	kfree(ne_mem_region);
+
+	return rc;
+}
+
 static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
 			     unsigned long arg)
 {
@@ -561,6 +788,36 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
 		return 0;
 	}
 
+	case NE_SET_USER_MEMORY_REGION: {
+		struct ne_user_memory_region mem_region = {};
+		int rc = -EINVAL;
+
+		if (copy_from_user(&mem_region, (void *)arg,
+				   sizeof(mem_region))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy from user\n");
+
+			return -EFAULT;
+		}
+
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		if (ne_enclave->state != NE_STATE_INIT) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave isn't in init state\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -EINVAL;
+		}
+
+		rc = ne_set_user_memory_region_ioctl(ne_enclave, &mem_region);
+
+		mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+		return rc;
+	}
+
 	default:
 		return -ENOTTY;
 	}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (10 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 11:21   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination Andra Paraschiv
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.

v1 -> v2

* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 114 ++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 17ccb6cdbd75..d9794f327169 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -703,6 +703,45 @@ static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
 	return rc;
 }
 
+/**
+ * ne_start_enclave_ioctl - Trigger enclave start after the enclave resources,
+ * such as memory and CPU, have been set.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @enclave_start_info: enclave info that includes enclave cid and flags.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
+	struct ne_enclave_start_info *enclave_start_info)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	struct enclave_start_req enclave_start_req = {};
+	int rc = -EINVAL;
+
+	enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
+	enclave_start_req.flags = enclave_start_info->flags;
+	enclave_start_req.slot_uid = ne_enclave->slot_uid;
+
+	rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, &enclave_start_req,
+			   sizeof(enclave_start_req), &cmd_reply,
+			   sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in enclave start [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	ne_enclave->state = NE_STATE_RUNNING;
+
+	enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
+
+	return 0;
+}
+
 static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
 			     unsigned long arg)
 {
@@ -818,6 +857,81 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
 		return rc;
 	}
 
+	case NE_START_ENCLAVE: {
+		struct ne_enclave_start_info enclave_start_info = {};
+		int rc = -EINVAL;
+
+		if (copy_from_user(&enclave_start_info, (void *)arg,
+				   sizeof(enclave_start_info))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy from user\n");
+
+			return -EFAULT;
+		}
+
+		mutex_lock(&ne_enclave->enclave_info_mutex);
+
+		if (ne_enclave->state != NE_STATE_INIT) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave isn't in init state\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -EINVAL;
+		}
+
+		if (!ne_enclave->nr_mem_regions) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave has no mem regions\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -ENOMEM;
+		}
+
+		if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave memory is less than %ld\n",
+					    NE_MIN_ENCLAVE_MEM_SIZE);
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -ENOMEM;
+		}
+
+		if (!ne_enclave->nr_vcpus) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Enclave has no vcpus\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -EINVAL;
+		}
+
+		if (!cpumask_empty(ne_enclave->cpu_siblings)) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "CPU siblings not used\n");
+
+			mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+			return -EINVAL;
+		}
+
+		rc = ne_start_enclave_ioctl(ne_enclave, &enclave_start_info);
+
+		mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+		if (copy_to_user((void *)arg, &enclave_start_info,
+				 sizeof(enclave_start_info))) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in copy to user\n");
+
+			return -EFAULT;
+		}
+
+		return rc;
+	}
+
 	default:
 		return -ENOTTY;
 	}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (11 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 11:26   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver Andra Paraschiv
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Use dev_err instead of custom NE log pattern.

v2 -> v3

* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().

v1 -> v2

* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
  creation path.
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 176 ++++++++++++++++++++++
 1 file changed, 176 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index d9794f327169..7c998d6b0173 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -939,6 +939,181 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
 	return 0;
 }
 
+/**
+ * ne_enclave_remove_all_mem_region_entries - Remove all memory region
+ * entries from the enclave data structure.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ */
+static void ne_enclave_remove_all_mem_region_entries(
+	struct ne_enclave *ne_enclave)
+{
+	struct ne_mem_region *ne_mem_region = NULL;
+	struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+	list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+				 &ne_enclave->mem_regions_list,
+				 mem_region_list_entry) {
+		list_del(&ne_mem_region->mem_region_list_entry);
+
+		unpin_user_pages(ne_mem_region->pages,
+				 ne_mem_region->nr_pages);
+
+		kfree(ne_mem_region->pages);
+
+		kfree(ne_mem_region);
+	}
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries - Remove all vCPU id entries
+ * from the enclave data structure.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave *ne_enclave)
+{
+	unsigned int cpu = 0;
+	struct ne_vcpu_id *ne_vcpu_id = NULL;
+	struct ne_vcpu_id *ne_vcpu_id_tmp = NULL;
+
+	mutex_lock(&ne_cpu_pool.mutex);
+
+	list_for_each_entry_safe(ne_vcpu_id, ne_vcpu_id_tmp,
+				 &ne_enclave->vcpu_ids_list,
+				 vcpu_id_list_entry) {
+		list_del(&ne_vcpu_id->vcpu_id_list_entry);
+
+		/* Update the available CPU pool. */
+		cpumask_set_cpu(ne_vcpu_id->vcpu_id, ne_cpu_pool.avail);
+
+		kfree(ne_vcpu_id);
+	}
+
+	/* If any siblings left in the enclave CPU pool, move to available. */
+	for_each_cpu(cpu, ne_enclave->cpu_siblings) {
+		cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
+
+		cpumask_set_cpu(cpu, ne_cpu_pool.avail);
+	}
+
+	free_cpumask_var(ne_enclave->cpu_siblings);
+
+	mutex_unlock(&ne_cpu_pool.mutex);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry - Remove enclave entry from the data
+ * structure that is part of the PCI device private data.
+ *
+ * This function gets called with the ne_pci_dev enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @ne_pci_dev: private data associated with the PCI device.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+					    struct ne_pci_dev *ne_pci_dev)
+{
+	struct ne_enclave *ne_enclave_entry = NULL;
+	struct ne_enclave *ne_enclave_entry_tmp = NULL;
+
+	list_for_each_entry_safe(ne_enclave_entry, ne_enclave_entry_tmp,
+				 &ne_pci_dev->enclaves_list,
+				 enclave_list_entry) {
+		if (ne_enclave_entry->slot_uid == ne_enclave->slot_uid) {
+			list_del(&ne_enclave_entry->enclave_list_entry);
+
+			break;
+		}
+	}
+}
+
+static int ne_enclave_release(struct inode *inode, struct file *file)
+{
+	struct ne_pci_dev_cmd_reply cmd_reply = {};
+	struct enclave_stop_req enclave_stop_request = {};
+	struct ne_enclave *ne_enclave = file->private_data;
+	struct ne_pci_dev *ne_pci_dev = NULL;
+	int rc = -EINVAL;
+	struct slot_free_req slot_free_req = {};
+
+	if (!ne_enclave)
+		return 0;
+
+	/*
+	 * Early exit in case there is an error in the enclave creation logic
+	 * and fput() is called on the cleanup path.
+	 */
+	if (!ne_enclave->slot_uid)
+		return 0;
+
+	if (!ne_enclave->pdev)
+		return -EINVAL;
+
+	ne_pci_dev = pci_get_drvdata(ne_enclave->pdev);
+	if (!ne_pci_dev)
+		return -EINVAL;
+
+	/*
+	 * Acquire the enclave list mutex before the enclave mutex
+	 * in order to avoid deadlocks with @ref ne_event_work_handler.
+	 */
+	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+	mutex_lock(&ne_enclave->enclave_info_mutex);
+
+	if (ne_enclave->state != NE_STATE_INIT &&
+	    ne_enclave->state != NE_STATE_STOPPED) {
+		enclave_stop_request.slot_uid = ne_enclave->slot_uid;
+
+		rc = ne_do_request(ne_enclave->pdev, ENCLAVE_STOP,
+				   &enclave_stop_request,
+				   sizeof(enclave_stop_request), &cmd_reply,
+				   sizeof(cmd_reply));
+		if (rc < 0) {
+			dev_err_ratelimited(ne_misc_dev.this_device,
+					    "Error in enclave stop [rc=%d]\n",
+					    rc);
+
+			goto unlock_mutex;
+		}
+
+		memset(&cmd_reply, 0, sizeof(cmd_reply));
+	}
+
+	slot_free_req.slot_uid = ne_enclave->slot_uid;
+
+	rc = ne_do_request(ne_enclave->pdev, SLOT_FREE, &slot_free_req,
+			   sizeof(slot_free_req), &cmd_reply,
+			   sizeof(cmd_reply));
+	if (rc < 0) {
+		dev_err_ratelimited(ne_misc_dev.this_device,
+				    "Error in slot free [rc=%d]\n", rc);
+
+		goto unlock_mutex;
+	}
+
+	ne_pci_dev_remove_enclave_entry(ne_enclave, ne_pci_dev);
+	ne_enclave_remove_all_mem_region_entries(ne_enclave);
+	ne_enclave_remove_all_vcpu_id_entries(ne_enclave);
+
+	mutex_unlock(&ne_enclave->enclave_info_mutex);
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+	kfree(ne_enclave);
+
+	return 0;
+
+unlock_mutex:
+	mutex_unlock(&ne_enclave->enclave_info_mutex);
+	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+
+	return rc;
+}
+
 static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
 {
 	__poll_t mask = 0;
@@ -959,6 +1134,7 @@ static const struct file_operations ne_enclave_fops = {
 	.llseek		= noop_llseek,
 	.poll		= ne_enclave_poll,
 	.unlocked_ioctl	= ne_enclave_ioctl,
+	.release	= ne_enclave_release,
 };
 
 /**
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (12 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 11:28   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 15/18] nitro_enclaves: Add Makefile " Andra Paraschiv
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Add PCI and SMP dependencies.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Kconfig to match the drivers/virt/nitro_enclaves
  directory.
* Update help in Kconfig.
---
 drivers/virt/Kconfig                |  2 ++
 drivers/virt/nitro_enclaves/Kconfig | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index cbc1f25c79ab..80c5f9c16ec1 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
 	     partition shuts down.
 
 source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
 endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig b/drivers/virt/nitro_enclaves/Kconfig
new file mode 100644
index 000000000000..69e41aa2222d
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+config NITRO_ENCLAVES
+	tristate "Nitro Enclaves Support"
+	depends on HOTPLUG_CPU && PCI && SMP
+	help
+	  This driver consists of support for enclave lifetime management
+	  for Nitro Enclaves (NE).
+
+	  To compile this driver as a module, choose M here.
+	  The module will be called nitro_enclaves.
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (13 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 11:30   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage Andra Paraschiv
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* No changes.

v2 -> v3

* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* Update path to Makefile to match the drivers/virt/nitro_enclaves
  directory.
---
 drivers/virt/Makefile                |  2 ++
 drivers/virt/nitro_enclaves/Makefile | 11 +++++++++++
 2 files changed, 13 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)	+= fsl_hypervisor.o
 obj-y				+= vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)	+= nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index 000000000000..e9f4fcd1591e
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (14 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 15/18] nitro_enclaves: Add Makefile " Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-07-06 11:39   ` Alexander Graf
  2020-06-22 20:03 ` [PATCH v4 17/18] nitro_enclaves: Add overview documentation Andra Paraschiv
  2020-06-22 20:03 ` [PATCH v4 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver Andra Paraschiv
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.

v2 -> v3

* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
  already in place.

v1 -> v2

* New in v2.
---
 samples/nitro_enclaves/.gitignore        |   2 +
 samples/nitro_enclaves/Makefile          |  16 +
 samples/nitro_enclaves/ne_ioctl_sample.c | 520 +++++++++++++++++++++++
 3 files changed, 538 insertions(+)
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore b/samples/nitro_enclaves/.gitignore
new file mode 100644
index 000000000000..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
new file mode 100644
index 000000000000..a3ec78fefb52
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall
+
+all:
+	$(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+	rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c b/samples/nitro_enclaves/ne_ioctl_sample.c
new file mode 100644
index 000000000000..572143d55d77
--- /dev/null
+++ b/samples/nitro_enclaves/ne_ioctl_sample.c
@@ -0,0 +1,520 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+/**
+ * Sample flow of using the ioctl interface provided by the Nitro Enclaves (NE)
+ * kernel driver.
+ *
+ * Usage
+ * -----
+ *
+ * Load the nitro_enclaves module, setting also the enclave CPU pool. The
+ * enclave CPUs need to be full cores from the same NUMA node. CPU 0 and its
+ * siblings have to remain available for the primary / parent VM, so they
+ * cannot be included in the enclave CPU pool.
+ *
+ * See the cpu list section from the kernel documentation.
+ * https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
+ *
+ *	insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
+ *	lsmod
+ *
+ *	The CPU pool can be set at runtime, after the kernel module is loaded.
+ *
+ *	echo <cpu-list> > /sys/module/nitro_enclaves/parameters/ne_cpus
+ *
+ *	NUMA and CPU siblings information can be found using
+ *
+ *	lscpu
+ *	/proc/cpuinfo
+ *
+ * Check the online / offline CPU list. The CPUs from the pool should be
+ * offlined.
+ *
+ *	lscpu
+ *
+ * Check dmesg for any warnings / errors through the NE driver lifetime / usage.
+ * The NE logs contain the "nitro_enclaves" or "pci 0000:00:02.0" pattern.
+ *
+ *	dmesg
+ *
+ * Setup hugetlbfs huge pages. The memory needs to be from the same NUMA node as
+ * the enclave CPUs.
+ * https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
+ *
+ *	echo <nr_hugepages> > /proc/sys/vm/nr_hugepages
+ *
+ *	or set the number of 2 MiB / 1 GiB hugepages using
+ *
+ *	/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+ *	/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
+ *
+ *	In this example 256 hugepages of 2 MiB are used.
+ *
+ * Build and run the NE sample.
+ *
+ *	make -C samples/nitro_enclaves clean
+ *	make -C samples/nitro_enclaves
+ *	./samples/nitro_enclaves/ne_ioctl_sample <path_to_enclave_image>
+ *
+ * Unload the nitro_enclaves module.
+ *
+ *	rmmod nitro_enclaves
+ *	lsmod
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <poll.h>
+#include <pthread.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/eventfd.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <linux/nitro_enclaves.h>
+#include <linux/vm_sockets.h>
+
+/* Nitro Enclaves (NE) misc device that provides the ioctl interface. */
+#define NE_DEV_NAME "/dev/nitro_enclaves"
+#define NE_EXPECTED_API_VERSION (1)
+
+/* Timeout in seconds / milliseconds for each poll event. */
+#define NE_POLL_WAIT_TIME (60)
+#define NE_POLL_WAIT_TIME_MS (NE_POLL_WAIT_TIME * 1000)
+
+/* Amount of time in seconds for the process to keep the enclave alive. */
+#define NE_SLEEP_TIME (300)
+
+/* Enclave vCPUs metadata. */
+#define NE_DEFAULT_NR_VCPUS (2)
+
+/* Enclave memory metadata */
+
+/* Min memory size - 2 MiB */
+#define NE_MIN_MEM_REGION_SIZE (2 * 1024 * 1024)
+
+/* 256 memory regions of 2 MiB */
+#define NE_DEFAULT_NR_MEM_REGIONS (256)
+
+/* Vsock addressing for enclave image loading heartbeat. */
+#define NE_IMAGE_LOAD_VSOCK_CID (3)
+#define NE_IMAGE_LOAD_VSOCK_PORT (9000)
+#define NE_IMAGE_LOAD_HEARTBEAT_VALUE (0xb7)
+
+struct ne_mem_region {
+	void *mem_addr;
+	size_t mem_size;
+};
+
+struct ne_vcpu {
+	int vcpu_fd;
+	unsigned int vcpu_id;
+};
+
+/* Thread function for polling the enclave fd. */
+void *ne_poll_enclave_fd(void *data)
+{
+	int enclave_fd = *(int *)data;
+	struct pollfd fds[1] = {};
+	int i = 0;
+	int rc = 0;
+
+	printf("Running from poll thread, enclave fd %d\n", enclave_fd);
+
+	fds[0].fd = enclave_fd;
+	fds[0].events = POLLIN | POLLERR | POLLHUP;
+
+	/* Keep on polling until the current process is terminated. */
+	while (1) {
+		printf("[iter %d] Polling ...\n", i);
+
+		rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
+		if (rc < 0) {
+			printf("Error in poll [%m]\n");
+
+			return NULL;
+		}
+
+		i++;
+
+		if (!rc) {
+			printf("Poll: %d seconds elapsed\n",
+			       i * NE_POLL_WAIT_TIME);
+
+			continue;
+		}
+
+		printf("Poll received value %d\n", fds[0].revents);
+	}
+
+	return NULL;
+}
+
+/* Allocate memory region that will be used for the enclave. */
+static int ne_alloc_mem_region(struct ne_mem_region *ne_mem_region)
+{
+	if (!ne_mem_region)
+		return -EINVAL;
+
+	if (!ne_mem_region->mem_size)
+		return -EINVAL;
+
+	ne_mem_region->mem_addr = mmap(NULL, ne_mem_region->mem_size,
+				       PROT_READ | PROT_WRITE,
+				       MAP_PRIVATE | MAP_ANONYMOUS |
+				       MAP_HUGETLB, -1, 0);
+	if (ne_mem_region->mem_addr == MAP_FAILED) {
+		printf("Error in mmap memory [%m]\n");
+
+		return -1;
+	}
+
+	return 0;
+}
+
+/* Place enclave image in enclave memory. */
+static int ne_load_enclave_image(int enclave_fd,
+	struct ne_mem_region ne_mem_regions[], char enclave_image_path[])
+{
+	struct ne_image_load_info image_load_info = {};
+	int rc = 0;
+
+	if (enclave_fd < 0)
+		return -EINVAL;
+
+	image_load_info.flags = NE_EIF_IMAGE;
+
+	rc = ioctl(enclave_fd, NE_GET_IMAGE_LOAD_INFO, &image_load_info);
+	if (rc < 0) {
+		printf("Error in get image load info [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	printf("Enclave image offset in enclave memory is %lld\n",
+	       image_load_info.memory_offset);
+
+	/*
+	 * TODO: Copy enclave image in enclave memory starting from the given
+	 * offset.
+	 */
+
+	return 0;
+}
+
+/* Wait for a hearbeat from the enclave to check it has booted. */
+static int ne_check_enclave_booted(void)
+{
+	struct sockaddr_vm client_vsock_addr = {};
+	socklen_t client_vsock_len = sizeof(client_vsock_addr);
+	struct pollfd fds[1] = {};
+	int rc = 0;
+	unsigned char recv_buf = 0;
+	struct sockaddr_vm server_vsock_addr = {
+		.svm_family = AF_VSOCK,
+		.svm_cid = NE_IMAGE_LOAD_VSOCK_CID,
+		.svm_port = NE_IMAGE_LOAD_VSOCK_PORT,
+	};
+	int server_vsock_fd = 0;
+
+	server_vsock_fd = socket(AF_VSOCK, SOCK_STREAM, 0);
+	if (server_vsock_fd < 0) {
+		rc = server_vsock_fd;
+
+		printf("Error in socket [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	rc = bind(server_vsock_fd, (struct sockaddr *)&server_vsock_addr,
+		  sizeof(server_vsock_addr));
+	if (rc < 0) {
+		printf("Error in bind [rc=%d]\n", rc);
+
+		goto out;
+	}
+
+	rc = listen(server_vsock_fd, 1);
+	if (rc < 0) {
+		printf("Error in listen [rc=%d]\n", rc);
+
+		goto out;
+	}
+
+	fds[0].fd = server_vsock_fd;
+	fds[0].events = POLLIN;
+
+	rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
+	if (rc < 0) {
+		printf("Error in poll [%m]\n");
+
+		goto out;
+	}
+
+	if (!rc) {
+		printf("Poll timeout, %d seconds elapsed\n", NE_POLL_WAIT_TIME);
+
+		rc = -ETIMEDOUT;
+
+		goto out;
+	}
+
+	if ((fds[0].revents & POLLIN) == 0) {
+		printf("Poll received value %d\n", fds[0].revents);
+
+		rc = -EINVAL;
+
+		goto out;
+	}
+
+	rc = accept(server_vsock_fd, (struct sockaddr *)&client_vsock_addr,
+		    &client_vsock_len);
+	if (rc < 0) {
+		printf("Error in accept [rc=%d]\n", rc);
+
+		goto out;
+	}
+
+	/*
+	 * Read the heartbeat value that the init process in the enclave sends
+	 * after vsock connect.
+	 */
+	rc = read(server_vsock_fd, &recv_buf, sizeof(recv_buf));
+	if (rc < 0) {
+		printf("Error in read [rc=%d]\n", rc);
+
+		goto out;
+	}
+
+	if (rc != sizeof(recv_buf) ||
+	    recv_buf != NE_IMAGE_LOAD_HEARTBEAT_VALUE) {
+		printf("Read %d instead of %d\n", recv_buf,
+		       NE_IMAGE_LOAD_HEARTBEAT_VALUE);
+
+		goto out;
+	}
+
+	close(server_vsock_fd);
+
+	return 0;
+
+out:
+	close(server_vsock_fd);
+
+	return rc;
+}
+
+/* Set memory region for the given enclave. */
+static int ne_set_mem_region(int enclave_fd, struct ne_mem_region ne_mem_region)
+{
+	struct ne_user_memory_region mem_region = {};
+	int rc = 0;
+
+	if (enclave_fd < 0)
+		return -EINVAL;
+
+	mem_region.memory_size = ne_mem_region.mem_size;
+	mem_region.userspace_addr = (__u64)ne_mem_region.mem_addr;
+
+	rc = ioctl(enclave_fd, NE_SET_USER_MEMORY_REGION, &mem_region);
+	if (rc < 0) {
+		printf("Error in set user memory region [rc=%d]\n", rc);
+
+		return rc;
+	}
+
+	return 0;
+}
+
+/* Unmap all the memory regions that were set aside for the  enclave. */
+static void ne_free_mem_regions(struct ne_mem_region ne_mem_regions[])
+{
+	unsigned int i = 0;
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++)
+		munmap(ne_mem_regions[i].mem_addr, ne_mem_regions[i].mem_size);
+}
+
+/* Create enclave vCPU. */
+static int ne_create_vcpu(int enclave_fd, struct ne_vcpu *ne_vcpu)
+{
+	if (enclave_fd < 0)
+		return -EINVAL;
+
+	if (!ne_vcpu)
+		return -EINVAL;
+
+	ne_vcpu->vcpu_fd = ioctl(enclave_fd, NE_CREATE_VCPU, &ne_vcpu->vcpu_id);
+	if (ne_vcpu->vcpu_fd < 0) {
+		printf("Error in create vcpu [rc=%d]\n", ne_vcpu->vcpu_fd);
+
+		return ne_vcpu->vcpu_fd;
+	}
+
+	return 0;
+}
+
+/* Release enclave vCPU fd(s). */
+static void ne_release_vcpus(struct ne_vcpu ne_vcpus[])
+{
+	unsigned int i = 0;
+
+	for (i = 0; i < NE_DEFAULT_NR_VCPUS; i++)
+		if (ne_vcpus[i].vcpu_fd > 0)
+			close(ne_vcpus[i].vcpu_fd);
+}
+
+int main(int argc, char *argv[])
+{
+	int enclave_fd = 0;
+	char enclave_image_path[PATH_MAX] = {};
+	struct ne_enclave_start_info enclave_start_info = {};
+	unsigned int i = 0;
+	int ne_api_version = 0;
+	int ne_dev_fd = 0;
+	struct ne_mem_region ne_mem_regions[NE_DEFAULT_NR_MEM_REGIONS] = {};
+	struct ne_vcpu ne_vcpus[NE_DEFAULT_NR_VCPUS] = {};
+	int rc = 0;
+	unsigned long slot_uid = 0;
+	pthread_t thread_id = 0;
+
+	if (argc != 2) {
+		printf("Usage: %s <path_to_enclave_image>\n", argv[0]);
+
+		exit(EXIT_FAILURE);
+	}
+
+	strncpy(enclave_image_path, argv[1], sizeof(enclave_image_path) - 1);
+
+	ne_dev_fd = open(NE_DEV_NAME, O_RDWR | O_CLOEXEC);
+	if (ne_dev_fd < 0) {
+		printf("Error in open NE device [rc=%d]\n", ne_dev_fd);
+
+		exit(EXIT_FAILURE);
+	}
+
+	ne_api_version = ioctl(ne_dev_fd, NE_GET_API_VERSION);
+	if (ne_api_version != NE_EXPECTED_API_VERSION) {
+		printf("Expected API version %d, provided API version %d\n",
+		       NE_EXPECTED_API_VERSION, ne_api_version);
+
+		close(ne_dev_fd);
+
+		exit(EXIT_FAILURE);
+	}
+
+	printf("Creating enclave slot ...\n");
+
+	enclave_fd = ioctl(ne_dev_fd, NE_CREATE_VM, &slot_uid);
+
+	close(ne_dev_fd);
+
+	if (enclave_fd < 0) {
+		printf("Error in create enclave slot [rc=%d]\n", enclave_fd);
+
+		exit(EXIT_FAILURE);
+	}
+
+	printf("Enclave fd %d\n", enclave_fd);
+
+	rc = pthread_create(&thread_id, NULL, ne_poll_enclave_fd,
+			    (void *)&enclave_fd);
+	if (rc < 0) {
+		printf("Error in thread create [rc=%d]\n", rc);
+
+		close(enclave_fd);
+
+		exit(EXIT_FAILURE);
+	}
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
+		ne_mem_regions[i].mem_size = NE_MIN_MEM_REGION_SIZE;
+		rc = ne_alloc_mem_region(&ne_mem_regions[i]);
+		if (rc < 0) {
+			printf("Error in alloc mem region, iter %d [rc=%d]\n",
+			       i, rc);
+
+			goto release_enclave_fd;
+		}
+	}
+
+	rc = ne_load_enclave_image(enclave_fd, ne_mem_regions,
+				   enclave_image_path);
+	if (rc < 0) {
+		printf("Error in load enclave image [rc=%d]\n", rc);
+
+		goto release_enclave_fd;
+	}
+
+	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
+		rc = ne_set_mem_region(enclave_fd, ne_mem_regions[i]);
+		if (rc < 0) {
+			printf("Error in set mem region, iter %d [rc=%d]\n",
+			       i, rc);
+
+			goto release_enclave_fd;
+		}
+	}
+
+	printf("Enclave memory regions were added\n");
+
+	for (i = 0; i < NE_DEFAULT_NR_VCPUS; i++) {
+		/*
+		 * The vCPU is chosen from the enclave vCPU pool, if the value
+		 * of the vcpu_id is 0.
+		 */
+		ne_vcpus[i].vcpu_id = 0;
+		rc = ne_create_vcpu(enclave_fd, &ne_vcpus[i]);
+		if (rc < 0) {
+			printf("Error in create vcpu, iter %d [rc=%d]\n",
+			       i, rc);
+
+			goto release_enclave_vcpu_fds;
+		}
+	}
+
+	printf("Enclave vCPUs were created\n");
+
+	rc = ioctl(enclave_fd, NE_START_ENCLAVE, &enclave_start_info);
+	if (rc < 0) {
+		printf("Error in start enclave [rc=%d]\n", rc);
+
+		goto release_enclave_vcpu_fds;
+	}
+
+	printf("Enclave started, CID %llu\n", enclave_start_info.enclave_cid);
+
+	/*
+	 * TODO: Check for enclave hearbeat after it has started to see if it
+	 * has booted.
+	 */
+
+	printf("Entering sleep for %d seconds ...\n", NE_SLEEP_TIME);
+
+	sleep(NE_SLEEP_TIME);
+
+	ne_release_vcpus(ne_vcpus);
+
+	close(enclave_fd);
+
+	ne_free_mem_regions(ne_mem_regions);
+
+	exit(EXIT_SUCCESS);
+
+release_enclave_vcpu_fds:
+	ne_release_vcpus(ne_vcpus);
+release_enclave_fd:
+	close(enclave_fd);
+	ne_free_mem_regions(ne_mem_regions);
+
+	exit(EXIT_FAILURE);
+}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 17/18] nitro_enclaves: Add overview documentation
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (15 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  2020-06-23  8:59   ` Stefan Hajnoczi
  2020-06-22 20:03 ` [PATCH v4 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver Andra Paraschiv
  17 siblings, 1 reply; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.

v2 -> v3

* No changes.

v1 -> v2

* New in v2.
---
 Documentation/nitro_enclaves/ne_overview.rst | 87 ++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.rst

diff --git a/Documentation/nitro_enclaves/ne_overview.rst b/Documentation/nitro_enclaves/ne_overview.rst
new file mode 100644
index 000000000000..21b4ba932000
--- /dev/null
+++ b/Documentation/nitro_enclaves/ne_overview.rst
@@ -0,0 +1,87 @@
+Nitro Enclaves
+==============
+
+Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
+that allows customers to carve out isolated compute environments within EC2
+instances [1].
+
+For example, an application that processes sensitive data and runs in a VM,
+can be separated from other applications running in the same VM. This
+application then runs in a separate VM than the primary VM, namely an enclave.
+
+An enclave runs alongside the VM that spawned it. This setup matches low latency
+applications needs. The resources that are allocated for the enclave, such as
+memory and CPU, are carved out of the primary VM. Each enclave is mapped to a
+process running in the primary VM, that communicates with the NE driver via an
+ioctl interface.
+
+In this sense, there are two components:
+
+1. An enclave abstraction process - a user space process running in the primary
+VM guest  that uses the provided ioctl interface of the NE driver to spawn an
+enclave VM (that's 2 below).
+
+There is a NE emulated PCI device exposed to the primary VM. The driver for this
+new PCI device is included in the NE driver.
+
+The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
+maps to an enclave start PCI command. The PCI device commands are then
+translated into  actions taken on the hypervisor side; that's the Nitro
+hypervisor running on the host where the primary VM is running. The Nitro
+hypervisor is based on core KVM technology.
+
+2. The enclave itself - a VM running on the same host as the primary VM that
+spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
+for the enclave VM. An enclave does not have persistent storage attached.
+
+The memory regions carved out of the primary VM and given to an enclave need to
+be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
+this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
+user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
+The enclave memory and CPUs need to be from the same NUMA node.
+
+An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
+available for the primary VM. A CPU pool has to be set for NE purposes by an
+user with admin capability. See the cpu list section from the kernel
+documentation [4] for how a CPU pool format looks.
+
+An enclave communicates with the primary VM via a local communication channel,
+using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
+while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
+uses eventfd for signaling. The enclave VM sees the usual interfaces - local
+APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
+device is placed in memory below the typical 4 GiB.
+
+The application that runs in the enclave needs to be packaged in an enclave
+image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
+enclave VM. The enclave VM has its own kernel and follows the standard Linux
+boot protocol.
+
+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
+used to check in the primary VM that the enclave has booted.
+
+If the enclave VM crashes or gracefully exits, an interrupt event is received by
+the NE driver. This event is sent further to the user space enclave process
+running in the primary VM via a poll notification mechanism. Then the user space
+enclave process can exit.
+
+[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
+[3] https://lwn.net/Articles/807108/
+[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
+[5] https://man7.org/linux/man-pages/man7/vsock.7.html
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver
  2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
                   ` (16 preceding siblings ...)
  2020-06-22 20:03 ` [PATCH v4 17/18] nitro_enclaves: Add overview documentation Andra Paraschiv
@ 2020-06-22 20:03 ` Andra Paraschiv
  17 siblings, 0 replies; 67+ messages in thread
From: Andra Paraschiv @ 2020-06-22 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden,
	Alexander Graf, Greg KH, Martin Pohlack, Matt Wilson,
	Paolo Bonzini, Balbir Singh, Stefano Garzarella, Stefan Hajnoczi,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream,
	Andra Paraschiv

Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
---
Changelog

v3 -> v4

* No changes.

v2 -> v3

* Update file entries to be in alphabetical order.

v1 -> v2

* No changes.
---
 MAINTAINERS | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b..66f35c4de16f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12115,6 +12115,19 @@ S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F:	arch/nios2/
 
+NITRO ENCLAVES (NE)
+M:	Andra Paraschiv <andraprs@amazon.com>
+M:	Alexandru Vasile <lexnv@amazon.com>
+M:	Alexandru Ciobotaru <alcioa@amazon.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+W:	https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F:	Documentation/nitro_enclaves/
+F:	drivers/virt/nitro_enclaves/
+F:	include/linux/nitro_enclaves.h
+F:	include/uapi/linux/nitro_enclaves.h
+F:	samples/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M:	Frederic Weisbecker <fweisbec@gmail.com>
 M:	Thomas Gleixner <tglx@linutronix.de>
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition
  2020-06-22 20:03 ` [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition Andra Paraschiv
@ 2020-06-23  8:56   ` Stefan Hajnoczi
  2020-06-24 14:02     ` Paraschiv, Andra-Irina
  2020-07-02 15:24   ` Alexander Graf
  1 sibling, 1 reply; 67+ messages in thread
From: Stefan Hajnoczi @ 2020-06-23  8:56 UTC (permalink / raw)
  To: Andra Paraschiv
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

On Mon, Jun 22, 2020 at 11:03:12PM +0300, Andra Paraschiv wrote:
> diff --git a/include/uapi/linux/nitro_enclaves.h b/include/uapi/linux/nitro_enclaves.h
> new file mode 100644
> index 000000000000..3270eb939a97
> --- /dev/null
> +++ b/include/uapi/linux/nitro_enclaves.h
> @@ -0,0 +1,137 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + */
> +
> +#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
> +#define _UAPI_LINUX_NITRO_ENCLAVES_H_
> +
> +#include <linux/types.h>
> +
> +/* Nitro Enclaves (NE) Kernel Driver Interface */
> +
> +#define NE_API_VERSION (1)
> +
> +/**
> + * The command is used to get the version of the NE API. This way the user space
> + * processes can be aware of the feature sets provided by the NE kernel driver.
> + *
> + * The NE API version is returned as result of this ioctl call.
> + */
> +#define NE_GET_API_VERSION _IO(0xAE, 0x20)
> +
> +/**
> + * The command is used to create a slot that is associated with an enclave VM.
> + *
> + * The generated unique slot id is a read parameter of this command. An enclave
> + * file descriptor is returned as result of this ioctl call. The enclave fd can
> + * be further used with ioctl calls to set vCPUs and memory regions, then start
> + * the enclave.
> + */
> +#define NE_CREATE_VM _IOR(0xAE, 0x21, __u64)

Information that would be useful for the ioctls:

1. Which fd the ioctl must be invoked on (/dev/nitro-enclaves, enclave fd, vCPU fd)

2. Errnos and their meanings

3. Which state(s) the ioctls may be invoked in (e.g. enclave created/started/etc)

> +/* User memory region flags */
> +
> +/* Memory region for enclave general usage. */
> +#define NE_DEFAULT_MEMORY_REGION (0x00)
> +
> +/* Memory region to be set for an enclave (write). */
> +struct ne_user_memory_region {
> +	/**
> +	 * Flags to determine the usage for the memory region (write).
> +	 */
> +	__u64 flags;

Where is the write flag defined?

I guess it's supposed to be:

  #define NE_USER_MEMORY_REGION_FLAG_WRITE (0x01)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 17/18] nitro_enclaves: Add overview documentation
  2020-06-22 20:03 ` [PATCH v4 17/18] nitro_enclaves: Add overview documentation Andra Paraschiv
@ 2020-06-23  8:59   ` Stefan Hajnoczi
  2020-06-24 14:39     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Stefan Hajnoczi @ 2020-06-23  8:59 UTC (permalink / raw)
  To: Andra Paraschiv
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]

On Mon, Jun 22, 2020 at 11:03:28PM +0300, Andra Paraschiv wrote:
> +The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
> +Enclave Image Format (EIF); plus an EIF header including metadata such as magic
> +number, eif version, image size and CRC.
> +
> +Hash values are computed for the entire enclave image (EIF), the kernel and
> +ramdisk(s). That's used, for example, to check that the enclave image that is
> +loaded in the enclave VM is the one that was intended to be run.
> +
> +These crypto measurements are included in a signed attestation document
> +generated by the Nitro Hypervisor and further used to prove the identity of the
> +enclave; KMS is an example of service that NE is integrated with and that checks
> +the attestation doc.
> +
> +The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
> +init process in the enclave connects to the vsock CID of the primary VM and a
> +predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
> +used to check in the primary VM that the enclave has booted.
> +
> +If the enclave VM crashes or gracefully exits, an interrupt event is received by
> +the NE driver. This event is sent further to the user space enclave process
> +running in the primary VM via a poll notification mechanism. Then the user space
> +enclave process can exit.
> +
> +[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
> +[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
> +[3] https://lwn.net/Articles/807108/
> +[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
> +[5] https://man7.org/linux/man-pages/man7/vsock.7.html

Is the EIF specification and the attestation protocol available?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition
  2020-06-23  8:56   ` Stefan Hajnoczi
@ 2020-06-24 14:02     ` Paraschiv, Andra-Irina
  2020-06-25 13:29       ` Stefan Hajnoczi
  0 siblings, 1 reply; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-06-24 14:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream



On 23/06/2020 11:56, Stefan Hajnoczi wrote:
> On Mon, Jun 22, 2020 at 11:03:12PM +0300, Andra Paraschiv wrote:
>> diff --git a/include/uapi/linux/nitro_enclaves.h b/include/uapi/linux/nitro_enclaves.h
>> new file mode 100644
>> index 000000000000..3270eb939a97
>> --- /dev/null
>> +++ b/include/uapi/linux/nitro_enclaves.h
>> @@ -0,0 +1,137 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +/*
>> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
>> + */
>> +
>> +#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
>> +#define _UAPI_LINUX_NITRO_ENCLAVES_H_
>> +
>> +#include <linux/types.h>
>> +
>> +/* Nitro Enclaves (NE) Kernel Driver Interface */
>> +
>> +#define NE_API_VERSION (1)
>> +
>> +/**
>> + * The command is used to get the version of the NE API. This way the user space
>> + * processes can be aware of the feature sets provided by the NE kernel driver.
>> + *
>> + * The NE API version is returned as result of this ioctl call.
>> + */
>> +#define NE_GET_API_VERSION _IO(0xAE, 0x20)
>> +
>> +/**
>> + * The command is used to create a slot that is associated with an enclave VM.
>> + *
>> + * The generated unique slot id is a read parameter of this command. An enclave
>> + * file descriptor is returned as result of this ioctl call. The enclave fd can
>> + * be further used with ioctl calls to set vCPUs and memory regions, then start
>> + * the enclave.
>> + */
>> +#define NE_CREATE_VM _IOR(0xAE, 0x21, __u64)
> Information that would be useful for the ioctls:
>
> 1. Which fd the ioctl must be invoked on (/dev/nitro-enclaves, enclave fd, vCPU fd)
>
> 2. Errnos and their meanings
>
> 3. Which state(s) the ioctls may be invoked in (e.g. enclave created/started/etc)

I'll include this info in v5. Indeed, that's useful for the user space 
tooling that interacts with the kernel driver, in addition to the code 
review itself and future refs, to understand how it works.

>
>> +/* User memory region flags */
>> +
>> +/* Memory region for enclave general usage. */
>> +#define NE_DEFAULT_MEMORY_REGION (0x00)
>> +
>> +/* Memory region to be set for an enclave (write). */
>> +struct ne_user_memory_region {
>> +	/**
>> +	 * Flags to determine the usage for the memory region (write).
>> +	 */
>> +	__u64 flags;
> Where is the write flag defined?
>
> I guess it's supposed to be:
>
>    #define NE_USER_MEMORY_REGION_FLAG_WRITE (0x01)

For now, the flags field is included in the NE ioctl interface for 
extensions, it is not part of the NE PCI device interface yet.

The enclave image is copied into enclave memory before the enclave 
memory is carved out of the primary / parent VM. After carving it out 
(when the command request to add memory is sent to the PCI device and it 
is successfully completed), there will be faults if the enclave memory 
is written from the primary / parent VM.

Ah, and just as a note, that "read" / "write" in parentheses means that 
a certain data structure / field is read / written by user space. I 
updated to use "in" / "out" instead of "read" / "write" in v5.

Thank you.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 17/18] nitro_enclaves: Add overview documentation
  2020-06-23  8:59   ` Stefan Hajnoczi
@ 2020-06-24 14:39     ` Paraschiv, Andra-Irina
  2020-06-25 13:10       ` Stefan Hajnoczi
  0 siblings, 1 reply; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-06-24 14:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream



On 23/06/2020 11:59, Stefan Hajnoczi wrote:
> On Mon, Jun 22, 2020 at 11:03:28PM +0300, Andra Paraschiv wrote:
>> +The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
>> +Enclave Image Format (EIF); plus an EIF header including metadata such as magic
>> +number, eif version, image size and CRC.
>> +
>> +Hash values are computed for the entire enclave image (EIF), the kernel and
>> +ramdisk(s). That's used, for example, to check that the enclave image that is
>> +loaded in the enclave VM is the one that was intended to be run.
>> +
>> +These crypto measurements are included in a signed attestation document
>> +generated by the Nitro Hypervisor and further used to prove the identity of the
>> +enclave; KMS is an example of service that NE is integrated with and that checks
>> +the attestation doc.
>> +
>> +The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
>> +init process in the enclave connects to the vsock CID of the primary VM and a
>> +predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
>> +used to check in the primary VM that the enclave has booted.
>> +
>> +If the enclave VM crashes or gracefully exits, an interrupt event is received by
>> +the NE driver. This event is sent further to the user space enclave process
>> +running in the primary VM via a poll notification mechanism. Then the user space
>> +enclave process can exit.
>> +
>> +[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
>> +[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>> +[3] https://lwn.net/Articles/807108/
>> +[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
>> +[5] https://man7.org/linux/man-pages/man7/vsock.7.html
> Is the EIF specification and the attestation protocol available?

For now, they are not publicly available. Once the refs are available 
(e.g. AWS documentation, GitHub documentation), I'll include them in the 
kernel documentation as well.

As a note here, the NE project is currently in preview 
(https://aws.amazon.com/ec2/nitro/nitro-enclaves/) and part of the 
documentation / codebase will be publicly available when NE is generally 
available (GA). This will be in addition to the ones already publicly 
available, like the NE kernel driver.

Let me know if I can help with any particular questions / clarifications.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 17/18] nitro_enclaves: Add overview documentation
  2020-06-24 14:39     ` Paraschiv, Andra-Irina
@ 2020-06-25 13:10       ` Stefan Hajnoczi
  2020-06-25 17:36         ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Stefan Hajnoczi @ 2020-06-25 13:10 UTC (permalink / raw)
  To: Paraschiv, Andra-Irina
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream

[-- Attachment #1: Type: text/plain, Size: 2621 bytes --]

On Wed, Jun 24, 2020 at 05:39:39PM +0300, Paraschiv, Andra-Irina wrote:
> 
> 
> On 23/06/2020 11:59, Stefan Hajnoczi wrote:
> > On Mon, Jun 22, 2020 at 11:03:28PM +0300, Andra Paraschiv wrote:
> > > +The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
> > > +Enclave Image Format (EIF); plus an EIF header including metadata such as magic
> > > +number, eif version, image size and CRC.
> > > +
> > > +Hash values are computed for the entire enclave image (EIF), the kernel and
> > > +ramdisk(s). That's used, for example, to check that the enclave image that is
> > > +loaded in the enclave VM is the one that was intended to be run.
> > > +
> > > +These crypto measurements are included in a signed attestation document
> > > +generated by the Nitro Hypervisor and further used to prove the identity of the
> > > +enclave; KMS is an example of service that NE is integrated with and that checks
> > > +the attestation doc.
> > > +
> > > +The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
> > > +init process in the enclave connects to the vsock CID of the primary VM and a
> > > +predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
> > > +used to check in the primary VM that the enclave has booted.
> > > +
> > > +If the enclave VM crashes or gracefully exits, an interrupt event is received by
> > > +the NE driver. This event is sent further to the user space enclave process
> > > +running in the primary VM via a poll notification mechanism. Then the user space
> > > +enclave process can exit.
> > > +
> > > +[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
> > > +[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
> > > +[3] https://lwn.net/Articles/807108/
> > > +[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
> > > +[5] https://man7.org/linux/man-pages/man7/vsock.7.html
> > Is the EIF specification and the attestation protocol available?
> 
> For now, they are not publicly available. Once the refs are available (e.g.
> AWS documentation, GitHub documentation), I'll include them in the kernel
> documentation as well.
> 
> As a note here, the NE project is currently in preview
> (https://aws.amazon.com/ec2/nitro/nitro-enclaves/) and part of the
> documentation / codebase will be publicly available when NE is generally
> available (GA). This will be in addition to the ones already publicly
> available, like the NE kernel driver.
> 
> Let me know if I can help with any particular questions / clarifications.

Thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition
  2020-06-24 14:02     ` Paraschiv, Andra-Irina
@ 2020-06-25 13:29       ` Stefan Hajnoczi
  2020-06-25 17:42         ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Stefan Hajnoczi @ 2020-06-25 13:29 UTC (permalink / raw)
  To: Paraschiv, Andra-Irina
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

On Wed, Jun 24, 2020 at 05:02:54PM +0300, Paraschiv, Andra-Irina wrote:
> On 23/06/2020 11:56, Stefan Hajnoczi wrote:
> > On Mon, Jun 22, 2020 at 11:03:12PM +0300, Andra Paraschiv wrote:
> > > +/* User memory region flags */
> > > +
> > > +/* Memory region for enclave general usage. */
> > > +#define NE_DEFAULT_MEMORY_REGION (0x00)
> > > +
> > > +/* Memory region to be set for an enclave (write). */
> > > +struct ne_user_memory_region {
> > > +	/**
> > > +	 * Flags to determine the usage for the memory region (write).
> > > +	 */
> > > +	__u64 flags;
> > Where is the write flag defined?
> > 
> > I guess it's supposed to be:
> > 
> >    #define NE_USER_MEMORY_REGION_FLAG_WRITE (0x01)
> 
> For now, the flags field is included in the NE ioctl interface for
> extensions, it is not part of the NE PCI device interface yet.
...
> Ah, and just as a note, that "read" / "write" in parentheses means that a
> certain data structure / field is read / written by user space. I updated to
> use "in" / "out" instead of "read" / "write" in v5.

Oops, I got confused. I thought "(write)" was an example of a flag that
can be set on the memory region. Now I realize "write" means this field
is an input to the ioctl. :)

Thanks for updating the docs.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 17/18] nitro_enclaves: Add overview documentation
  2020-06-25 13:10       ` Stefan Hajnoczi
@ 2020-06-25 17:36         ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-06-25 17:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream



On 25/06/2020 16:10, Stefan Hajnoczi wrote:
> On Wed, Jun 24, 2020 at 05:39:39PM +0300, Paraschiv, Andra-Irina wrote:
>>
>> On 23/06/2020 11:59, Stefan Hajnoczi wrote:
>>> On Mon, Jun 22, 2020 at 11:03:28PM +0300, Andra Paraschiv wrote:
>>>> +The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
>>>> +Enclave Image Format (EIF); plus an EIF header including metadata such as magic
>>>> +number, eif version, image size and CRC.
>>>> +
>>>> +Hash values are computed for the entire enclave image (EIF), the kernel and
>>>> +ramdisk(s). That's used, for example, to check that the enclave image that is
>>>> +loaded in the enclave VM is the one that was intended to be run.
>>>> +
>>>> +These crypto measurements are included in a signed attestation document
>>>> +generated by the Nitro Hypervisor and further used to prove the identity of the
>>>> +enclave; KMS is an example of service that NE is integrated with and that checks
>>>> +the attestation doc.
>>>> +
>>>> +The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
>>>> +init process in the enclave connects to the vsock CID of the primary VM and a
>>>> +predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
>>>> +used to check in the primary VM that the enclave has booted.
>>>> +
>>>> +If the enclave VM crashes or gracefully exits, an interrupt event is received by
>>>> +the NE driver. This event is sent further to the user space enclave process
>>>> +running in the primary VM via a poll notification mechanism. Then the user space
>>>> +enclave process can exit.
>>>> +
>>>> +[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
>>>> +[2] https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>>>> +[3] https://lwn.net/Articles/807108/
>>>> +[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
>>>> +[5] https://man7.org/linux/man-pages/man7/vsock.7.html
>>> Is the EIF specification and the attestation protocol available?
>> For now, they are not publicly available. Once the refs are available (e.g.
>> AWS documentation, GitHub documentation), I'll include them in the kernel
>> documentation as well.
>>
>> As a note here, the NE project is currently in preview
>> (https://aws.amazon.com/ec2/nitro/nitro-enclaves/) and part of the
>> documentation / codebase will be publicly available when NE is generally
>> available (GA). This will be in addition to the ones already publicly
>> available, like the NE kernel driver.
>>
>> Let me know if I can help with any particular questions / clarifications.
> Thanks!

You are welcome.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition
  2020-06-25 13:29       ` Stefan Hajnoczi
@ 2020-06-25 17:42         ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-06-25 17:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stewart Smith, Uwe Dannowski, kvm, ne-devel-upstream



On 25/06/2020 16:29, Stefan Hajnoczi wrote:
> On Wed, Jun 24, 2020 at 05:02:54PM +0300, Paraschiv, Andra-Irina wrote:
>> On 23/06/2020 11:56, Stefan Hajnoczi wrote:
>>> On Mon, Jun 22, 2020 at 11:03:12PM +0300, Andra Paraschiv wrote:
>>>> +/* User memory region flags */
>>>> +
>>>> +/* Memory region for enclave general usage. */
>>>> +#define NE_DEFAULT_MEMORY_REGION (0x00)
>>>> +
>>>> +/* Memory region to be set for an enclave (write). */
>>>> +struct ne_user_memory_region {
>>>> +	/**
>>>> +	 * Flags to determine the usage for the memory region (write).
>>>> +	 */
>>>> +	__u64 flags;
>>> Where is the write flag defined?
>>>
>>> I guess it's supposed to be:
>>>
>>>     #define NE_USER_MEMORY_REGION_FLAG_WRITE (0x01)
>> For now, the flags field is included in the NE ioctl interface for
>> extensions, it is not part of the NE PCI device interface yet.
> ...
>> Ah, and just as a note, that "read" / "write" in parentheses means that a
>> certain data structure / field is read / written by user space. I updated to
>> use "in" / "out" instead of "read" / "write" in v5.
> Oops, I got confused. I thought "(write)" was an example of a flag that
> can be set on the memory region. Now I realize "write" means this field
> is an input to the ioctl. :)
>
> Thanks for updating the docs.

I was thinking this may be the case. :) Should be less confusing now, 
with the "in / out" updates.

Thanks also for feedback.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-06-22 20:03 ` [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface Andra Paraschiv
@ 2020-06-29 16:20   ` Greg KH
  2020-06-29 17:45     ` Paraschiv, Andra-Irina
  2020-07-06  7:13   ` Alexander Graf
  1 sibling, 1 reply; 67+ messages in thread
From: Greg KH @ 2020-06-29 16:20 UTC (permalink / raw)
  To: Andra Paraschiv
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream

On Mon, Jun 22, 2020 at 11:03:18PM +0300, Andra Paraschiv wrote:
> +static int __init ne_init(void)
> +{
> +	struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
> +					      PCI_DEVICE_ID_NE, NULL);
> +	int rc = -EINVAL;
> +
> +	if (!pdev)
> +		return -ENODEV;

Ick, that's a _very_ old-school way of binding to a pci device.  Please
just be a "real" pci driver and your probe function will be called if
your hardware is present (or when it shows up.)  To do it this way
prevents your driver from being auto-loaded for when your hardware is
seen in the system, as well as lots of other things.

> +
> +	if (!zalloc_cpumask_var(&ne_cpu_pool.avail, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	mutex_init(&ne_cpu_pool.mutex);
> +
> +	rc = pci_register_driver(&ne_pci_driver);

Nice, you did it right here, but why the above crazy test?

> +	if (rc < 0) {
> +		dev_err(&pdev->dev,
> +			"Error in pci register driver [rc=%d]\n", rc);
> +
> +		goto free_cpumask;
> +	}
> +
> +	return 0;

You leaked a reference on that pci device, didn't you?  Not good :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-06-29 16:20   ` Greg KH
@ 2020-06-29 17:45     ` Paraschiv, Andra-Irina
  2020-06-30  8:05       ` Greg KH
  0 siblings, 1 reply; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-06-29 17:45 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream



On 29/06/2020 19:20, Greg KH wrote:
>
> On Mon, Jun 22, 2020 at 11:03:18PM +0300, Andra Paraschiv wrote:
>> +static int __init ne_init(void)
>> +{
>> +     struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
>> +                                           PCI_DEVICE_ID_NE, NULL);
>> +     int rc = -EINVAL;
>> +
>> +     if (!pdev)
>> +             return -ENODEV;
> Ick, that's a _very_ old-school way of binding to a pci device.  Please
> just be a "real" pci driver and your probe function will be called if
> your hardware is present (or when it shows up.)  To do it this way
> prevents your driver from being auto-loaded for when your hardware is
> seen in the system, as well as lots of other things.

This check is mainly here in the case any codebase is added before the 
pci driver register call below.

And if we log any error with dev_err() instead of pr_err() before the 
driver register.

That check was only for logging purposes, if done with dev_err(). I 
removed the check in v5.

>> +
>> +     if (!zalloc_cpumask_var(&ne_cpu_pool.avail, GFP_KERNEL))
>> +             return -ENOMEM;
>> +
>> +     mutex_init(&ne_cpu_pool.mutex);
>> +
>> +     rc = pci_register_driver(&ne_pci_driver);
> Nice, you did it right here, but why the above crazy test?
>
>> +     if (rc < 0) {
>> +             dev_err(&pdev->dev,
>> +                     "Error in pci register driver [rc=%d]\n", rc);
>> +
>> +             goto free_cpumask;
>> +     }
>> +
>> +     return 0;
> You leaked a reference on that pci device, didn't you?  Not good :(

Yes, the pci get device call needs its pair - pci_dev_put(). I added it 
here and for the other occurrences where it was missing.

Thanks for review.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-06-29 17:45     ` Paraschiv, Andra-Irina
@ 2020-06-30  8:05       ` Greg KH
  2020-06-30  9:08         ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Greg KH @ 2020-06-30  8:05 UTC (permalink / raw)
  To: Paraschiv, Andra-Irina
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream

On Mon, Jun 29, 2020 at 08:45:25PM +0300, Paraschiv, Andra-Irina wrote:
> 
> 
> On 29/06/2020 19:20, Greg KH wrote:
> > 
> > On Mon, Jun 22, 2020 at 11:03:18PM +0300, Andra Paraschiv wrote:
> > > +static int __init ne_init(void)
> > > +{
> > > +     struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
> > > +                                           PCI_DEVICE_ID_NE, NULL);
> > > +     int rc = -EINVAL;
> > > +
> > > +     if (!pdev)
> > > +             return -ENODEV;
> > Ick, that's a _very_ old-school way of binding to a pci device.  Please
> > just be a "real" pci driver and your probe function will be called if
> > your hardware is present (or when it shows up.)  To do it this way
> > prevents your driver from being auto-loaded for when your hardware is
> > seen in the system, as well as lots of other things.
> 
> This check is mainly here in the case any codebase is added before the pci
> driver register call below.

What do you mean by "codebase"?  You control this driver, just do all of
the logic in the probe() function, no need to do this in the module init
call.

> And if we log any error with dev_err() instead of pr_err() before the driver
> register.

Don't do that.

> That check was only for logging purposes, if done with dev_err(). I removed
> the check in v5.

Again, don't do it :)

> 
> > > +
> > > +     if (!zalloc_cpumask_var(&ne_cpu_pool.avail, GFP_KERNEL))
> > > +             return -ENOMEM;
> > > +
> > > +     mutex_init(&ne_cpu_pool.mutex);
> > > +
> > > +     rc = pci_register_driver(&ne_pci_driver);
> > Nice, you did it right here, but why the above crazy test?
> > 
> > > +     if (rc < 0) {
> > > +             dev_err(&pdev->dev,
> > > +                     "Error in pci register driver [rc=%d]\n", rc);
> > > +
> > > +             goto free_cpumask;
> > > +     }
> > > +
> > > +     return 0;
> > You leaked a reference on that pci device, didn't you?  Not good :(
> 
> Yes, the pci get device call needs its pair - pci_dev_put(). I added it here
> and for the other occurrences where it was missing.

Again, just don't do it and then you don't have to worry about any of
this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-06-30  8:05       ` Greg KH
@ 2020-06-30  9:08         ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-06-30  9:08 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, Anthony Liguori, Benjamin Herrenschmidt,
	Colm MacCarthaigh, Bjoern Doebel, David Woodhouse,
	Frank van der Linden, Alexander Graf, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream



On 30/06/2020 11:05, Greg KH wrote:
>
> On Mon, Jun 29, 2020 at 08:45:25PM +0300, Paraschiv, Andra-Irina wrote:
>>
>> On 29/06/2020 19:20, Greg KH wrote:
>>> On Mon, Jun 22, 2020 at 11:03:18PM +0300, Andra Paraschiv wrote:
>>>> +static int __init ne_init(void)
>>>> +{
>>>> +     struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
>>>> +                                           PCI_DEVICE_ID_NE, NULL);
>>>> +     int rc = -EINVAL;
>>>> +
>>>> +     if (!pdev)
>>>> +             return -ENODEV;
>>> Ick, that's a _very_ old-school way of binding to a pci device.  Please
>>> just be a "real" pci driver and your probe function will be called if
>>> your hardware is present (or when it shows up.)  To do it this way
>>> prevents your driver from being auto-loaded for when your hardware is
>>> seen in the system, as well as lots of other things.
>> This check is mainly here in the case any codebase is added before the pci
>> driver register call below.
> What do you mean by "codebase"?  You control this driver, just do all of
> the logic in the probe() function, no need to do this in the module init
> call.
>
>> And if we log any error with dev_err() instead of pr_err() before the driver
>> register.
> Don't do that.
>
>> That check was only for logging purposes, if done with dev_err(). I removed
>> the check in v5.
> Again, don't do it :)
>
>>>> +
>>>> +     if (!zalloc_cpumask_var(&ne_cpu_pool.avail, GFP_KERNEL))
>>>> +             return -ENOMEM;
>>>> +
>>>> +     mutex_init(&ne_cpu_pool.mutex);
>>>> +
>>>> +     rc = pci_register_driver(&ne_pci_driver);
>>> Nice, you did it right here, but why the above crazy test?
>>>
>>>> +     if (rc < 0) {
>>>> +             dev_err(&pdev->dev,
>>>> +                     "Error in pci register driver [rc=%d]\n", rc);
>>>> +
>>>> +             goto free_cpumask;
>>>> +     }
>>>> +
>>>> +     return 0;
>>> You leaked a reference on that pci device, didn't you?  Not good :(
>> Yes, the pci get device call needs its pair - pci_dev_put(). I added it here
>> and for the other occurrences where it was missing.
> Again, just don't do it and then you don't have to worry about any of
> this.

Yup, already started this morning to check & update where we can go 
without this call to get a PCI device reference, as a follow-up to what 
we discussed yesterday.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/18] nitro_enclaves: Init PCI device driver
  2020-06-22 20:03 ` [PATCH v4 04/18] nitro_enclaves: Init PCI device driver Andra Paraschiv
@ 2020-07-02 15:09   ` Alexander Graf
  2020-07-04 10:00     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-02 15:09 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> The Nitro Enclaves PCI device is used by the kernel driver as a means of
> communication with the hypervisor on the host where the primary VM and
> the enclaves run. It handles requests with regard to enclave lifetime.
> 
> Setup the PCI device driver and add support for MSI-X interrupts.
> 
> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
> Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Use dev_err instead of custom NE log pattern.
> * Update NE PCI driver name to "nitro_enclaves".
> 
> v2 -> v3
> 
> * Remove the GPL additional wording as SPDX-License-Identifier is
>    already in place.
> * Remove the WARN_ON calls.
> * Remove linux/bug include that is not needed.
> * Update static calls sanity checks.
> * Remove "ratelimited" from the logs that are not in the ioctl call
>    paths.
> * Update kzfree() calls to kfree().
> 
> v1 -> v2
> 
> * Add log pattern for NE.
> * Update PCI device setup functions to receive PCI device data structure and
>    then get private data from it inside the functions logic.
> * Remove the BUG_ON calls.
> * Add teardown function for MSI-X setup.
> * Update goto labels to match their purpose.
> * Implement TODO for NE PCI device disable state check.
> * Update function name for NE PCI device probe / remove.
> ---
>   drivers/virt/nitro_enclaves/ne_pci_dev.c | 261 +++++++++++++++++++++++
>   1 file changed, 261 insertions(+)
>   create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
> new file mode 100644
> index 000000000000..235fa3ecbee2
> --- /dev/null
> +++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
> @@ -0,0 +1,261 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + */
> +
> +/* Nitro Enclaves (NE) PCI device driver. */
> +
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/module.h>
> +#include <linux/nitro_enclaves.h>
> +#include <linux/pci.h>
> +#include <linux/types.h>
> +#include <linux/wait.h>
> +
> +#include "ne_misc_dev.h"
> +#include "ne_pci_dev.h"
> +
> +#define NE_DEFAULT_TIMEOUT_MSECS (120000) /* 120 sec */
> +
> +static const struct pci_device_id ne_pci_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
> +	{ 0, }
> +};
> +
> +MODULE_DEVICE_TABLE(pci, ne_pci_ids);
> +
> +/**
> + * ne_setup_msix - Setup MSI-X vectors for the PCI device.
> + *
> + * @pdev: PCI device to setup the MSI-X for.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_setup_msix(struct pci_dev *pdev)
> +{
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +	int nr_vecs = 0;
> +	int rc = -EINVAL;
> +
> +	if (!ne_pci_dev)
> +		return -EINVAL;
> +
> +	nr_vecs = pci_msix_vec_count(pdev);
> +	if (nr_vecs < 0) {
> +		rc = nr_vecs;
> +
> +		dev_err(&pdev->dev, "Error in getting vec count [rc=%d]\n", rc);
> +
> +		return rc;
> +	}
> +
> +	rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
> +	if (rc < 0) {
> +		dev_err(&pdev->dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
> +
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * ne_teardown_msix - Teardown MSI-X vectors for the PCI device.
> + *
> + * @pdev: PCI device to teardown the MSI-X for.
> + */
> +static void ne_teardown_msix(struct pci_dev *pdev)
> +{
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +
> +	if (!ne_pci_dev)
> +		return;
> +
> +	pci_free_irq_vectors(pdev);
> +}
> +
> +/**
> + * ne_pci_dev_enable - Select PCI device version and enable it.
> + *
> + * @pdev: PCI device to select version for and then enable.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_pci_dev_enable(struct pci_dev *pdev)
> +{
> +	u8 dev_enable_reply = 0;
> +	u16 dev_version_reply = 0;
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +
> +	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
> +		return -EINVAL;

How can this ever happen?

> +
> +	iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
> +
> +	dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
> +	if (dev_version_reply != NE_VERSION_MAX) {
> +		dev_err(&pdev->dev, "Error in pci dev version cmd\n");
> +
> +		return -EIO;
> +	}
> +
> +	iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
> +
> +	dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
> +	if (dev_enable_reply != NE_ENABLE_ON) {
> +		dev_err(&pdev->dev, "Error in pci dev enable cmd\n");
> +
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * ne_pci_dev_disable - Disable PCI device.
> + *
> + * @pdev: PCI device to disable.
> + */
> +static void ne_pci_dev_disable(struct pci_dev *pdev)
> +{
> +	u8 dev_disable_reply = 0;
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +	const unsigned int sleep_time = 10; /* 10 ms */
> +	unsigned int sleep_time_count = 0;
> +
> +	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
> +		return;

How can this ever happen?


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests
  2020-06-22 20:03 ` [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests Andra Paraschiv
@ 2020-07-02 15:19   ` Alexander Graf
  2020-07-04 15:05     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-02 15:19 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream, kbuild test robot



On 22.06.20 22:03, Andra Paraschiv wrote:
> The Nitro Enclaves PCI device exposes a MMIO space that this driver
> uses to submit command requests and to receive command replies e.g. for
> enclave creation / termination or setting enclave resources.
> 
> Add logic for handling PCI device command requests based on the given
> command type.
> 
> Register an MSI-X interrupt vector for command reply notifications to
> handle this type of communication events.
> 
> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> 
> Fix issue reported in:
> https://lore.kernel.org/lkml/202004231644.xTmN4Z1z%25lkp@intel.com/
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Use dev_err instead of custom NE log pattern.
> * Return IRQ_NONE when interrupts are not handled.
> 
> v2 -> v3
> 
> * Remove the WARN_ON calls.
> * Update static calls sanity checks.
> * Remove "ratelimited" from the logs that are not in the ioctl call
>    paths.
> 
> v1 -> v2
> 
> * Add log pattern for NE.
> * Remove the BUG_ON calls.
> * Update goto labels to match their purpose.
> * Add fix for kbuild report.
> ---
>   drivers/virt/nitro_enclaves/ne_pci_dev.c | 232 +++++++++++++++++++++++
>   1 file changed, 232 insertions(+)
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
> index 235fa3ecbee2..c24230cfe7c0 100644
> --- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
> +++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
> @@ -27,6 +27,218 @@ static const struct pci_device_id ne_pci_ids[] = {
>   
>   MODULE_DEVICE_TABLE(pci, ne_pci_ids);
>   
> +/**
> + * ne_submit_request - Submit command request to the PCI device based on the
> + * command type.
> + *
> + * This function gets called with the ne_pci_dev mutex held.
> + *
> + * @pdev: PCI device to send the command to.
> + * @cmd_type: command type of the request sent to the PCI device.
> + * @cmd_request: command request payload.
> + * @cmd_request_size: size of the command request payload.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_submit_request(struct pci_dev *pdev,
> +			     enum ne_pci_dev_cmd_type cmd_type,
> +			     void *cmd_request, size_t cmd_request_size)
> +{
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +
> +	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
> +		return -EINVAL;

How can this ever happen?

> +
> +	memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
> +		    cmd_request_size);
> +
> +	iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
> +
> +	return 0;
> +}
> +
> +/**
> + * ne_retrieve_reply - Retrieve reply from the PCI device.
> + *
> + * This function gets called with the ne_pci_dev mutex held.
> + *
> + * @pdev: PCI device to receive the reply from.
> + * @cmd_reply: command reply payload.
> + * @cmd_reply_size: size of the command reply payload.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_retrieve_reply(struct pci_dev *pdev,
> +			     struct ne_pci_dev_cmd_reply *cmd_reply,
> +			     size_t cmd_reply_size)
> +{
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +
> +	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
> +		return -EINVAL;

Same.

> +
> +	memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
> +		      cmd_reply_size);
> +
> +	return 0;
> +}
> +
> +/**
> + * ne_wait_for_reply - Wait for a reply of a PCI command.
> + *
> + * This function gets called with the ne_pci_dev mutex held.
> + *
> + * @pdev: PCI device for which a reply is waited.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_wait_for_reply(struct pci_dev *pdev)
> +{
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +	int rc = -EINVAL;

Unused assignment?

> +
> +	if (!ne_pci_dev)
> +		return -EINVAL;

Same.

> +
> +	/*
> +	 * TODO: Update to _interruptible and handle interrupted wait event
> +	 * e.g. -ERESTARTSYS, incoming signals + add / update timeout.
> +	 */
> +	rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
> +				atomic_read(&ne_pci_dev->cmd_reply_avail) != 0,
> +				msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
> +	if (!rc)
> +		return -ETIMEDOUT;
> +
> +	return 0;
> +}
> +
> +int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type cmd_type,
> +		  void *cmd_request, size_t cmd_request_size,
> +		  struct ne_pci_dev_cmd_reply *cmd_reply, size_t cmd_reply_size)
> +{
> +	struct ne_pci_dev *ne_pci_dev = NULL;
> +	int rc = -EINVAL;
> +
> +	if (!pdev)
> +		return -ENODEV;

When can this happen?

> +
> +	ne_pci_dev = pci_get_drvdata(pdev);
> +	if (!ne_pci_dev || !ne_pci_dev->iomem_base)
> +		return -EINVAL;

Same

> +
> +	if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
> +		dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%u\n",
> +				    cmd_type);
> +
> +		return -EINVAL;
> +	}
> +
> +	if (!cmd_request) {
> +		dev_err_ratelimited(&pdev->dev, "Null cmd request\n");
> +
> +		return -EINVAL;
> +	}
> +
> +	if (cmd_request_size > NE_SEND_DATA_SIZE) {
> +		dev_err_ratelimited(&pdev->dev,
> +				    "Invalid req size=%zu for cmd type=%u\n",
> +				    cmd_request_size, cmd_type);
> +
> +		return -EINVAL;
> +	}
> +
> +	if (!cmd_reply) {
> +		dev_err_ratelimited(&pdev->dev, "Null cmd reply\n");
> +
> +		return -EINVAL;
> +	}
> +
> +	if (cmd_reply_size > NE_RECV_DATA_SIZE) {
> +		dev_err_ratelimited(&pdev->dev, "Invalid reply size=%zu\n",
> +				    cmd_reply_size);
> +
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Use this mutex so that the PCI device handles one command request at
> +	 * a time.
> +	 */
> +	mutex_lock(&ne_pci_dev->pci_dev_mutex);
> +
> +	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
> +
> +	rc = ne_submit_request(pdev, cmd_type, cmd_request, cmd_request_size);
> +	if (rc < 0) {
> +		dev_err_ratelimited(&pdev->dev,
> +				    "Error in submit request [rc=%d]\n", rc);
> +
> +		goto unlock_mutex;
> +	}
> +
> +	rc = ne_wait_for_reply(pdev);
> +	if (rc < 0) {
> +		dev_err_ratelimited(&pdev->dev,
> +				    "Error in wait for reply [rc=%d]\n", rc);
> +
> +		goto unlock_mutex;
> +	}
> +
> +	rc = ne_retrieve_reply(pdev, cmd_reply, cmd_reply_size);
> +	if (rc < 0) {
> +		dev_err_ratelimited(&pdev->dev,
> +				    "Error in retrieve reply [rc=%d]\n", rc);
> +
> +		goto unlock_mutex;
> +	}
> +
> +	atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
> +
> +	if (cmd_reply->rc < 0) {
> +		dev_err_ratelimited(&pdev->dev,
> +				    "Error in cmd process logic [rc=%d]\n",
> +				    cmd_reply->rc);
> +
> +		rc = cmd_reply->rc;
> +
> +		goto unlock_mutex;
> +	}
> +
> +	mutex_unlock(&ne_pci_dev->pci_dev_mutex);
> +
> +	return 0;

Can you just set rc to 0 and fall through?

> +
> +unlock_mutex:
> +	mutex_unlock(&ne_pci_dev->pci_dev_mutex);
> +
> +	return rc;
> +}
> +
> +/**
> + * ne_reply_handler - Interrupt handler for retrieving a reply matching
> + * a request sent to the PCI device for enclave lifetime management.
> + *
> + * @irq: received interrupt for a reply sent by the PCI device.
> + * @args: PCI device private data structure.
> + *
> + * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
> + */
> +static irqreturn_t ne_reply_handler(int irq, void *args)
> +{
> +	struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
> +
> +	if (!ne_pci_dev)
> +		return IRQ_NONE;

How can this ever happen?


Alex

> +
> +	atomic_set(&ne_pci_dev->cmd_reply_avail, 1);
> +
> +	/* TODO: Update to _interruptible. */
> +	wake_up(&ne_pci_dev->cmd_reply_wait_q);
> +
> +	return IRQ_HANDLED;
> +}
> +
>   /**
>    * ne_setup_msix - Setup MSI-X vectors for the PCI device.
>    *
> @@ -59,7 +271,25 @@ static int ne_setup_msix(struct pci_dev *pdev)
>   		return rc;
>   	}
>   
> +	/*
> +	 * This IRQ gets triggered every time the PCI device responds to a
> +	 * command request. The reply is then retrieved, reading from the MMIO
> +	 * space of the PCI device.
> +	 */
> +	rc = request_irq(pci_irq_vector(pdev, NE_VEC_REPLY),
> +			 ne_reply_handler, 0, "enclave_cmd", ne_pci_dev);
> +	if (rc < 0) {
> +		dev_err(&pdev->dev, "Error in request irq reply [rc=%d]\n", rc);
> +
> +		goto free_irq_vectors;
> +	}
> +
>   	return 0;
> +
> +free_irq_vectors:
> +	pci_free_irq_vectors(pdev);
> +
> +	return rc;
>   }
>   
>   /**
> @@ -74,6 +304,8 @@ static void ne_teardown_msix(struct pci_dev *pdev)
>   	if (!ne_pci_dev)
>   		return;
>   
> +	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
> +
>   	pci_free_irq_vectors(pdev);
>   }
>   
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events
  2020-06-22 20:03 ` [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events Andra Paraschiv
@ 2020-07-02 15:24   ` Alexander Graf
  2020-07-04 15:43     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-02 15:24 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> In addition to the replies sent by the Nitro Enclaves PCI device in
> response to command requests, out-of-band enclave events can happen e.g.
> an enclave crashes. In this case, the Nitro Enclaves driver needs to be
> aware of the event and notify the corresponding user space process that
> abstracts the enclave.
> 
> Register an MSI-X interrupt vector to be used for this kind of
> out-of-band events. The interrupt notifies that the state of an enclave
> changed and the driver logic scans the state of each running enclave to
> identify for which this notification is intended.
> 
> Create an workqueue to handle the out-of-band events. Notify user space
> enclave process that is using a polling mechanism on the enclave fd.
> 
> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Use dev_err instead of custom NE log pattern.
> * Return IRQ_NONE when interrupts are not handled.
> 
> v2 -> v3
> 
> * Remove the WARN_ON calls.
> * Update static calls sanity checks.
> * Remove "ratelimited" from the logs that are not in the ioctl call
>    paths.
> 
> v1 -> v2
> 
> * Add log pattern for NE.
> * Update goto labels to match their purpose.
> ---
>   drivers/virt/nitro_enclaves/ne_pci_dev.c | 122 +++++++++++++++++++++++
>   1 file changed, 122 insertions(+)
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c b/drivers/virt/nitro_enclaves/ne_pci_dev.c
> index c24230cfe7c0..9a137862cade 100644
> --- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
> +++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
> @@ -239,6 +239,93 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
>   	return IRQ_HANDLED;
>   }
>   
> +/**
> + * ne_event_work_handler - Work queue handler for notifying enclaves on
> + * a state change received by the event interrupt handler.
> + *
> + * An out-of-band event is being issued by the Nitro Hypervisor when at least
> + * one enclave is changing state without client interaction.
> + *
> + * @work: item containing the Nitro Enclaves PCI device for which a
> + *	  out-of-band event was issued.
> + */
> +static void ne_event_work_handler(struct work_struct *work)
> +{
> +	struct ne_pci_dev_cmd_reply cmd_reply = {};
> +	struct ne_enclave *ne_enclave = NULL;
> +	struct ne_pci_dev *ne_pci_dev =
> +		container_of(work, struct ne_pci_dev, notify_work);
> +	int rc = -EINVAL;
> +	struct slot_info_req slot_info_req = {};
> +
> +	if (!ne_pci_dev)
> +		return;

How?

> +
> +	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
> +
> +	/*
> +	 * Iterate over all enclaves registered for the Nitro Enclaves
> +	 * PCI device and determine for which enclave(s) the out-of-band event
> +	 * is corresponding to.
> +	 */
> +	list_for_each_entry(ne_enclave, &ne_pci_dev->enclaves_list,
> +			    enclave_list_entry) {
> +		mutex_lock(&ne_enclave->enclave_info_mutex);
> +
> +		/*
> +		 * Enclaves that were never started cannot receive out-of-band
> +		 * events.
> +		 */
> +		if (ne_enclave->state != NE_STATE_RUNNING)
> +			goto unlock;
> +
> +		slot_info_req.slot_uid = ne_enclave->slot_uid;
> +
> +		rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, &slot_info_req,
> +				   sizeof(slot_info_req), &cmd_reply,
> +				   sizeof(cmd_reply));
> +		if (rc < 0)
> +			dev_err(&ne_enclave->pdev->dev,
> +				"Error in slot info [rc=%d]\n", rc);
> +
> +		/* Notify enclave process that the enclave state changed. */
> +		if (ne_enclave->state != cmd_reply.state) {
> +			ne_enclave->state = cmd_reply.state;
> +
> +			ne_enclave->has_event = true;
> +
> +			wake_up_interruptible(&ne_enclave->eventq);
> +		}
> +
> +unlock:
> +		 mutex_unlock(&ne_enclave->enclave_info_mutex);
> +	}
> +
> +	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
> +}
> +
> +/**
> + * ne_event_handler - Interrupt handler for PCI device out-of-band
> + * events. This interrupt does not supply any data in the MMIO region.
> + * It notifies a change in the state of any of the launched enclaves.
> + *
> + * @irq: received interrupt for an out-of-band event.
> + * @args: PCI device private data structure.
> + *
> + * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
> + */
> +static irqreturn_t ne_event_handler(int irq, void *args)
> +{
> +	struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
> +
> +	if (!ne_pci_dev)
> +		return IRQ_NONE;

How can this happen?


Alex

> +
> +	queue_work(ne_pci_dev->event_wq, &ne_pci_dev->notify_work);
> +
> +	return IRQ_HANDLED;
> +}
> +
>   /**
>    * ne_setup_msix - Setup MSI-X vectors for the PCI device.
>    *
> @@ -284,8 +371,37 @@ static int ne_setup_msix(struct pci_dev *pdev)
>   		goto free_irq_vectors;
>   	}
>   
> +	ne_pci_dev->event_wq = create_singlethread_workqueue("ne_pci_dev_wq");
> +	if (!ne_pci_dev->event_wq) {
> +		rc = -ENOMEM;
> +
> +		dev_err(&pdev->dev, "Cannot get wq for dev events [rc=%d]\n",
> +			rc);
> +
> +		goto free_reply_irq_vec;
> +	}
> +
> +	INIT_WORK(&ne_pci_dev->notify_work, ne_event_work_handler);
> +
> +	/*
> +	 * This IRQ gets triggered every time any enclave's state changes. Its
> +	 * handler then scans for the changes and propagates them to the user
> +	 * space.
> +	 */
> +	rc = request_irq(pci_irq_vector(pdev, NE_VEC_EVENT),
> +			 ne_event_handler, 0, "enclave_evt", ne_pci_dev);
> +	if (rc < 0) {
> +		dev_err(&pdev->dev, "Error in request irq event [rc=%d]\n", rc);
> +
> +		goto destroy_wq;
> +	}
> +
>   	return 0;
>   
> +destroy_wq:
> +	destroy_workqueue(ne_pci_dev->event_wq);
> +free_reply_irq_vec:
> +	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
>   free_irq_vectors:
>   	pci_free_irq_vectors(pdev);
>   
> @@ -304,6 +420,12 @@ static void ne_teardown_msix(struct pci_dev *pdev)
>   	if (!ne_pci_dev)
>   		return;
>   
> +	free_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_pci_dev);
> +
> +	flush_work(&ne_pci_dev->notify_work);
> +	flush_workqueue(ne_pci_dev->event_wq);
> +	destroy_workqueue(ne_pci_dev->event_wq);
> +
>   	free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
>   
>   	pci_free_irq_vectors(pdev);
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface
  2020-06-22 20:03 ` [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface Andra Paraschiv
@ 2020-07-02 15:24   ` Alexander Graf
  2020-07-04  8:20     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-02 15:24 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> The Nitro Enclaves (NE) driver communicates with a new PCI device, that
> is exposed to a virtual machine (VM) and handles commands meant for
> handling enclaves lifetime e.g. creation, termination, setting memory
> regions. The communication with the PCI device is handled using a MMIO
> space and MSI-X interrupts.
> 
> This device communicates with the hypervisor on the host, where the VM
> that spawned the enclave itself run, e.g. to launch a VM that is used
> for the enclave.
> 
> Define the MMIO space of the PCI device, the commands that are
> provided by this device. Add an internal data structure used as private
> data for the PCI device driver and the functions for the PCI device init
> / uninit and command requests handling.
> 
> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
> Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>

Reviewed-by: Alexander Graf <graf@amazon.com>


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition
  2020-06-22 20:03 ` [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition Andra Paraschiv
  2020-06-23  8:56   ` Stefan Hajnoczi
@ 2020-07-02 15:24   ` Alexander Graf
  2020-07-04  8:09     ` Paraschiv, Andra-Irina
  1 sibling, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-02 15:24 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> The Nitro Enclaves driver handles the enclave lifetime management. This
> includes enclave creation, termination and setting up its resources such
> as memory and CPU.
> 
> An enclave runs alongside the VM that spawned it. It is abstracted as a
> process running in the VM that launched it. The process interacts with
> the NE driver, that exposes an ioctl interface for creating an enclave
> and setting up its resources.
> 
> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>

Reviewed-by: Alexander Graf <graf@amazon.com>


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping
  2020-06-22 20:03 ` [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping Andra Paraschiv
@ 2020-07-02 15:24   ` Alexander Graf
  2020-07-04  8:23     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-02 15:24 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> The Nitro Enclaves driver keeps an internal info per each enclave.
> 
> This is needed to be able to manage enclave resources state, enclave
> notifications and have a reference of the PCI device that handles
> command requests for enclave lifetime management.
> 
> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>

Reviewed-by: Alexander Graf <graf@amazon.com>


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition
  2020-07-02 15:24   ` Alexander Graf
@ 2020-07-04  8:09     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-04  8:09 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 02/07/2020 18:24, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> The Nitro Enclaves driver handles the enclave lifetime management. This
>> includes enclave creation, termination and setting up its resources such
>> as memory and CPU.
>>
>> An enclave runs alongside the VM that spawned it. It is abstracted as a
>> process running in the VM that launched it. The process interacts with
>> the NE driver, that exposes an ioctl interface for creating an enclave
>> and setting up its resources.
>>
>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>
> Reviewed-by: Alexander Graf <graf@amazon.com>

Added. Thanks for reviewing the group of patches so far.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface
  2020-07-02 15:24   ` Alexander Graf
@ 2020-07-04  8:20     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-04  8:20 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 02/07/2020 18:24, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> The Nitro Enclaves (NE) driver communicates with a new PCI device, that
>> is exposed to a virtual machine (VM) and handles commands meant for
>> handling enclaves lifetime e.g. creation, termination, setting memory
>> regions. The communication with the PCI device is handled using a MMIO
>> space and MSI-X interrupts.
>>
>> This device communicates with the hypervisor on the host, where the VM
>> that spawned the enclave itself run, e.g. to launch a VM that is used
>> for the enclave.
>>
>> Define the MMIO space of the PCI device, the commands that are
>> provided by this device. Add an internal data structure used as private
>> data for the PCI device driver and the functions for the PCI device init
>> / uninit and command requests handling.
>>
>> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
>> Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>
> Reviewed-by: Alexander Graf <graf@amazon.com>

Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping
  2020-07-02 15:24   ` Alexander Graf
@ 2020-07-04  8:23     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-04  8:23 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 02/07/2020 18:24, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> The Nitro Enclaves driver keeps an internal info per each enclave.
>>
>> This is needed to be able to manage enclave resources state, enclave
>> notifications and have a reference of the PCI device that handles
>> command requests for enclave lifetime management.
>>
>> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>
> Reviewed-by: Alexander Graf <graf@amazon.com>

Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 04/18] nitro_enclaves: Init PCI device driver
  2020-07-02 15:09   ` Alexander Graf
@ 2020-07-04 10:00     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-04 10:00 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 02/07/2020 18:09, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> The Nitro Enclaves PCI device is used by the kernel driver as a means of
>> communication with the hypervisor on the host where the primary VM and
>> the enclaves run. It handles requests with regard to enclave lifetime.
>>
>> Setup the PCI device driver and add support for MSI-X interrupts.
>>
>> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
>> Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Use dev_err instead of custom NE log pattern.
>> * Update NE PCI driver name to "nitro_enclaves".
>>
>> v2 -> v3
>>
>> * Remove the GPL additional wording as SPDX-License-Identifier is
>>    already in place.
>> * Remove the WARN_ON calls.
>> * Remove linux/bug include that is not needed.
>> * Update static calls sanity checks.
>> * Remove "ratelimited" from the logs that are not in the ioctl call
>>    paths.
>> * Update kzfree() calls to kfree().
>>
>> v1 -> v2
>>
>> * Add log pattern for NE.
>> * Update PCI device setup functions to receive PCI device data 
>> structure and
>>    then get private data from it inside the functions logic.
>> * Remove the BUG_ON calls.
>> * Add teardown function for MSI-X setup.
>> * Update goto labels to match their purpose.
>> * Implement TODO for NE PCI device disable state check.
>> * Update function name for NE PCI device probe / remove.
>> ---
>>   drivers/virt/nitro_enclaves/ne_pci_dev.c | 261 +++++++++++++++++++++++
>>   1 file changed, 261 insertions(+)
>>   create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> new file mode 100644
>> index 000000000000..235fa3ecbee2
>> --- /dev/null
>> +++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> @@ -0,0 +1,261 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
>> Reserved.
>> + */
>> +
>> +/* Nitro Enclaves (NE) PCI device driver. */
>> +
>> +#include <linux/delay.h>
>> +#include <linux/device.h>
>> +#include <linux/list.h>
>> +#include <linux/mutex.h>
>> +#include <linux/module.h>
>> +#include <linux/nitro_enclaves.h>
>> +#include <linux/pci.h>
>> +#include <linux/types.h>
>> +#include <linux/wait.h>
>> +
>> +#include "ne_misc_dev.h"
>> +#include "ne_pci_dev.h"
>> +
>> +#define NE_DEFAULT_TIMEOUT_MSECS (120000) /* 120 sec */
>> +
>> +static const struct pci_device_id ne_pci_ids[] = {
>> +    { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
>> +    { 0, }
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, ne_pci_ids);
>> +
>> +/**
>> + * ne_setup_msix - Setup MSI-X vectors for the PCI device.
>> + *
>> + * @pdev: PCI device to setup the MSI-X for.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_setup_msix(struct pci_dev *pdev)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +    int nr_vecs = 0;
>> +    int rc = -EINVAL;
>> +
>> +    if (!ne_pci_dev)
>> +        return -EINVAL;
>> +
>> +    nr_vecs = pci_msix_vec_count(pdev);
>> +    if (nr_vecs < 0) {
>> +        rc = nr_vecs;
>> +
>> +        dev_err(&pdev->dev, "Error in getting vec count [rc=%d]\n", 
>> rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
>> +    if (rc < 0) {
>> +        dev_err(&pdev->dev, "Error in alloc MSI-X vecs [rc=%d]\n", rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_teardown_msix - Teardown MSI-X vectors for the PCI device.
>> + *
>> + * @pdev: PCI device to teardown the MSI-X for.
>> + */
>> +static void ne_teardown_msix(struct pci_dev *pdev)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +
>> +    if (!ne_pci_dev)
>> +        return;
>> +
>> +    pci_free_irq_vectors(pdev);
>> +}
>> +
>> +/**
>> + * ne_pci_dev_enable - Select PCI device version and enable it.
>> + *
>> + * @pdev: PCI device to select version for and then enable.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_pci_dev_enable(struct pci_dev *pdev)
>> +{
>> +    u8 dev_enable_reply = 0;
>> +    u16 dev_version_reply = 0;
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +
>> +    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
>> +        return -EINVAL;
>
> How can this ever happen?

This check and the following one are part of that checks added before 
for the situations that shouldn't happen, only if buggy system or broken 
logic at all. Removed the checks.

Thanks,
Andra

>
>> +
>> +    iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
>> +
>> +    dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
>> +    if (dev_version_reply != NE_VERSION_MAX) {
>> +        dev_err(&pdev->dev, "Error in pci dev version cmd\n");
>> +
>> +        return -EIO;
>> +    }
>> +
>> +    iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
>> +
>> +    dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
>> +    if (dev_enable_reply != NE_ENABLE_ON) {
>> +        dev_err(&pdev->dev, "Error in pci dev enable cmd\n");
>> +
>> +        return -EIO;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_pci_dev_disable - Disable PCI device.
>> + *
>> + * @pdev: PCI device to disable.
>> + */
>> +static void ne_pci_dev_disable(struct pci_dev *pdev)
>> +{
>> +    u8 dev_disable_reply = 0;
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +    const unsigned int sleep_time = 10; /* 10 ms */
>> +    unsigned int sleep_time_count = 0;
>> +
>> +    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
>> +        return;
>
> How can this ever happen?
>
>
> Alex




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests
  2020-07-02 15:19   ` Alexander Graf
@ 2020-07-04 15:05     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-04 15:05 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream, kbuild test robot



On 02/07/2020 18:19, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> The Nitro Enclaves PCI device exposes a MMIO space that this driver
>> uses to submit command requests and to receive command replies e.g. for
>> enclave creation / termination or setting enclave resources.
>>
>> Add logic for handling PCI device command requests based on the given
>> command type.
>>
>> Register an MSI-X interrupt vector for command reply notifications to
>> handle this type of communication events.
>>
>> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>>
>> Fix issue reported in:
>> https://lore.kernel.org/lkml/202004231644.xTmN4Z1z%25lkp@intel.com/
>>
>> Reported-by: kbuild test robot <lkp@intel.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Use dev_err instead of custom NE log pattern.
>> * Return IRQ_NONE when interrupts are not handled.
>>
>> v2 -> v3
>>
>> * Remove the WARN_ON calls.
>> * Update static calls sanity checks.
>> * Remove "ratelimited" from the logs that are not in the ioctl call
>>    paths.
>>
>> v1 -> v2
>>
>> * Add log pattern for NE.
>> * Remove the BUG_ON calls.
>> * Update goto labels to match their purpose.
>> * Add fix for kbuild report.
>> ---
>>   drivers/virt/nitro_enclaves/ne_pci_dev.c | 232 +++++++++++++++++++++++
>>   1 file changed, 232 insertions(+)
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> index 235fa3ecbee2..c24230cfe7c0 100644
>> --- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> +++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> @@ -27,6 +27,218 @@ static const struct pci_device_id ne_pci_ids[] = {
>>     MODULE_DEVICE_TABLE(pci, ne_pci_ids);
>>   +/**
>> + * ne_submit_request - Submit command request to the PCI device 
>> based on the
>> + * command type.
>> + *
>> + * This function gets called with the ne_pci_dev mutex held.
>> + *
>> + * @pdev: PCI device to send the command to.
>> + * @cmd_type: command type of the request sent to the PCI device.
>> + * @cmd_request: command request payload.
>> + * @cmd_request_size: size of the command request payload.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_submit_request(struct pci_dev *pdev,
>> +                 enum ne_pci_dev_cmd_type cmd_type,
>> +                 void *cmd_request, size_t cmd_request_size)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +
>> +    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
>> +        return -EINVAL;
>
> How can this ever happen?

Removed this one and the next checks in v5 of the patch series.

Thanks,
Andra

>
>> +
>> +    memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
>> +            cmd_request_size);
>> +
>> +    iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_retrieve_reply - Retrieve reply from the PCI device.
>> + *
>> + * This function gets called with the ne_pci_dev mutex held.
>> + *
>> + * @pdev: PCI device to receive the reply from.
>> + * @cmd_reply: command reply payload.
>> + * @cmd_reply_size: size of the command reply payload.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_retrieve_reply(struct pci_dev *pdev,
>> +                 struct ne_pci_dev_cmd_reply *cmd_reply,
>> +                 size_t cmd_reply_size)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +
>> +    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
>> +        return -EINVAL;
>
> Same.
>
>> +
>> +    memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
>> +              cmd_reply_size);
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_wait_for_reply - Wait for a reply of a PCI command.
>> + *
>> + * This function gets called with the ne_pci_dev mutex held.
>> + *
>> + * @pdev: PCI device for which a reply is waited.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_wait_for_reply(struct pci_dev *pdev)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +    int rc = -EINVAL;
>
> Unused assignment?
>
>> +
>> +    if (!ne_pci_dev)
>> +        return -EINVAL;
>
> Same.
>
>> +
>> +    /*
>> +     * TODO: Update to _interruptible and handle interrupted wait event
>> +     * e.g. -ERESTARTSYS, incoming signals + add / update timeout.
>> +     */
>> +    rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
>> + atomic_read(&ne_pci_dev->cmd_reply_avail) != 0,
>> +                msecs_to_jiffies(NE_DEFAULT_TIMEOUT_MSECS));
>> +    if (!rc)
>> +        return -ETIMEDOUT;
>> +
>> +    return 0;
>> +}
>> +
>> +int ne_do_request(struct pci_dev *pdev, enum ne_pci_dev_cmd_type 
>> cmd_type,
>> +          void *cmd_request, size_t cmd_request_size,
>> +          struct ne_pci_dev_cmd_reply *cmd_reply, size_t 
>> cmd_reply_size)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = NULL;
>> +    int rc = -EINVAL;
>> +
>> +    if (!pdev)
>> +        return -ENODEV;
>
> When can this happen?
>
>> +
>> +    ne_pci_dev = pci_get_drvdata(pdev);
>> +    if (!ne_pci_dev || !ne_pci_dev->iomem_base)
>> +        return -EINVAL;
>
> Same
>
>> +
>> +    if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
>> +        dev_err_ratelimited(&pdev->dev, "Invalid cmd type=%u\n",
>> +                    cmd_type);
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!cmd_request) {
>> +        dev_err_ratelimited(&pdev->dev, "Null cmd request\n");
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (cmd_request_size > NE_SEND_DATA_SIZE) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Invalid req size=%zu for cmd type=%u\n",
>> +                    cmd_request_size, cmd_type);
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!cmd_reply) {
>> +        dev_err_ratelimited(&pdev->dev, "Null cmd reply\n");
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (cmd_reply_size > NE_RECV_DATA_SIZE) {
>> +        dev_err_ratelimited(&pdev->dev, "Invalid reply size=%zu\n",
>> +                    cmd_reply_size);
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    /*
>> +     * Use this mutex so that the PCI device handles one command 
>> request at
>> +     * a time.
>> +     */
>> +    mutex_lock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +    atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
>> +
>> +    rc = ne_submit_request(pdev, cmd_type, cmd_request, 
>> cmd_request_size);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Error in submit request [rc=%d]\n", rc);
>> +
>> +        goto unlock_mutex;
>> +    }
>> +
>> +    rc = ne_wait_for_reply(pdev);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Error in wait for reply [rc=%d]\n", rc);
>> +
>> +        goto unlock_mutex;
>> +    }
>> +
>> +    rc = ne_retrieve_reply(pdev, cmd_reply, cmd_reply_size);
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Error in retrieve reply [rc=%d]\n", rc);
>> +
>> +        goto unlock_mutex;
>> +    }
>> +
>> +    atomic_set(&ne_pci_dev->cmd_reply_avail, 0);
>> +
>> +    if (cmd_reply->rc < 0) {
>> +        dev_err_ratelimited(&pdev->dev,
>> +                    "Error in cmd process logic [rc=%d]\n",
>> +                    cmd_reply->rc);
>> +
>> +        rc = cmd_reply->rc;
>> +
>> +        goto unlock_mutex;
>> +    }
>> +
>> +    mutex_unlock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +    return 0;
>
> Can you just set rc to 0 and fall through?

Done.

>
>> +
>> +unlock_mutex:
>> +    mutex_unlock(&ne_pci_dev->pci_dev_mutex);
>> +
>> +    return rc;
>> +}
>> +
>> +/**
>> + * ne_reply_handler - Interrupt handler for retrieving a reply matching
>> + * a request sent to the PCI device for enclave lifetime management.
>> + *
>> + * @irq: received interrupt for a reply sent by the PCI device.
>> + * @args: PCI device private data structure.
>> + *
>> + * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
>> + */
>> +static irqreturn_t ne_reply_handler(int irq, void *args)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
>> +
>> +    if (!ne_pci_dev)
>> +        return IRQ_NONE;
>
> How can this ever happen?
>
>
> Alex
>
>> +
>> +    atomic_set(&ne_pci_dev->cmd_reply_avail, 1);
>> +
>> +    /* TODO: Update to _interruptible. */
>> +    wake_up(&ne_pci_dev->cmd_reply_wait_q);
>> +
>> +    return IRQ_HANDLED;
>> +}
>> +
>>   /**
>>    * ne_setup_msix - Setup MSI-X vectors for the PCI device.
>>    *
>> @@ -59,7 +271,25 @@ static int ne_setup_msix(struct pci_dev *pdev)
>>           return rc;
>>       }
>>   +    /*
>> +     * This IRQ gets triggered every time the PCI device responds to a
>> +     * command request. The reply is then retrieved, reading from 
>> the MMIO
>> +     * space of the PCI device.
>> +     */
>> +    rc = request_irq(pci_irq_vector(pdev, NE_VEC_REPLY),
>> +             ne_reply_handler, 0, "enclave_cmd", ne_pci_dev);
>> +    if (rc < 0) {
>> +        dev_err(&pdev->dev, "Error in request irq reply [rc=%d]\n", 
>> rc);
>> +
>> +        goto free_irq_vectors;
>> +    }
>> +
>>       return 0;
>> +
>> +free_irq_vectors:
>> +    pci_free_irq_vectors(pdev);
>> +
>> +    return rc;
>>   }
>>     /**
>> @@ -74,6 +304,8 @@ static void ne_teardown_msix(struct pci_dev *pdev)
>>       if (!ne_pci_dev)
>>           return;
>>   +    free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
>> +
>>       pci_free_irq_vectors(pdev);
>>   }
>>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events
  2020-07-02 15:24   ` Alexander Graf
@ 2020-07-04 15:43     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-04 15:43 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 02/07/2020 18:24, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> In addition to the replies sent by the Nitro Enclaves PCI device in
>> response to command requests, out-of-band enclave events can happen e.g.
>> an enclave crashes. In this case, the Nitro Enclaves driver needs to be
>> aware of the event and notify the corresponding user space process that
>> abstracts the enclave.
>>
>> Register an MSI-X interrupt vector to be used for this kind of
>> out-of-band events. The interrupt notifies that the state of an enclave
>> changed and the driver logic scans the state of each running enclave to
>> identify for which this notification is intended.
>>
>> Create an workqueue to handle the out-of-band events. Notify user space
>> enclave process that is using a polling mechanism on the enclave fd.
>>
>> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Use dev_err instead of custom NE log pattern.
>> * Return IRQ_NONE when interrupts are not handled.
>>
>> v2 -> v3
>>
>> * Remove the WARN_ON calls.
>> * Update static calls sanity checks.
>> * Remove "ratelimited" from the logs that are not in the ioctl call
>>    paths.
>>
>> v1 -> v2
>>
>> * Add log pattern for NE.
>> * Update goto labels to match their purpose.
>> ---
>>   drivers/virt/nitro_enclaves/ne_pci_dev.c | 122 +++++++++++++++++++++++
>>   1 file changed, 122 insertions(+)
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> index c24230cfe7c0..9a137862cade 100644
>> --- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> +++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
>> @@ -239,6 +239,93 @@ static irqreturn_t ne_reply_handler(int irq, 
>> void *args)
>>       return IRQ_HANDLED;
>>   }
>>   +/**
>> + * ne_event_work_handler - Work queue handler for notifying enclaves on
>> + * a state change received by the event interrupt handler.
>> + *
>> + * An out-of-band event is being issued by the Nitro Hypervisor when 
>> at least
>> + * one enclave is changing state without client interaction.
>> + *
>> + * @work: item containing the Nitro Enclaves PCI device for which a
>> + *      out-of-band event was issued.
>> + */
>> +static void ne_event_work_handler(struct work_struct *work)
>> +{
>> +    struct ne_pci_dev_cmd_reply cmd_reply = {};
>> +    struct ne_enclave *ne_enclave = NULL;
>> +    struct ne_pci_dev *ne_pci_dev =
>> +        container_of(work, struct ne_pci_dev, notify_work);
>> +    int rc = -EINVAL;
>> +    struct slot_info_req slot_info_req = {};
>> +
>> +    if (!ne_pci_dev)
>> +        return;
>
> How?

Removed this check and the one below. Thank you.

Andra

>
>> +
>> +    mutex_lock(&ne_pci_dev->enclaves_list_mutex);
>> +
>> +    /*
>> +     * Iterate over all enclaves registered for the Nitro Enclaves
>> +     * PCI device and determine for which enclave(s) the out-of-band 
>> event
>> +     * is corresponding to.
>> +     */
>> +    list_for_each_entry(ne_enclave, &ne_pci_dev->enclaves_list,
>> +                enclave_list_entry) {
>> +        mutex_lock(&ne_enclave->enclave_info_mutex);
>> +
>> +        /*
>> +         * Enclaves that were never started cannot receive out-of-band
>> +         * events.
>> +         */
>> +        if (ne_enclave->state != NE_STATE_RUNNING)
>> +            goto unlock;
>> +
>> +        slot_info_req.slot_uid = ne_enclave->slot_uid;
>> +
>> +        rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, &slot_info_req,
>> +                   sizeof(slot_info_req), &cmd_reply,
>> +                   sizeof(cmd_reply));
>> +        if (rc < 0)
>> +            dev_err(&ne_enclave->pdev->dev,
>> +                "Error in slot info [rc=%d]\n", rc);
>> +
>> +        /* Notify enclave process that the enclave state changed. */
>> +        if (ne_enclave->state != cmd_reply.state) {
>> +            ne_enclave->state = cmd_reply.state;
>> +
>> +            ne_enclave->has_event = true;
>> +
>> +            wake_up_interruptible(&ne_enclave->eventq);
>> +        }
>> +
>> +unlock:
>> +         mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +    }
>> +
>> +    mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
>> +}
>> +
>> +/**
>> + * ne_event_handler - Interrupt handler for PCI device out-of-band
>> + * events. This interrupt does not supply any data in the MMIO region.
>> + * It notifies a change in the state of any of the launched enclaves.
>> + *
>> + * @irq: received interrupt for an out-of-band event.
>> + * @args: PCI device private data structure.
>> + *
>> + * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
>> + */
>> +static irqreturn_t ne_event_handler(int irq, void *args)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
>> +
>> +    if (!ne_pci_dev)
>> +        return IRQ_NONE;
>
> How can this happen?
>
>
> Alex
>
>> +
>> +    queue_work(ne_pci_dev->event_wq, &ne_pci_dev->notify_work);
>> +
>> +    return IRQ_HANDLED;
>> +}
>> +
>>   /**
>>    * ne_setup_msix - Setup MSI-X vectors for the PCI device.
>>    *
>> @@ -284,8 +371,37 @@ static int ne_setup_msix(struct pci_dev *pdev)
>>           goto free_irq_vectors;
>>       }
>>   +    ne_pci_dev->event_wq = 
>> create_singlethread_workqueue("ne_pci_dev_wq");
>> +    if (!ne_pci_dev->event_wq) {
>> +        rc = -ENOMEM;
>> +
>> +        dev_err(&pdev->dev, "Cannot get wq for dev events [rc=%d]\n",
>> +            rc);
>> +
>> +        goto free_reply_irq_vec;
>> +    }
>> +
>> +    INIT_WORK(&ne_pci_dev->notify_work, ne_event_work_handler);
>> +
>> +    /*
>> +     * This IRQ gets triggered every time any enclave's state 
>> changes. Its
>> +     * handler then scans for the changes and propagates them to the 
>> user
>> +     * space.
>> +     */
>> +    rc = request_irq(pci_irq_vector(pdev, NE_VEC_EVENT),
>> +             ne_event_handler, 0, "enclave_evt", ne_pci_dev);
>> +    if (rc < 0) {
>> +        dev_err(&pdev->dev, "Error in request irq event [rc=%d]\n", 
>> rc);
>> +
>> +        goto destroy_wq;
>> +    }
>> +
>>       return 0;
>>   +destroy_wq:
>> +    destroy_workqueue(ne_pci_dev->event_wq);
>> +free_reply_irq_vec:
>> +    free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
>>   free_irq_vectors:
>>       pci_free_irq_vectors(pdev);
>>   @@ -304,6 +420,12 @@ static void ne_teardown_msix(struct pci_dev 
>> *pdev)
>>       if (!ne_pci_dev)
>>           return;
>>   +    free_irq(pci_irq_vector(pdev, NE_VEC_EVENT), ne_pci_dev);
>> +
>> +    flush_work(&ne_pci_dev->notify_work);
>> +    flush_workqueue(ne_pci_dev->event_wq);
>> +    destroy_workqueue(ne_pci_dev->event_wq);
>> +
>>       free_irq(pci_irq_vector(pdev, NE_VEC_REPLY), ne_pci_dev);
>>         pci_free_irq_vectors(pdev);
>>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-06-22 20:03 ` [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface Andra Paraschiv
  2020-06-29 16:20   ` Greg KH
@ 2020-07-06  7:13   ` Alexander Graf
  2020-07-06  7:49     ` Paraschiv, Andra-Irina
  1 sibling, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06  7:13 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Benjamin Herrenschmidt, Colm MacCarthaigh, Bjoern Doebel,
	David Woodhouse, Frank van der Linden, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> The Nitro Enclaves driver provides an ioctl interface to the user space
> for enclave lifetime management e.g. enclave creation / termination and
> setting enclave resources such as memory and CPU.
> 
> This ioctl interface is mapped to a Nitro Enclaves misc device.
> 
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Use dev_err instead of custom NE log pattern.
> * Remove the NE CPU pool init during kernel module loading, as the CPU
>    pool is now setup at runtime, via a sysfs file for the kernel
>    parameter.
> * Add minimum enclave memory size definition.
> 
> v2 -> v3
> 
> * Remove the GPL additional wording as SPDX-License-Identifier is
>    already in place.
> * Remove the WARN_ON calls.
> * Remove linux/bug and linux/kvm_host includes that are not needed.
> * Remove "ratelimited" from the logs that are not in the ioctl call
>    paths.
> * Remove file ops that do nothing for now - open and release.
> 
> v1 -> v2
> 
> * Add log pattern for NE.
> * Update goto labels to match their purpose.
> * Update ne_cpu_pool data structure to include the global mutex.
> * Update NE misc device mode to 0660.
> * Check if the CPU siblings are included in the NE CPU pool, as full CPU
>    cores are given for the enclave(s).
> ---
>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 133 ++++++++++++++++++++++
>   drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
>   2 files changed, 144 insertions(+)
>   create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> new file mode 100644
> index 000000000000..628fb10c2b36
> --- /dev/null
> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> @@ -0,0 +1,133 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + */
> +
> +/**
> + * Enclave lifetime management driver for Nitro Enclaves (NE).
> + * Nitro is a hypervisor that has been developed by Amazon.
> + */
> +
> +#include <linux/anon_inodes.h>
> +#include <linux/capability.h>
> +#include <linux/cpu.h>
> +#include <linux/device.h>
> +#include <linux/file.h>
> +#include <linux/hugetlb.h>
> +#include <linux/list.h>
> +#include <linux/miscdevice.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/nitro_enclaves.h>
> +#include <linux/pci.h>
> +#include <linux/poll.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include "ne_misc_dev.h"
> +#include "ne_pci_dev.h"
> +
> +#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
> +
> +#define NE_MIN_ENCLAVE_MEM_SIZE (64 * 1024UL * 1024UL)
> +
> +#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
> +
> +/*
> + * TODO: Update logic to create new sysfs entries instead of using
> + * a kernel parameter e.g. if multiple sysfs files needed.
> + */
> +static const struct kernel_param_ops ne_cpu_pool_ops = {

Adding an empty ops struct looks very odd. If you fill it in a later 
patch, please indicate so in a comment here.

> +};
> +
> +static char ne_cpus[PAGE_SIZE];

PAGE_SIZE is a bit excessive, no? Even if you list every single CPU of a 
256 CPU system you are <1024.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-07-06  7:13   ` Alexander Graf
@ 2020-07-06  7:49     ` Paraschiv, Andra-Irina
  2020-07-06  8:01       ` Alexander Graf
  0 siblings, 1 reply; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-06  7:49 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Benjamin Herrenschmidt, Colm MacCarthaigh, Bjoern Doebel,
	David Woodhouse, Frank van der Linden, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream



On 06/07/2020 10:13, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> The Nitro Enclaves driver provides an ioctl interface to the user space
>> for enclave lifetime management e.g. enclave creation / termination and
>> setting enclave resources such as memory and CPU.
>>
>> This ioctl interface is mapped to a Nitro Enclaves misc device.
>>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Use dev_err instead of custom NE log pattern.
>> * Remove the NE CPU pool init during kernel module loading, as the CPU
>>    pool is now setup at runtime, via a sysfs file for the kernel
>>    parameter.
>> * Add minimum enclave memory size definition.
>>
>> v2 -> v3
>>
>> * Remove the GPL additional wording as SPDX-License-Identifier is
>>    already in place.
>> * Remove the WARN_ON calls.
>> * Remove linux/bug and linux/kvm_host includes that are not needed.
>> * Remove "ratelimited" from the logs that are not in the ioctl call
>>    paths.
>> * Remove file ops that do nothing for now - open and release.
>>
>> v1 -> v2
>>
>> * Add log pattern for NE.
>> * Update goto labels to match their purpose.
>> * Update ne_cpu_pool data structure to include the global mutex.
>> * Update NE misc device mode to 0660.
>> * Check if the CPU siblings are included in the NE CPU pool, as full CPU
>>    cores are given for the enclave(s).
>> ---
>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 133 ++++++++++++++++++++++
>>   drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
>>   2 files changed, 144 insertions(+)
>>   create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> new file mode 100644
>> index 000000000000..628fb10c2b36
>> --- /dev/null
>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> @@ -0,0 +1,133 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
>> Reserved.
>> + */
>> +
>> +/**
>> + * Enclave lifetime management driver for Nitro Enclaves (NE).
>> + * Nitro is a hypervisor that has been developed by Amazon.
>> + */
>> +
>> +#include <linux/anon_inodes.h>
>> +#include <linux/capability.h>
>> +#include <linux/cpu.h>
>> +#include <linux/device.h>
>> +#include <linux/file.h>
>> +#include <linux/hugetlb.h>
>> +#include <linux/list.h>
>> +#include <linux/miscdevice.h>
>> +#include <linux/mm.h>
>> +#include <linux/mman.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/nitro_enclaves.h>
>> +#include <linux/pci.h>
>> +#include <linux/poll.h>
>> +#include <linux/slab.h>
>> +#include <linux/types.h>
>> +
>> +#include "ne_misc_dev.h"
>> +#include "ne_pci_dev.h"
>> +
>> +#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
>> +
>> +#define NE_MIN_ENCLAVE_MEM_SIZE (64 * 1024UL * 1024UL)
>> +
>> +#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
>> +
>> +/*
>> + * TODO: Update logic to create new sysfs entries instead of using
>> + * a kernel parameter e.g. if multiple sysfs files needed.
>> + */
>> +static const struct kernel_param_ops ne_cpu_pool_ops = {
>
> Adding an empty ops struct looks very odd. If you fill it in a later 
> patch, please indicate so in a comment here.

True, I already updated this in v5, to have the .get function here and 
the .set one in a later patch.

>
>> +};
>> +
>> +static char ne_cpus[PAGE_SIZE];
>
> PAGE_SIZE is a bit excessive, no? Even if you list every single CPU of 
> a 256 CPU system you are <1024.

It is a bit too much, I was thinking of it while declaring this. I can 
update to 1024 in v5.

Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation
  2020-06-22 20:03 ` [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation Andra Paraschiv
@ 2020-07-06  7:53   ` Alexander Graf
  2020-07-06 13:12     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06  7:53 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> Add ioctl command logic for enclave VM creation. It triggers a slot
> allocation. The enclave resources will be associated with this slot and
> it will be used as an identifier for triggering enclave run.
> 
> Return a file descriptor, namely enclave fd. This is further used by the
> associated user space enclave process to set enclave resources and
> trigger enclave termination.
> 
> The poll function is implemented in order to notify the enclave process
> when an enclave exits without a specific enclave termination command
> trigger e.g. when an enclave crashes.
> 
> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>

Reviewed-by: Alexander Graf <graf@amazon.com>


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-07-06  7:49     ` Paraschiv, Andra-Irina
@ 2020-07-06  8:01       ` Alexander Graf
  2020-07-06 13:09         ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06  8:01 UTC (permalink / raw)
  To: Paraschiv, Andra-Irina, linux-kernel
  Cc: Benjamin Herrenschmidt, Colm MacCarthaigh, Bjoern Doebel,
	David Woodhouse, Frank van der Linden, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream



On 06.07.20 09:49, Paraschiv, Andra-Irina wrote:
> 
> 
> On 06/07/2020 10:13, Alexander Graf wrote:
>>
>>
>> On 22.06.20 22:03, Andra Paraschiv wrote:
>>> The Nitro Enclaves driver provides an ioctl interface to the user space
>>> for enclave lifetime management e.g. enclave creation / termination and
>>> setting enclave resources such as memory and CPU.
>>>
>>> This ioctl interface is mapped to a Nitro Enclaves misc device.
>>>
>>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>>> ---
>>> Changelog
>>>
>>> v3 -> v4
>>>
>>> * Use dev_err instead of custom NE log pattern.
>>> * Remove the NE CPU pool init during kernel module loading, as the CPU
>>>    pool is now setup at runtime, via a sysfs file for the kernel
>>>    parameter.
>>> * Add minimum enclave memory size definition.
>>>
>>> v2 -> v3
>>>
>>> * Remove the GPL additional wording as SPDX-License-Identifier is
>>>    already in place.
>>> * Remove the WARN_ON calls.
>>> * Remove linux/bug and linux/kvm_host includes that are not needed.
>>> * Remove "ratelimited" from the logs that are not in the ioctl call
>>>    paths.
>>> * Remove file ops that do nothing for now - open and release.
>>>
>>> v1 -> v2
>>>
>>> * Add log pattern for NE.
>>> * Update goto labels to match their purpose.
>>> * Update ne_cpu_pool data structure to include the global mutex.
>>> * Update NE misc device mode to 0660.
>>> * Check if the CPU siblings are included in the NE CPU pool, as full CPU
>>>    cores are given for the enclave(s).
>>> ---
>>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 133 ++++++++++++++++++++++
>>>   drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
>>>   2 files changed, 144 insertions(+)
>>>   create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c
>>>
>>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>> new file mode 100644
>>> index 000000000000..628fb10c2b36
>>> --- /dev/null
>>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>> @@ -0,0 +1,133 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
>>> Reserved.
>>> + */
>>> +
>>> +/**
>>> + * Enclave lifetime management driver for Nitro Enclaves (NE).
>>> + * Nitro is a hypervisor that has been developed by Amazon.
>>> + */
>>> +
>>> +#include <linux/anon_inodes.h>
>>> +#include <linux/capability.h>
>>> +#include <linux/cpu.h>
>>> +#include <linux/device.h>
>>> +#include <linux/file.h>
>>> +#include <linux/hugetlb.h>
>>> +#include <linux/list.h>
>>> +#include <linux/miscdevice.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/mman.h>
>>> +#include <linux/module.h>
>>> +#include <linux/mutex.h>
>>> +#include <linux/nitro_enclaves.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/poll.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/types.h>
>>> +
>>> +#include "ne_misc_dev.h"
>>> +#include "ne_pci_dev.h"
>>> +
>>> +#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
>>> +
>>> +#define NE_MIN_ENCLAVE_MEM_SIZE (64 * 1024UL * 1024UL)
>>> +
>>> +#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
>>> +
>>> +/*
>>> + * TODO: Update logic to create new sysfs entries instead of using
>>> + * a kernel parameter e.g. if multiple sysfs files needed.
>>> + */
>>> +static const struct kernel_param_ops ne_cpu_pool_ops = {
>>
>> Adding an empty ops struct looks very odd. If you fill it in a later 
>> patch, please indicate so in a comment here.
> 
> True, I already updated this in v5, to have the .get function here and 
> the .set one in a later patch.
> 
>>
>>> +};
>>> +
>>> +static char ne_cpus[PAGE_SIZE];
>>
>> PAGE_SIZE is a bit excessive, no? Even if you list every single CPU of 
>> a 256 CPU system you are <1024.
> 
> It is a bit too much, I was thinking of it while declaring this. I can 
> update to 1024 in v5.

The largest NUMA node CPU count I'm aware of today is 64. Since we limit 
the pool to a single node, we can't go beyond that. Let's be a bit 
future proof and double that number: 128. Then we get to 401 characters 
if you pass in every single CPU as comma separated. I would seriously 
hope most people would just pass ranges though.

So how about we make it 512 for now?


Thanks,

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation
  2020-06-22 20:03 ` [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation Andra Paraschiv
@ 2020-07-06 10:12   ` Alexander Graf
  2020-07-08 12:46     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 10:12 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> An enclave, before being started, has its resources set. One of its
> resources is CPU.
> 
> The NE CPU pool is set for choosing CPUs for enclaves from it. Offline
> the CPUs from the NE CPU pool during the pool setup and online them back
> during the NE CPU pool teardown.
> 
> The enclave CPUs need to be full cores and from the same NUMA node. CPU
> 0 and its siblings have to remain available to the primary / parent VM.
> 
> Add ioctl command logic for enclave vCPU creation. Return as result a
> file descriptor that is associated with the enclave vCPU.
> 
> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Setup the NE CPU pool at runtime via a sysfs file for the kernel
>    parameter.
> * Check enclave CPUs to be from the same NUMA node.
> * Use dev_err instead of custom NE log pattern.
> * Update the NE ioctl call to match the decoupling from the KVM API.
> 
> v2 -> v3
> 
> * Remove the WARN_ON calls.
> * Update static calls sanity checks.
> * Update kzfree() calls to kfree().
> * Remove file ops that do nothing for now - open, ioctl and release.
> 
> v1 -> v2
> 
> * Add log pattern for NE.
> * Update goto labels to match their purpose.
> * Remove the BUG_ON calls.
> * Check if enclave state is init when setting enclave vcpu.
> ---
>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 491 ++++++++++++++++++++++
>   1 file changed, 491 insertions(+)
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> index f70496813033..d6777008f685 100644
> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> @@ -39,7 +39,11 @@
>    * TODO: Update logic to create new sysfs entries instead of using
>    * a kernel parameter e.g. if multiple sysfs files needed.
>    */
> +static int ne_set_kernel_param(const char *val, const struct kernel_param *kp);
> +
>   static const struct kernel_param_ops ne_cpu_pool_ops = {
> +	.get = param_get_string,
> +	.set = ne_set_kernel_param,
>   };
>   
>   static char ne_cpus[PAGE_SIZE];
> @@ -60,6 +64,485 @@ struct ne_cpu_pool {
>   
>   static struct ne_cpu_pool ne_cpu_pool;
>   
> +static const struct file_operations ne_enclave_vcpu_fops = {
> +	.owner		= THIS_MODULE,
> +	.llseek		= noop_llseek,
> +};

Do we really need an fd for an object without operations? I think the 
general flow to add CPUs from the pool to the VM is very sensible. But I 
don't think we really need an fd as return value from that operation.

> +
> +/**
> + * ne_check_enclaves_created - Verify if at least one enclave has been created.
> + *
> + * @pdev: PCI device used for enclave lifetime management.
> + *
> + * @returns: true if at least one enclave is created, false otherwise.
> + */
> +static bool ne_check_enclaves_created(struct pci_dev *pdev)
> +{
> +	struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
> +
> +	if (!ne_pci_dev)
> +		return false;

Please pass in the ne_pci_dev into this function directly.

> +
> +	mutex_lock(&ne_pci_dev->enclaves_list_mutex);
> +
> +	if (list_empty(&ne_pci_dev->enclaves_list)) {
> +		mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
> +
> +		return false;

If you make this a return variable, you save on the unlock duplication.

> +	}
> +
> +	mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
> +
> +	return true;
> +}
> +
> +/**
> + * ne_setup_cpu_pool - Set the NE CPU pool after handling sanity checks such as
> + * not sharing CPU cores with the primary / parent VM or not using CPU 0, which
> + * should remain available for the primary / parent VM. Offline the CPUs from
> + * the pool after the checks passed.
> + *
> + * @pdev: PCI device used for enclave lifetime management.
> + * @ne_cpu_list: the CPU list used for setting NE CPU pool.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_setup_cpu_pool(struct pci_dev *pdev, const char *ne_cpu_list)
> +{
> +	unsigned int cpu = 0;
> +	unsigned int cpu_sibling = 0;
> +	int numa_node = -1;
> +	int rc = -EINVAL;
> +
> +	if (!capable(CAP_SYS_ADMIN)) {
> +		dev_err(&pdev->dev, "No admin capability for CPU pool setup\n");

No need to print anything here. It only gives non-admin users a chance 
to spill the kernel log. If non-admin users can write at all? Can they?

Also, isn't this at the wrong abstraction level? I would expect such a 
check to happen on the file write function, not here.

> +
> +		return -EPERM;
> +	}
> +
> +	if (!ne_cpu_list)
> +		return 0;
> +
> +	if (ne_check_enclaves_created(pdev)) {
> +		dev_err(&pdev->dev, "The CPU pool is used, enclaves created\n");
> +
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&ne_cpu_pool.mutex);
> +
> +	rc = cpulist_parse(ne_cpu_list, ne_cpu_pool.avail);
> +	if (rc < 0) {
> +		dev_err(&pdev->dev,
> +			"Error in cpulist parse [rc=%d]\n", rc);
> +
> +		goto unlock_mutex;
> +	}
> +
> +	/*
> +	 * Check if CPU 0 and its siblings are included in the provided CPU pool
> +	 * They should remain available for the primary / parent VM.
> +	 */
> +	if (cpumask_test_cpu(0, ne_cpu_pool.avail)) {
> +
> +		dev_err(&pdev->dev,
> +			"CPU 0 has to remain available for the primary VM\n");

Shouldn't this also change the read value of the sysfs file?

> +
> +		rc = -EINVAL;
> +
> +		goto unlock_mutex;
> +	}
> +
> +	for_each_cpu(cpu_sibling, topology_sibling_cpumask(0)) {
> +		if (cpumask_test_cpu(cpu_sibling, ne_cpu_pool.avail)) {
> +			dev_err(&pdev->dev,
> +				"CPU sibling %d of CPU 0 is in the CPU pool\n",
> +				cpu_sibling);

Same here. I would expect the sysfs file to reflect either the previous 
state or <empty> because failures mean no CPUs are donated anymore.

Can we somehow implement the get function of the param as something that 
gets generated dynamically?

> +
> +			rc = -EINVAL;
> +
> +			goto unlock_mutex;
> +		}
> +	}
> +
> +	/*
> +	 * Check if CPU siblings are included in the provided CPU pool. The
> +	 * expectation is that CPU cores are made available in the CPU pool for
> +	 * enclaves.
> +	 */
> +	for_each_cpu(cpu, ne_cpu_pool.avail) {
> +		for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
> +			if (!cpumask_test_cpu(cpu_sibling, ne_cpu_pool.avail)) {
> +				dev_err(&pdev->dev,
> +					"CPU %d is not in the CPU pool\n",
> +					cpu_sibling);
> +
> +				rc = -EINVAL;
> +
> +				goto unlock_mutex;
> +			}
> +		}
> +	}
> +
> +	/*
> +	 * Check if the CPUs from the NE CPU pool are from the same NUMA node.
> +	 */
> +	for_each_cpu(cpu, ne_cpu_pool.avail) {
> +		if (numa_node < 0) {
> +			numa_node = cpu_to_node(cpu);
> +
> +			if (numa_node < 0) {
> +				dev_err(&pdev->dev,
> +					"Invalid NUMA node %d\n", numa_node);
> +
> +				rc = -EINVAL;
> +
> +				goto unlock_mutex;
> +			}
> +		} else {
> +			if (numa_node != cpu_to_node(cpu)) {
> +				dev_err(&pdev->dev,
> +					"CPUs are from different NUMA nodes\n");
> +
> +				rc = -EINVAL;
> +
> +				goto unlock_mutex;
> +			}
> +		}
> +	}
> +

There should be a comment here that describes the why:

/*
  * CPUs that are donated to enclaves should not be considered online
  * by Linux anymore, as the hypervisor will degrade them to floating.
  *
  * We offline them here, to not degrade performance and expose correct
  * topology to Linux and user space.
  */

> +	for_each_cpu(cpu, ne_cpu_pool.avail) {
> +		rc = remove_cpu(cpu);
> +		if (rc != 0) {
> +			dev_err(&pdev->dev,
> +				"CPU %d is not offlined [rc=%d]\n", cpu, rc);
> +
> +			goto online_cpus;
> +		}
> +	}
> +
> +	mutex_unlock(&ne_cpu_pool.mutex);
> +
> +	return 0;
> +
> +online_cpus:
> +	for_each_cpu(cpu, ne_cpu_pool.avail)
> +		add_cpu(cpu);
> +unlock_mutex:
> +	mutex_unlock(&ne_cpu_pool.mutex);
> +
> +	return rc;
> +}
> +
> +/**
> + * ne_teardown_cpu_pool - Online the CPUs from the NE CPU pool and cleanup the
> + * CPU pool.
> + *
> + * @pdev: PCI device used for enclave lifetime management.
> + */
> +static void ne_teardown_cpu_pool(struct pci_dev *pdev)
> +{
> +	unsigned int cpu = 0;
> +	int rc = -EINVAL;
> +
> +	if (!capable(CAP_SYS_ADMIN)) {
> +		dev_err(&pdev->dev, "No admin capability for CPU pool setup\n");
> +
> +		return;
> +	}
> +
> +	if (!ne_cpu_pool.avail)
> +		return;
> +
> +	if (ne_check_enclaves_created(pdev)) {
> +		dev_err(&pdev->dev, "The CPU pool is used, enclaves created\n");
> +
> +		return;
> +	}
> +
> +	mutex_lock(&ne_cpu_pool.mutex);
> +
> +	for_each_cpu(cpu, ne_cpu_pool.avail) {
> +		rc = add_cpu(cpu);
> +		if (rc != 0)
> +			dev_err(&pdev->dev,
> +				"CPU %d is not onlined [rc=%d]\n", cpu, rc);
> +	}
> +
> +	cpumask_clear(ne_cpu_pool.avail);
> +
> +	mutex_unlock(&ne_cpu_pool.mutex);
> +}
> +
> +static int ne_set_kernel_param(const char *val, const struct kernel_param *kp)
> +{
> +	const char *ne_cpu_list = val;
> +	struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
> +					      PCI_DEVICE_ID_NE, NULL);

Isn't there a better way?

> +	int rc = -EINVAL;
> +
> +	if (!pdev)
> +		return -ENODEV;
> +
> +	ne_teardown_cpu_pool(pdev);
> +
> +	rc = ne_setup_cpu_pool(pdev, ne_cpu_list);
> +	if (rc < 0) {
> +		dev_err(&pdev->dev, "Error in setup CPU pool [rc=%d]\n", rc);
> +
> +		return rc;
> +	}
> +
> +	return param_set_copystring(val, kp);
> +}
> +
> +/**
> + * ne_get_cpu_from_cpu_pool - Get a CPU from the CPU pool. If the vCPU id is 0,
> + * the CPU is autogenerated and chosen from the NE CPU pool.
> + *
> + * This function gets called with the ne_enclave mutex held.
> + *
> + * @ne_enclave: private data associated with the current enclave.
> + * @vcpu_id: id of the CPU to be associated with the given slot, apic id on x86.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_get_cpu_from_cpu_pool(struct ne_enclave *ne_enclave, u32 *vcpu_id)

That's a very awkward API. Can you instead just pass by-value and return 
the resulting CPU ID?

> +{
> +	unsigned int cpu = 0;
> +	unsigned int cpu_sibling = 0;
> +
> +	if (*vcpu_id != 0) {
> +		if (cpumask_test_cpu(*vcpu_id, ne_enclave->cpu_siblings)) {
> +			cpumask_clear_cpu(*vcpu_id, ne_enclave->cpu_siblings);
> +
> +			return 0;
> +		}
> +
> +		mutex_lock(&ne_cpu_pool.mutex);
> +
> +		if (!cpumask_test_cpu(*vcpu_id, ne_cpu_pool.avail)) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "CPU %d is not in NE CPU pool\n",
> +					    *vcpu_id);
> +
> +			mutex_unlock(&ne_cpu_pool.mutex);
> +
> +			return -EINVAL;

I think you're better off making the return value explicit for the 
error, so that user space can print the error message rather than us.

> +		}
> +
> +		cpumask_clear_cpu(*vcpu_id, ne_cpu_pool.avail);
> +
> +		/*
> +		 * Make sure the CPU siblings are not marked as available
> +		 * anymore.
> +		 */
> +		for_each_cpu(cpu_sibling, topology_sibling_cpumask(*vcpu_id)) {
> +			if (cpu_sibling != *vcpu_id) {
> +				cpumask_clear_cpu(cpu_sibling,
> +						  ne_cpu_pool.avail);
> +
> +				cpumask_set_cpu(cpu_sibling,
> +						ne_enclave->cpu_siblings);
> +			}
> +		}
> +
> +		mutex_unlock(&ne_cpu_pool.mutex);
> +
> +		return 0;
> +	}
> +
> +	/* There are CPU siblings available to choose from. */
> +	cpu = cpumask_any(ne_enclave->cpu_siblings);
> +	if (cpu < nr_cpu_ids) {
> +		cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
> +
> +		*vcpu_id = cpu;
> +
> +		return 0;
> +	}
> +
> +	mutex_lock(&ne_cpu_pool.mutex);
> +
> +	/* Choose any CPU from the available CPU pool. */
> +	cpu = cpumask_any(ne_cpu_pool.avail);
> +	if (cpu >= nr_cpu_ids) {
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "No CPUs available in CPU pool\n");
> +
> +		mutex_unlock(&ne_cpu_pool.mutex);
> +
> +		return -EINVAL;

I think you're better off making the return value explicit for the 
error, so that user space can print the error message rather than us.

> +	}
> +
> +	cpumask_clear_cpu(cpu, ne_cpu_pool.avail);
> +
> +	/* Make sure the CPU siblings are not marked as available anymore. */
> +	for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
> +		if (cpu_sibling != cpu) {
> +			cpumask_clear_cpu(cpu_sibling, ne_cpu_pool.avail);
> +
> +			cpumask_set_cpu(cpu_sibling, ne_enclave->cpu_siblings);
> +		}
> +	}
> +
> +	mutex_unlock(&ne_cpu_pool.mutex);

I find the function slightly confusingly structured. Why can't we do 
something like


   if (!vcpu_id) {
     vcpu_id = find_next_free_vcpu_id();
     if (vcpu_id < 0)
         return -ENOSPC;
   }

   [logic to handle an explicit vcpu id]

I think that would be much more readable.

> +
> +	*vcpu_id = cpu;
> +
> +	return 0;
> +}
> +
> +/**
> + * ne_create_vcpu_ioctl - Add vCPU to the slot associated with the current
> + * enclave. Create vCPU file descriptor to be further used for CPU handling.
> + *
> + * This function gets called with the ne_enclave mutex held.
> + *
> + * @ne_enclave: private data associated with the current enclave.
> + * @vcpu_id: id of the CPU to be associated with the given slot, apic id on x86.
> + *
> + * @returns: vCPU fd on success, negative return value on failure.
> + */
> +static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
> +{
> +	struct ne_pci_dev_cmd_reply cmd_reply = {};
> +	int fd = 0;
> +	struct file *file = NULL;
> +	struct ne_vcpu_id *ne_vcpu_id = NULL;
> +	int rc = -EINVAL;
> +	struct slot_add_vcpu_req slot_add_vcpu_req = {};
> +
> +	if (ne_enclave->mm != current->mm)
> +		return -EIO;
> +
> +	ne_vcpu_id = kzalloc(sizeof(*ne_vcpu_id), GFP_KERNEL);
> +	if (!ne_vcpu_id)
> +		return -ENOMEM;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0) {
> +		rc = fd;
> +
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "Error in getting unused fd [rc=%d]\n", rc);
> +
> +		goto free_ne_vcpu_id;
> +	}
> +
> +	/* TODO: Include (vcpu) id in the ne-vm-vcpu naming. */
> +	file = anon_inode_getfile("ne-vm-vcpu", &ne_enclave_vcpu_fops,
> +				  ne_enclave, O_RDWR);
> +	if (IS_ERR(file)) {
> +		rc = PTR_ERR(file);
> +
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "Error in anon inode get file [rc=%d]\n",
> +				    rc);
> +
> +		goto put_fd;
> +	}
> +
> +	slot_add_vcpu_req.slot_uid = ne_enclave->slot_uid;
> +	slot_add_vcpu_req.vcpu_id = vcpu_id;
> +
> +	rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_VCPU, &slot_add_vcpu_req,
> +			   sizeof(slot_add_vcpu_req), &cmd_reply,
> +			   sizeof(cmd_reply));
> +	if (rc < 0) {
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "Error in slot add vcpu [rc=%d]\n", rc);
> +
> +		goto put_file;
> +	}
> +
> +	ne_vcpu_id->vcpu_id = vcpu_id;
> +
> +	list_add(&ne_vcpu_id->vcpu_id_list_entry, &ne_enclave->vcpu_ids_list);
> +
> +	ne_enclave->nr_vcpus++;
> +
> +	fd_install(fd, file);
> +
> +	return fd;
> +
> +put_file:
> +	fput(file);
> +put_fd:
> +	put_unused_fd(fd);
> +free_ne_vcpu_id:
> +	kfree(ne_vcpu_id);
> +
> +	return rc;
> +}
> +
> +static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
> +			     unsigned long arg)
> +{
> +	struct ne_enclave *ne_enclave = file->private_data;
> +
> +	if (!ne_enclave || !ne_enclave->pdev)
> +		return -EINVAL;
> +
> +	switch (cmd) {
> +	case NE_CREATE_VCPU: {

Can this be an ADD_VCPU rather than CREATE? We don't really need a vcpu 
fd after all ...


Alex

> +		int rc = -EINVAL;
> +		u32 vcpu_id = 0;
> +
> +		if (copy_from_user(&vcpu_id, (void *)arg, sizeof(vcpu_id))) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in copy from user\n");
> +
> +			return -EFAULT;
> +		}
> +
> +		mutex_lock(&ne_enclave->enclave_info_mutex);
> +
> +		if (ne_enclave->state != NE_STATE_INIT) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Enclave isn't in init state\n");
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -EINVAL;
> +		}
> +
> +		/* Use the CPU pool for choosing a CPU for the enclave. */
> +		rc = ne_get_cpu_from_cpu_pool(ne_enclave, &vcpu_id);
> +		if (rc < 0) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in get CPU from pool\n");
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -EINVAL;
> +		}
> +
> +		rc = ne_create_vcpu_ioctl(ne_enclave, vcpu_id);
> +
> +		/* Put back the CPU in enclave cpu pool, if add vcpu error. */
> +		if (rc < 0)
> +			cpumask_set_cpu(vcpu_id, ne_enclave->cpu_siblings);
> +
> +		mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +		if (copy_to_user((void *)arg, &vcpu_id, sizeof(vcpu_id))) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in copy to user\n");
> +
> +			return -EFAULT;
> +		}
> +
> +		return rc;
> +	}
> +
> +	default:
> +		return -ENOTTY;
> +	}
> +
> +	return 0;
> +}
> +
>   static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
>   {
>   	__poll_t mask = 0;
> @@ -79,6 +562,7 @@ static const struct file_operations ne_enclave_fops = {
>   	.owner		= THIS_MODULE,
>   	.llseek		= noop_llseek,
>   	.poll		= ne_enclave_poll,
> +	.unlocked_ioctl	= ne_enclave_ioctl,
>   };
>   
>   /**
> @@ -286,8 +770,15 @@ static int __init ne_init(void)
>   
>   static void __exit ne_exit(void)
>   {
> +	struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
> +					      PCI_DEVICE_ID_NE, NULL);
> +	if (!pdev)
> +		return;
> +
>   	pci_unregister_driver(&ne_pci_driver);
>   
> +	ne_teardown_cpu_pool(pdev);
> +
>   	free_cpumask_var(ne_cpu_pool.avail);
>   }
>   
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info
  2020-06-22 20:03 ` [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info Andra Paraschiv
@ 2020-07-06 10:16   ` Alexander Graf
  2020-07-06 13:35     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 10:16 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> Before setting the memory regions for the enclave, the enclave image
> needs to be placed in memory. After the memory regions are set, this
> memory cannot be used anymore by the VM, being carved out.
> 
> Add ioctl command logic to get the offset in enclave memory where to
> place the enclave image. Then the user space tooling copies the enclave
> image in the memory using the given memory offset.
> 
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Use dev_err instead of custom NE log pattern.
> * Set enclave image load offset based on flags.
> * Update the naming for the ioctl command from metadata to info.
> 
> v2 -> v3
> 
> * No changes.
> 
> v1 -> v2
> 
> * New in v2.
> ---
>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 25 +++++++++++++++++++++++
>   1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> index d6777008f685..cfdefa52ed2a 100644
> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> @@ -536,6 +536,31 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>   		return rc;
>   	}
>   
> +	case NE_GET_IMAGE_LOAD_INFO: {
> +		struct ne_image_load_info image_load_info = {};
> +
> +		if (copy_from_user(&image_load_info, (void *)arg,
> +				   sizeof(image_load_info))) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in copy from user\n");

The -EFAULT tells you all you need. Just remove this print.

> +
> +			return -EFAULT;
> +		}
> +
> +		if (image_load_info.flags == NE_EIF_IMAGE)
> +			image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
> +
> +		if (copy_to_user((void *)arg, &image_load_info,
> +				 sizeof(image_load_info))) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in copy to user\n");

Same here.


Alex

> +
> +			return -EFAULT;
> +		}
> +
> +		return 0;
> +	}
> +
>   	default:
>   		return -ENOTTY;
>   	}
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set
  2020-06-22 20:03 ` [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set Andra Paraschiv
@ 2020-07-06 10:46   ` Alexander Graf
  2020-07-09  7:36     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 10:46 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> Another resource that is being set for an enclave is memory. User space
> memory regions, that need to be backed by contiguous memory regions,
> are associated with the enclave.
> 
> One solution for allocating / reserving contiguous memory regions, that
> is used for integration, is hugetlbfs. The user space process that is
> associated with the enclave passes to the driver these memory regions.
> 
> The enclave memory regions need to be from the same NUMA node as the
> enclave CPUs.
> 
> Add ioctl command logic for setting user space memory region for an
> enclave.
> 
> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Check enclave memory regions are from the same NUMA node as the
>    enclave CPUs.
> * Use dev_err instead of custom NE log pattern.
> * Update the NE ioctl call to match the decoupling from the KVM API.
> 
> v2 -> v3
> 
> * Remove the WARN_ON calls.
> * Update static calls sanity checks.
> * Update kzfree() calls to kfree().
> 
> v1 -> v2
> 
> * Add log pattern for NE.
> * Update goto labels to match their purpose.
> * Remove the BUG_ON calls.
> * Check if enclave max memory regions is reached when setting an enclave
>    memory region.
> * Check if enclave state is init when setting an enclave memory region.
> ---
>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 257 ++++++++++++++++++++++
>   1 file changed, 257 insertions(+)
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> index cfdefa52ed2a..17ccb6cdbd75 100644
> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> @@ -476,6 +476,233 @@ static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
>   	return rc;
>   }
>   
> +/**
> + * ne_sanity_check_user_mem_region - Sanity check the userspace memory
> + * region received during the set user memory region ioctl call.
> + *
> + * This function gets called with the ne_enclave mutex held.
> + *
> + * @ne_enclave: private data associated with the current enclave.
> + * @mem_region: user space memory region to be sanity checked.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
> +	struct ne_user_memory_region *mem_region)
> +{
> +	if (ne_enclave->mm != current->mm)
> +		return -EIO;
> +
> +	if ((mem_region->memory_size % NE_MIN_MEM_REGION_SIZE) != 0) {
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "Mem size not multiple of 2 MiB\n");
> +
> +		return -EINVAL;

Can we make this an error that gets propagated to user space explicitly? 
I'd rather have a clear error return value of this function than a 
random message in dmesg.

> +	}
> +
> +	if ((mem_region->userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||

This logic already relies on the fact that NE_MIN_MEM_REGION_SIZE is a 
power of two. Can you do the same above on the memory_size check?

> +	    !access_ok((void __user *)(unsigned long)mem_region->userspace_addr,
> +		       mem_region->memory_size)) {
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "Invalid user space addr range\n");
> +
> +		return -EINVAL;

Same comment again. Return different errors for different conditions, so 
that user space has a chance to print proper errors to its users.

Also, don't we have to check alignment of userspace_addr as well?

> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * ne_set_user_memory_region_ioctl - Add user space memory region to the slot
> + * associated with the current enclave.
> + *
> + * This function gets called with the ne_enclave mutex held.
> + *
> + * @ne_enclave: private data associated with the current enclave.
> + * @mem_region: user space memory region to be associated with the given slot.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
> +	struct ne_user_memory_region *mem_region)
> +{
> +	struct ne_pci_dev_cmd_reply cmd_reply = {};
> +	long gup_rc = 0;
> +	unsigned long i = 0;
> +	struct ne_mem_region *ne_mem_region = NULL;
> +	unsigned long nr_phys_contig_mem_regions = 0;
> +	unsigned long nr_pinned_pages = 0;
> +	struct page **phys_contig_mem_regions = NULL;
> +	int rc = -EINVAL;
> +	struct slot_add_mem_req slot_add_mem_req = {};
> +
> +	rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
> +	if (rc < 0)
> +		return rc;
> +
> +	ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
> +	if (!ne_mem_region)
> +		return -ENOMEM;
> +
> +	/*
> +	 * TODO: Update nr_pages value to handle contiguous virtual address
> +	 * ranges mapped to non-contiguous physical regions. Hugetlbfs can give
> +	 * 2 MiB / 1 GiB contiguous physical regions.
> +	 */
> +	ne_mem_region->nr_pages = mem_region->memory_size /
> +		NE_MIN_MEM_REGION_SIZE;
> +
> +	ne_mem_region->pages = kcalloc(ne_mem_region->nr_pages,
> +				       sizeof(*ne_mem_region->pages),
> +				       GFP_KERNEL);
> +	if (!ne_mem_region->pages) {
> +		kfree(ne_mem_region);
> +
> +		return -ENOMEM;

kfree(NULL) is a nop, so you can just set rc and goto free_mem_region 
here and below.

> +	}
> +
> +	phys_contig_mem_regions = kcalloc(ne_mem_region->nr_pages,
> +					  sizeof(*phys_contig_mem_regions),
> +					  GFP_KERNEL);
> +	if (!phys_contig_mem_regions) {
> +		kfree(ne_mem_region->pages);
> +		kfree(ne_mem_region);
> +
> +		return -ENOMEM;
> +	}
> +
> +	/*
> +	 * TODO: Handle non-contiguous memory regions received from user space.
> +	 * Hugetlbfs can give 2 MiB / 1 GiB contiguous physical regions. The
> +	 * virtual address space can be seen as contiguous, although it is
> +	 * mapped underneath to 2 MiB / 1 GiB physical regions e.g. 8 MiB
> +	 * virtual address space mapped to 4 physically contiguous regions of 2
> +	 * MiB.
> +	 */
> +	do {
> +		unsigned long tmp_nr_pages = ne_mem_region->nr_pages -
> +			nr_pinned_pages;
> +		struct page **tmp_pages = ne_mem_region->pages +
> +			nr_pinned_pages;
> +		u64 tmp_userspace_addr = mem_region->userspace_addr +
> +			nr_pinned_pages * NE_MIN_MEM_REGION_SIZE;
> +
> +		gup_rc = get_user_pages(tmp_userspace_addr, tmp_nr_pages,
> +					FOLL_GET, tmp_pages, NULL);
> +		if (gup_rc < 0) {
> +			rc = gup_rc;
> +
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in gup [rc=%d]\n", rc);
> +
> +			unpin_user_pages(ne_mem_region->pages, nr_pinned_pages);
> +
> +			goto free_mem_region;
> +		}
> +
> +		nr_pinned_pages += gup_rc;
> +
> +	} while (nr_pinned_pages < ne_mem_region->nr_pages);

Can this deadlock the kernel? Shouldn't we rather return an error when 
we can't pin all pages?

> +
> +	/*
> +	 * TODO: Update checks once physically contiguous regions are collected
> +	 * based on the user space address and get_user_pages() results.
> +	 */
> +	for (i = 0; i < ne_mem_region->nr_pages; i++) {
> +		if (!PageHuge(ne_mem_region->pages[i])) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Not a hugetlbfs page\n");
> +
> +			goto unpin_pages;
> +		}
> +
> +		if (huge_page_size(page_hstate(ne_mem_region->pages[i])) !=
> +		    NE_MIN_MEM_REGION_SIZE) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Page size isn't 2 MiB\n");

Why is a huge page size of >2MB a problem? Can't we just make 
huge_page_size() the ne mem slot size?

> +
> +			goto unpin_pages;
> +		}
> +
> +		if (ne_enclave->numa_node !=
> +		    page_to_nid(ne_mem_region->pages[i])) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Page isn't from NUMA node %d\n",
> +					    ne_enclave->numa_node);
> +
> +			goto unpin_pages;

Is there a way to give user space hints on *why* things are going wrong?

> +		}
> +
> +		/*
> +		 * TODO: Update once handled non-contiguous memory regions
> +		 * received from user space.
> +		 */
> +		phys_contig_mem_regions[i] = ne_mem_region->pages[i];
> +	}
> +
> +	/*
> +	 * TODO: Update once handled non-contiguous memory regions received
> +	 * from user space.
> +	 */
> +	nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
> +
> +	if ((ne_enclave->nr_mem_regions + nr_phys_contig_mem_regions) >
> +	    ne_enclave->max_mem_regions) {
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "Reached max memory regions %lld\n",
> +				    ne_enclave->max_mem_regions);
> +
> +		goto unpin_pages;
> +	}
> +
> +	for (i = 0; i < nr_phys_contig_mem_regions; i++) {
> +		u64 phys_addr = page_to_phys(phys_contig_mem_regions[i]);
> +
> +		slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
> +		slot_add_mem_req.paddr = phys_addr;
> +		/*
> +		 * TODO: Update memory size of physical contiguous memory
> +		 * region, in case of non-contiguous memory regions received
> +		 * from user space.
> +		 */
> +		slot_add_mem_req.size = NE_MIN_MEM_REGION_SIZE;

Yeah, for now, just make it huge_page_size()! :)

> +
> +		rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_MEM,
> +				   &slot_add_mem_req, sizeof(slot_add_mem_req),
> +				   &cmd_reply, sizeof(cmd_reply));
> +		if (rc < 0) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in slot add mem [rc=%d]\n",
> +					    rc);
> +
> +			/* TODO: Only unpin memory regions not added. */

Are we sure we're not creating an unusable system here?

> +			goto unpin_pages;
> +		}
> +
> +		ne_enclave->mem_size += slot_add_mem_req.size;
> +		ne_enclave->nr_mem_regions++;
> +
> +		memset(&slot_add_mem_req, 0, sizeof(slot_add_mem_req));
> +		memset(&cmd_reply, 0, sizeof(cmd_reply));

If you define the variables in the for loop scope, you don't need to 
manually zero them again.


Alex

> +	}
> +
> +	list_add(&ne_mem_region->mem_region_list_entry,
> +		 &ne_enclave->mem_regions_list);
> +
> +	kfree(phys_contig_mem_regions);
> +
> +	return 0;
> +
> +unpin_pages:
> +	unpin_user_pages(ne_mem_region->pages, ne_mem_region->nr_pages);
> +free_mem_region:
> +	kfree(phys_contig_mem_regions);
> +	kfree(ne_mem_region->pages);
> +	kfree(ne_mem_region);
> +
> +	return rc;
> +}
> +
>   static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>   			     unsigned long arg)
>   {
> @@ -561,6 +788,36 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>   		return 0;
>   	}
>   
> +	case NE_SET_USER_MEMORY_REGION: {
> +		struct ne_user_memory_region mem_region = {};
> +		int rc = -EINVAL;
> +
> +		if (copy_from_user(&mem_region, (void *)arg,
> +				   sizeof(mem_region))) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in copy from user\n");
> +
> +			return -EFAULT;
> +		}
> +
> +		mutex_lock(&ne_enclave->enclave_info_mutex);
> +
> +		if (ne_enclave->state != NE_STATE_INIT) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Enclave isn't in init state\n");
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -EINVAL;
> +		}
> +
> +		rc = ne_set_user_memory_region_ioctl(ne_enclave, &mem_region);
> +
> +		mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +		return rc;
> +	}
> +
>   	default:
>   		return -ENOTTY;
>   	}
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start
  2020-06-22 20:03 ` [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start Andra Paraschiv
@ 2020-07-06 11:21   ` Alexander Graf
  2020-07-07 18:27     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 11:21 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> After all the enclave resources are set, the enclave is ready for
> beginning to run.
> 
> Add ioctl command logic for starting an enclave after all its resources,
> memory regions and CPUs, have been set.
> 
> The enclave start information includes the local channel addressing -
> vsock CID - and the flags associated with the enclave.
> 
> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Use dev_err instead of custom NE log pattern.
> * Update the naming for the ioctl command from metadata to info.
> * Check for minimum enclave memory size.
> 
> v2 -> v3
> 
> * Remove the WARN_ON calls.
> * Update static calls sanity checks.
> 
> v1 -> v2
> 
> * Add log pattern for NE.
> * Check if enclave state is init when starting an enclave.
> * Remove the BUG_ON calls.
> ---
>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 114 ++++++++++++++++++++++
>   1 file changed, 114 insertions(+)
> 
> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> index 17ccb6cdbd75..d9794f327169 100644
> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
> @@ -703,6 +703,45 @@ static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
>   	return rc;
>   }
>   
> +/**
> + * ne_start_enclave_ioctl - Trigger enclave start after the enclave resources,
> + * such as memory and CPU, have been set.
> + *
> + * This function gets called with the ne_enclave mutex held.
> + *
> + * @ne_enclave: private data associated with the current enclave.
> + * @enclave_start_info: enclave info that includes enclave cid and flags.
> + *
> + * @returns: 0 on success, negative return value on failure.
> + */
> +static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
> +	struct ne_enclave_start_info *enclave_start_info)
> +{
> +	struct ne_pci_dev_cmd_reply cmd_reply = {};
> +	struct enclave_start_req enclave_start_req = {};
> +	int rc = -EINVAL;
> +
> +	enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
> +	enclave_start_req.flags = enclave_start_info->flags;
> +	enclave_start_req.slot_uid = ne_enclave->slot_uid;

I think it's easier to read if you do the initialization straight in the 
variable declaation:

   struct enclave_start_req enclave_start_req = {
     .enclave_cid = enclave_start_info->cid,
     .flags = enclave_start_info->flags,
     .slot_uid = ne_enclave->slot_uid,
   };

> +
> +	rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, &enclave_start_req,
> +			   sizeof(enclave_start_req), &cmd_reply,
> +			   sizeof(cmd_reply));
> +	if (rc < 0) {
> +		dev_err_ratelimited(ne_misc_dev.this_device,
> +				    "Error in enclave start [rc=%d]\n", rc);
> +
> +		return rc;
> +	}
> +
> +	ne_enclave->state = NE_STATE_RUNNING;
> +
> +	enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
> +
> +	return 0;
> +}
> +
>   static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>   			     unsigned long arg)
>   {
> @@ -818,6 +857,81 @@ static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>   		return rc;
>   	}
>   
> +	case NE_START_ENCLAVE: {
> +		struct ne_enclave_start_info enclave_start_info = {};
> +		int rc = -EINVAL;
> +
> +		if (copy_from_user(&enclave_start_info, (void *)arg,
> +				   sizeof(enclave_start_info))) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in copy from user\n");

No need to print anything here

> +
> +			return -EFAULT;
> +		}
> +
> +		mutex_lock(&ne_enclave->enclave_info_mutex);
> +
> +		if (ne_enclave->state != NE_STATE_INIT) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Enclave isn't in init state\n");
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -EINVAL;

Can this be its own return value instead?

> +		}
> +
> +		if (!ne_enclave->nr_mem_regions) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Enclave has no mem regions\n");
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -ENOMEM;
> +		}
> +
> +		if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Enclave memory is less than %ld\n",
> +					    NE_MIN_ENCLAVE_MEM_SIZE);
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -ENOMEM;
> +		}
> +
> +		if (!ne_enclave->nr_vcpus) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Enclave has no vcpus\n");
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -EINVAL;

Same here.

> +		}
> +
> +		if (!cpumask_empty(ne_enclave->cpu_siblings)) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "CPU siblings not used\n");
> +
> +			mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +			return -EINVAL;

Same here.

> +		}
> +
> +		rc = ne_start_enclave_ioctl(ne_enclave, &enclave_start_info);
> +
> +		mutex_unlock(&ne_enclave->enclave_info_mutex);
> +
> +		if (copy_to_user((void *)arg, &enclave_start_info,

This needs to be __user void *, no?


Alex

> +				 sizeof(enclave_start_info))) {
> +			dev_err_ratelimited(ne_misc_dev.this_device,
> +					    "Error in copy to user\n");
> +
> +			return -EFAULT;
> +		}
> +
> +		return rc;
> +	}
> +
>   	default:
>   		return -ENOTTY;
>   	}
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination
  2020-06-22 20:03 ` [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination Andra Paraschiv
@ 2020-07-06 11:26   ` Alexander Graf
  2020-07-06 14:15     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 11:26 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> An enclave is associated with an fd that is returned after the enclave
> creation logic is completed. This enclave fd is further used to setup
> enclave resources. Once the enclave needs to be terminated, the enclave
> fd is closed.
> 
> Add logic for enclave termination, that is mapped to the enclave fd
> release callback. Free the internal enclave info used for bookkeeping.
> 
> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>

Reviewed-by: Alexander Graf <graf@amazon.com>

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
  2020-06-22 20:03 ` [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver Andra Paraschiv
@ 2020-07-06 11:28   ` Alexander Graf
  2020-07-06 13:50     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 11:28 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Add PCI and SMP dependencies.
> 
> v2 -> v3
> 
> * Remove the GPL additional wording as SPDX-License-Identifier is
>    already in place.
> 
> v1 -> v2
> 
> * Update path to Kconfig to match the drivers/virt/nitro_enclaves
>    directory.
> * Update help in Kconfig.
> ---
>   drivers/virt/Kconfig                |  2 ++
>   drivers/virt/nitro_enclaves/Kconfig | 16 ++++++++++++++++
>   2 files changed, 18 insertions(+)
>   create mode 100644 drivers/virt/nitro_enclaves/Kconfig
> 
> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> index cbc1f25c79ab..80c5f9c16ec1 100644
> --- a/drivers/virt/Kconfig
> +++ b/drivers/virt/Kconfig
> @@ -32,4 +32,6 @@ config FSL_HV_MANAGER
>   	     partition shuts down.
>   
>   source "drivers/virt/vboxguest/Kconfig"
> +
> +source "drivers/virt/nitro_enclaves/Kconfig"
>   endif
> diff --git a/drivers/virt/nitro_enclaves/Kconfig b/drivers/virt/nitro_enclaves/Kconfig
> new file mode 100644
> index 000000000000..69e41aa2222d
> --- /dev/null
> +++ b/drivers/virt/nitro_enclaves/Kconfig
> @@ -0,0 +1,16 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> +
> +# Amazon Nitro Enclaves (NE) support.
> +# Nitro is a hypervisor that has been developed by Amazon.
> +
> +config NITRO_ENCLAVES
> +	tristate "Nitro Enclaves Support"
> +	depends on HOTPLUG_CPU && PCI && SMP

Let's also depend on ARM64 || X86, so that we don't burden all of the 
other archs that are not available in EC2 today with an additional 
config option to think about.

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver
  2020-06-22 20:03 ` [PATCH v4 15/18] nitro_enclaves: Add Makefile " Andra Paraschiv
@ 2020-07-06 11:30   ` Alexander Graf
  2020-07-06 14:00     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 11:30 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>

Reviewed-by: Alexander Graf <graf@amazon.com>


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage
  2020-06-22 20:03 ` [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage Andra Paraschiv
@ 2020-07-06 11:39   ` Alexander Graf
  2020-07-07 19:03     ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-06 11:39 UTC (permalink / raw)
  To: Andra Paraschiv, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 22.06.20 22:03, Andra Paraschiv wrote:
> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
> ---
> Changelog
> 
> v3 -> v4
> 
> * Update usage details to match the updates in v4.
> * Update NE ioctl interface usage.
> 
> v2 -> v3
> 
> * Remove the include directory to use the uapi from the kernel.
> * Remove the GPL additional wording as SPDX-License-Identifier is
>    already in place.
> 
> v1 -> v2
> 
> * New in v2.
> ---
>   samples/nitro_enclaves/.gitignore        |   2 +
>   samples/nitro_enclaves/Makefile          |  16 +
>   samples/nitro_enclaves/ne_ioctl_sample.c | 520 +++++++++++++++++++++++
>   3 files changed, 538 insertions(+)
>   create mode 100644 samples/nitro_enclaves/.gitignore
>   create mode 100644 samples/nitro_enclaves/Makefile
>   create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c
> 
> diff --git a/samples/nitro_enclaves/.gitignore b/samples/nitro_enclaves/.gitignore
> new file mode 100644
> index 000000000000..827934129c90
> --- /dev/null
> +++ b/samples/nitro_enclaves/.gitignore
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ne_ioctl_sample
> diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
> new file mode 100644
> index 000000000000..a3ec78fefb52
> --- /dev/null
> +++ b/samples/nitro_enclaves/Makefile
> @@ -0,0 +1,16 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> +
> +# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
> +# usage.
> +
> +.PHONY: all clean
> +
> +CFLAGS += -Wall
> +
> +all:
> +	$(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
> +
> +clean:
> +	rm -f ne_ioctl_sample
> diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c b/samples/nitro_enclaves/ne_ioctl_sample.c
> new file mode 100644
> index 000000000000..572143d55d77
> --- /dev/null
> +++ b/samples/nitro_enclaves/ne_ioctl_sample.c
> @@ -0,0 +1,520 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + */
> +
> +/**
> + * Sample flow of using the ioctl interface provided by the Nitro Enclaves (NE)
> + * kernel driver.
> + *
> + * Usage
> + * -----
> + *
> + * Load the nitro_enclaves module, setting also the enclave CPU pool. The
> + * enclave CPUs need to be full cores from the same NUMA node. CPU 0 and its
> + * siblings have to remain available for the primary / parent VM, so they
> + * cannot be included in the enclave CPU pool.
> + *
> + * See the cpu list section from the kernel documentation.
> + * https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
> + *
> + *	insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
> + *	lsmod
> + *
> + *	The CPU pool can be set at runtime, after the kernel module is loaded.
> + *
> + *	echo <cpu-list> > /sys/module/nitro_enclaves/parameters/ne_cpus
> + *
> + *	NUMA and CPU siblings information can be found using
> + *
> + *	lscpu
> + *	/proc/cpuinfo
> + *
> + * Check the online / offline CPU list. The CPUs from the pool should be
> + * offlined.
> + *
> + *	lscpu
> + *
> + * Check dmesg for any warnings / errors through the NE driver lifetime / usage.
> + * The NE logs contain the "nitro_enclaves" or "pci 0000:00:02.0" pattern.
> + *
> + *	dmesg
> + *
> + * Setup hugetlbfs huge pages. The memory needs to be from the same NUMA node as
> + * the enclave CPUs.
> + * https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
> + *
> + *	echo <nr_hugepages> > /proc/sys/vm/nr_hugepages
> + *
> + *	or set the number of 2 MiB / 1 GiB hugepages using
> + *
> + *	/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> + *	/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> + *
> + *	In this example 256 hugepages of 2 MiB are used.
> + *
> + * Build and run the NE sample.
> + *
> + *	make -C samples/nitro_enclaves clean
> + *	make -C samples/nitro_enclaves
> + *	./samples/nitro_enclaves/ne_ioctl_sample <path_to_enclave_image>
> + *
> + * Unload the nitro_enclaves module.
> + *
> + *	rmmod nitro_enclaves
> + *	lsmod
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <poll.h>
> +#include <pthread.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <sys/eventfd.h>
> +#include <sys/mman.h>
> +#include <sys/socket.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +#include <linux/nitro_enclaves.h>
> +#include <linux/vm_sockets.h>
> +
> +/* Nitro Enclaves (NE) misc device that provides the ioctl interface. */
> +#define NE_DEV_NAME "/dev/nitro_enclaves"
> +#define NE_EXPECTED_API_VERSION (1)
> +
> +/* Timeout in seconds / milliseconds for each poll event. */
> +#define NE_POLL_WAIT_TIME (60)
> +#define NE_POLL_WAIT_TIME_MS (NE_POLL_WAIT_TIME * 1000)
> +
> +/* Amount of time in seconds for the process to keep the enclave alive. */
> +#define NE_SLEEP_TIME (300)
> +
> +/* Enclave vCPUs metadata. */
> +#define NE_DEFAULT_NR_VCPUS (2)
> +
> +/* Enclave memory metadata */
> +
> +/* Min memory size - 2 MiB */
> +#define NE_MIN_MEM_REGION_SIZE (2 * 1024 * 1024)
> +
> +/* 256 memory regions of 2 MiB */
> +#define NE_DEFAULT_NR_MEM_REGIONS (256)
> +
> +/* Vsock addressing for enclave image loading heartbeat. */
> +#define NE_IMAGE_LOAD_VSOCK_CID (3)
> +#define NE_IMAGE_LOAD_VSOCK_PORT (9000)
> +#define NE_IMAGE_LOAD_HEARTBEAT_VALUE (0xb7)
> +
> +struct ne_mem_region {
> +	void *mem_addr;
> +	size_t mem_size;
> +};
> +
> +struct ne_vcpu {
> +	int vcpu_fd;
> +	unsigned int vcpu_id;
> +};
> +
> +/* Thread function for polling the enclave fd. */
> +void *ne_poll_enclave_fd(void *data)
> +{
> +	int enclave_fd = *(int *)data;
> +	struct pollfd fds[1] = {};
> +	int i = 0;
> +	int rc = 0;
> +
> +	printf("Running from poll thread, enclave fd %d\n", enclave_fd);
> +
> +	fds[0].fd = enclave_fd;
> +	fds[0].events = POLLIN | POLLERR | POLLHUP;
> +
> +	/* Keep on polling until the current process is terminated. */
> +	while (1) {
> +		printf("[iter %d] Polling ...\n", i);
> +
> +		rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
> +		if (rc < 0) {
> +			printf("Error in poll [%m]\n");
> +
> +			return NULL;
> +		}
> +
> +		i++;
> +
> +		if (!rc) {
> +			printf("Poll: %d seconds elapsed\n",
> +			       i * NE_POLL_WAIT_TIME);
> +
> +			continue;
> +		}
> +
> +		printf("Poll received value %d\n", fds[0].revents);
> +	}
> +
> +	return NULL;
> +}
> +
> +/* Allocate memory region that will be used for the enclave. */
> +static int ne_alloc_mem_region(struct ne_mem_region *ne_mem_region)
> +{
> +	if (!ne_mem_region)
> +		return -EINVAL;
> +
> +	if (!ne_mem_region->mem_size)
> +		return -EINVAL;
> +
> +	ne_mem_region->mem_addr = mmap(NULL, ne_mem_region->mem_size,
> +				       PROT_READ | PROT_WRITE,
> +				       MAP_PRIVATE | MAP_ANONYMOUS |
> +				       MAP_HUGETLB, -1, 0);
> +	if (ne_mem_region->mem_addr == MAP_FAILED) {
> +		printf("Error in mmap memory [%m]\n");
> +
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Place enclave image in enclave memory. */
> +static int ne_load_enclave_image(int enclave_fd,
> +	struct ne_mem_region ne_mem_regions[], char enclave_image_path[])
> +{
> +	struct ne_image_load_info image_load_info = {};
> +	int rc = 0;
> +
> +	if (enclave_fd < 0)
> +		return -EINVAL;
> +
> +	image_load_info.flags = NE_EIF_IMAGE;
> +
> +	rc = ioctl(enclave_fd, NE_GET_IMAGE_LOAD_INFO, &image_load_info);
> +	if (rc < 0) {
> +		printf("Error in get image load info [rc=%d]\n", rc);
> +
> +		return rc;
> +	}
> +
> +	printf("Enclave image offset in enclave memory is %lld\n",
> +	       image_load_info.memory_offset);
> +
> +	/*
> +	 * TODO: Copy enclave image in enclave memory starting from the given
> +	 * offset.
> +	 */

Just open and read into the buffer at the given offset? :)

> +
> +	return 0;
> +}
> +
> +/* Wait for a hearbeat from the enclave to check it has booted. */
> +static int ne_check_enclave_booted(void)
> +{
> +	struct sockaddr_vm client_vsock_addr = {};
> +	socklen_t client_vsock_len = sizeof(client_vsock_addr);
> +	struct pollfd fds[1] = {};
> +	int rc = 0;
> +	unsigned char recv_buf = 0;
> +	struct sockaddr_vm server_vsock_addr = {
> +		.svm_family = AF_VSOCK,
> +		.svm_cid = NE_IMAGE_LOAD_VSOCK_CID,
> +		.svm_port = NE_IMAGE_LOAD_VSOCK_PORT,
> +	};
> +	int server_vsock_fd = 0;
> +
> +	server_vsock_fd = socket(AF_VSOCK, SOCK_STREAM, 0);
> +	if (server_vsock_fd < 0) {
> +		rc = server_vsock_fd;
> +
> +		printf("Error in socket [rc=%d]\n", rc);
> +
> +		return rc;
> +	}
> +
> +	rc = bind(server_vsock_fd, (struct sockaddr *)&server_vsock_addr,
> +		  sizeof(server_vsock_addr));
> +	if (rc < 0) {
> +		printf("Error in bind [rc=%d]\n", rc);
> +
> +		goto out;
> +	}
> +
> +	rc = listen(server_vsock_fd, 1);
> +	if (rc < 0) {
> +		printf("Error in listen [rc=%d]\n", rc);
> +
> +		goto out;
> +	}
> +
> +	fds[0].fd = server_vsock_fd;
> +	fds[0].events = POLLIN;
> +
> +	rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
> +	if (rc < 0) {
> +		printf("Error in poll [%m]\n");
> +
> +		goto out;
> +	}
> +
> +	if (!rc) {
> +		printf("Poll timeout, %d seconds elapsed\n", NE_POLL_WAIT_TIME);
> +
> +		rc = -ETIMEDOUT;
> +
> +		goto out;
> +	}
> +
> +	if ((fds[0].revents & POLLIN) == 0) {
> +		printf("Poll received value %d\n", fds[0].revents);
> +
> +		rc = -EINVAL;
> +
> +		goto out;
> +	}
> +
> +	rc = accept(server_vsock_fd, (struct sockaddr *)&client_vsock_addr,
> +		    &client_vsock_len);
> +	if (rc < 0) {
> +		printf("Error in accept [rc=%d]\n", rc);
> +
> +		goto out;
> +	}
> +
> +	/*
> +	 * Read the heartbeat value that the init process in the enclave sends
> +	 * after vsock connect.
> +	 */
> +	rc = read(server_vsock_fd, &recv_buf, sizeof(recv_buf));
> +	if (rc < 0) {
> +		printf("Error in read [rc=%d]\n", rc);
> +
> +		goto out;
> +	}
> +
> +	if (rc != sizeof(recv_buf) ||
> +	    recv_buf != NE_IMAGE_LOAD_HEARTBEAT_VALUE) {
> +		printf("Read %d instead of %d\n", recv_buf,
> +		       NE_IMAGE_LOAD_HEARTBEAT_VALUE);
> +
> +		goto out;
> +	}
> +
> +	close(server_vsock_fd);
> +
> +	return 0;
> +
> +out:
> +	close(server_vsock_fd);
> +
> +	return rc;
> +}
> +
> +/* Set memory region for the given enclave. */
> +static int ne_set_mem_region(int enclave_fd, struct ne_mem_region ne_mem_region)
> +{
> +	struct ne_user_memory_region mem_region = {};
> +	int rc = 0;
> +
> +	if (enclave_fd < 0)
> +		return -EINVAL;
> +
> +	mem_region.memory_size = ne_mem_region.mem_size;
> +	mem_region.userspace_addr = (__u64)ne_mem_region.mem_addr;
> +
> +	rc = ioctl(enclave_fd, NE_SET_USER_MEMORY_REGION, &mem_region);
> +	if (rc < 0) {
> +		printf("Error in set user memory region [rc=%d]\n", rc);
> +
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Unmap all the memory regions that were set aside for the  enclave. */
> +static void ne_free_mem_regions(struct ne_mem_region ne_mem_regions[])
> +{
> +	unsigned int i = 0;
> +
> +	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++)
> +		munmap(ne_mem_regions[i].mem_addr, ne_mem_regions[i].mem_size);
> +}
> +
> +/* Create enclave vCPU. */
> +static int ne_create_vcpu(int enclave_fd, struct ne_vcpu *ne_vcpu)
> +{
> +	if (enclave_fd < 0)
> +		return -EINVAL;
> +
> +	if (!ne_vcpu)
> +		return -EINVAL;
> +
> +	ne_vcpu->vcpu_fd = ioctl(enclave_fd, NE_CREATE_VCPU, &ne_vcpu->vcpu_id);
> +	if (ne_vcpu->vcpu_fd < 0) {
> +		printf("Error in create vcpu [rc=%d]\n", ne_vcpu->vcpu_fd);
> +
> +		return ne_vcpu->vcpu_fd;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Release enclave vCPU fd(s). */
> +static void ne_release_vcpus(struct ne_vcpu ne_vcpus[])
> +{
> +	unsigned int i = 0;
> +
> +	for (i = 0; i < NE_DEFAULT_NR_VCPUS; i++)
> +		if (ne_vcpus[i].vcpu_fd > 0)
> +			close(ne_vcpus[i].vcpu_fd);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int enclave_fd = 0;
> +	char enclave_image_path[PATH_MAX] = {};
> +	struct ne_enclave_start_info enclave_start_info = {};
> +	unsigned int i = 0;
> +	int ne_api_version = 0;
> +	int ne_dev_fd = 0;
> +	struct ne_mem_region ne_mem_regions[NE_DEFAULT_NR_MEM_REGIONS] = {};
> +	struct ne_vcpu ne_vcpus[NE_DEFAULT_NR_VCPUS] = {};
> +	int rc = 0;
> +	unsigned long slot_uid = 0;
> +	pthread_t thread_id = 0;
> +
> +	if (argc != 2) {
> +		printf("Usage: %s <path_to_enclave_image>\n", argv[0]);
> +
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	strncpy(enclave_image_path, argv[1], sizeof(enclave_image_path) - 1);

Why can you not just pass argv[1] as path?

> +
> +	ne_dev_fd = open(NE_DEV_NAME, O_RDWR | O_CLOEXEC);
> +	if (ne_dev_fd < 0) {
> +		printf("Error in open NE device [rc=%d]\n", ne_dev_fd);
> +
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	ne_api_version = ioctl(ne_dev_fd, NE_GET_API_VERSION);
> +	if (ne_api_version != NE_EXPECTED_API_VERSION) {
> +		printf("Expected API version %d, provided API version %d\n",
> +		       NE_EXPECTED_API_VERSION, ne_api_version);
> +
> +		close(ne_dev_fd);
> +
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	printf("Creating enclave slot ...\n");
> +
> +	enclave_fd = ioctl(ne_dev_fd, NE_CREATE_VM, &slot_uid);
> +
> +	close(ne_dev_fd);
> +
> +	if (enclave_fd < 0) {
> +		printf("Error in create enclave slot [rc=%d]\n", enclave_fd);
> +
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	printf("Enclave fd %d\n", enclave_fd);
> +
> +	rc = pthread_create(&thread_id, NULL, ne_poll_enclave_fd,
> +			    (void *)&enclave_fd);
> +	if (rc < 0) {
> +		printf("Error in thread create [rc=%d]\n", rc);
> +
> +		close(enclave_fd);
> +
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
> +		ne_mem_regions[i].mem_size = NE_MIN_MEM_REGION_SIZE;
> +		rc = ne_alloc_mem_region(&ne_mem_regions[i]);
> +		if (rc < 0) {
> +			printf("Error in alloc mem region, iter %d [rc=%d]\n",
> +			       i, rc);
> +
> +			goto release_enclave_fd;
> +		}
> +	}
> +
> +	rc = ne_load_enclave_image(enclave_fd, ne_mem_regions,
> +				   enclave_image_path);
> +	if (rc < 0) {
> +		printf("Error in load enclave image [rc=%d]\n", rc);
> +
> +		goto release_enclave_fd;
> +	}
> +
> +	for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
> +		rc = ne_set_mem_region(enclave_fd, ne_mem_regions[i]);
> +		if (rc < 0) {
> +			printf("Error in set mem region, iter %d [rc=%d]\n",
> +			       i, rc);
> +
> +			goto release_enclave_fd;
> +		}
> +	}
> +
> +	printf("Enclave memory regions were added\n");
> +
> +	for (i = 0; i < NE_DEFAULT_NR_VCPUS; i++) {
> +		/*
> +		 * The vCPU is chosen from the enclave vCPU pool, if the value
> +		 * of the vcpu_id is 0.
> +		 */
> +		ne_vcpus[i].vcpu_id = 0;
> +		rc = ne_create_vcpu(enclave_fd, &ne_vcpus[i]);
> +		if (rc < 0) {
> +			printf("Error in create vcpu, iter %d [rc=%d]\n",
> +			       i, rc);
> +
> +			goto release_enclave_vcpu_fds;
> +		}
> +	}
> +
> +	printf("Enclave vCPUs were created\n");
> +
> +	rc = ioctl(enclave_fd, NE_START_ENCLAVE, &enclave_start_info);
> +	if (rc < 0) {
> +		printf("Error in start enclave [rc=%d]\n", rc);
> +
> +		goto release_enclave_vcpu_fds;
> +	}
> +
> +	printf("Enclave started, CID %llu\n", enclave_start_info.enclave_cid);
> +
> +	/*
> +	 * TODO: Check for enclave hearbeat after it has started to see if it
> +	 * has booted.
> +	 */

So you wrote the function to check for the heartbeat, but don't call it? 
Why?


Alex

> +
> +	printf("Entering sleep for %d seconds ...\n", NE_SLEEP_TIME);
> +
> +	sleep(NE_SLEEP_TIME);
> +
> +	ne_release_vcpus(ne_vcpus);
> +
> +	close(enclave_fd);
> +
> +	ne_free_mem_regions(ne_mem_regions);
> +
> +	exit(EXIT_SUCCESS);
> +
> +release_enclave_vcpu_fds:
> +	ne_release_vcpus(ne_vcpus);
> +release_enclave_fd:
> +	close(enclave_fd);
> +	ne_free_mem_regions(ne_mem_regions);
> +
> +	exit(EXIT_FAILURE);
> +}
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface
  2020-07-06  8:01       ` Alexander Graf
@ 2020-07-06 13:09         ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-06 13:09 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Benjamin Herrenschmidt, Colm MacCarthaigh, Bjoern Doebel,
	David Woodhouse, Frank van der Linden, Greg KH, Martin Pohlack,
	Matt Wilson, Paolo Bonzini, Balbir Singh, Stefano Garzarella,
	Stefan Hajnoczi, Stewart Smith, Uwe Dannowski, kvm,
	ne-devel-upstream



On 06/07/2020 11:01, Alexander Graf wrote:
>
>
> On 06.07.20 09:49, Paraschiv, Andra-Irina wrote:
>>
>>
>> On 06/07/2020 10:13, Alexander Graf wrote:
>>>
>>>
>>> On 22.06.20 22:03, Andra Paraschiv wrote:
>>>> The Nitro Enclaves driver provides an ioctl interface to the user 
>>>> space
>>>> for enclave lifetime management e.g. enclave creation / termination 
>>>> and
>>>> setting enclave resources such as memory and CPU.
>>>>
>>>> This ioctl interface is mapped to a Nitro Enclaves misc device.
>>>>
>>>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>>>> ---
>>>> Changelog
>>>>
>>>> v3 -> v4
>>>>
>>>> * Use dev_err instead of custom NE log pattern.
>>>> * Remove the NE CPU pool init during kernel module loading, as the CPU
>>>>    pool is now setup at runtime, via a sysfs file for the kernel
>>>>    parameter.
>>>> * Add minimum enclave memory size definition.
>>>>
>>>> v2 -> v3
>>>>
>>>> * Remove the GPL additional wording as SPDX-License-Identifier is
>>>>    already in place.
>>>> * Remove the WARN_ON calls.
>>>> * Remove linux/bug and linux/kvm_host includes that are not needed.
>>>> * Remove "ratelimited" from the logs that are not in the ioctl call
>>>>    paths.
>>>> * Remove file ops that do nothing for now - open and release.
>>>>
>>>> v1 -> v2
>>>>
>>>> * Add log pattern for NE.
>>>> * Update goto labels to match their purpose.
>>>> * Update ne_cpu_pool data structure to include the global mutex.
>>>> * Update NE misc device mode to 0660.
>>>> * Check if the CPU siblings are included in the NE CPU pool, as 
>>>> full CPU
>>>>    cores are given for the enclave(s).
>>>> ---
>>>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 133 
>>>> ++++++++++++++++++++++
>>>>   drivers/virt/nitro_enclaves/ne_pci_dev.c  |  11 ++
>>>>   2 files changed, 144 insertions(+)
>>>>   create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c
>>>>
>>>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>>>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>>> new file mode 100644
>>>> index 000000000000..628fb10c2b36
>>>> --- /dev/null
>>>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>>> @@ -0,0 +1,133 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
>>>> Reserved.
>>>> + */
>>>> +
>>>> +/**
>>>> + * Enclave lifetime management driver for Nitro Enclaves (NE).
>>>> + * Nitro is a hypervisor that has been developed by Amazon.
>>>> + */
>>>> +
>>>> +#include <linux/anon_inodes.h>
>>>> +#include <linux/capability.h>
>>>> +#include <linux/cpu.h>
>>>> +#include <linux/device.h>
>>>> +#include <linux/file.h>
>>>> +#include <linux/hugetlb.h>
>>>> +#include <linux/list.h>
>>>> +#include <linux/miscdevice.h>
>>>> +#include <linux/mm.h>
>>>> +#include <linux/mman.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/nitro_enclaves.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/poll.h>
>>>> +#include <linux/slab.h>
>>>> +#include <linux/types.h>
>>>> +
>>>> +#include "ne_misc_dev.h"
>>>> +#include "ne_pci_dev.h"
>>>> +
>>>> +#define NE_EIF_LOAD_OFFSET (8 * 1024UL * 1024UL)
>>>> +
>>>> +#define NE_MIN_ENCLAVE_MEM_SIZE (64 * 1024UL * 1024UL)
>>>> +
>>>> +#define NE_MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
>>>> +
>>>> +/*
>>>> + * TODO: Update logic to create new sysfs entries instead of using
>>>> + * a kernel parameter e.g. if multiple sysfs files needed.
>>>> + */
>>>> +static const struct kernel_param_ops ne_cpu_pool_ops = {
>>>
>>> Adding an empty ops struct looks very odd. If you fill it in a later 
>>> patch, please indicate so in a comment here.
>>
>> True, I already updated this in v5, to have the .get function here 
>> and the .set one in a later patch.
>>
>>>
>>>> +};
>>>> +
>>>> +static char ne_cpus[PAGE_SIZE];
>>>
>>> PAGE_SIZE is a bit excessive, no? Even if you list every single CPU 
>>> of a 256 CPU system you are <1024.
>>
>> It is a bit too much, I was thinking of it while declaring this. I 
>> can update to 1024 in v5.
>
> The largest NUMA node CPU count I'm aware of today is 64. Since we 
> limit the pool to a single node, we can't go beyond that. Let's be a 
> bit future proof and double that number: 128. Then we get to 401 
> characters if you pass in every single CPU as comma separated. I would 
> seriously hope most people would just pass ranges though.
>
> So how about we make it 512 for now?

We can set it like this, I changed to 512 and updated the comment as well.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation
  2020-07-06  7:53   ` Alexander Graf
@ 2020-07-06 13:12     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-06 13:12 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 10:53, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> Add ioctl command logic for enclave VM creation. It triggers a slot
>> allocation. The enclave resources will be associated with this slot and
>> it will be used as an identifier for triggering enclave run.
>>
>> Return a file descriptor, namely enclave fd. This is further used by the
>> associated user space enclave process to set enclave resources and
>> trigger enclave termination.
>>
>> The poll function is implemented in order to notify the enclave process
>> when an enclave exits without a specific enclave termination command
>> trigger e.g. when an enclave crashes.
>>
>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>
> Reviewed-by: Alexander Graf <graf@amazon.com>

Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info
  2020-07-06 10:16   ` Alexander Graf
@ 2020-07-06 13:35     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-06 13:35 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 13:16, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> Before setting the memory regions for the enclave, the enclave image
>> needs to be placed in memory. After the memory regions are set, this
>> memory cannot be used anymore by the VM, being carved out.
>>
>> Add ioctl command logic to get the offset in enclave memory where to
>> place the enclave image. Then the user space tooling copies the enclave
>> image in the memory using the given memory offset.
>>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Use dev_err instead of custom NE log pattern.
>> * Set enclave image load offset based on flags.
>> * Update the naming for the ioctl command from metadata to info.
>>
>> v2 -> v3
>>
>> * No changes.
>>
>> v1 -> v2
>>
>> * New in v2.
>> ---
>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 25 +++++++++++++++++++++++
>>   1 file changed, 25 insertions(+)
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> index d6777008f685..cfdefa52ed2a 100644
>> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> @@ -536,6 +536,31 @@ static long ne_enclave_ioctl(struct file *file, 
>> unsigned int cmd,
>>           return rc;
>>       }
>>   +    case NE_GET_IMAGE_LOAD_INFO: {
>> +        struct ne_image_load_info image_load_info = {};
>> +
>> +        if (copy_from_user(&image_load_info, (void *)arg,
>> +                   sizeof(image_load_info))) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in copy from user\n");
>
> The -EFAULT tells you all you need. Just remove this print.

Removed the log from here and the other occurrences in the patch series.

Thanks,
Andra

>
>> +
>> +            return -EFAULT;
>> +        }
>> +
>> +        if (image_load_info.flags == NE_EIF_IMAGE)
>> +            image_load_info.memory_offset = NE_EIF_LOAD_OFFSET;
>> +
>> +        if (copy_to_user((void *)arg, &image_load_info,
>> +                 sizeof(image_load_info))) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in copy to user\n");
>
> Same here.
>
>
> Alex
>
>> +
>> +            return -EFAULT;
>> +        }
>> +
>> +        return 0;
>> +    }
>> +
>>       default:
>>           return -ENOTTY;
>>       }
>>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
  2020-07-06 11:28   ` Alexander Graf
@ 2020-07-06 13:50     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-06 13:50 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 14:28, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Add PCI and SMP dependencies.
>>
>> v2 -> v3
>>
>> * Remove the GPL additional wording as SPDX-License-Identifier is
>>    already in place.
>>
>> v1 -> v2
>>
>> * Update path to Kconfig to match the drivers/virt/nitro_enclaves
>>    directory.
>> * Update help in Kconfig.
>> ---
>>   drivers/virt/Kconfig                |  2 ++
>>   drivers/virt/nitro_enclaves/Kconfig | 16 ++++++++++++++++
>>   2 files changed, 18 insertions(+)
>>   create mode 100644 drivers/virt/nitro_enclaves/Kconfig
>>
>> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
>> index cbc1f25c79ab..80c5f9c16ec1 100644
>> --- a/drivers/virt/Kconfig
>> +++ b/drivers/virt/Kconfig
>> @@ -32,4 +32,6 @@ config FSL_HV_MANAGER
>>            partition shuts down.
>>     source "drivers/virt/vboxguest/Kconfig"
>> +
>> +source "drivers/virt/nitro_enclaves/Kconfig"
>>   endif
>> diff --git a/drivers/virt/nitro_enclaves/Kconfig 
>> b/drivers/virt/nitro_enclaves/Kconfig
>> new file mode 100644
>> index 000000000000..69e41aa2222d
>> --- /dev/null
>> +++ b/drivers/virt/nitro_enclaves/Kconfig
>> @@ -0,0 +1,16 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +#
>> +# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
>> Reserved.
>> +
>> +# Amazon Nitro Enclaves (NE) support.
>> +# Nitro is a hypervisor that has been developed by Amazon.
>> +
>> +config NITRO_ENCLAVES
>> +    tristate "Nitro Enclaves Support"
>> +    depends on HOTPLUG_CPU && PCI && SMP
>
> Let's also depend on ARM64 || X86, so that we don't burden all of the 
> other archs that are not available in EC2 today with an additional 
> config option to think about.

Included the arch specs.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver
  2020-07-06 11:30   ` Alexander Graf
@ 2020-07-06 14:00     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-06 14:00 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 14:30, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>
> Reviewed-by: Alexander Graf <graf@amazon.com>

Added. Thank you.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination
  2020-07-06 11:26   ` Alexander Graf
@ 2020-07-06 14:15     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-06 14:15 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 14:26, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> An enclave is associated with an fd that is returned after the enclave
>> creation logic is completed. This enclave fd is further used to setup
>> enclave resources. Once the enclave needs to be terminated, the enclave
>> fd is closed.
>>
>> Add logic for enclave termination, that is mapped to the enclave fd
>> release callback. Free the internal enclave info used for bookkeeping.
>>
>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>
> Reviewed-by: Alexander Graf <graf@amazon.com>

Added. Thanks for review.

Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start
  2020-07-06 11:21   ` Alexander Graf
@ 2020-07-07 18:27     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-07 18:27 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 14:21, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> After all the enclave resources are set, the enclave is ready for
>> beginning to run.
>>
>> Add ioctl command logic for starting an enclave after all its resources,
>> memory regions and CPUs, have been set.
>>
>> The enclave start information includes the local channel addressing -
>> vsock CID - and the flags associated with the enclave.
>>
>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Use dev_err instead of custom NE log pattern.
>> * Update the naming for the ioctl command from metadata to info.
>> * Check for minimum enclave memory size.
>>
>> v2 -> v3
>>
>> * Remove the WARN_ON calls.
>> * Update static calls sanity checks.
>>
>> v1 -> v2
>>
>> * Add log pattern for NE.
>> * Check if enclave state is init when starting an enclave.
>> * Remove the BUG_ON calls.
>> ---
>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 114 ++++++++++++++++++++++
>>   1 file changed, 114 insertions(+)
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> index 17ccb6cdbd75..d9794f327169 100644
>> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> @@ -703,6 +703,45 @@ static int 
>> ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
>>       return rc;
>>   }
>>   +/**
>> + * ne_start_enclave_ioctl - Trigger enclave start after the enclave 
>> resources,
>> + * such as memory and CPU, have been set.
>> + *
>> + * This function gets called with the ne_enclave mutex held.
>> + *
>> + * @ne_enclave: private data associated with the current enclave.
>> + * @enclave_start_info: enclave info that includes enclave cid and 
>> flags.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_start_enclave_ioctl(struct ne_enclave *ne_enclave,
>> +    struct ne_enclave_start_info *enclave_start_info)
>> +{
>> +    struct ne_pci_dev_cmd_reply cmd_reply = {};
>> +    struct enclave_start_req enclave_start_req = {};
>> +    int rc = -EINVAL;
>> +
>> +    enclave_start_req.enclave_cid = enclave_start_info->enclave_cid;
>> +    enclave_start_req.flags = enclave_start_info->flags;
>> +    enclave_start_req.slot_uid = ne_enclave->slot_uid;
>
> I think it's easier to read if you do the initialization straight in 
> the variable declaation:
>
>   struct enclave_start_req enclave_start_req = {
>     .enclave_cid = enclave_start_info->cid,
>     .flags = enclave_start_info->flags,
>     .slot_uid = ne_enclave->slot_uid,
>   };

Good point. In v5, I moved a couple of sanity checks from the ioctl 
switch case block in this function, so this would not apply wrt the 
updated codebase. But I'll keep this suggestion as reference for other 
cases.

>
>> +
>> +    rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, 
>> &enclave_start_req,
>> +               sizeof(enclave_start_req), &cmd_reply,
>> +               sizeof(cmd_reply));
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "Error in enclave start [rc=%d]\n", rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    ne_enclave->state = NE_STATE_RUNNING;
>> +
>> +    enclave_start_info->enclave_cid = cmd_reply.enclave_cid;
>> +
>> +    return 0;
>> +}
>> +
>>   static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>>                    unsigned long arg)
>>   {
>> @@ -818,6 +857,81 @@ static long ne_enclave_ioctl(struct file *file, 
>> unsigned int cmd,
>>           return rc;
>>       }
>>   +    case NE_START_ENCLAVE: {
>> +        struct ne_enclave_start_info enclave_start_info = {};
>> +        int rc = -EINVAL;
>> +
>> +        if (copy_from_user(&enclave_start_info, (void *)arg,
>> +                   sizeof(enclave_start_info))) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in copy from user\n");
>
> No need to print anything here

Done.

>
>> +
>> +            return -EFAULT;
>> +        }
>> +
>> +        mutex_lock(&ne_enclave->enclave_info_mutex);
>> +
>> +        if (ne_enclave->state != NE_STATE_INIT) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Enclave isn't in init state\n");
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -EINVAL;
>
> Can this be its own return value instead?

Yes, it should be and this would help with bubbling up to user space the 
reason of failure in more detail.

I started to define a set of NE error codes and update the failure paths 
(e.g. this one and the others mentioned below) to use those error codes.

>
>> +        }
>> +
>> +        if (!ne_enclave->nr_mem_regions) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Enclave has no mem regions\n");
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -ENOMEM;
>> +        }
>> +
>> +        if (ne_enclave->mem_size < NE_MIN_ENCLAVE_MEM_SIZE) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Enclave memory is less than %ld\n",
>> +                        NE_MIN_ENCLAVE_MEM_SIZE);
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -ENOMEM;
>> +        }
>> +
>> +        if (!ne_enclave->nr_vcpus) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Enclave has no vcpus\n");
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -EINVAL;
>
> Same here.
>
>> +        }
>> +
>> +        if (!cpumask_empty(ne_enclave->cpu_siblings)) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "CPU siblings not used\n");
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -EINVAL;
>
> Same here.
>
>> +        }
>> +
>> +        rc = ne_start_enclave_ioctl(ne_enclave, &enclave_start_info);
>> +
>> +        mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +        if (copy_to_user((void *)arg, &enclave_start_info,
>
> This needs to be __user void *, no?
>
>

Included "__user" in all the copy_from_user() / copy_to_user() calls.

Thank you.

Andra

>
>> + sizeof(enclave_start_info))) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in copy to user\n");
>> +
>> +            return -EFAULT;
>> +        }
>> +
>> +        return rc;
>> +    }
>> +
>>       default:
>>           return -ENOTTY;
>>       }
>>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage
  2020-07-06 11:39   ` Alexander Graf
@ 2020-07-07 19:03     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-07 19:03 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 14:39, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Update usage details to match the updates in v4.
>> * Update NE ioctl interface usage.
>>
>> v2 -> v3
>>
>> * Remove the include directory to use the uapi from the kernel.
>> * Remove the GPL additional wording as SPDX-License-Identifier is
>>    already in place.
>>
>> v1 -> v2
>>
>> * New in v2.
>> ---
>>   samples/nitro_enclaves/.gitignore        |   2 +
>>   samples/nitro_enclaves/Makefile          |  16 +
>>   samples/nitro_enclaves/ne_ioctl_sample.c | 520 +++++++++++++++++++++++
>>   3 files changed, 538 insertions(+)
>>   create mode 100644 samples/nitro_enclaves/.gitignore
>>   create mode 100644 samples/nitro_enclaves/Makefile
>>   create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c
>>
>> diff --git a/samples/nitro_enclaves/.gitignore 
>> b/samples/nitro_enclaves/.gitignore
>> new file mode 100644
>> index 000000000000..827934129c90
>> --- /dev/null
>> +++ b/samples/nitro_enclaves/.gitignore
>> @@ -0,0 +1,2 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +ne_ioctl_sample
>> diff --git a/samples/nitro_enclaves/Makefile 
>> b/samples/nitro_enclaves/Makefile
>> new file mode 100644
>> index 000000000000..a3ec78fefb52
>> --- /dev/null
>> +++ b/samples/nitro_enclaves/Makefile
>> @@ -0,0 +1,16 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +#
>> +# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
>> Reserved.
>> +
>> +# Enclave lifetime management support for Nitro Enclaves (NE) - 
>> ioctl sample
>> +# usage.
>> +
>> +.PHONY: all clean
>> +
>> +CFLAGS += -Wall
>> +
>> +all:
>> +    $(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
>> +
>> +clean:
>> +    rm -f ne_ioctl_sample
>> diff --git a/samples/nitro_enclaves/ne_ioctl_sample.c 
>> b/samples/nitro_enclaves/ne_ioctl_sample.c
>> new file mode 100644
>> index 000000000000..572143d55d77
>> --- /dev/null
>> +++ b/samples/nitro_enclaves/ne_ioctl_sample.c
>> @@ -0,0 +1,520 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights 
>> Reserved.
>> + */
>> +
>> +/**
>> + * Sample flow of using the ioctl interface provided by the Nitro 
>> Enclaves (NE)
>> + * kernel driver.
>> + *
>> + * Usage
>> + * -----
>> + *
>> + * Load the nitro_enclaves module, setting also the enclave CPU 
>> pool. The
>> + * enclave CPUs need to be full cores from the same NUMA node. CPU 0 
>> and its
>> + * siblings have to remain available for the primary / parent VM, so 
>> they
>> + * cannot be included in the enclave CPU pool.
>> + *
>> + * See the cpu list section from the kernel documentation.
>> + * 
>> https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
>> + *
>> + *    insmod drivers/virt/nitro_enclaves/nitro_enclaves.ko
>> + *    lsmod
>> + *
>> + *    The CPU pool can be set at runtime, after the kernel module is 
>> loaded.
>> + *
>> + *    echo <cpu-list> > /sys/module/nitro_enclaves/parameters/ne_cpus
>> + *
>> + *    NUMA and CPU siblings information can be found using
>> + *
>> + *    lscpu
>> + *    /proc/cpuinfo
>> + *
>> + * Check the online / offline CPU list. The CPUs from the pool 
>> should be
>> + * offlined.
>> + *
>> + *    lscpu
>> + *
>> + * Check dmesg for any warnings / errors through the NE driver 
>> lifetime / usage.
>> + * The NE logs contain the "nitro_enclaves" or "pci 0000:00:02.0" 
>> pattern.
>> + *
>> + *    dmesg
>> + *
>> + * Setup hugetlbfs huge pages. The memory needs to be from the same 
>> NUMA node as
>> + * the enclave CPUs.
>> + * https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>> + *
>> + *    echo <nr_hugepages> > /proc/sys/vm/nr_hugepages
>> + *
>> + *    or set the number of 2 MiB / 1 GiB hugepages using
>> + *
>> + *    /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>> + *    /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>> + *
>> + *    In this example 256 hugepages of 2 MiB are used.
>> + *
>> + * Build and run the NE sample.
>> + *
>> + *    make -C samples/nitro_enclaves clean
>> + *    make -C samples/nitro_enclaves
>> + *    ./samples/nitro_enclaves/ne_ioctl_sample <path_to_enclave_image>
>> + *
>> + * Unload the nitro_enclaves module.
>> + *
>> + *    rmmod nitro_enclaves
>> + *    lsmod
>> + */
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <errno.h>
>> +#include <fcntl.h>
>> +#include <limits.h>
>> +#include <poll.h>
>> +#include <pthread.h>
>> +#include <string.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/eventfd.h>
>> +#include <sys/mman.h>
>> +#include <sys/socket.h>
>> +#include <sys/types.h>
>> +#include <unistd.h>
>> +
>> +#include <linux/nitro_enclaves.h>
>> +#include <linux/vm_sockets.h>
>> +
>> +/* Nitro Enclaves (NE) misc device that provides the ioctl 
>> interface. */
>> +#define NE_DEV_NAME "/dev/nitro_enclaves"
>> +#define NE_EXPECTED_API_VERSION (1)
>> +
>> +/* Timeout in seconds / milliseconds for each poll event. */
>> +#define NE_POLL_WAIT_TIME (60)
>> +#define NE_POLL_WAIT_TIME_MS (NE_POLL_WAIT_TIME * 1000)
>> +
>> +/* Amount of time in seconds for the process to keep the enclave 
>> alive. */
>> +#define NE_SLEEP_TIME (300)
>> +
>> +/* Enclave vCPUs metadata. */
>> +#define NE_DEFAULT_NR_VCPUS (2)
>> +
>> +/* Enclave memory metadata */
>> +
>> +/* Min memory size - 2 MiB */
>> +#define NE_MIN_MEM_REGION_SIZE (2 * 1024 * 1024)
>> +
>> +/* 256 memory regions of 2 MiB */
>> +#define NE_DEFAULT_NR_MEM_REGIONS (256)
>> +
>> +/* Vsock addressing for enclave image loading heartbeat. */
>> +#define NE_IMAGE_LOAD_VSOCK_CID (3)
>> +#define NE_IMAGE_LOAD_VSOCK_PORT (9000)
>> +#define NE_IMAGE_LOAD_HEARTBEAT_VALUE (0xb7)
>> +
>> +struct ne_mem_region {
>> +    void *mem_addr;
>> +    size_t mem_size;
>> +};
>> +
>> +struct ne_vcpu {
>> +    int vcpu_fd;
>> +    unsigned int vcpu_id;
>> +};
>> +
>> +/* Thread function for polling the enclave fd. */
>> +void *ne_poll_enclave_fd(void *data)
>> +{
>> +    int enclave_fd = *(int *)data;
>> +    struct pollfd fds[1] = {};
>> +    int i = 0;
>> +    int rc = 0;
>> +
>> +    printf("Running from poll thread, enclave fd %d\n", enclave_fd);
>> +
>> +    fds[0].fd = enclave_fd;
>> +    fds[0].events = POLLIN | POLLERR | POLLHUP;
>> +
>> +    /* Keep on polling until the current process is terminated. */
>> +    while (1) {
>> +        printf("[iter %d] Polling ...\n", i);
>> +
>> +        rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
>> +        if (rc < 0) {
>> +            printf("Error in poll [%m]\n");
>> +
>> +            return NULL;
>> +        }
>> +
>> +        i++;
>> +
>> +        if (!rc) {
>> +            printf("Poll: %d seconds elapsed\n",
>> +                   i * NE_POLL_WAIT_TIME);
>> +
>> +            continue;
>> +        }
>> +
>> +        printf("Poll received value %d\n", fds[0].revents);
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +/* Allocate memory region that will be used for the enclave. */
>> +static int ne_alloc_mem_region(struct ne_mem_region *ne_mem_region)
>> +{
>> +    if (!ne_mem_region)
>> +        return -EINVAL;
>> +
>> +    if (!ne_mem_region->mem_size)
>> +        return -EINVAL;
>> +
>> +    ne_mem_region->mem_addr = mmap(NULL, ne_mem_region->mem_size,
>> +                       PROT_READ | PROT_WRITE,
>> +                       MAP_PRIVATE | MAP_ANONYMOUS |
>> +                       MAP_HUGETLB, -1, 0);
>> +    if (ne_mem_region->mem_addr == MAP_FAILED) {
>> +        printf("Error in mmap memory [%m]\n");
>> +
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Place enclave image in enclave memory. */
>> +static int ne_load_enclave_image(int enclave_fd,
>> +    struct ne_mem_region ne_mem_regions[], char enclave_image_path[])
>> +{
>> +    struct ne_image_load_info image_load_info = {};
>> +    int rc = 0;
>> +
>> +    if (enclave_fd < 0)
>> +        return -EINVAL;
>> +
>> +    image_load_info.flags = NE_EIF_IMAGE;
>> +
>> +    rc = ioctl(enclave_fd, NE_GET_IMAGE_LOAD_INFO, &image_load_info);
>> +    if (rc < 0) {
>> +        printf("Error in get image load info [rc=%d]\n", rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    printf("Enclave image offset in enclave memory is %lld\n",
>> +           image_load_info.memory_offset);
>> +
>> +    /*
>> +     * TODO: Copy enclave image in enclave memory starting from the 
>> given
>> +     * offset.
>> +     */
>
> Just open and read into the buffer at the given offset? :)

Aham, there is no big complexity in this. :) I just wanted to have it 
together with the updated functionality on the heartbeat logic below.

>
>> +
>> +    return 0;
>> +}
>> +
>> +/* Wait for a hearbeat from the enclave to check it has booted. */
>> +static int ne_check_enclave_booted(void)
>> +{
>> +    struct sockaddr_vm client_vsock_addr = {};
>> +    socklen_t client_vsock_len = sizeof(client_vsock_addr);
>> +    struct pollfd fds[1] = {};
>> +    int rc = 0;
>> +    unsigned char recv_buf = 0;
>> +    struct sockaddr_vm server_vsock_addr = {
>> +        .svm_family = AF_VSOCK,
>> +        .svm_cid = NE_IMAGE_LOAD_VSOCK_CID,
>> +        .svm_port = NE_IMAGE_LOAD_VSOCK_PORT,
>> +    };
>> +    int server_vsock_fd = 0;
>> +
>> +    server_vsock_fd = socket(AF_VSOCK, SOCK_STREAM, 0);
>> +    if (server_vsock_fd < 0) {
>> +        rc = server_vsock_fd;
>> +
>> +        printf("Error in socket [rc=%d]\n", rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    rc = bind(server_vsock_fd, (struct sockaddr *)&server_vsock_addr,
>> +          sizeof(server_vsock_addr));
>> +    if (rc < 0) {
>> +        printf("Error in bind [rc=%d]\n", rc);
>> +
>> +        goto out;
>> +    }
>> +
>> +    rc = listen(server_vsock_fd, 1);
>> +    if (rc < 0) {
>> +        printf("Error in listen [rc=%d]\n", rc);
>> +
>> +        goto out;
>> +    }
>> +
>> +    fds[0].fd = server_vsock_fd;
>> +    fds[0].events = POLLIN;
>> +
>> +    rc = poll(fds, 1, NE_POLL_WAIT_TIME_MS);
>> +    if (rc < 0) {
>> +        printf("Error in poll [%m]\n");
>> +
>> +        goto out;
>> +    }
>> +
>> +    if (!rc) {
>> +        printf("Poll timeout, %d seconds elapsed\n", 
>> NE_POLL_WAIT_TIME);
>> +
>> +        rc = -ETIMEDOUT;
>> +
>> +        goto out;
>> +    }
>> +
>> +    if ((fds[0].revents & POLLIN) == 0) {
>> +        printf("Poll received value %d\n", fds[0].revents);
>> +
>> +        rc = -EINVAL;
>> +
>> +        goto out;
>> +    }
>> +
>> +    rc = accept(server_vsock_fd, (struct sockaddr *)&client_vsock_addr,
>> +            &client_vsock_len);
>> +    if (rc < 0) {
>> +        printf("Error in accept [rc=%d]\n", rc);
>> +
>> +        goto out;
>> +    }
>> +
>> +    /*
>> +     * Read the heartbeat value that the init process in the enclave 
>> sends
>> +     * after vsock connect.
>> +     */
>> +    rc = read(server_vsock_fd, &recv_buf, sizeof(recv_buf));
>> +    if (rc < 0) {
>> +        printf("Error in read [rc=%d]\n", rc);
>> +
>> +        goto out;
>> +    }
>> +
>> +    if (rc != sizeof(recv_buf) ||
>> +        recv_buf != NE_IMAGE_LOAD_HEARTBEAT_VALUE) {
>> +        printf("Read %d instead of %d\n", recv_buf,
>> +               NE_IMAGE_LOAD_HEARTBEAT_VALUE);
>> +
>> +        goto out;
>> +    }
>> +
>> +    close(server_vsock_fd);
>> +
>> +    return 0;
>> +
>> +out:
>> +    close(server_vsock_fd);
>> +
>> +    return rc;
>> +}
>> +
>> +/* Set memory region for the given enclave. */
>> +static int ne_set_mem_region(int enclave_fd, struct ne_mem_region 
>> ne_mem_region)
>> +{
>> +    struct ne_user_memory_region mem_region = {};
>> +    int rc = 0;
>> +
>> +    if (enclave_fd < 0)
>> +        return -EINVAL;
>> +
>> +    mem_region.memory_size = ne_mem_region.mem_size;
>> +    mem_region.userspace_addr = (__u64)ne_mem_region.mem_addr;
>> +
>> +    rc = ioctl(enclave_fd, NE_SET_USER_MEMORY_REGION, &mem_region);
>> +    if (rc < 0) {
>> +        printf("Error in set user memory region [rc=%d]\n", rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Unmap all the memory regions that were set aside for the enclave. */
>> +static void ne_free_mem_regions(struct ne_mem_region ne_mem_regions[])
>> +{
>> +    unsigned int i = 0;
>> +
>> +    for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++)
>> +        munmap(ne_mem_regions[i].mem_addr, ne_mem_regions[i].mem_size);
>> +}
>> +
>> +/* Create enclave vCPU. */
>> +static int ne_create_vcpu(int enclave_fd, struct ne_vcpu *ne_vcpu)
>> +{
>> +    if (enclave_fd < 0)
>> +        return -EINVAL;
>> +
>> +    if (!ne_vcpu)
>> +        return -EINVAL;
>> +
>> +    ne_vcpu->vcpu_fd = ioctl(enclave_fd, NE_CREATE_VCPU, 
>> &ne_vcpu->vcpu_id);
>> +    if (ne_vcpu->vcpu_fd < 0) {
>> +        printf("Error in create vcpu [rc=%d]\n", ne_vcpu->vcpu_fd);
>> +
>> +        return ne_vcpu->vcpu_fd;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Release enclave vCPU fd(s). */
>> +static void ne_release_vcpus(struct ne_vcpu ne_vcpus[])
>> +{
>> +    unsigned int i = 0;
>> +
>> +    for (i = 0; i < NE_DEFAULT_NR_VCPUS; i++)
>> +        if (ne_vcpus[i].vcpu_fd > 0)
>> +            close(ne_vcpus[i].vcpu_fd);
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +    int enclave_fd = 0;
>> +    char enclave_image_path[PATH_MAX] = {};
>> +    struct ne_enclave_start_info enclave_start_info = {};
>> +    unsigned int i = 0;
>> +    int ne_api_version = 0;
>> +    int ne_dev_fd = 0;
>> +    struct ne_mem_region ne_mem_regions[NE_DEFAULT_NR_MEM_REGIONS] = 
>> {};
>> +    struct ne_vcpu ne_vcpus[NE_DEFAULT_NR_VCPUS] = {};
>> +    int rc = 0;
>> +    unsigned long slot_uid = 0;
>> +    pthread_t thread_id = 0;
>> +
>> +    if (argc != 2) {
>> +        printf("Usage: %s <path_to_enclave_image>\n", argv[0]);
>> +
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    strncpy(enclave_image_path, argv[1], sizeof(enclave_image_path) 
>> - 1);
>
> Why can you not just pass argv[1] as path?

I just wanted to limit to PATH_MAX size, but I can have this check on 
argv[1] and then pass it as path.

>
>> +
>> +    ne_dev_fd = open(NE_DEV_NAME, O_RDWR | O_CLOEXEC);
>> +    if (ne_dev_fd < 0) {
>> +        printf("Error in open NE device [rc=%d]\n", ne_dev_fd);
>> +
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    ne_api_version = ioctl(ne_dev_fd, NE_GET_API_VERSION);
>> +    if (ne_api_version != NE_EXPECTED_API_VERSION) {
>> +        printf("Expected API version %d, provided API version %d\n",
>> +               NE_EXPECTED_API_VERSION, ne_api_version);
>> +
>> +        close(ne_dev_fd);
>> +
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    printf("Creating enclave slot ...\n");
>> +
>> +    enclave_fd = ioctl(ne_dev_fd, NE_CREATE_VM, &slot_uid);
>> +
>> +    close(ne_dev_fd);
>> +
>> +    if (enclave_fd < 0) {
>> +        printf("Error in create enclave slot [rc=%d]\n", enclave_fd);
>> +
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    printf("Enclave fd %d\n", enclave_fd);
>> +
>> +    rc = pthread_create(&thread_id, NULL, ne_poll_enclave_fd,
>> +                (void *)&enclave_fd);
>> +    if (rc < 0) {
>> +        printf("Error in thread create [rc=%d]\n", rc);
>> +
>> +        close(enclave_fd);
>> +
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
>> +        ne_mem_regions[i].mem_size = NE_MIN_MEM_REGION_SIZE;
>> +        rc = ne_alloc_mem_region(&ne_mem_regions[i]);
>> +        if (rc < 0) {
>> +            printf("Error in alloc mem region, iter %d [rc=%d]\n",
>> +                   i, rc);
>> +
>> +            goto release_enclave_fd;
>> +        }
>> +    }
>> +
>> +    rc = ne_load_enclave_image(enclave_fd, ne_mem_regions,
>> +                   enclave_image_path);
>> +    if (rc < 0) {
>> +        printf("Error in load enclave image [rc=%d]\n", rc);
>> +
>> +        goto release_enclave_fd;
>> +    }
>> +
>> +    for (i = 0; i < NE_DEFAULT_NR_MEM_REGIONS; i++) {
>> +        rc = ne_set_mem_region(enclave_fd, ne_mem_regions[i]);
>> +        if (rc < 0) {
>> +            printf("Error in set mem region, iter %d [rc=%d]\n",
>> +                   i, rc);
>> +
>> +            goto release_enclave_fd;
>> +        }
>> +    }
>> +
>> +    printf("Enclave memory regions were added\n");
>> +
>> +    for (i = 0; i < NE_DEFAULT_NR_VCPUS; i++) {
>> +        /*
>> +         * The vCPU is chosen from the enclave vCPU pool, if the value
>> +         * of the vcpu_id is 0.
>> +         */
>> +        ne_vcpus[i].vcpu_id = 0;
>> +        rc = ne_create_vcpu(enclave_fd, &ne_vcpus[i]);
>> +        if (rc < 0) {
>> +            printf("Error in create vcpu, iter %d [rc=%d]\n",
>> +                   i, rc);
>> +
>> +            goto release_enclave_vcpu_fds;
>> +        }
>> +    }
>> +
>> +    printf("Enclave vCPUs were created\n");
>> +
>> +    rc = ioctl(enclave_fd, NE_START_ENCLAVE, &enclave_start_info);
>> +    if (rc < 0) {
>> +        printf("Error in start enclave [rc=%d]\n", rc);
>> +
>> +        goto release_enclave_vcpu_fds;
>> +    }
>> +
>> +    printf("Enclave started, CID %llu\n", 
>> enclave_start_info.enclave_cid);
>> +
>> +    /*
>> +     * TODO: Check for enclave hearbeat after it has started to see 
>> if it
>> +     * has booted.
>> +     */
>
> So you wrote the function to check for the heartbeat, but don't call 
> it? Why?
>

The logic flow (in the NE user space tooling, not from this sample) was 
in review at the time I added it here and recently has been updated. Now 
that we have completed the reviews, I will update this logic in the 
sample, together with including the enclave image loading in memory code 
bits mentioned above.

Thanks,
Andra

>
>> +
>> +    printf("Entering sleep for %d seconds ...\n", NE_SLEEP_TIME);
>> +
>> +    sleep(NE_SLEEP_TIME);
>> +
>> +    ne_release_vcpus(ne_vcpus);
>> +
>> +    close(enclave_fd);
>> +
>> +    ne_free_mem_regions(ne_mem_regions);
>> +
>> +    exit(EXIT_SUCCESS);
>> +
>> +release_enclave_vcpu_fds:
>> +    ne_release_vcpus(ne_vcpus);
>> +release_enclave_fd:
>> +    close(enclave_fd);
>> +    ne_free_mem_regions(ne_mem_regions);
>> +
>> +    exit(EXIT_FAILURE);
>> +}
>>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation
  2020-07-06 10:12   ` Alexander Graf
@ 2020-07-08 12:46     ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-08 12:46 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 13:12, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> An enclave, before being started, has its resources set. One of its
>> resources is CPU.
>>
>> The NE CPU pool is set for choosing CPUs for enclaves from it. Offline
>> the CPUs from the NE CPU pool during the pool setup and online them back
>> during the NE CPU pool teardown.
>>
>> The enclave CPUs need to be full cores and from the same NUMA node. CPU
>> 0 and its siblings have to remain available to the primary / parent VM.
>>
>> Add ioctl command logic for enclave vCPU creation. Return as result a
>> file descriptor that is associated with the enclave vCPU.
>>
>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Setup the NE CPU pool at runtime via a sysfs file for the kernel
>>    parameter.
>> * Check enclave CPUs to be from the same NUMA node.
>> * Use dev_err instead of custom NE log pattern.
>> * Update the NE ioctl call to match the decoupling from the KVM API.
>>
>> v2 -> v3
>>
>> * Remove the WARN_ON calls.
>> * Update static calls sanity checks.
>> * Update kzfree() calls to kfree().
>> * Remove file ops that do nothing for now - open, ioctl and release.
>>
>> v1 -> v2
>>
>> * Add log pattern for NE.
>> * Update goto labels to match their purpose.
>> * Remove the BUG_ON calls.
>> * Check if enclave state is init when setting enclave vcpu.
>> ---
>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 491 ++++++++++++++++++++++
>>   1 file changed, 491 insertions(+)
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> index f70496813033..d6777008f685 100644
>> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> @@ -39,7 +39,11 @@
>>    * TODO: Update logic to create new sysfs entries instead of using
>>    * a kernel parameter e.g. if multiple sysfs files needed.
>>    */
>> +static int ne_set_kernel_param(const char *val, const struct 
>> kernel_param *kp);
>> +
>>   static const struct kernel_param_ops ne_cpu_pool_ops = {
>> +    .get = param_get_string,
>> +    .set = ne_set_kernel_param,
>>   };
>>     static char ne_cpus[PAGE_SIZE];
>> @@ -60,6 +64,485 @@ struct ne_cpu_pool {
>>     static struct ne_cpu_pool ne_cpu_pool;
>>   +static const struct file_operations ne_enclave_vcpu_fops = {
>> +    .owner        = THIS_MODULE,
>> +    .llseek        = noop_llseek,
>> +};
>
> Do we really need an fd for an object without operations? I think the 
> general flow to add CPUs from the pool to the VM is very sensible. But 
> I don't think we really need an fd as return value from that operation.

Not particularly now, I kept it here for any potential further use cases 
where will need one and to make sure we take into account a stable 
interface and possibility for extensions.

As we've discussed that we can have as option for further extensions to 
add another ioctl which returns an fd, will update the current ioctl to 
keep the logic of adding a vCPU w/o generating an fd.

>
>> +
>> +/**
>> + * ne_check_enclaves_created - Verify if at least one enclave has 
>> been created.
>> + *
>> + * @pdev: PCI device used for enclave lifetime management.
>> + *
>> + * @returns: true if at least one enclave is created, false otherwise.
>> + */
>> +static bool ne_check_enclaves_created(struct pci_dev *pdev)
>> +{
>> +    struct ne_pci_dev *ne_pci_dev = pci_get_drvdata(pdev);
>> +
>> +    if (!ne_pci_dev)
>> +        return false;
>
> Please pass in the ne_pci_dev into this function directly.

Updated the function signature.

>
>
>> +
>> +    mutex_lock(&ne_pci_dev->enclaves_list_mutex);
>> +
>> +    if (list_empty(&ne_pci_dev->enclaves_list)) {
>> +        mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
>> +
>> +        return false;
>
> If you make this a return variable, you save on the unlock duplication.

Updated the logic to use a ret var.

>
>> +    }
>> +
>> +    mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
>> +
>> +    return true;
>> +}
>> +
>> +/**
>> + * ne_setup_cpu_pool - Set the NE CPU pool after handling sanity 
>> checks such as
>> + * not sharing CPU cores with the primary / parent VM or not using 
>> CPU 0, which
>> + * should remain available for the primary / parent VM. Offline the 
>> CPUs from
>> + * the pool after the checks passed.
>> + *
>> + * @pdev: PCI device used for enclave lifetime management.
>> + * @ne_cpu_list: the CPU list used for setting NE CPU pool.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_setup_cpu_pool(struct pci_dev *pdev, const char 
>> *ne_cpu_list)
>> +{
>> +    unsigned int cpu = 0;
>> +    unsigned int cpu_sibling = 0;
>> +    int numa_node = -1;
>> +    int rc = -EINVAL;
>> +
>> +    if (!capable(CAP_SYS_ADMIN)) {
>> +        dev_err(&pdev->dev, "No admin capability for CPU pool 
>> setup\n");
>
> No need to print anything here. It only gives non-admin users a chance 
> to spill the kernel log. If non-admin users can write at all? Can they?
>
> Also, isn't this at the wrong abstraction level? I would expect such a 
> check to happen on the file write function, not here.

Removed the log. Non-admin users don't have the permission to write, 
that's the default file permission set. I wanted to guard the offline / 
online of the CPUs anyway with this check.

True, I already moved the check when writing (setting) the cpu list in 
the file when I started to work on v5.

>
>> +
>> +        return -EPERM;
>> +    }
>> +
>> +    if (!ne_cpu_list)
>> +        return 0;
>> +
>> +    if (ne_check_enclaves_created(pdev)) {
>> +        dev_err(&pdev->dev, "The CPU pool is used, enclaves 
>> created\n");
>> +
>> +        return -EINVAL;
>> +    }
>> +
>> +    mutex_lock(&ne_cpu_pool.mutex);
>> +
>> +    rc = cpulist_parse(ne_cpu_list, ne_cpu_pool.avail);
>> +    if (rc < 0) {
>> +        dev_err(&pdev->dev,
>> +            "Error in cpulist parse [rc=%d]\n", rc);
>> +
>> +        goto unlock_mutex;
>> +    }
>> +
>> +    /*
>> +     * Check if CPU 0 and its siblings are included in the provided 
>> CPU pool
>> +     * They should remain available for the primary / parent VM.
>> +     */
>> +    if (cpumask_test_cpu(0, ne_cpu_pool.avail)) {
>> +
>> +        dev_err(&pdev->dev,
>> +            "CPU 0 has to remain available for the primary VM\n");
>
> Shouldn't this also change the read value of the sysfs file?

Yes, I already updated the logic in v5 to set an empty string for sysfs 
file value when there are failures in setting the CPU pool.

>
>> +
>> +        rc = -EINVAL;
>> +
>> +        goto unlock_mutex;
>> +    }
>> +
>> +    for_each_cpu(cpu_sibling, topology_sibling_cpumask(0)) {
>> +        if (cpumask_test_cpu(cpu_sibling, ne_cpu_pool.avail)) {
>> +            dev_err(&pdev->dev,
>> +                "CPU sibling %d of CPU 0 is in the CPU pool\n",
>> +                cpu_sibling);
>
> Same here. I would expect the sysfs file to reflect either the 
> previous state or <empty> because failures mean no CPUs are donated 
> anymore.
>
> Can we somehow implement the get function of the param as something 
> that gets generated dynamically?

I already updated the logic to set the string value of the CPU pool to 
an empty string and clear our internal data structure, the cpumask. This 
way, an empty sysfs file means no CPUs are set and all the CPUs are 
onlined back.

The CPU pool sysfs file value setup in v5 includes an early exit check - 
if enclaves are available, the CPU pool cannot be changed anymore.

Sure, we could have a custom get function, but I just haven't seen for 
now a need to have one replacing the current (default) implementation 
provided by the kernel.

>
>> +
>> +            rc = -EINVAL;
>> +
>> +            goto unlock_mutex;
>> +        }
>> +    }
>> +
>> +    /*
>> +     * Check if CPU siblings are included in the provided CPU pool. The
>> +     * expectation is that CPU cores are made available in the CPU 
>> pool for
>> +     * enclaves.
>> +     */
>> +    for_each_cpu(cpu, ne_cpu_pool.avail) {
>> +        for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
>> +            if (!cpumask_test_cpu(cpu_sibling, ne_cpu_pool.avail)) {
>> +                dev_err(&pdev->dev,
>> +                    "CPU %d is not in the CPU pool\n",
>> +                    cpu_sibling);
>> +
>> +                rc = -EINVAL;
>> +
>> +                goto unlock_mutex;
>> +            }
>> +        }
>> +    }
>> +
>> +    /*
>> +     * Check if the CPUs from the NE CPU pool are from the same NUMA 
>> node.
>> +     */
>> +    for_each_cpu(cpu, ne_cpu_pool.avail) {
>> +        if (numa_node < 0) {
>> +            numa_node = cpu_to_node(cpu);
>> +
>> +            if (numa_node < 0) {
>> +                dev_err(&pdev->dev,
>> +                    "Invalid NUMA node %d\n", numa_node);
>> +
>> +                rc = -EINVAL;
>> +
>> +                goto unlock_mutex;
>> +            }
>> +        } else {
>> +            if (numa_node != cpu_to_node(cpu)) {
>> +                dev_err(&pdev->dev,
>> +                    "CPUs are from different NUMA nodes\n");
>> +
>> +                rc = -EINVAL;
>> +
>> +                goto unlock_mutex;
>> +            }
>> +        }
>> +    }
>> +
>
> There should be a comment here that describes the why:
>
> /*
>  * CPUs that are donated to enclaves should not be considered online
>  * by Linux anymore, as the hypervisor will degrade them to floating.
>  *
>  * We offline them here, to not degrade performance and expose correct
>  * topology to Linux and user space.
>  */

Good point. Added here and also included in the commit message the 
motivation for offlining / onlining the CPUs from the pool.

>
>> +    for_each_cpu(cpu, ne_cpu_pool.avail) {
>> +        rc = remove_cpu(cpu);
>> +        if (rc != 0) {
>> +            dev_err(&pdev->dev,
>> +                "CPU %d is not offlined [rc=%d]\n", cpu, rc);
>> +
>> +            goto online_cpus;
>> +        }
>> +    }
>> +
>> +    mutex_unlock(&ne_cpu_pool.mutex);
>> +
>> +    return 0;
>> +
>> +online_cpus:
>> +    for_each_cpu(cpu, ne_cpu_pool.avail)
>> +        add_cpu(cpu);
>> +unlock_mutex:
>> +    mutex_unlock(&ne_cpu_pool.mutex);
>> +
>> +    return rc;
>> +}
>> +
>> +/**
>> + * ne_teardown_cpu_pool - Online the CPUs from the NE CPU pool and 
>> cleanup the
>> + * CPU pool.
>> + *
>> + * @pdev: PCI device used for enclave lifetime management.
>> + */
>> +static void ne_teardown_cpu_pool(struct pci_dev *pdev)
>> +{
>> +    unsigned int cpu = 0;
>> +    int rc = -EINVAL;
>> +
>> +    if (!capable(CAP_SYS_ADMIN)) {
>> +        dev_err(&pdev->dev, "No admin capability for CPU pool 
>> setup\n");
>> +
>> +        return;
>> +    }
>> +
>> +    if (!ne_cpu_pool.avail)
>> +        return;
>> +
>> +    if (ne_check_enclaves_created(pdev)) {
>> +        dev_err(&pdev->dev, "The CPU pool is used, enclaves 
>> created\n");
>> +
>> +        return;
>> +    }
>> +
>> +    mutex_lock(&ne_cpu_pool.mutex);
>> +
>> +    for_each_cpu(cpu, ne_cpu_pool.avail) {
>> +        rc = add_cpu(cpu);
>> +        if (rc != 0)
>> +            dev_err(&pdev->dev,
>> +                "CPU %d is not onlined [rc=%d]\n", cpu, rc);
>> +    }
>> +
>> +    cpumask_clear(ne_cpu_pool.avail);
>> +
>> +    mutex_unlock(&ne_cpu_pool.mutex);
>> +}
>> +
>> +static int ne_set_kernel_param(const char *val, const struct 
>> kernel_param *kp)
>> +{
>> +    const char *ne_cpu_list = val;
>> +    struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
>> +                          PCI_DEVICE_ID_NE, NULL);
>
> Isn't there a better way?

Yeah, I'm looking for options to update the logic to not use the 
pci_get_device() call where it appears in the patch series. Also pointed 
out in the discussion I've had before with Greg on a patch from the 
current version.

>
>> +    int rc = -EINVAL;
>> +
>> +    if (!pdev)
>> +        return -ENODEV;
>> +
>> +    ne_teardown_cpu_pool(pdev);
>> +
>> +    rc = ne_setup_cpu_pool(pdev, ne_cpu_list);
>> +    if (rc < 0) {
>> +        dev_err(&pdev->dev, "Error in setup CPU pool [rc=%d]\n", rc);
>> +
>> +        return rc;
>> +    }
>> +
>> +    return param_set_copystring(val, kp);
>> +}
>> +
>> +/**
>> + * ne_get_cpu_from_cpu_pool - Get a CPU from the CPU pool. If the 
>> vCPU id is 0,
>> + * the CPU is autogenerated and chosen from the NE CPU pool.
>> + *
>> + * This function gets called with the ne_enclave mutex held.
>> + *
>> + * @ne_enclave: private data associated with the current enclave.
>> + * @vcpu_id: id of the CPU to be associated with the given slot, 
>> apic id on x86.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_get_cpu_from_cpu_pool(struct ne_enclave *ne_enclave, 
>> u32 *vcpu_id)
>
> That's a very awkward API. Can you instead just pass by-value and 
> return the resulting CPU ID?

I separated the whole logic in 2 functions, one for getting a CPU from 
the pool and one for checking a given CPU is in the pool.

>
>> +{
>> +    unsigned int cpu = 0;
>> +    unsigned int cpu_sibling = 0;
>> +
>> +    if (*vcpu_id != 0) {
>> +        if (cpumask_test_cpu(*vcpu_id, ne_enclave->cpu_siblings)) {
>> +            cpumask_clear_cpu(*vcpu_id, ne_enclave->cpu_siblings);
>> +
>> +            return 0;
>> +        }
>> +
>> +        mutex_lock(&ne_cpu_pool.mutex);
>> +
>> +        if (!cpumask_test_cpu(*vcpu_id, ne_cpu_pool.avail)) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "CPU %d is not in NE CPU pool\n",
>> +                        *vcpu_id);
>> +
>> +            mutex_unlock(&ne_cpu_pool.mutex);
>> +
>> +            return -EINVAL;
>
> I think you're better off making the return value explicit for the 
> error, so that user space can print the error message rather than us.

Yup, will update the patch series to use NE specific errors in cases 
where necessary like this one.

>
>> +        }
>> +
>> +        cpumask_clear_cpu(*vcpu_id, ne_cpu_pool.avail);
>> +
>> +        /*
>> +         * Make sure the CPU siblings are not marked as available
>> +         * anymore.
>> +         */
>> +        for_each_cpu(cpu_sibling, topology_sibling_cpumask(*vcpu_id)) {
>> +            if (cpu_sibling != *vcpu_id) {
>> +                cpumask_clear_cpu(cpu_sibling,
>> +                          ne_cpu_pool.avail);
>> +
>> +                cpumask_set_cpu(cpu_sibling,
>> +                        ne_enclave->cpu_siblings);
>> +            }
>> +        }
>> +
>> +        mutex_unlock(&ne_cpu_pool.mutex);
>> +
>> +        return 0;
>> +    }
>> +
>> +    /* There are CPU siblings available to choose from. */
>> +    cpu = cpumask_any(ne_enclave->cpu_siblings);
>> +    if (cpu < nr_cpu_ids) {
>> +        cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
>> +
>> +        *vcpu_id = cpu;
>> +
>> +        return 0;
>> +    }
>> +
>> +    mutex_lock(&ne_cpu_pool.mutex);
>> +
>> +    /* Choose any CPU from the available CPU pool. */
>> +    cpu = cpumask_any(ne_cpu_pool.avail);
>> +    if (cpu >= nr_cpu_ids) {
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "No CPUs available in CPU pool\n");
>> +
>> +        mutex_unlock(&ne_cpu_pool.mutex);
>> +
>> +        return -EINVAL;
>
> I think you're better off making the return value explicit for the 
> error, so that user space can print the error message rather than us.
>
>> +    }
>> +
>> +    cpumask_clear_cpu(cpu, ne_cpu_pool.avail);
>> +
>> +    /* Make sure the CPU siblings are not marked as available 
>> anymore. */
>> +    for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
>> +        if (cpu_sibling != cpu) {
>> +            cpumask_clear_cpu(cpu_sibling, ne_cpu_pool.avail);
>> +
>> +            cpumask_set_cpu(cpu_sibling, ne_enclave->cpu_siblings);
>> +        }
>> +    }
>> +
>> +    mutex_unlock(&ne_cpu_pool.mutex);
>
> I find the function slightly confusingly structured. Why can't we do 
> something like
>
>
>   if (!vcpu_id) {
>     vcpu_id = find_next_free_vcpu_id();
>     if (vcpu_id < 0)
>         return -ENOSPC;
>   }
>
>   [logic to handle an explicit vcpu id]
>
> I think that would be much more readable.

The logic is now separated in 2 functions, one for checking the CPU is 
in the pool and one for getting a CPU from the pool.

>
>> +
>> +    *vcpu_id = cpu;
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_create_vcpu_ioctl - Add vCPU to the slot associated with the 
>> current
>> + * enclave. Create vCPU file descriptor to be further used for CPU 
>> handling.
>> + *
>> + * This function gets called with the ne_enclave mutex held.
>> + *
>> + * @ne_enclave: private data associated with the current enclave.
>> + * @vcpu_id: id of the CPU to be associated with the given slot, 
>> apic id on x86.
>> + *
>> + * @returns: vCPU fd on success, negative return value on failure.
>> + */
>> +static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 
>> vcpu_id)
>> +{
>> +    struct ne_pci_dev_cmd_reply cmd_reply = {};
>> +    int fd = 0;
>> +    struct file *file = NULL;
>> +    struct ne_vcpu_id *ne_vcpu_id = NULL;
>> +    int rc = -EINVAL;
>> +    struct slot_add_vcpu_req slot_add_vcpu_req = {};
>> +
>> +    if (ne_enclave->mm != current->mm)
>> +        return -EIO;
>> +
>> +    ne_vcpu_id = kzalloc(sizeof(*ne_vcpu_id), GFP_KERNEL);
>> +    if (!ne_vcpu_id)
>> +        return -ENOMEM;
>> +
>> +    fd = get_unused_fd_flags(O_CLOEXEC);
>> +    if (fd < 0) {
>> +        rc = fd;
>> +
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "Error in getting unused fd [rc=%d]\n", rc);
>> +
>> +        goto free_ne_vcpu_id;
>> +    }
>> +
>> +    /* TODO: Include (vcpu) id in the ne-vm-vcpu naming. */
>> +    file = anon_inode_getfile("ne-vm-vcpu", &ne_enclave_vcpu_fops,
>> +                  ne_enclave, O_RDWR);
>> +    if (IS_ERR(file)) {
>> +        rc = PTR_ERR(file);
>> +
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "Error in anon inode get file [rc=%d]\n",
>> +                    rc);
>> +
>> +        goto put_fd;
>> +    }
>> +
>> +    slot_add_vcpu_req.slot_uid = ne_enclave->slot_uid;
>> +    slot_add_vcpu_req.vcpu_id = vcpu_id;
>> +
>> +    rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_VCPU, 
>> &slot_add_vcpu_req,
>> +               sizeof(slot_add_vcpu_req), &cmd_reply,
>> +               sizeof(cmd_reply));
>> +    if (rc < 0) {
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "Error in slot add vcpu [rc=%d]\n", rc);
>> +
>> +        goto put_file;
>> +    }
>> +
>> +    ne_vcpu_id->vcpu_id = vcpu_id;
>> +
>> +    list_add(&ne_vcpu_id->vcpu_id_list_entry, 
>> &ne_enclave->vcpu_ids_list);
>> +
>> +    ne_enclave->nr_vcpus++;
>> +
>> +    fd_install(fd, file);
>> +
>> +    return fd;
>> +
>> +put_file:
>> +    fput(file);
>> +put_fd:
>> +    put_unused_fd(fd);
>> +free_ne_vcpu_id:
>> +    kfree(ne_vcpu_id);
>> +
>> +    return rc;
>> +}
>> +
>> +static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>> +                 unsigned long arg)
>> +{
>> +    struct ne_enclave *ne_enclave = file->private_data;
>> +
>> +    if (!ne_enclave || !ne_enclave->pdev)
>> +        return -EINVAL;
>> +
>> +    switch (cmd) {
>> +    case NE_CREATE_VCPU: {
>
> Can this be an ADD_VCPU rather than CREATE? We don't really need a 
> vcpu fd after all ...

I updated the ioctl call.

Thanks for review.

Andra

>
>> +        int rc = -EINVAL;
>> +        u32 vcpu_id = 0;
>> +
>> +        if (copy_from_user(&vcpu_id, (void *)arg, sizeof(vcpu_id))) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in copy from user\n");
>> +
>> +            return -EFAULT;
>> +        }
>> +
>> +        mutex_lock(&ne_enclave->enclave_info_mutex);
>> +
>> +        if (ne_enclave->state != NE_STATE_INIT) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Enclave isn't in init state\n");
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -EINVAL;
>> +        }
>> +
>> +        /* Use the CPU pool for choosing a CPU for the enclave. */
>> +        rc = ne_get_cpu_from_cpu_pool(ne_enclave, &vcpu_id);
>> +        if (rc < 0) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in get CPU from pool\n");
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -EINVAL;
>> +        }
>> +
>> +        rc = ne_create_vcpu_ioctl(ne_enclave, vcpu_id);
>> +
>> +        /* Put back the CPU in enclave cpu pool, if add vcpu error. */
>> +        if (rc < 0)
>> +            cpumask_set_cpu(vcpu_id, ne_enclave->cpu_siblings);
>> +
>> +        mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +        if (copy_to_user((void *)arg, &vcpu_id, sizeof(vcpu_id))) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in copy to user\n");
>> +
>> +            return -EFAULT;
>> +        }
>> +
>> +        return rc;
>> +    }
>> +
>> +    default:
>> +        return -ENOTTY;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
>>   {
>>       __poll_t mask = 0;
>> @@ -79,6 +562,7 @@ static const struct file_operations 
>> ne_enclave_fops = {
>>       .owner        = THIS_MODULE,
>>       .llseek        = noop_llseek,
>>       .poll        = ne_enclave_poll,
>> +    .unlocked_ioctl    = ne_enclave_ioctl,
>>   };
>>     /**
>> @@ -286,8 +770,15 @@ static int __init ne_init(void)
>>     static void __exit ne_exit(void)
>>   {
>> +    struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_AMAZON,
>> +                          PCI_DEVICE_ID_NE, NULL);
>> +    if (!pdev)
>> +        return;
>> +
>>       pci_unregister_driver(&ne_pci_driver);
>>   +    ne_teardown_cpu_pool(pdev);
>> +
>>       free_cpumask_var(ne_cpu_pool.avail);
>>   }
>>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set
  2020-07-06 10:46   ` Alexander Graf
@ 2020-07-09  7:36     ` Paraschiv, Andra-Irina
  2020-07-09  8:40       ` Alexander Graf
  0 siblings, 1 reply; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-09  7:36 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 06/07/2020 13:46, Alexander Graf wrote:
>
>
> On 22.06.20 22:03, Andra Paraschiv wrote:
>> Another resource that is being set for an enclave is memory. User space
>> memory regions, that need to be backed by contiguous memory regions,
>> are associated with the enclave.
>>
>> One solution for allocating / reserving contiguous memory regions, that
>> is used for integration, is hugetlbfs. The user space process that is
>> associated with the enclave passes to the driver these memory regions.
>>
>> The enclave memory regions need to be from the same NUMA node as the
>> enclave CPUs.
>>
>> Add ioctl command logic for setting user space memory region for an
>> enclave.
>>
>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>> ---
>> Changelog
>>
>> v3 -> v4
>>
>> * Check enclave memory regions are from the same NUMA node as the
>>    enclave CPUs.
>> * Use dev_err instead of custom NE log pattern.
>> * Update the NE ioctl call to match the decoupling from the KVM API.
>>
>> v2 -> v3
>>
>> * Remove the WARN_ON calls.
>> * Update static calls sanity checks.
>> * Update kzfree() calls to kfree().
>>
>> v1 -> v2
>>
>> * Add log pattern for NE.
>> * Update goto labels to match their purpose.
>> * Remove the BUG_ON calls.
>> * Check if enclave max memory regions is reached when setting an enclave
>>    memory region.
>> * Check if enclave state is init when setting an enclave memory region.
>> ---
>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 257 ++++++++++++++++++++++
>>   1 file changed, 257 insertions(+)
>>
>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> index cfdefa52ed2a..17ccb6cdbd75 100644
>> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>> @@ -476,6 +476,233 @@ static int ne_create_vcpu_ioctl(struct 
>> ne_enclave *ne_enclave, u32 vcpu_id)
>>       return rc;
>>   }
>>   +/**
>> + * ne_sanity_check_user_mem_region - Sanity check the userspace memory
>> + * region received during the set user memory region ioctl call.
>> + *
>> + * This function gets called with the ne_enclave mutex held.
>> + *
>> + * @ne_enclave: private data associated with the current enclave.
>> + * @mem_region: user space memory region to be sanity checked.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_sanity_check_user_mem_region(struct ne_enclave 
>> *ne_enclave,
>> +    struct ne_user_memory_region *mem_region)
>> +{
>> +    if (ne_enclave->mm != current->mm)
>> +        return -EIO;
>> +
>> +    if ((mem_region->memory_size % NE_MIN_MEM_REGION_SIZE) != 0) {
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "Mem size not multiple of 2 MiB\n");
>> +
>> +        return -EINVAL;
>
> Can we make this an error that gets propagated to user space 
> explicitly? I'd rather have a clear error return value of this 
> function than a random message in dmesg.

We can make this, will add memory checks specific NE error codes, as for 
the other call paths in the series e.g. enclave CPU(s) setup.

>
>> +    }
>> +
>> +    if ((mem_region->userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
>
> This logic already relies on the fact that NE_MIN_MEM_REGION_SIZE is a 
> power of two. Can you do the same above on the memory_size check?

Done.

>
>> +        !access_ok((void __user *)(unsigned 
>> long)mem_region->userspace_addr,
>> +               mem_region->memory_size)) {
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "Invalid user space addr range\n");
>> +
>> +        return -EINVAL;
>
> Same comment again. Return different errors for different conditions, 
> so that user space has a chance to print proper errors to its users.
>
> Also, don't we have to check alignment of userspace_addr as well?
>

Would need an alignment check for 2 MiB at least, yes.

>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * ne_set_user_memory_region_ioctl - Add user space memory region to 
>> the slot
>> + * associated with the current enclave.
>> + *
>> + * This function gets called with the ne_enclave mutex held.
>> + *
>> + * @ne_enclave: private data associated with the current enclave.
>> + * @mem_region: user space memory region to be associated with the 
>> given slot.
>> + *
>> + * @returns: 0 on success, negative return value on failure.
>> + */
>> +static int ne_set_user_memory_region_ioctl(struct ne_enclave 
>> *ne_enclave,
>> +    struct ne_user_memory_region *mem_region)
>> +{
>> +    struct ne_pci_dev_cmd_reply cmd_reply = {};
>> +    long gup_rc = 0;
>> +    unsigned long i = 0;
>> +    struct ne_mem_region *ne_mem_region = NULL;
>> +    unsigned long nr_phys_contig_mem_regions = 0;
>> +    unsigned long nr_pinned_pages = 0;
>> +    struct page **phys_contig_mem_regions = NULL;
>> +    int rc = -EINVAL;
>> +    struct slot_add_mem_req slot_add_mem_req = {};
>> +
>> +    rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
>> +    if (rc < 0)
>> +        return rc;
>> +
>> +    ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
>> +    if (!ne_mem_region)
>> +        return -ENOMEM;
>> +
>> +    /*
>> +     * TODO: Update nr_pages value to handle contiguous virtual address
>> +     * ranges mapped to non-contiguous physical regions. Hugetlbfs 
>> can give
>> +     * 2 MiB / 1 GiB contiguous physical regions.
>> +     */
>> +    ne_mem_region->nr_pages = mem_region->memory_size /
>> +        NE_MIN_MEM_REGION_SIZE;
>> +
>> +    ne_mem_region->pages = kcalloc(ne_mem_region->nr_pages,
>> +                       sizeof(*ne_mem_region->pages),
>> +                       GFP_KERNEL);
>> +    if (!ne_mem_region->pages) {
>> +        kfree(ne_mem_region);
>> +
>> +        return -ENOMEM;
>
> kfree(NULL) is a nop, so you can just set rc and goto free_mem_region 
> here and below.

Updated both return paths.

>
>> +    }
>> +
>> +    phys_contig_mem_regions = kcalloc(ne_mem_region->nr_pages,
>> +                      sizeof(*phys_contig_mem_regions),
>> +                      GFP_KERNEL);
>> +    if (!phys_contig_mem_regions) {
>> +        kfree(ne_mem_region->pages);
>> +        kfree(ne_mem_region);
>> +
>> +        return -ENOMEM;
>> +    }
>> +
>> +    /*
>> +     * TODO: Handle non-contiguous memory regions received from user 
>> space.
>> +     * Hugetlbfs can give 2 MiB / 1 GiB contiguous physical regions. 
>> The
>> +     * virtual address space can be seen as contiguous, although it is
>> +     * mapped underneath to 2 MiB / 1 GiB physical regions e.g. 8 MiB
>> +     * virtual address space mapped to 4 physically contiguous 
>> regions of 2
>> +     * MiB.
>> +     */
>> +    do {
>> +        unsigned long tmp_nr_pages = ne_mem_region->nr_pages -
>> +            nr_pinned_pages;
>> +        struct page **tmp_pages = ne_mem_region->pages +
>> +            nr_pinned_pages;
>> +        u64 tmp_userspace_addr = mem_region->userspace_addr +
>> +            nr_pinned_pages * NE_MIN_MEM_REGION_SIZE;
>> +
>> +        gup_rc = get_user_pages(tmp_userspace_addr, tmp_nr_pages,
>> +                    FOLL_GET, tmp_pages, NULL);
>> +        if (gup_rc < 0) {
>> +            rc = gup_rc;
>> +
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in gup [rc=%d]\n", rc);
>> +
>> +            unpin_user_pages(ne_mem_region->pages, nr_pinned_pages);
>> +
>> +            goto free_mem_region;
>> +        }
>> +
>> +        nr_pinned_pages += gup_rc;
>> +
>> +    } while (nr_pinned_pages < ne_mem_region->nr_pages);
>
> Can this deadlock the kernel? Shouldn't we rather return an error when 
> we can't pin all pages?

It shouldn't cause a deadlock, based on the return values:

 > Returns either number of pages pinned (which may be less than the
 > number requested), or an error. Details about the return value:
 >
 > -- If nr_pages is 0, returns 0.
 > -- If nr_pages is >0, but no pages were pinned, returns -errno.
 > -- If nr_pages is >0, and some pages were pinned, returns the number of
 > pages pinned. Again, this may be less than nr_pages.


But I can update the logic to have all or nothing.

>
>> +
>> +    /*
>> +     * TODO: Update checks once physically contiguous regions are 
>> collected
>> +     * based on the user space address and get_user_pages() results.
>> +     */
>> +    for (i = 0; i < ne_mem_region->nr_pages; i++) {
>> +        if (!PageHuge(ne_mem_region->pages[i])) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Not a hugetlbfs page\n");
>> +
>> +            goto unpin_pages;
>> +        }
>> +
>> +        if (huge_page_size(page_hstate(ne_mem_region->pages[i])) !=
>> +            NE_MIN_MEM_REGION_SIZE) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Page size isn't 2 MiB\n");
>
> Why is a huge page size of >2MB a problem? Can't we just make 
> huge_page_size() the ne mem slot size?

It's not a problem, actually this is part of the TODO(s) from the 
current function, to support contiguous regions larger than 2 MiB. It's 
just that we started with 2 MiB. :)

>
>> +
>> +            goto unpin_pages;
>> +        }
>> +
>> +        if (ne_enclave->numa_node !=
>> +            page_to_nid(ne_mem_region->pages[i])) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Page isn't from NUMA node %d\n",
>> +                        ne_enclave->numa_node);
>> +
>> +            goto unpin_pages;
>
> Is there a way to give user space hints on *why* things are going wrong?

Yes, one option for the user space to have more insights is to have the 
specific NE error codes you mentioned, so that we can improve the 
experience even further.

>
>> +        }
>> +
>> +        /*
>> +         * TODO: Update once handled non-contiguous memory regions
>> +         * received from user space.
>> +         */
>> +        phys_contig_mem_regions[i] = ne_mem_region->pages[i];
>> +    }
>> +
>> +    /*
>> +     * TODO: Update once handled non-contiguous memory regions received
>> +     * from user space.
>> +     */
>> +    nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
>> +
>> +    if ((ne_enclave->nr_mem_regions + nr_phys_contig_mem_regions) >
>> +        ne_enclave->max_mem_regions) {
>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>> +                    "Reached max memory regions %lld\n",
>> +                    ne_enclave->max_mem_regions);
>> +
>> +        goto unpin_pages;
>> +    }
>> +
>> +    for (i = 0; i < nr_phys_contig_mem_regions; i++) {
>> +        u64 phys_addr = page_to_phys(phys_contig_mem_regions[i]);
>> +
>> +        slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
>> +        slot_add_mem_req.paddr = phys_addr;
>> +        /*
>> +         * TODO: Update memory size of physical contiguous memory
>> +         * region, in case of non-contiguous memory regions received
>> +         * from user space.
>> +         */
>> +        slot_add_mem_req.size = NE_MIN_MEM_REGION_SIZE;
>
> Yeah, for now, just make it huge_page_size()! :)

Yup, I'll handle this in order to have the option for other sizes, in 
addition to 2 MiB e.g. 1 GiB for hugetlbfs.

>
>> +
>> +        rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_MEM,
>> +                   &slot_add_mem_req, sizeof(slot_add_mem_req),
>> +                   &cmd_reply, sizeof(cmd_reply));
>> +        if (rc < 0) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in slot add mem [rc=%d]\n",
>> +                        rc);
>> +
>> +            /* TODO: Only unpin memory regions not added. */
>
> Are we sure we're not creating an unusable system here?

The way the requests to the PCI device are structured is that we cannot 
get back a memory region / CPU, once added, till the enclave is 
terminated. Let's say there is an error in the remaining logic from the 
ioctl, after the region is successfully added, then the memory region 
can be given back to the primary / parent VM once the enclave 
termination (including slot free) is done.

We can either have the logic handle one contiguous region per ioctl call 
(user space gives a memory region that is backed by a single contiguous 
physical memory region) or have a for loop to go through all contiguous 
regions (user space gives a memory region that is backed by a set of 
(smaller) contiguous physical memory regions). In the second case, if a 
request to the NE PCI device fails, already added memory regions can be 
given back only on slot free, triggered by the enclave termination, when 
closing the enclave fd.

>
>> +            goto unpin_pages;
>> +        }
>> +
>> +        ne_enclave->mem_size += slot_add_mem_req.size;
>> +        ne_enclave->nr_mem_regions++;
>> +
>> +        memset(&slot_add_mem_req, 0, sizeof(slot_add_mem_req));
>> +        memset(&cmd_reply, 0, sizeof(cmd_reply));
>
> If you define the variables in the for loop scope, you don't need to 
> manually zero them again.

Updated to have the variables in the loop instead.

Thank you.

Andra

>
>> +    }
>> +
>> +    list_add(&ne_mem_region->mem_region_list_entry,
>> +         &ne_enclave->mem_regions_list);
>> +
>> +    kfree(phys_contig_mem_regions);
>> +
>> +    return 0;
>> +
>> +unpin_pages:
>> +    unpin_user_pages(ne_mem_region->pages, ne_mem_region->nr_pages);
>> +free_mem_region:
>> +    kfree(phys_contig_mem_regions);
>> +    kfree(ne_mem_region->pages);
>> +    kfree(ne_mem_region);
>> +
>> +    return rc;
>> +}
>> +
>>   static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
>>                    unsigned long arg)
>>   {
>> @@ -561,6 +788,36 @@ static long ne_enclave_ioctl(struct file *file, 
>> unsigned int cmd,
>>           return 0;
>>       }
>>   +    case NE_SET_USER_MEMORY_REGION: {
>> +        struct ne_user_memory_region mem_region = {};
>> +        int rc = -EINVAL;
>> +
>> +        if (copy_from_user(&mem_region, (void *)arg,
>> +                   sizeof(mem_region))) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Error in copy from user\n");
>> +
>> +            return -EFAULT;
>> +        }
>> +
>> +        mutex_lock(&ne_enclave->enclave_info_mutex);
>> +
>> +        if (ne_enclave->state != NE_STATE_INIT) {
>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>> +                        "Enclave isn't in init state\n");
>> +
>> + mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +            return -EINVAL;
>> +        }
>> +
>> +        rc = ne_set_user_memory_region_ioctl(ne_enclave, &mem_region);
>> +
>> +        mutex_unlock(&ne_enclave->enclave_info_mutex);
>> +
>> +        return rc;
>> +    }
>> +
>>       default:
>>           return -ENOTTY;
>>       }
>>




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set
  2020-07-09  7:36     ` Paraschiv, Andra-Irina
@ 2020-07-09  8:40       ` Alexander Graf
  2020-07-09  9:41         ` Paraschiv, Andra-Irina
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Graf @ 2020-07-09  8:40 UTC (permalink / raw)
  To: Paraschiv, Andra-Irina, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 09.07.20 09:36, Paraschiv, Andra-Irina wrote:
> 
> 
> On 06/07/2020 13:46, Alexander Graf wrote:
>>
>>
>> On 22.06.20 22:03, Andra Paraschiv wrote:
>>> Another resource that is being set for an enclave is memory. User space
>>> memory regions, that need to be backed by contiguous memory regions,
>>> are associated with the enclave.
>>>
>>> One solution for allocating / reserving contiguous memory regions, that
>>> is used for integration, is hugetlbfs. The user space process that is
>>> associated with the enclave passes to the driver these memory regions.
>>>
>>> The enclave memory regions need to be from the same NUMA node as the
>>> enclave CPUs.
>>>
>>> Add ioctl command logic for setting user space memory region for an
>>> enclave.
>>>
>>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>>> ---
>>> Changelog
>>>
>>> v3 -> v4
>>>
>>> * Check enclave memory regions are from the same NUMA node as the
>>>    enclave CPUs.
>>> * Use dev_err instead of custom NE log pattern.
>>> * Update the NE ioctl call to match the decoupling from the KVM API.
>>>
>>> v2 -> v3
>>>
>>> * Remove the WARN_ON calls.
>>> * Update static calls sanity checks.
>>> * Update kzfree() calls to kfree().
>>>
>>> v1 -> v2
>>>
>>> * Add log pattern for NE.
>>> * Update goto labels to match their purpose.
>>> * Remove the BUG_ON calls.
>>> * Check if enclave max memory regions is reached when setting an enclave
>>>    memory region.
>>> * Check if enclave state is init when setting an enclave memory region.
>>> ---
>>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 257 ++++++++++++++++++++++
>>>   1 file changed, 257 insertions(+)
>>>
>>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>> index cfdefa52ed2a..17ccb6cdbd75 100644
>>> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>> @@ -476,6 +476,233 @@ static int ne_create_vcpu_ioctl(struct 
>>> ne_enclave *ne_enclave, u32 vcpu_id)
>>>       return rc;
>>>   }
>>>   +/**
>>> + * ne_sanity_check_user_mem_region - Sanity check the userspace memory
>>> + * region received during the set user memory region ioctl call.
>>> + *
>>> + * This function gets called with the ne_enclave mutex held.
>>> + *
>>> + * @ne_enclave: private data associated with the current enclave.
>>> + * @mem_region: user space memory region to be sanity checked.
>>> + *
>>> + * @returns: 0 on success, negative return value on failure.
>>> + */
>>> +static int ne_sanity_check_user_mem_region(struct ne_enclave 
>>> *ne_enclave,
>>> +    struct ne_user_memory_region *mem_region)
>>> +{
>>> +    if (ne_enclave->mm != current->mm)
>>> +        return -EIO;
>>> +
>>> +    if ((mem_region->memory_size % NE_MIN_MEM_REGION_SIZE) != 0) {
>>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                    "Mem size not multiple of 2 MiB\n");
>>> +
>>> +        return -EINVAL;
>>
>> Can we make this an error that gets propagated to user space 
>> explicitly? I'd rather have a clear error return value of this 
>> function than a random message in dmesg.
> 
> We can make this, will add memory checks specific NE error codes, as for 
> the other call paths in the series e.g. enclave CPU(s) setup.
> 
>>
>>> +    }
>>> +
>>> +    if ((mem_region->userspace_addr & (NE_MIN_MEM_REGION_SIZE - 1)) ||
>>
>> This logic already relies on the fact that NE_MIN_MEM_REGION_SIZE is a 
>> power of two. Can you do the same above on the memory_size check?
> 
> Done.
> 
>>
>>> +        !access_ok((void __user *)(unsigned 
>>> long)mem_region->userspace_addr,
>>> +               mem_region->memory_size)) {
>>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                    "Invalid user space addr range\n");
>>> +
>>> +        return -EINVAL;
>>
>> Same comment again. Return different errors for different conditions, 
>> so that user space has a chance to print proper errors to its users.
>>
>> Also, don't we have to check alignment of userspace_addr as well?
>>
> 
> Would need an alignment check for 2 MiB at least, yes.
> 
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/**
>>> + * ne_set_user_memory_region_ioctl - Add user space memory region to 
>>> the slot
>>> + * associated with the current enclave.
>>> + *
>>> + * This function gets called with the ne_enclave mutex held.
>>> + *
>>> + * @ne_enclave: private data associated with the current enclave.
>>> + * @mem_region: user space memory region to be associated with the 
>>> given slot.
>>> + *
>>> + * @returns: 0 on success, negative return value on failure.
>>> + */
>>> +static int ne_set_user_memory_region_ioctl(struct ne_enclave 
>>> *ne_enclave,
>>> +    struct ne_user_memory_region *mem_region)
>>> +{
>>> +    struct ne_pci_dev_cmd_reply cmd_reply = {};
>>> +    long gup_rc = 0;
>>> +    unsigned long i = 0;
>>> +    struct ne_mem_region *ne_mem_region = NULL;
>>> +    unsigned long nr_phys_contig_mem_regions = 0;
>>> +    unsigned long nr_pinned_pages = 0;
>>> +    struct page **phys_contig_mem_regions = NULL;
>>> +    int rc = -EINVAL;
>>> +    struct slot_add_mem_req slot_add_mem_req = {};
>>> +
>>> +    rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
>>> +    if (rc < 0)
>>> +        return rc;
>>> +
>>> +    ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
>>> +    if (!ne_mem_region)
>>> +        return -ENOMEM;
>>> +
>>> +    /*
>>> +     * TODO: Update nr_pages value to handle contiguous virtual address
>>> +     * ranges mapped to non-contiguous physical regions. Hugetlbfs 
>>> can give
>>> +     * 2 MiB / 1 GiB contiguous physical regions.
>>> +     */
>>> +    ne_mem_region->nr_pages = mem_region->memory_size /
>>> +        NE_MIN_MEM_REGION_SIZE;
>>> +
>>> +    ne_mem_region->pages = kcalloc(ne_mem_region->nr_pages,
>>> +                       sizeof(*ne_mem_region->pages),
>>> +                       GFP_KERNEL);
>>> +    if (!ne_mem_region->pages) {
>>> +        kfree(ne_mem_region);
>>> +
>>> +        return -ENOMEM;
>>
>> kfree(NULL) is a nop, so you can just set rc and goto free_mem_region 
>> here and below.
> 
> Updated both return paths.
> 
>>
>>> +    }
>>> +
>>> +    phys_contig_mem_regions = kcalloc(ne_mem_region->nr_pages,
>>> +                      sizeof(*phys_contig_mem_regions),
>>> +                      GFP_KERNEL);
>>> +    if (!phys_contig_mem_regions) {
>>> +        kfree(ne_mem_region->pages);
>>> +        kfree(ne_mem_region);
>>> +
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    /*
>>> +     * TODO: Handle non-contiguous memory regions received from user 
>>> space.
>>> +     * Hugetlbfs can give 2 MiB / 1 GiB contiguous physical regions. 
>>> The
>>> +     * virtual address space can be seen as contiguous, although it is
>>> +     * mapped underneath to 2 MiB / 1 GiB physical regions e.g. 8 MiB
>>> +     * virtual address space mapped to 4 physically contiguous 
>>> regions of 2
>>> +     * MiB.
>>> +     */
>>> +    do {
>>> +        unsigned long tmp_nr_pages = ne_mem_region->nr_pages -
>>> +            nr_pinned_pages;
>>> +        struct page **tmp_pages = ne_mem_region->pages +
>>> +            nr_pinned_pages;
>>> +        u64 tmp_userspace_addr = mem_region->userspace_addr +
>>> +            nr_pinned_pages * NE_MIN_MEM_REGION_SIZE;
>>> +
>>> +        gup_rc = get_user_pages(tmp_userspace_addr, tmp_nr_pages,
>>> +                    FOLL_GET, tmp_pages, NULL);
>>> +        if (gup_rc < 0) {
>>> +            rc = gup_rc;
>>> +
>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                        "Error in gup [rc=%d]\n", rc);
>>> +
>>> +            unpin_user_pages(ne_mem_region->pages, nr_pinned_pages);
>>> +
>>> +            goto free_mem_region;
>>> +        }
>>> +
>>> +        nr_pinned_pages += gup_rc;
>>> +
>>> +    } while (nr_pinned_pages < ne_mem_region->nr_pages);
>>
>> Can this deadlock the kernel? Shouldn't we rather return an error when 
>> we can't pin all pages?
> 
> It shouldn't cause a deadlock, based on the return values:
> 
>  > Returns either number of pages pinned (which may be less than the
>  > number requested), or an error. Details about the return value:
>  >
>  > -- If nr_pages is 0, returns 0.
>  > -- If nr_pages is >0, but no pages were pinned, returns -errno.
>  > -- If nr_pages is >0, and some pages were pinned, returns the number of
>  > pages pinned. Again, this may be less than nr_pages.
> 
> 
> But I can update the logic to have all or nothing.
> 
>>
>>> +
>>> +    /*
>>> +     * TODO: Update checks once physically contiguous regions are 
>>> collected
>>> +     * based on the user space address and get_user_pages() results.
>>> +     */
>>> +    for (i = 0; i < ne_mem_region->nr_pages; i++) {
>>> +        if (!PageHuge(ne_mem_region->pages[i])) {
>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                        "Not a hugetlbfs page\n");
>>> +
>>> +            goto unpin_pages;
>>> +        }
>>> +
>>> +        if (huge_page_size(page_hstate(ne_mem_region->pages[i])) !=
>>> +            NE_MIN_MEM_REGION_SIZE) {
>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                        "Page size isn't 2 MiB\n");
>>
>> Why is a huge page size of >2MB a problem? Can't we just make 
>> huge_page_size() the ne mem slot size?
> 
> It's not a problem, actually this is part of the TODO(s) from the 
> current function, to support contiguous regions larger than 2 MiB. It's 
> just that we started with 2 MiB. :)
> 
>>
>>> +
>>> +            goto unpin_pages;
>>> +        }
>>> +
>>> +        if (ne_enclave->numa_node !=
>>> +            page_to_nid(ne_mem_region->pages[i])) {
>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                        "Page isn't from NUMA node %d\n",
>>> +                        ne_enclave->numa_node);
>>> +
>>> +            goto unpin_pages;
>>
>> Is there a way to give user space hints on *why* things are going wrong?
> 
> Yes, one option for the user space to have more insights is to have the 
> specific NE error codes you mentioned, so that we can improve the 
> experience even further.
> 
>>
>>> +        }
>>> +
>>> +        /*
>>> +         * TODO: Update once handled non-contiguous memory regions
>>> +         * received from user space.
>>> +         */
>>> +        phys_contig_mem_regions[i] = ne_mem_region->pages[i];
>>> +    }
>>> +
>>> +    /*
>>> +     * TODO: Update once handled non-contiguous memory regions received
>>> +     * from user space.
>>> +     */
>>> +    nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
>>> +
>>> +    if ((ne_enclave->nr_mem_regions + nr_phys_contig_mem_regions) >
>>> +        ne_enclave->max_mem_regions) {
>>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                    "Reached max memory regions %lld\n",
>>> +                    ne_enclave->max_mem_regions);
>>> +
>>> +        goto unpin_pages;
>>> +    }
>>> +
>>> +    for (i = 0; i < nr_phys_contig_mem_regions; i++) {
>>> +        u64 phys_addr = page_to_phys(phys_contig_mem_regions[i]);
>>> +
>>> +        slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
>>> +        slot_add_mem_req.paddr = phys_addr;
>>> +        /*
>>> +         * TODO: Update memory size of physical contiguous memory
>>> +         * region, in case of non-contiguous memory regions received
>>> +         * from user space.
>>> +         */
>>> +        slot_add_mem_req.size = NE_MIN_MEM_REGION_SIZE;
>>
>> Yeah, for now, just make it huge_page_size()! :)
> 
> Yup, I'll handle this in order to have the option for other sizes, in 
> addition to 2 MiB e.g. 1 GiB for hugetlbfs.
> 
>>
>>> +
>>> +        rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_MEM,
>>> +                   &slot_add_mem_req, sizeof(slot_add_mem_req),
>>> +                   &cmd_reply, sizeof(cmd_reply));
>>> +        if (rc < 0) {
>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>> +                        "Error in slot add mem [rc=%d]\n",
>>> +                        rc);
>>> +
>>> +            /* TODO: Only unpin memory regions not added. */
>>
>> Are we sure we're not creating an unusable system here?
> 
> The way the requests to the PCI device are structured is that we cannot 
> get back a memory region / CPU, once added, till the enclave is 
> terminated. Let's say there is an error in the remaining logic from the 
> ioctl, after the region is successfully added, then the memory region 
> can be given back to the primary / parent VM once the enclave 
> termination (including slot free) is done.
> 
> We can either have the logic handle one contiguous region per ioctl call 
> (user space gives a memory region that is backed by a single contiguous 
> physical memory region) or have a for loop to go through all contiguous 
> regions (user space gives a memory region that is backed by a set of 
> (smaller) contiguous physical memory regions). In the second case, if a 
> request to the NE PCI device fails, already added memory regions can be 
> given back only on slot free, triggered by the enclave termination, when 
> closing the enclave fd.

I'm in full agreement with you, but the logic here aborts mid-way, 
explicitly unpins all pages (does that mean use count is now -1 for 
some?) and does not keep track of the fact that some pages may be 
donated already. Does that mean that those pages may be reserved for the 
enclave, but passed to user space again?

I think in the error case, we should not unpin for now, because we can't 
guarantee that the "enclave device" isn't using those pages.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set
  2020-07-09  8:40       ` Alexander Graf
@ 2020-07-09  9:41         ` Paraschiv, Andra-Irina
  0 siblings, 0 replies; 67+ messages in thread
From: Paraschiv, Andra-Irina @ 2020-07-09  9:41 UTC (permalink / raw)
  To: Alexander Graf, linux-kernel
  Cc: Anthony Liguori, Benjamin Herrenschmidt, Colm MacCarthaigh,
	Bjoern Doebel, David Woodhouse, Frank van der Linden, Greg KH,
	Martin Pohlack, Matt Wilson, Paolo Bonzini, Balbir Singh,
	Stefano Garzarella, Stefan Hajnoczi, Stewart Smith,
	Uwe Dannowski, kvm, ne-devel-upstream



On 09/07/2020 11:40, Alexander Graf wrote:
>
>
> On 09.07.20 09:36, Paraschiv, Andra-Irina wrote:
>>
>>
>> On 06/07/2020 13:46, Alexander Graf wrote:
>>>
>>>
>>> On 22.06.20 22:03, Andra Paraschiv wrote:
>>>> Another resource that is being set for an enclave is memory. User 
>>>> space
>>>> memory regions, that need to be backed by contiguous memory regions,
>>>> are associated with the enclave.
>>>>
>>>> One solution for allocating / reserving contiguous memory regions, 
>>>> that
>>>> is used for integration, is hugetlbfs. The user space process that is
>>>> associated with the enclave passes to the driver these memory regions.
>>>>
>>>> The enclave memory regions need to be from the same NUMA node as the
>>>> enclave CPUs.
>>>>
>>>> Add ioctl command logic for setting user space memory region for an
>>>> enclave.
>>>>
>>>> Signed-off-by: Alexandru Vasile <lexnv@amazon.com>
>>>> Signed-off-by: Andra Paraschiv <andraprs@amazon.com>
>>>> ---
>>>> Changelog
>>>>
>>>> v3 -> v4
>>>>
>>>> * Check enclave memory regions are from the same NUMA node as the
>>>>    enclave CPUs.
>>>> * Use dev_err instead of custom NE log pattern.
>>>> * Update the NE ioctl call to match the decoupling from the KVM API.
>>>>
>>>> v2 -> v3
>>>>
>>>> * Remove the WARN_ON calls.
>>>> * Update static calls sanity checks.
>>>> * Update kzfree() calls to kfree().
>>>>
>>>> v1 -> v2
>>>>
>>>> * Add log pattern for NE.
>>>> * Update goto labels to match their purpose.
>>>> * Remove the BUG_ON calls.
>>>> * Check if enclave max memory regions is reached when setting an 
>>>> enclave
>>>>    memory region.
>>>> * Check if enclave state is init when setting an enclave memory 
>>>> region.
>>>> ---
>>>>   drivers/virt/nitro_enclaves/ne_misc_dev.c | 257 
>>>> ++++++++++++++++++++++
>>>>   1 file changed, 257 insertions(+)
>>>>
>>>> diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
>>>> b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>>> index cfdefa52ed2a..17ccb6cdbd75 100644
>>>> --- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>>> +++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
>>>> @@ -476,6 +476,233 @@ static int ne_create_vcpu_ioctl(struct 
>>>> ne_enclave *ne_enclave, u32 vcpu_id)
>>>>       return rc;
>>>>   }
>>>>   +/**
>>>> + * ne_sanity_check_user_mem_region - Sanity check the userspace 
>>>> memory
>>>> + * region received during the set user memory region ioctl call.
>>>> + *
>>>> + * This function gets called with the ne_enclave mutex held.
>>>> + *
>>>> + * @ne_enclave: private data associated with the current enclave.
>>>> + * @mem_region: user space memory region to be sanity checked.
>>>> + *
>>>> + * @returns: 0 on success, negative return value on failure.
>>>> + */
>>>> +static int ne_sanity_check_user_mem_region(struct ne_enclave 
>>>> *ne_enclave,
>>>> +    struct ne_user_memory_region *mem_region)
>>>> +{
>>>> +    if (ne_enclave->mm != current->mm)
>>>> +        return -EIO;
>>>> +
>>>> +    if ((mem_region->memory_size % NE_MIN_MEM_REGION_SIZE) != 0) {
>>>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                    "Mem size not multiple of 2 MiB\n");
>>>> +
>>>> +        return -EINVAL;
>>>
>>> Can we make this an error that gets propagated to user space 
>>> explicitly? I'd rather have a clear error return value of this 
>>> function than a random message in dmesg.
>>
>> We can make this, will add memory checks specific NE error codes, as 
>> for the other call paths in the series e.g. enclave CPU(s) setup.
>>
>>>
>>>> +    }
>>>> +
>>>> +    if ((mem_region->userspace_addr & (NE_MIN_MEM_REGION_SIZE - 
>>>> 1)) ||
>>>
>>> This logic already relies on the fact that NE_MIN_MEM_REGION_SIZE is 
>>> a power of two. Can you do the same above on the memory_size check?
>>
>> Done.
>>
>>>
>>>> +        !access_ok((void __user *)(unsigned 
>>>> long)mem_region->userspace_addr,
>>>> +               mem_region->memory_size)) {
>>>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                    "Invalid user space addr range\n");
>>>> +
>>>> +        return -EINVAL;
>>>
>>> Same comment again. Return different errors for different 
>>> conditions, so that user space has a chance to print proper errors 
>>> to its users.
>>>
>>> Also, don't we have to check alignment of userspace_addr as well?
>>>
>>
>> Would need an alignment check for 2 MiB at least, yes.
>>
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/**
>>>> + * ne_set_user_memory_region_ioctl - Add user space memory region 
>>>> to the slot
>>>> + * associated with the current enclave.
>>>> + *
>>>> + * This function gets called with the ne_enclave mutex held.
>>>> + *
>>>> + * @ne_enclave: private data associated with the current enclave.
>>>> + * @mem_region: user space memory region to be associated with the 
>>>> given slot.
>>>> + *
>>>> + * @returns: 0 on success, negative return value on failure.
>>>> + */
>>>> +static int ne_set_user_memory_region_ioctl(struct ne_enclave 
>>>> *ne_enclave,
>>>> +    struct ne_user_memory_region *mem_region)
>>>> +{
>>>> +    struct ne_pci_dev_cmd_reply cmd_reply = {};
>>>> +    long gup_rc = 0;
>>>> +    unsigned long i = 0;
>>>> +    struct ne_mem_region *ne_mem_region = NULL;
>>>> +    unsigned long nr_phys_contig_mem_regions = 0;
>>>> +    unsigned long nr_pinned_pages = 0;
>>>> +    struct page **phys_contig_mem_regions = NULL;
>>>> +    int rc = -EINVAL;
>>>> +    struct slot_add_mem_req slot_add_mem_req = {};
>>>> +
>>>> +    rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
>>>> +    if (rc < 0)
>>>> +        return rc;
>>>> +
>>>> +    ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
>>>> +    if (!ne_mem_region)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    /*
>>>> +     * TODO: Update nr_pages value to handle contiguous virtual 
>>>> address
>>>> +     * ranges mapped to non-contiguous physical regions. Hugetlbfs 
>>>> can give
>>>> +     * 2 MiB / 1 GiB contiguous physical regions.
>>>> +     */
>>>> +    ne_mem_region->nr_pages = mem_region->memory_size /
>>>> +        NE_MIN_MEM_REGION_SIZE;
>>>> +
>>>> +    ne_mem_region->pages = kcalloc(ne_mem_region->nr_pages,
>>>> +                       sizeof(*ne_mem_region->pages),
>>>> +                       GFP_KERNEL);
>>>> +    if (!ne_mem_region->pages) {
>>>> +        kfree(ne_mem_region);
>>>> +
>>>> +        return -ENOMEM;
>>>
>>> kfree(NULL) is a nop, so you can just set rc and goto 
>>> free_mem_region here and below.
>>
>> Updated both return paths.
>>
>>>
>>>> +    }
>>>> +
>>>> +    phys_contig_mem_regions = kcalloc(ne_mem_region->nr_pages,
>>>> +                      sizeof(*phys_contig_mem_regions),
>>>> +                      GFP_KERNEL);
>>>> +    if (!phys_contig_mem_regions) {
>>>> +        kfree(ne_mem_region->pages);
>>>> +        kfree(ne_mem_region);
>>>> +
>>>> +        return -ENOMEM;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * TODO: Handle non-contiguous memory regions received from 
>>>> user space.
>>>> +     * Hugetlbfs can give 2 MiB / 1 GiB contiguous physical 
>>>> regions. The
>>>> +     * virtual address space can be seen as contiguous, although 
>>>> it is
>>>> +     * mapped underneath to 2 MiB / 1 GiB physical regions e.g. 8 MiB
>>>> +     * virtual address space mapped to 4 physically contiguous 
>>>> regions of 2
>>>> +     * MiB.
>>>> +     */
>>>> +    do {
>>>> +        unsigned long tmp_nr_pages = ne_mem_region->nr_pages -
>>>> +            nr_pinned_pages;
>>>> +        struct page **tmp_pages = ne_mem_region->pages +
>>>> +            nr_pinned_pages;
>>>> +        u64 tmp_userspace_addr = mem_region->userspace_addr +
>>>> +            nr_pinned_pages * NE_MIN_MEM_REGION_SIZE;
>>>> +
>>>> +        gup_rc = get_user_pages(tmp_userspace_addr, tmp_nr_pages,
>>>> +                    FOLL_GET, tmp_pages, NULL);
>>>> +        if (gup_rc < 0) {
>>>> +            rc = gup_rc;
>>>> +
>>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                        "Error in gup [rc=%d]\n", rc);
>>>> +
>>>> +            unpin_user_pages(ne_mem_region->pages, nr_pinned_pages);
>>>> +
>>>> +            goto free_mem_region;
>>>> +        }
>>>> +
>>>> +        nr_pinned_pages += gup_rc;
>>>> +
>>>> +    } while (nr_pinned_pages < ne_mem_region->nr_pages);
>>>
>>> Can this deadlock the kernel? Shouldn't we rather return an error 
>>> when we can't pin all pages?
>>
>> It shouldn't cause a deadlock, based on the return values:
>>
>>  > Returns either number of pages pinned (which may be less than the
>>  > number requested), or an error. Details about the return value:
>>  >
>>  > -- If nr_pages is 0, returns 0.
>>  > -- If nr_pages is >0, but no pages were pinned, returns -errno.
>>  > -- If nr_pages is >0, and some pages were pinned, returns the 
>> number of
>>  > pages pinned. Again, this may be less than nr_pages.
>>
>>
>> But I can update the logic to have all or nothing.
>>
>>>
>>>> +
>>>> +    /*
>>>> +     * TODO: Update checks once physically contiguous regions are 
>>>> collected
>>>> +     * based on the user space address and get_user_pages() results.
>>>> +     */
>>>> +    for (i = 0; i < ne_mem_region->nr_pages; i++) {
>>>> +        if (!PageHuge(ne_mem_region->pages[i])) {
>>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                        "Not a hugetlbfs page\n");
>>>> +
>>>> +            goto unpin_pages;
>>>> +        }
>>>> +
>>>> +        if (huge_page_size(page_hstate(ne_mem_region->pages[i])) !=
>>>> +            NE_MIN_MEM_REGION_SIZE) {
>>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                        "Page size isn't 2 MiB\n");
>>>
>>> Why is a huge page size of >2MB a problem? Can't we just make 
>>> huge_page_size() the ne mem slot size?
>>
>> It's not a problem, actually this is part of the TODO(s) from the 
>> current function, to support contiguous regions larger than 2 MiB. 
>> It's just that we started with 2 MiB. :)
>>
>>>
>>>> +
>>>> +            goto unpin_pages;
>>>> +        }
>>>> +
>>>> +        if (ne_enclave->numa_node !=
>>>> +            page_to_nid(ne_mem_region->pages[i])) {
>>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                        "Page isn't from NUMA node %d\n",
>>>> +                        ne_enclave->numa_node);
>>>> +
>>>> +            goto unpin_pages;
>>>
>>> Is there a way to give user space hints on *why* things are going 
>>> wrong?
>>
>> Yes, one option for the user space to have more insights is to have 
>> the specific NE error codes you mentioned, so that we can improve the 
>> experience even further.
>>
>>>
>>>> +        }
>>>> +
>>>> +        /*
>>>> +         * TODO: Update once handled non-contiguous memory regions
>>>> +         * received from user space.
>>>> +         */
>>>> +        phys_contig_mem_regions[i] = ne_mem_region->pages[i];
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * TODO: Update once handled non-contiguous memory regions 
>>>> received
>>>> +     * from user space.
>>>> +     */
>>>> +    nr_phys_contig_mem_regions = ne_mem_region->nr_pages;
>>>> +
>>>> +    if ((ne_enclave->nr_mem_regions + nr_phys_contig_mem_regions) >
>>>> +        ne_enclave->max_mem_regions) {
>>>> +        dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                    "Reached max memory regions %lld\n",
>>>> +                    ne_enclave->max_mem_regions);
>>>> +
>>>> +        goto unpin_pages;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < nr_phys_contig_mem_regions; i++) {
>>>> +        u64 phys_addr = page_to_phys(phys_contig_mem_regions[i]);
>>>> +
>>>> +        slot_add_mem_req.slot_uid = ne_enclave->slot_uid;
>>>> +        slot_add_mem_req.paddr = phys_addr;
>>>> +        /*
>>>> +         * TODO: Update memory size of physical contiguous memory
>>>> +         * region, in case of non-contiguous memory regions received
>>>> +         * from user space.
>>>> +         */
>>>> +        slot_add_mem_req.size = NE_MIN_MEM_REGION_SIZE;
>>>
>>> Yeah, for now, just make it huge_page_size()! :)
>>
>> Yup, I'll handle this in order to have the option for other sizes, in 
>> addition to 2 MiB e.g. 1 GiB for hugetlbfs.
>>
>>>
>>>> +
>>>> +        rc = ne_do_request(ne_enclave->pdev, SLOT_ADD_MEM,
>>>> +                   &slot_add_mem_req, sizeof(slot_add_mem_req),
>>>> +                   &cmd_reply, sizeof(cmd_reply));
>>>> +        if (rc < 0) {
>>>> +            dev_err_ratelimited(ne_misc_dev.this_device,
>>>> +                        "Error in slot add mem [rc=%d]\n",
>>>> +                        rc);
>>>> +
>>>> +            /* TODO: Only unpin memory regions not added. */
>>>
>>> Are we sure we're not creating an unusable system here?
>>
>> The way the requests to the PCI device are structured is that we 
>> cannot get back a memory region / CPU, once added, till the enclave 
>> is terminated. Let's say there is an error in the remaining logic 
>> from the ioctl, after the region is successfully added, then the 
>> memory region can be given back to the primary / parent VM once the 
>> enclave termination (including slot free) is done.
>>
>> We can either have the logic handle one contiguous region per ioctl 
>> call (user space gives a memory region that is backed by a single 
>> contiguous physical memory region) or have a for loop to go through 
>> all contiguous regions (user space gives a memory region that is 
>> backed by a set of (smaller) contiguous physical memory regions). In 
>> the second case, if a request to the NE PCI device fails, already 
>> added memory regions can be given back only on slot free, triggered 
>> by the enclave termination, when closing the enclave fd.
>
> I'm in full agreement with you, but the logic here aborts mid-way, 
> explicitly unpins all pages (does that mean use count is now -1 for 
> some?) and does not keep track of the fact that some pages may be 
> donated already. Does that mean that those pages may be reserved for 
> the enclave, but passed to user space again?
>
> I think in the error case, we should not unpin for now, because we 
> can't guarantee that the "enclave device" isn't using those pages.

True, it's somewhere in the middle. It didn't seem ok to me as well, 
that's why leaving the TODO in that block when considering this possible 
scenario.

I changed the logic after writing down the previous reply, to have a 
function exit in case of error and not unpin pages or remove the state, 
wrt memory regions, that we are keeping track of. Similar to what you've 
suggested above.

Thanks,
Andra



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2020-07-09  9:42 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22 20:03 [PATCH v4 00/18] Add support for Nitro Enclaves Andra Paraschiv
2020-06-22 20:03 ` [PATCH v4 01/18] nitro_enclaves: Add ioctl interface definition Andra Paraschiv
2020-06-23  8:56   ` Stefan Hajnoczi
2020-06-24 14:02     ` Paraschiv, Andra-Irina
2020-06-25 13:29       ` Stefan Hajnoczi
2020-06-25 17:42         ` Paraschiv, Andra-Irina
2020-07-02 15:24   ` Alexander Graf
2020-07-04  8:09     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 02/18] nitro_enclaves: Define the PCI device interface Andra Paraschiv
2020-07-02 15:24   ` Alexander Graf
2020-07-04  8:20     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 03/18] nitro_enclaves: Define enclave info for internal bookkeeping Andra Paraschiv
2020-07-02 15:24   ` Alexander Graf
2020-07-04  8:23     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 04/18] nitro_enclaves: Init PCI device driver Andra Paraschiv
2020-07-02 15:09   ` Alexander Graf
2020-07-04 10:00     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 05/18] nitro_enclaves: Handle PCI device command requests Andra Paraschiv
2020-07-02 15:19   ` Alexander Graf
2020-07-04 15:05     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 06/18] nitro_enclaves: Handle out-of-band PCI device events Andra Paraschiv
2020-07-02 15:24   ` Alexander Graf
2020-07-04 15:43     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 07/18] nitro_enclaves: Init misc device providing the ioctl interface Andra Paraschiv
2020-06-29 16:20   ` Greg KH
2020-06-29 17:45     ` Paraschiv, Andra-Irina
2020-06-30  8:05       ` Greg KH
2020-06-30  9:08         ` Paraschiv, Andra-Irina
2020-07-06  7:13   ` Alexander Graf
2020-07-06  7:49     ` Paraschiv, Andra-Irina
2020-07-06  8:01       ` Alexander Graf
2020-07-06 13:09         ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 08/18] nitro_enclaves: Add logic for enclave vm creation Andra Paraschiv
2020-07-06  7:53   ` Alexander Graf
2020-07-06 13:12     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 09/18] nitro_enclaves: Add logic for enclave vcpu creation Andra Paraschiv
2020-07-06 10:12   ` Alexander Graf
2020-07-08 12:46     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 10/18] nitro_enclaves: Add logic for enclave image load info Andra Paraschiv
2020-07-06 10:16   ` Alexander Graf
2020-07-06 13:35     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 11/18] nitro_enclaves: Add logic for enclave memory region set Andra Paraschiv
2020-07-06 10:46   ` Alexander Graf
2020-07-09  7:36     ` Paraschiv, Andra-Irina
2020-07-09  8:40       ` Alexander Graf
2020-07-09  9:41         ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 12/18] nitro_enclaves: Add logic for enclave start Andra Paraschiv
2020-07-06 11:21   ` Alexander Graf
2020-07-07 18:27     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 13/18] nitro_enclaves: Add logic for enclave termination Andra Paraschiv
2020-07-06 11:26   ` Alexander Graf
2020-07-06 14:15     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver Andra Paraschiv
2020-07-06 11:28   ` Alexander Graf
2020-07-06 13:50     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 15/18] nitro_enclaves: Add Makefile " Andra Paraschiv
2020-07-06 11:30   ` Alexander Graf
2020-07-06 14:00     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 16/18] nitro_enclaves: Add sample for ioctl interface usage Andra Paraschiv
2020-07-06 11:39   ` Alexander Graf
2020-07-07 19:03     ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 17/18] nitro_enclaves: Add overview documentation Andra Paraschiv
2020-06-23  8:59   ` Stefan Hajnoczi
2020-06-24 14:39     ` Paraschiv, Andra-Irina
2020-06-25 13:10       ` Stefan Hajnoczi
2020-06-25 17:36         ` Paraschiv, Andra-Irina
2020-06-22 20:03 ` [PATCH v4 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver Andra Paraschiv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).