All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages
@ 2021-10-12 10:57 Paolo Bonzini
  2021-10-12 10:57 ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Paolo Bonzini @ 2021-10-12 10:57 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: dave.hansen, seanjc, x86, yang.zhong, jarkko

Add to /dev/sgx_vepc a ioctl that brings vEPC pages back to uninitialized
state with EREMOVE.  This is useful in order to match the expectations
of guests after reboot, and to match the behavior of real hardware.

The ioctl is a cleaner alternative to closing and reopening the
/dev/sgx_vepc device; reopening /dev/sgx_vepc could be problematic in
case userspace has sandboxed itself since the time it first opened the
device, and has thus lost permissions to do so.

If possible, I would like these patches to be included in 5.15 through
either the x86 or the KVM tree.

Thanks,

Paolo

Changes from RFC:
- improved commit messages, added documentation
- renamed ioctl from SGX_IOC_VEPC_REMOVE to SGX_IOC_VEPC_REMOVE_ALL

Change from v1:
- fixed documentation and code to cover SGX_ENCLAVE_ACT errors
- removed Tested-by since the code is quite different now

Paolo Bonzini (2):
  x86: sgx_vepc: extract sgx_vepc_remove_page
  x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl

 Documentation/x86/sgx.rst       | 26 +++++++++++++
 arch/x86/include/asm/sgx.h      |  3 ++
 arch/x86/include/uapi/asm/sgx.h |  2 +
 arch/x86/kernel/cpu/sgx/virt.c  | 69 ++++++++++++++++++++++++++++++---
 4 files changed, 95 insertions(+), 5 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-10-12 10:57 [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
@ 2021-10-12 10:57 ` Paolo Bonzini
  2021-10-12 16:53   ` Jarkko Sakkinen
                     ` (2 more replies)
  2021-10-12 10:57 ` [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 17+ messages in thread
From: Paolo Bonzini @ 2021-10-12 10:57 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: dave.hansen, seanjc, x86, yang.zhong, jarkko

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

One way to do this is to simply close and reopen the /dev/sgx_vepc file
descriptor and re-mmap the virtual EPC.  However, this is problematic
because it prevents sandboxing the userspace (for example forbidding
open() after the guest starts; this is doable with heavy use of SCM_RIGHTS
file descriptor passing).

In order to implement this, we will need a ioctl that performs
EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
other possibilities, such as closing and reopening the device,
are racy.

Start the implementation by creating a separate function with just
the __eremove wrapper.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	v1->v2: keep WARN in sgx_vepc_free_page

 arch/x86/kernel/cpu/sgx/virt.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 64511c4a5200..59cdf3f742ac 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -111,10 +111,8 @@ static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
-static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+static int sgx_vepc_remove_page(struct sgx_epc_page *epc_page)
 {
-	int ret;
-
 	/*
 	 * Take a previously guest-owned EPC page and return it to the
 	 * general EPC page pool.
@@ -124,7 +122,12 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 	 * case that a guest properly EREMOVE'd this page, a superfluous
 	 * EREMOVE is harmless.
 	 */
-	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	return __eremove(sgx_get_epc_virt_addr(epc_page));
+}
+
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret = sgx_vepc_remove_page(epc_page);
 	if (ret) {
 		/*
 		 * Only SGX_CHILD_PRESENT is expected, which is because of
@@ -144,7 +147,6 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 	}
 
 	sgx_free_epc_page(epc_page);
-
 	return 0;
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-12 10:57 [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
  2021-10-12 10:57 ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
@ 2021-10-12 10:57 ` Paolo Bonzini
  2021-10-12 16:57   ` Jarkko Sakkinen
                     ` (3 more replies)
  2021-10-13  6:54 ` [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Borislav Petkov
  2021-10-14 12:21 ` Yang Zhong
  3 siblings, 4 replies; 17+ messages in thread
From: Paolo Bonzini @ 2021-10-12 10:57 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: dave.hansen, seanjc, x86, yang.zhong, jarkko

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

Some userspace implementations of virtual SGX would rather avoid having
to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
virtual EPC.  For example, they could sandbox themselves after the guest
starts and forbid further calls to open(), in order to mitigate exploits
from untrusted guests.

Therefore, add a ioctl that does this with EREMOVE.  Userspace can
invoke the ioctl to bring its vEPC pages back to uninitialized state.
There is a possibility that some pages fail to be removed if they are
SECS pages, and the child and SECS pages could be in separate vEPC
regions.  Therefore, the ioctl returns the number of EREMOVE failures,
telling userspace to try the ioctl again after it's done with all
vEPC regions.  A more verbose description of the correct usage and
the possible error conditions is documented in sgx.rst.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	v1->v2: return EBUSY for SGX_ENCLAVE_ACT, adjust docs
		add cond_resched

 Documentation/x86/sgx.rst       | 26 +++++++++++++++
 arch/x86/include/asm/sgx.h      |  3 ++
 arch/x86/include/uapi/asm/sgx.h |  2 ++
 arch/x86/kernel/cpu/sgx/virt.c  | 57 +++++++++++++++++++++++++++++++++
 4 files changed, 88 insertions(+)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index dd0ac96ff9ef..7bc9c3b297d6 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -250,3 +250,29 @@ user wants to deploy SGX applications both on the host and in guests
 on the same machine, the user should reserve enough EPC (by taking out
 total virtual EPC size of all SGX VMs from the physical EPC size) for
 host SGX applications so they can run with acceptable performance.
+
+Architectural behavior is to restore all EPC pages to an uninitialized
+state also after a guest reboot.  Because this state can be reached only
+through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
+provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
+on all pages in the virtual EPC.
+
+``EREMOVE`` can fail for two reasons, which Linux relays to userspace
+in a different manner:
+
+1. Page removal will always fail when any thread is running in the
+   enclave to which the page belongs.  In this case the ioctl will
+   return ``EBUSY`` independent of whether it has successfully removed
+   some pages; userspace can avoid these failures by preventing execution
+   of any vcpu which maps the virtual EPC.
+
+2) Page removal will also fail for SGX "SECS" metadata pages which still
+   have child pages.  Child pages can be removed by executing
+   ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
+   mapped into the guest.  This means that the ioctl() must be called
+   twice: an initial set of calls to remove child pages and a subsequent
+   set of calls to remove SECS pages.  The second set of calls is only
+   required for those mappings that returned a nonzero value from the
+   first call.  It indicates a bug in the kernel or the userspace client
+   if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
+   a return code other than 0.
diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 05f3e21f01a7..2e5d8c97655e 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -50,6 +50,8 @@ enum sgx_encls_function {
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
  *				been completed yet.
  * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
+ * %SGX_ENCLAVE_ACT		One or more logical processors are executing
+ *				inside the enclave.
  * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
  *				public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
@@ -57,6 +59,7 @@ enum sgx_encls_function {
 enum sgx_return_code {
 	SGX_NOT_TRACKED			= 11,
 	SGX_CHILD_PRESENT		= 13,
+	SGX_ENCLAVE_ACT			= 14,
 	SGX_INVALID_EINITTOKEN		= 16,
 	SGX_UNMASKED_EVENT		= 128,
 };
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 9690d6899ad9..f4b81587e90b 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -27,6 +27,8 @@ enum sgx_page_flags {
 	_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
 #define SGX_IOC_ENCLAVE_PROVISION \
 	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
+#define SGX_IOC_VEPC_REMOVE_ALL \
+	_IO(SGX_MAGIC, 0x04)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 59cdf3f742ac..81a0a0f22007 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -150,6 +150,46 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 	return 0;
 }
 
+static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
+{
+	struct sgx_epc_page *entry;
+	unsigned long index;
+	long failures = 0;
+
+	xa_for_each(&vepc->page_array, index, entry) {
+		int ret = sgx_vepc_remove_page(entry);
+		switch (ret) {
+		case 0:
+			break;
+
+		case SGX_CHILD_PRESENT:
+			failures++;
+			break;
+
+		case SGX_ENCLAVE_ACT:
+			/*
+			 * Unlike in sgx_vepc_free_page, userspace could be calling
+			 * the ioctl while logical processors are running in the
+			 * enclave; do not warn.
+			 */
+			return -EBUSY;
+
+		default:
+			WARN_ONCE(1, EREMOVE_ERROR_MESSAGE, ret, ret);
+			failures++;
+			break;
+		}
+		cond_resched();
+	}
+
+	/*
+	 * Return the number of pages that failed to be removed, so
+	 * userspace knows that there are still SECS pages lying
+	 * around.
+	 */
+	return failures;
+}
+
 static int sgx_vepc_release(struct inode *inode, struct file *file)
 {
 	struct sgx_vepc *vepc = file->private_data;
@@ -235,9 +274,27 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static long sgx_vepc_ioctl(struct file *file,
+			   unsigned int cmd, unsigned long arg)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	switch (cmd) {
+	case SGX_IOC_VEPC_REMOVE_ALL:
+		if (arg)
+			return -EINVAL;
+		return sgx_vepc_remove_all(vepc);
+
+	default:
+		return -ENOTTY;
+	}
+}
+
 static const struct file_operations sgx_vepc_fops = {
 	.owner		= THIS_MODULE,
 	.open		= sgx_vepc_open,
+	.unlocked_ioctl	= sgx_vepc_ioctl,
+	.compat_ioctl	= sgx_vepc_ioctl,
 	.release	= sgx_vepc_release,
 	.mmap		= sgx_vepc_mmap,
 };
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-10-12 10:57 ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
@ 2021-10-12 16:53   ` Jarkko Sakkinen
  2021-10-13 12:56   ` [tip: x86/sgx] x86/sgx/virt: Extract sgx_vepc_remove_page() tip-bot2 for Paolo Bonzini
  2021-10-14 22:10   ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Dave Hansen
  2 siblings, 0 replies; 17+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:53 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: dave.hansen, seanjc, x86, yang.zhong

On Tue, 2021-10-12 at 06:57 -0400, Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.
> 
> One way to do this is to simply close and reopen the /dev/sgx_vepc file
> descriptor and re-mmap the virtual EPC.  However, this is problematic
> because it prevents sandboxing the userspace (for example forbidding
> open() after the guest starts; this is doable with heavy use of SCM_RIGHTS
> file descriptor passing).
> 
> In order to implement this, we will need a ioctl that performs
> EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
> other possibilities, such as closing and reopening the device,
> are racy.
> 
> Start the implementation by creating a separate function with just
> the __eremove wrapper.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: keep WARN in sgx_vepc_free_page
> 
>  arch/x86/kernel/cpu/sgx/virt.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 64511c4a5200..59cdf3f742ac 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -111,10 +111,8 @@ static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
>         return 0;
>  }
>  
> -static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
> +static int sgx_vepc_remove_page(struct sgx_epc_page *epc_page)
>  {
> -       int ret;
> -
>         /*
>          * Take a previously guest-owned EPC page and return it to the
>          * general EPC page pool.
> @@ -124,7 +122,12 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
>          * case that a guest properly EREMOVE'd this page, a superfluous
>          * EREMOVE is harmless.
>          */
> -       ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> +       return __eremove(sgx_get_epc_virt_addr(epc_page));
> +}
> +
> +static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
> +{
> +       int ret = sgx_vepc_remove_page(epc_page);
>         if (ret) {
>                 /*
>                  * Only SGX_CHILD_PRESENT is expected, which is because of
> @@ -144,7 +147,6 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
>         }
>  
>         sgx_free_epc_page(epc_page);
> -
>         return 0;
>  }
>  

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-12 10:57 ` [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl Paolo Bonzini
@ 2021-10-12 16:57   ` Jarkko Sakkinen
  2021-10-12 17:03     ` Paolo Bonzini
  2021-10-13 12:56   ` [tip: x86/sgx] x86/sgx/virt: Implement " tip-bot2 for Paolo Bonzini
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:57 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: dave.hansen, seanjc, x86, yang.zhong

On Tue, 2021-10-12 at 06:57 -0400, Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.
> 
> Some userspace implementations of virtual SGX would rather avoid having
> to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
> virtual EPC.  For example, they could sandbox themselves after the guest
> starts and forbid further calls to open(), in order to mitigate exploits
> from untrusted guests.
> 
> Therefore, add a ioctl that does this with EREMOVE.  Userspace can
> invoke the ioctl to bring its vEPC pages back to uninitialized state.
> There is a possibility that some pages fail to be removed if they are
> SECS pages, and the child and SECS pages could be in separate vEPC
> regions.  Therefore, the ioctl returns the number of EREMOVE failures,
> telling userspace to try the ioctl again after it's done with all
> vEPC regions.  A more verbose description of the correct usage and
> the possible error conditions is documented in sgx.rst.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: return EBUSY for SGX_ENCLAVE_ACT, adjust docs
>                 add cond_resched
> 
>  Documentation/x86/sgx.rst       | 26 +++++++++++++++
>  arch/x86/include/asm/sgx.h      |  3 ++
>  arch/x86/include/uapi/asm/sgx.h |  2 ++
>  arch/x86/kernel/cpu/sgx/virt.c  | 57 +++++++++++++++++++++++++++++++++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index dd0ac96ff9ef..7bc9c3b297d6 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -250,3 +250,29 @@ user wants to deploy SGX applications both on the host and in guests
>  on the same machine, the user should reserve enough EPC (by taking out
>  total virtual EPC size of all SGX VMs from the physical EPC size) for
>  host SGX applications so they can run with acceptable performance.
> +
> +Architectural behavior is to restore all EPC pages to an uninitialized
> +state also after a guest reboot.  Because this state can be reached only
> +through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
> +provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
> +on all pages in the virtual EPC.
> +
> +``EREMOVE`` can fail for two reasons, which Linux relays to userspace
> +in a different manner:
> +
> +1. Page removal will always fail when any thread is running in the
> +   enclave to which the page belongs.  In this case the ioctl will
> +   return ``EBUSY`` independent of whether it has successfully removed
> +   some pages; userspace can avoid these failures by preventing execution
> +   of any vcpu which maps the virtual EPC.
> +
> +2) Page removal will also fail for SGX "SECS" metadata pages which still
> +   have child pages.  Child pages can be removed by executing
> +   ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
> +   mapped into the guest.  This means that the ioctl() must be called
> +   twice: an initial set of calls to remove child pages and a subsequent
> +   set of calls to remove SECS pages.  The second set of calls is only
> +   required for those mappings that returned a nonzero value from the
> +   first call.  It indicates a bug in the kernel or the userspace client
> +   if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
> +   a return code other than 0.
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 05f3e21f01a7..2e5d8c97655e 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -50,6 +50,8 @@ enum sgx_encls_function {
>   * %SGX_NOT_TRACKED:           Previous ETRACK's shootdown sequence has not
>   *                             been completed yet.
>   * %SGX_CHILD_PRESENT          SECS has child pages present in the EPC.
> + * %SGX_ENCLAVE_ACT            One or more logical processors are executing
> + *                             inside the enclave.
>   * %SGX_INVALID_EINITTOKEN:    EINITTOKEN is invalid and enclave signer's
>   *                             public key does not match IA32_SGXLEPUBKEYHASH.
>   * %SGX_UNMASKED_EVENT:                An unmasked event, e.g. INTR, was received
> @@ -57,6 +59,7 @@ enum sgx_encls_function {
>  enum sgx_return_code {
>         SGX_NOT_TRACKED                 = 11,
>         SGX_CHILD_PRESENT               = 13,
> +       SGX_ENCLAVE_ACT                 = 14,
>         SGX_INVALID_EINITTOKEN          = 16,
>         SGX_UNMASKED_EVENT              = 128,
>  };
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 9690d6899ad9..f4b81587e90b 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -27,6 +27,8 @@ enum sgx_page_flags {
>         _IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
>  #define SGX_IOC_ENCLAVE_PROVISION \
>         _IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
> +#define SGX_IOC_VEPC_REMOVE_ALL \
> +       _IO(SGX_MAGIC, 0x04)
>  
>  /**
>   * struct sgx_enclave_create - parameter structure for the
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 59cdf3f742ac..81a0a0f22007 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -150,6 +150,46 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
>         return 0;
>  }
>  
> +static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
> +{
> +       struct sgx_epc_page *entry;
> +       unsigned long index;
> +       long failures = 0;
> +
> +       xa_for_each(&vepc->page_array, index, entry) {
> +               int ret = sgx_vepc_remove_page(entry);
> +               switch (ret) {
> +               case 0:
> +                       break;
> +
> +               case SGX_CHILD_PRESENT:
> +                       failures++;
> +                       break;
> +
> +               case SGX_ENCLAVE_ACT:
> +                       /*
> +                        * Unlike in sgx_vepc_free_page, userspace could be calling
> +                        * the ioctl while logical processors are running in the
> +                        * enclave; do not warn.
> +                        */
> +                       return -EBUSY;
> +
> +               default:
> +                       WARN_ONCE(1, EREMOVE_ERROR_MESSAGE, ret, ret);
> +                       failures++;
> +                       break;
> +               }
> +               cond_resched();
> +       }
> +
> +       /*
> +        * Return the number of pages that failed to be removed, so
> +        * userspace knows that there are still SECS pages lying
> +        * around.
> +        */
> +       return failures;
> +}
> +
>  static int sgx_vepc_release(struct inode *inode, struct file *file)
>  {
>         struct sgx_vepc *vepc = file->private_data;
> @@ -235,9 +274,27 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
>         return 0;
>  }
>  
> +static long sgx_vepc_ioctl(struct file *file,
> +                          unsigned int cmd, unsigned long arg)
> +{
> +       struct sgx_vepc *vepc = file->private_data;
> +
> +       switch (cmd) {
> +       case SGX_IOC_VEPC_REMOVE_ALL:
> +               if (arg)
> +                       return -EINVAL;
> +               return sgx_vepc_remove_all(vepc);
> +
> +       default:
> +               return -ENOTTY;
> +       }
> +}
> +
>  static const struct file_operations sgx_vepc_fops = {
>         .owner          = THIS_MODULE,
>         .open           = sgx_vepc_open,
> +       .unlocked_ioctl = sgx_vepc_ioctl,
> +       .compat_ioctl   = sgx_vepc_ioctl,
>         .release        = sgx_vepc_release,
>         .mmap           = sgx_vepc_mmap,
>  };

I went through this a few times, the code change is sound and
reasoning makes sense in the commit message.

The only thing that I think that is IMHO lacking is a simple
kselftest. I think a trivial test for SGX_IOC_VEP_REMOVE_ALL
would do.

/Jarkko


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-12 16:57   ` Jarkko Sakkinen
@ 2021-10-12 17:03     ` Paolo Bonzini
  2021-10-12 17:43       ` Jarkko Sakkinen
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2021-10-12 17:03 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-kernel, kvm; +Cc: dave.hansen, seanjc, x86, yang.zhong

On 12/10/21 18:57, Jarkko Sakkinen wrote:
>> +
>>   static const struct file_operations sgx_vepc_fops = {
>>          .owner          = THIS_MODULE,
>>          .open           = sgx_vepc_open,
>> +       .unlocked_ioctl = sgx_vepc_ioctl,
>> +       .compat_ioctl   = sgx_vepc_ioctl,
>>          .release        = sgx_vepc_release,
>>          .mmap           = sgx_vepc_mmap,
>>   };
> I went through this a few times, the code change is sound and
> reasoning makes sense in the commit message.
> 
> The only thing that I think that is IMHO lacking is a simple
> kselftest. I think a trivial test for SGX_IOC_VEP_REMOVE_ALL
> would do.

Yeah, a trivial test wouldn't cover a lot; it would be much better to at 
least set up a SECS, and check that the first call returns 1 and the 
second returns 0.  There is no existing test for /dev/sgx_vepc at all.

Right now I'm relying on Yang for testing this in QEMU, but I'll look 
into adding a selftest that does the full setup and runs an enclave in a 
guest.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-12 17:03     ` Paolo Bonzini
@ 2021-10-12 17:43       ` Jarkko Sakkinen
  0 siblings, 0 replies; 17+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 17:43 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: dave.hansen, seanjc, x86, yang.zhong

On Tue, 2021-10-12 at 19:03 +0200, Paolo Bonzini wrote:
> On 12/10/21 18:57, Jarkko Sakkinen wrote:
> > > +
> > >   static const struct file_operations sgx_vepc_fops = {
> > >          .owner          = THIS_MODULE,
> > >          .open           = sgx_vepc_open,
> > > +       .unlocked_ioctl = sgx_vepc_ioctl,
> > > +       .compat_ioctl   = sgx_vepc_ioctl,
> > >          .release        = sgx_vepc_release,
> > >          .mmap           = sgx_vepc_mmap,
> > >   };
> > I went through this a few times, the code change is sound and
> > reasoning makes sense in the commit message.
> > 
> > The only thing that I think that is IMHO lacking is a simple
> > kselftest. I think a trivial test for SGX_IOC_VEP_REMOVE_ALL
> > would do.
> 
> Yeah, a trivial test wouldn't cover a lot; it would be much better to at 
> least set up a SECS, and check that the first call returns 1 and the 
> second returns 0.  There is no existing test for /dev/sgx_vepc at all.
> 
> Right now I'm relying on Yang for testing this in QEMU, but I'll look 
> into adding a selftest that does the full setup and runs an enclave in a 
> guest.

This having a regression would not working would not cause that much collateral
damage, especially since it would be probably quickly noticed by someone, so I
think that this is not absolutely mandatory. So you can ignore kselftest part,
and thus

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

Thank you, this work helps me a lot, given that my SGX testing is based around
using QEMU ATM.

/Jarkko


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages
  2021-10-12 10:57 [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
  2021-10-12 10:57 ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
  2021-10-12 10:57 ` [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl Paolo Bonzini
@ 2021-10-13  6:54 ` Borislav Petkov
  2021-10-13  7:15   ` Paolo Bonzini
  2021-10-14 12:21 ` Yang Zhong
  3 siblings, 1 reply; 17+ messages in thread
From: Borislav Petkov @ 2021-10-13  6:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, dave.hansen, seanjc, x86, yang.zhong, jarkko

On Tue, Oct 12, 2021 at 06:57:06AM -0400, Paolo Bonzini wrote:
> If possible, I would like these patches to be included in 5.15 through
> either the x86 or the KVM tree.

You mean here 5.16, right?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages
  2021-10-13  6:54 ` [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Borislav Petkov
@ 2021-10-13  7:15   ` Paolo Bonzini
  2021-10-13  7:38     ` Borislav Petkov
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2021-10-13  7:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, kvm, dave.hansen, seanjc, x86, yang.zhong, jarkko

On 13/10/21 08:54, Borislav Petkov wrote:
> On Tue, Oct 12, 2021 at 06:57:06AM -0400, Paolo Bonzini wrote:
>> If possible, I would like these patches to be included in 5.15 through
>> either the x86 or the KVM tree.
> 
> You mean here 5.16, right?

No, I mean 5.15 because it literally cannot break anything that was 
working previously and the functionality provided by the ioctl 
(resetting VMs) is important.  But it wouldn't be a big issue if it was 
5.16 only.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages
  2021-10-13  7:15   ` Paolo Bonzini
@ 2021-10-13  7:38     ` Borislav Petkov
  0 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2021-10-13  7:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, dave.hansen, seanjc, x86, yang.zhong, jarkko

On Wed, Oct 13, 2021 at 09:15:27AM +0200, Paolo Bonzini wrote:
> No, I mean 5.15 because it literally cannot break anything that was working
> previously and the functionality provided by the ioctl (resetting VMs) is
> important.  But it wouldn't be a big issue if it was 5.16 only.

Ok, lemme queue it for 5.16 then.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip: x86/sgx] x86/sgx/virt: Implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-12 10:57 ` [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl Paolo Bonzini
  2021-10-12 16:57   ` Jarkko Sakkinen
@ 2021-10-13 12:56   ` tip-bot2 for Paolo Bonzini
  2021-10-14 22:14   ` [PATCH v2 2/2] x86: sgx_vepc: implement " Dave Hansen
  2021-10-15 22:29   ` Sean Christopherson
  3 siblings, 0 replies; 17+ messages in thread
From: tip-bot2 for Paolo Bonzini @ 2021-10-13 12:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Paolo Bonzini, Borislav Petkov, Jarkko Sakkinen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     71eba1c0939e3b1ad1b71fe0171de30e265437e3
Gitweb:        https://git.kernel.org/tip/71eba1c0939e3b1ad1b71fe0171de30e265437e3
Author:        Paolo Bonzini <pbonzini@redhat.com>
AuthorDate:    Tue, 12 Oct 2021 06:57:08 -04:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Wed, 13 Oct 2021 11:57:53 +02:00

x86/sgx/virt: Implement SGX_IOC_VEPC_REMOVE ioctl

For bare-metal SGX on real hardware, the hardware provides guaranteed
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

Some userspace implementations of virtual SGX would rather avoid having
to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
virtual EPC.  For example, they could sandbox themselves after the guest
starts and forbid further calls to open(), in order to mitigate exploits
from untrusted guests.

Therefore, add a ioctl that does this with EREMOVE.  Userspace can
invoke the ioctl to bring its vEPC pages back to uninitialized state.
There is a possibility that some pages fail to be removed if they are
SECS pages, and the child and SECS pages could be in separate vEPC
regions.  Therefore, the ioctl returns the number of EREMOVE failures,
telling userspace to try the ioctl again after it's done with all
vEPC regions.  A more verbose description of the correct usage and
the possible error conditions is documented in sgx.rst.

 [ bp: Minor massaging. ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20211012105708.2070480-3-pbonzini@redhat.com
---
 Documentation/x86/sgx.rst       | 26 +++++++++++++++-
 arch/x86/include/asm/sgx.h      |  3 ++-
 arch/x86/include/uapi/asm/sgx.h |  2 +-
 arch/x86/kernel/cpu/sgx/virt.c  | 57 ++++++++++++++++++++++++++++++++-
 4 files changed, 88 insertions(+)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index dd0ac96..7bc9c3b 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -250,3 +250,29 @@ user wants to deploy SGX applications both on the host and in guests
 on the same machine, the user should reserve enough EPC (by taking out
 total virtual EPC size of all SGX VMs from the physical EPC size) for
 host SGX applications so they can run with acceptable performance.
+
+Architectural behavior is to restore all EPC pages to an uninitialized
+state also after a guest reboot.  Because this state can be reached only
+through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
+provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
+on all pages in the virtual EPC.
+
+``EREMOVE`` can fail for two reasons, which Linux relays to userspace
+in a different manner:
+
+1. Page removal will always fail when any thread is running in the
+   enclave to which the page belongs.  In this case the ioctl will
+   return ``EBUSY`` independent of whether it has successfully removed
+   some pages; userspace can avoid these failures by preventing execution
+   of any vcpu which maps the virtual EPC.
+
+2) Page removal will also fail for SGX "SECS" metadata pages which still
+   have child pages.  Child pages can be removed by executing
+   ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
+   mapped into the guest.  This means that the ioctl() must be called
+   twice: an initial set of calls to remove child pages and a subsequent
+   set of calls to remove SECS pages.  The second set of calls is only
+   required for those mappings that returned a nonzero value from the
+   first call.  It indicates a bug in the kernel or the userspace client
+   if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
+   a return code other than 0.
diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 05f3e21..2e5d8c9 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -50,6 +50,8 @@ enum sgx_encls_function {
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
  *				been completed yet.
  * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
+ * %SGX_ENCLAVE_ACT		One or more logical processors are executing
+ *				inside the enclave.
  * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
  *				public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
@@ -57,6 +59,7 @@ enum sgx_encls_function {
 enum sgx_return_code {
 	SGX_NOT_TRACKED			= 11,
 	SGX_CHILD_PRESENT		= 13,
+	SGX_ENCLAVE_ACT			= 14,
 	SGX_INVALID_EINITTOKEN		= 16,
 	SGX_UNMASKED_EVENT		= 128,
 };
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 9690d68..f4b8158 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -27,6 +27,8 @@ enum sgx_page_flags {
 	_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
 #define SGX_IOC_ENCLAVE_PROVISION \
 	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
+#define SGX_IOC_VEPC_REMOVE_ALL \
+	_IO(SGX_MAGIC, 0x04)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 59cdf3f..4465253 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -150,6 +150,46 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 	return 0;
 }
 
+static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
+{
+	struct sgx_epc_page *entry;
+	unsigned long index;
+	long failures = 0;
+
+	xa_for_each(&vepc->page_array, index, entry) {
+		int ret = sgx_vepc_remove_page(entry);
+		switch (ret) {
+		case 0:
+			break;
+
+		case SGX_CHILD_PRESENT:
+			failures++;
+			break;
+
+		case SGX_ENCLAVE_ACT:
+			/*
+			 * Unlike in sgx_vepc_free_page, userspace could be calling
+			 * the ioctl while logical processors are running in the
+			 * enclave; do not warn.
+			 */
+			return -EBUSY;
+
+		default:
+			WARN_ONCE(1, EREMOVE_ERROR_MESSAGE, ret, ret);
+			failures++;
+			break;
+		}
+		cond_resched();
+	}
+
+	/*
+	 * Return the number of pages that failed to be removed, so
+	 * userspace knows that there are still SECS pages lying
+	 * around.
+	 */
+	return failures;
+}
+
 static int sgx_vepc_release(struct inode *inode, struct file *file)
 {
 	struct sgx_vepc *vepc = file->private_data;
@@ -235,9 +275,26 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static long sgx_vepc_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	switch (cmd) {
+	case SGX_IOC_VEPC_REMOVE_ALL:
+		if (arg)
+			return -EINVAL;
+		return sgx_vepc_remove_all(vepc);
+
+	default:
+		return -ENOTTY;
+	}
+}
+
 static const struct file_operations sgx_vepc_fops = {
 	.owner		= THIS_MODULE,
 	.open		= sgx_vepc_open,
+	.unlocked_ioctl	= sgx_vepc_ioctl,
+	.compat_ioctl	= sgx_vepc_ioctl,
 	.release	= sgx_vepc_release,
 	.mmap		= sgx_vepc_mmap,
 };

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: x86/sgx] x86/sgx/virt: Extract sgx_vepc_remove_page()
  2021-10-12 10:57 ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
  2021-10-12 16:53   ` Jarkko Sakkinen
@ 2021-10-13 12:56   ` tip-bot2 for Paolo Bonzini
  2021-10-14 22:10   ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Dave Hansen
  2 siblings, 0 replies; 17+ messages in thread
From: tip-bot2 for Paolo Bonzini @ 2021-10-13 12:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Paolo Bonzini, Borislav Petkov, Jarkko Sakkinen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     33633b20e0da301f9009cc9aa00282acbc282a1f
Gitweb:        https://git.kernel.org/tip/33633b20e0da301f9009cc9aa00282acbc282a1f
Author:        Paolo Bonzini <pbonzini@redhat.com>
AuthorDate:    Tue, 12 Oct 2021 06:57:07 -04:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Wed, 13 Oct 2021 11:57:44 +02:00

x86/sgx/virt: Extract sgx_vepc_remove_page()

For bare-metal SGX on real hardware, the hardware provides guaranteed
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

One way to do this is to simply close and reopen the /dev/sgx_vepc file
descriptor and re-mmap the virtual EPC.  However, this is problematic
because it prevents sandboxing the userspace (for example forbidding
open() after the guest starts; this is doable with heavy use of SCM_RIGHTS
file descriptor passing).

In order to implement this, an ioctl will be needed that performs
EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor: other
possibilities, such as closing and reopening the device, are racy.

Start the implementation by creating a separate function with just
the __eremove() wrapper.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20211012105708.2070480-2-pbonzini@redhat.com
---
 arch/x86/kernel/cpu/sgx/virt.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 64511c4..59cdf3f 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -111,10 +111,8 @@ static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
-static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+static int sgx_vepc_remove_page(struct sgx_epc_page *epc_page)
 {
-	int ret;
-
 	/*
 	 * Take a previously guest-owned EPC page and return it to the
 	 * general EPC page pool.
@@ -124,7 +122,12 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 	 * case that a guest properly EREMOVE'd this page, a superfluous
 	 * EREMOVE is harmless.
 	 */
-	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	return __eremove(sgx_get_epc_virt_addr(epc_page));
+}
+
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret = sgx_vepc_remove_page(epc_page);
 	if (ret) {
 		/*
 		 * Only SGX_CHILD_PRESENT is expected, which is because of
@@ -144,7 +147,6 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 	}
 
 	sgx_free_epc_page(epc_page);
-
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages
  2021-10-12 10:57 [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
                   ` (2 preceding siblings ...)
  2021-10-13  6:54 ` [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Borislav Petkov
@ 2021-10-14 12:21 ` Yang Zhong
  3 siblings, 0 replies; 17+ messages in thread
From: Yang Zhong @ 2021-10-14 12:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, dave.hansen, seanjc, x86, yang.zhong, jarkko

On Tue, Oct 12, 2021 at 06:57:06AM -0400, Paolo Bonzini wrote:
> Add to /dev/sgx_vepc a ioctl that brings vEPC pages back to uninitialized
> state with EREMOVE.  This is useful in order to match the expectations
> of guests after reboot, and to match the behavior of real hardware.
> 
> The ioctl is a cleaner alternative to closing and reopening the
> /dev/sgx_vepc device; reopening /dev/sgx_vepc could be problematic in
> case userspace has sandboxed itself since the time it first opened the
> device, and has thus lost permissions to do so.
> 
> If possible, I would like these patches to be included in 5.15 through
> either the x86 or the KVM tree.
> 
  
  Paolo, i did the below tests to verify those two patches on ICX server 
  (1). Windows2019 and Linux guest reboot 
  (2). One 10G vepc, and started 500 enclaves(each 2G) in guest, and then reset
       the guest with 'system_reset' command in monitor.
  (3). One 100K vepc, and start one 2M enclave in guest, then reset the guest
       with 'system_reset' command in the monitor.

   All those tests are successful, and the kernel changes work well. Thanks for
   the great support!

   Yang


> Thanks,
> 
> Paolo
> 
> Changes from RFC:
> - improved commit messages, added documentation
> - renamed ioctl from SGX_IOC_VEPC_REMOVE to SGX_IOC_VEPC_REMOVE_ALL
> 
> Change from v1:
> - fixed documentation and code to cover SGX_ENCLAVE_ACT errors
> - removed Tested-by since the code is quite different now
> 
> Paolo Bonzini (2):
>   x86: sgx_vepc: extract sgx_vepc_remove_page
>   x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl
> 
>  Documentation/x86/sgx.rst       | 26 +++++++++++++
>  arch/x86/include/asm/sgx.h      |  3 ++
>  arch/x86/include/uapi/asm/sgx.h |  2 +
>  arch/x86/kernel/cpu/sgx/virt.c  | 69 ++++++++++++++++++++++++++++++---
>  4 files changed, 95 insertions(+), 5 deletions(-)
> 
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-10-12 10:57 ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
  2021-10-12 16:53   ` Jarkko Sakkinen
  2021-10-13 12:56   ` [tip: x86/sgx] x86/sgx/virt: Extract sgx_vepc_remove_page() tip-bot2 for Paolo Bonzini
@ 2021-10-14 22:10   ` Dave Hansen
  2 siblings, 0 replies; 17+ messages in thread
From: Dave Hansen @ 2021-10-14 22:10 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: dave.hansen, seanjc, x86, yang.zhong, jarkko

On 10/12/21 3:57 AM, Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.
> 
> One way to do this is to simply close and reopen the /dev/sgx_vepc file
> descriptor and re-mmap the virtual EPC.  However, this is problematic
> because it prevents sandboxing the userspace (for example forbidding
> open() after the guest starts; this is doable with heavy use of SCM_RIGHTS
> file descriptor passing).
> 
> In order to implement this, we will need a ioctl that performs
> EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
> other possibilities, such as closing and reopening the device,
> are racy.
> 
> Start the implementation by creating a separate function with just
> the __eremove wrapper.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-12 10:57 ` [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl Paolo Bonzini
  2021-10-12 16:57   ` Jarkko Sakkinen
  2021-10-13 12:56   ` [tip: x86/sgx] x86/sgx/virt: Implement " tip-bot2 for Paolo Bonzini
@ 2021-10-14 22:14   ` Dave Hansen
  2021-10-15 22:29   ` Sean Christopherson
  3 siblings, 0 replies; 17+ messages in thread
From: Dave Hansen @ 2021-10-14 22:14 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: dave.hansen, seanjc, x86, yang.zhong, jarkko

On 10/12/21 3:57 AM, Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.
> 
> Some userspace implementations of virtual SGX would rather avoid having
> to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
> virtual EPC.  For example, they could sandbox themselves after the guest
> starts and forbid further calls to open(), in order to mitigate exploits
> from untrusted guests.
> 
> Therefore, add a ioctl that does this with EREMOVE.  Userspace can
> invoke the ioctl to bring its vEPC pages back to uninitialized state.
> There is a possibility that some pages fail to be removed if they are
> SECS pages, and the child and SECS pages could be in separate vEPC
> regions.  Therefore, the ioctl returns the number of EREMOVE failures,
> telling userspace to try the ioctl again after it's done with all
> vEPC regions.  A more verbose description of the correct usage and
> the possible error conditions is documented in sgx.rst.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

The new approach and revised changelogs look fine to me:

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

Like Jarkko mentioned, it would be _nice_ to have some self-contained
selftests around this.  Would it be a pain to rig something up in
selftests/kvm that at least trivially poked at /dev/sgx_vepc?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-12 10:57 ` [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl Paolo Bonzini
                     ` (2 preceding siblings ...)
  2021-10-14 22:14   ` [PATCH v2 2/2] x86: sgx_vepc: implement " Dave Hansen
@ 2021-10-15 22:29   ` Sean Christopherson
  2021-10-16  7:14     ` Paolo Bonzini
  3 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2021-10-15 22:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, dave.hansen, x86, yang.zhong, jarkko

On Tue, Oct 12, 2021, Paolo Bonzini wrote:
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 59cdf3f742ac..81a0a0f22007 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -150,6 +150,46 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
>  	return 0;
>  }
>  
> +static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
> +{
> +	struct sgx_epc_page *entry;
> +	unsigned long index;
> +	long failures = 0;
> +
> +	xa_for_each(&vepc->page_array, index, entry) {

Might be worth a comment that xa_for_each() is safe to use concurrently with
xa_load/xa_store, i.e. this doesn't need to take vepc->lock.  It does raise the
question of whether or not the kernel is responsible for providing deterministic
results if userspace/guest is accessing previously-unallocated pages.

> +		int ret = sgx_vepc_remove_page(entry);

I don't see anything that prevents userspace from doing SGX_IOC_VEPC_REMOVE_ALL
on multiple threads with the same vEPC.  That means userspace can induce a #GP
due to concurrent access.  Taking vepc->lock would solve that particular problem,
but I think that's a moot point because the EREMOVE locking rules are relative to
the SECS, not the individual page (because of refcounting).  SGX_IOC_VEPC_REMOVE_ALL
on any two arbitrary vEPCs could induce a fault if they have children belonging to
the same enclave, i.e. share an SECS.

Sadly, I think this needs to be:

		if (ret == SGX_CHILD_PRESENT)
			failures++;
		else if (ret)
			return -EBUSY;

> +		switch (ret) {
> +		case 0:
> +			break;
> +
> +		case SGX_CHILD_PRESENT:
> +			failures++;
> +			break;
> +
> +		case SGX_ENCLAVE_ACT:
> +			/*
> +			 * Unlike in sgx_vepc_free_page, userspace could be calling
> +			 * the ioctl while logical processors are running in the
> +			 * enclave; do not warn.
> +			 */
> +			return -EBUSY;
> +
> +		default:
> +			WARN_ONCE(1, EREMOVE_ERROR_MESSAGE, ret, ret);
> +			failures++;
> +			break;
> +		}
> +		cond_resched();
> +	}
> +
> +	/*
> +	 * Return the number of pages that failed to be removed, so
> +	 * userspace knows that there are still SECS pages lying
> +	 * around.
> +	 */
> +	return failures;
> +}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl
  2021-10-15 22:29   ` Sean Christopherson
@ 2021-10-16  7:14     ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2021-10-16  7:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, kvm, dave.hansen, x86, yang.zhong, jarkko

On 16/10/21 00:29, Sean Christopherson wrote:
> On Tue, Oct 12, 2021, Paolo Bonzini wrote:
>> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
>> index 59cdf3f742ac..81a0a0f22007 100644
>> --- a/arch/x86/kernel/cpu/sgx/virt.c
>> +++ b/arch/x86/kernel/cpu/sgx/virt.c
>> @@ -150,6 +150,46 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
>>   	return 0;
>>   }
>>   
>> +static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
>> +{
>> +	struct sgx_epc_page *entry;
>> +	unsigned long index;
>> +	long failures = 0;
>> +
>> +	xa_for_each(&vepc->page_array, index, entry) {
> 
> Might be worth a comment that xa_for_each() is safe to use concurrently with
> xa_load/xa_store, i.e. this doesn't need to take vepc->lock.

I considered that to be part of the xarray contract (xa_store uses 
rcu_assign_pointer so it has release semantics, and vepc->page_array is 
essentially "store once").

> It does raise the
> question of whether or not the kernel is responsible for providing deterministic
> results if userspace/guest is accessing previously-unallocated pages.

Garbage in, garbage out -- but you're right below that garbage in, WARN 
out is not acceptable.  I'm sending a v3 with documentation changes too.

Paolo

>> +		int ret = sgx_vepc_remove_page(entry);
> 
> I don't see anything that prevents userspace from doing SGX_IOC_VEPC_REMOVE_ALL
> on multiple threads with the same vEPC.  That means userspace can induce a #GP
> due to concurrent access.  Taking vepc->lock would solve that particular problem,
> but I think that's a moot point because the EREMOVE locking rules are relative to
> the SECS, not the individual page (because of refcounting).  SGX_IOC_VEPC_REMOVE_ALL
> on any two arbitrary vEPCs could induce a fault if they have children belonging to
> the same enclave, i.e. share an SECS.
> 
> Sadly, I think this needs to be:
> 
> 		if (ret == SGX_CHILD_PRESENT)
> 			failures++;
> 		else if (ret)
> 			return -EBUSY;
> 
>> +		switch (ret) {
>> +		case 0:
>> +			break;
>> +
>> +		case SGX_CHILD_PRESENT:
>> +			failures++;
>> +			break;
>> +
>> +		case SGX_ENCLAVE_ACT:
>> +			/*
>> +			 * Unlike in sgx_vepc_free_page, userspace could be calling
>> +			 * the ioctl while logical processors are running in the
>> +			 * enclave; do not warn.
>> +			 */
>> +			return -EBUSY;
>> +
>> +		default:
>> +			WARN_ONCE(1, EREMOVE_ERROR_MESSAGE, ret, ret);
>> +			failures++;
>> +			break;
>> +		}
>> +		cond_resched();
>> +	}
>> +
>> +	/*
>> +	 * Return the number of pages that failed to be removed, so
>> +	 * userspace knows that there are still SECS pages lying
>> +	 * around.
>> +	 */
>> +	return failures;
>> +}
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-16  7:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 10:57 [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
2021-10-12 10:57 ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
2021-10-12 16:53   ` Jarkko Sakkinen
2021-10-13 12:56   ` [tip: x86/sgx] x86/sgx/virt: Extract sgx_vepc_remove_page() tip-bot2 for Paolo Bonzini
2021-10-14 22:10   ` [PATCH v2 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Dave Hansen
2021-10-12 10:57 ` [PATCH v2 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE ioctl Paolo Bonzini
2021-10-12 16:57   ` Jarkko Sakkinen
2021-10-12 17:03     ` Paolo Bonzini
2021-10-12 17:43       ` Jarkko Sakkinen
2021-10-13 12:56   ` [tip: x86/sgx] x86/sgx/virt: Implement " tip-bot2 for Paolo Bonzini
2021-10-14 22:14   ` [PATCH v2 2/2] x86: sgx_vepc: implement " Dave Hansen
2021-10-15 22:29   ` Sean Christopherson
2021-10-16  7:14     ` Paolo Bonzini
2021-10-13  6:54 ` [PATCH v2 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Borislav Petkov
2021-10-13  7:15   ` Paolo Bonzini
2021-10-13  7:38     ` Borislav Petkov
2021-10-14 12:21 ` Yang Zhong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.