linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages
@ 2021-09-20 12:53 Paolo Bonzini
  2021-09-20 12:54 ` [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
  2021-09-20 12:54 ` [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl Paolo Bonzini
  0 siblings, 2 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-20 12:53 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

Add to /dev/sgx_vepc a ioctl that brings vEPC pages back to uninitialized
state with EREMOVE.  This is useful in order to match the expectations
of guests after reboot, and to match the behavior of real hardware.

The ioctl is a cleaner alternative to closing and reopening the
/dev/sgx_vepc device; reopening /dev/sgx_vepc could be problematic in
case userspace has sandboxed itself since the time it first opened the
device, and has thus lost permissions to do so.

If possible, I would like these patches to be included in 5.15 through
either the x86 or the KVM tree.

Thanks,

Paolo

Changes from RFC:
- improved commit messages, added documentation
- renamed ioctl from SGX_IOC_VEPC_REMOVE to SGX_IOC_VEPC_REMOVE_ALL

Paolo Bonzini (2):
  x86: sgx_vepc: extract sgx_vepc_remove_page
  x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl

 Documentation/x86/sgx.rst       | 14 ++++++++++
 arch/x86/include/uapi/asm/sgx.h |  2 ++
 arch/x86/kernel/cpu/sgx/virt.c  | 48 ++++++++++++++++++++++++++++++---
 3 files changed, 61 insertions(+), 3 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-20 12:53 [PATCH 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
@ 2021-09-20 12:54 ` Paolo Bonzini
  2021-09-21 19:44   ` Jarkko Sakkinen
  2021-09-20 12:54 ` [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl Paolo Bonzini
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-20 12:54 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

One way to do this is to simply close and reopen the /dev/sgx_vepc file
descriptor and re-mmap the virtual EPC.  However, this is problematic
because it prevents sandboxing the userspace (for example forbidding
open() after the guest starts, or running in a mount namespace that
does not have access to /dev; both are doable with pre-opened file
descriptors and/or SCM_RIGHTS file descriptor passing).

In order to implement this, we will need a ioctl that performs
EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
other possibilities, such as closing and reopening the device,
are racy.

Start the implementation by pulling the EREMOVE into a separate
function.

Tested-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/cpu/sgx/virt.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 64511c4a5200..59b9c13121cd 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -111,7 +111,7 @@ static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
-static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+static int sgx_vepc_remove_page(struct sgx_epc_page *epc_page)
 {
 	int ret;
 
@@ -140,11 +140,17 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 		 */
 		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
 			  ret, ret);
-		return ret;
 	}
+	return ret;
+}
 
-	sgx_free_epc_page(epc_page);
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret = sgx_vepc_remove_page(epc_page);
+	if (ret)
+		return ret;
 
+	sgx_free_epc_page(epc_page);
 	return 0;
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl
  2021-09-20 12:53 [PATCH 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
  2021-09-20 12:54 ` [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
@ 2021-09-20 12:54 ` Paolo Bonzini
  2021-09-20 22:17   ` Kai Huang
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-20 12:54 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

Some userspace implementations of virtual SGX would rather avoid having
to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
virtual EPC.  For example, they could be sandboxing themselves before the
guest starts in order to mitigate exploits from untrusted guests,
forbidding further calls to open().

Therefore, add a ioctl that does this with EREMOVE.  Userspace can
invoke the ioctl to bring its vEPC pages back to uninitialized state.
There is a possibility that some pages fail to be removed if they are
SECS pages, so the ioctl returns the number of EREMOVE failures.  The
correct usage is documented in sgx.rst.

Tested-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/x86/sgx.rst       | 14 +++++++++++++
 arch/x86/include/uapi/asm/sgx.h |  2 ++
 arch/x86/kernel/cpu/sgx/virt.c  | 36 +++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index dd0ac96ff9ef..b855db96c6c6 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -250,3 +250,17 @@ user wants to deploy SGX applications both on the host and in guests
 on the same machine, the user should reserve enough EPC (by taking out
 total virtual EPC size of all SGX VMs from the physical EPC size) for
 host SGX applications so they can run with acceptable performance.
+
+Some guests, such as Windows, require that all pages are uninitialized
+at startup or after a guest reboot.  Because this state can be reached
+only through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
+provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction on
+all pages in the virtual EPC.
+
+The ioctl will often not able to remove SECS pages, in case their child
+pages have not gone through ``EREMOVE`` yet; therefore, the ioctl returns the
+number of pages that failed to be removed.  ``SGX_IOC_VEPC_REMOVE_ALL`` should
+first be invoked on all the ``/dev/sgx_vepc`` file descriptors mapped
+into the guest; a second call to the ioctl will be able to remove all
+leftover pages and will return 0.  Any other return value on the second call
+would be a symptom of a bug in either Linux or the userspace client.
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 9690d6899ad9..f79d84ce8033 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -27,6 +27,8 @@ enum sgx_page_flags {
 	_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
 #define SGX_IOC_ENCLAVE_PROVISION \
 	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
+#define SGX_IOC_VEPC_REMOVE_ALL \
+	_IO(SGX_MAGIC, 0x04)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 59b9c13121cd..81dc186fda2e 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -154,6 +154,24 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 	return 0;
 }
 
+static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
+{
+	struct sgx_epc_page *entry;
+	unsigned long index;
+	long failures = 0;
+
+	xa_for_each(&vepc->page_array, index, entry)
+		if (sgx_vepc_remove_page(entry))
+			failures++;
+
+	/*
+	 * Return the number of pages that failed to be removed, so
+	 * userspace knows that there are still SECS pages lying
+	 * around.
+	 */
+	return failures;
+}
+
 static int sgx_vepc_release(struct inode *inode, struct file *file)
 {
 	struct sgx_vepc *vepc = file->private_data;
@@ -239,9 +257,27 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static long sgx_vepc_ioctl(struct file *file,
+			   unsigned int cmd, unsigned long arg)
+{
+	struct sgx_vepc *vepc = file->private_data;
+
+	switch (cmd) {
+	case SGX_IOC_VEPC_REMOVE_ALL:
+		if (arg)
+			return -EINVAL;
+		return sgx_vepc_remove_all(vepc);
+
+	default:
+		return -ENOTTY;
+	}
+}
+
 static const struct file_operations sgx_vepc_fops = {
 	.owner		= THIS_MODULE,
 	.open		= sgx_vepc_open,
+	.unlocked_ioctl	= sgx_vepc_ioctl,
+	.compat_ioctl	= sgx_vepc_ioctl,
 	.release	= sgx_vepc_release,
 	.mmap		= sgx_vepc_mmap,
 };
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl
  2021-09-20 12:54 ` [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl Paolo Bonzini
@ 2021-09-20 22:17   ` Kai Huang
  2021-09-20 23:09     ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Kai Huang @ 2021-09-20 22:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On Mon, 20 Sep 2021 08:54:01 -0400 Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.
> 
> Some userspace implementations of virtual SGX would rather avoid having
> to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the
> virtual EPC.  For example, they could be sandboxing themselves before the
> guest starts in order to mitigate exploits from untrusted guests,
> forbidding further calls to open().
> 
> Therefore, add a ioctl that does this with EREMOVE.  Userspace can
> invoke the ioctl to bring its vEPC pages back to uninitialized state.
> There is a possibility that some pages fail to be removed if they are
> SECS pages, so the ioctl returns the number of EREMOVE failures.  The
> correct usage is documented in sgx.rst.
> 
> Tested-by: Yang Zhong <yang.zhong@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/x86/sgx.rst       | 14 +++++++++++++
>  arch/x86/include/uapi/asm/sgx.h |  2 ++
>  arch/x86/kernel/cpu/sgx/virt.c  | 36 +++++++++++++++++++++++++++++++++
>  3 files changed, 52 insertions(+)
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index dd0ac96ff9ef..b855db96c6c6 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -250,3 +250,17 @@ user wants to deploy SGX applications both on the host and in guests
>  on the same machine, the user should reserve enough EPC (by taking out
>  total virtual EPC size of all SGX VMs from the physical EPC size) for
>  host SGX applications so they can run with acceptable performance.
> +
> +Some guests, such as Windows, require that all pages are uninitialized
> +at startup or after a guest reboot.  

I would say this is not requirement of some windows guests, but requirement of
correctly emulating architectural behaviour.  On bare-metal EPC is lost on
reboot, so on VM reboot, virtual EPC should be reset too, regardless what
guest VM is.  

Because this state can be reached
> +only through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
> +provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction on
> +all pages in the virtual EPC.
> +
> +The ioctl will often not able to remove SECS pages, in case their child
> +pages have not gone through ``EREMOVE`` yet; therefore, the ioctl returns the
> +number of pages that failed to be removed.  ``SGX_IOC_VEPC_REMOVE_ALL`` should
> +first be invoked on all the ``/dev/sgx_vepc`` file descriptors mapped
> +into the guest; a second call to the ioctl will be able to remove all
> +leftover pages and will return 0.  Any other return value on the second call
> +would be a symptom of a bug in either Linux or the userspace client.

Maybe also worth to mention userspace should guarantee there's no vcpu running
inside guest enclave when resetting guest's virtual EPC.

> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 9690d6899ad9..f79d84ce8033 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -27,6 +27,8 @@ enum sgx_page_flags {
>  	_IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
>  #define SGX_IOC_ENCLAVE_PROVISION \
>  	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
> +#define SGX_IOC_VEPC_REMOVE_ALL \
> +	_IO(SGX_MAGIC, 0x04)
>  
>  /**
>   * struct sgx_enclave_create - parameter structure for the
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 59b9c13121cd..81dc186fda2e 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -154,6 +154,24 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
>  	return 0;
>  }
>  
> +static long sgx_vepc_remove_all(struct sgx_vepc *vepc)
> +{
> +	struct sgx_epc_page *entry;
> +	unsigned long index;
> +	long failures = 0;
> +
> +	xa_for_each(&vepc->page_array, index, entry)
> +		if (sgx_vepc_remove_page(entry))
> +			failures++;

I think need to hold vepc->lock?

> +
> +	/*
> +	 * Return the number of pages that failed to be removed, so
> +	 * userspace knows that there are still SECS pages lying
> +	 * around.
> +	 */
> +	return failures;
> +}
> +
>  static int sgx_vepc_release(struct inode *inode, struct file *file)
>  {
>  	struct sgx_vepc *vepc = file->private_data;
> @@ -239,9 +257,27 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +static long sgx_vepc_ioctl(struct file *file,
> +			   unsigned int cmd, unsigned long arg)
> +{
> +	struct sgx_vepc *vepc = file->private_data;
> +
> +	switch (cmd) {
> +	case SGX_IOC_VEPC_REMOVE_ALL:
> +		if (arg)
> +			return -EINVAL;
> +		return sgx_vepc_remove_all(vepc);
> +
> +	default:
> +		return -ENOTTY;
> +	}
> +}
> +
>  static const struct file_operations sgx_vepc_fops = {
>  	.owner		= THIS_MODULE,
>  	.open		= sgx_vepc_open,
> +	.unlocked_ioctl	= sgx_vepc_ioctl,
> +	.compat_ioctl	= sgx_vepc_ioctl,
>  	.release	= sgx_vepc_release,
>  	.mmap		= sgx_vepc_mmap,
>  };
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl
  2021-09-20 22:17   ` Kai Huang
@ 2021-09-20 23:09     ` Dave Hansen
  2021-09-21 10:29       ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2021-09-20 23:09 UTC (permalink / raw)
  To: Kai Huang, Paolo Bonzini
  Cc: linux-kernel, kvm, x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 9/20/21 3:17 PM, Kai Huang wrote:
>> +The ioctl will often not able to remove SECS pages, in case their child
>> +pages have not gone through ``EREMOVE`` yet; therefore, the ioctl returns the
>> +number of pages that failed to be removed.  ``SGX_IOC_VEPC_REMOVE_ALL`` should
>> +first be invoked on all the ``/dev/sgx_vepc`` file descriptors mapped
>> +into the guest; a second call to the ioctl will be able to remove all
>> +leftover pages and will return 0.  Any other return value on the second call
>> +would be a symptom of a bug in either Linux or the userspace client.
> Maybe also worth to mention userspace should guarantee there's no vcpu running
> inside guest enclave when resetting guest's virtual EPC.

Why, specifically?

Is it because EREMOVE will also fail if there is a CPU running in the
enclave?

Maybe we should say something like:

The ioctl() returns the number of pages where removal failed.  Callers
of the ioctl() need to handle two sources of failure:

1) Page removal will always fail when any thread is running in the
   enclave to which the page belongs.  ioctl() users can avoid these
   failures by preventing execution of any vcpu which maps the virtual
   EPC.

2) Page removal will also fail for SGX "SECS" metadata pages which still
   have child pages.  SECS pages can be removed by executing
   ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
   mapped into the guest.  Yhis means that the ioctl() must be called
   twice: an initial set of calls to remove child pages and a subsequent
   set of calls to remove SECS pages.

   It indicates a bug in the kernel or the userspace client if any of
   the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls fails (has a
   return code other than 0).


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl
  2021-09-20 23:09     ` Dave Hansen
@ 2021-09-21 10:29       ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-21 10:29 UTC (permalink / raw)
  To: Dave Hansen, Kai Huang
  Cc: linux-kernel, kvm, x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 21/09/21 01:09, Dave Hansen wrote:
>> Maybe also worth to mention userspace should guarantee there's no vcpu running
>> inside guest enclave when resetting guest's virtual EPC.
> Why, specifically?
> 
> Is it because EREMOVE will also fail if there is a CPU running in the
> enclave?

Yes, and SGX_ENCLAVE_ACT would cause a WARN.  Good catch, Kai, I'll fix it.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-20 12:54 ` [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
@ 2021-09-21 19:44   ` Jarkko Sakkinen
  2021-09-21 19:46     ` Jarkko Sakkinen
  2021-09-23 12:08     ` Paolo Bonzini
  0 siblings, 2 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-21 19:44 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Mon, 2021-09-20 at 08:54 -0400, Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.

I would consider replacing

"For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized."

something like

"On bare-metal SGX, start of a power cycle zeros all of its reserved
memory. This happens after every reboot, but in addition to that
happens after waking up from any of the sleep states."

I can speculate and imagine where this might useful, but no matter how
trivial or complex it is, this patch needs to nail a concrete usage
example. I'd presume you know well the exact changes needed for QEMU, so
from that knowledge it should be easy to write the motivational part.

For instance, point out where it is needed in QEMU and why. I.e. why
you end up in the first place having to re-use vepc buffers (or whatever
they should be called) in QEMU. When that is taken care of, then there is
a red line to eventually ack these patches.

About the motivation.

In Linux we do have a mechanism to take care of this in a guest, for which
motivation was actually first and foremost kexec. It was not done to let VMM to
give a corrupted memory state to a guest.

Even to a Linux guest, since EPC should stil be represented in the state that
matches the hardware.  It'd be essentially a corrupted state, even if there was
measures to resist this. Windows guests failing is essentially a side-effect
of an issue, not an issue in the Windows guests.

Since QEMU needs to reinitialize VEPC buffers for guests, it should be as
efficient as we ever can make it. Just fill the gap of understanding why
QEMU needs to do this for guest. This is exactly kind of stuff that you want
have documented in the commit log for future :-)

/Jarkko


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-21 19:44   ` Jarkko Sakkinen
@ 2021-09-21 19:46     ` Jarkko Sakkinen
  2021-09-23 12:08     ` Paolo Bonzini
  1 sibling, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-21 19:46 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Tue, 2021-09-21 at 22:44 +0300, Jarkko Sakkinen wrote:
> Even to a Linux guest, since EPC should stil be represented in the state that
> matches the hardware.  It'd be essentially a corrupted state, even if there was
> measures to resist this. Windows guests failing is essentially a side-effect
> of an issue, not an issue in the Windows guests.

Ugh, typos, sorry. Even to a Linux guest it would be illegit what I was meaning
to say...

/Jarkko


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-21 19:44   ` Jarkko Sakkinen
  2021-09-21 19:46     ` Jarkko Sakkinen
@ 2021-09-23 12:08     ` Paolo Bonzini
  2021-09-23 20:33       ` Jarkko Sakkinen
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-23 12:08 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On 21/09/21 21:44, Jarkko Sakkinen wrote:
> "On bare-metal SGX, start of a power cycle zeros all of its reserved 
> memory. This happens after every reboot, but in addition to that 
> happens after waking up from any of the sleep states."
> 
> I can speculate and imagine where this might useful, but no matter
> how trivial or complex it is, this patch needs to nail a concrete
> usage example. I'd presume you know well the exact changes needed for
> QEMU, so from that knowledge it should be easy to write the
> motivational part.

Assuming that it's obvious that QEMU knows how to reset a machine (which 
includes writes to the ACPI reset register, or wakeup from sleep 
states), the question of "why does userspace reuse vEPC" should be 
answered by this paragraph:

"One way to do this is to simply close and reopen the /dev/sgx_vepc file
descriptor and re-mmap the virtual EPC.  However, this is problematic
because it prevents sandboxing the userspace (for example forbidding
open() after the guest starts, or running in a mount namespace that
does not have access to /dev; both are doable with pre-opened file
descriptors and/or SCM_RIGHTS file descriptor passing)."

> Even to a Linux guest, since EPC should stil be represented in the
> state that matches the hardware.  It'd be essentially a corrupted
> state, even if there was measures to resist this. Windows guests
> failing is essentially a side-effect of an issue, not an issue in the
> Windows guests.

Right, Linux is more liberal than it needs to be and ksgxd does the 
EREMOVE itself at the beginning (__sgx_sanitize_pages).  Windows has 
stronger expectations of what can and cannot happen before it boots, 
which are entirely justified.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-23 12:08     ` Paolo Bonzini
@ 2021-09-23 20:33       ` Jarkko Sakkinen
  0 siblings, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-23 20:33 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Thu, 2021-09-23 at 14:08 +0200, Paolo Bonzini wrote:
> On 21/09/21 21:44, Jarkko Sakkinen wrote:
> > "On bare-metal SGX, start of a power cycle zeros all of its reserved 
> > memory. This happens after every reboot, but in addition to that 
> > happens after waking up from any of the sleep states."
> > 
> > I can speculate and imagine where this might useful, but no matter
> > how trivial or complex it is, this patch needs to nail a concrete
> > usage example. I'd presume you know well the exact changes needed for
> > QEMU, so from that knowledge it should be easy to write the
> > motivational part.
> 
> Assuming that it's obvious that QEMU knows how to reset a machine (which 
> includes writes to the ACPI reset register, or wakeup from sleep 
> states), the question of "why does userspace reuse vEPC" should be 
> answered by this paragraph:
> 
> "One way to do this is to simply close and reopen the /dev/sgx_vepc file
> descriptor and re-mmap the virtual EPC.  However, this is problematic
> because it prevents sandboxing the userspace (for example forbidding
> open() after the guest starts, or running in a mount namespace that
> does not have access to /dev; both are doable with pre-opened file
> descriptors and/or SCM_RIGHTS file descriptor passing)."

Right, this makes sense.

> 
> > Even to a Linux guest, since EPC should stil be represented in the
> > state that matches the hardware.  It'd be essentially a corrupted
> > state, even if there was measures to resist this. Windows guests
> > failing is essentially a side-effect of an issue, not an issue in the
> > Windows guests.
> 
> Right, Linux is more liberal than it needs to be and ksgxd does the 
> EREMOVE itself at the beginning (__sgx_sanitize_pages).  Windows has 
> stronger expectations of what can and cannot happen before it boots, 
> which are entirely justified.
> 
> Paolo

Yep. We do it for kexec(). Alternative would be to zero at the time
of kexec() but this way things are just way more simpler, e.g. the
whole behaviour is local to the driver...

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-14  5:36             ` Paolo Bonzini
@ 2021-09-14 16:05               ` Jarkko Sakkinen
  0 siblings, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-14 16:05 UTC (permalink / raw)
  To: Paolo Bonzini, Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Tue, 2021-09-14 at 07:36 +0200, Paolo Bonzini wrote:
> On 13/09/21 23:13, Jarkko Sakkinen wrote:
> > > Apart from reclaiming, /dev/sgx_vepc might disappear between the first
> > > open() and subsequent ones.
> > 
> > If /dev/sgx_vepc disappears, why is it a problem *for the software*, and
> > not a sysadmin problem?
> 
> Rather than disappearing, it could be that a program first gets all the 
> resources it needs before it gets malicious input, and then enter a 
> restrictive sandbox.  In this case open() could be completely forbidden.
> 
> I will improve the documentation and changelogs when I post the non-RFC 
> version; that could have been done better, sorry.
> 

No worries, just trying to get bottom of the actual issue.

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 21:13           ` Jarkko Sakkinen
@ 2021-09-14  5:36             ` Paolo Bonzini
  2021-09-14 16:05               ` Jarkko Sakkinen
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-14  5:36 UTC (permalink / raw)
  To: Jarkko Sakkinen, Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On 13/09/21 23:13, Jarkko Sakkinen wrote:
>> Apart from reclaiming, /dev/sgx_vepc might disappear between the first
>> open() and subsequent ones.
>
> If /dev/sgx_vepc disappears, why is it a problem *for the software*, and
> not a sysadmin problem?

Rather than disappearing, it could be that a program first gets all the 
resources it needs before it gets malicious input, and then enter a 
restrictive sandbox.  In this case open() could be completely forbidden.

I will improve the documentation and changelogs when I post the non-RFC 
version; that could have been done better, sorry.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 19:25               ` Dave Hansen
@ 2021-09-13 21:16                 ` Jarkko Sakkinen
  0 siblings, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-13 21:16 UTC (permalink / raw)
  To: Dave Hansen, Paolo Bonzini, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Mon, 2021-09-13 at 12:25 -0700, Dave Hansen wrote:
> On 9/13/21 11:35 AM, Paolo Bonzini wrote:
> > > > Apart from reclaiming, /dev/sgx_vepc might disappear between the first
> > > > open() and subsequent ones.
> > > 
> > > Aside from it being rm'd, I don't think that's possible.
> > > 
> > 
> > Being rm'd would be a possibility in principle, and it would be ugly for
> > it to cause issues on running VMs.  Also I'd like for it to be able to
> > pass /dev/sgx_vepc in via a file descriptor, and run QEMU in a chroot or
> > a mount namespace.  Alternatively, with seccomp it may be possible to
> > sandbox a running QEMU process in such a way that open() is forbidden at
> > runtime (all hotplug is done via file descriptor passing); it is not yet
> > possible, but it is a goal.
> 
> OK, so maybe another way of saying this:
> 
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances.
> 
> But, vepc users have a problem: they might want to run an OS that
> expects to be booted with clean, fully uninitialized SGX state, just as
> it would be on bare-metal.  Right now, the only way to get that is to
> create a new vepc instance.  That might not be possible in all
> configurations, like if the permission to open(/dev/sgx_vepc) has been
> lost since the VM was first booted.

So you maintain your systems in a way that this does not happen?

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 18:35             ` Paolo Bonzini
  2021-09-13 19:25               ` Dave Hansen
@ 2021-09-13 21:15               ` Jarkko Sakkinen
  1 sibling, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-13 21:15 UTC (permalink / raw)
  To: Paolo Bonzini, Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Mon, 2021-09-13 at 20:35 +0200, Paolo Bonzini wrote:
> On 13/09/21 17:29, Dave Hansen wrote:
> > On 9/13/21 8:14 AM, Paolo Bonzini wrote:
> > > On 13/09/21 16:55, Dave Hansen wrote:
> > > > > By "Windows startup" I mean even after guest reboot.  Because another
> > > > > process could sneak in and steal your EPC pages between a close() and an
> > > > > open(), I'd like to have a way to EREMOVE the pages while keeping them
> > > > > assigned to the specific vEPC instance, i.e.*without*  going through
> > > > > sgx_vepc_free_page().
> > > > Oh, so you want fresh EPC state for the guest, but you're concerned that
> > > > the previous guest might have left them in a bad state.  The current
> > > > method of getting a new vepc instance (which guarantees fresh state) has
> > > > some other downsides.
> > > > 
> > > > Can't another process steal pages via sgxd and reclaim at any time?
> > > 
> > > vEPC pages never call sgx_mark_page_reclaimable, don't they?
> > 
> > Oh, I was just looking that they were on the SGX LRU.  You might be right.
> > But, we certainly don't want the fact that they are unreclaimable today
> > to be part of the ABI.  It's more of a bug than a feature.
> 
> Sure, that's fine.
> 
> > > > What's the extra concern here about going through a close()/open()
> > > > cycle?  Performance?
> > > 
> > > Apart from reclaiming, /dev/sgx_vepc might disappear between the first
> > > open() and subsequent ones.
> > 
> > Aside from it being rm'd, I don't think that's possible.
> > 
> 
> Being rm'd would be a possibility in principle, and it would be ugly for 
> it to cause issues on running VMs.  Also I'd like for it to be able to 
> pass /dev/sgx_vepc in via a file descriptor, and run QEMU in a chroot or 
> a mount namespace.  Alternatively, with seccomp it may be possible to 
> sandbox a running QEMU process in such a way that open() is forbidden at 
> runtime (all hotplug is done via file descriptor passing); it is not yet 
> possible, but it is a goal.

AFAIK, as long you have open files for a device, they work
as expected.

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 15:14         ` Paolo Bonzini
  2021-09-13 15:29           ` Dave Hansen
@ 2021-09-13 21:13           ` Jarkko Sakkinen
  2021-09-14  5:36             ` Paolo Bonzini
  1 sibling, 1 reply; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-13 21:13 UTC (permalink / raw)
  To: Paolo Bonzini, Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Mon, 2021-09-13 at 17:14 +0200, Paolo Bonzini wrote:
> On 13/09/21 16:55, Dave Hansen wrote:
> > > By "Windows startup" I mean even after guest reboot.  Because another
> > > process could sneak in and steal your EPC pages between a close() and an
> > > open(), I'd like to have a way to EREMOVE the pages while keeping them
> > > assigned to the specific vEPC instance, i.e.*without*  going through
> > > sgx_vepc_free_page().
> > Oh, so you want fresh EPC state for the guest, but you're concerned that
> > the previous guest might have left them in a bad state.  The current
> > method of getting a new vepc instance (which guarantees fresh state) has
> > some other downsides.
> > 
> > Can't another process steal pages via sgxd and reclaim at any time?
> 
> vEPC pages never call sgx_mark_page_reclaimable, don't they?
> 
> > What's the extra concern here about going through a close()/open()
> > cycle?  Performance?
> 
> Apart from reclaiming, /dev/sgx_vepc might disappear between the first 
> open() and subsequent ones.

If /dev/sgx_vepc dissapears, why is it a problem *for the software*, and
not a sysadmin problem?

I think that this is what the whole patch is lacking, why are we talking
about a software problem...

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 14:55       ` Dave Hansen
  2021-09-13 15:14         ` Paolo Bonzini
@ 2021-09-13 21:12         ` Jarkko Sakkinen
  1 sibling, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-13 21:12 UTC (permalink / raw)
  To: Dave Hansen, Paolo Bonzini, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Mon, 2021-09-13 at 07:55 -0700, Dave Hansen wrote:
> On 9/13/21 7:24 AM, Paolo Bonzini wrote:
> > > How does this end up happening in the first place?
> > > 
> > > All enclave pages should start out on 'sgx_dirty_page_list' and
> > > ksgxd sanitizes them with EREMOVE before making them available.  That
> > > should cover EREMOVE after reboots while SGX pages are initialized,
> > > including kexec().
> > 
> > By "Windows startup" I mean even after guest reboot.  Because another
> > process could sneak in and steal your EPC pages between a close() and an
> > open(), I'd like to have a way to EREMOVE the pages while keeping them
> > assigned to the specific vEPC instance, i.e. *without* going through
> > sgx_vepc_free_page().
> 
> Oh, so you want fresh EPC state for the guest, but you're concerned that
> the previous guest might have left them in a bad state.  The current
> method of getting a new vepc instance (which guarantees fresh state) has
> some other downsides.
> 
> Can't another process steal pages via sgxd and reclaim at any time?
> What's the extra concern here about going through a close()/open()
> cycle?  Performance?

sgxd does not steal anything from vepc regions.

They are not part of the reclaiming process.

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 14:24     ` Paolo Bonzini
  2021-09-13 14:55       ` Dave Hansen
@ 2021-09-13 21:00       ` Jarkko Sakkinen
  1 sibling, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-13 21:00 UTC (permalink / raw)
  To: Paolo Bonzini, Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Mon, 2021-09-13 at 16:24 +0200, Paolo Bonzini wrote:
> On 13/09/21 16:05, Dave Hansen wrote:
> > On 9/13/21 6:11 AM, Paolo Bonzini wrote:
> > > Windows expects all pages to be in uninitialized state on startup.
> > > In order to implement this, we will need a ioctl that performs
> > > EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
> > > other possibilities, such as closing and reopening the device,
> > > are racy.
> > 
> > Hi Paolo,
> > 
> > How does this end up happening in the first place?
> > 
> > All enclave pages should start out on 'sgx_dirty_page_list' and
> > ksgxd sanitizes them with EREMOVE before making them available.  That
> > should cover EREMOVE after reboots while SGX pages are initialized,
> > including kexec().
> 
> By "Windows startup" I mean even after guest reboot.  Because another 
> process could sneak in and steal your EPC pages between a close() and an 
> open(), I'd like to have a way to EREMOVE the pages while keeping them 
> assigned to the specific vEPC instance, i.e. *without* going through 
> sgx_vepc_free_page().

Isn't "other process in and steal your EPC pages" more like sysadmin
problem, rather than software?

I'm lacking of understanding what would be the collateral damage in
the end.

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 13:11 ` [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
  2021-09-13 14:05   ` Dave Hansen
@ 2021-09-13 20:33   ` Jarkko Sakkinen
  1 sibling, 0 replies; 26+ messages in thread
From: Jarkko Sakkinen @ 2021-09-13 20:33 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: x86, linux-sgx, dave.hansen, yang.zhong

On Mon, 2021-09-13 at 09:11 -0400, Paolo Bonzini wrote:
> Windows expects all pages to be in uninitialized state on startup.
> In order to implement this, we will need a ioctl that performs
> EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
> other possibilities, such as closing and reopening the device,
> are racy.

So what makes it racy, and what do mean by racy in this case?

I.e. you have to do open() and mmap(), and munmap() + close()
for removal. Depending on situation that is racy or not...

And is "Windows" more precisely a "Windows guest running in
Linux QEMU host"? It's ambiguous..

/Jarkko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 18:35             ` Paolo Bonzini
@ 2021-09-13 19:25               ` Dave Hansen
  2021-09-13 21:16                 ` Jarkko Sakkinen
  2021-09-13 21:15               ` Jarkko Sakkinen
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2021-09-13 19:25 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 9/13/21 11:35 AM, Paolo Bonzini wrote:
>>> Apart from reclaiming, /dev/sgx_vepc might disappear between the first
>>> open() and subsequent ones.
>>
>> Aside from it being rm'd, I don't think that's possible.
>>
> 
> Being rm'd would be a possibility in principle, and it would be ugly for
> it to cause issues on running VMs.  Also I'd like for it to be able to
> pass /dev/sgx_vepc in via a file descriptor, and run QEMU in a chroot or
> a mount namespace.  Alternatively, with seccomp it may be possible to
> sandbox a running QEMU process in such a way that open() is forbidden at
> runtime (all hotplug is done via file descriptor passing); it is not yet
> possible, but it is a goal.

OK, so maybe another way of saying this:

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances.

But, vepc users have a problem: they might want to run an OS that
expects to be booted with clean, fully uninitialized SGX state, just as
it would be on bare-metal.  Right now, the only way to get that is to
create a new vepc instance.  That might not be possible in all
configurations, like if the permission to open(/dev/sgx_vepc) has been
lost since the VM was first booted.

Windows has these expectations about booting with fully uninitialized
state.  There are also a number of environments where QEMU is sandboxed
or drops permissions in a way that prevent free and open access to
/dev/sgx_vepc.

So good so far?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 15:29           ` Dave Hansen
@ 2021-09-13 18:35             ` Paolo Bonzini
  2021-09-13 19:25               ` Dave Hansen
  2021-09-13 21:15               ` Jarkko Sakkinen
  0 siblings, 2 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-13 18:35 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 13/09/21 17:29, Dave Hansen wrote:
> On 9/13/21 8:14 AM, Paolo Bonzini wrote:
>> On 13/09/21 16:55, Dave Hansen wrote:
>>>> By "Windows startup" I mean even after guest reboot.  Because another
>>>> process could sneak in and steal your EPC pages between a close() and an
>>>> open(), I'd like to have a way to EREMOVE the pages while keeping them
>>>> assigned to the specific vEPC instance, i.e.*without*  going through
>>>> sgx_vepc_free_page().
>>> Oh, so you want fresh EPC state for the guest, but you're concerned that
>>> the previous guest might have left them in a bad state.  The current
>>> method of getting a new vepc instance (which guarantees fresh state) has
>>> some other downsides.
>>>
>>> Can't another process steal pages via sgxd and reclaim at any time?
>>
>> vEPC pages never call sgx_mark_page_reclaimable, don't they?
> 
> Oh, I was just looking that they were on the SGX LRU.  You might be right.
> But, we certainly don't want the fact that they are unreclaimable today
> to be part of the ABI.  It's more of a bug than a feature.

Sure, that's fine.

>>> What's the extra concern here about going through a close()/open()
>>> cycle?  Performance?
>>
>> Apart from reclaiming, /dev/sgx_vepc might disappear between the first
>> open() and subsequent ones.
> 
> Aside from it being rm'd, I don't think that's possible.
> 

Being rm'd would be a possibility in principle, and it would be ugly for 
it to cause issues on running VMs.  Also I'd like for it to be able to 
pass /dev/sgx_vepc in via a file descriptor, and run QEMU in a chroot or 
a mount namespace.  Alternatively, with seccomp it may be possible to 
sandbox a running QEMU process in such a way that open() is forbidden at 
runtime (all hotplug is done via file descriptor passing); it is not yet 
possible, but it is a goal.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 15:14         ` Paolo Bonzini
@ 2021-09-13 15:29           ` Dave Hansen
  2021-09-13 18:35             ` Paolo Bonzini
  2021-09-13 21:13           ` Jarkko Sakkinen
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2021-09-13 15:29 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 9/13/21 8:14 AM, Paolo Bonzini wrote:
> On 13/09/21 16:55, Dave Hansen wrote:
>>> By "Windows startup" I mean even after guest reboot.  Because another
>>> process could sneak in and steal your EPC pages between a close() and an
>>> open(), I'd like to have a way to EREMOVE the pages while keeping them
>>> assigned to the specific vEPC instance, i.e.*without*  going through
>>> sgx_vepc_free_page().
>> Oh, so you want fresh EPC state for the guest, but you're concerned that
>> the previous guest might have left them in a bad state.  The current
>> method of getting a new vepc instance (which guarantees fresh state) has
>> some other downsides.
>>
>> Can't another process steal pages via sgxd and reclaim at any time?
> 
> vEPC pages never call sgx_mark_page_reclaimable, don't they?

Oh, I was just looking that they were on the SGX LRU.  You might be right.

But, we certainly don't want the fact that they are unreclaimable today
to be part of the ABI.  It's more of a bug than a feature.

>> What's the extra concern here about going through a close()/open()
>> cycle?  Performance?
> 
> Apart from reclaiming, /dev/sgx_vepc might disappear between the first
> open() and subsequent ones.

Aside from it being rm'd, I don't think that's possible.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 14:55       ` Dave Hansen
@ 2021-09-13 15:14         ` Paolo Bonzini
  2021-09-13 15:29           ` Dave Hansen
  2021-09-13 21:13           ` Jarkko Sakkinen
  2021-09-13 21:12         ` Jarkko Sakkinen
  1 sibling, 2 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-13 15:14 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 13/09/21 16:55, Dave Hansen wrote:
>> By "Windows startup" I mean even after guest reboot.  Because another
>> process could sneak in and steal your EPC pages between a close() and an
>> open(), I'd like to have a way to EREMOVE the pages while keeping them
>> assigned to the specific vEPC instance, i.e.*without*  going through
>> sgx_vepc_free_page().
> Oh, so you want fresh EPC state for the guest, but you're concerned that
> the previous guest might have left them in a bad state.  The current
> method of getting a new vepc instance (which guarantees fresh state) has
> some other downsides.
> 
> Can't another process steal pages via sgxd and reclaim at any time?

vEPC pages never call sgx_mark_page_reclaimable, don't they?

> What's the extra concern here about going through a close()/open()
> cycle?  Performance?

Apart from reclaiming, /dev/sgx_vepc might disappear between the first 
open() and subsequent ones.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 14:24     ` Paolo Bonzini
@ 2021-09-13 14:55       ` Dave Hansen
  2021-09-13 15:14         ` Paolo Bonzini
  2021-09-13 21:12         ` Jarkko Sakkinen
  2021-09-13 21:00       ` Jarkko Sakkinen
  1 sibling, 2 replies; 26+ messages in thread
From: Dave Hansen @ 2021-09-13 14:55 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 9/13/21 7:24 AM, Paolo Bonzini wrote:
>> How does this end up happening in the first place?
>>
>> All enclave pages should start out on 'sgx_dirty_page_list' and
>> ksgxd sanitizes them with EREMOVE before making them available.  That
>> should cover EREMOVE after reboots while SGX pages are initialized,
>> including kexec().
> 
> By "Windows startup" I mean even after guest reboot.  Because another
> process could sneak in and steal your EPC pages between a close() and an
> open(), I'd like to have a way to EREMOVE the pages while keeping them
> assigned to the specific vEPC instance, i.e. *without* going through
> sgx_vepc_free_page().

Oh, so you want fresh EPC state for the guest, but you're concerned that
the previous guest might have left them in a bad state.  The current
method of getting a new vepc instance (which guarantees fresh state) has
some other downsides.

Can't another process steal pages via sgxd and reclaim at any time?
What's the extra concern here about going through a close()/open()
cycle?  Performance?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 14:05   ` Dave Hansen
@ 2021-09-13 14:24     ` Paolo Bonzini
  2021-09-13 14:55       ` Dave Hansen
  2021-09-13 21:00       ` Jarkko Sakkinen
  0 siblings, 2 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-13 14:24 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm
  Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 13/09/21 16:05, Dave Hansen wrote:
> On 9/13/21 6:11 AM, Paolo Bonzini wrote:
>> Windows expects all pages to be in uninitialized state on startup.
>> In order to implement this, we will need a ioctl that performs
>> EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
>> other possibilities, such as closing and reopening the device,
>> are racy.
> 
> Hi Paolo,
> 
> How does this end up happening in the first place?
> 
> All enclave pages should start out on 'sgx_dirty_page_list' and
> ksgxd sanitizes them with EREMOVE before making them available.  That
> should cover EREMOVE after reboots while SGX pages are initialized,
> including kexec().

By "Windows startup" I mean even after guest reboot.  Because another 
process could sneak in and steal your EPC pages between a close() and an 
open(), I'd like to have a way to EREMOVE the pages while keeping them 
assigned to the specific vEPC instance, i.e. *without* going through 
sgx_vepc_free_page().

Thanks,

Paolo

> sgx_vepc_free_page() should do the same for pages that a guest not not
> clean up properly.
> 
> sgx_encl_free_epc_page() does an EREMOVE after a normal enclave has used
> a page.
> 
> Those are the only three cases that I can think of.  So, it sounds like
> one of those is buggy, or there's another unexpected path out there.
> Ultimately, I think it would be really handy if we could do this EREMOVE
> implicitly and without any new ABI.
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 13:11 ` [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
@ 2021-09-13 14:05   ` Dave Hansen
  2021-09-13 14:24     ` Paolo Bonzini
  2021-09-13 20:33   ` Jarkko Sakkinen
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2021-09-13 14:05 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

On 9/13/21 6:11 AM, Paolo Bonzini wrote:
> Windows expects all pages to be in uninitialized state on startup.
> In order to implement this, we will need a ioctl that performs
> EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
> other possibilities, such as closing and reopening the device,
> are racy.

Hi Paolo,

How does this end up happening in the first place?

All enclave pages should start out on 'sgx_dirty_page_list' and
ksgxd sanitizes them with EREMOVE before making them available.  That
should cover EREMOVE after reboots while SGX pages are initialized,
including kexec().

sgx_vepc_free_page() should do the same for pages that a guest not not
clean up properly.

sgx_encl_free_epc_page() does an EREMOVE after a normal enclave has used
a page.

Those are the only three cases that I can think of.  So, it sounds like
one of those is buggy, or there's another unexpected path out there.
Ultimately, I think it would be really handy if we could do this EREMOVE
implicitly and without any new ABI.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page
  2021-09-13 13:11 [RFC/RFT PATCH 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
@ 2021-09-13 13:11 ` Paolo Bonzini
  2021-09-13 14:05   ` Dave Hansen
  2021-09-13 20:33   ` Jarkko Sakkinen
  0 siblings, 2 replies; 26+ messages in thread
From: Paolo Bonzini @ 2021-09-13 13:11 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: x86, linux-sgx, jarkko, dave.hansen, yang.zhong

Windows expects all pages to be in uninitialized state on startup.
In order to implement this, we will need a ioctl that performs
EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
other possibilities, such as closing and reopening the device,
are racy.

Start the implementation by pulling the EREMOVE into a separate
function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/cpu/sgx/virt.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 64511c4a5200..59b9c13121cd 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -111,7 +111,7 @@ static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
-static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+static int sgx_vepc_remove_page(struct sgx_epc_page *epc_page)
 {
 	int ret;
 
@@ -140,11 +140,17 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 		 */
 		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
 			  ret, ret);
-		return ret;
 	}
+	return ret;
+}
 
-	sgx_free_epc_page(epc_page);
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret = sgx_vepc_remove_page(epc_page);
+	if (ret)
+		return ret;
 
+	sgx_free_epc_page(epc_page);
 	return 0;
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-09-23 20:33 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 12:53 [PATCH 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
2021-09-20 12:54 ` [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
2021-09-21 19:44   ` Jarkko Sakkinen
2021-09-21 19:46     ` Jarkko Sakkinen
2021-09-23 12:08     ` Paolo Bonzini
2021-09-23 20:33       ` Jarkko Sakkinen
2021-09-20 12:54 ` [PATCH 2/2] x86: sgx_vepc: implement SGX_IOC_VEPC_REMOVE_ALL ioctl Paolo Bonzini
2021-09-20 22:17   ` Kai Huang
2021-09-20 23:09     ` Dave Hansen
2021-09-21 10:29       ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2021-09-13 13:11 [RFC/RFT PATCH 0/2] x86: sgx_vepc: implement ioctl to EREMOVE all pages Paolo Bonzini
2021-09-13 13:11 ` [PATCH 1/2] x86: sgx_vepc: extract sgx_vepc_remove_page Paolo Bonzini
2021-09-13 14:05   ` Dave Hansen
2021-09-13 14:24     ` Paolo Bonzini
2021-09-13 14:55       ` Dave Hansen
2021-09-13 15:14         ` Paolo Bonzini
2021-09-13 15:29           ` Dave Hansen
2021-09-13 18:35             ` Paolo Bonzini
2021-09-13 19:25               ` Dave Hansen
2021-09-13 21:16                 ` Jarkko Sakkinen
2021-09-13 21:15               ` Jarkko Sakkinen
2021-09-13 21:13           ` Jarkko Sakkinen
2021-09-14  5:36             ` Paolo Bonzini
2021-09-14 16:05               ` Jarkko Sakkinen
2021-09-13 21:12         ` Jarkko Sakkinen
2021-09-13 21:00       ` Jarkko Sakkinen
2021-09-13 20:33   ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).