All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] x86/sgx: implement support for MADV_WILLNEED
@ 2022-10-19 19:14 Haitao Huang
  2022-10-19 19:14 ` [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
  0 siblings, 1 reply; 9+ messages in thread
From: Haitao Huang @ 2022-10-19 19:14 UTC (permalink / raw)
  To: linux-sgx, jarkko, dave.hansen, reinette.chatre, vijay.dhanraj

Currently, after enclave initialization, EPC pages can only be added
one page at a time on page fault. This may be too slow for some use
cases in which large number of EPC pages are dynamically allocated.

Previously we have attempted [1] to address this issue by implementing
support for the semantics of MAP_POPULATE flag passed into mmap(). However,
some mm maintainers have concerns on adding a new callback in fops [2].

This series is to adopt the MADV_WILLNEED alternative suggested
by Dave in previous discussions [3]. The sgx driver implements the
fops->fadvise() so that user space will be able to use
madvise(..., MADV_WILLNEED) to instruct kernel to EAUG pages as
soon as possible for a given range.

Compared to the MAP_POPULATE approach, this alternative requires an
additional call to madvise() after mmap() from user space. But it
would not need any changes in kernel outside sgx driver. The separate
madvise() call also offers flexibility for user space to specify a
subrange to EAUG in an enclosing VMA.

The core implementation is in the second patch while the first patch
only refactors out a function handling EAUG on PF to be reused.

The last two patches are to add a microbenchmark in the sgx selftest to
measure the performance difference.  Following speedup on various
allocation sizes were observed when I ran it on a platform with 4G EPC.
It indicates that the change would roughly half the run time until
EPC swapping is activated, at which point EAUG for madvise is stopped.
-------------------------
  Alloc. size: Speedup
-------------------------
      1 page : 75%
      2 pages: 48%
      4 pages: 55%
      8 pages: 58%
     16 pages: 62%
     32 pages: 62%
     64 pages: 62%
    128 pages: 62%
    256 pages: 73%
    512 pages: 62%
   1024 pages: 62%
   2048 pages: 62%
   4096 pages: 61%
   8192 pages: 61%
  16384 pages: 61%
  32768 pages: 71%
  65536 pages: 61%
 131072 pages: 62%
 262144 pages: 62%
 524288 pages: 62%
1048576 pages: 55%
2097152 pages: 19%
-------------------------

Thank you very much for your attention and any comments/feedback.

Haitao

[1]https://lore.kernel.org/all/20220308112833.262805-1-jarkko@kernel.org/
[2]https://lore.kernel.org/linux-sgx/20220306021534.83553-1-jarkko@kernel.org/
[3]https://lore.kernel.org/linux-sgx/c3083144-bfc1-3260-164c-e59b2d110df8@intel.com/

Haitao Huang (4):
  x86/sgx: Export sgx_encl_eaug_page
  x86/sgx: Implement support for MADV_WILLNEED
  selftests/sgx: add len field for EACCEPT op
  selftests/sgx: Add test for madvise(..., WILLNEED)

 arch/x86/kernel/cpu/sgx/driver.c        |  81 ++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c          |  46 ++++---
 arch/x86/kernel/cpu/sgx/encl.h          |   3 +-
 tools/testing/selftests/sgx/defines.h   |   1 +
 tools/testing/selftests/sgx/main.c      | 167 ++++++++++++++++++++++++
 tools/testing/selftests/sgx/test_encl.c |  20 ++-
 6 files changed, 295 insertions(+), 23 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page
  2022-10-19 19:14 [RFC PATCH 0/4] x86/sgx: implement support for MADV_WILLNEED Haitao Huang
@ 2022-10-19 19:14 ` Haitao Huang
  2022-10-19 19:14   ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
  2022-10-23 21:22   ` [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page Jarkko Sakkinen
  0 siblings, 2 replies; 9+ messages in thread
From: Haitao Huang @ 2022-10-19 19:14 UTC (permalink / raw)
  To: linux-sgx, jarkko, dave.hansen, reinette.chatre, vijay.dhanraj

Change return type so it can be reused later for fops->fadvise

Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 46 ++++++++++++++++++++++------------
 arch/x86/kernel/cpu/sgx/encl.h |  3 ++-
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 8bdeae2fc309..c57e60d5a0aa 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -305,11 +305,11 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
  * on a SGX2 system then the EPC can be added dynamically via the SGX2
  * ENCLS[EAUG] instruction.
  *
- * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
- * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
+ * Returns: 0 when PTE was installed successfully, -EBUSY for waiting on
+ * reclaimer to free EPC, -ENOMEM for out of RAM, -EFAULT as error otherwise.
  */
-static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
-				     struct sgx_encl *encl, unsigned long addr)
+int sgx_encl_eaug_page(struct vm_area_struct *vma,
+		       struct sgx_encl *encl, unsigned long addr)
 {
 	vm_fault_t vmret = VM_FAULT_SIGBUS;
 	struct sgx_pageinfo pginfo = {0};
@@ -318,10 +318,10 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	struct sgx_va_page *va_page;
 	unsigned long phys_addr;
 	u64 secinfo_flags;
-	int ret;
+	int ret = -EFAULT;
 
 	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
-		return VM_FAULT_SIGBUS;
+		return -EFAULT;
 
 	/*
 	 * Ignore internal permission checking for dynamically added pages.
@@ -332,21 +332,21 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
 	encl_page = sgx_encl_page_alloc(encl, addr - encl->base, secinfo_flags);
 	if (IS_ERR(encl_page))
-		return VM_FAULT_OOM;
+		return -ENOMEM;
 
 	mutex_lock(&encl->lock);
 
 	epc_page = sgx_alloc_epc_page(encl_page, false);
 	if (IS_ERR(epc_page)) {
 		if (PTR_ERR(epc_page) == -EBUSY)
-			vmret =  VM_FAULT_NOPAGE;
+			ret =  -EBUSY;
 		goto err_out_unlock;
 	}
 
 	va_page = sgx_encl_grow(encl, false);
 	if (IS_ERR(va_page)) {
 		if (PTR_ERR(va_page) == -EBUSY)
-			vmret = VM_FAULT_NOPAGE;
+			ret = -EBUSY;
 		goto err_out_epc;
 	}
 
@@ -359,16 +359,20 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	 * If ret == -EBUSY then page was created in another flow while
 	 * running without encl->lock
 	 */
-	if (ret)
+	if (ret) {
+		ret = -EFAULT;
 		goto err_out_shrink;
+	}
 
 	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
 	pginfo.addr = encl_page->desc & PAGE_MASK;
 	pginfo.metadata = 0;
 
 	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
-	if (ret)
+	if (ret) {
+		ret = -EFAULT;
 		goto err_out;
+	}
 
 	encl_page->encl = encl;
 	encl_page->epc_page = epc_page;
@@ -385,10 +389,10 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	vmret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
 	if (vmret != VM_FAULT_NOPAGE) {
 		mutex_unlock(&encl->lock);
-		return VM_FAULT_SIGBUS;
+		return -EFAULT;
 	}
 	mutex_unlock(&encl->lock);
-	return VM_FAULT_NOPAGE;
+	return 0;
 
 err_out:
 	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
@@ -401,7 +405,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	mutex_unlock(&encl->lock);
 	kfree(encl_page);
 
-	return vmret;
+	return ret;
 }
 
 static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
@@ -431,8 +435,18 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 	 * enclave that will be checked for right away.
 	 */
 	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
-	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
-		return sgx_encl_eaug_page(vma, encl, addr);
+	    (!xa_load(&encl->page_array, PFN_DOWN(addr)))) {
+		switch (sgx_encl_eaug_page(vma, encl, addr)) {
+		case 0:
+		case -EBUSY:
+			return VM_FAULT_NOPAGE;
+		case -ENOMEM:
+			return VM_FAULT_OOM;
+		case -EFAULT:
+		default:
+			return VM_FAULT_SIGBUS;
+		}
+	}
 
 	mutex_lock(&encl->lock);
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index a65a952116fd..36059d35e1bc 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -127,5 +127,6 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 					 unsigned long addr);
 struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim);
 void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
-
+int sgx_encl_eaug_page(struct vm_area_struct *vma,
+		       struct sgx_encl *encl, unsigned long addr);
 #endif /* _X86_ENCL_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED
  2022-10-19 19:14 ` [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
@ 2022-10-19 19:14   ` Haitao Huang
  2022-10-19 19:14     ` [RFC PATCH 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
  2022-10-23 21:23     ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
  2022-10-23 21:22   ` [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page Jarkko Sakkinen
  1 sibling, 2 replies; 9+ messages in thread
From: Haitao Huang @ 2022-10-19 19:14 UTC (permalink / raw)
  To: linux-sgx, jarkko, dave.hansen, reinette.chatre, vijay.dhanraj

Add support for madvise(..., MADV_WILLNEED) by adding
pages with EAUG. Implement fops->fadvise() callback to
achieve this behaviour.

Note this is only done with best effort possible. If any
errors encountered or EPC is under swapping, the operation
will stop and return as normal.

Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/driver.c | 81 ++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index aa9b8b868867..54b24897605b 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -2,6 +2,7 @@
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
 #include <linux/acpi.h>
+#include <linux/fadvise.h>
 #include <linux/miscdevice.h>
 #include <linux/mman.h>
 #include <linux/security.h>
@@ -9,6 +10,7 @@
 #include <asm/traps.h>
 #include "driver.h"
 #include "encl.h"
+#include "encls.h"
 
 u64 sgx_attributes_reserved_mask;
 u64 sgx_xfrm_reserved_mask = ~0x3;
@@ -97,10 +99,88 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &sgx_vm_ops;
 	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
 	vma->vm_private_data = encl;
+	/* Anchor vm_pgoff to the enclave base.
+	 * So offset passed back to sgx_fadvise hook
+	 * is relative to the enclave base
+	 */
+	vma->vm_pgoff = (vma->vm_start - encl->base) >> PAGE_SHIFT;
 
 	return 0;
 }
 
+/*
+ * Add new pages to the enclave sequentially with ENCLS[EAUG] for the WILLNEED advice.
+ * Only do this to existing VMAs in the same enclave and reject the request.
+ * Returns:	0 if EAUG done with best effort, -EINVAL if any sub-range given
+ * is not in the enclave, or enclave is not initialized..
+ */
+static int sgx_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
+{
+	struct sgx_encl *encl = file->private_data;
+	unsigned long start, end, pos;
+	int ret = -EINVAL;
+	struct vm_area_struct *vma = NULL;
+
+	/* Only support WILLNEED */
+	if (advice != POSIX_FADV_WILLNEED)
+		return -EINVAL;
+	if (!encl)
+		return -EINVAL;
+	if (!cpu_feature_enabled(X86_FEATURE_SGX2))
+		return -EINVAL;
+
+	if (offset + len < offset)
+		return -EINVAL;
+	if (encl->base + offset < encl->base)
+		return -EINVAL;
+	start  = offset + encl->base;
+	end = start + len;
+	if (end < start)
+		return -EINVAL;
+	if (end > encl->base + encl->size)
+		return -EINVAL;
+
+	/* EAUG works only for initialized enclaves. */
+	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+		return -EINVAL;
+
+	mmap_read_lock(current->mm);
+
+	vma = find_vma(current->mm, start);
+	if (!vma)
+		goto unlock;
+	if (vma->vm_private_data != encl)
+		goto unlock;
+
+	pos = start;
+	if (pos < vma->vm_start || end > vma->vm_end) {
+		/* Don't allow any gaps */
+		goto unlock;
+	}
+	/* Here: vm_start <= pos < end <= vm_end */
+	while (pos < end) {
+		if (xa_load(&encl->page_array, PFN_DOWN(pos)))
+			continue;
+		if (signal_pending(current)) {
+			if (pos == start)
+				ret = -ERESTARTSYS;
+			else
+				ret = -EINTR;
+			goto unlock;
+		}
+		ret = sgx_encl_eaug_page(vma, encl, pos);
+		/* It's OK to not finish */
+		if (ret)
+			break;
+		pos = pos + PAGE_SIZE;
+		cond_resched();
+	}
+	ret = 0;
+unlock:
+	mmap_read_unlock(current->mm);
+	return ret;
+}
+
 static unsigned long sgx_get_unmapped_area(struct file *file,
 					   unsigned long addr,
 					   unsigned long len,
@@ -133,6 +213,7 @@ static const struct file_operations sgx_encl_fops = {
 	.compat_ioctl		= sgx_compat_ioctl,
 #endif
 	.mmap			= sgx_mmap,
+	.fadvise		= sgx_fadvise,
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/4] selftests/sgx: add len field for EACCEPT op
  2022-10-19 19:14   ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
@ 2022-10-19 19:14     ` Haitao Huang
  2022-10-19 19:14       ` [RFC PATCH 4/4] selftests/sgx: Add test for madvise(..., WILLNEED) Haitao Huang
  2022-10-23 21:23     ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
  1 sibling, 1 reply; 9+ messages in thread
From: Haitao Huang @ 2022-10-19 19:14 UTC (permalink / raw)
  To: linux-sgx, jarkko, dave.hansen, reinette.chatre, vijay.dhanraj

So we can EACCEPT multiple pages inside enclave without EEXIT,
preparing for testing with MADV_WILLNEED for ranges bigger than
a single page.

Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
---
 tools/testing/selftests/sgx/defines.h   |  1 +
 tools/testing/selftests/sgx/test_encl.c | 20 ++++++++++++++------
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/sgx/defines.h b/tools/testing/selftests/sgx/defines.h
index d8587c971941..8578e773d3d8 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -60,6 +60,7 @@ struct encl_op_eaccept {
 	struct encl_op_header header;
 	uint64_t epc_addr;
 	uint64_t flags;
+	uint64_t len;
 	uint64_t ret;
 };
 
diff --git a/tools/testing/selftests/sgx/test_encl.c b/tools/testing/selftests/sgx/test_encl.c
index c0d6397295e3..fc797385200b 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -35,14 +35,22 @@ static void do_encl_eaccept(void *_op)
 	struct sgx_secinfo secinfo __aligned(sizeof(struct sgx_secinfo)) = {0};
 	struct encl_op_eaccept *op = _op;
 	int rax;
+	if (op->len == 0)
+		op->len = 4096;
 
 	secinfo.flags = op->flags;
-
-	asm volatile(".byte 0x0f, 0x01, 0xd7"
-				: "=a" (rax)
-				: "a" (EACCEPT),
-				  "b" (&secinfo),
-				  "c" (op->epc_addr));
+	for (uint64_t addr = op->epc_addr;
+			addr < op->epc_addr + op->len; addr += 4096) {
+		asm volatile(".byte 0x0f, 0x01, 0xd7"
+					: "=a" (rax)
+					: "a" (EACCEPT),
+					  "b" (&secinfo),
+					  "c" (addr));
+		if (rax) {
+			op->ret = rax;
+			return;
+		}
+	}
 
 	op->ret = rax;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 4/4] selftests/sgx: Add test for madvise(..., WILLNEED)
  2022-10-19 19:14     ` [RFC PATCH 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
@ 2022-10-19 19:14       ` Haitao Huang
  0 siblings, 0 replies; 9+ messages in thread
From: Haitao Huang @ 2022-10-19 19:14 UTC (permalink / raw)
  To: linux-sgx, jarkko, dave.hansen, reinette.chatre, vijay.dhanraj

Measure and compare run time for EAUG'ing different number
of EPC pages with/without madvise(..., WILLNEED) call.

Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
---
 tools/testing/selftests/sgx/main.c | 167 +++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 48976bb7bd79..7b5f6705716d 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -10,6 +10,7 @@
 #include <stdint.h>
 #include <stdlib.h>
 #include <string.h>
+#include <time.h>
 #include <unistd.h>
 #include <sys/ioctl.h>
 #include <sys/mman.h>
@@ -1358,6 +1359,172 @@ TEST_F_TIMEOUT(enclave, augment_via_eaccept_long, TIMEOUT_DEFAULT)
 	munmap(addr, ENCL_DYNAMIC_SIZE_LONG);
 }
 
+static int eaccept_range(struct _test_data_enclave *self, void *addr,
+			 unsigned long size, uint64_t flags,
+						struct __test_metadata *_metadata)
+{
+	struct encl_op_eaccept eaccept_op;
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+
+	/*
+	 * Run EACCEPT on every page to trigger the #PF->EAUG->EACCEPT(again
+	 * without a #PF). All should be transparent to userspace.
+	 */
+	eaccept_op.flags = flags;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+	eaccept_op.len = size;
+	eaccept_op.epc_addr = (uint64_t)(addr);
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	ASSERT_EQ(eaccept_op.ret, 0);
+	ASSERT_EQ(self->run.function, EEXIT);
+
+	return 0;
+}
+
+static int trim_remove_range(struct _test_data_enclave *self, void *addr,
+			     unsigned long size, struct __test_metadata *_metadata)
+{
+	int ret, errno_save;
+	struct sgx_enclave_remove_pages remove_ioc;
+	struct sgx_enclave_modify_types modt_ioc;
+	unsigned long offset;
+	unsigned long count;
+
+	if ((uint64_t)addr <= self->encl.encl_base)
+		return -1;
+	offset = (uint64_t)addr - self->encl.encl_base;
+
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	modt_ioc.offset = offset;
+	modt_ioc.length = size;
+	modt_ioc.page_type = SGX_PAGE_TYPE_TRIM;
+	count = 0;
+	do {
+		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPES, &modt_ioc);
+
+		errno_save = ret == -1 ? errno : 0;
+		if (errno_save != EAGAIN)
+			break;
+		EXPECT_EQ(modt_ioc.result, 0);
+
+		count += modt_ioc.count;
+		modt_ioc.offset += modt_ioc.count;
+		modt_ioc.length -= modt_ioc.count;
+		modt_ioc.result = 0;
+		modt_ioc.count = 0;
+	} while (modt_ioc.length != 0);
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(modt_ioc.result, 0);
+	count += modt_ioc.count;
+	EXPECT_EQ(count, size);
+
+	EXPECT_EQ(eaccept_range(self, addr, size,
+				SGX_SECINFO_TRIM | SGX_SECINFO_MODIFIED,
+				   _metadata), 0);
+
+	/* Complete page removal. */
+	memset(&remove_ioc, 0, sizeof(remove_ioc));
+	remove_ioc.offset = offset;
+	remove_ioc.length = size;
+	count = 0;
+	do {
+		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+
+		errno_save = ret == -1 ? errno : 0;
+		if (errno_save != EAGAIN)
+			break;
+
+		count += remove_ioc.count;
+		remove_ioc.offset += remove_ioc.count;
+		remove_ioc.length -= remove_ioc.count;
+		remove_ioc.count = 0;
+	} while (remove_ioc.length != 0);
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	count += remove_ioc.count;
+	EXPECT_EQ(count, size);
+
+	return 0;
+}
+
+/*
+ * Compare performance with and without madvise call before EACCEPT'ing
+ * different size of regions.
+ */
+TEST_F_TIMEOUT(enclave, augment_via_madvise, TIMEOUT_DEFAULT)
+{
+	unsigned long advise_size = PAGE_SIZE;
+	unsigned long max_advise_size = get_total_epc_mem() * 3UL;
+	int speed_up_percent;
+	clock_t start;
+	double time_used1, time_used2;
+	size_t total_size = 0;
+	unsigned long i;
+	void *addr;
+
+	if (!sgx2_supported())
+		SKIP(return, "SGX2 not supported");
+
+	ASSERT_TRUE(setup_test_encl_dynamic(ENCL_HEAP_SIZE_DEFAULT,
+					    max_advise_size, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	for (i = 0; i < self->encl.nr_segments; i++) {
+		struct encl_segment *seg = &self->encl.segment_tbl[i];
+
+		total_size += seg->size;
+	}
+
+	for (i = 1; i < 52 && advise_size < max_advise_size; i++) {
+		addr = mmap((void *)self->encl.encl_base + total_size, advise_size,
+			    PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED,
+					self->encl.fd, 0);
+		EXPECT_NE(addr, MAP_FAILED);
+
+		start = clock();
+		EXPECT_EQ(eaccept_range(self, addr, advise_size,
+					SGX_SECINFO_R | SGX_SECINFO_W
+								| SGX_SECINFO_REG
+								| SGX_SECINFO_PENDING,
+								_metadata), 0);
+		time_used1 = (double)clock() - start;
+
+		EXPECT_EQ(trim_remove_range(self, addr, advise_size, _metadata), 0);
+
+		start = clock();
+		EXPECT_EQ(madvise(addr, advise_size, MADV_WILLNEED), 0);
+		EXPECT_EQ(eaccept_range(self, addr, advise_size,
+					SGX_SECINFO_R | SGX_SECINFO_W
+								| SGX_SECINFO_REG
+								| SGX_SECINFO_PENDING,
+								_metadata), 0);
+		time_used2 = (double)clock() - start;
+
+		speed_up_percent = (int)((time_used1 - time_used2) / time_used1 * 100);
+		TH_LOG("madvise speed up for eaug'ing %10ld pages: %d%%",
+		       advise_size / PAGE_SIZE, speed_up_percent);
+		EXPECT_GE(speed_up_percent, 0);
+		EXPECT_EQ(trim_remove_range(self, addr, advise_size, _metadata), 0);
+	    munmap(addr, advise_size);
+		advise_size = (advise_size << 1UL);
+	}
+	encl_delete(&self->encl);
+}
+
 /*
  * SGX2 page type modification test in two phases:
  * Phase 1:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page
  2022-10-19 19:14 ` [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
  2022-10-19 19:14   ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
@ 2022-10-23 21:22   ` Jarkko Sakkinen
  1 sibling, 0 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2022-10-23 21:22 UTC (permalink / raw)
  To: Haitao Huang; +Cc: linux-sgx, dave.hansen, reinette.chatre, vijay.dhanraj

On Wed, Oct 19, 2022 at 12:14:10PM -0700, Haitao Huang wrote:
> Change return type so it can be reused later for fops->fadvise

Does not mention the change in return values or reasoning for that.

> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 46 ++++++++++++++++++++++------------
>  arch/x86/kernel/cpu/sgx/encl.h |  3 ++-
>  2 files changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 8bdeae2fc309..c57e60d5a0aa 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -305,11 +305,11 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>   * on a SGX2 system then the EPC can be added dynamically via the SGX2
>   * ENCLS[EAUG] instruction.
>   *
> - * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
> - * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
> + * Returns: 0 when PTE was installed successfully, -EBUSY for waiting on
> + * reclaimer to free EPC, -ENOMEM for out of RAM, -EFAULT as error otherwise.
>   */
> -static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 -				     struct sgx_encl *encl, unsigned long addr)
> +int sgx_encl_eaug_page(struct vm_area_struct *vma,
> +		       struct sgx_encl *encl, unsigned long addr)

I'd prefer export changes to be separated to their own commits.

>  {
>  	vm_fault_t vmret = VM_FAULT_SIGBUS;
>  	struct sgx_pageinfo pginfo = {0};
> @@ -318,10 +318,10 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
>  	struct sgx_va_page *va_page;
>  	unsigned long phys_addr;
>  	u64 secinfo_flags;
> -	int ret;
> +	int ret = -EFAULT;
>  
>  	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> -		return VM_FAULT_SIGBUS;
> +		return -EFAULT;
>  
>  	/*
>  	 * Ignore internal permission checking for dynamically added pages.
> @@ -332,21 +332,21 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
>  	secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
>  	encl_page = sgx_encl_page_alloc(encl, addr - encl->base, secinfo_flags);
>  	if (IS_ERR(encl_page))
> -		return VM_FAULT_OOM;
> +		return -ENOMEM;
>  
>  	mutex_lock(&encl->lock);
>  
>  	epc_page = sgx_alloc_epc_page(encl_page, false);
>  	if (IS_ERR(epc_page)) {
>  		if (PTR_ERR(epc_page) == -EBUSY)
> -			vmret =  VM_FAULT_NOPAGE;
> +			ret =  -EBUSY;
>  		goto err_out_unlock;
>  	}
>  
>  	va_page = sgx_encl_grow(encl, false);
>  	if (IS_ERR(va_page)) {
>  		if (PTR_ERR(va_page) == -EBUSY)
> -			vmret = VM_FAULT_NOPAGE;
> +			ret = -EBUSY;
>  		goto err_out_epc;
>  	}
>  
> @@ -359,16 +359,20 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
>  	 * If ret == -EBUSY then page was created in another flow while
>  	 * running without encl->lock
>  	 */
> -	if (ret)
> +	if (ret) {
> +		ret = -EFAULT;
>  		goto err_out_shrink;
> +	}
>  
>  	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
>  	pginfo.addr = encl_page->desc & PAGE_MASK;
>  	pginfo.metadata = 0;
>  
>  	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> -	if (ret)
> +	if (ret) {
> +		ret = -EFAULT;
>  		goto err_out;
> +	}
>  
>  	encl_page->encl = encl;
>  	encl_page->epc_page = epc_page;
> @@ -385,10 +389,10 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
>  	vmret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
>  	if (vmret != VM_FAULT_NOPAGE) {
>  		mutex_unlock(&encl->lock);
> -		return VM_FAULT_SIGBUS;
> +		return -EFAULT;
>  	}
>  	mutex_unlock(&encl->lock);
> -	return VM_FAULT_NOPAGE;
> +	return 0;
>  
>  err_out:
>  	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> @@ -401,7 +405,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
>  	mutex_unlock(&encl->lock);
>  	kfree(encl_page);
>  
> -	return vmret;
> +	return ret;
>  }
>  
>  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> @@ -431,8 +435,18 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  	 * enclave that will be checked for right away.
>  	 */
>  	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
> -	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
> -		return sgx_encl_eaug_page(vma, encl, addr);
> +	    (!xa_load(&encl->page_array, PFN_DOWN(addr)))) {
> +		switch (sgx_encl_eaug_page(vma, encl, addr)) {
> +		case 0:
> +		case -EBUSY:
> +			return VM_FAULT_NOPAGE;
> +		case -ENOMEM:
> +			return VM_FAULT_OOM;
> +		case -EFAULT:
> +		default:
> +			return VM_FAULT_SIGBUS;
> +		}
> +	}

I.e. why it is better to change working code like this, and not
instead add such conversion to the fadvise handler.

No need to response here. This should be explained in the commit
message.

>  
>  	mutex_lock(&encl->lock);
>  
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index a65a952116fd..36059d35e1bc 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -127,5 +127,6 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>  					 unsigned long addr);
>  struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim);
>  void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
> -
> +int sgx_encl_eaug_page(struct vm_area_struct *vma,
> +		       struct sgx_encl *encl, unsigned long addr);
>  #endif /* _X86_ENCL_H */
> -- 
> 2.25.1
> 

BR, Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED
  2022-10-19 19:14   ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
  2022-10-19 19:14     ` [RFC PATCH 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
@ 2022-10-23 21:23     ` Jarkko Sakkinen
  2022-10-24 15:14       ` Haitao Huang
  1 sibling, 1 reply; 9+ messages in thread
From: Jarkko Sakkinen @ 2022-10-23 21:23 UTC (permalink / raw)
  To: Haitao Huang; +Cc: linux-sgx, dave.hansen, reinette.chatre, vijay.dhanraj

On Wed, Oct 19, 2022 at 12:14:11PM -0700, Haitao Huang wrote:
> Add support for madvise(..., MADV_WILLNEED) by adding
> pages with EAUG. Implement fops->fadvise() callback to
> achieve this behaviour.
> 
> Note this is only done with best effort possible. If any
> errors encountered or EPC is under swapping, the operation
> will stop and return as normal.
> 
> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/driver.c | 81 ++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
> index aa9b8b868867..54b24897605b 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -2,6 +2,7 @@
>  /*  Copyright(c) 2016-20 Intel Corporation. */
>  
>  #include <linux/acpi.h>
> +#include <linux/fadvise.h>
>  #include <linux/miscdevice.h>
>  #include <linux/mman.h>
>  #include <linux/security.h>
> @@ -9,6 +10,7 @@
>  #include <asm/traps.h>
>  #include "driver.h"
>  #include "encl.h"
> +#include "encls.h"
>  
>  u64 sgx_attributes_reserved_mask;
>  u64 sgx_xfrm_reserved_mask = ~0x3;
> @@ -97,10 +99,88 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma->vm_ops = &sgx_vm_ops;
>  	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
>  	vma->vm_private_data = encl;
> +	/* Anchor vm_pgoff to the enclave base.
> +	 * So offset passed back to sgx_fadvise hook
> +	 * is relative to the enclave base
> +	 */
> +	vma->vm_pgoff = (vma->vm_start - encl->base) >> PAGE_SHIFT;
>  
>  	return 0;
>  }
>  
> +/*
> + * Add new pages to the enclave sequentially with ENCLS[EAUG] for the WILLNEED advice.
> + * Only do this to existing VMAs in the same enclave and reject the request.
> + * Returns:	0 if EAUG done with best effort, -EINVAL if any sub-range given
> + * is not in the enclave, or enclave is not initialized..
> + */
> +static int sgx_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> +{
> +	struct sgx_encl *encl = file->private_data;
> +	unsigned long start, end, pos;
> +	int ret = -EINVAL;
> +	struct vm_area_struct *vma = NULL;
> +
> +	/* Only support WILLNEED */
> +	if (advice != POSIX_FADV_WILLNEED)
> +		return -EINVAL;
> +	if (!encl)
> +		return -EINVAL;
> +	if (!cpu_feature_enabled(X86_FEATURE_SGX2))
> +		return -EINVAL;
> +
> +	if (offset + len < offset)
> +		return -EINVAL;
> +	if (encl->base + offset < encl->base)
> +		return -EINVAL;
> +	start  = offset + encl->base;
> +	end = start + len;
> +	if (end < start)
> +		return -EINVAL;
> +	if (end > encl->base + encl->size)
> +		return -EINVAL;
> +
> +	/* EAUG works only for initialized enclaves. */
> +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> +		return -EINVAL;
> +
> +	mmap_read_lock(current->mm);
> +
> +	vma = find_vma(current->mm, start);
> +	if (!vma)
> +		goto unlock;
> +	if (vma->vm_private_data != encl)
> +		goto unlock;
> +
> +	pos = start;
> +	if (pos < vma->vm_start || end > vma->vm_end) {
> +		/* Don't allow any gaps */
> +		goto unlock;
> +	}
> +	/* Here: vm_start <= pos < end <= vm_end */
> +	while (pos < end) {
> +		if (xa_load(&encl->page_array, PFN_DOWN(pos)))
> +			continue;
> +		if (signal_pending(current)) {
> +			if (pos == start)
> +				ret = -ERESTARTSYS;
> +			else
> +				ret = -EINTR;
> +			goto unlock;
> +		}
> +		ret = sgx_encl_eaug_page(vma, encl, pos);

You should instead just do the export in the previous commit,
and convert the return values here. Less pollution to the
existing code base.

> +		/* It's OK to not finish */
> +		if (ret)
> +			break;
> +		pos = pos + PAGE_SIZE;
> +		cond_resched();
> +	}
> +	ret = 0;
> +unlock:
> +	mmap_read_unlock(current->mm);
> +	return ret;
> +}
> +
>  static unsigned long sgx_get_unmapped_area(struct file *file,
>  					   unsigned long addr,
>  					   unsigned long len,
> @@ -133,6 +213,7 @@ static const struct file_operations sgx_encl_fops = {
>  	.compat_ioctl		= sgx_compat_ioctl,
>  #endif
>  	.mmap			= sgx_mmap,
> +	.fadvise		= sgx_fadvise,
>  	.get_unmapped_area	= sgx_get_unmapped_area,
>  };
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED
  2022-10-23 21:23     ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
@ 2022-10-24 15:14       ` Haitao Huang
  2022-11-01  0:49         ` Jarkko Sakkinen
  0 siblings, 1 reply; 9+ messages in thread
From: Haitao Huang @ 2022-10-24 15:14 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx, dave.hansen, reinette.chatre, vijay.dhanraj

Hi Jarkko,

On Sun, 23 Oct 2022 16:23:30 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:
...
>> +	/* Here: vm_start <= pos < end <= vm_end */
>> +	while (pos < end) {
>> +		if (xa_load(&encl->page_array, PFN_DOWN(pos)))
>> +			continue;
>> +		if (signal_pending(current)) {
>> +			if (pos == start)
>> +				ret = -ERESTARTSYS;
>> +			else
>> +				ret = -EINTR;
>> +			goto unlock;
>> +		}
>> +		ret = sgx_encl_eaug_page(vma, encl, pos);
>
> You should instead just do the export in the previous commit,
> and convert the return values here. Less pollution to the
> existing code base.
>

Thanks for the feedback. Just to clarify, you prefer following:

1. do export only in separate patch, no return type and value change
2. change return type and value in this patch.

Note the sgx_fadvise function needs to stop EAUGing and return in case of  
-EBUSY. So sgx_encl_eaug_page has to return error code -EBUSY separately,  
instead of lumping -EBUSY with VM_FAULT_NOPAGE which was fine for page  
fault handler.

Thanks
Haitao

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED
  2022-10-24 15:14       ` Haitao Huang
@ 2022-11-01  0:49         ` Jarkko Sakkinen
  0 siblings, 0 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2022-11-01  0:49 UTC (permalink / raw)
  To: Haitao Huang; +Cc: linux-sgx, dave.hansen, reinette.chatre, vijay.dhanraj

On Mon, Oct 24, 2022 at 10:14:03AM -0500, Haitao Huang wrote:
> Hi Jarkko,
> 
> On Sun, 23 Oct 2022 16:23:30 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> ...
> > > +	/* Here: vm_start <= pos < end <= vm_end */
> > > +	while (pos < end) {
> > > +		if (xa_load(&encl->page_array, PFN_DOWN(pos)))
> > > +			continue;
> > > +		if (signal_pending(current)) {
> > > +			if (pos == start)
> > > +				ret = -ERESTARTSYS;
> > > +			else
> > > +				ret = -EINTR;
> > > +			goto unlock;
> > > +		}
> > > +		ret = sgx_encl_eaug_page(vma, encl, pos);
> > 
> > You should instead just do the export in the previous commit,
> > and convert the return values here. Less pollution to the
> > existing code base.
> > 
> 
> Thanks for the feedback. Just to clarify, you prefer following:
> 
> 1. do export only in separate patch, no return type and value change
> 2. change return type and value in this patch.
> 
> Note the sgx_fadvise function needs to stop EAUGing and return in case of
> -EBUSY. So sgx_encl_eaug_page has to return error code -EBUSY separately,
> instead of lumping -EBUSY with VM_FAULT_NOPAGE which was fine for page fault
> handler.

I guess the problem is that any of this is not explained in the commit
message, i.e. should already answer to this question. My preference is
argumented solution.

BR, Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-01  0:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-19 19:14 [RFC PATCH 0/4] x86/sgx: implement support for MADV_WILLNEED Haitao Huang
2022-10-19 19:14 ` [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
2022-10-19 19:14   ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
2022-10-19 19:14     ` [RFC PATCH 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
2022-10-19 19:14       ` [RFC PATCH 4/4] selftests/sgx: Add test for madvise(..., WILLNEED) Haitao Huang
2022-10-23 21:23     ` [RFC PATCH 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
2022-10-24 15:14       ` Haitao Huang
2022-11-01  0:49         ` Jarkko Sakkinen
2022-10-23 21:22   ` [RFC PATCH 1/4] x86/sgx: Export sgx_encl_eaug_page Jarkko Sakkinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.