linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v34 10/24] mm: Add vm_ops->mprotect()
       [not found] <20200707030204.126021-1-jarkko.sakkinen@linux.intel.com>
@ 2020-07-07  3:01 ` Jarkko Sakkinen
  2020-07-07  3:14   ` Matthew Wilcox
  2020-07-07  3:01 ` [PATCH v34 11/24] x86/sgx: Add SGX enclave driver Jarkko Sakkinen
  2020-07-07  3:01 ` [PATCH v34 16/24] x86/sgx: Add a page reclaimer Jarkko Sakkinen
  2 siblings, 1 reply; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-07  3:01 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: linux-kernel, Sean Christopherson, linux-mm, Andrew Morton,
	Matthew Wilcox, Jethro Beekman, Jarkko Sakkinen,
	andriy.shevchenko, asapek, bp, cedric.xing, chenalexchen,
	conradparker, cyhanish, dave.hansen, haitao.huang, josh,
	kai.huang, kai.svahn, kmoy, ludloff, luto, nhorman, npmccallum,
	puiterwijk, rientjes, tglx, yaozhangx

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add vm_ops()->mprotect() for additional constraints for a VMA.

Intel Software Guard eXtensions (SGX) will use this callback to add two
constraints:

1. Verify that the address range does not have holes: each page address
   must be filled with an enclave page.
2. Verify that VMA permissions won't surpass the permissions of any enclave
   page within the address range. Enclave cryptographically sealed
   permissions for each page address that set the upper limit for possible
   VMA permissions. Not respecting this can cause #GP's to be emitted.

Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 include/linux/mm.h |  2 ++
 mm/mprotect.c      | 13 ++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index dc7b87310c10..fc0e3ef28873 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -542,6 +542,8 @@ struct vm_operations_struct {
 	void (*close)(struct vm_area_struct * area);
 	int (*split)(struct vm_area_struct * area, unsigned long addr);
 	int (*mremap)(struct vm_area_struct * area);
+	int (*mprotect)(struct vm_area_struct *vma, unsigned long start,
+			unsigned long end, unsigned long prot);
 	vm_fault_t (*fault)(struct vm_fault *vmf);
 	vm_fault_t (*huge_fault)(struct vm_fault *vmf,
 			enum page_entry_size pe_size);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ce8b8a5eacbb..e23dfd8d18bc 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -603,13 +603,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 			goto out;
 		}
 
+		tmp = vma->vm_end;
+		if (tmp > end)
+			tmp = end;
+
 		error = security_file_mprotect(vma, reqprot, prot);
 		if (error)
 			goto out;
 
-		tmp = vma->vm_end;
-		if (tmp > end)
-			tmp = end;
+		if (vma->vm_ops && vma->vm_ops->mprotect) {
+			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
+			if (error)
+				goto out;
+		}
+
 		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
 		if (error)
 			goto out;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v34 11/24] x86/sgx: Add SGX enclave driver
       [not found] <20200707030204.126021-1-jarkko.sakkinen@linux.intel.com>
  2020-07-07  3:01 ` [PATCH v34 10/24] mm: Add vm_ops->mprotect() Jarkko Sakkinen
@ 2020-07-07  3:01 ` Jarkko Sakkinen
  2020-07-07  3:36   ` Matthew Wilcox
  2020-07-07  3:01 ` [PATCH v34 16/24] x86/sgx: Add a page reclaimer Jarkko Sakkinen
  2 siblings, 1 reply; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-07  3:01 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: linux-kernel, Jarkko Sakkinen, linux-security-module, linux-mm,
	Andrew Morton, Matthew Wilcox, Jethro Beekman, Haitao Huang,
	Chunyang Hui, Jordan Hand, Nathaniel McCallum, Seth Moore,
	Sean Christopherson, Suresh Siddha, andriy.shevchenko, asapek,
	bp, cedric.xing, chenalexchen, conradparker, cyhanish,
	dave.hansen, haitao.huang, josh, kai.huang, kai.svahn, kmoy,
	ludloff, luto, nhorman, puiterwijk, rientjes, tglx, yaozhangx

Intel Software Guard eXtensions (SGX) is a set of CPU instructions that
can be used by applications to set aside private regions of code and
data. The code outside the SGX hosted software entity is disallowed to
access the memory inside the enclave enforced by the CPU. We call these
entities enclaves.

Add a driver that provides an ioctl API to construct and run enclaves.
Enclaves are constructed from pages residing in reserved physical memory
areas. The contents of these pages can only be accessed when they are
mapped as part of an enclave, by a hardware thread running inside the
enclave.

The starting state of an enclave consists of a fixed measured set of
pages that are copied to the EPC during the construction process by
using ENCLS leaf functions and Software Enclave Control Structure (SECS)
that defines the enclave properties.

Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
the EPC and EINIT checks a given signed measurement and moves the enclave
into a state ready for execution.

An initialized enclave can only be accessed through special Thread Control
Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER.  This leaf
function converts a thread into enclave mode and continues the execution in
the offset defined by the TCS provided to EENTER. An enclave is exited
through syscall, exception, interrupts or by explicitly calling another
ENCLU leaf EEXIT.

The mmap() permissions are capped by the contained enclave page
permissions. The mapped areas must also be opaque, i.e. each page address
must contain a page. This logic is implemented in sgx_encl_may_map().

Cc: linux-security-module@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>
Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
Tested-by: Seth Moore <sethmo@google.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/Makefile |   2 +
 arch/x86/kernel/cpu/sgx/driver.c | 178 ++++++++++++++++
 arch/x86/kernel/cpu/sgx/driver.h |  29 +++
 arch/x86/kernel/cpu/sgx/encl.c   | 335 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h   |  87 ++++++++
 arch/x86/kernel/cpu/sgx/main.c   |  11 +
 6 files changed, 642 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
 create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
 create mode 100644 arch/x86/kernel/cpu/sgx/encl.h

diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 79510ce01b3b..3fc451120735 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,2 +1,4 @@
 obj-y += \
+	driver.o \
+	encl.o \
 	main.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
new file mode 100644
index 000000000000..682ec78230ac
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/acpi.h>
+#include <linux/miscdevice.h>
+#include <linux/mman.h>
+#include <linux/security.h>
+#include <linux/suspend.h>
+#include <asm/traps.h>
+#include "driver.h"
+#include "encl.h"
+
+MODULE_DESCRIPTION("Intel SGX Enclave Driver");
+MODULE_AUTHOR("Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>");
+MODULE_LICENSE("Dual BSD/GPL");
+
+u64 sgx_encl_size_max_32;
+u64 sgx_encl_size_max_64;
+u32 sgx_misc_reserved_mask;
+u64 sgx_attributes_reserved_mask;
+u64 sgx_xfrm_reserved_mask = ~0x3;
+u32 sgx_xsave_size_tbl[64];
+
+static int sgx_open(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl;
+	int ret;
+
+	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
+	if (!encl)
+		return -ENOMEM;
+
+	atomic_set(&encl->flags, 0);
+	kref_init(&encl->refcount);
+	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);
+	mutex_init(&encl->lock);
+	INIT_LIST_HEAD(&encl->mm_list);
+	spin_lock_init(&encl->mm_lock);
+
+	ret = init_srcu_struct(&encl->srcu);
+	if (ret) {
+		kfree(encl);
+		return ret;
+	}
+
+	file->private_data = encl;
+
+	return 0;
+}
+
+static int sgx_release(struct inode *inode, struct file *file)
+{
+	struct sgx_encl *encl = file->private_data;
+	struct sgx_encl_mm *encl_mm;
+
+	for ( ; ; )  {
+		spin_lock(&encl->mm_lock);
+
+		if (list_empty(&encl->mm_list)) {
+			encl_mm = NULL;
+		} else {
+			encl_mm = list_first_entry(&encl->mm_list,
+						   struct sgx_encl_mm, list);
+			list_del_rcu(&encl_mm->list);
+		}
+
+		spin_unlock(&encl->mm_lock);
+
+		/* The list is empty, ready to go. */
+		if (!encl_mm)
+			break;
+
+		synchronize_srcu(&encl->srcu);
+		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
+		kfree(encl_mm);
+	};
+
+	mutex_lock(&encl->lock);
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+	mutex_unlock(&encl->lock);
+
+	kref_put(&encl->refcount, sgx_encl_release);
+	return 0;
+}
+
+static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = file->private_data;
+	int ret;
+
+	ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end,
+			       vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC));
+	if (ret)
+		return ret;
+
+	ret = sgx_encl_mm_add(encl, vma->vm_mm);
+	if (ret)
+		return ret;
+
+	vma->vm_ops = &sgx_vm_ops;
+	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+	vma->vm_private_data = encl;
+
+	return 0;
+}
+
+static unsigned long sgx_get_unmapped_area(struct file *file,
+					   unsigned long addr,
+					   unsigned long len,
+					   unsigned long pgoff,
+					   unsigned long flags)
+{
+	if (flags & MAP_PRIVATE)
+		return -EINVAL;
+
+	if (flags & MAP_FIXED)
+		return addr;
+
+	return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
+static const struct file_operations sgx_encl_fops = {
+	.owner			= THIS_MODULE,
+	.open			= sgx_open,
+	.release		= sgx_release,
+	.mmap			= sgx_mmap,
+	.get_unmapped_area	= sgx_get_unmapped_area,
+};
+
+static struct miscdevice sgx_dev_enclave = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "enclave",
+	.nodename = "sgx/enclave",
+	.fops = &sgx_encl_fops,
+};
+
+int __init sgx_drv_init(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	u64 attr_mask, xfrm_mask;
+	int ret;
+	int i;
+
+	if (!boot_cpu_has(X86_FEATURE_SGX_LC)) {
+		pr_info("The public key MSRs are not writable.\n");
+		return -ENODEV;
+	}
+
+	cpuid_count(SGX_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	sgx_misc_reserved_mask = ~ebx | SGX_MISC_RESERVED_MASK;
+	sgx_encl_size_max_64 = 1ULL << ((edx >> 8) & 0xFF);
+	sgx_encl_size_max_32 = 1ULL << (edx & 0xFF);
+
+	cpuid_count(SGX_CPUID, 1, &eax, &ebx, &ecx, &edx);
+
+	attr_mask = (((u64)ebx) << 32) + (u64)eax;
+	sgx_attributes_reserved_mask = ~attr_mask | SGX_ATTR_RESERVED_MASK;
+
+	if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+		xfrm_mask = (((u64)edx) << 32) + (u64)ecx;
+
+		for (i = 2; i < 64; i++) {
+			cpuid_count(0x0D, i, &eax, &ebx, &ecx, &edx);
+			if ((1 << i) & xfrm_mask)
+				sgx_xsave_size_tbl[i] = eax + ebx;
+		}
+
+		sgx_xfrm_reserved_mask = ~xfrm_mask;
+	}
+
+	ret = misc_register(&sgx_dev_enclave);
+	if (ret) {
+		pr_err("Creating /dev/sgx/enclave failed with %d.\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
new file mode 100644
index 000000000000..f7ce40dedc91
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+#ifndef __ARCH_SGX_DRIVER_H__
+#define __ARCH_SGX_DRIVER_H__
+
+#include <crypto/hash.h>
+#include <linux/kref.h>
+#include <linux/mmu_notifier.h>
+#include <linux/radix-tree.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include "sgx.h"
+
+#define SGX_EINIT_SPIN_COUNT	20
+#define SGX_EINIT_SLEEP_COUNT	50
+#define SGX_EINIT_SLEEP_TIME	20
+
+extern u64 sgx_encl_size_max_32;
+extern u64 sgx_encl_size_max_64;
+extern u32 sgx_misc_reserved_mask;
+extern u64 sgx_attributes_reserved_mask;
+extern u64 sgx_xfrm_reserved_mask;
+extern u32 sgx_xsave_size_tbl[64];
+
+long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
+
+int sgx_drv_init(void);
+
+#endif /* __ARCH_X86_SGX_DRIVER_H__ */
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
new file mode 100644
index 000000000000..c3755f8bbcba
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -0,0 +1,335 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) 2016-18 Intel Corporation.
+
+#include <linux/lockdep.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/shmem_fs.h>
+#include <linux/suspend.h>
+#include <linux/sched/mm.h>
+#include "arch.h"
+#include "encl.h"
+#include "encls.h"
+#include "sgx.h"
+
+static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+						unsigned long addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned int flags;
+
+	/* If process was forked, VMA is still there but vm_private_data is set
+	 * to NULL.
+	 */
+	if (!encl)
+		return ERR_PTR(-EFAULT);
+
+	flags = atomic_read(&encl->flags);
+
+	if ((flags & SGX_ENCL_DEAD) || !(flags & SGX_ENCL_INITIALIZED))
+		return ERR_PTR(-EFAULT);
+
+	entry = radix_tree_lookup(&encl->page_tree, addr >> PAGE_SHIFT);
+	if (!entry)
+		return ERR_PTR(-EFAULT);
+
+	/* Page is already resident in the EPC. */
+	if (entry->epc_page)
+		return entry;
+
+	return ERR_PTR(-EFAULT);
+}
+
+static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm =
+		container_of(mn, struct sgx_encl_mm, mmu_notifier);
+	struct sgx_encl_mm *tmp = NULL;
+
+	/*
+	 * The enclave itself can remove encl_mm.  Note, objects can't be moved
+	 * off an RCU protected list, but deletion is ok.
+	 */
+	spin_lock(&encl_mm->encl->mm_lock);
+	list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) {
+		if (tmp == encl_mm) {
+			list_del_rcu(&encl_mm->list);
+			break;
+		}
+	}
+	spin_unlock(&encl_mm->encl->mm_lock);
+
+	if (tmp == encl_mm) {
+		synchronize_srcu(&encl_mm->encl->srcu);
+		mmu_notifier_put(mn);
+	}
+}
+
+static void sgx_mmu_notifier_free(struct mmu_notifier *mn)
+{
+	struct sgx_encl_mm *encl_mm =
+		container_of(mn, struct sgx_encl_mm, mmu_notifier);
+
+	kfree(encl_mm);
+}
+
+static const struct mmu_notifier_ops sgx_mmu_notifier_ops = {
+	.release		= sgx_mmu_notifier_release,
+	.free_notifier		= sgx_mmu_notifier_free,
+};
+
+static struct sgx_encl_mm *sgx_encl_find_mm(struct sgx_encl *encl,
+					    struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm = NULL;
+	struct sgx_encl_mm *tmp;
+	int idx;
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(tmp, &encl->mm_list, list) {
+		if (tmp->mm == mm) {
+			encl_mm = tmp;
+			break;
+		}
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	return encl_mm;
+}
+
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm;
+	int ret;
+
+	/* mm_list can be accessed only by a single thread at a time. */
+	mmap_assert_write_locked(mm);
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_DEAD)
+		return -EINVAL;
+
+	/*
+	 * mm_structs are kept on mm_list until the mm or the enclave dies,
+	 * i.e. once an mm is off the list, it's gone for good, therefore it's
+	 * impossible to get a false positive on @mm due to a stale mm_list.
+	 */
+	if (sgx_encl_find_mm(encl, mm))
+		return 0;
+
+	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
+	if (!encl_mm)
+		return -ENOMEM;
+
+	encl_mm->encl = encl;
+	encl_mm->mm = mm;
+	encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops;
+
+	ret = __mmu_notifier_register(&encl_mm->mmu_notifier, mm);
+	if (ret) {
+		kfree(encl_mm);
+		return ret;
+	}
+
+	spin_lock(&encl->mm_lock);
+	list_add_rcu(&encl_mm->list, &encl->mm_list);
+	spin_unlock(&encl->mm_lock);
+
+	return 0;
+}
+
+static void sgx_vma_open(struct vm_area_struct *vma)
+{
+	struct sgx_encl *encl = vma->vm_private_data;
+
+	if (!encl)
+		return;
+
+	if (sgx_encl_mm_add(encl, vma->vm_mm))
+		vma->vm_private_data = NULL;
+}
+
+static unsigned int sgx_vma_fault(struct vm_fault *vmf)
+{
+	unsigned long addr = (unsigned long)vmf->address;
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_encl *encl = vma->vm_private_data;
+	struct sgx_encl_page *entry;
+	int ret = VM_FAULT_NOPAGE;
+	unsigned long pfn;
+
+	if (!encl)
+		return VM_FAULT_SIGBUS;
+
+	mutex_lock(&encl->lock);
+
+	entry = sgx_encl_load_page(encl, addr);
+	if (IS_ERR(entry)) {
+		if (unlikely(PTR_ERR(entry) != -EBUSY))
+			ret = VM_FAULT_SIGBUS;
+
+		goto out;
+	}
+
+	if (!follow_pfn(vma, addr, &pfn))
+		goto out;
+
+	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(entry->epc_page->desc));
+	if (ret != VM_FAULT_NOPAGE) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
+out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
+/**
+ * sgx_encl_may_map() - Check if a requested VMA mapping is allowed
+ * @encl:		an enclave
+ * @start:		lower bound of the address range, inclusive
+ * @end:		upper bound of the address range, exclusive
+ * @vm_prot_bits:	requested protections of the address range
+ *
+ * Iterate through the enclave pages contained within [@start, @end) to verify
+ * the permissions requested by @vm_prot_bits do not exceed that of any enclave
+ * page to be mapped.
+ *
+ * Return:
+ *   0 on success,
+ *   -EACCES if VMA permissions exceed enclave page permissions
+ */
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_prot_bits)
+{
+	unsigned long idx, idx_start, idx_end;
+	struct sgx_encl_page *page;
+
+	/*
+	 * Disallow RIE tasks as their VMA permissions might conflict with the
+	 * enclave page permissions.
+	 */
+	if (!!(current->personality & READ_IMPLIES_EXEC))
+		return -EACCES;
+
+	idx_start = PFN_DOWN(start);
+	idx_end = PFN_DOWN(end - 1);
+
+	for (idx = idx_start; idx <= idx_end; ++idx) {
+		mutex_lock(&encl->lock);
+		page = radix_tree_lookup(&encl->page_tree, idx);
+		mutex_unlock(&encl->lock);
+
+		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
+			return -EACCES;
+	}
+
+	return 0;
+}
+
+static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start,
+			    unsigned long end, unsigned long prot)
+{
+	return sgx_encl_may_map(vma->vm_private_data, start, end,
+				calc_vm_prot_bits(prot, 0));
+}
+
+const struct vm_operations_struct sgx_vm_ops = {
+	.open = sgx_vma_open,
+	.fault = sgx_vma_fault,
+	.mprotect = sgx_vma_mprotect,
+};
+
+/**
+ * sgx_encl_find - find an enclave
+ * @mm:		mm struct of the current process
+ * @addr:	address in the ELRANGE
+ * @vma:	the resulting VMA
+ *
+ * Find an enclave identified by the given address. Give back a VMA that is
+ * part of the enclave and located in that address. The VMA is given back if it
+ * is a proper enclave VMA even if an &sgx_encl instance does not exist yet
+ * (enclave creation has not been performed).
+ *
+ * Return:
+ *   0 on success,
+ *   -EINVAL if an enclave was not found,
+ *   -ENOENT if the enclave has not been created yet
+ */
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma)
+{
+	struct vm_area_struct *result;
+	struct sgx_encl *encl;
+
+	result = find_vma(mm, addr);
+	if (!result || result->vm_ops != &sgx_vm_ops || addr < result->vm_start)
+		return -EINVAL;
+
+	encl = result->vm_private_data;
+	*vma = result;
+
+	return encl ? 0 : -ENOENT;
+}
+
+/**
+ * sgx_encl_destroy() - destroy enclave resources
+ * @encl:	an &sgx_encl instance
+ */
+void sgx_encl_destroy(struct sgx_encl *encl)
+{
+	struct sgx_encl_page *entry;
+	struct radix_tree_iter iter;
+	void **slot;
+
+	atomic_or(SGX_ENCL_DEAD, &encl->flags);
+
+	radix_tree_for_each_slot(slot, &encl->page_tree, &iter, 0) {
+		entry = *slot;
+
+		if (entry->epc_page) {
+			sgx_free_epc_page(entry->epc_page);
+			encl->secs_child_cnt--;
+			entry->epc_page = NULL;
+		}
+
+		radix_tree_delete(&entry->encl->page_tree,
+				  PFN_DOWN(entry->desc));
+		kfree(entry);
+	}
+
+	if (!encl->secs_child_cnt && encl->secs.epc_page) {
+		sgx_free_epc_page(encl->secs.epc_page);
+		encl->secs.epc_page = NULL;
+	}
+}
+
+/**
+ * sgx_encl_release - Destroy an enclave instance
+ * @kref:	address of a kref inside &sgx_encl
+ *
+ * Used together with kref_put(). Frees all the resources associated with the
+ * enclave and the instance itself.
+ */
+void sgx_encl_release(struct kref *ref)
+{
+	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
+
+	sgx_encl_destroy(encl);
+
+	if (encl->backing)
+		fput(encl->backing);
+
+	cleanup_srcu_struct(&encl->srcu);
+
+	WARN_ON_ONCE(!list_empty(&encl->mm_list));
+
+	/* Detect EPC page leak's. */
+	WARN_ON_ONCE(encl->secs_child_cnt);
+	WARN_ON_ONCE(encl->secs.epc_page);
+
+	kfree(encl);
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
new file mode 100644
index 000000000000..1d1bc5d590ee
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/**
+ * Copyright(c) 2016-19 Intel Corporation.
+ */
+#ifndef _X86_ENCL_H
+#define _X86_ENCL_H
+
+#include <linux/cpumask.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mm_types.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/radix-tree.h>
+#include <linux/srcu.h>
+#include <linux/workqueue.h>
+#include "sgx.h"
+
+/**
+ * enum sgx_encl_page_desc - defines bits for an enclave page's descriptor
+ * %SGX_ENCL_PAGE_ADDR_MASK:		Holds the virtual address of the page.
+ *
+ * The page address for SECS is zero and is used by the subsystem to recognize
+ * the SECS page.
+ */
+enum sgx_encl_page_desc {
+	/* Bits 11:3 are available when the page is not swapped. */
+	SGX_ENCL_PAGE_ADDR_MASK		= PAGE_MASK,
+};
+
+#define SGX_ENCL_PAGE_ADDR(page) \
+	((page)->desc & SGX_ENCL_PAGE_ADDR_MASK)
+
+struct sgx_encl_page {
+	unsigned long desc;
+	unsigned long vm_max_prot_bits;
+	struct sgx_epc_page *epc_page;
+	struct sgx_encl *encl;
+};
+
+enum sgx_encl_flags {
+	SGX_ENCL_CREATED	= BIT(0),
+	SGX_ENCL_INITIALIZED	= BIT(1),
+	SGX_ENCL_DEBUG		= BIT(2),
+	SGX_ENCL_DEAD		= BIT(3),
+	SGX_ENCL_IOCTL		= BIT(4),
+};
+
+struct sgx_encl_mm {
+	struct sgx_encl *encl;
+	struct mm_struct *mm;
+	struct list_head list;
+	struct mmu_notifier mmu_notifier;
+};
+
+struct sgx_encl {
+	atomic_t flags;
+	u64 secs_attributes;
+	u64 allowed_attributes;
+	unsigned int page_cnt;
+	unsigned int secs_child_cnt;
+	struct mutex lock;
+	struct list_head mm_list;
+	spinlock_t mm_lock;
+	struct file *backing;
+	struct kref refcount;
+	struct srcu_struct srcu;
+	unsigned long base;
+	unsigned long size;
+	unsigned long ssaframesize;
+	struct radix_tree_root page_tree;
+	struct sgx_encl_page secs;
+	cpumask_t cpumask;
+};
+
+extern const struct vm_operations_struct sgx_vm_ops;
+
+int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
+		  struct vm_area_struct **vma);
+void sgx_encl_destroy(struct sgx_encl *encl);
+void sgx_encl_release(struct kref *ref);
+int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
+int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
+		     unsigned long end, unsigned long vm_prot_bits);
+
+#endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 97c6895fb6c9..4137254fb29e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -9,6 +9,8 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include "driver.h"
+#include "encl.h"
 #include "encls.h"
 
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
@@ -260,6 +262,8 @@ static bool __init sgx_page_cache_init(void)
 
 static void __init sgx_init(void)
 {
+	int ret;
+
 	if (!boot_cpu_has(X86_FEATURE_SGX))
 		return;
 
@@ -269,8 +273,15 @@ static void __init sgx_init(void)
 	if (!sgx_page_reclaimer_init())
 		goto err_page_cache;
 
+	ret = sgx_drv_init();
+	if (ret)
+		goto err_kthread;
+
 	return;
 
+err_kthread:
+	kthread_stop(ksgxswapd_tsk);
+
 err_page_cache:
 	sgx_page_cache_teardown();
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v34 16/24] x86/sgx: Add a page reclaimer
       [not found] <20200707030204.126021-1-jarkko.sakkinen@linux.intel.com>
  2020-07-07  3:01 ` [PATCH v34 10/24] mm: Add vm_ops->mprotect() Jarkko Sakkinen
  2020-07-07  3:01 ` [PATCH v34 11/24] x86/sgx: Add SGX enclave driver Jarkko Sakkinen
@ 2020-07-07  3:01 ` Jarkko Sakkinen
  2 siblings, 0 replies; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-07  3:01 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: linux-kernel, Jarkko Sakkinen, linux-mm, Jethro Beekman,
	Jordan Hand, Nathaniel McCallum, Chunyang Hui, Seth Moore,
	Sean Christopherson, akpm, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, puiterwijk, rientjes, tglx, yaozhangx

There is a limited amount of EPC available. Therefore, some of it must be
copied to the regular memory, and only subset kept in the SGX reserved
memory. While kernel cannot directly access enclave memory, SGX provides a
set of ENCLS leaf functions to perform reclaiming.

This commits implements a page reclaimer by using these leaf functions. It
picks the victim pages in LRU fashion from all the enclaves running in the
system. The thread ksgxswapd reclaims pages on the event when the number of
free EPC pages goes below SGX_NR_LOW_PAGES up until it reaches
SGX_NR_HIGH_PAGES.

sgx_alloc_epc_page() can optionally directly reclaim pages with @reclaim
set true. A caller must also supply owner for each page so that the
reclaimer can access the associated enclaves. This is needed for locking,
as most of the ENCLS leafs cannot be executed concurrently for an enclave.
The owner is also needed for accessing SECS, which is required to be
resident when its child pages are being reclaimed.

Cc: linux-mm@kvack.org
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jordan Hand <jorhand@linux.microsoft.com>
Tested-by: Nathaniel McCallum <npmccallum@redhat.com>
Tested-by: Chunyang Hui <sanqian.hcy@antfin.com>
Tested-by: Seth Moore <sethmo@google.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/driver.c |   1 +
 arch/x86/kernel/cpu/sgx/encl.c   | 340 +++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/encl.h   |  41 +++
 arch/x86/kernel/cpu/sgx/ioctl.c  |  77 ++++-
 arch/x86/kernel/cpu/sgx/main.c   | 464 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h    |   9 +
 6 files changed, 925 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 1cebb6e9c9b7..83e186495e67 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -32,6 +32,7 @@ static int sgx_open(struct inode *inode, struct file *file)
 
 	atomic_set(&encl->flags, 0);
 	kref_init(&encl->refcount);
+	INIT_LIST_HEAD(&encl->va_pages);
 	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);
 	mutex_init(&encl->lock);
 	INIT_LIST_HEAD(&encl->mm_list);
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index c3755f8bbcba..b22e7187142d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,9 +12,84 @@
 #include "encls.h"
 #include "sgx.h"
 
+static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
+			   struct sgx_epc_page *epc_page,
+			   struct sgx_epc_page *secs_page)
+{
+	unsigned long va_offset = SGX_ENCL_PAGE_VA_OFFSET(encl_page);
+	struct sgx_encl *encl = encl_page->encl;
+	struct sgx_pageinfo pginfo;
+	struct sgx_backing b;
+	pgoff_t page_index;
+	int ret;
+
+	if (secs_page)
+		page_index = SGX_ENCL_PAGE_INDEX(encl_page);
+	else
+		page_index = PFN_DOWN(encl->size);
+
+	ret = sgx_encl_get_backing(encl, page_index, &b);
+	if (ret)
+		return ret;
+
+	pginfo.addr = SGX_ENCL_PAGE_ADDR(encl_page);
+	pginfo.contents = (unsigned long)kmap_atomic(b.contents);
+	pginfo.metadata = (unsigned long)kmap_atomic(b.pcmd) +
+			  b.pcmd_offset;
+
+	if (secs_page)
+		pginfo.secs = (u64)sgx_get_epc_addr(secs_page);
+	else
+		pginfo.secs = 0;
+
+	ret = __eldu(&pginfo, sgx_get_epc_addr(epc_page),
+		     sgx_get_epc_addr(encl_page->va_page->epc_page) +
+				      va_offset);
+	if (ret) {
+		if (encls_failed(ret))
+			ENCLS_WARN(ret, "ELDU");
+
+		ret = -EFAULT;
+	}
+
+	kunmap_atomic((void *)(unsigned long)(pginfo.metadata - b.pcmd_offset));
+	kunmap_atomic((void *)(unsigned long)pginfo.contents);
+
+	sgx_encl_put_backing(&b, false);
+
+	return ret;
+}
+
+static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
+					  struct sgx_epc_page *secs_page)
+{
+	unsigned long va_offset = SGX_ENCL_PAGE_VA_OFFSET(encl_page);
+	struct sgx_encl *encl = encl_page->encl;
+	struct sgx_epc_page *epc_page;
+	int ret;
+
+	epc_page = sgx_alloc_epc_page(encl_page, false);
+	if (IS_ERR(epc_page))
+		return epc_page;
+
+	ret = __sgx_encl_eldu(encl_page, epc_page, secs_page);
+	if (ret) {
+		sgx_free_epc_page(epc_page);
+		return ERR_PTR(ret);
+	}
+
+	sgx_free_va_slot(encl_page->va_page, va_offset);
+	list_move(&encl_page->va_page->list, &encl->va_pages);
+	encl_page->desc &= ~SGX_ENCL_PAGE_VA_OFFSET_MASK;
+	encl_page->epc_page = epc_page;
+
+	return epc_page;
+}
+
 static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 						unsigned long addr)
 {
+	struct sgx_epc_page *epc_page;
 	struct sgx_encl_page *entry;
 	unsigned int flags;
 
@@ -34,10 +109,27 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 		return ERR_PTR(-EFAULT);
 
 	/* Page is already resident in the EPC. */
-	if (entry->epc_page)
+	if (entry->epc_page) {
+		if (entry->desc & SGX_ENCL_PAGE_RECLAIMED)
+			return ERR_PTR(-EBUSY);
+
 		return entry;
+	}
+
+	if (!(encl->secs.epc_page)) {
+		epc_page = sgx_encl_eldu(&encl->secs, NULL);
+		if (IS_ERR(epc_page))
+			return ERR_CAST(epc_page);
+	}
+
+	epc_page = sgx_encl_eldu(entry, encl->secs.epc_page);
+	if (IS_ERR(epc_page))
+		return ERR_CAST(epc_page);
 
-	return ERR_PTR(-EFAULT);
+	encl->secs_child_cnt++;
+	sgx_mark_page_reclaimable(entry->epc_page);
+
+	return entry;
 }
 
 static void sgx_mmu_notifier_release(struct mmu_notifier *mn,
@@ -135,6 +227,9 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
 
 	spin_lock(&encl->mm_lock);
 	list_add_rcu(&encl_mm->list, &encl->mm_list);
+	/* Pairs with smp_rmb() in sgx_reclaimer_block(). */
+	smp_wmb();
+	encl->mm_list_version++;
 	spin_unlock(&encl->mm_lock);
 
 	return 0;
@@ -182,6 +277,8 @@ static unsigned int sgx_vma_fault(struct vm_fault *vmf)
 		goto out;
 	}
 
+	sgx_encl_test_and_clear_young(vma->vm_mm, entry);
+
 out:
 	mutex_unlock(&encl->lock);
 	return ret;
@@ -281,6 +378,7 @@ int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
  */
 void sgx_encl_destroy(struct sgx_encl *encl)
 {
+	struct sgx_va_page *va_page;
 	struct sgx_encl_page *entry;
 	struct radix_tree_iter iter;
 	void **slot;
@@ -291,6 +389,13 @@ void sgx_encl_destroy(struct sgx_encl *encl)
 		entry = *slot;
 
 		if (entry->epc_page) {
+			/*
+			 * The page and its radix tree entry cannot be freed
+			 * if the page is being held by the reclaimer.
+			 */
+			if (sgx_unmark_page_reclaimable(entry->epc_page))
+				continue;
+
 			sgx_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
@@ -305,6 +410,19 @@ void sgx_encl_destroy(struct sgx_encl *encl)
 		sgx_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
+
+	/*
+	 * The reclaimer is responsible for checking SGX_ENCL_DEAD before doing
+	 * EWB, thus it's safe to free VA pages even if the reclaimer holds a
+	 * reference to the enclave.
+	 */
+	while (!list_empty(&encl->va_pages)) {
+		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
+					   list);
+		list_del(&va_page->list);
+		sgx_free_epc_page(va_page->epc_page);
+		kfree(va_page);
+	}
 }
 
 /**
@@ -333,3 +451,221 @@ void sgx_encl_release(struct kref *ref)
 
 	kfree(encl);
 }
+
+static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
+					      pgoff_t index)
+{
+	struct inode *inode = encl->backing->f_path.dentry->d_inode;
+	struct address_space *mapping = inode->i_mapping;
+	gfp_t gfpmask = mapping_gfp_mask(mapping);
+
+	return shmem_read_mapping_page_gfp(mapping, index, gfpmask);
+}
+
+/**
+ * sgx_encl_get_backing() - Pin the backing storage
+ * @encl:	an enclave
+ * @page_index:	enclave page index
+ * @backing:	data for accessing backing storage for the page
+ *
+ * Pin the backing storage pages for storing the encrypted contents and Paging
+ * Crypto MetaData (PCMD) of an enclave page.
+ *
+ * Return:
+ *   0 on success,
+ *   -errno otherwise.
+ */
+int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
+			 struct sgx_backing *backing)
+{
+	pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
+	struct page *contents;
+	struct page *pcmd;
+
+	contents = sgx_encl_get_backing_page(encl, page_index);
+	if (IS_ERR(contents))
+		return PTR_ERR(contents);
+
+	pcmd = sgx_encl_get_backing_page(encl, pcmd_index);
+	if (IS_ERR(pcmd)) {
+		put_page(contents);
+		return PTR_ERR(pcmd);
+	}
+
+	backing->page_index = page_index;
+	backing->contents = contents;
+	backing->pcmd = pcmd;
+	backing->pcmd_offset =
+		(page_index & (PAGE_SIZE / sizeof(struct sgx_pcmd) - 1)) *
+		sizeof(struct sgx_pcmd);
+
+	return 0;
+}
+
+/**
+ * sgx_encl_put_backing() - Unpin the backing storage
+ * @backing:	data for accessing backing storage for the page
+ * @do_write:	mark pages dirty
+ */
+void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
+{
+	if (do_write) {
+		set_page_dirty(backing->pcmd);
+		set_page_dirty(backing->contents);
+	}
+
+	put_page(backing->pcmd);
+	put_page(backing->contents);
+}
+
+static int sgx_encl_test_and_clear_young_cb(pte_t *ptep, unsigned long addr,
+					    void *data)
+{
+	pte_t pte;
+	int ret;
+
+	ret = pte_young(*ptep);
+	if (ret) {
+		pte = pte_mkold(*ptep);
+		set_pte_at((struct mm_struct *)data, addr, ptep, pte);
+	}
+
+	return ret;
+}
+
+/**
+ * sgx_encl_test_and_clear_young() - Test and reset the accessed bit
+ * @mm:		mm_struct that is checked
+ * @page:	enclave page to be tested for recent access
+ *
+ * Checks the Access (A) bit from the PTE corresponding to the enclave page and
+ * clears it.
+ *
+ * Return: 1 if the page has been recently accessed and 0 if not.
+ */
+int sgx_encl_test_and_clear_young(struct mm_struct *mm,
+				  struct sgx_encl_page *page)
+{
+	unsigned long addr = SGX_ENCL_PAGE_ADDR(page);
+	struct sgx_encl *encl = page->encl;
+	struct vm_area_struct *vma;
+	int ret;
+
+	ret = sgx_encl_find(mm, addr, &vma);
+	if (ret)
+		return 0;
+
+	if (encl != vma->vm_private_data)
+		return 0;
+
+	ret = apply_to_page_range(vma->vm_mm, addr, PAGE_SIZE,
+				  sgx_encl_test_and_clear_young_cb, vma->vm_mm);
+	if (ret < 0)
+		return 0;
+
+	return ret;
+}
+
+/**
+ * sgx_encl_reserve_page() - Reserve an enclave page
+ * @encl:	an enclave
+ * @addr:	a page address
+ *
+ * Load an enclave page and lock the enclave so that the page can be used by
+ * EDBG* and EMOD*.
+ *
+ * Return:
+ *   an enclave page on success
+ *   -EFAULT	if the load fails
+ */
+struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
+					    unsigned long addr)
+{
+	struct sgx_encl_page *entry;
+
+	for ( ; ; ) {
+		mutex_lock(&encl->lock);
+
+		entry = sgx_encl_load_page(encl, addr);
+		if (PTR_ERR(entry) != -EBUSY)
+			break;
+
+		mutex_unlock(&encl->lock);
+	}
+
+	if (IS_ERR(entry))
+		mutex_unlock(&encl->lock);
+
+	return entry;
+}
+
+/**
+ * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ *
+ * Allocate a free EPC page and convert it to a Version Array (VA) page.
+ *
+ * Return:
+ *   a VA page,
+ *   -errno otherwise
+ */
+struct sgx_epc_page *sgx_alloc_va_page(void)
+{
+	struct sgx_epc_page *epc_page;
+	int ret;
+
+	epc_page = sgx_alloc_epc_page(NULL, true);
+	if (IS_ERR(epc_page))
+		return ERR_CAST(epc_page);
+
+	ret = __epa(sgx_get_epc_addr(epc_page));
+	if (ret) {
+		WARN_ONCE(1, "EPA returned %d (0x%x)", ret, ret);
+		sgx_free_epc_page(epc_page);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return epc_page;
+}
+
+/**
+ * sgx_alloc_va_slot - allocate a VA slot
+ * @va_page:	a &struct sgx_va_page instance
+ *
+ * Allocates a slot from a &struct sgx_va_page instance.
+ *
+ * Return: offset of the slot inside the VA page
+ */
+unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page)
+{
+	int slot = find_first_zero_bit(va_page->slots, SGX_VA_SLOT_COUNT);
+
+	if (slot < SGX_VA_SLOT_COUNT)
+		set_bit(slot, va_page->slots);
+
+	return slot << 3;
+}
+
+/**
+ * sgx_free_va_slot - free a VA slot
+ * @va_page:	a &struct sgx_va_page instance
+ * @offset:	offset of the slot inside the VA page
+ *
+ * Frees a slot from a &struct sgx_va_page instance.
+ */
+void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset)
+{
+	clear_bit(offset >> 3, va_page->slots);
+}
+
+/**
+ * sgx_va_page_full - is the VA page full?
+ * @va_page:	a &struct sgx_va_page instance
+ *
+ * Return: true if all slots have been taken
+ */
+bool sgx_va_page_full(struct sgx_va_page *va_page)
+{
+	int slot = find_first_zero_bit(va_page->slots, SGX_VA_SLOT_COUNT);
+
+	return slot == SGX_VA_SLOT_COUNT;
+}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 1d1bc5d590ee..f0f72e591244 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -19,6 +19,10 @@
 
 /**
  * enum sgx_encl_page_desc - defines bits for an enclave page's descriptor
+ * %SGX_ENCL_PAGE_RECLAIMED:		The page is in the process of being
+ *					reclaimed.
+ * %SGX_ENCL_PAGE_VA_OFFSET_MASK:	Holds the offset in the Version Array
+ *					(VA) page for a swapped page.
  * %SGX_ENCL_PAGE_ADDR_MASK:		Holds the virtual address of the page.
  *
  * The page address for SECS is zero and is used by the subsystem to recognize
@@ -26,16 +30,23 @@
  */
 enum sgx_encl_page_desc {
 	/* Bits 11:3 are available when the page is not swapped. */
+	SGX_ENCL_PAGE_RECLAIMED		= BIT(3),
+	SGX_ENCL_PAGE_VA_OFFSET_MASK	= GENMASK_ULL(11, 3),
 	SGX_ENCL_PAGE_ADDR_MASK		= PAGE_MASK,
 };
 
 #define SGX_ENCL_PAGE_ADDR(page) \
 	((page)->desc & SGX_ENCL_PAGE_ADDR_MASK)
+#define SGX_ENCL_PAGE_VA_OFFSET(page) \
+	((page)->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK)
+#define SGX_ENCL_PAGE_INDEX(page) \
+	PFN_DOWN((page)->desc - (page)->encl->base)
 
 struct sgx_encl_page {
 	unsigned long desc;
 	unsigned long vm_max_prot_bits;
 	struct sgx_epc_page *epc_page;
+	struct sgx_va_page *va_page;
 	struct sgx_encl *encl;
 };
 
@@ -63,17 +74,27 @@ struct sgx_encl {
 	struct mutex lock;
 	struct list_head mm_list;
 	spinlock_t mm_lock;
+	unsigned long mm_list_version;
 	struct file *backing;
 	struct kref refcount;
 	struct srcu_struct srcu;
 	unsigned long base;
 	unsigned long size;
 	unsigned long ssaframesize;
+	struct list_head va_pages;
 	struct radix_tree_root page_tree;
 	struct sgx_encl_page secs;
 	cpumask_t cpumask;
 };
 
+#define SGX_VA_SLOT_COUNT 512
+
+struct sgx_va_page {
+	struct sgx_epc_page *epc_page;
+	DECLARE_BITMAP(slots, SGX_VA_SLOT_COUNT);
+	struct list_head list;
+};
+
 extern const struct vm_operations_struct sgx_vm_ops;
 
 int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
@@ -84,4 +105,24 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
 int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 		     unsigned long end, unsigned long vm_prot_bits);
 
+struct sgx_backing {
+	pgoff_t page_index;
+	struct page *contents;
+	struct page *pcmd;
+	unsigned long pcmd_offset;
+};
+
+int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
+			 struct sgx_backing *backing);
+void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
+int sgx_encl_test_and_clear_young(struct mm_struct *mm,
+				  struct sgx_encl_page *page);
+struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
+					    unsigned long addr);
+
+struct sgx_epc_page *sgx_alloc_va_page(void);
+unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
+void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
+bool sgx_va_page_full(struct sgx_va_page *va_page);
+
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index e4ec07a56d39..d5e0995931cd 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -19,6 +19,43 @@
 /* A per-cpu cache for the last known values of IA32_SGXLEPUBKEYHASHx MSRs. */
 static DEFINE_PER_CPU(u64 [4], sgx_lepubkeyhash_cache);
 
+static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
+{
+	struct sgx_va_page *va_page = NULL;
+	void *err;
+
+	BUILD_BUG_ON(SGX_VA_SLOT_COUNT !=
+		(SGX_ENCL_PAGE_VA_OFFSET_MASK >> 3) + 1);
+
+	if (!(encl->page_cnt % SGX_VA_SLOT_COUNT)) {
+		va_page = kzalloc(sizeof(*va_page), GFP_KERNEL);
+		if (!va_page)
+			return ERR_PTR(-ENOMEM);
+
+		va_page->epc_page = sgx_alloc_va_page();
+		if (IS_ERR(va_page->epc_page)) {
+			err = ERR_CAST(va_page->epc_page);
+			kfree(va_page);
+			return err;
+		}
+
+		WARN_ON_ONCE(encl->page_cnt % SGX_VA_SLOT_COUNT);
+	}
+	encl->page_cnt++;
+	return va_page;
+}
+
+static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
+{
+	encl->page_cnt--;
+
+	if (va_page) {
+		sgx_free_epc_page(va_page->epc_page);
+		list_del(&va_page->list);
+		kfree(va_page);
+	}
+}
+
 static u32 sgx_calc_ssa_frame_size(u32 miscselect, u64 xfrm)
 {
 	u32 size_max = PAGE_SIZE;
@@ -86,24 +123,34 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 {
 	unsigned long encl_size = secs->size + PAGE_SIZE;
 	struct sgx_epc_page *secs_epc;
+	struct sgx_va_page *va_page;
 	struct sgx_pageinfo pginfo;
 	struct sgx_secinfo secinfo;
 	struct file *backing;
 	long ret;
 
+	va_page = sgx_encl_grow(encl);
+	if (IS_ERR(va_page))
+		return PTR_ERR(va_page);
+	else if (va_page)
+		list_add(&va_page->list, &encl->va_pages);
+
 	if (sgx_validate_secs(secs)) {
 		pr_debug("invalid SECS\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err_out_shrink;
 	}
 
 	backing = shmem_file_setup("SGX backing", encl_size + (encl_size >> 5),
 				   VM_NORESERVE);
-	if (IS_ERR(backing))
-		return PTR_ERR(backing);
+	if (IS_ERR(backing)) {
+		ret = PTR_ERR(backing);
+		goto err_out_shrink;
+	}
 
 	encl->backing = backing;
 
-	secs_epc = __sgx_alloc_epc_page();
+	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
 	if (IS_ERR(secs_epc)) {
 		ret = PTR_ERR(secs_epc);
 		goto err_out_backing;
@@ -150,6 +197,9 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	fput(encl->backing);
 	encl->backing = NULL;
 
+err_out_shrink:
+	sgx_encl_shrink(encl, va_page);
+
 	return ret;
 }
 
@@ -326,21 +376,35 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 {
 	struct sgx_encl_page *encl_page;
 	struct sgx_epc_page *epc_page;
+	struct sgx_va_page *va_page;
 	int ret;
 
 	encl_page = sgx_encl_page_alloc(encl, offset, secinfo->flags);
 	if (IS_ERR(encl_page))
 		return PTR_ERR(encl_page);
 
-	epc_page = __sgx_alloc_epc_page();
+	epc_page = sgx_alloc_epc_page(encl_page, true);
 	if (IS_ERR(epc_page)) {
 		kfree(encl_page);
 		return PTR_ERR(epc_page);
 	}
 
+	va_page = sgx_encl_grow(encl);
+	if (IS_ERR(va_page)) {
+		ret = PTR_ERR(va_page);
+		goto err_out_free;
+	}
+
 	mmap_read_lock(current->mm);
 	mutex_lock(&encl->lock);
 
+	/*
+	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
+	 * deleting (via sgx_encl_shrink()) in the error path.
+	 */
+	if (va_page)
+		list_add(&va_page->list, &encl->va_pages);
+
 	/*
 	 * Insert prior to EADD in case of OOM.  EADD modifies MRENCLAVE, i.e.
 	 * can't be gracefully unwound, while failure on EADD/EXTEND is limited
@@ -371,6 +435,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 			goto err_out;
 	}
 
+	sgx_mark_page_reclaimable(encl_page->epc_page);
 	mutex_unlock(&encl->lock);
 	mmap_read_unlock(current->mm);
 	return ret;
@@ -380,9 +445,11 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 			  PFN_DOWN(encl_page->desc));
 
 err_out_unlock:
+	sgx_encl_shrink(encl, va_page);
 	mutex_unlock(&encl->lock);
 	mmap_read_unlock(current->mm);
 
+err_out_free:
 	sgx_free_epc_page(epc_page);
 	kfree(encl_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4137254fb29e..3594d37d545f 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -16,6 +16,378 @@
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxswapd_tsk;
+static DECLARE_WAIT_QUEUE_HEAD(ksgxswapd_waitq);
+static LIST_HEAD(sgx_active_page_list);
+static DEFINE_SPINLOCK(sgx_active_page_list_lock);
+
+/**
+ * sgx_mark_page_reclaimable() - Mark a page as reclaimable
+ * @page:	EPC page
+ *
+ * Mark a page as reclaimable and add it to the active page list. Pages
+ * are automatically removed from the active list when freed.
+ */
+void sgx_mark_page_reclaimable(struct sgx_epc_page *page)
+{
+	spin_lock(&sgx_active_page_list_lock);
+	page->desc |= SGX_EPC_PAGE_RECLAIMABLE;
+	list_add_tail(&page->list, &sgx_active_page_list);
+	spin_unlock(&sgx_active_page_list_lock);
+}
+
+/**
+ * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list
+ * @page:	EPC page
+ *
+ * Clear the reclaimable flag and remove the page from the active page list.
+ *
+ * Return:
+ *   0 on success,
+ *   -EBUSY if the page is in the process of being reclaimed
+ */
+int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
+{
+	/*
+	 * Remove the page from the active list if necessary.  If the page
+	 * is actively being reclaimed, i.e. RECLAIMABLE is set but the
+	 * page isn't on the active list, return -EBUSY as we can't free
+	 * the page at this time since it is "owned" by the reclaimer.
+	 */
+	spin_lock(&sgx_active_page_list_lock);
+	if (page->desc & SGX_EPC_PAGE_RECLAIMABLE) {
+		if (list_empty(&page->list)) {
+			spin_unlock(&sgx_active_page_list_lock);
+			return -EBUSY;
+		}
+		list_del(&page->list);
+		page->desc &= ~SGX_EPC_PAGE_RECLAIMABLE;
+	}
+	spin_unlock(&sgx_active_page_list_lock);
+
+	return 0;
+}
+
+static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
+{
+	struct sgx_encl_page *page = epc_page->owner;
+	struct sgx_encl *encl = page->encl;
+	struct sgx_encl_mm *encl_mm;
+	bool ret = true;
+	int idx;
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+		if (!mmget_not_zero(encl_mm->mm))
+			continue;
+
+		mmap_read_lock(encl_mm->mm);
+		ret = !sgx_encl_test_and_clear_young(encl_mm->mm, page);
+		mmap_read_unlock(encl_mm->mm);
+
+		mmput_async(encl_mm->mm);
+
+		if (!ret || (atomic_read(&encl->flags) & SGX_ENCL_DEAD))
+			break;
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	if (!ret && !(atomic_read(&encl->flags) & SGX_ENCL_DEAD))
+		return false;
+
+	return true;
+}
+
+static void sgx_reclaimer_block(struct sgx_epc_page *epc_page)
+{
+	struct sgx_encl_page *page = epc_page->owner;
+	unsigned long addr = SGX_ENCL_PAGE_ADDR(page);
+	struct sgx_encl *encl = page->encl;
+	unsigned long mm_list_version;
+	struct sgx_encl_mm *encl_mm;
+	struct vm_area_struct *vma;
+	int idx, ret;
+
+	do {
+		mm_list_version = encl->mm_list_version;
+
+		/* Pairs with smp_rmb() in sgx_encl_mm_add(). */
+		smp_rmb();
+
+		idx = srcu_read_lock(&encl->srcu);
+
+		list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+			if (!mmget_not_zero(encl_mm->mm))
+				continue;
+
+			mmap_read_lock(encl_mm->mm);
+
+			ret = sgx_encl_find(encl_mm->mm, addr, &vma);
+			if (!ret && encl == vma->vm_private_data)
+				zap_vma_ptes(vma, addr, PAGE_SIZE);
+
+			mmap_read_unlock(encl_mm->mm);
+
+			mmput_async(encl_mm->mm);
+		}
+
+		srcu_read_unlock(&encl->srcu, idx);
+	} while (unlikely(encl->mm_list_version != mm_list_version));
+
+	mutex_lock(&encl->lock);
+
+	if (!(atomic_read(&encl->flags) & SGX_ENCL_DEAD)) {
+		ret = __eblock(sgx_get_epc_addr(epc_page));
+		if (encls_failed(ret))
+			ENCLS_WARN(ret, "EBLOCK");
+	}
+
+	mutex_unlock(&encl->lock);
+}
+
+static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot,
+			  struct sgx_backing *backing)
+{
+	struct sgx_pageinfo pginfo;
+	int ret;
+
+	pginfo.addr = 0;
+	pginfo.secs = 0;
+
+	pginfo.contents = (unsigned long)kmap_atomic(backing->contents);
+	pginfo.metadata = (unsigned long)kmap_atomic(backing->pcmd) +
+			  backing->pcmd_offset;
+
+	ret = __ewb(&pginfo, sgx_get_epc_addr(epc_page), va_slot);
+
+	kunmap_atomic((void *)(unsigned long)(pginfo.metadata -
+					      backing->pcmd_offset));
+	kunmap_atomic((void *)(unsigned long)pginfo.contents);
+
+	return ret;
+}
+
+static void sgx_ipi_cb(void *info)
+{
+}
+
+static const cpumask_t *sgx_encl_ewb_cpumask(struct sgx_encl *encl)
+{
+	cpumask_t *cpumask = &encl->cpumask;
+	struct sgx_encl_mm *encl_mm;
+	int idx;
+
+	/*
+	 * Can race with sgx_encl_mm_add(), but ETRACK has already been
+	 * executed, which means that the CPUs running in the new mm will enter
+	 * into the enclave with a fresh epoch.
+	 */
+	cpumask_clear(cpumask);
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+		if (!mmget_not_zero(encl_mm->mm))
+			continue;
+
+		cpumask_or(cpumask, cpumask, mm_cpumask(encl_mm->mm));
+
+		mmput_async(encl_mm->mm);
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	return cpumask;
+}
+
+static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
+			 struct sgx_backing *backing)
+{
+	struct sgx_encl_page *encl_page = epc_page->owner;
+	struct sgx_encl *encl = encl_page->encl;
+	struct sgx_va_page *va_page;
+	unsigned int va_offset;
+	void *va_slot;
+	int ret;
+
+	encl_page->desc &= ~SGX_ENCL_PAGE_RECLAIMED;
+
+	va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
+				   list);
+	va_offset = sgx_alloc_va_slot(va_page);
+	va_slot = sgx_get_epc_addr(va_page->epc_page) + va_offset;
+	if (sgx_va_page_full(va_page))
+		list_move_tail(&va_page->list, &encl->va_pages);
+
+	ret = __sgx_encl_ewb(epc_page, va_slot, backing);
+	if (ret == SGX_NOT_TRACKED) {
+		ret = __etrack(sgx_get_epc_addr(encl->secs.epc_page));
+		if (ret) {
+			if (encls_failed(ret))
+				ENCLS_WARN(ret, "ETRACK");
+		}
+
+		ret = __sgx_encl_ewb(epc_page, va_slot, backing);
+		if (ret == SGX_NOT_TRACKED) {
+			/*
+			 * Slow path, send IPIs to kick cpus out of the
+			 * enclave.  Note, it's imperative that the cpu
+			 * mask is generated *after* ETRACK, else we'll
+			 * miss cpus that entered the enclave between
+			 * generating the mask and incrementing epoch.
+			 */
+			on_each_cpu_mask(sgx_encl_ewb_cpumask(encl),
+					 sgx_ipi_cb, NULL, 1);
+			ret = __sgx_encl_ewb(epc_page, va_slot, backing);
+		}
+	}
+
+	if (ret) {
+		if (encls_failed(ret))
+			ENCLS_WARN(ret, "EWB");
+
+		sgx_free_va_slot(va_page, va_offset);
+	} else {
+		encl_page->desc |= va_offset;
+		encl_page->va_page = va_page;
+	}
+}
+
+static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
+				struct sgx_backing *backing)
+{
+	struct sgx_encl_page *encl_page = epc_page->owner;
+	struct sgx_encl *encl = encl_page->encl;
+	struct sgx_backing secs_backing;
+	int ret;
+
+	mutex_lock(&encl->lock);
+
+	if (atomic_read(&encl->flags) & SGX_ENCL_DEAD) {
+		ret = __eremove(sgx_get_epc_addr(epc_page));
+		ENCLS_WARN(ret, "EREMOVE");
+	} else {
+		sgx_encl_ewb(epc_page, backing);
+	}
+
+	encl_page->epc_page = NULL;
+	encl->secs_child_cnt--;
+
+	if (!encl->secs_child_cnt) {
+		if (atomic_read(&encl->flags) & SGX_ENCL_DEAD) {
+			sgx_free_epc_page(encl->secs.epc_page);
+			encl->secs.epc_page = NULL;
+		} else if (atomic_read(&encl->flags) & SGX_ENCL_INITIALIZED) {
+			ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
+						   &secs_backing);
+			if (ret)
+				goto out;
+
+			sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
+
+			sgx_free_epc_page(encl->secs.epc_page);
+			encl->secs.epc_page = NULL;
+
+			sgx_encl_put_backing(&secs_backing, true);
+		}
+	}
+
+out:
+	mutex_unlock(&encl->lock);
+}
+
+/*
+ * Take a fixed number of pages from the head of the active page pool and
+ * reclaim them to the enclave's private shmem files. Skip the pages, which have
+ * been accessed since the last scan. Move those pages to the tail of active
+ * page pool so that the pages get scanned in LRU like fashion.
+ */
+static void sgx_reclaim_pages(void)
+{
+	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
+	struct sgx_backing backing[SGX_NR_TO_SCAN];
+	struct sgx_epc_section *section;
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+	int cnt = 0;
+	int ret;
+	int i;
+
+	spin_lock(&sgx_active_page_list_lock);
+	for (i = 0; i < SGX_NR_TO_SCAN; i++) {
+		if (list_empty(&sgx_active_page_list))
+			break;
+
+		epc_page = list_first_entry(&sgx_active_page_list,
+					    struct sgx_epc_page, list);
+		list_del_init(&epc_page->list);
+		encl_page = epc_page->owner;
+
+		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0)
+			chunk[cnt++] = epc_page;
+		else
+			/* The owner is freeing the page. No need to add the
+			 * page back to the list of reclaimable pages.
+			 */
+			epc_page->desc &= ~SGX_EPC_PAGE_RECLAIMABLE;
+	}
+	spin_unlock(&sgx_active_page_list_lock);
+
+	for (i = 0; i < cnt; i++) {
+		epc_page = chunk[i];
+		encl_page = epc_page->owner;
+
+		if (!sgx_reclaimer_age(epc_page))
+			goto skip;
+
+		ret = sgx_encl_get_backing(encl_page->encl,
+					   SGX_ENCL_PAGE_INDEX(encl_page),
+					   &backing[i]);
+		if (ret)
+			goto skip;
+
+		mutex_lock(&encl_page->encl->lock);
+		encl_page->desc |= SGX_ENCL_PAGE_RECLAIMED;
+		mutex_unlock(&encl_page->encl->lock);
+		continue;
+
+skip:
+		spin_lock(&sgx_active_page_list_lock);
+		list_add_tail(&epc_page->list, &sgx_active_page_list);
+		spin_unlock(&sgx_active_page_list_lock);
+
+		kref_put(&encl_page->encl->refcount, sgx_encl_release);
+
+		chunk[i] = NULL;
+	}
+
+	for (i = 0; i < cnt; i++) {
+		epc_page = chunk[i];
+		if (epc_page)
+			sgx_reclaimer_block(epc_page);
+	}
+
+	for (i = 0; i < cnt; i++) {
+		epc_page = chunk[i];
+		if (!epc_page)
+			continue;
+
+		encl_page = epc_page->owner;
+		sgx_reclaimer_write(epc_page, &backing[i]);
+		sgx_encl_put_backing(&backing[i], true);
+
+		kref_put(&encl_page->encl->refcount, sgx_encl_release);
+		epc_page->desc &= ~SGX_EPC_PAGE_RECLAIMABLE;
+
+		section = sgx_get_epc_section(epc_page);
+		spin_lock(&section->lock);
+		list_add_tail(&epc_page->list, &section->page_list);
+		section->free_cnt++;
+		spin_unlock(&section->lock);
+	}
+}
+
 
 static void sgx_sanitize_section(struct sgx_epc_section *section)
 {
@@ -44,6 +416,23 @@ static void sgx_sanitize_section(struct sgx_epc_section *section)
 	}
 }
 
+static unsigned long sgx_nr_free_pages(void)
+{
+	unsigned long cnt = 0;
+	int i;
+
+	for (i = 0; i < sgx_nr_epc_sections; i++)
+		cnt += sgx_epc_sections[i].free_cnt;
+
+	return cnt;
+}
+
+static bool sgx_should_reclaim(unsigned long watermark)
+{
+	return sgx_nr_free_pages() < watermark &&
+	       !list_empty(&sgx_active_page_list);
+}
+
 static int ksgxswapd(void *p)
 {
 	int i;
@@ -69,6 +458,20 @@ static int ksgxswapd(void *p)
 			WARN(1, "EPC section %d has unsanitized pages.\n", i);
 	}
 
+	while (!kthread_should_stop()) {
+		if (try_to_freeze())
+			continue;
+
+		wait_event_freezable(ksgxswapd_waitq,
+				     kthread_should_stop() ||
+				     sgx_should_reclaim(SGX_NR_HIGH_PAGES));
+
+		if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
+			sgx_reclaim_pages();
+
+		cond_resched();
+	}
+
 	return 0;
 }
 
@@ -94,6 +497,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
 
 	page = list_first_entry(&section->page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
+	section->free_cnt--;
 
 	return page;
 }
@@ -127,6 +531,57 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 	return ERR_PTR(-ENOMEM);
 }
 
+/**
+ * sgx_alloc_epc_page() - Allocate an EPC page
+ * @owner:	the owner of the EPC page
+ * @reclaim:	reclaim pages if necessary
+ *
+ * Iterate through EPC sections and borrow a free EPC page to the caller. When a
+ * page is no longer needed it must be released with sgx_free_epc_page(). If
+ * @reclaim is set to true, directly reclaim pages when we are out of pages. No
+ * mm's can be locked when @reclaim is set to true.
+ *
+ * Finally, wake up ksgxswapd when the number of pages goes below the watermark
+ * before returning back to the caller.
+ *
+ * Return:
+ *   an EPC page,
+ *   -errno on error
+ */
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
+{
+	struct sgx_epc_page *entry;
+
+	for ( ; ; ) {
+		entry = __sgx_alloc_epc_page();
+		if (!IS_ERR(entry)) {
+			entry->owner = owner;
+			break;
+		}
+
+		if (list_empty(&sgx_active_page_list))
+			return ERR_PTR(-ENOMEM);
+
+		if (!reclaim) {
+			entry = ERR_PTR(-EBUSY);
+			break;
+		}
+
+		if (signal_pending(current)) {
+			entry = ERR_PTR(-ERESTARTSYS);
+			break;
+		}
+
+		sgx_reclaim_pages();
+		schedule();
+	}
+
+	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
+		wake_up(&ksgxswapd_waitq);
+
+	return entry;
+}
+
 /**
  * sgx_free_epc_page() - Free an EPC page
  * @page:	an EPC page
@@ -138,12 +593,20 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	struct sgx_epc_section *section = sgx_get_epc_section(page);
 	int ret;
 
+	/*
+	 * Don't take sgx_active_page_list_lock when asserting the page isn't
+	 * reclaimable, missing a WARN in the very rare case is preferable to
+	 * unnecessarily taking a global lock in the common case.
+	 */
+	WARN_ON_ONCE(page->desc & SGX_EPC_PAGE_RECLAIMABLE);
+
 	ret = __eremove(sgx_get_epc_addr(page));
 	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
 		return;
 
 	spin_lock(&section->lock);
 	list_add_tail(&page->list, &section->page_list);
+	section->free_cnt++;
 	spin_unlock(&section->lock);
 }
 
@@ -194,6 +657,7 @@ static bool __init sgx_setup_epc_section(u64 addr, u64 size,
 		list_add_tail(&page->list, &section->unsanitized_page_list);
 	}
 
+	section->free_cnt = nr_pages;
 	return true;
 
 err_out:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index fce756c3434b..1a6ca5f734e5 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -15,6 +15,7 @@
 
 struct sgx_epc_page {
 	unsigned long desc;
+	struct sgx_encl_page *owner;
 	struct list_head list;
 };
 
@@ -27,6 +28,7 @@ struct sgx_epc_page {
 struct sgx_epc_section {
 	unsigned long pa;
 	void *va;
+	unsigned long free_cnt;
 	struct list_head page_list;
 	struct list_head unsanitized_page_list;
 	spinlock_t lock;
@@ -34,6 +36,10 @@ struct sgx_epc_section {
 
 #define SGX_EPC_SECTION_MASK		GENMASK(7, 0)
 #define SGX_MAX_EPC_SECTIONS		(SGX_EPC_SECTION_MASK + 1)
+#define SGX_EPC_PAGE_RECLAIMABLE	BIT(8)
+#define SGX_NR_TO_SCAN			16
+#define SGX_NR_LOW_PAGES		32
+#define SGX_NR_HIGH_PAGES		64
 
 extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 
@@ -49,7 +55,10 @@ static inline void *sgx_get_epc_addr(struct sgx_epc_page *page)
 	return section->va + (page->desc & PAGE_MASK) - section->pa;
 }
 
+void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
+int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *__sgx_alloc_epc_page(void);
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 void sgx_free_epc_page(struct sgx_epc_page *page);
 
 #endif /* _X86_SGX_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-07  3:01 ` [PATCH v34 10/24] mm: Add vm_ops->mprotect() Jarkko Sakkinen
@ 2020-07-07  3:14   ` Matthew Wilcox
  2020-07-07  3:22     ` Sean Christopherson
  2020-07-07  4:03     ` Jarkko Sakkinen
  0 siblings, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2020-07-07  3:14 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: x86, linux-sgx, linux-kernel, Sean Christopherson, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 06:01:50AM +0300, Jarkko Sakkinen wrote:
> +++ b/mm/mprotect.c
> @@ -603,13 +603,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
>  			goto out;
>  		}
>  
> +		tmp = vma->vm_end;
> +		if (tmp > end)
> +			tmp = end;
> +
>  		error = security_file_mprotect(vma, reqprot, prot);
>  		if (error)
>  			goto out;
>  
> -		tmp = vma->vm_end;
> -		if (tmp > end)
> -			tmp = end;

You don't need to move this any more, right?

> +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> +			if (error)
> +				goto out;
> +		}
> +
>  		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
>  		if (error)
>  			goto out;
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-07  3:14   ` Matthew Wilcox
@ 2020-07-07  3:22     ` Sean Christopherson
  2020-07-07  3:24       ` Matthew Wilcox
  2020-07-07  4:03     ` Jarkko Sakkinen
  1 sibling, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2020-07-07  3:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jarkko Sakkinen, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> On Tue, Jul 07, 2020 at 06:01:50AM +0300, Jarkko Sakkinen wrote:
> > +++ b/mm/mprotect.c
> > @@ -603,13 +603,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> >  			goto out;
> >  		}
> >  
> > +		tmp = vma->vm_end;
> > +		if (tmp > end)
> > +			tmp = end;
> > +
> >  		error = security_file_mprotect(vma, reqprot, prot);
> >  		if (error)
> >  			goto out;
> >  
> > -		tmp = vma->vm_end;
> > -		if (tmp > end)
> > -			tmp = end;
> 
> You don't need to move this any more, right?

Ya, I was typing up a longer version, you beat me to the punch...

The calculation of 'tmp' doesn't need to be moved.  The previous incarnation
only moved it so that that 'tmp would be available for .may_mprotect().

The only reason  .may_mprotect() was placed before security_file_mprotect()
was to avoid triggering LSM errors that would have been blocked by
.may_mprotect(), but that justification is no longer relevant as the
proposed SGX LSM hooks obviously didn't worm their way into this series.

> > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > +			if (error)
> > +				goto out;
> > +		}

Based on "... and then the vma owner can do whatever it needs to before
calling mprotect_fixup(), which is already not static", my interpretation
is that Matthew's intent was to do:

		if (vma->vm_ops && vma->vm_ops->mprotect)
			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
		else
			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
		if (error)
			goto out;

i.e. make .mprotect() a full replacement as opposed to a prereq hook.
		
> > +
> >  		error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> >  		if (error)
> >  			goto out;
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-07  3:22     ` Sean Christopherson
@ 2020-07-07  3:24       ` Matthew Wilcox
  2020-07-07  4:01         ` Jarkko Sakkinen
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2020-07-07  3:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jarkko Sakkinen, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > +			if (error)
> > > +				goto out;
> > > +		}
> 
> Based on "... and then the vma owner can do whatever it needs to before
> calling mprotect_fixup(), which is already not static", my interpretation
> is that Matthew's intent was to do:
> 
> 		if (vma->vm_ops && vma->vm_ops->mprotect)
> 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> 		else
> 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> 		if (error)
> 			goto out;
> 
> i.e. make .mprotect() a full replacement as opposed to a prereq hook.

Yes, it was.  I was just looking at the next patch to be sure this was
how I'd been misunderstood.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 11/24] x86/sgx: Add SGX enclave driver
  2020-07-07  3:01 ` [PATCH v34 11/24] x86/sgx: Add SGX enclave driver Jarkko Sakkinen
@ 2020-07-07  3:36   ` Matthew Wilcox
  2020-07-07  4:11     ` Jarkko Sakkinen
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2020-07-07  3:36 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: x86, linux-sgx, linux-kernel, linux-security-module, linux-mm,
	Andrew Morton, Jethro Beekman, Haitao Huang, Chunyang Hui,
	Jordan Hand, Nathaniel McCallum, Seth Moore, Sean Christopherson,
	Suresh Siddha, andriy.shevchenko, asapek, bp, cedric.xing,
	chenalexchen, conradparker, cyhanish, dave.hansen, haitao.huang,
	josh, kai.huang, kai.svahn, kmoy, ludloff, luto, nhorman,
	puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 06:01:51AM +0300, Jarkko Sakkinen wrote:
> Intel Software Guard eXtensions (SGX) is a set of CPU instructions that
> can be used by applications to set aside private regions of code and
> data. The code outside the SGX hosted software entity is disallowed to

s/disallowed to/prevented from/

> access the memory inside the enclave enforced by the CPU. We call these

s/enforced//

> entities enclaves.
> 
> Add a driver that provides an ioctl API to construct and run enclaves.
> Enclaves are constructed from pages residing in reserved physical memory
> areas. The contents of these pages can only be accessed when they are
> mapped as part of an enclave, by a hardware thread running inside the
> enclave.
> 
> The starting state of an enclave consists of a fixed measured set of
> pages that are copied to the EPC during the construction process by
> using ENCLS leaf functions and Software Enclave Control Structure (SECS)
> that defines the enclave properties.
> 
> Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
> EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
> the EPC and EINIT checks a given signed measurement and moves the enclave
> into a state ready for execution.

What's a leaf function?  Is it like a CPU instruction?

> The mmap() permissions are capped by the contained enclave page
> permissions. The mapped areas must also be opaque, i.e. each page address
> must contain a page. This logic is implemented in sgx_encl_may_map().

do you mean "populated" instead of "opaque"?

> +	atomic_set(&encl->flags, 0);
> +	kref_init(&encl->refcount);
> +	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);

Why are you using a radix tree instead of an xarray?

> +int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
> +		     unsigned long end, unsigned long vm_prot_bits)
> +{
> +	unsigned long idx, idx_start, idx_end;
> +	struct sgx_encl_page *page;
> +
> +	/*
> +	 * Disallow RIE tasks as their VMA permissions might conflict with the
> +	 * enclave page permissions.
> +	 */
> +	if (!!(current->personality & READ_IMPLIES_EXEC))
> +		return -EACCES;
> +
> +	idx_start = PFN_DOWN(start);
> +	idx_end = PFN_DOWN(end - 1);
> +
> +	for (idx = idx_start; idx <= idx_end; ++idx) {
> +		mutex_lock(&encl->lock);
> +		page = radix_tree_lookup(&encl->page_tree, idx);
> +		mutex_unlock(&encl->lock);
> +
> +		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> +			return -EACCES;

You should really use an iterator here instead of repeated lookups.
xas_for_each() will probably be what you want.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-07  3:24       ` Matthew Wilcox
@ 2020-07-07  4:01         ` Jarkko Sakkinen
  2020-07-07  4:10           ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-07  4:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sean Christopherson, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 04:24:08AM +0100, Matthew Wilcox wrote:
> On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> > On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > +			if (error)
> > > > +				goto out;
> > > > +		}
> > 
> > Based on "... and then the vma owner can do whatever it needs to before
> > calling mprotect_fixup(), which is already not static", my interpretation
> > is that Matthew's intent was to do:
> > 
> > 		if (vma->vm_ops && vma->vm_ops->mprotect)
> > 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > 		else
> > 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> > 		if (error)
> > 			goto out;
> > 
> > i.e. make .mprotect() a full replacement as opposed to a prereq hook.
> 
> Yes, it was.  I was just looking at the next patch to be sure this was
> how I'd been misunderstood.

I'm don't get this part. If mprotect_fixup is called in the tail of the
callback, why it has to be called inside the callback and not be called
after the callback?

The reason I only part did what you requested was to do only the part of
the change that I get. Not to oppose it.

/Jarkko


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-07  3:14   ` Matthew Wilcox
  2020-07-07  3:22     ` Sean Christopherson
@ 2020-07-07  4:03     ` Jarkko Sakkinen
  1 sibling, 0 replies; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-07  4:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: x86, linux-sgx, linux-kernel, Sean Christopherson, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> On Tue, Jul 07, 2020 at 06:01:50AM +0300, Jarkko Sakkinen wrote:
> > +++ b/mm/mprotect.c
> > @@ -603,13 +603,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> >  			goto out;
> >  		}
> >  
> > +		tmp = vma->vm_end;
> > +		if (tmp > end)
> > +			tmp = end;
> > +
> >  		error = security_file_mprotect(vma, reqprot, prot);
> >  		if (error)
> >  			goto out;
> >  
> > -		tmp = vma->vm_end;
> > -		if (tmp > end)
> > -			tmp = end;
> 
> You don't need to move this any more, right?

My bad.

/Jarkko


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-07  4:01         ` Jarkko Sakkinen
@ 2020-07-07  4:10           ` Matthew Wilcox
  2020-07-08 14:33             ` Jarkko Sakkinen
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2020-07-07  4:10 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 07:01:51AM +0300, Jarkko Sakkinen wrote:
> On Tue, Jul 07, 2020 at 04:24:08AM +0100, Matthew Wilcox wrote:
> > On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> > > On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > > +			if (error)
> > > > > +				goto out;
> > > > > +		}
> > > 
> > > Based on "... and then the vma owner can do whatever it needs to before
> > > calling mprotect_fixup(), which is already not static", my interpretation
> > > is that Matthew's intent was to do:
> > > 
> > > 		if (vma->vm_ops && vma->vm_ops->mprotect)
> > > 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > 		else
> > > 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> > > 		if (error)
> > > 			goto out;
> > > 
> > > i.e. make .mprotect() a full replacement as opposed to a prereq hook.
> > 
> > Yes, it was.  I was just looking at the next patch to be sure this was
> > how I'd been misunderstood.
> 
> I'm don't get this part. If mprotect_fixup is called in the tail of the
> callback, why it has to be called inside the callback and not be called
> after the callback?

Because that's how every other VM operation works.  Look at your
implementation of get_unmapped_area() for example.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 11/24] x86/sgx: Add SGX enclave driver
  2020-07-07  3:36   ` Matthew Wilcox
@ 2020-07-07  4:11     ` Jarkko Sakkinen
  2020-07-07  4:29       ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-07  4:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: x86, linux-sgx, linux-kernel, linux-security-module, linux-mm,
	Andrew Morton, Jethro Beekman, Haitao Huang, Chunyang Hui,
	Jordan Hand, Nathaniel McCallum, Seth Moore, Sean Christopherson,
	Suresh Siddha, andriy.shevchenko, asapek, bp, cedric.xing,
	chenalexchen, conradparker, cyhanish, dave.hansen, haitao.huang,
	josh, kai.huang, kai.svahn, kmoy, ludloff, luto, nhorman,
	puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 04:36:17AM +0100, Matthew Wilcox wrote:
> On Tue, Jul 07, 2020 at 06:01:51AM +0300, Jarkko Sakkinen wrote:
> > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that
> > can be used by applications to set aside private regions of code and
> > data. The code outside the SGX hosted software entity is disallowed to
> 
> s/disallowed to/prevented from/
> 
> > access the memory inside the enclave enforced by the CPU. We call these
> 
> s/enforced//
> 
> > entities enclaves.
> > 
> > Add a driver that provides an ioctl API to construct and run enclaves.
> > Enclaves are constructed from pages residing in reserved physical memory
> > areas. The contents of these pages can only be accessed when they are
> > mapped as part of an enclave, by a hardware thread running inside the
> > enclave.
> > 
> > The starting state of an enclave consists of a fixed measured set of
> > pages that are copied to the EPC during the construction process by
> > using ENCLS leaf functions and Software Enclave Control Structure (SECS)
> > that defines the enclave properties.
> > 
> > Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
> > EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
> > the EPC and EINIT checks a given signed measurement and moves the enclave
> > into a state ready for execution.
> 
> What's a leaf function?  Is it like a CPU instruction?

Yeah, the opcode is ENCLS for ring-0 (enclave management and
construction) and ENCLU for ring-3 (entrance to the enclave etc).
The leaf function number goes to EAX.

> 
> > The mmap() permissions are capped by the contained enclave page
> > permissions. The mapped areas must also be opaque, i.e. each page address
> > must contain a page. This logic is implemented in sgx_encl_may_map().
> 
> do you mean "populated" instead of "opaque"?

Yes, that would be a better word to use. I'll change this.

> 
> > +	atomic_set(&encl->flags, 0);
> > +	kref_init(&encl->refcount);
> > +	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);
> 
> Why are you using a radix tree instead of an xarray?

Because xarray did not exist in 2017 and nobody has pointed out to use
it. Now I know it exists (yet do not know what it is).

> 
> > +int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
> > +		     unsigned long end, unsigned long vm_prot_bits)
> > +{
> > +	unsigned long idx, idx_start, idx_end;
> > +	struct sgx_encl_page *page;
> > +
> > +	/*
> > +	 * Disallow RIE tasks as their VMA permissions might conflict with the
> > +	 * enclave page permissions.
> > +	 */
> > +	if (!!(current->personality & READ_IMPLIES_EXEC))
> > +		return -EACCES;
> > +
> > +	idx_start = PFN_DOWN(start);
> > +	idx_end = PFN_DOWN(end - 1);
> > +
> > +	for (idx = idx_start; idx <= idx_end; ++idx) {
> > +		mutex_lock(&encl->lock);
> > +		page = radix_tree_lookup(&encl->page_tree, idx);
> > +		mutex_unlock(&encl->lock);
> > +
> > +		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> > +			return -EACCES;
> 
> You should really use an iterator here instead of repeated lookups.
> xas_for_each() will probably be what you want.

Thank you for your remarks. I'll look into using xarray for this.

/Jarkko


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 11/24] x86/sgx: Add SGX enclave driver
  2020-07-07  4:11     ` Jarkko Sakkinen
@ 2020-07-07  4:29       ` Sean Christopherson
  2020-07-07  4:39         ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2020-07-07  4:29 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Matthew Wilcox, x86, linux-sgx, linux-kernel,
	linux-security-module, linux-mm, Andrew Morton, Jethro Beekman,
	Haitao Huang, Chunyang Hui, Jordan Hand, Nathaniel McCallum,
	Seth Moore, Suresh Siddha, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, puiterwijk, rientjes, tglx, yaozhangx

Man, I really need to type faster.

On Tue, Jul 07, 2020 at 07:11:51AM +0300, Jarkko Sakkinen wrote:
> On Tue, Jul 07, 2020 at 04:36:17AM +0100, Matthew Wilcox wrote:
> > What's a leaf function?  Is it like a CPU instruction?
> 
> Yeah, the opcode is ENCLS for ring-0 (enclave management and
> construction) and ENCLU for ring-3 (entrance to the enclave etc).
> The leaf function number goes to EAX.

To add to Jarkko's comments, for all intents and purposes they are individual
instructions, e.g. all of their own entries in the SDM, but are buried behind
a single opcode that switches on EAX, e.g. ECREATE is EAX=0,  EADD is EAX=1,
EINIT is EAX=2.  It's purely a way to save opcode space when the extra
overhead is a non-issue, e.g. SMX/TXT's GETSEC does the same shenanigans.

> > > +	atomic_set(&encl->flags, 0);
> > > +	kref_init(&encl->refcount);
> > > +	INIT_RADIX_TREE(&encl->page_tree, GFP_KERNEL);
> > 
> > Why are you using a radix tree instead of an xarray?
> 
> Because xarray did not exist in 2017 and nobody has pointed out to use
> it. Now I know it exists (yet do not know what it is).

I've followed xarrays a little, but obviously not closely enough to
understand their advantages over radix trees.  At a glance, range-based
iteration alone is probably justification enough to switch.

> > > +int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
> > > +		     unsigned long end, unsigned long vm_prot_bits)
> > > +{
> > > +	unsigned long idx, idx_start, idx_end;
> > > +	struct sgx_encl_page *page;
> > > +
> > > +	/*
> > > +	 * Disallow RIE tasks as their VMA permissions might conflict with the
> > > +	 * enclave page permissions.
> > > +	 */
> > > +	if (!!(current->personality & READ_IMPLIES_EXEC))
> > > +		return -EACCES;
> > > +
> > > +	idx_start = PFN_DOWN(start);
> > > +	idx_end = PFN_DOWN(end - 1);
> > > +
> > > +	for (idx = idx_start; idx <= idx_end; ++idx) {
> > > +		mutex_lock(&encl->lock);
> > > +		page = radix_tree_lookup(&encl->page_tree, idx);
> > > +		mutex_unlock(&encl->lock);
> > > +
> > > +		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> > > +			return -EACCES;
> > 
> > You should really use an iterator here instead of repeated lookups.
> > xas_for_each() will probably be what you want.
> 
> Thank you for your remarks. I'll look into using xarray for this.

Question for Matthew:

To enforce the "page must be populated" rule, is there a clean way to retrieve
the index of the current entry?  Our entries/pages don't have information
about their index.  Or should we just count the number of entries and check
'em at the end? E.g.

        xas_for_each(...) {
                if (~page->vm_max_prot_bits & vm_prot_bits)
                        return -EACCES;
                nr_entries++;
        }

        if (nr_entries != (end_index - start_index))
                return -EACCES;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 11/24] x86/sgx: Add SGX enclave driver
  2020-07-07  4:29       ` Sean Christopherson
@ 2020-07-07  4:39         ` Matthew Wilcox
  2020-07-07  4:46           ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2020-07-07  4:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jarkko Sakkinen, x86, linux-sgx, linux-kernel,
	linux-security-module, linux-mm, Andrew Morton, Jethro Beekman,
	Haitao Huang, Chunyang Hui, Jordan Hand, Nathaniel McCallum,
	Seth Moore, Suresh Siddha, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, puiterwijk, rientjes, tglx, yaozhangx

On Mon, Jul 06, 2020 at 09:29:04PM -0700, Sean Christopherson wrote:
> > > > +	idx_start = PFN_DOWN(start);
> > > > +	idx_end = PFN_DOWN(end - 1);
> > > > +
> > > > +	for (idx = idx_start; idx <= idx_end; ++idx) {
> > > > +		mutex_lock(&encl->lock);
> > > > +		page = radix_tree_lookup(&encl->page_tree, idx);
> > > > +		mutex_unlock(&encl->lock);
> > > > +
> > > > +		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> > > > +			return -EACCES;
> > > 
> > > You should really use an iterator here instead of repeated lookups.
> > > xas_for_each() will probably be what you want.
> > 
> > Thank you for your remarks. I'll look into using xarray for this.
> 
> Question for Matthew:
> 
> To enforce the "page must be populated" rule, is there a clean way to retrieve
> the index of the current entry?  Our entries/pages don't have information
> about their index.  Or should we just count the number of entries and check
> 'em at the end? E.g.
> 
>         xas_for_each(...) {
>                 if (~page->vm_max_prot_bits & vm_prot_bits)
>                         return -EACCES;
>                 nr_entries++;
>         }
> 
>         if (nr_entries != (end_index - start_index))
>                 return -EACCES;

Probably best just to steal the implementation from here:

pgoff_t page_cache_next_miss(struct address_space *mapping,
                             pgoff_t index, unsigned long max_scan)
{
        XA_STATE(xas, &mapping->i_pages, index);

        while (max_scan--) {
                void *entry = xas_next(&xas);
                if (!entry || xa_is_value(entry))
                        break;
                if (xas.xa_index == 0)
                        break;
        }

        return xas.xa_index;
}

although I think you have a simpler task.

	XA_STATE(xas, ..., start_index);

	for (;;) {
		struct page *page = xas_next(&xas);

		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
			return -EACCES;
	}

	return 0;

should do the trick, I think.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 11/24] x86/sgx: Add SGX enclave driver
  2020-07-07  4:39         ` Matthew Wilcox
@ 2020-07-07  4:46           ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2020-07-07  4:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jarkko Sakkinen, x86, linux-sgx, linux-kernel,
	linux-security-module, linux-mm, Andrew Morton, Jethro Beekman,
	Haitao Huang, Chunyang Hui, Jordan Hand, Nathaniel McCallum,
	Seth Moore, Suresh Siddha, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 05:39:04AM +0100, Matthew Wilcox wrote:
> although I think you have a simpler task.
> 
> 	XA_STATE(xas, ..., start_index);
> 
> 	for (;;) {
> 		struct page *page = xas_next(&xas);
> 
> 		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> 			return -EACCES;
> 	}
> 
> 	return 0;
> 
> should do the trick, I think.

Ah, neato.  Thanks!


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-07  4:10           ` Matthew Wilcox
@ 2020-07-08 14:33             ` Jarkko Sakkinen
  2020-07-08 14:37               ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-08 14:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sean Christopherson, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Tue, Jul 07, 2020 at 05:10:46AM +0100, Matthew Wilcox wrote:
> On Tue, Jul 07, 2020 at 07:01:51AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Jul 07, 2020 at 04:24:08AM +0100, Matthew Wilcox wrote:
> > > On Mon, Jul 06, 2020 at 08:22:54PM -0700, Sean Christopherson wrote:
> > > > On Tue, Jul 07, 2020 at 04:14:24AM +0100, Matthew Wilcox wrote:
> > > > > > +		if (vma->vm_ops && vma->vm_ops->mprotect) {
> > > > > > +			error = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > > > +			if (error)
> > > > > > +				goto out;
> > > > > > +		}
> > > > 
> > > > Based on "... and then the vma owner can do whatever it needs to before
> > > > calling mprotect_fixup(), which is already not static", my interpretation
> > > > is that Matthew's intent was to do:
> > > > 
> > > > 		if (vma->vm_ops && vma->vm_ops->mprotect)
> > > > 			error =  = vma->vm_ops->mprotect(vma, nstart, tmp, prot);
> > > > 		else
> > > > 			error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> > > > 		if (error)
> > > > 			goto out;
> > > > 
> > > > i.e. make .mprotect() a full replacement as opposed to a prereq hook.
> > > 
> > > Yes, it was.  I was just looking at the next patch to be sure this was
> > > how I'd been misunderstood.
> > 
> > I'm don't get this part. If mprotect_fixup is called in the tail of the
> > callback, why it has to be called inside the callback and not be called
> > after the callback?
> 
> Because that's how every other VM operation works.  Look at your
> implementation of get_unmapped_area() for example.

I get the point but I don't think that your proposal could work given
that mprotect-callback takes neither 'prev' nor 'newflags' as its
parameters. The current callback has no means to call mprotect_fixup()
properly.

It would have to be extended

	int (*mprotect)(struct vm_area_struct *vma,
			struct vm_area_struct **pprev, unsigned long start,
			unsigned long end, unsigned long prot,
			unsigned long newflags);

Is this what you want?

/Jarkko


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-08 14:33             ` Jarkko Sakkinen
@ 2020-07-08 14:37               ` Matthew Wilcox
  2020-07-08 16:10                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2020-07-08 14:37 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Wed, Jul 08, 2020 at 05:33:20PM +0300, Jarkko Sakkinen wrote:
> I get the point but I don't think that your proposal could work given
> that mprotect-callback takes neither 'prev' nor 'newflags' as its
> parameters. The current callback has no means to call mprotect_fixup()
> properly.
> 
> It would have to be extended
> 
> 	int (*mprotect)(struct vm_area_struct *vma,
> 			struct vm_area_struct **pprev, unsigned long start,
> 			unsigned long end, unsigned long prot,
> 			unsigned long newflags);
> 
> Is this what you want?

https://lore.kernel.org/linux-mm/20200625173050.GF7703@casper.infradead.org/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-08 14:37               ` Matthew Wilcox
@ 2020-07-08 16:10                 ` Jarkko Sakkinen
  2020-07-08 22:56                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-08 16:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sean Christopherson, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Wed, Jul 08, 2020 at 03:37:08PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 08, 2020 at 05:33:20PM +0300, Jarkko Sakkinen wrote:
> > I get the point but I don't think that your proposal could work given
> > that mprotect-callback takes neither 'prev' nor 'newflags' as its
> > parameters. The current callback has no means to call mprotect_fixup()
> > properly.
> > 
> > It would have to be extended
> > 
> > 	int (*mprotect)(struct vm_area_struct *vma,
> > 			struct vm_area_struct **pprev, unsigned long start,
> > 			unsigned long end, unsigned long prot,
> > 			unsigned long newflags);
> > 
> > Is this what you want?
> 
> https://lore.kernel.org/linux-mm/20200625173050.GF7703@casper.infradead.org/

Ugh, it's there as it should be. I'm sorry - I just misread the code.

I think that should work, and we do not have to do extra conversion
inside the callback.

There is still one thing that I'm wondering. 'page->vm_max_prot_bits'
contains some an union of subset of {VM_READ, VM_WRITE, VM_EXEC}, and
'newflags' can contain other bits set too.

The old implementation of sgx_vma_mprotect() is like this:

static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start,
			    unsigned long end, unsigned long prot)
{
	return sgx_encl_may_map(vma->vm_private_data, start, end,
				calc_vm_prot_bits(prot, 0));
}

The new one should be probably the implemented along the lines of

static int sgx_vma_mprotect(struct vm_area_struct *vma,
			    struct vm_area_struct **pprev, unsigned long start,
			    unsigned long end, unsigned long newflags)
{
	unsigned long masked_newflags = newflags &
					(VM_READ | VM_WRITE | VM_EXEC);
	int ret;

	ret = sgx_encl_may_map(vma->vm_private_data, start, end,
				   masked_newflags);
	if (ret)
		return ret;

	return mprotect_fixup(vma, pprev, start, end, newflags);
}

Alternatively the filtering can be inside sgx_encl_may_map(). Perhaps
that is a better place for it. This was just easier function to
represent the idea.

/Jarkko


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v34 10/24] mm: Add vm_ops->mprotect()
  2020-07-08 16:10                 ` Jarkko Sakkinen
@ 2020-07-08 22:56                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 18+ messages in thread
From: Jarkko Sakkinen @ 2020-07-08 22:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sean Christopherson, x86, linux-sgx, linux-kernel, linux-mm,
	Andrew Morton, Jethro Beekman, andriy.shevchenko, asapek, bp,
	cedric.xing, chenalexchen, conradparker, cyhanish, dave.hansen,
	haitao.huang, josh, kai.huang, kai.svahn, kmoy, ludloff, luto,
	nhorman, npmccallum, puiterwijk, rientjes, tglx, yaozhangx

On Wed, Jul 08, 2020 at 07:10:27PM +0300, Jarkko Sakkinen wrote:
> On Wed, Jul 08, 2020 at 03:37:08PM +0100, Matthew Wilcox wrote:
> > On Wed, Jul 08, 2020 at 05:33:20PM +0300, Jarkko Sakkinen wrote:
> > > I get the point but I don't think that your proposal could work given
> > > that mprotect-callback takes neither 'prev' nor 'newflags' as its
> > > parameters. The current callback has no means to call mprotect_fixup()
> > > properly.
> > > 
> > > It would have to be extended
> > > 
> > > 	int (*mprotect)(struct vm_area_struct *vma,
> > > 			struct vm_area_struct **pprev, unsigned long start,
> > > 			unsigned long end, unsigned long prot,
> > > 			unsigned long newflags);
> > > 
> > > Is this what you want?
> > 
> > https://lore.kernel.org/linux-mm/20200625173050.GF7703@casper.infradead.org/
> 
> Ugh, it's there as it should be. I'm sorry - I just misread the code.
> 
> I think that should work, and we do not have to do extra conversion
> inside the callback.
> 
> There is still one thing that I'm wondering. 'page->vm_max_prot_bits'
> contains some an union of subset of {VM_READ, VM_WRITE, VM_EXEC}, and
> 'newflags' can contain other bits set too.
> 
> The old implementation of sgx_vma_mprotect() is like this:
> 
> static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start,
> 			    unsigned long end, unsigned long prot)
> {
> 	return sgx_encl_may_map(vma->vm_private_data, start, end,
> 				calc_vm_prot_bits(prot, 0));
> }
> 
> The new one should be probably the implemented along the lines of
> 
> static int sgx_vma_mprotect(struct vm_area_struct *vma,
> 			    struct vm_area_struct **pprev, unsigned long start,
> 			    unsigned long end, unsigned long newflags)
> {
> 	unsigned long masked_newflags = newflags &
> 					(VM_READ | VM_WRITE | VM_EXEC);
> 	int ret;
> 
> 	ret = sgx_encl_may_map(vma->vm_private_data, start, end,
> 				   masked_newflags);
> 	if (ret)
> 		return ret;
> 
> 	return mprotect_fixup(vma, pprev, start, end, newflags);
> }
> 
> Alternatively the filtering can be inside sgx_encl_may_map(). Perhaps
> that is a better place for it. This was just easier function to
> represent the idea.

Turned out that mmap() handler was already masking with RWX. So I
removed masking from there and do it as the first step in
sgx_encl_may_map(), which is called by the both handlers:

int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
		     unsigned long end, unsigned long vm_flags)
{
	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
	unsigned long idx, idx_start, idx_end;
	struct sgx_encl_page *page;

Also renamed the last function parameter from vm_flags to vm_port_bits.
Kind of makes the flow more understandable (i.e. vm_prot_bits is purely
internal representation, not something caller needs to be concerned of).

/Jarkko


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-07-08 22:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200707030204.126021-1-jarkko.sakkinen@linux.intel.com>
2020-07-07  3:01 ` [PATCH v34 10/24] mm: Add vm_ops->mprotect() Jarkko Sakkinen
2020-07-07  3:14   ` Matthew Wilcox
2020-07-07  3:22     ` Sean Christopherson
2020-07-07  3:24       ` Matthew Wilcox
2020-07-07  4:01         ` Jarkko Sakkinen
2020-07-07  4:10           ` Matthew Wilcox
2020-07-08 14:33             ` Jarkko Sakkinen
2020-07-08 14:37               ` Matthew Wilcox
2020-07-08 16:10                 ` Jarkko Sakkinen
2020-07-08 22:56                   ` Jarkko Sakkinen
2020-07-07  4:03     ` Jarkko Sakkinen
2020-07-07  3:01 ` [PATCH v34 11/24] x86/sgx: Add SGX enclave driver Jarkko Sakkinen
2020-07-07  3:36   ` Matthew Wilcox
2020-07-07  4:11     ` Jarkko Sakkinen
2020-07-07  4:29       ` Sean Christopherson
2020-07-07  4:39         ` Matthew Wilcox
2020-07-07  4:46           ` Sean Christopherson
2020-07-07  3:01 ` [PATCH v34 16/24] x86/sgx: Add a page reclaimer Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).