linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
@ 2022-09-22 17:10 Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Kristen Carlson Accardi
                   ` (21 more replies)
  0 siblings, 22 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups

Add a new cgroup controller to regulate the distribution of SGX EPC memory,
which is a subset of system RAM that is used to provide SGX-enabled
applications with protected memory, and is otherwise inaccessible.

SGX EPC memory allocations are separate from normal RAM allocations,
and is managed solely by the SGX subsystem. The existing cgroup memory
controller cannot be used to limit or account for SGX EPC memory.

This patchset implements the sgx_epc cgroup controller, which will provide
support for stats, events, and the following interface files:

sgx_epc.current
	A read-only value which represents the total amount of EPC
	memory currently being used on by the cgroup and its descendents.

sgx_epc.low
	A read-write value which is used to set best-effort protection
	of EPC usage. If the EPC usage of a cgroup drops below this value,
	then the cgroup's EPC memory will not be reclaimed if possible.

sgx_epc.high
	A read-write value which is used to set a best-effort limit
	on the amount of EPC usage a cgroup has. If a cgroup's usage
	goes past the high value, the EPC memory of that cgroup will
	get reclaimed back under the high limit.

sgx_epc.max
	A read-write value which is used to set a hard limit for
	cgroup EPC usage. If a cgroup's EPC usage reaches this limit,
	allocations are blocked until EPC memory can be reclaimed from
	the cgroup.

This work was originally authored by Sean Christopherson a few years ago,
and was modified to work with more recent kernels.

The patchset adds support for multiple LRUs to track both reclaimable
EPC pages (i.e. pages the reclaimer knows about), as well as unreclaimable
EPC pages (i.e. pages which the reclaimer isn't aware of, such as va pages).
These pages are assigned to an LRU, as well as an enclave, so that an
enclave's full EPC usage can be tracked. During OOM events, an enclave
can be have its memory zapped, and all the EPC pages not tracked by the
reclaimer can be freed.

I appreciate your comments and feedback.

Sean Christopherson (20):
  x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages()
  x86/sgx: Store EPC page owner as a 'void *' to handle multiple users
  x86/sgx: Track owning enclave in VA EPC pages
  x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s)
  x86/sgx: Introduce unreclaimable EPC page lists
  x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages
  x86/sgx: Use a list to track to-be-reclaimed pages during reclaim
  x86/sgx: Add EPC page flags to identify type of page
  x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default
  x86/sgx: Return the number of EPC pages that were successfully
    reclaimed
  x86/sgx: Add option to ignore age of page during EPC reclaim
  x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page
  x86/sgx: Prepare for multiple LRUs
  x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup
  x86/sgx: Add helper to grab pages from an arbitrary EPC LRU
  x86/sgx: Add EPC OOM path to forcefully reclaim EPC
  cgroup, x86/sgx: Add SGX EPC cgroup controller
  x86/sgx: Enable EPC cgroup controller in SGX core
  x86/sgx: Add stats and events interfaces to EPC cgroup controller
  docs, cgroup, x86/sgx: Add SGX EPC cgroup controller documentation

 Documentation/admin-guide/cgroup-v2.rst | 201 +++++
 arch/x86/kernel/cpu/sgx/Makefile        |   1 +
 arch/x86/kernel/cpu/sgx/encl.c          |  89 ++-
 arch/x86/kernel/cpu/sgx/encl.h          |   4 +-
 arch/x86/kernel/cpu/sgx/epc_cgroup.c    | 950 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/epc_cgroup.h    |  51 ++
 arch/x86/kernel/cpu/sgx/ioctl.c         |  13 +-
 arch/x86/kernel/cpu/sgx/main.c          | 389 ++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h           |  40 +-
 arch/x86/kernel/cpu/sgx/virt.c          |  28 +-
 include/linux/cgroup_subsys.h           |   4 +
 init/Kconfig                            |  12 +
 12 files changed, 1669 insertions(+), 113 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h

-- 
2.37.3


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages()
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-23 12:32   ` Jarkko Sakkinen
  2022-09-22 17:10 ` [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users Kristen Carlson Accardi
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the invocation of post-reclaim cond_resched() from the callers of
sgx_reclaim_pages() into the reclaim path itself.   sgx_reclaim_pages()
is always called in a loop and is always followed by a call to
cond_resched().  This will hold true for the EPC cgroup as well, which
adds even more calls to sgx_reclaim_pages() and thus cond_resched().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..4cdeb915dc86 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -367,6 +367,8 @@ static void sgx_reclaim_pages(void)
 
 		sgx_free_epc_page(epc_page);
 	}
+
+	cond_resched();
 }
 
 static bool sgx_should_reclaim(unsigned long watermark)
@@ -410,8 +412,6 @@ static int ksgxd(void *p)
 
 		if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
 			sgx_reclaim_pages();
-
-		cond_resched();
 	}
 
 	return 0;
@@ -578,7 +578,6 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 		}
 
 		sgx_reclaim_pages();
-		cond_resched();
 	}
 
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 18:54   ` Dave Hansen
  2022-09-23 12:49   ` Jarkko Sakkinen
  2022-09-22 17:10 ` [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages Kristen Carlson Accardi
                   ` (19 subsequent siblings)
  21 siblings, 2 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

A future patch will use the owner field for either a pointer to
a struct sgx_encl, or a struct sgx_encl_page.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/sgx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 0f2020653fba..5a7e858a8f98 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -33,7 +33,7 @@ struct sgx_epc_page {
 	unsigned int section;
 	u16 flags;
 	u16 poison;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 18:55   ` Dave Hansen
  2022-09-23 12:52   ` Jarkko Sakkinen
  2022-09-22 17:10 ` [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s) Kristen Carlson Accardi
                   ` (18 subsequent siblings)
  21 siblings, 2 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

In order to fully account for an enclave's EPC page usage, store
the owning enclave of a VA EPC page.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 5 ++++-
 arch/x86/kernel/cpu/sgx/encl.h  | 2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index f40d64206ded..a18f1311b57d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -1193,6 +1193,7 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr)
 
 /**
  * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ * @encl:    The enclave that this page is allocated to.
  * @reclaim: Reclaim EPC pages directly if none available. Enclave
  *           mutex should not be held if this is set.
  *
@@ -1202,7 +1203,7 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr)
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim)
 {
 	struct sgx_epc_page *epc_page;
 	int ret;
@@ -1218,6 +1219,8 @@ struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
 		return ERR_PTR(-EFAULT);
 	}
 
+	epc_page->owner = encl;
+
 	return epc_page;
 }
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..831d63f80f5a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -116,7 +116,7 @@ struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
 					  unsigned long offset,
 					  u64 secinfo_flags);
 void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr);
-struct sgx_epc_page *sgx_alloc_va_page(bool reclaim);
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index ebe79d60619f..9a1bb3c3211a 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -30,7 +30,7 @@ struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim)
 		if (!va_page)
 			return ERR_PTR(-ENOMEM);
 
-		va_page->epc_page = sgx_alloc_va_page(reclaim);
+		va_page->epc_page = sgx_alloc_va_page(encl, reclaim);
 		if (IS_ERR(va_page->epc_page)) {
 			err = ERR_CAST(va_page->epc_page);
 			kfree(va_page);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s)
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (2 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-23 13:20   ` Jarkko Sakkinen
  2022-09-22 17:10 ` [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists Kristen Carlson Accardi
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Wrap the existing reclaimable list and its spinlock in a struct to
minimize the code changes needed to handle multiple LRUs as well as
reclaimable and non-reclaimable lists, both of which will be introduced
and used by SGX EPC cgroups.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 37 +++++++++++++++++-----------------
 arch/x86/kernel/cpu/sgx/sgx.h  | 11 ++++++++++
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4cdeb915dc86..af68dc1c677b 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -26,10 +26,9 @@ static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
- * with sgx_reclaimer_lock acquired.
+ * with sgx_global_lru.lock acquired.
  */
-static LIST_HEAD(sgx_active_page_list);
-static DEFINE_SPINLOCK(sgx_reclaimer_lock);
+static struct sgx_epc_lru sgx_global_lru;
 
 static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0);
 
@@ -298,12 +297,12 @@ static void sgx_reclaim_pages(void)
 	int ret;
 	int i;
 
-	spin_lock(&sgx_reclaimer_lock);
+	spin_lock(&sgx_global_lru.lock);
 	for (i = 0; i < SGX_NR_TO_SCAN; i++) {
-		if (list_empty(&sgx_active_page_list))
+		if (list_empty(&sgx_global_lru.reclaimable))
 			break;
 
-		epc_page = list_first_entry(&sgx_active_page_list,
+		epc_page = list_first_entry(&sgx_global_lru.reclaimable,
 					    struct sgx_epc_page, list);
 		list_del_init(&epc_page->list);
 		encl_page = epc_page->owner;
@@ -316,7 +315,7 @@ static void sgx_reclaim_pages(void)
 			 */
 			epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
 	}
-	spin_unlock(&sgx_reclaimer_lock);
+	spin_unlock(&sgx_global_lru.lock);
 
 	for (i = 0; i < cnt; i++) {
 		epc_page = chunk[i];
@@ -339,9 +338,9 @@ static void sgx_reclaim_pages(void)
 		continue;
 
 skip:
-		spin_lock(&sgx_reclaimer_lock);
-		list_add_tail(&epc_page->list, &sgx_active_page_list);
-		spin_unlock(&sgx_reclaimer_lock);
+		spin_lock(&sgx_global_lru.lock);
+		list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable);
+		spin_unlock(&sgx_global_lru.lock);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 
@@ -374,7 +373,7 @@ static void sgx_reclaim_pages(void)
 static bool sgx_should_reclaim(unsigned long watermark)
 {
 	return atomic_long_read(&sgx_nr_free_pages) < watermark &&
-	       !list_empty(&sgx_active_page_list);
+	       !list_empty(&sgx_global_lru.reclaimable);
 }
 
 /*
@@ -427,6 +426,8 @@ static bool __init sgx_page_reclaimer_init(void)
 
 	ksgxd_tsk = tsk;
 
+	sgx_lru_init(&sgx_global_lru);
+
 	return true;
 }
 
@@ -502,10 +503,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
  */
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page)
 {
-	spin_lock(&sgx_reclaimer_lock);
+	spin_lock(&sgx_global_lru.lock);
 	page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED;
-	list_add_tail(&page->list, &sgx_active_page_list);
-	spin_unlock(&sgx_reclaimer_lock);
+	list_add_tail(&page->list, &sgx_global_lru.reclaimable);
+	spin_unlock(&sgx_global_lru.lock);
 }
 
 /**
@@ -520,18 +521,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page)
  */
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
 {
-	spin_lock(&sgx_reclaimer_lock);
+	spin_lock(&sgx_global_lru.lock);
 	if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) {
 		/* The page is being reclaimed. */
 		if (list_empty(&page->list)) {
-			spin_unlock(&sgx_reclaimer_lock);
+			spin_unlock(&sgx_global_lru.lock);
 			return -EBUSY;
 		}
 
 		list_del(&page->list);
 		page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
 	}
-	spin_unlock(&sgx_reclaimer_lock);
+	spin_unlock(&sgx_global_lru.lock);
 
 	return 0;
 }
@@ -564,7 +565,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 			break;
 		}
 
-		if (list_empty(&sgx_active_page_list))
+		if (list_empty(&sgx_global_lru.reclaimable))
 			return ERR_PTR(-ENOMEM);
 
 		if (!reclaim) {
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5a7e858a8f98..7b208ee8eb45 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -83,6 +83,17 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 	return section->virt_addr + index * PAGE_SIZE;
 }
 
+struct sgx_epc_lru {
+	spinlock_t lock;
+	struct list_head reclaimable;
+};
+
+static inline void sgx_lru_init(struct sgx_epc_lru *lru)
+{
+	spin_lock_init(&lru->lock);
+	INIT_LIST_HEAD(&lru->reclaimable);
+}
+
 struct sgx_epc_page *__sgx_alloc_epc_page(void);
 void sgx_free_epc_page(struct sgx_epc_page *page);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (3 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s) Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-23 13:29   ` Jarkko Sakkinen
  2022-09-22 17:10 ` [RFC PATCH 06/20] x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages Kristen Carlson Accardi
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add code to keep track of pages that are not tracked by the reclaimer
in the LRU's "unreclaimable" list. When there is an OOM event and an
enclave must be OOM killed, the EPC pages which are not tracked by
the reclaimer can still be freed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 10 +++++++---
 arch/x86/kernel/cpu/sgx/ioctl.c | 11 +++++++----
 arch/x86/kernel/cpu/sgx/main.c  | 26 +++++++++++++++-----------
 arch/x86/kernel/cpu/sgx/sgx.h   |  7 ++++---
 arch/x86/kernel/cpu/sgx/virt.c  | 28 ++++++++++++++++++++--------
 5 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index a18f1311b57d..ad611c06798f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -252,6 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
 		epc_page = sgx_encl_eldu(&encl->secs, NULL);
 		if (IS_ERR(epc_page))
 			return ERR_CAST(epc_page);
+		sgx_record_epc_page(epc_page, 0);
 	}
 
 	epc_page = sgx_encl_eldu(entry, encl->secs.epc_page);
@@ -259,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
 		return ERR_CAST(epc_page);
 
 	encl->secs_child_cnt++;
-	sgx_mark_page_reclaimable(entry->epc_page);
+	sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED);
 
 	return entry;
 }
@@ -375,7 +376,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	encl_page->type = SGX_PAGE_TYPE_REG;
 	encl->secs_child_cnt++;
 
-	sgx_mark_page_reclaimable(encl_page->epc_page);
+	sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED);
 
 	phys_addr = sgx_get_epc_phys_addr(epc_page);
 	/*
@@ -687,7 +688,7 @@ void sgx_encl_release(struct kref *ref)
 			 * The page and its radix tree entry cannot be freed
 			 * if the page is being held by the reclaimer.
 			 */
-			if (sgx_unmark_page_reclaimable(entry->epc_page))
+			if (sgx_drop_epc_page(entry->epc_page))
 				continue;
 
 			sgx_encl_free_epc_page(entry->epc_page);
@@ -703,6 +704,7 @@ void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
+		sgx_drop_epc_page(encl->secs.epc_page);
 		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
@@ -711,6 +713,7 @@ void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
+		sgx_drop_epc_page(va_page->epc_page);
 		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
@@ -1218,6 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim)
 		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(-EFAULT);
 	}
+	sgx_record_epc_page(epc_page, 0);
 
 	epc_page->owner = encl;
 
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 9a1bb3c3211a..aca80a3f38a1 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -48,6 +48,7 @@ void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 	encl->page_cnt--;
 
 	if (va_page) {
+		sgx_drop_epc_page(va_page->epc_page);
 		sgx_encl_free_epc_page(va_page->epc_page);
 		list_del(&va_page->list);
 		kfree(va_page);
@@ -113,6 +114,8 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	encl->attributes = secs->attributes;
 	encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS;
 
+	sgx_record_epc_page(encl->secs.epc_page, 0);
+
 	/* Set only after completion, as encl->lock has not been taken. */
 	set_bit(SGX_ENCL_CREATED, &encl->flags);
 
@@ -322,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 			goto err_out;
 	}
 
-	sgx_mark_page_reclaimable(encl_page->epc_page);
+	sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED);
 	mutex_unlock(&encl->lock);
 	mmap_read_unlock(current->mm);
 	return ret;
@@ -958,7 +961,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
 			 * Prevent page from being reclaimed while mutex
 			 * is released.
 			 */
-			if (sgx_unmark_page_reclaimable(entry->epc_page)) {
+			if (sgx_drop_epc_page(entry->epc_page)) {
 				ret = -EAGAIN;
 				goto out_entry_changed;
 			}
@@ -973,7 +976,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
 
 			mutex_lock(&encl->lock);
 
-			sgx_mark_page_reclaimable(entry->epc_page);
+			sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED);
 		}
 
 		/* Change EPC type */
@@ -1130,7 +1133,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
 			goto out_unlock;
 		}
 
-		if (sgx_unmark_page_reclaimable(entry->epc_page)) {
+		if (sgx_drop_epc_page(entry->epc_page)) {
 			ret = -EBUSY;
 			goto out_unlock;
 		}
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index af68dc1c677b..543bc5b20508 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -262,7 +262,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 			goto out;
 
 		sgx_encl_ewb(encl->secs.epc_page, &secs_backing);
-
+		sgx_drop_epc_page(encl->secs.epc_page);
 		sgx_encl_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 
@@ -495,31 +495,35 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 }
 
 /**
- * sgx_mark_page_reclaimable() - Mark a page as reclaimable
+ * sgx_record_epc_page() - Add a page to the LRU tracking
  * @page:	EPC page
  *
- * Mark a page as reclaimable and add it to the active page list. Pages
- * are automatically removed from the active list when freed.
+ * Mark a page with the specified flags and add it to the appropriate
+ * (un)reclaimable list.
  */
-void sgx_mark_page_reclaimable(struct sgx_epc_page *page)
+void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags)
 {
 	spin_lock(&sgx_global_lru.lock);
-	page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED;
-	list_add_tail(&page->list, &sgx_global_lru.reclaimable);
+	WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+	page->flags |= flags;
+	if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED)
+		list_add_tail(&page->list, &sgx_global_lru.reclaimable);
+	else
+		list_add_tail(&page->list, &sgx_global_lru.unreclaimable);
 	spin_unlock(&sgx_global_lru.lock);
 }
 
 /**
- * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list
+ * sgx_drop_epc_page() - Remove a page from a LRU list
  * @page:	EPC page
  *
- * Clear the reclaimable flag and remove the page from the active page list.
+ * Clear the reclaimable flag if set and remove the page from its LRU.
  *
  * Return:
  *   0 on success,
  *   -EBUSY if the page is in the process of being reclaimed
  */
-int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
+int sgx_drop_epc_page(struct sgx_epc_page *page)
 {
 	spin_lock(&sgx_global_lru.lock);
 	if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) {
@@ -529,9 +533,9 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
 			return -EBUSY;
 		}
 
-		list_del(&page->list);
 		page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
 	}
+	list_del(&page->list);
 	spin_unlock(&sgx_global_lru.lock);
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 7b208ee8eb45..65625ea8fd6e 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -86,20 +86,21 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 struct sgx_epc_lru {
 	spinlock_t lock;
 	struct list_head reclaimable;
+	struct list_head unreclaimable;
 };
 
 static inline void sgx_lru_init(struct sgx_epc_lru *lru)
 {
 	spin_lock_init(&lru->lock);
 	INIT_LIST_HEAD(&lru->reclaimable);
+	INIT_LIST_HEAD(&lru->unreclaimable);
 }
 
 struct sgx_epc_page *__sgx_alloc_epc_page(void);
 void sgx_free_epc_page(struct sgx_epc_page *page);
-
 void sgx_reclaim_direct(void);
-void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
-int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
+void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags);
+int sgx_drop_epc_page(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
 void sgx_ipi_cb(void *info);
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 6a77a14eee38..287e235bc3c1 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -62,6 +62,8 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc,
 		goto err_delete;
 	}
 
+	sgx_record_epc_page(epc_page, 0);
+
 	return 0;
 
 err_delete:
@@ -146,6 +148,7 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 		return ret;
 	}
 
+	sgx_drop_epc_page(epc_page);
 	sgx_free_epc_page(epc_page);
 	return 0;
 }
@@ -218,8 +221,15 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
 		 * have been removed, the SECS page must have a child on
 		 * another instance.
 		 */
-		if (sgx_vepc_free_page(epc_page))
+		if (sgx_vepc_free_page(epc_page)) {
+			/*
+			 * Drop the page before adding it to the list of SECS
+			 * pages.  Moving the page off the unreclaimable list
+			 * needs to be done under the LRU's spinlock.
+			 */
+			sgx_drop_epc_page(epc_page);
 			list_add_tail(&epc_page->list, &secs_pages);
+		}
 
 		xa_erase(&vepc->page_array, index);
 	}
@@ -234,15 +244,17 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
 	mutex_lock(&zombie_secs_pages_lock);
 	list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) {
 		/*
-		 * Speculatively remove the page from the list of zombies,
-		 * if the page is successfully EREMOVE'd it will be added to
-		 * the list of free pages.  If EREMOVE fails, throw the page
-		 * on the local list, which will be spliced on at the end.
+		 * If EREMOVE fails, throw the page on the local list, which
+		 * will be spliced on at the end.
+		 *
+		 * Note, this abuses sgx_drop_epc_page() to delete the page off
+		 * the list of zombies, but this is a very rare path (probably
+		 * never hit in production).  It's not worth special casing the
+		 * free path for this super rare case just to avoid taking the
+		 * LRU's spinlock.
 		 */
-		list_del(&epc_page->list);
-
 		if (sgx_vepc_free_page(epc_page))
-			list_add_tail(&epc_page->list, &secs_pages);
+			list_move_tail(&epc_page->list, &secs_pages);
 	}
 
 	if (!list_empty(&secs_pages))
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 06/20] x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (4 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 07/20] x86/sgx: Use a list to track to-be-reclaimed pages during reclaim Kristen Carlson Accardi
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Keep track of whether the EPC page is in the middle of being reclaimed
and do not delete the page off the it's LRU if it has not yet finished
being reclaimed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++-----
 arch/x86/kernel/cpu/sgx/sgx.h  |  5 +++++
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 543bc5b20508..93aa9e09c26d 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -307,13 +307,15 @@ static void sgx_reclaim_pages(void)
 		list_del_init(&epc_page->list);
 		encl_page = epc_page->owner;
 
-		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0)
+		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
+			epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
 			chunk[cnt++] = epc_page;
-		else
+		} else {
 			/* The owner is freeing the page. No need to add the
 			 * page back to the list of reclaimable pages.
 			 */
 			epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
+		}
 	}
 	spin_unlock(&sgx_global_lru.lock);
 
@@ -339,6 +341,7 @@ static void sgx_reclaim_pages(void)
 
 skip:
 		spin_lock(&sgx_global_lru.lock);
+		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
 		list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable);
 		spin_unlock(&sgx_global_lru.lock);
 
@@ -362,7 +365,8 @@ static void sgx_reclaim_pages(void)
 		sgx_reclaimer_write(epc_page, &backing[i]);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
-		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
+		epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED |
+				     SGX_EPC_PAGE_RECLAIM_IN_PROGRESS);
 
 		sgx_free_epc_page(epc_page);
 	}
@@ -504,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags)
 {
 	spin_lock(&sgx_global_lru.lock);
-	WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+	WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIM_FLAGS);
 	page->flags |= flags;
 	if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED)
 		list_add_tail(&page->list, &sgx_global_lru.reclaimable);
@@ -528,7 +532,7 @@ int sgx_drop_epc_page(struct sgx_epc_page *page)
 	spin_lock(&sgx_global_lru.lock);
 	if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) {
 		/* The page is being reclaimed. */
-		if (list_empty(&page->list)) {
+		if (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) {
 			spin_unlock(&sgx_global_lru.lock);
 			return -EBUSY;
 		}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 65625ea8fd6e..284d0cda9e36 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -29,6 +29,11 @@
 /* Pages on free list */
 #define SGX_EPC_PAGE_IS_FREE		BIT(1)
 
+/* page flag to indicate reclaim is in progress */
+#define SGX_EPC_PAGE_RECLAIM_IN_PROGRESS BIT(2)
+#define SGX_EPC_PAGE_RECLAIM_FLAGS	(SGX_EPC_PAGE_RECLAIMER_TRACKED | \
+					 SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)
+
 struct sgx_epc_page {
 	unsigned int section;
 	u16 flags;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 07/20] x86/sgx: Use a list to track to-be-reclaimed pages during reclaim
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (5 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 06/20] x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 08/20] x86/sgx: Add EPC page flags to identify type of page Kristen Carlson Accardi
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Change sgx_reclaim_pages() to use a list rather than an array for
storing the epc_pages which will be reclaimed. This change is needed
to transition to the LRU implementation for EPC cgroup support.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 43 +++++++++++++++-------------------
 1 file changed, 19 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 93aa9e09c26d..085c06fdc359 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -288,12 +288,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
  */
 static void sgx_reclaim_pages(void)
 {
-	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
 	struct sgx_backing backing[SGX_NR_TO_SCAN];
 	struct sgx_encl_page *encl_page;
-	struct sgx_epc_page *epc_page;
+	struct sgx_epc_page *epc_page, *tmp;
 	pgoff_t page_index;
-	int cnt = 0;
+	LIST_HEAD(iso);
 	int ret;
 	int i;
 
@@ -304,23 +303,26 @@ static void sgx_reclaim_pages(void)
 
 		epc_page = list_first_entry(&sgx_global_lru.reclaimable,
 					    struct sgx_epc_page, list);
-		list_del_init(&epc_page->list);
 		encl_page = epc_page->owner;
 
 		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
 			epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
-			chunk[cnt++] = epc_page;
+			list_move_tail(&epc_page->list, &iso);
 		} else {
-			/* The owner is freeing the page. No need to add the
-			 * page back to the list of reclaimable pages.
+			/* The owner is freeing the page, remove it from the
+			 * LRU list
 			 */
 			epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
+			list_del_init(&epc_page->list);
 		}
 	}
 	spin_unlock(&sgx_global_lru.lock);
 
-	for (i = 0; i < cnt; i++) {
-		epc_page = chunk[i];
+	if (list_empty(&iso))
+		goto out;
+
+	i = 0;
+	list_for_each_entry_safe(epc_page, tmp, &iso, list) {
 		encl_page = epc_page->owner;
 
 		if (!sgx_reclaimer_age(epc_page))
@@ -335,6 +337,7 @@ static void sgx_reclaim_pages(void)
 			goto skip;
 		}
 
+		i++;
 		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
 		mutex_unlock(&encl_page->encl->lock);
 		continue;
@@ -342,27 +345,19 @@ static void sgx_reclaim_pages(void)
 skip:
 		spin_lock(&sgx_global_lru.lock);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
-		list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable);
+		list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable);
 		spin_unlock(&sgx_global_lru.lock);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
-
-		chunk[i] = NULL;
-	}
-
-	for (i = 0; i < cnt; i++) {
-		epc_page = chunk[i];
-		if (epc_page)
-			sgx_reclaimer_block(epc_page);
 	}
 
-	for (i = 0; i < cnt; i++) {
-		epc_page = chunk[i];
-		if (!epc_page)
-			continue;
+	list_for_each_entry(epc_page, &iso, list)
+		sgx_reclaimer_block(epc_page);
 
+	i= 0;
+	list_for_each_entry_safe(epc_page, tmp, &iso, list) {
 		encl_page = epc_page->owner;
-		sgx_reclaimer_write(epc_page, &backing[i]);
+		sgx_reclaimer_write(epc_page, &backing[i++]);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED |
@@ -370,7 +365,7 @@ static void sgx_reclaim_pages(void)
 
 		sgx_free_epc_page(epc_page);
 	}
-
+out:
 	cond_resched();
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 08/20] x86/sgx: Add EPC page flags to identify type of page
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (6 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 07/20] x86/sgx: Use a list to track to-be-reclaimed pages during reclaim Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 09/20] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default Kristen Carlson Accardi
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Create new flags to help identify whether a page is an enclave page
or a va page and save the page type when the page is recorded.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  |  6 +++---
 arch/x86/kernel/cpu/sgx/ioctl.c |  4 ++--
 arch/x86/kernel/cpu/sgx/main.c  | 20 ++++++++++----------
 arch/x86/kernel/cpu/sgx/sgx.h   |  8 +++++++-
 4 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index ad611c06798f..672b302f3688 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -252,7 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
 		epc_page = sgx_encl_eldu(&encl->secs, NULL);
 		if (IS_ERR(epc_page))
 			return ERR_CAST(epc_page);
-		sgx_record_epc_page(epc_page, 0);
+		sgx_record_epc_page(epc_page, SGX_EPC_PAGE_ENCLAVE);
 	}
 
 	epc_page = sgx_encl_eldu(entry, encl->secs.epc_page);
@@ -260,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
 		return ERR_CAST(epc_page);
 
 	encl->secs_child_cnt++;
-	sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED);
+	sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE);
 
 	return entry;
 }
@@ -1221,7 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim)
 		sgx_encl_free_epc_page(epc_page);
 		return ERR_PTR(-EFAULT);
 	}
-	sgx_record_epc_page(epc_page, 0);
+	sgx_record_epc_page(epc_page, SGX_EPC_PAGE_VERSION_ARRAY);
 
 	epc_page->owner = encl;
 
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index aca80a3f38a1..c91cc6a01232 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -114,7 +114,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 	encl->attributes = secs->attributes;
 	encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS;
 
-	sgx_record_epc_page(encl->secs.epc_page, 0);
+	sgx_record_epc_page(encl->secs.epc_page, SGX_EPC_PAGE_ENCLAVE);
 
 	/* Set only after completion, as encl->lock has not been taken. */
 	set_bit(SGX_ENCL_CREATED, &encl->flags);
@@ -325,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 			goto err_out;
 	}
 
-	sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED);
+	sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE);
 	mutex_unlock(&encl->lock);
 	mmap_read_unlock(current->mm);
 	return ret;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 085c06fdc359..3c0d33b72896 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -304,6 +304,8 @@ static void sgx_reclaim_pages(void)
 		epc_page = list_first_entry(&sgx_global_lru.reclaimable,
 					    struct sgx_epc_page, list);
 		encl_page = epc_page->owner;
+		if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE)))
+			continue;
 
 		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
 			epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
@@ -360,8 +362,7 @@ static void sgx_reclaim_pages(void)
 		sgx_reclaimer_write(epc_page, &backing[i++]);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
-		epc_page->flags &= ~(SGX_EPC_PAGE_RECLAIMER_TRACKED |
-				     SGX_EPC_PAGE_RECLAIM_IN_PROGRESS);
+		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS;
 
 		sgx_free_epc_page(epc_page);
 	}
@@ -496,6 +497,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 /**
  * sgx_record_epc_page() - Add a page to the LRU tracking
  * @page:	EPC page
+ * @flags:	Reclaim flags for the page.
  *
  * Mark a page with the specified flags and add it to the appropriate
  * (un)reclaimable list.
@@ -525,18 +527,16 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags)
 int sgx_drop_epc_page(struct sgx_epc_page *page)
 {
 	spin_lock(&sgx_global_lru.lock);
-	if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) {
-		/* The page is being reclaimed. */
-		if (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS) {
-			spin_unlock(&sgx_global_lru.lock);
-			return -EBUSY;
-		}
-
-		page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
+	if ((page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) &&
+	    (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)) {
+		spin_unlock(&sgx_global_lru.lock);
+		return -EBUSY;
 	}
 	list_del(&page->list);
 	spin_unlock(&sgx_global_lru.lock);
 
+	page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS;
+
 	return 0;
 }
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 284d0cda9e36..76eae4ecbf87 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,8 +31,14 @@
 
 /* page flag to indicate reclaim is in progress */
 #define SGX_EPC_PAGE_RECLAIM_IN_PROGRESS BIT(2)
+#define SGX_EPC_PAGE_ENCLAVE		BIT(3)
+#define SGX_EPC_PAGE_VERSION_ARRAY	BIT(4)
+#define SGX_EPC_PAGE_ENCLAVE_RECLAIMABLE (SGX_EPC_PAGE_ENCLAVE | \
+					  SGX_EPC_PAGE_RECLAIMER_TRACKED)
 #define SGX_EPC_PAGE_RECLAIM_FLAGS	(SGX_EPC_PAGE_RECLAIMER_TRACKED | \
-					 SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)
+					 SGX_EPC_PAGE_RECLAIM_IN_PROGRESS | \
+					 SGX_EPC_PAGE_ENCLAVE | \
+					 SGX_EPC_PAGE_VERSION_ARRAY)
 
 struct sgx_epc_page {
 	unsigned int section;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 09/20] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (7 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 08/20] x86/sgx: Add EPC page flags to identify type of page Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 10/20] x86/sgx: Return the number of EPC pages that were successfully reclaimed Kristen Carlson Accardi
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Modify sgx_reclaim_pages() to take a parameter that specifies the
number of pages to scan for reclaiming. Specify a max value of
32, but scan 16 in the usual case. This allows the number of pages
sgx_reclaim_pages() scans to be specified by the caller, and adjusted
in future patches.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 3c0d33b72896..0010ed1b2e98 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -18,6 +18,8 @@
 #include "encl.h"
 #include "encls.h"
 
+#define SGX_MAX_NR_TO_RECLAIM	32
+
 struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
@@ -273,7 +275,10 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	mutex_unlock(&encl->lock);
 }
 
-/*
+/**
+ * sgx_reclaim_pages() - Reclaim EPC pages from the consumers
+ * @nr_to_scan:		 Number of EPC pages to scan for reclaim
+ *
  * Take a fixed number of pages from the head of the active page pool and
  * reclaim them to the enclave's private shmem files. Skip the pages, which have
  * been accessed since the last scan. Move those pages to the tail of active
@@ -286,9 +291,9 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
  * problematic as it would increase the lock contention too much, which would
  * halt forward progress.
  */
-static void sgx_reclaim_pages(void)
+static void sgx_reclaim_pages(int nr_to_scan)
 {
-	struct sgx_backing backing[SGX_NR_TO_SCAN];
+	struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM];
 	struct sgx_encl_page *encl_page;
 	struct sgx_epc_page *epc_page, *tmp;
 	pgoff_t page_index;
@@ -297,7 +302,7 @@ static void sgx_reclaim_pages(void)
 	int i;
 
 	spin_lock(&sgx_global_lru.lock);
-	for (i = 0; i < SGX_NR_TO_SCAN; i++) {
+	for (i = 0; i < nr_to_scan; i++) {
 		if (list_empty(&sgx_global_lru.reclaimable))
 			break;
 
@@ -327,7 +332,7 @@ static void sgx_reclaim_pages(void)
 	list_for_each_entry_safe(epc_page, tmp, &iso, list) {
 		encl_page = epc_page->owner;
 
-		if (!sgx_reclaimer_age(epc_page))
+		if (i == SGX_MAX_NR_TO_RECLAIM || !sgx_reclaimer_age(epc_page))
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
@@ -384,7 +389,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
 void sgx_reclaim_direct(void)
 {
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
-		sgx_reclaim_pages();
+		sgx_reclaim_pages(SGX_NR_TO_SCAN);
 }
 
 static int ksgxd(void *p)
@@ -410,7 +415,7 @@ static int ksgxd(void *p)
 				     sgx_should_reclaim(SGX_NR_HIGH_PAGES));
 
 		if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
-			sgx_reclaim_pages();
+			sgx_reclaim_pages(SGX_NR_TO_SCAN);
 	}
 
 	return 0;
@@ -581,7 +586,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 			break;
 		}
 
-		sgx_reclaim_pages();
+		sgx_reclaim_pages(SGX_NR_TO_SCAN);
 	}
 
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 10/20] x86/sgx: Return the number of EPC pages that were successfully reclaimed
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (8 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 09/20] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 11/20] x86/sgx: Add option to ignore age of page during EPC reclaim Kristen Carlson Accardi
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Return the number of reclaimed pages from sgx_reclaim_pages(), the EPC
cgroup will use the result to track the success rate of its reclaim
calls, e.g. to escalate to a more forceful reclaiming mode if necessary.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 0010ed1b2e98..fc5aed813834 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -290,8 +290,10 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
  * + EWB) but not sufficiently. Reclaiming one page at a time would also be
  * problematic as it would increase the lock contention too much, which would
  * halt forward progress.
+ *
+ * Return: number of EPC pages reclaimed
  */
-static void sgx_reclaim_pages(int nr_to_scan)
+static int sgx_reclaim_pages(int nr_to_scan)
 {
 	struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM];
 	struct sgx_encl_page *encl_page;
@@ -373,6 +375,7 @@ static void sgx_reclaim_pages(int nr_to_scan)
 	}
 out:
 	cond_resched();
+	return i;
 }
 
 static bool sgx_should_reclaim(unsigned long watermark)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 11/20] x86/sgx: Add option to ignore age of page during EPC reclaim
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (9 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 10/20] x86/sgx: Return the number of EPC pages that were successfully reclaimed Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 12/20] x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page Kristen Carlson Accardi
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a flag to sgx_reclaim_pages() to instruct it to ignore the age of
page, i.e. reclaim the page even if it's young.  The EPC cgroup will use
the flag to enforce its limits by draining the reclaimable lists before
resorting to other measures, e.g. forcefully reclaimable "unreclaimable"
pages by killing enclaves.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index fc5aed813834..98531f6fb448 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -278,6 +278,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 /**
  * sgx_reclaim_pages() - Reclaim EPC pages from the consumers
  * @nr_to_scan:		 Number of EPC pages to scan for reclaim
+ * @ignore_age:		 Reclaim a page even if it is young
  *
  * Take a fixed number of pages from the head of the active page pool and
  * reclaim them to the enclave's private shmem files. Skip the pages, which have
@@ -293,7 +294,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
  *
  * Return: number of EPC pages reclaimed
  */
-static int sgx_reclaim_pages(int nr_to_scan)
+static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age)
 {
 	struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM];
 	struct sgx_encl_page *encl_page;
@@ -334,7 +335,8 @@ static int sgx_reclaim_pages(int nr_to_scan)
 	list_for_each_entry_safe(epc_page, tmp, &iso, list) {
 		encl_page = epc_page->owner;
 
-		if (i == SGX_MAX_NR_TO_RECLAIM || !sgx_reclaimer_age(epc_page))
+		if (i == SGX_MAX_NR_TO_RECLAIM ||
+		    (!ignore_age && !sgx_reclaimer_age(epc_page)))
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
@@ -392,7 +394,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
 void sgx_reclaim_direct(void)
 {
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
-		sgx_reclaim_pages(SGX_NR_TO_SCAN);
+		sgx_reclaim_pages(SGX_NR_TO_SCAN, false);
 }
 
 static int ksgxd(void *p)
@@ -418,7 +420,7 @@ static int ksgxd(void *p)
 				     sgx_should_reclaim(SGX_NR_HIGH_PAGES));
 
 		if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
-			sgx_reclaim_pages(SGX_NR_TO_SCAN);
+			sgx_reclaim_pages(SGX_NR_TO_SCAN, false);
 	}
 
 	return 0;
@@ -589,7 +591,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 			break;
 		}
 
-		sgx_reclaim_pages(SGX_NR_TO_SCAN);
+		sgx_reclaim_pages(SGX_NR_TO_SCAN, false);
 	}
 
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 12/20] x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (10 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 11/20] x86/sgx: Add option to ignore age of page during EPC reclaim Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 13/20] x86/sgx: Prepare for multiple LRUs Kristen Carlson Accardi
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce a function that will be used to retrieve an LRU
from an EPC page.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 98531f6fb448..9f2cb264a347 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -31,6 +31,10 @@ static DEFINE_XARRAY(sgx_epc_address_space);
  * with sgx_global_lru.lock acquired.
  */
 static struct sgx_epc_lru sgx_global_lru;
+static inline struct sgx_epc_lru *sgx_lru(struct sgx_epc_page *epc_page)
+{
+	return &sgx_global_lru;
+}
 
 static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0);
 
@@ -299,6 +303,7 @@ static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age)
 	struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM];
 	struct sgx_encl_page *encl_page;
 	struct sgx_epc_page *epc_page, *tmp;
+	struct sgx_epc_lru *lru;
 	pgoff_t page_index;
 	LIST_HEAD(iso);
 	int ret;
@@ -354,10 +359,11 @@ static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age)
 		continue;
 
 skip:
-		spin_lock(&sgx_global_lru.lock);
+		lru = sgx_lru(epc_page);
+		spin_lock(&lru->lock);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
-		list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable);
-		spin_unlock(&sgx_global_lru.lock);
+		list_move_tail(&epc_page->list, &lru->reclaimable);
+		spin_unlock(&lru->lock);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 	}
@@ -514,14 +520,16 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
  */
 void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags)
 {
-	spin_lock(&sgx_global_lru.lock);
+	struct sgx_epc_lru *lru = sgx_lru(page);
+
+	spin_lock(&lru->lock);
 	WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIM_FLAGS);
 	page->flags |= flags;
 	if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED)
-		list_add_tail(&page->list, &sgx_global_lru.reclaimable);
+		list_add_tail(&page->list, &lru->reclaimable);
 	else
-		list_add_tail(&page->list, &sgx_global_lru.unreclaimable);
-	spin_unlock(&sgx_global_lru.lock);
+		list_add_tail(&page->list, &lru->unreclaimable);
+	spin_unlock(&lru->lock);
 }
 
 /**
@@ -536,14 +544,16 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags)
  */
 int sgx_drop_epc_page(struct sgx_epc_page *page)
 {
-	spin_lock(&sgx_global_lru.lock);
+	struct sgx_epc_lru *lru = sgx_lru(page);
+
+	spin_lock(&lru->lock);
 	if ((page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) &&
 	    (page->flags & SGX_EPC_PAGE_RECLAIM_IN_PROGRESS)) {
-		spin_unlock(&sgx_global_lru.lock);
+		spin_unlock(&lru->lock);
 		return -EBUSY;
 	}
 	list_del(&page->list);
-	spin_unlock(&sgx_global_lru.lock);
+	spin_unlock(&lru->lock);
 
 	page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS;
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 13/20] x86/sgx: Prepare for multiple LRUs
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (11 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 12/20] x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 14/20] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Kristen Carlson Accardi
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add sgx_can_reclaim() wrapper so that in a subsequent patch, multiple LRUs
can be used cleanly.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 9f2cb264a347..ac49346302ed 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -386,10 +386,15 @@ static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age)
 	return i;
 }
 
+static bool sgx_can_reclaim(void)
+{
+	return !list_empty(&sgx_global_lru.reclaimable);
+}
+
 static bool sgx_should_reclaim(unsigned long watermark)
 {
 	return atomic_long_read(&sgx_nr_free_pages) < watermark &&
-	       !list_empty(&sgx_global_lru.reclaimable);
+		sgx_can_reclaim();
 }
 
 /*
@@ -588,7 +593,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 			break;
 		}
 
-		if (list_empty(&sgx_global_lru.reclaimable))
+		if (!sgx_can_reclaim())
 			return ERR_PTR(-ENOMEM);
 
 		if (!reclaim) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 14/20] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (12 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 13/20] x86/sgx: Prepare for multiple LRUs Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 15/20] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Kristen Carlson Accardi
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Expose the top-level reclaim function as sgx_reclaim_epc_pages() for use
by the upcoming EPC cgroup, which will initiate reclaim to enforce
changes to high/max limits.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++----
 arch/x86/kernel/cpu/sgx/sgx.h  | 1 +
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ac49346302ed..1791881aa1b1 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -281,6 +281,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 
 /**
  * sgx_reclaim_pages() - Reclaim EPC pages from the consumers
+ * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
  * @nr_to_scan:		 Number of EPC pages to scan for reclaim
  * @ignore_age:		 Reclaim a page even if it is young
  *
@@ -298,7 +299,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
  *
  * Return: number of EPC pages reclaimed
  */
-static int sgx_reclaim_pages(int nr_to_scan, bool ignore_age)
+int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age)
 {
 	struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM];
 	struct sgx_encl_page *encl_page;
@@ -405,7 +406,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
 void sgx_reclaim_direct(void)
 {
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
-		sgx_reclaim_pages(SGX_NR_TO_SCAN, false);
+		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
 }
 
 static int ksgxd(void *p)
@@ -431,7 +432,7 @@ static int ksgxd(void *p)
 				     sgx_should_reclaim(SGX_NR_HIGH_PAGES));
 
 		if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
-			sgx_reclaim_pages(SGX_NR_TO_SCAN, false);
+			sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
 	}
 
 	return 0;
@@ -606,7 +607,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 			break;
 		}
 
-		sgx_reclaim_pages(SGX_NR_TO_SCAN, false);
+		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
 	}
 
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 76eae4ecbf87..a2042303a666 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -113,6 +113,7 @@ void sgx_reclaim_direct(void);
 void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags);
 int sgx_drop_epc_page(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age);
 
 void sgx_ipi_cb(void *info);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 15/20] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (13 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 14/20] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 16/20] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Kristen Carlson Accardi
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the isolation loop into a standalone helper, sgx_isolate_pages(),
in preparation for existence of multiple LRUs.  Expose the helper to
other SGX code so that it can be called from the EPC cgroup code, e.g.
to isolate pages from a single cgroup LRU.  Exposing the isolation loop
allows the cgroup iteration logic to be wholly encapsulated within the
cgroup code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 72 ++++++++++++++++++++--------------
 arch/x86/kernel/cpu/sgx/sgx.h  |  2 +
 2 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 1791881aa1b1..151ad720a4ec 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -280,10 +280,47 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 }
 
 /**
- * sgx_reclaim_pages() - Reclaim EPC pages from the consumers
+ * sgx_isolate_epc_pages - Isolate pages from an LRU for reclaim
+ * @lru		LRU from which to reclaim
+ * @nr_to_scan	Number of pages to scan for reclaim
+ * @dst		Destination list to hold the isolated pages
+ */
+void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan,
+			   struct list_head *dst)
+{
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+
+	spin_lock(&lru->lock);
+	for (; *nr_to_scan > 0; --(*nr_to_scan)) {
+		if (list_empty(&lru->reclaimable))
+			break;
+
+		epc_page = list_first_entry(&lru->reclaimable,
+					    struct sgx_epc_page, list);
+
+		encl_page = epc_page->owner;
+		if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE)))
+			continue;
+
+		if (kref_get_unless_zero(&encl_page->encl->refcount)) {
+			epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
+			list_move_tail(&epc_page->list, dst);
+		} else {
+			/* The owner is freeing the page, remove it from the
+			 * LRU list
+			 */
+			epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
+			list_del_init(&epc_page->list);
+		}
+	}
+	spin_unlock(&lru->lock);
+}
+
+/**
  * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
- * @nr_to_scan:		 Number of EPC pages to scan for reclaim
- * @ignore_age:		 Reclaim a page even if it is young
+ * @nr_to_scan:		Number of EPC pages to scan for reclaim
+ * @ignore_age:		Reclaim a page even if it is young
  *
  * Take a fixed number of pages from the head of the active page pool and
  * reclaim them to the enclave's private shmem files. Skip the pages, which have
@@ -302,42 +339,19 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age)
 {
 	struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM];
-	struct sgx_encl_page *encl_page;
 	struct sgx_epc_page *epc_page, *tmp;
+	struct sgx_encl_page *encl_page;
 	struct sgx_epc_lru *lru;
 	pgoff_t page_index;
 	LIST_HEAD(iso);
+	int i = 0;
 	int ret;
-	int i;
-
-	spin_lock(&sgx_global_lru.lock);
-	for (i = 0; i < nr_to_scan; i++) {
-		if (list_empty(&sgx_global_lru.reclaimable))
-			break;
-
-		epc_page = list_first_entry(&sgx_global_lru.reclaimable,
-					    struct sgx_epc_page, list);
-		encl_page = epc_page->owner;
-		if (WARN_ON_ONCE(!(epc_page->flags & SGX_EPC_PAGE_ENCLAVE)))
-			continue;
 
-		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
-			epc_page->flags |= SGX_EPC_PAGE_RECLAIM_IN_PROGRESS;
-			list_move_tail(&epc_page->list, &iso);
-		} else {
-			/* The owner is freeing the page, remove it from the
-			 * LRU list
-			 */
-			epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
-			list_del_init(&epc_page->list);
-		}
-	}
-	spin_unlock(&sgx_global_lru.lock);
+	sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso);
 
 	if (list_empty(&iso))
 		goto out;
 
-	i = 0;
 	list_for_each_entry_safe(epc_page, tmp, &iso, list) {
 		encl_page = epc_page->owner;
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index a2042303a666..0598d534371b 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -114,6 +114,8 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags);
 int sgx_drop_epc_page(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age);
+void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan,
+			   struct list_head *dst);
 
 void sgx_ipi_cb(void *info);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 16/20] x86/sgx: Add EPC OOM path to forcefully reclaim EPC
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (14 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 15/20] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 17/20] cgroup, x86/sgx: Add SGX EPC cgroup controller Kristen Carlson Accardi
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce the OOM path for killing an enclave with the reclaimer
is no longer able to reclaim enough EPC pages. Find a victim enclave,
which will be an enclave with EPC pages remaining that are not
accessible to the reclaimer ("unreclaimable"). Once a victim is
identified, mark the enclave as OOM and zap the enclaves entire
page range. Release all the enclaves resources except for the
struct sgx_encl memory itself.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/encl.c |  74 +++++++++++++++---
 arch/x86/kernel/cpu/sgx/encl.h |   2 +
 arch/x86/kernel/cpu/sgx/main.c | 135 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |   1 +
 4 files changed, 201 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 672b302f3688..fe6f0a62c4f1 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -622,7 +622,8 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
 	if (!encl)
 		return -EFAULT;
 
-	if (!test_bit(SGX_ENCL_DEBUG, &encl->flags))
+	if (!test_bit(SGX_ENCL_DEBUG, &encl->flags) ||
+	    test_bit(SGX_ENCL_OOM, &encl->flags))
 		return -EFAULT;
 
 	for (i = 0; i < len; i += cnt) {
@@ -668,16 +669,8 @@ const struct vm_operations_struct sgx_vm_ops = {
 	.access = sgx_vma_access,
 };
 
-/**
- * sgx_encl_release - Destroy an enclave instance
- * @ref:	address of a kref inside &sgx_encl
- *
- * Used together with kref_put(). Frees all the resources associated with the
- * enclave and the instance itself.
- */
-void sgx_encl_release(struct kref *ref)
+static void __sgx_encl_release(struct sgx_encl *encl)
 {
-	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
 	struct sgx_va_page *va_page;
 	struct sgx_encl_page *entry;
 	unsigned long index;
@@ -712,7 +705,7 @@ void sgx_encl_release(struct kref *ref)
 	while (!list_empty(&encl->va_pages)) {
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
-		list_del(&va_page->list);
+		list_del_init(&va_page->list);
 		sgx_drop_epc_page(va_page->epc_page);
 		sgx_encl_free_epc_page(va_page->epc_page);
 		kfree(va_page);
@@ -728,10 +721,66 @@ void sgx_encl_release(struct kref *ref)
 	/* Detect EPC page leak's. */
 	WARN_ON_ONCE(encl->secs_child_cnt);
 	WARN_ON_ONCE(encl->secs.epc_page);
+}
+
+/**
+ * sgx_encl_release - Destroy an enclave instance
+ * @ref:	address of a kref inside &sgx_encl
+ *
+ * Used together with kref_put(). Frees all the resources associated with the
+ * enclave and the instance itself.
+ */
+void sgx_encl_release(struct kref *ref)
+{
+	struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
+
+	/* if the enclave was OOM killed previously, it just needs to be freed */
+	if (!test_bit(SGX_ENCL_OOM, &encl->flags))
+		__sgx_encl_release(encl);
 
 	kfree(encl);
 }
 
+/**
+ * sgx_encl_destroy - prepare the enclave for release
+ * @encl:	address of the sgx_encl to drain
+ *
+ * Used during oom kill to empty the mm_list entries after they have
+ * been zapped. Release the remaining enclave resources without freeing
+ * struct sgx_encl.
+ */
+void sgx_encl_destroy(struct sgx_encl *encl)
+{
+	struct sgx_encl_mm *encl_mm;
+
+	for ( ; ; )  {
+		spin_lock(&encl->mm_lock);
+
+		if (list_empty(&encl->mm_list)) {
+			encl_mm = NULL;
+		} else {
+			encl_mm = list_first_entry(&encl->mm_list,
+						   struct sgx_encl_mm, list);
+			list_del_rcu(&encl_mm->list);
+		}
+
+		spin_unlock(&encl->mm_lock);
+
+		/* The enclave is no longer mapped by any mm. */
+		if (!encl_mm)
+			break;
+
+		synchronize_srcu(&encl->srcu);
+		mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
+		kfree(encl_mm);
+
+		/* 'encl_mm' is gone, put encl_mm->encl reference: */
+		kref_put(&encl->refcount, sgx_encl_release);
+	}
+
+	__sgx_encl_release(encl);
+}
+
 /*
  * 'mm' is exiting and no longer needs mmu notifications.
  */
@@ -801,6 +850,9 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
 	struct sgx_encl_mm *encl_mm;
 	int ret;
 
+	if (test_bit(SGX_ENCL_OOM, &encl->flags))
+		return -ENOMEM;
+
 	/*
 	 * Even though a single enclave may be mapped into an mm more than once,
 	 * each 'mm' only appears once on encl->mm_list. This is guaranteed by
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 831d63f80f5a..f4935632e53a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -39,6 +39,7 @@ enum sgx_encl_flags {
 	SGX_ENCL_DEBUG		= BIT(1),
 	SGX_ENCL_CREATED	= BIT(2),
 	SGX_ENCL_INITIALIZED	= BIT(3),
+	SGX_ENCL_OOM		= BIT(4),
 };
 
 struct sgx_encl_mm {
@@ -125,5 +126,6 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 					 unsigned long addr);
 struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim);
 void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
+void sgx_encl_destroy(struct sgx_encl *encl);
 
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 151ad720a4ec..082c08228840 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -657,6 +657,141 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	atomic_long_inc(&sgx_nr_free_pages);
 }
 
+static bool sgx_oom_get_ref(struct sgx_epc_page *epc_page)
+{
+	struct sgx_encl *encl;
+
+	if (epc_page->flags & SGX_EPC_PAGE_ENCLAVE)
+		encl = ((struct sgx_encl_page *)epc_page->owner)->encl;
+	else if (epc_page->flags & SGX_EPC_PAGE_VERSION_ARRAY)
+		encl = epc_page->owner;
+	else
+		return false;
+
+	return kref_get_unless_zero(&encl->refcount);
+}
+
+static struct sgx_epc_page *sgx_oom_get_victim(struct sgx_epc_lru *lru)
+{
+	struct sgx_epc_page *epc_page, *tmp;
+
+	if (list_empty(&lru->unreclaimable))
+		return NULL;
+
+	list_for_each_entry_safe(epc_page, tmp, &lru->unreclaimable, list) {
+		list_del_init(&epc_page->list);
+
+		if (sgx_oom_get_ref(epc_page))
+			return epc_page;
+	}
+	return NULL;
+}
+
+static void sgx_epc_oom_zap(void *owner, struct mm_struct *mm, unsigned long start,
+			    unsigned long end, const struct vm_operations_struct *ops)
+{
+	struct vm_area_struct *vma, *tmp;
+	unsigned long vm_end;
+
+	vma = find_vma(mm, start);
+	if (!vma || vma->vm_ops != ops || vma->vm_private_data != owner ||
+	    vma->vm_start >= end)
+		return;
+
+	for (tmp = vma; tmp->vm_start < end; tmp = tmp->vm_next) {
+		do {
+			vm_end = tmp->vm_end;
+			tmp = tmp->vm_next;
+		} while (tmp && tmp->vm_ops == ops &&
+			 vma->vm_private_data == owner && tmp->vm_start < end);
+
+		zap_page_range(vma, vma->vm_start, vm_end - vma->vm_start);
+
+		if (!tmp)
+			break;
+	}
+}
+
+static void sgx_oom_encl(struct sgx_encl *encl)
+{
+	unsigned long mm_list_version;
+	struct sgx_encl_mm *encl_mm;
+	int idx;
+
+	set_bit(SGX_ENCL_OOM, &encl->flags);
+
+	if (!test_bit(SGX_ENCL_CREATED, &encl->flags))
+		goto out;
+
+	do {
+		mm_list_version = encl->mm_list_version;
+
+		/* Pairs with smp_rmb() in sgx_encl_mm_add(). */
+		smp_rmb();
+
+		idx = srcu_read_lock(&encl->srcu);
+
+		list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+			if (!mmget_not_zero(encl_mm->mm))
+				continue;
+
+			mmap_read_lock(encl_mm->mm);
+
+			sgx_epc_oom_zap(encl, encl_mm->mm, encl->base,
+					encl->base + encl->size, &sgx_vm_ops);
+
+			mmap_read_unlock(encl_mm->mm);
+
+			mmput_async(encl_mm->mm);
+		}
+
+		srcu_read_unlock(&encl->srcu, idx);
+	} while (WARN_ON_ONCE(encl->mm_list_version != mm_list_version));
+
+	mutex_lock(&encl->lock);
+	sgx_encl_destroy(encl);
+	mutex_unlock(&encl->lock);
+
+out:
+	/*
+	 * This puts the refcount we took when we identified this enclave as
+	 * an OOM victim.
+	 */
+	kref_put(&encl->refcount, sgx_encl_release);
+}
+
+static inline void sgx_oom_encl_page(struct sgx_encl_page *encl_page)
+{
+	return sgx_oom_encl(encl_page->encl);
+}
+
+/**
+ * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
+ * @lru:	LRU that is low
+ *
+ * Return:	%true if a victim was found and kicked.
+ */
+bool sgx_epc_oom(struct sgx_epc_lru *lru)
+{
+	struct sgx_epc_page *victim;
+
+	spin_lock(&lru->lock);
+	victim = sgx_oom_get_victim(lru);
+	spin_unlock(&lru->lock);
+
+	if (!victim)
+		return false;
+
+	if (victim->flags & SGX_EPC_PAGE_ENCLAVE)
+		sgx_oom_encl_page(victim->owner);
+	else if (victim->flags & SGX_EPC_PAGE_VERSION_ARRAY)
+		sgx_oom_encl(victim->owner);
+	else
+		WARN_ON_ONCE(1);
+
+	return true;
+}
+
 static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 					 unsigned long index,
 					 struct sgx_epc_section *section)
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 0598d534371b..a4c7ee0a4958 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -116,6 +116,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age);
 void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan,
 			   struct list_head *dst);
+bool sgx_epc_oom(struct sgx_epc_lru *lru);
 
 void sgx_ipi_cb(void *info);
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 17/20] cgroup, x86/sgx: Add SGX EPC cgroup controller
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (15 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 16/20] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 18/20] x86/sgx: Enable EPC cgroup controller in SGX core Kristen Carlson Accardi
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Tejun Heo, Zefan Li, Johannes Weiner
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Implement a cgroup controller, sgx_epc, which regulates distribution of
SGX Enclave Page Cache (EPC) memory.  EPC memory is independent from
normal system memory, e.g. must be reserved at boot from RAM and cannot
be converted between EPC and normal memory while the system is running.
EPC is managed by the SGX subsystem and is not accounted by the memory
controller.

Much like normal system memory, EPC memory can be overcommitted via
virtual memory techniques and pages can be swapped out of the EPC to
their backing store (normal system memory, e.g. shmem).  The SGX EPC
subsystem is analogous to the memory subsytem and the SGX EPC controller
is in turn analogous to the memory controller; it implements limit and
protection models for EPC memory.

"sgx_epc.high" and "sgx_epc.low" are the main mechanisms to control
EPC usage, while "sgx_epc.max" is a last line of defense mechanism.
"sgx_epc.high" is a best-effort limit of EPC usage.  "sgx_epc.low"
is a best-effort protection of EPC usage.  "sgx_epc.max" is a hard
limit of EPC usage.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/Makefile     |   1 +
 arch/x86/kernel/cpu/sgx/epc_cgroup.c | 830 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/epc_cgroup.h |  37 ++
 include/linux/cgroup_subsys.h        |   4 +
 init/Kconfig                         |  12 +
 5 files changed, 884 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c
 create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h

diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 9c1656779b2a..12901a488da7 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -4,3 +4,4 @@ obj-y += \
 	ioctl.o \
 	main.o
 obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
+obj-$(CONFIG_CGROUP_SGX_EPC)	       += epc_cgroup.o
diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
new file mode 100644
index 000000000000..0a61bb8548ff
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
@@ -0,0 +1,830 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2022 Intel Corporation.
+
+#include <linux/atomic.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/threads.h>
+
+#include "epc_cgroup.h"
+
+#define SGX_EPC_RECLAIM_MIN_PAGES		16UL
+#define SGX_EPC_RECLAIM_MAX_PAGES		64UL
+#define SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD	5
+#define SGX_EPC_RECLAIM_OOM_THRESHOLD		5
+
+struct sgx_epc_reclaim_control {
+	struct sgx_epc_cgroup *epc_cg;
+	int nr_fails;
+	bool ignore_age;
+};
+
+static struct sgx_epc_cgroup *root_epc_cgroup __read_mostly;
+static struct workqueue_struct *sgx_epc_cg_wq;
+
+static int __init sgx_epc_cgroup_init(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_SGX))
+		return 0;
+
+	sgx_epc_cg_wq = alloc_workqueue("sgx_epc_cg_wq",
+					WQ_UNBOUND | WQ_FREEZABLE,
+					WQ_UNBOUND_MAX_ACTIVE);
+	BUG_ON(!sgx_epc_cg_wq);
+
+	return 0;
+}
+subsys_initcall(sgx_epc_cgroup_init);
+
+static inline bool sgx_epc_cgroup_disabled(void)
+{
+	return !cgroup_subsys_enabled(sgx_epc_cgrp_subsys);
+}
+
+static
+struct sgx_epc_cgroup *sgx_epc_cgroup_from_css(struct cgroup_subsys_state *css)
+{
+	return container_of(css, struct sgx_epc_cgroup, css);
+}
+
+static
+struct sgx_epc_cgroup *sgx_epc_cgroup_from_task(struct task_struct *task)
+{
+	if (unlikely(!task))
+		return NULL;
+	return sgx_epc_cgroup_from_css(task_css(task, sgx_epc_cgrp_id));
+}
+
+static struct sgx_epc_cgroup *sgx_epc_cgroup_from_mm(struct mm_struct *mm)
+{
+	struct sgx_epc_cgroup *epc_cg;
+
+	rcu_read_lock();
+	do {
+		epc_cg = sgx_epc_cgroup_from_task(rcu_dereference(mm->owner));
+		if (unlikely(!epc_cg))
+			epc_cg = root_epc_cgroup;
+	} while (!css_tryget_online(&epc_cg->css));
+	rcu_read_unlock();
+
+	return epc_cg;
+}
+
+static struct sgx_epc_cgroup *parent_epc_cgroup(struct sgx_epc_cgroup *epc_cg)
+{
+	return sgx_epc_cgroup_from_css(epc_cg->css.parent);
+}
+
+/**
+ * sgx_epc_cgroup_iter - iterate over the EPC cgroup hierarchy
+ * @root:		hierarchy root
+ * @prev:		previously returned epc_cg, NULL on first invocation
+ * @reclaim_epoch:	epoch for shared reclaim walks, NULL for full walks
+ *
+ * Return: references to children of the hierarchy below @root, or
+ * @root itself, or %NULL after a full round-trip.
+ *
+ * Caller must pass the return value in @prev on subsequent invocations
+ * for reference counting, or use sgx_epc_cgroup_iter_break() to cancel
+ * a hierarchy walk before the round-trip is complete.
+ */
+static struct sgx_epc_cgroup *sgx_epc_cgroup_iter(struct sgx_epc_cgroup *prev,
+						  struct sgx_epc_cgroup *root,
+						  unsigned long *reclaim_epoch)
+{
+	struct cgroup_subsys_state *css = NULL;
+	struct sgx_epc_cgroup *epc_cg = NULL;
+	struct sgx_epc_cgroup *pos = NULL;
+	bool inc_epoch = false;
+
+	if (sgx_epc_cgroup_disabled())
+		return NULL;
+
+	if (!root)
+		root = root_epc_cgroup;
+
+	if (prev && !reclaim_epoch)
+		pos = prev;
+
+	rcu_read_lock();
+
+start:
+	if (reclaim_epoch) {
+		/*
+		 * Abort the walk if a reclaimer working from the same root has
+		 * started a new walk after this reclaimer has already scanned
+		 * at least one cgroup.
+		 */
+		if (prev && *reclaim_epoch != root->epoch)
+			goto out;
+
+		while (1) {
+			pos = READ_ONCE(root->reclaim_iter);
+			if (!pos || css_tryget(&pos->css))
+				break;
+
+			/*
+			 * The css is dying, clear the reclaim_iter immediately
+			 * instead of waiting for ->css_released to be called.
+			 * Busy waiting serves no purpose and attempting to wait
+			 * for ->css_released may actually block it from being
+			 * called.
+			 */
+			(void)cmpxchg(&root->reclaim_iter, pos, NULL);
+		}
+	}
+
+	if (pos)
+		css = &pos->css;
+
+	while (!epc_cg) {
+		css = css_next_descendant_pre(css, &root->css);
+		if (!css) {
+			/*
+			 * Increment the epoch as we've reached the end of the
+			 * tree and the next call to css_next_descendant_pre
+			 * will restart at root.  Do not update root->epoch
+			 * directly as we should only do so if we update the
+			 * reclaim_iter, i.e. a different thread may win the
+			 * race and update the epoch for us.
+			 */
+			inc_epoch = true;
+
+			/*
+			 * Reclaimers share the hierarchy walk, and a new one
+			 * might jump in at the end of the hierarchy.  Restart
+			 * at root so that  we don't return NULL on a thread's
+			 * initial call.
+			 */
+			if (!prev)
+				continue;
+			break;
+		}
+
+		/*
+		 * Verify the css and acquire a reference.  Don't take an
+		 * extra reference to root as it's either the global root
+		 * or is provided by the caller and so is guaranteed to be
+		 * alive.  Keep walking if this css is dying.
+		 */
+		if (css != &root->css && !css_tryget(css))
+			continue;
+
+		epc_cg = sgx_epc_cgroup_from_css(css);
+	}
+
+	if (reclaim_epoch) {
+		/*
+		 * reclaim_iter could have already been updated by a competing
+		 * thread; check that the value hasn't changed since we read
+		 * it to avoid reclaiming from the same cgroup twice.  If the
+		 * value did change, put all of our references and restart the
+		 * entire process, for all intents and purposes we're making a
+		 * new call.
+		 */
+		if (cmpxchg(&root->reclaim_iter, pos, epc_cg) != pos) {
+			if (epc_cg && epc_cg != root)
+				css_put(&epc_cg->css);
+			if (pos)
+				css_put(&pos->css);
+			css = NULL;
+			epc_cg = NULL;
+			inc_epoch = false;
+			goto start;
+		}
+
+		if (inc_epoch)
+			root->epoch++;
+		if (!prev)
+			*reclaim_epoch = root->epoch;
+
+		if (pos)
+			css_put(&pos->css);
+	}
+
+out:
+	rcu_read_unlock();
+	if (prev && prev != root)
+		css_put(&prev->css);
+
+	return epc_cg;
+}
+
+/**
+ * sgx_epc_cgroup_iter_break - abort a hierarchy walk prematurely
+ * @prev:	last visited cgroup as returned by sgx_epc_cgroup_iter()
+ * @root:	hierarchy root
+ */
+static void sgx_epc_cgroup_iter_break(struct sgx_epc_cgroup *prev,
+				      struct sgx_epc_cgroup *root)
+{
+	if (!root)
+		root = root_epc_cgroup;
+	if (prev && prev != root)
+		css_put(&prev->css);
+}
+
+/**
+ * sgx_epc_cgroup_lru_empty - check if a cgroup tree has no pages on its lrus
+ * @root:	root of the tree to check
+ *
+ * Return: %true if all cgroups under the specified root have empty LRU lists.
+ * Used to avoid livelocks due to a cgroup having a non-zero charge count but
+ * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or
+ * because all pages in the cgroup are unreclaimable.
+ */
+bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root)
+{
+	struct sgx_epc_cgroup *epc_cg;
+
+	for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL);
+	     epc_cg;
+	     epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) {
+		if (!list_empty(&epc_cg->lru.reclaimable)) {
+			sgx_epc_cgroup_iter_break(epc_cg, root);
+			return false;
+		}
+	}
+	return true;
+}
+
+static inline bool __sgx_epc_cgroup_is_low(struct sgx_epc_cgroup *epc_cg)
+{
+	unsigned long cur = page_counter_read(&epc_cg->pc);
+
+	return cur < epc_cg->pc.low &&
+	       cur < epc_cg->high &&
+	       cur < epc_cg->pc.max;
+}
+
+/**
+ * sgx_epc_cgroup_is_low - check if EPC consumption is below the normal range
+ * @epc_cg:	the EPC cgroup to check
+ * @root:	the top ancestor of the sub-tree being checked
+ *
+ * Returns %true if EPC consumption of @epc_cg, and that of all
+ * ancestors up to (but not including) @root, is below the normal range.
+ *
+ * @root is exclusive; it is never low when looked at directly and isn't
+ * checked when traversing the hierarchy.
+ *
+ * Excluding @root enables using sgx_epc.low to prioritize EPC usage
+ * between cgroups within a subtree of the hierarchy that is limited
+ * by sgx_epc.high or sgx_epc.max.
+ *
+ * For example, given cgroup A with children B and C:
+ *
+ *    A
+ *   / \
+ *  B   C
+ *
+ * and
+ *
+ *  1. A/sgx_epc.current > A/sgx_epc.high
+ *  2. A/B/sgx_epc.current < A/B/sgx_epc.low
+ *  3. A/C/sgx_epc.current >= A/C/sgx_epc.low
+ *
+ * As 'A' is high, i.e. triggers reclaim from 'A', and 'B' is low, we
+ * should reclaim from 'C' until 'A' is no longer high or until we can
+ * no longer reclaim from 'C'.  If 'A', i.e. @root, isn't excluded by
+ * when reclaming from 'A', then 'B' will not be considered low and we
+ * will reclaim indiscriminately from both 'B' and 'C'.
+ */
+static bool sgx_epc_cgroup_is_low(struct sgx_epc_cgroup *epc_cg,
+				  struct sgx_epc_cgroup *root)
+{
+	if (sgx_epc_cgroup_disabled())
+		return false;
+
+	if (!root)
+		root = root_epc_cgroup;
+	if (epc_cg == root)
+		return false;
+
+	for (; epc_cg != root; epc_cg = parent_epc_cgroup(epc_cg)) {
+		if (!__sgx_epc_cgroup_is_low(epc_cg))
+			return false;
+	}
+
+	return true;
+}
+
+/**
+ * sgx_epc_cgroup_all_in_use_are_low - check if all cgroups in a tree are low
+ * @root:	the root EPC cgroup of the hierarchy to check
+ *
+ * Returns true if all cgroups in a hierarchy are either low or
+ * or do not have any pages on their LRU.
+ */
+static bool sgx_epc_cgroup_all_in_use_are_low(struct sgx_epc_cgroup *root)
+{
+	struct sgx_epc_cgroup *epc_cg;
+
+	if (sgx_epc_cgroup_disabled())
+		return false;
+
+	for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL);
+	     epc_cg;
+	     epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) {
+		if (!list_empty(&epc_cg->lru.reclaimable) &&
+		    !__sgx_epc_cgroup_is_low(epc_cg)) {
+			sgx_epc_cgroup_iter_break(epc_cg, root);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root,
+				  int *nr_to_scan, struct list_head *dst)
+{
+        struct sgx_epc_cgroup *epc_cg;
+        unsigned long epoch;
+	bool do_high;
+
+	if (!*nr_to_scan)
+		return;
+
+	/*
+	 * If we're not targeting a specific cgroup, try to reclaim only from
+	 * cgroups that are above their high limit.  If there are none, then go
+	 * ahead and grab anything available.
+	 */
+	do_high = !root;
+retry:
+        for (epc_cg = sgx_epc_cgroup_iter(NULL, root, &epoch);
+             epc_cg;
+             epc_cg = sgx_epc_cgroup_iter(epc_cg, root, &epoch)) {
+		if (do_high && page_counter_read(&epc_cg->pc) < epc_cg->high)
+			continue;
+
+                if (sgx_epc_cgroup_is_low(epc_cg, root)) {
+                        /*
+                         * Ignore low if all cgroups below @root are low,
+			 * in which case low is "normal".
+                         */
+                        if (!sgx_epc_cgroup_all_in_use_are_low(root))
+                                continue;
+                }
+
+                sgx_isolate_epc_pages(&epc_cg->lru, nr_to_scan, dst);
+                if (!*nr_to_scan) {
+                        sgx_epc_cgroup_iter_break(epc_cg, root);
+                        break;
+                }
+        }
+	if (*nr_to_scan && do_high) {
+		do_high = false;
+		goto retry;
+	}
+}
+
+static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages,
+					struct sgx_epc_reclaim_control *rc)
+{
+	/*
+	 * Ensure sgx_reclaim_pages is called with a minimum and maximum
+	 * number of pages.  Attempting to reclaim only a few pages will
+	 * often fail and is inefficient, while reclaiming a huge number
+	 * of pages can result in soft lockups due to holding various
+	 * locks for an extended duration.  This also bounds nr_pages so
+	 * that its guaranteed not to overflow 'int nr_to_scan'.
+	 */
+	nr_pages = max(nr_pages, SGX_EPC_RECLAIM_MIN_PAGES);
+	nr_pages = min(nr_pages, SGX_EPC_RECLAIM_MAX_PAGES);
+
+	return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age);
+}
+
+static int sgx_epc_cgroup_reclaim_failed(struct sgx_epc_reclaim_control *rc)
+{
+	if (sgx_epc_cgroup_lru_empty(rc->epc_cg))
+		return -ENOMEM;
+
+	++rc->nr_fails;
+	if (rc->nr_fails > SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD)
+		rc->ignore_age = true;
+
+	return 0;
+}
+
+static inline
+void sgx_epc_reclaim_control_init(struct sgx_epc_reclaim_control *rc,
+				  struct sgx_epc_cgroup *epc_cg)
+{
+	rc->epc_cg = epc_cg;
+	rc->nr_fails = 0;
+	rc->ignore_age = false;
+}
+
+static inline void __sgx_epc_cgroup_reclaim_high(struct sgx_epc_cgroup *epc_cg)
+{
+	struct sgx_epc_reclaim_control rc;
+	unsigned long cur, high;
+
+	sgx_epc_reclaim_control_init(&rc, epc_cg);
+
+	for (;;) {
+		high = READ_ONCE(epc_cg->high);
+
+		cur = page_counter_read(&epc_cg->pc);
+		if (cur <= high)
+			break;
+
+		if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) {
+			if (sgx_epc_cgroup_reclaim_failed(&rc))
+				break;
+		}
+	}
+}
+
+static void sgx_epc_cgroup_reclaim_high(struct sgx_epc_cgroup *epc_cg)
+{
+	for (; epc_cg; epc_cg = parent_epc_cgroup(epc_cg))
+		__sgx_epc_cgroup_reclaim_high(epc_cg);
+}
+
+/*
+ * Scheduled by sgx_epc_cgroup_try_charge() to reclaim pages from the
+ * cgroup, either when the cgroup is at/near its maximum capacity or
+ * when the cgroup is above its high threshold.
+ */
+static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
+{
+	struct sgx_epc_reclaim_control rc;
+	struct sgx_epc_cgroup *epc_cg;
+	unsigned long cur, max;
+
+	epc_cg = container_of(work, struct sgx_epc_cgroup, reclaim_work);
+
+	sgx_epc_reclaim_control_init(&rc, epc_cg);
+
+	for (;;) {
+		max = READ_ONCE(epc_cg->pc.max);
+
+		/*
+		 * Adjust the limit down by one page, the goal is to free up
+		 * pages for fault allocations, not to simply obey the limit.
+		 * Conditionally decrementing max also means the cur vs. max
+		 * check will correctly handle the case where both are zero.
+		 */
+		if (max)
+			max--;
+
+		/*
+		 * Unless the limit is extremely low, in which case forcing
+		 * reclaim will likely cause thrashing, force the cgroup to
+		 * reclaim at least once if it's operating *near* its maximum
+		 * limit by adjusting @max down by half the min reclaim size.
+		 * This work func is scheduled by sgx_epc_cgroup_try_charge
+		 * when it cannot directly reclaim due to being in an atomic
+		 * context, e.g. EPC allocation in a fault handler.  Waiting
+		 * to reclaim until the cgroup is actually at its limit is less
+		 * performant as it means the faulting task is effectively
+		 * blocked until a worker makes its way through the global work
+		 * queue.
+		 */
+		if (max > SGX_EPC_RECLAIM_MAX_PAGES)
+			max -= (SGX_EPC_RECLAIM_MIN_PAGES/2);
+
+		cur = page_counter_read(&epc_cg->pc);
+		if (cur <= max)
+			break;
+
+		if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) {
+			if (sgx_epc_cgroup_reclaim_failed(&rc))
+				break;
+		}
+	}
+
+	sgx_epc_cgroup_reclaim_high(epc_cg);
+}
+
+static int __sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg,
+				       unsigned long nr_pages, bool reclaim)
+{
+	struct sgx_epc_reclaim_control rc;
+	unsigned long cur, max, over;
+	unsigned int nr_empty = 0;
+	struct page_counter *fail;
+
+	if (epc_cg == root_epc_cgroup) {
+		page_counter_charge(&epc_cg->pc, nr_pages);
+		return 0;
+	}
+
+	sgx_epc_reclaim_control_init(&rc, NULL);
+
+	for (;;) {
+		if (page_counter_try_charge(&epc_cg->pc, nr_pages, &fail))
+			break;
+
+		rc.epc_cg = container_of(fail, struct sgx_epc_cgroup, pc);
+		max = READ_ONCE(rc.epc_cg->pc.max);
+		if (nr_pages > max)
+			return -ENOMEM;
+
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+
+		if (!reclaim) {
+			queue_work(sgx_epc_cg_wq, &rc.epc_cg->reclaim_work);
+			return -EBUSY;
+		}
+
+		cur = page_counter_read(&rc.epc_cg->pc);
+		over = ((cur + nr_pages) > max) ?
+			(cur + nr_pages) - max : SGX_EPC_RECLAIM_MIN_PAGES;
+
+		if (!sgx_epc_cgroup_reclaim_pages(over, &rc)) {
+			if (sgx_epc_cgroup_reclaim_failed(&rc)) {
+				if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD)
+					return -ENOMEM;
+				schedule();
+			}
+		}
+	}
+
+	css_get_many(&epc_cg->css, nr_pages);
+
+	for (; epc_cg; epc_cg = parent_epc_cgroup(epc_cg)) {
+		if (page_counter_read(&epc_cg->pc) >= epc_cg->high) {
+			if (!reclaim)
+				queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work);
+			else
+				sgx_epc_cgroup_reclaim_high(epc_cg);
+			break;
+		}
+	}
+	return 0;
+}
+
+
+/**
+ * sgx_epc_cgroup_try_charge - hierarchically try to charge a single EPC page
+ * @mm:			the mm_struct of the process to charge
+ * @reclaim:		whether or not synchronous reclaim is allowed
+ * @epc_cg_ptr:		out parameter for the charged EPC cgroup
+ *
+ * Returns EPC cgroup or NULL on success, -errno on failure.
+ */
+struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm,
+						 bool reclaim)
+{
+	struct sgx_epc_cgroup *epc_cg;
+	int ret;
+
+	if (sgx_epc_cgroup_disabled())
+		return NULL;
+
+	epc_cg = sgx_epc_cgroup_from_mm(mm);
+	ret = __sgx_epc_cgroup_try_charge(epc_cg, 1, reclaim);
+	css_put(&epc_cg->css);
+
+	if (ret)
+		return ERR_PTR(ret);
+	return epc_cg;
+}
+
+/**
+ * sgx_epc_cgroup_uncharge - hierarchically uncharge EPC pages
+ * @epc_cg:	the charged epc cgroup
+ * @nr_pages:	the number of pages to uncharge
+ */
+void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg)
+{
+	if (sgx_epc_cgroup_disabled())
+		return;
+
+	page_counter_uncharge(&epc_cg->pc, 1);
+
+	if (epc_cg != root_epc_cgroup)
+		css_put_many(&epc_cg->css, 1);
+}
+
+static void sgx_epc_cgroup_oom(struct sgx_epc_cgroup *root)
+{
+	struct sgx_epc_cgroup *epc_cg;
+
+	for (epc_cg = sgx_epc_cgroup_iter(NULL, root, NULL);
+	     epc_cg;
+	     epc_cg = sgx_epc_cgroup_iter(epc_cg, root, NULL)) {
+		if (sgx_epc_oom(&epc_cg->lru)) {
+			sgx_epc_cgroup_iter_break(epc_cg, root);
+			return;
+		}
+	}
+}
+
+static struct cgroup_subsys_state *
+sgx_epc_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct sgx_epc_cgroup *parent = sgx_epc_cgroup_from_css(parent_css);
+	struct sgx_epc_cgroup *epc_cg;
+
+	epc_cg = kzalloc(sizeof(struct sgx_epc_cgroup), GFP_KERNEL);
+	if (!epc_cg)
+		return ERR_PTR(-ENOMEM);
+
+	if (!parent)
+		root_epc_cgroup = epc_cg;
+
+	epc_cg->high = PAGE_COUNTER_MAX;
+	sgx_lru_init(&epc_cg->lru);
+	page_counter_init(&epc_cg->pc, parent ? &parent->pc : NULL);
+	INIT_WORK(&epc_cg->reclaim_work, sgx_epc_cgroup_reclaim_work_func);
+
+	return &epc_cg->css;
+}
+
+static void sgx_epc_cgroup_css_released(struct cgroup_subsys_state *css)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(css);
+	struct sgx_epc_cgroup *dead_cg = epc_cg;
+
+	while ((epc_cg = parent_epc_cgroup(epc_cg)))
+		cmpxchg(&epc_cg->reclaim_iter, dead_cg, NULL);
+}
+
+static void sgx_epc_cgroup_css_free(struct cgroup_subsys_state *css)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(css);
+
+	cancel_work_sync(&epc_cg->reclaim_work);
+	kfree(epc_cg);
+}
+
+static u64 sgx_epc_current_read(struct cgroup_subsys_state *css,
+				struct cftype *cft)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(css);
+
+	return (u64)page_counter_read(&epc_cg->pc) * PAGE_SIZE;
+}
+
+static int sgx_epc_low_show(struct seq_file *m, void *v)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m));
+	unsigned long low = READ_ONCE(epc_cg->pc.low);
+
+	if (low == PAGE_COUNTER_MAX)
+		seq_puts(m, "max\n");
+	else
+		seq_printf(m, "%llu\n", (u64)low * PAGE_SIZE);
+
+	return 0;
+}
+
+static ssize_t sgx_epc_low_write(struct kernfs_open_file *of,
+				 char *buf, size_t nbytes, loff_t off)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of));
+	unsigned long low;
+	int err;
+
+	buf = strstrip(buf);
+	err = page_counter_memparse(buf, "max", &low);
+	if (err)
+		return err;
+
+	page_counter_set_low(&epc_cg->pc, low);
+
+	return nbytes;
+}
+
+static int sgx_epc_high_show(struct seq_file *m, void *v)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m));
+	unsigned long high = READ_ONCE(epc_cg->high);
+
+	if (high == PAGE_COUNTER_MAX)
+		seq_puts(m, "max\n");
+	else
+		seq_printf(m, "%llu\n", (u64)high * PAGE_SIZE);
+
+	return 0;
+}
+
+static ssize_t sgx_epc_high_write(struct kernfs_open_file *of,
+				  char *buf, size_t nbytes, loff_t off)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of));
+	struct sgx_epc_reclaim_control rc;
+	unsigned long cur, high;
+	int err;
+
+	buf = strstrip(buf);
+	err = page_counter_memparse(buf, "max", &high);
+	if (err)
+		return err;
+
+	epc_cg->high = high;
+
+	sgx_epc_reclaim_control_init(&rc, epc_cg);
+
+	for (;;) {
+		cur = page_counter_read(&epc_cg->pc);
+		if (cur <= high)
+			break;
+
+		if (signal_pending(current))
+			break;
+
+		if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) {
+			if (sgx_epc_cgroup_reclaim_failed(&rc))
+				break;
+		}
+	}
+
+	return nbytes;
+}
+
+static int sgx_epc_max_show(struct seq_file *m, void *v)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m));
+	unsigned long max = READ_ONCE(epc_cg->pc.max);
+
+	if (max == PAGE_COUNTER_MAX)
+		seq_puts(m, "max\n");
+	else
+		seq_printf(m, "%llu\n", (u64)max * PAGE_SIZE);
+
+	return 0;
+}
+
+
+static ssize_t sgx_epc_max_write(struct kernfs_open_file *of, char *buf,
+				 size_t nbytes, loff_t off)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of));
+	struct sgx_epc_reclaim_control rc;
+	unsigned int nr_empty = 0;
+	unsigned long cur, max;
+	int err;
+
+	buf = strstrip(buf);
+	err = page_counter_memparse(buf, "max", &max);
+	if (err)
+		return err;
+
+	xchg(&epc_cg->pc.max, max);
+
+	sgx_epc_reclaim_control_init(&rc, epc_cg);
+
+	for (;;) {
+		cur = page_counter_read(&epc_cg->pc);
+		if (cur <= max)
+			break;
+
+		if (signal_pending(current))
+			break;
+
+		if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) {
+			if (sgx_epc_cgroup_reclaim_failed(&rc)) {
+				if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD)
+					sgx_epc_cgroup_oom(epc_cg);
+				schedule();
+			}
+		}
+	}
+
+	return nbytes;
+}
+
+static struct cftype sgx_epc_cgroup_files[] = {
+	{
+		.name = "current",
+		.read_u64 = sgx_epc_current_read,
+	},
+	{
+		.name = "low",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = sgx_epc_low_show,
+		.write = sgx_epc_low_write,
+	},
+	{
+		.name = "high",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = sgx_epc_high_show,
+		.write = sgx_epc_high_write,
+	},
+	{
+		.name = "max",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = sgx_epc_max_show,
+		.write = sgx_epc_max_write,
+	},
+	{ }	/* terminate */
+};
+
+struct cgroup_subsys sgx_epc_cgrp_subsys = {
+	.css_alloc	= sgx_epc_cgroup_css_alloc,
+	.css_free	= sgx_epc_cgroup_css_free,
+	.css_released	= sgx_epc_cgroup_css_released,
+
+	.legacy_cftypes	= sgx_epc_cgroup_files,
+	.dfl_cftypes	= sgx_epc_cgroup_files,
+};
diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
new file mode 100644
index 000000000000..226304a3d523
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2022 Intel Corporation. */
+#ifndef _INTEL_SGX_EPC_CGROUP_H_
+#define _INTEL_SGX_EPC_CGROUP_H_
+
+#include <asm/sgx.h>
+#include <linux/cgroup.h>
+#include <linux/list.h>
+#include <linux/page_counter.h>
+#include <linux/workqueue.h>
+
+#include "sgx.h"
+
+#ifndef CONFIG_CGROUP_SGX_EPC
+struct sgx_epc_cgroup;
+#else
+struct sgx_epc_cgroup {
+	struct cgroup_subsys_state	css;
+
+	struct page_counter	pc;
+	unsigned long		high;
+
+	struct sgx_epc_lru	lru;
+	struct sgx_epc_cgroup	*reclaim_iter;
+	struct work_struct	reclaim_work;
+	unsigned int		epoch;
+};
+
+struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm,
+						 bool reclaim);
+void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg);
+bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root);
+void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root,
+				  int *nr_to_scan, struct list_head *dst);
+#endif
+
+#endif /* _INTEL_SGX_EPC_CGROUP_H_ */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 445235487230..ff7fbb3e057a 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -65,6 +65,10 @@ SUBSYS(rdma)
 SUBSYS(misc)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_SGX_EPC)
+SUBSYS(sgx_epc)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index 80fe60fa77fb..aba7502b40b0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1178,6 +1178,18 @@ config CGROUP_MISC
 	  For more information, please check misc cgroup section in
 	  /Documentation/admin-guide/cgroup-v2.rst.
 
+config CGROUP_SGX_EPC
+	bool "Enclave Page Cache (EPC) controller for Intel SGX"
+	depends on X86_SGX && MEMCG
+	select PAGE_COUNTER
+	help
+	  Provides control over the EPC footprint of tasks in a cgroup.
+	  EPC is a subset of regular memory that is usable only by SGX
+	  enclaves and is very limited in quantity, e.g. less than 1%
+	  of total DRAM.
+
+          Say N if unsure.
+
 config CGROUP_DEBUG
 	bool "Debug controller"
 	default n
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 18/20] x86/sgx: Enable EPC cgroup controller in SGX core
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (16 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 17/20] cgroup, x86/sgx: Add SGX EPC cgroup controller Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 19/20] x86/sgx: Add stats and events interfaces to EPC cgroup controller Kristen Carlson Accardi
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add the appropriate calls to (un)charge a cgroup during EPC page
allocation and free, and to isolate pages for reclaim based on the
provided cgroup.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/epc_cgroup.c |  2 +-
 arch/x86/kernel/cpu/sgx/main.c       | 65 +++++++++++++++++++++++++---
 arch/x86/kernel/cpu/sgx/sgx.h        |  7 ++-
 3 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
index 0a61bb8548ff..71da3b499950 100644
--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
@@ -396,7 +396,7 @@ static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages,
 	nr_pages = max(nr_pages, SGX_EPC_RECLAIM_MIN_PAGES);
 	nr_pages = min(nr_pages, SGX_EPC_RECLAIM_MAX_PAGES);
 
-	return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age);
+	return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age, rc->epc_cg);
 }
 
 static int sgx_epc_cgroup_reclaim_failed(struct sgx_epc_reclaim_control *rc)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 082c08228840..29653a0d4670 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -17,6 +17,7 @@
 #include "driver.h"
 #include "encl.h"
 #include "encls.h"
+#include "epc_cgroup.h"
 
 #define SGX_MAX_NR_TO_RECLAIM	32
 
@@ -33,6 +34,10 @@ static DEFINE_XARRAY(sgx_epc_address_space);
 static struct sgx_epc_lru sgx_global_lru;
 static inline struct sgx_epc_lru *sgx_lru(struct sgx_epc_page *epc_page)
 {
+#ifdef CONFIG_CGROUP_SGX_EPC
+	if (epc_page->epc_cg)
+		return &epc_page->epc_cg->lru;
+#endif
 	return &sgx_global_lru;
 }
 
@@ -321,6 +326,7 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan,
  * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
  * @nr_to_scan:		Number of EPC pages to scan for reclaim
  * @ignore_age:		Reclaim a page even if it is young
+ * @epc_cg:		EPC cgroup from which to reclaim
  *
  * Take a fixed number of pages from the head of the active page pool and
  * reclaim them to the enclave's private shmem files. Skip the pages, which have
@@ -336,7 +342,8 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan,
  *
  * Return: number of EPC pages reclaimed
  */
-int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age)
+int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age,
+			  struct sgx_epc_cgroup *epc_cg)
 {
 	struct sgx_backing backing[SGX_MAX_NR_TO_RECLAIM];
 	struct sgx_epc_page *epc_page, *tmp;
@@ -347,8 +354,17 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age)
 	int i = 0;
 	int ret;
 
-	sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso);
+        /*
+         * If a specific cgroup is not being targetted, take from the global
+         * list first, even when cgroups are enabled.  If there are
+         * pages on the global LRU then they should get reclaimed asap.
+         */
+        if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC) || !epc_cg)
+                sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso);
 
+#ifdef CONFIG_CGROUP_SGX_EPC
+	sgx_epc_cgroup_isolate_pages(epc_cg, &nr_to_scan, &iso);
+#endif
 	if (list_empty(&iso))
 		goto out;
 
@@ -394,6 +410,12 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age)
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIM_FLAGS;
 
+#ifdef CONFIG_CGROUP_SGX_EPC
+		if (epc_page->epc_cg) {
+			sgx_epc_cgroup_uncharge(epc_page->epc_cg);
+			epc_page->epc_cg = NULL;
+		}
+#endif
 		sgx_free_epc_page(epc_page);
 	}
 out:
@@ -403,7 +425,11 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age)
 
 static bool sgx_can_reclaim(void)
 {
+#ifdef CONFIG_CGROUP_SGX_EPC
+	return !sgx_epc_cgroup_lru_empty(NULL);
+#else
 	return !list_empty(&sgx_global_lru.reclaimable);
+#endif
 }
 
 static bool sgx_should_reclaim(unsigned long watermark)
@@ -420,7 +446,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
 void sgx_reclaim_direct(void)
 {
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
-		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
+		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);
 }
 
 static int ksgxd(void *p)
@@ -446,7 +472,7 @@ static int ksgxd(void *p)
 				     sgx_should_reclaim(SGX_NR_HIGH_PAGES));
 
 		if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
-			sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
+			sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);
 	}
 
 	return 0;
@@ -600,7 +626,13 @@ int sgx_drop_epc_page(struct sgx_epc_page *page)
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 {
 	struct sgx_epc_page *page;
+#ifdef CONFIG_CGROUP_SGX_EPC
+	struct sgx_epc_cgroup *epc_cg;
 
+	epc_cg = sgx_epc_cgroup_try_charge(current->mm, reclaim);
+	if (IS_ERR(epc_cg))
+		return ERR_CAST(epc_cg);
+#endif
 	for ( ; ; ) {
 		page = __sgx_alloc_epc_page();
 		if (!IS_ERR(page)) {
@@ -608,8 +640,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 			break;
 		}
 
-		if (!sgx_can_reclaim())
-			return ERR_PTR(-ENOMEM);
+		if (!sgx_can_reclaim()) {
+			page = ERR_PTR(-ENOMEM);
+			break;
+		}
 
 		if (!reclaim) {
 			page = ERR_PTR(-EBUSY);
@@ -621,9 +655,17 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 			break;
 		}
 
-		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
+		sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);
 	}
 
+#ifdef CONFIG_CGROUP_SGX_EPC
+	if (!IS_ERR(page)) {
+		WARN_ON(page->epc_cg);
+		page->epc_cg = epc_cg;
+	} else {
+		sgx_epc_cgroup_uncharge(epc_cg);
+	}
+#endif
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
 		wake_up(&ksgxd_waitq);
 
@@ -654,6 +696,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	page->flags = SGX_EPC_PAGE_IS_FREE;
 
 	spin_unlock(&node->lock);
+#ifdef CONFIG_CGROUP_SGX_EPC
+	if (page->epc_cg) {
+		sgx_epc_cgroup_uncharge(page->epc_cg);
+		page->epc_cg = NULL;
+	}
+#endif
 	atomic_long_inc(&sgx_nr_free_pages);
 }
 
@@ -818,6 +866,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].flags = 0;
 		section->pages[i].owner = NULL;
 		section->pages[i].poison = 0;
+#ifdef CONFIG_CGROUP_SGX_EPC
+		section->pages[i].epc_cg = NULL;
+#endif
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index a4c7ee0a4958..3ea96779dd28 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -39,6 +39,7 @@
 					 SGX_EPC_PAGE_RECLAIM_IN_PROGRESS | \
 					 SGX_EPC_PAGE_ENCLAVE | \
 					 SGX_EPC_PAGE_VERSION_ARRAY)
+struct sgx_epc_cgroup;
 
 struct sgx_epc_page {
 	unsigned int section;
@@ -46,6 +47,9 @@ struct sgx_epc_page {
 	u16 poison;
 	void *owner;
 	struct list_head list;
+#ifdef CONFIG_CGROUP_SGX_EPC
+	struct sgx_epc_cgroup *epc_cg;
+#endif
 };
 
 /*
@@ -113,7 +117,8 @@ void sgx_reclaim_direct(void);
 void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags);
 int sgx_drop_epc_page(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
-int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age);
+int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age,
+			  struct sgx_epc_cgroup *epc_cg);
 void sgx_isolate_epc_pages(struct sgx_epc_lru *lru, int *nr_to_scan,
 			   struct list_head *dst);
 bool sgx_epc_oom(struct sgx_epc_lru *lru);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 19/20] x86/sgx: Add stats and events interfaces to EPC cgroup controller
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (17 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 18/20] x86/sgx: Enable EPC cgroup controller in SGX core Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:10 ` [RFC PATCH 20/20] docs, cgroup, x86/sgx: Add SGX EPC cgroup controller documentation Kristen Carlson Accardi
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Kristen Carlson Accardi, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Enable the cgroup sgx_epc.stats and sgx_epc.events files and
associated counters.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/epc_cgroup.c | 134 +++++++++++++++++++++++++--
 arch/x86/kernel/cpu/sgx/epc_cgroup.h |  16 +++-
 arch/x86/kernel/cpu/sgx/main.c       |   6 +-
 3 files changed, 145 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
index 71da3b499950..8541029b86be 100644
--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
@@ -77,6 +77,43 @@ static struct sgx_epc_cgroup *parent_epc_cgroup(struct sgx_epc_cgroup *epc_cg)
 	return sgx_epc_cgroup_from_css(epc_cg->css.parent);
 }
 
+static inline unsigned long sgx_epc_cgroup_cnt_read(struct sgx_epc_cgroup *epc_cg,
+						    enum sgx_epc_cgroup_counter i)
+{
+	return atomic_long_read(&epc_cg->cnt[i]);
+}
+
+static inline void sgx_epc_cgroup_cnt_reset(struct sgx_epc_cgroup *epc_cg,
+					    enum sgx_epc_cgroup_counter i)
+{
+	atomic_long_set(&epc_cg->cnt[i], 0);
+}
+
+static inline void sgx_epc_cgroup_cnt_add(struct sgx_epc_cgroup *epc_cg,
+					  enum sgx_epc_cgroup_counter i,
+					  unsigned long cnt)
+{
+	atomic_long_add(cnt, &epc_cg->cnt[i]);
+}
+
+static inline void sgx_epc_cgroup_event(struct sgx_epc_cgroup *epc_cg,
+					enum sgx_epc_cgroup_counter i,
+					unsigned long cnt)
+{
+	sgx_epc_cgroup_cnt_add(epc_cg, i, cnt);
+
+	if (i == SGX_EPC_CGROUP_LOW || i == SGX_EPC_CGROUP_HIGH ||
+	    i == SGX_EPC_CGROUP_MAX)
+		cgroup_file_notify(&epc_cg->events_file);
+}
+
+static inline void sgx_epc_cgroup_cnt_sub(struct sgx_epc_cgroup *epc_cg,
+					  enum sgx_epc_cgroup_counter i,
+					  unsigned long cnt)
+{
+	atomic_long_sub(cnt, &epc_cg->cnt[i]);
+}
+
 /**
  * sgx_epc_cgroup_iter - iterate over the EPC cgroup hierarchy
  * @root:		hierarchy root
@@ -368,7 +405,9 @@ void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root,
                          */
                         if (!sgx_epc_cgroup_all_in_use_are_low(root))
                                 continue;
+			sgx_epc_cgroup_event(epc_cg, SGX_EPC_CGROUP_LOW, 1);
                 }
+		sgx_epc_cgroup_event(epc_cg, SGX_EPC_CGROUP_RECLAMATIONS, 1);
 
                 sgx_isolate_epc_pages(&epc_cg->lru, nr_to_scan, dst);
                 if (!*nr_to_scan) {
@@ -383,8 +422,11 @@ void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root,
 }
 
 static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages,
-					struct sgx_epc_reclaim_control *rc)
+					struct sgx_epc_reclaim_control *rc,
+					enum sgx_epc_cgroup_counter c)
 {
+	sgx_epc_cgroup_event(rc->epc_cg, c, 1);
+
 	/*
 	 * Ensure sgx_reclaim_pages is called with a minimum and maximum
 	 * number of pages.  Attempting to reclaim only a few pages will
@@ -434,7 +476,8 @@ static inline void __sgx_epc_cgroup_reclaim_high(struct sgx_epc_cgroup *epc_cg)
 		if (cur <= high)
 			break;
 
-		if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) {
+		if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc,
+						  SGX_EPC_CGROUP_HIGH)) {
 			if (sgx_epc_cgroup_reclaim_failed(&rc))
 				break;
 		}
@@ -494,7 +537,8 @@ static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work)
 		if (cur <= max)
 			break;
 
-		if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) {
+		if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc,
+						  SGX_EPC_CGROUP_MAX)) {
 			if (sgx_epc_cgroup_reclaim_failed(&rc))
 				break;
 		}
@@ -539,7 +583,8 @@ static int __sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg,
 		over = ((cur + nr_pages) > max) ?
 			(cur + nr_pages) - max : SGX_EPC_RECLAIM_MIN_PAGES;
 
-		if (!sgx_epc_cgroup_reclaim_pages(over, &rc)) {
+		if (!sgx_epc_cgroup_reclaim_pages(over, &rc,
+						  SGX_EPC_CGROUP_MAX)) {
 			if (sgx_epc_cgroup_reclaim_failed(&rc)) {
 				if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD)
 					return -ENOMEM;
@@ -586,6 +631,8 @@ struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm,
 
 	if (ret)
 		return ERR_PTR(ret);
+
+	sgx_epc_cgroup_cnt_add(epc_cg, SGX_EPC_CGROUP_PAGES, 1);
 	return epc_cg;
 }
 
@@ -593,13 +640,17 @@ struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm,
  * sgx_epc_cgroup_uncharge - hierarchically uncharge EPC pages
  * @epc_cg:	the charged epc cgroup
  * @nr_pages:	the number of pages to uncharge
+ * @reclaimed:	whether the pages were reclaimed (vs. freed)
  */
-void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg)
+void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg, bool reclaimed)
 {
 	if (sgx_epc_cgroup_disabled())
 		return;
 
 	page_counter_uncharge(&epc_cg->pc, 1);
+	sgx_epc_cgroup_cnt_sub(epc_cg, SGX_EPC_CGROUP_PAGES, 1);
+	if (reclaimed)
+		sgx_epc_cgroup_event(epc_cg, SGX_EPC_CGROUP_RECLAIMED, 1);
 
 	if (epc_cg != root_epc_cgroup)
 		css_put_many(&epc_cg->css, 1);
@@ -665,6 +716,61 @@ static u64 sgx_epc_current_read(struct cgroup_subsys_state *css,
 	return (u64)page_counter_read(&epc_cg->pc) * PAGE_SIZE;
 }
 
+static int sgx_epc_stats_show(struct seq_file *m, void *v)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m));
+
+	unsigned long cur, dir, rec, recs;
+	cur = page_counter_read(&epc_cg->pc);
+	dir = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_PAGES);
+	rec = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_RECLAIMED);
+	recs= sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_RECLAMATIONS);
+
+	seq_printf(m, "pages            %lu\n", cur);
+	seq_printf(m, "direct           %lu\n", dir);
+	seq_printf(m, "indirect         %lu\n", (cur - dir));
+	seq_printf(m, "reclaimed        %lu\n", rec);
+	seq_printf(m, "reclamations	%lu\n", recs);
+
+	return 0;
+}
+
+static ssize_t sgx_epc_stats_reset(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of));
+	sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_RECLAIMED);
+	sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_RECLAMATIONS);
+	return nbytes;
+}
+
+
+static int sgx_epc_events_show(struct seq_file *m, void *v)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m));
+
+	unsigned long low, high, max;
+	low  = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_LOW);
+	high = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_HIGH);
+	max  = sgx_epc_cgroup_cnt_read(epc_cg, SGX_EPC_CGROUP_MAX);
+
+	seq_printf(m, "low      %lu\n", low);
+	seq_printf(m, "high     %lu\n", high);
+	seq_printf(m, "max      %lu\n", max);
+
+	return 0;
+}
+
+static ssize_t sgx_epc_events_reset(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes, loff_t off)
+{
+	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(of_css(of));
+	sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_LOW);
+	sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_HIGH);
+	sgx_epc_cgroup_cnt_reset(epc_cg, SGX_EPC_CGROUP_MAX);
+	return nbytes;
+}
+
 static int sgx_epc_low_show(struct seq_file *m, void *v)
 {
 	struct sgx_epc_cgroup *epc_cg = sgx_epc_cgroup_from_css(seq_css(m));
@@ -733,7 +839,8 @@ static ssize_t sgx_epc_high_write(struct kernfs_open_file *of,
 		if (signal_pending(current))
 			break;
 
-		if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc)) {
+		if (!sgx_epc_cgroup_reclaim_pages(cur - high, &rc,
+						  SGX_EPC_CGROUP_HIGH)) {
 			if (sgx_epc_cgroup_reclaim_failed(&rc))
 				break;
 		}
@@ -782,7 +889,8 @@ static ssize_t sgx_epc_max_write(struct kernfs_open_file *of, char *buf,
 		if (signal_pending(current))
 			break;
 
-		if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) {
+		if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc,
+						  SGX_EPC_CGROUP_MAX)) {
 			if (sgx_epc_cgroup_reclaim_failed(&rc)) {
 				if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD)
 					sgx_epc_cgroup_oom(epc_cg);
@@ -799,6 +907,18 @@ static struct cftype sgx_epc_cgroup_files[] = {
 		.name = "current",
 		.read_u64 = sgx_epc_current_read,
 	},
+	{
+		.name = "stats",
+		.seq_show = sgx_epc_stats_show,
+		.write = sgx_epc_stats_reset,
+	},
+	{
+		.name = "events",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.file_offset = offsetof(struct sgx_epc_cgroup, events_file),
+		.seq_show = sgx_epc_events_show,
+		.write = sgx_epc_events_reset,
+	},
 	{
 		.name = "low",
 		.flags = CFTYPE_NOT_ON_ROOT,
diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
index 226304a3d523..656c9f386b48 100644
--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
@@ -14,6 +14,16 @@
 #ifndef CONFIG_CGROUP_SGX_EPC
 struct sgx_epc_cgroup;
 #else
+enum sgx_epc_cgroup_counter {
+	SGX_EPC_CGROUP_PAGES,
+	SGX_EPC_CGROUP_RECLAIMED,
+	SGX_EPC_CGROUP_RECLAMATIONS,
+	SGX_EPC_CGROUP_LOW,
+	SGX_EPC_CGROUP_HIGH,
+	SGX_EPC_CGROUP_MAX,
+	SGX_EPC_CGROUP_NR_COUNTERS,
+};
+
 struct sgx_epc_cgroup {
 	struct cgroup_subsys_state	css;
 
@@ -24,11 +34,15 @@ struct sgx_epc_cgroup {
 	struct sgx_epc_cgroup	*reclaim_iter;
 	struct work_struct	reclaim_work;
 	unsigned int		epoch;
+
+	atomic_long_t           cnt[SGX_EPC_CGROUP_NR_COUNTERS];
+
+	struct cgroup_file      events_file;
 };
 
 struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(struct mm_struct *mm,
 						 bool reclaim);
-void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg);
+void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg, bool reclaimed);
 bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root);
 void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root,
 				  int *nr_to_scan, struct list_head *dst);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 29653a0d4670..3330ed4d0d43 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -412,7 +412,7 @@ int sgx_reclaim_epc_pages(int nr_to_scan, bool ignore_age,
 
 #ifdef CONFIG_CGROUP_SGX_EPC
 		if (epc_page->epc_cg) {
-			sgx_epc_cgroup_uncharge(epc_page->epc_cg);
+			sgx_epc_cgroup_uncharge(epc_page->epc_cg, true);
 			epc_page->epc_cg = NULL;
 		}
 #endif
@@ -663,7 +663,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 		WARN_ON(page->epc_cg);
 		page->epc_cg = epc_cg;
 	} else {
-		sgx_epc_cgroup_uncharge(epc_cg);
+		sgx_epc_cgroup_uncharge(epc_cg, false);
 	}
 #endif
 	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
@@ -698,7 +698,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	spin_unlock(&node->lock);
 #ifdef CONFIG_CGROUP_SGX_EPC
 	if (page->epc_cg) {
-		sgx_epc_cgroup_uncharge(page->epc_cg);
+		sgx_epc_cgroup_uncharge(page->epc_cg, false);
 		page->epc_cg = NULL;
 	}
 #endif
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 20/20] docs, cgroup, x86/sgx: Add SGX EPC cgroup controller documentation
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (18 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 19/20] x86/sgx: Add stats and events interfaces to EPC cgroup controller Kristen Carlson Accardi
@ 2022-09-22 17:10 ` Kristen Carlson Accardi
  2022-09-22 17:41 ` [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Tejun Heo
  2022-09-23 12:24 ` Jarkko Sakkinen
  21 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 17:10 UTC (permalink / raw)
  To: linux-kernel, linux-sgx, cgroups, Tejun Heo, Zefan Li,
	Johannes Weiner, Jonathan Corbet
  Cc: Kristen Carlson Accardi, Sean Christopherson, linux-doc

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add initial documentation for the SGX EPC cgroup controller,
which regulates distribution of SGX Enclave Page Cache (EPC) memory.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 201 ++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index be4a77baf784..c355cb08fc18 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -71,6 +71,10 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
        5.9-2 Migration and Ownership
      5-10. Others
        5-10-1. perf_event
+     5-11. SGX EPC
+       5-11-1. SGX EPC Interface Files
+       5-11-2. Usage Guidelines
+       5-11-3. Migration
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -2440,6 +2444,203 @@ always be filtered by cgroup v2 path.  The controller can still be
 moved to a legacy hierarchy after v2 hierarchy is populated.
 
 
+SGX EPC
+-------
+
+The "sgx_epc" controller regulates distribution of SGX EPC memory,
+which is a subset of system RAM that is used to provide SGX-enabled
+applications with protected memory, and is otherwise inaccessible,
+i.e. shows up as reserved in /proc/iomem and cannot be read/written
+outside of an SGX enclave.
+
+Although current systems implement EPC by stealing memory from RAM,
+for all intents and purposes the EPC is independent from normal system
+memory, e.g. must be reserved at boot from RAM and cannot be converted
+between EPC and normal memory while the system is running.  The EPC is
+managed by the SGX subsystem and is not accounted by the memory
+controller.  Note that this is true only for EPC memory itself, i.e.
+normal memory allocations related to SGX and EPC memory, e.g. the
+backing memory for evicted EPC pages, are accounted, limited and
+protected by the memory controller.
+
+Much like normal system memory, EPC memory can be overcommitted via
+virtual memory techniques and pages can be swapped out of the EPC
+to their backing store (normal system memory allocated via shmem).
+The SGX EPC subsystem is analogous to the memory subsytem, and the
+SGX EPC controller is in turn analogous to the memory controller;
+it implements limit and protection models for EPC memory.
+
+See Documentation/x86/sgx.rst for more info on SGX and EPC.
+
+SGX EPC Interface Files
+~~~~~~~~~~~~~~~~~~~~~~~
+
+All SGX EPC memory amounts are in bytes unless explicitly stated
+otherwise.  If a value which is not PAGE_SIZE aligned is written,
+the actual value used by the controller will be rounded down to
+the closest PAGE_SIZE multiple.
+
+  sgx_epc.current
+
+	A read-only single value file which exists on all cgroups.
+
+	The total amount of EPC memory currently being used by the
+	cgroup and its descendants.
+
+  sgx_epc.low
+
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "0".
+
+	Best-effort protection of EPC usage.  If the EPC usage of a
+	cgroup is below its limits, and all its ancestors are below
+	their low limits, then the cgroup's EPC won't be reclaimed
+	unless EPC cannot be reclaimed from unprotected cgroups,
+	e.g. all sibling cgroups are also below their low limit.
+
+	Setting low to a value more than the amount of EPC available
+	is discouraged.  The low limit is effectively ignored if the
+	cgroup's high or max limit is less than its low limit.
+
+  sgx_epc.high
+
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "max".
+
+	EPC usage best-effort limit.  This is the main mechanism to
+	control EPC usage of a cgroup.  If a cgroup's usage goes
+	over the high boundary, EPC pages will be reclaimed from
+	the cgroup until it is back under the high limit.
+
+	Going over the high limit does not prevent allocation of
+	additional EPC pages, e.g. EPC usage will often spike above
+	the high limit during enclave creation, when a large number
+	of EPC pages are EADDed in a short period.
+
+  sgx_epc.max
+
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "max".
+
+	EPC usage hard limit.  If a cgroup's EPC usage reaches this
+	limit, EPC allocations, e.g. for page fault handling, will
+	be blocked until EPC can be reclaimed from the cgroup.  If
+	EPC cannot be reclaimed in a timely manner, reclaim will be
+	forced, e.g. by ignoring LRU.
+
+	The max limit is intended to be a last line of defense; it
+	should rarely come into play on a properly configured and
+	monitored system.
+
+  sgx_epc.stats
+
+	A read-write flat-keyed file which exists on all cgroups.
+	Reads from the file display the cgroup's statistics, while
+	writes reset the underlying counters (if applicable).
+
+	The entries are ordered to be human readable, and new entries
+	can show up in the middle.  Don't rely on items remaining in a
+	fixed position; use the keys to look up specific values!
+
+	The following entries are defined.
+
+	  pages
+
+		The total number of pages currently being used by the
+		cgroup and its descendants, i.e. sgx_epc.current / 4096.
+
+	  direct
+
+		The number of pages currently being used by the cgroup
+		itself, excluding its descendants.
+
+	  indirect
+
+		The number of pages currently being used by the cgroup's
+		descendants, excluding its own pages.
+
+	  reclaimed
+
+		The number of pages that have been reclaimed from the
+		cgroup (since sgx_epc.stats was last reset).
+
+	  reclamations
+
+		The number of times this cgroup's LRU lists have been
+		scanned for reclaim, i.e. the number of times the cgroup
+		has been selected for reclaim via any code path.
+
+  sgx_epc.events
+
+	A read-write flat-keyed file which exists on non-root cgroups.
+	Writes to the file reset the event counters to zero.  A value
+	change in this file generates a file modified event.
+
+	The following entries are defined.
+
+	  low
+
+		The number of times the cgroup has been reclaimed even
+		though its usage is under the low boundary, e.g. due to
+		all sibling cgroups also being low.  This event usually
+		indicates that the low boundary is over-committed.
+
+	  high
+
+		The number of times the cgroup has triggered a reclaim
+		due to its EPC usage exceeding its high EPC boundary.
+		This event is expected for cgroups whose EPC usage is
+		capped by its high limit rather than global pressure.
+
+	  max
+
+		The number of times the cgroup has triggered a reclaim
+		due to its EPC usage  approaching (or exceeding) its max
+		EPC boundary.
+
+Usage Guidelines
+~~~~~~~~~~~~~~~~
+
+"sgx_epc.high" and "sgx_epc.low" are the main mechanisms to control
+EPC usage; using "sgx_epc.max" as anything other than a safety net
+is inadvisable, SGX application performance will suffer greatly if
+a process encounters its max limit.  Because a cgroup is allowed to
+breach its high limit, e.g. to fault in a page, performance is not
+artificially limited, whereas the max limit will effectively block
+a faulting application until the kernel can reclaim EPC memory from
+the cgroup.
+
+Exactly how "sgx_epc.high" is utilized will vary case by case, i.e.
+there is no one "correct" strategy.  Deferring to global EPC memory
+pressure, e.g. by overcommitting on the high limit, may be the most
+effective approach for a particular situation, whereas a different
+scenario might warrant a more draconian usage of the high limit.
+Regardless of the strategy used, because breach of the high limit
+does not cause processes to block or be killed, a management agent
+has ample opportunity to monitor and react as needed, e.g. it can
+raise the offending cgroup's high limit or terminate the workload.
+
+Similarly, "sgx_epc.low" can play different roles depending on the
+situation, e.g. it can be set to a relatively high value to protect
+a mission critical workload, or it may be used to reserve a minimal
+amount of EPC memory simply to ensure forward progress.  Employing
+"sgx_epc.low" in some capacity is generally recommended, especially
+when overcommitting "sgx_epc.high", as it is relatively common for
+a system to be under heavy EPC pressure; this holds true even on a
+carefully tuned system, as initializing an enclave requires all of
+the enclave's pages be brought into the EPC at some point prior to
+initialization, if only temporarily.
+
+Migration
+~~~~~~~~~~~~~~~~
+
+Once an EPC page is charged to a cgroup (during allocation), it
+remains charged to the original cgroup until the page is released
+or reclaimed.  Migrating a process to a different cgroup doesn't
+move the EPC charges that it incurred while in the previous cgroup
+to its new cgroup.
+
+
 Non-normative information
 -------------------------
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (19 preceding siblings ...)
  2022-09-22 17:10 ` [RFC PATCH 20/20] docs, cgroup, x86/sgx: Add SGX EPC cgroup controller documentation Kristen Carlson Accardi
@ 2022-09-22 17:41 ` Tejun Heo
  2022-09-22 18:59   ` Kristen Carlson Accardi
  2022-09-23 12:24 ` Jarkko Sakkinen
  21 siblings, 1 reply; 43+ messages in thread
From: Tejun Heo @ 2022-09-22 17:41 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song

Hello,

(cc'ing memcg folks)

On Thu, Sep 22, 2022 at 10:10:37AM -0700, Kristen Carlson Accardi wrote:
> Add a new cgroup controller to regulate the distribution of SGX EPC memory,
> which is a subset of system RAM that is used to provide SGX-enabled
> applications with protected memory, and is otherwise inaccessible.
> 
> SGX EPC memory allocations are separate from normal RAM allocations,
> and is managed solely by the SGX subsystem. The existing cgroup memory
> controller cannot be used to limit or account for SGX EPC memory.
> 
> This patchset implements the sgx_epc cgroup controller, which will provide
> support for stats, events, and the following interface files:
> 
> sgx_epc.current
> 	A read-only value which represents the total amount of EPC
> 	memory currently being used on by the cgroup and its descendents.
> 
> sgx_epc.low
> 	A read-write value which is used to set best-effort protection
> 	of EPC usage. If the EPC usage of a cgroup drops below this value,
> 	then the cgroup's EPC memory will not be reclaimed if possible.
> 
> sgx_epc.high
> 	A read-write value which is used to set a best-effort limit
> 	on the amount of EPC usage a cgroup has. If a cgroup's usage
> 	goes past the high value, the EPC memory of that cgroup will
> 	get reclaimed back under the high limit.
> 
> sgx_epc.max
> 	A read-write value which is used to set a hard limit for
> 	cgroup EPC usage. If a cgroup's EPC usage reaches this limit,
> 	allocations are blocked until EPC memory can be reclaimed from
> 	the cgroup.

I don't know how SGX uses its memory but you said in the other message that
it's usually a really small portion of the memory and glancing the code it
looks like its own page aging and all. Can you give some concrete examples
on how it's used and why we need cgroup support for it? Also, do you really
need all three control knobs here? e.g. given that .high is only really
useful in conjunction with memory pressure and oom handling from userspace,
I don't see how this would actually be useful for something like this.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users
  2022-09-22 17:10 ` [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users Kristen Carlson Accardi
@ 2022-09-22 18:54   ` Dave Hansen
  2022-09-23 12:49   ` Jarkko Sakkinen
  1 sibling, 0 replies; 43+ messages in thread
From: Dave Hansen @ 2022-09-22 18:54 UTC (permalink / raw)
  To: Kristen Carlson Accardi, linux-kernel, linux-sgx, cgroups,
	Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
  Cc: Sean Christopherson

On 9/22/22 10:10, Kristen Carlson Accardi wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> A future patch will use the owner field for either a pointer to
> a struct sgx_encl, or a struct sgx_encl_page.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kernel/cpu/sgx/sgx.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 0f2020653fba..5a7e858a8f98 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -33,7 +33,7 @@ struct sgx_epc_page {
>  	unsigned int section;
>  	u16 flags;
>  	u16 poison;
> -	struct sgx_encl_page *owner;
> +	void *owner;
>  	struct list_head list;
>  };
>  

We normally handle these with a union.  I'd probably do something like
this instead:

-	struct sgx_encl_page *owner;
+	union owner {
+		struct sgx_encl	     *o_encl;
+		struct sgx_encl_page *o_page;
+	}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages
  2022-09-22 17:10 ` [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages Kristen Carlson Accardi
@ 2022-09-22 18:55   ` Dave Hansen
  2022-09-22 20:04     ` Kristen Carlson Accardi
  2022-09-23 12:52   ` Jarkko Sakkinen
  1 sibling, 1 reply; 43+ messages in thread
From: Dave Hansen @ 2022-09-22 18:55 UTC (permalink / raw)
  To: Kristen Carlson Accardi, linux-kernel, linux-sgx, cgroups,
	Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
  Cc: Sean Christopherson

On 9/22/22 10:10, Kristen Carlson Accardi wrote:
> -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
> +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim)
>  {
>  	struct sgx_epc_page *epc_page;
>  	int ret;
> @@ -1218,6 +1219,8 @@ struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
>  		return ERR_PTR(-EFAULT);
>  	}
>  
> +	epc_page->owner = encl;
> +
>  	return epc_page;
>  }

BTW, is there a flag or any other way to tell to what kind of object
->owner points?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-22 17:41 ` [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Tejun Heo
@ 2022-09-22 18:59   ` Kristen Carlson Accardi
  2022-09-22 19:08     ` Tejun Heo
  0 siblings, 1 reply; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 18:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, linux-sgx, cgroups, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song

On Thu, 2022-09-22 at 07:41 -1000, Tejun Heo wrote:
> Hello,
> 
> (cc'ing memcg folks)
> 
> On Thu, Sep 22, 2022 at 10:10:37AM -0700, Kristen Carlson Accardi
> wrote:
> > Add a new cgroup controller to regulate the distribution of SGX EPC
> > memory,
> > which is a subset of system RAM that is used to provide SGX-enabled
> > applications with protected memory, and is otherwise inaccessible.
> > 
> > SGX EPC memory allocations are separate from normal RAM
> > allocations,
> > and is managed solely by the SGX subsystem. The existing cgroup
> > memory
> > controller cannot be used to limit or account for SGX EPC memory.
> > 
> > This patchset implements the sgx_epc cgroup controller, which will
> > provide
> > support for stats, events, and the following interface files:
> > 
> > sgx_epc.current
> >         A read-only value which represents the total amount of EPC
> >         memory currently being used on by the cgroup and its
> > descendents.
> > 
> > sgx_epc.low
> >         A read-write value which is used to set best-effort
> > protection
> >         of EPC usage. If the EPC usage of a cgroup drops below this
> > value,
> >         then the cgroup's EPC memory will not be reclaimed if
> > possible.
> > 
> > sgx_epc.high
> >         A read-write value which is used to set a best-effort limit
> >         on the amount of EPC usage a cgroup has. If a cgroup's
> > usage
> >         goes past the high value, the EPC memory of that cgroup
> > will
> >         get reclaimed back under the high limit.
> > 
> > sgx_epc.max
> >         A read-write value which is used to set a hard limit for
> >         cgroup EPC usage. If a cgroup's EPC usage reaches this
> > limit,
> >         allocations are blocked until EPC memory can be reclaimed
> > from
> >         the cgroup.
> 
> I don't know how SGX uses its memory but you said in the other
> message that
> it's usually a really small portion of the memory and glancing the
> code it
> looks like its own page aging and all. Can you give some concrete
> examples
> on how it's used and why we need cgroup support for it? Also, do you
> really
> need all three control knobs here? e.g. given that .high is only
> really
> useful in conjunction with memory pressure and oom handling from
> userspace,
> I don't see how this would actually be useful for something like
> this.
> 
> Thanks.
> 

Thanks for your question. The SGX EPC memory is a global shared
resource that can be over committed. The SGX EPC controller should be
used similarly to the normal memory controller. Normally when there is
pressure on EPC memory, the reclaimer thread will write out pages from
EPC memory to a backing RAM that is allocated per enclave. It is
possible currently for even a single enclave to force all the other
enclaves to have their epc pages written to backing RAM by allocating
all the available system EPC memory. This can cause performance issues
for the enclaves when they have to fault to load pages page in.

sgx_epc.high value will help control the EPC usage of the cgroup. The
sgx reclaimer will use this value to prevent the total EPC usage of a
cgroup from exceeding this value (best effort). This way, if a system
administrator would like to try to prevent single enclaves, or groups
of enclaves from allocating all of the EPC memory and causing
performance issues for the other enclaves on the system, they can set
this limit. sgx_epc.max can be used to set a hard limit, which will
cause an enclave to get all it's used pages zapped and it will
effectively be killed until it is rebuilt by the owning sgx
application. sgx_epc.low can be used to (best effort) try to ensure
that some minimum amount of EPC pages are protected for enclaves in a
particular cgroup. This can be useful for preventing evictions and thus
performance issues due to faults.

I hope this answers your question.

Thanks,
Kristen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-22 18:59   ` Kristen Carlson Accardi
@ 2022-09-22 19:08     ` Tejun Heo
  2022-09-22 21:03       ` Dave Hansen
  0 siblings, 1 reply; 43+ messages in thread
From: Tejun Heo @ 2022-09-22 19:08 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song

Hello,

On Thu, Sep 22, 2022 at 11:59:14AM -0700, Kristen Carlson Accardi wrote:
> Thanks for your question. The SGX EPC memory is a global shared
> resource that can be over committed. The SGX EPC controller should be
> used similarly to the normal memory controller. Normally when there is
> pressure on EPC memory, the reclaimer thread will write out pages from
> EPC memory to a backing RAM that is allocated per enclave. It is
> possible currently for even a single enclave to force all the other
> enclaves to have their epc pages written to backing RAM by allocating
> all the available system EPC memory. This can cause performance issues
> for the enclaves when they have to fault to load pages page in.

Can you please give more concrete examples? I'd love to hear how the SGX EPC
memory is typically used in what amounts and what's the performance
implications when they get reclaimed and so on. ie. Please describe a
realistic usage scenario of contention with sufficient details on how the
system is set up, what the applications are using the SGX EPC memory for and
how much, how the contention on memory affects the users and so on.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages
  2022-09-22 18:55   ` Dave Hansen
@ 2022-09-22 20:04     ` Kristen Carlson Accardi
  2022-09-22 21:39       ` Dave Hansen
  0 siblings, 1 reply; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-22 20:04 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, linux-sgx, cgroups, Jarkko Sakkinen,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
  Cc: Sean Christopherson

On Thu, 2022-09-22 at 11:55 -0700, Dave Hansen wrote:
> On 9/22/22 10:10, Kristen Carlson Accardi wrote:
> > -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
> > +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool
> > reclaim)
> >  {
> >         struct sgx_epc_page *epc_page;
> >         int ret;
> > @@ -1218,6 +1219,8 @@ struct sgx_epc_page *sgx_alloc_va_page(bool
> > reclaim)
> >                 return ERR_PTR(-EFAULT);
> >         }
> >  
> > +       epc_page->owner = encl;
> > +
> >         return epc_page;
> >  }
> 
> BTW, is there a flag or any other way to tell to what kind of object
> ->owner points?

The owner will only be an sgx_encl type if it is a va page, so to tell
what kind of object owner is, you look at the epc page flags - like
this:
        if (epc_page->flags & SGX_EPC_PAGE_ENCLAVE)
                encl = ((struct sgx_encl_page *)epc_page->owner)->encl;
        else if (epc_page->flags & SGX_EPC_PAGE_VERSION_ARRAY)
                encl = epc_page->owner;
...



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-22 19:08     ` Tejun Heo
@ 2022-09-22 21:03       ` Dave Hansen
  2022-09-24  0:09         ` Tejun Heo
  0 siblings, 1 reply; 43+ messages in thread
From: Dave Hansen @ 2022-09-22 21:03 UTC (permalink / raw)
  To: Tejun Heo, Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song

On 9/22/22 12:08, Tejun Heo wrote:
> Can you please give more concrete examples? I'd love to hear how the SGX EPC
> memory is typically used in what amounts and what's the performance
> implications when they get reclaimed and so on. ie. Please describe a
> realistic usage scenario of contention with sufficient details on how the
> system is set up, what the applications are using the SGX EPC memory for and
> how much, how the contention on memory affects the users and so on.

One wrinkle is that the apps that use SGX EPC memory are *normal* apps.
 There are frameworks that some folks are very excited about that allow
you to run mostly unmodified app stacks inside SGX.  For example:

	https://github.com/gramineproject/graphene

In fact, Gramine users are the troublesome ones for overcommit.  Most
explicitly-written SGX applications are quite austere in their SGX
memory use; they're probably never going to see overcommit.  These
Gramine-wrapped apps are (relative) pigs.  They've been the ones finding
bugs in the existing SGX overcommit code.

So, where does all the SGX memory go?  It's the usual suspects:
memcached and redis. ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages
  2022-09-22 20:04     ` Kristen Carlson Accardi
@ 2022-09-22 21:39       ` Dave Hansen
  0 siblings, 0 replies; 43+ messages in thread
From: Dave Hansen @ 2022-09-22 21:39 UTC (permalink / raw)
  To: Kristen Carlson Accardi, linux-kernel, linux-sgx, cgroups,
	Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin
  Cc: Sean Christopherson

On 9/22/22 13:04, Kristen Carlson Accardi wrote:
>> BTW, is there a flag or any other way to tell to what kind of object
>> ->owner points?
> The owner will only be an sgx_encl type if it is a va page, so to tell
> what kind of object owner is, you look at the epc page flags - like
> this:
>         if (epc_page->flags & SGX_EPC_PAGE_ENCLAVE)
>                 encl = ((struct sgx_encl_page *)epc_page->owner)->encl;
>         else if (epc_page->flags & SGX_EPC_PAGE_VERSION_ARRAY)
>                 encl = epc_page->owner;
> ...

I don't know how much refactoring it would take, but it would be nice if
that was a bit more obvious.  Basically, can we get the code that checks
for or sets SGX_EPC_PAGE_VERSION_ARRAY close to the code that assigns or
reads ->owner?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
                   ` (20 preceding siblings ...)
  2022-09-22 17:41 ` [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Tejun Heo
@ 2022-09-23 12:24 ` Jarkko Sakkinen
  21 siblings, 0 replies; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 12:24 UTC (permalink / raw)
  To: Kristen Carlson Accardi; +Cc: linux-kernel, linux-sgx, cgroups

On Thu, Sep 22, 2022 at 10:10:37AM -0700, Kristen Carlson Accardi wrote:
> Add a new cgroup controller to regulate the distribution of SGX EPC memory,
> which is a subset of system RAM that is used to provide SGX-enabled
> applications with protected memory, and is otherwise inaccessible.
> 
> SGX EPC memory allocations are separate from normal RAM allocations,
> and is managed solely by the SGX subsystem. The existing cgroup memory
> controller cannot be used to limit or account for SGX EPC memory.
> 
> This patchset implements the sgx_epc cgroup controller, which will provide
> support for stats, events, and the following interface files:
> 
> sgx_epc.current
> 	A read-only value which represents the total amount of EPC
> 	memory currently being used on by the cgroup and its descendents.
> 
> sgx_epc.low
> 	A read-write value which is used to set best-effort protection
> 	of EPC usage. If the EPC usage of a cgroup drops below this value,
> 	then the cgroup's EPC memory will not be reclaimed if possible.
> 
> sgx_epc.high
> 	A read-write value which is used to set a best-effort limit
> 	on the amount of EPC usage a cgroup has. If a cgroup's usage
> 	goes past the high value, the EPC memory of that cgroup will
> 	get reclaimed back under the high limit.
> 
> sgx_epc.max
> 	A read-write value which is used to set a hard limit for
> 	cgroup EPC usage. If a cgroup's EPC usage reaches this limit,
> 	allocations are blocked until EPC memory can be reclaimed from
> 	the cgroup.

It would be worth of mentioning for clarity that shmem is accounted from
memcg.

BR, Jarkko

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages()
  2022-09-22 17:10 ` [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Kristen Carlson Accardi
@ 2022-09-23 12:32   ` Jarkko Sakkinen
  2022-09-23 12:35     ` Jarkko Sakkinen
  0 siblings, 1 reply; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 12:32 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Thu, Sep 22, 2022 at 10:10:38AM -0700, Kristen Carlson Accardi wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Move the invocation of post-reclaim cond_resched() from the callers of
> sgx_reclaim_pages() into the reclaim path itself.   sgx_reclaim_pages()
> is always called in a loop and is always followed by a call to
> cond_resched().  This will hold true for the EPC cgroup as well, which
> adds even more calls to sgx_reclaim_pages() and thus cond_resched().

This would be in my opinion better:

"
In order to avoid repetion of cond_sched() in ksgxd() and
sgx_alloc_epc_page(), move the call inside sgx_reclaim_pages().

This will hold true for the EPC cgroup as well, which adds more
call sites sgx_reclaim_pages().
"

This way it is dead obvious and is better description because
it enumerates the consequences (i.e. call sites).

BR, Jarkko


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages()
  2022-09-23 12:32   ` Jarkko Sakkinen
@ 2022-09-23 12:35     ` Jarkko Sakkinen
  2022-09-23 12:37       ` Jarkko Sakkinen
  0 siblings, 1 reply; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 12:35 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Fri, Sep 23, 2022 at 03:32:43PM +0300, Jarkko Sakkinen wrote:
> On Thu, Sep 22, 2022 at 10:10:38AM -0700, Kristen Carlson Accardi wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Move the invocation of post-reclaim cond_resched() from the callers of
> > sgx_reclaim_pages() into the reclaim path itself.   sgx_reclaim_pages()
> > is always called in a loop and is always followed by a call to
> > cond_resched().  This will hold true for the EPC cgroup as well, which
> > adds even more calls to sgx_reclaim_pages() and thus cond_resched().
> 
> This would be in my opinion better:
> 
> "
> In order to avoid repetion of cond_sched() in ksgxd() and
> sgx_alloc_epc_page(), move the call inside sgx_reclaim_pages().
> 
> This will hold true for the EPC cgroup as well, which adds more
> call sites sgx_reclaim_pages().
> "
> 
> This way it is dead obvious and is better description because
> it enumerates the consequences (i.e. call sites).

Forgot 3rd call site: sgx_reclaim_direct(), which is used by
SGX2 ioctls. The call sites of sgx_reclaim_direct() do not
call cond_resched(). You should address why adding this call
to those flows is fine.

BR, Jarkko


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages()
  2022-09-23 12:35     ` Jarkko Sakkinen
@ 2022-09-23 12:37       ` Jarkko Sakkinen
  0 siblings, 0 replies; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 12:37 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Fri, Sep 23, 2022 at 03:35:25PM +0300, Jarkko Sakkinen wrote:
> On Fri, Sep 23, 2022 at 03:32:43PM +0300, Jarkko Sakkinen wrote:
> > On Thu, Sep 22, 2022 at 10:10:38AM -0700, Kristen Carlson Accardi wrote:
> > > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > > 
> > > Move the invocation of post-reclaim cond_resched() from the callers of
> > > sgx_reclaim_pages() into the reclaim path itself.   sgx_reclaim_pages()
> > > is always called in a loop and is always followed by a call to
> > > cond_resched().  This will hold true for the EPC cgroup as well, which
> > > adds even more calls to sgx_reclaim_pages() and thus cond_resched().
> > 
> > This would be in my opinion better:
> > 
> > "
> > In order to avoid repetion of cond_sched() in ksgxd() and
> > sgx_alloc_epc_page(), move the call inside sgx_reclaim_pages().
> > 
> > This will hold true for the EPC cgroup as well, which adds more
> > call sites sgx_reclaim_pages().
> > "
> > 
> > This way it is dead obvious and is better description because
> > it enumerates the consequences (i.e. call sites).
> 
> Forgot 3rd call site: sgx_reclaim_direct(), which is used by
> SGX2 ioctls. The call sites of sgx_reclaim_direct() do not
> call cond_resched(). You should address why adding this call
> to those flows is fine.

Ofc adding a parameter to sgx_reclaim_pages() for cond_resched() call is
other option (not emphasising either option).

BR, Jarkko

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users
  2022-09-22 17:10 ` [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users Kristen Carlson Accardi
  2022-09-22 18:54   ` Dave Hansen
@ 2022-09-23 12:49   ` Jarkko Sakkinen
  1 sibling, 0 replies; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 12:49 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Thu, Sep 22, 2022 at 10:10:39AM -0700, Kristen Carlson Accardi wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> A future patch will use the owner field for either a pointer to
> a struct sgx_encl, or a struct sgx_encl_page.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kernel/cpu/sgx/sgx.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 0f2020653fba..5a7e858a8f98 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -33,7 +33,7 @@ struct sgx_epc_page {
>  	unsigned int section;
>  	u16 flags;
>  	u16 poison;
> -	struct sgx_encl_page *owner;
> +	void *owner;
>  	struct list_head list;
>  };
>  
> -- 
> 2.37.3
> 

Conflicts with https://lore.kernel.org/linux-sgx/20220920063948.3556917-1-zhiquan1.li@intel.com/T/#m5c911085eb350df564db2c18e344ce036e269749

BR, Jarkko

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages
  2022-09-22 17:10 ` [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages Kristen Carlson Accardi
  2022-09-22 18:55   ` Dave Hansen
@ 2022-09-23 12:52   ` Jarkko Sakkinen
  1 sibling, 0 replies; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 12:52 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Thu, Sep 22, 2022 at 10:10:40AM -0700, Kristen Carlson Accardi wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> In order to fully account for an enclave's EPC page usage, store
> the owning enclave of a VA EPC page.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: Sean Christopherson <seanjc@google.com>

Why this change fully accounts enclave's EPC page usage?

BR, Jarkko

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s)
  2022-09-22 17:10 ` [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s) Kristen Carlson Accardi
@ 2022-09-23 13:20   ` Jarkko Sakkinen
  2022-09-29 23:04     ` Kristen Carlson Accardi
  0 siblings, 1 reply; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 13:20 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Thu, Sep 22, 2022 at 10:10:41AM -0700, Kristen Carlson Accardi wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Wrap the existing reclaimable list and its spinlock in a struct to
> minimize the code changes needed to handle multiple LRUs as well as
> reclaimable and non-reclaimable lists, both of which will be introduced
> and used by SGX EPC cgroups.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: Sean Christopherson <seanjc@google.com>

The commit message could explicitly state the added data type.

The data type is not LRU: together with the LIFO list, i.e.
a queue, the code implements LRU alike policy.

I would name the data type as sgx_epc_queue because it is a 
less confusing name.

> ---
>  arch/x86/kernel/cpu/sgx/main.c | 37 +++++++++++++++++-----------------
>  arch/x86/kernel/cpu/sgx/sgx.h  | 11 ++++++++++
>  2 files changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 4cdeb915dc86..af68dc1c677b 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -26,10 +26,9 @@ static DEFINE_XARRAY(sgx_epc_address_space);
>  
>  /*
>   * These variables are part of the state of the reclaimer, and must be accessed
> - * with sgx_reclaimer_lock acquired.
> + * with sgx_global_lru.lock acquired.
>   */
> -static LIST_HEAD(sgx_active_page_list);
> -static DEFINE_SPINLOCK(sgx_reclaimer_lock);
> +static struct sgx_epc_lru sgx_global_lru;
>  
>  static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0);
>  
> @@ -298,12 +297,12 @@ static void sgx_reclaim_pages(void)
>  	int ret;
>  	int i;
>  
> -	spin_lock(&sgx_reclaimer_lock);
> +	spin_lock(&sgx_global_lru.lock);
>  	for (i = 0; i < SGX_NR_TO_SCAN; i++) {
> -		if (list_empty(&sgx_active_page_list))
> +		if (list_empty(&sgx_global_lru.reclaimable))
>  			break;
>  
> -		epc_page = list_first_entry(&sgx_active_page_list,
> +		epc_page = list_first_entry(&sgx_global_lru.reclaimable,
>  					    struct sgx_epc_page, list);
>  		list_del_init(&epc_page->list);
>  		encl_page = epc_page->owner;
> @@ -316,7 +315,7 @@ static void sgx_reclaim_pages(void)
>  			 */
>  			epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
>  	}
> -	spin_unlock(&sgx_reclaimer_lock);
> +	spin_unlock(&sgx_global_lru.lock);
>  
>  	for (i = 0; i < cnt; i++) {
>  		epc_page = chunk[i];
> @@ -339,9 +338,9 @@ static void sgx_reclaim_pages(void)
>  		continue;
>  
>  skip:
> -		spin_lock(&sgx_reclaimer_lock);
> -		list_add_tail(&epc_page->list, &sgx_active_page_list);
> -		spin_unlock(&sgx_reclaimer_lock);
> +		spin_lock(&sgx_global_lru.lock);
> +		list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable);
> +		spin_unlock(&sgx_global_lru.lock);
>  
>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>  
> @@ -374,7 +373,7 @@ static void sgx_reclaim_pages(void)
>  static bool sgx_should_reclaim(unsigned long watermark)
>  {
>  	return atomic_long_read(&sgx_nr_free_pages) < watermark &&
> -	       !list_empty(&sgx_active_page_list);
> +	       !list_empty(&sgx_global_lru.reclaimable);
>  }
>  
>  /*
> @@ -427,6 +426,8 @@ static bool __init sgx_page_reclaimer_init(void)
>  
>  	ksgxd_tsk = tsk;
>  
> +	sgx_lru_init(&sgx_global_lru);
> +
>  	return true;
>  }
>  
> @@ -502,10 +503,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
>   */
>  void sgx_mark_page_reclaimable(struct sgx_epc_page *page)
>  {
> -	spin_lock(&sgx_reclaimer_lock);
> +	spin_lock(&sgx_global_lru.lock);
>  	page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED;
> -	list_add_tail(&page->list, &sgx_active_page_list);
> -	spin_unlock(&sgx_reclaimer_lock);
> +	list_add_tail(&page->list, &sgx_global_lru.reclaimable);
> +	spin_unlock(&sgx_global_lru.lock);
>  }
>  
>  /**
> @@ -520,18 +521,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page)
>   */
>  int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>  {
> -	spin_lock(&sgx_reclaimer_lock);
> +	spin_lock(&sgx_global_lru.lock);
>  	if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) {
>  		/* The page is being reclaimed. */
>  		if (list_empty(&page->list)) {
> -			spin_unlock(&sgx_reclaimer_lock);
> +			spin_unlock(&sgx_global_lru.lock);
>  			return -EBUSY;
>  		}
>  
>  		list_del(&page->list);
>  		page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
>  	}
> -	spin_unlock(&sgx_reclaimer_lock);
> +	spin_unlock(&sgx_global_lru.lock);
>  
>  	return 0;
>  }
> @@ -564,7 +565,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>  			break;
>  		}
>  
> -		if (list_empty(&sgx_active_page_list))
> +		if (list_empty(&sgx_global_lru.reclaimable))
>  			return ERR_PTR(-ENOMEM);
>  
>  		if (!reclaim) {
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 5a7e858a8f98..7b208ee8eb45 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -83,6 +83,17 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
>  	return section->virt_addr + index * PAGE_SIZE;
>  }
>  
> +struct sgx_epc_lru {
> +	spinlock_t lock;
> +	struct list_head reclaimable;

s/reclaimable/list/

> +};
> +
> +static inline void sgx_lru_init(struct sgx_epc_lru *lru)
> +{
> +	spin_lock_init(&lru->lock);
> +	INIT_LIST_HEAD(&lru->reclaimable);
> +}
> +
>  struct sgx_epc_page *__sgx_alloc_epc_page(void);
>  void sgx_free_epc_page(struct sgx_epc_page *page);
>  
> -- 
> 2.37.3
> 

Please also add these:

/*
 * Must be called with queue->lock acquired.
 */
static inline struct sgx_epc_page *__sgx_epc_queue_push(struct sgx_epc_queue *queue,
                                                        struct sgx_page *page)
{
        list_add_tail(&page->list, &queue->list);
}

/*
 * Must be called with queue->lock acquired.
 */
static inline struct sgx_epc_page *__sgx_epc_queue_pop(struct sgx_epc_queue *queue)
{
        struct sgx_epc_page *page;

        if (list_empty(&queue->list)
                return NULL;

	page = list_first_entry(&queue->list, struct sgx_epc_page, list);
  	list_del_init(&page->list);

        return page;
}

And use them in existing sites. It ensures coherent behavior. You should be
able to replace all uses with either, or combination of them (list_move).

BR, Jarkko

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists
  2022-09-22 17:10 ` [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists Kristen Carlson Accardi
@ 2022-09-23 13:29   ` Jarkko Sakkinen
  0 siblings, 0 replies; 43+ messages in thread
From: Jarkko Sakkinen @ 2022-09-23 13:29 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Thu, Sep 22, 2022 at 10:10:42AM -0700, Kristen Carlson Accardi wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Add code to keep track of pages that are not tracked by the reclaimer
> in the LRU's "unreclaimable" list. When there is an OOM event and an
> enclave must be OOM killed, the EPC pages which are not tracked by
> the reclaimer can still be freed.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Cc: Sean Christopherson <seanjc@google.com>

This could have some description of what is actually happening in
this patch.

BR, Jarkko

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-22 21:03       ` Dave Hansen
@ 2022-09-24  0:09         ` Tejun Heo
  2022-09-26 18:30           ` Kristen Carlson Accardi
  2022-10-07 16:39           ` Kristen Carlson Accardi
  0 siblings, 2 replies; 43+ messages in thread
From: Tejun Heo @ 2022-09-24  0:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kristen Carlson Accardi, linux-kernel, linux-sgx, cgroups,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song

Hello,

On Thu, Sep 22, 2022 at 02:03:52PM -0700, Dave Hansen wrote:
> On 9/22/22 12:08, Tejun Heo wrote:
> > Can you please give more concrete examples? I'd love to hear how the SGX EPC
> > memory is typically used in what amounts and what's the performance
> > implications when they get reclaimed and so on. ie. Please describe a
> > realistic usage scenario of contention with sufficient details on how the
> > system is set up, what the applications are using the SGX EPC memory for and
> > how much, how the contention on memory affects the users and so on.
> 
> One wrinkle is that the apps that use SGX EPC memory are *normal* apps.
>  There are frameworks that some folks are very excited about that allow
> you to run mostly unmodified app stacks inside SGX.  For example:
> 
> 	https://github.com/gramineproject/graphene
> 
> In fact, Gramine users are the troublesome ones for overcommit.  Most
> explicitly-written SGX applications are quite austere in their SGX
> memory use; they're probably never going to see overcommit.  These
> Gramine-wrapped apps are (relative) pigs.  They've been the ones finding
> bugs in the existing SGX overcommit code.
> 
> So, where does all the SGX memory go?  It's the usual suspects:
> memcached and redis. ;)

Hey, so, I'm a bit weary that this doesn't seem to have a strong demand at
this point. When there's clear shared demand, I usually hear from multiple
parties about their use cases and the practical problems they're trying to
solve and so on. This, at least to me, seems primarily driven by producers
than consumers.

There's nothing wrong with projecting future usages and jumping ahead the
curve but there's a balance to hit, and going full-on memcg-style controller
with three control knobs seems to be jumping the gun and may create
commitments which we end up looking back on with a bit of regret.

Given that, how about this? We can easily add the functionality of .max
through the misc controller. Add a new key there, trycharge when allocating
new memory, if fails, try reclaim and then fail allocation if reclaim fails
hard enough. I belive that should give at least a reasonable place to start
especially given that memcg only had limits with similar semantics for quite
a while at the beginning.

That way, we avoid creating a big interface commitments while providing a
feature which should be able to serve and test out the immediate usecases.
If, for some reason, many of us end up running hefty applications in SGX, we
can revisit the issue and build up something more complete with provisions
for backward compatibility.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-24  0:09         ` Tejun Heo
@ 2022-09-26 18:30           ` Kristen Carlson Accardi
  2022-10-07 16:39           ` Kristen Carlson Accardi
  1 sibling, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-26 18:30 UTC (permalink / raw)
  To: Tejun Heo, Dave Hansen
  Cc: linux-kernel, linux-sgx, cgroups, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song

On Fri, 2022-09-23 at 14:09 -1000, Tejun Heo wrote:
> Hello,
> 
> On Thu, Sep 22, 2022 at 02:03:52PM -0700, Dave Hansen wrote:
> > On 9/22/22 12:08, Tejun Heo wrote:
> > > Can you please give more concrete examples? I'd love to hear how
> > > the SGX EPC
> > > memory is typically used in what amounts and what's the
> > > performance
> > > implications when they get reclaimed and so on. ie. Please
> > > describe a
> > > realistic usage scenario of contention with sufficient details on
> > > how the
> > > system is set up, what the applications are using the SGX EPC
> > > memory for and
> > > how much, how the contention on memory affects the users and so
> > > on.
> > 
> > One wrinkle is that the apps that use SGX EPC memory are *normal*
> > apps.
> >  There are frameworks that some folks are very excited about that
> > allow
> > you to run mostly unmodified app stacks inside SGX.  For example:
> > 
> >         https://github.com/gramineproject/graphene
> > 
> > In fact, Gramine users are the troublesome ones for overcommit. 
> > Most
> > explicitly-written SGX applications are quite austere in their SGX
> > memory use; they're probably never going to see overcommit.  These
> > Gramine-wrapped apps are (relative) pigs.  They've been the ones
> > finding
> > bugs in the existing SGX overcommit code.
> > 
> > So, where does all the SGX memory go?  It's the usual suspects:
> > memcached and redis. ;)
> 
> Hey, so, I'm a bit weary that this doesn't seem to have a strong
> demand at
> this point. When there's clear shared demand, I usually hear from
> multiple
> parties about their use cases and the practical problems they're
> trying to
> solve and so on. This, at least to me, seems primarily driven by
> producers
> than consumers.
> 
> There's nothing wrong with projecting future usages and jumping ahead
> the
> curve but there's a balance to hit, and going full-on memcg-style
> controller
> with three control knobs seems to be jumping the gun and may create
> commitments which we end up looking back on with a bit of regret.
> 
> Given that, how about this? We can easily add the functionality of
> .max
> through the misc controller. Add a new key there, trycharge when
> allocating
> new memory, if fails, try reclaim and then fail allocation if reclaim
> fails
> hard enough. I belive that should give at least a reasonable place to
> start
> especially given that memcg only had limits with similar semantics
> for quite
> a while at the beginning.
> 
> That way, we avoid creating a big interface commitments while
> providing a
> feature which should be able to serve and test out the immediate
> usecases.
> If, for some reason, many of us end up running hefty applications in
> SGX, we
> can revisit the issue and build up something more complete with
> provisions
> for backward compatibility.
> 
> Thanks.
> 

Hi Tejun,

thanks for your suggestion. Let me discuss this with customers who
requested this feature (not all customers like to respond publically)
and see if it will meet needs. If there is an issue, I'll respond back
with concerns.

Thanks,
Kristen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s)
  2022-09-23 13:20   ` Jarkko Sakkinen
@ 2022-09-29 23:04     ` Kristen Carlson Accardi
  0 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-09-29 23:04 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-kernel, linux-sgx, cgroups, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Sean Christopherson

On Fri, 2022-09-23 at 16:20 +0300, Jarkko Sakkinen wrote:
> On Thu, Sep 22, 2022 at 10:10:41AM -0700, Kristen Carlson Accardi
> wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Wrap the existing reclaimable list and its spinlock in a struct to
> > minimize the code changes needed to handle multiple LRUs as well as
> > reclaimable and non-reclaimable lists, both of which will be
> > introduced
> > and used by SGX EPC cgroups.
> > 
> > Signed-off-by: Sean Christopherson
> > <sean.j.christopherson@intel.com>
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > Cc: Sean Christopherson <seanjc@google.com>
> 
> The commit message could explicitly state the added data type.
> 
> The data type is not LRU: together with the LIFO list, i.e.
> a queue, the code implements LRU alike policy.
> 
> I would name the data type as sgx_epc_queue because it is a 
> less confusing name.

I think when you look at patch 05/20 which adds the unreclaimable field
this becomes less like a straight up queue data type.

> 
> > ---
> >  arch/x86/kernel/cpu/sgx/main.c | 37 +++++++++++++++++-------------
> > ----
> >  arch/x86/kernel/cpu/sgx/sgx.h  | 11 ++++++++++
> >  2 files changed, 30 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c
> > b/arch/x86/kernel/cpu/sgx/main.c
> > index 4cdeb915dc86..af68dc1c677b 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -26,10 +26,9 @@ static DEFINE_XARRAY(sgx_epc_address_space);
> >  
> >  /*
> >   * These variables are part of the state of the reclaimer, and
> > must be accessed
> > - * with sgx_reclaimer_lock acquired.
> > + * with sgx_global_lru.lock acquired.
> >   */
> > -static LIST_HEAD(sgx_active_page_list);
> > -static DEFINE_SPINLOCK(sgx_reclaimer_lock);
> > +static struct sgx_epc_lru sgx_global_lru;
> >  
> >  static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0);
> >  
> > @@ -298,12 +297,12 @@ static void sgx_reclaim_pages(void)
> >         int ret;
> >         int i;
> >  
> > -       spin_lock(&sgx_reclaimer_lock);
> > +       spin_lock(&sgx_global_lru.lock);
> >         for (i = 0; i < SGX_NR_TO_SCAN; i++) {
> > -               if (list_empty(&sgx_active_page_list))
> > +               if (list_empty(&sgx_global_lru.reclaimable))
> >                         break;
> >  
> > -               epc_page = list_first_entry(&sgx_active_page_list,
> > +               epc_page =
> > list_first_entry(&sgx_global_lru.reclaimable,
> >                                             struct sgx_epc_page,
> > list);
> >                 list_del_init(&epc_page->list);
> >                 encl_page = epc_page->owner;
> > @@ -316,7 +315,7 @@ static void sgx_reclaim_pages(void)
> >                          */
> >                         epc_page->flags &=
> > ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> >         }
> > -       spin_unlock(&sgx_reclaimer_lock);
> > +       spin_unlock(&sgx_global_lru.lock);
> >  
> >         for (i = 0; i < cnt; i++) {
> >                 epc_page = chunk[i];
> > @@ -339,9 +338,9 @@ static void sgx_reclaim_pages(void)
> >                 continue;
> >  
> >  skip:
> > -               spin_lock(&sgx_reclaimer_lock);
> > -               list_add_tail(&epc_page->list,
> > &sgx_active_page_list);
> > -               spin_unlock(&sgx_reclaimer_lock);
> > +               spin_lock(&sgx_global_lru.lock);
> > +               list_add_tail(&epc_page->list,
> > &sgx_global_lru.reclaimable);
> > +               spin_unlock(&sgx_global_lru.lock);
> >  
> >                 kref_put(&encl_page->encl->refcount,
> > sgx_encl_release);
> >  
> > @@ -374,7 +373,7 @@ static void sgx_reclaim_pages(void)
> >  static bool sgx_should_reclaim(unsigned long watermark)
> >  {
> >         return atomic_long_read(&sgx_nr_free_pages) < watermark &&
> > -              !list_empty(&sgx_active_page_list);
> > +              !list_empty(&sgx_global_lru.reclaimable);
> >  }
> >  
> >  /*
> > @@ -427,6 +426,8 @@ static bool __init
> > sgx_page_reclaimer_init(void)
> >  
> >         ksgxd_tsk = tsk;
> >  
> > +       sgx_lru_init(&sgx_global_lru);
> > +
> >         return true;
> >  }
> >  
> > @@ -502,10 +503,10 @@ struct sgx_epc_page
> > *__sgx_alloc_epc_page(void)
> >   */
> >  void sgx_mark_page_reclaimable(struct sgx_epc_page *page)
> >  {
> > -       spin_lock(&sgx_reclaimer_lock);
> > +       spin_lock(&sgx_global_lru.lock);
> >         page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED;
> > -       list_add_tail(&page->list, &sgx_active_page_list);
> > -       spin_unlock(&sgx_reclaimer_lock);
> > +       list_add_tail(&page->list, &sgx_global_lru.reclaimable);
> > +       spin_unlock(&sgx_global_lru.lock);
> >  }
> >  
> >  /**
> > @@ -520,18 +521,18 @@ void sgx_mark_page_reclaimable(struct
> > sgx_epc_page *page)
> >   */
> >  int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
> >  {
> > -       spin_lock(&sgx_reclaimer_lock);
> > +       spin_lock(&sgx_global_lru.lock);
> >         if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) {
> >                 /* The page is being reclaimed. */
> >                 if (list_empty(&page->list)) {
> > -                       spin_unlock(&sgx_reclaimer_lock);
> > +                       spin_unlock(&sgx_global_lru.lock);
> >                         return -EBUSY;
> >                 }
> >  
> >                 list_del(&page->list);
> >                 page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> >         }
> > -       spin_unlock(&sgx_reclaimer_lock);
> > +       spin_unlock(&sgx_global_lru.lock);
> >  
> >         return 0;
> >  }
> > @@ -564,7 +565,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void
> > *owner, bool reclaim)
> >                         break;
> >                 }
> >  
> > -               if (list_empty(&sgx_active_page_list))
> > +               if (list_empty(&sgx_global_lru.reclaimable))
> >                         return ERR_PTR(-ENOMEM);
> >  
> >                 if (!reclaim) {
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h
> > b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 5a7e858a8f98..7b208ee8eb45 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -83,6 +83,17 @@ static inline void *sgx_get_epc_virt_addr(struct
> > sgx_epc_page *page)
> >         return section->virt_addr + index * PAGE_SIZE;
> >  }
> >  
> > +struct sgx_epc_lru {
> > +       spinlock_t lock;
> > +       struct list_head reclaimable;
> 
> s/reclaimable/list/

It feels to me that once you add the "unreclaimable" struct list_head
field to this struct in the next patch, it would be a bit confusing to
rename this to just "list". What the final struct looks like is
actually not really a nice clean simple queue, but 2 lists - one for
EPC pages which are being tracked by the reclaimer, and one for EPC
pages which are not (such as va pages).

> 
> > +};
> > +
> > +static inline void sgx_lru_init(struct sgx_epc_lru *lru)
> > +{
> > +       spin_lock_init(&lru->lock);
> > +       INIT_LIST_HEAD(&lru->reclaimable);
> > +}
> > +
> >  struct sgx_epc_page *__sgx_alloc_epc_page(void);
> >  void sgx_free_epc_page(struct sgx_epc_page *page);
> >  
> > -- 
> > 2.37.3
> > 
> 
> Please also add these:
> 
> /*
>  * Must be called with queue->lock acquired.
>  */
> static inline struct sgx_epc_page *__sgx_epc_queue_push(struct
> sgx_epc_queue *queue,
>                                                         struct
> sgx_page *page)
> {
>         list_add_tail(&page->list, &queue->list);
> }
> 
> /*
>  * Must be called with queue->lock acquired.
>  */
> static inline struct sgx_epc_page *__sgx_epc_queue_pop(struct
> sgx_epc_queue *queue)
> {
>         struct sgx_epc_page *page;
> 
>         if (list_empty(&queue->list)
>                 return NULL;
> 
>         page = list_first_entry(&queue->list, struct sgx_epc_page,
> list);
>         list_del_init(&page->list);
> 
>         return page;
> }
> 
> And use them in existing sites. It ensures coherent behavior. You
> should be
> able to replace all uses with either, or combination of them
> (list_move).
> 
> BR, Jarkko


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-09-24  0:09         ` Tejun Heo
  2022-09-26 18:30           ` Kristen Carlson Accardi
@ 2022-10-07 16:39           ` Kristen Carlson Accardi
  2022-10-07 16:42             ` Tejun Heo
  1 sibling, 1 reply; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-10-07 16:39 UTC (permalink / raw)
  To: Tejun Heo, Dave Hansen
  Cc: linux-kernel, linux-sgx, cgroups, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song

On Fri, 2022-09-23 at 14:09 -1000, Tejun Heo wrote:
<snip>

> 
> Given that, how about this? We can easily add the functionality of
> .max
> through the misc controller. Add a new key there, trycharge when
> allocating
> new memory, if fails, try reclaim and then fail allocation if reclaim
> fails
> hard enough. I belive that should give at least a reasonable place to
> start
> especially given that memcg only had limits with similar semantics
> for quite
> a while at the beginning.
> 

Hi Tejun,
I'm playing with the misc controller to see if I can make it do what I
need to do, and I had a question for you. Is there a way to easily get
notified when there are writes to the "max" file? For example, in my
full controller implementation, if a max value is written, the current
epc usage for that cgroup is immediately examined. If that usage is
over the new value of max, then the reclaimer will reclaim from that
particular cgroup to get it under the max. If it is not possible to
reclaim enough to get it under the max, enclaves will be killed so that
all the epc pages can be released and then get under the max value.
With the misc controller, i haven't been able to find a way to easily
react to a change in the max value. Am I missing something?

Thanks,
Kristen


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-10-07 16:39           ` Kristen Carlson Accardi
@ 2022-10-07 16:42             ` Tejun Heo
  2022-10-07 16:46               ` Kristen Carlson Accardi
  0 siblings, 1 reply; 43+ messages in thread
From: Tejun Heo @ 2022-10-07 16:42 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Dave Hansen, linux-kernel, linux-sgx, cgroups, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song

Hello, Kristen.

On Fri, Oct 07, 2022 at 09:39:40AM -0700, Kristen Carlson Accardi wrote:
...
> With the misc controller, i haven't been able to find a way to easily
> react to a change in the max value. Am I missing something?

There isn't currently but it should be possible to add per-key notifiers,
right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory
  2022-10-07 16:42             ` Tejun Heo
@ 2022-10-07 16:46               ` Kristen Carlson Accardi
  0 siblings, 0 replies; 43+ messages in thread
From: Kristen Carlson Accardi @ 2022-10-07 16:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dave Hansen, linux-kernel, linux-sgx, cgroups, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song

On Fri, 2022-10-07 at 06:42 -1000, Tejun Heo wrote:
> Hello, Kristen.
> 
> On Fri, Oct 07, 2022 at 09:39:40AM -0700, Kristen Carlson Accardi
> wrote:
> ...
> > With the misc controller, i haven't been able to find a way to
> > easily
> > react to a change in the max value. Am I missing something?
> 
> There isn't currently but it should be possible to add per-key
> notifiers,
> right?
> 
> Thanks.
> 

OK - yes, I will include a modification to the misc controller for the
functionality I need in my patchset.


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2022-10-07 16:47 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Kristen Carlson Accardi
2022-09-23 12:32   ` Jarkko Sakkinen
2022-09-23 12:35     ` Jarkko Sakkinen
2022-09-23 12:37       ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users Kristen Carlson Accardi
2022-09-22 18:54   ` Dave Hansen
2022-09-23 12:49   ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages Kristen Carlson Accardi
2022-09-22 18:55   ` Dave Hansen
2022-09-22 20:04     ` Kristen Carlson Accardi
2022-09-22 21:39       ` Dave Hansen
2022-09-23 12:52   ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s) Kristen Carlson Accardi
2022-09-23 13:20   ` Jarkko Sakkinen
2022-09-29 23:04     ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists Kristen Carlson Accardi
2022-09-23 13:29   ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 06/20] x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 07/20] x86/sgx: Use a list to track to-be-reclaimed pages during reclaim Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 08/20] x86/sgx: Add EPC page flags to identify type of page Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 09/20] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 10/20] x86/sgx: Return the number of EPC pages that were successfully reclaimed Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 11/20] x86/sgx: Add option to ignore age of page during EPC reclaim Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 12/20] x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 13/20] x86/sgx: Prepare for multiple LRUs Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 14/20] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 15/20] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 16/20] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 17/20] cgroup, x86/sgx: Add SGX EPC cgroup controller Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 18/20] x86/sgx: Enable EPC cgroup controller in SGX core Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 19/20] x86/sgx: Add stats and events interfaces to EPC cgroup controller Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 20/20] docs, cgroup, x86/sgx: Add SGX EPC cgroup controller documentation Kristen Carlson Accardi
2022-09-22 17:41 ` [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Tejun Heo
2022-09-22 18:59   ` Kristen Carlson Accardi
2022-09-22 19:08     ` Tejun Heo
2022-09-22 21:03       ` Dave Hansen
2022-09-24  0:09         ` Tejun Heo
2022-09-26 18:30           ` Kristen Carlson Accardi
2022-10-07 16:39           ` Kristen Carlson Accardi
2022-10-07 16:42             ` Tejun Heo
2022-10-07 16:46               ` Kristen Carlson Accardi
2022-09-23 12:24 ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).