linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init()
       [not found] <20210303150323.433207-1-jarkko@kernel.org>
@ 2021-03-03 15:03 ` Jarkko Sakkinen
  2021-03-03 16:56   ` Dave Hansen
  2021-03-03 15:03 ` [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages() Jarkko Sakkinen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-03 15:03 UTC (permalink / raw)
  To: linux-sgx
  Cc: Jarkko Sakkinen, stable, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Jethro Beekman, Sean Christopherson, Serge Ayoun, linux-kernel

If sgx_page_cache_init() fails in the middle, a trivial return statement
causes unused memory and virtual address space reserved for the EPC
section, not freed. Fix this by using the same rollback, as when
sgx_page_reclaimer_init() fails.

Cc: stable@vger.kernel.org # 5.11
Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8df81a3ed945..52d070fb4c9a 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -708,8 +708,10 @@ static int __init sgx_init(void)
 	if (!cpu_feature_enabled(X86_FEATURE_SGX))
 		return -ENODEV;
 
-	if (!sgx_page_cache_init())
-		return -ENOMEM;
+	if (!sgx_page_cache_init()) {
+		ret = -ENOMEM;
+		goto err_page_cache;
+	}
 
 	if (!sgx_page_reclaimer_init()) {
 		ret = -ENOMEM;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
       [not found] <20210303150323.433207-1-jarkko@kernel.org>
  2021-03-03 15:03 ` [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init() Jarkko Sakkinen
@ 2021-03-03 15:03 ` Jarkko Sakkinen
  2021-03-03 16:59   ` Dave Hansen
  2021-03-03 15:03 ` [PATCH v3 3/5] x86/sgx: Replace section->init_laundry_list with a temp list Jarkko Sakkinen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-03 15:03 UTC (permalink / raw)
  To: linux-sgx
  Cc: Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

Replace the ad-hoc code with a sgx_free_epc_page(), in order to make sure
that all the relevant checks and book keeping is done, while freeing a
borrowed EPC page.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/main.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 52d070fb4c9a..ed99c60024dc 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
 {
 	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
 	struct sgx_backing backing[SGX_NR_TO_SCAN];
-	struct sgx_epc_section *section;
 	struct sgx_encl_page *encl_page;
 	struct sgx_epc_page *epc_page;
 	pgoff_t page_index;
@@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
 
-		section = &sgx_epc_sections[epc_page->section];
-		spin_lock(&section->lock);
-		list_add_tail(&epc_page->list, &section->page_list);
-		section->free_cnt++;
-		spin_unlock(&section->lock);
+		sgx_free_epc_page(epc_page);
 	}
 }
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/5] x86/sgx: Replace section->init_laundry_list with a temp list
       [not found] <20210303150323.433207-1-jarkko@kernel.org>
  2021-03-03 15:03 ` [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init() Jarkko Sakkinen
  2021-03-03 15:03 ` [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages() Jarkko Sakkinen
@ 2021-03-03 15:03 ` Jarkko Sakkinen
  2021-03-03 18:02   ` Dave Hansen
  2021-03-03 15:03 ` [PATCH v3 4/5] x86/sgx: Replace section->page_list with a global free page list Jarkko Sakkinen
  2021-03-03 15:03 ` [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
  4 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-03 15:03 UTC (permalink / raw)
  To: linux-sgx
  Cc: Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

Build a local laundry list in sgx_init(), and transfer its ownsership to
ksgxd for sanitization, thus getting rid of useless member in struct
sgx_epc_section.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/main.c | 64 ++++++++++++++++++----------------
 arch/x86/kernel/cpu/sgx/sgx.h  |  7 ----
 2 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ed99c60024dc..a649010949c2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -30,35 +30,33 @@ static DEFINE_SPINLOCK(sgx_reclaimer_lock);
  * Reset dirty EPC pages to uninitialized state. Laundry can be left with SECS
  * pages whose child pages blocked EREMOVE.
  */
-static void sgx_sanitize_section(struct sgx_epc_section *section)
+static void sgx_sanitize_section(struct list_head *laundry)
 {
 	struct sgx_epc_page *page;
 	LIST_HEAD(dirty);
 	int ret;
 
 	/* init_laundry_list is thread-local, no need for a lock: */
-	while (!list_empty(&section->init_laundry_list)) {
+	while (!list_empty(laundry)) {
 		if (kthread_should_stop())
 			return;
 
-		/* needed for access to ->page_list: */
-		spin_lock(&section->lock);
-
-		page = list_first_entry(&section->init_laundry_list,
-					struct sgx_epc_page, list);
+		page = list_first_entry(laundry, struct sgx_epc_page, list);
 
 		ret = __eremove(sgx_get_epc_virt_addr(page));
-		if (!ret)
-			list_move(&page->list, &section->page_list);
-		else
+		if (!ret) {
+			/* The page is clean - move to the free list. */
+			list_del(&page->list);
+			sgx_free_epc_page(page);
+		} else {
+			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
-
-		spin_unlock(&section->lock);
+		}
 
 		cond_resched();
 	}
 
-	list_splice(&dirty, &section->init_laundry_list);
+	list_splice(&dirty, laundry);
 }
 
 static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -400,6 +398,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
 
 static int ksgxd(void *p)
 {
+	struct list_head *laundry = p;
 	int i;
 
 	set_freezable();
@@ -408,16 +407,13 @@ static int ksgxd(void *p)
 	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
-	for (i = 0; i < sgx_nr_epc_sections; i++)
-		sgx_sanitize_section(&sgx_epc_sections[i]);
+	sgx_sanitize_section(laundry);
+	sgx_sanitize_section(laundry);
 
-	for (i = 0; i < sgx_nr_epc_sections; i++) {
-		sgx_sanitize_section(&sgx_epc_sections[i]);
+	if (!list_empty(laundry))
+		WARN(1, "EPC section %d has unsanitized pages.\n", i);
 
-		/* Should never happen. */
-		if (!list_empty(&sgx_epc_sections[i].init_laundry_list))
-			WARN(1, "EPC section %d has unsanitized pages.\n", i);
-	}
+	kfree(laundry);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())
@@ -436,11 +432,11 @@ static int ksgxd(void *p)
 	return 0;
 }
 
-static bool __init sgx_page_reclaimer_init(void)
+static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
 {
 	struct task_struct *tsk;
 
-	tsk = kthread_run(ksgxd, NULL, "ksgxd");
+	tsk = kthread_run(ksgxd, laundry, "ksgxd");
 	if (IS_ERR(tsk))
 		return false;
 
@@ -614,7 +610,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 					 unsigned long index,
-					 struct sgx_epc_section *section)
+					 struct sgx_epc_section *section,
+					 struct list_head *laundry)
 {
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	unsigned long i;
@@ -632,13 +629,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	section->phys_addr = phys_addr;
 	spin_lock_init(&section->lock);
 	INIT_LIST_HEAD(&section->page_list);
-	INIT_LIST_HEAD(&section->init_laundry_list);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
 		section->pages[i].owner = NULL;
-		list_add_tail(&section->pages[i].list, &section->init_laundry_list);
+		list_add_tail(&section->pages[i].list, laundry);
 	}
 
 	section->free_cnt = nr_pages;
@@ -656,7 +652,7 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
 	       ((high & GENMASK_ULL(19, 0)) << 32);
 }
 
-static bool __init sgx_page_cache_init(void)
+static bool __init sgx_page_cache_init(struct list_head *laundry)
 {
 	u32 eax, ebx, ecx, edx, type;
 	u64 pa, size;
@@ -679,7 +675,7 @@ static bool __init sgx_page_cache_init(void)
 
 		pr_info("EPC section 0x%llx-0x%llx\n", pa, pa + size - 1);
 
-		if (!sgx_setup_epc_section(pa, size, i, &sgx_epc_sections[i])) {
+		if (!sgx_setup_epc_section(pa, size, i, &sgx_epc_sections[i], laundry)) {
 			pr_err("No free memory for an EPC section\n");
 			break;
 		}
@@ -697,18 +693,25 @@ static bool __init sgx_page_cache_init(void)
 
 static int __init sgx_init(void)
 {
+	struct list_head *laundry;
 	int ret;
 	int i;
 
 	if (!cpu_feature_enabled(X86_FEATURE_SGX))
 		return -ENODEV;
 
-	if (!sgx_page_cache_init()) {
+	laundry = kzalloc(sizeof(*laundry), GFP_KERNEL);
+	if (!laundry)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(laundry);
+
+	if (!sgx_page_cache_init(laundry)) {
 		ret = -ENOMEM;
 		goto err_page_cache;
 	}
 
-	if (!sgx_page_reclaimer_init()) {
+	if (!sgx_page_reclaimer_init(laundry)) {
 		ret = -ENOMEM;
 		goto err_page_cache;
 	}
@@ -728,6 +731,7 @@ static int __init sgx_init(void)
 		memunmap(sgx_epc_sections[i].virt_addr);
 	}
 
+	kfree(laundry);
 	return ret;
 }
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5fa42d143feb..bc8af0428640 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -45,13 +45,6 @@ struct sgx_epc_section {
 	spinlock_t lock;
 	struct list_head page_list;
 	unsigned long free_cnt;
-
-	/*
-	 * Pages which need EREMOVE run on them before they can be
-	 * used.  Only safe to be accessed in ksgxd and init code.
-	 * Not protected by locks.
-	 */
-	struct list_head init_laundry_list;
 };
 
 extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 4/5] x86/sgx: Replace section->page_list with a global free page list
       [not found] <20210303150323.433207-1-jarkko@kernel.org>
                   ` (2 preceding siblings ...)
  2021-03-03 15:03 ` [PATCH v3 3/5] x86/sgx: Replace section->init_laundry_list with a temp list Jarkko Sakkinen
@ 2021-03-03 15:03 ` Jarkko Sakkinen
  2021-03-03 23:48   ` Dave Hansen
  2021-03-03 15:03 ` [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
  4 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-03 15:03 UTC (permalink / raw)
  To: linux-sgx
  Cc: Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

Background
==========

EPC section is covered by one or more SRAT entries that are associated with
one and only one PXM (NUMA node). The current implementation overheats a
single NUMA node, because sgx_alloc_epc_page() always starts looking for
pages from the same EPC section everytime.

Only within a section it does pick pages in FIFO fashion, i.e. the oldest
freed in that section is the EPC page given back to the caller.  That does
not do any good, as the pages in the same node are performance-wise equal.

Solution
========

Replace local lists with a single global free page list, from which pages
are borrowed in FIFO fashion.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/main.c | 84 +++++++++++-----------------------
 arch/x86/kernel/cpu/sgx/sgx.h  |  6 ---
 2 files changed, 27 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index a649010949c2..58474480f5be 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -18,14 +18,15 @@ static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
 
-/*
- * These variables are part of the state of the reclaimer, and must be accessed
- * with sgx_reclaimer_lock acquired.
- */
+/* The reclaimer lock protected variables prepend the lock. */
 static LIST_HEAD(sgx_active_page_list);
-
 static DEFINE_SPINLOCK(sgx_reclaimer_lock);
 
+/* The free page list lock protected variables prepend the lock. */
+static unsigned long sgx_nr_free_pages;
+static LIST_HEAD(sgx_free_page_list);
+static DEFINE_SPINLOCK(sgx_free_page_list_lock);
+
 /*
  * Reset dirty EPC pages to uninitialized state. Laundry can be left with SECS
  * pages whose child pages blocked EREMOVE.
@@ -379,21 +380,9 @@ static void sgx_reclaim_pages(void)
 	}
 }
 
-static unsigned long sgx_nr_free_pages(void)
-{
-	unsigned long cnt = 0;
-	int i;
-
-	for (i = 0; i < sgx_nr_epc_sections; i++)
-		cnt += sgx_epc_sections[i].free_cnt;
-
-	return cnt;
-}
-
 static bool sgx_should_reclaim(unsigned long watermark)
 {
-	return sgx_nr_free_pages() < watermark &&
-	       !list_empty(&sgx_active_page_list);
+	return sgx_nr_free_pages < watermark && !list_empty(&sgx_active_page_list);
 }
 
 static int ksgxd(void *p)
@@ -445,50 +434,34 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
 	return true;
 }
 
-static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_section *section)
-{
-	struct sgx_epc_page *page;
-
-	spin_lock(&section->lock);
-
-	if (list_empty(&section->page_list)) {
-		spin_unlock(&section->lock);
-		return NULL;
-	}
-
-	page = list_first_entry(&section->page_list, struct sgx_epc_page, list);
-	list_del_init(&page->list);
-	section->free_cnt--;
-
-	spin_unlock(&section->lock);
-	return page;
-}
-
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
  *
- * Iterate through EPC sections and borrow a free EPC page to the caller. When a
- * page is no longer needed it must be released with sgx_free_epc_page().
+ * Borrow a free EPC page to the caller in FIFO fashion: the callers gets oldest
+ * freed page.
  *
  * Return:
- *   an EPC page,
- *   -errno on error
+ * - an EPC page:	Free EPC pages were available.
+ * - ERR_PTR(-ENOMEM):	Run out of EPC pages.
  */
 struct sgx_epc_page *__sgx_alloc_epc_page(void)
 {
-	struct sgx_epc_section *section;
 	struct sgx_epc_page *page;
-	int i;
 
-	for (i = 0; i < sgx_nr_epc_sections; i++) {
-		section = &sgx_epc_sections[i];
+	spin_lock(&sgx_free_page_list_lock);
 
-		page = __sgx_alloc_epc_page_from_section(section);
-		if (page)
-			return page;
+	if (list_empty(&sgx_free_page_list)) {
+		spin_unlock(&sgx_free_page_list_lock);
+		return NULL;
 	}
 
-	return ERR_PTR(-ENOMEM);
+	page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list);
+	list_del_init(&page->list);
+	sgx_nr_free_pages--;
+
+	spin_unlock(&sgx_free_page_list_lock);
+
+	return page;
 }
 
 /**
@@ -593,7 +566,6 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
-	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	int ret;
 
 	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
@@ -602,10 +574,10 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
 		return;
 
-	spin_lock(&section->lock);
-	list_add_tail(&page->list, &section->page_list);
-	section->free_cnt++;
-	spin_unlock(&section->lock);
+	spin_lock(&sgx_free_page_list_lock);
+	list_add_tail(&page->list, &sgx_free_page_list);
+	sgx_nr_free_pages++;
+	spin_unlock(&sgx_free_page_list_lock);
 }
 
 static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
@@ -627,8 +599,6 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
-	spin_lock_init(&section->lock);
-	INIT_LIST_HEAD(&section->page_list);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -637,7 +607,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		list_add_tail(&section->pages[i].list, laundry);
 	}
 
-	section->free_cnt = nr_pages;
+	sgx_nr_free_pages += nr_pages;
 	return true;
 }
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index bc8af0428640..41ca045a574a 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -34,17 +34,11 @@ struct sgx_epc_page {
  * physical memory e.g. for memory areas of the each node. This structure is
  * used to store EPC pages for one EPC section and virtual memory area where
  * the pages have been mapped.
- *
- * 'lock' must be held before accessing 'page_list' or 'free_cnt'.
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
-
-	spinlock_t lock;
-	struct list_head page_list;
-	unsigned long free_cnt;
 };
 
 extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
       [not found] <20210303150323.433207-1-jarkko@kernel.org>
                   ` (3 preceding siblings ...)
  2021-03-03 15:03 ` [PATCH v3 4/5] x86/sgx: Replace section->page_list with a global free page list Jarkko Sakkinen
@ 2021-03-03 15:03 ` Jarkko Sakkinen
  2021-03-04  0:20   ` Dave Hansen
  4 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-03 15:03 UTC (permalink / raw)
  To: linux-sgx
  Cc: Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, Dave Hansen, linux-kernel

Background
==========

EPC section is covered by one or more SRAT entries that are associated with
one and only one PXM (NUMA node). The motivation behind this patch is to
provide basic elements of building allocation scheme based on this premise.

Solution
========

Use phys_to_target_node() to associate each NUMA node with the EPC
sections contained within its range. In sgx_alloc_epc_page(), first try
to allocate from the NUMA node, where the CPU is executing. If that
fails, fallback to the legacy allocation.

Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/Kconfig               |  1 +
 arch/x86/kernel/cpu/sgx/main.c | 84 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |  9 ++++
 3 files changed, 94 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a5f6a3013138..7eb1e96cfe8a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1940,6 +1940,7 @@ config X86_SGX
 	depends on CRYPTO_SHA256=y
 	select SRCU
 	select MMU_NOTIFIER
+	select NUMA_KEEP_MEMINFO if NUMA
 	help
 	  Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
 	  that can be used by applications to set aside private regions of code
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 58474480f5be..62cc0e1f0728 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -25,6 +25,23 @@ static DEFINE_SPINLOCK(sgx_reclaimer_lock);
 /* The free page list lock protected variables prepend the lock. */
 static unsigned long sgx_nr_free_pages;
 static LIST_HEAD(sgx_free_page_list);
+
+/* Nodes with one or more EPC sections. */
+static nodemask_t sgx_numa_mask;
+
+/*
+ * Array with one list_head for each possible NUMA node.  Each
+ * list contains all the sgx_epc_section's which are on that
+ * node.
+ */
+static struct sgx_numa_node *sgx_numa_nodes;
+
+/*
+ * sgx_free_epc_page() uses this to find out the correct struct sgx_numa_node,
+ * to put the page in.
+ */
+static int sgx_section_to_numa_node_id[SGX_MAX_EPC_SECTIONS];
+
 static DEFINE_SPINLOCK(sgx_free_page_list_lock);
 
 /*
@@ -434,6 +451,36 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
 	return true;
 }
 
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
+{
+	struct sgx_epc_page *page = NULL;
+	struct sgx_numa_node *sgx_node;
+
+	if (WARN_ON_ONCE(nid < 0 || nid >= num_possible_nodes()))
+		return NULL;
+
+	if (!node_isset(nid, sgx_numa_mask))
+		return NULL;
+
+	sgx_node = &sgx_numa_nodes[nid];
+
+	spin_lock(&sgx_free_page_list_lock);
+
+	if (list_empty(&sgx_node->free_page_list)) {
+		spin_unlock(&sgx_free_page_list_lock);
+		return NULL;
+	}
+
+	page = list_first_entry(&sgx_node->free_page_list, struct sgx_epc_page, numa_list);
+	list_del_init(&page->numa_list);
+	list_del_init(&page->list);
+	sgx_nr_free_pages--;
+
+	spin_unlock(&sgx_free_page_list_lock);
+
+	return page;
+}
+
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
  *
@@ -446,8 +493,14 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
  */
 struct sgx_epc_page *__sgx_alloc_epc_page(void)
 {
+	int current_nid = numa_node_id();
 	struct sgx_epc_page *page;
 
+	/* Try to allocate EPC from the current node, first: */
+	page = __sgx_alloc_epc_page_from_node(current_nid);
+	if (page)
+		return page;
+
 	spin_lock(&sgx_free_page_list_lock);
 
 	if (list_empty(&sgx_free_page_list)) {
@@ -456,6 +509,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 	}
 
 	page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list);
+	list_del_init(&page->numa_list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
 
@@ -566,6 +620,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
+	int nid = sgx_section_to_numa_node_id[page->section];
+	struct sgx_numa_node *sgx_node = &sgx_numa_nodes[nid];
 	int ret;
 
 	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
@@ -575,7 +631,15 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 		return;
 
 	spin_lock(&sgx_free_page_list_lock);
+
+	/*   Enable NUMA local allocation in sgx_alloc_epc_page(). */
+	if (!node_isset(nid, sgx_numa_mask)) {
+		INIT_LIST_HEAD(&sgx_node->free_page_list);
+		node_set(nid, sgx_numa_mask);
+	}
+
 	list_add_tail(&page->list, &sgx_free_page_list);
+	list_add_tail(&page->numa_list, &sgx_node->free_page_list);
 	sgx_nr_free_pages++;
 	spin_unlock(&sgx_free_page_list_lock);
 }
@@ -626,8 +690,28 @@ static bool __init sgx_page_cache_init(struct list_head *laundry)
 {
 	u32 eax, ebx, ecx, edx, type;
 	u64 pa, size;
+	int nid;
 	int i;
 
+	nodes_clear(sgx_numa_mask);
+	sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);
+
+	/*
+	 * Create NUMA node lookup table for sgx_free_epc_page() as the very
+	 * first step, as it is used to populate the free list's during the
+	 * initialization.
+	 */
+	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
+		nid = numa_map_to_online_node(phys_to_target_node(pa));
+		if (nid == NUMA_NO_NODE) {
+			/* The physical address is already printed above. */
+			pr_warn(FW_BUG "Unable to map EPC section to online node. Fallback to the NUMA node 0.\n");
+			nid = 0;
+		}
+
+		sgx_section_to_numa_node_id[i] = nid;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
 		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 41ca045a574a..3a3c07fc0c8e 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -27,6 +27,7 @@ struct sgx_epc_page {
 	unsigned int flags;
 	struct sgx_encl_page *owner;
 	struct list_head list;
+	struct list_head numa_list;
 };
 
 /*
@@ -43,6 +44,14 @@ struct sgx_epc_section {
 
 extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 
+/*
+ * Contains the tracking data for NUMA nodes having EPC pages. Most importantly,
+ * the free page list local to the node is stored here.
+ */
+struct sgx_numa_node {
+	struct list_head free_page_list;
+};
+
 static inline unsigned long sgx_get_epc_phys_addr(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init()
  2021-03-03 15:03 ` [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init() Jarkko Sakkinen
@ 2021-03-03 16:56   ` Dave Hansen
  2021-03-10 15:00     ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2021-03-03 16:56 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: stable, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Jethro Beekman,
	Sean Christopherson, Serge Ayoun, linux-kernel

On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> If sgx_page_cache_init() fails in the middle, a trivial return
> statement causes unused memory and virtual address space reserved for
> the EPC section, not freed. Fix this by using the same rollback, as
> when sgx_page_reclaimer_init() fails.
...
> @@ -708,8 +708,10 @@ static int __init sgx_init(void)
>  	if (!cpu_feature_enabled(X86_FEATURE_SGX))
>  		return -ENODEV;
>  
> -	if (!sgx_page_cache_init())
> -		return -ENOMEM;
> +	if (!sgx_page_cache_init()) {
> +		ret = -ENOMEM;
> +		goto err_page_cache;
> +	}


Currently, the only way sgx_page_cache_init() can fail is in the case
that there are no sections:

        if (!sgx_nr_epc_sections) {
                pr_err("There are zero EPC sections.\n");
                return false;
        }

That only happened if all sgx_setup_epc_section() calls failed.
sgx_setup_epc_section() never both allocates memory with vmalloc for
section->pages *and* fails.  If sgx_setup_epc_section() has a successful
memremap() but a failed vmalloc(), it cleans up with memunmap().

In other words, I see how this _looks_ like a memory leak from
sgx_init(), but I don't see an actual leak in practice.

Am I missing something?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-03 15:03 ` [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages() Jarkko Sakkinen
@ 2021-03-03 16:59   ` Dave Hansen
  2021-03-10 15:11     ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2021-03-03 16:59 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel

On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 52d070fb4c9a..ed99c60024dc 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
>  {
>  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
>  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> -	struct sgx_epc_section *section;
>  	struct sgx_encl_page *encl_page;
>  	struct sgx_epc_page *epc_page;
>  	pgoff_t page_index;
> @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
>  
> -		section = &sgx_epc_sections[epc_page->section];
> -		spin_lock(&section->lock);
> -		list_add_tail(&epc_page->list, &section->page_list);
> -		section->free_cnt++;
> -		spin_unlock(&section->lock);
> +		sgx_free_epc_page(epc_page);
>  	}
>  }

In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
 This code does not call __eremove().  That seems to be changing
behavior where none was intended.

Was this, perhaps, based on top of Kai's series that changes the
behavior of sgx_free_epc_page()?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/5] x86/sgx: Replace section->init_laundry_list with a temp list
  2021-03-03 15:03 ` [PATCH v3 3/5] x86/sgx: Replace section->init_laundry_list with a temp list Jarkko Sakkinen
@ 2021-03-03 18:02   ` Dave Hansen
  2021-03-10 14:50     ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2021-03-03 18:02 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel

...
> -static void sgx_sanitize_section(struct sgx_epc_section *section)
> +static void sgx_sanitize_section(struct list_head *laundry)
>  {

Does this need a better function name now that it's not literally
dealing with sections at *all*?

	sgx_sanitize_pages()

perhaps.

>  	struct sgx_epc_page *page;
>  	LIST_HEAD(dirty);
>  	int ret;
>  
>  	/* init_laundry_list is thread-local, no need for a lock: */
> -	while (!list_empty(&section->init_laundry_list)) {
> +	while (!list_empty(laundry)) {
>  		if (kthread_should_stop())
>  			return;
>  
> -		/* needed for access to ->page_list: */
> -		spin_lock(&section->lock);
> -
> -		page = list_first_entry(&section->init_laundry_list,
> -					struct sgx_epc_page, list);
> +		page = list_first_entry(laundry, struct sgx_epc_page, list);
>  
>  		ret = __eremove(sgx_get_epc_virt_addr(page));
> -		if (!ret)
> -			list_move(&page->list, &section->page_list);
> -		else
> +		if (!ret) {
> +			/* The page is clean - move to the free list. */
> +			list_del(&page->list);
> +			sgx_free_epc_page(page);
> +		} else {
> +			/* The page is not yet clean - move to the dirty list. */
>  			list_move_tail(&page->list, &dirty);
> -
> -		spin_unlock(&section->lock);
> +		}
>  
>  		cond_resched();
>  	}
>  
> -	list_splice(&dirty, &section->init_laundry_list);
> +	list_splice(&dirty, laundry);
>  }
>  
>  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> @@ -400,6 +398,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
>  
>  static int ksgxd(void *p)
>  {
> +	struct list_head *laundry = p;
>  	int i;
>  
>  	set_freezable();
> @@ -408,16 +407,13 @@ static int ksgxd(void *p)
>  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
> -	for (i = 0; i < sgx_nr_epc_sections; i++)
> -		sgx_sanitize_section(&sgx_epc_sections[i]);
> +	sgx_sanitize_section(laundry);
> +	sgx_sanitize_section(laundry);

Did you intend to call this twice?

> -	for (i = 0; i < sgx_nr_epc_sections; i++) {
> -		sgx_sanitize_section(&sgx_epc_sections[i]);
> +	if (!list_empty(laundry))
> +		WARN(1, "EPC section %d has unsanitized pages.\n", i);
>  
> -		/* Should never happen. */
> -		if (!list_empty(&sgx_epc_sections[i].init_laundry_list))
> -			WARN(1, "EPC section %d has unsanitized pages.\n", i);
> -	}
> +	kfree(laundry);

This is a bit unfortunate.  'laundry' is allocated up in another thread
and the lifetime isn't obvious.  It's just 32 bytes, but this is just
asking to be leaked.

>  	while (!kthread_should_stop()) {
>  		if (try_to_freeze())
> @@ -436,11 +432,11 @@ static int ksgxd(void *p)
>  	return 0;
>  }
>  
> -static bool __init sgx_page_reclaimer_init(void)
> +static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
>  {
>  	struct task_struct *tsk;
>  
> -	tsk = kthread_run(ksgxd, NULL, "ksgxd");
> +	tsk = kthread_run(ksgxd, laundry, "ksgxd");
>  	if (IS_ERR(tsk))
>  		return false;
>  
> @@ -614,7 +610,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
>  
>  static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  					 unsigned long index,
> -					 struct sgx_epc_section *section)
> +					 struct sgx_epc_section *section,
> +					 struct list_head *laundry)
>  {

I think this at least need a comment somewhere about what this function
is doing with 'laundry'.

>  	unsigned long nr_pages = size >> PAGE_SHIFT;
>  	unsigned long i;
> @@ -632,13 +629,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	section->phys_addr = phys_addr;
>  	spin_lock_init(&section->lock);
>  	INIT_LIST_HEAD(&section->page_list);
> -	INIT_LIST_HEAD(&section->init_laundry_list);
>  
>  	for (i = 0; i < nr_pages; i++) {
>  		section->pages[i].section = index;
>  		section->pages[i].flags = 0;
>  		section->pages[i].owner = NULL;
> -		list_add_tail(&section->pages[i].list, &section->init_laundry_list);
> +		list_add_tail(&section->pages[i].list, laundry);
>  	}
>  
>  	section->free_cnt = nr_pages;
> @@ -656,7 +652,7 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
>  	       ((high & GENMASK_ULL(19, 0)) << 32);
>  }
>  
> -static bool __init sgx_page_cache_init(void)
> +static bool __init sgx_page_cache_init(struct list_head *laundry)
>  {
>  	u32 eax, ebx, ecx, edx, type;
>  	u64 pa, size;
> @@ -679,7 +675,7 @@ static bool __init sgx_page_cache_init(void)
>  
>  		pr_info("EPC section 0x%llx-0x%llx\n", pa, pa + size - 1);
>  
> -		if (!sgx_setup_epc_section(pa, size, i, &sgx_epc_sections[i])) {
> +		if (!sgx_setup_epc_section(pa, size, i, &sgx_epc_sections[i], laundry)) {
>  			pr_err("No free memory for an EPC section\n");
>  			break;
>  		}

This is a great place for a comment about what is coming back on 'laundry'.

> @@ -697,18 +693,25 @@ static bool __init sgx_page_cache_init(void)
>  
>  static int __init sgx_init(void)
>  {
> +	struct list_head *laundry;
>  	int ret;
>  	int i;
>  
>  	if (!cpu_feature_enabled(X86_FEATURE_SGX))
>  		return -ENODEV;
>  
> -	if (!sgx_page_cache_init()) {
> +	laundry = kzalloc(sizeof(*laundry), GFP_KERNEL);
> +	if (!laundry)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(laundry);
> +
> +	if (!sgx_page_cache_init(laundry)) {
>  		ret = -ENOMEM;
>  		goto err_page_cache;
>  	}
>  
> -	if (!sgx_page_reclaimer_init()) {
> +	if (!sgx_page_reclaimer_init(laundry)) {
>  		ret = -ENOMEM;
>  		goto err_page_cache;
>  	}

I really don't like this being dynamically allocated, especially since
it's freed in another task in a non-obvious place.

Wouldn't this all just be a lot simpler if we had a global list_head?
That will eat a whopping 16 bytes of space.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/5] x86/sgx: Replace section->page_list with a global free page list
  2021-03-03 15:03 ` [PATCH v3 4/5] x86/sgx: Replace section->page_list with a global free page list Jarkko Sakkinen
@ 2021-03-03 23:48   ` Dave Hansen
  2021-03-10 10:54     ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2021-03-03 23:48 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel

On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> Background
> ==========
> 
> EPC section is covered by one or more SRAT entries that are associated with
> one and only one PXM (NUMA node). The current implementation overheats a

Overheats?

> single NUMA node, because sgx_alloc_epc_page() always starts looking for
> pages from the same EPC section everytime.

"every time"

> Only within a section it does pick pages in FIFO fashion, i.e. the oldest
> freed in that section is the EPC page given back to the caller.  That does
> not do any good, as the pages in the same node are performance-wise equal.

I'm not sure why all of this is relevant and it doesn't really tell me
anything about this patch's place in the *series*.

Why are we destroying all of the per-node structures just before adding
NUMA support?

> Replace local lists with a single global free page list, from which pages
> are borrowed in FIFO fashion.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-03-03 15:03 ` [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
@ 2021-03-04  0:20   ` Dave Hansen
  2021-03-10 11:30     ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2021-03-04  0:20 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Dave Hansen, linux-kernel

What changed from the last patch?

On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> Background
> ==========
> 
> EPC section is covered by one or more SRAT entries that are associated with
> one and only one PXM (NUMA node). The motivation behind this patch is to
> provide basic elements of building allocation scheme based on this premise.

Just like normal RAM, enclave memory (EPC) should be covered by entries
in the ACPI SRAT table.  These entries allow each EPC section to be
associated with a NUMA node.

Use this information to implement a simple NUMA-aware allocator for
enclave memory.

> Use phys_to_target_node() to associate each NUMA node with the EPC
> sections contained within its range. In sgx_alloc_epc_page(), first try
> to allocate from the NUMA node, where the CPU is executing. If that
> fails, fallback to the legacy allocation.

By "legacy", you mean the one from the last patch? :)

> Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
>  arch/x86/Kconfig               |  1 +
>  arch/x86/kernel/cpu/sgx/main.c | 84 ++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h  |  9 ++++
>  3 files changed, 94 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index a5f6a3013138..7eb1e96cfe8a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1940,6 +1940,7 @@ config X86_SGX
>  	depends on CRYPTO_SHA256=y
>  	select SRCU
>  	select MMU_NOTIFIER
> +	select NUMA_KEEP_MEMINFO if NUMA

This dependency is worth mentioning somewhere.  Why do we suddenly need
NUMA_KEEP_MEMINFO?

> +/* Nodes with one or more EPC sections. */
> +static nodemask_t sgx_numa_mask;
> +
> +/*
> + * Array with one list_head for each possible NUMA node.  Each
> + * list contains all the sgx_epc_section's which are on that

					   ^ no "'", please

> + * node.
> + */
> +static struct sgx_numa_node *sgx_numa_nodes;
> +
> +/*
> + * sgx_free_epc_page() uses this to find out the correct struct sgx_numa_node,
> + * to put the page in.
> + */
> +static int sgx_section_to_numa_node_id[SGX_MAX_EPC_SECTIONS];

If this is per-section, why not put it in struct sgx_epc_section?

>  /*
> @@ -434,6 +451,36 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
>  	return true;
>  }
>  
> +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> +{
> +	struct sgx_epc_page *page = NULL;
> +	struct sgx_numa_node *sgx_node;
> +
> +	if (WARN_ON_ONCE(nid < 0 || nid >= num_possible_nodes()))
> +		return NULL;

This has exactly one call-site which plumbs numa_node_id() in here
pretty directly.  Is this check worthwhile?

> +	if (!node_isset(nid, sgx_numa_mask))
> +		return NULL;
> +
> +	sgx_node = &sgx_numa_nodes[nid];
> +
> +	spin_lock(&sgx_free_page_list_lock);

The glocal lock protecting a per-node structure is a bit unsightly.

> +	if (list_empty(&sgx_node->free_page_list)) {
> +		spin_unlock(&sgx_free_page_list_lock);
> +		return NULL;
> +	}
> +
> +	page = list_first_entry(&sgx_node->free_page_list, struct sgx_epc_page, numa_list);
> +	list_del_init(&page->numa_list);
> +	list_del_init(&page->list);
> +	sgx_nr_free_pages--;
> +
> +	spin_unlock(&sgx_free_page_list_lock);
> +
> +	return page;
> +}
> +
>  /**
>   * __sgx_alloc_epc_page() - Allocate an EPC page
>   *
> @@ -446,8 +493,14 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
>   */
>  struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  {
> +	int current_nid = numa_node_id();
>  	struct sgx_epc_page *page;
>  
> +	/* Try to allocate EPC from the current node, first: */
> +	page = __sgx_alloc_epc_page_from_node(current_nid);
> +	if (page)
> +		return page;
> +
>  	spin_lock(&sgx_free_page_list_lock);
>  
>  	if (list_empty(&sgx_free_page_list)) {
> @@ -456,6 +509,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  	}
>  
>  	page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list);
> +	list_del_init(&page->numa_list);
>  	list_del_init(&page->list);
>  	sgx_nr_free_pages--;

I would much rather prefer that this does what the real page allocator
does: kep the page on a single list.  That list is maintained
per-NUMA-node.  Allocations try local NUMA node structures, then fall
back to other structures (hopefully in a locality-aware fashion).

I wrote you the loop that I want to see this implement in an earlier
review.  This, basically:

	page = NULL;
	nid = numa_node_id();
	while (true) {
		page = __sgx_alloc_epc_page_from_node(nid);	
		if (page)
			break;

		nid = // ... some search here, next_node_in()...
		// check if we wrapped around:
		if (nid == numa_node_id())
			break;
	}

There's no global list.  You just walk around nodes trying to find one
with space.  If you wrap around, you stop.

Please implement this.  If you think it's a bad idea, or can't, let's
talk about it in advance.  Right now, it appears that my review comments
aren't being incorporated into newer versions.

>  void sgx_free_epc_page(struct sgx_epc_page *page)
>  {
> +	int nid = sgx_section_to_numa_node_id[page->section];
> +	struct sgx_numa_node *sgx_node = &sgx_numa_nodes[nid];
>  	int ret;
>  
>  	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> @@ -575,7 +631,15 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
>  		return;
>  
>  	spin_lock(&sgx_free_page_list_lock);
> +
> +	/*   Enable NUMA local allocation in sgx_alloc_epc_page(). */
> +	if (!node_isset(nid, sgx_numa_mask)) {
> +		INIT_LIST_HEAD(&sgx_node->free_page_list);
> +		node_set(nid, sgx_numa_mask);
> +	}
> +
>  	list_add_tail(&page->list, &sgx_free_page_list);
> +	list_add_tail(&page->numa_list, &sgx_node->free_page_list);
>  	sgx_nr_free_pages++;
>  	spin_unlock(&sgx_free_page_list_lock);
>  }
> @@ -626,8 +690,28 @@ static bool __init sgx_page_cache_init(struct list_head *laundry)
>  {
>  	u32 eax, ebx, ecx, edx, type;
>  	u64 pa, size;
> +	int nid;
>  	int i;
>  
> +	nodes_clear(sgx_numa_mask);

Is this really required for a variable allocated in .bss?

> +	sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);

This is what I was looking for here, thanks!

> +	/*
> +	 * Create NUMA node lookup table for sgx_free_epc_page() as the very
> +	 * first step, as it is used to populate the free list's during the
> +	 * initialization.
> +	 */
> +	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
> +		nid = numa_map_to_online_node(phys_to_target_node(pa));
> +		if (nid == NUMA_NO_NODE) {
> +			/* The physical address is already printed above. */
> +			pr_warn(FW_BUG "Unable to map EPC section to online node. Fallback to the NUMA node 0.\n");
> +			nid = 0;
> +		}
> +
> +		sgx_section_to_numa_node_id[i] = nid;
> +	}
> +
>  	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
>  		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
>  
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 41ca045a574a..3a3c07fc0c8e 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -27,6 +27,7 @@ struct sgx_epc_page {
>  	unsigned int flags;
>  	struct sgx_encl_page *owner;
>  	struct list_head list;
> +	struct list_head numa_list;
>  };

I'll say it again, explicitly: Each sgx_epc_page should be on one and
only one free list: a per-NUMA-node list.

>  /*
> @@ -43,6 +44,14 @@ struct sgx_epc_section {
>  
>  extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>  
> +/*
> + * Contains the tracking data for NUMA nodes having EPC pages. Most importantly,
> + * the free page list local to the node is stored here.
> + */
> +struct sgx_numa_node {
> +	struct list_head free_page_list;
> +};

I think it's unconscionable to leave this protected by a global lock.
Please at least give us a per-node spinlock proteting this list.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/5] x86/sgx: Replace section->page_list with a global free page list
  2021-03-03 23:48   ` Dave Hansen
@ 2021-03-10 10:54     ` Jarkko Sakkinen
  0 siblings, 0 replies; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 10:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Wed, Mar 03, 2021 at 03:48:46PM -0800, Dave Hansen wrote:
> On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > Background
> > ==========
> > 
> > EPC section is covered by one or more SRAT entries that are associated with
> > one and only one PXM (NUMA node). The current implementation overheats a
> 
> Overheats?
> 
> > single NUMA node, because sgx_alloc_epc_page() always starts looking for
> > pages from the same EPC section everytime.
> 
> "every time"
> 
> > Only within a section it does pick pages in FIFO fashion, i.e. the oldest
> > freed in that section is the EPC page given back to the caller.  That does
> > not do any good, as the pages in the same node are performance-wise equal.
> 
> I'm not sure why all of this is relevant and it doesn't really tell me
> anything about this patch's place in the *series*.
> 
> Why are we destroying all of the per-node structures just before adding
> NUMA support?

These are per-section structures, not per-node structures. Probably most
times, if not all times, they are equal, but conceptually I'm not
destroying per-node structures :-) I'm introducing them in 5/5.

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-03-04  0:20   ` Dave Hansen
@ 2021-03-10 11:30     ` Jarkko Sakkinen
  2021-03-10 15:44       ` Dave Hansen
  0 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 11:30 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Dave Hansen, linux-kernel

Weird. I did check my kernel org last time on Thrusday night but did not
get this. I was actually wondering the lack of feedback.

Then I had suddenly huge pile of email waiting for me on Monday with
bunch emails from around the time you sent this one.

On Wed, Mar 03, 2021 at 04:20:03PM -0800, Dave Hansen wrote:
> What changed from the last patch?
> 
> On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > Background
> > ==========
> > 
> > EPC section is covered by one or more SRAT entries that are associated with
> > one and only one PXM (NUMA node). The motivation behind this patch is to
> > provide basic elements of building allocation scheme based on this premise.
> 
> Just like normal RAM, enclave memory (EPC) should be covered by entries
> in the ACPI SRAT table.  These entries allow each EPC section to be
> associated with a NUMA node.
> 
> Use this information to implement a simple NUMA-aware allocator for
> enclave memory.
> 
> > Use phys_to_target_node() to associate each NUMA node with the EPC
> > sections contained within its range. In sgx_alloc_epc_page(), first try
> > to allocate from the NUMA node, where the CPU is executing. If that
> > fails, fallback to the legacy allocation.
> 
> By "legacy", you mean the one from the last patch? :)
> 
> > Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> >  arch/x86/Kconfig               |  1 +
> >  arch/x86/kernel/cpu/sgx/main.c | 84 ++++++++++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/sgx.h  |  9 ++++
> >  3 files changed, 94 insertions(+)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index a5f6a3013138..7eb1e96cfe8a 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1940,6 +1940,7 @@ config X86_SGX
> >  	depends on CRYPTO_SHA256=y
> >  	select SRCU
> >  	select MMU_NOTIFIER
> > +	select NUMA_KEEP_MEMINFO if NUMA
> 
> This dependency is worth mentioning somewhere.  Why do we suddenly need
> NUMA_KEEP_MEMINFO?
> 
> > +/* Nodes with one or more EPC sections. */
> > +static nodemask_t sgx_numa_mask;
> > +
> > +/*
> > + * Array with one list_head for each possible NUMA node.  Each
> > + * list contains all the sgx_epc_section's which are on that
> 
> 					   ^ no "'", please
> 
> > + * node.
> > + */
> > +static struct sgx_numa_node *sgx_numa_nodes;
> > +
> > +/*
> > + * sgx_free_epc_page() uses this to find out the correct struct sgx_numa_node,
> > + * to put the page in.
> > + */
> > +static int sgx_section_to_numa_node_id[SGX_MAX_EPC_SECTIONS];
> 
> If this is per-section, why not put it in struct sgx_epc_section?

Because struct sgx_epc_page does not contain a pointer to
struct sgx_epc_section.

> 
> >  /*
> > @@ -434,6 +451,36 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
> >  	return true;
> >  }
> >  
> > +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> > +{
> > +	struct sgx_epc_page *page = NULL;
> > +	struct sgx_numa_node *sgx_node;
> > +
> > +	if (WARN_ON_ONCE(nid < 0 || nid >= num_possible_nodes()))
> > +		return NULL;
> 
> This has exactly one call-site which plumbs numa_node_id() in here
> pretty directly.  Is this check worthwhile?

Probably not.


> > +	if (!node_isset(nid, sgx_numa_mask))
> > +		return NULL;
> > +
> > +	sgx_node = &sgx_numa_nodes[nid];
> > +
> > +	spin_lock(&sgx_free_page_list_lock);
> 
> The glocal lock protecting a per-node structure is a bit unsightly.

The patch set could introduce additional patch for changing the
locking scheme. It's logically a separate change.

> > +	if (list_empty(&sgx_node->free_page_list)) {
> > +		spin_unlock(&sgx_free_page_list_lock);
> > +		return NULL;
> > +	}
> > +
> > +	page = list_first_entry(&sgx_node->free_page_list, struct sgx_epc_page, numa_list);
> > +	list_del_init(&page->numa_list);
> > +	list_del_init(&page->list);
> > +	sgx_nr_free_pages--;
> > +
> > +	spin_unlock(&sgx_free_page_list_lock);
> > +
> > +	return page;
> > +}
> > +
> >  /**
> >   * __sgx_alloc_epc_page() - Allocate an EPC page
> >   *
> > @@ -446,8 +493,14 @@ static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
> >   */
> >  struct sgx_epc_page *__sgx_alloc_epc_page(void)
> >  {
> > +	int current_nid = numa_node_id();
> >  	struct sgx_epc_page *page;
> >  
> > +	/* Try to allocate EPC from the current node, first: */
> > +	page = __sgx_alloc_epc_page_from_node(current_nid);
> > +	if (page)
> > +		return page;
> > +
> >  	spin_lock(&sgx_free_page_list_lock);
> >  
> >  	if (list_empty(&sgx_free_page_list)) {
> > @@ -456,6 +509,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
> >  	}
> >  
> >  	page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list);
> > +	list_del_init(&page->numa_list);
> >  	list_del_init(&page->list);
> >  	sgx_nr_free_pages--;
> 
> I would much rather prefer that this does what the real page allocator
> does: kep the page on a single list.  That list is maintained
> per-NUMA-node.  Allocations try local NUMA node structures, then fall
> back to other structures (hopefully in a locality-aware fashion).
> 
> I wrote you the loop that I want to see this implement in an earlier
> review.  This, basically:
> 
> 	page = NULL;
> 	nid = numa_node_id();
> 	while (true) {
> 		page = __sgx_alloc_epc_page_from_node(nid);	
> 		if (page)
> 			break;
> 
> 		nid = // ... some search here, next_node_in()...
> 		// check if we wrapped around:
> 		if (nid == numa_node_id())
> 			break;
> 	}
> 
> There's no global list.  You just walk around nodes trying to find one
> with space.  If you wrap around, you stop.
> 
> Please implement this.  If you think it's a bad idea, or can't, let's
> talk about it in advance.  Right now, it appears that my review comments
> aren't being incorporated into newer versions.

How I interpreted your earlier comments is that the fallback is unfair and
this patch set version does fix that. 

I can buy the above allocation scheme, but I don't think this patch set
version is a step backwards. The things done to struct sgx_epc_section
are exactly what should be done to it.

Implementation-wise you are asking me to squash 4/5 and 5/5 into a single
patch, and remove global list. It's a tiny iteration from this patch
version and I can do it.

> >  void sgx_free_epc_page(struct sgx_epc_page *page)
> >  {
> > +	int nid = sgx_section_to_numa_node_id[page->section];
> > +	struct sgx_numa_node *sgx_node = &sgx_numa_nodes[nid];
> >  	int ret;
> >  
> >  	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> > @@ -575,7 +631,15 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
> >  		return;
> >  
> >  	spin_lock(&sgx_free_page_list_lock);
> > +
> > +	/*   Enable NUMA local allocation in sgx_alloc_epc_page(). */
> > +	if (!node_isset(nid, sgx_numa_mask)) {
> > +		INIT_LIST_HEAD(&sgx_node->free_page_list);
> > +		node_set(nid, sgx_numa_mask);
> > +	}
> > +
> >  	list_add_tail(&page->list, &sgx_free_page_list);
> > +	list_add_tail(&page->numa_list, &sgx_node->free_page_list);
> >  	sgx_nr_free_pages++;
> >  	spin_unlock(&sgx_free_page_list_lock);
> >  }
> > @@ -626,8 +690,28 @@ static bool __init sgx_page_cache_init(struct list_head *laundry)
> >  {
> >  	u32 eax, ebx, ecx, edx, type;
> >  	u64 pa, size;
> > +	int nid;
> >  	int i;
> >  
> > +	nodes_clear(sgx_numa_mask);
> 
> Is this really required for a variable allocated in .bss?

Probably not, I'll check what nodes_clear() does.

> > +	sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);
> 
> This is what I was looking for here, thanks!
> 
> > +	/*
> > +	 * Create NUMA node lookup table for sgx_free_epc_page() as the very
> > +	 * first step, as it is used to populate the free list's during the
> > +	 * initialization.
> > +	 */
> > +	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
> > +		nid = numa_map_to_online_node(phys_to_target_node(pa));
> > +		if (nid == NUMA_NO_NODE) {
> > +			/* The physical address is already printed above. */
> > +			pr_warn(FW_BUG "Unable to map EPC section to online node. Fallback to the NUMA node 0.\n");
> > +			nid = 0;
> > +		}
> > +
> > +		sgx_section_to_numa_node_id[i] = nid;
> > +	}
> > +
> >  	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
> >  		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 41ca045a574a..3a3c07fc0c8e 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -27,6 +27,7 @@ struct sgx_epc_page {
> >  	unsigned int flags;
> >  	struct sgx_encl_page *owner;
> >  	struct list_head list;
> > +	struct list_head numa_list;
> >  };
> 
> I'll say it again, explicitly: Each sgx_epc_page should be on one and
> only one free list: a per-NUMA-node list.
> 
> >  /*
> > @@ -43,6 +44,14 @@ struct sgx_epc_section {
> >  
> >  extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
> >  
> > +/*
> > + * Contains the tracking data for NUMA nodes having EPC pages. Most importantly,
> > + * the free page list local to the node is stored here.
> > + */
> > +struct sgx_numa_node {
> > +	struct list_head free_page_list;
> > +};
> 
> I think it's unconscionable to leave this protected by a global lock.
> Please at least give us a per-node spinlock proteting this list.

I can do it but I'll add a separate commit for it. It's better to make
locking scheme changes that way (IMHO). Helps with bisection later on...

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/5] x86/sgx: Replace section->init_laundry_list with a temp list
  2021-03-03 18:02   ` Dave Hansen
@ 2021-03-10 14:50     ` Jarkko Sakkinen
  0 siblings, 0 replies; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 14:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Wed, Mar 03, 2021 at 10:02:27AM -0800, Dave Hansen wrote:
> ...
> > -static void sgx_sanitize_section(struct sgx_epc_section *section)
> > +static void sgx_sanitize_section(struct list_head *laundry)
> >  {
> 
> Does this need a better function name now that it's not literally
> dealing with sections at *all*?
> 
> 	sgx_sanitize_pages()
> 
> perhaps.

Makes sense to me.

> >  	struct sgx_epc_page *page;
> >  	LIST_HEAD(dirty);
> >  	int ret;
> >  
> >  	/* init_laundry_list is thread-local, no need for a lock: */
> > -	while (!list_empty(&section->init_laundry_list)) {
> > +	while (!list_empty(laundry)) {
> >  		if (kthread_should_stop())
> >  			return;
> >  
> > -		/* needed for access to ->page_list: */
> > -		spin_lock(&section->lock);
> > -
> > -		page = list_first_entry(&section->init_laundry_list,
> > -					struct sgx_epc_page, list);
> > +		page = list_first_entry(laundry, struct sgx_epc_page, list);
> >  
> >  		ret = __eremove(sgx_get_epc_virt_addr(page));
> > -		if (!ret)
> > -			list_move(&page->list, &section->page_list);
> > -		else
> > +		if (!ret) {
> > +			/* The page is clean - move to the free list. */
> > +			list_del(&page->list);
> > +			sgx_free_epc_page(page);
> > +		} else {
> > +			/* The page is not yet clean - move to the dirty list. */
> >  			list_move_tail(&page->list, &dirty);
> > -
> > -		spin_unlock(&section->lock);
> > +		}
> >  
> >  		cond_resched();
> >  	}
> >  
> > -	list_splice(&dirty, &section->init_laundry_list);
> > +	list_splice(&dirty, laundry);
> >  }
> >  
> >  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> > @@ -400,6 +398,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	struct list_head *laundry = p;
> >  	int i;
> >  
> >  	set_freezable();
> > @@ -408,16 +407,13 @@ static int ksgxd(void *p)
> >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> > -	for (i = 0; i < sgx_nr_epc_sections; i++)
> > -		sgx_sanitize_section(&sgx_epc_sections[i]);
> > +	sgx_sanitize_section(laundry);
> > +	sgx_sanitize_section(laundry);
> 
> Did you intend to call this twice?

Yes, see the inline comment above.

> > -	for (i = 0; i < sgx_nr_epc_sections; i++) {
> > -		sgx_sanitize_section(&sgx_epc_sections[i]);
> > +	if (!list_empty(laundry))
> > +		WARN(1, "EPC section %d has unsanitized pages.\n", i);
> >  
> > -		/* Should never happen. */
> > -		if (!list_empty(&sgx_epc_sections[i].init_laundry_list))
> > -			WARN(1, "EPC section %d has unsanitized pages.\n", i);
> > -	}
> > +	kfree(laundry);
> 
> This is a bit unfortunate.  'laundry' is allocated up in another thread
> and the lifetime isn't obvious.  It's just 32 bytes, but this is just
> asking to be leaked.
> >  	while (!kthread_should_stop()) {
> >  		if (try_to_freeze())
> > @@ -436,11 +432,11 @@ static int ksgxd(void *p)
> >  	return 0;
> >  }
> >  
> > -static bool __init sgx_page_reclaimer_init(void)
> > +static bool __init sgx_page_reclaimer_init(struct list_head *laundry)
> >  {
> >  	struct task_struct *tsk;
> >  
> > -	tsk = kthread_run(ksgxd, NULL, "ksgxd");
> > +	tsk = kthread_run(ksgxd, laundry, "ksgxd");
> >  	if (IS_ERR(tsk))
> >  		return false;
> >  
> > @@ -614,7 +610,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
> >  
> >  static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
> >  					 unsigned long index,
> > -					 struct sgx_epc_section *section)
> > +					 struct sgx_epc_section *section,
> > +					 struct list_head *laundry)
> >  {
> 
> I think this at least need a comment somewhere about what this function
> is doing with 'laundry'.

Ok.

> >  	unsigned long nr_pages = size >> PAGE_SHIFT;
> >  	unsigned long i;
> > @@ -632,13 +629,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
> >  	section->phys_addr = phys_addr;
> >  	spin_lock_init(&section->lock);
> >  	INIT_LIST_HEAD(&section->page_list);
> > -	INIT_LIST_HEAD(&section->init_laundry_list);
> >  
> >  	for (i = 0; i < nr_pages; i++) {
> >  		section->pages[i].section = index;
> >  		section->pages[i].flags = 0;
> >  		section->pages[i].owner = NULL;
> > -		list_add_tail(&section->pages[i].list, &section->init_laundry_list);
> > +		list_add_tail(&section->pages[i].list, laundry);
> >  	}
> >  
> >  	section->free_cnt = nr_pages;
> > @@ -656,7 +652,7 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
> >  	       ((high & GENMASK_ULL(19, 0)) << 32);
> >  }
> >  
> > -static bool __init sgx_page_cache_init(void)
> > +static bool __init sgx_page_cache_init(struct list_head *laundry)
> >  {
> >  	u32 eax, ebx, ecx, edx, type;
> >  	u64 pa, size;
> > @@ -679,7 +675,7 @@ static bool __init sgx_page_cache_init(void)
> >  
> >  		pr_info("EPC section 0x%llx-0x%llx\n", pa, pa + size - 1);
> >  
> > -		if (!sgx_setup_epc_section(pa, size, i, &sgx_epc_sections[i])) {
> > +		if (!sgx_setup_epc_section(pa, size, i, &sgx_epc_sections[i], laundry)) {
> >  			pr_err("No free memory for an EPC section\n");
> >  			break;
> >  		}
> 
> This is a great place for a comment about what is coming back on 'laundry'.
> 
> > @@ -697,18 +693,25 @@ static bool __init sgx_page_cache_init(void)
> >  
> >  static int __init sgx_init(void)
> >  {
> > +	struct list_head *laundry;
> >  	int ret;
> >  	int i;
> >  
> >  	if (!cpu_feature_enabled(X86_FEATURE_SGX))
> >  		return -ENODEV;
> >  
> > -	if (!sgx_page_cache_init()) {
> > +	laundry = kzalloc(sizeof(*laundry), GFP_KERNEL);
> > +	if (!laundry)
> > +		return -ENOMEM;
> > +
> > +	INIT_LIST_HEAD(laundry);
> > +
> > +	if (!sgx_page_cache_init(laundry)) {
> >  		ret = -ENOMEM;
> >  		goto err_page_cache;
> >  	}
> >  
> > -	if (!sgx_page_reclaimer_init()) {
> > +	if (!sgx_page_reclaimer_init(laundry)) {
> >  		ret = -ENOMEM;
> >  		goto err_page_cache;
> >  	}
> 
> I really don't like this being dynamically allocated, especially since
> it's freed in another task in a non-obvious place.
> 
> Wouldn't this all just be a lot simpler if we had a global list_head?
> That will eat a whopping 16 bytes of space.

Yeah, why not. It's just then one global instead of per-struct field, which
is quite ugly.

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init()
  2021-03-03 16:56   ` Dave Hansen
@ 2021-03-10 15:00     ` Jarkko Sakkinen
  2021-03-10 15:49       ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 15:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, stable, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Jethro Beekman,
	Sean Christopherson, Serge Ayoun, linux-kernel

On Wed, Mar 03, 2021 at 08:56:52AM -0800, Dave Hansen wrote:
> On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > If sgx_page_cache_init() fails in the middle, a trivial return
> > statement causes unused memory and virtual address space reserved for
> > the EPC section, not freed. Fix this by using the same rollback, as
> > when sgx_page_reclaimer_init() fails.
> ...
> > @@ -708,8 +708,10 @@ static int __init sgx_init(void)
> >  	if (!cpu_feature_enabled(X86_FEATURE_SGX))
> >  		return -ENODEV;
> >  
> > -	if (!sgx_page_cache_init())
> > -		return -ENOMEM;
> > +	if (!sgx_page_cache_init()) {
> > +		ret = -ENOMEM;
> > +		goto err_page_cache;
> > +	}
> 
> 
> Currently, the only way sgx_page_cache_init() can fail is in the case
> that there are no sections:
> 
>         if (!sgx_nr_epc_sections) {
>                 pr_err("There are zero EPC sections.\n");
>                 return false;
>         }
> 
> That only happened if all sgx_setup_epc_section() calls failed.
> sgx_setup_epc_section() never both allocates memory with vmalloc for
> section->pages *and* fails.  If sgx_setup_epc_section() has a successful
> memremap() but a failed vmalloc(), it cleans up with memunmap().
> 
> In other words, I see how this _looks_ like a memory leak from
> sgx_init(), but I don't see an actual leak in practice.
> 
> Am I missing something?

In sgx_setup_epc_section():


	section->pages = vmalloc(nr_pages * sizeof(struct sgx_epc_page));
	if (!section->pages) {
		memunmap(section->virt_addr);
		return false;
	}

I.e. this rollback does not happen without this fix applied:

	for (i = 0; i < sgx_nr_epc_sections; i++) {
		vfree(sgx_epc_sections[i].pages);
		memunmap(sgx_epc_sections[i].virt_addr);
	}

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-03 16:59   ` Dave Hansen
@ 2021-03-10 15:11     ` Jarkko Sakkinen
  2021-03-10 15:55       ` Dave Hansen
  2021-03-10 20:36       ` Kai Huang
  0 siblings, 2 replies; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 15:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Wed, Mar 03, 2021 at 08:59:17AM -0800, Dave Hansen wrote:
> On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 52d070fb4c9a..ed99c60024dc 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
> >  {
> >  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> >  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> > -	struct sgx_epc_section *section;
> >  	struct sgx_encl_page *encl_page;
> >  	struct sgx_epc_page *epc_page;
> >  	pgoff_t page_index;
> > @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
> >  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> >  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> >  
> > -		section = &sgx_epc_sections[epc_page->section];
> > -		spin_lock(&section->lock);
> > -		list_add_tail(&epc_page->list, &section->page_list);
> > -		section->free_cnt++;
> > -		spin_unlock(&section->lock);
> > +		sgx_free_epc_page(epc_page);
> >  	}
> >  }
> 
> In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
>  This code does not call __eremove().  That seems to be changing
> behavior where none was intended.

EREMOVE does not matter here, as it doesn't in almost all most of the sites
where sgx_free_epc_page() is used in the driver. It does nothing to an
uninitialized pages.

The two patches that I posted originally for Kai's series took EREMOVE out
of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
needed, but for reasons unknown to me, that change is gone.

Replacing the ad-hoc code with sgx_free_epc_page() is absolutely the right
action to take because it follows the pattern how sgx_free_epc_page() is
used in the driver.

For reference:

https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/

> Was this, perhaps, based on top of Kai's series that changes the
> behavior of sgx_free_epc_page()?

I did not refer to that patch series.

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-03-10 11:30     ` Jarkko Sakkinen
@ 2021-03-10 15:44       ` Dave Hansen
  2021-03-10 21:48         ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2021-03-10 15:44 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Dave Hansen, linux-kernel

>>> + * node.
>>> + */
>>> +static struct sgx_numa_node *sgx_numa_nodes;
>>> +
>>> +/*
>>> + * sgx_free_epc_page() uses this to find out the correct struct sgx_numa_node,
>>> + * to put the page in.
>>> + */
>>> +static int sgx_section_to_numa_node_id[SGX_MAX_EPC_SECTIONS];
>>
>> If this is per-section, why not put it in struct sgx_epc_section?
> 
> Because struct sgx_epc_page does not contain a pointer to
> struct sgx_epc_section.

Currently, we have epc_page->section.  That's not changing.  But, with
the free list moving from sgx_epc_section to sgx_numa_node, we need a
way to get from page->node, not just page->section.

We can either add that to:

	struct sgx_epc_section {
		...
+		struct sgx_numa_node *node;
	}

so we can do epc_page->section->node to find the epc_page's free list,
or we could just do:

 struct sgx_epc_page {
- 	unsigned int section;
+ 	unsigned int node;
 	unsigned int flags;
 	struct sgx_encl_page *owner;
 	struct list_head list;
	struct list_head numa_list;
 };

and go from page->node directly.

>>>  	page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list);
>>> +	list_del_init(&page->numa_list);
>>>  	list_del_init(&page->list);
>>>  	sgx_nr_free_pages--;
>>
>> I would much rather prefer that this does what the real page allocator
>> does: kep the page on a single list.  That list is maintained
>> per-NUMA-node.  Allocations try local NUMA node structures, then fall
>> back to other structures (hopefully in a locality-aware fashion).
>>
>> I wrote you the loop that I want to see this implement in an earlier
>> review.  This, basically:
>>
>> 	page = NULL;
>> 	nid = numa_node_id();
>> 	while (true) {
>> 		page = __sgx_alloc_epc_page_from_node(nid);	
>> 		if (page)
>> 			break;
>>
>> 		nid = // ... some search here, next_node_in()...
>> 		// check if we wrapped around:
>> 		if (nid == numa_node_id())
>> 			break;
>> 	}
>>
>> There's no global list.  You just walk around nodes trying to find one
>> with space.  If you wrap around, you stop.
>>
>> Please implement this.  If you think it's a bad idea, or can't, let's
>> talk about it in advance.  Right now, it appears that my review comments
>> aren't being incorporated into newer versions.
> 
> How I interpreted your earlier comments is that the fallback is unfair and
> this patch set version does fix that. 
> 
> I can buy the above allocation scheme, but I don't think this patch set
> version is a step backwards. The things done to struct sgx_epc_section
> are exactly what should be done to it.

To me, it's a step backwards.  It regresses in that it falls back to an
entirely non-NUMA aware allocation mechanism.  The global list is
actually likely to be even worse than the per-section searches because
it has a global lock as opposed to the at per-section locks.  It also
has the overhead of managing two lists instead of one.

So, yes, it is *fair* in terms of NUMA node pressure.  But being fair in
a NUMA-aware allocator by simply not doing NUMA at all is a regression.

> Implementation-wise you are asking me to squash 4/5 and 5/5 into a single
> patch, and remove global list. It's a tiny iteration from this patch
> version and I can do it.

Sounds good.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init()
  2021-03-10 15:00     ` Jarkko Sakkinen
@ 2021-03-10 15:49       ` Sean Christopherson
  2021-03-10 21:52         ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-03-10 15:49 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Dave Hansen, linux-sgx, stable, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Jethro Beekman, Serge Ayoun, linux-kernel

On Wed, Mar 10, 2021, Jarkko Sakkinen wrote:
> On Wed, Mar 03, 2021 at 08:56:52AM -0800, Dave Hansen wrote:
> > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > If sgx_page_cache_init() fails in the middle, a trivial return
> > > statement causes unused memory and virtual address space reserved for
> > > the EPC section, not freed. Fix this by using the same rollback, as
> > > when sgx_page_reclaimer_init() fails.
> > ...
> > > @@ -708,8 +708,10 @@ static int __init sgx_init(void)
> > >  	if (!cpu_feature_enabled(X86_FEATURE_SGX))
> > >  		return -ENODEV;
> > >  
> > > -	if (!sgx_page_cache_init())
> > > -		return -ENOMEM;
> > > +	if (!sgx_page_cache_init()) {
> > > +		ret = -ENOMEM;
> > > +		goto err_page_cache;
> > > +	}
> > 
> > 
> > Currently, the only way sgx_page_cache_init() can fail is in the case
> > that there are no sections:
> > 
> >         if (!sgx_nr_epc_sections) {
> >                 pr_err("There are zero EPC sections.\n");
> >                 return false;
> >         }
> > 
> > That only happened if all sgx_setup_epc_section() calls failed.
> > sgx_setup_epc_section() never both allocates memory with vmalloc for
> > section->pages *and* fails.  If sgx_setup_epc_section() has a successful
> > memremap() but a failed vmalloc(), it cleans up with memunmap().
> > 
> > In other words, I see how this _looks_ like a memory leak from
> > sgx_init(), but I don't see an actual leak in practice.
> > 
> > Am I missing something?
> 
> In sgx_setup_epc_section():
> 
> 
> 	section->pages = vmalloc(nr_pages * sizeof(struct sgx_epc_page));
> 	if (!section->pages) {
> 		memunmap(section->virt_addr);
> 		return false;
> 	}
> 
> I.e. this rollback does not happen without this fix applied:
> 
> 	for (i = 0; i < sgx_nr_epc_sections; i++) {
> 		vfree(sgx_epc_sections[i].pages);
> 		memunmap(sgx_epc_sections[i].virt_addr);
> 	}

Dave is pointing out that sgx_page_cache_init() fails if and only if _all_
sections fail sgx_setup_epc_section(), and if all sections fail then
sgx_nr_epc_sections is '0' and the above is a nop.

That behavior is by design, as we didn't want to kill SGX if a single section
failed to initialize for whatever reason.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 15:11     ` Jarkko Sakkinen
@ 2021-03-10 15:55       ` Dave Hansen
  2021-03-10 21:56         ` Jarkko Sakkinen
  2021-03-10 20:36       ` Kai Huang
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Hansen @ 2021-03-10 15:55 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On 3/10/21 7:11 AM, Jarkko Sakkinen wrote:
>>> -		section = &sgx_epc_sections[epc_page->section];
>>> -		spin_lock(&section->lock);
>>> -		list_add_tail(&epc_page->list, &section->page_list);
>>> -		section->free_cnt++;
>>> -		spin_unlock(&section->lock);
>>> +		sgx_free_epc_page(epc_page);
>>>  	}
>>>  }
>> In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
>>  This code does not call __eremove().  That seems to be changing
>> behavior where none was intended.
> EREMOVE does not matter here, as it doesn't in almost all most of the sites
> where sgx_free_epc_page() is used in the driver. It does nothing to an
> uninitialized pages.
> 
> The two patches that I posted originally for Kai's series took EREMOVE out
> of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> needed, but for reasons unknown to me, that change is gone.
> 
> Replacing the ad-hoc code with sgx_free_epc_page() is absolutely the right
> action to take because it follows the pattern how sgx_free_epc_page() is
> used in the driver.

That sounds generally fine.  But, this is a functional change.  Where
there are functional changes, I always hope to see some mention of the
change in the changelog.

Could you add some of this to the next changelog, please?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 15:11     ` Jarkko Sakkinen
  2021-03-10 15:55       ` Dave Hansen
@ 2021-03-10 20:36       ` Kai Huang
  2021-03-10 22:10         ` Jarkko Sakkinen
  1 sibling, 1 reply; 27+ messages in thread
From: Kai Huang @ 2021-03-10 20:36 UTC (permalink / raw)
  To: Jarkko Sakkinen, Dave Hansen
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Wed, 2021-03-10 at 17:11 +0200, Jarkko Sakkinen wrote:
> On Wed, Mar 03, 2021 at 08:59:17AM -0800, Dave Hansen wrote:
> > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > index 52d070fb4c9a..ed99c60024dc 100644
> > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
> > >  {
> > >  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> > >  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> > > -	struct sgx_epc_section *section;
> > >  	struct sgx_encl_page *encl_page;
> > >  	struct sgx_epc_page *epc_page;
> > >  	pgoff_t page_index;
> > > @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
> > >  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> > >  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> > >  
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > -		section = &sgx_epc_sections[epc_page->section];
> > > -		spin_lock(&section->lock);
> > > -		list_add_tail(&epc_page->list, &section->page_list);
> > > -		section->free_cnt++;
> > > -		spin_unlock(&section->lock);
> > > +		sgx_free_epc_page(epc_page);
> > >  	}
> > >  }
> > 
> > In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
> >  This code does not call __eremove().  That seems to be changing
> > behavior where none was intended.
> 
> EREMOVE does not matter here, as it doesn't in almost all most of the sites
> where sgx_free_epc_page() is used in the driver. It does nothing to an
> uninitialized pages.

Right. EREMOVE on uninitialized pages does nothing, so a more reasonable way is to
just NOT call EREMOVE (your original code), since it is absolutely unnecessary.

I don't see ANY reason we should call EREMOVE here. 

Actually w/o my patch to split EREMOVE out of sgx_free_epc_page(), it then makes
perfect sense to have new sgx_free_epc_page() here.

> 
> The two patches that I posted originally for Kai's series took EREMOVE out
> of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> needed, but for reasons unknown to me, that change is gone.
> 

It's not gone. It goes into a new sgx_encl_free_epc_page(), which is exactly the same
as current sgx_free_epc_page() which as EREMOVE, instead of putting EREMOVE into a
dedicated sgx_reset_epc_page(), as you did in your series:

https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/

However, your change has side effort: it always put page back into free pool, even
EREMOVE fails. To make your change w/o having any functional change, it has to be:

	if(!sgx_reset_epc_page())
		sgx_free_epc_page();

And for this, Dave raised one concern we should add a WARN() to let user know EPC
page is leaked, and reboot is requied to get them back.

However with sgx_reset_epc_page(), there's no place to add such WARN(), and
implementing original sgx_free_epc_page() as sgx_encl_free_epc_page() looks very
reasonable to me:

https://www.spinics.net/lists/linux-sgx/msg04631.html


> Replacing the ad-hoc code with sgx_free_epc_page() is absolutely the right
> action to take because it follows the pattern how sgx_free_epc_page() is
> used in the driver.
> 
> For reference:
> 
> https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/
> 
> > Was this, perhaps, based on top of Kai's series that changes the
> > behavior of sgx_free_epc_page()?
> 
> I did not refer to that patch series.
> 
> /Jarkko



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-03-10 15:44       ` Dave Hansen
@ 2021-03-10 21:48         ` Jarkko Sakkinen
  0 siblings, 0 replies; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 21:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Dave Hansen, linux-kernel

On Wed, Mar 10, 2021 at 07:44:39AM -0800, Dave Hansen wrote:
> >>> + * node.
> >>> + */
> >>> +static struct sgx_numa_node *sgx_numa_nodes;
> >>> +
> >>> +/*
> >>> + * sgx_free_epc_page() uses this to find out the correct struct sgx_numa_node,
> >>> + * to put the page in.
> >>> + */
> >>> +static int sgx_section_to_numa_node_id[SGX_MAX_EPC_SECTIONS];
> >>
> >> If this is per-section, why not put it in struct sgx_epc_section?
> > 
> > Because struct sgx_epc_page does not contain a pointer to
> > struct sgx_epc_section.
> 
> Currently, we have epc_page->section.  That's not changing.  But, with
> the free list moving from sgx_epc_section to sgx_numa_node, we need a
> way to get from page->node, not just page->section.
> 
> We can either add that to:
> 
> 	struct sgx_epc_section {
> 		...
> +		struct sgx_numa_node *node;
> 	}
> 
> so we can do epc_page->section->node to find the epc_page's free list,
> or we could just do:
> 
>  struct sgx_epc_page {
> - 	unsigned int section;
> + 	unsigned int node;
>  	unsigned int flags;
>  	struct sgx_encl_page *owner;
>  	struct list_head list;
> 	struct list_head numa_list;
>  };
> 
> and go from page->node directly.

OK, I buy this, thanks.

> >>>  	page = list_first_entry(&sgx_free_page_list, struct sgx_epc_page, list);
> >>> +	list_del_init(&page->numa_list);
> >>>  	list_del_init(&page->list);
> >>>  	sgx_nr_free_pages--;
> >>
> >> I would much rather prefer that this does what the real page allocator
> >> does: kep the page on a single list.  That list is maintained
> >> per-NUMA-node.  Allocations try local NUMA node structures, then fall
> >> back to other structures (hopefully in a locality-aware fashion).
> >>
> >> I wrote you the loop that I want to see this implement in an earlier
> >> review.  This, basically:
> >>
> >> 	page = NULL;
> >> 	nid = numa_node_id();
> >> 	while (true) {
> >> 		page = __sgx_alloc_epc_page_from_node(nid);	
> >> 		if (page)
> >> 			break;
> >>
> >> 		nid = // ... some search here, next_node_in()...
> >> 		// check if we wrapped around:
> >> 		if (nid == numa_node_id())
> >> 			break;
> >> 	}
> >>
> >> There's no global list.  You just walk around nodes trying to find one
> >> with space.  If you wrap around, you stop.
> >>
> >> Please implement this.  If you think it's a bad idea, or can't, let's
> >> talk about it in advance.  Right now, it appears that my review comments
> >> aren't being incorporated into newer versions.
> > 
> > How I interpreted your earlier comments is that the fallback is unfair and
> > this patch set version does fix that. 
> > 
> > I can buy the above allocation scheme, but I don't think this patch set
> > version is a step backwards. The things done to struct sgx_epc_section
> > are exactly what should be done to it.
> 
> To me, it's a step backwards.  It regresses in that it falls back to an
> entirely non-NUMA aware allocation mechanism.  The global list is
> actually likely to be even worse than the per-section searches because
> it has a global lock as opposed to the at per-section locks.  It also
> has the overhead of managing two lists instead of one.
> 
> So, yes, it is *fair* in terms of NUMA node pressure.  But being fair in
> a NUMA-aware allocator by simply not doing NUMA at all is a regression.

The code is structured now in a way that is trivial to remove the global
list and move on to just node lists. I.e. nasty section lists have been
wiped away. Refactoring global list out is a trivial step.  

That way this is a step forwards, even if having a global list would
be step backwards:-)

> > Implementation-wise you are asking me to squash 4/5 and 5/5 into a single
> > patch, and remove global list. It's a tiny iteration from this patch
> > version and I can do it.
> 
> Sounds good.

I'll dissolve your feedback and come with the new version, which I'll
put out tomorrow.

PS. If you don't here of me after you have given feedback to the next
version, please ping privately. It looks like things are getting through
again fast but better be sure than sorry...

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init()
  2021-03-10 15:49       ` Sean Christopherson
@ 2021-03-10 21:52         ` Jarkko Sakkinen
  0 siblings, 0 replies; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 21:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, linux-sgx, stable, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
	Jethro Beekman, Serge Ayoun, linux-kernel

On Wed, Mar 10, 2021 at 07:49:29AM -0800, Sean Christopherson wrote:
> On Wed, Mar 10, 2021, Jarkko Sakkinen wrote:
> > On Wed, Mar 03, 2021 at 08:56:52AM -0800, Dave Hansen wrote:
> > > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > > If sgx_page_cache_init() fails in the middle, a trivial return
> > > > statement causes unused memory and virtual address space reserved for
> > > > the EPC section, not freed. Fix this by using the same rollback, as
> > > > when sgx_page_reclaimer_init() fails.
> > > ...
> > > > @@ -708,8 +708,10 @@ static int __init sgx_init(void)
> > > >  	if (!cpu_feature_enabled(X86_FEATURE_SGX))
> > > >  		return -ENODEV;
> > > >  
> > > > -	if (!sgx_page_cache_init())
> > > > -		return -ENOMEM;
> > > > +	if (!sgx_page_cache_init()) {
> > > > +		ret = -ENOMEM;
> > > > +		goto err_page_cache;
> > > > +	}
> > > 
> > > 
> > > Currently, the only way sgx_page_cache_init() can fail is in the case
> > > that there are no sections:
> > > 
> > >         if (!sgx_nr_epc_sections) {
> > >                 pr_err("There are zero EPC sections.\n");
> > >                 return false;
> > >         }
> > > 
> > > That only happened if all sgx_setup_epc_section() calls failed.
> > > sgx_setup_epc_section() never both allocates memory with vmalloc for
> > > section->pages *and* fails.  If sgx_setup_epc_section() has a successful
> > > memremap() but a failed vmalloc(), it cleans up with memunmap().
> > > 
> > > In other words, I see how this _looks_ like a memory leak from
> > > sgx_init(), but I don't see an actual leak in practice.
> > > 
> > > Am I missing something?
> > 
> > In sgx_setup_epc_section():
> > 
> > 
> > 	section->pages = vmalloc(nr_pages * sizeof(struct sgx_epc_page));
> > 	if (!section->pages) {
> > 		memunmap(section->virt_addr);
> > 		return false;
> > 	}
> > 
> > I.e. this rollback does not happen without this fix applied:
> > 
> > 	for (i = 0; i < sgx_nr_epc_sections; i++) {
> > 		vfree(sgx_epc_sections[i].pages);
> > 		memunmap(sgx_epc_sections[i].virt_addr);
> > 	}
> 
> Dave is pointing out that sgx_page_cache_init() fails if and only if _all_
> sections fail sgx_setup_epc_section(), and if all sections fail then
> sgx_nr_epc_sections is '0' and the above is a nop.
> 
> That behavior is by design, as we didn't want to kill SGX if a single section
> failed to initialize for whatever reason.

My bad. You're correct. I got mixed up by the rollback :-) Thanks!

I'll just drop the whole patch.

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 15:55       ` Dave Hansen
@ 2021-03-10 21:56         ` Jarkko Sakkinen
  0 siblings, 0 replies; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 21:56 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Wed, Mar 10, 2021 at 07:55:35AM -0800, Dave Hansen wrote:
> On 3/10/21 7:11 AM, Jarkko Sakkinen wrote:
> >>> -		section = &sgx_epc_sections[epc_page->section];
> >>> -		spin_lock(&section->lock);
> >>> -		list_add_tail(&epc_page->list, &section->page_list);
> >>> -		section->free_cnt++;
> >>> -		spin_unlock(&section->lock);
> >>> +		sgx_free_epc_page(epc_page);
> >>>  	}
> >>>  }
> >> In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
> >>  This code does not call __eremove().  That seems to be changing
> >> behavior where none was intended.
> > EREMOVE does not matter here, as it doesn't in almost all most of the sites
> > where sgx_free_epc_page() is used in the driver. It does nothing to an
> > uninitialized pages.
> > 
> > The two patches that I posted originally for Kai's series took EREMOVE out
> > of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> > needed, but for reasons unknown to me, that change is gone.
> > 
> > Replacing the ad-hoc code with sgx_free_epc_page() is absolutely the right
> > action to take because it follows the pattern how sgx_free_epc_page() is
> > used in the driver.
> 
> That sounds generally fine.  But, this is a functional change.  Where
> there are functional changes, I always hope to see some mention of the
> change in the changelog.
> 
> Could you add some of this to the next changelog, please?

This appears for the first time in this patch set version, which means that
there is no patch changelog for this.

Maybe a better idea would be to explain the functional change in the commit
message (which of course implies also entry to the patch change log)?

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 20:36       ` Kai Huang
@ 2021-03-10 22:10         ` Jarkko Sakkinen
  2021-03-10 22:12           ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 22:10 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Thu, Mar 11, 2021 at 09:36:15AM +1300, Kai Huang wrote:
> On Wed, 2021-03-10 at 17:11 +0200, Jarkko Sakkinen wrote:
> > On Wed, Mar 03, 2021 at 08:59:17AM -0800, Dave Hansen wrote:
> > > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > > index 52d070fb4c9a..ed99c60024dc 100644
> > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
> > > >  {
> > > >  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> > > >  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> > > > -	struct sgx_epc_section *section;
> > > >  	struct sgx_encl_page *encl_page;
> > > >  	struct sgx_epc_page *epc_page;
> > > >  	pgoff_t page_index;
> > > > @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
> > > >  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> > > >  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> > > >  
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > -		section = &sgx_epc_sections[epc_page->section];
> > > > -		spin_lock(&section->lock);
> > > > -		list_add_tail(&epc_page->list, &section->page_list);
> > > > -		section->free_cnt++;
> > > > -		spin_unlock(&section->lock);
> > > > +		sgx_free_epc_page(epc_page);
> > > >  	}
> > > >  }
> > > 
> > > In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
> > >  This code does not call __eremove().  That seems to be changing
> > > behavior where none was intended.
> > 
> > EREMOVE does not matter here, as it doesn't in almost all most of the sites
> > where sgx_free_epc_page() is used in the driver. It does nothing to an
> > uninitialized pages.
> 
> Right. EREMOVE on uninitialized pages does nothing, so a more reasonable way is to
> just NOT call EREMOVE (your original code), since it is absolutely unnecessary.
> 
> I don't see ANY reason we should call EREMOVE here. 
> 
> Actually w/o my patch to split EREMOVE out of sgx_free_epc_page(), it then makes
> perfect sense to have new sgx_free_epc_page() here.
> 
> > 
> > The two patches that I posted originally for Kai's series took EREMOVE out
> > of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> > needed, but for reasons unknown to me, that change is gone.
> > 
> 
> It's not gone. It goes into a new sgx_encl_free_epc_page(), which is exactly the same
> as current sgx_free_epc_page() which as EREMOVE, instead of putting EREMOVE into a
> dedicated sgx_reset_epc_page(), as you did in your series:
> 
> https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/
> 
> However, your change has side effort: it always put page back into free pool, even
> EREMOVE fails. To make your change w/o having any functional change, it has to be:
> 
> 	if(!sgx_reset_epc_page())
> 		sgx_free_epc_page();

OK, great, your patch set uses the wrapper only in the necessary call
sites. Sorry, I overlooked this part.

Anyway, it knowingly does that. I considered either as equally harmful
side-ffects when I implemented. Either can only trigger, when there is a
bug in the kernel code.

It *could* do what that snippet suggest but it's like "out of the frying pan,
into the fire" kind of change.

Since NUMA patch set anyway requires to have a global dirty list, I think
the better way to deal with this, would be to declare a new global in the
patch under discussion:

static struct list_head sgx_dirty_list;

And sgx_encl_free_epc_page() could simply put the pages in this list. In
some cases you could possibly even reset the system state using kexec for
debugging purposes, so it could potentially bring a tiny bit of value.

I can rebase then my NUMA patches on top of SGX specific KVM patches, once
Boris have applied them.

> And for this, Dave raised one concern we should add a WARN() to let user know EPC
> page is leaked, and reboot is requied to get them back.
> 
> However with sgx_reset_epc_page(), there's no place to add such WARN(), and
> implementing original sgx_free_epc_page() as sgx_encl_free_epc_page() looks very
> reasonable to me:
> 
> https://www.spinics.net/lists/linux-sgx/msg04631.html

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 22:10         ` Jarkko Sakkinen
@ 2021-03-10 22:12           ` Jarkko Sakkinen
  2021-03-10 22:35             ` Jarkko Sakkinen
  0 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 22:12 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Thu, Mar 11, 2021 at 12:10:56AM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 11, 2021 at 09:36:15AM +1300, Kai Huang wrote:
> > On Wed, 2021-03-10 at 17:11 +0200, Jarkko Sakkinen wrote:
> > > On Wed, Mar 03, 2021 at 08:59:17AM -0800, Dave Hansen wrote:
> > > > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > > > index 52d070fb4c9a..ed99c60024dc 100644
> > > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > > @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
> > > > >  {
> > > > >  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> > > > >  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> > > > > -	struct sgx_epc_section *section;
> > > > >  	struct sgx_encl_page *encl_page;
> > > > >  	struct sgx_epc_page *epc_page;
> > > > >  	pgoff_t page_index;
> > > > > @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
> > > > >  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> > > > >  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -		section = &sgx_epc_sections[epc_page->section];
> > > > > -		spin_lock(&section->lock);
> > > > > -		list_add_tail(&epc_page->list, &section->page_list);
> > > > > -		section->free_cnt++;
> > > > > -		spin_unlock(&section->lock);
> > > > > +		sgx_free_epc_page(epc_page);
> > > > >  	}
> > > > >  }
> > > > 
> > > > In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
> > > >  This code does not call __eremove().  That seems to be changing
> > > > behavior where none was intended.
> > > 
> > > EREMOVE does not matter here, as it doesn't in almost all most of the sites
> > > where sgx_free_epc_page() is used in the driver. It does nothing to an
> > > uninitialized pages.
> > 
> > Right. EREMOVE on uninitialized pages does nothing, so a more reasonable way is to
> > just NOT call EREMOVE (your original code), since it is absolutely unnecessary.
> > 
> > I don't see ANY reason we should call EREMOVE here. 
> > 
> > Actually w/o my patch to split EREMOVE out of sgx_free_epc_page(), it then makes
> > perfect sense to have new sgx_free_epc_page() here.
> > 
> > > 
> > > The two patches that I posted originally for Kai's series took EREMOVE out
> > > of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> > > needed, but for reasons unknown to me, that change is gone.
> > > 
> > 
> > It's not gone. It goes into a new sgx_encl_free_epc_page(), which is exactly the same
> > as current sgx_free_epc_page() which as EREMOVE, instead of putting EREMOVE into a
> > dedicated sgx_reset_epc_page(), as you did in your series:
> > 
> > https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/
> > 
> > However, your change has side effort: it always put page back into free pool, even
> > EREMOVE fails. To make your change w/o having any functional change, it has to be:
> > 
> > 	if(!sgx_reset_epc_page())
> > 		sgx_free_epc_page();
> 
> OK, great, your patch set uses the wrapper only in the necessary call
> sites. Sorry, I overlooked this part.
> 
> Anyway, it knowingly does that. I considered either as equally harmful
> side-ffects when I implemented. Either can only trigger, when there is a
> bug in the kernel code.
> 
> It *could* do what that snippet suggest but it's like "out of the frying pan,
> into the fire" kind of change.
> 
> Since NUMA patch set anyway requires to have a global dirty list, I think
> the better way to deal with this, would be to declare a new global in the
> patch under discussion:
> 
> static struct list_head sgx_dirty_list;

sgx_dirty_page_list

> 
> And sgx_encl_free_epc_page() could simply put the pages in this list. In
> some cases you could possibly even reset the system state using kexec for
> debugging purposes, so it could potentially bring a tiny bit of value.
> 
> I can rebase then my NUMA patches on top of SGX specific KVM patches, once
> Boris have applied them.
> 
> > And for this, Dave raised one concern we should add a WARN() to let user know EPC
> > page is leaked, and reboot is requied to get them back.
> > 
> > However with sgx_reset_epc_page(), there's no place to add such WARN(), and
> > implementing original sgx_free_epc_page() as sgx_encl_free_epc_page() looks very
> > reasonable to me:
> > 
> > https://www.spinics.net/lists/linux-sgx/msg04631.html
> 
> /Jarkko


/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 22:12           ` Jarkko Sakkinen
@ 2021-03-10 22:35             ` Jarkko Sakkinen
  2021-03-10 22:43               ` Kai Huang
  0 siblings, 1 reply; 27+ messages in thread
From: Jarkko Sakkinen @ 2021-03-10 22:35 UTC (permalink / raw)
  To: Kai Huang
  Cc: Dave Hansen, linux-sgx, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Thu, Mar 11, 2021 at 12:12:17AM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 11, 2021 at 12:10:56AM +0200, Jarkko Sakkinen wrote:
> > On Thu, Mar 11, 2021 at 09:36:15AM +1300, Kai Huang wrote:
> > > On Wed, 2021-03-10 at 17:11 +0200, Jarkko Sakkinen wrote:
> > > > On Wed, Mar 03, 2021 at 08:59:17AM -0800, Dave Hansen wrote:
> > > > > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > > > > index 52d070fb4c9a..ed99c60024dc 100644
> > > > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > > > @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
> > > > > >  {
> > > > > >  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> > > > > >  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> > > > > > -	struct sgx_epc_section *section;
> > > > > >  	struct sgx_encl_page *encl_page;
> > > > > >  	struct sgx_epc_page *epc_page;
> > > > > >  	pgoff_t page_index;
> > > > > > @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
> > > > > >  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> > > > > >  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> > > > > >  
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -		section = &sgx_epc_sections[epc_page->section];
> > > > > > -		spin_lock(&section->lock);
> > > > > > -		list_add_tail(&epc_page->list, &section->page_list);
> > > > > > -		section->free_cnt++;
> > > > > > -		spin_unlock(&section->lock);
> > > > > > +		sgx_free_epc_page(epc_page);
> > > > > >  	}
> > > > > >  }
> > > > > 
> > > > > In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
> > > > >  This code does not call __eremove().  That seems to be changing
> > > > > behavior where none was intended.
> > > > 
> > > > EREMOVE does not matter here, as it doesn't in almost all most of the sites
> > > > where sgx_free_epc_page() is used in the driver. It does nothing to an
> > > > uninitialized pages.
> > > 
> > > Right. EREMOVE on uninitialized pages does nothing, so a more reasonable way is to
> > > just NOT call EREMOVE (your original code), since it is absolutely unnecessary.
> > > 
> > > I don't see ANY reason we should call EREMOVE here. 
> > > 
> > > Actually w/o my patch to split EREMOVE out of sgx_free_epc_page(), it then makes
> > > perfect sense to have new sgx_free_epc_page() here.
> > > 
> > > > 
> > > > The two patches that I posted originally for Kai's series took EREMOVE out
> > > > of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> > > > needed, but for reasons unknown to me, that change is gone.
> > > > 
> > > 
> > > It's not gone. It goes into a new sgx_encl_free_epc_page(), which is exactly the same
> > > as current sgx_free_epc_page() which as EREMOVE, instead of putting EREMOVE into a
> > > dedicated sgx_reset_epc_page(), as you did in your series:
> > > 
> > > https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/
> > > 
> > > However, your change has side effort: it always put page back into free pool, even
> > > EREMOVE fails. To make your change w/o having any functional change, it has to be:
> > > 
> > > 	if(!sgx_reset_epc_page())
> > > 		sgx_free_epc_page();
> > 
> > OK, great, your patch set uses the wrapper only in the necessary call
> > sites. Sorry, I overlooked this part.
> > 
> > Anyway, it knowingly does that. I considered either as equally harmful
> > side-ffects when I implemented. Either can only trigger, when there is a
> > bug in the kernel code.
> > 
> > It *could* do what that snippet suggest but it's like "out of the frying pan,
> > into the fire" kind of change.
> > 
> > Since NUMA patch set anyway requires to have a global dirty list, I think
> > the better way to deal with this, would be to declare a new global in the
> > patch under discussion:
> > 
> > static struct list_head sgx_dirty_list;
> 
> sgx_dirty_page_list

Actually, I think it is good as it is now. Please do nothing :-)

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>

I can continue from that and improve the fallback further. Not perfect, but
good enough.

/Jarkko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 22:35             ` Jarkko Sakkinen
@ 2021-03-10 22:43               ` Kai Huang
  2021-03-10 22:52                 ` Kai Huang
  0 siblings, 1 reply; 27+ messages in thread
From: Kai Huang @ 2021-03-10 22:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Dave Hansen, linux-sgx, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Thu, 2021-03-11 at 00:35 +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 11, 2021 at 12:12:17AM +0200, Jarkko Sakkinen wrote:
> > On Thu, Mar 11, 2021 at 12:10:56AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Mar 11, 2021 at 09:36:15AM +1300, Kai Huang wrote:
> > > > On Wed, 2021-03-10 at 17:11 +0200, Jarkko Sakkinen wrote:
> > > > > On Wed, Mar 03, 2021 at 08:59:17AM -0800, Dave Hansen wrote:
> > > > > > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > > > > > index 52d070fb4c9a..ed99c60024dc 100644
> > > > > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > > > > @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
> > > > > > >  {
> > > > > > >  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> > > > > > >  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> > > > > > > -	struct sgx_epc_section *section;
> > > > > > >  	struct sgx_encl_page *encl_page;
> > > > > > >  	struct sgx_epc_page *epc_page;
> > > > > > >  	pgoff_t page_index;
> > > > > > > @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
> > > > > > >  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> > > > > > >  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> > > > > > >  
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > -		section = &sgx_epc_sections[epc_page->section];
> > > > > > > -		spin_lock(&section->lock);
> > > > > > > -		list_add_tail(&epc_page->list, &section->page_list);
> > > > > > > -		section->free_cnt++;
> > > > > > > -		spin_unlock(&section->lock);
> > > > > > > +		sgx_free_epc_page(epc_page);
> > > > > > >  	}
> > > > > > >  }
> > > > > > 
> > > > > > In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
> > > > > >  This code does not call __eremove().  That seems to be changing
> > > > > > behavior where none was intended.
> > > > > 
> > > > > EREMOVE does not matter here, as it doesn't in almost all most of the sites
> > > > > where sgx_free_epc_page() is used in the driver. It does nothing to an
> > > > > uninitialized pages.
> > > > 
> > > > Right. EREMOVE on uninitialized pages does nothing, so a more reasonable way is to
> > > > just NOT call EREMOVE (your original code), since it is absolutely unnecessary.
> > > > 
> > > > I don't see ANY reason we should call EREMOVE here. 
> > > > 
> > > > Actually w/o my patch to split EREMOVE out of sgx_free_epc_page(), it then makes
> > > > perfect sense to have new sgx_free_epc_page() here.
> > > > 
> > > > > 
> > > > > The two patches that I posted originally for Kai's series took EREMOVE out
> > > > > of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> > > > > needed, but for reasons unknown to me, that change is gone.
> > > > > 
> > > > 
> > > > It's not gone. It goes into a new sgx_encl_free_epc_page(), which is exactly the same
> > > > as current sgx_free_epc_page() which as EREMOVE, instead of putting EREMOVE into a
> > > > dedicated sgx_reset_epc_page(), as you did in your series:
> > > > 
> > > > https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/
> > > > 
> > > > However, your change has side effort: it always put page back into free pool, even
> > > > EREMOVE fails. To make your change w/o having any functional change, it has to be:
> > > > 
> > > > 	if(!sgx_reset_epc_page())
> > > > 		sgx_free_epc_page();
> > > 
> > > OK, great, your patch set uses the wrapper only in the necessary call
> > > sites. Sorry, I overlooked this part.
> > > 
> > > Anyway, it knowingly does that. I considered either as equally harmful
> > > side-ffects when I implemented. Either can only trigger, when there is a
> > > bug in the kernel code.
> > > 
> > > It *could* do what that snippet suggest but it's like "out of the frying pan,
> > > into the fire" kind of change.
> > > 
> > > Since NUMA patch set anyway requires to have a global dirty list, I think
> > > the better way to deal with this, would be to declare a new global in the
> > > patch under discussion:
> > > 
> > > static struct list_head sgx_dirty_list;
> > 
> > sgx_dirty_page_list
> 
> Actually, I think it is good as it is now. Please do nothing :-)
> 
> Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> I can continue from that and improve the fallback further. Not perfect, but
> good enough.

Great. Thank you Jarkko.

I'll add your Acked-by and repost it since I also made a mistake in copy-paste:)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages()
  2021-03-10 22:43               ` Kai Huang
@ 2021-03-10 22:52                 ` Kai Huang
  0 siblings, 0 replies; 27+ messages in thread
From: Kai Huang @ 2021-03-10 22:52 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Dave Hansen, linux-sgx, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, linux-kernel

On Thu, 2021-03-11 at 11:43 +1300, Kai Huang wrote:
> On Thu, 2021-03-11 at 00:35 +0200, Jarkko Sakkinen wrote:
> > On Thu, Mar 11, 2021 at 12:12:17AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Mar 11, 2021 at 12:10:56AM +0200, Jarkko Sakkinen wrote:
> > > > On Thu, Mar 11, 2021 at 09:36:15AM +1300, Kai Huang wrote:
> > > > > On Wed, 2021-03-10 at 17:11 +0200, Jarkko Sakkinen wrote:
> > > > > > On Wed, Mar 03, 2021 at 08:59:17AM -0800, Dave Hansen wrote:
> > > > > > > On 3/3/21 7:03 AM, Jarkko Sakkinen wrote:
> > > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > > > > > > index 52d070fb4c9a..ed99c60024dc 100644
> > > > > > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > > > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > > > > > @@ -305,7 +305,6 @@ static void sgx_reclaim_pages(void)
> > > > > > > >  {
> > > > > > > >  	struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> > > > > > > >  	struct sgx_backing backing[SGX_NR_TO_SCAN];
> > > > > > > > -	struct sgx_epc_section *section;
> > > > > > > >  	struct sgx_encl_page *encl_page;
> > > > > > > >  	struct sgx_epc_page *epc_page;
> > > > > > > >  	pgoff_t page_index;
> > > > > > > > @@ -378,11 +377,7 @@ static void sgx_reclaim_pages(void)
> > > > > > > >  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
> > > > > > > >  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> > > > > > > >  
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -		section = &sgx_epc_sections[epc_page->section];
> > > > > > > > -		spin_lock(&section->lock);
> > > > > > > > -		list_add_tail(&epc_page->list, &section->page_list);
> > > > > > > > -		section->free_cnt++;
> > > > > > > > -		spin_unlock(&section->lock);
> > > > > > > > +		sgx_free_epc_page(epc_page);
> > > > > > > >  	}
> > > > > > > >  }
> > > > > > > 
> > > > > > > In current upstream (3fb6d0e00e), sgx_free_epc_page() calls __eremove().
> > > > > > >  This code does not call __eremove().  That seems to be changing
> > > > > > > behavior where none was intended.
> > > > > > 
> > > > > > EREMOVE does not matter here, as it doesn't in almost all most of the sites
> > > > > > where sgx_free_epc_page() is used in the driver. It does nothing to an
> > > > > > uninitialized pages.
> > > > > 
> > > > > Right. EREMOVE on uninitialized pages does nothing, so a more reasonable way is to
> > > > > just NOT call EREMOVE (your original code), since it is absolutely unnecessary.
> > > > > 
> > > > > I don't see ANY reason we should call EREMOVE here. 
> > > > > 
> > > > > Actually w/o my patch to split EREMOVE out of sgx_free_epc_page(), it then makes
> > > > > perfect sense to have new sgx_free_epc_page() here.
> > > > > 
> > > > > > 
> > > > > > The two patches that I posted originally for Kai's series took EREMOVE out
> > > > > > of sgx_free_epc_page() and put an explicit EREMOVE where it is actually
> > > > > > needed, but for reasons unknown to me, that change is gone.
> > > > > > 
> > > > > 
> > > > > It's not gone. It goes into a new sgx_encl_free_epc_page(), which is exactly the same
> > > > > as current sgx_free_epc_page() which as EREMOVE, instead of putting EREMOVE into a
> > > > > dedicated sgx_reset_epc_page(), as you did in your series:
> > > > > 
> > > > > https://lore.kernel.org/linux-sgx/20210113233541.17669-1-jarkko@kernel.org/
> > > > > 
> > > > > However, your change has side effort: it always put page back into free pool, even
> > > > > EREMOVE fails. To make your change w/o having any functional change, it has to be:
> > > > > 
> > > > > 	if(!sgx_reset_epc_page())
> > > > > 		sgx_free_epc_page();
> > > > 
> > > > OK, great, your patch set uses the wrapper only in the necessary call
> > > > sites. Sorry, I overlooked this part.
> > > > 
> > > > Anyway, it knowingly does that. I considered either as equally harmful
> > > > side-ffects when I implemented. Either can only trigger, when there is a
> > > > bug in the kernel code.
> > > > 
> > > > It *could* do what that snippet suggest but it's like "out of the frying pan,
> > > > into the fire" kind of change.
> > > > 
> > > > Since NUMA patch set anyway requires to have a global dirty list, I think
> > > > the better way to deal with this, would be to declare a new global in the
> > > > patch under discussion:
> > > > 
> > > > static struct list_head sgx_dirty_list;
> > > 
> > > sgx_dirty_page_list
> > 
> > Actually, I think it is good as it is now. Please do nothing :-)
> > 
> > Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
> > 
> > I can continue from that and improve the fallback further. Not perfect, but
> > good enough.
> 
> Great. Thank you Jarkko.
> 
> I'll add your Acked-by and repost it since I also made a mistake in copy-paste:)
> 

Hmm.. This patch was originally from you, so it has From you, and has your SoB. It
also has Co-developed-by me, but does it still require Acked-by from you?

Anyway I have added it to my local. Let me know if I should remove it.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-03-10 22:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210303150323.433207-1-jarkko@kernel.org>
2021-03-03 15:03 ` [PATCH v3 1/5] x86/sgx: Fix a resource leak in sgx_init() Jarkko Sakkinen
2021-03-03 16:56   ` Dave Hansen
2021-03-10 15:00     ` Jarkko Sakkinen
2021-03-10 15:49       ` Sean Christopherson
2021-03-10 21:52         ` Jarkko Sakkinen
2021-03-03 15:03 ` [PATCH v3 2/5] x86/sgx: Use sgx_free_epc_page() in sgx_reclaim_pages() Jarkko Sakkinen
2021-03-03 16:59   ` Dave Hansen
2021-03-10 15:11     ` Jarkko Sakkinen
2021-03-10 15:55       ` Dave Hansen
2021-03-10 21:56         ` Jarkko Sakkinen
2021-03-10 20:36       ` Kai Huang
2021-03-10 22:10         ` Jarkko Sakkinen
2021-03-10 22:12           ` Jarkko Sakkinen
2021-03-10 22:35             ` Jarkko Sakkinen
2021-03-10 22:43               ` Kai Huang
2021-03-10 22:52                 ` Kai Huang
2021-03-03 15:03 ` [PATCH v3 3/5] x86/sgx: Replace section->init_laundry_list with a temp list Jarkko Sakkinen
2021-03-03 18:02   ` Dave Hansen
2021-03-10 14:50     ` Jarkko Sakkinen
2021-03-03 15:03 ` [PATCH v3 4/5] x86/sgx: Replace section->page_list with a global free page list Jarkko Sakkinen
2021-03-03 23:48   ` Dave Hansen
2021-03-10 10:54     ` Jarkko Sakkinen
2021-03-03 15:03 ` [PATCH v3 5/5] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
2021-03-04  0:20   ` Dave Hansen
2021-03-10 11:30     ` Jarkko Sakkinen
2021-03-10 15:44       ` Dave Hansen
2021-03-10 21:48         ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).