linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
@ 2021-03-19  4:06 Kai Huang
  2021-03-19  5:42 ` Jarkko Sakkinen
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Kai Huang @ 2021-03-19  4:06 UTC (permalink / raw)
  To: x86, linux-sgx
  Cc: bp, jarkko, dave.hansen, dave.hansen, linux-kernel, Kai Huang

Below kernel bug happened when running simple SGX application when EPC
is under pressure.  The root cause is with commit 5b8719504e3a
("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"),
__sgx_alloc_epc_page() returns NULL when there's no free EPC page can be
allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in
such case.

Fix by directly returning the page if __sgx_alloc_epc_page_from_node()
allocates a valid page in fallback to non-local allocation, and always
returning ERR_PTR(-ENOMEM) if no EPC page can be allocated.

[  253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  253.500101] #PF: supervisor write access in kernel mode
[  253.525462] #PF: error_code(0x0002) - not-present page
...
[  254.102041] Call Trace:
[  254.126699]  sgx_ioc_enclave_add_pages+0x241/0x770
[  254.151305]  sgx_ioctl+0x194/0x4b0
[  254.174976]  ? handle_mm_fault+0xd0/0x260
[  254.198470]  ? do_user_addr_fault+0x1ef/0x570
[  254.221827]  __x64_sys_ioctl+0x91/0xc0
[  254.244546]  do_syscall_64+0x38/0x90
[  254.266728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  254.289232] RIP: 0033:0x7fdc4cf4031b
...
[  254.711480] CR2: 0000000000000008
[  254.735494] ---[ end trace 970dce6d4cdf7f64 ]---
[  254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152
...

Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index fe26e7e91c25..7105e34da530 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -508,10 +508,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 
 		page = __sgx_alloc_epc_page_from_node(nid);
 		if (page)
-			break;
+			return page;
 	}
 
-	return page;
+	return ERR_PTR(-ENOMEM);
 }
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19  4:06 [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page() Kai Huang
@ 2021-03-19  5:42 ` Jarkko Sakkinen
  2021-03-19  8:45 ` Borislav Petkov
  2021-03-19 18:45 ` [tip: x86/sgx] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() tip-bot2 for Jarkko Sakkinen
  2 siblings, 0 replies; 13+ messages in thread
From: Jarkko Sakkinen @ 2021-03-19  5:42 UTC (permalink / raw)
  To: Kai Huang; +Cc: x86, linux-sgx, bp, dave.hansen, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 05:06:02PM +1300, Kai Huang wrote:
> Below kernel bug happened when running simple SGX application when EPC
> is under pressure.  The root cause is with commit 5b8719504e3a
> ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"),
> __sgx_alloc_epc_page() returns NULL when there's no free EPC page can be
> allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in
> such case.
> 
> Fix by directly returning the page if __sgx_alloc_epc_page_from_node()
> allocates a valid page in fallback to non-local allocation, and always
> returning ERR_PTR(-ENOMEM) if no EPC page can be allocated.
> 
> [  253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  253.500101] #PF: supervisor write access in kernel mode
> [  253.525462] #PF: error_code(0x0002) - not-present page
> ...
> [  254.102041] Call Trace:
> [  254.126699]  sgx_ioc_enclave_add_pages+0x241/0x770
> [  254.151305]  sgx_ioctl+0x194/0x4b0
> [  254.174976]  ? handle_mm_fault+0xd0/0x260
> [  254.198470]  ? do_user_addr_fault+0x1ef/0x570
> [  254.221827]  __x64_sys_ioctl+0x91/0xc0
> [  254.244546]  do_syscall_64+0x38/0x90
> [  254.266728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  254.289232] RIP: 0033:0x7fdc4cf4031b
> ...
> [  254.711480] CR2: 0000000000000008
> [  254.735494] ---[ end trace 970dce6d4cdf7f64 ]---
> [  254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152
> ...
> 
> Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> Signed-off-by: Kai Huang <kai.huang@intel.com>


Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

> ---
>  arch/x86/kernel/cpu/sgx/main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index fe26e7e91c25..7105e34da530 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -508,10 +508,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  
>  		page = __sgx_alloc_epc_page_from_node(nid);
>  		if (page)
> -			break;
> +			return page;
>  	}
>  
> -	return page;
> +	return ERR_PTR(-ENOMEM);
>  }
>  
>  /**
> -- 
> 2.30.2
> 
> 

/Jarkko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19  4:06 [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page() Kai Huang
  2021-03-19  5:42 ` Jarkko Sakkinen
@ 2021-03-19  8:45 ` Borislav Petkov
  2021-03-19  9:01   ` Kai Huang
  2021-03-19 14:49   ` Jarkko Sakkinen
  2021-03-19 18:45 ` [tip: x86/sgx] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() tip-bot2 for Jarkko Sakkinen
  2 siblings, 2 replies; 13+ messages in thread
From: Borislav Petkov @ 2021-03-19  8:45 UTC (permalink / raw)
  To: Kai Huang; +Cc: x86, linux-sgx, jarkko, dave.hansen, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 05:06:02PM +1300, Kai Huang wrote:
> Below kernel bug happened when running simple SGX application when EPC
> is under pressure.  The root cause is with commit 5b8719504e3a
> ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"),
> __sgx_alloc_epc_page() returns NULL when there's no free EPC page can be
> allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in
> such case.
> 
> Fix by directly returning the page if __sgx_alloc_epc_page_from_node()
> allocates a valid page in fallback to non-local allocation, and always
> returning ERR_PTR(-ENOMEM) if no EPC page can be allocated.
> 
> [  253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  253.500101] #PF: supervisor write access in kernel mode
> [  253.525462] #PF: error_code(0x0002) - not-present page
> ...
> [  254.102041] Call Trace:
> [  254.126699]  sgx_ioc_enclave_add_pages+0x241/0x770
> [  254.151305]  sgx_ioctl+0x194/0x4b0
> [  254.174976]  ? handle_mm_fault+0xd0/0x260
> [  254.198470]  ? do_user_addr_fault+0x1ef/0x570
> [  254.221827]  __x64_sys_ioctl+0x91/0xc0
> [  254.244546]  do_syscall_64+0x38/0x90
> [  254.266728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  254.289232] RIP: 0033:0x7fdc4cf4031b
> ...
> [  254.711480] CR2: 0000000000000008
> [  254.735494] ---[ end trace 970dce6d4cdf7f64 ]---
> [  254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152
> ...
> 
> Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

I was on the verge whether to merge that into the original patch since
it is the top patch on the branch or create a new one but opted for
former because this way it won't break bisection and people won't have
to pay attention whether there's a fix patch to the NUMA patch too, in
case they wanna backport and whatnot.

Here's the new version:

Thx.

---
From 323950a8a98b492ac2fa168e8e4c0becfb4554dd Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko@kernel.org>
Date: Thu, 18 Mar 2021 01:53:31 +0200
Subject: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to
 sgx_alloc_epc_page()

Background
==========

SGX enclave memory is enumerated by the processor in contiguous physical
ranges called Enclave Page Cache (EPC) sections.  Currently, there is a
free list per section, but allocations simply target the lowest-numbered
sections.  This is functional, but has no NUMA awareness.

Fortunately, EPC sections are covered by entries in the ACPI SRAT table.
These entries allow each EPC section to be associated with a NUMA node,
just like normal RAM.

Solution
========

Implement a NUMA-aware enclave page allocator.  Mirror the buddy allocator
and maintain a list of enclave pages for each NUMA node.  Attempt to
allocate enclave memory first from local nodes, then fall back to other
nodes.

Note that the fallback is not as sophisticated as the buddy allocator
and is itself not aware of NUMA distances.  When a node's free list is
empty, it searches for the next-highest node with enclave pages (and
will wrap if necessary).  This could be improved in the future.

Other
=====

NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node().

 [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because
   callers do not expect that and that leads to a NULL ptr deref. ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
---
 arch/x86/Kconfig               |   1 +
 arch/x86/kernel/cpu/sgx/main.c | 119 +++++++++++++++++++++------------
 arch/x86/kernel/cpu/sgx/sgx.h  |  16 +++--
 3 files changed, 88 insertions(+), 48 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..35391e94bd22 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1931,6 +1931,7 @@ config X86_SGX
 	depends on CRYPTO_SHA256=y
 	select SRCU
 	select MMU_NOTIFIER
+	select NUMA_KEEP_MEMINFO if NUMA
 	help
 	  Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
 	  that can be used by applications to set aside private regions of code
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index f3a5cd2d27ef..5c9c5e5fb1fb 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -23,9 +23,21 @@ static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
  * with sgx_reclaimer_lock acquired.
  */
 static LIST_HEAD(sgx_active_page_list);
-
 static DEFINE_SPINLOCK(sgx_reclaimer_lock);
 
+/* The free page list lock protected variables prepend the lock. */
+static unsigned long sgx_nr_free_pages;
+
+/* Nodes with one or more EPC sections. */
+static nodemask_t sgx_numa_mask;
+
+/*
+ * Array with one list_head for each possible NUMA node.  Each
+ * list contains all the sgx_epc_section's which are on that
+ * node.
+ */
+static struct sgx_numa_node *sgx_numa_nodes;
+
 static LIST_HEAD(sgx_dirty_page_list);
 
 /*
@@ -312,6 +324,7 @@ static void sgx_reclaim_pages(void)
 	struct sgx_epc_section *section;
 	struct sgx_encl_page *encl_page;
 	struct sgx_epc_page *epc_page;
+	struct sgx_numa_node *node;
 	pgoff_t page_index;
 	int cnt = 0;
 	int ret;
@@ -383,28 +396,18 @@ static void sgx_reclaim_pages(void)
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
 
 		section = &sgx_epc_sections[epc_page->section];
-		spin_lock(&section->lock);
-		list_add_tail(&epc_page->list, &section->page_list);
-		section->free_cnt++;
-		spin_unlock(&section->lock);
-	}
-}
-
-static unsigned long sgx_nr_free_pages(void)
-{
-	unsigned long cnt = 0;
-	int i;
-
-	for (i = 0; i < sgx_nr_epc_sections; i++)
-		cnt += sgx_epc_sections[i].free_cnt;
+		node = section->node;
 
-	return cnt;
+		spin_lock(&node->lock);
+		list_add_tail(&epc_page->list, &node->free_page_list);
+		sgx_nr_free_pages++;
+		spin_unlock(&node->lock);
+	}
 }
 
 static bool sgx_should_reclaim(unsigned long watermark)
 {
-	return sgx_nr_free_pages() < watermark &&
-	       !list_empty(&sgx_active_page_list);
+	return sgx_nr_free_pages < watermark && !list_empty(&sgx_active_page_list);
 }
 
 static int ksgxd(void *p)
@@ -451,45 +454,56 @@ static bool __init sgx_page_reclaimer_init(void)
 	return true;
 }
 
-static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_section *section)
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 {
-	struct sgx_epc_page *page;
+	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
+	struct sgx_epc_page *page = NULL;
 
-	spin_lock(&section->lock);
+	spin_lock(&node->lock);
 
-	if (list_empty(&section->page_list)) {
-		spin_unlock(&section->lock);
+	if (list_empty(&node->free_page_list)) {
+		spin_unlock(&node->lock);
 		return NULL;
 	}
 
-	page = list_first_entry(&section->page_list, struct sgx_epc_page, list);
+	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
-	section->free_cnt--;
+	sgx_nr_free_pages--;
+
+	spin_unlock(&node->lock);
 
-	spin_unlock(&section->lock);
 	return page;
 }
 
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
  *
- * Iterate through EPC sections and borrow a free EPC page to the caller. When a
- * page is no longer needed it must be released with sgx_free_epc_page().
+ * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
+ * from the NUMA node, where the caller is executing.
  *
  * Return:
- *   an EPC page,
- *   -errno on error
+ * - an EPC page:	A borrowed EPC pages were available.
+ * - NULL:		Out of EPC pages.
  */
 struct sgx_epc_page *__sgx_alloc_epc_page(void)
 {
-	struct sgx_epc_section *section;
 	struct sgx_epc_page *page;
-	int i;
+	int nid_of_current = numa_node_id();
+	int nid;
 
-	for (i = 0; i < sgx_nr_epc_sections; i++) {
-		section = &sgx_epc_sections[i];
+	if (node_isset(nid_of_current, sgx_numa_mask)) {
+		page = __sgx_alloc_epc_page_from_node(nid_of_current);
+		if (page)
+			return page;
+	}
+
+	/* Fall back to the non-local NUMA nodes: */
+	while (true) {
+		nid = next_node_in(nid, sgx_numa_mask);
+		if (nid == nid_of_current)
+			break;
 
-		page = __sgx_alloc_epc_page_from_section(section);
+		page = __sgx_alloc_epc_page_from_node(nid);
 		if (page)
 			return page;
 	}
@@ -600,6 +614,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+	struct sgx_numa_node *node = section->node;
 	int ret;
 
 	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
@@ -608,10 +623,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
 		return;
 
-	spin_lock(&section->lock);
-	list_add_tail(&page->list, &section->page_list);
-	section->free_cnt++;
-	spin_unlock(&section->lock);
+	spin_lock(&node->lock);
+
+	list_add_tail(&page->list, &node->free_page_list);
+	sgx_nr_free_pages++;
+
+	spin_unlock(&node->lock);
 }
 
 static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
@@ -632,8 +649,6 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
-	spin_lock_init(&section->lock);
-	INIT_LIST_HEAD(&section->page_list);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -642,7 +657,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
-	section->free_cnt = nr_pages;
+	sgx_nr_free_pages += nr_pages;
 	return true;
 }
 
@@ -661,8 +676,13 @@ static bool __init sgx_page_cache_init(void)
 {
 	u32 eax, ebx, ecx, edx, type;
 	u64 pa, size;
+	int nid;
 	int i;
 
+	sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);
+	if (!sgx_numa_nodes)
+		return false;
+
 	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
 		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
 
@@ -685,6 +705,21 @@ static bool __init sgx_page_cache_init(void)
 			break;
 		}
 
+		nid = numa_map_to_online_node(phys_to_target_node(pa));
+		if (nid == NUMA_NO_NODE) {
+			/* The physical address is already printed above. */
+			pr_warn(FW_BUG "Unable to map EPC section to online node. Fallback to the NUMA node 0.\n");
+			nid = 0;
+		}
+
+		if (!node_isset(nid, sgx_numa_mask)) {
+			spin_lock_init(&sgx_numa_nodes[nid].lock);
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
+			node_set(nid, sgx_numa_mask);
+		}
+
+		sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
+
 		sgx_nr_epc_sections++;
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index bc8af0428640..653af8ca1a25 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -29,22 +29,26 @@ struct sgx_epc_page {
 	struct list_head list;
 };
 
+/*
+ * Contains the tracking data for NUMA nodes having EPC pages. Most importantly,
+ * the free page list local to the node is stored here.
+ */
+struct sgx_numa_node {
+	struct list_head free_page_list;
+	spinlock_t lock;
+};
+
 /*
  * The firmware can define multiple chunks of EPC to the different areas of the
  * physical memory e.g. for memory areas of the each node. This structure is
  * used to store EPC pages for one EPC section and virtual memory area where
  * the pages have been mapped.
- *
- * 'lock' must be held before accessing 'page_list' or 'free_cnt'.
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
-
-	spinlock_t lock;
-	struct list_head page_list;
-	unsigned long free_cnt;
+	struct sgx_numa_node *node;
 };
 
 extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
-- 
2.29.2

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19  8:45 ` Borislav Petkov
@ 2021-03-19  9:01   ` Kai Huang
  2021-03-19  9:12     ` Kai Huang
  2021-03-19 14:50     ` Jarkko Sakkinen
  2021-03-19 14:49   ` Jarkko Sakkinen
  1 sibling, 2 replies; 13+ messages in thread
From: Kai Huang @ 2021-03-19  9:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-sgx, jarkko, dave.hansen, dave.hansen, linux-kernel

On Fri, 19 Mar 2021 09:45:23 +0100 Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 05:06:02PM +1300, Kai Huang wrote:
> > Below kernel bug happened when running simple SGX application when EPC
> > is under pressure.  The root cause is with commit 5b8719504e3a
> > ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"),
> > __sgx_alloc_epc_page() returns NULL when there's no free EPC page can be
> > allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in
> > such case.
> > 
> > Fix by directly returning the page if __sgx_alloc_epc_page_from_node()
> > allocates a valid page in fallback to non-local allocation, and always
> > returning ERR_PTR(-ENOMEM) if no EPC page can be allocated.
> > 
> > [  253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > [  253.500101] #PF: supervisor write access in kernel mode
> > [  253.525462] #PF: error_code(0x0002) - not-present page
> > ...
> > [  254.102041] Call Trace:
> > [  254.126699]  sgx_ioc_enclave_add_pages+0x241/0x770
> > [  254.151305]  sgx_ioctl+0x194/0x4b0
> > [  254.174976]  ? handle_mm_fault+0xd0/0x260
> > [  254.198470]  ? do_user_addr_fault+0x1ef/0x570
> > [  254.221827]  __x64_sys_ioctl+0x91/0xc0
> > [  254.244546]  do_syscall_64+0x38/0x90
> > [  254.266728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  254.289232] RIP: 0033:0x7fdc4cf4031b
> > ...
> > [  254.711480] CR2: 0000000000000008
> > [  254.735494] ---[ end trace 970dce6d4cdf7f64 ]---
> > [  254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152
> > ...
> > 
> > Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/main.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> I was on the verge whether to merge that into the original patch since
> it is the top patch on the branch or create a new one but opted for
> former because this way it won't break bisection and people won't have
> to pay attention whether there's a fix patch to the NUMA patch too, in
> case they wanna backport and whatnot.

Sure.

[...]

> +
> +	/* Fall back to the non-local NUMA nodes: */
> +	while (true) {
> +		nid = next_node_in(nid, sgx_numa_mask);
> +		if (nid == nid_of_current)
> +			break;
>  
> -		page = __sgx_alloc_epc_page_from_section(section);
> +		page = __sgx_alloc_epc_page_from_node(nid);
>  		if (page)
>  			return page;
>  	}

It seems "return ERR_PTR(-ENOMEM)" is missing at the bottom of this function?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19  9:01   ` Kai Huang
@ 2021-03-19  9:12     ` Kai Huang
  2021-03-19 14:50     ` Jarkko Sakkinen
  1 sibling, 0 replies; 13+ messages in thread
From: Kai Huang @ 2021-03-19  9:12 UTC (permalink / raw)
  To: Kai Huang
  Cc: Borislav Petkov, x86, linux-sgx, jarkko, dave.hansen,
	dave.hansen, linux-kernel

On Fri, 19 Mar 2021 22:01:41 +1300 Kai Huang wrote:
> On Fri, 19 Mar 2021 09:45:23 +0100 Borislav Petkov wrote:
> > On Fri, Mar 19, 2021 at 05:06:02PM +1300, Kai Huang wrote:
> > > Below kernel bug happened when running simple SGX application when EPC
> > > is under pressure.  The root cause is with commit 5b8719504e3a
> > > ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"),
> > > __sgx_alloc_epc_page() returns NULL when there's no free EPC page can be
> > > allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in
> > > such case.
> > > 
> > > Fix by directly returning the page if __sgx_alloc_epc_page_from_node()
> > > allocates a valid page in fallback to non-local allocation, and always
> > > returning ERR_PTR(-ENOMEM) if no EPC page can be allocated.
> > > 
> > > [  253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > [  253.500101] #PF: supervisor write access in kernel mode
> > > [  253.525462] #PF: error_code(0x0002) - not-present page
> > > ...
> > > [  254.102041] Call Trace:
> > > [  254.126699]  sgx_ioc_enclave_add_pages+0x241/0x770
> > > [  254.151305]  sgx_ioctl+0x194/0x4b0
> > > [  254.174976]  ? handle_mm_fault+0xd0/0x260
> > > [  254.198470]  ? do_user_addr_fault+0x1ef/0x570
> > > [  254.221827]  __x64_sys_ioctl+0x91/0xc0
> > > [  254.244546]  do_syscall_64+0x38/0x90
> > > [  254.266728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [  254.289232] RIP: 0033:0x7fdc4cf4031b
> > > ...
> > > [  254.711480] CR2: 0000000000000008
> > > [  254.735494] ---[ end trace 970dce6d4cdf7f64 ]---
> > > [  254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152
> > > ...
> > > 
> > > Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  arch/x86/kernel/cpu/sgx/main.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > I was on the verge whether to merge that into the original patch since
> > it is the top patch on the branch or create a new one but opted for
> > former because this way it won't break bisection and people won't have
> > to pay attention whether there's a fix patch to the NUMA patch too, in
> > case they wanna backport and whatnot.
> 
> Sure.
> 
> [...]
> 
> > +
> > +	/* Fall back to the non-local NUMA nodes: */
> > +	while (true) {
> > +		nid = next_node_in(nid, sgx_numa_mask);
> > +		if (nid == nid_of_current)
> > +			break;
> >  
> > -		page = __sgx_alloc_epc_page_from_section(section);
> > +		page = __sgx_alloc_epc_page_from_node(nid);
> >  		if (page)
> >  			return page;
> >  	}
> 
> It seems "return ERR_PTR(-ENOMEM)" is missing at the bottom of this function?

Oh my fault. This line is not shown in patch probably due to it is not changed
by this patch. Please ignore this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19  8:45 ` Borislav Petkov
  2021-03-19  9:01   ` Kai Huang
@ 2021-03-19 14:49   ` Jarkko Sakkinen
  1 sibling, 0 replies; 13+ messages in thread
From: Jarkko Sakkinen @ 2021-03-19 14:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, x86, linux-sgx, dave.hansen, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 09:45:23AM +0100, Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 05:06:02PM +1300, Kai Huang wrote:
> > Below kernel bug happened when running simple SGX application when EPC
> > is under pressure.  The root cause is with commit 5b8719504e3a
> > ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"),
> > __sgx_alloc_epc_page() returns NULL when there's no free EPC page can be
> > allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in
> > such case.
> > 
> > Fix by directly returning the page if __sgx_alloc_epc_page_from_node()
> > allocates a valid page in fallback to non-local allocation, and always
> > returning ERR_PTR(-ENOMEM) if no EPC page can be allocated.
> > 
> > [  253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > [  253.500101] #PF: supervisor write access in kernel mode
> > [  253.525462] #PF: error_code(0x0002) - not-present page
> > ...
> > [  254.102041] Call Trace:
> > [  254.126699]  sgx_ioc_enclave_add_pages+0x241/0x770
> > [  254.151305]  sgx_ioctl+0x194/0x4b0
> > [  254.174976]  ? handle_mm_fault+0xd0/0x260
> > [  254.198470]  ? do_user_addr_fault+0x1ef/0x570
> > [  254.221827]  __x64_sys_ioctl+0x91/0xc0
> > [  254.244546]  do_syscall_64+0x38/0x90
> > [  254.266728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  254.289232] RIP: 0033:0x7fdc4cf4031b
> > ...
> > [  254.711480] CR2: 0000000000000008
> > [  254.735494] ---[ end trace 970dce6d4cdf7f64 ]---
> > [  254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152
> > ...
> > 
> > Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/main.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> I was on the verge whether to merge that into the original patch since
> it is the top patch on the branch or create a new one but opted for
> former because this way it won't break bisection and people won't have
> to pay attention whether there's a fix patch to the NUMA patch too, in
> case they wanna backport and whatnot.
> 
> Here's the new version:
> 
> Thx.
> 

Yeah, I'd also prefer to keep Dave's and Kai's patches as separate entitied.

Just in case, this was the Dave's fix:

https://patchwork.kernel.org/project/intel-sgx/patch/20210318214933.29341-1-dave.hansen@intel.com/

/Jarkko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19  9:01   ` Kai Huang
  2021-03-19  9:12     ` Kai Huang
@ 2021-03-19 14:50     ` Jarkko Sakkinen
  2021-03-19 14:59       ` Borislav Petkov
  1 sibling, 1 reply; 13+ messages in thread
From: Jarkko Sakkinen @ 2021-03-19 14:50 UTC (permalink / raw)
  To: Kai Huang
  Cc: Borislav Petkov, x86, linux-sgx, dave.hansen, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 10:01:41PM +1300, Kai Huang wrote:
> On Fri, 19 Mar 2021 09:45:23 +0100 Borislav Petkov wrote:
> > On Fri, Mar 19, 2021 at 05:06:02PM +1300, Kai Huang wrote:
> > > Below kernel bug happened when running simple SGX application when EPC
> > > is under pressure.  The root cause is with commit 5b8719504e3a
> > > ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()"),
> > > __sgx_alloc_epc_page() returns NULL when there's no free EPC page can be
> > > allocated, while old behavior was it always returned ERR_PTR(-ENOMEM) in
> > > such case.
> > > 
> > > Fix by directly returning the page if __sgx_alloc_epc_page_from_node()
> > > allocates a valid page in fallback to non-local allocation, and always
> > > returning ERR_PTR(-ENOMEM) if no EPC page can be allocated.
> > > 
> > > [  253.474764] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > [  253.500101] #PF: supervisor write access in kernel mode
> > > [  253.525462] #PF: error_code(0x0002) - not-present page
> > > ...
> > > [  254.102041] Call Trace:
> > > [  254.126699]  sgx_ioc_enclave_add_pages+0x241/0x770
> > > [  254.151305]  sgx_ioctl+0x194/0x4b0
> > > [  254.174976]  ? handle_mm_fault+0xd0/0x260
> > > [  254.198470]  ? do_user_addr_fault+0x1ef/0x570
> > > [  254.221827]  __x64_sys_ioctl+0x91/0xc0
> > > [  254.244546]  do_syscall_64+0x38/0x90
> > > [  254.266728]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [  254.289232] RIP: 0033:0x7fdc4cf4031b
> > > ...
> > > [  254.711480] CR2: 0000000000000008
> > > [  254.735494] ---[ end trace 970dce6d4cdf7f64 ]---
> > > [  254.759915] RIP: 0010:sgx_alloc_epc_page+0x46/0x152
> > > ...
> > > 
> > > Fixes: 5b8719504e3a("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > >  arch/x86/kernel/cpu/sgx/main.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > I was on the verge whether to merge that into the original patch since
> > it is the top patch on the branch or create a new one but opted for
> > former because this way it won't break bisection and people won't have
> > to pay attention whether there's a fix patch to the NUMA patch too, in
> > case they wanna backport and whatnot.
> 
> Sure.
> 
> [...]
> 
> > +
> > +	/* Fall back to the non-local NUMA nodes: */
> > +	while (true) {
> > +		nid = next_node_in(nid, sgx_numa_mask);
> > +		if (nid == nid_of_current)
> > +			break;
> >  
> > -		page = __sgx_alloc_epc_page_from_section(section);
> > +		page = __sgx_alloc_epc_page_from_node(nid);
> >  		if (page)
> >  			return page;
> >  	}
> 
> It seems "return ERR_PTR(-ENOMEM)" is missing at the bottom of this function?

I understood Boris' comment that the fixes would not be squashed.

/Jarkko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19 14:50     ` Jarkko Sakkinen
@ 2021-03-19 14:59       ` Borislav Petkov
  2021-03-19 15:22         ` Jarkko Sakkinen
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-03-19 14:59 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Kai Huang, x86, linux-sgx, dave.hansen, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 04:50:33PM +0200, Jarkko Sakkinen wrote:
> > > I was on the verge whether to merge that into the original patch since
> > > it is the top patch on the branch or create a new one but opted for
> > > former because this way it won't break bisection and people won't have
> > > to pay attention whether there's a fix patch to the NUMA patch too, in
> > > case they wanna backport and whatnot.

...

> I understood Boris' comment that the fixes would not be squashed.

Please read it again and let me know what about it is not clear. It is
at the beginning of this mail.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19 14:59       ` Borislav Petkov
@ 2021-03-19 15:22         ` Jarkko Sakkinen
  2021-03-19 15:52           ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Jarkko Sakkinen @ 2021-03-19 15:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kai Huang, x86, linux-sgx, dave.hansen, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 03:59:31PM +0100, Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 04:50:33PM +0200, Jarkko Sakkinen wrote:
> > > > I was on the verge whether to merge that into the original patch since
> > > > it is the top patch on the branch or create a new one but opted for
> > > > former because this way it won't break bisection and people won't have
> > > > to pay attention whether there's a fix patch to the NUMA patch too, in
> > > > case they wanna backport and whatnot.
> 
> ...
> 
> > I understood Boris' comment that the fixes would not be squashed.
> 
> Please read it again and let me know what about it is not clear. It is
> at the beginning of this mail.

I did misread it for the first time.

So let's sanity: you *are* going to squash the patches together because
that way it's factors easier to backport the whole thing?

Is this the correct understanding?

Fair point, I can cope with that.

> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

/Jarkko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19 15:22         ` Jarkko Sakkinen
@ 2021-03-19 15:52           ` Borislav Petkov
  2021-03-19 15:59             ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-03-19 15:52 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Kai Huang, x86, linux-sgx, dave.hansen, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 05:22:56PM +0200, Jarkko Sakkinen wrote:
> I did misread it for the first time.
> 
> So let's sanity: you *are* going to squash the patches together because
> that way it's factors easier to backport the whole thing?
> 
> Is this the correct understanding?

I squashed Kai's fix because I don't want to break people's bisection if
they land between your patch and his fix. They're already troubled enough
chasing an issue, don't want to have them get a NULL ptr in sgx land.

Now, looking at dhansen's fix: what can happen if nid is uninitialized?
AFAICT, we'll end up in

static inline int __next_node(int n, const nodemask_t *srcp)
{
        return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
}

with n uninitialized and depending on its value it'll either return
MAX_NUMNODES so we'll try to allocate on the first node or try to
allocate on some other node.

Now, if you think that that is still problematic enough for enclave
creation, then I'll fold his patch too.

So yes, the main reason is usability and not breaking bisection.

So, what would you prefer?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19 15:52           ` Borislav Petkov
@ 2021-03-19 15:59             ` Dave Hansen
  2021-03-19 16:09               ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2021-03-19 15:59 UTC (permalink / raw)
  To: Borislav Petkov, Jarkko Sakkinen
  Cc: Kai Huang, x86, linux-sgx, dave.hansen, linux-kernel

On 3/19/21 8:52 AM, Borislav Petkov wrote:
> On Fri, Mar 19, 2021 at 05:22:56PM +0200, Jarkko Sakkinen wrote:
>> I did misread it for the first time.
>>
>> So let's sanity: you *are* going to squash the patches together because
>> that way it's factors easier to backport the whole thing?
>>
>> Is this the correct understanding?
> I squashed Kai's fix because I don't want to break people's bisection if
> they land between your patch and his fix. They're already troubled enough
> chasing an issue, don't want to have them get a NULL ptr in sgx land.
> 
> Now, looking at dhansen's fix: what can happen if nid is uninitialized?
> AFAICT, we'll end up in
> 
> static inline int __next_node(int n, const nodemask_t *srcp)
> {
>         return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
> }
> 
> with n uninitialized and depending on its value it'll either return
> MAX_NUMNODES so we'll try to allocate on the first node or try to
> allocate on some other node.
> 
> Now, if you think that that is still problematic enough for enclave
> creation, then I'll fold his patch too.
> 
> So yes, the main reason is usability and not breaking bisection.
> 
> So, what would you prefer?

It's probably best to squash my patch in.  The uninitialized value could
*theoretically* cause the search to start at the wrong node and then end
before every node has been visited.  That could cause premature
allocation failures.

But, I seriously doubt anyone will notice either way.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page()
  2021-03-19 15:59             ` Dave Hansen
@ 2021-03-19 16:09               ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2021-03-19 16:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jarkko Sakkinen, Kai Huang, x86, linux-sgx, dave.hansen, linux-kernel

On Fri, Mar 19, 2021 at 08:59:46AM -0700, Dave Hansen wrote:
> It's probably best to squash my patch in.  The uninitialized value could
> *theoretically* cause the search to start at the wrong node and then end
> before every node has been visited.  That could cause premature
> allocation failures.
> 
> But, I seriously doubt anyone will notice either way.

Yeah, I think too that we should be on the safe side, just in case.
Lemme redo the branch with the new changes and test.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip: x86/sgx] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-03-19  4:06 [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page() Kai Huang
  2021-03-19  5:42 ` Jarkko Sakkinen
  2021-03-19  8:45 ` Borislav Petkov
@ 2021-03-19 18:45 ` tip-bot2 for Jarkko Sakkinen
  2 siblings, 0 replies; 13+ messages in thread
From: tip-bot2 for Jarkko Sakkinen @ 2021-03-19 18:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kernel test robot, Jarkko Sakkinen, Kai Huang, Dave Hansen,
	Borislav Petkov, Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     901ddbb9ecf5425183ea0c09d10c2fd7868dce54
Gitweb:        https://git.kernel.org/tip/901ddbb9ecf5425183ea0c09d10c2fd7868dce54
Author:        Jarkko Sakkinen <jarkko@kernel.org>
AuthorDate:    Thu, 18 Mar 2021 01:53:31 +02:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Fri, 19 Mar 2021 19:16:51 +01:00

x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()

Background
==========

SGX enclave memory is enumerated by the processor in contiguous physical
ranges called Enclave Page Cache (EPC) sections.  Currently, there is a
free list per section, but allocations simply target the lowest-numbered
sections.  This is functional, but has no NUMA awareness.

Fortunately, EPC sections are covered by entries in the ACPI SRAT table.
These entries allow each EPC section to be associated with a NUMA node,
just like normal RAM.

Solution
========

Implement a NUMA-aware enclave page allocator.  Mirror the buddy allocator
and maintain a list of enclave pages for each NUMA node.  Attempt to
allocate enclave memory first from local nodes, then fall back to other
nodes.

Note that the fallback is not as sophisticated as the buddy allocator
and is itself not aware of NUMA distances.  When a node's free list is
empty, it searches for the next-highest node with enclave pages (and
will wrap if necessary).  This could be improved in the future.

Other
=====

NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node().

 [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because
   callers do not expect that and that leads to a NULL ptr deref. ]

 [ dhansen: Fix an uninitialized 'nid' variable in
   __sgx_alloc_epc_page() as

   Reported-by: kernel test robot <lkp@intel.com>

   to avoid any potential allocations from the wrong NUMA node or even
   premature allocation failures. ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com
Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com
Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
---
 arch/x86/Kconfig               |   1 +-
 arch/x86/kernel/cpu/sgx/main.c | 119 ++++++++++++++++++++------------
 arch/x86/kernel/cpu/sgx/sgx.h  |  16 ++--
 3 files changed, 88 insertions(+), 48 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879..35391e9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1931,6 +1931,7 @@ config X86_SGX
 	depends on CRYPTO_SHA256=y
 	select SRCU
 	select MMU_NOTIFIER
+	select NUMA_KEEP_MEMINFO if NUMA
 	help
 	  Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
 	  that can be used by applications to set aside private regions of code
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index f3a5cd2..13a7599 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -23,9 +23,21 @@ static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
  * with sgx_reclaimer_lock acquired.
  */
 static LIST_HEAD(sgx_active_page_list);
-
 static DEFINE_SPINLOCK(sgx_reclaimer_lock);
 
+/* The free page list lock protected variables prepend the lock. */
+static unsigned long sgx_nr_free_pages;
+
+/* Nodes with one or more EPC sections. */
+static nodemask_t sgx_numa_mask;
+
+/*
+ * Array with one list_head for each possible NUMA node.  Each
+ * list contains all the sgx_epc_section's which are on that
+ * node.
+ */
+static struct sgx_numa_node *sgx_numa_nodes;
+
 static LIST_HEAD(sgx_dirty_page_list);
 
 /*
@@ -312,6 +324,7 @@ static void sgx_reclaim_pages(void)
 	struct sgx_epc_section *section;
 	struct sgx_encl_page *encl_page;
 	struct sgx_epc_page *epc_page;
+	struct sgx_numa_node *node;
 	pgoff_t page_index;
 	int cnt = 0;
 	int ret;
@@ -383,28 +396,18 @@ skip:
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
 
 		section = &sgx_epc_sections[epc_page->section];
-		spin_lock(&section->lock);
-		list_add_tail(&epc_page->list, &section->page_list);
-		section->free_cnt++;
-		spin_unlock(&section->lock);
-	}
-}
-
-static unsigned long sgx_nr_free_pages(void)
-{
-	unsigned long cnt = 0;
-	int i;
-
-	for (i = 0; i < sgx_nr_epc_sections; i++)
-		cnt += sgx_epc_sections[i].free_cnt;
+		node = section->node;
 
-	return cnt;
+		spin_lock(&node->lock);
+		list_add_tail(&epc_page->list, &node->free_page_list);
+		sgx_nr_free_pages++;
+		spin_unlock(&node->lock);
+	}
 }
 
 static bool sgx_should_reclaim(unsigned long watermark)
 {
-	return sgx_nr_free_pages() < watermark &&
-	       !list_empty(&sgx_active_page_list);
+	return sgx_nr_free_pages < watermark && !list_empty(&sgx_active_page_list);
 }
 
 static int ksgxd(void *p)
@@ -451,45 +454,56 @@ static bool __init sgx_page_reclaimer_init(void)
 	return true;
 }
 
-static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_section *section)
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 {
-	struct sgx_epc_page *page;
+	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
+	struct sgx_epc_page *page = NULL;
 
-	spin_lock(&section->lock);
+	spin_lock(&node->lock);
 
-	if (list_empty(&section->page_list)) {
-		spin_unlock(&section->lock);
+	if (list_empty(&node->free_page_list)) {
+		spin_unlock(&node->lock);
 		return NULL;
 	}
 
-	page = list_first_entry(&section->page_list, struct sgx_epc_page, list);
+	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
-	section->free_cnt--;
+	sgx_nr_free_pages--;
+
+	spin_unlock(&node->lock);
 
-	spin_unlock(&section->lock);
 	return page;
 }
 
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
  *
- * Iterate through EPC sections and borrow a free EPC page to the caller. When a
- * page is no longer needed it must be released with sgx_free_epc_page().
+ * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
+ * from the NUMA node, where the caller is executing.
  *
  * Return:
- *   an EPC page,
- *   -errno on error
+ * - an EPC page:	A borrowed EPC pages were available.
+ * - NULL:		Out of EPC pages.
  */
 struct sgx_epc_page *__sgx_alloc_epc_page(void)
 {
-	struct sgx_epc_section *section;
 	struct sgx_epc_page *page;
-	int i;
+	int nid_of_current = numa_node_id();
+	int nid = nid_of_current;
 
-	for (i = 0; i < sgx_nr_epc_sections; i++) {
-		section = &sgx_epc_sections[i];
+	if (node_isset(nid_of_current, sgx_numa_mask)) {
+		page = __sgx_alloc_epc_page_from_node(nid_of_current);
+		if (page)
+			return page;
+	}
+
+	/* Fall back to the non-local NUMA nodes: */
+	while (true) {
+		nid = next_node_in(nid, sgx_numa_mask);
+		if (nid == nid_of_current)
+			break;
 
-		page = __sgx_alloc_epc_page_from_section(section);
+		page = __sgx_alloc_epc_page_from_node(nid);
 		if (page)
 			return page;
 	}
@@ -600,6 +614,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+	struct sgx_numa_node *node = section->node;
 	int ret;
 
 	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
@@ -608,10 +623,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
 		return;
 
-	spin_lock(&section->lock);
-	list_add_tail(&page->list, &section->page_list);
-	section->free_cnt++;
-	spin_unlock(&section->lock);
+	spin_lock(&node->lock);
+
+	list_add_tail(&page->list, &node->free_page_list);
+	sgx_nr_free_pages++;
+
+	spin_unlock(&node->lock);
 }
 
 static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
@@ -632,8 +649,6 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
-	spin_lock_init(&section->lock);
-	INIT_LIST_HEAD(&section->page_list);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -642,7 +657,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
-	section->free_cnt = nr_pages;
+	sgx_nr_free_pages += nr_pages;
 	return true;
 }
 
@@ -661,8 +676,13 @@ static bool __init sgx_page_cache_init(void)
 {
 	u32 eax, ebx, ecx, edx, type;
 	u64 pa, size;
+	int nid;
 	int i;
 
+	sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);
+	if (!sgx_numa_nodes)
+		return false;
+
 	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
 		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
 
@@ -685,6 +705,21 @@ static bool __init sgx_page_cache_init(void)
 			break;
 		}
 
+		nid = numa_map_to_online_node(phys_to_target_node(pa));
+		if (nid == NUMA_NO_NODE) {
+			/* The physical address is already printed above. */
+			pr_warn(FW_BUG "Unable to map EPC section to online node. Fallback to the NUMA node 0.\n");
+			nid = 0;
+		}
+
+		if (!node_isset(nid, sgx_numa_mask)) {
+			spin_lock_init(&sgx_numa_nodes[nid].lock);
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
+			node_set(nid, sgx_numa_mask);
+		}
+
+		sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
+
 		sgx_nr_epc_sections++;
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index bc8af04..653af8c 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -30,21 +30,25 @@ struct sgx_epc_page {
 };
 
 /*
+ * Contains the tracking data for NUMA nodes having EPC pages. Most importantly,
+ * the free page list local to the node is stored here.
+ */
+struct sgx_numa_node {
+	struct list_head free_page_list;
+	spinlock_t lock;
+};
+
+/*
  * The firmware can define multiple chunks of EPC to the different areas of the
  * physical memory e.g. for memory areas of the each node. This structure is
  * used to store EPC pages for one EPC section and virtual memory area where
  * the pages have been mapped.
- *
- * 'lock' must be held before accessing 'page_list' or 'free_cnt'.
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
-
-	spinlock_t lock;
-	struct list_head page_list;
-	unsigned long free_cnt;
+	struct sgx_numa_node *node;
 };
 
 extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-19 18:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19  4:06 [PATCH] x86/sgx: Avoid returning NULL in __sgx_alloc_epc_page() Kai Huang
2021-03-19  5:42 ` Jarkko Sakkinen
2021-03-19  8:45 ` Borislav Petkov
2021-03-19  9:01   ` Kai Huang
2021-03-19  9:12     ` Kai Huang
2021-03-19 14:50     ` Jarkko Sakkinen
2021-03-19 14:59       ` Borislav Petkov
2021-03-19 15:22         ` Jarkko Sakkinen
2021-03-19 15:52           ` Borislav Petkov
2021-03-19 15:59             ` Dave Hansen
2021-03-19 16:09               ` Borislav Petkov
2021-03-19 14:49   ` Jarkko Sakkinen
2021-03-19 18:45 ` [tip: x86/sgx] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() tip-bot2 for Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).