linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
@ 2021-02-21  2:06 Jarkko Sakkinen
  2021-02-22  0:54 ` Dave Hansen
  2021-02-23 21:42 ` Dave Hansen
  0 siblings, 2 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2021-02-21  2:06 UTC (permalink / raw)
  To: linux-sgx
  Cc: haitao.huang, dan.j.williams, Jarkko Sakkinen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Dave Hansen,
	linux-kernel

Background
==========

EPC section is covered by one or more SRAT entries that are associated with
one and only one PXM (NUMA node). The motivation behind this patch is to
provide basic elements of building allocation scheme based on this premise.

It does not try to fully address NUMA. For instance, it does not provide
integration to the mempolicy API, but neither does introduce any
bottlenecks to address this later on. Memory allocation is a complex topic,
and thus it's better to start with baby steps.

Solution
========

Use phys_to_target_node() to associate each NUMA node with the EPC sections
contained within its range.

In sgx_alloc_epc_page(), first try to allocate from the NUMA node, where
the CPU is executing. If that fails, fallback to the legacy allocation.

Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/Kconfig               |  1 +
 arch/x86/kernel/cpu/sgx/main.c | 52 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..dcb73a5edf63 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1941,6 +1941,7 @@ config X86_SGX
 	depends on CRYPTO_SHA256=y
 	select SRCU
 	select MMU_NOTIFIER
+	select NUMA_KEEP_MEMINFO if NUMA
 	help
 	  Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
 	  that can be used by applications to set aside private regions of code
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8df81a3ed945..21addedc5240 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -18,6 +18,12 @@ static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
 
+/* Nodes with one or more EPC sections. */
+static nodemask_t sgx_numa_mask;
+
+/* Array of lists of EPC sections for each NUMA node. */
+struct list_head *sgx_numa_nodes;
+
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
  * with sgx_reclaimer_lock acquired.
@@ -473,6 +479,26 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
 	return page;
 }
 
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
+{
+	struct sgx_epc_section *section;
+	struct sgx_epc_page *page;
+
+	if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES))
+		return NULL;
+
+	if (!node_isset(nid, sgx_numa_mask))
+		return NULL;
+
+	list_for_each_entry(section, &sgx_numa_nodes[nid], section_list) {
+		page = __sgx_alloc_epc_page_from_section(section);
+		if (page)
+			return page;
+	}
+
+	return NULL;
+}
+
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
  *
@@ -485,13 +511,17 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
  */
 struct sgx_epc_page *__sgx_alloc_epc_page(void)
 {
+	int current_nid = numa_node_id();
 	struct sgx_epc_section *section;
 	struct sgx_epc_page *page;
 	int i;
 
+	page = __sgx_alloc_epc_page_from_node(current_nid);
+	if (page)
+		return page;
+
 	for (i = 0; i < sgx_nr_epc_sections; i++) {
 		section = &sgx_epc_sections[i];
-
 		page = __sgx_alloc_epc_page_from_section(section);
 		if (page)
 			return page;
@@ -665,8 +695,12 @@ static bool __init sgx_page_cache_init(void)
 {
 	u32 eax, ebx, ecx, edx, type;
 	u64 pa, size;
+	int nid;
 	int i;
 
+	nodes_clear(sgx_numa_mask);
+	sgx_numa_nodes = kmalloc_array(MAX_NUMNODES, sizeof(*sgx_numa_nodes), GFP_KERNEL);
+
 	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
 		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
 
@@ -690,6 +724,22 @@ static bool __init sgx_page_cache_init(void)
 		}
 
 		sgx_nr_epc_sections++;
+
+		nid = numa_map_to_online_node(phys_to_target_node(pa));
+
+		if (nid == NUMA_NO_NODE) {
+			pr_err(FW_BUG "unable to map EPC section %d to online node.\n", nid);
+			nid = 0;
+		} else if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES)) {
+			nid = 0;
+		}
+
+		if (!node_isset(nid, sgx_numa_mask)) {
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid]);
+			node_set(nid, sgx_numa_mask);
+		}
+
+		list_add_tail(&sgx_epc_sections[i].section_list, &sgx_numa_nodes[nid]);
 	}
 
 	if (!sgx_nr_epc_sections) {
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5fa42d143feb..4bc31bc4bacf 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -45,6 +45,7 @@ struct sgx_epc_section {
 	spinlock_t lock;
 	struct list_head page_list;
 	unsigned long free_cnt;
+	struct list_head section_list;
 
 	/*
 	 * Pages which need EREMOVE run on them before they can be
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-21  2:06 [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
@ 2021-02-22  0:54 ` Dave Hansen
  2021-02-23 19:14   ` Dave Hansen
  2021-02-23 19:17   ` Jarkko Sakkinen
  2021-02-23 21:42 ` Dave Hansen
  1 sibling, 2 replies; 9+ messages in thread
From: Dave Hansen @ 2021-02-22  0:54 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: haitao.huang, dan.j.williams, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Dave Hansen, linux-kernel

> +/* Nodes with one or more EPC sections. */
> +static nodemask_t sgx_numa_mask;

I'd also add that this is for optimization only.

> +/* Array of lists of EPC sections for each NUMA node. */
> +struct list_head *sgx_numa_nodes;

I'd much prefer:

/*
 * Array with one list_head for each possible NUMA node.  Each
 * list contains all the sgx_epc_section's which are on that
 * node.
 */

Otherwise, it's hard to imagine what this structure looks like.

>  /*
>   * These variables are part of the state of the reclaimer, and must be accessed
>   * with sgx_reclaimer_lock acquired.
> @@ -473,6 +479,26 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
>  	return page;
>  }
>  
> +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> +{
> +	struct sgx_epc_section *section;
> +	struct sgx_epc_page *page;
> +
> +	if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES))
> +		return NULL;
> +
> +	if (!node_isset(nid, sgx_numa_mask))
> +		return NULL;
> +
> +	list_for_each_entry(section, &sgx_numa_nodes[nid], section_list) {
> +		page = __sgx_alloc_epc_page_from_section(section);
> +		if (page)
> +			return page;
> +	}
> +
> +	return NULL;
> +}
> +
>  /**
>   * __sgx_alloc_epc_page() - Allocate an EPC page
>   *
> @@ -485,13 +511,17 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
>   */
>  struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  {
> +	int current_nid = numa_node_id();
>  	struct sgx_epc_section *section;
>  	struct sgx_epc_page *page;
>  	int i;
>  
> +	page = __sgx_alloc_epc_page_from_node(current_nid);
> +	if (page)
> +		return page;

Comments, please.

	/* Try to allocate EPC from the current node, first: */

then:

	/* Search all EPC sections, ignoring locality: */

>  	for (i = 0; i < sgx_nr_epc_sections; i++) {
>  		section = &sgx_epc_sections[i];
> -
>  		page = __sgx_alloc_epc_page_from_section(section);
>  		if (page)
>  			return page;

This still has the problem that it exerts too much pressure on the
low-numbered sgx_epc_sections[].  If a node's sections are full, it
always tries to go after sgx_epc_sections[0].

It can be in another patch, but I think the *minimal* thing we can do
here for a NUMA allocator is to try to at least balance the allocations.

Instead of having a for-each-section loop, I'd make it for-each-node ->
for-each-section.  Something like:

	for (i = 0; i < num_possible_nodes(); i++) {
		node = (numa_node_id() + i) % num_possible_nodes()
		
		if (!node_isset(nid, sgx_numa_mask))
			continue;

		list_for_each_entry(section, &sgx_numa_nodes[nid],
				    section_list) {
			__sgx_alloc_epc_page_from_section(section)
		}
	}
	
Then you have a single loop instead of a "try local then a fall back".

Also, that "node++" thing might be able to use next_online_node().

> @@ -665,8 +695,12 @@ static bool __init sgx_page_cache_init(void)
>  {
>  	u32 eax, ebx, ecx, edx, type;
>  	u64 pa, size;
> +	int nid;
>  	int i;
>  
> +	nodes_clear(sgx_numa_mask);
> +	sgx_numa_nodes = kmalloc_array(MAX_NUMNODES, sizeof(*sgx_numa_nodes), GFP_KERNEL);

MAX_NUMNODES will always be the largest compile-time constant.  That's
4k, IIRC.  num_possible_nodes() might be as small as 1 if NUMA is off.

>  	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
>  		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
>  
> @@ -690,6 +724,22 @@ static bool __init sgx_page_cache_init(void)
>  		}
>  
>  		sgx_nr_epc_sections++;
> +
> +		nid = numa_map_to_online_node(phys_to_target_node(pa));
> +
> +		if (nid == NUMA_NO_NODE) {
> +			pr_err(FW_BUG "unable to map EPC section %d to online node.\n", nid);
> +			nid = 0;

Could we dump out the physical address there?  I think that's even more
informative than a section number.

> +		} else if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES)) {
> +			nid = 0;
> +		}

I'm not sure we really need to check for these.  If we're worried about
the firmware returning these, I'd expect numa_map_to_online_node() to
sanity check them for us.

> +		if (!node_isset(nid, sgx_numa_mask)) {
> +			INIT_LIST_HEAD(&sgx_numa_nodes[nid]);
> +			node_set(nid, sgx_numa_mask);
> +		}
> +
> +		list_add_tail(&sgx_epc_sections[i].section_list, &sgx_numa_nodes[nid]);
>  	}
>  
>  	if (!sgx_nr_epc_sections) {
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 5fa42d143feb..4bc31bc4bacf 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -45,6 +45,7 @@ struct sgx_epc_section {
>  	spinlock_t lock;
>  	struct list_head page_list;
>  	unsigned long free_cnt;
> +	struct list_head section_list;

Maybe name this numa_section_list.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-22  0:54 ` Dave Hansen
@ 2021-02-23 19:14   ` Dave Hansen
  2021-02-24 17:29     ` Jarkko Sakkinen
  2021-02-23 19:17   ` Jarkko Sakkinen
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2021-02-23 19:14 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: haitao.huang, dan.j.williams, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Dave Hansen, linux-kernel

On 2/21/21 4:54 PM, Dave Hansen wrote:
> Instead of having a for-each-section loop, I'd make it for-each-node ->
> for-each-section.  Something like:
> 
> 	for (i = 0; i < num_possible_nodes(); i++) {
> 		node = (numa_node_id() + i) % num_possible_nodes()
> 		
> 		if (!node_isset(nid, sgx_numa_mask))
> 			continue;
> 
> 		list_for_each_entry(section, &sgx_numa_nodes[nid],
> 				    section_list) {
> 			__sgx_alloc_epc_page_from_section(section)
> 		}
> 	}

OK, here's an almost completely fleshed-out loop:

	page = NULL;
	node = numa_node_id();
	start_node = node;
	while (1) {
		list_for_each_entry(section, &sgx_numa_nodes[nid],
 				    section_list) {
 			page = __sgx_alloc_epc(section);
			if (page)
				break;
 		}
		if (page)
			break;
		
		/*
		 * EPC allocation failed on 'node'.  Fall
		 * back with round-robin to other nodes with
		 * EPC:
		 */
		node = next_node_in(node, sgx_numa_mask);

		/* Give up if allocation wraps back to the start: */
		if (node == start_node)
			break;
	}

This will:
1. Always start close to the CPU that started the allocation
2. Always spread the allocations out among nodes evenly, never
   concentrating allocations on node 0, for instance.  (This could also
   be node_random() and get a similar effect, but this probably has
   slightly better default NUMA behavior).
3. Efficiently look among all nodes because of 'sgx_numa_mask'
4. Have no special case for the first allocation.  All allocations will
   be satisfied from this unified loop.
5. Compile down to no loop on CONFIG_NUMA=y systems.
6. Be guaranteed to make forward progress even if preempted and
   numa_node_id() changes in the loop.

BTW, I think the name of __sgx_alloc_epc_page_from_section() can be
shortened down.  It's passed a section and returns a page, so both of
those could be removed from the name.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-22  0:54 ` Dave Hansen
  2021-02-23 19:14   ` Dave Hansen
@ 2021-02-23 19:17   ` Jarkko Sakkinen
  2021-02-23 19:20     ` Dave Hansen
  1 sibling, 1 reply; 9+ messages in thread
From: Jarkko Sakkinen @ 2021-02-23 19:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, haitao.huang, dan.j.williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Dave Hansen,
	linux-kernel

On Sun, Feb 21, 2021 at 04:54:33PM -0800, Dave Hansen wrote:
> > +/* Nodes with one or more EPC sections. */
> > +static nodemask_t sgx_numa_mask;
> 
> I'd also add that this is for optimization only.
> 
> > +/* Array of lists of EPC sections for each NUMA node. */
> > +struct list_head *sgx_numa_nodes;
> 
> I'd much prefer:
> 
> /*
>  * Array with one list_head for each possible NUMA node.  Each
>  * list contains all the sgx_epc_section's which are on that
>  * node.
>  */
> 
> Otherwise, it's hard to imagine what this structure looks like.

OK.

> >  /*
> >   * These variables are part of the state of the reclaimer, and must be accessed
> >   * with sgx_reclaimer_lock acquired.
> > @@ -473,6 +479,26 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
> >  	return page;
> >  }
> >  
> > +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> > +{
> > +	struct sgx_epc_section *section;
> > +	struct sgx_epc_page *page;
> > +
> > +	if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES))
> > +		return NULL;
> > +
> > +	if (!node_isset(nid, sgx_numa_mask))
> > +		return NULL;
> > +
> > +	list_for_each_entry(section, &sgx_numa_nodes[nid], section_list) {
> > +		page = __sgx_alloc_epc_page_from_section(section);
> > +		if (page)
> > +			return page;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> >  /**
> >   * __sgx_alloc_epc_page() - Allocate an EPC page
> >   *
> > @@ -485,13 +511,17 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
> >   */
> >  struct sgx_epc_page *__sgx_alloc_epc_page(void)
> >  {
> > +	int current_nid = numa_node_id();
> >  	struct sgx_epc_section *section;
> >  	struct sgx_epc_page *page;
> >  	int i;
> >  
> > +	page = __sgx_alloc_epc_page_from_node(current_nid);
> > +	if (page)
> > +		return page;
> 
> Comments, please.
> 
> 	/* Try to allocate EPC from the current node, first: */
> 
> then:
> 
> 	/* Search all EPC sections, ignoring locality: */
> 
> >  	for (i = 0; i < sgx_nr_epc_sections; i++) {
> >  		section = &sgx_epc_sections[i];
> > -
> >  		page = __sgx_alloc_epc_page_from_section(section);
> >  		if (page)
> >  			return page;
> 
> This still has the problem that it exerts too much pressure on the
> low-numbered sgx_epc_sections[].  If a node's sections are full, it
> always tries to go after sgx_epc_sections[0].

I have a better idea. See below.

> It can be in another patch, but I think the *minimal* thing we can do
> here for a NUMA allocator is to try to at least balance the allocations.
> 
> Instead of having a for-each-section loop, I'd make it for-each-node ->
> for-each-section.  Something like:
> 
> 	for (i = 0; i < num_possible_nodes(); i++) {
> 		node = (numa_node_id() + i) % num_possible_nodes()
> 		
> 		if (!node_isset(nid, sgx_numa_mask))
> 			continue;
> 
> 		list_for_each_entry(section, &sgx_numa_nodes[nid],
> 				    section_list) {
> 			__sgx_alloc_epc_page_from_section(section)
> 		}
> 	}
> 	
> Then you have a single loop instead of a "try local then a fall back".
> 
> Also, that "node++" thing might be able to use next_online_node().
> 
> > @@ -665,8 +695,12 @@ static bool __init sgx_page_cache_init(void)
> >  {
> >  	u32 eax, ebx, ecx, edx, type;
> >  	u64 pa, size;
> > +	int nid;
> >  	int i;
> >  
> > +	nodes_clear(sgx_numa_mask);
> > +	sgx_numa_nodes = kmalloc_array(MAX_NUMNODES, sizeof(*sgx_numa_nodes), GFP_KERNEL);
> 
> MAX_NUMNODES will always be the largest compile-time constant.  That's
> 4k, IIRC.  num_possible_nodes() might be as small as 1 if NUMA is off.

Right.

> >  	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
> >  		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
> >  
> > @@ -690,6 +724,22 @@ static bool __init sgx_page_cache_init(void)
> >  		}
> >  
> >  		sgx_nr_epc_sections++;
> > +
> > +		nid = numa_map_to_online_node(phys_to_target_node(pa));
> > +
> > +		if (nid == NUMA_NO_NODE) {
> > +			pr_err(FW_BUG "unable to map EPC section %d to online node.\n", nid);
> > +			nid = 0;
> 
> Could we dump out the physical address there?  I think that's even more
> informative than a section number.

Yes.

> > +		} else if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES)) {
> > +			nid = 0;
> > +		}
> 
> I'm not sure we really need to check for these.  If we're worried about
> the firmware returning these, I'd expect numa_map_to_online_node() to
> sanity check them for us.

Yes, let's remove it.

> > +		if (!node_isset(nid, sgx_numa_mask)) {
> > +			INIT_LIST_HEAD(&sgx_numa_nodes[nid]);
> > +			node_set(nid, sgx_numa_mask);
> > +		}
> > +
> > +		list_add_tail(&sgx_epc_sections[i].section_list, &sgx_numa_nodes[nid]);
> >  	}
> >  
> >  	if (!sgx_nr_epc_sections) {
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 5fa42d143feb..4bc31bc4bacf 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -45,6 +45,7 @@ struct sgx_epc_section {
> >  	spinlock_t lock;
> >  	struct list_head page_list;
> >  	unsigned long free_cnt;
> > +	struct list_head section_list;
> 
> Maybe name this numa_section_list.

Instead, let's just:

1. Have a global sgx_free_epc_list and remove sgx_epc_section.
   Pages from this are allocated from this in LIFO fashion.
2. Instead add struct list_head node_list and use that for node
   associated pages.
3. Replace 'int section' with 'int node'.

This will remove one layer of abstraction and provide better fallback
scheme.

E.g. allocate:

1. Check node_list of current node.
2. As a fallback check sgx_free_epc_list.
3. list_del() for two lists.

/Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-23 19:17   ` Jarkko Sakkinen
@ 2021-02-23 19:20     ` Dave Hansen
  2021-02-23 19:33       ` Jarkko Sakkinen
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2021-02-23 19:20 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, haitao.huang, dan.j.williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Dave Hansen,
	linux-kernel

On 2/23/21 11:17 AM, Jarkko Sakkinen wrote:
> Instead, let's just:
> 
> 1. Have a global sgx_free_epc_list and remove sgx_epc_section.
>    Pages from this are allocated from this in LIFO fashion.
> 2. Instead add struct list_head node_list and use that for node
>    associated pages.
> 3. Replace 'int section' with 'int node'.

I was thinking of something similar.

I'm fine with this approach.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-23 19:20     ` Dave Hansen
@ 2021-02-23 19:33       ` Jarkko Sakkinen
  0 siblings, 0 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2021-02-23 19:33 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, haitao.huang, dan.j.williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Dave Hansen,
	linux-kernel

On Tue, Feb 23, 2021 at 11:20:55AM -0800, Dave Hansen wrote:
> On 2/23/21 11:17 AM, Jarkko Sakkinen wrote:
> > Instead, let's just:
> > 
> > 1. Have a global sgx_free_epc_list and remove sgx_epc_section.
> >    Pages from this are allocated from this in LIFO fashion.
> > 2. Instead add struct list_head node_list and use that for node
> >    associated pages.
> > 3. Replace 'int section' with 'int node'.
> 
> I was thinking of something similar.
> 
> I'm fine with this approach.

Here's my two step plan.

1. Let's ack this with the cosmetic changes. It's good first baby
   step to take.
2. I'll do another patch that wipes sgx_epc_section.

Both are functionally self-contained patch that improve. That's why
I don't see point in iterating them as a two patch patch set. One
patch at a time is less overhead.

/Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-21  2:06 [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
  2021-02-22  0:54 ` Dave Hansen
@ 2021-02-23 21:42 ` Dave Hansen
  2021-02-24 17:31   ` Jarkko Sakkinen
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2021-02-23 21:42 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: haitao.huang, dan.j.williams, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Dave Hansen, linux-kernel

This doesn't look like it addresses all of the suggestions that I made
two days ago.  Is that coming in v3?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-23 19:14   ` Dave Hansen
@ 2021-02-24 17:29     ` Jarkko Sakkinen
  0 siblings, 0 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2021-02-24 17:29 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, haitao.huang, dan.j.williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Dave Hansen,
	linux-kernel

On Tue, Feb 23, 2021 at 11:14:10AM -0800, Dave Hansen wrote:
> On 2/21/21 4:54 PM, Dave Hansen wrote:
> > Instead of having a for-each-section loop, I'd make it for-each-node ->
> > for-each-section.  Something like:
> > 
> > 	for (i = 0; i < num_possible_nodes(); i++) {
> > 		node = (numa_node_id() + i) % num_possible_nodes()
> > 		
> > 		if (!node_isset(nid, sgx_numa_mask))
> > 			continue;
> > 
> > 		list_for_each_entry(section, &sgx_numa_nodes[nid],
> > 				    section_list) {
> > 			__sgx_alloc_epc_page_from_section(section)
> > 		}
> > 	}
> 
> OK, here's an almost completely fleshed-out loop:
> 
> 	page = NULL;
> 	node = numa_node_id();
> 	start_node = node;
> 	while (1) {
> 		list_for_each_entry(section, &sgx_numa_nodes[nid],
>  				    section_list) {
>  			page = __sgx_alloc_epc(section);
> 			if (page)
> 				break;
>  		}
> 		if (page)
> 			break;
> 		
> 		/*
> 		 * EPC allocation failed on 'node'.  Fall
> 		 * back with round-robin to other nodes with
> 		 * EPC:
> 		 */
> 		node = next_node_in(node, sgx_numa_mask);
> 
> 		/* Give up if allocation wraps back to the start: */
> 		if (node == start_node)
> 			break;
> 	}
> 
> This will:
> 1. Always start close to the CPU that started the allocation
> 2. Always spread the allocations out among nodes evenly, never
>    concentrating allocations on node 0, for instance.  (This could also
>    be node_random() and get a similar effect, but this probably has
>    slightly better default NUMA behavior).
> 3. Efficiently look among all nodes because of 'sgx_numa_mask'
> 4. Have no special case for the first allocation.  All allocations will
>    be satisfied from this unified loop.
> 5. Compile down to no loop on CONFIG_NUMA=y systems.
> 6. Be guaranteed to make forward progress even if preempted and
>    numa_node_id() changes in the loop.
> 
> BTW, I think the name of __sgx_alloc_epc_page_from_section() can be
> shortened down.  It's passed a section and returns a page, so both of
> those could be removed from the name.

I would start with what I have with minimal changes, and then continue
with a patch that completely wipes the section list.

Then fallback can just take a page from a flat FIFO of EPC pages.

/Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
  2021-02-23 21:42 ` Dave Hansen
@ 2021-02-24 17:31   ` Jarkko Sakkinen
  0 siblings, 0 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2021-02-24 17:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, haitao.huang, dan.j.williams, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Dave Hansen,
	linux-kernel

On Tue, Feb 23, 2021 at 01:42:04PM -0800, Dave Hansen wrote:
> This doesn't look like it addresses all of the suggestions that I made
> two days ago.  Is that coming in v3?

You mean v2 does not address?

/Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-24 17:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-21  2:06 [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
2021-02-22  0:54 ` Dave Hansen
2021-02-23 19:14   ` Dave Hansen
2021-02-24 17:29     ` Jarkko Sakkinen
2021-02-23 19:17   ` Jarkko Sakkinen
2021-02-23 19:20     ` Dave Hansen
2021-02-23 19:33       ` Jarkko Sakkinen
2021-02-23 21:42 ` Dave Hansen
2021-02-24 17:31   ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).