From: Dave Hansen <dave.hansen@intel.com>
To: Jarkko Sakkinen <jarkko@kernel.org>, x86@kernel.org
Cc: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org,
Sean Christopherson <seanjc@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Jiri Kosina <trivial@kernel.org>
Subject: Re: [PATCH RFC] x86/sgx: Add trivial NUMA allocation
Date: Tue, 12 Jan 2021 16:24:01 -0800 [thread overview]
Message-ID: <34b1acd1-e769-0dc2-a225-8ce3d2b6a085@intel.com> (raw)
In-Reply-To: <20201216135031.21518-1-jarkko@kernel.org>
On 12/16/20 5:50 AM, Jarkko Sakkinen wrote:
> Create a pointer array for each NUMA node with the references to the
> contained EPC sections. Use this in __sgx_alloc_epc_page() to knock the
> current NUMA node before the others.
It makes it harder to comment when I'm not on cc.
Hint, hint... ;)
We need a bit more information here as well. What's the relationship
between NUMA nodes and sections? How does the BIOS tell us which NUMA
nodes a section is in? Is it the same or different from normal RAM and
PMEM?
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index c519fc5f6948..0da510763c47 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -13,6 +13,13 @@
> #include "encl.h"
> #include "encls.h"
>
> +struct sgx_numa_node {
> + struct sgx_epc_section *sections[SGX_MAX_EPC_SECTIONS];
> + int nr_sections;
> +};
So, we have a 'NUMA node' structure already: pg_data_t. Why don't we
just hang the epc sections off there?
> +static struct sgx_numa_node sgx_numa_nodes[MAX_NUMNODES];
Hmm... Time to see if I can still do math.
#define SGX_MAX_EPC_SECTIONS 8
(sizeof(struct sgx_epc_section *) + sizeof(int)) * 8 * MAX_NUMNODES
CONFIG_NODES_SHIFT=10 (on Fedora)
#define MAX_NUMNODES (1 << NODES_SHIFT)
12*8*1024 = ~100k. Yikes. For *EVERY* system that enables SGX,
regardless if they are NUMA or not.'
Trivial is great, this may be too trivial.
Adding a list_head to pg_data_t and sgx_epc_section would add
SGX_MAX_EPC_SECTIONS*sizeof(list_head)=192 bytes plus 16 bytes per
*present* NUMA node.
> /**
> * __sgx_alloc_epc_page() - Allocate an EPC page
> *
> @@ -485,14 +511,19 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
> */
> struct sgx_epc_page *__sgx_alloc_epc_page(void)
> {
> - struct sgx_epc_section *section;
> struct sgx_epc_page *page;
> + int nid = numa_node_id();
> int i;
>
> - for (i = 0; i < sgx_nr_epc_sections; i++) {
> - section = &sgx_epc_sections[i];
> + page = __sgx_alloc_epc_page_from_node(nid);
> + if (page)
> + return page;
>
> - page = __sgx_alloc_epc_page_from_section(section);
> + for (i = 0; i < sgx_nr_numa_nodes; i++) {
> + if (i == nid)
> + continue;
Yikes. That's a horribly inefficient loop. Consider if nodes 0 and
1023 were the only ones with EPC. What would this loop do? I think
it's a much better idea to keep a nodemask_t of nodes that have EPC.
Then, just do bitmap searches.
> + page = __sgx_alloc_epc_page_from_node(i);
> if (page)
> return page;
> }
> @@ -661,11 +692,28 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
> ((high & GENMASK_ULL(19, 0)) << 32);
> }
>
> +static int __init sgx_pfn_to_nid(unsigned long pfn)
> +{
> + pg_data_t *pgdat;
> + int nid;
> +
> + for (nid = 0; nid < nr_node_ids; nid++) {
> + pgdat = NODE_DATA(nid);
> +
> + if (pfn >= pgdat->node_start_pfn &&
> + pfn < (pgdat->node_start_pfn + pgdat->node_spanned_pages))
> + return nid;
> + }
> +
> + return 0;
> +}
I'm not positive this works. I *thought* these ->node_start_pfn and
->node_spanned_pages are really only guaranteed to cover memory which is
managed by the kernel and has 'struct page' for it.
EPC doesn't have a 'struct page', so won't necessarily be covered by the
pgdat-> and zone-> ranges. I *think* you may have to go all the way
back to the ACPI SRAT for this.
It would also be *possible* to have an SRAT constructed like this:
0->1GB System RAM - Node 0
1->2GB Reserved - Node 1
2->3GB System RAM - Node 0
Where the 1->2GB is EPC. The Node 0 pg_data_t would be:
pgdat->node_start_pfn = 0
pgdat->node_spanned_pages = 3GB
next prev parent reply other threads:[~2021-01-13 0:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-16 13:50 [PATCH RFC] x86/sgx: Add trivial NUMA allocation Jarkko Sakkinen
2021-01-13 0:24 ` Dave Hansen [this message]
2021-01-14 17:54 ` Jarkko Sakkinen
2021-01-14 18:35 ` Dave Hansen
2021-01-15 10:19 ` Jarkko Sakkinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34b1acd1-e769-0dc2-a225-8ce3d2b6a085@intel.com \
--to=dave.hansen@intel.com \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=jarkko@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-sgx@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=trivial@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).