From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 278F9C433F5 for ; Tue, 28 Aug 2018 10:30:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C533F20869 for ; Tue, 28 Aug 2018 10:30:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C533F20869 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727589AbeH1OVs (ORCPT ); Tue, 28 Aug 2018 10:21:48 -0400 Received: from mga01.intel.com ([192.55.52.88]:60233 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727412AbeH1OVs (ORCPT ); Tue, 28 Aug 2018 10:21:48 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2018 03:30:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,299,1531810800"; d="scan'208";a="80821490" Received: from jsakkine-mobl1.fi.intel.com (HELO localhost) ([10.237.66.39]) by fmsmga002.fm.intel.com with ESMTP; 28 Aug 2018 03:30:40 -0700 Date: Tue, 28 Aug 2018 13:30:40 +0300 From: Jarkko Sakkinen To: Dave Hansen Cc: x86@kernel.org, platform-driver-x86@vger.kernel.org, sean.j.christopherson@intel.com, nhorman@redhat.com, npmccallum@redhat.com, linux-sgx@vger.kernel.org, Serge Ayoun , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Suresh Siddha , "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Subject: Re: [PATCH v13 07/13] x86/sgx: Add data structures for tracking the EPC pages Message-ID: <20180828103040.GA21326@linux.intel.com> References: <20180827185507.17087-1-jarkko.sakkinen@linux.intel.com> <20180827185507.17087-8-jarkko.sakkinen@linux.intel.com> <4666cae8-c711-8dd5-cbce-3d97cc19a9e5@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4666cae8-c711-8dd5-cbce-3d97cc19a9e5@intel.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 27, 2018 at 02:07:53PM -0700, Dave Hansen wrote: > On 08/27/2018 11:53 AM, Jarkko Sakkinen wrote: > > Add data structures to track Enclave Page Cache (EPC) pages. EPC is > > divided into multiple banks (1-N) of which addresses and sizes can be > > enumerated with CPUID by the OS. > > > > On NUMA systems a node can have at most bank. A bank can be at most part of > > two nodes. SGX supports both nodes with a single memory controller and also > > sub-cluster nodes with severals memory controllers on a single die. > > > > Signed-off-by: Jarkko Sakkinen > > Co-developed-by: Serge Ayoun > > Co-developed-by: Sean Christopherson > > Signed-off-by: Serge Ayoun > > Signed-off-by: Sean Christopherson > > --- > > arch/x86/include/asm/sgx.h | 60 ++++++++++++++++++ > > arch/x86/kernel/cpu/intel_sgx.c | 106 +++++++++++++++++++++++++++++++- > > 2 files changed, 164 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h > > index 2130e639ab49..17b7b3aa66bf 100644 > > --- a/arch/x86/include/asm/sgx.h > > +++ b/arch/x86/include/asm/sgx.h > > @@ -4,9 +4,69 @@ > > #ifndef _ASM_X86_SGX_H > > #define _ASM_X86_SGX_H > > > > +#include > > +#include > > +#include > > #include > > +#include > > +#include > > + > > +#define SGX_MAX_EPC_BANKS 8 > > This is _still_ missing a meaningful description of what a bank is and > whether it is a hardware or software structure. > > It would also help us to determine whether your bit packing below is > really required. I think a better name would be EPC section as this is what the SDM uses in the Table 3-8 when describing subleaves of EAX=0x12 (SGX specific leaf) starting from ECX=0x02. It is a software structure that contains the information given by these subleaves. These sections constitute the physical pages that are part of the EPC. > > +struct sgx_epc_page { > > + unsigned long desc; > > + struct list_head list; > > +}; > > + > > +struct sgx_epc_bank { > > + unsigned long pa; > > + void *va; > > + unsigned long size; > > Please add units. size could be bytes or pages, or who knows what. I > can't tell you how many bugs I've tripped over in the past from simple > unit conversions Will do. > > + struct sgx_epc_page *pages_data; > > + struct sgx_epc_page **pages; > > + unsigned long free_cnt; > > + spinlock_t lock; > > +}; > > > > extern bool sgx_enabled; > > extern bool sgx_lc_enabled; > > +extern struct sgx_epc_bank sgx_epc_banks[SGX_MAX_EPC_BANKS]; > > + > > +/* > > + * enum sgx_epc_page_desc - defines bits and masks for an EPC page's desc > > Why are you bothering packing these bits? This seems a rather > convoluted way to store two integers. To keep struct sgx_epc_page 64 bytes. > > +static __init int sgx_init_epc_bank(u64 addr, u64 size, unsigned long index, > > + struct sgx_epc_bank *bank) > > +{ > > + unsigned long nr_pages = size >> PAGE_SHIFT; > > + struct sgx_epc_page *pages_data; > > + unsigned long i; > > + void *va; > > + > > + va = ioremap_cache(addr, size); > > + if (!va) > > + return -ENOMEM; > > + > > + pages_data = kcalloc(nr_pages, sizeof(struct sgx_epc_page), GFP_KERNEL); > > + if (!pages_data) > > + goto out_iomap; > > This looks like you're roughly limited by the page allocator to a bank > size of ~1.4GB which seems kinda small. Is this really OK? Where does this limitation come from? > > > + bank->pages = kcalloc(nr_pages, sizeof(struct sgx_epc_page *), > > + GFP_KERNEL); > > + if (!bank->pages) > > + goto out_pdata; > > + > > + for (i = 0; i < nr_pages; i++) { > > + bank->pages[i] = &pages_data[i]; > > + bank->pages[i]->desc = (addr + (i << PAGE_SHIFT)) | index; > > + } > > + > > + bank->pa = addr; > > + bank->size = size; > > + bank->va = va; > > + bank->free_cnt = nr_pages; > > + bank->pages_data = pages_data; > > + spin_lock_init(&bank->lock); > > + return 0; > > +out_pdata: > > + kfree(pages_data); > > +out_iomap: > > + iounmap(va); > > + return -ENOMEM; > > +} > > + > > +static __init void sgx_page_cache_teardown(void) > > +{ > > + struct sgx_epc_bank *bank; > > + int i; > > + > > + for (i = 0; i < sgx_nr_epc_banks; i++) { > > + bank = &sgx_epc_banks[i]; > > + iounmap((void *)bank->va); > > + kfree(bank->pages); > > + kfree(bank->pages_data); > > + } > > +} > > + > > +static inline u64 sgx_combine_bank_regs(u64 low, u64 high) > > +{ > > + return (low & 0xFFFFF000) + ((high & 0xFFFFF) << 32); > > +} > > -ENOCOMMENT for a rather weird looking calculation Yea, totally agreed... I'll think about how to make this cleaner. Maybe it would be anyway better idea to open code this to the call sites and explain the calculation in a comment. > > +static __init int sgx_page_cache_init(void) > > +{ > > + u32 eax, ebx, ecx, edx; > > + u64 pa, size; > > + int ret; > > + int i; > > + > > + for (i = 0; i < SGX_MAX_EPC_BANKS; i++) { > > + cpuid_count(SGX_CPUID, 2 + i, &eax, &ebx, &ecx, &edx); > > + if (!(eax & 0xF)) > > + break; > > So, we have random data coming out of a random CPUID leaf being called > 'eax' and then being tested against a random hard-coded mask. This > seems rather unfortunate for someone trying to understand the code. Can > we do better? Should probably do something along the lines: #define SGX_CPUID_SECTION(i) (2 + (i)) enum sgx_section { SGX_CPUID_SECTION_INVALID = 0x00, SGX_CPUID_SECTION_VALID = 0x1B, SGX_CPUID_SECTION_MASK = 0xFF, }; for (i = 0; i < SGX_MAX_EPC_BANKS; i++) { cpuid_count(SGX_CPUID, SGX_CPUID_SECTION(i), &eax, &ebx, &ecx, &edx); section = eax & SGX_SECTION_MASK; if (section != SGX_CPUID_SECTION_VALID) { if (section != SGX_CPUID_SECTION_INVALID) { /* Maybe a warning here for any other value as * they are reserved according to the SDM? */ } continue; } /* ... */ } > > + pa = sgx_combine_bank_regs(eax, ebx); > > + size = sgx_combine_bank_regs(ecx, edx); > > + pr_info("EPC bank 0x%llx-0x%llx\n", pa, pa + size - 1); > > + ret = sgx_init_epc_bank(pa, size, i, &sgx_epc_banks[i]); > > + if (ret) { > > + sgx_page_cache_teardown(); > > + return ret; > > + } > > So if one bank fails, we tear down all banks, yet leave sgx_nr_epc_banks > incremented? That sounds troublesome. It is. Thanks for spotting that out. > > + sgx_nr_epc_banks++; > > + } > > + > > + if (!sgx_nr_epc_banks) { > > + pr_err("There are zero EPC banks.\n"); > > + return -ENODEV; > > + } > > + > > + return 0; > > +} > > Does this support hot-addition of a bank? If not, why not? This is the DSDT for this data from my GLK NUC: Scope (_SB) { Device (EPC) { Name (_HID, EisaId ("INT0E0C")) // _HID: Hardware ID Name (_STR, Unicode ("Enclave Page Cache 1.0")) // _STR: Description String Name (_MLS, Package (0x01) // _MLS: Multiple Language String { Package (0x02) { "en", Unicode ("Enclave Page Cache 1.0") } }) Name (RBUF, ResourceTemplate () { QWordMemory (ResourceConsumer, PosDecode, MinNotFixed, MaxNotFixed, NonCacheable, ReadWrite, 0x0000000000000000, // Granularity 0x0000000000000000, // Range Minimum 0x0000000000000000, // Range Maximum 0x0000000000000000, // Translation Offset 0x0000000000000001, // Length ,, _Y18, AddressRangeMemory, TypeStatic) }) Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { CreateQWordField (RBUF, \_SB.EPC._Y18._MIN, EMIN) // _MIN: Minimum Base Address CreateQWordField (RBUF, \_SB.EPC._Y18._MAX, EMAX) // _MAX: Maximum Base Address CreateQWordField (RBUF, \_SB.EPC._Y18._LEN, ELEN) // _LEN: Length EMIN = EMNA /* External reference */ ELEN = ELNG /* External reference */ EMAX = ((EMNA + ELNG) - One) Return (RBUF) /* \_SB_.EPC_.RBUF */ } Method (_STA, 0, NotSerialized) // _STA: Status { If ((EPCS != Zero)) { Return (0x0F) } Return (Zero) } } } I'm not aware that we would have an ACPI specification for SGX so this is all I have at the moment (does not show any ACPI event for hotplugging). /Jarkko From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga01.intel.com ([192.55.52.88]:60233 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727412AbeH1OVs (ORCPT ); Tue, 28 Aug 2018 10:21:48 -0400 Date: Tue, 28 Aug 2018 13:30:40 +0300 From: Jarkko Sakkinen To: Dave Hansen CC: , , , , , , Serge Ayoun , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Suresh Siddha , "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Subject: Re: [PATCH v13 07/13] x86/sgx: Add data structures for tracking the EPC pages Message-ID: <20180828103040.GA21326@linux.intel.com> References: <20180827185507.17087-1-jarkko.sakkinen@linux.intel.com> <20180827185507.17087-8-jarkko.sakkinen@linux.intel.com> <4666cae8-c711-8dd5-cbce-3d97cc19a9e5@intel.com> Content-Type: text/plain; charset="us-ascii" In-Reply-To: <4666cae8-c711-8dd5-cbce-3d97cc19a9e5@intel.com> Sender: List-ID: Return-Path: linux-sgx-owner@vger.kernel.org MIME-Version: 1.0 On Mon, Aug 27, 2018 at 02:07:53PM -0700, Dave Hansen wrote: > On 08/27/2018 11:53 AM, Jarkko Sakkinen wrote: > > Add data structures to track Enclave Page Cache (EPC) pages. EPC is > > divided into multiple banks (1-N) of which addresses and sizes can be > > enumerated with CPUID by the OS. > > > > On NUMA systems a node can have at most bank. A bank can be at most part of > > two nodes. SGX supports both nodes with a single memory controller and also > > sub-cluster nodes with severals memory controllers on a single die. > > > > Signed-off-by: Jarkko Sakkinen > > Co-developed-by: Serge Ayoun > > Co-developed-by: Sean Christopherson > > Signed-off-by: Serge Ayoun > > Signed-off-by: Sean Christopherson > > --- > > arch/x86/include/asm/sgx.h | 60 ++++++++++++++++++ > > arch/x86/kernel/cpu/intel_sgx.c | 106 +++++++++++++++++++++++++++++++- > > 2 files changed, 164 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h > > index 2130e639ab49..17b7b3aa66bf 100644 > > --- a/arch/x86/include/asm/sgx.h > > +++ b/arch/x86/include/asm/sgx.h > > @@ -4,9 +4,69 @@ > > #ifndef _ASM_X86_SGX_H > > #define _ASM_X86_SGX_H > > > > +#include > > +#include > > +#include > > #include > > +#include > > +#include > > + > > +#define SGX_MAX_EPC_BANKS 8 > > This is _still_ missing a meaningful description of what a bank is and > whether it is a hardware or software structure. > > It would also help us to determine whether your bit packing below is > really required. I think a better name would be EPC section as this is what the SDM uses in the Table 3-8 when describing subleaves of EAX=0x12 (SGX specific leaf) starting from ECX=0x02. It is a software structure that contains the information given by these subleaves. These sections constitute the physical pages that are part of the EPC. > > +struct sgx_epc_page { > > + unsigned long desc; > > + struct list_head list; > > +}; > > + > > +struct sgx_epc_bank { > > + unsigned long pa; > > + void *va; > > + unsigned long size; > > Please add units. size could be bytes or pages, or who knows what. I > can't tell you how many bugs I've tripped over in the past from simple > unit conversions Will do. > > + struct sgx_epc_page *pages_data; > > + struct sgx_epc_page **pages; > > + unsigned long free_cnt; > > + spinlock_t lock; > > +}; > > > > extern bool sgx_enabled; > > extern bool sgx_lc_enabled; > > +extern struct sgx_epc_bank sgx_epc_banks[SGX_MAX_EPC_BANKS]; > > + > > +/* > > + * enum sgx_epc_page_desc - defines bits and masks for an EPC page's desc > > Why are you bothering packing these bits? This seems a rather > convoluted way to store two integers. To keep struct sgx_epc_page 64 bytes. > > +static __init int sgx_init_epc_bank(u64 addr, u64 size, unsigned long index, > > + struct sgx_epc_bank *bank) > > +{ > > + unsigned long nr_pages = size >> PAGE_SHIFT; > > + struct sgx_epc_page *pages_data; > > + unsigned long i; > > + void *va; > > + > > + va = ioremap_cache(addr, size); > > + if (!va) > > + return -ENOMEM; > > + > > + pages_data = kcalloc(nr_pages, sizeof(struct sgx_epc_page), GFP_KERNEL); > > + if (!pages_data) > > + goto out_iomap; > > This looks like you're roughly limited by the page allocator to a bank > size of ~1.4GB which seems kinda small. Is this really OK? Where does this limitation come from? > > > + bank->pages = kcalloc(nr_pages, sizeof(struct sgx_epc_page *), > > + GFP_KERNEL); > > + if (!bank->pages) > > + goto out_pdata; > > + > > + for (i = 0; i < nr_pages; i++) { > > + bank->pages[i] = &pages_data[i]; > > + bank->pages[i]->desc = (addr + (i << PAGE_SHIFT)) | index; > > + } > > + > > + bank->pa = addr; > > + bank->size = size; > > + bank->va = va; > > + bank->free_cnt = nr_pages; > > + bank->pages_data = pages_data; > > + spin_lock_init(&bank->lock); > > + return 0; > > +out_pdata: > > + kfree(pages_data); > > +out_iomap: > > + iounmap(va); > > + return -ENOMEM; > > +} > > + > > +static __init void sgx_page_cache_teardown(void) > > +{ > > + struct sgx_epc_bank *bank; > > + int i; > > + > > + for (i = 0; i < sgx_nr_epc_banks; i++) { > > + bank = &sgx_epc_banks[i]; > > + iounmap((void *)bank->va); > > + kfree(bank->pages); > > + kfree(bank->pages_data); > > + } > > +} > > + > > +static inline u64 sgx_combine_bank_regs(u64 low, u64 high) > > +{ > > + return (low & 0xFFFFF000) + ((high & 0xFFFFF) << 32); > > +} > > -ENOCOMMENT for a rather weird looking calculation Yea, totally agreed... I'll think about how to make this cleaner. Maybe it would be anyway better idea to open code this to the call sites and explain the calculation in a comment. > > +static __init int sgx_page_cache_init(void) > > +{ > > + u32 eax, ebx, ecx, edx; > > + u64 pa, size; > > + int ret; > > + int i; > > + > > + for (i = 0; i < SGX_MAX_EPC_BANKS; i++) { > > + cpuid_count(SGX_CPUID, 2 + i, &eax, &ebx, &ecx, &edx); > > + if (!(eax & 0xF)) > > + break; > > So, we have random data coming out of a random CPUID leaf being called > 'eax' and then being tested against a random hard-coded mask. This > seems rather unfortunate for someone trying to understand the code. Can > we do better? Should probably do something along the lines: #define SGX_CPUID_SECTION(i) (2 + (i)) enum sgx_section { SGX_CPUID_SECTION_INVALID = 0x00, SGX_CPUID_SECTION_VALID = 0x1B, SGX_CPUID_SECTION_MASK = 0xFF, }; for (i = 0; i < SGX_MAX_EPC_BANKS; i++) { cpuid_count(SGX_CPUID, SGX_CPUID_SECTION(i), &eax, &ebx, &ecx, &edx); section = eax & SGX_SECTION_MASK; if (section != SGX_CPUID_SECTION_VALID) { if (section != SGX_CPUID_SECTION_INVALID) { /* Maybe a warning here for any other value as * they are reserved according to the SDM? */ } continue; } /* ... */ } > > + pa = sgx_combine_bank_regs(eax, ebx); > > + size = sgx_combine_bank_regs(ecx, edx); > > + pr_info("EPC bank 0x%llx-0x%llx\n", pa, pa + size - 1); > > + ret = sgx_init_epc_bank(pa, size, i, &sgx_epc_banks[i]); > > + if (ret) { > > + sgx_page_cache_teardown(); > > + return ret; > > + } > > So if one bank fails, we tear down all banks, yet leave sgx_nr_epc_banks > incremented? That sounds troublesome. It is. Thanks for spotting that out. > > + sgx_nr_epc_banks++; > > + } > > + > > + if (!sgx_nr_epc_banks) { > > + pr_err("There are zero EPC banks.\n"); > > + return -ENODEV; > > + } > > + > > + return 0; > > +} > > Does this support hot-addition of a bank? If not, why not? This is the DSDT for this data from my GLK NUC: Scope (_SB) { Device (EPC) { Name (_HID, EisaId ("INT0E0C")) // _HID: Hardware ID Name (_STR, Unicode ("Enclave Page Cache 1.0")) // _STR: Description String Name (_MLS, Package (0x01) // _MLS: Multiple Language String { Package (0x02) { "en", Unicode ("Enclave Page Cache 1.0") } }) Name (RBUF, ResourceTemplate () { QWordMemory (ResourceConsumer, PosDecode, MinNotFixed, MaxNotFixed, NonCacheable, ReadWrite, 0x0000000000000000, // Granularity 0x0000000000000000, // Range Minimum 0x0000000000000000, // Range Maximum 0x0000000000000000, // Translation Offset 0x0000000000000001, // Length ,, _Y18, AddressRangeMemory, TypeStatic) }) Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { CreateQWordField (RBUF, \_SB.EPC._Y18._MIN, EMIN) // _MIN: Minimum Base Address CreateQWordField (RBUF, \_SB.EPC._Y18._MAX, EMAX) // _MAX: Maximum Base Address CreateQWordField (RBUF, \_SB.EPC._Y18._LEN, ELEN) // _LEN: Length EMIN = EMNA /* External reference */ ELEN = ELNG /* External reference */ EMAX = ((EMNA + ELNG) - One) Return (RBUF) /* \_SB_.EPC_.RBUF */ } Method (_STA, 0, NotSerialized) // _STA: Status { If ((EPCS != Zero)) { Return (0x0F) } Return (Zero) } } } I'm not aware that we would have an ACPI specification for SGX so this is all I have at the moment (does not show any ACPI event for hotplugging). /Jarkko From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jarkko Sakkinen Subject: Re: [PATCH v13 07/13] x86/sgx: Add data structures for tracking the EPC pages Date: Tue, 28 Aug 2018 13:30:40 +0300 Message-ID: <20180828103040.GA21326@linux.intel.com> References: <20180827185507.17087-1-jarkko.sakkinen@linux.intel.com> <20180827185507.17087-8-jarkko.sakkinen@linux.intel.com> <4666cae8-c711-8dd5-cbce-3d97cc19a9e5@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4666cae8-c711-8dd5-cbce-3d97cc19a9e5@intel.com> Sender: linux-kernel-owner@vger.kernel.org To: Dave Hansen Cc: x86@kernel.org, platform-driver-x86@vger.kernel.org, sean.j.christopherson@intel.com, nhorman@redhat.com, npmccallum@redhat.com, linux-sgx@vger.kernel.org, Serge Ayoun , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Suresh Siddha , "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" List-Id: platform-driver-x86.vger.kernel.org On Mon, Aug 27, 2018 at 02:07:53PM -0700, Dave Hansen wrote: > On 08/27/2018 11:53 AM, Jarkko Sakkinen wrote: > > Add data structures to track Enclave Page Cache (EPC) pages. EPC is > > divided into multiple banks (1-N) of which addresses and sizes can be > > enumerated with CPUID by the OS. > > > > On NUMA systems a node can have at most bank. A bank can be at most part of > > two nodes. SGX supports both nodes with a single memory controller and also > > sub-cluster nodes with severals memory controllers on a single die. > > > > Signed-off-by: Jarkko Sakkinen > > Co-developed-by: Serge Ayoun > > Co-developed-by: Sean Christopherson > > Signed-off-by: Serge Ayoun > > Signed-off-by: Sean Christopherson > > --- > > arch/x86/include/asm/sgx.h | 60 ++++++++++++++++++ > > arch/x86/kernel/cpu/intel_sgx.c | 106 +++++++++++++++++++++++++++++++- > > 2 files changed, 164 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h > > index 2130e639ab49..17b7b3aa66bf 100644 > > --- a/arch/x86/include/asm/sgx.h > > +++ b/arch/x86/include/asm/sgx.h > > @@ -4,9 +4,69 @@ > > #ifndef _ASM_X86_SGX_H > > #define _ASM_X86_SGX_H > > > > +#include > > +#include > > +#include > > #include > > +#include > > +#include > > + > > +#define SGX_MAX_EPC_BANKS 8 > > This is _still_ missing a meaningful description of what a bank is and > whether it is a hardware or software structure. > > It would also help us to determine whether your bit packing below is > really required. I think a better name would be EPC section as this is what the SDM uses in the Table 3-8 when describing subleaves of EAX=0x12 (SGX specific leaf) starting from ECX=0x02. It is a software structure that contains the information given by these subleaves. These sections constitute the physical pages that are part of the EPC. > > +struct sgx_epc_page { > > + unsigned long desc; > > + struct list_head list; > > +}; > > + > > +struct sgx_epc_bank { > > + unsigned long pa; > > + void *va; > > + unsigned long size; > > Please add units. size could be bytes or pages, or who knows what. I > can't tell you how many bugs I've tripped over in the past from simple > unit conversions Will do. > > + struct sgx_epc_page *pages_data; > > + struct sgx_epc_page **pages; > > + unsigned long free_cnt; > > + spinlock_t lock; > > +}; > > > > extern bool sgx_enabled; > > extern bool sgx_lc_enabled; > > +extern struct sgx_epc_bank sgx_epc_banks[SGX_MAX_EPC_BANKS]; > > + > > +/* > > + * enum sgx_epc_page_desc - defines bits and masks for an EPC page's desc > > Why are you bothering packing these bits? This seems a rather > convoluted way to store two integers. To keep struct sgx_epc_page 64 bytes. > > +static __init int sgx_init_epc_bank(u64 addr, u64 size, unsigned long index, > > + struct sgx_epc_bank *bank) > > +{ > > + unsigned long nr_pages = size >> PAGE_SHIFT; > > + struct sgx_epc_page *pages_data; > > + unsigned long i; > > + void *va; > > + > > + va = ioremap_cache(addr, size); > > + if (!va) > > + return -ENOMEM; > > + > > + pages_data = kcalloc(nr_pages, sizeof(struct sgx_epc_page), GFP_KERNEL); > > + if (!pages_data) > > + goto out_iomap; > > This looks like you're roughly limited by the page allocator to a bank > size of ~1.4GB which seems kinda small. Is this really OK? Where does this limitation come from? > > > + bank->pages = kcalloc(nr_pages, sizeof(struct sgx_epc_page *), > > + GFP_KERNEL); > > + if (!bank->pages) > > + goto out_pdata; > > + > > + for (i = 0; i < nr_pages; i++) { > > + bank->pages[i] = &pages_data[i]; > > + bank->pages[i]->desc = (addr + (i << PAGE_SHIFT)) | index; > > + } > > + > > + bank->pa = addr; > > + bank->size = size; > > + bank->va = va; > > + bank->free_cnt = nr_pages; > > + bank->pages_data = pages_data; > > + spin_lock_init(&bank->lock); > > + return 0; > > +out_pdata: > > + kfree(pages_data); > > +out_iomap: > > + iounmap(va); > > + return -ENOMEM; > > +} > > + > > +static __init void sgx_page_cache_teardown(void) > > +{ > > + struct sgx_epc_bank *bank; > > + int i; > > + > > + for (i = 0; i < sgx_nr_epc_banks; i++) { > > + bank = &sgx_epc_banks[i]; > > + iounmap((void *)bank->va); > > + kfree(bank->pages); > > + kfree(bank->pages_data); > > + } > > +} > > + > > +static inline u64 sgx_combine_bank_regs(u64 low, u64 high) > > +{ > > + return (low & 0xFFFFF000) + ((high & 0xFFFFF) << 32); > > +} > > -ENOCOMMENT for a rather weird looking calculation Yea, totally agreed... I'll think about how to make this cleaner. Maybe it would be anyway better idea to open code this to the call sites and explain the calculation in a comment. > > +static __init int sgx_page_cache_init(void) > > +{ > > + u32 eax, ebx, ecx, edx; > > + u64 pa, size; > > + int ret; > > + int i; > > + > > + for (i = 0; i < SGX_MAX_EPC_BANKS; i++) { > > + cpuid_count(SGX_CPUID, 2 + i, &eax, &ebx, &ecx, &edx); > > + if (!(eax & 0xF)) > > + break; > > So, we have random data coming out of a random CPUID leaf being called > 'eax' and then being tested against a random hard-coded mask. This > seems rather unfortunate for someone trying to understand the code. Can > we do better? Should probably do something along the lines: #define SGX_CPUID_SECTION(i) (2 + (i)) enum sgx_section { SGX_CPUID_SECTION_INVALID = 0x00, SGX_CPUID_SECTION_VALID = 0x1B, SGX_CPUID_SECTION_MASK = 0xFF, }; for (i = 0; i < SGX_MAX_EPC_BANKS; i++) { cpuid_count(SGX_CPUID, SGX_CPUID_SECTION(i), &eax, &ebx, &ecx, &edx); section = eax & SGX_SECTION_MASK; if (section != SGX_CPUID_SECTION_VALID) { if (section != SGX_CPUID_SECTION_INVALID) { /* Maybe a warning here for any other value as * they are reserved according to the SDM? */ } continue; } /* ... */ } > > + pa = sgx_combine_bank_regs(eax, ebx); > > + size = sgx_combine_bank_regs(ecx, edx); > > + pr_info("EPC bank 0x%llx-0x%llx\n", pa, pa + size - 1); > > + ret = sgx_init_epc_bank(pa, size, i, &sgx_epc_banks[i]); > > + if (ret) { > > + sgx_page_cache_teardown(); > > + return ret; > > + } > > So if one bank fails, we tear down all banks, yet leave sgx_nr_epc_banks > incremented? That sounds troublesome. It is. Thanks for spotting that out. > > + sgx_nr_epc_banks++; > > + } > > + > > + if (!sgx_nr_epc_banks) { > > + pr_err("There are zero EPC banks.\n"); > > + return -ENODEV; > > + } > > + > > + return 0; > > +} > > Does this support hot-addition of a bank? If not, why not? This is the DSDT for this data from my GLK NUC: Scope (_SB) { Device (EPC) { Name (_HID, EisaId ("INT0E0C")) // _HID: Hardware ID Name (_STR, Unicode ("Enclave Page Cache 1.0")) // _STR: Description String Name (_MLS, Package (0x01) // _MLS: Multiple Language String { Package (0x02) { "en", Unicode ("Enclave Page Cache 1.0") } }) Name (RBUF, ResourceTemplate () { QWordMemory (ResourceConsumer, PosDecode, MinNotFixed, MaxNotFixed, NonCacheable, ReadWrite, 0x0000000000000000, // Granularity 0x0000000000000000, // Range Minimum 0x0000000000000000, // Range Maximum 0x0000000000000000, // Translation Offset 0x0000000000000001, // Length ,, _Y18, AddressRangeMemory, TypeStatic) }) Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { CreateQWordField (RBUF, \_SB.EPC._Y18._MIN, EMIN) // _MIN: Minimum Base Address CreateQWordField (RBUF, \_SB.EPC._Y18._MAX, EMAX) // _MAX: Maximum Base Address CreateQWordField (RBUF, \_SB.EPC._Y18._LEN, ELEN) // _LEN: Length EMIN = EMNA /* External reference */ ELEN = ELNG /* External reference */ EMAX = ((EMNA + ELNG) - One) Return (RBUF) /* \_SB_.EPC_.RBUF */ } Method (_STA, 0, NotSerialized) // _STA: Status { If ((EPCS != Zero)) { Return (0x0F) } Return (Zero) } } } I'm not aware that we would have an ACPI specification for SGX so this is all I have at the moment (does not show any ACPI event for hotplugging). /Jarkko