* [PATCH 0/2] x86/sgx: Limit EPC overcommit @ 2021-12-20 17:46 Kristen Carlson Accardi 2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi 2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi 0 siblings, 2 replies; 19+ messages in thread From: Kristen Carlson Accardi @ 2021-12-20 17:46 UTC (permalink / raw) To: linux-sgx SGX currently allows EPC pages to be overcommitted. If the system is out of enclave memory, EPC pages are swapped to normal RAM via a per enclave shared memory area. This shared memory is not charged to the enclave or the task mapping it, making it hard to account for using normal methods. Since SGX will allow EPC pages to be overcommitted without limits, enclaves can consume system memory for these backing pages without limits. In order to prevent this, set a cap on the amount of overcommit SGX allows based on a module param which can be set at boot time. Then, whenever a backing page is requested by an enclave, keep track of the total amount of shared memory pages used across all enclaves and return an error if the overcommit limit has been reached. This will restrict the total amount of backing pages that all enclaves can consume to a maximum amount, and prevent enclaves from consuming all the system RAM for backing pages. The overcommit percentage has a default value of 100, which limits shared memory page consumption to equal to the number of EPC pages in the system. If sgx.overcommit_percent is set to a negative value, SGX will not place any limits on the amount of overcommit that might be requested, and SGX will behave as it has previously without the sgx.overcommit_percent limit. Kristen Carlson Accardi (2): x86/sgx: Add accounting for tracking overcommit x86/sgx: account backing pages .../admin-guide/kernel-parameters.txt | 7 ++ Documentation/x86/sgx.rst | 16 +++- arch/x86/kernel/cpu/sgx/Makefile | 6 +- arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++- arch/x86/kernel/cpu/sgx/encl.h | 6 +- arch/x86/kernel/cpu/sgx/main.c | 70 ++++++++++++++++- arch/x86/kernel/cpu/sgx/sgx.h | 2 + 7 files changed, 173 insertions(+), 10 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 17:46 [PATCH 0/2] x86/sgx: Limit EPC overcommit Kristen Carlson Accardi @ 2021-12-20 17:46 ` Kristen Carlson Accardi 2021-12-20 19:30 ` Borislav Petkov 2021-12-28 23:04 ` Jarkko Sakkinen 2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi 1 sibling, 2 replies; 19+ messages in thread From: Kristen Carlson Accardi @ 2021-12-20 17:46 UTC (permalink / raw) To: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin When the system runs out of enclave memory, SGX can reclaim EPC pages by swapping to normal RAM. This normal RAM is allocated via a per-enclave shared memory area. The shared memory area is not mapped into the enclave or the task mapping it, which makes its memory use opaque (including to the OOM killer). Having lots of hard to find memory around is problematic, especially when there is no limit. Introduce a module parameter and a global counter that can be used to limit the number of pages that enclaves are able to consume for backing storage. This parameter is a percentage value that is used in conjunction with the number of EPC pages in the system to set a cap on the amount of backing RAM that can be consumed. The default for this value is 100, which limits the total number of shared memory pages that may be consumed by all enclaves as backing pages to the number of EPC pages on the system. For example, on an SGX system that has 128MB of EPC, this default would cap the amount of normal RAM that SGX consumes for its shared memory areas at 128MB. If sgx.overcommit_percent is set to a negative value (such as -1), SGX will not place any limits on the amount of overcommit that might be requested, and SGX will behave as it has previously without the overcommit_percent limit. SGX may not be built as a module, but the module parameter interface is used in order to provide a convenient interface. The SGX overcommit_percent works differently than the core VM overcommit limit. Enclaves request backing pages one page at a time, and the number of in use backing pages that are allowed is a global resource that is limited for all enclaves. Introduce a pair of functions which can be used by callers when requesting backing RAM pages. These functions are responsible for accounting the page charges. A request may return an error if the request will cause the counter to exceed the backing page cap. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> --- .../admin-guide/kernel-parameters.txt | 7 ++ Documentation/x86/sgx.rst | 16 ++++- arch/x86/kernel/cpu/sgx/Makefile | 6 +- arch/x86/kernel/cpu/sgx/main.c | 64 +++++++++++++++++++ arch/x86/kernel/cpu/sgx/sgx.h | 2 + 5 files changed, 93 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9725c546a0d4..9d23c05a833b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5165,6 +5165,13 @@ serialnumber [BUGS=X86-32] + sgx.overcommit_percent= [X86-64,SGX] + Limits the amount of normal RAM used for backing + storage that may be allocate, expressed as a + percentage of the total number of EPC pages in the + system. + See Documentation/x86/sgx.rst for more information. + shapers= [NET] Maximal number of shapers. diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst index 265568a9292c..4f9a1c68be94 100644 --- a/Documentation/x86/sgx.rst +++ b/Documentation/x86/sgx.rst @@ -147,7 +147,21 @@ Page reclaimer Similar to the core kswapd, ksgxd, is responsible for managing the overcommitment of enclave memory. If the system runs out of enclave memory, -*ksgxd* “swaps” enclave memory to normal memory. +*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is allocated +via per enclave shared memory. The shared memory area is not mapped into the +enclave or the task mapping it, which makes its memory use opaque - including +to the system out of memory killer (OOM). This can be problematic when there +are no limits in place on the amount an enclave can allocate. + +At boot time, the module parameter "sgx.overcommit_percent" can be used to +place a limit on the number of shared memory backing pages that may be +allocated, expressed as a percentage of the total number of EPC pages in the +system. A value of 100 is the default, and represents a limit equal to the +number of EPC pages in the system. To disable the limit, set +sgx.overcommit_percent to -1. The number of backing pages available to +enclaves is a global resource. If the system exceeds the number of allowed +backing pages in use, the reclaimer will be unable to swap EPC pages to +shared memory. Launch Control ============== diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..72f9192a43fe 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -1,6 +1,10 @@ -obj-y += \ +# This allows sgx to have module namespace +obj-y += sgx.o + +sgx-y += \ driver.o \ encl.o \ ioctl.o \ main.o + obj-$(CONFIG_X86_SGX_KVM) += virt.o diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 2857a49f2335..c58ce9d9fd56 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright(c) 2016-20 Intel Corporation. */ +#include <linux/moduleparam.h> #include <linux/file.h> #include <linux/freezer.h> #include <linux/highmem.h> @@ -43,6 +44,54 @@ static struct sgx_numa_node *sgx_numa_nodes; static LIST_HEAD(sgx_dirty_page_list); +/* + * Limits the amount of normal RAM that SGX can consume for EPC + * overcommit to the total EPC pages * sgx_overcommit_percent / 100 + */ +static int sgx_overcommit_percent = 100; +module_param_named(overcommit_percent, sgx_overcommit_percent, int, 0440); +MODULE_PARM_DESC(overcommit_percent, "Percentage of overcommit of EPC pages."); + +/* The number of pages that can be allocated globally for backing storage. */ +static atomic_long_t sgx_nr_available_backing_pages; +static bool sgx_disable_overcommit_tracking; + +/** + * sgx_charge_mem() - charge for a page used for backing storage + * + * Backing storage usage is capped by the sgx_nr_available_backing_pages. + * If the backing storage usage is over the overcommit limit, + * return an error. + * + * Return: + * 0: The page requested does not exceed the limit + * -ENOMEM: The page requested exceeds the overcommit limit + */ +int sgx_charge_mem(void) +{ + if (sgx_disable_overcommit_tracking) + return 0; + + if (!atomic_long_add_unless(&sgx_nr_available_backing_pages, -1, 0)) + return -ENOMEM; + + return 0; +} + +/** + * sgx_uncharge_mem() - uncharge a page previously used for backing storage + * + * When backing storage is no longer in use, increment the + * sgx_nr_available_backing_pages counter. + */ +void sgx_uncharge_mem(void) +{ + if (sgx_disable_overcommit_tracking) + return; + + atomic_long_inc(&sgx_nr_available_backing_pages); +} + /* * Reset post-kexec EPC pages to the uninitialized state. The pages are removed * from the input list, and made available for the page allocator. SECS pages @@ -786,6 +835,7 @@ static bool __init sgx_page_cache_init(void) u64 pa, size; int nid; int i; + u64 total_epc_bytes = 0; sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL); if (!sgx_numa_nodes) @@ -830,6 +880,7 @@ static bool __init sgx_page_cache_init(void) sgx_epc_sections[i].node = &sgx_numa_nodes[nid]; sgx_numa_nodes[nid].size += size; + total_epc_bytes += size; sgx_nr_epc_sections++; } @@ -839,6 +890,19 @@ static bool __init sgx_page_cache_init(void) return false; } + if (sgx_overcommit_percent >= 0) { + u64 available_backing_bytes; + + available_backing_bytes = + total_epc_bytes * (sgx_overcommit_percent / 100); + + atomic_long_set(&sgx_nr_available_backing_pages, + available_backing_bytes >> PAGE_SHIFT); + } else { + pr_info("Disabling overcommit limit.\n"); + sgx_disable_overcommit_tracking = true; + } + return true; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 0f17def9fe6f..3507a9983fc1 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -89,6 +89,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page); void sgx_mark_page_reclaimable(struct sgx_epc_page *page); int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); +int sgx_charge_mem(void); +void sgx_uncharge_mem(void); #ifdef CONFIG_X86_SGX_KVM int __init sgx_vepc_init(void); -- 2.20.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi @ 2021-12-20 19:30 ` Borislav Petkov 2021-12-20 20:39 ` Kristen Carlson Accardi 2021-12-28 23:04 ` Jarkko Sakkinen 1 sibling, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2021-12-20 19:30 UTC (permalink / raw) To: Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi wrote: > Similar to the core kswapd, ksgxd, is responsible for managing the > overcommitment of enclave memory. If the system runs out of enclave memory, > -*ksgxd* “swaps” enclave memory to normal memory. > +*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is allocated > +via per enclave shared memory. The shared memory area is not mapped into the > +enclave or the task mapping it, which makes its memory use opaque - including > +to the system out of memory killer (OOM). This can be problematic when there > +are no limits in place on the amount an enclave can allocate. Problematic how? The commit message above is talking about what your patch does and that is kinda clear from the diff. The *why* is really missing. Only that allusion that it might be problematic in some cases but that's not even scratching the surface. > +At boot time, the module parameter "sgx.overcommit_percent" can be used to > +place a limit on the number of shared memory backing pages that may be > +allocated, expressed as a percentage of the total number of EPC pages in the > +system. A value of 100 is the default, and represents a limit equal to the > +number of EPC pages in the system. To disable the limit, set > +sgx.overcommit_percent to -1. The number of backing pages available to > +enclaves is a global resource. If the system exceeds the number of allowed > +backing pages in use, the reclaimer will be unable to swap EPC pages to > +shared memory. So you're basically putting the burden on the user/sysadmin to *actually* *know* what percentage is "problematic" and to know what to supply. I'd bet not very many people would know how much is problematic and it probably all depends. So why don't you come up with a sane default, instead, which works in most cases and set it automatically? Dunno, maybe some scaled percentage of memory depending on how many enclaves are run but all up to a sane limit of, say, 90% of total memory so that there are 10% left for normal system operation. This way you'll avoid "problematic" and still have some memory left for other use. Or something like that. Adding yet another knob is yuck and the easy way out. And we have waaaay too many knobs so we should always try to do the automatic thing, if at all possible. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 19:30 ` Borislav Petkov @ 2021-12-20 20:39 ` Kristen Carlson Accardi 2021-12-20 21:11 ` Borislav Petkov 0 siblings, 1 reply; 19+ messages in thread From: Kristen Carlson Accardi @ 2021-12-20 20:39 UTC (permalink / raw) To: Borislav Petkov Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin Hi Boris, thanks for taking look. On Mon, 2021-12-20 at 20:30 +0100, Borislav Petkov wrote: > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi > wrote: > > Similar to the core kswapd, ksgxd, is responsible for managing the > > overcommitment of enclave memory. If the system runs out of > > enclave memory, > > -*ksgxd* “swaps” enclave memory to normal memory. > > +*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is > > allocated > > +via per enclave shared memory. The shared memory area is not > > mapped into the > > +enclave or the task mapping it, which makes its memory use opaque > > - including > > +to the system out of memory killer (OOM). This can be problematic > > when there > > +are no limits in place on the amount an enclave can allocate. > > Problematic how? If a malicious or just extra large enclave is loaded, or even just a lot of enclaves, it can eat up all the normal RAM on the system. Normal methods of finding out where all the memory on the system is being used, wouldn't be able to find this usage since it is shared memory. In addition, the OOM killer wouldn't be able to kill any enclaves. > > The commit message above is talking about what your patch does and > that > is kinda clear from the diff. The *why* is really missing. Only that > allusion that it might be problematic in some cases but that's not > even > scratching the surface. > > > +At boot time, the module parameter "sgx.overcommit_percent" can be > > used to > > +place a limit on the number of shared memory backing pages that > > may be > > +allocated, expressed as a percentage of the total number of EPC > > pages in the > > +system. A value of 100 is the default, and represents a limit > > equal to the > > +number of EPC pages in the system. To disable the limit, set > > +sgx.overcommit_percent to -1. The number of backing pages > > available to > > +enclaves is a global resource. If the system exceeds the number of > > allowed > > +backing pages in use, the reclaimer will be unable to swap EPC > > pages to > > +shared memory. > > So you're basically putting the burden on the user/sysadmin to > *actually* *know* what percentage is "problematic" and to know what > to > supply. I'd bet not very many people would know how much is > problematic > and it probably all depends. > > So why don't you come up with a sane default, instead, which works in > most cases and set it automatically? The default value is set to 100%, and this percentage, plus the number of EPC pages in the system is used to calculate a sane value for the number of backing pages to add - in this case, exactly the number of EPC pages in the system. It is set automatically if the parameter is not used to override it. > > Dunno, maybe some scaled percentage of memory depending on how many > enclaves are run but all up to a sane limit of, say, 90% of total > memory > so that there are 10% left for normal system operation. I think in this case you are still making a judgement call for the admin about how much memory you think they ought to be able to use for non-SGX operations, and it feels more natural to me to have the default based on how many backing pages you might reasonably need. I suppose one could check the total system memory available and double check that the calculated number of backing pages you'd give out would never exceed some percentage of total system RAM, but then you wind up with 2 knobs to play with instead of just one, which seems wrong to me. > > This way you'll avoid "problematic" and still have some memory left > for > other use. > > Or something like that. > > Adding yet another knob is yuck and the easy way out. And we have > waaaay > too many knobs so we should always try to do the automatic thing, if > at > all possible. I completely agree - so I'm trying to make sure I understand this comment, as the value is currently set to default that would automatically apply that is based on EPC memory present and not a fixed value. So I'd like to understand what you'd like to see done differently. are you saying you'd like to eliminated the ability to override the automatic default? Or just that you'd rather calculate the percentage based on total normal system RAM rather than EPC memory? Thanks, Kristen ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 20:39 ` Kristen Carlson Accardi @ 2021-12-20 21:11 ` Borislav Petkov 2021-12-20 21:35 ` Kristen Carlson Accardi 0 siblings, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2021-12-20 21:11 UTC (permalink / raw) To: Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml Bah, that thread is not on lkml. Please always Cc lkml in the future. On Mon, Dec 20, 2021 at 12:39:19PM -0800, Kristen Carlson Accardi wrote: > If a malicious or just extra large enclave is loaded, or even just a > lot of enclaves, it can eat up all the normal RAM on the system. Normal > methods of finding out where all the memory on the system is being > used, wouldn't be able to find this usage since it is shared memory. In > addition, the OOM killer wouldn't be able to kill any enclaves. So you need some sort of limiting against malicious enclaves anyways, regardless of this knob. IOW, you can set a percentage limit of per-enclave memory which cannot exceed a certain amount which won't prevent the system from its normal operation. For example. > I completely agree - so I'm trying to make sure I understand this > comment, as the value is currently set to default that would > automatically apply that is based on EPC memory present and not a fixed > value. So I'd like to understand what you'd like to see done > differently. are you saying you'd like to eliminated the ability to > override the automatic default? Or just that you'd rather calculate the > percentage based on total normal system RAM rather than EPC memory? So you say that there are cases where swapping to normal RAM can eat up all RAM. So the first heuristic should be: do not allow for *all* enclaves together to use up more than X percent of normal RAM during EPC reclaim. X percent being, say, 90% of all RAM. For example. I guess 10% should be enough for normal operation but someone who's more knowledgeable in system limits could chime in here. Then, define a per-enclave limit which says, an enclave can use Y % of memory for swapping when trying to reclaim EPC memory. And that can be something like: 90 % RAM -------- total amount of enclaves currently on the system And you can obviously create scenarios where creating too many enclaves can bring the system into a situation where it doesn't do any forward progress. But you probably can cause the same with overcommitting with VMs so perhaps it would be a good idea to look how overcommitting VMs and limits there are handled. Bottom line is: the logic should be for the most common cases to function properly, out-of-the-box, without knobs. And then to keep the system operational by preventing enclaves from bringing it down to a halt just by doing EPC reclaim. Does that make more sense? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 21:11 ` Borislav Petkov @ 2021-12-20 21:35 ` Kristen Carlson Accardi 2021-12-20 22:48 ` Borislav Petkov 2021-12-22 14:21 ` Dave Hansen 0 siblings, 2 replies; 19+ messages in thread From: Kristen Carlson Accardi @ 2021-12-20 21:35 UTC (permalink / raw) To: Borislav Petkov Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml On Mon, 2021-12-20 at 22:11 +0100, Borislav Petkov wrote: > Bah, that thread is not on lkml. Please always Cc lkml in the future. > > On Mon, Dec 20, 2021 at 12:39:19PM -0800, Kristen Carlson Accardi > wrote: > > If a malicious or just extra large enclave is loaded, or even just > > a > > lot of enclaves, it can eat up all the normal RAM on the system. > > Normal > > methods of finding out where all the memory on the system is being > > used, wouldn't be able to find this usage since it is shared > > memory. In > > addition, the OOM killer wouldn't be able to kill any enclaves. > > So you need some sort of limiting against malicious enclaves anyways, > regardless of this knob. IOW, you can set a percentage limit of > per-enclave memory which cannot exceed a certain amount which won't > prevent the system from its normal operation. For example. > > > I completely agree - so I'm trying to make sure I understand this > > comment, as the value is currently set to default that would > > automatically apply that is based on EPC memory present and not a > > fixed > > value. So I'd like to understand what you'd like to see done > > differently. are you saying you'd like to eliminated the ability to > > override the automatic default? Or just that you'd rather calculate > > the > > percentage based on total normal system RAM rather than EPC memory? > > So you say that there are cases where swapping to normal RAM can eat > up all RAM. > > So the first heuristic should be: do not allow for *all* enclaves > together to use up more than X percent of normal RAM during EPC > reclaim. So, in your proposal, you would first change the calculated number of maximum available backing pages to be based on total system RAM rather than EPC memory, got it. > > X percent being, say, 90% of all RAM. For example. I guess 10% should > be enough for normal operation but someone who's more knowledgeable > in > system limits could chime in here. > > Then, define a per-enclave limit which says, an enclave can use Y % > of > memory for swapping when trying to reclaim EPC memory. And that can > be > something like: > > 90 % RAM > -------- > total amount of enclaves currently on the system > This would require recalculating the max number of allowed backing pages per enclave at run time whenever a new enclave is loaded - but all the existing enclaves may have already used more than the new max number of per-enclave allowable pages. How would you handle that scenario? This would add a lot of complexity for sure - and it does make me wonder whether any additional benefit of limiting per enclave would be worth it. > And you can obviously create scenarios where creating too many > enclaves > can bring the system into a situation where it doesn't do any forward > progress. > > But you probably can cause the same with overcommitting with VMs so > perhaps it would be a good idea to look how overcommitting VMs and > limits there are handled. > > Bottom line is: the logic should be for the most common cases to > function properly, out-of-the-box, without knobs. And then to keep > the > system operational by preventing enclaves from bringing it down to a > halt just by doing EPC reclaim. > > Does that make more sense? > Thanks for your more detailed explanation - I will take a look at the VM overcommit limits. Since obviously the original implementation did have a default value set, I had still a remaining specific question about your comments. Are you suggesting that there should not be a way to override any overcommit limit at all? So drop the parameter all together? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 21:35 ` Kristen Carlson Accardi @ 2021-12-20 22:48 ` Borislav Petkov 2021-12-21 15:53 ` Dave Hansen 2021-12-22 14:21 ` Dave Hansen 1 sibling, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2021-12-20 22:48 UTC (permalink / raw) To: Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml On Mon, Dec 20, 2021 at 01:35:03PM -0800, Kristen Carlson Accardi wrote: > So, in your proposal you would first change the calculated number of > maximum available backing pages to be based on total system RAM rather > than EPC memory, got it. This was just an example. My point is to try to make it automatic and not introduce another knob. And to decide on the limits and behavior by using common sense and addressing the common use cases first. > This would require recalculating the max number of allowed backing > pages per enclave at run time whenever a new enclave is loaded - but > all the existing enclaves may have already used more than the new max > number of per-enclave allowable pages. How would you handle that > scenario? This would add a lot of complexity for sure - and it does > make me wonder whether any additional benefit of limiting per enclave > would be worth it. See above - this was just an example. And as you've shown, an example of what *not* to do. > Thanks for your more detailed explanation - I will take a look at the > VM overcommit limits. Since obviously the original implementation did > have a default value set, I had still a remaining specific question > about your comments. Are you suggesting that there should not be a way > to override any overcommit limit at all? So drop the parameter all > together? So let me ask you a counter-question: Imagine you're a sysadmin. Or a general, common system user if there ever is one. When your system starts thrashing and you're running a bunch of enclaves, how do you find out there even is a knob which might potentially help you? And after you find it, how would you use that knob? Or would you rather prefer that the system did the right thing for you instead of having to figure out what the sensible values for that knob would be? My point is: put yourself in the user's shoes and try to think about what would be the optimal thing the system should do. Once that is cleanly and properly defined, then we can deal with knobs and policies etc. I sincerely hope that makes more sense. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 22:48 ` Borislav Petkov @ 2021-12-21 15:53 ` Dave Hansen 0 siblings, 0 replies; 19+ messages in thread From: Dave Hansen @ 2021-12-21 15:53 UTC (permalink / raw) To: Borislav Petkov, Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml On 12/20/21 2:48 PM, Borislav Petkov wrote: > On Mon, Dec 20, 2021 at 01:35:03PM -0800, Kristen Carlson Accardi wrote: >> So, in your proposal you would first change the calculated number of >> maximum available backing pages to be based on total system RAM rather >> than EPC memory, got it. > > This was just an example. My point is to try to make it automatic and > not introduce another knob. And to decide on the limits and behavior > by using common sense and addressing the common use cases first. The common case is clearly a few enclaves on systems where the overcommit levels are modest. The "100%" limit will work great there. I'd personally be fine with just enforcing that limit as the default and ignoring everything else. It makes me a bit nervous, though, because suddenly enforcing a limit is an ABI break. The tunable effectively gives us a way out if we screw up either the limit's quantity or someone needs the old ABI back. That said, we don't *need* it to be tunable, boot parameter or not. If you're concerned about the tunable, I think we should drop it and not look back. > Imagine you're a sysadmin. Or a general, common system user if there > ever is one. > > When your system starts thrashing and you're running a bunch of > enclaves, how do you find out there even is a knob which might > potentially help you? I'm selfish. The tunable isn't for end users. It's for me. The scenario I envisioned is that a user upgrades to a new kernel and their enclaves start crashing. They'll eventually find us, the maintainers of the SGX code, and we'll have a tool as kernel developers to help them. The tunable helps _me_ in two ways: 1. It help me easily get user back to pre-5.17 (or whatever) behavior 2. If we got the "100%" value wrong, end users can help us experiment to help find a better value. BTW, all the chat about "malicious" enclaves and so forth... I *totally* agree that this is a problem and one worth solving. It just can't be solved today. We need real cgroup support. It's coming soon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 21:35 ` Kristen Carlson Accardi 2021-12-20 22:48 ` Borislav Petkov @ 2021-12-22 14:21 ` Dave Hansen 1 sibling, 0 replies; 19+ messages in thread From: Dave Hansen @ 2021-12-22 14:21 UTC (permalink / raw) To: Kristen Carlson Accardi, Borislav Petkov Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml On 12/20/21 1:35 PM, Kristen Carlson Accardi wrote: > Thanks for your more detailed explanation - I will take a look at the > VM overcommit limits. Since obviously the original implementation did > have a default value set, I had still a remaining specific question > about your comments. Are you suggesting that there should not be a way > to override any overcommit limit at all? So drop the parameter all > together? Yes, let's kill the exposed tunable. We don't have any strong, practical need for it at the moment other than paranoia. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi 2021-12-20 19:30 ` Borislav Petkov @ 2021-12-28 23:04 ` Jarkko Sakkinen 2021-12-28 23:34 ` Dave Hansen 2022-01-06 18:26 ` Kristen Carlson Accardi 1 sibling, 2 replies; 19+ messages in thread From: Jarkko Sakkinen @ 2021-12-28 23:04 UTC (permalink / raw) To: Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi wrote: > When the system runs out of enclave memory, SGX can reclaim EPC pages > by swapping to normal RAM. This normal RAM is allocated via a > per-enclave shared memory area. The shared memory area is not mapped > into the enclave or the task mapping it, which makes its memory use > opaque (including to the OOM killer). Having lots of hard to find > memory around is problematic, especially when there is no limit. > > Introduce a module parameter and a global counter that can be used to limit > the number of pages that enclaves are able to consume for backing storage. > This parameter is a percentage value that is used in conjunction with the > number of EPC pages in the system to set a cap on the amount of backing > RAM that can be consumed. > > The default for this value is 100, which limits the total number of > shared memory pages that may be consumed by all enclaves as backing pages > to the number of EPC pages on the system. > > For example, on an SGX system that has 128MB of EPC, this default would > cap the amount of normal RAM that SGX consumes for its shared memory > areas at 128MB. > > If sgx.overcommit_percent is set to a negative value (such as -1), > SGX will not place any limits on the amount of overcommit that might > be requested, and SGX will behave as it has previously without the > overcommit_percent limit. > > SGX may not be built as a module, but the module parameter interface > is used in order to provide a convenient interface. > > The SGX overcommit_percent works differently than the core VM overcommit > limit. Enclaves request backing pages one page at a time, and the number > of in use backing pages that are allowed is a global resource that is > limited for all enclaves. > > Introduce a pair of functions which can be used by callers when requesting > backing RAM pages. These functions are responsible for accounting the > page charges. A request may return an error if the request will cause the > counter to exceed the backing page cap. > > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > --- > .../admin-guide/kernel-parameters.txt | 7 ++ > Documentation/x86/sgx.rst | 16 ++++- > arch/x86/kernel/cpu/sgx/Makefile | 6 +- > arch/x86/kernel/cpu/sgx/main.c | 64 +++++++++++++++++++ > arch/x86/kernel/cpu/sgx/sgx.h | 2 + > 5 files changed, 93 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 9725c546a0d4..9d23c05a833b 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -5165,6 +5165,13 @@ > > serialnumber [BUGS=X86-32] > > + sgx.overcommit_percent= [X86-64,SGX] > + Limits the amount of normal RAM used for backing > + storage that may be allocate, expressed as a > + percentage of the total number of EPC pages in the > + system. > + See Documentation/x86/sgx.rst for more information. > + > shapers= [NET] > Maximal number of shapers. > > diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst > index 265568a9292c..4f9a1c68be94 100644 > --- a/Documentation/x86/sgx.rst > +++ b/Documentation/x86/sgx.rst > @@ -147,7 +147,21 @@ Page reclaimer > > Similar to the core kswapd, ksgxd, is responsible for managing the > overcommitment of enclave memory. If the system runs out of enclave memory, > -*ksgxd* “swaps” enclave memory to normal memory. > +*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is allocated > +via per enclave shared memory. The shared memory area is not mapped into the > +enclave or the task mapping it, which makes its memory use opaque - including > +to the system out of memory killer (OOM). This can be problematic when there > +are no limits in place on the amount an enclave can allocate. > + > +At boot time, the module parameter "sgx.overcommit_percent" can be used to > +place a limit on the number of shared memory backing pages that may be > +allocated, expressed as a percentage of the total number of EPC pages in the > +system. A value of 100 is the default, and represents a limit equal to the > +number of EPC pages in the system. To disable the limit, set > +sgx.overcommit_percent to -1. The number of backing pages available to > +enclaves is a global resource. If the system exceeds the number of allowed > +backing pages in use, the reclaimer will be unable to swap EPC pages to > +shared memory. > > Launch Control > ============== > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile > index 9c1656779b2a..72f9192a43fe 100644 > --- a/arch/x86/kernel/cpu/sgx/Makefile > +++ b/arch/x86/kernel/cpu/sgx/Makefile > @@ -1,6 +1,10 @@ > -obj-y += \ > +# This allows sgx to have module namespace > +obj-y += sgx.o > + > +sgx-y += \ > driver.o \ > encl.o \ > ioctl.o \ > main.o > + > obj-$(CONFIG_X86_SGX_KVM) += virt.o > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index 2857a49f2335..c58ce9d9fd56 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -1,6 +1,7 @@ > // SPDX-License-Identifier: GPL-2.0 > /* Copyright(c) 2016-20 Intel Corporation. */ > > +#include <linux/moduleparam.h> > #include <linux/file.h> > #include <linux/freezer.h> > #include <linux/highmem.h> > @@ -43,6 +44,54 @@ static struct sgx_numa_node *sgx_numa_nodes; > > static LIST_HEAD(sgx_dirty_page_list); > > +/* > + * Limits the amount of normal RAM that SGX can consume for EPC > + * overcommit to the total EPC pages * sgx_overcommit_percent / 100 > + */ > +static int sgx_overcommit_percent = 100; > +module_param_named(overcommit_percent, sgx_overcommit_percent, int, 0440); > +MODULE_PARM_DESC(overcommit_percent, "Percentage of overcommit of EPC pages."); > + > +/* The number of pages that can be allocated globally for backing storage. */ > +static atomic_long_t sgx_nr_available_backing_pages; > +static bool sgx_disable_overcommit_tracking; I don't like the use of word tracking as we already have ETRACK. I'd also shorten the first global as "sgx_nr_backing_pages". Couldn't you set "sgx_nr_backing_pages" to -1 when capping is disabled, and then you would not need that bool in the first place? > + > +/** > + * sgx_charge_mem() - charge for a page used for backing storage > + * Please remove this empty line: https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt > + * Backing storage usage is capped by the sgx_nr_available_backing_pages. > + * If the backing storage usage is over the overcommit limit, Where does this verb "charge" come from? /Jarkko ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-28 23:04 ` Jarkko Sakkinen @ 2021-12-28 23:34 ` Dave Hansen 2022-01-06 18:26 ` Kristen Carlson Accardi 1 sibling, 0 replies; 19+ messages in thread From: Dave Hansen @ 2021-12-28 23:34 UTC (permalink / raw) To: Jarkko Sakkinen, Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin >> +/* >> + * Limits the amount of normal RAM that SGX can consume for EPC >> + * overcommit to the total EPC pages * sgx_overcommit_percent / 100 >> + */ >> +static int sgx_overcommit_percent = 100; >> +module_param_named(overcommit_percent, sgx_overcommit_percent, int, 0440); >> +MODULE_PARM_DESC(overcommit_percent, "Percentage of overcommit of EPC pages."); >> + >> +/* The number of pages that can be allocated globally for backing storage. */ >> +static atomic_long_t sgx_nr_available_backing_pages; >> +static bool sgx_disable_overcommit_tracking; > > I don't like the use of word tracking as we already have ETRACK. I don't think anyone is going to confuse "overcommit tracking" with ETRACK. That said, this *could* be "sgx_disable_overcommit_limits", I guess. > I'd also shorten the first global as "sgx_nr_backing_pages". That means something different from the variable, though. "sgx_nr_backing_pages" would be the name for the current number of backing pages which currently exist. > Couldn't you set "sgx_nr_backing_pages" to -1 when capping is disabled, and > then you would not need that bool in the first place? > >> + >> +/** >> + * sgx_charge_mem() - charge for a page used for backing storage >> + * > > Please remove this empty line: > > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt That *might* make sense when there are arguments. The arguments at least help visually separate the short function description from the full text description. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2021-12-28 23:04 ` Jarkko Sakkinen 2021-12-28 23:34 ` Dave Hansen @ 2022-01-06 18:26 ` Kristen Carlson Accardi 2022-01-07 12:25 ` Jarkko Sakkinen 1 sibling, 1 reply; 19+ messages in thread From: Kristen Carlson Accardi @ 2022-01-06 18:26 UTC (permalink / raw) To: Jarkko Sakkinen Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin Hi Jarkko, thanks for your review, On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote: > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi > wrote: > > > > + > > +/** > > + * sgx_charge_mem() - charge for a page used for backing storage > > + * > > Please remove this empty line: > > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt I read this to be that there should be an empty line after the short description/arg list but before the longer description. I think for functions without args this is the proper layout. It also is more readable. > > > + * Backing storage usage is capped by the > > sgx_nr_available_backing_pages. > > + * If the backing storage usage is over the overcommit limit, > > Where does this verb "charge" come from? Charge in this context means that some available backing pages are now not available because they are in use. It feels appropriate to me to use "charge/uncharge" verbs for this action, unless you think it's confusing somehow. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2022-01-06 18:26 ` Kristen Carlson Accardi @ 2022-01-07 12:25 ` Jarkko Sakkinen 2022-01-07 17:17 ` Kristen Carlson Accardi 0 siblings, 1 reply; 19+ messages in thread From: Jarkko Sakkinen @ 2022-01-07 12:25 UTC (permalink / raw) To: Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin On Thu, Jan 06, 2022 at 10:26:18AM -0800, Kristen Carlson Accardi wrote: > Hi Jarkko, thanks for your review, > > On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote: > > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi > > wrote: > > > > > > + > > > +/** > > > + * sgx_charge_mem() - charge for a page used for backing storage > > > + * > > > > Please remove this empty line: > > > > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt > > I read this to be that there should be an empty line after the short > description/arg list but before the longer description. I think for > functions without args this is the proper layout. It also is more > readable. > > > > > > + * Backing storage usage is capped by the > > > sgx_nr_available_backing_pages. > > > + * If the backing storage usage is over the overcommit limit, > > > > Where does this verb "charge" come from? > > Charge in this context means that some available backing pages are now > not available because they are in use. It feels appropriate to me to > use "charge/uncharge" verbs for this action, unless you think it's > confusing somehow. OK, it's cool. I'm still wondering why you need extra variable given that sgx_nr_backing_available_pages is signed. You could mark the feature being disabled by setting it to -1. It might cosmetically improve readability to have a boolean but for practical uses (e.g. eBPF tracing scripts) it only adds extra complexity. /Jarkko ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2022-01-07 12:25 ` Jarkko Sakkinen @ 2022-01-07 17:17 ` Kristen Carlson Accardi 2022-01-08 15:54 ` Jarkko Sakkinen 0 siblings, 1 reply; 19+ messages in thread From: Kristen Carlson Accardi @ 2022-01-07 17:17 UTC (permalink / raw) To: Jarkko Sakkinen Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin On Fri, 2022-01-07 at 14:25 +0200, Jarkko Sakkinen wrote: > On Thu, Jan 06, 2022 at 10:26:18AM -0800, Kristen Carlson Accardi > wrote: > > Hi Jarkko, thanks for your review, > > > > On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote: > > > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi > > > wrote: > > > > + > > > > +/** > > > > + * sgx_charge_mem() - charge for a page used for backing > > > > storage > > > > + * > > > > > > Please remove this empty line: > > > > > > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt > > > > I read this to be that there should be an empty line after the > > short > > description/arg list but before the longer description. I think for > > functions without args this is the proper layout. It also is more > > readable. > > > > > > + * Backing storage usage is capped by the > > > > sgx_nr_available_backing_pages. > > > > + * If the backing storage usage is over the overcommit limit, > > > > > > Where does this verb "charge" come from? > > > > Charge in this context means that some available backing pages are > > now > > not available because they are in use. It feels appropriate to me > > to > > use "charge/uncharge" verbs for this action, unless you think it's > > confusing somehow. > > OK, it's cool. > > I'm still wondering why you need extra variable given that > sgx_nr_backing_available_pages is signed. You could mark the > feature being disabled by setting it to -1. > > It might cosmetically improve readability to have a boolean but > for practical uses (e.g. eBPF tracing scripts) it only adds extra > complexity. > > /Jarkko I can see your point. Since Boris objected to the module param, the next version will not have a way to disable limits at all, so I am deleting that boolean all together. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit 2022-01-07 17:17 ` Kristen Carlson Accardi @ 2022-01-08 15:54 ` Jarkko Sakkinen 0 siblings, 0 replies; 19+ messages in thread From: Jarkko Sakkinen @ 2022-01-08 15:54 UTC (permalink / raw) To: Kristen Carlson Accardi Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin On Fri, Jan 07, 2022 at 09:17:03AM -0800, Kristen Carlson Accardi wrote: > On Fri, 2022-01-07 at 14:25 +0200, Jarkko Sakkinen wrote: > > On Thu, Jan 06, 2022 at 10:26:18AM -0800, Kristen Carlson Accardi > > wrote: > > > Hi Jarkko, thanks for your review, > > > > > > On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote: > > > > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi > > > > wrote: > > > > > + > > > > > +/** > > > > > + * sgx_charge_mem() - charge for a page used for backing > > > > > storage > > > > > + * > > > > > > > > Please remove this empty line: > > > > > > > > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt > > > > > > I read this to be that there should be an empty line after the > > > short > > > description/arg list but before the longer description. I think for > > > functions without args this is the proper layout. It also is more > > > readable. > > > > > > > > + * Backing storage usage is capped by the > > > > > sgx_nr_available_backing_pages. > > > > > + * If the backing storage usage is over the overcommit limit, > > > > > > > > Where does this verb "charge" come from? > > > > > > Charge in this context means that some available backing pages are > > > now > > > not available because they are in use. It feels appropriate to me > > > to > > > use "charge/uncharge" verbs for this action, unless you think it's > > > confusing somehow. > > > > OK, it's cool. > > > > I'm still wondering why you need extra variable given that > > sgx_nr_backing_available_pages is signed. You could mark the > > feature being disabled by setting it to -1. > > > > It might cosmetically improve readability to have a boolean but > > for practical uses (e.g. eBPF tracing scripts) it only adds extra > > complexity. > > > > /Jarkko > > I can see your point. Since Boris objected to the module param, the > next version will not have a way to disable limits at all, so I am > deleting that boolean all together. Ok, cool, I'm looking forward for the next version. /Jarkko ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/2] x86/sgx: account backing pages 2021-12-20 17:46 [PATCH 0/2] x86/sgx: Limit EPC overcommit Kristen Carlson Accardi 2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi @ 2021-12-20 17:46 ` Kristen Carlson Accardi 2021-12-28 23:37 ` Jarkko Sakkinen 1 sibling, 1 reply; 19+ messages in thread From: Kristen Carlson Accardi @ 2021-12-20 17:46 UTC (permalink / raw) To: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin SGX may allow EPC pages to be overcommitted. If the system is out of enclave memory, EPC pages are swapped to normal RAM via a per enclave shared memory area. This shared memory is not charged to the enclave or the task mapping it, making it hard to account for using normal methods. In order to avoid unlimited usage of normal RAM, enclaves must be charged for each new page used for backing storage, and uncharged when they are no longer using a backing page. Modify the existing flow for requesting backing pages to reduce the available backing page counter and confirm that the limit has not been exceeded. Backing page usage for loading EPC pages back out of the shared memory do not incur a charge. When a backing page is released from usage, increment the available backing page counter. When swapping EPC pages to RAM, in addition to storing the page contents, SGX must store some additional metadata to protect against a malicious kernel when the page is swapped back in. This additional metadata is called Paging Crypto MetaData. PCMD is allocated from the same shared memory area as the backing page contents and consumes RAM the same way. PCMD is 128 bytes in size, and there is one PCMD structure per page written to shared RAM. The page index for the PCMD page is calculated from the page index of the backing page, so it is possible that the PCMD structures are not packed into the minimum number of pages possible. If 32 PCMDs can fit onto a single page, then PCMD usage is 1/32 of total EPC pages. In the worst case, PCMD can consume the same amount of RAM as EPC backing pages (1:1). For simplicity, this implementation does not account for PCMD page usage. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> --- arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++++++++++++++++-- arch/x86/kernel/cpu/sgx/encl.h | 6 ++- arch/x86/kernel/cpu/sgx/main.c | 6 +-- 3 files changed, 80 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 001808e3901c..8be6f0592bdc 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, else page_index = PFN_DOWN(encl->size); - ret = sgx_encl_get_backing(encl, page_index, &b); + ret = sgx_encl_lookup_backing(encl, page_index, &b); if (ret) return ret; @@ -407,6 +407,12 @@ void sgx_encl_release(struct kref *ref) sgx_encl_free_epc_page(entry->epc_page); encl->secs_child_cnt--; entry->epc_page = NULL; + } else { + /* + * If there is no epc_page, it means it has been + * swapped out. Uncharge the backing storage. + */ + sgx_uncharge_mem(); } kfree(entry); @@ -574,8 +580,8 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, * 0 on success, * -errno otherwise. */ -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, - struct sgx_backing *backing) +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing) { pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5); struct page *contents; @@ -601,6 +607,62 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, return 0; } +/** + * sgx_encl_alloc_backing() - allocate a new backing storage page + * @encl: an enclave pointer + * @page_index: enclave page index + * @backing: data for accessing backing storage for the page + * + * Confirm that the global overcommit limit has not been reached before + * requesting a new backing storage page for storing the encrypted contents + * and Paging Crypto MetaData (PCMD) of an enclave page. This is called when + * there is no existing backing page, just before writing to the backing + * storage with EWB. + * + * Return: + * 0 on success, + * -errno otherwise. + */ +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing) +{ + int ret; + + if (sgx_charge_mem()) + return -ENOMEM; + + ret = sgx_encl_get_backing(encl, page_index, backing); + if (ret) + sgx_uncharge_mem(); + + return ret; +} + +/** + * sgx_encl_lookup_backing() - retrieve an existing backing storage page + * @encl: an enclave pointer + * @page_index: enclave page index + * @backing: data for accessing backing storage for the page + * + * Retrieve a backing page for loading data back into an EPC page with ELDU. + * This call does not cause a charge to the overcommit limit because a page + * has already been allocated, but has been swapped out or is in RAM + * + * It is the caller's responsibility to ensure that it is appropriate to + * use sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If + * lookup is not used correctly, this will cause an allocation that is + * not accounted for. + * + * Return: + * 0 on success, + * -errno otherwise. + */ +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing) +{ + return sgx_encl_get_backing(encl, page_index, backing); +} + /** * sgx_encl_put_backing() - Unpin the backing storage * @backing: data for accessing backing storage for the page @@ -608,9 +670,17 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, */ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write) { + /* + * If the page is being written to by the reclaimer, it is + * still in use and the backing page usage should not be + * uncharged. However, if the page is not being written to, + * it is no longer in use and may be uncharged. + */ if (do_write) { set_page_dirty(backing->pcmd); set_page_dirty(backing->contents); + } else { + sgx_uncharge_mem(); } put_page(backing->pcmd); diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index fec43ca65065..8ffb8a83263f 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -105,8 +105,10 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, void sgx_encl_release(struct kref *ref); int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, - struct sgx_backing *backing); +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing); +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index, + struct sgx_backing *backing); void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write); int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index c58ce9d9fd56..0ef9b7398b35 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -357,8 +357,8 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, encl->secs_child_cnt--; if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { - ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), - &secs_backing); + ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size), + &secs_backing); if (ret) goto out; @@ -428,7 +428,7 @@ static void sgx_reclaim_pages(void) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); - ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); + ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]); if (ret) goto skip; -- 2.20.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] x86/sgx: account backing pages 2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi @ 2021-12-28 23:37 ` Jarkko Sakkinen 2022-01-05 0:36 ` Dave Hansen 0 siblings, 1 reply; 19+ messages in thread From: Jarkko Sakkinen @ 2021-12-28 23:37 UTC (permalink / raw) To: Kristen Carlson Accardi Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin On Mon, Dec 20, 2021 at 09:46:40AM -0800, Kristen Carlson Accardi wrote: > +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index, > + struct sgx_backing *backing) > +{ > + return sgx_encl_get_backing(encl, page_index, backing); > +} Is this wrapping necessary? Also, there is ambiguous terminology: 1. Local function: "get_backing" 2. Exported function: "lookup_backing" /Jarkko ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] x86/sgx: account backing pages 2021-12-28 23:37 ` Jarkko Sakkinen @ 2022-01-05 0:36 ` Dave Hansen 2022-01-08 14:24 ` Jarkko Sakkinen 0 siblings, 1 reply; 19+ messages in thread From: Dave Hansen @ 2022-01-05 0:36 UTC (permalink / raw) To: Jarkko Sakkinen, Kristen Carlson Accardi Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin On 12/28/21 3:37 PM, Jarkko Sakkinen wrote: > On Mon, Dec 20, 2021 at 09:46:40AM -0800, Kristen Carlson Accardi wrote: >> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index, >> + struct sgx_backing *backing) >> +{ >> + return sgx_encl_get_backing(encl, page_index, backing); >> +} > Is this wrapping necessary? Yes, I think so. > Also, there is ambiguous terminology: > > 1. Local function: "get_backing" > 2. Exported function: "lookup_backing" I'm not sure what you're getting at. There are three important things that you do with backing storage: 1. Allocate it 2. Find it 3. De-allocate (free) it Right now, the code has a pattern where it does: get_backing(); // do something put_backing(); That sure as heck looks like it is allocating and freeing it. But, it's actually *maybe* doing an allocation. The "find it" path also looks *EXACTLY* the same as the actual allocation path. You might also recall that the original code didn't even *have* a (real) free path. The "wrapping" is really just naming the two different operations that use the "get" function: lookup and allocate. It's not just wrapping, it's clarify the logical behavior. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] x86/sgx: account backing pages 2022-01-05 0:36 ` Dave Hansen @ 2022-01-08 14:24 ` Jarkko Sakkinen 0 siblings, 0 replies; 19+ messages in thread From: Jarkko Sakkinen @ 2022-01-08 14:24 UTC (permalink / raw) To: Dave Hansen Cc: Kristen Carlson Accardi, linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin On Tue, Jan 04, 2022 at 04:36:28PM -0800, Dave Hansen wrote: > On 12/28/21 3:37 PM, Jarkko Sakkinen wrote: > > On Mon, Dec 20, 2021 at 09:46:40AM -0800, Kristen Carlson Accardi wrote: > >> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index, > >> + struct sgx_backing *backing) > >> +{ > >> + return sgx_encl_get_backing(encl, page_index, backing); > >> +} > > Is this wrapping necessary? > > Yes, I think so. > > > Also, there is ambiguous terminology: > > > > 1. Local function: "get_backing" > > 2. Exported function: "lookup_backing" > > I'm not sure what you're getting at. > > There are three important things that you do with backing storage: > > 1. Allocate it > 2. Find it > 3. De-allocate (free) it > > Right now, the code has a pattern where it does: > > get_backing(); > // do something > put_backing(); > > That sure as heck looks like it is allocating and freeing it. But, it's > actually *maybe* doing an allocation. The "find it" path also looks > *EXACTLY* the same as the actual allocation path. You might also recall > that the original code didn't even *have* a (real) free path. > > The "wrapping" is really just naming the two different operations that > use the "get" function: lookup and allocate. It's not just wrapping, > it's clarify the logical behavior. Why it makes sense to keep sgx_encl_get_backing(), if it has zero call sites and not open-code its implementation to sgx_encl_lookup_backing(). I'm also wondering, why here the function is not named as sgx_encl_charge_backing(), i.e. follow the naming convention? It would be easier to remember the flow, when reading the code. Since we use "not as common name", let's take advantage of it to make maintaining the code easier later on. The commit message says: "Modify the existing flow for requesting backing pages to reduce the available backing page counter and confirm that the limit has not been exceeded. Backing page usage for loading EPC pages back out of the shared memory do not incur a charge." I would add, in order to make this less abstract: " In other words, replace call sites of sgx_encl_get_backing() with either: * sgx_encl_lookup_backing() for ELDU, which does not cause sgx_charge_mem() to be invoked. * sgx_encl_alloc_backing() for EWB, which does cause sgx_charge_mem() to be invoked. " It's currently way too abstract description of the code change. /Jarkko ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2022-01-08 15:55 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-20 17:46 [PATCH 0/2] x86/sgx: Limit EPC overcommit Kristen Carlson Accardi 2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi 2021-12-20 19:30 ` Borislav Petkov 2021-12-20 20:39 ` Kristen Carlson Accardi 2021-12-20 21:11 ` Borislav Petkov 2021-12-20 21:35 ` Kristen Carlson Accardi 2021-12-20 22:48 ` Borislav Petkov 2021-12-21 15:53 ` Dave Hansen 2021-12-22 14:21 ` Dave Hansen 2021-12-28 23:04 ` Jarkko Sakkinen 2021-12-28 23:34 ` Dave Hansen 2022-01-06 18:26 ` Kristen Carlson Accardi 2022-01-07 12:25 ` Jarkko Sakkinen 2022-01-07 17:17 ` Kristen Carlson Accardi 2022-01-08 15:54 ` Jarkko Sakkinen 2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi 2021-12-28 23:37 ` Jarkko Sakkinen 2022-01-05 0:36 ` Dave Hansen 2022-01-08 14:24 ` Jarkko Sakkinen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.