All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86/sgx: Limit EPC overcommit
@ 2021-12-20 17:46 Kristen Carlson Accardi
  2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi
  2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi
  0 siblings, 2 replies; 19+ messages in thread
From: Kristen Carlson Accardi @ 2021-12-20 17:46 UTC (permalink / raw)
  To: linux-sgx

SGX currently allows EPC pages to be overcommitted. If the system is
out of enclave memory, EPC pages are swapped to normal RAM via
a per enclave shared memory area. This shared memory is not charged
to the enclave or the task mapping it, making it hard to account
for using normal methods. Since SGX will allow EPC pages to be
overcommitted without limits, enclaves can consume system memory
for these backing pages without limits.

In order to prevent this, set a cap on the amount of overcommit SGX
allows based on a module param which can be set at boot time. Then,
whenever a backing page is requested by an enclave, keep track of
the total amount of shared memory pages used across all enclaves and
return an error if the overcommit limit has been reached. This will
restrict the total amount of backing pages that all enclaves can
consume to a maximum amount, and prevent enclaves from consuming
all the system RAM for backing pages.

The overcommit percentage has a default value of 100, which
limits shared memory page consumption to equal to the number of
EPC pages in the system. If sgx.overcommit_percent is set to a
negative value, SGX will not place any limits on the amount of
overcommit that might be requested, and SGX will behave as it has
previously without the sgx.overcommit_percent limit.


Kristen Carlson Accardi (2):
  x86/sgx: Add accounting for tracking overcommit
  x86/sgx: account backing pages

 .../admin-guide/kernel-parameters.txt         |  7 ++
 Documentation/x86/sgx.rst                     | 16 +++-
 arch/x86/kernel/cpu/sgx/Makefile              |  6 +-
 arch/x86/kernel/cpu/sgx/encl.c                | 76 ++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/encl.h                |  6 +-
 arch/x86/kernel/cpu/sgx/main.c                | 70 ++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |  2 +
 7 files changed, 173 insertions(+), 10 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 17:46 [PATCH 0/2] x86/sgx: Limit EPC overcommit Kristen Carlson Accardi
@ 2021-12-20 17:46 ` Kristen Carlson Accardi
  2021-12-20 19:30   ` Borislav Petkov
  2021-12-28 23:04   ` Jarkko Sakkinen
  2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi
  1 sibling, 2 replies; 19+ messages in thread
From: Kristen Carlson Accardi @ 2021-12-20 17:46 UTC (permalink / raw)
  To: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin

When the system runs out of enclave memory, SGX can reclaim EPC pages
by swapping to normal RAM. This normal RAM is allocated via a
per-enclave shared memory area. The shared memory area is not mapped
into the enclave or the task mapping it, which makes its memory use
opaque (including to the OOM killer). Having lots of hard to find
memory around is problematic, especially when there is no limit.

Introduce a module parameter and a global counter that can be used to limit
the number of pages that enclaves are able to consume for backing storage.
This parameter is a percentage value that is used in conjunction with the
number of EPC pages in the system to set a cap on the amount of backing
RAM that can be consumed.

The default for this value is 100, which limits the total number of
shared memory pages that may be consumed by all enclaves as backing pages
to the number of EPC pages on the system.

For example, on an SGX system that has 128MB of EPC, this default would
cap the amount of normal RAM that SGX consumes for its shared memory
areas at 128MB.

If sgx.overcommit_percent is set to a negative value (such as -1),
SGX will not place any limits on the amount of overcommit that might
be requested, and SGX will behave as it has previously without the
overcommit_percent limit.

SGX may not be built as a module, but the module parameter interface
is used in order to provide a convenient interface.

The SGX overcommit_percent works differently than the core VM overcommit
limit. Enclaves request backing pages one page at a time, and the number
of in use backing pages that are allowed is a global resource that is
limited for all enclaves.

Introduce a pair of functions which can be used by callers when requesting
backing RAM pages. These functions are responsible for accounting the
page charges. A request may return an error if the request will cause the
counter to exceed the backing page cap.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++
 Documentation/x86/sgx.rst                     | 16 ++++-
 arch/x86/kernel/cpu/sgx/Makefile              |  6 +-
 arch/x86/kernel/cpu/sgx/main.c                | 64 +++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h                 |  2 +
 5 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..9d23c05a833b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5165,6 +5165,13 @@
 
 	serialnumber	[BUGS=X86-32]
 
+	sgx.overcommit_percent=	[X86-64,SGX]
+			Limits the amount of normal RAM used for backing
+			storage that may be allocate, expressed as a
+			percentage of the total number of EPC pages in the
+			system.
+			See Documentation/x86/sgx.rst for more information.
+
 	shapers=	[NET]
 			Maximal number of shapers.
 
diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index 265568a9292c..4f9a1c68be94 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -147,7 +147,21 @@ Page reclaimer
 
 Similar to the core kswapd, ksgxd, is responsible for managing the
 overcommitment of enclave memory.  If the system runs out of enclave memory,
-*ksgxd* “swaps” enclave memory to normal memory.
+*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is allocated
+via per enclave shared memory. The shared memory area is not mapped into the
+enclave or the task mapping it, which makes its memory use opaque - including
+to the system out of memory killer (OOM). This can be problematic when there
+are no limits in place on the amount an enclave can allocate.
+
+At boot time, the module parameter "sgx.overcommit_percent" can be used to
+place a limit on the number of shared memory backing pages that may be
+allocated, expressed as a percentage of the total number of EPC pages in the
+system.  A value of 100 is the default, and represents a limit equal to the
+number of EPC pages in the system. To disable the limit, set
+sgx.overcommit_percent to -1. The number of backing pages available to
+enclaves is a global resource. If the system exceeds the number of allowed
+backing pages in use, the reclaimer will be unable to swap EPC pages to
+shared memory.
 
 Launch Control
 ==============
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index 9c1656779b2a..72f9192a43fe 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -1,6 +1,10 @@
-obj-y += \
+# This allows sgx to have module namespace
+obj-y += sgx.o
+
+sgx-y += \
 	driver.o \
 	encl.o \
 	ioctl.o \
 	main.o
+
 obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 2857a49f2335..c58ce9d9fd56 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/moduleparam.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
@@ -43,6 +44,54 @@ static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
 
+/*
+ * Limits the amount of normal RAM that SGX can consume for EPC
+ * overcommit to the total EPC pages * sgx_overcommit_percent / 100
+ */
+static int sgx_overcommit_percent = 100;
+module_param_named(overcommit_percent, sgx_overcommit_percent, int, 0440);
+MODULE_PARM_DESC(overcommit_percent, "Percentage of overcommit of EPC pages.");
+
+/* The number of pages that can be allocated globally for backing storage. */
+static atomic_long_t sgx_nr_available_backing_pages;
+static bool sgx_disable_overcommit_tracking;
+
+/**
+ * sgx_charge_mem() - charge for a page used for backing storage
+ *
+ * Backing storage usage is capped by the sgx_nr_available_backing_pages.
+ * If the backing storage usage is over the overcommit limit,
+ * return an error.
+ *
+ * Return:
+ * 0:		The page requested does not exceed the limit
+ * -ENOMEM:	The page requested exceeds the overcommit limit
+ */
+int sgx_charge_mem(void)
+{
+	if (sgx_disable_overcommit_tracking)
+		return 0;
+
+	if (!atomic_long_add_unless(&sgx_nr_available_backing_pages, -1, 0))
+		return -ENOMEM;
+
+	return 0;
+}
+
+/**
+ * sgx_uncharge_mem() - uncharge a page previously used for backing storage
+ *
+ * When backing storage is no longer in use, increment the
+ * sgx_nr_available_backing_pages counter.
+ */
+void sgx_uncharge_mem(void)
+{
+	if (sgx_disable_overcommit_tracking)
+		return;
+
+	atomic_long_inc(&sgx_nr_available_backing_pages);
+}
+
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
@@ -786,6 +835,7 @@ static bool __init sgx_page_cache_init(void)
 	u64 pa, size;
 	int nid;
 	int i;
+	u64 total_epc_bytes = 0;
 
 	sgx_numa_nodes = kmalloc_array(num_possible_nodes(), sizeof(*sgx_numa_nodes), GFP_KERNEL);
 	if (!sgx_numa_nodes)
@@ -830,6 +880,7 @@ static bool __init sgx_page_cache_init(void)
 
 		sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
 		sgx_numa_nodes[nid].size += size;
+		total_epc_bytes += size;
 
 		sgx_nr_epc_sections++;
 	}
@@ -839,6 +890,19 @@ static bool __init sgx_page_cache_init(void)
 		return false;
 	}
 
+	if (sgx_overcommit_percent >= 0) {
+		u64 available_backing_bytes;
+
+		available_backing_bytes =
+			total_epc_bytes * (sgx_overcommit_percent / 100);
+
+		atomic_long_set(&sgx_nr_available_backing_pages,
+				available_backing_bytes >> PAGE_SHIFT);
+	} else {
+		pr_info("Disabling overcommit limit.\n");
+		sgx_disable_overcommit_tracking = true;
+	}
+
 	return true;
 }
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 0f17def9fe6f..3507a9983fc1 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -89,6 +89,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page);
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+int sgx_charge_mem(void);
+void sgx_uncharge_mem(void);
 
 #ifdef CONFIG_X86_SGX_KVM
 int __init sgx_vepc_init(void);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] x86/sgx: account backing pages
  2021-12-20 17:46 [PATCH 0/2] x86/sgx: Limit EPC overcommit Kristen Carlson Accardi
  2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi
@ 2021-12-20 17:46 ` Kristen Carlson Accardi
  2021-12-28 23:37   ` Jarkko Sakkinen
  1 sibling, 1 reply; 19+ messages in thread
From: Kristen Carlson Accardi @ 2021-12-20 17:46 UTC (permalink / raw)
  To: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

SGX may allow EPC pages to be overcommitted. If the system is
out of enclave memory, EPC pages are swapped to normal RAM via
a per enclave shared memory area. This shared memory is not
charged to the enclave or the task mapping it, making it hard
to account for using normal methods.

In order to avoid unlimited usage of normal RAM, enclaves must be
charged for each new page used for backing storage, and uncharged
when they are no longer using a backing page.

Modify the existing flow for requesting backing pages to reduce the
available backing page counter and confirm that the limit has not
been exceeded. Backing page usage for loading EPC pages back out of
the shared memory do not incur a charge.

When a backing page is released from usage, increment the available
backing page counter.

When swapping EPC pages to RAM, in addition to storing the page
contents, SGX must store some additional metadata to protect
against a malicious kernel when the page is swapped back in. This
additional metadata is called Paging Crypto MetaData. PCMD is
allocated from the same shared memory area as the backing page
contents and consumes RAM the same way.

PCMD is 128 bytes in size, and there is one PCMD structure per
page written to shared RAM. The page index for the PCMD page is
calculated from the page index of the backing page, so it is possible
that the PCMD structures are not packed into the minimum number of
pages possible. If 32 PCMDs can fit onto a single page, then PCMD
usage is 1/32 of total EPC pages. In the worst case, PCMD can
consume the same amount of RAM as EPC backing pages (1:1). For
simplicity, this implementation does not account for PCMD page usage.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 76 ++++++++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/sgx/encl.h |  6 ++-
 arch/x86/kernel/cpu/sgx/main.c |  6 +--
 3 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..8be6f0592bdc 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	else
 		page_index = PFN_DOWN(encl->size);
 
-	ret = sgx_encl_get_backing(encl, page_index, &b);
+	ret = sgx_encl_lookup_backing(encl, page_index, &b);
 	if (ret)
 		return ret;
 
@@ -407,6 +407,12 @@ void sgx_encl_release(struct kref *ref)
 			sgx_encl_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
+		} else {
+			/*
+			 * If there is no epc_page, it means it has been
+			 * swapped out. Uncharge the backing storage.
+			 */
+			sgx_uncharge_mem();
 		}
 
 		kfree(entry);
@@ -574,8 +580,8 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
  *   0 on success,
  *   -errno otherwise.
  */
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
-			 struct sgx_backing *backing)
+static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
+				struct sgx_backing *backing)
 {
 	pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
 	struct page *contents;
@@ -601,6 +607,62 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 	return 0;
 }
 
+/**
+ * sgx_encl_alloc_backing() - allocate a new backing storage page
+ * @encl:	an enclave pointer
+ * @page_index:	enclave page index
+ * @backing:	data for accessing backing storage for the page
+ *
+ * Confirm that the global overcommit limit has not been reached before
+ * requesting a new backing storage page for storing the encrypted contents
+ * and Paging Crypto MetaData (PCMD) of an enclave page. This is called when
+ * there is no existing backing page, just before writing to the backing
+ * storage with EWB.
+ *
+ * Return:
+ *   0 on success,
+ *   -errno otherwise.
+ */
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+			   struct sgx_backing *backing)
+{
+	int ret;
+
+	if (sgx_charge_mem())
+		return -ENOMEM;
+
+	ret = sgx_encl_get_backing(encl, page_index, backing);
+	if (ret)
+		sgx_uncharge_mem();
+
+	return ret;
+}
+
+/**
+ * sgx_encl_lookup_backing() - retrieve an existing backing storage page
+ * @encl:	an enclave pointer
+ * @page_index:	enclave page index
+ * @backing:	data for accessing backing storage for the page
+ *
+ * Retrieve a backing page for loading data back into an EPC page with ELDU.
+ * This call does not cause a charge to the overcommit limit because a page
+ * has already been allocated, but has been swapped out or is in RAM
+ *
+ * It is the caller's responsibility to ensure that it is appropriate to
+ * use sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If
+ * lookup is not used correctly, this will cause an allocation that is
+ * not accounted for.
+ *
+ * Return:
+ *   0 on success,
+ *   -errno otherwise.
+ */
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+			    struct sgx_backing *backing)
+{
+	return sgx_encl_get_backing(encl, page_index, backing);
+}
+
 /**
  * sgx_encl_put_backing() - Unpin the backing storage
  * @backing:	data for accessing backing storage for the page
@@ -608,9 +670,17 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
  */
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write)
 {
+	/*
+	 * If the page is being written to by the reclaimer, it is
+	 * still in use and the backing page usage should not be
+	 * uncharged. However, if the page is not being written to,
+	 * it is no longer in use and may be uncharged.
+	 */
 	if (do_write) {
 		set_page_dirty(backing->pcmd);
 		set_page_dirty(backing->contents);
+	} else {
+		sgx_uncharge_mem();
 	}
 
 	put_page(backing->pcmd);
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..8ffb8a83263f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -105,8 +105,10 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 
 void sgx_encl_release(struct kref *ref);
 int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
-			 struct sgx_backing *backing);
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+			   struct sgx_backing *backing);
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+			    struct sgx_backing *backing);
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index c58ce9d9fd56..0ef9b7398b35 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -357,8 +357,8 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	encl->secs_child_cnt--;
 
 	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
-		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
-					   &secs_backing);
+		ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size),
+					     &secs_backing);
 		if (ret)
 			goto out;
 
@@ -428,7 +428,7 @@ static void sgx_reclaim_pages(void)
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
-		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
+		ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]);
 		if (ret)
 			goto skip;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi
@ 2021-12-20 19:30   ` Borislav Petkov
  2021-12-20 20:39     ` Kristen Carlson Accardi
  2021-12-28 23:04   ` Jarkko Sakkinen
  1 sibling, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2021-12-20 19:30 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin

On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi wrote:
>  Similar to the core kswapd, ksgxd, is responsible for managing the
>  overcommitment of enclave memory.  If the system runs out of enclave memory,
> -*ksgxd* “swaps” enclave memory to normal memory.
> +*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is allocated
> +via per enclave shared memory. The shared memory area is not mapped into the
> +enclave or the task mapping it, which makes its memory use opaque - including
> +to the system out of memory killer (OOM). This can be problematic when there
> +are no limits in place on the amount an enclave can allocate.

Problematic how?

The commit message above is talking about what your patch does and that
is kinda clear from the diff. The *why* is really missing. Only that
allusion that it might be problematic in some cases but that's not even
scratching the surface.

> +At boot time, the module parameter "sgx.overcommit_percent" can be used to
> +place a limit on the number of shared memory backing pages that may be
> +allocated, expressed as a percentage of the total number of EPC pages in the
> +system.  A value of 100 is the default, and represents a limit equal to the
> +number of EPC pages in the system. To disable the limit, set
> +sgx.overcommit_percent to -1. The number of backing pages available to
> +enclaves is a global resource. If the system exceeds the number of allowed
> +backing pages in use, the reclaimer will be unable to swap EPC pages to
> +shared memory.

So you're basically putting the burden on the user/sysadmin to
*actually* *know* what percentage is "problematic" and to know what to
supply. I'd bet not very many people would know how much is problematic
and it probably all depends.

So why don't you come up with a sane default, instead, which works in
most cases and set it automatically?

Dunno, maybe some scaled percentage of memory depending on how many
enclaves are run but all up to a sane limit of, say, 90% of total memory
so that there are 10% left for normal system operation.

This way you'll avoid "problematic" and still have some memory left for
other use.

Or something like that.

Adding yet another knob is yuck and the easy way out. And we have waaaay
too many knobs so we should always try to do the automatic thing, if at
all possible.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 19:30   ` Borislav Petkov
@ 2021-12-20 20:39     ` Kristen Carlson Accardi
  2021-12-20 21:11       ` Borislav Petkov
  0 siblings, 1 reply; 19+ messages in thread
From: Kristen Carlson Accardi @ 2021-12-20 20:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin

Hi Boris, thanks for taking look.

On Mon, 2021-12-20 at 20:30 +0100, Borislav Petkov wrote:
> On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi
> wrote:
> >  Similar to the core kswapd, ksgxd, is responsible for managing the
> >  overcommitment of enclave memory.  If the system runs out of
> > enclave memory,
> > -*ksgxd* “swaps” enclave memory to normal memory.
> > +*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is
> > allocated
> > +via per enclave shared memory. The shared memory area is not
> > mapped into the
> > +enclave or the task mapping it, which makes its memory use opaque
> > - including
> > +to the system out of memory killer (OOM). This can be problematic
> > when there
> > +are no limits in place on the amount an enclave can allocate.
> 
> Problematic how?

If a malicious or just extra large enclave is loaded, or even just a
lot of enclaves, it can eat up all the normal RAM on the system. Normal
methods of finding out where all the memory on the system is being
used, wouldn't be able to find this usage since it is shared memory. In
addition, the OOM killer wouldn't be able to kill any enclaves.
 
> 
> The commit message above is talking about what your patch does and
> that
> is kinda clear from the diff. The *why* is really missing. Only that
> allusion that it might be problematic in some cases but that's not
> even
> scratching the surface.
> 
> > +At boot time, the module parameter "sgx.overcommit_percent" can be
> > used to
> > +place a limit on the number of shared memory backing pages that
> > may be
> > +allocated, expressed as a percentage of the total number of EPC
> > pages in the
> > +system.  A value of 100 is the default, and represents a limit
> > equal to the
> > +number of EPC pages in the system. To disable the limit, set
> > +sgx.overcommit_percent to -1. The number of backing pages
> > available to
> > +enclaves is a global resource. If the system exceeds the number of
> > allowed
> > +backing pages in use, the reclaimer will be unable to swap EPC
> > pages to
> > +shared memory.
> 
> So you're basically putting the burden on the user/sysadmin to
> *actually* *know* what percentage is "problematic" and to know what
> to
> supply. I'd bet not very many people would know how much is
> problematic
> and it probably all depends.
> 
> So why don't you come up with a sane default, instead, which works in
> most cases and set it automatically?

The default value is set to 100%, and this percentage, plus the number
of EPC pages in the system is used to calculate a sane value for the
number of backing pages to add - in this case, exactly the number of
EPC pages in the system. It is set automatically if the parameter is
not used to override it.

> 
> Dunno, maybe some scaled percentage of memory depending on how many
> enclaves are run but all up to a sane limit of, say, 90% of total
> memory
> so that there are 10% left for normal system operation.

I think in this case you are still making a judgement call for the
admin about how much memory you think they ought to be able to use for
non-SGX operations, and it feels more natural to me to have the default
based on how many backing pages you might reasonably need. I suppose
one could check the total system memory available and double check that
the calculated number of backing pages you'd give out would never
exceed some percentage of total system RAM, but then you wind up with 2
knobs to play with instead of just one, which seems wrong to me.

> 
> This way you'll avoid "problematic" and still have some memory left
> for
> other use.
> 
> Or something like that.
> 
> Adding yet another knob is yuck and the easy way out. And we have
> waaaay
> too many knobs so we should always try to do the automatic thing, if
> at
> all possible.

I completely agree - so I'm trying to make sure I understand this
comment, as the value is currently set to default that would
automatically apply that is based on EPC memory present and not a fixed
value. So I'd like to understand what you'd like to see done
differently. are you saying you'd like to eliminated the ability to
override the automatic default? Or just that you'd rather calculate the
percentage based on total normal system RAM rather than EPC memory?

Thanks,
Kristen



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 20:39     ` Kristen Carlson Accardi
@ 2021-12-20 21:11       ` Borislav Petkov
  2021-12-20 21:35         ` Kristen Carlson Accardi
  0 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2021-12-20 21:11 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml

Bah, that thread is not on lkml. Please always Cc lkml in the future.

On Mon, Dec 20, 2021 at 12:39:19PM -0800, Kristen Carlson Accardi wrote:
> If a malicious or just extra large enclave is loaded, or even just a
> lot of enclaves, it can eat up all the normal RAM on the system. Normal
> methods of finding out where all the memory on the system is being
> used, wouldn't be able to find this usage since it is shared memory. In
> addition, the OOM killer wouldn't be able to kill any enclaves.

So you need some sort of limiting against malicious enclaves anyways,
regardless of this knob. IOW, you can set a percentage limit of
per-enclave memory which cannot exceed a certain amount which won't
prevent the system from its normal operation. For example.

> I completely agree - so I'm trying to make sure I understand this
> comment, as the value is currently set to default that would
> automatically apply that is based on EPC memory present and not a fixed
> value. So I'd like to understand what you'd like to see done
> differently. are you saying you'd like to eliminated the ability to
> override the automatic default? Or just that you'd rather calculate the
> percentage based on total normal system RAM rather than EPC memory?

So you say that there are cases where swapping to normal RAM can eat
up all RAM.

So the first heuristic should be: do not allow for *all* enclaves
together to use up more than X percent of normal RAM during EPC reclaim.

X percent being, say, 90% of all RAM. For example. I guess 10% should
be enough for normal operation but someone who's more knowledgeable in
system limits could chime in here.

Then, define a per-enclave limit which says, an enclave can use Y % of
memory for swapping when trying to reclaim EPC memory. And that can be
something like:

	90 % RAM
	--------
	total amount of enclaves currently on the system

And you can obviously create scenarios where creating too many enclaves
can bring the system into a situation where it doesn't do any forward
progress.

But you probably can cause the same with overcommitting with VMs so
perhaps it would be a good idea to look how overcommitting VMs and
limits there are handled.

Bottom line is: the logic should be for the most common cases to
function properly, out-of-the-box, without knobs. And then to keep the
system operational by preventing enclaves from bringing it down to a
halt just by doing EPC reclaim.

Does that make more sense?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 21:11       ` Borislav Petkov
@ 2021-12-20 21:35         ` Kristen Carlson Accardi
  2021-12-20 22:48           ` Borislav Petkov
  2021-12-22 14:21           ` Dave Hansen
  0 siblings, 2 replies; 19+ messages in thread
From: Kristen Carlson Accardi @ 2021-12-20 21:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml

On Mon, 2021-12-20 at 22:11 +0100, Borislav Petkov wrote:
> Bah, that thread is not on lkml. Please always Cc lkml in the future.
> 
> On Mon, Dec 20, 2021 at 12:39:19PM -0800, Kristen Carlson Accardi
> wrote:
> > If a malicious or just extra large enclave is loaded, or even just
> > a
> > lot of enclaves, it can eat up all the normal RAM on the system.
> > Normal
> > methods of finding out where all the memory on the system is being
> > used, wouldn't be able to find this usage since it is shared
> > memory. In
> > addition, the OOM killer wouldn't be able to kill any enclaves.
> 
> So you need some sort of limiting against malicious enclaves anyways,
> regardless of this knob. IOW, you can set a percentage limit of
> per-enclave memory which cannot exceed a certain amount which won't
> prevent the system from its normal operation. For example.
> 
> > I completely agree - so I'm trying to make sure I understand this
> > comment, as the value is currently set to default that would
> > automatically apply that is based on EPC memory present and not a
> > fixed
> > value. So I'd like to understand what you'd like to see done
> > differently. are you saying you'd like to eliminated the ability to
> > override the automatic default? Or just that you'd rather calculate
> > the
> > percentage based on total normal system RAM rather than EPC memory?
> 
> So you say that there are cases where swapping to normal RAM can eat
> up all RAM.
> 
> So the first heuristic should be: do not allow for *all* enclaves
> together to use up more than X percent of normal RAM during EPC
> reclaim.

So, in your proposal, you would first change the calculated number of
maximum available backing pages to be based on total system RAM rather
than EPC memory, got it.

> 
> X percent being, say, 90% of all RAM. For example. I guess 10% should
> be enough for normal operation but someone who's more knowledgeable
> in
> system limits could chime in here.
> 
> Then, define a per-enclave limit which says, an enclave can use Y %
> of
> memory for swapping when trying to reclaim EPC memory. And that can
> be
> something like:
> 
> 	90 % RAM
> 	--------
> 	total amount of enclaves currently on the system
> 

This would require recalculating the max number of allowed backing
pages per enclave at run time whenever a new enclave is loaded - but
all the existing enclaves may have already used more than the new max
number of per-enclave allowable pages. How would you handle that
scenario? This would add a lot of complexity for sure - and it does
make me wonder whether any additional benefit of limiting per enclave
would be worth it.

> And you can obviously create scenarios where creating too many
> enclaves
> can bring the system into a situation where it doesn't do any forward
> progress.
> 
> But you probably can cause the same with overcommitting with VMs so
> perhaps it would be a good idea to look how overcommitting VMs and
> limits there are handled.
> 
> Bottom line is: the logic should be for the most common cases to
> function properly, out-of-the-box, without knobs. And then to keep
> the
> system operational by preventing enclaves from bringing it down to a
> halt just by doing EPC reclaim.
> 
> Does that make more sense?
> 

Thanks for your more detailed explanation - I will take a look at the
VM overcommit limits. Since obviously the original implementation did
have a default value set, I had still a remaining specific question
about your comments. Are you suggesting that there should not be a way
to override any overcommit limit at all? So drop the parameter all
together?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 21:35         ` Kristen Carlson Accardi
@ 2021-12-20 22:48           ` Borislav Petkov
  2021-12-21 15:53             ` Dave Hansen
  2021-12-22 14:21           ` Dave Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2021-12-20 22:48 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml

On Mon, Dec 20, 2021 at 01:35:03PM -0800, Kristen Carlson Accardi wrote:
> So, in your proposal you would first change the calculated number of
> maximum available backing pages to be based on total system RAM rather
> than EPC memory, got it.

This was just an example. My point is to try to make it automatic and
not introduce another knob. And to decide on the limits and behavior
by using common sense and addressing the common use cases first.

> This would require recalculating the max number of allowed backing
> pages per enclave at run time whenever a new enclave is loaded - but
> all the existing enclaves may have already used more than the new max
> number of per-enclave allowable pages. How would you handle that
> scenario? This would add a lot of complexity for sure - and it does
> make me wonder whether any additional benefit of limiting per enclave
> would be worth it.

See above - this was just an example. And as you've shown, an example of
what *not* to do.

> Thanks for your more detailed explanation - I will take a look at the
> VM overcommit limits. Since obviously the original implementation did
> have a default value set, I had still a remaining specific question
> about your comments. Are you suggesting that there should not be a way
> to override any overcommit limit at all? So drop the parameter all
> together?

So let me ask you a counter-question:

Imagine you're a sysadmin. Or a general, common system user if there
ever is one.

When your system starts thrashing and you're running a bunch of
enclaves, how do you find out there even is a knob which might
potentially help you?

And after you find it, how would you use that knob?

Or would you rather prefer that the system did the right thing for you
instead of having to figure out what the sensible values for that knob
would be?

My point is: put yourself in the user's shoes and try to think about
what would be the optimal thing the system should do.

Once that is cleanly and properly defined, then we can deal with knobs
and policies etc.

I sincerely hope that makes more sense.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 22:48           ` Borislav Petkov
@ 2021-12-21 15:53             ` Dave Hansen
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2021-12-21 15:53 UTC (permalink / raw)
  To: Borislav Petkov, Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml

On 12/20/21 2:48 PM, Borislav Petkov wrote:
> On Mon, Dec 20, 2021 at 01:35:03PM -0800, Kristen Carlson Accardi wrote:
>> So, in your proposal you would first change the calculated number of
>> maximum available backing pages to be based on total system RAM rather
>> than EPC memory, got it.
> 
> This was just an example. My point is to try to make it automatic and
> not introduce another knob. And to decide on the limits and behavior
> by using common sense and addressing the common use cases first.

The common case is clearly a few enclaves on systems where the
overcommit levels are modest.  The "100%" limit will work great there.

I'd personally be fine with just enforcing that limit as the default and
ignoring everything else.  It makes me a bit nervous, though, because
suddenly enforcing a limit is an ABI break.  The tunable effectively
gives us a way out if we screw up either the limit's quantity or someone
needs the old ABI back.

That said, we don't *need* it to be tunable, boot parameter or not.  If
you're concerned about the tunable, I think we should drop it and not
look back.

> Imagine you're a sysadmin. Or a general, common system user if there
> ever is one.
> 
> When your system starts thrashing and you're running a bunch of
> enclaves, how do you find out there even is a knob which might
> potentially help you?

I'm selfish.  The tunable isn't for end users.  It's for me.

The scenario I envisioned is that a user upgrades to a new kernel and
their enclaves start crashing.  They'll eventually find us, the
maintainers of the SGX code, and we'll have a tool as kernel developers
to help them.  The tunable helps _me_ in two ways:
1. It help me easily get user back to pre-5.17 (or whatever) behavior
2. If we got the "100%" value wrong, end users can help us experiment to
   help find a better value.

BTW, all the chat about "malicious" enclaves and so forth...  I
*totally* agree that this is a problem and one worth solving.  It just
can't be solved today.  We need real cgroup support.  It's coming soon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 21:35         ` Kristen Carlson Accardi
  2021-12-20 22:48           ` Borislav Petkov
@ 2021-12-22 14:21           ` Dave Hansen
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2021-12-22 14:21 UTC (permalink / raw)
  To: Kristen Carlson Accardi, Borislav Petkov
  Cc: linux-sgx, Jonathan Corbet, Jarkko Sakkinen, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, x86, H. Peter Anvin, lkml

On 12/20/21 1:35 PM, Kristen Carlson Accardi wrote:
> Thanks for your more detailed explanation - I will take a look at the
> VM overcommit limits. Since obviously the original implementation did
> have a default value set, I had still a remaining specific question
> about your comments. Are you suggesting that there should not be a way
> to override any overcommit limit at all? So drop the parameter all
> together?

Yes, let's kill the exposed tunable.

We don't have any strong, practical need for it at the moment other than
paranoia.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi
  2021-12-20 19:30   ` Borislav Petkov
@ 2021-12-28 23:04   ` Jarkko Sakkinen
  2021-12-28 23:34     ` Dave Hansen
  2022-01-06 18:26     ` Kristen Carlson Accardi
  1 sibling, 2 replies; 19+ messages in thread
From: Jarkko Sakkinen @ 2021-12-28 23:04 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi wrote:
> When the system runs out of enclave memory, SGX can reclaim EPC pages
> by swapping to normal RAM. This normal RAM is allocated via a
> per-enclave shared memory area. The shared memory area is not mapped
> into the enclave or the task mapping it, which makes its memory use
> opaque (including to the OOM killer). Having lots of hard to find
> memory around is problematic, especially when there is no limit.
> 
> Introduce a module parameter and a global counter that can be used to limit
> the number of pages that enclaves are able to consume for backing storage.
> This parameter is a percentage value that is used in conjunction with the
> number of EPC pages in the system to set a cap on the amount of backing
> RAM that can be consumed.
> 
> The default for this value is 100, which limits the total number of
> shared memory pages that may be consumed by all enclaves as backing pages
> to the number of EPC pages on the system.
> 
> For example, on an SGX system that has 128MB of EPC, this default would
> cap the amount of normal RAM that SGX consumes for its shared memory
> areas at 128MB.
> 
> If sgx.overcommit_percent is set to a negative value (such as -1),
> SGX will not place any limits on the amount of overcommit that might
> be requested, and SGX will behave as it has previously without the
> overcommit_percent limit.
> 
> SGX may not be built as a module, but the module parameter interface
> is used in order to provide a convenient interface.
> 
> The SGX overcommit_percent works differently than the core VM overcommit
> limit. Enclaves request backing pages one page at a time, and the number
> of in use backing pages that are allowed is a global resource that is
> limited for all enclaves.
> 
> Introduce a pair of functions which can be used by callers when requesting
> backing RAM pages. These functions are responsible for accounting the
> page charges. A request may return an error if the request will cause the
> counter to exceed the backing page cap.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  7 ++
>  Documentation/x86/sgx.rst                     | 16 ++++-
>  arch/x86/kernel/cpu/sgx/Makefile              |  6 +-
>  arch/x86/kernel/cpu/sgx/main.c                | 64 +++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h                 |  2 +
>  5 files changed, 93 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9725c546a0d4..9d23c05a833b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5165,6 +5165,13 @@
>  
>  	serialnumber	[BUGS=X86-32]
>  
> +	sgx.overcommit_percent=	[X86-64,SGX]
> +			Limits the amount of normal RAM used for backing
> +			storage that may be allocate, expressed as a
> +			percentage of the total number of EPC pages in the
> +			system.
> +			See Documentation/x86/sgx.rst for more information.
> +
>  	shapers=	[NET]
>  			Maximal number of shapers.
>  
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index 265568a9292c..4f9a1c68be94 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -147,7 +147,21 @@ Page reclaimer
>  
>  Similar to the core kswapd, ksgxd, is responsible for managing the
>  overcommitment of enclave memory.  If the system runs out of enclave memory,
> -*ksgxd* “swaps” enclave memory to normal memory.
> +*ksgxd* “swaps” enclave memory to normal RAM. This normal RAM is allocated
> +via per enclave shared memory. The shared memory area is not mapped into the
> +enclave or the task mapping it, which makes its memory use opaque - including
> +to the system out of memory killer (OOM). This can be problematic when there
> +are no limits in place on the amount an enclave can allocate.
> +
> +At boot time, the module parameter "sgx.overcommit_percent" can be used to
> +place a limit on the number of shared memory backing pages that may be
> +allocated, expressed as a percentage of the total number of EPC pages in the
> +system.  A value of 100 is the default, and represents a limit equal to the
> +number of EPC pages in the system. To disable the limit, set
> +sgx.overcommit_percent to -1. The number of backing pages available to
> +enclaves is a global resource. If the system exceeds the number of allowed
> +backing pages in use, the reclaimer will be unable to swap EPC pages to
> +shared memory.
>  
>  Launch Control
>  ==============
> diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
> index 9c1656779b2a..72f9192a43fe 100644
> --- a/arch/x86/kernel/cpu/sgx/Makefile
> +++ b/arch/x86/kernel/cpu/sgx/Makefile
> @@ -1,6 +1,10 @@
> -obj-y += \
> +# This allows sgx to have module namespace
> +obj-y += sgx.o
> +
> +sgx-y += \
>  	driver.o \
>  	encl.o \
>  	ioctl.o \
>  	main.o
> +
>  obj-$(CONFIG_X86_SGX_KVM)	+= virt.o
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 2857a49f2335..c58ce9d9fd56 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*  Copyright(c) 2016-20 Intel Corporation. */
>  
> +#include <linux/moduleparam.h>
>  #include <linux/file.h>
>  #include <linux/freezer.h>
>  #include <linux/highmem.h>
> @@ -43,6 +44,54 @@ static struct sgx_numa_node *sgx_numa_nodes;
>  
>  static LIST_HEAD(sgx_dirty_page_list);
>  
> +/*
> + * Limits the amount of normal RAM that SGX can consume for EPC
> + * overcommit to the total EPC pages * sgx_overcommit_percent / 100
> + */
> +static int sgx_overcommit_percent = 100;
> +module_param_named(overcommit_percent, sgx_overcommit_percent, int, 0440);
> +MODULE_PARM_DESC(overcommit_percent, "Percentage of overcommit of EPC pages.");
> +
> +/* The number of pages that can be allocated globally for backing storage. */
> +static atomic_long_t sgx_nr_available_backing_pages;
> +static bool sgx_disable_overcommit_tracking;

I don't like the use of word tracking as we already have ETRACK. I'd also
shorten the first global as "sgx_nr_backing_pages".

Couldn't you set "sgx_nr_backing_pages" to -1 when capping is disabled, and
then you would not need that bool in the first place?

> +
> +/**
> + * sgx_charge_mem() - charge for a page used for backing storage
> + *

Please remove this empty line:

https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt

> + * Backing storage usage is capped by the sgx_nr_available_backing_pages.
> + * If the backing storage usage is over the overcommit limit,

Where does this verb "charge" come from?

/Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-28 23:04   ` Jarkko Sakkinen
@ 2021-12-28 23:34     ` Dave Hansen
  2022-01-06 18:26     ` Kristen Carlson Accardi
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Hansen @ 2021-12-28 23:34 UTC (permalink / raw)
  To: Jarkko Sakkinen, Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

>> +/*
>> + * Limits the amount of normal RAM that SGX can consume for EPC
>> + * overcommit to the total EPC pages * sgx_overcommit_percent / 100
>> + */
>> +static int sgx_overcommit_percent = 100;
>> +module_param_named(overcommit_percent, sgx_overcommit_percent, int, 0440);
>> +MODULE_PARM_DESC(overcommit_percent, "Percentage of overcommit of EPC pages.");
>> +
>> +/* The number of pages that can be allocated globally for backing storage. */
>> +static atomic_long_t sgx_nr_available_backing_pages;
>> +static bool sgx_disable_overcommit_tracking;
> 
> I don't like the use of word tracking as we already have ETRACK.

I don't think anyone is going to confuse "overcommit tracking" with ETRACK.

That said, this *could* be "sgx_disable_overcommit_limits", I guess.

> I'd also shorten the first global as "sgx_nr_backing_pages".
That means something different from the variable, though.

"sgx_nr_backing_pages" would be the name for the current number of
backing pages which currently exist.

> Couldn't you set "sgx_nr_backing_pages" to -1 when capping is disabled, and
> then you would not need that bool in the first place?
> 
>> +
>> +/**
>> + * sgx_charge_mem() - charge for a page used for backing storage
>> + *
> 
> Please remove this empty line:
> 
> https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt

That *might* make sense when there are arguments.  The arguments at
least help visually separate the short function description from the
full text description.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/sgx: account backing pages
  2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi
@ 2021-12-28 23:37   ` Jarkko Sakkinen
  2022-01-05  0:36     ` Dave Hansen
  0 siblings, 1 reply; 19+ messages in thread
From: Jarkko Sakkinen @ 2021-12-28 23:37 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin

On Mon, Dec 20, 2021 at 09:46:40AM -0800, Kristen Carlson Accardi wrote:
> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
> +			    struct sgx_backing *backing)
> +{
> +	return sgx_encl_get_backing(encl, page_index, backing);
> +}

Is this wrapping necessary?

Also, there is ambiguous terminology:

1. Local function: "get_backing"
2. Exported function: "lookup_backing"

/Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/sgx: account backing pages
  2021-12-28 23:37   ` Jarkko Sakkinen
@ 2022-01-05  0:36     ` Dave Hansen
  2022-01-08 14:24       ` Jarkko Sakkinen
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2022-01-05  0:36 UTC (permalink / raw)
  To: Jarkko Sakkinen, Kristen Carlson Accardi
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin

On 12/28/21 3:37 PM, Jarkko Sakkinen wrote:
> On Mon, Dec 20, 2021 at 09:46:40AM -0800, Kristen Carlson Accardi wrote:
>> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
>> +			    struct sgx_backing *backing)
>> +{
>> +	return sgx_encl_get_backing(encl, page_index, backing);
>> +}
> Is this wrapping necessary?

Yes, I think so.

> Also, there is ambiguous terminology:
> 
> 1. Local function: "get_backing"
> 2. Exported function: "lookup_backing"

I'm not sure what you're getting at.

There are three important things that you do with backing storage:

1. Allocate it
2. Find it
3. De-allocate (free) it

Right now, the code has a pattern where it does:

	get_backing();
	// do something
	put_backing();

That sure as heck looks like it is allocating and freeing it.  But, it's
actually *maybe* doing an allocation.  The "find it" path also looks
*EXACTLY* the same as the actual allocation path.  You might also recall
that the original code didn't even *have* a (real) free path.

The "wrapping" is really just naming the two different operations that
use the "get" function: lookup and allocate.  It's not just wrapping,
it's clarify the logical behavior.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2021-12-28 23:04   ` Jarkko Sakkinen
  2021-12-28 23:34     ` Dave Hansen
@ 2022-01-06 18:26     ` Kristen Carlson Accardi
  2022-01-07 12:25       ` Jarkko Sakkinen
  1 sibling, 1 reply; 19+ messages in thread
From: Kristen Carlson Accardi @ 2022-01-06 18:26 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

Hi Jarkko, thanks for your review,

On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote:
> On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi
> wrote:
> > 
> > +
> > +/**
> > + * sgx_charge_mem() - charge for a page used for backing storage
> > + *
> 
> Please remove this empty line:
> 
> https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt

I read this to be that there should be an empty line after the short
description/arg list but before the longer description. I think for
functions without args this is the proper layout. It also is more
readable.

> 
> > + * Backing storage usage is capped by the
> > sgx_nr_available_backing_pages.
> > + * If the backing storage usage is over the overcommit limit,
> 
> Where does this verb "charge" come from?

Charge in this context means that some available backing pages are now
not available because they are in use. It feels appropriate to me to
use "charge/uncharge" verbs for this action, unless you think it's
confusing somehow.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2022-01-06 18:26     ` Kristen Carlson Accardi
@ 2022-01-07 12:25       ` Jarkko Sakkinen
  2022-01-07 17:17         ` Kristen Carlson Accardi
  0 siblings, 1 reply; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-01-07 12:25 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

On Thu, Jan 06, 2022 at 10:26:18AM -0800, Kristen Carlson Accardi wrote:
> Hi Jarkko, thanks for your review,
> 
> On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote:
> > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi
> > wrote:
> > > 
> > > +
> > > +/**
> > > + * sgx_charge_mem() - charge for a page used for backing storage
> > > + *
> > 
> > Please remove this empty line:
> > 
> > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt
> 
> I read this to be that there should be an empty line after the short
> description/arg list but before the longer description. I think for
> functions without args this is the proper layout. It also is more
> readable.
> 
> > 
> > > + * Backing storage usage is capped by the
> > > sgx_nr_available_backing_pages.
> > > + * If the backing storage usage is over the overcommit limit,
> > 
> > Where does this verb "charge" come from?
> 
> Charge in this context means that some available backing pages are now
> not available because they are in use. It feels appropriate to me to
> use "charge/uncharge" verbs for this action, unless you think it's
> confusing somehow.

OK, it's cool.

I'm still wondering why you need extra variable given that
sgx_nr_backing_available_pages is signed. You could mark the
feature being disabled by setting it to -1.

It might cosmetically improve readability to have a boolean but
for practical uses (e.g. eBPF tracing scripts) it only adds extra
complexity.

/Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2022-01-07 12:25       ` Jarkko Sakkinen
@ 2022-01-07 17:17         ` Kristen Carlson Accardi
  2022-01-08 15:54           ` Jarkko Sakkinen
  0 siblings, 1 reply; 19+ messages in thread
From: Kristen Carlson Accardi @ 2022-01-07 17:17 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

On Fri, 2022-01-07 at 14:25 +0200, Jarkko Sakkinen wrote:
> On Thu, Jan 06, 2022 at 10:26:18AM -0800, Kristen Carlson Accardi
> wrote:
> > Hi Jarkko, thanks for your review,
> > 
> > On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote:
> > > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi
> > > wrote:
> > > > +
> > > > +/**
> > > > + * sgx_charge_mem() - charge for a page used for backing
> > > > storage
> > > > + *
> > > 
> > > Please remove this empty line:
> > > 
> > > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt
> > 
> > I read this to be that there should be an empty line after the
> > short
> > description/arg list but before the longer description. I think for
> > functions without args this is the proper layout. It also is more
> > readable.
> > 
> > > > + * Backing storage usage is capped by the
> > > > sgx_nr_available_backing_pages.
> > > > + * If the backing storage usage is over the overcommit limit,
> > > 
> > > Where does this verb "charge" come from?
> > 
> > Charge in this context means that some available backing pages are
> > now
> > not available because they are in use. It feels appropriate to me
> > to
> > use "charge/uncharge" verbs for this action, unless you think it's
> > confusing somehow.
> 
> OK, it's cool.
> 
> I'm still wondering why you need extra variable given that
> sgx_nr_backing_available_pages is signed. You could mark the
> feature being disabled by setting it to -1.
> 
> It might cosmetically improve readability to have a boolean but
> for practical uses (e.g. eBPF tracing scripts) it only adds extra
> complexity.
> 
> /Jarkko

I can see your point. Since Boris objected to the module param, the
next version will not have a way to disable limits at all, so I am
deleting that boolean all together.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/sgx: account backing pages
  2022-01-05  0:36     ` Dave Hansen
@ 2022-01-08 14:24       ` Jarkko Sakkinen
  0 siblings, 0 replies; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-01-08 14:24 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kristen Carlson Accardi, linux-sgx, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

On Tue, Jan 04, 2022 at 04:36:28PM -0800, Dave Hansen wrote:
> On 12/28/21 3:37 PM, Jarkko Sakkinen wrote:
> > On Mon, Dec 20, 2021 at 09:46:40AM -0800, Kristen Carlson Accardi wrote:
> >> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
> >> +			    struct sgx_backing *backing)
> >> +{
> >> +	return sgx_encl_get_backing(encl, page_index, backing);
> >> +}
> > Is this wrapping necessary?
> 
> Yes, I think so.
> 
> > Also, there is ambiguous terminology:
> > 
> > 1. Local function: "get_backing"
> > 2. Exported function: "lookup_backing"
> 
> I'm not sure what you're getting at.
> 
> There are three important things that you do with backing storage:
> 
> 1. Allocate it
> 2. Find it
> 3. De-allocate (free) it
> 
> Right now, the code has a pattern where it does:
> 
> 	get_backing();
> 	// do something
> 	put_backing();
> 
> That sure as heck looks like it is allocating and freeing it.  But, it's
> actually *maybe* doing an allocation.  The "find it" path also looks
> *EXACTLY* the same as the actual allocation path.  You might also recall
> that the original code didn't even *have* a (real) free path.
> 
> The "wrapping" is really just naming the two different operations that
> use the "get" function: lookup and allocate.  It's not just wrapping,
> it's clarify the logical behavior.

Why it makes sense to keep sgx_encl_get_backing(), if it has zero call
sites and not open-code its implementation to sgx_encl_lookup_backing().

I'm also wondering, why here the function is not named as
sgx_encl_charge_backing(), i.e. follow the naming convention? It would be
easier to remember the flow, when reading the code. Since we use "not as
common name", let's take advantage of it to make maintaining the code
easier later on.

The commit message says:

"Modify the existing flow for requesting backing pages to reduce the
available backing page counter and confirm that the limit has not been
exceeded. Backing page usage for loading EPC pages back out of the shared
memory do not incur a charge."

I would add, in order to make this less abstract:

"
In other words, replace call sites of sgx_encl_get_backing() with either:

* sgx_encl_lookup_backing() for ELDU, which does not cause sgx_charge_mem()
  to be invoked.
* sgx_encl_alloc_backing() for EWB, which does cause sgx_charge_mem()
  to be invoked.
"

It's currently way too abstract description of the code change.

/Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit
  2022-01-07 17:17         ` Kristen Carlson Accardi
@ 2022-01-08 15:54           ` Jarkko Sakkinen
  0 siblings, 0 replies; 19+ messages in thread
From: Jarkko Sakkinen @ 2022-01-08 15:54 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jonathan Corbet, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin

On Fri, Jan 07, 2022 at 09:17:03AM -0800, Kristen Carlson Accardi wrote:
> On Fri, 2022-01-07 at 14:25 +0200, Jarkko Sakkinen wrote:
> > On Thu, Jan 06, 2022 at 10:26:18AM -0800, Kristen Carlson Accardi
> > wrote:
> > > Hi Jarkko, thanks for your review,
> > > 
> > > On Wed, 2021-12-29 at 01:04 +0200, Jarkko Sakkinen wrote:
> > > > On Mon, Dec 20, 2021 at 09:46:39AM -0800, Kristen Carlson Accardi
> > > > wrote:
> > > > > +
> > > > > +/**
> > > > > + * sgx_charge_mem() - charge for a page used for backing
> > > > > storage
> > > > > + *
> > > > 
> > > > Please remove this empty line:
> > > > 
> > > > https://www.kernel.org/doc/Documentation/kernel-doc-nano-HOWTO.txt
> > > 
> > > I read this to be that there should be an empty line after the
> > > short
> > > description/arg list but before the longer description. I think for
> > > functions without args this is the proper layout. It also is more
> > > readable.
> > > 
> > > > > + * Backing storage usage is capped by the
> > > > > sgx_nr_available_backing_pages.
> > > > > + * If the backing storage usage is over the overcommit limit,
> > > > 
> > > > Where does this verb "charge" come from?
> > > 
> > > Charge in this context means that some available backing pages are
> > > now
> > > not available because they are in use. It feels appropriate to me
> > > to
> > > use "charge/uncharge" verbs for this action, unless you think it's
> > > confusing somehow.
> > 
> > OK, it's cool.
> > 
> > I'm still wondering why you need extra variable given that
> > sgx_nr_backing_available_pages is signed. You could mark the
> > feature being disabled by setting it to -1.
> > 
> > It might cosmetically improve readability to have a boolean but
> > for practical uses (e.g. eBPF tracing scripts) it only adds extra
> > complexity.
> > 
> > /Jarkko
> 
> I can see your point. Since Boris objected to the module param, the
> next version will not have a way to disable limits at all, so I am
> deleting that boolean all together.

Ok, cool, I'm looking forward for the next version.

/Jarkko

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-01-08 15:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 17:46 [PATCH 0/2] x86/sgx: Limit EPC overcommit Kristen Carlson Accardi
2021-12-20 17:46 ` [PATCH 1/2] x86/sgx: Add accounting for tracking overcommit Kristen Carlson Accardi
2021-12-20 19:30   ` Borislav Petkov
2021-12-20 20:39     ` Kristen Carlson Accardi
2021-12-20 21:11       ` Borislav Petkov
2021-12-20 21:35         ` Kristen Carlson Accardi
2021-12-20 22:48           ` Borislav Petkov
2021-12-21 15:53             ` Dave Hansen
2021-12-22 14:21           ` Dave Hansen
2021-12-28 23:04   ` Jarkko Sakkinen
2021-12-28 23:34     ` Dave Hansen
2022-01-06 18:26     ` Kristen Carlson Accardi
2022-01-07 12:25       ` Jarkko Sakkinen
2022-01-07 17:17         ` Kristen Carlson Accardi
2022-01-08 15:54           ` Jarkko Sakkinen
2021-12-20 17:46 ` [PATCH 2/2] x86/sgx: account backing pages Kristen Carlson Accardi
2021-12-28 23:37   ` Jarkko Sakkinen
2022-01-05  0:36     ` Dave Hansen
2022-01-08 14:24       ` Jarkko Sakkinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.