All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
@ 2022-05-19 21:04 Kristen Carlson Accardi
  2022-05-19 21:22 ` Shakeel Butt
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Kristen Carlson Accardi @ 2022-05-19 21:04 UTC (permalink / raw)
  To: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin
  Cc: stable, linux-kernel, linux-mm, mhocko, roman.gushchin, hannes,
	shakeelb, Kristen Carlson Accardi

When the system runs out of enclave memory, SGX can reclaim EPC pages
by swapping to normal RAM. These backing pages are allocated via a
per-enclave shared memory area. Since SGX allows unlimited over
commit on EPC memory, the reclaimer thread can allocate a large
number of backing RAM pages in response to EPC memory pressure.

When the shared memory backing RAM allocation occurs during
the reclaimer thread context, the shared memory is charged to
the root memory control group, and the shmem usage of the enclave
is not properly accounted for, making cgroups ineffective at
limiting the amount of RAM an enclave can consume.

For example, when using a cgroup to launch a set of test
enclaves, the kernel does not properly account for 50% - 75% of
shmem page allocations on average. In the worst case, when
nearly all allocations occur during the reclaimer thread, the
kernel accounts less than a percent of the amount of shmem used
by the enclave's cgroup to the correct cgroup.

SGX stores a list of mm_structs that are associated with
an enclave. Pick one of them during reclaim and charge that
mm's memcg with the shmem allocation. The one that gets picked
is arbitrary, but this list almost always only has one mm. The
cases where there is more than one mm with different memcg's
are not worth considering.

Create a new function - sgx_encl_alloc_backing(). This function
is used whenever a new backing storage page needs to be
allocated. Previously the same function was used for page
allocation as well as retrieving a previously allocated page.
Prior to backing page allocation, if there is a mm_struct associated
with the enclave that is requesting the allocation, it is set
as the active memory control group.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
V1 -> V2:
 Changed sgx_encl_set_active_memcg() to simply return the correct
 memcg for the enclave and renamed to sgx_encl_get_mem_cgroup().

 Created helper function current_is_ksgxd() to improve readability.

 Use mmget_not_zero()/mmput_async() when searching mm_list.

 Move call to set_active_memcg() to sgx_encl_alloc_backing() and
 use mem_cgroup_put() to avoid leaking a memcg reference.

 Address review feedback regarding comments and commit log.

 arch/x86/kernel/cpu/sgx/encl.c | 109 ++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/encl.h |  11 +++-
 arch/x86/kernel/cpu/sgx/main.c |   4 +-
 3 files changed, 118 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..6d10202612d6 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	else
 		page_index = PFN_DOWN(encl->size);
 
-	ret = sgx_encl_get_backing(encl, page_index, &b);
+	ret = sgx_encl_lookup_backing(encl, page_index, &b);
 	if (ret)
 		return ret;
 
@@ -574,7 +574,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
  *   0 on success,
  *   -errno otherwise.
  */
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
+static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 			 struct sgx_backing *backing)
 {
 	pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
@@ -601,6 +601,111 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 	return 0;
 }
 
+/*
+ * When called from ksgxd, returns the mem_cgroup of a struct mm stored
+ * in the enclave's mm_list. When not called from ksgxd, just returns
+ * the mem_cgroup of the current task.
+ */
+static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
+{
+	struct mem_cgroup *memcg = NULL;
+	struct sgx_encl_mm *encl_mm;
+	int idx;
+
+	/*
+	 * If called from normal task context, return the mem_cgroup
+	 * of the current task's mm. The remainder of the handling is for
+	 * ksgxd.
+	 */
+	if (!current_is_ksgxd())
+		return get_mem_cgroup_from_mm(current->mm);
+
+	/*
+	 * Search the enclave's mm_list to find an mm associated with
+	 * this enclave to charge the allocation to.
+	 */
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+		if (!mmget_not_zero(encl_mm->mm))
+			continue;
+
+		memcg = get_mem_cgroup_from_mm(encl_mm->mm);
+
+		mmput_async(encl_mm->mm);
+
+		break;
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	/*
+	 * In the rare case that there isn't an mm associated with
+	 * the enclave, set memcg to the current active mem_cgroup.
+	 * This will be the root mem_cgroup if there is no active
+	 * mem_cgroup.
+	 */
+	if (!memcg)
+		return get_mem_cgroup_from_mm(NULL);
+
+	return memcg;
+}
+
+/**
+ * sgx_encl_alloc_backing() - allocate a new backing storage page
+ * @encl:	an enclave pointer
+ * @page_index:	enclave page index
+ * @backing:	data for accessing backing storage for the page
+ *
+ * When called from ksgxd, sets the active memcg from one of the
+ * mms in the enclave's mm_list prior to any backing page allocation,
+ * in order to ensure that shmem page allocations are charged to the
+ * enclave.
+ *
+ * Return:
+ *   0 on success,
+ *   -errno otherwise.
+ */
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+			   struct sgx_backing *backing)
+{
+	struct mem_cgroup *memcg, *old_memcg;
+	int ret;
+
+	memcg = sgx_encl_get_mem_cgroup(encl);
+
+	old_memcg = set_active_memcg(memcg);
+
+	ret = sgx_encl_get_backing(encl, page_index, backing);
+
+	set_active_memcg(old_memcg);
+
+	mem_cgroup_put(memcg);
+
+	return ret;
+}
+
+/**
+ * sgx_encl_lookup_backing() - retrieve an existing backing storage page
+ * @encl:	an enclave pointer
+ * @page_index:	enclave page index
+ * @backing:	data for accessing backing storage for the page
+ *
+ * Retrieve a backing page for loading data back into an EPC page with ELDU.
+ * It is the caller's responsibility to ensure that it is appropriate to use
+ * sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If lookup is
+ * not used correctly, this will cause an allocation which is not accounted for.
+ *
+ * Return:
+ *   0 on success,
+ *   -errno otherwise.
+ */
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+			   struct sgx_backing *backing)
+{
+	return sgx_encl_get_backing(encl, page_index, backing);
+}
+
 /**
  * sgx_encl_put_backing() - Unpin the backing storage
  * @backing:	data for accessing backing storage for the page
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..2de3b150ab00 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -100,13 +100,20 @@ static inline int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
 	return 0;
 }
 
+static inline bool current_is_ksgxd(void)
+{
+	return current->mm ? false : true;
+}
+
 int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 		     unsigned long end, unsigned long vm_flags);
 
 void sgx_encl_release(struct kref *ref);
 int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
-			 struct sgx_backing *backing);
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+			    struct sgx_backing *backing);
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+			   struct sgx_backing *backing);
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4b41efc9e367..7d41c8538795 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -310,7 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	encl->secs_child_cnt--;
 
 	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
-		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
+		ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size),
 					   &secs_backing);
 		if (ret)
 			goto out;
@@ -381,7 +381,7 @@ static void sgx_reclaim_pages(void)
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
-		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
+		ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]);
 		if (ret)
 			goto skip;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
  2022-05-19 21:04 [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation Kristen Carlson Accardi
@ 2022-05-19 21:22 ` Shakeel Butt
  2022-05-20 15:44   ` Kristen Carlson Accardi
  2022-05-19 22:31 ` Jarkko Sakkinen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Shakeel Butt @ 2022-05-19 21:22 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, stable, LKML,
	Linux MM, Michal Hocko, Roman Gushchin, Johannes Weiner

On Thu, May 19, 2022 at 2:05 PM Kristen Carlson Accardi
<kristen@linux.intel.com> wrote:
>
> When the system runs out of enclave memory, SGX can reclaim EPC pages
> by swapping to normal RAM. These backing pages are allocated via a
> per-enclave shared memory area. Since SGX allows unlimited over
> commit on EPC memory, the reclaimer thread can allocate a large
> number of backing RAM pages in response to EPC memory pressure.
>
> When the shared memory backing RAM allocation occurs during
> the reclaimer thread context, the shared memory is charged to
> the root memory control group, and the shmem usage of the enclave
> is not properly accounted for, making cgroups ineffective at
> limiting the amount of RAM an enclave can consume.
>
> For example, when using a cgroup to launch a set of test
> enclaves, the kernel does not properly account for 50% - 75% of
> shmem page allocations on average. In the worst case, when
> nearly all allocations occur during the reclaimer thread, the
> kernel accounts less than a percent of the amount of shmem used
> by the enclave's cgroup to the correct cgroup.
>
> SGX stores a list of mm_structs that are associated with
> an enclave. Pick one of them during reclaim and charge that
> mm's memcg with the shmem allocation. The one that gets picked
> is arbitrary, but this list almost always only has one mm. The
> cases where there is more than one mm with different memcg's
> are not worth considering.
>
> Create a new function - sgx_encl_alloc_backing(). This function
> is used whenever a new backing storage page needs to be
> allocated. Previously the same function was used for page
> allocation as well as retrieving a previously allocated page.
> Prior to backing page allocation, if there is a mm_struct associated
> with the enclave that is requesting the allocation, it is set
> as the active memory control group.
>
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>

For the memcg part:

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
  2022-05-19 21:04 [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation Kristen Carlson Accardi
  2022-05-19 21:22 ` Shakeel Butt
@ 2022-05-19 22:31 ` Jarkko Sakkinen
  2022-05-20 15:43   ` Kristen Carlson Accardi
  2022-05-19 23:01 ` Greg KH
  2022-05-20  1:24 ` Roman Gushchin
  3 siblings, 1 reply; 8+ messages in thread
From: Jarkko Sakkinen @ 2022-05-19 22:31 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, stable, linux-kernel,
	linux-mm, mhocko, roman.gushchin, hannes, shakeelb

On Thu, May 19, 2022 at 02:04:45PM -0700, Kristen Carlson Accardi wrote:
> When the system runs out of enclave memory, SGX can reclaim EPC pages
> by swapping to normal RAM. These backing pages are allocated via a
> per-enclave shared memory area. Since SGX allows unlimited over
> commit on EPC memory, the reclaimer thread can allocate a large
> number of backing RAM pages in response to EPC memory pressure.
> 
> When the shared memory backing RAM allocation occurs during
> the reclaimer thread context, the shared memory is charged to
> the root memory control group, and the shmem usage of the enclave
> is not properly accounted for, making cgroups ineffective at
> limiting the amount of RAM an enclave can consume.
> 
> For example, when using a cgroup to launch a set of test
> enclaves, the kernel does not properly account for 50% - 75% of
> shmem page allocations on average. In the worst case, when
> nearly all allocations occur during the reclaimer thread, the
> kernel accounts less than a percent of the amount of shmem used
> by the enclave's cgroup to the correct cgroup.
> 
> SGX stores a list of mm_structs that are associated with
> an enclave. Pick one of them during reclaim and charge that
> mm's memcg with the shmem allocation. The one that gets picked
> is arbitrary, but this list almost always only has one mm. The
> cases where there is more than one mm with different memcg's
> are not worth considering.
> 
> Create a new function - sgx_encl_alloc_backing(). This function
> is used whenever a new backing storage page needs to be
> allocated. Previously the same function was used for page
> allocation as well as retrieving a previously allocated page.
> Prior to backing page allocation, if there is a mm_struct associated
> with the enclave that is requesting the allocation, it is set
> as the active memory control group.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
> V1 -> V2:
>  Changed sgx_encl_set_active_memcg() to simply return the correct
>  memcg for the enclave and renamed to sgx_encl_get_mem_cgroup().
> 
>  Created helper function current_is_ksgxd() to improve readability.
> 
>  Use mmget_not_zero()/mmput_async() when searching mm_list.
> 
>  Move call to set_active_memcg() to sgx_encl_alloc_backing() and
>  use mem_cgroup_put() to avoid leaking a memcg reference.
> 
>  Address review feedback regarding comments and commit log.
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 109 ++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/sgx/encl.h |  11 +++-
>  arch/x86/kernel/cpu/sgx/main.c |   4 +-
>  3 files changed, 118 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 001808e3901c..6d10202612d6 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	else
>  		page_index = PFN_DOWN(encl->size);
>  
> -	ret = sgx_encl_get_backing(encl, page_index, &b);
> +	ret = sgx_encl_lookup_backing(encl, page_index, &b);
>  	if (ret)
>  		return ret;
>  
> @@ -574,7 +574,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
>   *   0 on success,
>   *   -errno otherwise.
>   */
> -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
>  			 struct sgx_backing *backing)
>  {
>  	pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
> @@ -601,6 +601,111 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
>  	return 0;
>  }
>  
> +/*
> + * When called from ksgxd, returns the mem_cgroup of a struct mm stored
> + * in the enclave's mm_list. When not called from ksgxd, just returns
> + * the mem_cgroup of the current task.
> + */
> +static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
> +{
> +	struct mem_cgroup *memcg = NULL;
> +	struct sgx_encl_mm *encl_mm;
> +	int idx;
> +
> +	/*
> +	 * If called from normal task context, return the mem_cgroup
> +	 * of the current task's mm. The remainder of the handling is for
> +	 * ksgxd.
> +	 */
> +	if (!current_is_ksgxd())
> +		return get_mem_cgroup_from_mm(current->mm);
> +
> +	/*
> +	 * Search the enclave's mm_list to find an mm associated with
> +	 * this enclave to charge the allocation to.
> +	 */
> +	idx = srcu_read_lock(&encl->srcu);
> +
> +	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
> +		if (!mmget_not_zero(encl_mm->mm))
> +			continue;
> +
> +		memcg = get_mem_cgroup_from_mm(encl_mm->mm);
> +
> +		mmput_async(encl_mm->mm);
> +
> +		break;
> +	}
> +
> +	srcu_read_unlock(&encl->srcu, idx);
> +
> +	/*
> +	 * In the rare case that there isn't an mm associated with
> +	 * the enclave, set memcg to the current active mem_cgroup.
> +	 * This will be the root mem_cgroup if there is no active
> +	 * mem_cgroup.
> +	 */
> +	if (!memcg)
> +		return get_mem_cgroup_from_mm(NULL);
> +
> +	return memcg;
> +}
> +
> +/**
> + * sgx_encl_alloc_backing() - allocate a new backing storage page
> + * @encl:	an enclave pointer
> + * @page_index:	enclave page index
> + * @backing:	data for accessing backing storage for the page
> + *
> + * When called from ksgxd, sets the active memcg from one of the
> + * mms in the enclave's mm_list prior to any backing page allocation,
> + * in order to ensure that shmem page allocations are charged to the
> + * enclave.
> + *
> + * Return:
> + *   0 on success,
> + *   -errno otherwise.
> + */
> +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
> +			   struct sgx_backing *backing)
> +{
> +	struct mem_cgroup *memcg, *old_memcg;
> +	int ret;
> +
> +	memcg = sgx_encl_get_mem_cgroup(encl);
> +
> +	old_memcg = set_active_memcg(memcg);
> +
> +	ret = sgx_encl_get_backing(encl, page_index, backing);
> +
> +	set_active_memcg(old_memcg);
> +
> +	mem_cgroup_put(memcg);

This is too sparse IMHO.

I would rewrite it as:

	struct mem_cgroup *encl_memcg = sgx_encl_get_mem_cgroup(encl);
	struct mem_cgroup *memcg = set_active_memcg(encl_memcg);
	int ret;

	ret = sgx_encl_get_backing(encl, page_index, backing);

	set_active_memcg(memcg);
	mem_cgroup_put(encl_memcg);

I think old_memcg is not very documentative, whereas this also dclearly
tells what is going on: enclave's memcg is temporarily swapped in order to
perform page allocation, so that the allocations gets accounted from them
correct cgroup.

> +
> +	return ret;
> +}
> +
> +/**
> + * sgx_encl_lookup_backing() - retrieve an existing backing storage page
> + * @encl:	an enclave pointer
> + * @page_index:	enclave page index
> + * @backing:	data for accessing backing storage for the page
> + *
> + * Retrieve a backing page for loading data back into an EPC page with ELDU.
> + * It is the caller's responsibility to ensure that it is appropriate to use
> + * sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If lookup is
> + * not used correctly, this will cause an allocation which is not accounted for.
> + *
> + * Return:
> + *   0 on success,
> + *   -errno otherwise.
> + */
> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
> +			   struct sgx_backing *backing)
> +{
> +	return sgx_encl_get_backing(encl, page_index, backing);
> +}
> +
>  /**
>   * sgx_encl_put_backing() - Unpin the backing storage
>   * @backing:	data for accessing backing storage for the page
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index fec43ca65065..2de3b150ab00 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -100,13 +100,20 @@ static inline int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
>  	return 0;
>  }
>  
> +static inline bool current_is_ksgxd(void)
> +{
> +	return current->mm ? false : true;
> +}
> +
>  int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>  		     unsigned long end, unsigned long vm_flags);
>  
>  void sgx_encl_release(struct kref *ref);
>  int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
> -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> -			 struct sgx_backing *backing);
> +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
> +			    struct sgx_backing *backing);
> +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
> +			   struct sgx_backing *backing);
>  void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
>  int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>  				  struct sgx_encl_page *page);
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 4b41efc9e367..7d41c8538795 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -310,7 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  	encl->secs_child_cnt--;
>  
>  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
> -		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> +		ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size),
>  					   &secs_backing);
>  		if (ret)
>  			goto out;
> @@ -381,7 +381,7 @@ static void sgx_reclaim_pages(void)
>  			goto skip;
>  
>  		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> -		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
> +		ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]);
>  		if (ret)
>  			goto skip;
>  
> -- 
> 2.20.1
> 

BR, Jarkko

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
  2022-05-19 21:04 [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation Kristen Carlson Accardi
  2022-05-19 21:22 ` Shakeel Butt
  2022-05-19 22:31 ` Jarkko Sakkinen
@ 2022-05-19 23:01 ` Greg KH
  2022-05-20  1:24 ` Roman Gushchin
  3 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2022-05-19 23:01 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, stable,
	linux-kernel, linux-mm, mhocko, roman.gushchin, hannes, shakeelb

On Thu, May 19, 2022 at 02:04:45PM -0700, Kristen Carlson Accardi wrote:
> When the system runs out of enclave memory, SGX can reclaim EPC pages
> by swapping to normal RAM. These backing pages are allocated via a
> per-enclave shared memory area. Since SGX allows unlimited over
> commit on EPC memory, the reclaimer thread can allocate a large
> number of backing RAM pages in response to EPC memory pressure.
> 
> When the shared memory backing RAM allocation occurs during
> the reclaimer thread context, the shared memory is charged to
> the root memory control group, and the shmem usage of the enclave
> is not properly accounted for, making cgroups ineffective at
> limiting the amount of RAM an enclave can consume.
> 
> For example, when using a cgroup to launch a set of test
> enclaves, the kernel does not properly account for 50% - 75% of
> shmem page allocations on average. In the worst case, when
> nearly all allocations occur during the reclaimer thread, the
> kernel accounts less than a percent of the amount of shmem used
> by the enclave's cgroup to the correct cgroup.
> 
> SGX stores a list of mm_structs that are associated with
> an enclave. Pick one of them during reclaim and charge that
> mm's memcg with the shmem allocation. The one that gets picked
> is arbitrary, but this list almost always only has one mm. The
> cases where there is more than one mm with different memcg's
> are not worth considering.
> 
> Create a new function - sgx_encl_alloc_backing(). This function
> is used whenever a new backing storage page needs to be
> allocated. Previously the same function was used for page
> allocation as well as retrieving a previously allocated page.
> Prior to backing page allocation, if there is a mm_struct associated
> with the enclave that is requesting the allocation, it is set
> as the active memory control group.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
> V1 -> V2:
>  Changed sgx_encl_set_active_memcg() to simply return the correct
>  memcg for the enclave and renamed to sgx_encl_get_mem_cgroup().
> 
>  Created helper function current_is_ksgxd() to improve readability.
> 
>  Use mmget_not_zero()/mmput_async() when searching mm_list.
> 
>  Move call to set_active_memcg() to sgx_encl_alloc_backing() and
>  use mem_cgroup_put() to avoid leaking a memcg reference.
> 
>  Address review feedback regarding comments and commit log.
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
  2022-05-19 21:04 [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation Kristen Carlson Accardi
                   ` (2 preceding siblings ...)
  2022-05-19 23:01 ` Greg KH
@ 2022-05-20  1:24 ` Roman Gushchin
  2022-05-20 15:40   ` Kristen Carlson Accardi
  3 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2022-05-20  1:24 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, stable,
	linux-kernel, linux-mm, mhocko, hannes, shakeelb

On Thu, May 19, 2022 at 02:04:45PM -0700, Kristen Carlson Accardi wrote:
> When the system runs out of enclave memory, SGX can reclaim EPC pages
> by swapping to normal RAM. These backing pages are allocated via a
> per-enclave shared memory area. Since SGX allows unlimited over
> commit on EPC memory, the reclaimer thread can allocate a large
> number of backing RAM pages in response to EPC memory pressure.
> 
> When the shared memory backing RAM allocation occurs during
> the reclaimer thread context, the shared memory is charged to
> the root memory control group, and the shmem usage of the enclave
> is not properly accounted for, making cgroups ineffective at
> limiting the amount of RAM an enclave can consume.
> 
> For example, when using a cgroup to launch a set of test
> enclaves, the kernel does not properly account for 50% - 75% of
> shmem page allocations on average. In the worst case, when
> nearly all allocations occur during the reclaimer thread, the
> kernel accounts less than a percent of the amount of shmem used
> by the enclave's cgroup to the correct cgroup.
> 
> SGX stores a list of mm_structs that are associated with
> an enclave. Pick one of them during reclaim and charge that
> mm's memcg with the shmem allocation. The one that gets picked
> is arbitrary, but this list almost always only has one mm. The
> cases where there is more than one mm with different memcg's
> are not worth considering.
> 
> Create a new function - sgx_encl_alloc_backing(). This function
> is used whenever a new backing storage page needs to be
> allocated. Previously the same function was used for page
> allocation as well as retrieving a previously allocated page.
> Prior to backing page allocation, if there is a mm_struct associated
> with the enclave that is requesting the allocation, it is set
> as the active memory control group.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
> V1 -> V2:
>  Changed sgx_encl_set_active_memcg() to simply return the correct
>  memcg for the enclave and renamed to sgx_encl_get_mem_cgroup().
> 
>  Created helper function current_is_ksgxd() to improve readability.
> 
>  Use mmget_not_zero()/mmput_async() when searching mm_list.
> 
>  Move call to set_active_memcg() to sgx_encl_alloc_backing() and
>  use mem_cgroup_put() to avoid leaking a memcg reference.
> 
>  Address review feedback regarding comments and commit log.
> 
>  arch/x86/kernel/cpu/sgx/encl.c | 109 ++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/sgx/encl.h |  11 +++-
>  arch/x86/kernel/cpu/sgx/main.c |   4 +-
>  3 files changed, 118 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 001808e3901c..6d10202612d6 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	else
>  		page_index = PFN_DOWN(encl->size);
>  
> -	ret = sgx_encl_get_backing(encl, page_index, &b);
> +	ret = sgx_encl_lookup_backing(encl, page_index, &b);
>  	if (ret)
>  		return ret;
>  
> @@ -574,7 +574,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
>   *   0 on success,
>   *   -errno otherwise.
>   */
> -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
> +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
>  			 struct sgx_backing *backing)
>  {
>  	pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
> @@ -601,6 +601,111 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
>  	return 0;
>  }
>  
> +/*
> + * When called from ksgxd, returns the mem_cgroup of a struct mm stored
> + * in the enclave's mm_list. When not called from ksgxd, just returns
> + * the mem_cgroup of the current task.
> + */
> +static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
> +{
> +	struct mem_cgroup *memcg = NULL;
> +	struct sgx_encl_mm *encl_mm;
> +	int idx;
> +
> +	/*
> +	 * If called from normal task context, return the mem_cgroup
> +	 * of the current task's mm. The remainder of the handling is for
> +	 * ksgxd.
> +	 */
> +	if (!current_is_ksgxd())
> +		return get_mem_cgroup_from_mm(current->mm);
> +
> +	/*
> +	 * Search the enclave's mm_list to find an mm associated with
> +	 * this enclave to charge the allocation to.
> +	 */
> +	idx = srcu_read_lock(&encl->srcu);
> +
> +	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
> +		if (!mmget_not_zero(encl_mm->mm))
> +			continue;
> +
> +		memcg = get_mem_cgroup_from_mm(encl_mm->mm);
> +
> +		mmput_async(encl_mm->mm);
> +
> +		break;
> +	}
> +
> +	srcu_read_unlock(&encl->srcu, idx);
> +
> +	/*
> +	 * In the rare case that there isn't an mm associated with
> +	 * the enclave, set memcg to the current active mem_cgroup.
> +	 * This will be the root mem_cgroup if there is no active
> +	 * mem_cgroup.
> +	 */
> +	if (!memcg)
> +		return get_mem_cgroup_from_mm(NULL);
> +
> +	return memcg;
> +}


You can simplify the function a bit. But it's up to you, not a strong
opinion.

static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
{
	struct mem_cgroup *memcg = NULL;
	struct sgx_encl_mm *encl_mm;
	int idx;

	if (current_is_ksgxd()) {
		/*
		 * Search the enclave's mm_list to find an mm associated with
		 * this enclave to charge the allocation to.
		 */
		idx = srcu_read_lock(&encl->srcu);
		list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
			if (!mmget_not_zero(encl_mm->mm))
				continue;

			memcg = get_mem_cgroup_from_mm(encl_mm->mm);
			mmput_async(encl_mm->mm);
			break;
		}
		srcu_read_unlock(&encl->srcu, idx);
	}

	return memcg ? memcg : get_mem_cgroup_from_mm(current->mm);
}

--

The rest of the patch looks good to me. Please, feel free to add:

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
  2022-05-20  1:24 ` Roman Gushchin
@ 2022-05-20 15:40   ` Kristen Carlson Accardi
  0 siblings, 0 replies; 8+ messages in thread
From: Kristen Carlson Accardi @ 2022-05-20 15:40 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, stable,
	linux-kernel, linux-mm, mhocko, hannes, shakeelb

On Thu, 2022-05-19 at 18:24 -0700, Roman Gushchin wrote:
> On Thu, May 19, 2022 at 02:04:45PM -0700, Kristen Carlson Accardi
> wrote:
> > When the system runs out of enclave memory, SGX can reclaim EPC
> > pages
> > by swapping to normal RAM. These backing pages are allocated via a
> > per-enclave shared memory area. Since SGX allows unlimited over
> > commit on EPC memory, the reclaimer thread can allocate a large
> > number of backing RAM pages in response to EPC memory pressure.
> > 
> > When the shared memory backing RAM allocation occurs during
> > the reclaimer thread context, the shared memory is charged to
> > the root memory control group, and the shmem usage of the enclave
> > is not properly accounted for, making cgroups ineffective at
> > limiting the amount of RAM an enclave can consume.
> > 
> > For example, when using a cgroup to launch a set of test
> > enclaves, the kernel does not properly account for 50% - 75% of
> > shmem page allocations on average. In the worst case, when
> > nearly all allocations occur during the reclaimer thread, the
> > kernel accounts less than a percent of the amount of shmem used
> > by the enclave's cgroup to the correct cgroup.
> > 
> > SGX stores a list of mm_structs that are associated with
> > an enclave. Pick one of them during reclaim and charge that
> > mm's memcg with the shmem allocation. The one that gets picked
> > is arbitrary, but this list almost always only has one mm. The
> > cases where there is more than one mm with different memcg's
> > are not worth considering.
> > 
> > Create a new function - sgx_encl_alloc_backing(). This function
> > is used whenever a new backing storage page needs to be
> > allocated. Previously the same function was used for page
> > allocation as well as retrieving a previously allocated page.
> > Prior to backing page allocation, if there is a mm_struct
> > associated
> > with the enclave that is requesting the allocation, it is set
> > as the active memory control group.
> > 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > ---
> > V1 -> V2:
> >  Changed sgx_encl_set_active_memcg() to simply return the correct
> >  memcg for the enclave and renamed to sgx_encl_get_mem_cgroup().
> > 
> >  Created helper function current_is_ksgxd() to improve readability.
> > 
> >  Use mmget_not_zero()/mmput_async() when searching mm_list.
> > 
> >  Move call to set_active_memcg() to sgx_encl_alloc_backing() and
> >  use mem_cgroup_put() to avoid leaking a memcg reference.
> > 
> >  Address review feedback regarding comments and commit log.
> > 
> >  arch/x86/kernel/cpu/sgx/encl.c | 109
> > ++++++++++++++++++++++++++++++++-
> >  arch/x86/kernel/cpu/sgx/encl.h |  11 +++-
> >  arch/x86/kernel/cpu/sgx/main.c |   4 +-
> >  3 files changed, 118 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > b/arch/x86/kernel/cpu/sgx/encl.c
> > index 001808e3901c..6d10202612d6 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page
> > *encl_page,
> >  	else
> >  		page_index = PFN_DOWN(encl->size);
> >  
> > -	ret = sgx_encl_get_backing(encl, page_index, &b);
> > +	ret = sgx_encl_lookup_backing(encl, page_index, &b);
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -574,7 +574,7 @@ static struct page
> > *sgx_encl_get_backing_page(struct sgx_encl *encl,
> >   *   0 on success,
> >   *   -errno otherwise.
> >   */
> > -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned
> > long page_index,
> >  			 struct sgx_backing *backing)
> >  {
> >  	pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >>
> > 5);
> > @@ -601,6 +601,111 @@ int sgx_encl_get_backing(struct sgx_encl
> > *encl, unsigned long page_index,
> >  	return 0;
> >  }
> >  
> > +/*
> > + * When called from ksgxd, returns the mem_cgroup of a struct mm
> > stored
> > + * in the enclave's mm_list. When not called from ksgxd, just
> > returns
> > + * the mem_cgroup of the current task.
> > + */
> > +static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl
> > *encl)
> > +{
> > +	struct mem_cgroup *memcg = NULL;
> > +	struct sgx_encl_mm *encl_mm;
> > +	int idx;
> > +
> > +	/*
> > +	 * If called from normal task context, return the mem_cgroup
> > +	 * of the current task's mm. The remainder of the handling is
> > for
> > +	 * ksgxd.
> > +	 */
> > +	if (!current_is_ksgxd())
> > +		return get_mem_cgroup_from_mm(current->mm);
> > +
> > +	/*
> > +	 * Search the enclave's mm_list to find an mm associated with
> > +	 * this enclave to charge the allocation to.
> > +	 */
> > +	idx = srcu_read_lock(&encl->srcu);
> > +
> > +	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
> > +		if (!mmget_not_zero(encl_mm->mm))
> > +			continue;
> > +
> > +		memcg = get_mem_cgroup_from_mm(encl_mm->mm);
> > +
> > +		mmput_async(encl_mm->mm);
> > +
> > +		break;
> > +	}
> > +
> > +	srcu_read_unlock(&encl->srcu, idx);
> > +
> > +	/*
> > +	 * In the rare case that there isn't an mm associated with
> > +	 * the enclave, set memcg to the current active mem_cgroup.
> > +	 * This will be the root mem_cgroup if there is no active
> > +	 * mem_cgroup.
> > +	 */
> > +	if (!memcg)
> > +		return get_mem_cgroup_from_mm(NULL);
> > +
> > +	return memcg;
> > +}
> 
> You can simplify the function a bit. But it's up to you, not a strong
> opinion.
> 
> static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl
> *encl)
> {
> 	struct mem_cgroup *memcg = NULL;
> 	struct sgx_encl_mm *encl_mm;
> 	int idx;
> 
> 	if (current_is_ksgxd()) {
> 		/*
> 		 * Search the enclave's mm_list to find an mm
> associated with
> 		 * this enclave to charge the allocation to.
> 		 */
> 		idx = srcu_read_lock(&encl->srcu);
> 		list_for_each_entry_rcu(encl_mm, &encl->mm_list, list)
> {
> 			if (!mmget_not_zero(encl_mm->mm))
> 				continue;
> 
> 			memcg = get_mem_cgroup_from_mm(encl_mm->mm);
> 			mmput_async(encl_mm->mm);
> 			break;
> 		}
> 		srcu_read_unlock(&encl->srcu, idx);
> 	}
> 
> 	return memcg ? memcg : get_mem_cgroup_from_mm(current->mm);
> }
> 

I don't have strong opinions on this either - I actually had it written
this way originally but then decided maybe other people would find it
more readable the other way. I definitely don't care either way.

> --
> 
> The rest of the patch looks good to me. Please, feel free to add:
> 
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
> 
> Thanks!

Thanks so much for your review, I will add this to v3.

Kristen



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
  2022-05-19 22:31 ` Jarkko Sakkinen
@ 2022-05-20 15:43   ` Kristen Carlson Accardi
  0 siblings, 0 replies; 8+ messages in thread
From: Kristen Carlson Accardi @ 2022-05-20 15:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, stable, linux-kernel,
	linux-mm, mhocko, roman.gushchin, hannes, shakeelb

On Fri, 2022-05-20 at 01:31 +0300, Jarkko Sakkinen wrote:
> On Thu, May 19, 2022 at 02:04:45PM -0700, Kristen Carlson Accardi
> wrote:
> > When the system runs out of enclave memory, SGX can reclaim EPC
> > pages
> > by swapping to normal RAM. These backing pages are allocated via a
> > per-enclave shared memory area. Since SGX allows unlimited over
> > commit on EPC memory, the reclaimer thread can allocate a large
> > number of backing RAM pages in response to EPC memory pressure.
> > 
> > When the shared memory backing RAM allocation occurs during
> > the reclaimer thread context, the shared memory is charged to
> > the root memory control group, and the shmem usage of the enclave
> > is not properly accounted for, making cgroups ineffective at
> > limiting the amount of RAM an enclave can consume.
> > 
> > For example, when using a cgroup to launch a set of test
> > enclaves, the kernel does not properly account for 50% - 75% of
> > shmem page allocations on average. In the worst case, when
> > nearly all allocations occur during the reclaimer thread, the
> > kernel accounts less than a percent of the amount of shmem used
> > by the enclave's cgroup to the correct cgroup.
> > 
> > SGX stores a list of mm_structs that are associated with
> > an enclave. Pick one of them during reclaim and charge that
> > mm's memcg with the shmem allocation. The one that gets picked
> > is arbitrary, but this list almost always only has one mm. The
> > cases where there is more than one mm with different memcg's
> > are not worth considering.
> > 
> > Create a new function - sgx_encl_alloc_backing(). This function
> > is used whenever a new backing storage page needs to be
> > allocated. Previously the same function was used for page
> > allocation as well as retrieving a previously allocated page.
> > Prior to backing page allocation, if there is a mm_struct
> > associated
> > with the enclave that is requesting the allocation, it is set
> > as the active memory control group.
> > 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > ---
> > V1 -> V2:
> >  Changed sgx_encl_set_active_memcg() to simply return the correct
> >  memcg for the enclave and renamed to sgx_encl_get_mem_cgroup().
> > 
> >  Created helper function current_is_ksgxd() to improve readability.
> > 
> >  Use mmget_not_zero()/mmput_async() when searching mm_list.
> > 
> >  Move call to set_active_memcg() to sgx_encl_alloc_backing() and
> >  use mem_cgroup_put() to avoid leaking a memcg reference.
> > 
> >  Address review feedback regarding comments and commit log.
> > 
> >  arch/x86/kernel/cpu/sgx/encl.c | 109
> > ++++++++++++++++++++++++++++++++-
> >  arch/x86/kernel/cpu/sgx/encl.h |  11 +++-
> >  arch/x86/kernel/cpu/sgx/main.c |   4 +-
> >  3 files changed, 118 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > b/arch/x86/kernel/cpu/sgx/encl.c
> > index 001808e3901c..6d10202612d6 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page
> > *encl_page,
> >  	else
> >  		page_index = PFN_DOWN(encl->size);
> >  
> > -	ret = sgx_encl_get_backing(encl, page_index, &b);
> > +	ret = sgx_encl_lookup_backing(encl, page_index, &b);
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -574,7 +574,7 @@ static struct page
> > *sgx_encl_get_backing_page(struct sgx_encl *encl,
> >   *   0 on success,
> >   *   -errno otherwise.
> >   */
> > -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > +static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned
> > long page_index,
> >  			 struct sgx_backing *backing)
> >  {
> >  	pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >>
> > 5);
> > @@ -601,6 +601,111 @@ int sgx_encl_get_backing(struct sgx_encl
> > *encl, unsigned long page_index,
> >  	return 0;
> >  }
> >  
> > +/*
> > + * When called from ksgxd, returns the mem_cgroup of a struct mm
> > stored
> > + * in the enclave's mm_list. When not called from ksgxd, just
> > returns
> > + * the mem_cgroup of the current task.
> > + */
> > +static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl
> > *encl)
> > +{
> > +	struct mem_cgroup *memcg = NULL;
> > +	struct sgx_encl_mm *encl_mm;
> > +	int idx;
> > +
> > +	/*
> > +	 * If called from normal task context, return the mem_cgroup
> > +	 * of the current task's mm. The remainder of the handling is
> > for
> > +	 * ksgxd.
> > +	 */
> > +	if (!current_is_ksgxd())
> > +		return get_mem_cgroup_from_mm(current->mm);
> > +
> > +	/*
> > +	 * Search the enclave's mm_list to find an mm associated with
> > +	 * this enclave to charge the allocation to.
> > +	 */
> > +	idx = srcu_read_lock(&encl->srcu);
> > +
> > +	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
> > +		if (!mmget_not_zero(encl_mm->mm))
> > +			continue;
> > +
> > +		memcg = get_mem_cgroup_from_mm(encl_mm->mm);
> > +
> > +		mmput_async(encl_mm->mm);
> > +
> > +		break;
> > +	}
> > +
> > +	srcu_read_unlock(&encl->srcu, idx);
> > +
> > +	/*
> > +	 * In the rare case that there isn't an mm associated with
> > +	 * the enclave, set memcg to the current active mem_cgroup.
> > +	 * This will be the root mem_cgroup if there is no active
> > +	 * mem_cgroup.
> > +	 */
> > +	if (!memcg)
> > +		return get_mem_cgroup_from_mm(NULL);
> > +
> > +	return memcg;
> > +}
> > +
> > +/**
> > + * sgx_encl_alloc_backing() - allocate a new backing storage page
> > + * @encl:	an enclave pointer
> > + * @page_index:	enclave page index
> > + * @backing:	data for accessing backing storage for the page
> > + *
> > + * When called from ksgxd, sets the active memcg from one of the
> > + * mms in the enclave's mm_list prior to any backing page
> > allocation,
> > + * in order to ensure that shmem page allocations are charged to
> > the
> > + * enclave.
> > + *
> > + * Return:
> > + *   0 on success,
> > + *   -errno otherwise.
> > + */
> > +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > +			   struct sgx_backing *backing)
> > +{
> > +	struct mem_cgroup *memcg, *old_memcg;
> > +	int ret;
> > +
> > +	memcg = sgx_encl_get_mem_cgroup(encl);
> > +
> > +	old_memcg = set_active_memcg(memcg);
> > +
> > +	ret = sgx_encl_get_backing(encl, page_index, backing);
> > +
> > +	set_active_memcg(old_memcg);
> > +
> > +	mem_cgroup_put(memcg);
> 
> This is too sparse IMHO.
> 
> I would rewrite it as:
> 
> 	struct mem_cgroup *encl_memcg = sgx_encl_get_mem_cgroup(encl);
> 	struct mem_cgroup *memcg = set_active_memcg(encl_memcg);
> 	int ret;
> 
> 	ret = sgx_encl_get_backing(encl, page_index, backing);
> 
> 	set_active_memcg(memcg);
> 	mem_cgroup_put(encl_memcg);
> 
> I think old_memcg is not very documentative, whereas this also
> dclearly
> tells what is going on: enclave's memcg is temporarily swapped in
> order to
> perform page allocation, so that the allocations gets accounted from
> them
> correct cgroup.

Thanks for your feedback, I will incorporate it into v3.

--Kristen

> 
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * sgx_encl_lookup_backing() - retrieve an existing backing
> > storage page
> > + * @encl:	an enclave pointer
> > + * @page_index:	enclave page index
> > + * @backing:	data for accessing backing storage for the page
> > + *
> > + * Retrieve a backing page for loading data back into an EPC page
> > with ELDU.
> > + * It is the caller's responsibility to ensure that it is
> > appropriate to use
> > + * sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing().
> > If lookup is
> > + * not used correctly, this will cause an allocation which is not
> > accounted for.
> > + *
> > + * Return:
> > + *   0 on success,
> > + *   -errno otherwise.
> > + */
> > +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > +			   struct sgx_backing *backing)
> > +{
> > +	return sgx_encl_get_backing(encl, page_index, backing);
> > +}
> > +
> >  /**
> >   * sgx_encl_put_backing() - Unpin the backing storage
> >   * @backing:	data for accessing backing storage for the page
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h
> > b/arch/x86/kernel/cpu/sgx/encl.h
> > index fec43ca65065..2de3b150ab00 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -100,13 +100,20 @@ static inline int sgx_encl_find(struct
> > mm_struct *mm, unsigned long addr,
> >  	return 0;
> >  }
> >  
> > +static inline bool current_is_ksgxd(void)
> > +{
> > +	return current->mm ? false : true;
> > +}
> > +
> >  int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
> >  		     unsigned long end, unsigned long vm_flags);
> >  
> >  void sgx_encl_release(struct kref *ref);
> >  int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
> > -int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > -			 struct sgx_backing *backing);
> > +int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > +			    struct sgx_backing *backing);
> > +int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long
> > page_index,
> > +			   struct sgx_backing *backing);
> >  void sgx_encl_put_backing(struct sgx_backing *backing, bool
> > do_write);
> >  int sgx_encl_test_and_clear_young(struct mm_struct *mm,
> >  				  struct sgx_encl_page *page);
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c
> > b/arch/x86/kernel/cpu/sgx/main.c
> > index 4b41efc9e367..7d41c8538795 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -310,7 +310,7 @@ static void sgx_reclaimer_write(struct
> > sgx_epc_page *epc_page,
> >  	encl->secs_child_cnt--;
> >  
> >  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED,
> > &encl->flags)) {
> > -		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> > +		ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl-
> > >size),
> >  					   &secs_backing);
> >  		if (ret)
> >  			goto out;
> > @@ -381,7 +381,7 @@ static void sgx_reclaim_pages(void)
> >  			goto skip;
> >  
> >  		page_index = PFN_DOWN(encl_page->desc - encl_page-
> > >encl->base);
> > -		ret = sgx_encl_get_backing(encl_page->encl, page_index,
> > &backing[i]);
> > +		ret = sgx_encl_alloc_backing(encl_page->encl,
> > page_index, &backing[i]);
> >  		if (ret)
> >  			goto skip;
> >  
> > -- 
> > 2.20.1
> > 
> 
> BR, Jarkko


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation
  2022-05-19 21:22 ` Shakeel Butt
@ 2022-05-20 15:44   ` Kristen Carlson Accardi
  0 siblings, 0 replies; 8+ messages in thread
From: Kristen Carlson Accardi @ 2022-05-20 15:44 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: linux-sgx, Jarkko Sakkinen, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, stable, LKML,
	Linux MM, Michal Hocko, Roman Gushchin, Johannes Weiner

On Thu, 2022-05-19 at 14:22 -0700, Shakeel Butt wrote:
> On Thu, May 19, 2022 at 2:05 PM Kristen Carlson Accardi
> <kristen@linux.intel.com> wrote:
> > When the system runs out of enclave memory, SGX can reclaim EPC
> > pages
> > by swapping to normal RAM. These backing pages are allocated via a
> > per-enclave shared memory area. Since SGX allows unlimited over
> > commit on EPC memory, the reclaimer thread can allocate a large
> > number of backing RAM pages in response to EPC memory pressure.
> > 
> > When the shared memory backing RAM allocation occurs during
> > the reclaimer thread context, the shared memory is charged to
> > the root memory control group, and the shmem usage of the enclave
> > is not properly accounted for, making cgroups ineffective at
> > limiting the amount of RAM an enclave can consume.
> > 
> > For example, when using a cgroup to launch a set of test
> > enclaves, the kernel does not properly account for 50% - 75% of
> > shmem page allocations on average. In the worst case, when
> > nearly all allocations occur during the reclaimer thread, the
> > kernel accounts less than a percent of the amount of shmem used
> > by the enclave's cgroup to the correct cgroup.
> > 
> > SGX stores a list of mm_structs that are associated with
> > an enclave. Pick one of them during reclaim and charge that
> > mm's memcg with the shmem allocation. The one that gets picked
> > is arbitrary, but this list almost always only has one mm. The
> > cases where there is more than one mm with different memcg's
> > are not worth considering.
> > 
> > Create a new function - sgx_encl_alloc_backing(). This function
> > is used whenever a new backing storage page needs to be
> > allocated. Previously the same function was used for page
> > allocation as well as retrieving a previously allocated page.
> > Prior to backing page allocation, if there is a mm_struct
> > associated
> > with the enclave that is requesting the allocation, it is set
> > as the active memory control group.
> > 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> 
> For the memcg part:
> 
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

Thanks!



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-20 15:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 21:04 [PATCH v2] x86/sgx: Set active memcg prior to shmem allocation Kristen Carlson Accardi
2022-05-19 21:22 ` Shakeel Butt
2022-05-20 15:44   ` Kristen Carlson Accardi
2022-05-19 22:31 ` Jarkko Sakkinen
2022-05-20 15:43   ` Kristen Carlson Accardi
2022-05-19 23:01 ` Greg KH
2022-05-20  1:24 ` Roman Gushchin
2022-05-20 15:40   ` Kristen Carlson Accardi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.