All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kristen Carlson Accardi <kristen@linux.intel.com>
To: linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org,
	cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>,
	linux-doc@vger.kernel.org
Subject: [RFC PATCH 20/20] docs, cgroup, x86/sgx: Add SGX EPC cgroup controller documentation
Date: Thu, 22 Sep 2022 10:10:57 -0700	[thread overview]
Message-ID: <20220922171057.1236139-21-kristen@linux.intel.com> (raw)
In-Reply-To: <20220922171057.1236139-1-kristen@linux.intel.com>

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add initial documentation for the SGX EPC cgroup controller,
which regulates distribution of SGX Enclave Page Cache (EPC) memory.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 201 ++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index be4a77baf784..c355cb08fc18 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -71,6 +71,10 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
        5.9-2 Migration and Ownership
      5-10. Others
        5-10-1. perf_event
+     5-11. SGX EPC
+       5-11-1. SGX EPC Interface Files
+       5-11-2. Usage Guidelines
+       5-11-3. Migration
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -2440,6 +2444,203 @@ always be filtered by cgroup v2 path.  The controller can still be
 moved to a legacy hierarchy after v2 hierarchy is populated.
 
 
+SGX EPC
+-------
+
+The "sgx_epc" controller regulates distribution of SGX EPC memory,
+which is a subset of system RAM that is used to provide SGX-enabled
+applications with protected memory, and is otherwise inaccessible,
+i.e. shows up as reserved in /proc/iomem and cannot be read/written
+outside of an SGX enclave.
+
+Although current systems implement EPC by stealing memory from RAM,
+for all intents and purposes the EPC is independent from normal system
+memory, e.g. must be reserved at boot from RAM and cannot be converted
+between EPC and normal memory while the system is running.  The EPC is
+managed by the SGX subsystem and is not accounted by the memory
+controller.  Note that this is true only for EPC memory itself, i.e.
+normal memory allocations related to SGX and EPC memory, e.g. the
+backing memory for evicted EPC pages, are accounted, limited and
+protected by the memory controller.
+
+Much like normal system memory, EPC memory can be overcommitted via
+virtual memory techniques and pages can be swapped out of the EPC
+to their backing store (normal system memory allocated via shmem).
+The SGX EPC subsystem is analogous to the memory subsytem, and the
+SGX EPC controller is in turn analogous to the memory controller;
+it implements limit and protection models for EPC memory.
+
+See Documentation/x86/sgx.rst for more info on SGX and EPC.
+
+SGX EPC Interface Files
+~~~~~~~~~~~~~~~~~~~~~~~
+
+All SGX EPC memory amounts are in bytes unless explicitly stated
+otherwise.  If a value which is not PAGE_SIZE aligned is written,
+the actual value used by the controller will be rounded down to
+the closest PAGE_SIZE multiple.
+
+  sgx_epc.current
+
+	A read-only single value file which exists on all cgroups.
+
+	The total amount of EPC memory currently being used by the
+	cgroup and its descendants.
+
+  sgx_epc.low
+
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "0".
+
+	Best-effort protection of EPC usage.  If the EPC usage of a
+	cgroup is below its limits, and all its ancestors are below
+	their low limits, then the cgroup's EPC won't be reclaimed
+	unless EPC cannot be reclaimed from unprotected cgroups,
+	e.g. all sibling cgroups are also below their low limit.
+
+	Setting low to a value more than the amount of EPC available
+	is discouraged.  The low limit is effectively ignored if the
+	cgroup's high or max limit is less than its low limit.
+
+  sgx_epc.high
+
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "max".
+
+	EPC usage best-effort limit.  This is the main mechanism to
+	control EPC usage of a cgroup.  If a cgroup's usage goes
+	over the high boundary, EPC pages will be reclaimed from
+	the cgroup until it is back under the high limit.
+
+	Going over the high limit does not prevent allocation of
+	additional EPC pages, e.g. EPC usage will often spike above
+	the high limit during enclave creation, when a large number
+	of EPC pages are EADDed in a short period.
+
+  sgx_epc.max
+
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "max".
+
+	EPC usage hard limit.  If a cgroup's EPC usage reaches this
+	limit, EPC allocations, e.g. for page fault handling, will
+	be blocked until EPC can be reclaimed from the cgroup.  If
+	EPC cannot be reclaimed in a timely manner, reclaim will be
+	forced, e.g. by ignoring LRU.
+
+	The max limit is intended to be a last line of defense; it
+	should rarely come into play on a properly configured and
+	monitored system.
+
+  sgx_epc.stats
+
+	A read-write flat-keyed file which exists on all cgroups.
+	Reads from the file display the cgroup's statistics, while
+	writes reset the underlying counters (if applicable).
+
+	The entries are ordered to be human readable, and new entries
+	can show up in the middle.  Don't rely on items remaining in a
+	fixed position; use the keys to look up specific values!
+
+	The following entries are defined.
+
+	  pages
+
+		The total number of pages currently being used by the
+		cgroup and its descendants, i.e. sgx_epc.current / 4096.
+
+	  direct
+
+		The number of pages currently being used by the cgroup
+		itself, excluding its descendants.
+
+	  indirect
+
+		The number of pages currently being used by the cgroup's
+		descendants, excluding its own pages.
+
+	  reclaimed
+
+		The number of pages that have been reclaimed from the
+		cgroup (since sgx_epc.stats was last reset).
+
+	  reclamations
+
+		The number of times this cgroup's LRU lists have been
+		scanned for reclaim, i.e. the number of times the cgroup
+		has been selected for reclaim via any code path.
+
+  sgx_epc.events
+
+	A read-write flat-keyed file which exists on non-root cgroups.
+	Writes to the file reset the event counters to zero.  A value
+	change in this file generates a file modified event.
+
+	The following entries are defined.
+
+	  low
+
+		The number of times the cgroup has been reclaimed even
+		though its usage is under the low boundary, e.g. due to
+		all sibling cgroups also being low.  This event usually
+		indicates that the low boundary is over-committed.
+
+	  high
+
+		The number of times the cgroup has triggered a reclaim
+		due to its EPC usage exceeding its high EPC boundary.
+		This event is expected for cgroups whose EPC usage is
+		capped by its high limit rather than global pressure.
+
+	  max
+
+		The number of times the cgroup has triggered a reclaim
+		due to its EPC usage  approaching (or exceeding) its max
+		EPC boundary.
+
+Usage Guidelines
+~~~~~~~~~~~~~~~~
+
+"sgx_epc.high" and "sgx_epc.low" are the main mechanisms to control
+EPC usage; using "sgx_epc.max" as anything other than a safety net
+is inadvisable, SGX application performance will suffer greatly if
+a process encounters its max limit.  Because a cgroup is allowed to
+breach its high limit, e.g. to fault in a page, performance is not
+artificially limited, whereas the max limit will effectively block
+a faulting application until the kernel can reclaim EPC memory from
+the cgroup.
+
+Exactly how "sgx_epc.high" is utilized will vary case by case, i.e.
+there is no one "correct" strategy.  Deferring to global EPC memory
+pressure, e.g. by overcommitting on the high limit, may be the most
+effective approach for a particular situation, whereas a different
+scenario might warrant a more draconian usage of the high limit.
+Regardless of the strategy used, because breach of the high limit
+does not cause processes to block or be killed, a management agent
+has ample opportunity to monitor and react as needed, e.g. it can
+raise the offending cgroup's high limit or terminate the workload.
+
+Similarly, "sgx_epc.low" can play different roles depending on the
+situation, e.g. it can be set to a relatively high value to protect
+a mission critical workload, or it may be used to reserve a minimal
+amount of EPC memory simply to ensure forward progress.  Employing
+"sgx_epc.low" in some capacity is generally recommended, especially
+when overcommitting "sgx_epc.high", as it is relatively common for
+a system to be under heavy EPC pressure; this holds true even on a
+carefully tuned system, as initializing an enclave requires all of
+the enclave's pages be brought into the EPC at some point prior to
+initialization, if only temporarily.
+
+Migration
+~~~~~~~~~~~~~~~~
+
+Once an EPC page is charged to a cgroup (during allocation), it
+remains charged to the original cgroup until the page is released
+or reclaimed.  Migrating a process to a different cgroup doesn't
+move the EPC charges that it incurred while in the previous cgroup
+to its new cgroup.
+
+
 Non-normative information
 -------------------------
 
-- 
2.37.3


  parent reply	other threads:[~2022-09-22 17:14 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 17:10 [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 01/20] x86/sgx: Call cond_resched() at the end of sgx_reclaim_pages() Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-23 12:32   ` Jarkko Sakkinen
2022-09-23 12:32     ` Jarkko Sakkinen
2022-09-23 12:35     ` Jarkko Sakkinen
2022-09-23 12:35       ` Jarkko Sakkinen
2022-09-23 12:37       ` Jarkko Sakkinen
2022-09-23 12:37         ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 02/20] x86/sgx: Store EPC page owner as a 'void *' to handle multiple users Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 18:54   ` Dave Hansen
2022-09-22 18:54     ` Dave Hansen
2022-09-23 12:49   ` Jarkko Sakkinen
2022-09-23 12:49     ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 03/20] x86/sgx: Track owning enclave in VA EPC pages Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 18:55   ` Dave Hansen
2022-09-22 18:55     ` Dave Hansen
2022-09-22 20:04     ` Kristen Carlson Accardi
2022-09-22 20:04       ` Kristen Carlson Accardi
2022-09-22 21:39       ` Dave Hansen
2022-09-22 21:39         ` Dave Hansen
2022-09-23 12:52   ` Jarkko Sakkinen
2022-09-23 12:52     ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 04/20] x86/sgx: Add 'struct sgx_epc_lru' to encapsulate lru list(s) Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-23 13:20   ` Jarkko Sakkinen
2022-09-23 13:20     ` Jarkko Sakkinen
2022-09-29 23:04     ` Kristen Carlson Accardi
2022-09-29 23:04       ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 05/20] x86/sgx: Introduce unreclaimable EPC page lists Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-23 13:29   ` Jarkko Sakkinen
2022-09-23 13:29     ` Jarkko Sakkinen
2022-09-22 17:10 ` [RFC PATCH 06/20] x86/sgx: Introduce RECLAIM_IN_PROGRESS flag for EPC pages Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 07/20] x86/sgx: Use a list to track to-be-reclaimed pages during reclaim Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 08/20] x86/sgx: Add EPC page flags to identify type of page Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 09/20] x86/sgx: Allow reclaiming up to 32 pages, but scan 16 by default Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 10/20] x86/sgx: Return the number of EPC pages that were successfully reclaimed Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 11/20] x86/sgx: Add option to ignore age of page during EPC reclaim Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 12/20] x86/sgx: Add helper to retrieve SGX EPC LRU given an EPC page Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 13/20] x86/sgx: Prepare for multiple LRUs Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 14/20] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 15/20] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 16/20] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 17/20] cgroup, x86/sgx: Add SGX EPC cgroup controller Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 18/20] x86/sgx: Enable EPC cgroup controller in SGX core Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` [RFC PATCH 19/20] x86/sgx: Add stats and events interfaces to EPC cgroup controller Kristen Carlson Accardi
2022-09-22 17:10   ` Kristen Carlson Accardi
2022-09-22 17:10 ` Kristen Carlson Accardi [this message]
2022-09-22 17:41 ` [RFC PATCH 00/20] Add Cgroup support for SGX EPC memory Tejun Heo
2022-09-22 17:41   ` Tejun Heo
2022-09-22 18:59   ` Kristen Carlson Accardi
2022-09-22 18:59     ` Kristen Carlson Accardi
2022-09-22 19:08     ` Tejun Heo
2022-09-22 19:08       ` Tejun Heo
2022-09-22 21:03       ` Dave Hansen
2022-09-22 21:03         ` Dave Hansen
2022-09-24  0:09         ` Tejun Heo
2022-09-24  0:09           ` Tejun Heo
2022-09-26 18:30           ` Kristen Carlson Accardi
2022-09-26 18:30             ` Kristen Carlson Accardi
2022-10-07 16:39           ` Kristen Carlson Accardi
2022-10-07 16:39             ` Kristen Carlson Accardi
2022-10-07 16:42             ` Tejun Heo
2022-10-07 16:42               ` Tejun Heo
2022-10-07 16:46               ` Kristen Carlson Accardi
2022-10-07 16:46                 ` Kristen Carlson Accardi
2022-09-23 12:24 ` Jarkko Sakkinen
2022-09-23 12:24   ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220922171057.1236139-21-kristen@linux.intel.com \
    --to=kristen@linux.intel.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=seanjc@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.