linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
@ 2022-02-08  0:45 Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 01/32] x86/sgx: Add short descriptions to ENCLS wrappers Reinette Chatre
                   ` (32 more replies)
  0 siblings, 33 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

V1: https://lore.kernel.org/linux-sgx/cover.1638381245.git.reinette.chatre@intel.com/

Changes since V1 that directly impact user space:
- SGX2 permission changes changed from a single ioctl() named
  SGX_IOC_PAGE_MODP to two new ioctl()s:
  SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
  SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS, supported by two different
  parameter structures (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS does
  not support a result output parameter) (Jarkko).

  User space flow impact: After user space runs ENCLU[EMODPE] it
  needs to call SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to have PTEs
  updated. Previously running SGX_IOC_PAGE_MODP in this scenario
  resulted in EPCM.PR being set but calling
  SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will not result in EPCM.PR
  being set anymore and thus no need for an additional
  ENCLU[EACCEPT].

- SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
  SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
  obtain new permissions from secinfo as parameter instead of
  the permissions directly (Jarkko).

- ioctl() supporting SGX2 page type change is renamed from
  SGX_IOC_PAGE_MODT to SGX_IOC_ENCLAVE_MODIFY_TYPE (Jarkko).

- SGX_IOC_ENCLAVE_MODIFY_TYPE obtains new page type from secinfo
  as parameter instead of the page type directly (Jarkko).

- ioctl() supporting SGX2 page removal is renamed from
  SGX_IOC_PAGE_REMOVE to SGX_IOC_ENCLAVE_REMOVE_PAGES (Jarkko).

- All ioctl() parameter structures have been renamed as a result of the
  ioctl() renaming:
  SGX_IOC_ENCLAVE_RELAX_PERMISSIONS => struct sgx_enclave_relax_perm
  SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS => struct sgx_enclave_restrict_perm
  SGX_IOC_ENCLAVE_MODIFY_TYPE => struct sgx_enclave_modt
  SGX_IOC_ENCLAVE_REMOVE_PAGES => struct sgx_enclave_remove_pages

Changes since V1 that do not directly impact user space:
- Number of patches in series increased from 25 to 32 primarily because
  of splitting the original submission:
  - Wrappers for the new SGX2 functions are introduced in three separate
    patches replacing the original "x86/sgx: Add wrappers for SGX2
    functions"
    (Jarkko).
  - Moving and renaming sgx_encl_ewb_cpumask() is done with two patches
    replacing the original "x86/sgx: Use more generic name for enclave
    cpumask function" (Jarkko).
  - Support for SGX2 EPCM permission changes is split into two ioctls(),
    one for relaxing and one for restricting permissions, each introduced
    by a new patch replacing the original "x86/sgx: Support enclave page
    permission changes" (Jarkko).
  - Extracted code used by existing ioctls() for usage by new ioctl()s
    into a new utility in new patch "x86/sgx: Create utility to validate
    user provided offset and length" (Dave did not specifically ask for
    this but it addresses his review feedback).
  - Two new Documentation patches to support the SGX2 work
    ("Documentation/x86: Introduce enclave runtime management") and
    a dedicated section on the enclave permission management
    ("Documentation/x86: Document SGX permission details") (Andy).
- Most patches were reworked to improve the language by:
  * aiming to refer to exact item instead of English rephrasing (Jarkko).
  * use ioctl() instead of ioctl throughout (Dave).
  * Use "relaxed" instead of "exceed" when referring to permissions
    (Dave).
- Improved documentation with several additions to
  Documentation/x86/sgx.rst.
- Many smaller changes, please refer to individual patches.

Hi Everybody,

The current Linux kernel support for SGX includes support for SGX1 that
requires that an enclave be created with properties that accommodate all
usages over its (the enclave's) lifetime. This includes properties such
as permissions of enclave pages, the number of enclave pages, and the
number of threads supported by the enclave.

Consequences of this requirement to have the enclave be created to
accommodate all usages include:
* pages needing to support relocated code are required to have RWX
  permissions for their entire lifetime,
* an enclave needs to be created with the maximum stack and heap
  projected to be needed during the enclave's entire lifetime which
  can be longer than the processes running within it,
* an enclave needs to be created with support for the maximum number
  of threads projected to run in the enclave.

Since SGX1 a few more functions were introduced, collectively called
SGX2, that support modifications to an initialized enclave. Hardware
supporting these functions are already available as listed on
https://github.com/ayeks/SGX-hardware

This series adds support for SGX2, also referred to as Enclave Dynamic
Memory Management (EDMM). This includes:

* Support modifying permissions of regular enclave pages belonging to an
  initialized enclave. New permissions are not allowed to exceed the
  originally vetted permissions. For example, RX isn't allowed unless
  the page was originally added with RX or RWX.
  Modifying permissions is accomplished with two new ioctl()s:
  SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
  SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS.

* Support dynamic addition of regular enclave pages to an initialized
  enclave. Pages are added with RW permissions as their "originally
  vetted permissions" (see previous bullet) and thus not allowed to
  be made executable at this time. Enabling dynamically added pages
  to obtain executable permissions require integration with user space
  policy that is deferred until the core SGX2 enabling is complete.
  Pages are dynamically added to an initialized enclave from the SGX
  page fault handler.

* Support expanding an initialized enclave to accommodate more threads.
  More threads can be accommodated by an enclave with the addition of
  Thread Control Structure (TCS) pages that is done by changing the
  type of regular enclave pages to TCS pages using a new ioctl()
  SGX_IOC_ENCLAVE_MODIFY_TYPE.

* Support removing regular and TCS pages from an initialized enclave.
  Removing pages is accomplished in two stages as supported by two new
  ioctl()s SGX_IOC_ENCLAVE_MODIFY_TYPE (same ioctl() as mentioned in
  previous bullet) and SGX_IOC_ENCLAVE_REMOVE_PAGES.

* Tests covering all the new flows, some edge cases, and one
  comprehensive stress scenario.

No additional work is needed to support SGX2 in a virtualized
environment. The tests included in this series can also be run from
a guest and was tested with the recent QEMU release based on 6.2.0
that supports SGX.

Patches 1 to 14 prepares the existing code for SGX2 support by
introducing the SGX2 functions, making sure pages remain accessible
after their enclave permissions are changed, and tracking enclave page
types as well as runtime permissions as needed by SGX2.

Patches 15 through 32 are a mix of x86/sgx and selftests/sgx patches
that follow the format where first an SGX2 feature is
enabled and then followed by tests of the new feature and/or
tests of scenarios that combine SGX2 features enabled up to that point.

In two cases (patches 20 and 31) code in support of SGX2 is separated
out with detailed motivation to support the review.

This series is based on v5.17-rc2 with the following fixes additionally
applied:

"selftests/sgx: Remove extra newlines in test output"
 https://lore.kernel.org/linux-sgx/16317683a1822bbd44ab3ca48b60a9a217ac24de.1643754040.git.reinette.chatre@intel.com/
"selftests/sgx: Ensure enclave data available during debug print"
 https://lore.kernel.org/linux-sgx/eaaeeb9122916d831942fc8a3043c687137314c1.1643754040.git.reinette.chatre@intel.com/
"selftests/sgx: Do not attempt enclave build without valid enclave"
 https://lore.kernel.org/linux-sgx/4e4ea6d70c286c209964bec1e8d29ac8e692748b.1643754040.git.reinette.chatre@intel.com/
"selftests/sgx: Fix NULL-pointer-dereference upon early test failure"
 https://lore.kernel.org/linux-sgx/89824888783fd8e770bfc64530c7549650a41851.1643754040.git.reinette.chatre@intel.com/
"x86/sgx: Add poison handling to reclaimer"
 https://lore.kernel.org/linux-sgx/dcc95eb2aaefb042527ac50d0a50738c7c160dac.1643830353.git.reinette.chatre@intel.com/
"x86/sgx: Silence softlockup detection when releasing large enclaves"
 https://lore.kernel.org/linux-sgx/b5e9f218064aa76e3026f778e1ad0a1d823e3db8.1643133224.git.reinette.chatre@intel.com/

Your feedback will be greatly appreciated.

Regards,

Reinette

Reinette Chatre (32):
  x86/sgx: Add short descriptions to ENCLS wrappers
  x86/sgx: Add wrapper for SGX2 EMODPR function
  x86/sgx: Add wrapper for SGX2 EMODT function
  x86/sgx: Add wrapper for SGX2 EAUG function
  Documentation/x86: Document SGX permission details
  x86/sgx: Support VMA permissions more relaxed than enclave permissions
  x86/sgx: Add pfn_mkwrite() handler for present PTEs
  x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic
    permission changes
  x86/sgx: Export sgx_encl_ewb_cpumask()
  x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask()
  x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes()
  x86/sgx: Make sgx_ipi_cb() available internally
  x86/sgx: Create utility to validate user provided offset and length
  x86/sgx: Keep record of SGX page type
  x86/sgx: Support relaxing of enclave page permissions
  x86/sgx: Support restricting of enclave page permissions
  selftests/sgx: Add test for EPCM permission changes
  selftests/sgx: Add test for TCS page permission changes
  x86/sgx: Support adding of pages to an initialized enclave
  x86/sgx: Tighten accessible memory range after enclave initialization
  selftests/sgx: Test two different SGX2 EAUG flows
  x86/sgx: Support modifying SGX page type
  x86/sgx: Support complete page removal
  Documentation/x86: Introduce enclave runtime management section
  selftests/sgx: Introduce dynamic entry point
  selftests/sgx: Introduce TCS initialization enclave operation
  selftests/sgx: Test complete changing of page type flow
  selftests/sgx: Test faulty enclave behavior
  selftests/sgx: Test invalid access to removed enclave page
  selftests/sgx: Test reclaiming of untouched page
  x86/sgx: Free up EPC pages directly to support large page ranges
  selftests/sgx: Page removal stress test

 Documentation/x86/sgx.rst                     |   64 +-
 arch/x86/include/asm/sgx.h                    |    8 +
 arch/x86/include/uapi/asm/sgx.h               |   81 +
 arch/x86/kernel/cpu/sgx/encl.c                |  334 +++-
 arch/x86/kernel/cpu/sgx/encl.h                |   12 +-
 arch/x86/kernel/cpu/sgx/encls.h               |   33 +
 arch/x86/kernel/cpu/sgx/ioctl.c               |  831 ++++++++-
 arch/x86/kernel/cpu/sgx/main.c                |   70 +-
 arch/x86/kernel/cpu/sgx/sgx.h                 |    3 +
 tools/testing/selftests/sgx/defines.h         |   23 +
 tools/testing/selftests/sgx/load.c            |   41 +
 tools/testing/selftests/sgx/main.c            | 1484 +++++++++++++++++
 tools/testing/selftests/sgx/main.h            |    1 +
 tools/testing/selftests/sgx/test_encl.c       |   68 +
 .../selftests/sgx/test_encl_bootstrap.S       |    6 +
 15 files changed, 2963 insertions(+), 96 deletions(-)


base-commit: 26291c54e111ff6ba87a164d85d4a4e134b7315c
prerequisite-patch-id: 3c3908f1c3536cc04ba020fb3e81f51395b44223
prerequisite-patch-id: e860923423c3387cf6fdcceb2fa41dc5e9454ef4
prerequisite-patch-id: 986260c8bc4255eb61e2c4afa88d2b723e376423
prerequisite-patch-id: ba014a99fced2b57d5d9e2dfb9d80ddf4333c13e
prerequisite-patch-id: 65cbb72889b6353a5639b984615d12019136b270
prerequisite-patch-id: e3296a2f0345a77c8a7ca91f76697ae2e1dca21f
-- 
2.25.1


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH V2 01/32] x86/sgx: Add short descriptions to ENCLS wrappers
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 02/32] x86/sgx: Add wrapper for SGX2 EMODPR function Reinette Chatre
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

The SGX ENCLS instruction uses EAX to specify an SGX function and
may require additional registers, depending on the SGX function.
ENCLS invokes the specified privileged SGX function for managing
and debugging enclaves. Macros are used to wrap the ENCLS
functionality and several wrappers are used to wrap the macros to
make the different SGX functions accessible in the code.

The wrappers of the supported SGX functions are cryptic. Add short
descriptions of each as a comment.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Fix commit message and subject to not refer to descriptions as
  "changelog descriptions" or "shortlog descriptions" (Jarkko).
- Improve all descriptions with guidance from Jarkko.

 arch/x86/kernel/cpu/sgx/encls.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index fa04a73daf9c..0e22fa8f77c5 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -136,57 +136,71 @@ static inline bool encls_failed(int ret)
 	ret;						\
 	})
 
+/* Initialize an EPC page into an SGX Enclave Control Structure (SECS) page. */
 static inline int __ecreate(struct sgx_pageinfo *pginfo, void *secs)
 {
 	return __encls_2(ECREATE, pginfo, secs);
 }
 
+/* Hash a 256 byte region of an enclave page to SECS:MRENCLAVE. */
 static inline int __eextend(void *secs, void *addr)
 {
 	return __encls_2(EEXTEND, secs, addr);
 }
 
+/*
+ * Associate an EPC page to an enclave either as a REG or TCS page
+ * populated with the provided data.
+ */
 static inline int __eadd(struct sgx_pageinfo *pginfo, void *addr)
 {
 	return __encls_2(EADD, pginfo, addr);
 }
 
+/* Finalize enclave build, initialize enclave for user code execution. */
 static inline int __einit(void *sigstruct, void *token, void *secs)
 {
 	return __encls_ret_3(EINIT, sigstruct, secs, token);
 }
 
+/* Disassociate EPC page from its enclave and mark it as unused. */
 static inline int __eremove(void *addr)
 {
 	return __encls_ret_1(EREMOVE, addr);
 }
 
+/* Copy data to an EPC page belonging to a debug enclave. */
 static inline int __edbgwr(void *addr, unsigned long *data)
 {
 	return __encls_2(EDGBWR, *data, addr);
 }
 
+/* Copy data from an EPC page belonging to a debug enclave. */
 static inline int __edbgrd(void *addr, unsigned long *data)
 {
 	return __encls_1_1(EDGBRD, *data, addr);
 }
 
+/* Track that software has completed the required TLB address clears. */
 static inline int __etrack(void *addr)
 {
 	return __encls_ret_1(ETRACK, addr);
 }
 
+/* Load, verify, and unblock an EPC page. */
 static inline int __eldu(struct sgx_pageinfo *pginfo, void *addr,
 			 void *va)
 {
 	return __encls_ret_3(ELDU, pginfo, addr, va);
 }
 
+/* Make EPC page inaccessible to enclave, ready to be written to memory. */
 static inline int __eblock(void *addr)
 {
 	return __encls_ret_1(EBLOCK, addr);
 }
 
+/* Initialize an EPC page into a Version Array (VA) page. */
 static inline int __epa(void *addr)
 {
 	unsigned long rbx = SGX_PAGE_TYPE_VA;
@@ -194,6 +208,7 @@ static inline int __epa(void *addr)
 	return __encls_2(EPA, rbx, addr);
 }
 
+/* Invalidate an EPC page and write it out to main memory. */
 static inline int __ewb(struct sgx_pageinfo *pginfo, void *addr,
 			void *va)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 02/32] x86/sgx: Add wrapper for SGX2 EMODPR function
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 01/32] x86/sgx: Add short descriptions to ENCLS wrappers Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 03/32] x86/sgx: Add wrapper for SGX2 EMODT function Reinette Chatre
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Add a wrapper for the EMODPR ENCLS leaf function used to
restrict enclave page permissions as maintained in the
SGX hardware's Enclave Page Cache Map (EPCM).

EMODPR:
1) Updates the EPCM permissions of an enclave page by treating
   the new permissions as a mask - supplying a value that relaxes
   EPCM permissions has no effect.
2) Sets the PR bit in the EPCM entry of the enclave page to
   indicate that permission restriction is in progress. The bit
   is reset by the enclave by invoking ENCLU leaf function
   EACCEPT or EACCEPTCOPY.

The enclave may access the page throughout the entire process
if conforming to the EPCM permissions for the enclave page.

After performing the permission restriction by issuing EMODPR
the kernel needs to collaborate with the hardware to ensure that
all logical processors sees the new restricted permissions. This
is required for the enclave's EACCEPT/EACCEPTCOPY to succeed and
is accomplished with the ETRACK flow.

Expand enum sgx_return_code with the possible EMODPR return
values.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Split original patch ("x86/sgx: Add wrappers for SGX2 functions")
  in three to introduce the SGX2 functions separately (Jarkko).
- Rewrite commit message to include how the EPCM within the hardware
  is changed by the SGX2 function as well as the calling
  conditions (Jarkko).
- Make short description more specific to which permissions (EPCM
  permissions) the function modifies.

 arch/x86/include/asm/sgx.h      | 5 +++++
 arch/x86/kernel/cpu/sgx/encls.h | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 3f9334ef67cd..d67810b50a81 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -65,17 +65,22 @@ enum sgx_encls_function {
 
 /**
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
+ * %SGX_EPC_PAGE_CONFLICT:	Page is being written by other ENCLS function.
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
  *				been completed yet.
  * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
  * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
  *				public key does not match IA32_SGXLEPUBKEYHASH.
+ * %SGX_PAGE_NOT_MODIFIABLE:	The EPC page cannot be modified because it
+ *				is in the PENDING or MODIFIED state.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
  */
 enum sgx_return_code {
+	SGX_EPC_PAGE_CONFLICT		= 7,
 	SGX_NOT_TRACKED			= 11,
 	SGX_CHILD_PRESENT		= 13,
 	SGX_INVALID_EINITTOKEN		= 16,
+	SGX_PAGE_NOT_MODIFIABLE		= 20,
 	SGX_UNMASKED_EVENT		= 128,
 };
 
diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 0e22fa8f77c5..2b091912f038 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -215,4 +215,10 @@ static inline int __ewb(struct sgx_pageinfo *pginfo, void *addr,
 	return __encls_ret_3(EWB, pginfo, addr, va);
 }
 
+/* Restrict the EPCM permissions of an EPC page. */
+static inline int __emodpr(struct sgx_secinfo *secinfo, void *addr)
+{
+	return __encls_ret_2(EMODPR, secinfo, addr);
+}
+
 #endif /* _X86_ENCLS_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 03/32] x86/sgx: Add wrapper for SGX2 EMODT function
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 01/32] x86/sgx: Add short descriptions to ENCLS wrappers Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 02/32] x86/sgx: Add wrapper for SGX2 EMODPR function Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 04/32] x86/sgx: Add wrapper for SGX2 EAUG function Reinette Chatre
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Add a wrapper for the EMODT ENCLS leaf function used to
change the type of an enclave page as maintained in the
SGX hardware's Enclave Page Cache Map (EPCM).

EMODT:
1) Updates the EPCM page type of the enclave page.
2) Sets the MODIFIED bit in the EPCM entry of the enclave page.
   This bit is reset by the enclave by invoking ENCLU leaf
   function EACCEPT or EACCEPTCOPY.

Access from within the enclave to the enclave page is not possible
while the MODIFIED bit is set.

After changing the enclave page type by issuing EMODT the kernel
needs to collaborate with the hardware to ensure that no logical
processor continues to hold a reference to the changed page. This
is required to ensure no required security checks are circumvented
and is required for the enclave's EACCEPT/EACCEPTCOPY to succeed.
Ensuring that no references to the changed page remain is
accomplished with the ETRACK flow.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Split original patch ("x86/sgx: Add wrappers for SGX2 functions")
  in three to introduce the SGX2 functions separately (Jarkko).
- Rewrite commit message to include how the EPCM within the hardware
  is changed by the SGX2 function as well as the calling
  conditions (Jarkko).

 arch/x86/kernel/cpu/sgx/encls.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 2b091912f038..7a1ecf704ec1 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -221,4 +221,10 @@ static inline int __emodpr(struct sgx_secinfo *secinfo, void *addr)
 	return __encls_ret_2(EMODPR, secinfo, addr);
 }
 
+/* Change the type of an EPC page. */
+static inline int __emodt(struct sgx_secinfo *secinfo, void *addr)
+{
+	return __encls_ret_2(EMODT, secinfo, addr);
+}
+
 #endif /* _X86_ENCLS_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 04/32] x86/sgx: Add wrapper for SGX2 EAUG function
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (2 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 03/32] x86/sgx: Add wrapper for SGX2 EMODT function Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 05/32] Documentation/x86: Document SGX permission details Reinette Chatre
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Add a wrapper for the EAUG ENCLS leaf function used to
add a page to an initialized enclave.

EAUG:
1) Stores all properties of the new enclave page in the SGX
   hardware's Enclave Page Cache Map (EPCM).
2) Sets the PENDING bit in the EPCM entry of the enclave page.
   This bit is cleared by the enclave by invoking ENCLU leaf
   function EACCEPT or EACCEPTCOPY.

Access from within the enclave to the new enclave page is not
possible until the PENDING bit is cleared.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Split original patch ("x86/sgx: Add wrappers for SGX2 functions")
  in three to introduce the SGX2 functions separately (Jarkko).
- Rewrite commit message to include how the EPCM within the hardware
  is changed by the SGX2 function as well as any calling
  conditions (Jarkko).

 arch/x86/kernel/cpu/sgx/encls.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 7a1ecf704ec1..99004b02e2ed 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -227,4 +227,10 @@ static inline int __emodt(struct sgx_secinfo *secinfo, void *addr)
 	return __encls_ret_2(EMODT, secinfo, addr);
 }
 
+/* Zero a page of EPC memory and add it to an initialized enclave. */
+static inline int __eaug(struct sgx_pageinfo *pginfo, void *addr)
+{
+	return __encls_2(EAUG, pginfo, addr);
+}
+
 #endif /* _X86_ENCLS_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 05/32] Documentation/x86: Document SGX permission details
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (3 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 04/32] x86/sgx: Add wrapper for SGX2 EAUG function Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions Reinette Chatre
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Provide summary of the various permissions involved in
managing access to enclave pages. This summary documents
the foundation for additions related to runtime managing of
enclave page permissions that is made possible with SGX2.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch.

 Documentation/x86/sgx.rst | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index 265568a9292c..89ff924b1480 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -71,16 +71,34 @@ The processor tracks EPC pages in a hardware metadata structure called the
 which describes the owning enclave, access rights and page type among the other
 things.
 
-EPCM permissions are separate from the normal page tables.  This prevents the
-kernel from, for instance, allowing writes to data which an enclave wishes to
-remain read-only.  EPCM permissions may only impose additional restrictions on
-top of normal x86 page permissions.
-
 For all intents and purposes, the SGX architecture allows the processor to
 invalidate all EPCM entries at will.  This requires that software be prepared to
 handle an EPCM fault at any time.  In practice, this can happen on events like
 power transitions when the ephemeral key that encrypts enclave memory is lost.
 
+Details about enclave page permissions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+EPCM permissions are separate from the normal page tables.  This prevents the
+kernel from, for instance, allowing writes to data which an enclave wishes
+to remain read-only.
+
+Three permission masks are relevant to SGX:
+
+* EPCM permissions.
+* Page Table Entry (PTE) permissions.
+* Virtual Memory Area (VMA) permissions.
+
+An enclave is only able to access an enclave page if all three permission
+masks enable it to do so.
+
+The relationships between the different permission masks are:
+
+* An SGX VMA can only be created if its permissions are the same or weaker
+  than the EPCM permissions.
+* PTEs are installed to match the EPCM permissions, but not be more
+  relaxed than the VMA permissions.
+
 Application interface
 =====================
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (4 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 05/32] Documentation/x86: Document SGX permission details Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-03-07 17:10   ` Jarkko Sakkinen
  2022-02-08  0:45 ` [PATCH V2 07/32] x86/sgx: Add pfn_mkwrite() handler for present PTEs Reinette Chatre
                   ` (26 subsequent siblings)
  32 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

=== Summary ===

An SGX VMA can only be created if its permissions are the same or
weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
creation this same rule is again enforced by the page fault handler:
faulted enclave pages are required to have equal or more relaxed
EPCM permissions than the VMA permissions.

On SGX1 systems the additional enforcement in the page fault handler
is redundant and on SGX2 systems it incorrectly prevents access.
On SGX1 systems it is unnecessary to repeat the enforcement of the
permission rule. The rule used during original VMA creation will
ensure that any access attempt will use correct permissions.
With SGX2 the EPCM permissions of a page can change after VMA
creation resulting in the VMA permissions potentially being more
relaxed than the EPCM permissions and the page fault handler
incorrectly blocking valid access attempts.

Enable the VMA's pages to remain accessible while ensuring that
the PTEs are installed to match the EPCM permissions but not be
more relaxed than the VMA permissions.

=== Full Changelog ===

An SGX enclave is an area of memory where parts of an application
can reside. First an enclave is created and loaded (from
non-enclave memory) with the code and data of an application,
then user space can map (mmap()) the enclave memory to
be able to enter the enclave at its defined entry points for
execution within it.

The hardware maintains a secure structure, the Enclave Page Cache Map
(EPCM), that tracks the contents of the enclave. Of interest here is
its tracking of the enclave page permissions. When a page is loaded
into the enclave its permissions are specified and recorded in the
EPCM. In parallel the kernel maintains permissions within the
page table entries (PTEs) and the rule is that PTE permissions
are not allowed to be more relaxed than the EPCM permissions.

A new mapping (mmap()) of enclave memory can only succeed if the
mapping has the same or weaker permissions than the permissions that
were vetted during enclave creation. This is enforced by
sgx_encl_may_map() that is called on the mmap() as well as mprotect()
paths. This rule remains.

One feature of SGX2 is to support the modification of EPCM permissions
after enclave initialization. Enclave pages may thus already be part
of a VMA at the time their EPCM permissions are changed resulting
in the VMA's permissions potentially being more relaxed than the EPCM
permissions.

Allow permissions of existing VMAs to be more relaxed than EPCM
permissions in preparation for dynamic EPCM permission changes
made possible in SGX2.  New VMAs that attempt to have more relaxed
permissions than EPCM permissions continue to be unsupported.

Reasons why permissions of existing VMAs are allowed to be more relaxed
than EPCM permissions instead of dynamically changing VMA permissions
when EPCM permissions change are:
1) Changing VMA permissions involve splitting VMAs which is an
   operation that can fail. Additionally changing EPCM permissions of
   a range of pages could also fail on any of the pages involved.
   Handling these error cases causes problems. For example, if an
   EPCM permission change fails and the VMA has already been split
   then it is not possible to undo the VMA split nor possible to
   undo the EPCM permission changes that did succeed before the
   failure.
2) The kernel has little insight into the user space where EPCM
   permissions are controlled from. For example, a RW page may
   be made RO just before it is made RX and splitting the VMAs
   while the VMAs may change soon is unnecessary.

Remove the extra permission check called on a page fault
(vm_operations_struct->fault) or during debugging
(vm_operations_struct->access) when loading the enclave page from swap
that ensures that the VMA permissions are not more relaxed than the
EPCM permissions. Since a VMA could only exist if it passed the
original permission checks during mmap() and a VMA may indeed
have more relaxed permissions than the EPCM permissions this extra
permission check is no longer appropriate.

With the permission check removed, ensure that PTEs do
not blindly inherit the VMA permissions but instead the permissions
that the VMA and EPCM agree on. PTEs for writable pages (from VMA
and enclave perspective) are installed with the writable bit set,
reducing the need for this additional flow to the permission mismatch
cases handled next.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Reword commit message (Jarkko).
- Use "relax" instead of "exceed" when referring to permissions (Dave).
- Add snippet to Documentation/x86/sgx.rst that highlights the
  relationship between VMA, EPCM, and PTE permissions on SGX
  systems (Andy).

 Documentation/x86/sgx.rst      | 10 +++++++++
 arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index 89ff924b1480..5659932728a5 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -99,6 +99,16 @@ The relationships between the different permission masks are:
 * PTEs are installed to match the EPCM permissions, but not be more
   relaxed than the VMA permissions.
 
+On systems supporting SGX2 EPCM permissions may change while the
+enclave page belongs to a VMA without impacting the VMA permissions.
+This means that a running VMA may appear to allow access to an enclave
+page that is not allowed by its EPCM permissions. For example, when an
+enclave page with RW EPCM permissions is mapped by a RW VMA but is
+subsequently changed to have read-only EPCM permissions. The kernel
+continues to maintain correct access to the enclave page through the
+PTE that will ensure that only access allowed by both the VMA
+and EPCM permissions are permitted.
+
 Application interface
 =====================
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 48afe96ae0f0..b6105d9e7c46 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 }
 
 static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
-						unsigned long addr,
-						unsigned long vm_flags)
+						unsigned long addr)
 {
-	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
 	struct sgx_epc_page *epc_page;
 	struct sgx_encl_page *entry;
 
@@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 	if (!entry)
 		return ERR_PTR(-EFAULT);
 
-	/*
-	 * Verify that the faulted page has equal or higher build time
-	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
-	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
-	 */
-	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
-		return ERR_PTR(-EFAULT);
-
 	/* Entry successfully located. */
 	if (entry->epc_page) {
 		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
@@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 {
 	unsigned long addr = (unsigned long)vmf->address;
 	struct vm_area_struct *vma = vmf->vma;
+	unsigned long page_prot_bits;
 	struct sgx_encl_page *entry;
+	unsigned long vm_prot_bits;
 	unsigned long phys_addr;
 	struct sgx_encl *encl;
 	vm_fault_t ret;
@@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 
 	mutex_lock(&encl->lock);
 
-	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
+	entry = sgx_encl_load_page(encl, addr);
 	if (IS_ERR(entry)) {
 		mutex_unlock(&encl->lock);
 
@@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 
 	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
 
-	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
+	/*
+	 * Insert PTE to match the EPCM page permissions ensured to not
+	 * exceed the VMA permissions.
+	 */
+	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
+	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
+	/*
+	 * Add VM_SHARED so that PTE is made writable right away if VMA
+	 * and EPCM are writable (no COW in SGX).
+	 */
+	page_prot_bits |= (vma->vm_flags & VM_SHARED);
+	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
+				  vm_get_page_prot(page_prot_bits));
 	if (ret != VM_FAULT_NOPAGE) {
 		mutex_unlock(&encl->lock);
 
@@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
  * Load an enclave page to EPC if required, and take encl->lock.
  */
 static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
-						   unsigned long addr,
-						   unsigned long vm_flags)
+						   unsigned long addr)
 {
 	struct sgx_encl_page *entry;
 
 	for ( ; ; ) {
 		mutex_lock(&encl->lock);
 
-		entry = sgx_encl_load_page(encl, addr, vm_flags);
+		entry = sgx_encl_load_page(encl, addr);
 		if (PTR_ERR(entry) != -EBUSY)
 			break;
 
@@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
 		return -EFAULT;
 
 	for (i = 0; i < len; i += cnt) {
-		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
-					      vma->vm_flags);
+		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
 		if (IS_ERR(entry)) {
 			ret = PTR_ERR(entry);
 			break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 07/32] x86/sgx: Add pfn_mkwrite() handler for present PTEs
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (5 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes Reinette Chatre
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

By default a write page fault on a present PTE inherits the
permissions of the VMA.

When using SGX2, enclave page permissions maintained in the
hardware's Enclave Page Cache Map (EPCM) may change after a VMA
accessing the page is created. A VMA's permissions may thus be
more relaxed than the EPCM permissions even though the VMA was
originally created not to have more relaxed permissions. Following
the default behavior during a page fault on a present PTE while
the VMA permissions are more relaxed than the EPCM permissions would
result in the PTE for an enclave page to be writable even
though the page is not writable according to the EPCM permissions.

The kernel should not allow writing to a page if that page is not
writable: the PTE should accurately reflect the EPCM permissions
while not being more relaxed than the VMA permissions.

Do not blindly accept VMA permissions on a page fault due to a
write attempt to a present PTE. Install a pfn_mkwrite() handler
that ensures that the VMA permissions agree with the EPCM
permissions in this regard.

Before and after page fault flow scenarios
==========================================

Consider the following scenario that will be possible when using SGX2:
* An enclave page exists with RW EPCM permissions.
* A RW VMA maps the range spanning the enclave page.
* The enclave page's EPCM permissions are changed to read-only.
* There is no PTE for the enclave page.

Considering that the PTE is not present in the scenario,
user space will observe the following when attempting to write to the
enclave page from within the enclave:
 1) Instruction writing to enclave page is run from within the enclave.
 2) A page fault with second and third bits set (0x6) is encountered
    and handled by the SGX handler sgx_vma_fault() that installs a
    read-only page table entry following previous patch that installs
    a PTE with permissions that VMA and enclave agree on
    (read-only in this case).
 3) Instruction writing to enclave page is re-attempted.
 4) A page fault with first three bits set (0x7) is encountered and
    transparently (from SGX driver and user space perspective) handled
    by the kernel with the PTE made writable because the VMA is
    writable.
 5) Instruction writing to enclave page is re-attempted.
 6) Since the EPCM permissions prevents writing to the page a new page
    fault is encountered, this time with the SGX flag set in the error
    code (0x8007). No action is taken by the kernel for this page fault
    and execution returns to user space.
 7) Typically such a fault will be passed on to an application with a
    signal but if the enclave is entered with the vDSO function provided
    by the kernel then user space does not receive a signal but instead
    the vDSO function returns successfully with exception information
    (vector=14, error code=0x8007, and address) within the exception
    fields within the vDSO function's struct sgx_enclave_run.

As can be observed it is not possible for user space to write to an
enclave page if that page's EPCM permissions do not allow so,
no matter what the VMA or PTE allows.

Even so, the kernel should not allow writing to a page if that page is
not writable. The PTE should accurately reflect the EPCM permissions.

With a pfn_mkwrite() handler that ensures that the VMA permissions
agree with the EPCM permissions user space observes the following
when attempting to write to the enclave page from within the enclave:
 1) Instruction writing to enclave page is run from within the enclave.
 2) A page fault with second and third bits set (0x6) is encountered
    and handled by the SGX handler sgx_vma_fault() that installs a
    read-only page table entry following previous patch that installs
    a PTE with permissions that VMA and enclave agree on
    (read-only in this case).
 3) Instruction writing to enclave page is re-attempted.
 4) A page fault with first three bits set (0x7) is encountered and
    passed to the pfn_mkwrite() handler for consideration. The handler
    determines that the page should not be writable and returns SIGBUS.
 5) Typically such a fault will be passed on to an application with a
    signal but if the enclave is entered with the vDSO function provided
    by the kernel then user space does not receive a signal but instead
    the vDSO function returns successfully with exception information
    (vector=14, error code=0x7, and address) within the exception fields
    within the vDSO function's struct sgx_enclave_run.

The accurate exception information supports the SGX runtime, which is
virtually always implemented inside a shared library, by providing
accurate information in support of its management of the SGX enclave.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Highlight in commit message that the behavior cannot happen in
  existing code but instead is behavior that becomes possible with SGX2
  (Jarkko).
- Reword commit message and remove the Q&A format (Jarkko).

 arch/x86/kernel/cpu/sgx/encl.c | 42 ++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index b6105d9e7c46..1ba01c75a579 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -184,6 +184,47 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 	return VM_FAULT_NOPAGE;
 }
 
+/*
+ * A fault occurred while writing to a present enclave PTE. Since PTE is
+ * present this will not be handled by sgx_vma_fault(). VMA may allow
+ * writing to the page while enclave (as based on EPCM permissions) does
+ * not. Do not follow the default of inheriting VMA permissions in this
+ * regard, ensure enclave also allows writing to the page.
+ */
+static vm_fault_t sgx_vma_pfn_mkwrite(struct vm_fault *vmf)
+{
+	unsigned long addr = (unsigned long)vmf->address;
+	struct vm_area_struct *vma = vmf->vma;
+	struct sgx_encl_page *entry;
+	struct sgx_encl *encl;
+	vm_fault_t ret = 0;
+
+	encl = vma->vm_private_data;
+
+	/*
+	 * It's very unlikely but possible that allocating memory for the
+	 * mm_list entry of a forked process failed in sgx_vma_open(). When
+	 * this happens, vm_private_data is set to NULL.
+	 */
+	if (unlikely(!encl))
+		return VM_FAULT_SIGBUS;
+
+	mutex_lock(&encl->lock);
+
+	entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+	if (!entry) {
+		ret = VM_FAULT_SIGBUS;
+		goto out;
+	}
+
+	if (!(entry->vm_max_prot_bits & VM_WRITE))
+		ret = VM_FAULT_SIGBUS;
+
+out:
+	mutex_unlock(&encl->lock);
+	return ret;
+}
+
 static void sgx_vma_open(struct vm_area_struct *vma)
 {
 	struct sgx_encl *encl = vma->vm_private_data;
@@ -381,6 +422,7 @@ const struct vm_operations_struct sgx_vm_ops = {
 	.mprotect = sgx_vma_mprotect,
 	.open = sgx_vma_open,
 	.access = sgx_vma_access,
+	.pfn_mkwrite = sgx_vma_pfn_mkwrite,
 };
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (6 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 07/32] x86/sgx: Add pfn_mkwrite() handler for present PTEs Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-03-04  8:55   ` Jarkko Sakkinen
  2022-02-08  0:45 ` [PATCH V2 09/32] x86/sgx: Export sgx_encl_ewb_cpumask() Reinette Chatre
                   ` (24 subsequent siblings)
  32 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Enclave creators declare their enclave page permissions (EPCM
permissions) at the time the pages are added to the enclave. These
page permissions are the vetted permissible accesses of the enclave
pages and stashed off (in struct sgx_encl_page->vm_max_prot_bits)
for later comparison with enclave PTEs and VMAs.

Current permission support assume that EPCM permissions remain static
for the lifetime of the enclave. This is about to change with the
addition of support for SGX2 where the EPCM permissions of enclave
pages belonging to an initialized enclave may change during the
enclave's lifetime.

Support for changing of EPCM permissions should continue to respect
the vetted maximum protection bits maintained in
sgx_encl_page->vm_max_prot_bits. Towards this end, add
sgx_encl_page->vm_run_prot_bits in preparation for support of
enclave page permission changes. sgx_encl_page->vm_run_prot_bits
reflect the active EPCM permissions of an enclave page and are not to
exceed sgx_encl_page->vm_max_prot_bits.

Two permission fields are used: sgx_encl_page->vm_run_prot_bits
reflects the current EPCM permissions and is used to manage the page
table entries while sgx_encl_page->vm_max_prot_bits contains the vetted
maximum protection bits and is used to guide which EPCM permissions
are allowed in the upcoming SGX2 permission changing support (it guides
what values sgx_encl_page->vm_run_prot_bits may have).

Consider this example how sgx_encl_page->vm_max_prot_bits and
sgx_encl_page->vm_run_prot_bits are used:

(1) Add an enclave page with secinfo of RW to an uninitialized enclave:
    sgx_encl_page->vm_max_prot_bits = RW
    sgx_encl_page->vm_run_prot_bits = RW

    At this point RW VMAs would be allowed to access this page and PTEs
    would allow write access as guided by
    sgx_encl_page->vm_run_prot_bits.

(2) User space invokes SGX2 to change the EPCM permissions to read-only.
    This is allowed because sgx_encl_page->vm_max_prot_bits = RW:
    sgx_encl_page->vm_max_prot_bits = RW
    sgx_encl_page->vm_run_prot_bits = R

    At this point only new read-only VMAs would be allowed to access
    this page and PTEs would not allow write access as guided
    by sgx_encl_page->vm_run_prot_bits.

(3) User space invokes SGX2 to change the EPCM permissions to RX.
    This will not be supported by the kernel because
    sgx_encl_page->vm_max_prot_bits = RW:
    sgx_encl_page->vm_max_prot_bits = RW
    sgx_encl_page->vm_run_prot_bits = R

(3) User space invokes SGX2 to change the EPCM permissions to RW.
    This will be allowed because sgx_encl_page->vm_max_prot_bits = RW:
    sgx_encl_page->vm_max_prot_bits = RW
    sgx_encl_page->vm_run_prot_bits = RW

    At this point RW VMAs would again be allowed to access this page
    and PTEs would allow write access as guided by
    sgx_encl_page->vm_run_prot_bits.

struct sgx_encl_page hosting this information is maintained for each
enclave page so the space consumed by the struct is important.
The existing sgx_encl_page->vm_max_prot_bits is already unsigned long
while only using three bits. Transition to a bitfield for the two
members containing protection bits.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Add snippet to Documentation/x86/sgx.rst that details the difference
  between vm_max_prot_bits and vm_run_prot_bits (Andy and Jarkko).
- Change subject line (Jarkko).
- Refer to actual variables instead of using English rephrasing -
  sgx_encl_page->vm_run_prot_bits instead of "runtime
  protection bits" (Jarkko).
- Add information in commit message on why two fields are needed
  (Jarkko).

 Documentation/x86/sgx.rst       | 10 ++++++++++
 arch/x86/kernel/cpu/sgx/encl.c  |  6 +++---
 arch/x86/kernel/cpu/sgx/encl.h  |  3 ++-
 arch/x86/kernel/cpu/sgx/ioctl.c |  6 ++++++
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index 5659932728a5..9df620b59f83 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -99,6 +99,16 @@ The relationships between the different permission masks are:
 * PTEs are installed to match the EPCM permissions, but not be more
   relaxed than the VMA permissions.
 
+During runtime the EPCM permissions of enclave pages belonging to an
+initialized enclave can change on systems supporting SGX2. In support
+of these runtime changes the kernel maintains (for each enclave page)
+the most permissive EPCM permission mask allowed by policy as
+the ``vm_max_prot_bits`` of that page. EPCM permissions are not allowed
+to be relaxed beyond ``vm_max_prot_bits``.  The kernel also maintains
+the currently active EPCM permissions of an enclave page as its
+``vm_run_prot_bits`` to ensure PTEs and new VMAs respect the active
+EPCM permission values.
+
 On systems supporting SGX2 EPCM permissions may change while the
 enclave page belongs to a VMA without impacting the VMA permissions.
 This means that a running VMA may appear to allow access to an enclave
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 1ba01c75a579..a980d8458949 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -164,7 +164,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 	 * exceed the VMA permissions.
 	 */
 	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
-	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
+	page_prot_bits = entry->vm_run_prot_bits & vm_prot_bits;
 	/*
 	 * Add VM_SHARED so that PTE is made writable right away if VMA
 	 * and EPCM are writable (no COW in SGX).
@@ -217,7 +217,7 @@ static vm_fault_t sgx_vma_pfn_mkwrite(struct vm_fault *vmf)
 		goto out;
 	}
 
-	if (!(entry->vm_max_prot_bits & VM_WRITE))
+	if (!(entry->vm_run_prot_bits & VM_WRITE))
 		ret = VM_FAULT_SIGBUS;
 
 out:
@@ -280,7 +280,7 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 	mutex_lock(&encl->lock);
 	xas_lock(&xas);
 	xas_for_each(&xas, page, PFN_DOWN(end - 1)) {
-		if (~page->vm_max_prot_bits & vm_prot_bits) {
+		if (~page->vm_run_prot_bits & vm_prot_bits) {
 			ret = -EACCES;
 			break;
 		}
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..dc262d843411 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -27,7 +27,8 @@
 
 struct sgx_encl_page {
 	unsigned long desc;
-	unsigned long vm_max_prot_bits;
+	unsigned long vm_max_prot_bits:8;
+	unsigned long vm_run_prot_bits:8;
 	struct sgx_epc_page *epc_page;
 	struct sgx_encl *encl;
 	struct sgx_va_page *va_page;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..7e0819a89532 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -197,6 +197,12 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
 	/* Calculate maximum of the VM flags for the page. */
 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
 
+	/*
+	 * At time of allocation, the runtime protection bits are the same
+	 * as the maximum protection bits.
+	 */
+	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
+
 	return encl_page;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 09/32] x86/sgx: Export sgx_encl_ewb_cpumask()
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (7 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 10/32] x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask() Reinette Chatre
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Using sgx_encl_ewb_cpumask() to learn which CPUs might have executed
an enclave is useful to ensure that TLBs are cleared when changes are
made to enclave pages.

sgx_encl_ewb_cpumask() is used within the reclaimer when an enclave
page is evicted. The upcoming SGX2 support enables changes to be
made to enclave pages and will require TLBs to not refer to the
changed pages and thus will be needing sgx_encl_ewb_cpumask().

Relocate sgx_encl_ewb_cpumask() to be with the rest of the enclave
code in encl.c now that it is no longer unique to the reclaimer.

Take care to ensure that any future usage maintains the
current context requirement that ETRACK has been called first.
Expand the existing comments to highlight this while moving them
to a more prominent location before the function.

No functional change.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch split from original "x86/sgx: Use more generic name for
  enclave cpumask function" (Jarkko).
- Change subject line (Jarkko).
- Fixup kernel-doc to use brackets in function name.

 arch/x86/kernel/cpu/sgx/encl.c | 67 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h |  1 +
 arch/x86/kernel/cpu/sgx/main.c | 29 ---------------
 3 files changed, 68 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index a980d8458949..687166769ca8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -597,6 +597,73 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
 	return 0;
 }
 
+/**
+ * sgx_encl_ewb_cpumask() - Query which CPUs might be accessing the enclave
+ * @encl: the enclave
+ *
+ * Some SGX functions require that no cached linear-to-physical address
+ * mappings are present before they can succeed. For example, ENCLS[EWB]
+ * copies a page from the enclave page cache to regular main memory but
+ * it fails if it cannot ensure that there are no cached
+ * linear-to-physical address mappings referring to the page.
+ *
+ * SGX hardware flushes all cached linear-to-physical mappings on a CPU
+ * when an enclave is exited via ENCLU[EEXIT] or an Asynchronous Enclave
+ * Exit (AEX). Exiting an enclave will thus ensure cached linear-to-physical
+ * address mappings are cleared but coordination with the tracking done within
+ * the SGX hardware is needed to support the SGX functions that depend on this
+ * cache clearing.
+ *
+ * When the ENCLS[ETRACK] function is issued on an enclave the hardware
+ * tracks threads operating inside the enclave at that time. The SGX
+ * hardware tracking require that all the identified threads must have
+ * exited the enclave in order to flush the mappings before a function such
+ * as ENCLS[EWB] will be permitted
+ *
+ * The following flow is used to support SGX functions that require that
+ * no cached linear-to-physical address mappings are present:
+ * 1) Execute ENCLS[ETRACK] to initiate hardware tracking.
+ * 2) Use this function (sgx_encl_ewb_cpumask()) to query which CPUs might be
+ *    accessing the enclave.
+ * 3) Send IPI to identified CPUs, kicking them out of the enclave and
+ *    thus flushing all locally cached linear-to-physical address mappings.
+ * 4) Execute SGX function.
+ *
+ * Context: It is required to call this function after ENCLS[ETRACK].
+ *          This will ensure that if any new mm appears (racing with
+ *          sgx_encl_mm_add()) then the new mm will enter into the
+ *          enclave with fresh linear-to-physical address mappings.
+ *
+ *          It is required that all IPIs are completed before a new
+ *          ENCLS[ETRACK] is issued so be sure to protect steps 1 to 3
+ *          of the above flow with the enclave's mutex.
+ *
+ * Return: cpumask of CPUs that might be accessing @encl
+ */
+const cpumask_t *sgx_encl_ewb_cpumask(struct sgx_encl *encl)
+{
+	cpumask_t *cpumask = &encl->cpumask;
+	struct sgx_encl_mm *encl_mm;
+	int idx;
+
+	cpumask_clear(cpumask);
+
+	idx = srcu_read_lock(&encl->srcu);
+
+	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+		if (!mmget_not_zero(encl_mm->mm))
+			continue;
+
+		cpumask_or(cpumask, cpumask, mm_cpumask(encl_mm->mm));
+
+		mmput_async(encl_mm->mm);
+	}
+
+	srcu_read_unlock(&encl->srcu, idx);
+
+	return cpumask;
+}
+
 static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
 					      pgoff_t index)
 {
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index dc262d843411..44431da21757 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -106,6 +106,7 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 
 void sgx_encl_release(struct kref *ref);
 int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
+const cpumask_t *sgx_encl_ewb_cpumask(struct sgx_encl *encl);
 int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 			 struct sgx_backing *backing);
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8e4bc6453d26..2de85f459492 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -203,35 +203,6 @@ static void sgx_ipi_cb(void *info)
 {
 }
 
-static const cpumask_t *sgx_encl_ewb_cpumask(struct sgx_encl *encl)
-{
-	cpumask_t *cpumask = &encl->cpumask;
-	struct sgx_encl_mm *encl_mm;
-	int idx;
-
-	/*
-	 * Can race with sgx_encl_mm_add(), but ETRACK has already been
-	 * executed, which means that the CPUs running in the new mm will enter
-	 * into the enclave with a fresh epoch.
-	 */
-	cpumask_clear(cpumask);
-
-	idx = srcu_read_lock(&encl->srcu);
-
-	list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
-		if (!mmget_not_zero(encl_mm->mm))
-			continue;
-
-		cpumask_or(cpumask, cpumask, mm_cpumask(encl_mm->mm));
-
-		mmput_async(encl_mm->mm);
-	}
-
-	srcu_read_unlock(&encl->srcu, idx);
-
-	return cpumask;
-}
-
 /*
  * Swap page to the regular memory transformed to the blocked state by using
  * EBLOCK, which means that it can no longer be referenced (no new TLB entries).
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 10/32] x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask()
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (8 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 09/32] x86/sgx: Export sgx_encl_ewb_cpumask() Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 11/32] x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes() Reinette Chatre
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

sgx_encl_ewb_cpumask() is no longer unique to the reclaimer where it
is used during the EWB ENCLS leaf function when EPC pages are written
out to main memory and sgx_encl_ewb_cpumask() is used to learn which
CPUs might have executed the enclave to ensure that TLBs are cleared.

Upcoming SGX2 enabling will use sgx_encl_ewb_cpumask() during the
EMODPR and EMODT ENCLS leaf functions that make changes to enclave
pages. The function is needed for the same reason it is used now: to
learn which CPUs might have executed the enclave to ensure that TLBs
no longer point to the changed pages.

Rename sgx_encl_ewb_cpumask() to sgx_encl_cpumask() to reflect the
broader usage.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch split from original "x86/sgx: Use more generic name for
  enclave cpumask function" (Jarkko).

 arch/x86/kernel/cpu/sgx/encl.c | 6 +++---
 arch/x86/kernel/cpu/sgx/encl.h | 2 +-
 arch/x86/kernel/cpu/sgx/main.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 687166769ca8..6f5d01121766 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -598,7 +598,7 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
 }
 
 /**
- * sgx_encl_ewb_cpumask() - Query which CPUs might be accessing the enclave
+ * sgx_encl_cpumask() - Query which CPUs might be accessing the enclave
  * @encl: the enclave
  *
  * Some SGX functions require that no cached linear-to-physical address
@@ -623,7 +623,7 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
  * The following flow is used to support SGX functions that require that
  * no cached linear-to-physical address mappings are present:
  * 1) Execute ENCLS[ETRACK] to initiate hardware tracking.
- * 2) Use this function (sgx_encl_ewb_cpumask()) to query which CPUs might be
+ * 2) Use this function (sgx_encl_cpumask()) to query which CPUs might be
  *    accessing the enclave.
  * 3) Send IPI to identified CPUs, kicking them out of the enclave and
  *    thus flushing all locally cached linear-to-physical address mappings.
@@ -640,7 +640,7 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
  *
  * Return: cpumask of CPUs that might be accessing @encl
  */
-const cpumask_t *sgx_encl_ewb_cpumask(struct sgx_encl *encl)
+const cpumask_t *sgx_encl_cpumask(struct sgx_encl *encl)
 {
 	cpumask_t *cpumask = &encl->cpumask;
 	struct sgx_encl_mm *encl_mm;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 44431da21757..becb68503baa 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -106,7 +106,7 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 
 void sgx_encl_release(struct kref *ref);
 int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
-const cpumask_t *sgx_encl_ewb_cpumask(struct sgx_encl *encl);
+const cpumask_t *sgx_encl_cpumask(struct sgx_encl *encl);
 int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 			 struct sgx_backing *backing);
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 2de85f459492..fa33922879bf 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -249,7 +249,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
 			 * miss cpus that entered the enclave between
 			 * generating the mask and incrementing epoch.
 			 */
-			on_each_cpu_mask(sgx_encl_ewb_cpumask(encl),
+			on_each_cpu_mask(sgx_encl_cpumask(encl),
 					 sgx_ipi_cb, NULL, 1);
 			ret = __sgx_encl_ewb(epc_page, va_slot, backing);
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 11/32] x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes()
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (9 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 10/32] x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask() Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 12/32] x86/sgx: Make sgx_ipi_cb() available internally Reinette Chatre
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

The SGX reclaimer removes page table entries pointing to pages that are
moved to swap.

SGX2 enables changes to pages belonging to an initialized enclave, thus
enclave pages may have their permission or type changed while the page
is being accessed by an enclave. Supporting SGX2 requires page table
entries to be removed so that any cached mappings to changed pages
are removed. For example, with the ability to change enclave page types
a regular enclave page may be changed to a Thread Control Structure
(TCS) page that may not be accessed by an enclave.

Factor out the code removing page table entries to a separate function
sgx_zap_enclave_ptes(), fixing accuracy of comments in the process,
and make it available to the upcoming SGX2 code.

Place sgx_zap_enclave_ptes() with the rest of the enclave code in
encl.c interacting with the page table since this code is no longer
unique to the reclaimer.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Elaborate why SGX2 needs this ability (Jarkko).
- More specific subject.
- Fix kernel-doc to have brackets in function name.

 arch/x86/kernel/cpu/sgx/encl.c | 45 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/encl.h |  2 +-
 arch/x86/kernel/cpu/sgx/main.c | 31 ++---------------------
 3 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 6f5d01121766..8da813504249 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -589,7 +589,7 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
 
 	spin_lock(&encl->mm_lock);
 	list_add_rcu(&encl_mm->list, &encl->mm_list);
-	/* Pairs with smp_rmb() in sgx_reclaimer_block(). */
+	/* Pairs with smp_rmb() in sgx_zap_enclave_ptes(). */
 	smp_wmb();
 	encl->mm_list_version++;
 	spin_unlock(&encl->mm_lock);
@@ -778,6 +778,49 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 	return ret;
 }
 
+/**
+ * sgx_zap_enclave_ptes() - remove PTEs mapping the address from enclave
+ * @encl: the enclave
+ * @addr: page aligned pointer to single page for which PTEs will be removed
+ *
+ * Multiple VMAs may have an enclave page mapped. Remove the PTE mapping
+ * @addr from each VMA. Ensure that page fault handler is ready to handle
+ * new mappings of @addr before calling this function.
+ */
+void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr)
+{
+	unsigned long mm_list_version;
+	struct sgx_encl_mm *encl_mm;
+	struct vm_area_struct *vma;
+	int idx, ret;
+
+	do {
+		mm_list_version = encl->mm_list_version;
+
+		/* Pairs with smp_wmb() in sgx_encl_mm_add(). */
+		smp_rmb();
+
+		idx = srcu_read_lock(&encl->srcu);
+
+		list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+			if (!mmget_not_zero(encl_mm->mm))
+				continue;
+
+			mmap_read_lock(encl_mm->mm);
+
+			ret = sgx_encl_find(encl_mm->mm, addr, &vma);
+			if (!ret && encl == vma->vm_private_data)
+				zap_vma_ptes(vma, addr, PAGE_SIZE);
+
+			mmap_read_unlock(encl_mm->mm);
+
+			mmput_async(encl_mm->mm);
+		}
+
+		srcu_read_unlock(&encl->srcu, idx);
+	} while (unlikely(encl->mm_list_version != mm_list_version));
+}
+
 /**
  * sgx_alloc_va_page() - Allocate a Version Array (VA) page
  *
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index becb68503baa..82e21088e68b 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -112,7 +112,7 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
-
+void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr);
 struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index fa33922879bf..ce9e87d5f8ec 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -137,36 +137,9 @@ static void sgx_reclaimer_block(struct sgx_epc_page *epc_page)
 	struct sgx_encl_page *page = epc_page->owner;
 	unsigned long addr = page->desc & PAGE_MASK;
 	struct sgx_encl *encl = page->encl;
-	unsigned long mm_list_version;
-	struct sgx_encl_mm *encl_mm;
-	struct vm_area_struct *vma;
-	int idx, ret;
-
-	do {
-		mm_list_version = encl->mm_list_version;
-
-		/* Pairs with smp_rmb() in sgx_encl_mm_add(). */
-		smp_rmb();
-
-		idx = srcu_read_lock(&encl->srcu);
-
-		list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
-			if (!mmget_not_zero(encl_mm->mm))
-				continue;
-
-			mmap_read_lock(encl_mm->mm);
-
-			ret = sgx_encl_find(encl_mm->mm, addr, &vma);
-			if (!ret && encl == vma->vm_private_data)
-				zap_vma_ptes(vma, addr, PAGE_SIZE);
-
-			mmap_read_unlock(encl_mm->mm);
-
-			mmput_async(encl_mm->mm);
-		}
+	int ret;
 
-		srcu_read_unlock(&encl->srcu, idx);
-	} while (unlikely(encl->mm_list_version != mm_list_version));
+	sgx_zap_enclave_ptes(encl, addr);
 
 	mutex_lock(&encl->lock);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 12/32] x86/sgx: Make sgx_ipi_cb() available internally
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (10 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 11/32] x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes() Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 13/32] x86/sgx: Create utility to validate user provided offset and length Reinette Chatre
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

The ETRACK function followed by an IPI to all CPUs within an enclave
is a common pattern with more frequent use in support of SGX2.

Make the (empty) IPI callback function available internally in
preparation for usage by SGX2.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Replace "for more usages" by "for usage by SGX2" (Jarkko)

 arch/x86/kernel/cpu/sgx/main.c | 2 +-
 arch/x86/kernel/cpu/sgx/sgx.h  | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ce9e87d5f8ec..6e2cb7564080 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -172,7 +172,7 @@ static int __sgx_encl_ewb(struct sgx_epc_page *epc_page, void *va_slot,
 	return ret;
 }
 
-static void sgx_ipi_cb(void *info)
+void sgx_ipi_cb(void *info)
 {
 }
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 0f17def9fe6f..b30cee4de903 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -90,6 +90,8 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
 
+void sgx_ipi_cb(void *info);
+
 #ifdef CONFIG_X86_SGX_KVM
 int __init sgx_vepc_init(void);
 #else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 13/32] x86/sgx: Create utility to validate user provided offset and length
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (11 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 12/32] x86/sgx: Make sgx_ipi_cb() available internally Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 14/32] x86/sgx: Keep record of SGX page type Reinette Chatre
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

User provided offset and length is validated when parsing the parameters
of the SGX_IOC_ENCLAVE_ADD_PAGES ioctl(). Extract this validation
into a utility that can be used by the SGX2 ioctl()s that will
also provide these values.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch

 arch/x86/kernel/cpu/sgx/ioctl.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 7e0819a89532..6e7cc441156b 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -378,6 +378,26 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 	return ret;
 }
 
+/*
+ * Ensure user provided offset and length values are valid for
+ * an enclave.
+ */
+static int sgx_validate_offset_length(struct sgx_encl *encl,
+				      unsigned long offset,
+				      unsigned long length)
+{
+	if (!IS_ALIGNED(offset, PAGE_SIZE))
+		return -EINVAL;
+
+	if (!length || length & (PAGE_SIZE - 1))
+		return -EINVAL;
+
+	if (offset + length - PAGE_SIZE >= encl->size)
+		return -EINVAL;
+
+	return 0;
+}
+
 /**
  * sgx_ioc_enclave_add_pages() - The handler for %SGX_IOC_ENCLAVE_ADD_PAGES
  * @encl:       an enclave pointer
@@ -431,14 +451,10 @@ static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg)
 	if (copy_from_user(&add_arg, arg, sizeof(add_arg)))
 		return -EFAULT;
 
-	if (!IS_ALIGNED(add_arg.offset, PAGE_SIZE) ||
-	    !IS_ALIGNED(add_arg.src, PAGE_SIZE))
-		return -EINVAL;
-
-	if (!add_arg.length || add_arg.length & (PAGE_SIZE - 1))
+	if (!IS_ALIGNED(add_arg.src, PAGE_SIZE))
 		return -EINVAL;
 
-	if (add_arg.offset + add_arg.length - PAGE_SIZE >= encl->size)
+	if (sgx_validate_offset_length(encl, add_arg.offset, add_arg.length))
 		return -EINVAL;
 
 	if (copy_from_user(&secinfo, (void __user *)add_arg.secinfo,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 14/32] x86/sgx: Keep record of SGX page type
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (12 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 13/32] x86/sgx: Create utility to validate user provided offset and length Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions Reinette Chatre
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

SGX2 functions are not allowed on all page types. For example,
ENCLS[EMODPR] is only allowed on regular SGX enclave pages and
ENCLS[EMODPT] is only allowed on TCS and regular pages. If these
functions are attempted on another type of page the hardware would
trigger a fault.

Keep a record of the SGX page type so that there is more
certainty whether an SGX2 instruction can succeed and faults
can be treated as real failures.

The page type is a property of struct sgx_encl_page
and thus does not cover the VA page type. VA pages are maintained
in separate structures and their type can be determined in
a different way. The SGX2 instructions needing the page type do not
operate on VA pages and this is thus not a scenario needing to
be covered at this time.

With the protection bits consuming 16 bits of the unsigned long
there is room available in the bitfield to include the page type
information without increasing the space consumed by the struct.

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Add Acked-by from Jarkko.

 arch/x86/include/asm/sgx.h      | 3 +++
 arch/x86/kernel/cpu/sgx/encl.h  | 1 +
 arch/x86/kernel/cpu/sgx/ioctl.c | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index d67810b50a81..eae20fa52b93 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -239,6 +239,9 @@ struct sgx_pageinfo {
  * %SGX_PAGE_TYPE_REG:	a regular page
  * %SGX_PAGE_TYPE_VA:	a VA page
  * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
+ *
+ * Make sure when making changes to this enum that its values can still fit
+ * in the bitfield within &struct sgx_encl_page
  */
 enum sgx_page_type {
 	SGX_PAGE_TYPE_SECS,
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 82e21088e68b..cb9f16d457ac 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -29,6 +29,7 @@ struct sgx_encl_page {
 	unsigned long desc;
 	unsigned long vm_max_prot_bits:8;
 	unsigned long vm_run_prot_bits:8;
+	enum sgx_page_type type:16;
 	struct sgx_epc_page *epc_page;
 	struct sgx_encl *encl;
 	struct sgx_va_page *va_page;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 6e7cc441156b..b8336d5d9029 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -107,6 +107,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 		set_bit(SGX_ENCL_DEBUG, &encl->flags);
 
 	encl->secs.encl = encl;
+	encl->secs.type = SGX_PAGE_TYPE_SECS;
 	encl->base = secs->base;
 	encl->size = secs->size;
 	encl->attributes = secs->attributes;
@@ -350,6 +351,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 	 */
 	encl_page->encl = encl;
 	encl_page->epc_page = epc_page;
+	encl_page->type = (secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK) >> 8;
 	encl->secs_child_cnt++;
 
 	if (flags & SGX_PAGE_MEASURE) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (13 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 14/32] x86/sgx: Keep record of SGX page type Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-03-04  8:59   ` Jarkko Sakkinen
  2022-02-08  0:45 ` [PATCH V2 16/32] x86/sgx: Support restricting " Reinette Chatre
                   ` (17 subsequent siblings)
  32 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

In the initial (SGX1) version of SGX, pages in an enclave need to be
created with permissions that support all usages of the pages, from
the time the enclave is initialized until it is unloaded. For example,
pages used by a JIT compiler or when code needs to otherwise be
relocated need to always have RWX permissions.

With the SGX2 function ENCLU[EMODPE] an enclave is able to relax
the EPCM permissions of its pages after the enclave is initialized.
Relaxing EPCM permissions is not possible from outside the enclave,
including from the kernel. The kernel does control the PTEs though
and the enclave still depends on the kernel to install PTEs with the
new relaxed permissions before it (the enclave) can access the pages
using the new permissions.

Introduce ioctl() SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support
relaxing of EPCM permissions done from within the enclave. With
this ioctl() the user specifies a page range and the permissions to
be applied to all pages in the provided range. After checking
the new permissions (more detail below) the PTEs are reset and
it is ensured that any new PTEs will contain the new, relaxed,
permissions.

The permission change request could fail on any page within the
provided range. To support partial success the ioctl() returns
an error code based on failures encountered by the kernel and
the number of pages that were successfully changed.

Checking user provided new permissions
======================================

Enclave page permission changes need to be approached with care and
for this reason permission changes are only allowed if
the new permissions are the same or more restrictive that the
vetted permissions. Thus, even though an enclave is able to relax
the EPCM permissions of its pages beyond what was originally vetted,
the kernel will not. The kernel will only install PTEs that respect
the vetted enclave page permissions.

For example, enclave pages with vetted EPCM permissions in brackets
below are allowed to have PTE permissions as follows:
* (RWX) R => RW => RX => RWX
* (RW) R => RW
* (RX) R => RX

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Change terminology to use "relax" instead of "extend" to refer to
  the case when enclave page permissions are added (Dave).
- Use ioctl() in commit message (Dave).
- Add examples on what permissions would be allowed (Dave).
- Split enclave page permission changes into two ioctl()s, one for
  permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
  and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
  (Jarkko).
- In support of the ioctl() name change the following names have been
  changed:
  struct sgx_page_modp -> struct sgx_enclave_relax_perm
  sgx_ioc_page_modp() -> sgx_ioc_enclave_relax_perm()
  sgx_page_modp() -> sgx_enclave_relax_perm()
- ioctl() takes entire secinfo as input instead of
  page permissions only (Jarkko).
- Fix kernel-doc to include () in function name.
- Introduce small helper to check for SGX2 readiness instead of
  duplicating the same two checks in every SGX2 supporting ioctl().
- Fixups in comments
- Move kernel-doc to function that provides documentation for
  Documentation/x86/sgx.rst.
- Remove redundant comment.
- Make explicit which member of struct sgx_enclave_relax_perm is
  for output (Dave).

 arch/x86/include/uapi/asm/sgx.h |  19 +++
 arch/x86/kernel/cpu/sgx/ioctl.c | 199 ++++++++++++++++++++++++++++++++
 2 files changed, 218 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index f4b81587e90b..5c678b27bb72 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -29,6 +29,8 @@ enum sgx_page_flags {
 	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
 #define SGX_IOC_VEPC_REMOVE_ALL \
 	_IO(SGX_MAGIC, 0x04)
+#define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
+	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -76,6 +78,23 @@ struct sgx_enclave_provision {
 	__u64 fd;
 };
 
+/**
+ * struct sgx_enclave_relax_perm - parameters for ioctl
+ *                                 %SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
+ * @offset:	starting page offset (page aligned relative to enclave base
+ *		address defined in SECS)
+ * @length:	length of memory (multiple of the page size)
+ * @secinfo:	address for the SECINFO data containing the new permission bits
+ *		for pages in range described by @offset and @length
+ * @count:	(output) bytes successfully changed (multiple of page size)
+ */
+struct sgx_enclave_relax_perm {
+	__u64 offset;
+	__u64 length;
+	__u64 secinfo;
+	__u64 count;
+};
+
 struct sgx_enclave_run;
 
 /**
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b8336d5d9029..9cc6af404bf6 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -698,6 +698,202 @@ static long sgx_ioc_enclave_provision(struct sgx_encl *encl, void __user *arg)
 	return sgx_set_attribute(&encl->attributes_mask, params.fd);
 }
 
+static unsigned long vm_prot_from_secinfo(u64 secinfo_perm)
+{
+	unsigned long vm_prot;
+
+	vm_prot = _calc_vm_trans(secinfo_perm, SGX_SECINFO_R, PROT_READ)  |
+		  _calc_vm_trans(secinfo_perm, SGX_SECINFO_W, PROT_WRITE) |
+		  _calc_vm_trans(secinfo_perm, SGX_SECINFO_X, PROT_EXEC);
+	vm_prot = calc_vm_prot_bits(vm_prot, 0);
+
+	return vm_prot;
+}
+
+/**
+ * sgx_enclave_relax_perm() - Update OS after permissions relaxed by enclave
+ * @encl:	Enclave to which the pages belong.
+ * @modp:	Checked parameters from user on which pages need modifying.
+ * @secinfo_perm: New validated permission bits.
+ *
+ * Return:
+ * - 0:		Success.
+ * - -errno:	Otherwise.
+ */
+static long sgx_enclave_relax_perm(struct sgx_encl *encl,
+				   struct sgx_enclave_relax_perm *modp,
+				   u64 secinfo_perm)
+{
+	struct sgx_encl_page *entry;
+	unsigned long vm_prot;
+	unsigned long addr;
+	unsigned long c;
+	int ret;
+
+	vm_prot = vm_prot_from_secinfo(secinfo_perm);
+
+	for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
+		addr = encl->base + modp->offset + c;
+
+		mutex_lock(&encl->lock);
+
+		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+		if (!entry) {
+			ret = -EFAULT;
+			goto out_unlock;
+		}
+
+		/*
+		 * Changing EPCM permissions is only supported on regular
+		 * SGX pages.
+		 */
+		if (entry->type != SGX_PAGE_TYPE_REG) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		/*
+		 * Do not accept permissions that are more relaxed
+		 * than vetted permissions.
+		 * If this check fails then EPCM permissions may be more
+		 * relaxed that what would be allowed by the kernel via
+		 * PTEs.
+		 */
+		if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+
+		/*
+		 * Change runtime protection before zapping PTEs to ensure
+		 * any new #PF uses new permissions.
+		 */
+		entry->vm_run_prot_bits = vm_prot;
+
+		mutex_unlock(&encl->lock);
+		/*
+		 * Do not keep encl->lock because of dependency on
+		 * mmap_lock acquired in sgx_zap_enclave_ptes().
+		 */
+		sgx_zap_enclave_ptes(encl, addr);
+	}
+
+	ret = 0;
+	goto out;
+
+out_unlock:
+	mutex_unlock(&encl->lock);
+out:
+	modp->count = c;
+
+	return ret;
+}
+
+/*
+ * Ensure enclave is ready for SGX2 functions. Readiness is checked
+ * by ensuring the hardware supports SGX2 and the enclave is initialized
+ * and thus able to handle requests to modify pages within it.
+ */
+static int sgx_ioc_sgx2_ready(struct sgx_encl *encl)
+{
+	if (!(cpu_feature_enabled(X86_FEATURE_SGX2)))
+		return -ENODEV;
+
+	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+		return -EINVAL;
+
+	return 0;
+}
+
+/*
+ * Return valid permission fields from a secinfo structure provided by
+ * user space. The secinfo structure is required to only have bits in
+ * the permission fields set.
+ */
+static int sgx_perm_from_user_secinfo(void __user *_secinfo, u64 *secinfo_perm)
+{
+	struct sgx_secinfo secinfo;
+	u64 perm;
+
+	if (copy_from_user(&secinfo, (void __user *)_secinfo,
+			   sizeof(secinfo)))
+		return -EFAULT;
+
+	if (secinfo.flags & ~SGX_SECINFO_PERMISSION_MASK)
+		return -EINVAL;
+
+	if (memchr_inv(secinfo.reserved, 0, sizeof(secinfo.reserved)))
+		return -EINVAL;
+
+	perm = secinfo.flags & SGX_SECINFO_PERMISSION_MASK;
+
+	if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
+		return -EINVAL;
+
+	*secinfo_perm = perm;
+
+	return 0;
+}
+
+/**
+ * sgx_ioc_enclave_relax_perm() - handler for
+ *                                %SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
+ * @encl:	an enclave pointer
+ * @arg:	userspace pointer to a &struct sgx_enclave_relax_perm instance
+ *
+ * SGX2 distinguishes between relaxing and restricting the enclave page
+ * permissions maintained by the hardware (EPCM permissions) of pages
+ * belonging to an initialized enclave (after %SGX_IOC_ENCLAVE_INIT).
+ *
+ * EPCM permissions can be relaxed anytime directly from within the enclave
+ * with no visibility from the kernel. This is accomplished with
+ * ENCLU[EMODPE] run from within the enclave. Accessing pages with
+ * the new, relaxed permissions requires the kernel to update the PTE
+ * to handle the subsequent #PF correctly.
+ *
+ * Enclave page permissions are not allowed to exceed the
+ * maximum vetted permissions maintained in
+ * &struct sgx_encl_page->vm_max_prot_bits. If the enclave
+ * exceeds these permissions by running ENCLU[EMODPE] from within the enclave
+ * the kernel will prevent access to the pages via PTE and
+ * VMA permissions.
+ *
+ * Return:
+ * - 0:		Success
+ * - -errno:	Otherwise
+ */
+static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg)
+{
+	struct sgx_enclave_relax_perm params;
+	u64 secinfo_perm;
+	long ret;
+
+	ret = sgx_ioc_sgx2_ready(encl);
+	if (ret)
+		return ret;
+
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (sgx_validate_offset_length(encl, params.offset, params.length))
+		return -EINVAL;
+
+	ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
+					 &secinfo_perm);
+	if (ret)
+		return ret;
+
+	if (params.count)
+		return -EINVAL;
+
+	ret = sgx_enclave_relax_perm(encl, &params, secinfo_perm);
+
+	if (copy_to_user(arg, &params, sizeof(params)))
+		return -EFAULT;
+
+	return ret;
+}
+
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
 	struct sgx_encl *encl = filep->private_data;
@@ -719,6 +915,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	case SGX_IOC_ENCLAVE_PROVISION:
 		ret = sgx_ioc_enclave_provision(encl, (void __user *)arg);
 		break;
+	case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
+		ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
+		break;
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (14 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-21  0:49   ` Jarkko Sakkinen
  2022-02-08  0:45 ` [PATCH V2 17/32] selftests/sgx: Add test for EPCM permission changes Reinette Chatre
                   ` (16 subsequent siblings)
  32 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

In the initial (SGX1) version of SGX, pages in an enclave need to be
created with permissions that support all usages of the pages, from the
time the enclave is initialized until it is unloaded. For example,
pages used by a JIT compiler or when code needs to otherwise be
relocated need to always have RWX permissions.

SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel
and can be used to restrict the EPCM permissions of regular enclave
pages within an initialized enclave.

Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support
restricting EPCM permissions. With this ioctl() the user specifies
a page range and the permissions to be applied to all pages in
the provided range. After checking the new permissions (more detail
below) the page table entries are reset and any new page
table entries will contain the new, restricted, permissions.
ENCLS[EMODPR] is run to restrict the EPCM permissions followed by
the ENCLS[ETRACK] flow that will ensure no cached
linear-to-physical address mappings to the changed pages remain.

It is possible for the permission change request to fail on any
page within the provided range, either with an error encountered
by the kernel or by the SGX hardware while running
ENCLS[EMODPR]. To support partial success the ioctl() returns an
error code based on failures encountered by the kernel as well
as two result output parameters: one for the number of pages
that were successfully changed and one for the SGX return code.

Checking user provided new permissions
======================================

Enclave page permission changes need to be approached with care and
for this reason permission changes are only allowed if the new
permissions are the same or more restrictive that the vetted
permissions. No additional checking is done to ensure that the
permissions are actually being restricted. This is because the
enclave may have relaxed the EPCM permissions from within
the enclave without letting the kernel know. An attempt to relax
permissions using this call will be ignored by the hardware.

For example, together with the support for relaxing of EPCM permissions,
enclave pages added with the vetted permissions in brackets below
are allowed to have permissions as follows:
* (RWX) => RW => R => RX => RWX
* (RW) => R => RW
* (RX) => R => RX

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Change terminology to use "relax" instead of "extend" to refer to
  the case when enclave page permissions are added (Dave).
- Use ioctl() in commit message (Dave).
- Add examples on what permissions would be allowed (Dave).
- Split enclave page permission changes into two ioctl()s, one for
  permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
  and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
  (Jarkko).
- In support of the ioctl() name change the following names have been
  changed:
  struct sgx_page_modp -> struct sgx_enclave_restrict_perm
  sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm()
  sgx_page_modp() -> sgx_enclave_restrict_perm()
- ioctl() takes entire secinfo as input instead of
  page permissions only (Jarkko).
- Fix kernel-doc to include () in function name.
- Create and use utility for the ETRACK flow.
- Fixups in comments
- Move kernel-doc to function that provides documentation for
  Documentation/x86/sgx.rst.
- Remove redundant comment.
- Make explicit which members of struct sgx_enclave_restrict_perm
  are for output (Dave).

 arch/x86/include/uapi/asm/sgx.h |  21 +++
 arch/x86/kernel/cpu/sgx/encl.c  |   4 +-
 arch/x86/kernel/cpu/sgx/encl.h  |   3 +
 arch/x86/kernel/cpu/sgx/ioctl.c | 229 ++++++++++++++++++++++++++++++++
 4 files changed, 255 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 5c678b27bb72..b0ffb80bc67f 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -31,6 +31,8 @@ enum sgx_page_flags {
 	_IO(SGX_MAGIC, 0x04)
 #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
 	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
+#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
+	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
 	__u64 count;
 };
 
+/**
+ * struct sgx_enclave_restrict_perm - parameters for ioctl
+ *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
+ * @offset:	starting page offset (page aligned relative to enclave base
+ *		address defined in SECS)
+ * @length:	length of memory (multiple of the page size)
+ * @secinfo:	address for the SECINFO data containing the new permission bits
+ *		for pages in range described by @offset and @length
+ * @result:	(output) SGX result code of ENCLS[EMODPR] function
+ * @count:	(output) bytes successfully changed (multiple of page size)
+ */
+struct sgx_enclave_restrict_perm {
+	__u64 offset;
+	__u64 length;
+	__u64 secinfo;
+	__u64 result;
+	__u64 count;
+};
+
 struct sgx_enclave_run;
 
 /**
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 8da813504249..a5d4a7efb986 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	return epc_page;
 }
 
-static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
-						unsigned long addr)
+struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+					 unsigned long addr)
 {
 	struct sgx_epc_page *epc_page;
 	struct sgx_encl_page *entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index cb9f16d457ac..848a28d28d3d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
 void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 
+struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
+					 unsigned long addr);
+
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 9cc6af404bf6..23bdf558b231 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
+/*
+ * Some SGX functions require that no cached linear-to-physical address
+ * mappings are present before they can succeed. Collaborate with
+ * hardware via ENCLS[ETRACK] to ensure that all cached
+ * linear-to-physical address mappings belonging to all threads of
+ * the enclave are cleared. See sgx_encl_cpumask() for details.
+ */
+static int sgx_enclave_etrack(struct sgx_encl *encl)
+{
+	void *epc_virt;
+	int ret;
+
+	epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page);
+	ret = __etrack(epc_virt);
+	if (ret) {
+		/*
+		 * ETRACK only fails when there is an OS issue. For
+		 * example, two consecutive ETRACK was sent without
+		 * completed IPI between.
+		 */
+		pr_err_once("ETRACK returned %d (0x%x)", ret, ret);
+		/*
+		 * Send IPIs to kick CPUs out of the enclave and
+		 * try ETRACK again.
+		 */
+		on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
+		ret = __etrack(epc_virt);
+		if (ret) {
+			pr_err_once("ETRACK repeat returned %d (0x%x)",
+				    ret, ret);
+			return -EFAULT;
+		}
+	}
+	on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
+
+	return 0;
+}
+
+/**
+ * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS view
+ * @encl:	Enclave to which the pages belong.
+ * @modp:	Checked parameters from user on which pages need modifying.
+ * @secinfo_perm: New (validated) permission bits.
+ *
+ * Return:
+ * - 0:		Success.
+ * - -errno:	Otherwise.
+ */
+static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
+				      struct sgx_enclave_restrict_perm *modp,
+				      u64 secinfo_perm)
+{
+	unsigned long vm_prot, run_prot_restore;
+	struct sgx_encl_page *entry;
+	struct sgx_secinfo secinfo;
+	unsigned long addr;
+	unsigned long c;
+	void *epc_virt;
+	int ret;
+
+	memset(&secinfo, 0, sizeof(secinfo));
+	secinfo.flags = secinfo_perm;
+
+	vm_prot = vm_prot_from_secinfo(secinfo_perm);
+
+	for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
+		addr = encl->base + modp->offset + c;
+
+		mutex_lock(&encl->lock);
+
+		entry = sgx_encl_load_page(encl, addr);
+		if (IS_ERR(entry)) {
+			ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT;
+			goto out_unlock;
+		}
+
+		/*
+		 * Changing EPCM permissions is only supported on regular
+		 * SGX pages. Attempting this change on other pages will
+		 * result in #PF.
+		 */
+		if (entry->type != SGX_PAGE_TYPE_REG) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		/*
+		 * Do not verify if current runtime protection bits are what
+		 * is being requested. The enclave may have relaxed EPCM
+		 * permissions calls without letting the kernel know and
+		 * thus permission restriction may still be needed even if
+		 * from the kernel's perspective the permissions are unchanged.
+		 */
+
+		/* New permissions should never exceed vetted permissions. */
+		if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+
+		/* Make sure page stays around while releasing mutex. */
+		if (sgx_unmark_page_reclaimable(entry->epc_page)) {
+			ret = -EAGAIN;
+			goto out_unlock;
+		}
+
+		/*
+		 * Change runtime protection before zapping PTEs to ensure
+		 * any new #PF uses new permissions. EPCM permissions (if
+		 * needed) not changed yet.
+		 */
+		run_prot_restore = entry->vm_run_prot_bits;
+		entry->vm_run_prot_bits = vm_prot;
+
+		mutex_unlock(&encl->lock);
+		/*
+		 * Do not keep encl->lock because of dependency on
+		 * mmap_lock acquired in sgx_zap_enclave_ptes().
+		 */
+		sgx_zap_enclave_ptes(encl, addr);
+
+		mutex_lock(&encl->lock);
+
+		/* Change EPCM permissions. */
+		epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
+		ret = __emodpr(&secinfo, epc_virt);
+		if (encls_faulted(ret)) {
+			/*
+			 * All possible faults should be avoidable:
+			 * parameters have been checked, will only change
+			 * permissions of a regular page, and no concurrent
+			 * SGX1/SGX2 ENCLS instructions since these
+			 * are protected with mutex.
+			 */
+			pr_err_once("EMODPR encountered exception %d\n",
+				    ENCLS_TRAPNR(ret));
+			ret = -EFAULT;
+			goto out_prot_restore;
+		}
+		if (encls_failed(ret)) {
+			modp->result = ret;
+			ret = -EFAULT;
+			goto out_prot_restore;
+		}
+
+		ret = sgx_enclave_etrack(encl);
+		if (ret) {
+			ret = -EFAULT;
+			goto out_reclaim;
+		}
+
+		sgx_mark_page_reclaimable(entry->epc_page);
+		mutex_unlock(&encl->lock);
+	}
+
+	ret = 0;
+	goto out;
+
+out_prot_restore:
+	entry->vm_run_prot_bits = run_prot_restore;
+out_reclaim:
+	sgx_mark_page_reclaimable(entry->epc_page);
+out_unlock:
+	mutex_unlock(&encl->lock);
+out:
+	modp->count = c;
+
+	return ret;
+}
+
+/**
+ * sgx_ioc_enclave_restrict_perm() - handler for
+ *                                   %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
+ * @encl:	an enclave pointer
+ * @arg:	userspace pointer to a &struct sgx_enclave_restrict_perm
+ *		instance
+ *
+ * SGX2 distinguishes between relaxing and restricting the enclave page
+ * permissions maintained by the hardware (EPCM permissions) of pages
+ * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
+ *
+ * EPCM permissions cannot be restricted from within the enclave, the enclave
+ * requires the kernel to run the privileged level 0 instructions ENCLS[EMODPR]
+ * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this call
+ * will be ignored by the hardware.
+ *
+ * Enclave page permissions are not allowed to exceed the maximum vetted
+ * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits.
+ *
+ * Return:
+ * - 0:		Success
+ * - -errno:	Otherwise
+ */
+static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl,
+					  void __user *arg)
+{
+	struct sgx_enclave_restrict_perm params;
+	u64 secinfo_perm;
+	long ret;
+
+	ret = sgx_ioc_sgx2_ready(encl);
+	if (ret)
+		return ret;
+
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (sgx_validate_offset_length(encl, params.offset, params.length))
+		return -EINVAL;
+
+	ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
+					 &secinfo_perm);
+	if (ret)
+		return ret;
+
+	if (params.result || params.count)
+		return -EINVAL;
+
+	ret = sgx_enclave_restrict_perm(encl, &params, secinfo_perm);
+
+	if (copy_to_user(arg, &params, sizeof(params)))
+		return -EFAULT;
+
+	return ret;
+}
+
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
 	struct sgx_encl *encl = filep->private_data;
@@ -918,6 +1144,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
 		ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
 		break;
+	case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
+		ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg);
+		break;
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 17/32] selftests/sgx: Add test for EPCM permission changes
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (15 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 16/32] x86/sgx: Support restricting " Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 18/32] selftests/sgx: Add test for TCS page " Reinette Chatre
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

EPCM permission changes could be made from within (to relax
permissions) or out (to restrict permissions) the enclave. Kernel
support is needed when permissions are restricted to be able to
call the privileged ENCLS[EMODPR] instruction and ensure PTEs
allowing the restricted permissions are flushed. EPCM permissions
can be relaxed via ENCLU[EMODPE] from within the enclave but the
enclave still depends on the kernel to install PTEs with the new
permissions.

Add a test that exercises a few of the enclave page permission flows:
1) Test starts with a RW (from enclave and kernel perspective)
   enclave page that is mapped via a RW VMA.
2) Use the SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl() to restrict
   the enclave (EPCM) page permissions to read-only (kernel removes
   PTE in the process).
3) Run ENCLU[EACCEPT] from within the enclave to accept the new page
   permissions.
4) Attempt to write to the enclave page from within the enclave - this
   should fail with a page fault on the PTE since the page
   table entry accurately reflects the (read-only) EPCM permissions.
5) Restore EPCM permissions to RW by running ENCLU[EMODPE] from within
   the enclave.
6) Attempt to write to the enclave page from within the enclave - this
   should fail again with a page fault because even though the EPCM
   permissions are RW the PTE does not yet reflect that.
7) Use the SGX_IOC_ENCLAVE_RELAX_PERMISSIONS ioctl() to inform the
   kernel of new page permissions and PTEs will accurately reflect
   RW EPCM permissions.
8) Writing to enclave page from within enclave succeeds.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Adapt test to the kernel interface changes: the ioctl() name change
  and providing entire secinfo as parameter.
- Remove the ENCLU[EACCEPT] call after permissions are relaxed since
  the new flow no longer results in the EPCM PR bit being set.
- Rewrite error path to reduce line lengths.

 tools/testing/selftests/sgx/defines.h   |  15 ++
 tools/testing/selftests/sgx/main.c      | 253 ++++++++++++++++++++++++
 tools/testing/selftests/sgx/test_encl.c |  38 ++++
 3 files changed, 306 insertions(+)

diff --git a/tools/testing/selftests/sgx/defines.h b/tools/testing/selftests/sgx/defines.h
index 02d775789ea7..b638eb98c80c 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -24,6 +24,8 @@ enum encl_op_type {
 	ENCL_OP_PUT_TO_ADDRESS,
 	ENCL_OP_GET_FROM_ADDRESS,
 	ENCL_OP_NOP,
+	ENCL_OP_EACCEPT,
+	ENCL_OP_EMODPE,
 	ENCL_OP_MAX,
 };
 
@@ -53,4 +55,17 @@ struct encl_op_get_from_addr {
 	uint64_t addr;
 };
 
+struct encl_op_eaccept {
+	struct encl_op_header header;
+	uint64_t epc_addr;
+	uint64_t flags;
+	uint64_t ret;
+};
+
+struct encl_op_emodpe {
+	struct encl_op_header header;
+	uint64_t epc_addr;
+	uint64_t flags;
+};
+
 #endif /* DEFINES_H */
diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index dd74fa42302e..4f348ed1dc29 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -25,6 +25,18 @@ static const uint64_t MAGIC = 0x1122334455667788ULL;
 static const uint64_t MAGIC2 = 0x8877665544332211ULL;
 vdso_sgx_enter_enclave_t vdso_sgx_enter_enclave;
 
+/*
+ * Security Information (SECINFO) data structure needed by a few SGX
+ * instructions (eg. ENCLU[EACCEPT] and ENCLU[EMODPE]) holds meta-data
+ * about an enclave page. &enum sgx_secinfo_page_state specifies the
+ * secinfo flags used for page state.
+ */
+enum sgx_secinfo_page_state {
+	SGX_SECINFO_PENDING = (1 << 3),
+	SGX_SECINFO_MODIFIED = (1 << 4),
+	SGX_SECINFO_PR = (1 << 5),
+};
+
 struct vdso_symtab {
 	Elf64_Sym *elf_symtab;
 	const char *elf_symstrtab;
@@ -555,4 +567,245 @@ TEST_F(enclave, pte_permissions)
 	EXPECT_EQ(self->run.exception_addr, 0);
 }
 
+/*
+ * Enclave page permission test.
+ *
+ * Modify and restore enclave page's EPCM (enclave) permissions from
+ * outside enclave (ENCLS[EMODPR] via kernel) as well as from within
+ * enclave (via ENCLU[EMODPE]). Kernel should ensure PTE permissions
+ * are the same as the EPCM permissions so check for page fault if
+ * VMA allows access but EPCM and PTE does not.
+ */
+TEST_F(enclave, epcm_permissions)
+{
+	struct sgx_enclave_restrict_perm restrict_ioc;
+	struct encl_op_get_from_addr get_addr_op;
+	struct sgx_enclave_relax_perm relax_ioc;
+	struct encl_op_put_to_addr put_addr_op;
+	struct encl_op_eaccept eaccept_op;
+	struct encl_op_emodpe emodpe_op;
+	struct sgx_secinfo secinfo;
+	unsigned long data_start;
+	int ret, errno_save;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	/*
+	 * Ensure kernel supports needed ioctl() and system supports needed
+	 * commands.
+	 */
+	memset(&restrict_ioc, 0, sizeof(restrict_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS,
+		    &restrict_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	ASSERT_EQ(ret, -1);
+
+	/* ret == -1 */
+	if (errno_save == ENOTTY)
+		SKIP(return,
+		     "Kernel does not support SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl()");
+	else if (errno_save == ENODEV)
+		SKIP(return, "System does not support SGX2");
+
+	/*
+	 * Page that will have its permissions changed is the second data
+	 * page in the .data segment. This forms part of the local encl_buffer
+	 * within the enclave.
+	 *
+	 * At start of test @data_start should have EPCM as well as PTE
+	 * permissions of RW.
+	 */
+
+	data_start = self->encl.encl_base +
+		     encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+	/*
+	 * Sanity check that page at @data_start is writable before making
+	 * any changes to page permissions.
+	 *
+	 * Start by writing MAGIC to test page.
+	 */
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = data_start;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory that was just written to, confirming that
+	 * page is writable.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = data_start;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Change EPCM permissions to read-only, PTE entry flushed by
+	 * kernel in the process.
+	 */
+	memset(&restrict_ioc, 0, sizeof(restrict_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = PROT_READ;
+	restrict_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	restrict_ioc.length = PAGE_SIZE;
+	restrict_ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS,
+		    &restrict_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(restrict_ioc.result, 0);
+	EXPECT_EQ(restrict_ioc.count, 4096);
+
+	/*
+	 * EPCM permissions changed from kernel, need to EACCEPT from enclave.
+	 */
+	eaccept_op.epc_addr = data_start;
+	eaccept_op.flags = PROT_READ | SGX_SECINFO_REG | SGX_SECINFO_PR;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/*
+	 * EPCM permissions of page is now read-only, expect #PF
+	 * on PTE (not EPCM) when attempting to write to page from
+	 * within enclave.
+	 */
+	put_addr_op.value = MAGIC2;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(self->run.function, ERESUME);
+	EXPECT_EQ(self->run.exception_vector, 14);
+	EXPECT_EQ(self->run.exception_error_code, 0x7);
+	EXPECT_EQ(self->run.exception_addr, data_start);
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+
+	/*
+	 * Received AEX but cannot return to enclave at same entrypoint,
+	 * need different TCS from where EPCM permission can be made writable
+	 * again.
+	 */
+	self->run.tcs = self->encl.encl_base + PAGE_SIZE;
+
+	/*
+	 * Enter enclave at new TCS to change EPCM permissions to be
+	 * writable again and thus fix the page fault that triggered the
+	 * AEX.
+	 */
+
+	emodpe_op.epc_addr = data_start;
+	emodpe_op.flags = PROT_READ | PROT_WRITE;
+	emodpe_op.header.type = ENCL_OP_EMODPE;
+
+	EXPECT_EQ(ENCL_CALL(&emodpe_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Attempt to return to main TCS to resume execution at faulting
+	 * instruction, but PTE should still prevent writing to the page.
+	 */
+	self->run.tcs = self->encl.encl_base;
+
+	EXPECT_EQ(vdso_sgx_enter_enclave((unsigned long)&put_addr_op, 0, 0,
+					 ERESUME, 0, 0,
+					 &self->run),
+		  0);
+
+	EXPECT_EQ(self->run.function, ERESUME);
+	EXPECT_EQ(self->run.exception_vector, 14);
+	EXPECT_EQ(self->run.exception_error_code, 0x7);
+	EXPECT_EQ(self->run.exception_addr, data_start);
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+	/*
+	 * Inform kernel about new permissions to have PTEs match EPCM.
+	 */
+	memset(&relax_ioc, 0, sizeof(relax_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = PROT_READ | PROT_WRITE;
+	relax_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	relax_ioc.length = PAGE_SIZE;
+	relax_ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_RELAX_PERMISSIONS,
+		    &relax_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(relax_ioc.count, 4096);
+
+	/*
+	 * Wrong page permissions that caused original fault has
+	 * now been fixed via EPCM permissions as well as PTE.
+	 * Resume execution in main TCS to re-attempt the memory access.
+	 */
+	self->run.tcs = self->encl.encl_base;
+
+	EXPECT_EQ(vdso_sgx_enter_enclave((unsigned long)&put_addr_op, 0, 0,
+					 ERESUME, 0, 0,
+					 &self->run),
+		  0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	get_addr_op.value = 0;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC2);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.user_data, 0);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/sgx/test_encl.c b/tools/testing/selftests/sgx/test_encl.c
index 4fca01cfd898..5b6c65331527 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -11,6 +11,42 @@
  */
 static uint8_t encl_buffer[8192] = { 1 };
 
+enum sgx_enclu_function {
+	EACCEPT = 0x5,
+	EMODPE = 0x6,
+};
+
+static void do_encl_emodpe(void *_op)
+{
+	struct sgx_secinfo secinfo __aligned(sizeof(struct sgx_secinfo)) = {0};
+	struct encl_op_emodpe *op = _op;
+
+	secinfo.flags = op->flags;
+
+	asm volatile(".byte 0x0f, 0x01, 0xd7"
+				:
+				: "a" (EMODPE),
+				  "b" (&secinfo),
+				  "c" (op->epc_addr));
+}
+
+static void do_encl_eaccept(void *_op)
+{
+	struct sgx_secinfo secinfo __aligned(sizeof(struct sgx_secinfo)) = {0};
+	struct encl_op_eaccept *op = _op;
+	int rax;
+
+	secinfo.flags = op->flags;
+
+	asm volatile(".byte 0x0f, 0x01, 0xd7"
+				: "=a" (rax)
+				: "a" (EACCEPT),
+				  "b" (&secinfo),
+				  "c" (op->epc_addr));
+
+	op->ret = rax;
+}
+
 static void *memcpy(void *dest, const void *src, size_t n)
 {
 	size_t i;
@@ -62,6 +98,8 @@ void encl_body(void *rdi,  void *rsi)
 		do_encl_op_put_to_addr,
 		do_encl_op_get_from_addr,
 		do_encl_op_nop,
+		do_encl_eaccept,
+		do_encl_emodpe,
 	};
 
 	struct encl_op_header *op = (struct encl_op_header *)rdi;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 18/32] selftests/sgx: Add test for TCS page permission changes
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (16 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 17/32] selftests/sgx: Add test for EPCM permission changes Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave Reinette Chatre
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Kernel should not allow permission changes on TCS pages. Add test to
confirm this behavior.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Adapt test to the kernel interface changes: the ioctl() name change
  and providing entire secinfo as parameter.
- Rewrite error path to reduce line lengths.

 tools/testing/selftests/sgx/main.c | 74 ++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 4f348ed1dc29..1398cd1b0983 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -121,6 +121,24 @@ static Elf64_Sym *vdso_symtab_get(struct vdso_symtab *symtab, const char *name)
 	return NULL;
 }
 
+/*
+ * Return the offset in the enclave where the TCS segment can be found.
+ * The first RW segment loaded is the TCS.
+ */
+static off_t encl_get_tcs_offset(struct encl *encl)
+{
+	int i;
+
+	for (i = 0; i < encl->nr_segments; i++) {
+		struct encl_segment *seg = &encl->segment_tbl[i];
+
+		if (i == 0 && seg->prot == (PROT_READ | PROT_WRITE))
+			return seg->offset;
+	}
+
+	return -1;
+}
+
 /*
  * Return the offset in the enclave where the data segment can be found.
  * The first RW segment loaded is the TCS, skip that to get info on the
@@ -567,6 +585,62 @@ TEST_F(enclave, pte_permissions)
 	EXPECT_EQ(self->run.exception_addr, 0);
 }
 
+/*
+ * Modifying permissions of TCS page should not be possible.
+ */
+TEST_F(enclave, tcs_permissions)
+{
+	struct sgx_enclave_restrict_perm ioc;
+	struct sgx_secinfo secinfo;
+	int ret, errno_save;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	memset(&ioc, 0, sizeof(ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	/*
+	 * Ensure kernel supports needed ioctl() and system supports needed
+	 * commands.
+	 */
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS, &ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	ASSERT_EQ(ret, -1);
+
+	/* ret == -1 */
+	if (errno_save == ENOTTY)
+		SKIP(return,
+		     "Kernel does not support SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl()");
+	else if (errno_save == ENODEV)
+		SKIP(return, "System does not support SGX2");
+
+	/*
+	 * Attempt to make TCS page read-only. This is not allowed and
+	 * should be prevented by the kernel.
+	 */
+	secinfo.flags = PROT_READ;
+	ioc.offset = encl_get_tcs_offset(&self->encl);
+	ioc.length = PAGE_SIZE;
+	ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS, &ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, -1);
+	EXPECT_EQ(errno_save, EINVAL);
+	EXPECT_EQ(ioc.result, 0);
+	EXPECT_EQ(ioc.count, 0);
+}
+
 /*
  * Enclave page permission test.
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (17 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 18/32] selftests/sgx: Add test for TCS page " Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-19 11:57   ` Jarkko Sakkinen
  2022-03-07 16:16   ` Jarkko Sakkinen
  2022-02-08  0:45 ` [PATCH V2 20/32] x86/sgx: Tighten accessible memory range after enclave initialization Reinette Chatre
                   ` (13 subsequent siblings)
  32 siblings, 2 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

With SGX1 an enclave needs to be created with its maximum memory demands
allocated. Pages cannot be added to an enclave after it is initialized.
SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
pages to an initialized enclave. With SGX2 the enclave still needs to
set aside address space for its maximum memory demands during enclave
creation, but all pages need not be added before enclave initialization.
Pages can be added during enclave runtime.

Add support for dynamically adding pages to an initialized enclave,
architecturally limited to RW permission. Add pages via the page fault
handler at the time an enclave address without a backing enclave page
is accessed, potentially directly reclaiming pages if no free pages
are available.

The enclave is still required to run ENCLU[EACCEPT] on the page before
it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
on an uninitialized address. This will trigger the page fault handler
that will add the enclave page and return execution to the enclave to
repeat the ENCLU[EACCEPT] instruction, this time successful.

If the enclave accesses an uninitialized address in another way, for
example by expanding the enclave stack to a page that has not yet been
added, then the page fault handler would add the page on the first
write but upon returning to the enclave the instruction that triggered
the page fault would be repeated and since ENCLU[EACCEPT] was not run
yet it would trigger a second page fault, this time with the SGX flag
set in the page fault error code. This can only be recovered by entering
the enclave again and directly running the ENCLU[EACCEPT] instruction on
the now initialized address.

Accessing an uninitialized address from outside the enclave also
triggers this flow but the page will remain inaccessible (access will
result in #PF) until accepted from within the enclave via
ENCLU[EACCEPT].

The page is added with the architecturally constrained RW permissions
as runtime as well as maximum allowed permissions. It is understood that
there are some use cases, for example code relocation, that requires RWX
maximum permissions. Supporting these use cases require guidance from
user space policy before such maximum permissions can be allowed.
Integration with user policy is deferred.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Fix subject line "to initialized" -> "to an initialized" (Jarkko).
- Move text about hardware's PENDING state to the patch that introduces
  the ENCLS[EAUG] wrapper (Jarkko).
- Ensure kernel-doc uses brackets when referring to function.

 arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h  |   2 +
 arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
 3 files changed, 137 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index a5d4a7efb986..d1e3ea86b902 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -124,6 +124,128 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 	return entry;
 }
 
+/**
+ * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
+ * @vma:	VMA obtained from fault info from where page is accessed
+ * @encl:	enclave accessing the page
+ * @addr:	address that triggered the page fault
+ *
+ * When an initialized enclave accesses a page with no backing EPC page
+ * on a SGX2 system then the EPC can be added dynamically via the SGX2
+ * ENCLS[EAUG] instruction.
+ *
+ * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
+ * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
+ */
+static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
+				     struct sgx_encl *encl, unsigned long addr)
+{
+	struct sgx_pageinfo pginfo = {0};
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+	struct sgx_va_page *va_page;
+	unsigned long phys_addr;
+	unsigned long prot;
+	vm_fault_t vmret;
+	int ret;
+
+	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+		return VM_FAULT_SIGBUS;
+
+	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
+	if (!encl_page)
+		return VM_FAULT_OOM;
+
+	encl_page->desc = addr;
+	encl_page->encl = encl;
+
+	/*
+	 * Adding a regular page that is architecturally allowed to only
+	 * be created with RW permissions.
+	 * TBD: Interface with user space policy to support max permissions
+	 * of RWX.
+	 */
+	prot = PROT_READ | PROT_WRITE;
+	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
+	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
+
+	epc_page = sgx_alloc_epc_page(encl_page, true);
+	if (IS_ERR(epc_page)) {
+		kfree(encl_page);
+		return VM_FAULT_SIGBUS;
+	}
+
+	va_page = sgx_encl_grow(encl);
+	if (IS_ERR(va_page)) {
+		ret = PTR_ERR(va_page);
+		goto err_out_free;
+	}
+
+	mutex_lock(&encl->lock);
+
+	/*
+	 * Copy comment from sgx_encl_add_page() to maintain guidance in
+	 * this similar flow:
+	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
+	 * deleting (via sgx_encl_shrink()) in the error path.
+	 */
+	if (va_page)
+		list_add(&va_page->list, &encl->va_pages);
+
+	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
+			encl_page, GFP_KERNEL);
+	/*
+	 * If ret == -EBUSY then page was created in another flow while
+	 * running without encl->lock
+	 */
+	if (ret)
+		goto err_out_unlock;
+
+	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
+	pginfo.addr = encl_page->desc & PAGE_MASK;
+	pginfo.metadata = 0;
+
+	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
+	if (ret)
+		goto err_out;
+
+	encl_page->encl = encl;
+	encl_page->epc_page = epc_page;
+	encl_page->type = SGX_PAGE_TYPE_REG;
+	encl->secs_child_cnt++;
+
+	sgx_mark_page_reclaimable(encl_page->epc_page);
+
+	phys_addr = sgx_get_epc_phys_addr(epc_page);
+	/*
+	 * Do not undo everything when creating PTE entry fails - next #PF
+	 * would find page ready for a PTE.
+	 * PAGE_SHARED because protection is forced to be RW above and COW
+	 * is not supported.
+	 */
+	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
+				    PAGE_SHARED);
+	if (vmret != VM_FAULT_NOPAGE) {
+		mutex_unlock(&encl->lock);
+		return VM_FAULT_SIGBUS;
+	}
+	mutex_unlock(&encl->lock);
+	return VM_FAULT_NOPAGE;
+
+err_out:
+	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
+
+err_out_unlock:
+	sgx_encl_shrink(encl, va_page);
+	mutex_unlock(&encl->lock);
+
+err_out_free:
+	sgx_encl_free_epc_page(epc_page);
+	kfree(encl_page);
+
+	return VM_FAULT_SIGBUS;
+}
+
 static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 {
 	unsigned long addr = (unsigned long)vmf->address;
@@ -145,6 +267,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 	if (unlikely(!encl))
 		return VM_FAULT_SIGBUS;
 
+	/*
+	 * The page_array keeps track of all enclave pages, whether they
+	 * are swapped out or not. If there is no entry for this page and
+	 * the system supports SGX2 then it is possible to dynamically add
+	 * a new enclave page. This is only possible for an initialized
+	 * enclave that will be checked for right away.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
+	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
+		return sgx_encl_eaug_page(vma, encl, addr);
+
 	mutex_lock(&encl->lock);
 
 	entry = sgx_encl_load_page(encl, addr);
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 848a28d28d3d..1b6ce1da7c92 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -123,4 +123,6 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 					 unsigned long addr);
 
+struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
+void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 23bdf558b231..58ff62a1fb00 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -17,7 +17,7 @@
 #include "encl.h"
 #include "encls.h"
 
-static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
+struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 {
 	struct sgx_va_page *va_page = NULL;
 	void *err;
@@ -43,7 +43,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 	return va_page;
 }
 
-static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
+void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 {
 	encl->page_cnt--;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 20/32] x86/sgx: Tighten accessible memory range after enclave initialization
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (18 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 21/32] selftests/sgx: Test two different SGX2 EAUG flows Reinette Chatre
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Before an enclave is initialized the enclave's memory range is unknown.
The enclave's memory range is learned at the time it is created via the
SGX_IOC_ENCLAVE_CREATE ioctl() where the provided memory range is
obtained from an earlier mmap() of /dev/sgx_enclave. After an enclave
is initialized its memory can be mapped into user space (mmap()) from
where it can be entered at its defined entry points.

With the enclave's memory range known after it is initialized there is
no reason why it should be possible to map memory outside this range.

Lock down access to the initialized enclave's memory range by denying
any attempt to map memory outside its memory range.

Locking down the memory range also makes adding pages to an initialized
enclave more efficient. Pages are added to an initialized enclave by
accessing memory that belongs to the enclave's memory range but not yet
backed by an enclave page. If it is possible for user space to map
memory that does not form part of the enclave then an access to this
memory would eventually fail. Failures range from a prompt general
protection fault if the access was an ENCLU[EACCEPT] from within the
enclave, or a page fault via the vDSO if it was another access from
within the enclave, or a SIGBUS (also resulting from a page fault) if
the access was from outside the enclave.

Disallowing invalid memory to be mapped in the first place avoids
preventable failures.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Add comment (Jarkko).

 arch/x86/kernel/cpu/sgx/encl.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index d1e3ea86b902..c20100245411 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -403,6 +403,11 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
 
 	XA_STATE(xas, &encl->page_array, PFN_DOWN(start));
 
+	/* Disallow mapping outside enclave's address range. */
+	if (test_bit(SGX_ENCL_INITIALIZED, &encl->flags) &&
+	    (start < encl->base || end > encl->base + encl->size))
+		return -EACCES;
+
 	/*
 	 * Disallow READ_IMPLIES_EXEC tasks as their VMA permissions might
 	 * conflict with the enclave page permissions.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 21/32] selftests/sgx: Test two different SGX2 EAUG flows
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (19 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 20/32] x86/sgx: Tighten accessible memory range after enclave initialization Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-03-07 16:39   ` Jarkko Sakkinen
  2022-02-08  0:45 ` [PATCH V2 22/32] x86/sgx: Support modifying SGX page type Reinette Chatre
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Enclave pages can be added to an initialized enclave when an address
belonging to the enclave but without a backing page is accessed from
within the enclave.

Accessing memory without a backing enclave page from within an enclave
can be in different ways:
1) Pre-emptively run ENCLU[EACCEPT]. Since the addition of a page
   always needs to be accepted by the enclave via ENCLU[EACCEPT] this
   flow is efficient since the first execution of ENCLU[EACCEPT]
   triggers the addition of the page and when execution returns to the
   same instruction the second execution would be successful as an
   acceptance of the page.

2) A direct read or write. The flow where a direct read or write
   triggers the page addition execution cannot resume from the
   instruction (read/write) that triggered the fault but instead
   the enclave needs to be entered at a different entry point to
   run needed ENCLU[EACCEPT] before execution can return to the
   original entry point and the read/write instruction that faulted.

Add tests for both flows.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since v1:
- Replace __cpuid() definition and usage with __cpuid_count().
- Fix accuracy of comments.

 tools/testing/selftests/sgx/main.c | 243 +++++++++++++++++++++++++++++
 1 file changed, 243 insertions(+)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 1398cd1b0983..68285603b3f0 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -86,6 +86,15 @@ static bool vdso_get_symtab(void *addr, struct vdso_symtab *symtab)
 	return true;
 }
 
+static inline int sgx2_supported(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+
+	__cpuid_count(SGX_CPUID, 0x0, eax, ebx, ecx, edx);
+
+	return eax & 0x2;
+}
+
 static unsigned long elf_sym_hash(const char *name)
 {
 	unsigned long h = 0, high;
@@ -882,4 +891,238 @@ TEST_F(enclave, epcm_permissions)
 	EXPECT_EQ(self->run.exception_addr, 0);
 }
 
+/*
+ * Test the addition of pages to an initialized enclave via writing to
+ * a page belonging to the enclave's address space but was not added
+ * during enclave creation.
+ */
+TEST_F(enclave, augment)
+{
+	struct encl_op_get_from_addr get_addr_op;
+	struct encl_op_put_to_addr put_addr_op;
+	struct encl_op_eaccept eaccept_op;
+	size_t total_size = 0;
+	void *addr;
+	int i;
+
+	if (!sgx2_supported())
+		SKIP(return, "SGX2 not supported");
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	for (i = 0; i < self->encl.nr_segments; i++) {
+		struct encl_segment *seg = &self->encl.segment_tbl[i];
+
+		total_size += seg->size;
+	}
+
+	/*
+	 * Actual enclave size is expected to be larger than the loaded
+	 * test enclave since enclave size must be a power of 2 in bytes
+	 * and test_encl does not consume it all.
+	 */
+	EXPECT_LT(total_size + PAGE_SIZE, self->encl.encl_size);
+
+	/*
+	 * Create memory mapping for the page that will be added. New
+	 * memory mapping is for one page right after all existing
+	 * mappings.
+	 */
+	addr = mmap((void *)self->encl.encl_base + total_size, PAGE_SIZE,
+		    PROT_READ | PROT_WRITE | PROT_EXEC,
+		    MAP_SHARED | MAP_FIXED, self->encl.fd, 0);
+	EXPECT_NE(addr, MAP_FAILED);
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+
+	/*
+	 * Attempt to write to the new page from within enclave.
+	 * Expected to fail since page is not (yet) part of the enclave.
+	 * The first #PF will trigger the addition of the page to the
+	 * enclave, but since the new page needs an EACCEPT from within the
+	 * enclave before it can be used it would not be possible
+	 * to successfully return to the failing instruction. This is the
+	 * cause of the second #PF captured here having the SGX bit set,
+	 * it is from hardware preventing the page from being used.
+	 */
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = (unsigned long)addr;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(self->run.function, ERESUME);
+	EXPECT_EQ(self->run.exception_vector, 14);
+	EXPECT_EQ(self->run.exception_addr, (unsigned long)addr);
+
+	if (self->run.exception_error_code == 0x6) {
+		munmap(addr, PAGE_SIZE);
+		SKIP(return, "Kernel does not support adding pages to initialized enclave");
+	}
+
+	EXPECT_EQ(self->run.exception_error_code, 0x8007);
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+
+	/* Handle AEX by running EACCEPT from new entry point. */
+	self->run.tcs = self->encl.encl_base + PAGE_SIZE;
+
+	eaccept_op.epc_addr = self->encl.encl_base + total_size;
+	eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/* Can now return to main TCS to resume execution. */
+	self->run.tcs = self->encl.encl_base;
+
+	EXPECT_EQ(vdso_sgx_enter_enclave((unsigned long)&put_addr_op, 0, 0,
+					 ERESUME, 0, 0,
+					 &self->run),
+		  0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory from newly added page that was just written to,
+	 * confirming that data previously written (MAGIC) is present.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = (unsigned long)addr;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	munmap(addr, PAGE_SIZE);
+}
+
+/*
+ * Test for the addition of pages to an initialized enclave via a
+ * pre-emptive run of EACCEPT on page to be added.
+ */
+TEST_F(enclave, augment_via_eaccept)
+{
+	struct encl_op_get_from_addr get_addr_op;
+	struct encl_op_put_to_addr put_addr_op;
+	struct encl_op_eaccept eaccept_op;
+	size_t total_size = 0;
+	void *addr;
+	int i;
+
+	if (!sgx2_supported())
+		SKIP(return, "SGX2 not supported");
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	for (i = 0; i < self->encl.nr_segments; i++) {
+		struct encl_segment *seg = &self->encl.segment_tbl[i];
+
+		total_size += seg->size;
+	}
+
+	/*
+	 * Actual enclave size is expected to be larger than the loaded
+	 * test enclave since enclave size must be a power of 2 in bytes while
+	 * test_encl does not consume it all.
+	 */
+	EXPECT_LT(total_size + PAGE_SIZE, self->encl.encl_size);
+
+	/*
+	 * mmap() a page at end of existing enclave to be used for dynamic
+	 * EPC page.
+	 */
+
+	addr = mmap((void *)self->encl.encl_base + total_size, PAGE_SIZE,
+		    PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_FIXED,
+		    self->encl.fd, 0);
+	EXPECT_NE(addr, MAP_FAILED);
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+
+	/*
+	 * Run EACCEPT on new page to trigger the #PF->EAUG->EACCEPT(again
+	 * without a #PF). All should be transparent to userspace.
+	 */
+	eaccept_op.epc_addr = self->encl.encl_base + total_size;
+	eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	if (self->run.exception_vector == 14 &&
+	    self->run.exception_error_code == 4 &&
+	    self->run.exception_addr == self->encl.encl_base + total_size) {
+		munmap(addr, PAGE_SIZE);
+		SKIP(return, "Kernel does not support adding pages to initialized enclave");
+	}
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/*
+	 * New page should be accessible from within enclave - attempt to
+	 * write to it.
+	 */
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = (unsigned long)addr;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory from newly added page that was just written to,
+	 * confirming that data previously written (MAGIC) is present.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = (unsigned long)addr;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	munmap(addr, PAGE_SIZE);
+}
+
 TEST_HARNESS_MAIN
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 22/32] x86/sgx: Support modifying SGX page type
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (20 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 21/32] selftests/sgx: Test two different SGX2 EAUG flows Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 23/32] x86/sgx: Support complete page removal Reinette Chatre
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Every enclave contains one or more Thread Control Structures (TCS). The
TCS contains meta-data used by the hardware to save and restore thread
specific information when entering/exiting the enclave. With SGX1 an
enclave needs to be created with enough TCSs to support the largest
number of threads expecting to use the enclave and enough enclave pages
to meet all its anticipated memory demands. In SGX1 all pages remain in
the enclave until the enclave is unloaded.

SGX2 introduces a new function, ENCLS[EMODT], that is used to change
the type of an enclave page from a regular (SGX_PAGE_TYPE_REG) enclave
page to a TCS (SGX_PAGE_TYPE_TCS) page or change the type from a
regular (SGX_PAGE_TYPE_REG) or TCS (SGX_PAGE_TYPE_TCS)
page to a trimmed (SGX_PAGE_TYPE_TRIM) page (setting it up for later
removal).

With the existing support of dynamically adding regular enclave pages
to an initialized enclave and changing the page type to TCS it is
possible to dynamically increase the number of threads supported by an
enclave.

Changing the enclave page type to SGX_PAGE_TYPE_TRIM is the first step
of dynamically removing pages from an initialized enclave. The complete
page removal flow is:
1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM
   using the SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl() introduced here.
2) Approve the page removal by running ENCLU[EACCEPT] from within
   the enclave.
3) Initiate actual page removal using the ioctl() introduced in the
   following patch.

Add ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPE to support changing SGX
enclave page types within an initialized enclave. With
SGX_IOC_ENCLAVE_MODIFY_TYPE the user specifies a page range and the
enclave page type to be applied to all pages in the provided range.
The ioctl() itself can return an error code based on failures
encountered by the kernel. It is also possible for SGX specific
failures to be encountered.  Add a result output parameter to
communicate the SGX return code. It is possible for the enclave page
type change request to fail on any page within the provided range.
Support partial success by returning the number of pages that were
successfully changed.

After the page type is changed the page continues to be accessible
from the kernel perspective with page table entries and internal
state. The page may be moved to swap. Any access until ENCLU[EACCEPT]
will encounter a page fault with SGX flag set in error code.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Remove the "Earlier changes ..." paragraph (Jarkko).
- Change "new ioctl" text to "Add SGX_IOC_ENCLAVE_MOD_TYPE" (Jarkko).
- Discussion about EPCM interaction and the EPCM MODIFIED bit is moved
  to new patch that introduces the ENCLS[EMODT] wrapper while keeping
  the higher level discussion on page accessibility in
  this commit log (Jarkko).
- Rename SGX_IOC_PAGE_MODT ioctl() to SGX_IOC_ENCLAVE_MODIFY_TYPE
  (Jarkko).
- Rename struct sgx_page_modt to struct sgx_enclave_modt in support
  of ioctl() rename.
- Rename sgx_page_modt() to sgx_enclave_modt() and sgx_ioc_page_modt()
  to sgx_ioc_enclave_modt() in support of ioctl() rename.
- Provide secinfo as parameter to ioctl() instead of just
  page type (Jarkko).
- Update comments to refer to new ioctl() names.
- Use new SGX2 checking helper().
- Use ETRACK flow utility.
- Move kernel-doc to function that provides documentation for
  Documentation/x86/sgx.rst.
- Remove redundant comment.
- Use offset/length validation utility.
- Make explicit which members of struct sgx_enclave_modt are for
  output (Dave).

 arch/x86/include/uapi/asm/sgx.h |  20 +++
 arch/x86/kernel/cpu/sgx/ioctl.c | 212 ++++++++++++++++++++++++++++++++
 2 files changed, 232 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index b0ffb80bc67f..1df91517b612 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -33,6 +33,8 @@ enum sgx_page_flags {
 	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
 #define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
 	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
+#define SGX_IOC_ENCLAVE_MODIFY_TYPE \
+	_IOWR(SGX_MAGIC, 0x07, struct sgx_enclave_modt)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -116,6 +118,24 @@ struct sgx_enclave_restrict_perm {
 	__u64 count;
 };
 
+/**
+ * struct sgx_enclave_modt - parameters for %SGX_IOC_ENCLAVE_MODIFY_TYPE
+ * @offset:	starting page offset (page aligned relative to enclave base
+ *		address defined in SECS)
+ * @length:	length of memory (multiple of the page size)
+ * @secinfo:	address for the SECINFO data containing the new type
+ *		for pages in range described by @offset and @length
+ * @result:	(output) SGX result code of ENCLS[EMODT] function
+ * @count:	(output) bytes successfully changed (multiple of page size)
+ */
+struct sgx_enclave_modt {
+	__u64 offset;
+	__u64 length;
+	__u64 secinfo;
+	__u64 result;
+	__u64 count;
+};
+
 struct sgx_enclave_run;
 
 /**
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 58ff62a1fb00..3f59920184c4 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1120,6 +1120,215 @@ static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl,
 	return ret;
 }
 
+/**
+ * sgx_enclave_modt() - Modify type of SGX enclave pages
+ * @encl:	Enclave to which the pages belong.
+ * @modt:	Checked parameters from user about which pages need modifying.
+ * @page_type:	New page type.
+ *
+ * Return:
+ * - 0:		Success
+ * - -errno:	Otherwise
+ */
+static long sgx_enclave_modt(struct sgx_encl *encl,
+			     struct sgx_enclave_modt *modt,
+			     enum sgx_page_type page_type)
+{
+	unsigned long max_prot_restore, run_prot_restore;
+	struct sgx_encl_page *entry;
+	struct sgx_secinfo secinfo;
+	unsigned long prot;
+	unsigned long addr;
+	unsigned long c;
+	void *epc_virt;
+	int ret;
+
+	/*
+	 * The only new page types allowed by hardware are PT_TCS and PT_TRIM.
+	 */
+	if (page_type != SGX_PAGE_TYPE_TCS && page_type != SGX_PAGE_TYPE_TRIM)
+		return -EINVAL;
+
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = page_type << 8;
+
+	for (c = 0 ; c < modt->length; c += PAGE_SIZE) {
+		addr = encl->base + modt->offset + c;
+
+		mutex_lock(&encl->lock);
+
+		entry = sgx_encl_load_page(encl, addr);
+		if (IS_ERR(entry)) {
+			ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT;
+			goto out_unlock;
+		}
+
+		/*
+		 * Borrow the logic from the Intel SDM. Regular pages
+		 * (SGX_PAGE_TYPE_REG) can change type to SGX_PAGE_TYPE_TCS
+		 * or SGX_PAGE_TYPE_TRIM but TCS pages can only be trimmed.
+		 * CET pages not supported yet.
+		 */
+		if (!(entry->type == SGX_PAGE_TYPE_REG ||
+		      (entry->type == SGX_PAGE_TYPE_TCS &&
+		       page_type == SGX_PAGE_TYPE_TRIM))) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		max_prot_restore = entry->vm_max_prot_bits;
+		run_prot_restore = entry->vm_run_prot_bits;
+
+		/*
+		 * Once a regular page becomes a TCS page it cannot be
+		 * changed back. So the maximum allowed protection reflects
+		 * the TCS page that is always RW from kernel perspective but
+		 * will be inaccessible from within enclave. Before doing
+		 * so, do make sure that the new page type continues to
+		 * respect the originally vetted page permissions.
+		 */
+		if (entry->type == SGX_PAGE_TYPE_REG &&
+		    page_type == SGX_PAGE_TYPE_TCS) {
+			if (~entry->vm_max_prot_bits & (VM_READ | VM_WRITE)) {
+				ret = -EPERM;
+				goto out_unlock;
+			}
+			prot = PROT_READ | PROT_WRITE;
+			entry->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
+			entry->vm_run_prot_bits = entry->vm_max_prot_bits;
+
+			/*
+			 * Prevent page from being reclaimed while mutex
+			 * is released.
+			 */
+			if (sgx_unmark_page_reclaimable(entry->epc_page)) {
+				ret = -EAGAIN;
+				goto out_entry_changed;
+			}
+
+			/*
+			 * Do not keep encl->lock because of dependency on
+			 * mmap_lock acquired in sgx_zap_enclave_ptes().
+			 */
+			mutex_unlock(&encl->lock);
+
+			sgx_zap_enclave_ptes(encl, addr);
+
+			mutex_lock(&encl->lock);
+
+			sgx_mark_page_reclaimable(entry->epc_page);
+		}
+
+		/* Change EPC type */
+		epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
+		ret = __emodt(&secinfo, epc_virt);
+		if (encls_faulted(ret)) {
+			/*
+			 * All possible faults should be avoidable:
+			 * parameters have been checked, will only change
+			 * valid page types, and no concurrent
+			 * SGX1/SGX2 ENCLS instructions since these are
+			 * protected with mutex.
+			 */
+			pr_err_once("EMODT encountered exception %d\n",
+				    ENCLS_TRAPNR(ret));
+			ret = -EFAULT;
+			goto out_entry_changed;
+		}
+		if (encls_failed(ret)) {
+			modt->result = ret;
+			ret = -EFAULT;
+			goto out_entry_changed;
+		}
+
+		ret = sgx_enclave_etrack(encl);
+		if (ret) {
+			ret = -EFAULT;
+			goto out_unlock;
+		}
+
+		entry->type = page_type;
+
+		mutex_unlock(&encl->lock);
+	}
+
+	ret = 0;
+	goto out;
+
+out_entry_changed:
+	entry->vm_max_prot_bits = max_prot_restore;
+	entry->vm_run_prot_bits = run_prot_restore;
+out_unlock:
+	mutex_unlock(&encl->lock);
+out:
+	modt->count = c;
+
+	return ret;
+}
+
+/**
+ * sgx_ioc_enclave_modt() - handler for %SGX_IOC_ENCLAVE_MODIFY_TYPE
+ * @encl:	an enclave pointer
+ * @arg:	userspace pointer to a &struct sgx_enclave_modt instance
+ *
+ * Ability to change the enclave page type supports the following use cases:
+ *
+ * * It is possible to add TCS pages to an enclave by changing the type of
+ *   regular pages (%SGX_PAGE_TYPE_REG) to TCS (%SGX_PAGE_TYPE_TCS) pages.
+ *   With this support the number of threads supported by an initialized
+ *   enclave can be increased dynamically.
+ *
+ * * Regular or TCS pages can dynamically be removed from an initialized
+ *   enclave by changing the page type to %SGX_PAGE_TYPE_TRIM. Changing the
+ *   page type to %SGX_PAGE_TYPE_TRIM marks the page for removal with actual
+ *   removal done by handler of %SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl() called
+ *   after ENCLU[EACCEPT] is run on %SGX_PAGE_TYPE_TRIM page from within the
+ *   enclave.
+ *
+ * Return:
+ * - 0:		Success
+ * - -errno:	Otherwise
+ */
+static long sgx_ioc_enclave_modt(struct sgx_encl *encl, void __user *arg)
+{
+	struct sgx_enclave_modt params;
+	enum sgx_page_type page_type;
+	struct sgx_secinfo secinfo;
+	long ret;
+
+	ret = sgx_ioc_sgx2_ready(encl);
+	if (ret)
+		return ret;
+
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (sgx_validate_offset_length(encl, params.offset, params.length))
+		return -EINVAL;
+
+	if (copy_from_user(&secinfo, (void __user *)params.secinfo,
+			   sizeof(secinfo)))
+		return -EFAULT;
+
+	if (secinfo.flags & ~SGX_SECINFO_PAGE_TYPE_MASK)
+		return -EINVAL;
+
+	if (memchr_inv(secinfo.reserved, 0, sizeof(secinfo.reserved)))
+		return -EINVAL;
+
+	if (params.result || params.count)
+		return -EINVAL;
+
+	page_type = (secinfo.flags & SGX_SECINFO_PAGE_TYPE_MASK) >> 8;
+	ret = sgx_enclave_modt(encl, &params, page_type);
+
+	if (copy_to_user(arg, &params, sizeof(params)))
+		return -EFAULT;
+
+	return ret;
+}
+
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
 	struct sgx_encl *encl = filep->private_data;
@@ -1147,6 +1356,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
 		ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg);
 		break;
+	case SGX_IOC_ENCLAVE_MODIFY_TYPE:
+		ret = sgx_ioc_enclave_modt(encl, (void __user *)arg);
+		break;
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 23/32] x86/sgx: Support complete page removal
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (21 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 22/32] x86/sgx: Support modifying SGX page type Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 24/32] Documentation/x86: Introduce enclave runtime management section Reinette Chatre
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

The SGX2 page removal flow was introduced in previous patch and is
as follows:
1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM
   using the ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPE introduced in
   previous patch.
2) Approve the page removal by running ENCLU[EACCEPT] from within
   the enclave.
3) Initiate actual page removal using the ioctl()
   SGX_IOC_ENCLAVE_REMOVE_PAGES introduced here.

Support the final step of the SGX2 page removal flow with ioctl()
SGX_IOC_ENCLAVE_REMOVE_PAGES. With this ioctl() the user specifies
a page range that should be removed. All pages in the provided
range should have the SGX_PAGE_TYPE_TRIM page type and the request
will fail with EPERM (Operation not permitted) if a page that does
not have the correct type is encountered. Page removal can fail
on any page within the provided range. Support partial success by
returning the number of pages that were successfully removed.

Since actual page removal will succeed even if ENCLU[EACCEPT] was not
run from within the enclave the ENCLU[EMODPR] instruction with RWX
permissions is used as a no-op mechanism to ensure ENCLU[EACCEPT] was
successfully run from within the enclave before the enclave page is
removed.

If the user omits running SGX_IOC_ENCLAVE_REMOVE_PAGES the pages will
still be removed when the enclave is unloaded.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Update comments to refer to new ioctl() names SGX_IOC_PAGE_MODT ->
  SGX_IOC_ENCLAVE_MODIFY_TYPE.
- Fix kernel-doc to have () as part of function name.
- Change name of ioctl():
  SGX_IOC_PAGE_REMOVE -> SGX_IOC_ENCLAVE_REMOVE_PAGES (Jarkko).
- With the above name change the page removal ioctl() has its name
  aligned with existing SGX_IOC_ENCLAVE_ADD_PAGES ioctl(). Also align
  naming of struct and functions:
  struct sgx_page_remove -> struct sgx_enclave_remove_pages
  sgx_page_remove() -> sgx_encl_remove_pages()
  sgx_ioc_page_remove() -> sgx_ioc_enclave_remove_pages()
- Use new SGX2 checking helper.
- When loading enclave page, make error code consistent with other
  instances to help user distinguish between permanent and temporary
  failures.
- Move kernel-doc to function that provides documentation for
  Documentation/x86/sgx.rst.
- Remove redundant comment.
- Use offset/length validation utility.
- Make explicit which member of struct sgx_enclave_remove_pages is for
  output (Dave).

 arch/x86/include/uapi/asm/sgx.h |  21 +++++
 arch/x86/kernel/cpu/sgx/ioctl.c | 145 ++++++++++++++++++++++++++++++++
 2 files changed, 166 insertions(+)

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 1df91517b612..db969a2a1874 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -35,6 +35,8 @@ enum sgx_page_flags {
 	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
 #define SGX_IOC_ENCLAVE_MODIFY_TYPE \
 	_IOWR(SGX_MAGIC, 0x07, struct sgx_enclave_modt)
+#define SGX_IOC_ENCLAVE_REMOVE_PAGES \
+	_IOWR(SGX_MAGIC, 0x08, struct sgx_enclave_remove_pages)
 
 /**
  * struct sgx_enclave_create - parameter structure for the
@@ -136,6 +138,25 @@ struct sgx_enclave_modt {
 	__u64 count;
 };
 
+/**
+ * struct sgx_enclave_remove_pages - %SGX_IOC_ENCLAVE_REMOVE_PAGES parameters
+ * @offset:	starting page offset (page aligned relative to enclave base
+ *		address defined in SECS)
+ * @length:	length of memory (multiple of the page size)
+ * @count:	(output) bytes successfully changed (multiple of page size)
+ *
+ * Regular (PT_REG) or TCS (PT_TCS) can be removed from an initialized
+ * enclave if the system supports SGX2. First, the %SGX_IOC_ENCLAVE_MODIFY_TYPE
+ * ioctl() should be used to change the page type to PT_TRIM. After that
+ * succeeds ENCLU[EACCEPT] should be run from within the enclave and then
+ * %SGX_IOC_ENCLAVE_REMOVE_PAGES can be used to complete the page removal.
+ */
+struct sgx_enclave_remove_pages {
+	__u64 offset;
+	__u64 length;
+	__u64 count;
+};
+
 struct sgx_enclave_run;
 
 /**
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 3f59920184c4..0ffb07095a80 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1329,6 +1329,148 @@ static long sgx_ioc_enclave_modt(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
+/**
+ * sgx_encl_remove_pages() - Remove trimmed pages from SGX enclave
+ * @encl:	Enclave to which the pages belong
+ * @params:	Checked parameters from user on which pages need to be removed
+ *
+ * Return:
+ * - 0:		Success.
+ * - -errno:	Otherwise.
+ */
+static long sgx_encl_remove_pages(struct sgx_encl *encl,
+				  struct sgx_enclave_remove_pages *params)
+{
+	struct sgx_encl_page *entry;
+	struct sgx_secinfo secinfo;
+	unsigned long addr;
+	unsigned long c;
+	void *epc_virt;
+	int ret;
+
+	memset(&secinfo, 0, sizeof(secinfo));
+	secinfo.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
+
+	for (c = 0 ; c < params->length; c += PAGE_SIZE) {
+		addr = encl->base + params->offset + c;
+
+		mutex_lock(&encl->lock);
+
+		entry = sgx_encl_load_page(encl, addr);
+		if (IS_ERR(entry)) {
+			ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT;
+			goto out_unlock;
+		}
+
+		if (entry->type != SGX_PAGE_TYPE_TRIM) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+
+		/*
+		 * ENCLS[EMODPR] is a no-op instruction used to inform if
+		 * ENCLU[EACCEPT] was run from within the enclave. If
+		 * ENCLS[EMODPR] is run with RWX on a trimmed page that is
+		 * not yet accepted then it will return
+		 * %SGX_PAGE_NOT_MODIFIABLE, after the trimmed page is
+		 * accepted the instruction will encounter a page fault.
+		 */
+		epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
+		ret = __emodpr(&secinfo, epc_virt);
+		if (!encls_faulted(ret) || ENCLS_TRAPNR(ret) != X86_TRAP_PF) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+
+		if (sgx_unmark_page_reclaimable(entry->epc_page)) {
+			ret = -EBUSY;
+			goto out_unlock;
+		}
+
+		/*
+		 * Do not keep encl->lock because of dependency on
+		 * mmap_lock acquired in sgx_zap_enclave_ptes().
+		 */
+		mutex_unlock(&encl->lock);
+
+		sgx_zap_enclave_ptes(encl, addr);
+
+		mutex_lock(&encl->lock);
+
+		sgx_encl_free_epc_page(entry->epc_page);
+		encl->secs_child_cnt--;
+		entry->epc_page = NULL;
+		xa_erase(&encl->page_array, PFN_DOWN(entry->desc));
+		sgx_encl_shrink(encl, NULL);
+		kfree(entry);
+
+		mutex_unlock(&encl->lock);
+	}
+
+	ret = 0;
+	goto out;
+
+out_unlock:
+	mutex_unlock(&encl->lock);
+out:
+	params->count = c;
+
+	return ret;
+}
+
+/**
+ * sgx_ioc_enclave_remove_pages() - handler for %SGX_IOC_ENCLAVE_REMOVE_PAGES
+ * @encl:	an enclave pointer
+ * @arg:	userspace pointer to &struct sgx_enclave_remove_pages instance
+ *
+ * Final step of the flow removing pages from an initialized enclave. The
+ * complete flow is:
+ *
+ * 1) User changes the type of the pages to be removed to %SGX_PAGE_TYPE_TRIM
+ *    using the %SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl().
+ * 2) User approves the page removal by running ENCLU[EACCEPT] from within
+ *    the enclave.
+ * 3) User initiates actual page removal using the
+ *    %SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl() that is handled here.
+ *
+ * First remove any page table entries pointing to the page and then proceed
+ * with the actual removal of the enclave page and data in support of it.
+ *
+ * VA pages are not affected by this removal. It is thus possible that the
+ * enclave may end up with more VA pages than needed to support all its
+ * pages.
+ *
+ * Return:
+ * - 0:		Success
+ * - -errno:	Otherwise
+ */
+static long sgx_ioc_enclave_remove_pages(struct sgx_encl *encl,
+					 void __user *arg)
+{
+	struct sgx_enclave_remove_pages params;
+	long ret;
+
+	ret = sgx_ioc_sgx2_ready(encl);
+	if (ret)
+		return ret;
+
+	if (copy_from_user(&params, arg, sizeof(params)))
+		return -EFAULT;
+
+	if (sgx_validate_offset_length(encl, params.offset, params.length))
+		return -EINVAL;
+
+	if (params.count)
+		return -EINVAL;
+
+	ret = sgx_encl_remove_pages(encl, &params);
+
+	if (copy_to_user(arg, &params, sizeof(params)))
+		return -EFAULT;
+
+	return ret;
+}
+
 long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
 	struct sgx_encl *encl = filep->private_data;
@@ -1359,6 +1501,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	case SGX_IOC_ENCLAVE_MODIFY_TYPE:
 		ret = sgx_ioc_enclave_modt(encl, (void __user *)arg);
 		break;
+	case SGX_IOC_ENCLAVE_REMOVE_PAGES:
+		ret = sgx_ioc_enclave_remove_pages(encl, (void __user *)arg);
+		break;
 	default:
 		ret = -ENOIOCTLCMD;
 		break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 24/32] Documentation/x86: Introduce enclave runtime management section
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (22 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 23/32] x86/sgx: Support complete page removal Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 25/32] selftests/sgx: Introduce dynamic entry point Reinette Chatre
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Enclave runtime management is introduced following the pattern
of the section describing enclave building. Provide a brief
summary of enclave runtime management, pointing to the functions
implementing the ioctl()s that will contain details within their
kernel-doc.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch.

 Documentation/x86/sgx.rst | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index 9df620b59f83..4059efbb4d2e 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -138,6 +138,22 @@ pages and establish enclave page permissions.
                sgx_ioc_enclave_init
                sgx_ioc_enclave_provision
 
+Enclave runtime management
+--------------------------
+
+Systems supporting SGX2 additionally support changes to initialized
+enclaves: modifying enclave page permissions and type, and dynamically
+adding and removing of enclave pages. When an enclave accesses an address
+within its address range that does not have a backing page then a new
+regular page will be dynamically added to the enclave. The enclave is
+still required to run EACCEPT on the new page before it can be used.
+
+.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
+   :functions: sgx_ioc_enclave_relax_perm
+               sgx_ioc_enclave_restrict_perm
+               sgx_ioc_enclave_modt
+               sgx_ioc_enclave_remove_pages
+
 Enclave vDSO
 ------------
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 25/32] selftests/sgx: Introduce dynamic entry point
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (23 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 24/32] Documentation/x86: Introduce enclave runtime management section Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 26/32] selftests/sgx: Introduce TCS initialization enclave operation Reinette Chatre
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

The test enclave (test_encl.elf) is built with two initialized
Thread Control Structures (TCS) included in the binary. Both TCS are
initialized with the same entry point, encl_entry, that correctly
computes the absolute address of the stack based on the stack of each
TCS that is also built into the binary.

A new TCS can be added dynamically to the enclave and requires to be
initialized with an entry point used to enter the enclave. Since the
existing entry point, encl_entry, assumes that the TCS and its stack
exists at particular offsets within the binary it is not able to handle
a dynamically added TCS and its stack.

Introduce a new entry point, encl_dyn_entry, that initializes the
absolute address of that thread's stack to the address immediately
preceding the TCS itself. It is now possible to dynamically add a
contiguous memory region to the enclave with the new stack preceding
the new TCS. With the new TCS initialized with encl_dyn_entry as entry
point the absolute address of the stack is computed correctly on entry.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since V1.

 tools/testing/selftests/sgx/test_encl_bootstrap.S | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/sgx/test_encl_bootstrap.S b/tools/testing/selftests/sgx/test_encl_bootstrap.S
index 82fb0dfcbd23..03ae0f57e29d 100644
--- a/tools/testing/selftests/sgx/test_encl_bootstrap.S
+++ b/tools/testing/selftests/sgx/test_encl_bootstrap.S
@@ -45,6 +45,12 @@ encl_entry:
 	# TCS #2. By adding the value of encl_stack to it, we get
 	# the absolute address for the stack.
 	lea	(encl_stack)(%rbx), %rax
+	jmp encl_entry_core
+encl_dyn_entry:
+	# Entry point for dynamically created TCS page expected to follow
+	# its stack directly.
+	lea -1(%rbx), %rax
+encl_entry_core:
 	xchg	%rsp, %rax
 	push	%rax
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 26/32] selftests/sgx: Introduce TCS initialization enclave operation
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (24 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 25/32] selftests/sgx: Introduce dynamic entry point Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 27/32] selftests/sgx: Test complete changing of page type flow Reinette Chatre
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

The Thread Control Structure (TCS) contains meta-data used by the
hardware to save and restore thread specific information when
entering/exiting the enclave. A TCS can be added to an initialized
enclave by first adding a new regular enclave page, initializing the
content of the new page from within the enclave, and then changing that
page's type to a TCS.

Support the initialization of a TCS from within the enclave.
The variable information needed that should be provided from outside
the enclave is the address of the TCS, address of the State Save Area
(SSA), and the entry point that the thread should use to enter the
enclave. With this information provided all needed fields of a TCS
can be initialized.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
No changes since V1.

 tools/testing/selftests/sgx/defines.h   |  8 +++++++
 tools/testing/selftests/sgx/test_encl.c | 30 +++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/tools/testing/selftests/sgx/defines.h b/tools/testing/selftests/sgx/defines.h
index b638eb98c80c..d8587c971941 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -26,6 +26,7 @@ enum encl_op_type {
 	ENCL_OP_NOP,
 	ENCL_OP_EACCEPT,
 	ENCL_OP_EMODPE,
+	ENCL_OP_INIT_TCS_PAGE,
 	ENCL_OP_MAX,
 };
 
@@ -68,4 +69,11 @@ struct encl_op_emodpe {
 	uint64_t flags;
 };
 
+struct encl_op_init_tcs_page {
+	struct encl_op_header header;
+	uint64_t tcs_page;
+	uint64_t ssa;
+	uint64_t entry;
+};
+
 #endif /* DEFINES_H */
diff --git a/tools/testing/selftests/sgx/test_encl.c b/tools/testing/selftests/sgx/test_encl.c
index 5b6c65331527..c0d6397295e3 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -57,6 +57,35 @@ static void *memcpy(void *dest, const void *src, size_t n)
 	return dest;
 }
 
+static void *memset(void *dest, int c, size_t n)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++)
+		((char *)dest)[i] = c;
+
+	return dest;
+}
+
+static void do_encl_init_tcs_page(void *_op)
+{
+	struct encl_op_init_tcs_page *op = _op;
+	void *tcs = (void *)op->tcs_page;
+	uint32_t val_32;
+
+	memset(tcs, 0, 16);			/* STATE and FLAGS */
+	memcpy(tcs + 16, &op->ssa, 8);		/* OSSA */
+	memset(tcs + 24, 0, 4);			/* CSSA */
+	val_32 = 1;
+	memcpy(tcs + 28, &val_32, 4);		/* NSSA */
+	memcpy(tcs + 32, &op->entry, 8);	/* OENTRY */
+	memset(tcs + 40, 0, 24);		/* AEP, OFSBASE, OGSBASE */
+	val_32 = 0xFFFFFFFF;
+	memcpy(tcs + 64, &val_32, 4);		/* FSLIMIT */
+	memcpy(tcs + 68, &val_32, 4);		/* GSLIMIT */
+	memset(tcs + 72, 0, 4024);		/* Reserved */
+}
+
 static void do_encl_op_put_to_buf(void *op)
 {
 	struct encl_op_put_to_buf *op2 = op;
@@ -100,6 +129,7 @@ void encl_body(void *rdi,  void *rsi)
 		do_encl_op_nop,
 		do_encl_eaccept,
 		do_encl_emodpe,
+		do_encl_init_tcs_page,
 	};
 
 	struct encl_op_header *op = (struct encl_op_header *)rdi;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 27/32] selftests/sgx: Test complete changing of page type flow
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (25 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 26/32] selftests/sgx: Introduce TCS initialization enclave operation Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 28/32] selftests/sgx: Test faulty enclave behavior Reinette Chatre
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Support for changing an enclave page's type enables an initialized
enclave to be expanded with support for more threads by changing the
type of a regular enclave page to that of a Thread Control Structure
(TCS).  Additionally, being able to change a TCS or regular enclave
page's type to be trimmed (SGX_PAGE_TYPE_TRIM) initiates the removal
of the page from the enclave.

Test changing page type to TCS as well as page removal flows
in two phases: In the first phase support for a new thread is
dynamically added to an initialized enclave and in the second phase
the pages associated with the new thread are removed from the enclave.
As an additional sanity check after the second phase the page used as
a TCS page during the first phase is added back as a regular page and
ensured that it can be written to (which is not possible if it was a
TCS page).

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Update to support ioctl() name change (SGX_IOC_PAGE_MODT ->
  SGX_IOC_ENCLAVE_MODIFY_TYPE) and provide secinfo as parameter instead
  of just page type (Jarkko).
- Update test to reflect page removal ioctl() and struct name change:
  SGX_IOC_PAGE_REMOVE->SGX_IOC_ENCLAVE_REMOVE_PAGES,
  struct sgx_page_remove -> struct sgx_enclave_remove_pages (Jarkko).
- Use ioctl() instead of ioctl (Dave).

 tools/testing/selftests/sgx/load.c |  41 ++++
 tools/testing/selftests/sgx/main.c | 347 +++++++++++++++++++++++++++++
 tools/testing/selftests/sgx/main.h |   1 +
 3 files changed, 389 insertions(+)

diff --git a/tools/testing/selftests/sgx/load.c b/tools/testing/selftests/sgx/load.c
index 006b464c8fc9..94bdeac1cf04 100644
--- a/tools/testing/selftests/sgx/load.c
+++ b/tools/testing/selftests/sgx/load.c
@@ -130,6 +130,47 @@ static bool encl_ioc_add_pages(struct encl *encl, struct encl_segment *seg)
 	return true;
 }
 
+/*
+ * Parse the enclave code's symbol table to locate and return address of
+ * the provided symbol
+ */
+uint64_t encl_get_entry(struct encl *encl, const char *symbol)
+{
+	Elf64_Shdr *sections;
+	Elf64_Sym *symtab;
+	Elf64_Ehdr *ehdr;
+	char *sym_names;
+	int num_sym;
+	int i;
+
+	ehdr = encl->bin;
+	sections = encl->bin + ehdr->e_shoff;
+
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		if (sections[i].sh_type == SHT_SYMTAB) {
+			symtab = (Elf64_Sym *)((char *)encl->bin + sections[i].sh_offset);
+			num_sym = sections[i].sh_size / sections[i].sh_entsize;
+			break;
+		}
+	}
+
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		if (sections[i].sh_type == SHT_STRTAB) {
+			sym_names = (char *)encl->bin + sections[i].sh_offset;
+			break;
+		}
+	}
+
+	for (i = 0; i < num_sym; i++) {
+		Elf64_Sym *sym = &symtab[i];
+
+		if (!strcmp(symbol, sym_names + sym->st_name))
+			return (uint64_t)sym->st_value;
+	}
+
+	return 0;
+}
+
 bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
 {
 	const char device_path[] = "/dev/sgx_enclave";
diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 68285603b3f0..53a581bd56c5 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -1125,4 +1125,351 @@ TEST_F(enclave, augment_via_eaccept)
 	munmap(addr, PAGE_SIZE);
 }
 
+/*
+ * SGX2 page type modification test in two phases:
+ * Phase 1:
+ * Create a new TCS, consisting out of three new pages (stack page with regular
+ * page type, SSA page with regular page type, and TCS page with TCS page
+ * type) in an initialized enclave and run a simple workload within it.
+ * Phase 2:
+ * Remove the three pages added in phase 1, add a new regular page at the
+ * same address that previously hosted the TCS page and verify that it can
+ * be modified.
+ */
+TEST_F(enclave, tcs_create)
+{
+	struct encl_op_init_tcs_page init_tcs_page_op;
+	struct sgx_enclave_remove_pages remove_ioc;
+	struct encl_op_get_from_addr get_addr_op;
+	struct encl_op_put_to_addr put_addr_op;
+	struct encl_op_get_from_buf get_buf_op;
+	struct encl_op_put_to_buf put_buf_op;
+	void *addr, *tcs, *stack_end, *ssa;
+	struct encl_op_eaccept eaccept_op;
+	struct sgx_enclave_modt modt_ioc;
+	struct sgx_secinfo secinfo;
+	size_t total_size = 0;
+	uint64_t val_64;
+	int errno_save;
+	int ret, i;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl,
+				    _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	/*
+	 * Hardware (SGX2) and kernel support is needed for this test. Start
+	 * with check that test has a chance of succeeding.
+	 */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+
+	if (ret == -1) {
+		if (errno == ENOTTY)
+			SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+		else if (errno == ENODEV)
+			SKIP(return, "System does not support SGX2");
+	}
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	EXPECT_EQ(ret, -1);
+
+	/*
+	 * Add three regular pages via EAUG: one will be the TCS stack, one
+	 * will be the TCS SSA, and one will be the new TCS. The stack and
+	 * SSA will remain as regular pages, the TCS page will need its
+	 * type changed after populated with needed data.
+	 */
+	for (i = 0; i < self->encl.nr_segments; i++) {
+		struct encl_segment *seg = &self->encl.segment_tbl[i];
+
+		total_size += seg->size;
+	}
+
+	/*
+	 * Actual enclave size is expected to be larger than the loaded
+	 * test enclave since enclave size must be a power of 2 in bytes while
+	 * test_encl does not consume it all.
+	 */
+	EXPECT_LT(total_size + 3 * PAGE_SIZE, self->encl.encl_size);
+
+	/*
+	 * mmap() three pages at end of existing enclave to be used for the
+	 * three new pages.
+	 */
+	addr = mmap((void *)self->encl.encl_base + total_size, 3 * PAGE_SIZE,
+		    PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED,
+		    self->encl.fd, 0);
+	EXPECT_NE(addr, MAP_FAILED);
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+
+	stack_end = (void *)self->encl.encl_base + total_size;
+	tcs = (void *)self->encl.encl_base + total_size + PAGE_SIZE;
+	ssa = (void *)self->encl.encl_base + total_size + 2 * PAGE_SIZE;
+
+	/*
+	 * Run EACCEPT on each new page to trigger the
+	 * EACCEPT->(#PF)->EAUG->EACCEPT(again without a #PF) flow.
+	 */
+
+	eaccept_op.epc_addr = (unsigned long)stack_end;
+	eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	if (self->run.exception_vector == 14 &&
+	    self->run.exception_error_code == 4 &&
+	    self->run.exception_addr == (unsigned long)stack_end) {
+		munmap(addr, 3 * PAGE_SIZE);
+		SKIP(return, "Kernel does not support adding pages to initialized enclave");
+	}
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	eaccept_op.epc_addr = (unsigned long)ssa;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	eaccept_op.epc_addr = (unsigned long)tcs;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/*
+	 * Three new pages added to enclave. Now populate the TCS page with
+	 * needed data. This should be done from within enclave. Provide
+	 * the function that will do the actual data population with needed
+	 * data.
+	 */
+
+	/*
+	 * New TCS will use the "encl_dyn_entry" entrypoint that expects
+	 * stack to begin in page before TCS page.
+	 */
+	val_64 = encl_get_entry(&self->encl, "encl_dyn_entry");
+	EXPECT_NE(val_64, 0);
+
+	init_tcs_page_op.tcs_page = (unsigned long)tcs;
+	init_tcs_page_op.ssa = (unsigned long)total_size + 2 * PAGE_SIZE;
+	init_tcs_page_op.entry = val_64;
+	init_tcs_page_op.header.type = ENCL_OP_INIT_TCS_PAGE;
+
+	EXPECT_EQ(ENCL_CALL(&init_tcs_page_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/* Change TCS page type to TCS. */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = SGX_PAGE_TYPE_TCS << 8;
+	modt_ioc.offset = total_size + PAGE_SIZE;
+	modt_ioc.length = PAGE_SIZE;
+	modt_ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(modt_ioc.result, 0);
+	EXPECT_EQ(modt_ioc.count, 4096);
+
+	/* EACCEPT new TCS page from enclave. */
+	eaccept_op.epc_addr = (unsigned long)tcs;
+	eaccept_op.flags = SGX_SECINFO_TCS | SGX_SECINFO_MODIFIED;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/* Run workload from new TCS. */
+	self->run.tcs = (unsigned long)tcs;
+
+	/*
+	 * Simple workload to write to data buffer and read value back.
+	 */
+	put_buf_op.header.type = ENCL_OP_PUT_TO_BUFFER;
+	put_buf_op.value = MAGIC;
+
+	EXPECT_EQ(ENCL_CALL(&put_buf_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	get_buf_op.header.type = ENCL_OP_GET_FROM_BUFFER;
+	get_buf_op.value = 0;
+
+	EXPECT_EQ(ENCL_CALL(&get_buf_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_buf_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Phase 2 of test:
+	 * Remove pages associated with new TCS, create a regular page
+	 * where TCS page used to be and verify it can be used as a regular
+	 * page.
+	 */
+
+	/* Start page removal by requesting change of page type to PT_TRIM. */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+	modt_ioc.offset = total_size;
+	modt_ioc.length = 3 * PAGE_SIZE;
+	modt_ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(modt_ioc.result, 0);
+	EXPECT_EQ(modt_ioc.count, 3 * PAGE_SIZE);
+
+	/*
+	 * Enter enclave via TCS #1 and approve page removal by sending
+	 * EACCEPT for each of three removed pages.
+	 */
+	self->run.tcs = self->encl.encl_base;
+
+	eaccept_op.epc_addr = (unsigned long)stack_end;
+	eaccept_op.flags = SGX_SECINFO_TRIM | SGX_SECINFO_MODIFIED;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	eaccept_op.epc_addr = (unsigned long)tcs;
+	eaccept_op.ret = 0;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	eaccept_op.epc_addr = (unsigned long)ssa;
+	eaccept_op.ret = 0;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/* Send final ioctl() to complete page removal. */
+	memset(&remove_ioc, 0, sizeof(remove_ioc));
+
+	remove_ioc.offset = total_size;
+	remove_ioc.length = 3 * PAGE_SIZE;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(remove_ioc.count, 3 * PAGE_SIZE);
+
+	/*
+	 * Enter enclave via TCS #1 and access location where TCS #3 was to
+	 * trigger dynamic add of regular page at that location.
+	 */
+	eaccept_op.epc_addr = (unsigned long)tcs;
+	eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/*
+	 * New page should be accessible from within enclave - write to it.
+	 */
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = (unsigned long)tcs;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory from newly added page that was just written to,
+	 * confirming that data previously written (MAGIC) is present.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = (unsigned long)tcs;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	munmap(addr, 3 * PAGE_SIZE);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/sgx/main.h b/tools/testing/selftests/sgx/main.h
index b45c52ec7ab3..fc585be97e2f 100644
--- a/tools/testing/selftests/sgx/main.h
+++ b/tools/testing/selftests/sgx/main.h
@@ -38,6 +38,7 @@ void encl_delete(struct encl *ctx);
 bool encl_load(const char *path, struct encl *encl, unsigned long heap_size);
 bool encl_measure(struct encl *encl);
 bool encl_build(struct encl *encl);
+uint64_t encl_get_entry(struct encl *encl, const char *symbol);
 
 int sgx_enter_enclave(void *rdi, void *rsi, long rdx, u32 function, void *r8, void *r9,
 		      struct sgx_enclave_run *run);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 28/32] selftests/sgx: Test faulty enclave behavior
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (26 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 27/32] selftests/sgx: Test complete changing of page type flow Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 29/32] selftests/sgx: Test invalid access to removed enclave page Reinette Chatre
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Removing a page from an initialized enclave involves three steps:
first the user requests changing the page type to SGX_PAGE_TYPE_TRIM
via an ioctl(), on success the ENCLU[EACCEPT] instruction needs to be
run from within the enclave to accept the page removal, finally the
user requests page removal to be completed via an ioctl(). Only after
acceptance (ENCLU[EACCEPT]) from within the enclave can the kernel
remove the page from a running enclave.

Test the behavior when the user's request to change the page type
succeeds, but the ENCLU[EACCEPT] instruction is not run before the
ioctl() requesting page removal is run. This should not be permitted.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Update to support ioctl() name change (SGX_IOC_PAGE_MODT ->
  SGX_IOC_ENCLAVE_MODIFY_TYPE) and provide secinfo as parameter instead
  of just page type (Jarkko).
- Update test to reflect page removal ioctl() and struct name change:
  SGX_IOC_PAGE_REMOVE->SGX_IOC_ENCLAVE_REMOVE_PAGES,
  struct sgx_page_remove -> struct sgx_enclave_remove_pages (Jarkko).
- Use ioctl() instead of ioctl in text (Dave).

 tools/testing/selftests/sgx/main.c | 116 +++++++++++++++++++++++++++++
 1 file changed, 116 insertions(+)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 53a581bd56c5..e9513ced1853 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -1472,4 +1472,120 @@ TEST_F(enclave, tcs_create)
 	munmap(addr, 3 * PAGE_SIZE);
 }
 
+/*
+ * Ensure sane behavior if user requests page removal, does not run
+ * EACCEPT from within enclave but still attempts to finalize page removal
+ * with the SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl(). The latter should fail
+ * because the removal was not EACCEPTed from within the enclave.
+ */
+TEST_F(enclave, remove_added_page_no_eaccept)
+{
+	struct sgx_enclave_remove_pages remove_ioc;
+	struct encl_op_get_from_addr get_addr_op;
+	struct encl_op_put_to_addr put_addr_op;
+	struct sgx_enclave_modt modt_ioc;
+	struct sgx_secinfo secinfo;
+	unsigned long data_start;
+	int ret, errno_save;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	/*
+	 * Hardware (SGX2) and kernel support is needed for this test. Start
+	 * with check that test has a chance of succeeding.
+	 */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+
+	if (ret == -1) {
+		if (errno == ENOTTY)
+			SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+		else if (errno == ENODEV)
+			SKIP(return, "System does not support SGX2");
+	}
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	EXPECT_EQ(ret, -1);
+
+	/*
+	 * Page that will be removed is the second data page in the .data
+	 * segment. This forms part of the local encl_buffer within the
+	 * enclave.
+	 */
+	data_start = self->encl.encl_base +
+		     encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+	/*
+	 * Sanity check that page at @data_start is writable before
+	 * removing it.
+	 *
+	 * Start by writing MAGIC to test page.
+	 */
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = data_start;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory that was just written to, confirming that data
+	 * previously written (MAGIC) is present.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = data_start;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/* Start page removal by requesting change of page type to PT_TRIM */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+	modt_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	modt_ioc.length = PAGE_SIZE;
+	modt_ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(modt_ioc.result, 0);
+	EXPECT_EQ(modt_ioc.count, 4096);
+
+	/* Skip EACCEPT */
+
+	/* Send final ioctl() to complete page removal */
+	memset(&remove_ioc, 0, sizeof(remove_ioc));
+
+	remove_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	remove_ioc.length = PAGE_SIZE;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	/* Operation not permitted since EACCEPT was omitted. */
+	EXPECT_EQ(ret, -1);
+	EXPECT_EQ(errno_save, EPERM);
+	EXPECT_EQ(remove_ioc.count, 0);
+}
+
 TEST_HARNESS_MAIN
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 29/32] selftests/sgx: Test invalid access to removed enclave page
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (27 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 28/32] selftests/sgx: Test faulty enclave behavior Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 30/32] selftests/sgx: Test reclaiming of untouched page Reinette Chatre
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Removing a page from an initialized enclave involves three steps:
(1) the user requests changing the page type to SGX_PAGE_TYPE_TRIM
via the SGX_IOC_ENCLAVE_MODIFY_TYPE  ioctl(), (2) on success the
ENCLU[EACCEPT] instruction is run from within the enclave to accept
the page removal, (3) the user initiates the actual removal of the
page via the SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl().

Test two possible invalid accesses during the page removal flow:
* Test the behavior when a request to remove the page by changing its
  type to SGX_PAGE_TYPE_TRIM completes successfully but instead of
  executing ENCLU[EACCEPT] from within the enclave the enclave attempts
  to read from the page. Even though the page is accessible from the
  page table entries its type is SGX_PAGE_TYPE_TRIM and thus not
  accessible according to SGX. The expected behavior is a page fault
  with the SGX flag set in the error code.
* Test the behavior when the page type is changed successfully and
  ENCLU[EACCEPT] was run from within the enclave. The final ioctl(),
  SGX_IOC_ENCLAVE_REMOVE_PAGES, is omitted and replaced with an
  attempt to access the page. Even though the page is accessible
  from the page table entries its type is SGX_PAGE_TYPE_TRIM and
  thus not accessible according to SGX.  The expected behavior is
  a page fault with the SGX flag set in the error code.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Update to support ioctl() name change (SGX_IOC_PAGE_MODT ->
  SGX_IOC_ENCLAVE_MODIFY_TYPE) and provide secinfo as parameter instead
  of just page type (Jarkko).
- Use ioctl() instead of ioctl (Dave).

 tools/testing/selftests/sgx/main.c | 247 +++++++++++++++++++++++++++++
 1 file changed, 247 insertions(+)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index e9513ced1853..239d3c9df169 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -1588,4 +1588,251 @@ TEST_F(enclave, remove_added_page_no_eaccept)
 	EXPECT_EQ(remove_ioc.count, 0);
 }
 
+/*
+ * Request enclave page removal but instead of correctly following with
+ * EACCEPT a read attempt to page is made from within the enclave.
+ */
+TEST_F(enclave, remove_added_page_invalid_access)
+{
+	struct encl_op_get_from_addr get_addr_op;
+	struct encl_op_put_to_addr put_addr_op;
+	struct sgx_enclave_modt ioc;
+	struct sgx_secinfo secinfo;
+	unsigned long data_start;
+	int ret, errno_save;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	/*
+	 * Hardware (SGX2) and kernel support is needed for this test. Start
+	 * with check that test has a chance of succeeding.
+	 */
+	memset(&ioc, 0, sizeof(ioc));
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+
+	if (ret == -1) {
+		if (errno == ENOTTY)
+			SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+		else if (errno == ENODEV)
+			SKIP(return, "System does not support SGX2");
+	}
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	EXPECT_EQ(ret, -1);
+
+	/*
+	 * Page that will be removed is the second data page in the .data
+	 * segment. This forms part of the local encl_buffer within the
+	 * enclave.
+	 */
+	data_start = self->encl.encl_base +
+		     encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+	/*
+	 * Sanity check that page at @data_start is writable before
+	 * removing it.
+	 *
+	 * Start by writing MAGIC to test page.
+	 */
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = data_start;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory that was just written to, confirming that data
+	 * previously written (MAGIC) is present.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = data_start;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/* Start page removal by requesting change of page type to PT_TRIM. */
+	memset(&ioc, 0, sizeof(ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+	ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	ioc.length = PAGE_SIZE;
+	ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(ioc.result, 0);
+	EXPECT_EQ(ioc.count, 4096);
+
+	/*
+	 * Read from page that was just removed.
+	 */
+	get_addr_op.value = 0;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	/*
+	 * From kernel perspective the page is present but according to SGX the
+	 * page should not be accessible so a #PF with SGX bit set is
+	 * expected.
+	 */
+
+	EXPECT_EQ(self->run.function, ERESUME);
+	EXPECT_EQ(self->run.exception_vector, 14);
+	EXPECT_EQ(self->run.exception_error_code, 0x8005);
+	EXPECT_EQ(self->run.exception_addr, data_start);
+}
+
+/*
+ * Request enclave page removal and correctly follow with
+ * EACCEPT but do not follow with removal ioctl() but instead a read attempt
+ * to removed page is made from within the enclave.
+ */
+TEST_F(enclave, remove_added_page_invalid_access_after_eaccept)
+{
+	struct encl_op_get_from_addr get_addr_op;
+	struct encl_op_put_to_addr put_addr_op;
+	struct encl_op_eaccept eaccept_op;
+	struct sgx_enclave_modt ioc;
+	struct sgx_secinfo secinfo;
+	unsigned long data_start;
+	int ret, errno_save;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	/*
+	 * Hardware (SGX2) and kernel support is needed for this test. Start
+	 * with check that test has a chance of succeeding.
+	 */
+	memset(&ioc, 0, sizeof(ioc));
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+
+	if (ret == -1) {
+		if (errno == ENOTTY)
+			SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+		else if (errno == ENODEV)
+			SKIP(return, "System does not support SGX2");
+	}
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	EXPECT_EQ(ret, -1);
+
+	/*
+	 * Page that will be removed is the second data page in the .data
+	 * segment. This forms part of the local encl_buffer within the
+	 * enclave.
+	 */
+	data_start = self->encl.encl_base +
+		     encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+	/*
+	 * Sanity check that page at @data_start is writable before
+	 * removing it.
+	 *
+	 * Start by writing MAGIC to test page.
+	 */
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = data_start;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory that was just written to, confirming that data
+	 * previously written (MAGIC) is present.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = data_start;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/* Start page removal by requesting change of page type to PT_TRIM. */
+	memset(&ioc, 0, sizeof(ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+	ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	ioc.length = PAGE_SIZE;
+	ioc.secinfo = (unsigned long)&secinfo;
+
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(ioc.result, 0);
+	EXPECT_EQ(ioc.count, 4096);
+
+	eaccept_op.epc_addr = (unsigned long)data_start;
+	eaccept_op.ret = 0;
+	eaccept_op.flags = SGX_SECINFO_TRIM | SGX_SECINFO_MODIFIED;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	/* Skip ioctl() to remove page. */
+
+	/*
+	 * Read from page that was just removed.
+	 */
+	get_addr_op.value = 0;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	/*
+	 * From kernel perspective the page is present but according to SGX the
+	 * page should not be accessible so a #PF with SGX bit set is
+	 * expected.
+	 */
+
+	EXPECT_EQ(self->run.function, ERESUME);
+	EXPECT_EQ(self->run.exception_vector, 14);
+	EXPECT_EQ(self->run.exception_error_code, 0x8005);
+	EXPECT_EQ(self->run.exception_addr, data_start);
+}
+
 TEST_HARNESS_MAIN
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 30/32] selftests/sgx: Test reclaiming of untouched page
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (28 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 29/32] selftests/sgx: Test invalid access to removed enclave page Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 31/32] x86/sgx: Free up EPC pages directly to support large page ranges Reinette Chatre
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Removing a page from an initialized enclave involves three steps:
(1) the user requests changing the page type to PT_TRIM via the
    SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()
(2) on success the ENCLU[EACCEPT] instruction is run from within
    the enclave to accept the page removal
(3) the user initiates the actual removal of the page via the
    SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl().

Remove a page that has never been accessed. This means that when the
first ioctl() requesting page removal arrives, there will be no page
table entry, yet a valid page table entry needs to exist for the
ENCLU[EACCEPT] function to succeed. In this test it is verified that
a page table entry can still be installed for a page that is in the
process of being removed.

Suggested-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Update to support ioctl() name change (SGX_IOC_PAGE_MODT ->
  SGX_IOC_ENCLAVE_MODIFY_TYPE) and provide secinfo as parameter instead
  of just page type (Jarkko).
- Update test to reflect page removal ioctl() and struct name change:
  SGX_IOC_PAGE_REMOVE->SGX_IOC_ENCLAVE_REMOVE_PAGES,
  struct sgx_page_remove -> struct sgx_enclave_remove_pages (Jarkko).
- Ensure test is skipped when SGX2 not supported by kernel.

 tools/testing/selftests/sgx/main.c | 82 ++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 239d3c9df169..4fe5a0324c97 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -1835,4 +1835,86 @@ TEST_F(enclave, remove_added_page_invalid_access_after_eaccept)
 	EXPECT_EQ(self->run.exception_addr, data_start);
 }
 
+TEST_F(enclave, remove_untouched_page)
+{
+	struct sgx_enclave_remove_pages remove_ioc;
+	struct encl_op_eaccept eaccept_op;
+	struct sgx_enclave_modt modt_ioc;
+	struct sgx_secinfo secinfo;
+	unsigned long data_start;
+	int ret, errno_save;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+	/*
+	 * Hardware (SGX2) and kernel support is needed for this test. Start
+	 * with check that test has a chance of succeeding.
+	 */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+
+	if (ret == -1) {
+		if (errno == ENOTTY)
+			SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+		else if (errno == ENODEV)
+			SKIP(return, "System does not support SGX2");
+	}
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	EXPECT_EQ(ret, -1);
+
+	/* SGX2 is supported by kernel and hardware, test can proceed. */
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	data_start = self->encl.encl_base +
+			 encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+	modt_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	modt_ioc.length = PAGE_SIZE;
+	modt_ioc.secinfo = (unsigned long)&secinfo;
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(modt_ioc.result, 0);
+	EXPECT_EQ(modt_ioc.count, 4096);
+
+	/*
+	 * Enter enclave via TCS #1 and approve page removal by sending
+	 * EACCEPT for removed page.
+	 */
+
+	eaccept_op.epc_addr = data_start;
+	eaccept_op.flags = SGX_SECINFO_TRIM | SGX_SECINFO_MODIFIED;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+	EXPECT_EQ(eaccept_op.ret, 0);
+
+	memset(&remove_ioc, 0, sizeof(remove_ioc));
+
+	remove_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+	remove_ioc.length = PAGE_SIZE;
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(remove_ioc.count, 4096);
+}
+
 TEST_HARNESS_MAIN
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 31/32] x86/sgx: Free up EPC pages directly to support large page ranges
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (29 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 30/32] selftests/sgx: Test reclaiming of untouched page Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-08  0:45 ` [PATCH V2 32/32] selftests/sgx: Page removal stress test Reinette Chatre
  2022-02-22 20:27 ` [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Nathaniel McCallum
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

The page reclaimer ensures availability of EPC pages across all
enclaves. In support of this it runs independently from the
individual enclaves in order to take locks from the different
enclaves as it writes pages to swap.

When needing to load a page from swap an EPC page needs to be
available for its contents to be loaded into. Loading an existing
enclave page from swap does not reclaim EPC pages directly if
none are available, instead the reclaimer is woken when the
available EPC pages are found to be below a watermark.

When iterating over a large number of pages in an oversubscribed
environment there is a race between the reclaimer woken up and
EPC pages reclaimed fast enough for the page operations to proceed.

Ensure there are EPC pages available before attempting to load
a page that may potentially be pulled from swap into an available
EPC page.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since v1:
- Reword commit message.

 arch/x86/kernel/cpu/sgx/ioctl.c | 6 ++++++
 arch/x86/kernel/cpu/sgx/main.c  | 6 ++++++
 arch/x86/kernel/cpu/sgx/sgx.h   | 1 +
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 0ffb07095a80..d8c3c07badb3 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -962,6 +962,8 @@ static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
 	for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
 		addr = encl->base + modp->offset + c;
 
+		sgx_direct_reclaim();
+
 		mutex_lock(&encl->lock);
 
 		entry = sgx_encl_load_page(encl, addr);
@@ -1156,6 +1158,8 @@ static long sgx_enclave_modt(struct sgx_encl *encl,
 	for (c = 0 ; c < modt->length; c += PAGE_SIZE) {
 		addr = encl->base + modt->offset + c;
 
+		sgx_direct_reclaim();
+
 		mutex_lock(&encl->lock);
 
 		entry = sgx_encl_load_page(encl, addr);
@@ -1354,6 +1358,8 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
 	for (c = 0 ; c < params->length; c += PAGE_SIZE) {
 		addr = encl->base + params->offset + c;
 
+		sgx_direct_reclaim();
+
 		mutex_lock(&encl->lock);
 
 		entry = sgx_encl_load_page(encl, addr);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 6e2cb7564080..545da16bb3ea 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -370,6 +370,12 @@ static bool sgx_should_reclaim(unsigned long watermark)
 	       !list_empty(&sgx_active_page_list);
 }
 
+void sgx_direct_reclaim(void)
+{
+	if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
+		sgx_reclaim_pages();
+}
+
 static int ksgxd(void *p)
 {
 	set_freezable();
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index b30cee4de903..85cbf103b0dd 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -86,6 +86,7 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 struct sgx_epc_page *__sgx_alloc_epc_page(void);
 void sgx_free_epc_page(struct sgx_epc_page *page);
 
+void sgx_direct_reclaim(void);
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
 struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH V2 32/32] selftests/sgx: Page removal stress test
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (30 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 31/32] x86/sgx: Free up EPC pages directly to support large page ranges Reinette Chatre
@ 2022-02-08  0:45 ` Reinette Chatre
  2022-02-22 20:27 ` [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Nathaniel McCallum
  32 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-08  0:45 UTC (permalink / raw)
  To: dave.hansen, jarkko, tglx, bp, luto, mingo, linux-sgx, x86
  Cc: seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel

Create enclave with additional heap that consumes all physical SGX
memory and then remove it.

Depending on the available SGX memory this test could take a
significant time to run (several minutes) as it (1) creates the
enclave, (2) changes the type of every page to be trimmed,
(3) enters the enclave once per page to run EACCEPT, before
(4) the pages are finally removed.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Exit test completely on first failure of EACCEPT of a removed page. Since
  this is an oversubscribed test the number of pages on which this is
  attempted can be significant and in case of failure the per-page
  error logging would overwhelm the system.
- Update test to call renamed ioctl() (SGX_IOC_PAGE_MODT ->
  SGX_IOC_ENCLAVE_MODIFY_TYPE) and provide secinfo as parameter (Jarkko).
- Fixup definitions to be reverse xmas tree.
- Update test to reflect page removal ioctl() and struct name change:
  SGX_IOC_PAGE_REMOVE->SGX_IOC_ENCLAVE_REMOVE_PAGES,
  struct sgx_page_remove -> struct sgx_enclave_remove_pages (Jarkko).
- Ensure test is skipped when SGX2 not supported by kernel.
- Cleanup comments.

 tools/testing/selftests/sgx/main.c | 122 +++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 4fe5a0324c97..22abda2696e2 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -378,7 +378,129 @@ TEST_F(enclave, unclobbered_vdso_oversubscribed)
 	EXPECT_EQ(get_op.value, MAGIC);
 	EXPECT_EEXIT(&self->run);
 	EXPECT_EQ(self->run.user_data, 0);
+}
+
+TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
+{
+	struct sgx_enclave_remove_pages remove_ioc;
+	struct encl_op_get_from_buf get_op;
+	struct encl_op_eaccept eaccept_op;
+	struct encl_op_put_to_buf put_op;
+	struct sgx_enclave_modt modt_ioc;
+	struct sgx_secinfo secinfo;
+	struct encl_segment *heap;
+	unsigned long total_mem;
+	int ret, errno_save;
+	unsigned long addr;
+	unsigned long i;
+
+	/*
+	 * Create enclave with additional heap that is as big as all
+	 * available physical SGX memory.
+	 */
+	total_mem = get_total_epc_mem();
+	ASSERT_NE(total_mem, 0);
+	TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
+	       total_mem);
+	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata));
+
+	/*
+	 * Hardware (SGX2) and kernel support is needed for this test. Start
+	 * with check that test has a chance of succeeding.
+	 */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+
+	if (ret == -1) {
+		if (errno == ENOTTY)
+			SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+		else if (errno == ENODEV)
+			SKIP(return, "System does not support SGX2");
+	}
+
+	/*
+	 * Invalid parameters were provided during sanity check,
+	 * expect command to fail.
+	 */
+	EXPECT_EQ(ret, -1);
+
+	/* SGX2 is supported by kernel and hardware, test can proceed. */
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	heap = &self->encl.segment_tbl[self->encl.nr_segments - 1];
+
+	put_op.header.type = ENCL_OP_PUT_TO_BUFFER;
+	put_op.value = MAGIC;
+
+	EXPECT_EQ(ENCL_CALL(&put_op, &self->run, false), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.user_data, 0);
+
+	get_op.header.type = ENCL_OP_GET_FROM_BUFFER;
+	get_op.value = 0;
+
+	EXPECT_EQ(ENCL_CALL(&get_op, &self->run, false), 0);
+
+	EXPECT_EQ(get_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.user_data, 0);
+
+	/* Trim entire heap. */
+	memset(&modt_ioc, 0, sizeof(modt_ioc));
+	memset(&secinfo, 0, sizeof(secinfo));
+
+	secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+	modt_ioc.offset = heap->offset;
+	modt_ioc.length = heap->size;
+	modt_ioc.secinfo = (unsigned long)&secinfo;
+
+	TH_LOG("Changing type of %zd bytes to trimmed may take a while ...",
+	       heap->size);
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(modt_ioc.result, 0);
+	EXPECT_EQ(modt_ioc.count, heap->size);
+
+	/* EACCEPT all removed pages. */
+	addr = self->encl.encl_base + heap->offset;
+
+	eaccept_op.flags = SGX_SECINFO_TRIM | SGX_SECINFO_MODIFIED;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	TH_LOG("Entering enclave to run EACCEPT for each page of %zd bytes may take a while ...",
+	       heap->size);
+	for (i = 0; i < heap->size; i += 4096) {
+		eaccept_op.epc_addr = addr + i;
+		eaccept_op.ret = 0;
 
+		EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+		EXPECT_EQ(self->run.exception_vector, 0);
+		EXPECT_EQ(self->run.exception_error_code, 0);
+		EXPECT_EQ(self->run.exception_addr, 0);
+		ASSERT_EQ(eaccept_op.ret, 0);
+		ASSERT_EQ(self->run.function, EEXIT);
+	}
+
+	/* Complete page removal. */
+	memset(&remove_ioc, 0, sizeof(remove_ioc));
+
+	remove_ioc.offset = heap->offset;
+	remove_ioc.length = heap->size;
+
+	TH_LOG("Removing %zd bytes from enclave may take a while ...",
+	       heap->size);
+	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+	errno_save = ret == -1 ? errno : 0;
+
+	EXPECT_EQ(ret, 0);
+	EXPECT_EQ(errno_save, 0);
+	EXPECT_EQ(remove_ioc.count, heap->size);
 }
 
 TEST_F(enclave, clobbered_vdso)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave
  2022-02-08  0:45 ` [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave Reinette Chatre
@ 2022-02-19 11:57   ` Jarkko Sakkinen
  2022-02-19 12:01     ` Jarkko Sakkinen
  2022-03-07 16:16   ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-19 11:57 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> With SGX1 an enclave needs to be created with its maximum memory demands
> allocated. Pages cannot be added to an enclave after it is initialized.
> SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> pages to an initialized enclave. With SGX2 the enclave still needs to
> set aside address space for its maximum memory demands during enclave
> creation, but all pages need not be added before enclave initialization.
> Pages can be added during enclave runtime.
> 
> Add support for dynamically adding pages to an initialized enclave,
> architecturally limited to RW permission. Add pages via the page fault
> handler at the time an enclave address without a backing enclave page
> is accessed, potentially directly reclaiming pages if no free pages
> are available.
> 
> The enclave is still required to run ENCLU[EACCEPT] on the page before
> it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
> on an uninitialized address. This will trigger the page fault handler
> that will add the enclave page and return execution to the enclave to
> repeat the ENCLU[EACCEPT] instruction, this time successful.
> 
> If the enclave accesses an uninitialized address in another way, for
> example by expanding the enclave stack to a page that has not yet been
> added, then the page fault handler would add the page on the first
> write but upon returning to the enclave the instruction that triggered
> the page fault would be repeated and since ENCLU[EACCEPT] was not run
> yet it would trigger a second page fault, this time with the SGX flag
> set in the page fault error code. This can only be recovered by entering
> the enclave again and directly running the ENCLU[EACCEPT] instruction on
> the now initialized address.
> 
> Accessing an uninitialized address from outside the enclave also
> triggers this flow but the page will remain inaccessible (access will
> result in #PF) until accepted from within the enclave via
> ENCLU[EACCEPT].
> 
> The page is added with the architecturally constrained RW permissions
> as runtime as well as maximum allowed permissions. It is understood that
> there are some use cases, for example code relocation, that requires RWX
> maximum permissions. Supporting these use cases require guidance from
> user space policy before such maximum permissions can be allowed.
> Integration with user policy is deferred.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Fix subject line "to initialized" -> "to an initialized" (Jarkko).
> - Move text about hardware's PENDING state to the patch that introduces
>   the ENCLS[EAUG] wrapper (Jarkko).
> - Ensure kernel-doc uses brackets when referring to function.
> 
>  arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/encl.h  |   2 +
>  arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
>  3 files changed, 137 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index a5d4a7efb986..d1e3ea86b902 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -124,6 +124,128 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>  	return entry;
>  }
>  
> +/**
> + * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
> + * @vma:	VMA obtained from fault info from where page is accessed
> + * @encl:	enclave accessing the page
> + * @addr:	address that triggered the page fault
> + *
> + * When an initialized enclave accesses a page with no backing EPC page
> + * on a SGX2 system then the EPC can be added dynamically via the SGX2
> + * ENCLS[EAUG] instruction.
> + *
> + * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
> + * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
> + */
> +static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
> +				     struct sgx_encl *encl, unsigned long addr)
> +{
> +	struct sgx_pageinfo pginfo = {0};
> +	struct sgx_encl_page *encl_page;
> +	struct sgx_epc_page *epc_page;
> +	struct sgx_va_page *va_page;
> +	unsigned long phys_addr;
> +	unsigned long prot;
> +	vm_fault_t vmret;
> +	int ret;
> +
> +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> +		return VM_FAULT_SIGBUS;
> +
> +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> +	if (!encl_page)
> +		return VM_FAULT_OOM;
> +
> +	encl_page->desc = addr;
> +	encl_page->encl = encl;
> +
> +	/*
> +	 * Adding a regular page that is architecturally allowed to only
> +	 * be created with RW permissions.
> +	 * TBD: Interface with user space policy to support max permissions
> +	 * of RWX.
> +	 */
> +	prot = PROT_READ | PROT_WRITE;
> +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> +
> +	epc_page = sgx_alloc_epc_page(encl_page, true);
> +	if (IS_ERR(epc_page)) {
> +		kfree(encl_page);
> +		return VM_FAULT_SIGBUS;
> +	}
> +
> +	va_page = sgx_encl_grow(encl);
> +	if (IS_ERR(va_page)) {
> +		ret = PTR_ERR(va_page);
> +		goto err_out_free;
> +	}
> +
> +	mutex_lock(&encl->lock);
> +
> +	/*
> +	 * Copy comment from sgx_encl_add_page() to maintain guidance in
> +	 * this similar flow:
> +	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
> +	 * deleting (via sgx_encl_shrink()) in the error path.
> +	 */
> +	if (va_page)
> +		list_add(&va_page->list, &encl->va_pages);
> +
> +	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
> +			encl_page, GFP_KERNEL);
> +	/*
> +	 * If ret == -EBUSY then page was created in another flow while
> +	 * running without encl->lock
> +	 */
> +	if (ret)
> +		goto err_out_unlock;
> +
> +	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
> +	pginfo.addr = encl_page->desc & PAGE_MASK;
> +	pginfo.metadata = 0;
> +
> +	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> +	if (ret)
> +		goto err_out;
> +
> +	encl_page->encl = encl;
> +	encl_page->epc_page = epc_page;
> +	encl_page->type = SGX_PAGE_TYPE_REG;
> +	encl->secs_child_cnt++;
> +
> +	sgx_mark_page_reclaimable(encl_page->epc_page);
> +
> +	phys_addr = sgx_get_epc_phys_addr(epc_page);
> +	/*
> +	 * Do not undo everything when creating PTE entry fails - next #PF
> +	 * would find page ready for a PTE.
> +	 * PAGE_SHARED because protection is forced to be RW above and COW
> +	 * is not supported.
> +	 */
> +	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> +				    PAGE_SHARED);
> +	if (vmret != VM_FAULT_NOPAGE) {
> +		mutex_unlock(&encl->lock);
> +		return VM_FAULT_SIGBUS;
> +	}
> +	mutex_unlock(&encl->lock);
> +	return VM_FAULT_NOPAGE;
> +
> +err_out:
> +	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> +
> +err_out_unlock:
> +	sgx_encl_shrink(encl, va_page);
> +	mutex_unlock(&encl->lock);
> +
> +err_out_free:
> +	sgx_encl_free_epc_page(epc_page);
> +	kfree(encl_page);
> +
> +	return VM_FAULT_SIGBUS;
> +}
> +
>  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  {
>  	unsigned long addr = (unsigned long)vmf->address;
> @@ -145,6 +267,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  	if (unlikely(!encl))
>  		return VM_FAULT_SIGBUS;
>  
> +	/*
> +	 * The page_array keeps track of all enclave pages, whether they
> +	 * are swapped out or not. If there is no entry for this page and
> +	 * the system supports SGX2 then it is possible to dynamically add
> +	 * a new enclave page. This is only possible for an initialized
> +	 * enclave that will be checked for right away.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
> +	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
> +		return sgx_encl_eaug_page(vma, encl, addr);
> +
>  	mutex_lock(&encl->lock);
>  
>  	entry = sgx_encl_load_page(encl, addr);
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index 848a28d28d3d..1b6ce1da7c92 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -123,4 +123,6 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page);
>  struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>  					 unsigned long addr);
>  
> +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
> +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
>  #endif /* _X86_ENCL_H */
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 23bdf558b231..58ff62a1fb00 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -17,7 +17,7 @@
>  #include "encl.h"
>  #include "encls.h"
>  
> -static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
>  {
>  	struct sgx_va_page *va_page = NULL;
>  	void *err;
> @@ -43,7 +43,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
>  	return va_page;
>  }
>  
> -static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
>  {
>  	encl->page_cnt--;
>  
> -- 
> 2.25.1
> 

Quickly looking through also this sequence is possible:

1. Enclave's run-time flow ignores the whole EACCEPT but instead a memory
   dereference will initialize the sequence.
2. This causes #PF handler to do EAUG and after the enclave is re-entered
   the vDSO exists because the page is not EACCEPT'd.
2. Enclave host enter in-enclave exception handler, which does EACCEPT.

Can you confirm this? I'm planning to test this patch by implementing EAUG
support in Rust for Enarx. At this point I'm not yet sure whether I choose
EACCEPT initiated or memory deference initiated code path but I think it is
good if the kernel implementation is good enough to support both.

Other than that, this looks super solid!

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave
  2022-02-19 11:57   ` Jarkko Sakkinen
@ 2022-02-19 12:01     ` Jarkko Sakkinen
  2022-02-20 18:40       ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-19 12:01 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Sat, Feb 19, 2022 at 12:57:21PM +0100, Jarkko Sakkinen wrote:
> On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> > With SGX1 an enclave needs to be created with its maximum memory demands
> > allocated. Pages cannot be added to an enclave after it is initialized.
> > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> > pages to an initialized enclave. With SGX2 the enclave still needs to
> > set aside address space for its maximum memory demands during enclave
> > creation, but all pages need not be added before enclave initialization.
> > Pages can be added during enclave runtime.
> > 
> > Add support for dynamically adding pages to an initialized enclave,
> > architecturally limited to RW permission. Add pages via the page fault
> > handler at the time an enclave address without a backing enclave page
> > is accessed, potentially directly reclaiming pages if no free pages
> > are available.
> > 
> > The enclave is still required to run ENCLU[EACCEPT] on the page before
> > it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
> > on an uninitialized address. This will trigger the page fault handler
> > that will add the enclave page and return execution to the enclave to
> > repeat the ENCLU[EACCEPT] instruction, this time successful.
> > 
> > If the enclave accesses an uninitialized address in another way, for
> > example by expanding the enclave stack to a page that has not yet been
> > added, then the page fault handler would add the page on the first
> > write but upon returning to the enclave the instruction that triggered
> > the page fault would be repeated and since ENCLU[EACCEPT] was not run
> > yet it would trigger a second page fault, this time with the SGX flag
> > set in the page fault error code. This can only be recovered by entering
> > the enclave again and directly running the ENCLU[EACCEPT] instruction on
> > the now initialized address.
> > 
> > Accessing an uninitialized address from outside the enclave also
> > triggers this flow but the page will remain inaccessible (access will
> > result in #PF) until accepted from within the enclave via
> > ENCLU[EACCEPT].
> > 
> > The page is added with the architecturally constrained RW permissions
> > as runtime as well as maximum allowed permissions. It is understood that
> > there are some use cases, for example code relocation, that requires RWX
> > maximum permissions. Supporting these use cases require guidance from
> > user space policy before such maximum permissions can be allowed.
> > Integration with user policy is deferred.
> > 
> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > ---
> > Changes since V1:
> > - Fix subject line "to initialized" -> "to an initialized" (Jarkko).
> > - Move text about hardware's PENDING state to the patch that introduces
> >   the ENCLS[EAUG] wrapper (Jarkko).
> > - Ensure kernel-doc uses brackets when referring to function.
> > 
> >  arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/encl.h  |   2 +
> >  arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
> >  3 files changed, 137 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index a5d4a7efb986..d1e3ea86b902 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -124,6 +124,128 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >  	return entry;
> >  }
> >  
> > +/**
> > + * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
> > + * @vma:	VMA obtained from fault info from where page is accessed
> > + * @encl:	enclave accessing the page
> > + * @addr:	address that triggered the page fault
> > + *
> > + * When an initialized enclave accesses a page with no backing EPC page
> > + * on a SGX2 system then the EPC can be added dynamically via the SGX2
> > + * ENCLS[EAUG] instruction.
> > + *
> > + * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
> > + * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
> > + */
> > +static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
> > +				     struct sgx_encl *encl, unsigned long addr)
> > +{
> > +	struct sgx_pageinfo pginfo = {0};
> > +	struct sgx_encl_page *encl_page;
> > +	struct sgx_epc_page *epc_page;
> > +	struct sgx_va_page *va_page;
> > +	unsigned long phys_addr;
> > +	unsigned long prot;
> > +	vm_fault_t vmret;
> > +	int ret;
> > +
> > +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> > +		return VM_FAULT_SIGBUS;
> > +
> > +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> > +	if (!encl_page)
> > +		return VM_FAULT_OOM;
> > +
> > +	encl_page->desc = addr;
> > +	encl_page->encl = encl;
> > +
> > +	/*
> > +	 * Adding a regular page that is architecturally allowed to only
> > +	 * be created with RW permissions.
> > +	 * TBD: Interface with user space policy to support max permissions
> > +	 * of RWX.
> > +	 */
> > +	prot = PROT_READ | PROT_WRITE;
> > +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > +
> > +	epc_page = sgx_alloc_epc_page(encl_page, true);
> > +	if (IS_ERR(epc_page)) {
> > +		kfree(encl_page);
> > +		return VM_FAULT_SIGBUS;
> > +	}
> > +
> > +	va_page = sgx_encl_grow(encl);
> > +	if (IS_ERR(va_page)) {
> > +		ret = PTR_ERR(va_page);
> > +		goto err_out_free;
> > +	}
> > +
> > +	mutex_lock(&encl->lock);
> > +
> > +	/*
> > +	 * Copy comment from sgx_encl_add_page() to maintain guidance in
> > +	 * this similar flow:
> > +	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
> > +	 * deleting (via sgx_encl_shrink()) in the error path.
> > +	 */
> > +	if (va_page)
> > +		list_add(&va_page->list, &encl->va_pages);
> > +
> > +	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
> > +			encl_page, GFP_KERNEL);
> > +	/*
> > +	 * If ret == -EBUSY then page was created in another flow while
> > +	 * running without encl->lock
> > +	 */
> > +	if (ret)
> > +		goto err_out_unlock;
> > +
> > +	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
> > +	pginfo.addr = encl_page->desc & PAGE_MASK;
> > +	pginfo.metadata = 0;
> > +
> > +	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> > +	if (ret)
> > +		goto err_out;
> > +
> > +	encl_page->encl = encl;
> > +	encl_page->epc_page = epc_page;
> > +	encl_page->type = SGX_PAGE_TYPE_REG;
> > +	encl->secs_child_cnt++;
> > +
> > +	sgx_mark_page_reclaimable(encl_page->epc_page);
> > +
> > +	phys_addr = sgx_get_epc_phys_addr(epc_page);
> > +	/*
> > +	 * Do not undo everything when creating PTE entry fails - next #PF
> > +	 * would find page ready for a PTE.
> > +	 * PAGE_SHARED because protection is forced to be RW above and COW
> > +	 * is not supported.
> > +	 */
> > +	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> > +				    PAGE_SHARED);
> > +	if (vmret != VM_FAULT_NOPAGE) {
> > +		mutex_unlock(&encl->lock);
> > +		return VM_FAULT_SIGBUS;
> > +	}
> > +	mutex_unlock(&encl->lock);
> > +	return VM_FAULT_NOPAGE;
> > +
> > +err_out:
> > +	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> > +
> > +err_out_unlock:
> > +	sgx_encl_shrink(encl, va_page);
> > +	mutex_unlock(&encl->lock);
> > +
> > +err_out_free:
> > +	sgx_encl_free_epc_page(epc_page);
> > +	kfree(encl_page);
> > +
> > +	return VM_FAULT_SIGBUS;
> > +}
> > +
> >  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >  {
> >  	unsigned long addr = (unsigned long)vmf->address;
> > @@ -145,6 +267,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >  	if (unlikely(!encl))
> >  		return VM_FAULT_SIGBUS;
> >  
> > +	/*
> > +	 * The page_array keeps track of all enclave pages, whether they
> > +	 * are swapped out or not. If there is no entry for this page and
> > +	 * the system supports SGX2 then it is possible to dynamically add
> > +	 * a new enclave page. This is only possible for an initialized
> > +	 * enclave that will be checked for right away.
> > +	 */
> > +	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
> > +	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
> > +		return sgx_encl_eaug_page(vma, encl, addr);
> > +
> >  	mutex_lock(&encl->lock);
> >  
> >  	entry = sgx_encl_load_page(encl, addr);
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > index 848a28d28d3d..1b6ce1da7c92 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -123,4 +123,6 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> >  struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >  					 unsigned long addr);
> >  
> > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
> > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
> >  #endif /* _X86_ENCL_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 23bdf558b231..58ff62a1fb00 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -17,7 +17,7 @@
> >  #include "encl.h"
> >  #include "encls.h"
> >  
> > -static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> >  {
> >  	struct sgx_va_page *va_page = NULL;
> >  	void *err;
> > @@ -43,7 +43,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> >  	return va_page;
> >  }
> >  
> > -static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> >  {
> >  	encl->page_cnt--;
> >  
> > -- 
> > 2.25.1
> > 
> 
> Quickly looking through also this sequence is possible:
> 
> 1. Enclave's run-time flow ignores the whole EACCEPT but instead a memory
>    dereference will initialize the sequence.
> 2. This causes #PF handler to do EAUG and after the enclave is re-entered
>    the vDSO exists because the page is not EACCEPT'd.
> 2. Enclave host enter in-enclave exception handler, which does EACCEPT.
> 
> Can you confirm this? I'm planning to test this patch by implementing EAUG
> support in Rust for Enarx. At this point I'm not yet sure whether I choose
> EACCEPT initiated or memory deference initiated code path but I think it is
> good if the kernel implementation is good enough to support both.
> 
> Other than that, this looks super solid!

I got my answer:

https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/

I could almost give reviewed-by but I need to write the user space
implementation first to check that this works for Enarx.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave
  2022-02-19 12:01     ` Jarkko Sakkinen
@ 2022-02-20 18:40       ` Jarkko Sakkinen
  2022-02-22 19:19         ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-20 18:40 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Sat, Feb 19, 2022 at 01:01:08PM +0100, Jarkko Sakkinen wrote:
> On Sat, Feb 19, 2022 at 12:57:21PM +0100, Jarkko Sakkinen wrote:
> > On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> > > With SGX1 an enclave needs to be created with its maximum memory demands
> > > allocated. Pages cannot be added to an enclave after it is initialized.
> > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> > > pages to an initialized enclave. With SGX2 the enclave still needs to
> > > set aside address space for its maximum memory demands during enclave
> > > creation, but all pages need not be added before enclave initialization.
> > > Pages can be added during enclave runtime.
> > > 
> > > Add support for dynamically adding pages to an initialized enclave,
> > > architecturally limited to RW permission. Add pages via the page fault
> > > handler at the time an enclave address without a backing enclave page
> > > is accessed, potentially directly reclaiming pages if no free pages
> > > are available.
> > > 
> > > The enclave is still required to run ENCLU[EACCEPT] on the page before
> > > it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
> > > on an uninitialized address. This will trigger the page fault handler
> > > that will add the enclave page and return execution to the enclave to
> > > repeat the ENCLU[EACCEPT] instruction, this time successful.
> > > 
> > > If the enclave accesses an uninitialized address in another way, for
> > > example by expanding the enclave stack to a page that has not yet been
> > > added, then the page fault handler would add the page on the first
> > > write but upon returning to the enclave the instruction that triggered
> > > the page fault would be repeated and since ENCLU[EACCEPT] was not run
> > > yet it would trigger a second page fault, this time with the SGX flag
> > > set in the page fault error code. This can only be recovered by entering
> > > the enclave again and directly running the ENCLU[EACCEPT] instruction on
> > > the now initialized address.
> > > 
> > > Accessing an uninitialized address from outside the enclave also
> > > triggers this flow but the page will remain inaccessible (access will
> > > result in #PF) until accepted from within the enclave via
> > > ENCLU[EACCEPT].
> > > 
> > > The page is added with the architecturally constrained RW permissions
> > > as runtime as well as maximum allowed permissions. It is understood that
> > > there are some use cases, for example code relocation, that requires RWX
> > > maximum permissions. Supporting these use cases require guidance from
> > > user space policy before such maximum permissions can be allowed.
> > > Integration with user policy is deferred.
> > > 
> > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > ---
> > > Changes since V1:
> > > - Fix subject line "to initialized" -> "to an initialized" (Jarkko).
> > > - Move text about hardware's PENDING state to the patch that introduces
> > >   the ENCLS[EAUG] wrapper (Jarkko).
> > > - Ensure kernel-doc uses brackets when referring to function.
> > > 
> > >  arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
> > >  arch/x86/kernel/cpu/sgx/encl.h  |   2 +
> > >  arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
> > >  3 files changed, 137 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > > index a5d4a7efb986..d1e3ea86b902 100644
> > > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > > @@ -124,6 +124,128 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > >  	return entry;
> > >  }
> > >  
> > > +/**
> > > + * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
> > > + * @vma:	VMA obtained from fault info from where page is accessed
> > > + * @encl:	enclave accessing the page
> > > + * @addr:	address that triggered the page fault
> > > + *
> > > + * When an initialized enclave accesses a page with no backing EPC page
> > > + * on a SGX2 system then the EPC can be added dynamically via the SGX2
> > > + * ENCLS[EAUG] instruction.
> > > + *
> > > + * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
> > > + * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
> > > + */
> > > +static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
> > > +				     struct sgx_encl *encl, unsigned long addr)
> > > +{
> > > +	struct sgx_pageinfo pginfo = {0};
> > > +	struct sgx_encl_page *encl_page;
> > > +	struct sgx_epc_page *epc_page;
> > > +	struct sgx_va_page *va_page;
> > > +	unsigned long phys_addr;
> > > +	unsigned long prot;
> > > +	vm_fault_t vmret;
> > > +	int ret;
> > > +
> > > +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> > > +		return VM_FAULT_SIGBUS;
> > > +
> > > +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> > > +	if (!encl_page)
> > > +		return VM_FAULT_OOM;
> > > +
> > > +	encl_page->desc = addr;
> > > +	encl_page->encl = encl;
> > > +
> > > +	/*
> > > +	 * Adding a regular page that is architecturally allowed to only
> > > +	 * be created with RW permissions.
> > > +	 * TBD: Interface with user space policy to support max permissions
> > > +	 * of RWX.
> > > +	 */
> > > +	prot = PROT_READ | PROT_WRITE;
> > > +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > > +
> > > +	epc_page = sgx_alloc_epc_page(encl_page, true);
> > > +	if (IS_ERR(epc_page)) {
> > > +		kfree(encl_page);
> > > +		return VM_FAULT_SIGBUS;
> > > +	}
> > > +
> > > +	va_page = sgx_encl_grow(encl);
> > > +	if (IS_ERR(va_page)) {
> > > +		ret = PTR_ERR(va_page);
> > > +		goto err_out_free;
> > > +	}
> > > +
> > > +	mutex_lock(&encl->lock);
> > > +
> > > +	/*
> > > +	 * Copy comment from sgx_encl_add_page() to maintain guidance in
> > > +	 * this similar flow:
> > > +	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
> > > +	 * deleting (via sgx_encl_shrink()) in the error path.
> > > +	 */
> > > +	if (va_page)
> > > +		list_add(&va_page->list, &encl->va_pages);
> > > +
> > > +	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
> > > +			encl_page, GFP_KERNEL);
> > > +	/*
> > > +	 * If ret == -EBUSY then page was created in another flow while
> > > +	 * running without encl->lock
> > > +	 */
> > > +	if (ret)
> > > +		goto err_out_unlock;
> > > +
> > > +	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
> > > +	pginfo.addr = encl_page->desc & PAGE_MASK;
> > > +	pginfo.metadata = 0;
> > > +
> > > +	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> > > +	if (ret)
> > > +		goto err_out;
> > > +
> > > +	encl_page->encl = encl;
> > > +	encl_page->epc_page = epc_page;
> > > +	encl_page->type = SGX_PAGE_TYPE_REG;
> > > +	encl->secs_child_cnt++;
> > > +
> > > +	sgx_mark_page_reclaimable(encl_page->epc_page);
> > > +
> > > +	phys_addr = sgx_get_epc_phys_addr(epc_page);
> > > +	/*
> > > +	 * Do not undo everything when creating PTE entry fails - next #PF
> > > +	 * would find page ready for a PTE.
> > > +	 * PAGE_SHARED because protection is forced to be RW above and COW
> > > +	 * is not supported.
> > > +	 */
> > > +	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> > > +				    PAGE_SHARED);
> > > +	if (vmret != VM_FAULT_NOPAGE) {
> > > +		mutex_unlock(&encl->lock);
> > > +		return VM_FAULT_SIGBUS;
> > > +	}
> > > +	mutex_unlock(&encl->lock);
> > > +	return VM_FAULT_NOPAGE;
> > > +
> > > +err_out:
> > > +	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> > > +
> > > +err_out_unlock:
> > > +	sgx_encl_shrink(encl, va_page);
> > > +	mutex_unlock(&encl->lock);
> > > +
> > > +err_out_free:
> > > +	sgx_encl_free_epc_page(epc_page);
> > > +	kfree(encl_page);
> > > +
> > > +	return VM_FAULT_SIGBUS;
> > > +}
> > > +
> > >  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > >  {
> > >  	unsigned long addr = (unsigned long)vmf->address;
> > > @@ -145,6 +267,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > >  	if (unlikely(!encl))
> > >  		return VM_FAULT_SIGBUS;
> > >  
> > > +	/*
> > > +	 * The page_array keeps track of all enclave pages, whether they
> > > +	 * are swapped out or not. If there is no entry for this page and
> > > +	 * the system supports SGX2 then it is possible to dynamically add
> > > +	 * a new enclave page. This is only possible for an initialized
> > > +	 * enclave that will be checked for right away.
> > > +	 */
> > > +	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
> > > +	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
> > > +		return sgx_encl_eaug_page(vma, encl, addr);
> > > +
> > >  	mutex_lock(&encl->lock);
> > >  
> > >  	entry = sgx_encl_load_page(encl, addr);
> > > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > > index 848a28d28d3d..1b6ce1da7c92 100644
> > > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > > @@ -123,4 +123,6 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> > >  struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > >  					 unsigned long addr);
> > >  
> > > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
> > > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
> > >  #endif /* _X86_ENCL_H */
> > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > > index 23bdf558b231..58ff62a1fb00 100644
> > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > > @@ -17,7 +17,7 @@
> > >  #include "encl.h"
> > >  #include "encls.h"
> > >  
> > > -static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > >  {
> > >  	struct sgx_va_page *va_page = NULL;
> > >  	void *err;
> > > @@ -43,7 +43,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > >  	return va_page;
> > >  }
> > >  
> > > -static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> > > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> > >  {
> > >  	encl->page_cnt--;
> > >  
> > > -- 
> > > 2.25.1
> > > 
> > 
> > Quickly looking through also this sequence is possible:
> > 
> > 1. Enclave's run-time flow ignores the whole EACCEPT but instead a memory
> >    dereference will initialize the sequence.
> > 2. This causes #PF handler to do EAUG and after the enclave is re-entered
> >    the vDSO exists because the page is not EACCEPT'd.
> > 2. Enclave host enter in-enclave exception handler, which does EACCEPT.
> > 
> > Can you confirm this? I'm planning to test this patch by implementing EAUG
> > support in Rust for Enarx. At this point I'm not yet sure whether I choose
> > EACCEPT initiated or memory deference initiated code path but I think it is
> > good if the kernel implementation is good enough to support both.
> > 
> > Other than that, this looks super solid!
> 
> I got my answer:
> 
> https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/
> 
> I could almost give reviewed-by but I need to write the user space
> implementation first to check that this works for Enarx.

Do you know if it is possible to do EAUG, EMODPR and the do a single
EACCEPT for both? Just looking at pseudo-code, it looked doable but
I need to check this.

I.e. EAUG has this

EPCM(DS:RCX).BLOCKED := 0;
EPCM(DS:RCX).PENDING := 1;
EPCM(DS:RCX).MODIFIED := 0;
EPCM(DS:RCX).PR := 0;
(* associate the EPCPAGE with the SECS by storing the SECS identifier of DS:TMP_SECS *)
Update EPCM(DS:RCX) SECS identifier to reference DS:TMP_SECS identifier;
(* Set EPCM valid fields *)
EPCM(DS:RCX).VALID := 1;

And EMODPR only checks .VALID.

Doing two EACCEPT rounds is a bit rough as you have the page available in a
kind of "stalled' state.

/Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-08  0:45 ` [PATCH V2 16/32] x86/sgx: Support restricting " Reinette Chatre
@ 2022-02-21  0:49   ` Jarkko Sakkinen
  2022-02-22 18:35     ` Reinette Chatre
  2022-02-23 19:21     ` Dhanraj, Vijay
  0 siblings, 2 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-21  0:49 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
> In the initial (SGX1) version of SGX, pages in an enclave need to be
> created with permissions that support all usages of the pages, from the
> time the enclave is initialized until it is unloaded. For example,
> pages used by a JIT compiler or when code needs to otherwise be
> relocated need to always have RWX permissions.
> 
> SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel
> and can be used to restrict the EPCM permissions of regular enclave
> pages within an initialized enclave.
> 
> Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support
> restricting EPCM permissions. With this ioctl() the user specifies
> a page range and the permissions to be applied to all pages in
> the provided range. After checking the new permissions (more detail
> below) the page table entries are reset and any new page
> table entries will contain the new, restricted, permissions.
> ENCLS[EMODPR] is run to restrict the EPCM permissions followed by
> the ENCLS[ETRACK] flow that will ensure no cached
> linear-to-physical address mappings to the changed pages remain.
> 
> It is possible for the permission change request to fail on any
> page within the provided range, either with an error encountered
> by the kernel or by the SGX hardware while running
> ENCLS[EMODPR]. To support partial success the ioctl() returns an
> error code based on failures encountered by the kernel as well
> as two result output parameters: one for the number of pages
> that were successfully changed and one for the SGX return code.
> 
> Checking user provided new permissions
> ======================================
> 
> Enclave page permission changes need to be approached with care and
> for this reason permission changes are only allowed if the new
> permissions are the same or more restrictive that the vetted
> permissions. No additional checking is done to ensure that the
> permissions are actually being restricted. This is because the
> enclave may have relaxed the EPCM permissions from within
> the enclave without letting the kernel know. An attempt to relax
> permissions using this call will be ignored by the hardware.
> 
> For example, together with the support for relaxing of EPCM permissions,
> enclave pages added with the vetted permissions in brackets below
> are allowed to have permissions as follows:
> * (RWX) => RW => R => RX => RWX
> * (RW) => R => RW
> * (RX) => R => RX
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Change terminology to use "relax" instead of "extend" to refer to
>   the case when enclave page permissions are added (Dave).
> - Use ioctl() in commit message (Dave).
> - Add examples on what permissions would be allowed (Dave).
> - Split enclave page permission changes into two ioctl()s, one for
>   permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
>   and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
>   (Jarkko).
> - In support of the ioctl() name change the following names have been
>   changed:
>   struct sgx_page_modp -> struct sgx_enclave_restrict_perm
>   sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm()
>   sgx_page_modp() -> sgx_enclave_restrict_perm()
> - ioctl() takes entire secinfo as input instead of
>   page permissions only (Jarkko).
> - Fix kernel-doc to include () in function name.
> - Create and use utility for the ETRACK flow.
> - Fixups in comments
> - Move kernel-doc to function that provides documentation for
>   Documentation/x86/sgx.rst.
> - Remove redundant comment.
> - Make explicit which members of struct sgx_enclave_restrict_perm
>   are for output (Dave).
> 
>  arch/x86/include/uapi/asm/sgx.h |  21 +++
>  arch/x86/kernel/cpu/sgx/encl.c  |   4 +-
>  arch/x86/kernel/cpu/sgx/encl.h  |   3 +
>  arch/x86/kernel/cpu/sgx/ioctl.c | 229 ++++++++++++++++++++++++++++++++
>  4 files changed, 255 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 5c678b27bb72..b0ffb80bc67f 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -31,6 +31,8 @@ enum sgx_page_flags {
>  	_IO(SGX_MAGIC, 0x04)
>  #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
>  	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> +	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
>  
>  /**
>   * struct sgx_enclave_create - parameter structure for the
> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
>  	__u64 count;
>  };
>  
> +/**
> + * struct sgx_enclave_restrict_perm - parameters for ioctl
> + *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> + * @offset:	starting page offset (page aligned relative to enclave base
> + *		address defined in SECS)
> + * @length:	length of memory (multiple of the page size)
> + * @secinfo:	address for the SECINFO data containing the new permission bits
> + *		for pages in range described by @offset and @length
> + * @result:	(output) SGX result code of ENCLS[EMODPR] function
> + * @count:	(output) bytes successfully changed (multiple of page size)
> + */
> +struct sgx_enclave_restrict_perm {
> +	__u64 offset;
> +	__u64 length;
> +	__u64 secinfo;
> +	__u64 result;
> +	__u64 count;
> +};
> +
>  struct sgx_enclave_run;
>  
>  /**
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 8da813504249..a5d4a7efb986 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	return epc_page;
>  }
>  
> -static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> -						unsigned long addr)
> +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> +					 unsigned long addr)
>  {
>  	struct sgx_epc_page *epc_page;
>  	struct sgx_encl_page *entry;
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index cb9f16d457ac..848a28d28d3d 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
>  bool sgx_va_page_full(struct sgx_va_page *va_page);
>  void sgx_encl_free_epc_page(struct sgx_epc_page *page);
>  
> +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> +					 unsigned long addr);
> +
>  #endif /* _X86_ENCL_H */
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 9cc6af404bf6..23bdf558b231 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg)
>  	return ret;
>  }
>  
> +/*
> + * Some SGX functions require that no cached linear-to-physical address
> + * mappings are present before they can succeed. Collaborate with
> + * hardware via ENCLS[ETRACK] to ensure that all cached
> + * linear-to-physical address mappings belonging to all threads of
> + * the enclave are cleared. See sgx_encl_cpumask() for details.
> + */
> +static int sgx_enclave_etrack(struct sgx_encl *encl)
> +{
> +	void *epc_virt;
> +	int ret;
> +
> +	epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page);
> +	ret = __etrack(epc_virt);
> +	if (ret) {
> +		/*
> +		 * ETRACK only fails when there is an OS issue. For
> +		 * example, two consecutive ETRACK was sent without
> +		 * completed IPI between.
> +		 */
> +		pr_err_once("ETRACK returned %d (0x%x)", ret, ret);
> +		/*
> +		 * Send IPIs to kick CPUs out of the enclave and
> +		 * try ETRACK again.
> +		 */
> +		on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
> +		ret = __etrack(epc_virt);
> +		if (ret) {
> +			pr_err_once("ETRACK repeat returned %d (0x%x)",
> +				    ret, ret);
> +			return -EFAULT;
> +		}
> +	}
> +	on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
> +
> +	return 0;
> +}
> +
> +/**
> + * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS view
> + * @encl:	Enclave to which the pages belong.
> + * @modp:	Checked parameters from user on which pages need modifying.
> + * @secinfo_perm: New (validated) permission bits.
> + *
> + * Return:
> + * - 0:		Success.
> + * - -errno:	Otherwise.
> + */
> +static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
> +				      struct sgx_enclave_restrict_perm *modp,
> +				      u64 secinfo_perm)
> +{
> +	unsigned long vm_prot, run_prot_restore;
> +	struct sgx_encl_page *entry;
> +	struct sgx_secinfo secinfo;
> +	unsigned long addr;
> +	unsigned long c;
> +	void *epc_virt;
> +	int ret;
> +
> +	memset(&secinfo, 0, sizeof(secinfo));
> +	secinfo.flags = secinfo_perm;
> +
> +	vm_prot = vm_prot_from_secinfo(secinfo_perm);
> +
> +	for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
> +		addr = encl->base + modp->offset + c;
> +
> +		mutex_lock(&encl->lock);
> +
> +		entry = sgx_encl_load_page(encl, addr);
> +		if (IS_ERR(entry)) {
> +			ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT;
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * Changing EPCM permissions is only supported on regular
> +		 * SGX pages. Attempting this change on other pages will
> +		 * result in #PF.
> +		 */
> +		if (entry->type != SGX_PAGE_TYPE_REG) {
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * Do not verify if current runtime protection bits are what
> +		 * is being requested. The enclave may have relaxed EPCM
> +		 * permissions calls without letting the kernel know and
> +		 * thus permission restriction may still be needed even if
> +		 * from the kernel's perspective the permissions are unchanged.
> +		 */
> +
> +		/* New permissions should never exceed vetted permissions. */
> +		if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
> +			ret = -EPERM;
> +			goto out_unlock;
> +		}
> +
> +		/* Make sure page stays around while releasing mutex. */
> +		if (sgx_unmark_page_reclaimable(entry->epc_page)) {
> +			ret = -EAGAIN;
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * Change runtime protection before zapping PTEs to ensure
> +		 * any new #PF uses new permissions. EPCM permissions (if
> +		 * needed) not changed yet.
> +		 */
> +		run_prot_restore = entry->vm_run_prot_bits;
> +		entry->vm_run_prot_bits = vm_prot;
> +
> +		mutex_unlock(&encl->lock);
> +		/*
> +		 * Do not keep encl->lock because of dependency on
> +		 * mmap_lock acquired in sgx_zap_enclave_ptes().
> +		 */
> +		sgx_zap_enclave_ptes(encl, addr);
> +
> +		mutex_lock(&encl->lock);
> +
> +		/* Change EPCM permissions. */
> +		epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
> +		ret = __emodpr(&secinfo, epc_virt);
> +		if (encls_faulted(ret)) {
> +			/*
> +			 * All possible faults should be avoidable:
> +			 * parameters have been checked, will only change
> +			 * permissions of a regular page, and no concurrent
> +			 * SGX1/SGX2 ENCLS instructions since these
> +			 * are protected with mutex.
> +			 */
> +			pr_err_once("EMODPR encountered exception %d\n",
> +				    ENCLS_TRAPNR(ret));
> +			ret = -EFAULT;
> +			goto out_prot_restore;
> +		}
> +		if (encls_failed(ret)) {
> +			modp->result = ret;
> +			ret = -EFAULT;
> +			goto out_prot_restore;
> +		}
> +
> +		ret = sgx_enclave_etrack(encl);
> +		if (ret) {
> +			ret = -EFAULT;
> +			goto out_reclaim;
> +		}
> +
> +		sgx_mark_page_reclaimable(entry->epc_page);
> +		mutex_unlock(&encl->lock);
> +	}
> +
> +	ret = 0;
> +	goto out;
> +
> +out_prot_restore:
> +	entry->vm_run_prot_bits = run_prot_restore;
> +out_reclaim:
> +	sgx_mark_page_reclaimable(entry->epc_page);
> +out_unlock:
> +	mutex_unlock(&encl->lock);
> +out:
> +	modp->count = c;
> +
> +	return ret;
> +}
> +
> +/**
> + * sgx_ioc_enclave_restrict_perm() - handler for
> + *                                   %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> + * @encl:	an enclave pointer
> + * @arg:	userspace pointer to a &struct sgx_enclave_restrict_perm
> + *		instance
> + *
> + * SGX2 distinguishes between relaxing and restricting the enclave page
> + * permissions maintained by the hardware (EPCM permissions) of pages
> + * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
> + *
> + * EPCM permissions cannot be restricted from within the enclave, the enclave
> + * requires the kernel to run the privileged level 0 instructions ENCLS[EMODPR]
> + * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this call
> + * will be ignored by the hardware.
> + *
> + * Enclave page permissions are not allowed to exceed the maximum vetted
> + * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits.
> + *
> + * Return:
> + * - 0:		Success
> + * - -errno:	Otherwise
> + */
> +static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl,
> +					  void __user *arg)
> +{
> +	struct sgx_enclave_restrict_perm params;
> +	u64 secinfo_perm;
> +	long ret;
> +
> +	ret = sgx_ioc_sgx2_ready(encl);
> +	if (ret)
> +		return ret;
> +
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (sgx_validate_offset_length(encl, params.offset, params.length))
> +		return -EINVAL;
> +
> +	ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
> +					 &secinfo_perm);
> +	if (ret)
> +		return ret;
> +
> +	if (params.result || params.count)
> +		return -EINVAL;
> +
> +	ret = sgx_enclave_restrict_perm(encl, &params, secinfo_perm);
> +
> +	if (copy_to_user(arg, &params, sizeof(params)))
> +		return -EFAULT;
> +
> +	return ret;
> +}
> +
>  long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  {
>  	struct sgx_encl *encl = filep->private_data;
> @@ -918,6 +1144,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  	case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
>  		ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
>  		break;
> +	case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
> +		ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg);
> +		break;
>  	default:
>  		ret = -ENOIOCTLCMD;
>  		break;
> -- 
> 2.25.1
> 

Just a suggestion but these might be a bit less cluttered explanations of
the fields:

/// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
#[repr(C)]
pub struct RelaxPermissions {
    /// In: starting page offset
    offset: u64,
    /// In: length of the address range (multiple of the page size)
    length: u64,
    /// In: SECINFO containing the relaxed permissions
    secinfo: u64,
    /// Out: length of the address range successfully changed
    count: u64,
};

/// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
#[repr(C)]
pub struct RestrictPermissions {
    /// In: starting page offset
    offset: u64,
    /// In: length of the address range (multiple of the page size)
    length: u64,
    /// In: SECINFO containing the restricted permissions
    secinfo: u64,
    /// In: ENCLU[EMODPR] return value
    result: u64,
    /// Out: length of the address range successfully changed
    count: u64,
};

I can live with the current ones too but I rewrote them so that I can
quickly make sense of the fields later. It's Rust code but the point is
the documentation...

Also, it should not be too much trouble to use the struct in user space
code even if the struct names are struct sgx_enclave_relax_permissions and
struct sgx_enclave_restrict_permissions, given that you most likely have
exactly single call-site in the run-time.

Other than that, looks quite good.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-21  0:49   ` Jarkko Sakkinen
@ 2022-02-22 18:35     ` Reinette Chatre
  2022-02-23 15:46       ` Jarkko Sakkinen
  2022-02-23 19:21     ` Dhanraj, Vijay
  1 sibling, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-22 18:35 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

Hi Jarkko,

On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:

...

>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
>> index 5c678b27bb72..b0ffb80bc67f 100644
>> --- a/arch/x86/include/uapi/asm/sgx.h
>> +++ b/arch/x86/include/uapi/asm/sgx.h
>> @@ -31,6 +31,8 @@ enum sgx_page_flags {
>>  	_IO(SGX_MAGIC, 0x04)
>>  #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
>>  	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
>> +	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
>>  
>>  /**
>>   * struct sgx_enclave_create - parameter structure for the
>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
>>  	__u64 count;
>>  };
>>  
>> +/**
>> + * struct sgx_enclave_restrict_perm - parameters for ioctl
>> + *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
>> + * @offset:	starting page offset (page aligned relative to enclave base
>> + *		address defined in SECS)
>> + * @length:	length of memory (multiple of the page size)
>> + * @secinfo:	address for the SECINFO data containing the new permission bits
>> + *		for pages in range described by @offset and @length
>> + * @result:	(output) SGX result code of ENCLS[EMODPR] function
>> + * @count:	(output) bytes successfully changed (multiple of page size)
>> + */
>> +struct sgx_enclave_restrict_perm {
>> +	__u64 offset;
>> +	__u64 length;
>> +	__u64 secinfo;
>> +	__u64 result;
>> +	__u64 count;
>> +};
>> +
>>  struct sgx_enclave_run;
>>  
>>  /**

...

> 
> Just a suggestion but these might be a bit less cluttered explanations of
> the fields:
> 
> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RelaxPermissions {
>     /// In: starting page offset
>     offset: u64,
>     /// In: length of the address range (multiple of the page size)
>     length: u64,
>     /// In: SECINFO containing the relaxed permissions
>     secinfo: u64,
>     /// Out: length of the address range successfully changed
>     count: u64,
> };
> 
> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RestrictPermissions {
>     /// In: starting page offset
>     offset: u64,
>     /// In: length of the address range (multiple of the page size)
>     length: u64,
>     /// In: SECINFO containing the restricted permissions
>     secinfo: u64,
>     /// In: ENCLU[EMODPR] return value
>     result: u64,
>     /// Out: length of the address range successfully changed
>     count: u64,
> };

In your proposal you shorten the descriptions from the current implementation.
I do consider the removed information valuable since I believe that it helps
users understand the kernel interface requirements without needing to be
familiar with or dig into the kernel code to understand how the provided data
is used.

For example, you shorten offset to "starting page offset", but what was removed
was the requirement that this offset has to be page aligned and what the offset
is relative to. I do believe summarizing these requirements upfront helps
a user space developer by not needing to dig through kernel code later
in order to understand why an -EINVAL was received.

 
> I can live with the current ones too but I rewrote them so that I can
> quickly make sense of the fields later. It's Rust code but the point is
> the documentation...

Since you do seem to be ok with the current descriptions I would prefer
to keep them.

> Also, it should not be too much trouble to use the struct in user space
> code even if the struct names are struct sgx_enclave_relax_permissions and
> struct sgx_enclave_restrict_permissions, given that you most likely have
> exactly single call-site in the run-time.

Are you requesting that I make the following name changes?
struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions

If so, do you want the function names also written out in this way?
sgx_enclave_relax_perm()        -> sgx_enclave_relax_permissions()
sgx_ioc_enclave_relax_perm()    -> sgx_ioc_enclave_relax_permissions()
sgx_enclave_restrict_perm()     -> sgx_enclave_restrict_permissions()
sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()

> Other than that, looks quite good.

Thank you very much for reviewing and testing this work.

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave
  2022-02-20 18:40       ` Jarkko Sakkinen
@ 2022-02-22 19:19         ` Reinette Chatre
  2022-02-23 15:46           ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-22 19:19 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

Hi Jarkko,

On 2/20/2022 10:40 AM, Jarkko Sakkinen wrote:
...
 
> Do you know if it is possible to do EAUG, EMODPR and the do a single
> EACCEPT for both? Just looking at pseudo-code, it looked doable but
> I need to check this.
> 
> I.e. EAUG has this
> 
> EPCM(DS:RCX).BLOCKED := 0;
> EPCM(DS:RCX).PENDING := 1;
> EPCM(DS:RCX).MODIFIED := 0;
> EPCM(DS:RCX).PR := 0;
> (* associate the EPCPAGE with the SECS by storing the SECS identifier of DS:TMP_SECS *)
> Update EPCM(DS:RCX) SECS identifier to reference DS:TMP_SECS identifier;
> (* Set EPCM valid fields *)
> EPCM(DS:RCX).VALID := 1;
> 
> And EMODPR only checks .VALID.

After that check there is also:
IF (EPCM(DS:RCX).PENDING is not 0 or (EPCM(DS:RCX).MODIFIED is not 0) )
    THEN
        RFLAGS.ZF := 1;
        RAX := SGX_PAGE_NOT_MODIFIABLE;
        GOTO DONE;
FI;

Attempting the SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl() on a recently
added page (EAUG) that has not yet been EACCEPTed is thus expected to fail
with errno of EFAULT (indicating ENCLS[EMODPR] failure) and the returned
structure's result field set to 20 (SGX_PAGE_NOT_MODIFIABLE).

I confirmed this behavior by modifying the "augment" kselftest test by adding
a SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS call between the new memory access and
the EACCEPT.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
                   ` (31 preceding siblings ...)
  2022-02-08  0:45 ` [PATCH V2 32/32] selftests/sgx: Page removal stress test Reinette Chatre
@ 2022-02-22 20:27 ` Nathaniel McCallum
  2022-02-22 22:39   ` Reinette Chatre
  32 siblings, 1 reply; 130+ messages in thread
From: Nathaniel McCallum @ 2022-02-22 20:27 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

1. This interface looks very odd to me. mmap() is the kernel interface
for changing user space memory maps. Why are we introducing a new
interface for this? You can just simply add a new mmap flag (i.e.
MAP_SGX_TCS*) and then figure out which SGX instructions to execute
based on the desired state of the memory maps. If you do this, none of
the following ioctls are needed:

* SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
* SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
* SGX_IOC_ENCLAVE_REMOVE_PAGES
* SGX_IOC_ENCLAVE_MODIFY_TYPE

It also means that languages don't have to grow support for all these
ioctls. Instead, they can just reuse the existing mmap() bindings with
the new flag. Also, multiple operations can be combined into a single
mmap() call, amortizing the changes over a single context switch.

2. Automatically adding pages with hard-coded permissions in a fault
handler seems like a really bad idea. How do you distinguish between
accesses which should result in an updated mapping and accesses that
should result in a fault? IMHO, all unmapped page accesses should
result in a page fault. mmap() should be called first to identify the
correct permissions for these pages. Then the page handler should be
updated to use the permissions from the mapping when backfilling
physical pages. If I understand correctly, this should also obviate
the need for the weird userspace callback to allow for execute
permissions.

3. Implementing as I've suggested also means that we can lock down an
enclave, for example - after code has been JITed, by closing the file
descriptor. Once the file descriptor used to create the enclave is
closed, no further mmap() can be performed on the enclave. Attempting
to do EACCEPT on an unmapped page will generate a page fault.

* - I'm aware that a new flag might be frowned upon. I see a few other options:
1. reuse an existing flag which doesn't make sense in this context
2. communicate the page type in the offset argument
3. keep SGX_IOC_ENCLAVE_MODIFY_TYPE

On Mon, Feb 7, 2022 at 8:07 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> V1: https://lore.kernel.org/linux-sgx/cover.1638381245.git.reinette.chatre@intel.com/
>
> Changes since V1 that directly impact user space:
> - SGX2 permission changes changed from a single ioctl() named
>   SGX_IOC_PAGE_MODP to two new ioctl()s:
>   SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
>   SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS, supported by two different
>   parameter structures (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS does
>   not support a result output parameter) (Jarkko).
>
>   User space flow impact: After user space runs ENCLU[EMODPE] it
>   needs to call SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to have PTEs
>   updated. Previously running SGX_IOC_PAGE_MODP in this scenario
>   resulted in EPCM.PR being set but calling
>   SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will not result in EPCM.PR
>   being set anymore and thus no need for an additional
>   ENCLU[EACCEPT].
>
> - SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
>   SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
>   obtain new permissions from secinfo as parameter instead of
>   the permissions directly (Jarkko).
>
> - ioctl() supporting SGX2 page type change is renamed from
>   SGX_IOC_PAGE_MODT to SGX_IOC_ENCLAVE_MODIFY_TYPE (Jarkko).
>
> - SGX_IOC_ENCLAVE_MODIFY_TYPE obtains new page type from secinfo
>   as parameter instead of the page type directly (Jarkko).
>
> - ioctl() supporting SGX2 page removal is renamed from
>   SGX_IOC_PAGE_REMOVE to SGX_IOC_ENCLAVE_REMOVE_PAGES (Jarkko).
>
> - All ioctl() parameter structures have been renamed as a result of the
>   ioctl() renaming:
>   SGX_IOC_ENCLAVE_RELAX_PERMISSIONS => struct sgx_enclave_relax_perm
>   SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS => struct sgx_enclave_restrict_perm
>   SGX_IOC_ENCLAVE_MODIFY_TYPE => struct sgx_enclave_modt
>   SGX_IOC_ENCLAVE_REMOVE_PAGES => struct sgx_enclave_remove_pages
>
> Changes since V1 that do not directly impact user space:
> - Number of patches in series increased from 25 to 32 primarily because
>   of splitting the original submission:
>   - Wrappers for the new SGX2 functions are introduced in three separate
>     patches replacing the original "x86/sgx: Add wrappers for SGX2
>     functions"
>     (Jarkko).
>   - Moving and renaming sgx_encl_ewb_cpumask() is done with two patches
>     replacing the original "x86/sgx: Use more generic name for enclave
>     cpumask function" (Jarkko).
>   - Support for SGX2 EPCM permission changes is split into two ioctls(),
>     one for relaxing and one for restricting permissions, each introduced
>     by a new patch replacing the original "x86/sgx: Support enclave page
>     permission changes" (Jarkko).
>   - Extracted code used by existing ioctls() for usage by new ioctl()s
>     into a new utility in new patch "x86/sgx: Create utility to validate
>     user provided offset and length" (Dave did not specifically ask for
>     this but it addresses his review feedback).
>   - Two new Documentation patches to support the SGX2 work
>     ("Documentation/x86: Introduce enclave runtime management") and
>     a dedicated section on the enclave permission management
>     ("Documentation/x86: Document SGX permission details") (Andy).
> - Most patches were reworked to improve the language by:
>   * aiming to refer to exact item instead of English rephrasing (Jarkko).
>   * use ioctl() instead of ioctl throughout (Dave).
>   * Use "relaxed" instead of "exceed" when referring to permissions
>     (Dave).
> - Improved documentation with several additions to
>   Documentation/x86/sgx.rst.
> - Many smaller changes, please refer to individual patches.
>
> Hi Everybody,
>
> The current Linux kernel support for SGX includes support for SGX1 that
> requires that an enclave be created with properties that accommodate all
> usages over its (the enclave's) lifetime. This includes properties such
> as permissions of enclave pages, the number of enclave pages, and the
> number of threads supported by the enclave.
>
> Consequences of this requirement to have the enclave be created to
> accommodate all usages include:
> * pages needing to support relocated code are required to have RWX
>   permissions for their entire lifetime,
> * an enclave needs to be created with the maximum stack and heap
>   projected to be needed during the enclave's entire lifetime which
>   can be longer than the processes running within it,
> * an enclave needs to be created with support for the maximum number
>   of threads projected to run in the enclave.
>
> Since SGX1 a few more functions were introduced, collectively called
> SGX2, that support modifications to an initialized enclave. Hardware
> supporting these functions are already available as listed on
> https://github.com/ayeks/SGX-hardware
>
> This series adds support for SGX2, also referred to as Enclave Dynamic
> Memory Management (EDMM). This includes:
>
> * Support modifying permissions of regular enclave pages belonging to an
>   initialized enclave. New permissions are not allowed to exceed the
>   originally vetted permissions. For example, RX isn't allowed unless
>   the page was originally added with RX or RWX.
>   Modifying permissions is accomplished with two new ioctl()s:
>   SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
>   SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS.
>
> * Support dynamic addition of regular enclave pages to an initialized
>   enclave. Pages are added with RW permissions as their "originally
>   vetted permissions" (see previous bullet) and thus not allowed to
>   be made executable at this time. Enabling dynamically added pages
>   to obtain executable permissions require integration with user space
>   policy that is deferred until the core SGX2 enabling is complete.
>   Pages are dynamically added to an initialized enclave from the SGX
>   page fault handler.
>
> * Support expanding an initialized enclave to accommodate more threads.
>   More threads can be accommodated by an enclave with the addition of
>   Thread Control Structure (TCS) pages that is done by changing the
>   type of regular enclave pages to TCS pages using a new ioctl()
>   SGX_IOC_ENCLAVE_MODIFY_TYPE.
>
> * Support removing regular and TCS pages from an initialized enclave.
>   Removing pages is accomplished in two stages as supported by two new
>   ioctl()s SGX_IOC_ENCLAVE_MODIFY_TYPE (same ioctl() as mentioned in
>   previous bullet) and SGX_IOC_ENCLAVE_REMOVE_PAGES.
>
> * Tests covering all the new flows, some edge cases, and one
>   comprehensive stress scenario.
>
> No additional work is needed to support SGX2 in a virtualized
> environment. The tests included in this series can also be run from
> a guest and was tested with the recent QEMU release based on 6.2.0
> that supports SGX.
>
> Patches 1 to 14 prepares the existing code for SGX2 support by
> introducing the SGX2 functions, making sure pages remain accessible
> after their enclave permissions are changed, and tracking enclave page
> types as well as runtime permissions as needed by SGX2.
>
> Patches 15 through 32 are a mix of x86/sgx and selftests/sgx patches
> that follow the format where first an SGX2 feature is
> enabled and then followed by tests of the new feature and/or
> tests of scenarios that combine SGX2 features enabled up to that point.
>
> In two cases (patches 20 and 31) code in support of SGX2 is separated
> out with detailed motivation to support the review.
>
> This series is based on v5.17-rc2 with the following fixes additionally
> applied:
>
> "selftests/sgx: Remove extra newlines in test output"
>  https://lore.kernel.org/linux-sgx/16317683a1822bbd44ab3ca48b60a9a217ac24de.1643754040.git.reinette.chatre@intel.com/
> "selftests/sgx: Ensure enclave data available during debug print"
>  https://lore.kernel.org/linux-sgx/eaaeeb9122916d831942fc8a3043c687137314c1.1643754040.git.reinette.chatre@intel.com/
> "selftests/sgx: Do not attempt enclave build without valid enclave"
>  https://lore.kernel.org/linux-sgx/4e4ea6d70c286c209964bec1e8d29ac8e692748b.1643754040.git.reinette.chatre@intel.com/
> "selftests/sgx: Fix NULL-pointer-dereference upon early test failure"
>  https://lore.kernel.org/linux-sgx/89824888783fd8e770bfc64530c7549650a41851.1643754040.git.reinette.chatre@intel.com/
> "x86/sgx: Add poison handling to reclaimer"
>  https://lore.kernel.org/linux-sgx/dcc95eb2aaefb042527ac50d0a50738c7c160dac.1643830353.git.reinette.chatre@intel.com/
> "x86/sgx: Silence softlockup detection when releasing large enclaves"
>  https://lore.kernel.org/linux-sgx/b5e9f218064aa76e3026f778e1ad0a1d823e3db8.1643133224.git.reinette.chatre@intel.com/
>
> Your feedback will be greatly appreciated.
>
> Regards,
>
> Reinette
>
> Reinette Chatre (32):
>   x86/sgx: Add short descriptions to ENCLS wrappers
>   x86/sgx: Add wrapper for SGX2 EMODPR function
>   x86/sgx: Add wrapper for SGX2 EMODT function
>   x86/sgx: Add wrapper for SGX2 EAUG function
>   Documentation/x86: Document SGX permission details
>   x86/sgx: Support VMA permissions more relaxed than enclave permissions
>   x86/sgx: Add pfn_mkwrite() handler for present PTEs
>   x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic
>     permission changes
>   x86/sgx: Export sgx_encl_ewb_cpumask()
>   x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask()
>   x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes()
>   x86/sgx: Make sgx_ipi_cb() available internally
>   x86/sgx: Create utility to validate user provided offset and length
>   x86/sgx: Keep record of SGX page type
>   x86/sgx: Support relaxing of enclave page permissions
>   x86/sgx: Support restricting of enclave page permissions
>   selftests/sgx: Add test for EPCM permission changes
>   selftests/sgx: Add test for TCS page permission changes
>   x86/sgx: Support adding of pages to an initialized enclave
>   x86/sgx: Tighten accessible memory range after enclave initialization
>   selftests/sgx: Test two different SGX2 EAUG flows
>   x86/sgx: Support modifying SGX page type
>   x86/sgx: Support complete page removal
>   Documentation/x86: Introduce enclave runtime management section
>   selftests/sgx: Introduce dynamic entry point
>   selftests/sgx: Introduce TCS initialization enclave operation
>   selftests/sgx: Test complete changing of page type flow
>   selftests/sgx: Test faulty enclave behavior
>   selftests/sgx: Test invalid access to removed enclave page
>   selftests/sgx: Test reclaiming of untouched page
>   x86/sgx: Free up EPC pages directly to support large page ranges
>   selftests/sgx: Page removal stress test
>
>  Documentation/x86/sgx.rst                     |   64 +-
>  arch/x86/include/asm/sgx.h                    |    8 +
>  arch/x86/include/uapi/asm/sgx.h               |   81 +
>  arch/x86/kernel/cpu/sgx/encl.c                |  334 +++-
>  arch/x86/kernel/cpu/sgx/encl.h                |   12 +-
>  arch/x86/kernel/cpu/sgx/encls.h               |   33 +
>  arch/x86/kernel/cpu/sgx/ioctl.c               |  831 ++++++++-
>  arch/x86/kernel/cpu/sgx/main.c                |   70 +-
>  arch/x86/kernel/cpu/sgx/sgx.h                 |    3 +
>  tools/testing/selftests/sgx/defines.h         |   23 +
>  tools/testing/selftests/sgx/load.c            |   41 +
>  tools/testing/selftests/sgx/main.c            | 1484 +++++++++++++++++
>  tools/testing/selftests/sgx/main.h            |    1 +
>  tools/testing/selftests/sgx/test_encl.c       |   68 +
>  .../selftests/sgx/test_encl_bootstrap.S       |    6 +
>  15 files changed, 2963 insertions(+), 96 deletions(-)
>
>
> base-commit: 26291c54e111ff6ba87a164d85d4a4e134b7315c
> prerequisite-patch-id: 3c3908f1c3536cc04ba020fb3e81f51395b44223
> prerequisite-patch-id: e860923423c3387cf6fdcceb2fa41dc5e9454ef4
> prerequisite-patch-id: 986260c8bc4255eb61e2c4afa88d2b723e376423
> prerequisite-patch-id: ba014a99fced2b57d5d9e2dfb9d80ddf4333c13e
> prerequisite-patch-id: 65cbb72889b6353a5639b984615d12019136b270
> prerequisite-patch-id: e3296a2f0345a77c8a7ca91f76697ae2e1dca21f
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-02-22 20:27 ` [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Nathaniel McCallum
@ 2022-02-22 22:39   ` Reinette Chatre
  2022-02-23 13:24     ` Nathaniel McCallum
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-22 22:39 UTC (permalink / raw)
  To: Nathaniel McCallum
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

Hi Nathaniel,

On 2/22/2022 12:27 PM, Nathaniel McCallum wrote:
> 1. This interface looks very odd to me. mmap() is the kernel interface
> for changing user space memory maps. Why are we introducing a new
> interface for this?

mmap() is the kernel interface used to create new mappings in the
virtual address space of the calling process. This is different from
the permissions and properties of the underlying file/memory being mapped.

A new interface is introduced because changes need to be made to the
permissions and properties of the underlying enclave. A new virtual
address space is not needed nor should existing VMAs be impacted.

This is similar to how mmap() is not used to change file permissions.

VMA permissions are separate from enclave page permissions as found in
the EPCM (Enclave Page Cache Map). The current implementation (SGX1) already
distinguishes between the VMA and EPCM permissions - for example, it is
already possible to create a read-only VMA from enclave pages that have
RW EPCM permissions. mmap() of a portion of EPC memory with a particular
permission does not imply that the underlying EPCM permissions (should)have
that permission. 

> You can just simply add a new mmap flag (i.e.
> MAP_SGX_TCS*) and then figure out which SGX instructions to execute
> based on the desired state of the memory maps. If you do this, none of
> the following ioctls are needed:
> 
> * SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> * SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> * SGX_IOC_ENCLAVE_REMOVE_PAGES
> * SGX_IOC_ENCLAVE_MODIFY_TYPE
> 
> It also means that languages don't have to grow support for all these
> ioctls. Instead, they can just reuse the existing mmap() bindings with
> the new flag. Also, multiple operations can be combined into a single
> mmap() call, amortizing the changes over a single context switch.
> 
> 2. Automatically adding pages with hard-coded permissions in a fault
> handler seems like a really bad idea.

Could you please elaborate why this is a bad idea?

> How do you distinguish between
> accesses which should result in an updated mapping and accesses that
> should result in a fault?

Accesses that should result in an updated mapping have two requirements:
(a) address accessed belongs to the enclave based on the address
    range specified during enclave create
(b) there is no backing enclave page for the address

> IMHO, all unmapped page accesses should
> result in a page fault. mmap() should be called first to identify the
> correct permissions for these pages.
> Then the page handler should be
> updated to use the permissions from the mapping when backfilling
> physical pages. If I understand correctly, this should also obviate

Regular enclave pages can _only_ be dynamically added with RW permission.

SGX2's support for adding regular pages to an enclave via the EAUG
instruction is architecturally set at RW. The OS cannot change those permissions
via the EAUG instruction nor can the OS do so with a different/additional
instruction because:
* the OS is not able to relax permissions since that can only be done from
within the enclave with ENCLU[EMODPE], thus it is not possible for the OS to
dynamically add pages via EAUG as RW and then relax permissions to RWX. 
* the OS is not able to EAUG a page and immediately attempt an EMODPR either
as Jarkko also recently inquired about:
https://lore.kernel.org/linux-sgx/80f3d7b9-e3d5-b2c0-7707-710bf6f5081e@intel.com/

> the need for the weird userspace callback to allow for execute
> permissions.

User policy integration would always be required to allow execute
permissions on a writable page. This is not expected to be a userspace
callback but instead integration with existing user policy subsystem(s).

> 
> 3. Implementing as I've suggested also means that we can lock down an
> enclave, for example - after code has been JITed, by closing the file
> descriptor. Once the file descriptor used to create the enclave is
> closed, no further mmap() can be performed on the enclave. Attempting
> to do EACCEPT on an unmapped page will generate a page fault.

This is not clear to me. If the file descriptor is closed and no further
mmap() is allowed then how would a process be able to enter the enclave
to execute code within it?

This series does indeed lock down the address range to ensure that it is
not possible to map memory that does not belong to the enclave after the
enclave is created. Please see:
https://lore.kernel.org/linux-sgx/1b833dbce6c937f71523f4aaf4b2181b9673519f.1644274683.git.reinette.chatre@intel.com/

> 
> * - I'm aware that a new flag might be frowned upon. I see a few other options:
> 1. reuse an existing flag which doesn't make sense in this context
> 2. communicate the page type in the offset argument
> 3. keep SGX_IOC_ENCLAVE_MODIFY_TYPE
> 

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-02-22 22:39   ` Reinette Chatre
@ 2022-02-23 13:24     ` Nathaniel McCallum
  2022-02-23 18:25       ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Nathaniel McCallum @ 2022-02-23 13:24 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

On Tue, Feb 22, 2022 at 5:39 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Nathaniel,
>
> On 2/22/2022 12:27 PM, Nathaniel McCallum wrote:
> > 1. This interface looks very odd to me. mmap() is the kernel interface
> > for changing user space memory maps. Why are we introducing a new
> > interface for this?
>
> mmap() is the kernel interface used to create new mappings in the
> virtual address space of the calling process. This is different from
> the permissions and properties of the underlying file/memory being mapped.
>
> A new interface is introduced because changes need to be made to the
> permissions and properties of the underlying enclave. A new virtual
> address space is not needed nor should existing VMAs be impacted.
>
> This is similar to how mmap() is not used to change file permissions.
>
> VMA permissions are separate from enclave page permissions as found in
> the EPCM (Enclave Page Cache Map). The current implementation (SGX1) already
> distinguishes between the VMA and EPCM permissions - for example, it is
> already possible to create a read-only VMA from enclave pages that have
> RW EPCM permissions. mmap() of a portion of EPC memory with a particular
> permission does not imply that the underlying EPCM permissions (should)have
> that permission.

Yes. BUT... unlike the file permissions, this leaks an implementation detail.

The user process is governed by VMA permissions. And during enclave
creation, it had to mmap() all the enclave regions to their final VMA
permissions. So during enclave creation you have to use mmap() but
after enclave creation you use custom APIs? That's inconsistent at
best.

Forcing userspace to worry about the (mostly undocumented!)
interactions between EPC, PTE and VMA permissions makes these APIs
hard to use and difficult to reason about.

When I call SGX_IOC_ENCLAVE_RELAX_PERMISSIONS, do I also have to call
mmap() to update the VMA permissions to match? It isn't clear. Nor is
it really clear why I'm calling completely separate APIs.

> > You can just simply add a new mmap flag (i.e.
> > MAP_SGX_TCS*) and then figure out which SGX instructions to execute
> > based on the desired state of the memory maps. If you do this, none of
> > the following ioctls are needed:
> >
> > * SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> > * SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> > * SGX_IOC_ENCLAVE_REMOVE_PAGES
> > * SGX_IOC_ENCLAVE_MODIFY_TYPE
> >
> > It also means that languages don't have to grow support for all these
> > ioctls. Instead, they can just reuse the existing mmap() bindings with
> > the new flag. Also, multiple operations can be combined into a single
> > mmap() call, amortizing the changes over a single context switch.
> >
> > 2. Automatically adding pages with hard-coded permissions in a fault
> > handler seems like a really bad idea.
>
> Could you please elaborate why this is a bad idea?

Because implementations that miss this subtlety suddenly have pages
with magic permissions. Magic is bad. Explicit is good.

> > How do you distinguish between
> > accesses which should result in an updated mapping and accesses that
> > should result in a fault?
>
> Accesses that should result in an updated mapping have two requirements:
> (a) address accessed belongs to the enclave based on the address
>     range specified during enclave create
> (b) there is no backing enclave page for the address

What happens if the enclave is buggy? Or has been compromised. In both
of those cases, there should be a userspace visible fault and pages
should not be added.

> > IMHO, all unmapped page accesses should
> > result in a page fault. mmap() should be called first to identify the
> > correct permissions for these pages.
> > Then the page handler should be
> > updated to use the permissions from the mapping when backfilling
> > physical pages. If I understand correctly, this should also obviate
>
> Regular enclave pages can _only_ be dynamically added with RW permission.
>
> SGX2's support for adding regular pages to an enclave via the EAUG
> instruction is architecturally set at RW. The OS cannot change those permissions
> via the EAUG instruction nor can the OS do so with a different/additional
> instruction because:
> * the OS is not able to relax permissions since that can only be done from
> within the enclave with ENCLU[EMODPE], thus it is not possible for the OS to
> dynamically add pages via EAUG as RW and then relax permissions to RWX.
> * the OS is not able to EAUG a page and immediately attempt an EMODPR either
> as Jarkko also recently inquired about:
> https://lore.kernel.org/linux-sgx/80f3d7b9-e3d5-b2c0-7707-710bf6f5081e@intel.com/

This design looks... unfinished. EAUG takes a PAGEINFO in RBX, but
PAGEINFO.SECINFO must be zeroed and EAUG instead sets magic hard-coded
permissions. Why doesn't EAUG just respect the permissions in
PAGEINFO.SECINFO? We aren't told.

Further, if the enclave can do EMODPE, why does
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS even exist? None of the
documentation explains what this ioctl even does. Does it update PTE
permissions? VMA permissions? Nobody knows without reading the source
code.

Userspace should not be bothered with the subtle details of the
interaction between EPC, PTE and VMA permissions. But this API does
everything it can do to expose all these details to userspace. And it
doesn't bother to document them (probably because it is hard). It
would be much better to avoid exposing these details to userspace.

IMHO, there should be a simple flow like this (if EAUG respects
PAGEINFO.SECINFO):

1. Non-enclave calls mmap()/munmap().
2. Enclave issues EACCEPT, if necessary.
3. Enclave issues EMODPE, if necessary.

Notice that in the second step above, during the mmap() call, the
kernel ensures that EPC, PTE and VMA are in sync and fails if they
cannot be made to be compatible. Also note that in the above flow EAUG
instructions can be efficiently batched.

Given the current poor state of the EAUG instruction, we might need to
do this flow instead:

1. Enclave issues EACCEPT, if necessary. (Add RW pages...)
2. Non-enclave calls mmap()/munmap().
3. Enclave issues EACCEPT, if necessary.
4. Enclave issues EMODPE, if necessary.

However, doing EAUG only via the page access handler means that there
is no way to batch EAUG instructions and this forces a context switch
for every page you want to add. This has to be terrible for
performance. Note specifically that the SDM calls out batching, which
is currently impossible under this patch set. 35.5.7 - "Page
allocation operations may be batched to improve efficiency."

As it stands today, if I want to add 256MiB of pages to an enclave,
I'll have to do 2^16 context switches. That doesn't seem scalable.

> > the need for the weird userspace callback to allow for execute
> > permissions.
>
> User policy integration would always be required to allow execute
> permissions on a writable page. This is not expected to be a userspace
> callback but instead integration with existing user policy subsystem(s).

Why? This isn't documented.

> > 3. Implementing as I've suggested also means that we can lock down an
> > enclave, for example - after code has been JITed, by closing the file
> > descriptor. Once the file descriptor used to create the enclave is
> > closed, no further mmap() can be performed on the enclave. Attempting
> > to do EACCEPT on an unmapped page will generate a page fault.
>
> This is not clear to me. If the file descriptor is closed and no further
> mmap() is allowed then how would a process be able to enter the enclave
> to execute code within it?

EENTER (or the vdso function) with the address of a TCS page, like
normal. In Enarx, we don't retain the enclave fd after the final
mmap() following EINIT. Everything works just fine.

> This series does indeed lock down the address range to ensure that it is
> not possible to map memory that does not belong to the enclave after the
> enclave is created. Please see:
> https://lore.kernel.org/linux-sgx/1b833dbce6c937f71523f4aaf4b2181b9673519f.1644274683.git.reinette.chatre@intel.com/

That's not what I'm talking about. I'm talking about a workflow like this:

1. Enclave initialization: ECREATE ... EINIT
2. EENTER
3. Enclave JITs some code (changes page permissions)
4. EEXIT
5. Close enclave fd.
6. EENTER
7. If an enclave attempts page modifications, a fault occurs.

Think of this similar to seccomp(). The enclave wants to do some
dynamic page table manipulation. But then it wants to lock down page
table modification so that, if compromised, attackers have no ability
to obtain RWX permissions.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-22 18:35     ` Reinette Chatre
@ 2022-02-23 15:46       ` Jarkko Sakkinen
  2022-02-23 19:55         ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-23 15:46 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
> > On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
> 
> ...
> 
> >> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> >> index 5c678b27bb72..b0ffb80bc67f 100644
> >> --- a/arch/x86/include/uapi/asm/sgx.h
> >> +++ b/arch/x86/include/uapi/asm/sgx.h
> >> @@ -31,6 +31,8 @@ enum sgx_page_flags {
> >>  	_IO(SGX_MAGIC, 0x04)
> >>  #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> >>  	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> >> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> >> +	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
> >>  
> >>  /**
> >>   * struct sgx_enclave_create - parameter structure for the
> >> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
> >>  	__u64 count;
> >>  };
> >>  
> >> +/**
> >> + * struct sgx_enclave_restrict_perm - parameters for ioctl
> >> + *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> >> + * @offset:	starting page offset (page aligned relative to enclave base
> >> + *		address defined in SECS)
> >> + * @length:	length of memory (multiple of the page size)
> >> + * @secinfo:	address for the SECINFO data containing the new permission bits
> >> + *		for pages in range described by @offset and @length
> >> + * @result:	(output) SGX result code of ENCLS[EMODPR] function
> >> + * @count:	(output) bytes successfully changed (multiple of page size)
> >> + */
> >> +struct sgx_enclave_restrict_perm {
> >> +	__u64 offset;
> >> +	__u64 length;
> >> +	__u64 secinfo;
> >> +	__u64 result;
> >> +	__u64 count;
> >> +};
> >> +
> >>  struct sgx_enclave_run;
> >>  
> >>  /**
> 
> ...
> 
> > 
> > Just a suggestion but these might be a bit less cluttered explanations of
> > the fields:
> > 
> > /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> > #[repr(C)]
> > pub struct RelaxPermissions {
> >     /// In: starting page offset
> >     offset: u64,
> >     /// In: length of the address range (multiple of the page size)
> >     length: u64,
> >     /// In: SECINFO containing the relaxed permissions
> >     secinfo: u64,
> >     /// Out: length of the address range successfully changed
> >     count: u64,
> > };
> > 
> > /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> > #[repr(C)]
> > pub struct RestrictPermissions {
> >     /// In: starting page offset
> >     offset: u64,
> >     /// In: length of the address range (multiple of the page size)
> >     length: u64,
> >     /// In: SECINFO containing the restricted permissions
> >     secinfo: u64,
> >     /// In: ENCLU[EMODPR] return value
> >     result: u64,
> >     /// Out: length of the address range successfully changed
> >     count: u64,
> > };
> 
> In your proposal you shorten the descriptions from the current implementation.
> I do consider the removed information valuable since I believe that it helps
> users understand the kernel interface requirements without needing to be
> familiar with or dig into the kernel code to understand how the provided data
> is used.
> 
> For example, you shorten offset to "starting page offset", but what was removed
> was the requirement that this offset has to be page aligned and what the offset
> is relative to. I do believe summarizing these requirements upfront helps
> a user space developer by not needing to dig through kernel code later
> in order to understand why an -EINVAL was received.
> 
>  
> > I can live with the current ones too but I rewrote them so that I can
> > quickly make sense of the fields later. It's Rust code but the point is
> > the documentation...
> 
> Since you do seem to be ok with the current descriptions I would prefer
> to keep them.

Yeah, they are fine to me.

> > Also, it should not be too much trouble to use the struct in user space
> > code even if the struct names are struct sgx_enclave_relax_permissions and
> > struct sgx_enclave_restrict_permissions, given that you most likely have
> > exactly single call-site in the run-time.
> 
> Are you requesting that I make the following name changes?
> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions
> 
> If so, do you want the function names also written out in this way?
> sgx_enclave_relax_perm()        -> sgx_enclave_relax_permissions()
> sgx_ioc_enclave_relax_perm()    -> sgx_ioc_enclave_relax_permissions()
> sgx_enclave_restrict_perm()     -> sgx_enclave_restrict_permissions()
> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()

Yes, unless you have a specific reason to shorten them :-)

> > Other than that, looks quite good.
> 
> Thank you very much for reviewing and testing this work.

NP
 
> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave
  2022-02-22 19:19         ` Reinette Chatre
@ 2022-02-23 15:46           ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-23 15:46 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Tue, Feb 22, 2022 at 11:19:11AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 2/20/2022 10:40 AM, Jarkko Sakkinen wrote:
> ...
>  
> > Do you know if it is possible to do EAUG, EMODPR and the do a single
> > EACCEPT for both? Just looking at pseudo-code, it looked doable but
> > I need to check this.
> > 
> > I.e. EAUG has this
> > 
> > EPCM(DS:RCX).BLOCKED := 0;
> > EPCM(DS:RCX).PENDING := 1;
> > EPCM(DS:RCX).MODIFIED := 0;
> > EPCM(DS:RCX).PR := 0;
> > (* associate the EPCPAGE with the SECS by storing the SECS identifier of DS:TMP_SECS *)
> > Update EPCM(DS:RCX) SECS identifier to reference DS:TMP_SECS identifier;
> > (* Set EPCM valid fields *)
> > EPCM(DS:RCX).VALID := 1;
> > 
> > And EMODPR only checks .VALID.
> 
> After that check there is also:
> IF (EPCM(DS:RCX).PENDING is not 0 or (EPCM(DS:RCX).MODIFIED is not 0) )
>     THEN
>         RFLAGS.ZF := 1;
>         RAX := SGX_PAGE_NOT_MODIFIABLE;
>         GOTO DONE;
> FI;
> 
> Attempting the SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl() on a recently
> added page (EAUG) that has not yet been EACCEPTed is thus expected to fail
> with errno of EFAULT (indicating ENCLS[EMODPR] failure) and the returned
> structure's result field set to 20 (SGX_PAGE_NOT_MODIFIABLE).
> 
> I confirmed this behavior by modifying the "augment" kselftest test by adding
> a SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS call between the new memory access and
> the EACCEPT.

Thank you, also Mark confirmed this.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-02-23 13:24     ` Nathaniel McCallum
@ 2022-02-23 18:25       ` Reinette Chatre
  2022-03-02 16:57         ` Nathaniel McCallum
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-23 18:25 UTC (permalink / raw)
  To: Nathaniel McCallum
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

Hi Nathaniel,

On 2/23/2022 5:24 AM, Nathaniel McCallum wrote:
> On Tue, Feb 22, 2022 at 5:39 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Nathaniel,
>>
>> On 2/22/2022 12:27 PM, Nathaniel McCallum wrote:
>>> 1. This interface looks very odd to me. mmap() is the kernel interface
>>> for changing user space memory maps. Why are we introducing a new
>>> interface for this?
>>
>> mmap() is the kernel interface used to create new mappings in the
>> virtual address space of the calling process. This is different from
>> the permissions and properties of the underlying file/memory being mapped.
>>
>> A new interface is introduced because changes need to be made to the
>> permissions and properties of the underlying enclave. A new virtual
>> address space is not needed nor should existing VMAs be impacted.
>>
>> This is similar to how mmap() is not used to change file permissions.
>>
>> VMA permissions are separate from enclave page permissions as found in
>> the EPCM (Enclave Page Cache Map). The current implementation (SGX1) already
>> distinguishes between the VMA and EPCM permissions - for example, it is
>> already possible to create a read-only VMA from enclave pages that have
>> RW EPCM permissions. mmap() of a portion of EPC memory with a particular
>> permission does not imply that the underlying EPCM permissions (should)have
>> that permission.
> 
> Yes. BUT... unlike the file permissions, this leaks an implementation detail.

Not really - just like a RW file can be mapped read-only or RW, RW enclave
memory can be mapped read-only or RW.

> 
> The user process is governed by VMA permissions. And during enclave
> creation, it had to mmap() all the enclave regions to their final VMA
> permissions. So during enclave creation you have to use mmap() but
> after enclave creation you use custom APIs? That's inconsistent at
> best.

No. ioctl()s are consistently used to manage enclave memory.

The existing ioctls() SGX_IOC_ENCLAVE_CREATE, SGX_IOC_ENCLAVE_ADD_PAGES,
and SGX_IOC_ENCLAVE_INIT are used to set up to initialize the enclave memory.

The new ioctls() are used to manage enclave memory after enclave initialization.

The enclave memory is thus managed with a consistent interface.

mmap() is required before SGX_IOC_ENCLAVE_CREATE to obtain a base address
for the enclave that is required by the ioctl(). The rest of the ioctl()s,
existing and new, are consistent in interface by not requiring a memory
mapping but instead work from an offset from the base address.
 
> Forcing userspace to worry about the (mostly undocumented!)
> interactions between EPC, PTE and VMA permissions makes these APIs
> hard to use and difficult to reason about.

This is not new. The current SGX1 user space is already prevented from
creating a mapping of enclave memory that is more relaxed than the enclave
memory. For example, if the enclave memory has RW EPCM permissions then it
is not possible to mmap() that memory as RWX.

> 
> When I call SGX_IOC_ENCLAVE_RELAX_PERMISSIONS, do I also have to call
> mmap() to update the VMA permissions to match? It isn't clear. Nor is

mprotect() may be the better call to use.

> it really clear why I'm calling completely separate APIs.
> 
>>> You can just simply add a new mmap flag (i.e.
>>> MAP_SGX_TCS*) and then figure out which SGX instructions to execute
>>> based on the desired state of the memory maps. If you do this, none of
>>> the following ioctls are needed:
>>>
>>> * SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>>> * SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
>>> * SGX_IOC_ENCLAVE_REMOVE_PAGES
>>> * SGX_IOC_ENCLAVE_MODIFY_TYPE
>>>
>>> It also means that languages don't have to grow support for all these
>>> ioctls. Instead, they can just reuse the existing mmap() bindings with
>>> the new flag. Also, multiple operations can be combined into a single
>>> mmap() call, amortizing the changes over a single context switch.
>>>
>>> 2. Automatically adding pages with hard-coded permissions in a fault
>>> handler seems like a really bad idea.
>>
>> Could you please elaborate why this is a bad idea?
> 
> Because implementations that miss this subtlety suddenly have pages
> with magic permissions. Magic is bad. Explicit is good.
> 

There is no magic. Any new pages have to be accepted by the enclave.
The enclave will not be able to access these pages unless explicitly
accepted, ENCLU[EACCEPT], from within the enclave.

>>> How do you distinguish between
>>> accesses which should result in an updated mapping and accesses that
>>> should result in a fault?
>>
>> Accesses that should result in an updated mapping have two requirements:
>> (a) address accessed belongs to the enclave based on the address
>>     range specified during enclave create
>> (b) there is no backing enclave page for the address
> 
> What happens if the enclave is buggy? Or has been compromised. In both
> of those cases, there should be a userspace visible fault and pages
> should not be added.

If user space accesses a memory address with a regular read/write that
results in a new page added then there is indeed a user space visible
fault. You can see this flow in action in the "augment" test case in
https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/

If user space indeed wants the page after encountering such a fault then
it needs to enter the enclave again, from a different entry point, to
run ENCLU[EACCEPT], before it can return to the original entry point to
continue execution from the instruction that triggered the original read/write.

The only flow where a page is added without a user space visible fault
is when user space explicitly runs the ENCLU[EACCEPT] to do so.

> 
>>> IMHO, all unmapped page accesses should
>>> result in a page fault. mmap() should be called first to identify the
>>> correct permissions for these pages.
>>> Then the page handler should be
>>> updated to use the permissions from the mapping when backfilling
>>> physical pages. If I understand correctly, this should also obviate
>>
>> Regular enclave pages can _only_ be dynamically added with RW permission.
>>
>> SGX2's support for adding regular pages to an enclave via the EAUG
>> instruction is architecturally set at RW. The OS cannot change those permissions
>> via the EAUG instruction nor can the OS do so with a different/additional
>> instruction because:
>> * the OS is not able to relax permissions since that can only be done from
>> within the enclave with ENCLU[EMODPE], thus it is not possible for the OS to
>> dynamically add pages via EAUG as RW and then relax permissions to RWX.
>> * the OS is not able to EAUG a page and immediately attempt an EMODPR either
>> as Jarkko also recently inquired about:
>> https://lore.kernel.org/linux-sgx/80f3d7b9-e3d5-b2c0-7707-710bf6f5081e@intel.com/
> 
> This design looks... unfinished. EAUG takes a PAGEINFO in RBX, but
> PAGEINFO.SECINFO must be zeroed and EAUG instead sets magic hard-coded
> permissions. Why doesn't EAUG just respect the permissions in
> PAGEINFO.SECINFO? We aren't told.

This design is finished and respects the hardware specification. You can find
the details in the SDM's documentation of the EAUG function.

If the SECINFO field has a value then the hardware requires it to indicate
that it is a new shadow stack page being added, not a regular page. Support for
shadow stack pages is not in scope for this work. Attempting to dynamically
add a regular page with explicit permissions will result in a #GP(0).

The only way to add a regular enclave page is to make the SECINFO field empty
and doing so forces the page type to be a regular page and the permissions to
be RW.

> 
> Further, if the enclave can do EMODPE, why does
> SGX_IOC_ENCLAVE_RELAX_PERMISSIONS even exist? None of the
> documentation explains what this ioctl even does. Does it update PTE
> permissions? VMA permissions? Nobody knows without reading the source
> code.

Build the documentation (after applying this series) and it should
contain all the information you are searching for. As is the current custom
in the SGX documentation the built documentation pulls its content from
the kernel doc of the functions that implement the core of the 
user space interactions.

> 
> Userspace should not be bothered with the subtle details of the
> interaction between EPC, PTE and VMA permissions. But this API does
> everything it can do to expose all these details to userspace. And it
> doesn't bother to document them (probably because it is hard). It
> would be much better to avoid exposing these details to userspace.
> 
> IMHO, there should be a simple flow like this (if EAUG respects
> PAGEINFO.SECINFO):

EAUG does not respect PAGEINFO.SECINFO for regular pages.

> 
> 1. Non-enclave calls mmap()/munmap().
> 2. Enclave issues EACCEPT, if necessary.
> 3. Enclave issues EMODPE, if necessary.
> 
> Notice that in the second step above, during the mmap() call, the
> kernel ensures that EPC, PTE and VMA are in sync and fails if they
> cannot be made to be compatible. Also note that in the above flow EAUG
> instructions can be efficiently batched.
> 
> Given the current poor state of the EAUG instruction, we might need to
> do this flow instead:
> 
> 1. Enclave issues EACCEPT, if necessary. (Add RW pages...)
> 2. Non-enclave calls mmap()/munmap().
> 3. Enclave issues EACCEPT, if necessary.
> 4. Enclave issues EMODPE, if necessary.
> 
> However, doing EAUG only via the page access handler means that there
> is no way to batch EAUG instructions and this forces a context switch
> for every page you want to add. This has to be terrible for
> performance. Note specifically that the SDM calls out batching, which
> is currently impossible under this patch set. 35.5.7 - "Page
> allocation operations may be batched to improve efficiency."

These page functions are all per-page so it is not possible to add multiple
pages with a single instruction. It is indeed possible to pre-fault pages.
 
> As it stands today, if I want to add 256MiB of pages to an enclave,
> I'll have to do 2^16 context switches. That doesn't seem scalable.

No. Running ENCLU[EACCEPT] on each of the pages within that range should not
need any explicit context switch out of the enclave. See the "augment_via_eaccept" 
test case in:
https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/


>>> the need for the weird userspace callback to allow for execute
>>> permissions.
>>
>> User policy integration would always be required to allow execute
>> permissions on a writable page. This is not expected to be a userspace
>> callback but instead integration with existing user policy subsystem(s).
> 
> Why? This isn't documented.

This is similar to the existing policies involved in managing the permissions
of mapped memory. When user space calls mprotect() to change permissions
of a mapped region then the kernel will not blindly allow the permissions but
instead ensure that it is allowed based on user policy by calling the LSM
(Linux Security Module) hooks.

You can learn more about LSM and various security modules at:
Documentation/security/lsm.rst
Documentation/admin-guide/LSM/*

You can compare what is needed here to what is currently done when user space
attempts to make some memory executable (see:
mm/mprotect.c:do_mprotect_key()->security_file_mprotect()). User policy needs
to help the kernel determine if this is allowed. For example, when SELinux is
the security module of choice then the process or file (depending on what type
of memory is being changed) needs to have a special permission (PROCESS__EXECHEAP,
PROCESS__EXECSTACK, or FILE__EXECMOD) assigned by user space to allow this.

Integration with user space policy is required for RWX of dynamically added pages
to be supported. In this series dynamically added pages will not be allowed to
be made executable, a follow-up series will add support for user policy
integration to support RWX permissions of dynamically added pages.

>>> 3. Implementing as I've suggested also means that we can lock down an
>>> enclave, for example - after code has been JITed, by closing the file
>>> descriptor. Once the file descriptor used to create the enclave is
>>> closed, no further mmap() can be performed on the enclave. Attempting
>>> to do EACCEPT on an unmapped page will generate a page fault.
>>
>> This is not clear to me. If the file descriptor is closed and no further
>> mmap() is allowed then how would a process be able to enter the enclave
>> to execute code within it?
> 
> EENTER (or the vdso function) with the address of a TCS page, like
> normal. In Enarx, we don't retain the enclave fd after the final
> mmap() following EINIT. Everything works just fine.

The OS fault handler is responsible for managing the PTEs that is required
for the enclave to be able to access the memory within the enclave.
The OS fault handler is attached to a VMA that is created with mmap(). 

> 
>> This series does indeed lock down the address range to ensure that it is
>> not possible to map memory that does not belong to the enclave after the
>> enclave is created. Please see:
>> https://lore.kernel.org/linux-sgx/1b833dbce6c937f71523f4aaf4b2181b9673519f.1644274683.git.reinette.chatre@intel.com/
> 
> That's not what I'm talking about. I'm talking about a workflow like this:
> 
> 1. Enclave initialization: ECREATE ... EINIT
> 2. EENTER
> 3. Enclave JITs some code (changes page permissions)
> 4. EEXIT
> 5. Close enclave fd.
> 6. EENTER
> 7. If an enclave attempts page modifications, a fault occurs.

The original fd that was created to obtain the enclave base address
may be closed at (5) but the executable and data portions of the enclave
still needs to be mapped afterwards to be able to have OS support for
managing the PTEs that the enclave depends on to access those pages.

> 
> Think of this similar to seccomp(). The enclave wants to do some
> dynamic page table manipulation. But then it wants to lock down page
> table modification so that, if compromised, attackers have no ability
> to obtain RWX permissions.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-21  0:49   ` Jarkko Sakkinen
  2022-02-22 18:35     ` Reinette Chatre
@ 2022-02-23 19:21     ` Dhanraj, Vijay
  2022-02-23 22:42       ` Reinette Chatre
                         ` (2 more replies)
  1 sibling, 3 replies; 130+ messages in thread
From: Dhanraj, Vijay @ 2022-02-23 19:21 UTC (permalink / raw)
  To: Jarkko Sakkinen, Chatre, Reinette
  Cc: dave.hansen, tglx, bp, Lutomirski, Andy, mingo, linux-sgx, x86,
	Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi All,

Regarding the recent update of splitting the page permissions change request into two IOCTLS (RELAX and RESTRICT), can we combine them into one? That is, revert to how it was done in the v1 version?

Why? Currently in Gramine (a library OS for unmodified applications, https://gramineproject.io/) with the new proposed change, one needs to store the page permission for each page or range of pages. And for every request of `mmap` or `mprotect`, Gramine would have to do a lookup of the page permissions for the request range and then call the respective IOCTL either RESTRICT or RELAX. This seems a little overwhelming.

Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? With this approach, we can avoid storing  page permissions and simplify the implementation.

I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows to do TLB shootdowns which might not be needed for RELAX IOCTL but I am not sure what will be the performance impact. Is there any data point to see the performance impact?

Thanks,
-Vijay

> -----Original Message-----
> From: Jarkko Sakkinen <jarkko@kernel.org>
> Sent: Sunday, February 20, 2022 4:50 PM
> To: Reinette Chatre <reinette.chatre@intel.com>
> Cc: dave.hansen@linux.intel.com; tglx@linutronix.de; bp@alien8.de;
> luto@kernel.org; mingo@redhat.com; linux-sgx@vger.kernel.org;
> x86@kernel.org; seanjc@google.com; kai.huang@intel.com;
> cathy.zhang@intel.com; cedric.xing@intel.com; haitao.huang@intel.com;
> mark.shanahan@intel.com; hpa@zytor.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page
> permissions
> 
> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
> > In the initial (SGX1) version of SGX, pages in an enclave need to be
> > created with permissions that support all usages of the pages, from
> > the time the enclave is initialized until it is unloaded. For example,
> > pages used by a JIT compiler or when code needs to otherwise be
> > relocated need to always have RWX permissions.
> >
> > SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel
> > and can be used to restrict the EPCM permissions of regular enclave
> > pages within an initialized enclave.
> >
> > Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support
> > restricting EPCM permissions. With this ioctl() the user specifies a
> > page range and the permissions to be applied to all pages in the
> > provided range. After checking the new permissions (more detail
> > below) the page table entries are reset and any new page table entries
> > will contain the new, restricted, permissions.
> > ENCLS[EMODPR] is run to restrict the EPCM permissions followed by the
> > ENCLS[ETRACK] flow that will ensure no cached linear-to-physical
> > address mappings to the changed pages remain.
> >
> > It is possible for the permission change request to fail on any page
> > within the provided range, either with an error encountered by the
> > kernel or by the SGX hardware while running ENCLS[EMODPR]. To support
> > partial success the ioctl() returns an error code based on failures
> > encountered by the kernel as well as two result output parameters: one
> > for the number of pages that were successfully changed and one for the
> > SGX return code.
> >
> > Checking user provided new permissions
> > ======================================
> >
> > Enclave page permission changes need to be approached with care and
> > for this reason permission changes are only allowed if the new
> > permissions are the same or more restrictive that the vetted
> > permissions. No additional checking is done to ensure that the
> > permissions are actually being restricted. This is because the enclave
> > may have relaxed the EPCM permissions from within the enclave without
> > letting the kernel know. An attempt to relax permissions using this
> > call will be ignored by the hardware.
> >
> > For example, together with the support for relaxing of EPCM
> > permissions, enclave pages added with the vetted permissions in
> > brackets below are allowed to have permissions as follows:
> > * (RWX) => RW => R => RX => RWX
> > * (RW) => R => RW
> > * (RX) => R => RX
> >
> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > ---
> > Changes since V1:
> > - Change terminology to use "relax" instead of "extend" to refer to
> >   the case when enclave page permissions are added (Dave).
> > - Use ioctl() in commit message (Dave).
> > - Add examples on what permissions would be allowed (Dave).
> > - Split enclave page permission changes into two ioctl()s, one for
> >   permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
> >   and one for permission relaxing
> (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
> >   (Jarkko).
> > - In support of the ioctl() name change the following names have been
> >   changed:
> >   struct sgx_page_modp -> struct sgx_enclave_restrict_perm
> >   sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm()
> >   sgx_page_modp() -> sgx_enclave_restrict_perm()
> > - ioctl() takes entire secinfo as input instead of
> >   page permissions only (Jarkko).
> > - Fix kernel-doc to include () in function name.
> > - Create and use utility for the ETRACK flow.
> > - Fixups in comments
> > - Move kernel-doc to function that provides documentation for
> >   Documentation/x86/sgx.rst.
> > - Remove redundant comment.
> > - Make explicit which members of struct sgx_enclave_restrict_perm
> >   are for output (Dave).
> >
> >  arch/x86/include/uapi/asm/sgx.h |  21 +++
> >  arch/x86/kernel/cpu/sgx/encl.c  |   4 +-
> >  arch/x86/kernel/cpu/sgx/encl.h  |   3 +
> >  arch/x86/kernel/cpu/sgx/ioctl.c | 229
> > ++++++++++++++++++++++++++++++++
> >  4 files changed, 255 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/include/uapi/asm/sgx.h
> > b/arch/x86/include/uapi/asm/sgx.h index 5c678b27bb72..b0ffb80bc67f
> > 100644
> > --- a/arch/x86/include/uapi/asm/sgx.h
> > +++ b/arch/x86/include/uapi/asm/sgx.h
> > @@ -31,6 +31,8 @@ enum sgx_page_flags {
> >  	_IO(SGX_MAGIC, 0x04)
> >  #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> >  	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> > +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> > +	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
> >
> >  /**
> >   * struct sgx_enclave_create - parameter structure for the @@ -95,6
> > +97,25 @@ struct sgx_enclave_relax_perm {
> >  	__u64 count;
> >  };
> >
> > +/**
> > + * struct sgx_enclave_restrict_perm - parameters for ioctl
> > + *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> > + * @offset:	starting page offset (page aligned relative to enclave base
> > + *		address defined in SECS)
> > + * @length:	length of memory (multiple of the page size)
> > + * @secinfo:	address for the SECINFO data containing the new permission
> bits
> > + *		for pages in range described by @offset and @length
> > + * @result:	(output) SGX result code of ENCLS[EMODPR] function
> > + * @count:	(output) bytes successfully changed (multiple of page size)
> > + */
> > +struct sgx_enclave_restrict_perm {
> > +	__u64 offset;
> > +	__u64 length;
> > +	__u64 secinfo;
> > +	__u64 result;
> > +	__u64 count;
> > +};
> > +
> >  struct sgx_enclave_run;
> >
> >  /**
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> > b/arch/x86/kernel/cpu/sgx/encl.c index 8da813504249..a5d4a7efb986
> > 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct
> sgx_encl_page *encl_page,
> >  	return epc_page;
> >  }
> >
> > -static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > -						unsigned long addr)
> > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > +					 unsigned long addr)
> >  {
> >  	struct sgx_epc_page *epc_page;
> >  	struct sgx_encl_page *entry;
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h
> > b/arch/x86/kernel/cpu/sgx/encl.h index cb9f16d457ac..848a28d28d3d
> > 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page
> *va_page,
> > unsigned int offset);  bool sgx_va_page_full(struct sgx_va_page
> > *va_page);  void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> >
> > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > +					 unsigned long addr);
> > +
> >  #endif /* _X86_ENCL_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c
> > b/arch/x86/kernel/cpu/sgx/ioctl.c index 9cc6af404bf6..23bdf558b231
> > 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct
> sgx_encl *encl, void __user *arg)
> >  	return ret;
> >  }
> >
> > +/*
> > + * Some SGX functions require that no cached linear-to-physical
> > +address
> > + * mappings are present before they can succeed. Collaborate with
> > + * hardware via ENCLS[ETRACK] to ensure that all cached
> > + * linear-to-physical address mappings belonging to all threads of
> > + * the enclave are cleared. See sgx_encl_cpumask() for details.
> > + */
> > +static int sgx_enclave_etrack(struct sgx_encl *encl) {
> > +	void *epc_virt;
> > +	int ret;
> > +
> > +	epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page);
> > +	ret = __etrack(epc_virt);
> > +	if (ret) {
> > +		/*
> > +		 * ETRACK only fails when there is an OS issue. For
> > +		 * example, two consecutive ETRACK was sent without
> > +		 * completed IPI between.
> > +		 */
> > +		pr_err_once("ETRACK returned %d (0x%x)", ret, ret);
> > +		/*
> > +		 * Send IPIs to kick CPUs out of the enclave and
> > +		 * try ETRACK again.
> > +		 */
> > +		on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb,
> NULL, 1);
> > +		ret = __etrack(epc_virt);
> > +		if (ret) {
> > +			pr_err_once("ETRACK repeat returned %d (0x%x)",
> > +				    ret, ret);
> > +			return -EFAULT;
> > +		}
> > +	}
> > +	on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1);
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS
> view
> > + * @encl:	Enclave to which the pages belong.
> > + * @modp:	Checked parameters from user on which pages need
> modifying.
> > + * @secinfo_perm: New (validated) permission bits.
> > + *
> > + * Return:
> > + * - 0:		Success.
> > + * - -errno:	Otherwise.
> > + */
> > +static long sgx_enclave_restrict_perm(struct sgx_encl *encl,
> > +				      struct sgx_enclave_restrict_perm *modp,
> > +				      u64 secinfo_perm)
> > +{
> > +	unsigned long vm_prot, run_prot_restore;
> > +	struct sgx_encl_page *entry;
> > +	struct sgx_secinfo secinfo;
> > +	unsigned long addr;
> > +	unsigned long c;
> > +	void *epc_virt;
> > +	int ret;
> > +
> > +	memset(&secinfo, 0, sizeof(secinfo));
> > +	secinfo.flags = secinfo_perm;
> > +
> > +	vm_prot = vm_prot_from_secinfo(secinfo_perm);
> > +
> > +	for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
> > +		addr = encl->base + modp->offset + c;
> > +
> > +		mutex_lock(&encl->lock);
> > +
> > +		entry = sgx_encl_load_page(encl, addr);
> > +		if (IS_ERR(entry)) {
> > +			ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -
> EFAULT;
> > +			goto out_unlock;
> > +		}
> > +
> > +		/*
> > +		 * Changing EPCM permissions is only supported on regular
> > +		 * SGX pages. Attempting this change on other pages will
> > +		 * result in #PF.
> > +		 */
> > +		if (entry->type != SGX_PAGE_TYPE_REG) {
> > +			ret = -EINVAL;
> > +			goto out_unlock;
> > +		}
> > +
> > +		/*
> > +		 * Do not verify if current runtime protection bits are what
> > +		 * is being requested. The enclave may have relaxed EPCM
> > +		 * permissions calls without letting the kernel know and
> > +		 * thus permission restriction may still be needed even if
> > +		 * from the kernel's perspective the permissions are
> unchanged.
> > +		 */
> > +
> > +		/* New permissions should never exceed vetted
> permissions. */
> > +		if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
> > +			ret = -EPERM;
> > +			goto out_unlock;
> > +		}
> > +
> > +		/* Make sure page stays around while releasing mutex. */
> > +		if (sgx_unmark_page_reclaimable(entry->epc_page)) {
> > +			ret = -EAGAIN;
> > +			goto out_unlock;
> > +		}
> > +
> > +		/*
> > +		 * Change runtime protection before zapping PTEs to ensure
> > +		 * any new #PF uses new permissions. EPCM permissions (if
> > +		 * needed) not changed yet.
> > +		 */
> > +		run_prot_restore = entry->vm_run_prot_bits;
> > +		entry->vm_run_prot_bits = vm_prot;
> > +
> > +		mutex_unlock(&encl->lock);
> > +		/*
> > +		 * Do not keep encl->lock because of dependency on
> > +		 * mmap_lock acquired in sgx_zap_enclave_ptes().
> > +		 */
> > +		sgx_zap_enclave_ptes(encl, addr);
> > +
> > +		mutex_lock(&encl->lock);
> > +
> > +		/* Change EPCM permissions. */
> > +		epc_virt = sgx_get_epc_virt_addr(entry->epc_page);
> > +		ret = __emodpr(&secinfo, epc_virt);
> > +		if (encls_faulted(ret)) {
> > +			/*
> > +			 * All possible faults should be avoidable:
> > +			 * parameters have been checked, will only change
> > +			 * permissions of a regular page, and no concurrent
> > +			 * SGX1/SGX2 ENCLS instructions since these
> > +			 * are protected with mutex.
> > +			 */
> > +			pr_err_once("EMODPR encountered exception
> %d\n",
> > +				    ENCLS_TRAPNR(ret));
> > +			ret = -EFAULT;
> > +			goto out_prot_restore;
> > +		}
> > +		if (encls_failed(ret)) {
> > +			modp->result = ret;
> > +			ret = -EFAULT;
> > +			goto out_prot_restore;
> > +		}
> > +
> > +		ret = sgx_enclave_etrack(encl);
> > +		if (ret) {
> > +			ret = -EFAULT;
> > +			goto out_reclaim;
> > +		}
> > +
> > +		sgx_mark_page_reclaimable(entry->epc_page);
> > +		mutex_unlock(&encl->lock);
> > +	}
> > +
> > +	ret = 0;
> > +	goto out;
> > +
> > +out_prot_restore:
> > +	entry->vm_run_prot_bits = run_prot_restore;
> > +out_reclaim:
> > +	sgx_mark_page_reclaimable(entry->epc_page);
> > +out_unlock:
> > +	mutex_unlock(&encl->lock);
> > +out:
> > +	modp->count = c;
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * sgx_ioc_enclave_restrict_perm() - handler for
> > + *                                   %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> > + * @encl:	an enclave pointer
> > + * @arg:	userspace pointer to a &struct sgx_enclave_restrict_perm
> > + *		instance
> > + *
> > + * SGX2 distinguishes between relaxing and restricting the enclave
> > +page
> > + * permissions maintained by the hardware (EPCM permissions) of pages
> > + * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
> > + *
> > + * EPCM permissions cannot be restricted from within the enclave, the
> > +enclave
> > + * requires the kernel to run the privileged level 0 instructions
> > +ENCLS[EMODPR]
> > + * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this
> > +call
> > + * will be ignored by the hardware.
> > + *
> > + * Enclave page permissions are not allowed to exceed the maximum
> > +vetted
> > + * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits.
> > + *
> > + * Return:
> > + * - 0:		Success
> > + * - -errno:	Otherwise
> > + */
> > +static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl,
> > +					  void __user *arg)
> > +{
> > +	struct sgx_enclave_restrict_perm params;
> > +	u64 secinfo_perm;
> > +	long ret;
> > +
> > +	ret = sgx_ioc_sgx2_ready(encl);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (copy_from_user(&params, arg, sizeof(params)))
> > +		return -EFAULT;
> > +
> > +	if (sgx_validate_offset_length(encl, params.offset, params.length))
> > +		return -EINVAL;
> > +
> > +	ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
> > +					 &secinfo_perm);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (params.result || params.count)
> > +		return -EINVAL;
> > +
> > +	ret = sgx_enclave_restrict_perm(encl, &params, secinfo_perm);
> > +
> > +	if (copy_to_user(arg, &params, sizeof(params)))
> > +		return -EFAULT;
> > +
> > +	return ret;
> > +}
> > +
> >  long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long
> > arg)  {
> >  	struct sgx_encl *encl = filep->private_data; @@ -918,6 +1144,9 @@
> > long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> >  	case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
> >  		ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
> >  		break;
> > +	case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
> > +		ret = sgx_ioc_enclave_restrict_perm(encl, (void __user
> *)arg);
> > +		break;
> >  	default:
> >  		ret = -ENOIOCTLCMD;
> >  		break;
> > --
> > 2.25.1
> >
> 
> Just a suggestion but these might be a bit less cluttered explanations of
> the fields:
> 
> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RelaxPermissions {
>     /// In: starting page offset
>     offset: u64,
>     /// In: length of the address range (multiple of the page size)
>     length: u64,
>     /// In: SECINFO containing the relaxed permissions
>     secinfo: u64,
>     /// Out: length of the address range successfully changed
>     count: u64,
> };
> 
> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> #[repr(C)]
> pub struct RestrictPermissions {
>     /// In: starting page offset
>     offset: u64,
>     /// In: length of the address range (multiple of the page size)
>     length: u64,
>     /// In: SECINFO containing the restricted permissions
>     secinfo: u64,
>     /// In: ENCLU[EMODPR] return value
>     result: u64,
>     /// Out: length of the address range successfully changed
>     count: u64,
> };
> 
> I can live with the current ones too but I rewrote them so that I can
> quickly make sense of the fields later. It's Rust code but the point is
> the documentation...
> 
> Also, it should not be too much trouble to use the struct in user space
> code even if the struct names are struct sgx_enclave_relax_permissions and
> struct sgx_enclave_restrict_permissions, given that you most likely have
> exactly single call-site in the run-time.
> 
> Other than that, looks quite good.
> 
> BR, Jarkko


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-23 15:46       ` Jarkko Sakkinen
@ 2022-02-23 19:55         ` Reinette Chatre
  2022-02-28 12:27           ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-02-23 19:55 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

Hi Jarkko,

On 2/23/2022 7:46 AM, Jarkko Sakkinen wrote:
> On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
>>> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
>>
>> ...
>>
>>>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
>>>> index 5c678b27bb72..b0ffb80bc67f 100644
>>>> --- a/arch/x86/include/uapi/asm/sgx.h
>>>> +++ b/arch/x86/include/uapi/asm/sgx.h
>>>> @@ -31,6 +31,8 @@ enum sgx_page_flags {
>>>>  	_IO(SGX_MAGIC, 0x04)
>>>>  #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
>>>>  	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
>>>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
>>>> +	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
>>>>  
>>>>  /**
>>>>   * struct sgx_enclave_create - parameter structure for the
>>>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
>>>>  	__u64 count;
>>>>  };
>>>>  
>>>> +/**
>>>> + * struct sgx_enclave_restrict_perm - parameters for ioctl
>>>> + *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
>>>> + * @offset:	starting page offset (page aligned relative to enclave base
>>>> + *		address defined in SECS)
>>>> + * @length:	length of memory (multiple of the page size)
>>>> + * @secinfo:	address for the SECINFO data containing the new permission bits
>>>> + *		for pages in range described by @offset and @length
>>>> + * @result:	(output) SGX result code of ENCLS[EMODPR] function
>>>> + * @count:	(output) bytes successfully changed (multiple of page size)
>>>> + */
>>>> +struct sgx_enclave_restrict_perm {
>>>> +	__u64 offset;
>>>> +	__u64 length;
>>>> +	__u64 secinfo;
>>>> +	__u64 result;
>>>> +	__u64 count;
>>>> +};
>>>> +
>>>>  struct sgx_enclave_run;
>>>>  
>>>>  /**
>>
>> ...
>>
>>>
>>> Just a suggestion but these might be a bit less cluttered explanations of
>>> the fields:
>>>
>>> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
>>> #[repr(C)]
>>> pub struct RelaxPermissions {
>>>     /// In: starting page offset
>>>     offset: u64,
>>>     /// In: length of the address range (multiple of the page size)
>>>     length: u64,
>>>     /// In: SECINFO containing the relaxed permissions
>>>     secinfo: u64,
>>>     /// Out: length of the address range successfully changed
>>>     count: u64,
>>> };
>>>
>>> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
>>> #[repr(C)]
>>> pub struct RestrictPermissions {
>>>     /// In: starting page offset
>>>     offset: u64,
>>>     /// In: length of the address range (multiple of the page size)
>>>     length: u64,
>>>     /// In: SECINFO containing the restricted permissions
>>>     secinfo: u64,
>>>     /// In: ENCLU[EMODPR] return value
>>>     result: u64,
>>>     /// Out: length of the address range successfully changed
>>>     count: u64,
>>> };
>>
>> In your proposal you shorten the descriptions from the current implementation.
>> I do consider the removed information valuable since I believe that it helps
>> users understand the kernel interface requirements without needing to be
>> familiar with or dig into the kernel code to understand how the provided data
>> is used.
>>
>> For example, you shorten offset to "starting page offset", but what was removed
>> was the requirement that this offset has to be page aligned and what the offset
>> is relative to. I do believe summarizing these requirements upfront helps
>> a user space developer by not needing to dig through kernel code later
>> in order to understand why an -EINVAL was received.
>>
>>  
>>> I can live with the current ones too but I rewrote them so that I can
>>> quickly make sense of the fields later. It's Rust code but the point is
>>> the documentation...
>>
>> Since you do seem to be ok with the current descriptions I would prefer
>> to keep them.
> 
> Yeah, they are fine to me.
> 
>>> Also, it should not be too much trouble to use the struct in user space
>>> code even if the struct names are struct sgx_enclave_relax_permissions and
>>> struct sgx_enclave_restrict_permissions, given that you most likely have
>>> exactly single call-site in the run-time.
>>
>> Are you requesting that I make the following name changes?
>> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
>> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions
>>
>> If so, do you want the function names also written out in this way?
>> sgx_enclave_relax_perm()        -> sgx_enclave_relax_permissions()
>> sgx_ioc_enclave_relax_perm()    -> sgx_ioc_enclave_relax_permissions()
>> sgx_enclave_restrict_perm()     -> sgx_enclave_restrict_permissions()
>> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()
> 
> Yes, unless you have a specific reason to shorten them :-)

Just aesthetic reasons ... having a long function name can look unbalanced
if it has many parameters and if the parameters themselves are long it
becomes hard to keep to the required line length.

Even so, it does look as though the longest ones can be made to work within 80
characters:
sgx_enclave_restrict_permissions(...
                                 struct sgx_enclave_restrict_permissions *modp,
                                 ...)

Other (aesthetic) consequence would be, for example, the core sgx_ioctl() would
now have some branches span more lines than other so it would not look as neat as
now (this is subjective I know).

Apart from the aesthetic reasons I do not have another reason not to make the
change and I will do so in the next version.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-23 19:21     ` Dhanraj, Vijay
@ 2022-02-23 22:42       ` Reinette Chatre
  2022-02-28 12:24       ` Jarkko Sakkinen
  2022-03-10  6:10       ` Jarkko Sakkinen
  2 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-02-23 22:42 UTC (permalink / raw)
  To: Dhanraj, Vijay, Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, Lutomirski, Andy, mingo, linux-sgx, x86,
	Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Vijay,

On 2/23/2022 11:21 AM, Dhanraj, Vijay wrote:
> Hi All,
> 
> Regarding the recent update of splitting the page permissions changerequest
> into two IOCTLS (RELAX and RESTRICT), can we combine them into one? That is,
> revert to how it was done in the v1 version?

While V1 did have a single ioctl() to handle both relaxing and restricting
permissions it never was possible for the kernel to distinguish what the
user intended. For this reason, even though there was a single ioctl() in V1,
it implemented permission restriction while supporting permission
relaxing as a side effect since the PTEs are flushed and new PTEs will
support the new permission. A consequence was that the V1 SGX_IOC_PAGE_MODP
required ENCLU[EACCEPT] from within the enclave even if it was only intended
to be used to relax permissions. SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS in
V2 is exactly the same as SGX_IOC_PAGE_MODP of V1.

> 
> Why? Currently in Gramine (a library OS for unmodified applications,
> https://gramineproject.io/) with the new proposed change, one needs
> to store the page permission for each page or range of pages. And for
> every request of `mmap` or `mprotect`, Gramine would have to do a lookup
> of the page permissions for the request range and then call the respective
> IOCTL either RESTRICT or RELAX. This seems a little overwhelming.

Gramine would also need to know when to enter the enclave to run EMODPE, which
goes in hand with running SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.

> 
> Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do an
> `EACCEPT` irrespective of RELAX or RESTRICT page permission request? With this
> approach, we can avoid storing  page permissions and simplify the implementation.

This should be possible with the current implementation, similar to previous
implementation, but not optimal if only EMODPE followed by 
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS is what is needed.

> 
> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows to do
> TLB shootdowns which might not be needed for RELAX IOCTL but I am not sure what
> will be the performance impact. Is there any data point to see the performance impact?

It can be worse than just that. EMODPR requires the EPC page to be present
and thus the page would need to be loaded from swap and decrypted if it
is not present. This may also mean that existing EPC pages need to be
swapped out (first blocked, then encrypted to backing storage, then the
ETRACK flow followed by IPIs to ensure there are no more references to that
page) ... before there is space available for needed page to be loaded and
decrypted.

That only takes care of the EMODPR ... which as you state needs
to be followed by the ETRACK flow and IPIs.

The above is also just for the OS portion - after that there is the
EACCEPT that needs to be run from within the enclave for every page whether
permissions were relaxed or restricted. This would be dependent on the
implementation - whether the enclave is entered once per EACCEPT or once
for all EACCEPTs.

All of the above would be unnecessary if permissions were just relaxed from
within the enclave while SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS used to
perform the OS actions.

The performance impact should be easy to determine: run both ioctl()s
and compare how long they take. Since you are asking about Gramine this may be
best to do in that environment but I can attempt something on your behalf by
using the existing SGX selftest infrastructure.

As an experiment I modified the existing "unclobbered_vdso_oversubscribed_remove"
test case that currently runs the SGX_IOC_ENCLAVE_MODIFY_TYPE on a large memory
region to instead run ioctl()s SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS. In my test I ran these ioctl()s on a 4GB
memory range to amplify any performance impact since I was just measuring it
by printing timestamps from user space.

My result showed that:
* Running SGX_IOC_ENCLAVE_RELAX_PERMISSIONS on the 4GB region took less than a second
  No EACCEPT needed from user space.

* Running SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS on the 4GB region took about 20 seconds.
* Running EACCEPT on each enclave page took an additional 20 seconds. (Please note that
  this is using a sub obtimal way of entering the enclave for each EACCEPT where it
  would be more efficient to enter the enclave once and run EACCEPT for each page without
  exiting the enclave.)

The performance impact seems significant to me.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-23 19:21     ` Dhanraj, Vijay
  2022-02-23 22:42       ` Reinette Chatre
@ 2022-02-28 12:24       ` Jarkko Sakkinen
  2022-02-28 13:19         ` Jarkko Sakkinen
  2022-02-28 15:16         ` Dave Hansen
  2022-03-10  6:10       ` Jarkko Sakkinen
  2 siblings, 2 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-28 12:24 UTC (permalink / raw)
  To: Dhanraj, Vijay
  Cc: Chatre, Reinette, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> Hi All,
> 
> Regarding the recent update of splitting the page permissions change
> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> one? That is, revert to how it was done in the v1 version?

They are logically separate complex functionalities:

1. "restrict" calls EMODPR and requires EACCEPT
2. "relax" increases permissions up to vetted ("EADD") and could be
    combined with EMODPE called inside enclave.

I don't think it is a good idea.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-23 19:55         ` Reinette Chatre
@ 2022-02-28 12:27           ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-28 12:27 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Wed, Feb 23, 2022 at 11:55:03AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 2/23/2022 7:46 AM, Jarkko Sakkinen wrote:
> > On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote:
> >>> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote:
> >>
> >> ...
> >>
> >>>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> >>>> index 5c678b27bb72..b0ffb80bc67f 100644
> >>>> --- a/arch/x86/include/uapi/asm/sgx.h
> >>>> +++ b/arch/x86/include/uapi/asm/sgx.h
> >>>> @@ -31,6 +31,8 @@ enum sgx_page_flags {
> >>>>  	_IO(SGX_MAGIC, 0x04)
> >>>>  #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> >>>>  	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
> >>>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \
> >>>> +	_IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm)
> >>>>  
> >>>>  /**
> >>>>   * struct sgx_enclave_create - parameter structure for the
> >>>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm {
> >>>>  	__u64 count;
> >>>>  };
> >>>>  
> >>>> +/**
> >>>> + * struct sgx_enclave_restrict_perm - parameters for ioctl
> >>>> + *                                    %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> >>>> + * @offset:	starting page offset (page aligned relative to enclave base
> >>>> + *		address defined in SECS)
> >>>> + * @length:	length of memory (multiple of the page size)
> >>>> + * @secinfo:	address for the SECINFO data containing the new permission bits
> >>>> + *		for pages in range described by @offset and @length
> >>>> + * @result:	(output) SGX result code of ENCLS[EMODPR] function
> >>>> + * @count:	(output) bytes successfully changed (multiple of page size)
> >>>> + */
> >>>> +struct sgx_enclave_restrict_perm {
> >>>> +	__u64 offset;
> >>>> +	__u64 length;
> >>>> +	__u64 secinfo;
> >>>> +	__u64 result;
> >>>> +	__u64 count;
> >>>> +};
> >>>> +
> >>>>  struct sgx_enclave_run;
> >>>>  
> >>>>  /**
> >>
> >> ...
> >>
> >>>
> >>> Just a suggestion but these might be a bit less cluttered explanations of
> >>> the fields:
> >>>
> >>> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure
> >>> #[repr(C)]
> >>> pub struct RelaxPermissions {
> >>>     /// In: starting page offset
> >>>     offset: u64,
> >>>     /// In: length of the address range (multiple of the page size)
> >>>     length: u64,
> >>>     /// In: SECINFO containing the relaxed permissions
> >>>     secinfo: u64,
> >>>     /// Out: length of the address range successfully changed
> >>>     count: u64,
> >>> };
> >>>
> >>> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure
> >>> #[repr(C)]
> >>> pub struct RestrictPermissions {
> >>>     /// In: starting page offset
> >>>     offset: u64,
> >>>     /// In: length of the address range (multiple of the page size)
> >>>     length: u64,
> >>>     /// In: SECINFO containing the restricted permissions
> >>>     secinfo: u64,
> >>>     /// In: ENCLU[EMODPR] return value
> >>>     result: u64,
> >>>     /// Out: length of the address range successfully changed
> >>>     count: u64,
> >>> };
> >>
> >> In your proposal you shorten the descriptions from the current implementation.
> >> I do consider the removed information valuable since I believe that it helps
> >> users understand the kernel interface requirements without needing to be
> >> familiar with or dig into the kernel code to understand how the provided data
> >> is used.
> >>
> >> For example, you shorten offset to "starting page offset", but what was removed
> >> was the requirement that this offset has to be page aligned and what the offset
> >> is relative to. I do believe summarizing these requirements upfront helps
> >> a user space developer by not needing to dig through kernel code later
> >> in order to understand why an -EINVAL was received.
> >>
> >>  
> >>> I can live with the current ones too but I rewrote them so that I can
> >>> quickly make sense of the fields later. It's Rust code but the point is
> >>> the documentation...
> >>
> >> Since you do seem to be ok with the current descriptions I would prefer
> >> to keep them.
> > 
> > Yeah, they are fine to me.
> > 
> >>> Also, it should not be too much trouble to use the struct in user space
> >>> code even if the struct names are struct sgx_enclave_relax_permissions and
> >>> struct sgx_enclave_restrict_permissions, given that you most likely have
> >>> exactly single call-site in the run-time.
> >>
> >> Are you requesting that I make the following name changes?
> >> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions
> >> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions
> >>
> >> If so, do you want the function names also written out in this way?
> >> sgx_enclave_relax_perm()        -> sgx_enclave_relax_permissions()
> >> sgx_ioc_enclave_relax_perm()    -> sgx_ioc_enclave_relax_permissions()
> >> sgx_enclave_restrict_perm()     -> sgx_enclave_restrict_permissions()
> >> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions()
> > 
> > Yes, unless you have a specific reason to shorten them :-)
> 
> Just aesthetic reasons ... having a long function name can look unbalanced
> if it has many parameters and if the parameters themselves are long it
> becomes hard to keep to the required line length.
> 
> Even so, it does look as though the longest ones can be made to work within 80
> characters:
> sgx_enclave_restrict_permissions(...
>                                  struct sgx_enclave_restrict_permissions *modp,
>                                  ...)
> 
> Other (aesthetic) consequence would be, for example, the core sgx_ioctl() would
> now have some branches span more lines than other so it would not look as neat as
> now (this is subjective I know).
> 
> Apart from the aesthetic reasons I do not have another reason not to make the
> change and I will do so in the next version.

IMHO, for one call site aesthics reason in alignment is less important than
a no-brainer function name.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-28 12:24       ` Jarkko Sakkinen
@ 2022-02-28 13:19         ` Jarkko Sakkinen
  2022-02-28 15:16         ` Dave Hansen
  1 sibling, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-02-28 13:19 UTC (permalink / raw)
  To: Dhanraj, Vijay
  Cc: Chatre, Reinette, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Mon, Feb 28, 2022 at 01:25:07PM +0100, Jarkko Sakkinen wrote:
> On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > Hi All,
> > 
> > Regarding the recent update of splitting the page permissions change
> > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > one? That is, revert to how it was done in the v1 version?
> 
> They are logically separate complex functionalities:
> 
> 1. "restrict" calls EMODPR and requires EACCEPT
> 2. "relax" increases permissions up to vetted ("EADD") and could be
>     combined with EMODPE called inside enclave.
> 
> I don't think it is a good idea.

I.e. in microarchitecture there is no EMODP but two different flows, 
and thus it is not sane to act like there was with that kind of ioctl.
It is as granular as the hardware is this way, and I think that is
common sense.

It would make much sense as combining ECREATE/EADD/EINIT into a single
multi-function ioctl. Often user space needs to be anyway have at least
some logically distinct flows fore these.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-28 12:24       ` Jarkko Sakkinen
  2022-02-28 13:19         ` Jarkko Sakkinen
@ 2022-02-28 15:16         ` Dave Hansen
  2022-02-28 17:44           ` Dhanraj, Vijay
  2022-03-01 13:26           ` Jarkko Sakkinen
  1 sibling, 2 replies; 130+ messages in thread
From: Dave Hansen @ 2022-02-28 15:16 UTC (permalink / raw)
  To: Jarkko Sakkinen, Dhanraj, Vijay
  Cc: Chatre, Reinette, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On 2/28/22 04:24, Jarkko Sakkinen wrote:
>> Regarding the recent update of splitting the page permissions change
>> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
>> one? That is, revert to how it was done in the v1 version?
> They are logically separate complex functionalities:
> 
> 1. "restrict" calls EMODPR and requires EACCEPT
> 2. "relax" increases permissions up to vetted ("EADD") and could be
>     combined with EMODPE called inside enclave.

It would be great to have a _slightly_ better justification than that.
Existing permission interfaces like chmod or mprotect() don't have this
asymmetry.

I think you're saying that the underlying hardware implementation is
asymmetric, so the interface should be too.  I don't find that argument
very convincing.  If the hardware interface is arcane and we can make it
look more sane in the ioctl() layer, we should that, asymmetry or not.

If we can't make it any more sane, let's say why the ioctl() must or
should be asymmetric.

The SGX2 page permission mechanism is horribly counter intuitive.
*Everybody* that looks at it thinks that it's wrong.  That means that we
have a lot of work ahead of us to explain the interfaces that get
layered on top.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-28 15:16         ` Dave Hansen
@ 2022-02-28 17:44           ` Dhanraj, Vijay
  2022-03-01 13:26           ` Jarkko Sakkinen
  1 sibling, 0 replies; 130+ messages in thread
From: Dhanraj, Vijay @ 2022-02-28 17:44 UTC (permalink / raw)
  To: Hansen, Dave, Jarkko Sakkinen
  Cc: Chatre, Reinette, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

> On 2/28/22 04:24, Jarkko Sakkinen wrote:
> >> Regarding the recent update of splitting the page permissions change
> >> request into two IOCTLS (RELAX and RESTRICT), can we combine them
> >> into one? That is, revert to how it was done in the v1 version?
> > They are logically separate complex functionalities:
> >
> > 1. "restrict" calls EMODPR and requires EACCEPT 2. "relax" increases
> > permissions up to vetted ("EADD") and could be
> >     combined with EMODPE called inside enclave.
> 
> It would be great to have a _slightly_ better justification than that.
> Existing permission interfaces like chmod or mprotect() don't have this
> asymmetry.
> 
> I think you're saying that the underlying hardware implementation is
> asymmetric, so the interface should be too.  I don't find that argument very
> convincing.  If the hardware interface is arcane and we can make it look more
> sane in the ioctl() layer, we should that, asymmetry or not.
> 

Very nice analogy with `mprotect` and agree to this. It would be simpler from
user space point of view if we can abstract this and maintain a single interface
to relax or restrict permission. But if committee feels having two IOCTLS is the way,
then will modify Gramine to adopt this approach.



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-28 15:16         ` Dave Hansen
  2022-02-28 17:44           ` Dhanraj, Vijay
@ 2022-03-01 13:26           ` Jarkko Sakkinen
  2022-03-01 13:42             ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-01 13:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dhanraj, Vijay, Chatre, Reinette, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Mon, Feb 28, 2022 at 07:16:22AM -0800, Dave Hansen wrote:
> On 2/28/22 04:24, Jarkko Sakkinen wrote:
> >> Regarding the recent update of splitting the page permissions change
> >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> >> one? That is, revert to how it was done in the v1 version?
> > They are logically separate complex functionalities:
> > 
> > 1. "restrict" calls EMODPR and requires EACCEPT
> > 2. "relax" increases permissions up to vetted ("EADD") and could be
> >     combined with EMODPE called inside enclave.
> 
> It would be great to have a _slightly_ better justification than that.
> Existing permission interfaces like chmod or mprotect() don't have this
> asymmetry.
> 
> I think you're saying that the underlying hardware implementation is
> asymmetric, so the interface should be too.  I don't find that argument
> very convincing.  If the hardware interface is arcane and we can make it
> look more sane in the ioctl() layer, we should that, asymmetry or not.

That is my argument, yes.

> If we can't make it any more sane, let's say why the ioctl() must or
> should be asymmetric.

Perhaps underling this asymmetry in kdoc would be enough.

> The SGX2 page permission mechanism is horribly counter intuitive.
> *Everybody* that looks at it thinks that it's wrong.  That means that we
> have a lot of work ahead of us to explain the interfaces that get
> layered on top.

I fully agree on this :-)

With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
obviously new RX pages are now out of the picture:


	/*
	 * Adding a regular page that is architecturally allowed to only
	 * be created with RW permissions.
	 * TBD: Interface with user space policy to support max permissions
	 * of RWX.
	 */
	prot = PROT_READ | PROT_WRITE;
	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;

If that TBD is left out to the final version the page augmentation has a
risk of a API bottleneck, and that risk can realize then also in the page
permission ioctls.

I.e. now any review comment is based on not fully known territory, we have
one known unknown, and some unknown unknowns from unpredictable effect to
future API changes.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-01 13:26           ` Jarkko Sakkinen
@ 2022-03-01 13:42             ` Jarkko Sakkinen
  2022-03-01 17:48               ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-01 13:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dhanraj, Vijay, Chatre, Reinette, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Tue, Mar 01, 2022 at 02:26:48PM +0100, Jarkko Sakkinen wrote:
> On Mon, Feb 28, 2022 at 07:16:22AM -0800, Dave Hansen wrote:
> > On 2/28/22 04:24, Jarkko Sakkinen wrote:
> > >> Regarding the recent update of splitting the page permissions change
> > >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > >> one? That is, revert to how it was done in the v1 version?
> > > They are logically separate complex functionalities:
> > > 
> > > 1. "restrict" calls EMODPR and requires EACCEPT
> > > 2. "relax" increases permissions up to vetted ("EADD") and could be
> > >     combined with EMODPE called inside enclave.
> > 
> > It would be great to have a _slightly_ better justification than that.
> > Existing permission interfaces like chmod or mprotect() don't have this
> > asymmetry.
> > 
> > I think you're saying that the underlying hardware implementation is
> > asymmetric, so the interface should be too.  I don't find that argument
> > very convincing.  If the hardware interface is arcane and we can make it
> > look more sane in the ioctl() layer, we should that, asymmetry or not.
> 
> That is my argument, yes.
> 
> > If we can't make it any more sane, let's say why the ioctl() must or
> > should be asymmetric.
> 
> Perhaps underling this asymmetry in kdoc would be enough.
> 
> > The SGX2 page permission mechanism is horribly counter intuitive.
> > *Everybody* that looks at it thinks that it's wrong.  That means that we
> > have a lot of work ahead of us to explain the interfaces that get
> > layered on top.
> 
> I fully agree on this :-)
> 
> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
> obviously new RX pages are now out of the picture:
> 
> 
> 	/*
> 	 * Adding a regular page that is architecturally allowed to only
> 	 * be created with RW permissions.
> 	 * TBD: Interface with user space policy to support max permissions
> 	 * of RWX.
> 	 */
> 	prot = PROT_READ | PROT_WRITE;
> 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> 
> If that TBD is left out to the final version the page augmentation has a
> risk of a API bottleneck, and that risk can realize then also in the page
> permission ioctls.
> 
> I.e. now any review comment is based on not fully known territory, we have
> one known unknown, and some unknown unknowns from unpredictable effect to
> future API changes.

I think the best way to move forward would be to do EAUG's explicitly with
an ioctl that could also include secinfo for permissions. Then you can
easily do the rest with EACCEPTCOPY inside the enclave.

Putting EAUG to the #PF handler and implicitly call it just too flakky and
hard to make deterministic for e.g. JIT compiler in our use case (not to
mention that JIT is not possible at all because inability to do RX pages).

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-01 13:42             ` Jarkko Sakkinen
@ 2022-03-01 17:48               ` Reinette Chatre
  2022-03-02  2:05                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-01 17:48 UTC (permalink / raw)
  To: Jarkko Sakkinen, Dave Hansen
  Cc: Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
>> obviously new RX pages are now out of the picture:
>>
>>
>> 	/*
>> 	 * Adding a regular page that is architecturally allowed to only
>> 	 * be created with RW permissions.
>> 	 * TBD: Interface with user space policy to support max permissions
>> 	 * of RWX.
>> 	 */
>> 	prot = PROT_READ | PROT_WRITE;
>> 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>>
>> If that TBD is left out to the final version the page augmentation has a
>> risk of a API bottleneck, and that risk can realize then also in the page
>> permission ioctls.
>>
>> I.e. now any review comment is based on not fully known territory, we have
>> one known unknown, and some unknown unknowns from unpredictable effect to
>> future API changes.

The plan to complete the "TBD" in the above snippet was to follow this work
with user policy integration at this location. On a high level the plan was
for this to look something like:


 	/*
 	 * Adding a regular page that is architecturally allowed to only
 	 * be created with RW permissions.
 	 * Interface with user space policy to support max permissions
 	 * of RWX.
 	 */
 	prot = PROT_READ | PROT_WRITE;
 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);

        if (user space policy allows RWX on dynamically added pages)
	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0);
	else
		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0);

The work that follows this series aimed to do the integration with user
space policy.

> I think the best way to move forward would be to do EAUG's explicitly with
> an ioctl that could also include secinfo for permissions. Then you can
> easily do the rest with EACCEPTCOPY inside the enclave.

SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
this purpose. It already includes SECINFO which may also be useful if
needing to later support EAUG of PT_SS* pages.

How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
after enclave initialization on any memory region within the enclave where
pages are planned to be added dynamically. This ioctl() calls EAUG to add the
new pages with RW permissions and their vm_max_prot_bits can be set to the
permissions found in the included SECINFO. This will support later EACCEPTCOPY
as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS

The big question is whether communicating user policy after enclave initialization
via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
appreciate a confirmation on this direction considering the significant history
behind this topic.
 
> Putting EAUG to the #PF handler and implicitly call it just too flakky and
> hard to make deterministic for e.g. JIT compiler in our use case (not to
> mention that JIT is not possible at all because inability to do RX pages).

In this series this is indeed not possible because it lacks the user policy
integration. JIT will be possible after user policy integration.

Reinette
 

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-01 17:48               ` Reinette Chatre
@ 2022-03-02  2:05                 ` Jarkko Sakkinen
  2022-03-02  2:11                   ` Jarkko Sakkinen
  2022-03-02 22:57                   ` Reinette Chatre
  0 siblings, 2 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-02  2:05 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> >> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
> >> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
> >> obviously new RX pages are now out of the picture:
> >>
> >>
> >> 	/*
> >> 	 * Adding a regular page that is architecturally allowed to only
> >> 	 * be created with RW permissions.
> >> 	 * TBD: Interface with user space policy to support max permissions
> >> 	 * of RWX.
> >> 	 */
> >> 	prot = PROT_READ | PROT_WRITE;
> >> 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> >> 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> >>
> >> If that TBD is left out to the final version the page augmentation has a
> >> risk of a API bottleneck, and that risk can realize then also in the page
> >> permission ioctls.
> >>
> >> I.e. now any review comment is based on not fully known territory, we have
> >> one known unknown, and some unknown unknowns from unpredictable effect to
> >> future API changes.
> 
> The plan to complete the "TBD" in the above snippet was to follow this work
> with user policy integration at this location. On a high level the plan was
> for this to look something like:
> 
> 
>  	/*
>  	 * Adding a regular page that is architecturally allowed to only
>  	 * be created with RW permissions.
>  	 * Interface with user space policy to support max permissions
>  	 * of RWX.
>  	 */
>  	prot = PROT_READ | PROT_WRITE;
>  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> 
>         if (user space policy allows RWX on dynamically added pages)
> 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0);
> 	else
> 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0);
> 
> The work that follows this series aimed to do the integration with user
> space policy.

What do you mean by "user space policy" anyway exactly? I'm sorry but I
just don't fully understand this.

It's too big of a risk to accept this series without X taken care of. Patch
series should neither have TODO nor TBD comments IMHO. I don't want to ack
a series based on speculation what might happen in the future.

> > I think the best way to move forward would be to do EAUG's explicitly with
> > an ioctl that could also include secinfo for permissions. Then you can
> > easily do the rest with EACCEPTCOPY inside the enclave.
> 
> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
> this purpose. It already includes SECINFO which may also be useful if
> needing to later support EAUG of PT_SS* pages.

You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.

And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird
thing added to the #PF handler? Why is it added at all then?

> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
> after enclave initialization on any memory region within the enclave where
> pages are planned to be added dynamically. This ioctl() calls EAUG to add the
> new pages with RW permissions and their vm_max_prot_bits can be set to the
> permissions found in the included SECINFO. This will support later EACCEPTCOPY
> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS

I don't like this type of re-use of the existing API.

> The big question is whether communicating user policy after enclave initialization
> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
> appreciate a confirmation on this direction considering the significant history
> behind this topic.

I have no idea because I don't know what is user space policy.

> > Putting EAUG to the #PF handler and implicitly call it just too flakky and
> > hard to make deterministic for e.g. JIT compiler in our use case (not to
> > mention that JIT is not possible at all because inability to do RX pages).
> 
> In this series this is indeed not possible because it lacks the user policy
> integration. JIT will be possible after user policy integration.

Like this I don't what this series can be used in practice.

Majority of practical use cases for EDMM boil down to having a way to add
new executable code (not just Enarx).

> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-02  2:05                 ` Jarkko Sakkinen
@ 2022-03-02  2:11                   ` Jarkko Sakkinen
  2022-03-02  4:03                     ` Jarkko Sakkinen
  2022-03-02 22:57                   ` Reinette Chatre
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-02  2:11 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Wed, Mar 02, 2022 at 03:05:25AM +0100, Jarkko Sakkinen wrote:
> > The work that follows this series aimed to do the integration with user
> > space policy.
> 
> What do you mean by "user space policy" anyway exactly? I'm sorry but I
> just don't fully understand this.
> 
> It's too big of a risk to accept this series without X taken care of. Patch
> series should neither have TODO nor TBD comments IMHO. I don't want to ack
> a series based on speculation what might happen in the future.

If I accept this, then I'm kind of pre-acking code that I have no idea what
it looks like, can it be acked, or am I doing the right thing for the
kernel by acking this. 

It's unfortunately force majeure situation for me. I simply could not ack
this, whether I want it or not.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-02  2:11                   ` Jarkko Sakkinen
@ 2022-03-02  4:03                     ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-02  4:03 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Wed, Mar 02, 2022 at 03:11:06AM +0100, Jarkko Sakkinen wrote:
> On Wed, Mar 02, 2022 at 03:05:25AM +0100, Jarkko Sakkinen wrote:
> > > The work that follows this series aimed to do the integration with user
> > > space policy.
> > 
> > What do you mean by "user space policy" anyway exactly? I'm sorry but I
> > just don't fully understand this.
> > 
> > It's too big of a risk to accept this series without X taken care of. Patch
> > series should neither have TODO nor TBD comments IMHO. I don't want to ack
> > a series based on speculation what might happen in the future.
> 
> If I accept this, then I'm kind of pre-acking code that I have no idea what
> it looks like, can it be acked, or am I doing the right thing for the
> kernel by acking this. 
> 
> It's unfortunately force majeure situation for me. I simply could not ack
> this, whether I want it or not.

I'd actually to leave out permission change madness completely out of this
patch set, as we all know it is a grazy beast of microarchitecture. For
user space having that is less critical than having executable pages.

Simply with EAUG/EACCEPTCOPY you can already populate enclave with any
permissions you had in mind. Augmenting alone would be logically consistent
patch set that is actually usable for many workloads.

Now there is half-broken augmenting (this is even writtend down to the TBD
comment) and complex code for EMODPR and EMODT that is usable only for
kselftests and not much else before there is fully working augmenting.

This way we get actually sound patch set that is easy to review and apply
to the mainline. It is also factors easier for you to iterate a smaller
set of patches.

After this it is so much easier to start to look at remaining functionality,
and at the same time augmenting part can be stress tested with real-world
code and it will mature quickly.

This whole thing *really* needs a serious U-turn on how it is delivered to
the upstream. Sometimes it is better just to admit that this didn't start
with the right foot.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-02-23 18:25       ` Reinette Chatre
@ 2022-03-02 16:57         ` Nathaniel McCallum
  2022-03-02 21:20           ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Nathaniel McCallum @ 2022-03-02 16:57 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

Reinette,

Perhaps it would be better for us to have a shared understanding on
how the patches as posted are supposed to work in the most common
cases? I'm thinking here of projects such as Enarx, Gramine and
Occulum, which all have a similar process. Namely they execute an
executable (called exec in the below chart) which has things like
syscalls handled by a shim. These two components (shim and exec) are
supported by a non-enclave userspace runtime. Given this common
architectural pattern, this is how I understand adding pages via an
exec call to mmap() to work.

https://mermaid.live/edit#pako:eNp1k81qwzAQhF9F6NRCAu1Vh0BIRemhoeSHBuIettYmFpElVZZLQ8i7144sJ8aOT2bmY3d2vT7R1AikjBb4U6JO8UXC3kGeaFI9FpyXqbSgPTmg06j6uiu1lzn2jSKTA2XwD9NEB31uPBLzi-6iMpLnYB8Wn4-kOBYpKBW52iXj8WQSmzEy5Zvt01ewG5HUQN2UEc7nK77YPjdALd64GWih8NpkALGwR_JtzOGAaKXexyTKGEt2pgoMaXahgj5Qgk9nM_6xGvDDJpsmOyiVv0LB62B8un4dBDrLiLPeWciCL9fvvKVQizhSG6stFz9Df7sxUpcYitR-SodFO2A_Vw-7l4nzzduqjX9bKJxOHDDeBB3RHF0OUlS3faq1hPoMqzulrHoVGPZOE32u0NIK8MiF9MZRtgNV4IhC6c3yqFPKvCsxQs3_0VDnfzf-CPg

This only covers adding RW pages. I haven't even tackled permission
changes yet. Is that understanding correct? If not, please provide an
alternative sequence diagram to explain how you expect this to be
used.

On Wed, Feb 23, 2022 at 1:25 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Nathaniel,
>
> On 2/23/2022 5:24 AM, Nathaniel McCallum wrote:
> > On Tue, Feb 22, 2022 at 5:39 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >>
> >> Hi Nathaniel,
> >>
> >> On 2/22/2022 12:27 PM, Nathaniel McCallum wrote:
> >>> 1. This interface looks very odd to me. mmap() is the kernel interface
> >>> for changing user space memory maps. Why are we introducing a new
> >>> interface for this?
> >>
> >> mmap() is the kernel interface used to create new mappings in the
> >> virtual address space of the calling process. This is different from
> >> the permissions and properties of the underlying file/memory being mapped.
> >>
> >> A new interface is introduced because changes need to be made to the
> >> permissions and properties of the underlying enclave. A new virtual
> >> address space is not needed nor should existing VMAs be impacted.
> >>
> >> This is similar to how mmap() is not used to change file permissions.
> >>
> >> VMA permissions are separate from enclave page permissions as found in
> >> the EPCM (Enclave Page Cache Map). The current implementation (SGX1) already
> >> distinguishes between the VMA and EPCM permissions - for example, it is
> >> already possible to create a read-only VMA from enclave pages that have
> >> RW EPCM permissions. mmap() of a portion of EPC memory with a particular
> >> permission does not imply that the underlying EPCM permissions (should)have
> >> that permission.
> >
> > Yes. BUT... unlike the file permissions, this leaks an implementation detail.
>
> Not really - just like a RW file can be mapped read-only or RW, RW enclave
> memory can be mapped read-only or RW.
>
> >
> > The user process is governed by VMA permissions. And during enclave
> > creation, it had to mmap() all the enclave regions to their final VMA
> > permissions. So during enclave creation you have to use mmap() but
> > after enclave creation you use custom APIs? That's inconsistent at
> > best.
>
> No. ioctl()s are consistently used to manage enclave memory.
>
> The existing ioctls() SGX_IOC_ENCLAVE_CREATE, SGX_IOC_ENCLAVE_ADD_PAGES,
> and SGX_IOC_ENCLAVE_INIT are used to set up to initialize the enclave memory.
>
> The new ioctls() are used to manage enclave memory after enclave initialization.
>
> The enclave memory is thus managed with a consistent interface.
>
> mmap() is required before SGX_IOC_ENCLAVE_CREATE to obtain a base address
> for the enclave that is required by the ioctl(). The rest of the ioctl()s,
> existing and new, are consistent in interface by not requiring a memory
> mapping but instead work from an offset from the base address.
>
> > Forcing userspace to worry about the (mostly undocumented!)
> > interactions between EPC, PTE and VMA permissions makes these APIs
> > hard to use and difficult to reason about.
>
> This is not new. The current SGX1 user space is already prevented from
> creating a mapping of enclave memory that is more relaxed than the enclave
> memory. For example, if the enclave memory has RW EPCM permissions then it
> is not possible to mmap() that memory as RWX.
>
> >
> > When I call SGX_IOC_ENCLAVE_RELAX_PERMISSIONS, do I also have to call
> > mmap() to update the VMA permissions to match? It isn't clear. Nor is
>
> mprotect() may be the better call to use.
>
> > it really clear why I'm calling completely separate APIs.
> >
> >>> You can just simply add a new mmap flag (i.e.
> >>> MAP_SGX_TCS*) and then figure out which SGX instructions to execute
> >>> based on the desired state of the memory maps. If you do this, none of
> >>> the following ioctls are needed:
> >>>
> >>> * SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> >>> * SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
> >>> * SGX_IOC_ENCLAVE_REMOVE_PAGES
> >>> * SGX_IOC_ENCLAVE_MODIFY_TYPE
> >>>
> >>> It also means that languages don't have to grow support for all these
> >>> ioctls. Instead, they can just reuse the existing mmap() bindings with
> >>> the new flag. Also, multiple operations can be combined into a single
> >>> mmap() call, amortizing the changes over a single context switch.
> >>>
> >>> 2. Automatically adding pages with hard-coded permissions in a fault
> >>> handler seems like a really bad idea.
> >>
> >> Could you please elaborate why this is a bad idea?
> >
> > Because implementations that miss this subtlety suddenly have pages
> > with magic permissions. Magic is bad. Explicit is good.
> >
>
> There is no magic. Any new pages have to be accepted by the enclave.
> The enclave will not be able to access these pages unless explicitly
> accepted, ENCLU[EACCEPT], from within the enclave.
>
> >>> How do you distinguish between
> >>> accesses which should result in an updated mapping and accesses that
> >>> should result in a fault?
> >>
> >> Accesses that should result in an updated mapping have two requirements:
> >> (a) address accessed belongs to the enclave based on the address
> >>     range specified during enclave create
> >> (b) there is no backing enclave page for the address
> >
> > What happens if the enclave is buggy? Or has been compromised. In both
> > of those cases, there should be a userspace visible fault and pages
> > should not be added.
>
> If user space accesses a memory address with a regular read/write that
> results in a new page added then there is indeed a user space visible
> fault. You can see this flow in action in the "augment" test case in
> https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/
>
> If user space indeed wants the page after encountering such a fault then
> it needs to enter the enclave again, from a different entry point, to
> run ENCLU[EACCEPT], before it can return to the original entry point to
> continue execution from the instruction that triggered the original read/write.
>
> The only flow where a page is added without a user space visible fault
> is when user space explicitly runs the ENCLU[EACCEPT] to do so.
>
> >
> >>> IMHO, all unmapped page accesses should
> >>> result in a page fault. mmap() should be called first to identify the
> >>> correct permissions for these pages.
> >>> Then the page handler should be
> >>> updated to use the permissions from the mapping when backfilling
> >>> physical pages. If I understand correctly, this should also obviate
> >>
> >> Regular enclave pages can _only_ be dynamically added with RW permission.
> >>
> >> SGX2's support for adding regular pages to an enclave via the EAUG
> >> instruction is architecturally set at RW. The OS cannot change those permissions
> >> via the EAUG instruction nor can the OS do so with a different/additional
> >> instruction because:
> >> * the OS is not able to relax permissions since that can only be done from
> >> within the enclave with ENCLU[EMODPE], thus it is not possible for the OS to
> >> dynamically add pages via EAUG as RW and then relax permissions to RWX.
> >> * the OS is not able to EAUG a page and immediately attempt an EMODPR either
> >> as Jarkko also recently inquired about:
> >> https://lore.kernel.org/linux-sgx/80f3d7b9-e3d5-b2c0-7707-710bf6f5081e@intel.com/
> >
> > This design looks... unfinished. EAUG takes a PAGEINFO in RBX, but
> > PAGEINFO.SECINFO must be zeroed and EAUG instead sets magic hard-coded
> > permissions. Why doesn't EAUG just respect the permissions in
> > PAGEINFO.SECINFO? We aren't told.
>
> This design is finished and respects the hardware specification. You can find
> the details in the SDM's documentation of the EAUG function.
>
> If the SECINFO field has a value then the hardware requires it to indicate
> that it is a new shadow stack page being added, not a regular page. Support for
> shadow stack pages is not in scope for this work. Attempting to dynamically
> add a regular page with explicit permissions will result in a #GP(0).
>
> The only way to add a regular enclave page is to make the SECINFO field empty
> and doing so forces the page type to be a regular page and the permissions to
> be RW.
>
> >
> > Further, if the enclave can do EMODPE, why does
> > SGX_IOC_ENCLAVE_RELAX_PERMISSIONS even exist? None of the
> > documentation explains what this ioctl even does. Does it update PTE
> > permissions? VMA permissions? Nobody knows without reading the source
> > code.
>
> Build the documentation (after applying this series) and it should
> contain all the information you are searching for. As is the current custom
> in the SGX documentation the built documentation pulls its content from
> the kernel doc of the functions that implement the core of the
> user space interactions.
>
> >
> > Userspace should not be bothered with the subtle details of the
> > interaction between EPC, PTE and VMA permissions. But this API does
> > everything it can do to expose all these details to userspace. And it
> > doesn't bother to document them (probably because it is hard). It
> > would be much better to avoid exposing these details to userspace.
> >
> > IMHO, there should be a simple flow like this (if EAUG respects
> > PAGEINFO.SECINFO):
>
> EAUG does not respect PAGEINFO.SECINFO for regular pages.
>
> >
> > 1. Non-enclave calls mmap()/munmap().
> > 2. Enclave issues EACCEPT, if necessary.
> > 3. Enclave issues EMODPE, if necessary.
> >
> > Notice that in the second step above, during the mmap() call, the
> > kernel ensures that EPC, PTE and VMA are in sync and fails if they
> > cannot be made to be compatible. Also note that in the above flow EAUG
> > instructions can be efficiently batched.
> >
> > Given the current poor state of the EAUG instruction, we might need to
> > do this flow instead:
> >
> > 1. Enclave issues EACCEPT, if necessary. (Add RW pages...)
> > 2. Non-enclave calls mmap()/munmap().
> > 3. Enclave issues EACCEPT, if necessary.
> > 4. Enclave issues EMODPE, if necessary.
> >
> > However, doing EAUG only via the page access handler means that there
> > is no way to batch EAUG instructions and this forces a context switch
> > for every page you want to add. This has to be terrible for
> > performance. Note specifically that the SDM calls out batching, which
> > is currently impossible under this patch set. 35.5.7 - "Page
> > allocation operations may be batched to improve efficiency."
>
> These page functions are all per-page so it is not possible to add multiple
> pages with a single instruction. It is indeed possible to pre-fault pages.
>
> > As it stands today, if I want to add 256MiB of pages to an enclave,
> > I'll have to do 2^16 context switches. That doesn't seem scalable.
>
> No. Running ENCLU[EACCEPT] on each of the pages within that range should not
> need any explicit context switch out of the enclave. See the "augment_via_eaccept"
> test case in:
> https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/
>
>
> >>> the need for the weird userspace callback to allow for execute
> >>> permissions.
> >>
> >> User policy integration would always be required to allow execute
> >> permissions on a writable page. This is not expected to be a userspace
> >> callback but instead integration with existing user policy subsystem(s).
> >
> > Why? This isn't documented.
>
> This is similar to the existing policies involved in managing the permissions
> of mapped memory. When user space calls mprotect() to change permissions
> of a mapped region then the kernel will not blindly allow the permissions but
> instead ensure that it is allowed based on user policy by calling the LSM
> (Linux Security Module) hooks.
>
> You can learn more about LSM and various security modules at:
> Documentation/security/lsm.rst
> Documentation/admin-guide/LSM/*
>
> You can compare what is needed here to what is currently done when user space
> attempts to make some memory executable (see:
> mm/mprotect.c:do_mprotect_key()->security_file_mprotect()). User policy needs
> to help the kernel determine if this is allowed. For example, when SELinux is
> the security module of choice then the process or file (depending on what type
> of memory is being changed) needs to have a special permission (PROCESS__EXECHEAP,
> PROCESS__EXECSTACK, or FILE__EXECMOD) assigned by user space to allow this.
>
> Integration with user space policy is required for RWX of dynamically added pages
> to be supported. In this series dynamically added pages will not be allowed to
> be made executable, a follow-up series will add support for user policy
> integration to support RWX permissions of dynamically added pages.
>
> >>> 3. Implementing as I've suggested also means that we can lock down an
> >>> enclave, for example - after code has been JITed, by closing the file
> >>> descriptor. Once the file descriptor used to create the enclave is
> >>> closed, no further mmap() can be performed on the enclave. Attempting
> >>> to do EACCEPT on an unmapped page will generate a page fault.
> >>
> >> This is not clear to me. If the file descriptor is closed and no further
> >> mmap() is allowed then how would a process be able to enter the enclave
> >> to execute code within it?
> >
> > EENTER (or the vdso function) with the address of a TCS page, like
> > normal. In Enarx, we don't retain the enclave fd after the final
> > mmap() following EINIT. Everything works just fine.
>
> The OS fault handler is responsible for managing the PTEs that is required
> for the enclave to be able to access the memory within the enclave.
> The OS fault handler is attached to a VMA that is created with mmap().
>
> >
> >> This series does indeed lock down the address range to ensure that it is
> >> not possible to map memory that does not belong to the enclave after the
> >> enclave is created. Please see:
> >> https://lore.kernel.org/linux-sgx/1b833dbce6c937f71523f4aaf4b2181b9673519f.1644274683.git.reinette.chatre@intel.com/
> >
> > That's not what I'm talking about. I'm talking about a workflow like this:
> >
> > 1. Enclave initialization: ECREATE ... EINIT
> > 2. EENTER
> > 3. Enclave JITs some code (changes page permissions)
> > 4. EEXIT
> > 5. Close enclave fd.
> > 6. EENTER
> > 7. If an enclave attempts page modifications, a fault occurs.
>
> The original fd that was created to obtain the enclave base address
> may be closed at (5) but the executable and data portions of the enclave
> still needs to be mapped afterwards to be able to have OS support for
> managing the PTEs that the enclave depends on to access those pages.
>
> >
> > Think of this similar to seccomp(). The enclave wants to do some
> > dynamic page table manipulation. But then it wants to lock down page
> > table modification so that, if compromised, attackers have no ability
> > to obtain RWX permissions.
>
> Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-03-02 16:57         ` Nathaniel McCallum
@ 2022-03-02 21:20           ` Reinette Chatre
  2022-03-03  1:13             ` Nathaniel McCallum
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-02 21:20 UTC (permalink / raw)
  To: Nathaniel McCallum
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

Hi Nathaniel,

On 3/2/2022 8:57 AM, Nathaniel McCallum wrote:
> Perhaps it would be better for us to have a shared understanding on
> how the patches as posted are supposed to work in the most common
> cases? I'm thinking here of projects such as Enarx, Gramine and
> Occulum, which all have a similar process. Namely they execute an
> executable (called exec in the below chart) which has things like
> syscalls handled by a shim. These two components (shim and exec) are
> supported by a non-enclave userspace runtime. Given this common
> architectural pattern, this is how I understand adding pages via an
> exec call to mmap() to work.
> 
> https://mermaid.live/edit#pako:eNp1k81qwzAQhF9F6NRCAu1Vh0BIRemhoeSHBuIettYmFpElVZZLQ8i7144sJ8aOT2bmY3d2vT7R1AikjBb4U6JO8UXC3kGeaFI9FpyXqbSgPTmg06j6uiu1lzn2jSKTA2XwD9NEB31uPBLzi-6iMpLnYB8Wn4-kOBYpKBW52iXj8WQSmzEy5Zvt01ewG5HUQN2UEc7nK77YPjdALd64GWih8NpkALGwR_JtzOGAaKXexyTKGEt2pgoMaXahgj5Qgk9nM_6xGvDDJpsmOyiVv0LB62B8un4dBDrLiLPeWciCL9fvvKVQizhSG6stFz9Df7sxUpcYitR-SodFO2A_Vw-7l4nzzduqjX9bKJxOHDDeBB3RHF0OUlS3faq1hPoMqzulrHoVGPZOE32u0NIK8MiF9MZRtgNV4IhC6c3yqFPKvCsxQs3_0VDnfzf-CPg
> 
> This only covers adding RW pages. I haven't even tackled permission
> changes yet. Is that understanding correct? If not, please provide an
> alternative sequence diagram to explain how you expect this to be
> used.

Please find my attempt linked below:

https://mermaid.live/edit#pako:eNqFUsFqAjEQ_ZWQUwsK7XUPgthQeqiUVang9jAkoxu6m2yzWVsR_72J2WTbKnSOb97MvPeSI-VaIM1oix8dKo4PEnYG6kIRVw0YK7lsQFlSghGfYPCy845GYXWJm05ZWV8ZaEt55QB-IS9UwOfaItF7NGc0I3UNzU3-ekvaQ8uhqiLPd8l4PJnEYxmZsvXm7i20e5B4QlA5rAqMgJJfG9Ixg21X2ctVXn9GGJsvWb65729FSZXWDdlqpxx46Qzu-gB8-cHzhhim2zKdzdjLcuAAt3IPzv6Qkq84EdxGM3492UJS-cdSpLHp6nEgCPz3RjI5NPvAlRisJjspOsbWT8sUyc_MwjuynC1Wzyw9EB3RGk0NUrgvePRYQW2J7tNQd5sKDN5ooU6O2jXCiWZCWm1otoWqxRGFzurFQXGaWdNhJPXfuGedvgFejOuH

The changes include:
* Move mmap() to occur before attempting EACCEPT on the addresses. This is
  required for EACCEPT (as well as any subsequent access from within the enclave)
  to be able to access the pages.
* Remove AEX[1] to the runtime within the loop. After EAUG returns execution
  will return to the instruction pointer that triggered the #PF, EACCEPT,
  this will cause the EACCEPT to be run again, this time succeeding.

This is based on the implementation within this series. When supporting
the new ioctl() requested by Jarkko there will be an additional ioctl()
required before the loop.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-02  2:05                 ` Jarkko Sakkinen
  2022-03-02  2:11                   ` Jarkko Sakkinen
@ 2022-03-02 22:57                   ` Reinette Chatre
  2022-03-03 16:08                     ` Haitao Huang
  2022-03-03 23:12                     ` Jarkko Sakkinen
  1 sibling, 2 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-03-02 22:57 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>>>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of
>>>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
>>>> obviously new RX pages are now out of the picture:
>>>>
>>>>
>>>> 	/*
>>>> 	 * Adding a regular page that is architecturally allowed to only
>>>> 	 * be created with RW permissions.
>>>> 	 * TBD: Interface with user space policy to support max permissions
>>>> 	 * of RWX.
>>>> 	 */
>>>> 	prot = PROT_READ | PROT_WRITE;
>>>> 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>>> 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>>>>
>>>> If that TBD is left out to the final version the page augmentation has a
>>>> risk of a API bottleneck, and that risk can realize then also in the page
>>>> permission ioctls.
>>>>
>>>> I.e. now any review comment is based on not fully known territory, we have
>>>> one known unknown, and some unknown unknowns from unpredictable effect to
>>>> future API changes.
>>
>> The plan to complete the "TBD" in the above snippet was to follow this work
>> with user policy integration at this location. On a high level the plan was
>> for this to look something like:
>>
>>
>>  	/*
>>  	 * Adding a regular page that is architecturally allowed to only
>>  	 * be created with RW permissions.
>>  	 * Interface with user space policy to support max permissions
>>  	 * of RWX.
>>  	 */
>>  	prot = PROT_READ | PROT_WRITE;
>>  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>
>>         if (user space policy allows RWX on dynamically added pages)
>> 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0);
>> 	else
>> 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0);
>>
>> The work that follows this series aimed to do the integration with user
>> space policy.
> 
> What do you mean by "user space policy" anyway exactly? I'm sorry but I
> just don't fully understand this.

My apologies - I just assumed that you would need no reminder about this contentious
part of SGX history. Essentially it means that, yes, the kernel could theoretically
permit any kind of access to any file/page, but some accesses are known to generally
be a bad idea - like making memory executable as well as writable - and thus there
are additional checks based on what user space permits before the kernel allows
such accesses.

For example,
mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()

User policy and SGX has seen significant discussion. Some notable threads:
https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
 
> It's too big of a risk to accept this series without X taken care of. Patch
> series should neither have TODO nor TBD comments IMHO. I don't want to ack
> a series based on speculation what might happen in the future.

ok

> 
>>> I think the best way to move forward would be to do EAUG's explicitly with
>>> an ioctl that could also include secinfo for permissions. Then you can
>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>
>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>> this purpose. It already includes SECINFO which may also be useful if
>> needing to later support EAUG of PT_SS* pages.
> 
> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.

I could, yes.

> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird
> thing added to the #PF handler? Why is it added at all then?

I was just speculating in my response, there is no plan to extend
SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).

>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>> after enclave initialization on any memory region within the enclave where
>> pages are planned to be added dynamically. This ioctl() calls EAUG to add the
>> new pages with RW permissions and their vm_max_prot_bits can be set to the
>> permissions found in the included SECINFO. This will support later EACCEPTCOPY
>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> 
> I don't like this type of re-use of the existing API.

I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
considering the user policy question (above) and performance trade-off (more below).

> 
>> The big question is whether communicating user policy after enclave initialization
>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
>> appreciate a confirmation on this direction considering the significant history
>> behind this topic.
> 
> I have no idea because I don't know what is user space policy.

This discussion is about some enclave usages needing RWX permissions
on dynamically added enclave pages. RWX permissions on dynamically added pages is
not something that should blindly be allowed for all SGX enclaves but instead the user
needs to explicitly allow specific enclaves to have such ability. This is equivalent
to (but not the same as) what exists in Linux today with LSM. As seen in
mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
files and memory be both writable and executable, but it would only do so for those
files and memory that the LSM (which is how user policy is communicated, like SELinux)
indicates it is allowed, not blindly do so for all files and all memory.

>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
>>> hard to make deterministic for e.g. JIT compiler in our use case (not to
>>> mention that JIT is not possible at all because inability to do RX pages).

I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
what I understand it would have a performance impact since it would require all memory
that may be needed by the enclave be pre-allocated from outside the enclave and not
just dynamically allocated from within the enclave at the time it is needed.

Would such a performance impact be acceptable?

>> In this series this is indeed not possible because it lacks the user policy
>> integration. JIT will be possible after user policy integration.
> 
> Like this I don't what this series can be used in practice.
> 
> Majority of practical use cases for EDMM boil down to having a way to add
> new executable code (not just Enarx).
> 

Understood.

On 3/1/2022 8:03 PM, Jarkko Sakkinen wrote:
> I'd actually to leave out permission change madness completely out of this
> patch set, as we all know it is a grazy beast of microarchitecture. For
> user space having that is less critical than having executable pages.
> 
> Simply with EAUG/EACCEPTCOPY you can already populate enclave with any
> permissions you had in mind. Augmenting alone would be logically consistent
> patch set that is actually usable for many workloads.

Support for permission changes is required in order to support dynamically added
pages (EAUG pages) to be made executable. Yes, you could make
a dynamically added page have executable EPCM permissions using EACCEPTCOPY
but the kernel is still required to make the PTE executable.

> Now there is half-broken augmenting (this is even writtend down to the TBD
> comment) and complex code for EMODPR and EMODT that is usable only for
> kselftests and not much else before there is fully working augmenting.
> 
> This way we get actually sound patch set that is easy to review and apply
> to the mainline. It is also factors easier for you to iterate a smaller
> set of patches.
> 
> After this it is so much easier to start to look at remaining functionality,
> and at the same time augmenting part can be stress tested with real-world
> code and it will mature quickly.
> 
> This whole thing *really* needs a serious U-turn on how it is delivered to
> the upstream. Sometimes it is better just to admit that this didn't start
> with the right foot.

As mentioned above, from what I understand the support for (as you state) the
"majority of practical use cases" on dynamically added pages do require
supporting permission changes also. It thus seems to me that it would help
consuming this feature if dynamic addition of pages and permission changes
are presented together. The SGX2 functionality that remains after that is the
changing of page type, which forms part of the page removal flow. In this
regard I also find that presenting the page addition flow at the same time
as the page removal flow would make these features easier to consume. I
think supporting the addition of pages and leaving page removal to
"future work" would be similarly frustrating to consume.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-03-02 21:20           ` Reinette Chatre
@ 2022-03-03  1:13             ` Nathaniel McCallum
  2022-03-03 17:49               ` Reinette Chatre
  2022-03-04  0:57               ` Jarkko Sakkinen
  0 siblings, 2 replies; 130+ messages in thread
From: Nathaniel McCallum @ 2022-03-03  1:13 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

On Wed, Mar 2, 2022 at 4:20 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Nathaniel,
>
> On 3/2/2022 8:57 AM, Nathaniel McCallum wrote:
> > Perhaps it would be better for us to have a shared understanding on
> > how the patches as posted are supposed to work in the most common
> > cases? I'm thinking here of projects such as Enarx, Gramine and
> > Occulum, which all have a similar process. Namely they execute an
> > executable (called exec in the below chart) which has things like
> > syscalls handled by a shim. These two components (shim and exec) are
> > supported by a non-enclave userspace runtime. Given this common
> > architectural pattern, this is how I understand adding pages via an
> > exec call to mmap() to work.
> >
> > https://mermaid.live/edit#pako:eNp1k81qwzAQhF9F6NRCAu1Vh0BIRemhoeSHBuIettYmFpElVZZLQ8i7144sJ8aOT2bmY3d2vT7R1AikjBb4U6JO8UXC3kGeaFI9FpyXqbSgPTmg06j6uiu1lzn2jSKTA2XwD9NEB31uPBLzi-6iMpLnYB8Wn4-kOBYpKBW52iXj8WQSmzEy5Zvt01ewG5HUQN2UEc7nK77YPjdALd64GWih8NpkALGwR_JtzOGAaKXexyTKGEt2pgoMaXahgj5Qgk9nM_6xGvDDJpsmOyiVv0LB62B8un4dBDrLiLPeWciCL9fvvKVQizhSG6stFz9Df7sxUpcYitR-SodFO2A_Vw-7l4nzzduqjX9bKJxOHDDeBB3RHF0OUlS3faq1hPoMqzulrHoVGPZOE32u0NIK8MiF9MZRtgNV4IhC6c3yqFPKvCsxQs3_0VDnfzf-CPg
> >
> > This only covers adding RW pages. I haven't even tackled permission
> > changes yet. Is that understanding correct? If not, please provide an
> > alternative sequence diagram to explain how you expect this to be
> > used.
>
> Please find my attempt linked below:
>
> https://mermaid.live/edit#pako:eNqFUsFqAjEQ_ZWQUwsK7XUPgthQeqiUVang9jAkoxu6m2yzWVsR_72J2WTbKnSOb97MvPeSI-VaIM1oix8dKo4PEnYG6kIRVw0YK7lsQFlSghGfYPCy845GYXWJm05ZWV8ZaEt55QB-IS9UwOfaItF7NGc0I3UNzU3-ekvaQ8uhqiLPd8l4PJnEYxmZsvXm7i20e5B4QlA5rAqMgJJfG9Ixg21X2ctVXn9GGJsvWb65729FSZXWDdlqpxx46Qzu-gB8-cHzhhim2zKdzdjLcuAAt3IPzv6Qkq84EdxGM3492UJS-cdSpLHp6nEgCPz3RjI5NPvAlRisJjspOsbWT8sUyc_MwjuynC1Wzyw9EB3RGk0NUrgvePRYQW2J7tNQd5sKDN5ooU6O2jXCiWZCWm1otoWqxRGFzurFQXGaWdNhJPXfuGedvgFejOuH
>
> The changes include:
> * Move mmap() to occur before attempting EACCEPT on the addresses. This is
>   required for EACCEPT (as well as any subsequent access from within the enclave)
>   to be able to access the pages.
> * Remove AEX[1] to the runtime within the loop. After EAUG returns execution
>   will return to the instruction pointer that triggered the #PF, EACCEPT,
>   this will cause the EACCEPT to be run again, this time succeeding.
>
> This is based on the implementation within this series. When supporting
> the new ioctl() requested by Jarkko there will be an additional ioctl()
> required before the loop.

https://mermaid.live/edit/#pako:eNp1U9FqgzAU_ZWQpw1a2F6FFaQLYw8ro7asUPeQmWsNNYlL4rZS-u-LRmut1ie953jvOecmR5woBjjABr5LkAk8c7rTVMQSuYeWVslSfIH23wXVlie8oNKijGr2SzUMkT1oCfmwrktpuRj5wWRcDKvwB0ksfX2hLCD1A7quBkgIWtwtP-6ROZiE5nnLq1A0nc5m7bAAhWSzffj0cFNEFaEaGiBCFiuy3D42hKp4gWZUshy6ISOUL6X2e4CCy10rQhUW8dR52QESivGUJ9RyJQ2SAAyYZ_V6ndUSsnldneVca_bJdvY7lkf6vc4haTBlbsdbDmLoaLlSBUqVy5wmWW2nw3rq26Pg-oTzOXlf9Xkt7BfTeqjjSWlP2JWTlkrC9cutlmcLlUlxoRBkE3T9Mrq7KArd0UBPqFDGTpstI2OphSv-jf1cBukPJlmSaP1GXFs8wQK0oJy523Ws-DG2GTiJOHCvDLx3HMuTo5YFc1MJ41ZpHKQ0NzDB1fWLDjLBgdUltKTmhjas0z-kWy8L

My comments below correspond to the arrow numbers in the diagram.

2. When the runtime receives the AEX, it doesn't have enough knowledge
to know whether or not to ask the kernel for an mmap(). So it has to
reenter the shim.

3. The shim has to handle the syscall instruction routing it to the
enclave's memory management subsystem.

4. The shim has to do bookkeeping and decide if additional pages are
even needed. If pages are already allocated, for example, it can skip
directly to step 13. However, if modifications are needed, it will go
to steps 5-12.

5-12. This is the part that represents new code from the kernel's
perspective for SGX2. It is also in a performance critical path and
should be evaluated with greater scrutiny. The number of context
switches is O(2N + 4) for each new allocated block, where N is the
number of pages: a context switch occurs at step 5, 6, 7,  8, 9/10 and
12. However, this can be reduced to O(4) for each new allocated block
with a simple modification:

https://mermaid.live/edit/#pako:eNqNk11rwyAUhv-KeLVBC9ttYIXQydjFymhaVmh24fSkkUbN1Gwrpf99pvlsk8G80nMez3l91SNmmgMOsIXPAhSDR0F3hspYIT9o4bQq5AeYap1T4wQTOVUOpdTwb2pgmNmDUZAN46ZQTsiRDTYVchiFH2CxquIL7QDpLzDnaICkpPnN8u0W2YNlNMsarsyi6XQ2a5oFKCSb7d17la6DqATKpgEiZLEiy-19DZTBXjalimfQNRlBPrTe7wFyoXaNCJ07JBJ_lh0gqblIBKNOaGWRAuDAK-qiVquWkM3zqpVzrblytjt-R2Va5yjR3h_K0nPrLleOaudFERKunzoIVE9Xj26VtZYbsEXmxgUOTP2_witfSTifk9fViMDzZPQuoij0V40eUK6tm9a3hqyjDq74P_zuH6V6aGRJovUL8WXxBEswkgruf8ux5GPsUvDvGQd-yiGhpS04ViePFjn3XQkXThscJDSzMMHld4oOiuHAmQIaqP5xNXX6BeBJIEk

The interesting thing about this pattern is that this can be done for
all page modification types except EMODT. For example, here's the same
process for changing a mapping from RW to RX:

https://mermaid.live/edit/#pako:eNqNk11rwyAUhv-KeLVBC9ttYIVCvdhFu5F0UGh24fSkkUbN1Gwrpf995jttMphXes7jOa-vesZMc8ABtvBZgGKwEvRgqIwV8oMWTqtCfoCp1zk1TjCRU-VQSg3_pgbGmSMYBdk4bgrlhJzYYFMhx1H4ARarOr7RDpD-AlNFAyQlze_C3T2yJ8tolrVcmUXz-WLRNgvQkuz2D-91ugmiEiibBoiQzZaE-8cGKIODbEoVz6BvMoF8aH08AuRCHVoROndIJP4sB0BSc5EIRp3QyiIFwIHX1FWtTi0hu-dtJ-dWc-1sf_yeyrTOUaK9P5SlVes-V45651URsn5ZvYY9BmqgbMB32jrTDdgic9MSR7b-X-ONs5U-MqGvmkxeRhQt_V2jJ5Rr6-bNtSHrqIMb_g_DhyepXxoJSfS2Jr4snmEJRlLB_Xc5l3yMXQr-QePATzkktHQFx-ri0SLnvivhwmmDg4RmFma4_E_RSTEcOFNACzVfrqEuvytQILY

My point in this thread has always been that it is an anti-feature to
presume that there is a need to treat EPC and VLA permissions
separately. This is a performance sink and it optimizes for a use case
which doesn't exist. Nobody actually wants there to be a mismatch
between EPC and VLA permissions.

So, besides EMODT, the only userspace interface we need is
mmap()/mprotect()/munmap(). The kernel should either succeed the
mmap()/mprotect()/munmap() syscall if the EPC permissions can be made
compatible or should fail otherwise.

Another interesting property arises from this flow. Since the EPC and
VLA permissions are always synchronized from the perspective of
userspace, in cases where the memory state between the kernel and the
exec layer is roughly synchronous, bookkeeping in the shim can be
implemented without any persistent memory between syscall handling
events. So, for example, the shim can implement brk() and
mmap()/munmap()/mprotect() with just two pointers: one to the break
position and one to the lowest mmap().

It is true that this basically commits enclave authors to doing all
EACCEPT calls immediately after modifications. But I suspect everyone
will do this anyway since there is no efficient (read: performant) way
for shims to handle page faults. So trying to do this lazily will just
result in a huge decrease in performance.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-02 22:57                   ` Reinette Chatre
@ 2022-03-03 16:08                     ` Haitao Huang
  2022-03-03 21:23                       ` Reinette Chatre
  2022-03-03 23:18                       ` Jarkko Sakkinen
  2022-03-03 23:12                     ` Jarkko Sakkinen
  1 sibling, 2 replies; 130+ messages in thread
From: Haitao Huang @ 2022-03-03 16:08 UTC (permalink / raw)
  To: Jarkko Sakkinen, Reinette Chatre
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi all,

On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre  
<reinette.chatre@intel.com> wrote:

> Hi Jarkko,
>
> On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>>> Hi Jarkko,
>>>
>>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>>>>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version  
>>>>> of
>>>>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
>>>>> obviously new RX pages are now out of the picture:
>>>>>
>>>>>
>>>>> 	/*
>>>>> 	 * Adding a regular page that is architecturally allowed to only
>>>>> 	 * be created with RW permissions.
>>>>> 	 * TBD: Interface with user space policy to support max permissions
>>>>> 	 * of RWX.
>>>>> 	 */
>>>>> 	prot = PROT_READ | PROT_WRITE;
>>>>> 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>>>> 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>>>>>
>>>>> If that TBD is left out to the final version the page augmentation  
>>>>> has a
>>>>> risk of a API bottleneck, and that risk can realize then also in the  
>>>>> page
>>>>> permission ioctls.
>>>>>
>>>>> I.e. now any review comment is based on not fully known territory,  
>>>>> we have
>>>>> one known unknown, and some unknown unknowns from unpredictable  
>>>>> effect to
>>>>> future API changes.
>>>
>>> The plan to complete the "TBD" in the above snippet was to follow this  
>>> work
>>> with user policy integration at this location. On a high level the  
>>> plan was
>>> for this to look something like:
>>>
>>>
>>>  	/*
>>>  	 * Adding a regular page that is architecturally allowed to only
>>>  	 * be created with RW permissions.
>>>  	 * Interface with user space policy to support max permissions
>>>  	 * of RWX.
>>>  	 */
>>>  	prot = PROT_READ | PROT_WRITE;
>>>  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>>>
>>>         if (user space policy allows RWX on dynamically added pages)
>>> 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |  
>>> PROT_WRITE | PROT_EXEC, 0);
>>> 	else
>>> 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |  
>>> PROT_WRITE, 0);
>>>
>>> The work that follows this series aimed to do the integration with user
>>> space policy.
>>
>> What do you mean by "user space policy" anyway exactly? I'm sorry but I
>> just don't fully understand this.
>
> My apologies - I just assumed that you would need no reminder about this  
> contentious
> part of SGX history. Essentially it means that, yes, the kernel could  
> theoretically
> permit any kind of access to any file/page, but some accesses are known  
> to generally
> be a bad idea - like making memory executable as well as writable - and  
> thus there
> are additional checks based on what user space permits before the kernel  
> allows
> such accesses.
>
> For example,
> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>
> User policy and SGX has seen significant discussion. Some notable  
> threads:
> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
>
>> It's too big of a risk to accept this series without X taken care of.  
>> Patch
>> series should neither have TODO nor TBD comments IMHO. I don't want to  
>> ack
>> a series based on speculation what might happen in the future.
>
> ok
>
>>
>>>> I think the best way to move forward would be to do EAUG's explicitly  
>>>> with
>>>> an ioctl that could also include secinfo for permissions. Then you can
>>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>>
>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>>> this purpose. It already includes SECINFO which may also be useful if
>>> needing to later support EAUG of PT_SS* pages.
>>
>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a  
>> day.
>
> I could, yes.
>
>> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this  
>> weird
>> thing added to the #PF handler? Why is it added at all then?
>
> I was just speculating in my response, there is no plan to extend
> SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>
>>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>>> after enclave initialization on any memory region within the enclave  
>>> where
>>> pages are planned to be added dynamically. This ioctl() calls EAUG to  
>>> add the
>>> new pages with RW permissions and their vm_max_prot_bits can be set to  
>>> the
>>> permissions found in the included SECINFO. This will support later  
>>> EACCEPTCOPY
>>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>>
>> I don't like this type of re-use of the existing API.
>
> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus  
> after
> considering the user policy question (above) and performance trade-off  
> (more below).
>
>>
>>> The big question is whether communicating user policy after enclave  
>>> initialization
>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all?  
>>> I would
>>> appreciate a confirmation on this direction considering the  
>>> significant history
>>> behind this topic.
>>
>> I have no idea because I don't know what is user space policy.
>
> This discussion is about some enclave usages needing RWX permissions
> on dynamically added enclave pages. RWX permissions on dynamically added  
> pages is
> not something that should blindly be allowed for all SGX enclaves but  
> instead the user
> needs to explicitly allow specific enclaves to have such ability. This  
> is equivalent
> to (but not the same as) what exists in Linux today with LSM. As seen in
> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able  
> to make
> files and memory be both writable and executable, but it would only do  
> so for those
> files and memory that the LSM (which is how user policy is communicated,  
> like SELinux)
> indicates it is allowed, not blindly do so for all files and all memory.
>
>>>> Putting EAUG to the #PF handler and implicitly call it just too  
>>>> flakky and
>>>> hard to make deterministic for e.g. JIT compiler in our use case (not  
>>>> to
>>>> mention that JIT is not possible at all because inability to do RX  
>>>> pages).
>
> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic  
> but from
> what I understand it would have a performance impact since it would  
> require all memory
> that may be needed by the enclave be pre-allocated from outside the  
> enclave and not
> just dynamically allocated from within the enclave at the time it is  
> needed.
>
> Would such a performance impact be acceptable?
>

User space won't always have enough info to decide whether the pages to be  
EAUG'd immediately. In some cases (shared libraries, JVM for example) lots  
of code/data pages can be mapped but never actually touched. One  
enclave/process does not know if any other more important enclave/process  
would need the EPC.

It should be for kernel to make the final decision as it has overall  
picture of the system EPC usage and availability.

User space can provide a hint (similar to MAP_POPULATE) to kernel that the  
mmap'd area will soon be needed and kernel should EAUG as soon as it sees  
fit based on current system usage. Or kernel implement some policy to  
avoid #PF triggered by EACCEPT, for example, if the system has ton of free  
EPC relative to the requested by mmap at the time.

BR
Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-03-03  1:13             ` Nathaniel McCallum
@ 2022-03-03 17:49               ` Reinette Chatre
  2022-03-04  0:57               ` Jarkko Sakkinen
  1 sibling, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-03-03 17:49 UTC (permalink / raw)
  To: Nathaniel McCallum
  Cc: dave.hansen, Jarkko Sakkinen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

Hi Nathaniel,

On 3/2/2022 5:13 PM, Nathaniel McCallum wrote:
> On Wed, Mar 2, 2022 at 4:20 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Nathaniel,
>>
>> On 3/2/2022 8:57 AM, Nathaniel McCallum wrote:
>>> Perhaps it would be better for us to have a shared understanding on
>>> how the patches as posted are supposed to work in the most common
>>> cases? I'm thinking here of projects such as Enarx, Gramine and
>>> Occulum, which all have a similar process. Namely they execute an
>>> executable (called exec in the below chart) which has things like
>>> syscalls handled by a shim. These two components (shim and exec) are
>>> supported by a non-enclave userspace runtime. Given this common
>>> architectural pattern, this is how I understand adding pages via an
>>> exec call to mmap() to work.
>>>
>>> https://mermaid.live/edit#pako:eNp1k81qwzAQhF9F6NRCAu1Vh0BIRemhoeSHBuIettYmFpElVZZLQ8i7144sJ8aOT2bmY3d2vT7R1AikjBb4U6JO8UXC3kGeaFI9FpyXqbSgPTmg06j6uiu1lzn2jSKTA2XwD9NEB31uPBLzi-6iMpLnYB8Wn4-kOBYpKBW52iXj8WQSmzEy5Zvt01ewG5HUQN2UEc7nK77YPjdALd64GWih8NpkALGwR_JtzOGAaKXexyTKGEt2pgoMaXahgj5Qgk9nM_6xGvDDJpsmOyiVv0LB62B8un4dBDrLiLPeWciCL9fvvKVQizhSG6stFz9Df7sxUpcYitR-SodFO2A_Vw-7l4nzzduqjX9bKJxOHDDeBB3RHF0OUlS3faq1hPoMqzulrHoVGPZOE32u0NIK8MiF9MZRtgNV4IhC6c3yqFPKvCsxQs3_0VDnfzf-CPg
>>>
>>> This only covers adding RW pages. I haven't even tackled permission
>>> changes yet. Is that understanding correct? If not, please provide an
>>> alternative sequence diagram to explain how you expect this to be
>>> used.
>>
>> Please find my attempt linked below:
>>
>> https://mermaid.live/edit#pako:eNqFUsFqAjEQ_ZWQUwsK7XUPgthQeqiUVang9jAkoxu6m2yzWVsR_72J2WTbKnSOb97MvPeSI-VaIM1oix8dKo4PEnYG6kIRVw0YK7lsQFlSghGfYPCy845GYXWJm05ZWV8ZaEt55QB-IS9UwOfaItF7NGc0I3UNzU3-ekvaQ8uhqiLPd8l4PJnEYxmZsvXm7i20e5B4QlA5rAqMgJJfG9Ixg21X2ctVXn9GGJsvWb65729FSZXWDdlqpxx46Qzu-gB8-cHzhhim2zKdzdjLcuAAt3IPzv6Qkq84EdxGM3492UJS-cdSpLHp6nEgCPz3RjI5NPvAlRisJjspOsbWT8sUyc_MwjuynC1Wzyw9EB3RGk0NUrgvePRYQW2J7tNQd5sKDN5ooU6O2jXCiWZCWm1otoWqxRGFzurFQXGaWdNhJPXfuGedvgFejOuH
>>
>> The changes include:
>> * Move mmap() to occur before attempting EACCEPT on the addresses. This is
>>   required for EACCEPT (as well as any subsequent access from within the enclave)
>>   to be able to access the pages.
>> * Remove AEX[1] to the runtime within the loop. After EAUG returns execution
>>   will return to the instruction pointer that triggered the #PF, EACCEPT,
>>   this will cause the EACCEPT to be run again, this time succeeding.
>>
>> This is based on the implementation within this series. When supporting
>> the new ioctl() requested by Jarkko there will be an additional ioctl()
>> required before the loop.
> 
> https://mermaid.live/edit/#pako:eNp1U9FqgzAU_ZWQpw1a2F6FFaQLYw8ro7asUPeQmWsNNYlL4rZS-u-LRmut1ie953jvOecmR5woBjjABr5LkAk8c7rTVMQSuYeWVslSfIH23wXVlie8oNKijGr2SzUMkT1oCfmwrktpuRj5wWRcDKvwB0ksfX2hLCD1A7quBkgIWtwtP-6ROZiE5nnLq1A0nc5m7bAAhWSzffj0cFNEFaEaGiBCFiuy3D42hKp4gWZUshy6ISOUL6X2e4CCy10rQhUW8dR52QESivGUJ9RyJQ2SAAyYZ_V6ndUSsnldneVca_bJdvY7lkf6vc4haTBlbsdbDmLoaLlSBUqVy5wmWW2nw3rq26Pg-oTzOXlf9Xkt7BfTeqjjSWlP2JWTlkrC9cutlmcLlUlxoRBkE3T9Mrq7KArd0UBPqFDGTpstI2OphSv-jf1cBukPJlmSaP1GXFs8wQK0oJy523Ws-DG2GTiJOHCvDLx3HMuTo5YFc1MJ41ZpHKQ0NzDB1fWLDjLBgdUltKTmhjas0z-kWy8L
> 
> My comments below correspond to the arrow numbers in the diagram.
> 
> 2. When the runtime receives the AEX, it doesn't have enough knowledge
> to know whether or not to ask the kernel for an mmap(). So it has to
> reenter the shim.
> 
> 3. The shim has to handle the syscall instruction routing it to the
> enclave's memory management subsystem.
> 
> 4. The shim has to do bookkeeping and decide if additional pages are
> even needed. If pages are already allocated, for example, it can skip
> directly to step 13. However, if modifications are needed, it will go
> to steps 5-12.
> 
> 5-12. This is the part that represents new code from the kernel's
> perspective for SGX2. It is also in a performance critical path and
> should be evaluated with greater scrutiny. The number of context
> switches is O(2N + 4) for each new allocated block, where N is the
> number of pages: a context switch occurs at step 5, 6, 7,  8, 9/10 and
> 12. However, this can be reduced to O(4) for each new allocated block
> with a simple modification:
> 
> https://mermaid.live/edit/#pako:eNqNk11rwyAUhv-KeLVBC9ttYIXQydjFymhaVmh24fSkkUbN1Gwrpf99pvlsk8G80nMez3l91SNmmgMOsIXPAhSDR0F3hspYIT9o4bQq5AeYap1T4wQTOVUOpdTwb2pgmNmDUZAN46ZQTsiRDTYVchiFH2CxquIL7QDpLzDnaICkpPnN8u0W2YNlNMsarsyi6XQ2a5oFKCSb7d17la6DqATKpgEiZLEiy-19DZTBXjalimfQNRlBPrTe7wFyoXaNCJ07JBJ_lh0gqblIBKNOaGWRAuDAK-qiVquWkM3zqpVzrblytjt-R2Va5yjR3h_K0nPrLleOaudFERKunzoIVE9Xj26VtZYbsEXmxgUOTP2_witfSTifk9fViMDzZPQuoij0V40eUK6tm9a3hqyjDq74P_zuH6V6aGRJovUL8WXxBEswkgruf8ux5GPsUvDvGQd-yiGhpS04ViePFjn3XQkXThscJDSzMMHld4oOiuHAmQIaqP5xNXX6BeBJIEk

Your optimized proposal is possible in the current implementation as
follows:

https://mermaid.live/edit#pako:eNp1k11vgjAUhv_KSa-2RJPtlmQmxvViFzOLuMxEdlHbgzTSlrVlmzH-9xUBUWFclfc8nI-X0wPhRiCJiMOvEjXHZ8m2lqlEQ3hY6Y0u1QZt_V4w6yWXBdMeMmbFD7PYj-zQasz7ui21l2rgA5dJ1VfxF3mia31uPIL5RntSI1CKFXeLj3twe8dZnrdcFYXxeDJpi0Uwpav1w2cdbkSogKpoBJTOl3SxfmyASryIZkyLHLsiA8jGmN0OsZB62zZhCg8yDbNsEZQRMpWceWm0A40oUNTUVa5zt5SuXpbndm57rp3txu-oOnKd62ySRVfmfjhlz4YOy40pIDXBc8az0zhdbMAJOp3N6NuyY1A3o54Og-7F8TT8HHiCwjg_bnwG55nHG_4fhy5HqVeDLmj8_kpDWjIiCq1iUoT9PlR8QnyGYQNJFI4CU1bZQhJ9DGhZiFCVCumNJVHKcocjUl2AeK85ibwtsYWaO9JQxz-gBQs-

You can think of that EACCEPT instruction similar to a current (SGX1)
enclave memory read or write when the enclave page is not currently in
the EPC, for example, if the enclave memory being accessed is swapped
out and need to be decrypted and loaded back. Instead of ENCLS[ELDU]
incorporated to load the enclave page back into EPC, ENCLS[EAUG] is
incorporated to create a new EPC page.

You can find an example of such a flow involving EACCEPT in the
"augment_via_eaccept" test found in "[PATCH V2 21/32] selftests/sgx: Test
two different SGX2 EAUG flows"


> The interesting thing about this pattern is that this can be done for
> all page modification types except EMODT. For example, here's the same
> process for changing a mapping from RW to RX:
> 
> https://mermaid.live/edit/#pako:eNqNk11rwyAUhv-KeLVBC9ttYIVCvdhFu5F0UGh24fSkkUbN1Gwrpf995jttMphXes7jOa-vesZMc8ABtvBZgGKwEvRgqIwV8oMWTqtCfoCp1zk1TjCRU-VQSg3_pgbGmSMYBdk4bgrlhJzYYFMhx1H4ARarOr7RDpD-AlNFAyQlze_C3T2yJ8tolrVcmUXz-WLRNgvQkuz2D-91ugmiEiibBoiQzZaE-8cGKIODbEoVz6BvMoF8aH08AuRCHVoROndIJP4sB0BSc5EIRp3QyiIFwIHX1FWtTi0hu-dtJ-dWc-1sf_yeyrTOUaK9P5SlVes-V45651URsn5ZvYY9BmqgbMB32jrTDdgic9MSR7b-X-ONs5U-MqGvmkxeRhQt_V2jJ5Rr6-bNtSHrqIMb_g_DhyepXxoJSfS2Jr4snmEJRlLB_Xc5l3yMXQr-QePATzkktHQFx-ri0SLnvivhwmmDg4RmFma4_E_RSTEcOFNACzVfrqEuvytQILY
> 
> My point in this thread has always been that it is an anti-feature to
> presume that there is a need to treat EPC and VLA permissions
> separately. This is a performance sink and it optimizes for a use case
> which doesn't exist. Nobody actually wants there to be a mismatch
> between EPC and VLA permissions.

I assume you mean VMA permissions. It is hard for me to trust the statement
that nobody wants there to be a mismatch since VMA permissions being separate
from EPC permissions is an intentional (as documented) and integral part of the
current SGX ABI. Current SGX implementation explicitly checks for and supports
VMA mappings with permissions different from EPC permissions.

This SGX2 implementation follows and respects the current ABI and changing ABI
cannot be taken lightly.
 
Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-03 16:08                     ` Haitao Huang
@ 2022-03-03 21:23                       ` Reinette Chatre
  2022-03-03 21:44                         ` Dave Hansen
  2022-03-03 23:18                       ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-03 21:23 UTC (permalink / raw)
  To: Haitao Huang, Jarkko Sakkinen
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Haitao,

On 3/3/2022 8:08 AM, Haitao Huang wrote:
> On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre <reinette.chatre@intel.com> wrote:
>> On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>>> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>>>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:

...

>>>>> I think the best way to move forward would be to do EAUG's explicitly with
>>>>> an ioctl that could also include secinfo for permissions. Then you can
>>>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>>>
>>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>>>> this purpose. It already includes SECINFO which may also be useful if
>>>> needing to later support EAUG of PT_SS* pages.
>>>
>>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.
>>
>> I could, yes.
>>
>>> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird
>>> thing added to the #PF handler? Why is it added at all then?
>>
>> I was just speculating in my response, there is no plan to extend
>> SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>>
>>>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>>>> after enclave initialization on any memory region within the enclave where
>>>> pages are planned to be added dynamically. This ioctl() calls EAUG to add the
>>>> new pages with RW permissions and their vm_max_prot_bits can be set to the
>>>> permissions found in the included SECINFO. This will support later EACCEPTCOPY
>>>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>>>
>>> I don't like this type of re-use of the existing API.
>>
>> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
>> considering the user policy question (above) and performance trade-off (more below).
>>
>>>
>>>> The big question is whether communicating user policy after enclave initialization
>>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
>>>> appreciate a confirmation on this direction considering the significant history
>>>> behind this topic.
>>>
>>> I have no idea because I don't know what is user space policy.
>>
>> This discussion is about some enclave usages needing RWX permissions
>> on dynamically added enclave pages. RWX permissions on dynamically added pages is
>> not something that should blindly be allowed for all SGX enclaves but instead the user
>> needs to explicitly allow specific enclaves to have such ability. This is equivalent
>> to (but not the same as) what exists in Linux today with LSM. As seen in
>> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
>> files and memory be both writable and executable, but it would only do so for those
>> files and memory that the LSM (which is how user policy is communicated, like SELinux)
>> indicates it is allowed, not blindly do so for all files and all memory.
>>
>>>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
>>>>> hard to make deterministic for e.g. JIT compiler in our use case (not to
>>>>> mention that JIT is not possible at all because inability to do RX pages).
>>
>> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
>> what I understand it would have a performance impact since it would require all memory
>> that may be needed by the enclave be pre-allocated from outside the enclave and not
>> just dynamically allocated from within the enclave at the time it is needed.
>>
>> Would such a performance impact be acceptable?
>>
> 
> User space won't always have enough info to decide whether the pages to be EAUG'd immediately. In some cases (shared libraries, JVM for example) lots of code/data pages can be mapped but never actually touched. One enclave/process does not know if any other more important enclave/process would need the EPC.
> 
> It should be for kernel to make the final decision as it has overall picture of the system EPC usage and availability.
> 
> User space can provide a hint (similar to MAP_POPULATE) to kernel that the mmap'd area will soon be needed and kernel should EAUG as soon as it sees fit based on current system usage. Or kernel implement some policy to avoid #PF triggered by EACCEPT, for example, if the system has ton of free EPC relative to the requested by mmap at the time.
> 

mmap(...,...,...,MAP_POPULATE,...,...) would be most fitting and
ideal since it would enable user space to indicate that the pages would
be needed soon and the kernel can then prefault the pages. This is already
desirable in the current implementation to avoid the first page fault on
pages added via SGX_IOC_ENCLAVE_ADD_PAGES.

Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
then I believe that SGX would benefit.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-03 21:23                       ` Reinette Chatre
@ 2022-03-03 21:44                         ` Dave Hansen
  2022-03-05  3:19                           ` Jarkko Sakkinen
  2022-03-10  5:43                           ` Jarkko Sakkinen
  0 siblings, 2 replies; 130+ messages in thread
From: Dave Hansen @ 2022-03-03 21:44 UTC (permalink / raw)
  To: Reinette Chatre, Haitao Huang, Jarkko Sakkinen
  Cc: Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On 3/3/22 13:23, Reinette Chatre wrote:
> Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> then I believe that SGX would benefit.

Some Intel folks asked for this quite a while ago.  I think it's
entirely doable: add a new vm_ops->populate() function that will allow
ignoring VM_IO|VM_PFNMAP if present.

Or, if nobody wants to waste all of the vm_ops space, just add an
arch_vma_populate() or something which can call over into SGX.

I'll happily review the patches if anyone can put such a beast together.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-02 22:57                   ` Reinette Chatre
  2022-03-03 16:08                     ` Haitao Huang
@ 2022-03-03 23:12                     ` Jarkko Sakkinen
  2022-03-04  0:48                       ` Reinette Chatre
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-03 23:12 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Wed, Mar 02, 2022 at 02:57:45PM -0800, Reinette Chatre wrote:
> > What do you mean by "user space policy" anyway exactly? I'm sorry but I
> > just don't fully understand this.
> 
> My apologies - I just assumed that you would need no reminder about this contentious
> part of SGX history. Essentially it means that, yes, the kernel could theoretically
> permit any kind of access to any file/page, but some accesses are known to generally
> be a bad idea - like making memory executable as well as writable - and thus there
> are additional checks based on what user space permits before the kernel allows
> such accesses.

The device files are limited by a GID (in systemd upstream), which is a
"user policy".

What you want to add and why augmentation cannot be made complete before
the unknown factor is added to the access control?

> >>> I think the best way to move forward would be to do EAUG's explicitly with
> >>> an ioctl that could also include secinfo for permissions. Then you can
> >>> easily do the rest with EACCEPTCOPY inside the enclave.
> >>
> >> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
> >> this purpose. It already includes SECINFO which may also be useful if
> >> needing to later support EAUG of PT_SS* pages.
> > 
> > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.
> 
> I could, yes.

And this enables EACCEPTCOPY pattern nicely.

E.g. you can implement mmap() with EAUG and then EACCEPTCOPY feeded with
permissions and a zero page:

1. enclave calls back to host to do mmap()
2. host does eaug on given range and enter back to enclave.
3. enclave does eacceptcopy with given permissions and a zero page.

> > I don't like this type of re-use of the existing API.
> 
> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
> considering the user policy question (above) and performance trade-off (more below).

Ok.

If adding this would be a bottleneck it would be already persistent int
"add pages", so whatever limitation there might be, it already exist.

Thus, logically, that could be safely added without worrying about user
policies all that much...

> 
> > 
> >> The big question is whether communicating user policy after enclave initialization
> >> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
> >> appreciate a confirmation on this direction considering the significant history
> >> behind this topic.
> > 
> > I have no idea because I don't know what is user space policy.
> 
> This discussion is about some enclave usages needing RWX permissions
> on dynamically added enclave pages. RWX permissions on dynamically added pages is

I'm not sure if that is actually necessary, if you use EAUG-EACCEPTCOPY
type of pattern. Please correct if I'm wrong.

> not something that should blindly be allowed for all SGX enclaves but instead the user
> needs to explicitly allow specific enclaves to have such ability. This is equivalent
> to (but not the same as) what exists in Linux today with LSM. As seen in
> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
> files and memory be both writable and executable, but it would only do so for those
> files and memory that the LSM (which is how user policy is communicated, like SELinux)
> indicates it is allowed, not blindly do so for all files and all memory.

We could also potentially make LSM hooks to ioctls, if that is ever needed.

And as I said earlier, EAUG ioctl does not make things any worse they might
be.

> >>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
> >>> hard to make deterministic for e.g. JIT compiler in our use case (not to
> >>> mention that JIT is not possible at all because inability to do RX pages).
> 
> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
> what I understand it would have a performance impact since it would require all memory
> that may be needed by the enclave be pre-allocated from outside the enclave and not
> just dynamically allocated from within the enclave at the time it is needed.
> 
> Would such a performance impact be acceptable?

IMHO yes because bad behaving enclave can cause the same issue anyway,
and more indeterministic manner.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-03 16:08                     ` Haitao Huang
  2022-03-03 21:23                       ` Reinette Chatre
@ 2022-03-03 23:18                       ` Jarkko Sakkinen
  2022-03-04  4:03                         ` Haitao Huang
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-03 23:18 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
> Hi all,
> 
> On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
> <reinette.chatre@intel.com> wrote:
> 
> > Hi Jarkko,
> > 
> > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> > > > Hi Jarkko,
> > > > 
> > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
> > > > > > this version of
> > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but
> > > > > > obviously new RX pages are now out of the picture:
> > > > > > 
> > > > > > 
> > > > > > 	/*
> > > > > > 	 * Adding a regular page that is architecturally allowed to only
> > > > > > 	 * be created with RW permissions.
> > > > > > 	 * TBD: Interface with user space policy to support max permissions
> > > > > > 	 * of RWX.
> > > > > > 	 */
> > > > > > 	prot = PROT_READ | PROT_WRITE;
> > > > > > 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > > > > > 
> > > > > > If that TBD is left out to the final version the page
> > > > > > augmentation has a
> > > > > > risk of a API bottleneck, and that risk can realize then
> > > > > > also in the page
> > > > > > permission ioctls.
> > > > > > 
> > > > > > I.e. now any review comment is based on not fully known
> > > > > > territory, we have
> > > > > > one known unknown, and some unknown unknowns from
> > > > > > unpredictable effect to
> > > > > > future API changes.
> > > > 
> > > > The plan to complete the "TBD" in the above snippet was to
> > > > follow this work
> > > > with user policy integration at this location. On a high level
> > > > the plan was
> > > > for this to look something like:
> > > > 
> > > > 
> > > >  	/*
> > > >  	 * Adding a regular page that is architecturally allowed to only
> > > >  	 * be created with RW permissions.
> > > >  	 * Interface with user space policy to support max permissions
> > > >  	 * of RWX.
> > > >  	 */
> > > >  	prot = PROT_READ | PROT_WRITE;
> > > >  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > 
> > > >         if (user space policy allows RWX on dynamically added pages)
> > > > 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > PROT_WRITE | PROT_EXEC, 0);
> > > > 	else
> > > > 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > PROT_WRITE, 0);
> > > > 
> > > > The work that follows this series aimed to do the integration with user
> > > > space policy.
> > > 
> > > What do you mean by "user space policy" anyway exactly? I'm sorry but I
> > > just don't fully understand this.
> > 
> > My apologies - I just assumed that you would need no reminder about this
> > contentious
> > part of SGX history. Essentially it means that, yes, the kernel could
> > theoretically
> > permit any kind of access to any file/page, but some accesses are known
> > to generally
> > be a bad idea - like making memory executable as well as writable - and
> > thus there
> > are additional checks based on what user space permits before the kernel
> > allows
> > such accesses.
> > 
> > For example,
> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
> > 
> > User policy and SGX has seen significant discussion. Some notable
> > threads:
> > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> > https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
> > 
> > > It's too big of a risk to accept this series without X taken care
> > > of. Patch
> > > series should neither have TODO nor TBD comments IMHO. I don't want
> > > to ack
> > > a series based on speculation what might happen in the future.
> > 
> > ok
> > 
> > > 
> > > > > I think the best way to move forward would be to do EAUG's
> > > > > explicitly with
> > > > > an ioctl that could also include secinfo for permissions. Then you can
> > > > > easily do the rest with EACCEPTCOPY inside the enclave.
> > > > 
> > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
> > > > this purpose. It already includes SECINFO which may also be useful if
> > > > needing to later support EAUG of PT_SS* pages.
> > > 
> > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it
> > > a day.
> > 
> > I could, yes.
> > 
> > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
> > > this weird
> > > thing added to the #PF handler? Why is it added at all then?
> > 
> > I was just speculating in my response, there is no plan to extend
> > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
> > 
> > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
> > > > after enclave initialization on any memory region within the
> > > > enclave where
> > > > pages are planned to be added dynamically. This ioctl() calls
> > > > EAUG to add the
> > > > new pages with RW permissions and their vm_max_prot_bits can be
> > > > set to the
> > > > permissions found in the included SECINFO. This will support
> > > > later EACCEPTCOPY
> > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> > > 
> > > I don't like this type of re-use of the existing API.
> > 
> > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus
> > after
> > considering the user policy question (above) and performance trade-off
> > (more below).
> > 
> > > 
> > > > The big question is whether communicating user policy after
> > > > enclave initialization
> > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
> > > > to all? I would
> > > > appreciate a confirmation on this direction considering the
> > > > significant history
> > > > behind this topic.
> > > 
> > > I have no idea because I don't know what is user space policy.
> > 
> > This discussion is about some enclave usages needing RWX permissions
> > on dynamically added enclave pages. RWX permissions on dynamically added
> > pages is
> > not something that should blindly be allowed for all SGX enclaves but
> > instead the user
> > needs to explicitly allow specific enclaves to have such ability. This
> > is equivalent
> > to (but not the same as) what exists in Linux today with LSM. As seen in
> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able
> > to make
> > files and memory be both writable and executable, but it would only do
> > so for those
> > files and memory that the LSM (which is how user policy is communicated,
> > like SELinux)
> > indicates it is allowed, not blindly do so for all files and all memory.
> > 
> > > > > Putting EAUG to the #PF handler and implicitly call it just
> > > > > too flakky and
> > > > > hard to make deterministic for e.g. JIT compiler in our use
> > > > > case (not to
> > > > > mention that JIT is not possible at all because inability to
> > > > > do RX pages).
> > 
> > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic
> > but from
> > what I understand it would have a performance impact since it would
> > require all memory
> > that may be needed by the enclave be pre-allocated from outside the
> > enclave and not
> > just dynamically allocated from within the enclave at the time it is
> > needed.
> > 
> > Would such a performance impact be acceptable?
> > 
> 
> User space won't always have enough info to decide whether the pages to be
> EAUG'd immediately. In some cases (shared libraries, JVM for example) lots
> of code/data pages can be mapped but never actually touched. One
> enclave/process does not know if any other more important enclave/process
> would need the EPC.
> 
> It should be for kernel to make the final decision as it has overall picture
> of the system EPC usage and availability.

EAUG ioctl does not give better capabilities for user space to waste
EPC given that EADD ioctl already exists, i.e. your argument is logically
incorrect.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-03 23:12                     ` Jarkko Sakkinen
@ 2022-03-04  0:48                       ` Reinette Chatre
  0 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-03-04  0:48 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/3/2022 3:12 PM, Jarkko Sakkinen wrote:
> On Wed, Mar 02, 2022 at 02:57:45PM -0800, Reinette Chatre wrote:
>>> What do you mean by "user space policy" anyway exactly? I'm sorry but I
>>> just don't fully understand this.
>>
>> My apologies - I just assumed that you would need no reminder about this contentious
>> part of SGX history. Essentially it means that, yes, the kernel could theoretically
>> permit any kind of access to any file/page, but some accesses are known to generally
>> be a bad idea - like making memory executable as well as writable - and thus there
>> are additional checks based on what user space permits before the kernel allows
>> such accesses.
> 
> The device files are limited by a GID (in systemd upstream), which is a
> "user policy".
> 
> What you want to add and why augmentation cannot be made complete before
> the unknown factor is added to the access control?

After studying this part of SGX history I learned that unfortunately none of the
existing user policy controls have been found to be a perfect fit for enclaves.
Current user policy type permissions are associated with files and processes and
enclaves have properties of both. One process can execute multiple enclaves and
only one/some of those enclaves may require to execute dirty pages. Associating
a permission to execute dirty pages with the process, and thus giving that ability
to all of its enclaves, is not ideal. Similarly, the file /dev/sgx_enclave can
represent multiple enclaves used by multiple processes and a file permission is
similarly too broad.

What I was planning to propose and discuss after the SGX2 core enabling was
an ability for user space to uniquely identify enclaves that require the
ability to execute dirty pages. This identification can be specified by using
enclave properties like MRENCLAVE and MRSIGNER. Executing dirty pages would
only be allowed for these specific enclaves identified to require this ability.
A solution like this is possible using the kernel's keys subsystem by introducing
a new "enclave_execdirty" key that contains these properties. I have this working
as a PoC.

Perhaps the SGX_IOC_ENCLAVE_AUGMENT_PAGES what you propose can also be seen as
a solution to support user space policy ... instead that it is more fine grained
in that it is used to identify specific memory ranges within specific enclaves that
are allowed to execute dirty pages. What do you think?

>>>>> I think the best way to move forward would be to do EAUG's explicitly with
>>>>> an ioctl that could also include secinfo for permissions. Then you can
>>>>> easily do the rest with EACCEPTCOPY inside the enclave.
>>>>
>>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for
>>>> this purpose. It already includes SECINFO which may also be useful if
>>>> needing to later support EAUG of PT_SS* pages.
>>>
>>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day.
>>
>> I could, yes.
> 
> And this enables EACCEPTCOPY pattern nicely.
> 
> E.g. you can implement mmap() with EAUG and then EACCEPTCOPY feeded with
> permissions and a zero page:
> 
> 1. enclave calls back to host to do mmap()
> 2. host does eaug on given range and enter back to enclave.
> 3. enclave does eacceptcopy with given permissions and a zero page.
> 
>>> I don't like this type of re-use of the existing API.
>>
>> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after
>> considering the user policy question (above) and performance trade-off (more below).
> 
> Ok.
> 
> If adding this would be a bottleneck it would be already persistent int
> "add pages", so whatever limitation there might be, it already exist.

Currently this checking is built in as part of "add pages", for example, user
space is prevented from circumventing existing protections on the source pages
with the "vma->vm_flags & VM_MAYEXEC" check in __sgx_encl_add_page().

Further, there is trust here in that the pages added before enclave
initialization are accompanied by their secinfo with the permissions of
the pages and those values are included in the measurement (MRENCLAVE) of
the final enclave. The maximum permissions any enclave page
specified during "add pages" may have is "locked down" during this time.

Permissions of EAUG pages are not included in the MRENCLAVE of the enclave
and there is no backing memory that can be referenced to learn what is already
allowed.

It is possible that some of the code dynamically loaded into the enclave
could indeed be buggy or malicious so effort should be made to only allow
executing of dirty pages to those enclaves specified to require the ability.

> Thus, logically, that could be safely added without worrying about user
> policies all that much...
> 
>>
>>>
>>>> The big question is whether communicating user policy after enclave initialization
>>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would
>>>> appreciate a confirmation on this direction considering the significant history
>>>> behind this topic.
>>>
>>> I have no idea because I don't know what is user space policy.
>>
>> This discussion is about some enclave usages needing RWX permissions
>> on dynamically added enclave pages. RWX permissions on dynamically added pages is
> 
> I'm not sure if that is actually necessary, if you use EAUG-EACCEPTCOPY
> type of pattern. Please correct if I'm wrong.

This only takes EPCM permissions into account. The issue comes in when the kernel
needs to determine whether it should allow the PTEs pointing to these pages to be 
executable.

To elaborate your example, to use dynamically added RWX pages 
EAUG->EACCEPTCOPY->SGX_IOC_ENCLAVE_RELAX_PERMISSIONS is required and 
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will only allow PTEs that are allowed. In the
driver sgx_encl_page->vm_max_prot_bits dictates what permissions are allowed
and SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will return EPERM if an attempt is made
to relax permissions beyond that.

When considering the user space policy integration, sgx_encl_page->vm_max_prot_bits
will be initialized to reflect allowed permissions, RWX if the enclave is so allowed,
in this way EAUG pages can be made executable using SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.
 
>> not something that should blindly be allowed for all SGX enclaves but instead the user
>> needs to explicitly allow specific enclaves to have such ability. This is equivalent
>> to (but not the same as) what exists in Linux today with LSM. As seen in
>> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make
>> files and memory be both writable and executable, but it would only do so for those
>> files and memory that the LSM (which is how user policy is communicated, like SELinux)
>> indicates it is allowed, not blindly do so for all files and all memory.
> 
> We could also potentially make LSM hooks to ioctls, if that is ever needed.

Could you please elaborate?

> 
> And as I said earlier, EAUG ioctl does not make things any worse they might
> be.

I hope my earlier comments noting the differences with adding pages shine some light here.

> 
>>>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and
>>>>> hard to make deterministic for e.g. JIT compiler in our use case (not to
>>>>> mention that JIT is not possible at all because inability to do RX pages).
>>
>> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from
>> what I understand it would have a performance impact since it would require all memory
>> that may be needed by the enclave be pre-allocated from outside the enclave and not
>> just dynamically allocated from within the enclave at the time it is needed.
>>
>> Would such a performance impact be acceptable?
> 
> IMHO yes because bad behaving enclave can cause the same issue anyway,
> and more indeterministic manner.

With EAUG pages supported in the page fault handler it is possible to support
both usages. Especially now that Dave provided guidance on how to
support MAP_POPULATE. As I understand, when MAP_POPULATE is supported a usage
needing deterministic behavior can pre-fault all the EAUG pages while those
usages mapping a lot of memory that mostly will go unused are also supported.


Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2
  2022-03-03  1:13             ` Nathaniel McCallum
  2022-03-03 17:49               ` Reinette Chatre
@ 2022-03-04  0:57               ` Jarkko Sakkinen
  1 sibling, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-04  0:57 UTC (permalink / raw)
  To: Nathaniel McCallum
  Cc: Reinette Chatre, dave.hansen, tglx, bp, Andy Lutomirski, mingo,
	linux-sgx, x86, seanjc, kai.huang, cathy.zhang, cedric.xing,
	haitao.huang, mark.shanahan, hpa, linux-kernel

On Wed, Mar 02, 2022 at 08:13:55PM -0500, Nathaniel McCallum wrote:
> On Wed, Mar 2, 2022 at 4:20 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
> >
> > Hi Nathaniel,
> >
> > On 3/2/2022 8:57 AM, Nathaniel McCallum wrote:
> > > Perhaps it would be better for us to have a shared understanding on
> > > how the patches as posted are supposed to work in the most common
> > > cases? I'm thinking here of projects such as Enarx, Gramine and
> > > Occulum, which all have a similar process. Namely they execute an
> > > executable (called exec in the below chart) which has things like
> > > syscalls handled by a shim. These two components (shim and exec) are
> > > supported by a non-enclave userspace runtime. Given this common
> > > architectural pattern, this is how I understand adding pages via an
> > > exec call to mmap() to work.
> > >
> > > https://mermaid.live/edit#pako:eNp1k81qwzAQhF9F6NRCAu1Vh0BIRemhoeSHBuIettYmFpElVZZLQ8i7144sJ8aOT2bmY3d2vT7R1AikjBb4U6JO8UXC3kGeaFI9FpyXqbSgPTmg06j6uiu1lzn2jSKTA2XwD9NEB31uPBLzi-6iMpLnYB8Wn4-kOBYpKBW52iXj8WQSmzEy5Zvt01ewG5HUQN2UEc7nK77YPjdALd64GWih8NpkALGwR_JtzOGAaKXexyTKGEt2pgoMaXahgj5Qgk9nM_6xGvDDJpsmOyiVv0LB62B8un4dBDrLiLPeWciCL9fvvKVQizhSG6stFz9Df7sxUpcYitR-SodFO2A_Vw-7l4nzzduqjX9bKJxOHDDeBB3RHF0OUlS3faq1hPoMqzulrHoVGPZOE32u0NIK8MiF9MZRtgNV4IhC6c3yqFPKvCsxQs3_0VDnfzf-CPg
> > >
> > > This only covers adding RW pages. I haven't even tackled permission
> > > changes yet. Is that understanding correct? If not, please provide an
> > > alternative sequence diagram to explain how you expect this to be
> > > used.
> >
> > Please find my attempt linked below:
> >
> > https://mermaid.live/edit#pako:eNqFUsFqAjEQ_ZWQUwsK7XUPgthQeqiUVang9jAkoxu6m2yzWVsR_72J2WTbKnSOb97MvPeSI-VaIM1oix8dKo4PEnYG6kIRVw0YK7lsQFlSghGfYPCy845GYXWJm05ZWV8ZaEt55QB-IS9UwOfaItF7NGc0I3UNzU3-ekvaQ8uhqiLPd8l4PJnEYxmZsvXm7i20e5B4QlA5rAqMgJJfG9Ixg21X2ctVXn9GGJsvWb65729FSZXWDdlqpxx46Qzu-gB8-cHzhhim2zKdzdjLcuAAt3IPzv6Qkq84EdxGM3492UJS-cdSpLHp6nEgCPz3RjI5NPvAlRisJjspOsbWT8sUyc_MwjuynC1Wzyw9EB3RGk0NUrgvePRYQW2J7tNQd5sKDN5ooU6O2jXCiWZCWm1otoWqxRGFzurFQXGaWdNhJPXfuGedvgFejOuH
> >
> > The changes include:
> > * Move mmap() to occur before attempting EACCEPT on the addresses. This is
> >   required for EACCEPT (as well as any subsequent access from within the enclave)
> >   to be able to access the pages.
> > * Remove AEX[1] to the runtime within the loop. After EAUG returns execution
> >   will return to the instruction pointer that triggered the #PF, EACCEPT,
> >   this will cause the EACCEPT to be run again, this time succeeding.
> >
> > This is based on the implementation within this series. When supporting
> > the new ioctl() requested by Jarkko there will be an additional ioctl()
> > required before the loop.
> 
> https://mermaid.live/edit/#pako:eNp1U9FqgzAU_ZWQpw1a2F6FFaQLYw8ro7asUPeQmWsNNYlL4rZS-u-LRmut1ie953jvOecmR5woBjjABr5LkAk8c7rTVMQSuYeWVslSfIH23wXVlie8oNKijGr2SzUMkT1oCfmwrktpuRj5wWRcDKvwB0ksfX2hLCD1A7quBkgIWtwtP-6ROZiE5nnLq1A0nc5m7bAAhWSzffj0cFNEFaEaGiBCFiuy3D42hKp4gWZUshy6ISOUL6X2e4CCy10rQhUW8dR52QESivGUJ9RyJQ2SAAyYZ_V6ndUSsnldneVca_bJdvY7lkf6vc4haTBlbsdbDmLoaLlSBUqVy5wmWW2nw3rq26Pg-oTzOXlf9Xkt7BfTeqjjSWlP2JWTlkrC9cutlmcLlUlxoRBkE3T9Mrq7KArd0UBPqFDGTpstI2OphSv-jf1cBukPJlmSaP1GXFs8wQK0oJy523Ws-DG2GTiJOHCvDLx3HMuTo5YFc1MJ41ZpHKQ0NzDB1fWLDjLBgdUltKTmhjas0z-kWy8L
> 
> My comments below correspond to the arrow numbers in the diagram.
> 
> 2. When the runtime receives the AEX, it doesn't have enough knowledge
> to know whether or not to ask the kernel for an mmap(). So it has to
> reenter the shim.
> 
> 3. The shim has to handle the syscall instruction routing it to the
> enclave's memory management subsystem.
> 
> 4. The shim has to do bookkeeping and decide if additional pages are
> even needed. If pages are already allocated, for example, it can skip
> directly to step 13. However, if modifications are needed, it will go
> to steps 5-12.
> 
> 5-12. This is the part that represents new code from the kernel's
> perspective for SGX2. It is also in a performance critical path and
> should be evaluated with greater scrutiny. The number of context
> switches is O(2N + 4) for each new allocated block, where N is the
> number of pages: a context switch occurs at step 5, 6, 7,  8, 9/10 and
> 12. However, this can be reduced to O(4) for each new allocated block
> with a simple modification:
> 
> https://mermaid.live/edit/#pako:eNqNk11rwyAUhv-KeLVBC9ttYIXQydjFymhaVmh24fSkkUbN1Gwrpf99pvlsk8G80nMez3l91SNmmgMOsIXPAhSDR0F3hspYIT9o4bQq5AeYap1T4wQTOVUOpdTwb2pgmNmDUZAN46ZQTsiRDTYVchiFH2CxquIL7QDpLzDnaICkpPnN8u0W2YNlNMsarsyi6XQ2a5oFKCSb7d17la6DqATKpgEiZLEiy-19DZTBXjalimfQNRlBPrTe7wFyoXaNCJ07JBJ_lh0gqblIBKNOaGWRAuDAK-qiVquWkM3zqpVzrblytjt-R2Va5yjR3h_K0nPrLleOaudFERKunzoIVE9Xj26VtZYbsEXmxgUOTP2_witfSTifk9fViMDzZPQuoij0V40eUK6tm9a3hqyjDq74P_zuH6V6aGRJovUL8WXxBEswkgruf8ux5GPsUvDvGQd-yiGhpS04ViePFjn3XQkXThscJDSzMMHld4oOiuHAmQIaqP5xNXX6BeBJIEk
> 
> The interesting thing about this pattern is that this can be done for
> all page modification types except EMODT. For example, here's the same
> process for changing a mapping from RW to RX:
> 
> https://mermaid.live/edit/#pako:eNqNk11rwyAUhv-KeLVBC9ttYIVCvdhFu5F0UGh24fSkkUbN1Gwrpf995jttMphXes7jOa-vesZMc8ABtvBZgGKwEvRgqIwV8oMWTqtCfoCp1zk1TjCRU-VQSg3_pgbGmSMYBdk4bgrlhJzYYFMhx1H4ARarOr7RDpD-AlNFAyQlze_C3T2yJ8tolrVcmUXz-WLRNgvQkuz2D-91ugmiEiibBoiQzZaE-8cGKIODbEoVz6BvMoF8aH08AuRCHVoROndIJP4sB0BSc5EIRp3QyiIFwIHX1FWtTi0hu-dtJ-dWc-1sf_yeyrTOUaK9P5SlVes-V45651URsn5ZvYY9BmqgbMB32jrTDdgic9MSR7b-X-ONs5U-MqGvmkxeRhQt_V2jJ5Rr6-bNtSHrqIMb_g_DhyepXxoJSfS2Jr4snmEJRlLB_Xc5l3yMXQr-QePATzkktHQFx-ri0SLnvivhwmmDg4RmFma4_E_RSTEcOFNACzVfrqEuvytQILY
> 
> My point in this thread has always been that it is an anti-feature to
> presume that there is a need to treat EPC and VLA permissions
> separately. This is a performance sink and it optimizes for a use case
> which doesn't exist. Nobody actually wants there to be a mismatch
> between EPC and VLA permissions.

I would not touch pre-initialization, EADD'd pages. The reason is
backwards compatibility.

For post-initialization, options are still open.

> So, besides EMODT, the only userspace interface we need is
> mmap()/mprotect()/munmap(). The kernel should either succeed the
> mmap()/mprotect()/munmap() syscall if the EPC permissions can be made
> compatible or should fail otherwise.

For mmap() it is the enclave who sets the permissions, not kernel, i.e. you
get a half-broken mmap() implementation. Kernel does EAUG, enclave does
EACCEPTCOPY.

I think what you're asking is too simple to be true, and even if we could
do it, it might limit possibilities to optimize user space, e.g. because
there is two ENCLU leaf functions (EMODPE, EACCEPTCOPY) and one ENCLS
leaf function (EMODPR), which can modify permissions.

A kernel syscall is essentially something that can be fully serviced
by the kernel. This is not such situation. The work is split.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-03 23:18                       ` Jarkko Sakkinen
@ 2022-03-04  4:03                         ` Haitao Huang
  2022-03-04  8:30                           ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Haitao Huang @ 2022-03-04  4:03 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Reinette Chatre, Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel


On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
>> Hi all,
>>
>> On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>
>> > Hi Jarkko,
>> >
>> > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>> > > > Hi Jarkko,
>> > > >
>> > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
>> > > > > > this version of
>> > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX  
>> pages but
>> > > > > > obviously new RX pages are now out of the picture:
>> > > > > >
>> > > > > >
>> > > > > > 	/*
>> > > > > > 	 * Adding a regular page that is architecturally allowed to  
>> only
>> > > > > > 	 * be created with RW permissions.
>> > > > > > 	 * TBD: Interface with user space policy to support max  
>> permissions
>> > > > > > 	 * of RWX.
>> > > > > > 	 */
>> > > > > > 	prot = PROT_READ | PROT_WRITE;
>> > > > > > 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > > > > 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
>> > > > > >
>> > > > > > If that TBD is left out to the final version the page
>> > > > > > augmentation has a
>> > > > > > risk of a API bottleneck, and that risk can realize then
>> > > > > > also in the page
>> > > > > > permission ioctls.
>> > > > > >
>> > > > > > I.e. now any review comment is based on not fully known
>> > > > > > territory, we have
>> > > > > > one known unknown, and some unknown unknowns from
>> > > > > > unpredictable effect to
>> > > > > > future API changes.
>> > > >
>> > > > The plan to complete the "TBD" in the above snippet was to
>> > > > follow this work
>> > > > with user policy integration at this location. On a high level
>> > > > the plan was
>> > > > for this to look something like:
>> > > >
>> > > >
>> > > >  	/*
>> > > >  	 * Adding a regular page that is architecturally allowed to only
>> > > >  	 * be created with RW permissions.
>> > > >  	 * Interface with user space policy to support max permissions
>> > > >  	 * of RWX.
>> > > >  	 */
>> > > >  	prot = PROT_READ | PROT_WRITE;
>> > > >  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > >
>> > > >         if (user space policy allows RWX on dynamically added  
>> pages)
>> > > > 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > PROT_WRITE | PROT_EXEC, 0);
>> > > > 	else
>> > > > 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > PROT_WRITE, 0);
>> > > >
>> > > > The work that follows this series aimed to do the integration  
>> with user
>> > > > space policy.
>> > >
>> > > What do you mean by "user space policy" anyway exactly? I'm sorry  
>> but I
>> > > just don't fully understand this.
>> >
>> > My apologies - I just assumed that you would need no reminder about  
>> this
>> > contentious
>> > part of SGX history. Essentially it means that, yes, the kernel could
>> > theoretically
>> > permit any kind of access to any file/page, but some accesses are  
>> known
>> > to generally
>> > be a bad idea - like making memory executable as well as writable -  
>> and
>> > thus there
>> > are additional checks based on what user space permits before the  
>> kernel
>> > allows
>> > such accesses.
>> >
>> > For example,
>> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>> >
>> > User policy and SGX has seen significant discussion. Some notable
>> > threads:
>> >  
>> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
>> >  
>> https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
>> >
>> > > It's too big of a risk to accept this series without X taken care
>> > > of. Patch
>> > > series should neither have TODO nor TBD comments IMHO. I don't want
>> > > to ack
>> > > a series based on speculation what might happen in the future.
>> >
>> > ok
>> >
>> > >
>> > > > > I think the best way to move forward would be to do EAUG's
>> > > > > explicitly with
>> > > > > an ioctl that could also include secinfo for permissions. Then  
>> you can
>> > > > > easily do the rest with EACCEPTCOPY inside the enclave.
>> > > >
>> > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be  
>> used for
>> > > > this purpose. It already includes SECINFO which may also be  
>> useful if
>> > > > needing to later support EAUG of PT_SS* pages.
>> > >
>> > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it
>> > > a day.
>> >
>> > I could, yes.
>> >
>> > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
>> > > this weird
>> > > thing added to the #PF handler? Why is it added at all then?
>> >
>> > I was just speculating in my response, there is no plan to extend
>> > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>> >
>> > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
>> > > > after enclave initialization on any memory region within the
>> > > > enclave where
>> > > > pages are planned to be added dynamically. This ioctl() calls
>> > > > EAUG to add the
>> > > > new pages with RW permissions and their vm_max_prot_bits can be
>> > > > set to the
>> > > > permissions found in the included SECINFO. This will support
>> > > > later EACCEPTCOPY
>> > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>> > >
>> > > I don't like this type of re-use of the existing API.
>> >
>> > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is  
>> consensus
>> > after
>> > considering the user policy question (above) and performance trade-off
>> > (more below).
>> >
>> > >
>> > > > The big question is whether communicating user policy after
>> > > > enclave initialization
>> > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
>> > > > to all? I would
>> > > > appreciate a confirmation on this direction considering the
>> > > > significant history
>> > > > behind this topic.
>> > >
>> > > I have no idea because I don't know what is user space policy.
>> >
>> > This discussion is about some enclave usages needing RWX permissions
>> > on dynamically added enclave pages. RWX permissions on dynamically  
>> added
>> > pages is
>> > not something that should blindly be allowed for all SGX enclaves but
>> > instead the user
>> > needs to explicitly allow specific enclaves to have such ability. This
>> > is equivalent
>> > to (but not the same as) what exists in Linux today with LSM. As seen  
>> in
>> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is  
>> able
>> > to make
>> > files and memory be both writable and executable, but it would only do
>> > so for those
>> > files and memory that the LSM (which is how user policy is  
>> communicated,
>> > like SELinux)
>> > indicates it is allowed, not blindly do so for all files and all  
>> memory.
>> >
>> > > > > Putting EAUG to the #PF handler and implicitly call it just
>> > > > > too flakky and
>> > > > > hard to make deterministic for e.g. JIT compiler in our use
>> > > > > case (not to
>> > > > > mention that JIT is not possible at all because inability to
>> > > > > do RX pages).
>> >
>> > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more  
>> deterministic
>> > but from
>> > what I understand it would have a performance impact since it would
>> > require all memory
>> > that may be needed by the enclave be pre-allocated from outside the
>> > enclave and not
>> > just dynamically allocated from within the enclave at the time it is
>> > needed.
>> >
>> > Would such a performance impact be acceptable?
>> >
>>
>> User space won't always have enough info to decide whether the pages to  
>> be
>> EAUG'd immediately. In some cases (shared libraries, JVM for example)  
>> lots
>> of code/data pages can be mapped but never actually touched. One
>> enclave/process does not know if any other more important  
>> enclave/process
>> would need the EPC.
>>
>> It should be for kernel to make the final decision as it has overall  
>> picture
>> of the system EPC usage and availability.
>
> EAUG ioctl does not give better capabilities for user space to waste
> EPC given that EADD ioctl already exists, i.e. your argument is logically
> incorrect.

The point of adding EAUG is to allow more efficient use of EPC pages.  
Without EAUG, enclaves have to EADD everything upfront into EPC, consuming  
predetermined number of EPC pages, some of which may not be used at all.  
With EAUG, enclaves should be able to load minimal pages to get started,  
pages added on #PF as they are actually accessed.

Obviously as you pointed out, some usages make more sense to pre-EAUG  
(EAUG before #PF). But your proposal of supporting only pre-EAUG here  
essentially makes EAUG behave almost the same as EADD.  If the current  
implementation with EAUG on #PF can also use MAP_POPULATE for pre-EAUG  
(seems possible based on Dave's comments), then it is flxible to cover all  
cases and allow kernel to optimize allocation of EPC pages.

Thanks
Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-04  4:03                         ` Haitao Huang
@ 2022-03-04  8:30                           ` Jarkko Sakkinen
  2022-03-04 15:51                             ` Haitao Huang
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-04  8:30 UTC (permalink / raw)
  To: Haitao Huang, Reinette Chatre
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
> 
> On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
> > > Hi all,
> > > 
> > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
> > > <reinette.chatre@intel.com> wrote:
> > > 
> > > > Hi Jarkko,
> > > >
> > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> > > > > > Hi Jarkko,
> > > > > >
> > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
> > > > > > > > this version of
> > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX
> > > pages but
> > > > > > > > obviously new RX pages are now out of the picture:
> > > > > > > >
> > > > > > > >
> > > > > > > > 	/*
> > > > > > > > 	 * Adding a regular page that is architecturally allowed
> > > to only
> > > > > > > > 	 * be created with RW permissions.
> > > > > > > > 	 * TBD: Interface with user space policy to support max
> > > permissions
> > > > > > > > 	 * of RWX.
> > > > > > > > 	 */
> > > > > > > > 	prot = PROT_READ | PROT_WRITE;
> > > > > > > > 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > > > 	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > > > > > > >
> > > > > > > > If that TBD is left out to the final version the page
> > > > > > > > augmentation has a
> > > > > > > > risk of a API bottleneck, and that risk can realize then
> > > > > > > > also in the page
> > > > > > > > permission ioctls.
> > > > > > > >
> > > > > > > > I.e. now any review comment is based on not fully known
> > > > > > > > territory, we have
> > > > > > > > one known unknown, and some unknown unknowns from
> > > > > > > > unpredictable effect to
> > > > > > > > future API changes.
> > > > > >
> > > > > > The plan to complete the "TBD" in the above snippet was to
> > > > > > follow this work
> > > > > > with user policy integration at this location. On a high level
> > > > > > the plan was
> > > > > > for this to look something like:
> > > > > >
> > > > > >
> > > > > >  	/*
> > > > > >  	 * Adding a regular page that is architecturally allowed to only
> > > > > >  	 * be created with RW permissions.
> > > > > >  	 * Interface with user space policy to support max permissions
> > > > > >  	 * of RWX.
> > > > > >  	 */
> > > > > >  	prot = PROT_READ | PROT_WRITE;
> > > > > >  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > >
> > > > > >         if (user space policy allows RWX on dynamically added
> > > pages)
> > > > > > 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > PROT_WRITE | PROT_EXEC, 0);
> > > > > > 	else
> > > > > > 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > PROT_WRITE, 0);
> > > > > >
> > > > > > The work that follows this series aimed to do the integration
> > > with user
> > > > > > space policy.
> > > > >
> > > > > What do you mean by "user space policy" anyway exactly? I'm
> > > sorry but I
> > > > > just don't fully understand this.
> > > >
> > > > My apologies - I just assumed that you would need no reminder
> > > about this
> > > > contentious
> > > > part of SGX history. Essentially it means that, yes, the kernel could
> > > > theoretically
> > > > permit any kind of access to any file/page, but some accesses are
> > > known
> > > > to generally
> > > > be a bad idea - like making memory executable as well as writable
> > > - and
> > > > thus there
> > > > are additional checks based on what user space permits before the
> > > kernel
> > > > allows
> > > > such accesses.
> > > >
> > > > For example,
> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
> > > >
> > > > User policy and SGX has seen significant discussion. Some notable
> > > > threads:
> > > > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> > > > https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
> > > >
> > > > > It's too big of a risk to accept this series without X taken care
> > > > > of. Patch
> > > > > series should neither have TODO nor TBD comments IMHO. I don't want
> > > > > to ack
> > > > > a series based on speculation what might happen in the future.
> > > >
> > > > ok
> > > >
> > > > >
> > > > > > > I think the best way to move forward would be to do EAUG's
> > > > > > > explicitly with
> > > > > > > an ioctl that could also include secinfo for permissions.
> > > Then you can
> > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
> > > > > >
> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be
> > > used for
> > > > > > this purpose. It already includes SECINFO which may also be
> > > useful if
> > > > > > needing to later support EAUG of PT_SS* pages.
> > > > >
> > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it
> > > > > a day.
> > > >
> > > > I could, yes.
> > > >
> > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
> > > > > this weird
> > > > > thing added to the #PF handler? Why is it added at all then?
> > > >
> > > > I was just speculating in my response, there is no plan to extend
> > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
> > > >
> > > > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES
> > > > > > after enclave initialization on any memory region within the
> > > > > > enclave where
> > > > > > pages are planned to be added dynamically. This ioctl() calls
> > > > > > EAUG to add the
> > > > > > new pages with RW permissions and their vm_max_prot_bits can be
> > > > > > set to the
> > > > > > permissions found in the included SECINFO. This will support
> > > > > > later EACCEPTCOPY
> > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> > > > >
> > > > > I don't like this type of re-use of the existing API.
> > > >
> > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
> > > consensus
> > > > after
> > > > considering the user policy question (above) and performance trade-off
> > > > (more below).
> > > >
> > > > >
> > > > > > The big question is whether communicating user policy after
> > > > > > enclave initialization
> > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
> > > > > > to all? I would
> > > > > > appreciate a confirmation on this direction considering the
> > > > > > significant history
> > > > > > behind this topic.
> > > > >
> > > > > I have no idea because I don't know what is user space policy.
> > > >
> > > > This discussion is about some enclave usages needing RWX permissions
> > > > on dynamically added enclave pages. RWX permissions on dynamically
> > > added
> > > > pages is
> > > > not something that should blindly be allowed for all SGX enclaves but
> > > > instead the user
> > > > needs to explicitly allow specific enclaves to have such ability. This
> > > > is equivalent
> > > > to (but not the same as) what exists in Linux today with LSM. As
> > > seen in
> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux
> > > is able
> > > > to make
> > > > files and memory be both writable and executable, but it would only do
> > > > so for those
> > > > files and memory that the LSM (which is how user policy is
> > > communicated,
> > > > like SELinux)
> > > > indicates it is allowed, not blindly do so for all files and all
> > > memory.
> > > >
> > > > > > > Putting EAUG to the #PF handler and implicitly call it just
> > > > > > > too flakky and
> > > > > > > hard to make deterministic for e.g. JIT compiler in our use
> > > > > > > case (not to
> > > > > > > mention that JIT is not possible at all because inability to
> > > > > > > do RX pages).
> > > >
> > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
> > > deterministic
> > > > but from
> > > > what I understand it would have a performance impact since it would
> > > > require all memory
> > > > that may be needed by the enclave be pre-allocated from outside the
> > > > enclave and not
> > > > just dynamically allocated from within the enclave at the time it is
> > > > needed.
> > > >
> > > > Would such a performance impact be acceptable?
> > > >
> > > 
> > > User space won't always have enough info to decide whether the pages
> > > to be
> > > EAUG'd immediately. In some cases (shared libraries, JVM for
> > > example) lots
> > > of code/data pages can be mapped but never actually touched. One
> > > enclave/process does not know if any other more important
> > > enclave/process
> > > would need the EPC.
> > > 
> > > It should be for kernel to make the final decision as it has overall
> > > picture
> > > of the system EPC usage and availability.
> > 
> > EAUG ioctl does not give better capabilities for user space to waste
> > EPC given that EADD ioctl already exists, i.e. your argument is logically
> > incorrect.
> 
> The point of adding EAUG is to allow more efficient use of EPC pages.
> Without EAUG, enclaves have to EADD everything upfront into EPC, consuming
> predetermined number of EPC pages, some of which may not be used at all.
> With EAUG, enclaves should be able to load minimal pages to get started,
> pages added on #PF as they are actually accessed.
> 
> Obviously as you pointed out, some usages make more sense to pre-EAUG (EAUG
> before #PF). But your proposal of supporting only pre-EAUG here essentially
> makes EAUG behave almost the same as EADD.  If the current implementation
> with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible
> based on Dave's comments), then it is flxible to cover all cases and allow
> kernel to optimize allocation of EPC pages.

There is no even a working #PF based implementation in existance, and your
argument has too many if's for my taste.

Reinette, can you squash this fixup to your patch set and send v3 so that
we get to a working implementation that can be benchmarked against e.g.
ioctl based version:

https://lore.kernel.org/linux-sgx/20220304033918.361495-1-jarkko@kernel.org/T/#u

This also objectively fixes some performance issues, e.g. EMODPE can be
just used without any round-trips (v2 requires relax ioctl).

BR, Jark

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
  2022-02-08  0:45 ` [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes Reinette Chatre
@ 2022-03-04  8:55   ` Jarkko Sakkinen
  2022-03-04 19:19     ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-04  8:55 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Feb 07, 2022 at 04:45:30PM -0800, Reinette Chatre wrote:
> Enclave creators declare their enclave page permissions (EPCM
> permissions) at the time the pages are added to the enclave. These
> page permissions are the vetted permissible accesses of the enclave
> pages and stashed off (in struct sgx_encl_page->vm_max_prot_bits)
> for later comparison with enclave PTEs and VMAs.
> 
> Current permission support assume that EPCM permissions remain static
> for the lifetime of the enclave. This is about to change with the
> addition of support for SGX2 where the EPCM permissions of enclave
> pages belonging to an initialized enclave may change during the
> enclave's lifetime.
> 
> Support for changing of EPCM permissions should continue to respect
> the vetted maximum protection bits maintained in
> sgx_encl_page->vm_max_prot_bits. Towards this end, add
> sgx_encl_page->vm_run_prot_bits in preparation for support of
> enclave page permission changes. sgx_encl_page->vm_run_prot_bits
> reflect the active EPCM permissions of an enclave page and are not to
> exceed sgx_encl_page->vm_max_prot_bits.
> 
> Two permission fields are used: sgx_encl_page->vm_run_prot_bits
> reflects the current EPCM permissions and is used to manage the page
> table entries while sgx_encl_page->vm_max_prot_bits contains the vetted
> maximum protection bits and is used to guide which EPCM permissions
> are allowed in the upcoming SGX2 permission changing support (it guides
> what values sgx_encl_page->vm_run_prot_bits may have).
> 
> Consider this example how sgx_encl_page->vm_max_prot_bits and
> sgx_encl_page->vm_run_prot_bits are used:
> 
> (1) Add an enclave page with secinfo of RW to an uninitialized enclave:
>     sgx_encl_page->vm_max_prot_bits = RW
>     sgx_encl_page->vm_run_prot_bits = RW
> 
>     At this point RW VMAs would be allowed to access this page and PTEs
>     would allow write access as guided by
>     sgx_encl_page->vm_run_prot_bits.
> 
> (2) User space invokes SGX2 to change the EPCM permissions to read-only.
>     This is allowed because sgx_encl_page->vm_max_prot_bits = RW:
>     sgx_encl_page->vm_max_prot_bits = RW
>     sgx_encl_page->vm_run_prot_bits = R
> 
>     At this point only new read-only VMAs would be allowed to access
>     this page and PTEs would not allow write access as guided
>     by sgx_encl_page->vm_run_prot_bits.
> 
> (3) User space invokes SGX2 to change the EPCM permissions to RX.
>     This will not be supported by the kernel because
>     sgx_encl_page->vm_max_prot_bits = RW:
>     sgx_encl_page->vm_max_prot_bits = RW
>     sgx_encl_page->vm_run_prot_bits = R
> 
> (3) User space invokes SGX2 to change the EPCM permissions to RW.
>     This will be allowed because sgx_encl_page->vm_max_prot_bits = RW:
>     sgx_encl_page->vm_max_prot_bits = RW
>     sgx_encl_page->vm_run_prot_bits = RW
> 
>     At this point RW VMAs would again be allowed to access this page
>     and PTEs would allow write access as guided by
>     sgx_encl_page->vm_run_prot_bits.
> 
> struct sgx_encl_page hosting this information is maintained for each
> enclave page so the space consumed by the struct is important.
> The existing sgx_encl_page->vm_max_prot_bits is already unsigned long
> while only using three bits. Transition to a bitfield for the two
> members containing protection bits.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Add snippet to Documentation/x86/sgx.rst that details the difference
>   between vm_max_prot_bits and vm_run_prot_bits (Andy and Jarkko).
> - Change subject line (Jarkko).
> - Refer to actual variables instead of using English rephrasing -
>   sgx_encl_page->vm_run_prot_bits instead of "runtime
>   protection bits" (Jarkko).
> - Add information in commit message on why two fields are needed
>   (Jarkko).
> 
>  Documentation/x86/sgx.rst       | 10 ++++++++++
>  arch/x86/kernel/cpu/sgx/encl.c  |  6 +++---
>  arch/x86/kernel/cpu/sgx/encl.h  |  3 ++-
>  arch/x86/kernel/cpu/sgx/ioctl.c |  6 ++++++
>  4 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index 5659932728a5..9df620b59f83 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
>  * PTEs are installed to match the EPCM permissions, but not be more
>    relaxed than the VMA permissions.
>  
> +During runtime the EPCM permissions of enclave pages belonging to an
> +initialized enclave can change on systems supporting SGX2. In support
> +of these runtime changes the kernel maintains (for each enclave page)
> +the most permissive EPCM permission mask allowed by policy as
> +the ``vm_max_prot_bits`` of that page. EPCM permissions are not allowed
> +to be relaxed beyond ``vm_max_prot_bits``.  The kernel also maintains
> +the currently active EPCM permissions of an enclave page as its
> +``vm_run_prot_bits`` to ensure PTEs and new VMAs respect the active
> +EPCM permission values.
> +
>  On systems supporting SGX2 EPCM permissions may change while the
>  enclave page belongs to a VMA without impacting the VMA permissions.
>  This means that a running VMA may appear to allow access to an enclave
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 1ba01c75a579..a980d8458949 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -164,7 +164,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  	 * exceed the VMA permissions.
>  	 */
>  	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> -	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
> +	page_prot_bits = entry->vm_run_prot_bits & vm_prot_bits;
>  	/*
>  	 * Add VM_SHARED so that PTE is made writable right away if VMA
>  	 * and EPCM are writable (no COW in SGX).
> @@ -217,7 +217,7 @@ static vm_fault_t sgx_vma_pfn_mkwrite(struct vm_fault *vmf)
>  		goto out;
>  	}
>  
> -	if (!(entry->vm_max_prot_bits & VM_WRITE))
> +	if (!(entry->vm_run_prot_bits & VM_WRITE))
>  		ret = VM_FAULT_SIGBUS;
>  
>  out:
> @@ -280,7 +280,7 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>  	mutex_lock(&encl->lock);
>  	xas_lock(&xas);
>  	xas_for_each(&xas, page, PFN_DOWN(end - 1)) {
> -		if (~page->vm_max_prot_bits & vm_prot_bits) {
> +		if (~page->vm_run_prot_bits & vm_prot_bits) {
>  			ret = -EACCES;
>  			break;
>  		}
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index fec43ca65065..dc262d843411 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -27,7 +27,8 @@
>  
>  struct sgx_encl_page {
>  	unsigned long desc;
> -	unsigned long vm_max_prot_bits;
> +	unsigned long vm_max_prot_bits:8;
> +	unsigned long vm_run_prot_bits:8;
>  	struct sgx_epc_page *epc_page;
>  	struct sgx_encl *encl;
>  	struct sgx_va_page *va_page;
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 83df20e3e633..7e0819a89532 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -197,6 +197,12 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
>  	/* Calculate maximum of the VM flags for the page. */
>  	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
>  
> +	/*
> +	 * At time of allocation, the runtime protection bits are the same
> +	 * as the maximum protection bits.
> +	 */
> +	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
> +
>  	return encl_page;
>  }
>  
> -- 
> 2.25.1
> 

This patch I can NAK without 2nd thought. It adds to the round-trips of
using ENCLU[EMODPE].

A better idea is the one I explain in

https://lore.kernel.org/linux-sgx/20220304033918.361495-1-jarkko@kernel.org/T/#u

The only thing this patch is doing is adding artifical complexity when
we already have somewhat complex microarchitecture for permissions.

Please just drop this patch from the next version and also ioctl for relax.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
  2022-02-08  0:45 ` [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions Reinette Chatre
@ 2022-03-04  8:59   ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-04  8:59 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Feb 07, 2022 at 04:45:37PM -0800, Reinette Chatre wrote:
> In the initial (SGX1) version of SGX, pages in an enclave need to be
> created with permissions that support all usages of the pages, from
> the time the enclave is initialized until it is unloaded. For example,
> pages used by a JIT compiler or when code needs to otherwise be
> relocated need to always have RWX permissions.
> 
> With the SGX2 function ENCLU[EMODPE] an enclave is able to relax
> the EPCM permissions of its pages after the enclave is initialized.
> Relaxing EPCM permissions is not possible from outside the enclave,
> including from the kernel. The kernel does control the PTEs though
> and the enclave still depends on the kernel to install PTEs with the
> new relaxed permissions before it (the enclave) can access the pages
> using the new permissions.
> 
> Introduce ioctl() SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support
> relaxing of EPCM permissions done from within the enclave. With
> this ioctl() the user specifies a page range and the permissions to
> be applied to all pages in the provided range. After checking
> the new permissions (more detail below) the PTEs are reset and
> it is ensured that any new PTEs will contain the new, relaxed,
> permissions.
> 
> The permission change request could fail on any page within the
> provided range. To support partial success the ioctl() returns
> an error code based on failures encountered by the kernel and
> the number of pages that were successfully changed.
> 
> Checking user provided new permissions
> ======================================
> 
> Enclave page permission changes need to be approached with care and
> for this reason permission changes are only allowed if
> the new permissions are the same or more restrictive that the
> vetted permissions. Thus, even though an enclave is able to relax
> the EPCM permissions of its pages beyond what was originally vetted,
> the kernel will not. The kernel will only install PTEs that respect
> the vetted enclave page permissions.
> 
> For example, enclave pages with vetted EPCM permissions in brackets
> below are allowed to have PTE permissions as follows:
> * (RWX) R => RW => RX => RWX
> * (RW) R => RW
> * (RX) R => RX
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Change terminology to use "relax" instead of "extend" to refer to
>   the case when enclave page permissions are added (Dave).
> - Use ioctl() in commit message (Dave).
> - Add examples on what permissions would be allowed (Dave).
> - Split enclave page permission changes into two ioctl()s, one for
>   permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS)
>   and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS)
>   (Jarkko).
> - In support of the ioctl() name change the following names have been
>   changed:
>   struct sgx_page_modp -> struct sgx_enclave_relax_perm
>   sgx_ioc_page_modp() -> sgx_ioc_enclave_relax_perm()
>   sgx_page_modp() -> sgx_enclave_relax_perm()
> - ioctl() takes entire secinfo as input instead of
>   page permissions only (Jarkko).
> - Fix kernel-doc to include () in function name.
> - Introduce small helper to check for SGX2 readiness instead of
>   duplicating the same two checks in every SGX2 supporting ioctl().
> - Fixups in comments
> - Move kernel-doc to function that provides documentation for
>   Documentation/x86/sgx.rst.
> - Remove redundant comment.
> - Make explicit which member of struct sgx_enclave_relax_perm is
>   for output (Dave).
> 
>  arch/x86/include/uapi/asm/sgx.h |  19 +++
>  arch/x86/kernel/cpu/sgx/ioctl.c | 199 ++++++++++++++++++++++++++++++++
>  2 files changed, 218 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index f4b81587e90b..5c678b27bb72 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -29,6 +29,8 @@ enum sgx_page_flags {
>  	_IOW(SGX_MAGIC, 0x03, struct sgx_enclave_provision)
>  #define SGX_IOC_VEPC_REMOVE_ALL \
>  	_IO(SGX_MAGIC, 0x04)
> +#define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \
> +	_IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm)
>  
>  /**
>   * struct sgx_enclave_create - parameter structure for the
> @@ -76,6 +78,23 @@ struct sgx_enclave_provision {
>  	__u64 fd;
>  };
>  
> +/**
> + * struct sgx_enclave_relax_perm - parameters for ioctl
> + *                                 %SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> + * @offset:	starting page offset (page aligned relative to enclave base
> + *		address defined in SECS)
> + * @length:	length of memory (multiple of the page size)
> + * @secinfo:	address for the SECINFO data containing the new permission bits
> + *		for pages in range described by @offset and @length
> + * @count:	(output) bytes successfully changed (multiple of page size)
> + */
> +struct sgx_enclave_relax_perm {
> +	__u64 offset;
> +	__u64 length;
> +	__u64 secinfo;
> +	__u64 count;
> +};
> +
>  struct sgx_enclave_run;
>  
>  /**
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index b8336d5d9029..9cc6af404bf6 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -698,6 +698,202 @@ static long sgx_ioc_enclave_provision(struct sgx_encl *encl, void __user *arg)
>  	return sgx_set_attribute(&encl->attributes_mask, params.fd);
>  }
>  
> +static unsigned long vm_prot_from_secinfo(u64 secinfo_perm)
> +{
> +	unsigned long vm_prot;
> +
> +	vm_prot = _calc_vm_trans(secinfo_perm, SGX_SECINFO_R, PROT_READ)  |
> +		  _calc_vm_trans(secinfo_perm, SGX_SECINFO_W, PROT_WRITE) |
> +		  _calc_vm_trans(secinfo_perm, SGX_SECINFO_X, PROT_EXEC);
> +	vm_prot = calc_vm_prot_bits(vm_prot, 0);
> +
> +	return vm_prot;
> +}
> +
> +/**
> + * sgx_enclave_relax_perm() - Update OS after permissions relaxed by enclave
> + * @encl:	Enclave to which the pages belong.
> + * @modp:	Checked parameters from user on which pages need modifying.
> + * @secinfo_perm: New validated permission bits.
> + *
> + * Return:
> + * - 0:		Success.
> + * - -errno:	Otherwise.
> + */
> +static long sgx_enclave_relax_perm(struct sgx_encl *encl,
> +				   struct sgx_enclave_relax_perm *modp,
> +				   u64 secinfo_perm)
> +{
> +	struct sgx_encl_page *entry;
> +	unsigned long vm_prot;
> +	unsigned long addr;
> +	unsigned long c;
> +	int ret;
> +
> +	vm_prot = vm_prot_from_secinfo(secinfo_perm);
> +
> +	for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
> +		addr = encl->base + modp->offset + c;
> +
> +		mutex_lock(&encl->lock);
> +
> +		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
> +		if (!entry) {
> +			ret = -EFAULT;
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * Changing EPCM permissions is only supported on regular
> +		 * SGX pages.
> +		 */
> +		if (entry->type != SGX_PAGE_TYPE_REG) {
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * Do not accept permissions that are more relaxed
> +		 * than vetted permissions.
> +		 * If this check fails then EPCM permissions may be more
> +		 * relaxed that what would be allowed by the kernel via
> +		 * PTEs.
> +		 */
> +		if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) {
> +			ret = -EPERM;
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * Change runtime protection before zapping PTEs to ensure
> +		 * any new #PF uses new permissions.
> +		 */
> +		entry->vm_run_prot_bits = vm_prot;
> +
> +		mutex_unlock(&encl->lock);
> +		/*
> +		 * Do not keep encl->lock because of dependency on
> +		 * mmap_lock acquired in sgx_zap_enclave_ptes().
> +		 */
> +		sgx_zap_enclave_ptes(encl, addr);
> +	}
> +
> +	ret = 0;
> +	goto out;
> +
> +out_unlock:
> +	mutex_unlock(&encl->lock);
> +out:
> +	modp->count = c;
> +
> +	return ret;
> +}
> +
> +/*
> + * Ensure enclave is ready for SGX2 functions. Readiness is checked
> + * by ensuring the hardware supports SGX2 and the enclave is initialized
> + * and thus able to handle requests to modify pages within it.
> + */
> +static int sgx_ioc_sgx2_ready(struct sgx_encl *encl)
> +{
> +	if (!(cpu_feature_enabled(X86_FEATURE_SGX2)))
> +		return -ENODEV;
> +
> +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +/*
> + * Return valid permission fields from a secinfo structure provided by
> + * user space. The secinfo structure is required to only have bits in
> + * the permission fields set.
> + */
> +static int sgx_perm_from_user_secinfo(void __user *_secinfo, u64 *secinfo_perm)
> +{
> +	struct sgx_secinfo secinfo;
> +	u64 perm;
> +
> +	if (copy_from_user(&secinfo, (void __user *)_secinfo,
> +			   sizeof(secinfo)))
> +		return -EFAULT;
> +
> +	if (secinfo.flags & ~SGX_SECINFO_PERMISSION_MASK)
> +		return -EINVAL;
> +
> +	if (memchr_inv(secinfo.reserved, 0, sizeof(secinfo.reserved)))
> +		return -EINVAL;
> +
> +	perm = secinfo.flags & SGX_SECINFO_PERMISSION_MASK;
> +
> +	if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
> +		return -EINVAL;
> +
> +	*secinfo_perm = perm;
> +
> +	return 0;
> +}
> +
> +/**
> + * sgx_ioc_enclave_relax_perm() - handler for
> + *                                %SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> + * @encl:	an enclave pointer
> + * @arg:	userspace pointer to a &struct sgx_enclave_relax_perm instance
> + *
> + * SGX2 distinguishes between relaxing and restricting the enclave page
> + * permissions maintained by the hardware (EPCM permissions) of pages
> + * belonging to an initialized enclave (after %SGX_IOC_ENCLAVE_INIT).
> + *
> + * EPCM permissions can be relaxed anytime directly from within the enclave
> + * with no visibility from the kernel. This is accomplished with
> + * ENCLU[EMODPE] run from within the enclave. Accessing pages with
> + * the new, relaxed permissions requires the kernel to update the PTE
> + * to handle the subsequent #PF correctly.
> + *
> + * Enclave page permissions are not allowed to exceed the
> + * maximum vetted permissions maintained in
> + * &struct sgx_encl_page->vm_max_prot_bits. If the enclave
> + * exceeds these permissions by running ENCLU[EMODPE] from within the enclave
> + * the kernel will prevent access to the pages via PTE and
> + * VMA permissions.
> + *
> + * Return:
> + * - 0:		Success
> + * - -errno:	Otherwise
> + */
> +static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg)
> +{
> +	struct sgx_enclave_relax_perm params;
> +	u64 secinfo_perm;
> +	long ret;
> +
> +	ret = sgx_ioc_sgx2_ready(encl);
> +	if (ret)
> +		return ret;
> +
> +	if (copy_from_user(&params, arg, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (sgx_validate_offset_length(encl, params.offset, params.length))
> +		return -EINVAL;
> +
> +	ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo,
> +					 &secinfo_perm);
> +	if (ret)
> +		return ret;
> +
> +	if (params.count)
> +		return -EINVAL;
> +
> +	ret = sgx_enclave_relax_perm(encl, &params, secinfo_perm);
> +
> +	if (copy_to_user(arg, &params, sizeof(params)))
> +		return -EFAULT;
> +
> +	return ret;
> +}
> +
>  long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  {
>  	struct sgx_encl *encl = filep->private_data;
> @@ -719,6 +915,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  	case SGX_IOC_ENCLAVE_PROVISION:
>  		ret = sgx_ioc_enclave_provision(encl, (void __user *)arg);
>  		break;
> +	case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS:
> +		ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg);
> +		break;
>  	default:
>  		ret = -ENOIOCTLCMD;
>  		break;
> -- 
> 2.25.1
> 

Definitive NAK.

Should be dropped from the next patch set version. We *do not* want to
artificially construct an extra round-trip to EMODPE flow.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-04  8:30                           ` Jarkko Sakkinen
@ 2022-03-04 15:51                             ` Haitao Huang
  2022-03-05  1:02                               ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Haitao Huang @ 2022-03-04 15:51 UTC (permalink / raw)
  To: Reinette Chatre, Jarkko Sakkinen
  Cc: Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko

On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
>>
>> On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org>
>> wrote:
>>
>> > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
>> > > Hi all,
>> > >
>> > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
>> > > <reinette.chatre@intel.com> wrote:
>> > >
>> > > > Hi Jarkko,
>> > > >
>> > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
>> > > > > > Hi Jarkko,
>> > > > > >
>> > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
>> > > > > > > > this version of
>> > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX
>> > > pages but
>> > > > > > > > obviously new RX pages are now out of the picture:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 	/*
>> > > > > > > > 	 * Adding a regular page that is architecturally allowed
>> > > to only
>> > > > > > > > 	 * be created with RW permissions.
>> > > > > > > > 	 * TBD: Interface with user space policy to support max
>> > > permissions
>> > > > > > > > 	 * of RWX.
>> > > > > > > > 	 */
>> > > > > > > > 	prot = PROT_READ | PROT_WRITE;
>> > > > > > > > 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > > > > > > 	encl_page->vm_max_prot_bits =  
>> encl_page->vm_run_prot_bits;
>> > > > > > > >
>> > > > > > > > If that TBD is left out to the final version the page
>> > > > > > > > augmentation has a
>> > > > > > > > risk of a API bottleneck, and that risk can realize then
>> > > > > > > > also in the page
>> > > > > > > > permission ioctls.
>> > > > > > > >
>> > > > > > > > I.e. now any review comment is based on not fully known
>> > > > > > > > territory, we have
>> > > > > > > > one known unknown, and some unknown unknowns from
>> > > > > > > > unpredictable effect to
>> > > > > > > > future API changes.
>> > > > > >
>> > > > > > The plan to complete the "TBD" in the above snippet was to
>> > > > > > follow this work
>> > > > > > with user policy integration at this location. On a high level
>> > > > > > the plan was
>> > > > > > for this to look something like:
>> > > > > >
>> > > > > >
>> > > > > >  	/*
>> > > > > >  	 * Adding a regular page that is architecturally allowed to  
>> only
>> > > > > >  	 * be created with RW permissions.
>> > > > > >  	 * Interface with user space policy to support max  
>> permissions
>> > > > > >  	 * of RWX.
>> > > > > >  	 */
>> > > > > >  	prot = PROT_READ | PROT_WRITE;
>> > > > > >  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
>> > > > > >
>> > > > > >         if (user space policy allows RWX on dynamically added
>> > > pages)
>> > > > > > 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > > > PROT_WRITE | PROT_EXEC, 0);
>> > > > > > 	else
>> > > > > > 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
>> > > > > > PROT_WRITE, 0);
>> > > > > >
>> > > > > > The work that follows this series aimed to do the integration
>> > > with user
>> > > > > > space policy.
>> > > > >
>> > > > > What do you mean by "user space policy" anyway exactly? I'm
>> > > sorry but I
>> > > > > just don't fully understand this.
>> > > >
>> > > > My apologies - I just assumed that you would need no reminder
>> > > about this
>> > > > contentious
>> > > > part of SGX history. Essentially it means that, yes, the kernel  
>> could
>> > > > theoretically
>> > > > permit any kind of access to any file/page, but some accesses are
>> > > known
>> > > > to generally
>> > > > be a bad idea - like making memory executable as well as writable
>> > > - and
>> > > > thus there
>> > > > are additional checks based on what user space permits before the
>> > > kernel
>> > > > allows
>> > > > such accesses.
>> > > >
>> > > > For example,
>> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>> > > >
>> > > > User policy and SGX has seen significant discussion. Some notable
>> > > > threads:
>> > > >  
>> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
>> > > >  
>> https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
>> > > >
>> > > > > It's too big of a risk to accept this series without X taken  
>> care
>> > > > > of. Patch
>> > > > > series should neither have TODO nor TBD comments IMHO. I don't  
>> want
>> > > > > to ack
>> > > > > a series based on speculation what might happen in the future.
>> > > >
>> > > > ok
>> > > >
>> > > > >
>> > > > > > > I think the best way to move forward would be to do EAUG's
>> > > > > > > explicitly with
>> > > > > > > an ioctl that could also include secinfo for permissions.
>> > > Then you can
>> > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
>> > > > > >
>> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be
>> > > used for
>> > > > > > this purpose. It already includes SECINFO which may also be
>> > > useful if
>> > > > > > needing to later support EAUG of PT_SS* pages.
>> > > > >
>> > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and  
>> call it
>> > > > > a day.
>> > > >
>> > > > I could, yes.
>> > > >
>> > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
>> > > > > this weird
>> > > > > thing added to the #PF handler? Why is it added at all then?
>> > > >
>> > > > I was just speculating in my response, there is no plan to extend
>> > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>> > > >
>> > > > > > How this could work is user space calls  
>> SGX_IOC_ENCLAVE_ADD_PAGES
>> > > > > > after enclave initialization on any memory region within the
>> > > > > > enclave where
>> > > > > > pages are planned to be added dynamically. This ioctl() calls
>> > > > > > EAUG to add the
>> > > > > > new pages with RW permissions and their vm_max_prot_bits can  
>> be
>> > > > > > set to the
>> > > > > > permissions found in the included SECINFO. This will support
>> > > > > > later EACCEPTCOPY
>> > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>> > > > >
>> > > > > I don't like this type of re-use of the existing API.
>> > > >
>> > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
>> > > consensus
>> > > > after
>> > > > considering the user policy question (above) and performance  
>> trade-off
>> > > > (more below).
>> > > >
>> > > > >
>> > > > > > The big question is whether communicating user policy after
>> > > > > > enclave initialization
>> > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
>> > > > > > to all? I would
>> > > > > > appreciate a confirmation on this direction considering the
>> > > > > > significant history
>> > > > > > behind this topic.
>> > > > >
>> > > > > I have no idea because I don't know what is user space policy.
>> > > >
>> > > > This discussion is about some enclave usages needing RWX  
>> permissions
>> > > > on dynamically added enclave pages. RWX permissions on dynamically
>> > > added
>> > > > pages is
>> > > > not something that should blindly be allowed for all SGX enclaves  
>> but
>> > > > instead the user
>> > > > needs to explicitly allow specific enclaves to have such ability.  
>> This
>> > > > is equivalent
>> > > > to (but not the same as) what exists in Linux today with LSM. As
>> > > seen in
>> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux
>> > > is able
>> > > > to make
>> > > > files and memory be both writable and executable, but it would  
>> only do
>> > > > so for those
>> > > > files and memory that the LSM (which is how user policy is
>> > > communicated,
>> > > > like SELinux)
>> > > > indicates it is allowed, not blindly do so for all files and all
>> > > memory.
>> > > >
>> > > > > > > Putting EAUG to the #PF handler and implicitly call it just
>> > > > > > > too flakky and
>> > > > > > > hard to make deterministic for e.g. JIT compiler in our use
>> > > > > > > case (not to
>> > > > > > > mention that JIT is not possible at all because inability to
>> > > > > > > do RX pages).
>> > > >
>> > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
>> > > deterministic
>> > > > but from
>> > > > what I understand it would have a performance impact since it  
>> would
>> > > > require all memory
>> > > > that may be needed by the enclave be pre-allocated from outside  
>> the
>> > > > enclave and not
>> > > > just dynamically allocated from within the enclave at the time it  
>> is
>> > > > needed.
>> > > >
>> > > > Would such a performance impact be acceptable?
>> > > >
>> > >
>> > > User space won't always have enough info to decide whether the pages
>> > > to be
>> > > EAUG'd immediately. In some cases (shared libraries, JVM for
>> > > example) lots
>> > > of code/data pages can be mapped but never actually touched. One
>> > > enclave/process does not know if any other more important
>> > > enclave/process
>> > > would need the EPC.
>> > >
>> > > It should be for kernel to make the final decision as it has overall
>> > > picture
>> > > of the system EPC usage and availability.
>> >
>> > EAUG ioctl does not give better capabilities for user space to waste
>> > EPC given that EADD ioctl already exists, i.e. your argument is  
>> logically
>> > incorrect.
>>
>> The point of adding EAUG is to allow more efficient use of EPC pages.
>> Without EAUG, enclaves have to EADD everything upfront into EPC,  
>> consuming
>> predetermined number of EPC pages, some of which may not be used at all.
>> With EAUG, enclaves should be able to load minimal pages to get started,
>> pages added on #PF as they are actually accessed.
>>
>> Obviously as you pointed out, some usages make more sense to pre-EAUG  
>> (EAUG
>> before #PF). But your proposal of supporting only pre-EAUG here  
>> essentially
>> makes EAUG behave almost the same as EADD.  If the current  
>> implementation
>> with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible
>> based on Dave's comments), then it is flxible to cover all cases and  
>> allow
>> kernel to optimize allocation of EPC pages.
>
> There is no even a working #PF based implementation in existance, and  
> your
> argument has too many if's for my taste.

1) if you mean no user space is implementing this kind of solution, read  
this section, otherwise, skip to 2) below which is only couple of  
sentences.

If you are willing to look, there is already implementation in our SDK to  
do heap and stack expansion on demand on #PF. Enclaves may not know  
heap/stack size up front, we have implemented these features to make EPC  
usage more efficient. I don't know why normal processes can add RAM on  
#PF, but enclaves adding EPC on #PF becomes so unacceptable concept to  
you. And the kernel does that for EPC swapping already when #PF happens on  
a swapped out EPC page.

Our implementation has gone through several rounds, the latest is  
here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was also  
implemented in original OOT driver based SDK implementation. Customers are  
using it and found them useful. I think this is a critical feature that  
many other runtimes will also need.

2)
It's OK for you to request additional support for your usage and I agree  
it is needed. But IMHO, totally getting rid of EAUG on #PF is bad and  
unnecessary. Current implementation can be extended to support your usage.  
What's the reason  you think MAP_POPULATE won't work for you?

BR
Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
  2022-03-04  8:55   ` Jarkko Sakkinen
@ 2022-03-04 19:19     ` Reinette Chatre
  0 siblings, 0 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-03-04 19:19 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

Hi Jarkko,

On 3/4/2022 12:55 AM, Jarkko Sakkinen wrote:
> On Mon, Feb 07, 2022 at 04:45:30PM -0800, Reinette Chatre wrote:
>> Enclave creators declare their enclave page permissions (EPCM
>> permissions) at the time the pages are added to the enclave. These
>> page permissions are the vetted permissible accesses of the enclave
>> pages and stashed off (in struct sgx_encl_page->vm_max_prot_bits)
>> for later comparison with enclave PTEs and VMAs.
>>
>> Current permission support assume that EPCM permissions remain static
>> for the lifetime of the enclave. This is about to change with the
>> addition of support for SGX2 where the EPCM permissions of enclave
>> pages belonging to an initialized enclave may change during the
>> enclave's lifetime.
>>
>> Support for changing of EPCM permissions should continue to respect
>> the vetted maximum protection bits maintained in
>> sgx_encl_page->vm_max_prot_bits. Towards this end, add
>> sgx_encl_page->vm_run_prot_bits in preparation for support of
>> enclave page permission changes. sgx_encl_page->vm_run_prot_bits
>> reflect the active EPCM permissions of an enclave page and are not to
>> exceed sgx_encl_page->vm_max_prot_bits.
>>
>> Two permission fields are used: sgx_encl_page->vm_run_prot_bits
>> reflects the current EPCM permissions and is used to manage the page
>> table entries while sgx_encl_page->vm_max_prot_bits contains the vetted
>> maximum protection bits and is used to guide which EPCM permissions
>> are allowed in the upcoming SGX2 permission changing support (it guides
>> what values sgx_encl_page->vm_run_prot_bits may have).
>>
>> Consider this example how sgx_encl_page->vm_max_prot_bits and
>> sgx_encl_page->vm_run_prot_bits are used:
>>
>> (1) Add an enclave page with secinfo of RW to an uninitialized enclave:
>>     sgx_encl_page->vm_max_prot_bits = RW
>>     sgx_encl_page->vm_run_prot_bits = RW
>>
>>     At this point RW VMAs would be allowed to access this page and PTEs
>>     would allow write access as guided by
>>     sgx_encl_page->vm_run_prot_bits.
>>
>> (2) User space invokes SGX2 to change the EPCM permissions to read-only.
>>     This is allowed because sgx_encl_page->vm_max_prot_bits = RW:
>>     sgx_encl_page->vm_max_prot_bits = RW
>>     sgx_encl_page->vm_run_prot_bits = R
>>
>>     At this point only new read-only VMAs would be allowed to access
>>     this page and PTEs would not allow write access as guided
>>     by sgx_encl_page->vm_run_prot_bits.
>>
>> (3) User space invokes SGX2 to change the EPCM permissions to RX.
>>     This will not be supported by the kernel because
>>     sgx_encl_page->vm_max_prot_bits = RW:
>>     sgx_encl_page->vm_max_prot_bits = RW
>>     sgx_encl_page->vm_run_prot_bits = R
>>
>> (3) User space invokes SGX2 to change the EPCM permissions to RW.
>>     This will be allowed because sgx_encl_page->vm_max_prot_bits = RW:
>>     sgx_encl_page->vm_max_prot_bits = RW
>>     sgx_encl_page->vm_run_prot_bits = RW
>>
>>     At this point RW VMAs would again be allowed to access this page
>>     and PTEs would allow write access as guided by
>>     sgx_encl_page->vm_run_prot_bits.
>>
>> struct sgx_encl_page hosting this information is maintained for each
>> enclave page so the space consumed by the struct is important.
>> The existing sgx_encl_page->vm_max_prot_bits is already unsigned long
>> while only using three bits. Transition to a bitfield for the two
>> members containing protection bits.
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>> Changes since V1:
>> - Add snippet to Documentation/x86/sgx.rst that details the difference
>>   between vm_max_prot_bits and vm_run_prot_bits (Andy and Jarkko).
>> - Change subject line (Jarkko).
>> - Refer to actual variables instead of using English rephrasing -
>>   sgx_encl_page->vm_run_prot_bits instead of "runtime
>>   protection bits" (Jarkko).
>> - Add information in commit message on why two fields are needed
>>   (Jarkko).
>>
>>  Documentation/x86/sgx.rst       | 10 ++++++++++
>>  arch/x86/kernel/cpu/sgx/encl.c  |  6 +++---
>>  arch/x86/kernel/cpu/sgx/encl.h  |  3 ++-
>>  arch/x86/kernel/cpu/sgx/ioctl.c |  6 ++++++
>>  4 files changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
>> index 5659932728a5..9df620b59f83 100644
>> --- a/Documentation/x86/sgx.rst
>> +++ b/Documentation/x86/sgx.rst
>> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
>>  * PTEs are installed to match the EPCM permissions, but not be more
>>    relaxed than the VMA permissions.
>>  
>> +During runtime the EPCM permissions of enclave pages belonging to an
>> +initialized enclave can change on systems supporting SGX2. In support
>> +of these runtime changes the kernel maintains (for each enclave page)
>> +the most permissive EPCM permission mask allowed by policy as
>> +the ``vm_max_prot_bits`` of that page. EPCM permissions are not allowed
>> +to be relaxed beyond ``vm_max_prot_bits``.  The kernel also maintains
>> +the currently active EPCM permissions of an enclave page as its
>> +``vm_run_prot_bits`` to ensure PTEs and new VMAs respect the active
>> +EPCM permission values.
>> +
>>  On systems supporting SGX2 EPCM permissions may change while the
>>  enclave page belongs to a VMA without impacting the VMA permissions.
>>  This means that a running VMA may appear to allow access to an enclave
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>> index 1ba01c75a579..a980d8458949 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -164,7 +164,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>  	 * exceed the VMA permissions.
>>  	 */
>>  	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>> -	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
>> +	page_prot_bits = entry->vm_run_prot_bits & vm_prot_bits;
>>  	/*
>>  	 * Add VM_SHARED so that PTE is made writable right away if VMA
>>  	 * and EPCM are writable (no COW in SGX).
>> @@ -217,7 +217,7 @@ static vm_fault_t sgx_vma_pfn_mkwrite(struct vm_fault *vmf)
>>  		goto out;
>>  	}
>>  
>> -	if (!(entry->vm_max_prot_bits & VM_WRITE))
>> +	if (!(entry->vm_run_prot_bits & VM_WRITE))
>>  		ret = VM_FAULT_SIGBUS;
>>  
>>  out:
>> @@ -280,7 +280,7 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>>  	mutex_lock(&encl->lock);
>>  	xas_lock(&xas);
>>  	xas_for_each(&xas, page, PFN_DOWN(end - 1)) {
>> -		if (~page->vm_max_prot_bits & vm_prot_bits) {
>> +		if (~page->vm_run_prot_bits & vm_prot_bits) {
>>  			ret = -EACCES;
>>  			break;
>>  		}
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
>> index fec43ca65065..dc262d843411 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.h
>> +++ b/arch/x86/kernel/cpu/sgx/encl.h
>> @@ -27,7 +27,8 @@
>>  
>>  struct sgx_encl_page {
>>  	unsigned long desc;
>> -	unsigned long vm_max_prot_bits;
>> +	unsigned long vm_max_prot_bits:8;
>> +	unsigned long vm_run_prot_bits:8;
>>  	struct sgx_epc_page *epc_page;
>>  	struct sgx_encl *encl;
>>  	struct sgx_va_page *va_page;
>> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
>> index 83df20e3e633..7e0819a89532 100644
>> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
>> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
>> @@ -197,6 +197,12 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
>>  	/* Calculate maximum of the VM flags for the page. */
>>  	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
>>  
>> +	/*
>> +	 * At time of allocation, the runtime protection bits are the same
>> +	 * as the maximum protection bits.
>> +	 */
>> +	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
>> +
>>  	return encl_page;
>>  }
>>  
>> -- 
>> 2.25.1
>>
> 
> This patch I can NAK without 2nd thought. It adds to the round-trips of
> using ENCLU[EMODPE].
> 
> A better idea is the one I explain in
> 
> https://lore.kernel.org/linux-sgx/20220304033918.361495-1-jarkko@kernel.org/T/#u
> 
> The only thing this patch is doing is adding artifical complexity when
> we already have somewhat complex microarchitecture for permissions.

The motivation of this change is to ensure that the kernel does not allow
access to a page that the page does not allow. For example, this change
ensures that the kernel will not allow execution on an enclave page that
is not executable. 

In your change this is removed and the kernel will, for example, allow
execution of enclave pages that are not executable.

It could be seen as complex, but it is done out of respect for security.

> 
> Please just drop this patch from the next version and also ioctl for relax.
> 

I responded with more detail to your proposal in:
https://lore.kernel.org/linux-sgx/684930a2-a247-7d5e-90e8-6e80db618c4c@intel.com/

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-04 15:51                             ` Haitao Huang
@ 2022-03-05  1:02                               ` Jarkko Sakkinen
  2022-03-06 14:24                                 ` Haitao Huang
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-05  1:02 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Fri, Mar 04, 2022 at 09:51:22AM -0600, Haitao Huang wrote:
> Hi Jarkko
> 
> On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
> > > 
> > > On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> > > wrote:
> > > 
> > > > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
> > > > > Hi all,
> > > > >
> > > > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
> > > > > <reinette.chatre@intel.com> wrote:
> > > > >
> > > > > > Hi Jarkko,
> > > > > >
> > > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
> > > > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote:
> > > > > > > > Hi Jarkko,
> > > > > > > >
> > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
> > > > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
> > > > > > > > > > this version of
> > > > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX
> > > > > pages but
> > > > > > > > > > obviously new RX pages are now out of the picture:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 	/*
> > > > > > > > > > 	 * Adding a regular page that is architecturally allowed
> > > > > to only
> > > > > > > > > > 	 * be created with RW permissions.
> > > > > > > > > > 	 * TBD: Interface with user space policy to support max
> > > > > permissions
> > > > > > > > > > 	 * of RWX.
> > > > > > > > > > 	 */
> > > > > > > > > > 	prot = PROT_READ | PROT_WRITE;
> > > > > > > > > > 	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > > > > > 	encl_page->vm_max_prot_bits =
> > > encl_page->vm_run_prot_bits;
> > > > > > > > > >
> > > > > > > > > > If that TBD is left out to the final version the page
> > > > > > > > > > augmentation has a
> > > > > > > > > > risk of a API bottleneck, and that risk can realize then
> > > > > > > > > > also in the page
> > > > > > > > > > permission ioctls.
> > > > > > > > > >
> > > > > > > > > > I.e. now any review comment is based on not fully known
> > > > > > > > > > territory, we have
> > > > > > > > > > one known unknown, and some unknown unknowns from
> > > > > > > > > > unpredictable effect to
> > > > > > > > > > future API changes.
> > > > > > > >
> > > > > > > > The plan to complete the "TBD" in the above snippet was to
> > > > > > > > follow this work
> > > > > > > > with user policy integration at this location. On a high level
> > > > > > > > the plan was
> > > > > > > > for this to look something like:
> > > > > > > >
> > > > > > > >
> > > > > > > >  	/*
> > > > > > > >  	 * Adding a regular page that is architecturally allowed
> > > to only
> > > > > > > >  	 * be created with RW permissions.
> > > > > > > >  	 * Interface with user space policy to support max
> > > permissions
> > > > > > > >  	 * of RWX.
> > > > > > > >  	 */
> > > > > > > >  	prot = PROT_READ | PROT_WRITE;
> > > > > > > >  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > > > > > >
> > > > > > > >         if (user space policy allows RWX on dynamically added
> > > > > pages)
> > > > > > > > 	 	encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > > > PROT_WRITE | PROT_EXEC, 0);
> > > > > > > > 	else
> > > > > > > > 		encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ |
> > > > > > > > PROT_WRITE, 0);
> > > > > > > >
> > > > > > > > The work that follows this series aimed to do the integration
> > > > > with user
> > > > > > > > space policy.
> > > > > > >
> > > > > > > What do you mean by "user space policy" anyway exactly? I'm
> > > > > sorry but I
> > > > > > > just don't fully understand this.
> > > > > >
> > > > > > My apologies - I just assumed that you would need no reminder
> > > > > about this
> > > > > > contentious
> > > > > > part of SGX history. Essentially it means that, yes, the
> > > kernel could
> > > > > > theoretically
> > > > > > permit any kind of access to any file/page, but some accesses are
> > > > > known
> > > > > > to generally
> > > > > > be a bad idea - like making memory executable as well as writable
> > > > > - and
> > > > > > thus there
> > > > > > are additional checks based on what user space permits before the
> > > > > kernel
> > > > > > allows
> > > > > > such accesses.
> > > > > >
> > > > > > For example,
> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
> > > > > >
> > > > > > User policy and SGX has seen significant discussion. Some notable
> > > > > > threads:
> > > > > > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
> > > > > > https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
> > > > > >
> > > > > > > It's too big of a risk to accept this series without X taken
> > > care
> > > > > > > of. Patch
> > > > > > > series should neither have TODO nor TBD comments IMHO. I
> > > don't want
> > > > > > > to ack
> > > > > > > a series based on speculation what might happen in the future.
> > > > > >
> > > > > > ok
> > > > > >
> > > > > > >
> > > > > > > > > I think the best way to move forward would be to do EAUG's
> > > > > > > > > explicitly with
> > > > > > > > > an ioctl that could also include secinfo for permissions.
> > > > > Then you can
> > > > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
> > > > > > > >
> > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be
> > > > > used for
> > > > > > > > this purpose. It already includes SECINFO which may also be
> > > > > useful if
> > > > > > > > needing to later support EAUG of PT_SS* pages.
> > > > > > >
> > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and
> > > call it
> > > > > > > a day.
> > > > > >
> > > > > > I could, yes.
> > > > > >
> > > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is
> > > > > > > this weird
> > > > > > > thing added to the #PF handler? Why is it added at all then?
> > > > > >
> > > > > > I was just speculating in my response, there is no plan to extend
> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
> > > > > >
> > > > > > > > How this could work is user space calls
> > > SGX_IOC_ENCLAVE_ADD_PAGES
> > > > > > > > after enclave initialization on any memory region within the
> > > > > > > > enclave where
> > > > > > > > pages are planned to be added dynamically. This ioctl() calls
> > > > > > > > EAUG to add the
> > > > > > > > new pages with RW permissions and their vm_max_prot_bits
> > > can be
> > > > > > > > set to the
> > > > > > > > permissions found in the included SECINFO. This will support
> > > > > > > > later EACCEPTCOPY
> > > > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
> > > > > > >
> > > > > > > I don't like this type of re-use of the existing API.
> > > > > >
> > > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
> > > > > consensus
> > > > > > after
> > > > > > considering the user policy question (above) and performance
> > > trade-off
> > > > > > (more below).
> > > > > >
> > > > > > >
> > > > > > > > The big question is whether communicating user policy after
> > > > > > > > enclave initialization
> > > > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable
> > > > > > > > to all? I would
> > > > > > > > appreciate a confirmation on this direction considering the
> > > > > > > > significant history
> > > > > > > > behind this topic.
> > > > > > >
> > > > > > > I have no idea because I don't know what is user space policy.
> > > > > >
> > > > > > This discussion is about some enclave usages needing RWX
> > > permissions
> > > > > > on dynamically added enclave pages. RWX permissions on dynamically
> > > > > added
> > > > > > pages is
> > > > > > not something that should blindly be allowed for all SGX
> > > enclaves but
> > > > > > instead the user
> > > > > > needs to explicitly allow specific enclaves to have such
> > > ability. This
> > > > > > is equivalent
> > > > > > to (but not the same as) what exists in Linux today with LSM. As
> > > > > seen in
> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux
> > > > > is able
> > > > > > to make
> > > > > > files and memory be both writable and executable, but it would
> > > only do
> > > > > > so for those
> > > > > > files and memory that the LSM (which is how user policy is
> > > > > communicated,
> > > > > > like SELinux)
> > > > > > indicates it is allowed, not blindly do so for all files and all
> > > > > memory.
> > > > > >
> > > > > > > > > Putting EAUG to the #PF handler and implicitly call it just
> > > > > > > > > too flakky and
> > > > > > > > > hard to make deterministic for e.g. JIT compiler in our use
> > > > > > > > > case (not to
> > > > > > > > > mention that JIT is not possible at all because inability to
> > > > > > > > > do RX pages).
> > > > > >
> > > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
> > > > > deterministic
> > > > > > but from
> > > > > > what I understand it would have a performance impact since it
> > > would
> > > > > > require all memory
> > > > > > that may be needed by the enclave be pre-allocated from
> > > outside the
> > > > > > enclave and not
> > > > > > just dynamically allocated from within the enclave at the time
> > > it is
> > > > > > needed.
> > > > > >
> > > > > > Would such a performance impact be acceptable?
> > > > > >
> > > > >
> > > > > User space won't always have enough info to decide whether the pages
> > > > > to be
> > > > > EAUG'd immediately. In some cases (shared libraries, JVM for
> > > > > example) lots
> > > > > of code/data pages can be mapped but never actually touched. One
> > > > > enclave/process does not know if any other more important
> > > > > enclave/process
> > > > > would need the EPC.
> > > > >
> > > > > It should be for kernel to make the final decision as it has overall
> > > > > picture
> > > > > of the system EPC usage and availability.
> > > >
> > > > EAUG ioctl does not give better capabilities for user space to waste
> > > > EPC given that EADD ioctl already exists, i.e. your argument is
> > > logically
> > > > incorrect.
> > > 
> > > The point of adding EAUG is to allow more efficient use of EPC pages.
> > > Without EAUG, enclaves have to EADD everything upfront into EPC,
> > > consuming
> > > predetermined number of EPC pages, some of which may not be used at all.
> > > With EAUG, enclaves should be able to load minimal pages to get started,
> > > pages added on #PF as they are actually accessed.
> > > 
> > > Obviously as you pointed out, some usages make more sense to
> > > pre-EAUG (EAUG
> > > before #PF). But your proposal of supporting only pre-EAUG here
> > > essentially
> > > makes EAUG behave almost the same as EADD.  If the current
> > > implementation
> > > with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible
> > > based on Dave's comments), then it is flxible to cover all cases and
> > > allow
> > > kernel to optimize allocation of EPC pages.
> > 
> > There is no even a working #PF based implementation in existance, and
> > your
> > argument has too many if's for my taste.
> 
> 1) if you mean no user space is implementing this kind of solution, read
> this section, otherwise, skip to 2) below which is only couple of sentences.
> 
> If you are willing to look, there is already implementation in our SDK to do
> heap and stack expansion on demand on #PF. Enclaves may not know heap/stack
> size up front, we have implemented these features to make EPC usage more
> efficient. I don't know why normal processes can add RAM on #PF, but
> enclaves adding EPC on #PF becomes so unacceptable concept to you. And the
> kernel does that for EPC swapping already when #PF happens on a swapped out
> EPC page.

In adds O(n) round-trips for a mmap() emulation, which can be done in O(1)
round-trips with a ioctl.

> Our implementation has gone through several rounds, the latest is
> here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was also
> implemented in original OOT driver based SDK implementation. Customers are
> using it and found them useful. I think this is a critical feature that many
> other runtimes will also need.

I'm not sure what the common sense argument here is.

> 2)
> It's OK for you to request additional support for your usage and I agree it
> is needed. But IMHO, totally getting rid of EAUG on #PF is bad and
> unnecessary. Current implementation can be extended to support your usage.
> What's the reason  you think MAP_POPULATE won't work for you?

I do not recall taking stand on MAP_POPULATE.

> BR
> Haitao

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-03 21:44                         ` Dave Hansen
@ 2022-03-05  3:19                           ` Jarkko Sakkinen
  2022-03-06  0:15                             ` Jarkko Sakkinen
  2022-03-10  5:43                           ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-05  3:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Reinette Chatre, Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Sorry, I missed this.

On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> On 3/3/22 13:23, Reinette Chatre wrote:
> > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > then I believe that SGX would benefit.
> 
> Some Intel folks asked for this quite a while ago.  I think it's
> entirely doable: add a new vm_ops->populate() function that will allow
> ignoring VM_IO|VM_PFNMAP if present.

I'm sorry what I don't understand what you mean by ignoring here,
i.e. cannot fully comprehend the last sentece.

And would the vm_ops->populate() be called right after the existing ones
involved with the VMA creation process?

> Or, if nobody wants to waste all of the vm_ops space, just add an
> arch_vma_populate() or something which can call over into SGX.
> 
> I'll happily review the patches if anyone can put such a beast together.

I'll start with vm_ops->populate() and check the feedback first for
that.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-05  3:19                           ` Jarkko Sakkinen
@ 2022-03-06  0:15                             ` Jarkko Sakkinen
  2022-03-06  0:25                               ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  0:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Reinette Chatre, Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Sat, Mar 05, 2022 at 05:19:24AM +0200, Jarkko Sakkinen wrote:
> Sorry, I missed this.
> 
> On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> > On 3/3/22 13:23, Reinette Chatre wrote:
> > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > > then I believe that SGX would benefit.
> > 
> > Some Intel folks asked for this quite a while ago.  I think it's
> > entirely doable: add a new vm_ops->populate() function that will allow
> > ignoring VM_IO|VM_PFNMAP if present.
> 
> I'm sorry what I don't understand what you mean by ignoring here,
> i.e. cannot fully comprehend the last sentece.
> 
> And would the vm_ops->populate() be called right after the existing ones
> involved with the VMA creation process?
> 
> > Or, if nobody wants to waste all of the vm_ops space, just add an
> > arch_vma_populate() or something which can call over into SGX.
> > 
> > I'll happily review the patches if anyone can put such a beast together.
> 
> I'll start with vm_ops->populate() and check the feedback first for
> that.

I would instead extend populate() in file_operations into:

int (*populate)(struct file *, struct vm_area_struct *, bool populate);

This does not add to memory consumption.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-06  0:15                             ` Jarkko Sakkinen
@ 2022-03-06  0:25                               ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  0:25 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Reinette Chatre, Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Sun, Mar 06, 2022 at 02:15:32AM +0200, Jarkko Sakkinen wrote:
> On Sat, Mar 05, 2022 at 05:19:24AM +0200, Jarkko Sakkinen wrote:
> > Sorry, I missed this.
> > 
> > On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> > > On 3/3/22 13:23, Reinette Chatre wrote:
> > > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > > > then I believe that SGX would benefit.
> > > 
> > > Some Intel folks asked for this quite a while ago.  I think it's
> > > entirely doable: add a new vm_ops->populate() function that will allow
> > > ignoring VM_IO|VM_PFNMAP if present.
> > 
> > I'm sorry what I don't understand what you mean by ignoring here,
> > i.e. cannot fully comprehend the last sentece.
> > 
> > And would the vm_ops->populate() be called right after the existing ones
> > involved with the VMA creation process?
> > 
> > > Or, if nobody wants to waste all of the vm_ops space, just add an
> > > arch_vma_populate() or something which can call over into SGX.
> > > 
> > > I'll happily review the patches if anyone can put such a beast together.
> > 
> > I'll start with vm_ops->populate() and check the feedback first for
> > that.
> 
> I would instead extend populate() in file_operations into:
> 
> int (*populate)(struct file *, struct vm_area_struct *, bool populate);
> 
> This does not add to memory consumption.

Ugh, mixing my words, sorry :-) I meant:

int (*mmap)(struct file *, struct vm_area_struct *, bool populate);

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-05  1:02                               ` Jarkko Sakkinen
@ 2022-03-06 14:24                                 ` Haitao Huang
  0 siblings, 0 replies; 130+ messages in thread
From: Haitao Huang @ 2022-03-06 14:24 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Reinette Chatre, Dave Hansen, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Fri, 04 Mar 2022 19:02:28 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Fri, Mar 04, 2022 at 09:51:22AM -0600, Haitao Huang wrote:
>> Hi Jarkko
>>
>> On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <jarkko@kernel.org>
>> wrote:
>>
>> > On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote:
>> > >
>> > > On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen  
>> <jarkko@kernel.org>
>> > > wrote:
>> > >
>> > > > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote:
>> > > > > Hi all,
>> > > > >
>> > > > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre
>> > > > > <reinette.chatre@intel.com> wrote:
>> > > > >
>> > > > > > Hi Jarkko,
>> > > > > >
>> > > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote:
>> > > > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre  
>> wrote:
>> > > > > > > > Hi Jarkko,
>> > > > > > > >
>> > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote:
>> > > > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of
>> > > > > > > > > > this version of
>> > > > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R  
>> and RX
>> > > > > pages but
>> > > > > > > > > > obviously new RX pages are now out of the picture:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > 	/*
>> > > > > > > > > > 	 * Adding a regular page that is architecturally  
>> allowed
>> > > > > to only
>> > > > > > > > > > 	 * be created with RW permissions.
>> > > > > > > > > > 	 * TBD: Interface with user space policy to support  
>> max
>> > > > > permissions
>> > > > > > > > > > 	 * of RWX.
>> > > > > > > > > > 	 */
>> > > > > > > > > > 	prot = PROT_READ | PROT_WRITE;
>> > > > > > > > > > 	encl_page->vm_run_prot_bits =  
>> calc_vm_prot_bits(prot, 0);
>> > > > > > > > > > 	encl_page->vm_max_prot_bits =
>> > > encl_page->vm_run_prot_bits;
>> > > > > > > > > >
>> > > > > > > > > > If that TBD is left out to the final version the page
>> > > > > > > > > > augmentation has a
>> > > > > > > > > > risk of a API bottleneck, and that risk can realize  
>> then
>> > > > > > > > > > also in the page
>> > > > > > > > > > permission ioctls.
>> > > > > > > > > >
>> > > > > > > > > > I.e. now any review comment is based on not fully  
>> known
>> > > > > > > > > > territory, we have
>> > > > > > > > > > one known unknown, and some unknown unknowns from
>> > > > > > > > > > unpredictable effect to
>> > > > > > > > > > future API changes.
>> > > > > > > >
>> > > > > > > > The plan to complete the "TBD" in the above snippet was to
>> > > > > > > > follow this work
>> > > > > > > > with user policy integration at this location. On a high  
>> level
>> > > > > > > > the plan was
>> > > > > > > > for this to look something like:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >  	/*
>> > > > > > > >  	 * Adding a regular page that is architecturally allowed
>> > > to only
>> > > > > > > >  	 * be created with RW permissions.
>> > > > > > > >  	 * Interface with user space policy to support max
>> > > permissions
>> > > > > > > >  	 * of RWX.
>> > > > > > > >  	 */
>> > > > > > > >  	prot = PROT_READ | PROT_WRITE;
>> > > > > > > >  	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot,  
>> 0);
>> > > > > > > >
>> > > > > > > >         if (user space policy allows RWX on dynamically  
>> added
>> > > > > pages)
>> > > > > > > > 	 	encl_page->vm_max_prot_bits =  
>> calc_vm_prot_bits(PROT_READ |
>> > > > > > > > PROT_WRITE | PROT_EXEC, 0);
>> > > > > > > > 	else
>> > > > > > > > 		encl_page->vm_max_prot_bits =  
>> calc_vm_prot_bits(PROT_READ |
>> > > > > > > > PROT_WRITE, 0);
>> > > > > > > >
>> > > > > > > > The work that follows this series aimed to do the  
>> integration
>> > > > > with user
>> > > > > > > > space policy.
>> > > > > > >
>> > > > > > > What do you mean by "user space policy" anyway exactly? I'm
>> > > > > sorry but I
>> > > > > > > just don't fully understand this.
>> > > > > >
>> > > > > > My apologies - I just assumed that you would need no reminder
>> > > > > about this
>> > > > > > contentious
>> > > > > > part of SGX history. Essentially it means that, yes, the
>> > > kernel could
>> > > > > > theoretically
>> > > > > > permit any kind of access to any file/page, but some accesses  
>> are
>> > > > > known
>> > > > > > to generally
>> > > > > > be a bad idea - like making memory executable as well as  
>> writable
>> > > > > - and
>> > > > > > thus there
>> > > > > > are additional checks based on what user space permits before  
>> the
>> > > > > kernel
>> > > > > > allows
>> > > > > > such accesses.
>> > > > > >
>> > > > > > For example,
>> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()
>> > > > > >
>> > > > > > User policy and SGX has seen significant discussion. Some  
>> notable
>> > > > > > threads:
>> > > > > >  
>> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/
>> > > > > >  
>> https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/
>> > > > > >
>> > > > > > > It's too big of a risk to accept this series without X taken
>> > > care
>> > > > > > > of. Patch
>> > > > > > > series should neither have TODO nor TBD comments IMHO. I
>> > > don't want
>> > > > > > > to ack
>> > > > > > > a series based on speculation what might happen in the  
>> future.
>> > > > > >
>> > > > > > ok
>> > > > > >
>> > > > > > >
>> > > > > > > > > I think the best way to move forward would be to do  
>> EAUG's
>> > > > > > > > > explicitly with
>> > > > > > > > > an ioctl that could also include secinfo for  
>> permissions.
>> > > > > Then you can
>> > > > > > > > > easily do the rest with EACCEPTCOPY inside the enclave.
>> > > > > > > >
>> > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could  
>> possibly be
>> > > > > used for
>> > > > > > > > this purpose. It already includes SECINFO which may also  
>> be
>> > > > > useful if
>> > > > > > > > needing to later support EAUG of PT_SS* pages.
>> > > > > > >
>> > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and
>> > > call it
>> > > > > > > a day.
>> > > > > >
>> > > > > > I could, yes.
>> > > > > >
>> > > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES  
>> what is
>> > > > > > > this weird
>> > > > > > > thing added to the #PF handler? Why is it added at all then?
>> > > > > >
>> > > > > > I was just speculating in my response, there is no plan to  
>> extend
>> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of).
>> > > > > >
>> > > > > > > > How this could work is user space calls
>> > > SGX_IOC_ENCLAVE_ADD_PAGES
>> > > > > > > > after enclave initialization on any memory region within  
>> the
>> > > > > > > > enclave where
>> > > > > > > > pages are planned to be added dynamically. This ioctl()  
>> calls
>> > > > > > > > EAUG to add the
>> > > > > > > > new pages with RW permissions and their vm_max_prot_bits
>> > > can be
>> > > > > > > > set to the
>> > > > > > > > permissions found in the included SECINFO. This will  
>> support
>> > > > > > > > later EACCEPTCOPY
>> > > > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS
>> > > > > > >
>> > > > > > > I don't like this type of re-use of the existing API.
>> > > > > >
>> > > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is
>> > > > > consensus
>> > > > > > after
>> > > > > > considering the user policy question (above) and performance
>> > > trade-off
>> > > > > > (more below).
>> > > > > >
>> > > > > > >
>> > > > > > > > The big question is whether communicating user policy  
>> after
>> > > > > > > > enclave initialization
>> > > > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is  
>> acceptable
>> > > > > > > > to all? I would
>> > > > > > > > appreciate a confirmation on this direction considering  
>> the
>> > > > > > > > significant history
>> > > > > > > > behind this topic.
>> > > > > > >
>> > > > > > > I have no idea because I don't know what is user space  
>> policy.
>> > > > > >
>> > > > > > This discussion is about some enclave usages needing RWX
>> > > permissions
>> > > > > > on dynamically added enclave pages. RWX permissions on  
>> dynamically
>> > > > > added
>> > > > > > pages is
>> > > > > > not something that should blindly be allowed for all SGX
>> > > enclaves but
>> > > > > > instead the user
>> > > > > > needs to explicitly allow specific enclaves to have such
>> > > ability. This
>> > > > > > is equivalent
>> > > > > > to (but not the same as) what exists in Linux today with LSM.  
>> As
>> > > > > seen in
>> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect()  
>> Linux
>> > > > > is able
>> > > > > > to make
>> > > > > > files and memory be both writable and executable, but it would
>> > > only do
>> > > > > > so for those
>> > > > > > files and memory that the LSM (which is how user policy is
>> > > > > communicated,
>> > > > > > like SELinux)
>> > > > > > indicates it is allowed, not blindly do so for all files and  
>> all
>> > > > > memory.
>> > > > > >
>> > > > > > > > > Putting EAUG to the #PF handler and implicitly call it  
>> just
>> > > > > > > > > too flakky and
>> > > > > > > > > hard to make deterministic for e.g. JIT compiler in our  
>> use
>> > > > > > > > > case (not to
>> > > > > > > > > mention that JIT is not possible at all because  
>> inability to
>> > > > > > > > > do RX pages).
>> > > > > >
>> > > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more
>> > > > > deterministic
>> > > > > > but from
>> > > > > > what I understand it would have a performance impact since it
>> > > would
>> > > > > > require all memory
>> > > > > > that may be needed by the enclave be pre-allocated from
>> > > outside the
>> > > > > > enclave and not
>> > > > > > just dynamically allocated from within the enclave at the time
>> > > it is
>> > > > > > needed.
>> > > > > >
>> > > > > > Would such a performance impact be acceptable?
>> > > > > >
>> > > > >
>> > > > > User space won't always have enough info to decide whether the  
>> pages
>> > > > > to be
>> > > > > EAUG'd immediately. In some cases (shared libraries, JVM for
>> > > > > example) lots
>> > > > > of code/data pages can be mapped but never actually touched. One
>> > > > > enclave/process does not know if any other more important
>> > > > > enclave/process
>> > > > > would need the EPC.
>> > > > >
>> > > > > It should be for kernel to make the final decision as it has  
>> overall
>> > > > > picture
>> > > > > of the system EPC usage and availability.
>> > > >
>> > > > EAUG ioctl does not give better capabilities for user space to  
>> waste
>> > > > EPC given that EADD ioctl already exists, i.e. your argument is
>> > > logically
>> > > > incorrect.
>> > >
>> > > The point of adding EAUG is to allow more efficient use of EPC  
>> pages.
>> > > Without EAUG, enclaves have to EADD everything upfront into EPC,
>> > > consuming
>> > > predetermined number of EPC pages, some of which may not be used at  
>> all.
>> > > With EAUG, enclaves should be able to load minimal pages to get  
>> started,
>> > > pages added on #PF as they are actually accessed.
>> > >
>> > > Obviously as you pointed out, some usages make more sense to
>> > > pre-EAUG (EAUG
>> > > before #PF). But your proposal of supporting only pre-EAUG here
>> > > essentially
>> > > makes EAUG behave almost the same as EADD.  If the current
>> > > implementation
>> > > with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems  
>> possible
>> > > based on Dave's comments), then it is flxible to cover all cases and
>> > > allow
>> > > kernel to optimize allocation of EPC pages.
>> >
>> > There is no even a working #PF based implementation in existance, and
>> > your
>> > argument has too many if's for my taste.
>>
>> 1) if you mean no user space is implementing this kind of solution, read
>> this section, otherwise, skip to 2) below which is only couple of  
>> sentences.
>>
>> If you are willing to look, there is already implementation in our SDK  
>> to do
>> heap and stack expansion on demand on #PF. Enclaves may not know  
>> heap/stack
>> size up front, we have implemented these features to make EPC usage more
>> efficient. I don't know why normal processes can add RAM on #PF, but
>> enclaves adding EPC on #PF becomes so unacceptable concept to you. And  
>> the
>> kernel does that for EPC swapping already when #PF happens on a swapped  
>> out
>> EPC page.
>
> In adds O(n) round-trips for a mmap() emulation, which can be done in  
> O(1)
> round-trips with a ioctl.
>
>> Our implementation has gone through several rounds, the latest is
>> here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was  
>> also
>> implemented in original OOT driver based SDK implementation. Customers  
>> are
>> using it and found them useful. I think this is a critical feature that  
>> many
>> other runtimes will also need.
>
> I'm not sure what the common sense argument here is.
>
My (wrong) assumption was that you are disabling EAUG on #PF totally, and  
all I was saying EAUG on #PF is critical for many usages and disabling it  
requires good justification.

But you are expecting an ioctl call for each #PF for those usages:  
https://lore.kernel.org/linux-sgx/YiK8NEnvgPerEdFB@iki.fi/#t. IIUC, that's  
better than total disabling but less optimal. (I have not checked all call  
sequences in detail to be sure it would work for all our cases)


>> 2)
>> It's OK for you to request additional support for your usage and I  
>> agree it
>> is needed. But IMHO, totally getting rid of EAUG on #PF is bad and
>> unnecessary. Current implementation can be extended to support your  
>> usage.
>> What's the reason  you think MAP_POPULATE won't work for you?
>
> I do not recall taking stand on MAP_POPULATE.

Thanks for looking into that. Like I said, that should cover all usages.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave
  2022-02-08  0:45 ` [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave Reinette Chatre
  2022-02-19 11:57   ` Jarkko Sakkinen
@ 2022-03-07 16:16   ` Jarkko Sakkinen
  1 sibling, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 16:16 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> +	if (!encl_page)
> +		return VM_FAULT_OOM;
> +
> +	encl_page->desc = addr;
> +	encl_page->encl = encl;
> +
> +	/*
> +	 * Adding a regular page that is architecturally allowed to only
> +	 * be created with RW permissions.
> +	 * TBD: Interface with user space policy to support max permissions
> +	 * of RWX.
> +	 */
> +	prot = PROT_READ | PROT_WRITE;
> +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;

You should use sgx_encl_page_alloc() here and not reinvent wheel.

I wrote a patch that exports it:

https://lore.kernel.org/linux-sgx/20220306053211.135762-3-jarkko@kernel.org/T/#u

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 21/32] selftests/sgx: Test two different SGX2 EAUG flows
  2022-02-08  0:45 ` [PATCH V2 21/32] selftests/sgx: Test two different SGX2 EAUG flows Reinette Chatre
@ 2022-03-07 16:39   ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 16:39 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Feb 07, 2022 at 04:45:43PM -0800, Reinette Chatre wrote:
> +	addr = mmap((void *)self->encl.encl_base + total_size, PAGE_SIZE,
> +		    PROT_READ | PROT_WRITE | PROT_EXEC,
> +		    MAP_SHARED | MAP_FIXED, self->encl.fd, 0);

Maybe inline comment to just state that this is expected to work because
the range does not contain enclave pages, just as a reminder (had xref
to sgx_encl_may_map() to check this assumption). Otherwise, fine.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-02-08  0:45 ` [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions Reinette Chatre
@ 2022-03-07 17:10   ` Jarkko Sakkinen
  2022-03-07 17:36     ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 17:10 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
> === Summary ===
> 
> An SGX VMA can only be created if its permissions are the same or
> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> creation this same rule is again enforced by the page fault handler:
> faulted enclave pages are required to have equal or more relaxed
> EPCM permissions than the VMA permissions.
> 
> On SGX1 systems the additional enforcement in the page fault handler
> is redundant and on SGX2 systems it incorrectly prevents access.
> On SGX1 systems it is unnecessary to repeat the enforcement of the
> permission rule. The rule used during original VMA creation will
> ensure that any access attempt will use correct permissions.
> With SGX2 the EPCM permissions of a page can change after VMA
> creation resulting in the VMA permissions potentially being more
> relaxed than the EPCM permissions and the page fault handler
> incorrectly blocking valid access attempts.
> 
> Enable the VMA's pages to remain accessible while ensuring that
> the PTEs are installed to match the EPCM permissions but not be
> more relaxed than the VMA permissions.
> 
> === Full Changelog ===
> 
> An SGX enclave is an area of memory where parts of an application
> can reside. First an enclave is created and loaded (from
> non-enclave memory) with the code and data of an application,
> then user space can map (mmap()) the enclave memory to
> be able to enter the enclave at its defined entry points for
> execution within it.
> 
> The hardware maintains a secure structure, the Enclave Page Cache Map
> (EPCM), that tracks the contents of the enclave. Of interest here is
> its tracking of the enclave page permissions. When a page is loaded
> into the enclave its permissions are specified and recorded in the
> EPCM. In parallel the kernel maintains permissions within the
> page table entries (PTEs) and the rule is that PTE permissions
> are not allowed to be more relaxed than the EPCM permissions.
> 
> A new mapping (mmap()) of enclave memory can only succeed if the
> mapping has the same or weaker permissions than the permissions that
> were vetted during enclave creation. This is enforced by
> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
> paths. This rule remains.
> 
> One feature of SGX2 is to support the modification of EPCM permissions
> after enclave initialization. Enclave pages may thus already be part
> of a VMA at the time their EPCM permissions are changed resulting
> in the VMA's permissions potentially being more relaxed than the EPCM
> permissions.
> 
> Allow permissions of existing VMAs to be more relaxed than EPCM
> permissions in preparation for dynamic EPCM permission changes
> made possible in SGX2.  New VMAs that attempt to have more relaxed
> permissions than EPCM permissions continue to be unsupported.
> 
> Reasons why permissions of existing VMAs are allowed to be more relaxed
> than EPCM permissions instead of dynamically changing VMA permissions
> when EPCM permissions change are:
> 1) Changing VMA permissions involve splitting VMAs which is an
>    operation that can fail. Additionally changing EPCM permissions of
>    a range of pages could also fail on any of the pages involved.
>    Handling these error cases causes problems. For example, if an
>    EPCM permission change fails and the VMA has already been split
>    then it is not possible to undo the VMA split nor possible to
>    undo the EPCM permission changes that did succeed before the
>    failure.
> 2) The kernel has little insight into the user space where EPCM
>    permissions are controlled from. For example, a RW page may
>    be made RO just before it is made RX and splitting the VMAs
>    while the VMAs may change soon is unnecessary.
> 
> Remove the extra permission check called on a page fault
> (vm_operations_struct->fault) or during debugging
> (vm_operations_struct->access) when loading the enclave page from swap
> that ensures that the VMA permissions are not more relaxed than the
> EPCM permissions. Since a VMA could only exist if it passed the
> original permission checks during mmap() and a VMA may indeed
> have more relaxed permissions than the EPCM permissions this extra
> permission check is no longer appropriate.
> 
> With the permission check removed, ensure that PTEs do
> not blindly inherit the VMA permissions but instead the permissions
> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
> and enclave perspective) are installed with the writable bit set,
> reducing the need for this additional flow to the permission mismatch
> cases handled next.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Reword commit message (Jarkko).
> - Use "relax" instead of "exceed" when referring to permissions (Dave).
> - Add snippet to Documentation/x86/sgx.rst that highlights the
>   relationship between VMA, EPCM, and PTE permissions on SGX
>   systems (Andy).
> 
>  Documentation/x86/sgx.rst      | 10 +++++++++
>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
>  2 files changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> index 89ff924b1480..5659932728a5 100644
> --- a/Documentation/x86/sgx.rst
> +++ b/Documentation/x86/sgx.rst
> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
>  * PTEs are installed to match the EPCM permissions, but not be more
>    relaxed than the VMA permissions.
>  
> +On systems supporting SGX2 EPCM permissions may change while the
> +enclave page belongs to a VMA without impacting the VMA permissions.
> +This means that a running VMA may appear to allow access to an enclave
> +page that is not allowed by its EPCM permissions. For example, when an
> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
> +subsequently changed to have read-only EPCM permissions. The kernel
> +continues to maintain correct access to the enclave page through the
> +PTE that will ensure that only access allowed by both the VMA
> +and EPCM permissions are permitted.
> +
>  Application interface
>  =====================
>  
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 48afe96ae0f0..b6105d9e7c46 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  }
>  
>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> -						unsigned long addr,
> -						unsigned long vm_flags)
> +						unsigned long addr)
>  {
> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>  	struct sgx_epc_page *epc_page;
>  	struct sgx_encl_page *entry;
>  
> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>  	if (!entry)
>  		return ERR_PTR(-EFAULT);
>  
> -	/*
> -	 * Verify that the faulted page has equal or higher build time
> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
> -	 */
> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
> -		return ERR_PTR(-EFAULT);
> -
>  	/* Entry successfully located. */
>  	if (entry->epc_page) {
>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  {
>  	unsigned long addr = (unsigned long)vmf->address;
>  	struct vm_area_struct *vma = vmf->vma;
> +	unsigned long page_prot_bits;
>  	struct sgx_encl_page *entry;
> +	unsigned long vm_prot_bits;
>  	unsigned long phys_addr;
>  	struct sgx_encl *encl;
>  	vm_fault_t ret;
> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  
>  	mutex_lock(&encl->lock);
>  
> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
> +	entry = sgx_encl_load_page(encl, addr);
>  	if (IS_ERR(entry)) {
>  		mutex_unlock(&encl->lock);
  
> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  
>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
>  
> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
> +	/*
> +	 * Insert PTE to match the EPCM page permissions ensured to not
> +	 * exceed the VMA permissions.
> +	 */
> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
> +	/*
> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
> +	 * and EPCM are writable (no COW in SGX).
> +	 */
> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> +				  vm_get_page_prot(page_prot_bits));
>  	if (ret != VM_FAULT_NOPAGE) {
>  		mutex_unlock(&encl->lock);
>  
> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
>   * Load an enclave page to EPC if required, and take encl->lock.
>   */
>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
> -						   unsigned long addr,
> -						   unsigned long vm_flags)
> +						   unsigned long addr)
>  {
>  	struct sgx_encl_page *entry;
>  
>  	for ( ; ; ) {
>  		mutex_lock(&encl->lock);
>  
> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
> +		entry = sgx_encl_load_page(encl, addr);
>  		if (PTR_ERR(entry) != -EBUSY)
>  			break;
>  
> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
>  		return -EFAULT;
>  
>  	for (i = 0; i < len; i += cnt) {
> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
> -					      vma->vm_flags);
> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
>  		if (IS_ERR(entry)) {
>  			ret = PTR_ERR(entry);
>  			break;
> -- 
> 2.25.1
> 

If you unconditionally set vm_max_prot_bits to RWX for dynamically created
pags, you would not need to do this.

These patches could be then safely dropped then:

- [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
- [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
- [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions

And that would also keep full ABI compatibility without exceptions to the
existing mainline code.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-07 17:10   ` Jarkko Sakkinen
@ 2022-03-07 17:36     ` Reinette Chatre
  2022-03-08  8:14       ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-07 17:36 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

Hi Jarkko,

On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
> On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
>> === Summary ===
>>
>> An SGX VMA can only be created if its permissions are the same or
>> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
>> creation this same rule is again enforced by the page fault handler:
>> faulted enclave pages are required to have equal or more relaxed
>> EPCM permissions than the VMA permissions.
>>
>> On SGX1 systems the additional enforcement in the page fault handler
>> is redundant and on SGX2 systems it incorrectly prevents access.
>> On SGX1 systems it is unnecessary to repeat the enforcement of the
>> permission rule. The rule used during original VMA creation will
>> ensure that any access attempt will use correct permissions.
>> With SGX2 the EPCM permissions of a page can change after VMA
>> creation resulting in the VMA permissions potentially being more
>> relaxed than the EPCM permissions and the page fault handler
>> incorrectly blocking valid access attempts.
>>
>> Enable the VMA's pages to remain accessible while ensuring that
>> the PTEs are installed to match the EPCM permissions but not be
>> more relaxed than the VMA permissions.
>>
>> === Full Changelog ===
>>
>> An SGX enclave is an area of memory where parts of an application
>> can reside. First an enclave is created and loaded (from
>> non-enclave memory) with the code and data of an application,
>> then user space can map (mmap()) the enclave memory to
>> be able to enter the enclave at its defined entry points for
>> execution within it.
>>
>> The hardware maintains a secure structure, the Enclave Page Cache Map
>> (EPCM), that tracks the contents of the enclave. Of interest here is
>> its tracking of the enclave page permissions. When a page is loaded
>> into the enclave its permissions are specified and recorded in the
>> EPCM. In parallel the kernel maintains permissions within the
>> page table entries (PTEs) and the rule is that PTE permissions
>> are not allowed to be more relaxed than the EPCM permissions.
>>
>> A new mapping (mmap()) of enclave memory can only succeed if the
>> mapping has the same or weaker permissions than the permissions that
>> were vetted during enclave creation. This is enforced by
>> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
>> paths. This rule remains.
>>
>> One feature of SGX2 is to support the modification of EPCM permissions
>> after enclave initialization. Enclave pages may thus already be part
>> of a VMA at the time their EPCM permissions are changed resulting
>> in the VMA's permissions potentially being more relaxed than the EPCM
>> permissions.
>>
>> Allow permissions of existing VMAs to be more relaxed than EPCM
>> permissions in preparation for dynamic EPCM permission changes
>> made possible in SGX2.  New VMAs that attempt to have more relaxed
>> permissions than EPCM permissions continue to be unsupported.
>>
>> Reasons why permissions of existing VMAs are allowed to be more relaxed
>> than EPCM permissions instead of dynamically changing VMA permissions
>> when EPCM permissions change are:
>> 1) Changing VMA permissions involve splitting VMAs which is an
>>    operation that can fail. Additionally changing EPCM permissions of
>>    a range of pages could also fail on any of the pages involved.
>>    Handling these error cases causes problems. For example, if an
>>    EPCM permission change fails and the VMA has already been split
>>    then it is not possible to undo the VMA split nor possible to
>>    undo the EPCM permission changes that did succeed before the
>>    failure.
>> 2) The kernel has little insight into the user space where EPCM
>>    permissions are controlled from. For example, a RW page may
>>    be made RO just before it is made RX and splitting the VMAs
>>    while the VMAs may change soon is unnecessary.
>>
>> Remove the extra permission check called on a page fault
>> (vm_operations_struct->fault) or during debugging
>> (vm_operations_struct->access) when loading the enclave page from swap
>> that ensures that the VMA permissions are not more relaxed than the
>> EPCM permissions. Since a VMA could only exist if it passed the
>> original permission checks during mmap() and a VMA may indeed
>> have more relaxed permissions than the EPCM permissions this extra
>> permission check is no longer appropriate.
>>
>> With the permission check removed, ensure that PTEs do
>> not blindly inherit the VMA permissions but instead the permissions
>> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
>> and enclave perspective) are installed with the writable bit set,
>> reducing the need for this additional flow to the permission mismatch
>> cases handled next.
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>> Changes since V1:
>> - Reword commit message (Jarkko).
>> - Use "relax" instead of "exceed" when referring to permissions (Dave).
>> - Add snippet to Documentation/x86/sgx.rst that highlights the
>>   relationship between VMA, EPCM, and PTE permissions on SGX
>>   systems (Andy).
>>
>>  Documentation/x86/sgx.rst      | 10 +++++++++
>>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
>>  2 files changed, 30 insertions(+), 18 deletions(-)
>>
>> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
>> index 89ff924b1480..5659932728a5 100644
>> --- a/Documentation/x86/sgx.rst
>> +++ b/Documentation/x86/sgx.rst
>> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
>>  * PTEs are installed to match the EPCM permissions, but not be more
>>    relaxed than the VMA permissions.
>>  
>> +On systems supporting SGX2 EPCM permissions may change while the
>> +enclave page belongs to a VMA without impacting the VMA permissions.
>> +This means that a running VMA may appear to allow access to an enclave
>> +page that is not allowed by its EPCM permissions. For example, when an
>> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
>> +subsequently changed to have read-only EPCM permissions. The kernel
>> +continues to maintain correct access to the enclave page through the
>> +PTE that will ensure that only access allowed by both the VMA
>> +and EPCM permissions are permitted.
>> +
>>  Application interface
>>  =====================
>>  
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>> index 48afe96ae0f0..b6105d9e7c46 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>>  }
>>  
>>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>> -						unsigned long addr,
>> -						unsigned long vm_flags)
>> +						unsigned long addr)
>>  {
>> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>>  	struct sgx_epc_page *epc_page;
>>  	struct sgx_encl_page *entry;
>>  
>> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>>  	if (!entry)
>>  		return ERR_PTR(-EFAULT);
>>  
>> -	/*
>> -	 * Verify that the faulted page has equal or higher build time
>> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
>> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
>> -	 */
>> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
>> -		return ERR_PTR(-EFAULT);
>> -
>>  	/* Entry successfully located. */
>>  	if (entry->epc_page) {
>>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
>> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>  {
>>  	unsigned long addr = (unsigned long)vmf->address;
>>  	struct vm_area_struct *vma = vmf->vma;
>> +	unsigned long page_prot_bits;
>>  	struct sgx_encl_page *entry;
>> +	unsigned long vm_prot_bits;
>>  	unsigned long phys_addr;
>>  	struct sgx_encl *encl;
>>  	vm_fault_t ret;
>> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>  
>>  	mutex_lock(&encl->lock);
>>  
>> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
>> +	entry = sgx_encl_load_page(encl, addr);
>>  	if (IS_ERR(entry)) {
>>  		mutex_unlock(&encl->lock);
>   
>> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>  
>>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
>>  
>> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
>> +	/*
>> +	 * Insert PTE to match the EPCM page permissions ensured to not
>> +	 * exceed the VMA permissions.
>> +	 */
>> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
>> +	/*
>> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
>> +	 * and EPCM are writable (no COW in SGX).
>> +	 */
>> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
>> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
>> +				  vm_get_page_prot(page_prot_bits));
>>  	if (ret != VM_FAULT_NOPAGE) {
>>  		mutex_unlock(&encl->lock);
>>  
>> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
>>   * Load an enclave page to EPC if required, and take encl->lock.
>>   */
>>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
>> -						   unsigned long addr,
>> -						   unsigned long vm_flags)
>> +						   unsigned long addr)
>>  {
>>  	struct sgx_encl_page *entry;
>>  
>>  	for ( ; ; ) {
>>  		mutex_lock(&encl->lock);
>>  
>> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
>> +		entry = sgx_encl_load_page(encl, addr);
>>  		if (PTR_ERR(entry) != -EBUSY)
>>  			break;
>>  
>> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
>>  		return -EFAULT;
>>  
>>  	for (i = 0; i < len; i += cnt) {
>> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
>> -					      vma->vm_flags);
>> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
>>  		if (IS_ERR(entry)) {
>>  			ret = PTR_ERR(entry);
>>  			break;
>> -- 
>> 2.25.1
>>
> 
> If you unconditionally set vm_max_prot_bits to RWX for dynamically created
> pags, you would not need to do this.
> 
> These patches could be then safely dropped then:
> 
> - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
> - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
> - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
> 
> And that would also keep full ABI compatibility without exceptions to the
> existing mainline code.
> 

Dropping these changes do not just impact dynamically created pages. Dropping
these patches would result in EPCM page permission restriction being supported
for all pages, those added before enclave initialization as well as dynamically
added pages, but their PTEs will not be impacted.

For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
would keep allowing and installing RW PTEs to this page.

Allowing this goes against something explicitly disallowed from the beginning
of SGX as per:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/sgx.rst#n74

"EPCM permissions are separate from the normal page tables.  This prevents the
kernel from, for instance, allowing writes to data which an enclave wishes to
remain read-only."

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-07 17:36     ` Reinette Chatre
@ 2022-03-08  8:14       ` Jarkko Sakkinen
  2022-03-08  9:06         ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08  8:14 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Mon, Mar 07, 2022 at 09:36:36AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
> > On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
> >> === Summary ===
> >>
> >> An SGX VMA can only be created if its permissions are the same or
> >> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> >> creation this same rule is again enforced by the page fault handler:
> >> faulted enclave pages are required to have equal or more relaxed
> >> EPCM permissions than the VMA permissions.
> >>
> >> On SGX1 systems the additional enforcement in the page fault handler
> >> is redundant and on SGX2 systems it incorrectly prevents access.
> >> On SGX1 systems it is unnecessary to repeat the enforcement of the
> >> permission rule. The rule used during original VMA creation will
> >> ensure that any access attempt will use correct permissions.
> >> With SGX2 the EPCM permissions of a page can change after VMA
> >> creation resulting in the VMA permissions potentially being more
> >> relaxed than the EPCM permissions and the page fault handler
> >> incorrectly blocking valid access attempts.
> >>
> >> Enable the VMA's pages to remain accessible while ensuring that
> >> the PTEs are installed to match the EPCM permissions but not be
> >> more relaxed than the VMA permissions.
> >>
> >> === Full Changelog ===
> >>
> >> An SGX enclave is an area of memory where parts of an application
> >> can reside. First an enclave is created and loaded (from
> >> non-enclave memory) with the code and data of an application,
> >> then user space can map (mmap()) the enclave memory to
> >> be able to enter the enclave at its defined entry points for
> >> execution within it.
> >>
> >> The hardware maintains a secure structure, the Enclave Page Cache Map
> >> (EPCM), that tracks the contents of the enclave. Of interest here is
> >> its tracking of the enclave page permissions. When a page is loaded
> >> into the enclave its permissions are specified and recorded in the
> >> EPCM. In parallel the kernel maintains permissions within the
> >> page table entries (PTEs) and the rule is that PTE permissions
> >> are not allowed to be more relaxed than the EPCM permissions.
> >>
> >> A new mapping (mmap()) of enclave memory can only succeed if the
> >> mapping has the same or weaker permissions than the permissions that
> >> were vetted during enclave creation. This is enforced by
> >> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
> >> paths. This rule remains.
> >>
> >> One feature of SGX2 is to support the modification of EPCM permissions
> >> after enclave initialization. Enclave pages may thus already be part
> >> of a VMA at the time their EPCM permissions are changed resulting
> >> in the VMA's permissions potentially being more relaxed than the EPCM
> >> permissions.
> >>
> >> Allow permissions of existing VMAs to be more relaxed than EPCM
> >> permissions in preparation for dynamic EPCM permission changes
> >> made possible in SGX2.  New VMAs that attempt to have more relaxed
> >> permissions than EPCM permissions continue to be unsupported.
> >>
> >> Reasons why permissions of existing VMAs are allowed to be more relaxed
> >> than EPCM permissions instead of dynamically changing VMA permissions
> >> when EPCM permissions change are:
> >> 1) Changing VMA permissions involve splitting VMAs which is an
> >>    operation that can fail. Additionally changing EPCM permissions of
> >>    a range of pages could also fail on any of the pages involved.
> >>    Handling these error cases causes problems. For example, if an
> >>    EPCM permission change fails and the VMA has already been split
> >>    then it is not possible to undo the VMA split nor possible to
> >>    undo the EPCM permission changes that did succeed before the
> >>    failure.
> >> 2) The kernel has little insight into the user space where EPCM
> >>    permissions are controlled from. For example, a RW page may
> >>    be made RO just before it is made RX and splitting the VMAs
> >>    while the VMAs may change soon is unnecessary.
> >>
> >> Remove the extra permission check called on a page fault
> >> (vm_operations_struct->fault) or during debugging
> >> (vm_operations_struct->access) when loading the enclave page from swap
> >> that ensures that the VMA permissions are not more relaxed than the
> >> EPCM permissions. Since a VMA could only exist if it passed the
> >> original permission checks during mmap() and a VMA may indeed
> >> have more relaxed permissions than the EPCM permissions this extra
> >> permission check is no longer appropriate.
> >>
> >> With the permission check removed, ensure that PTEs do
> >> not blindly inherit the VMA permissions but instead the permissions
> >> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
> >> and enclave perspective) are installed with the writable bit set,
> >> reducing the need for this additional flow to the permission mismatch
> >> cases handled next.
> >>
> >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> >> ---
> >> Changes since V1:
> >> - Reword commit message (Jarkko).
> >> - Use "relax" instead of "exceed" when referring to permissions (Dave).
> >> - Add snippet to Documentation/x86/sgx.rst that highlights the
> >>   relationship between VMA, EPCM, and PTE permissions on SGX
> >>   systems (Andy).
> >>
> >>  Documentation/x86/sgx.rst      | 10 +++++++++
> >>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
> >>  2 files changed, 30 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> >> index 89ff924b1480..5659932728a5 100644
> >> --- a/Documentation/x86/sgx.rst
> >> +++ b/Documentation/x86/sgx.rst
> >> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
> >>  * PTEs are installed to match the EPCM permissions, but not be more
> >>    relaxed than the VMA permissions.
> >>  
> >> +On systems supporting SGX2 EPCM permissions may change while the
> >> +enclave page belongs to a VMA without impacting the VMA permissions.
> >> +This means that a running VMA may appear to allow access to an enclave
> >> +page that is not allowed by its EPCM permissions. For example, when an
> >> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
> >> +subsequently changed to have read-only EPCM permissions. The kernel
> >> +continues to maintain correct access to the enclave page through the
> >> +PTE that will ensure that only access allowed by both the VMA
> >> +and EPCM permissions are permitted.
> >> +
> >>  Application interface
> >>  =====================
> >>  
> >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> >> index 48afe96ae0f0..b6105d9e7c46 100644
> >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> >> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >>  }
> >>  
> >>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >> -						unsigned long addr,
> >> -						unsigned long vm_flags)
> >> +						unsigned long addr)
> >>  {
> >> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> >>  	struct sgx_epc_page *epc_page;
> >>  	struct sgx_encl_page *entry;
> >>  
> >> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >>  	if (!entry)
> >>  		return ERR_PTR(-EFAULT);
> >>  
> >> -	/*
> >> -	 * Verify that the faulted page has equal or higher build time
> >> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
> >> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
> >> -	 */
> >> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
> >> -		return ERR_PTR(-EFAULT);
> >> -
> >>  	/* Entry successfully located. */
> >>  	if (entry->epc_page) {
> >>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> >> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>  {
> >>  	unsigned long addr = (unsigned long)vmf->address;
> >>  	struct vm_area_struct *vma = vmf->vma;
> >> +	unsigned long page_prot_bits;
> >>  	struct sgx_encl_page *entry;
> >> +	unsigned long vm_prot_bits;
> >>  	unsigned long phys_addr;
> >>  	struct sgx_encl *encl;
> >>  	vm_fault_t ret;
> >> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>  
> >>  	mutex_lock(&encl->lock);
> >>  
> >> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
> >> +	entry = sgx_encl_load_page(encl, addr);
> >>  	if (IS_ERR(entry)) {
> >>  		mutex_unlock(&encl->lock);
> >   
> >> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>  
> >>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
> >>  
> >> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
> >> +	/*
> >> +	 * Insert PTE to match the EPCM page permissions ensured to not
> >> +	 * exceed the VMA permissions.
> >> +	 */
> >> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> >> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
> >> +	/*
> >> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
> >> +	 * and EPCM are writable (no COW in SGX).
> >> +	 */
> >> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
> >> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> >> +				  vm_get_page_prot(page_prot_bits));
> >>  	if (ret != VM_FAULT_NOPAGE) {
> >>  		mutex_unlock(&encl->lock);
> >>  
> >> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
> >>   * Load an enclave page to EPC if required, and take encl->lock.
> >>   */
> >>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
> >> -						   unsigned long addr,
> >> -						   unsigned long vm_flags)
> >> +						   unsigned long addr)
> >>  {
> >>  	struct sgx_encl_page *entry;
> >>  
> >>  	for ( ; ; ) {
> >>  		mutex_lock(&encl->lock);
> >>  
> >> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
> >> +		entry = sgx_encl_load_page(encl, addr);
> >>  		if (PTR_ERR(entry) != -EBUSY)
> >>  			break;
> >>  
> >> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
> >>  		return -EFAULT;
> >>  
> >>  	for (i = 0; i < len; i += cnt) {
> >> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
> >> -					      vma->vm_flags);
> >> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
> >>  		if (IS_ERR(entry)) {
> >>  			ret = PTR_ERR(entry);
> >>  			break;
> >> -- 
> >> 2.25.1
> >>
> > 
> > If you unconditionally set vm_max_prot_bits to RWX for dynamically created
> > pags, you would not need to do this.
> > 
> > These patches could be then safely dropped then:
> > 
> > - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
> > - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
> > - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
> > 
> > And that would also keep full ABI compatibility without exceptions to the
> > existing mainline code.
> > 
> 
> Dropping these changes do not just impact dynamically created pages. Dropping
> these patches would result in EPCM page permission restriction being supported
> for all pages, those added before enclave initialization as well as dynamically
> added pages, but their PTEs will not be impacted.
> 
> For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
> then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
> would keep allowing and installing RW PTEs to this page.

I think that would be perfectly fine, if someone wants to do that. There is
no corrateral damage on doing that. Kernel does not get messed because of
that. It's a use case that does not make sense in the first place, so it'd
be stupid to build anything extensive around it to the kernel.

Shooting yourself to the foot is something that kernel does and should not
protect user space from unless there is a risk of messing the state of the
kernel itself.

Much worse is that we have e.g. completely artificial ioctl
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support this scheme, which could e.g.
cause extra roundtrips for simple EMODPE.

Also this means not having to include 06/32, which keeps 100% backwards
compatibility in run-time behaviour to the mainline while not restricting
at all dynamically created pages. And we get rid of complex book keeping
of vm_run_prot_bits.

And generally the whole model is then very easy to understand and explain.
If I had to keep presentation of the current mess in the patch set in a
conference, I can honestly say that I would be in serious trouble. It's
not clean and clear security model, which is a risk by itself.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-08  8:14       ` Jarkko Sakkinen
@ 2022-03-08  9:06         ` Jarkko Sakkinen
  2022-03-08  9:12           ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08  9:06 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Tue, Mar 08, 2022 at 10:14:42AM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 07, 2022 at 09:36:36AM -0800, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
> > > On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
> > >> === Summary ===
> > >>
> > >> An SGX VMA can only be created if its permissions are the same or
> > >> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> > >> creation this same rule is again enforced by the page fault handler:
> > >> faulted enclave pages are required to have equal or more relaxed
> > >> EPCM permissions than the VMA permissions.
> > >>
> > >> On SGX1 systems the additional enforcement in the page fault handler
> > >> is redundant and on SGX2 systems it incorrectly prevents access.
> > >> On SGX1 systems it is unnecessary to repeat the enforcement of the
> > >> permission rule. The rule used during original VMA creation will
> > >> ensure that any access attempt will use correct permissions.
> > >> With SGX2 the EPCM permissions of a page can change after VMA
> > >> creation resulting in the VMA permissions potentially being more
> > >> relaxed than the EPCM permissions and the page fault handler
> > >> incorrectly blocking valid access attempts.
> > >>
> > >> Enable the VMA's pages to remain accessible while ensuring that
> > >> the PTEs are installed to match the EPCM permissions but not be
> > >> more relaxed than the VMA permissions.
> > >>
> > >> === Full Changelog ===
> > >>
> > >> An SGX enclave is an area of memory where parts of an application
> > >> can reside. First an enclave is created and loaded (from
> > >> non-enclave memory) with the code and data of an application,
> > >> then user space can map (mmap()) the enclave memory to
> > >> be able to enter the enclave at its defined entry points for
> > >> execution within it.
> > >>
> > >> The hardware maintains a secure structure, the Enclave Page Cache Map
> > >> (EPCM), that tracks the contents of the enclave. Of interest here is
> > >> its tracking of the enclave page permissions. When a page is loaded
> > >> into the enclave its permissions are specified and recorded in the
> > >> EPCM. In parallel the kernel maintains permissions within the
> > >> page table entries (PTEs) and the rule is that PTE permissions
> > >> are not allowed to be more relaxed than the EPCM permissions.
> > >>
> > >> A new mapping (mmap()) of enclave memory can only succeed if the
> > >> mapping has the same or weaker permissions than the permissions that
> > >> were vetted during enclave creation. This is enforced by
> > >> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
> > >> paths. This rule remains.
> > >>
> > >> One feature of SGX2 is to support the modification of EPCM permissions
> > >> after enclave initialization. Enclave pages may thus already be part
> > >> of a VMA at the time their EPCM permissions are changed resulting
> > >> in the VMA's permissions potentially being more relaxed than the EPCM
> > >> permissions.
> > >>
> > >> Allow permissions of existing VMAs to be more relaxed than EPCM
> > >> permissions in preparation for dynamic EPCM permission changes
> > >> made possible in SGX2.  New VMAs that attempt to have more relaxed
> > >> permissions than EPCM permissions continue to be unsupported.
> > >>
> > >> Reasons why permissions of existing VMAs are allowed to be more relaxed
> > >> than EPCM permissions instead of dynamically changing VMA permissions
> > >> when EPCM permissions change are:
> > >> 1) Changing VMA permissions involve splitting VMAs which is an
> > >>    operation that can fail. Additionally changing EPCM permissions of
> > >>    a range of pages could also fail on any of the pages involved.
> > >>    Handling these error cases causes problems. For example, if an
> > >>    EPCM permission change fails and the VMA has already been split
> > >>    then it is not possible to undo the VMA split nor possible to
> > >>    undo the EPCM permission changes that did succeed before the
> > >>    failure.
> > >> 2) The kernel has little insight into the user space where EPCM
> > >>    permissions are controlled from. For example, a RW page may
> > >>    be made RO just before it is made RX and splitting the VMAs
> > >>    while the VMAs may change soon is unnecessary.
> > >>
> > >> Remove the extra permission check called on a page fault
> > >> (vm_operations_struct->fault) or during debugging
> > >> (vm_operations_struct->access) when loading the enclave page from swap
> > >> that ensures that the VMA permissions are not more relaxed than the
> > >> EPCM permissions. Since a VMA could only exist if it passed the
> > >> original permission checks during mmap() and a VMA may indeed
> > >> have more relaxed permissions than the EPCM permissions this extra
> > >> permission check is no longer appropriate.
> > >>
> > >> With the permission check removed, ensure that PTEs do
> > >> not blindly inherit the VMA permissions but instead the permissions
> > >> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
> > >> and enclave perspective) are installed with the writable bit set,
> > >> reducing the need for this additional flow to the permission mismatch
> > >> cases handled next.
> > >>
> > >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > >> ---
> > >> Changes since V1:
> > >> - Reword commit message (Jarkko).
> > >> - Use "relax" instead of "exceed" when referring to permissions (Dave).
> > >> - Add snippet to Documentation/x86/sgx.rst that highlights the
> > >>   relationship between VMA, EPCM, and PTE permissions on SGX
> > >>   systems (Andy).
> > >>
> > >>  Documentation/x86/sgx.rst      | 10 +++++++++
> > >>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
> > >>  2 files changed, 30 insertions(+), 18 deletions(-)
> > >>
> > >> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> > >> index 89ff924b1480..5659932728a5 100644
> > >> --- a/Documentation/x86/sgx.rst
> > >> +++ b/Documentation/x86/sgx.rst
> > >> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
> > >>  * PTEs are installed to match the EPCM permissions, but not be more
> > >>    relaxed than the VMA permissions.
> > >>  
> > >> +On systems supporting SGX2 EPCM permissions may change while the
> > >> +enclave page belongs to a VMA without impacting the VMA permissions.
> > >> +This means that a running VMA may appear to allow access to an enclave
> > >> +page that is not allowed by its EPCM permissions. For example, when an
> > >> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
> > >> +subsequently changed to have read-only EPCM permissions. The kernel
> > >> +continues to maintain correct access to the enclave page through the
> > >> +PTE that will ensure that only access allowed by both the VMA
> > >> +and EPCM permissions are permitted.
> > >> +
> > >>  Application interface
> > >>  =====================
> > >>  
> > >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > >> index 48afe96ae0f0..b6105d9e7c46 100644
> > >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> > >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > >> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> > >>  }
> > >>  
> > >>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > >> -						unsigned long addr,
> > >> -						unsigned long vm_flags)
> > >> +						unsigned long addr)
> > >>  {
> > >> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> > >>  	struct sgx_epc_page *epc_page;
> > >>  	struct sgx_encl_page *entry;
> > >>  
> > >> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > >>  	if (!entry)
> > >>  		return ERR_PTR(-EFAULT);
> > >>  
> > >> -	/*
> > >> -	 * Verify that the faulted page has equal or higher build time
> > >> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
> > >> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
> > >> -	 */
> > >> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
> > >> -		return ERR_PTR(-EFAULT);
> > >> -
> > >>  	/* Entry successfully located. */
> > >>  	if (entry->epc_page) {
> > >>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> > >> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > >>  {
> > >>  	unsigned long addr = (unsigned long)vmf->address;
> > >>  	struct vm_area_struct *vma = vmf->vma;
> > >> +	unsigned long page_prot_bits;
> > >>  	struct sgx_encl_page *entry;
> > >> +	unsigned long vm_prot_bits;
> > >>  	unsigned long phys_addr;
> > >>  	struct sgx_encl *encl;
> > >>  	vm_fault_t ret;
> > >> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > >>  
> > >>  	mutex_lock(&encl->lock);
> > >>  
> > >> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
> > >> +	entry = sgx_encl_load_page(encl, addr);
> > >>  	if (IS_ERR(entry)) {
> > >>  		mutex_unlock(&encl->lock);
> > >   
> > >> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > >>  
> > >>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
> > >>  
> > >> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
> > >> +	/*
> > >> +	 * Insert PTE to match the EPCM page permissions ensured to not
> > >> +	 * exceed the VMA permissions.
> > >> +	 */
> > >> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> > >> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
> > >> +	/*
> > >> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
> > >> +	 * and EPCM are writable (no COW in SGX).
> > >> +	 */
> > >> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
> > >> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> > >> +				  vm_get_page_prot(page_prot_bits));
> > >>  	if (ret != VM_FAULT_NOPAGE) {
> > >>  		mutex_unlock(&encl->lock);
> > >>  
> > >> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
> > >>   * Load an enclave page to EPC if required, and take encl->lock.
> > >>   */
> > >>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
> > >> -						   unsigned long addr,
> > >> -						   unsigned long vm_flags)
> > >> +						   unsigned long addr)
> > >>  {
> > >>  	struct sgx_encl_page *entry;
> > >>  
> > >>  	for ( ; ; ) {
> > >>  		mutex_lock(&encl->lock);
> > >>  
> > >> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
> > >> +		entry = sgx_encl_load_page(encl, addr);
> > >>  		if (PTR_ERR(entry) != -EBUSY)
> > >>  			break;
> > >>  
> > >> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
> > >>  		return -EFAULT;
> > >>  
> > >>  	for (i = 0; i < len; i += cnt) {
> > >> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
> > >> -					      vma->vm_flags);
> > >> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
> > >>  		if (IS_ERR(entry)) {
> > >>  			ret = PTR_ERR(entry);
> > >>  			break;
> > >> -- 
> > >> 2.25.1
> > >>
> > > 
> > > If you unconditionally set vm_max_prot_bits to RWX for dynamically created
> > > pags, you would not need to do this.
> > > 
> > > These patches could be then safely dropped then:
> > > 
> > > - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
> > > - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
> > > - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
> > > 
> > > And that would also keep full ABI compatibility without exceptions to the
> > > existing mainline code.
> > > 
> > 
> > Dropping these changes do not just impact dynamically created pages. Dropping
> > these patches would result in EPCM page permission restriction being supported
> > for all pages, those added before enclave initialization as well as dynamically
> > added pages, but their PTEs will not be impacted.
> > 
> > For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
> > then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
> > would keep allowing and installing RW PTEs to this page.
> 
> I think that would be perfectly fine, if someone wants to do that. There is
> no corrateral damage on doing that. Kernel does not get messed because of
> that. It's a use case that does not make sense in the first place, so it'd
> be stupid to build anything extensive around it to the kernel.
> 
> Shooting yourself to the foot is something that kernel does and should not
> protect user space from unless there is a risk of messing the state of the
> kernel itself.
> 
> Much worse is that we have e.g. completely artificial ioctl
> SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support this scheme, which could e.g.
> cause extra roundtrips for simple EMODPE.
> 
> Also this means not having to include 06/32, which keeps 100% backwards
> compatibility in run-time behaviour to the mainline while not restricting
> at all dynamically created pages. And we get rid of complex book keeping
> of vm_run_prot_bits.
> 
> And generally the whole model is then very easy to understand and explain.
> If I had to keep presentation of the current mess in the patch set in a
> conference, I can honestly say that I would be in serious trouble. It's
> not clean and clear security model, which is a risk by itself.

I.e.

1. For EADD'd pages: stick what has been the invariant 1,5 years now. Do
   not change it by any means (e.g. 06/32).
2. For EAUG'd pages: set vm_max_prot_bits RWX, which essentially means do
   what ever you want with PTE's and EPCM.

It's a clear and understandable model that does nothing bad to the kernel,
and a run-time developer can surely find away to get things on going. For
user space, the most important thing is the clarity in kernel behaviour,
and this does deliver that clarity. It's not perfect but it does do the
job and anyone can get it.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-08  9:06         ` Jarkko Sakkinen
@ 2022-03-08  9:12           ` Jarkko Sakkinen
  2022-03-08 16:04             ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08  9:12 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Tue, Mar 08, 2022 at 11:06:46AM +0200, Jarkko Sakkinen wrote:
> On Tue, Mar 08, 2022 at 10:14:42AM +0200, Jarkko Sakkinen wrote:
> > On Mon, Mar 07, 2022 at 09:36:36AM -0800, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
> > > > On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
> > > >> === Summary ===
> > > >>
> > > >> An SGX VMA can only be created if its permissions are the same or
> > > >> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> > > >> creation this same rule is again enforced by the page fault handler:
> > > >> faulted enclave pages are required to have equal or more relaxed
> > > >> EPCM permissions than the VMA permissions.
> > > >>
> > > >> On SGX1 systems the additional enforcement in the page fault handler
> > > >> is redundant and on SGX2 systems it incorrectly prevents access.
> > > >> On SGX1 systems it is unnecessary to repeat the enforcement of the
> > > >> permission rule. The rule used during original VMA creation will
> > > >> ensure that any access attempt will use correct permissions.
> > > >> With SGX2 the EPCM permissions of a page can change after VMA
> > > >> creation resulting in the VMA permissions potentially being more
> > > >> relaxed than the EPCM permissions and the page fault handler
> > > >> incorrectly blocking valid access attempts.
> > > >>
> > > >> Enable the VMA's pages to remain accessible while ensuring that
> > > >> the PTEs are installed to match the EPCM permissions but not be
> > > >> more relaxed than the VMA permissions.
> > > >>
> > > >> === Full Changelog ===
> > > >>
> > > >> An SGX enclave is an area of memory where parts of an application
> > > >> can reside. First an enclave is created and loaded (from
> > > >> non-enclave memory) with the code and data of an application,
> > > >> then user space can map (mmap()) the enclave memory to
> > > >> be able to enter the enclave at its defined entry points for
> > > >> execution within it.
> > > >>
> > > >> The hardware maintains a secure structure, the Enclave Page Cache Map
> > > >> (EPCM), that tracks the contents of the enclave. Of interest here is
> > > >> its tracking of the enclave page permissions. When a page is loaded
> > > >> into the enclave its permissions are specified and recorded in the
> > > >> EPCM. In parallel the kernel maintains permissions within the
> > > >> page table entries (PTEs) and the rule is that PTE permissions
> > > >> are not allowed to be more relaxed than the EPCM permissions.
> > > >>
> > > >> A new mapping (mmap()) of enclave memory can only succeed if the
> > > >> mapping has the same or weaker permissions than the permissions that
> > > >> were vetted during enclave creation. This is enforced by
> > > >> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
> > > >> paths. This rule remains.
> > > >>
> > > >> One feature of SGX2 is to support the modification of EPCM permissions
> > > >> after enclave initialization. Enclave pages may thus already be part
> > > >> of a VMA at the time their EPCM permissions are changed resulting
> > > >> in the VMA's permissions potentially being more relaxed than the EPCM
> > > >> permissions.
> > > >>
> > > >> Allow permissions of existing VMAs to be more relaxed than EPCM
> > > >> permissions in preparation for dynamic EPCM permission changes
> > > >> made possible in SGX2.  New VMAs that attempt to have more relaxed
> > > >> permissions than EPCM permissions continue to be unsupported.
> > > >>
> > > >> Reasons why permissions of existing VMAs are allowed to be more relaxed
> > > >> than EPCM permissions instead of dynamically changing VMA permissions
> > > >> when EPCM permissions change are:
> > > >> 1) Changing VMA permissions involve splitting VMAs which is an
> > > >>    operation that can fail. Additionally changing EPCM permissions of
> > > >>    a range of pages could also fail on any of the pages involved.
> > > >>    Handling these error cases causes problems. For example, if an
> > > >>    EPCM permission change fails and the VMA has already been split
> > > >>    then it is not possible to undo the VMA split nor possible to
> > > >>    undo the EPCM permission changes that did succeed before the
> > > >>    failure.
> > > >> 2) The kernel has little insight into the user space where EPCM
> > > >>    permissions are controlled from. For example, a RW page may
> > > >>    be made RO just before it is made RX and splitting the VMAs
> > > >>    while the VMAs may change soon is unnecessary.
> > > >>
> > > >> Remove the extra permission check called on a page fault
> > > >> (vm_operations_struct->fault) or during debugging
> > > >> (vm_operations_struct->access) when loading the enclave page from swap
> > > >> that ensures that the VMA permissions are not more relaxed than the
> > > >> EPCM permissions. Since a VMA could only exist if it passed the
> > > >> original permission checks during mmap() and a VMA may indeed
> > > >> have more relaxed permissions than the EPCM permissions this extra
> > > >> permission check is no longer appropriate.
> > > >>
> > > >> With the permission check removed, ensure that PTEs do
> > > >> not blindly inherit the VMA permissions but instead the permissions
> > > >> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
> > > >> and enclave perspective) are installed with the writable bit set,
> > > >> reducing the need for this additional flow to the permission mismatch
> > > >> cases handled next.
> > > >>
> > > >> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > >> ---
> > > >> Changes since V1:
> > > >> - Reword commit message (Jarkko).
> > > >> - Use "relax" instead of "exceed" when referring to permissions (Dave).
> > > >> - Add snippet to Documentation/x86/sgx.rst that highlights the
> > > >>   relationship between VMA, EPCM, and PTE permissions on SGX
> > > >>   systems (Andy).
> > > >>
> > > >>  Documentation/x86/sgx.rst      | 10 +++++++++
> > > >>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
> > > >>  2 files changed, 30 insertions(+), 18 deletions(-)
> > > >>
> > > >> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> > > >> index 89ff924b1480..5659932728a5 100644
> > > >> --- a/Documentation/x86/sgx.rst
> > > >> +++ b/Documentation/x86/sgx.rst
> > > >> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
> > > >>  * PTEs are installed to match the EPCM permissions, but not be more
> > > >>    relaxed than the VMA permissions.
> > > >>  
> > > >> +On systems supporting SGX2 EPCM permissions may change while the
> > > >> +enclave page belongs to a VMA without impacting the VMA permissions.
> > > >> +This means that a running VMA may appear to allow access to an enclave
> > > >> +page that is not allowed by its EPCM permissions. For example, when an
> > > >> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
> > > >> +subsequently changed to have read-only EPCM permissions. The kernel
> > > >> +continues to maintain correct access to the enclave page through the
> > > >> +PTE that will ensure that only access allowed by both the VMA
> > > >> +and EPCM permissions are permitted.
> > > >> +
> > > >>  Application interface
> > > >>  =====================
> > > >>  
> > > >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > > >> index 48afe96ae0f0..b6105d9e7c46 100644
> > > >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> > > >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > > >> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> > > >>  }
> > > >>  
> > > >>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > > >> -						unsigned long addr,
> > > >> -						unsigned long vm_flags)
> > > >> +						unsigned long addr)
> > > >>  {
> > > >> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> > > >>  	struct sgx_epc_page *epc_page;
> > > >>  	struct sgx_encl_page *entry;
> > > >>  
> > > >> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > > >>  	if (!entry)
> > > >>  		return ERR_PTR(-EFAULT);
> > > >>  
> > > >> -	/*
> > > >> -	 * Verify that the faulted page has equal or higher build time
> > > >> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
> > > >> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
> > > >> -	 */
> > > >> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
> > > >> -		return ERR_PTR(-EFAULT);
> > > >> -
> > > >>  	/* Entry successfully located. */
> > > >>  	if (entry->epc_page) {
> > > >>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> > > >> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > > >>  {
> > > >>  	unsigned long addr = (unsigned long)vmf->address;
> > > >>  	struct vm_area_struct *vma = vmf->vma;
> > > >> +	unsigned long page_prot_bits;
> > > >>  	struct sgx_encl_page *entry;
> > > >> +	unsigned long vm_prot_bits;
> > > >>  	unsigned long phys_addr;
> > > >>  	struct sgx_encl *encl;
> > > >>  	vm_fault_t ret;
> > > >> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > > >>  
> > > >>  	mutex_lock(&encl->lock);
> > > >>  
> > > >> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
> > > >> +	entry = sgx_encl_load_page(encl, addr);
> > > >>  	if (IS_ERR(entry)) {
> > > >>  		mutex_unlock(&encl->lock);
> > > >   
> > > >> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > > >>  
> > > >>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
> > > >>  
> > > >> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
> > > >> +	/*
> > > >> +	 * Insert PTE to match the EPCM page permissions ensured to not
> > > >> +	 * exceed the VMA permissions.
> > > >> +	 */
> > > >> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> > > >> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
> > > >> +	/*
> > > >> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
> > > >> +	 * and EPCM are writable (no COW in SGX).
> > > >> +	 */
> > > >> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
> > > >> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> > > >> +				  vm_get_page_prot(page_prot_bits));
> > > >>  	if (ret != VM_FAULT_NOPAGE) {
> > > >>  		mutex_unlock(&encl->lock);
> > > >>  
> > > >> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
> > > >>   * Load an enclave page to EPC if required, and take encl->lock.
> > > >>   */
> > > >>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
> > > >> -						   unsigned long addr,
> > > >> -						   unsigned long vm_flags)
> > > >> +						   unsigned long addr)
> > > >>  {
> > > >>  	struct sgx_encl_page *entry;
> > > >>  
> > > >>  	for ( ; ; ) {
> > > >>  		mutex_lock(&encl->lock);
> > > >>  
> > > >> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
> > > >> +		entry = sgx_encl_load_page(encl, addr);
> > > >>  		if (PTR_ERR(entry) != -EBUSY)
> > > >>  			break;
> > > >>  
> > > >> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
> > > >>  		return -EFAULT;
> > > >>  
> > > >>  	for (i = 0; i < len; i += cnt) {
> > > >> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
> > > >> -					      vma->vm_flags);
> > > >> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
> > > >>  		if (IS_ERR(entry)) {
> > > >>  			ret = PTR_ERR(entry);
> > > >>  			break;
> > > >> -- 
> > > >> 2.25.1
> > > >>
> > > > 
> > > > If you unconditionally set vm_max_prot_bits to RWX for dynamically created
> > > > pags, you would not need to do this.
> > > > 
> > > > These patches could be then safely dropped then:
> > > > 
> > > > - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
> > > > - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
> > > > - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
> > > > 
> > > > And that would also keep full ABI compatibility without exceptions to the
> > > > existing mainline code.
> > > > 
> > > 
> > > Dropping these changes do not just impact dynamically created pages. Dropping
> > > these patches would result in EPCM page permission restriction being supported
> > > for all pages, those added before enclave initialization as well as dynamically
> > > added pages, but their PTEs will not be impacted.
> > > 
> > > For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
> > > then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
> > > would keep allowing and installing RW PTEs to this page.
> > 
> > I think that would be perfectly fine, if someone wants to do that. There is
> > no corrateral damage on doing that. Kernel does not get messed because of
> > that. It's a use case that does not make sense in the first place, so it'd
> > be stupid to build anything extensive around it to the kernel.
> > 
> > Shooting yourself to the foot is something that kernel does and should not
> > protect user space from unless there is a risk of messing the state of the
> > kernel itself.
> > 
> > Much worse is that we have e.g. completely artificial ioctl
> > SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support this scheme, which could e.g.
> > cause extra roundtrips for simple EMODPE.
> > 
> > Also this means not having to include 06/32, which keeps 100% backwards
> > compatibility in run-time behaviour to the mainline while not restricting
> > at all dynamically created pages. And we get rid of complex book keeping
> > of vm_run_prot_bits.
> > 
> > And generally the whole model is then very easy to understand and explain.
> > If I had to keep presentation of the current mess in the patch set in a
> > conference, I can honestly say that I would be in serious trouble. It's
> > not clean and clear security model, which is a risk by itself.
> 
> I.e.
> 
> 1. For EADD'd pages: stick what has been the invariant 1,5 years now. Do
>    not change it by any means (e.g. 06/32).
> 2. For EAUG'd pages: set vm_max_prot_bits RWX, which essentially means do
>    what ever you want with PTE's and EPCM.
> 
> It's a clear and understandable model that does nothing bad to the kernel,
> and a run-time developer can surely find away to get things on going. For
> user space, the most important thing is the clarity in kernel behaviour,
> and this does deliver that clarity. It's not perfect but it does do the
> job and anyone can get it.

Also a quantitive argument for this is that by simplifying security model
this way it is one ioctl less, which must be considered as +1. We do not
want to add new ioctls unless it is something we absolutely cannnot live
without. We absolutely can live without SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-08  9:12           ` Jarkko Sakkinen
@ 2022-03-08 16:04             ` Reinette Chatre
  2022-03-08 17:00               ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-08 16:04 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

Hi Jarkko,

On 3/8/2022 1:12 AM, Jarkko Sakkinen wrote:
> On Tue, Mar 08, 2022 at 11:06:46AM +0200, Jarkko Sakkinen wrote:
>> On Tue, Mar 08, 2022 at 10:14:42AM +0200, Jarkko Sakkinen wrote:
>>> On Mon, Mar 07, 2022 at 09:36:36AM -0800, Reinette Chatre wrote:
>>>> Hi Jarkko,
>>>>
>>>> On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
>>>>> On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
>>>>>> === Summary ===
>>>>>>
>>>>>> An SGX VMA can only be created if its permissions are the same or
>>>>>> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
>>>>>> creation this same rule is again enforced by the page fault handler:
>>>>>> faulted enclave pages are required to have equal or more relaxed
>>>>>> EPCM permissions than the VMA permissions.
>>>>>>
>>>>>> On SGX1 systems the additional enforcement in the page fault handler
>>>>>> is redundant and on SGX2 systems it incorrectly prevents access.
>>>>>> On SGX1 systems it is unnecessary to repeat the enforcement of the
>>>>>> permission rule. The rule used during original VMA creation will
>>>>>> ensure that any access attempt will use correct permissions.
>>>>>> With SGX2 the EPCM permissions of a page can change after VMA
>>>>>> creation resulting in the VMA permissions potentially being more
>>>>>> relaxed than the EPCM permissions and the page fault handler
>>>>>> incorrectly blocking valid access attempts.
>>>>>>
>>>>>> Enable the VMA's pages to remain accessible while ensuring that
>>>>>> the PTEs are installed to match the EPCM permissions but not be
>>>>>> more relaxed than the VMA permissions.
>>>>>>
>>>>>> === Full Changelog ===
>>>>>>
>>>>>> An SGX enclave is an area of memory where parts of an application
>>>>>> can reside. First an enclave is created and loaded (from
>>>>>> non-enclave memory) with the code and data of an application,
>>>>>> then user space can map (mmap()) the enclave memory to
>>>>>> be able to enter the enclave at its defined entry points for
>>>>>> execution within it.
>>>>>>
>>>>>> The hardware maintains a secure structure, the Enclave Page Cache Map
>>>>>> (EPCM), that tracks the contents of the enclave. Of interest here is
>>>>>> its tracking of the enclave page permissions. When a page is loaded
>>>>>> into the enclave its permissions are specified and recorded in the
>>>>>> EPCM. In parallel the kernel maintains permissions within the
>>>>>> page table entries (PTEs) and the rule is that PTE permissions
>>>>>> are not allowed to be more relaxed than the EPCM permissions.
>>>>>>
>>>>>> A new mapping (mmap()) of enclave memory can only succeed if the
>>>>>> mapping has the same or weaker permissions than the permissions that
>>>>>> were vetted during enclave creation. This is enforced by
>>>>>> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
>>>>>> paths. This rule remains.
>>>>>>
>>>>>> One feature of SGX2 is to support the modification of EPCM permissions
>>>>>> after enclave initialization. Enclave pages may thus already be part
>>>>>> of a VMA at the time their EPCM permissions are changed resulting
>>>>>> in the VMA's permissions potentially being more relaxed than the EPCM
>>>>>> permissions.
>>>>>>
>>>>>> Allow permissions of existing VMAs to be more relaxed than EPCM
>>>>>> permissions in preparation for dynamic EPCM permission changes
>>>>>> made possible in SGX2.  New VMAs that attempt to have more relaxed
>>>>>> permissions than EPCM permissions continue to be unsupported.
>>>>>>
>>>>>> Reasons why permissions of existing VMAs are allowed to be more relaxed
>>>>>> than EPCM permissions instead of dynamically changing VMA permissions
>>>>>> when EPCM permissions change are:
>>>>>> 1) Changing VMA permissions involve splitting VMAs which is an
>>>>>>    operation that can fail. Additionally changing EPCM permissions of
>>>>>>    a range of pages could also fail on any of the pages involved.
>>>>>>    Handling these error cases causes problems. For example, if an
>>>>>>    EPCM permission change fails and the VMA has already been split
>>>>>>    then it is not possible to undo the VMA split nor possible to
>>>>>>    undo the EPCM permission changes that did succeed before the
>>>>>>    failure.
>>>>>> 2) The kernel has little insight into the user space where EPCM
>>>>>>    permissions are controlled from. For example, a RW page may
>>>>>>    be made RO just before it is made RX and splitting the VMAs
>>>>>>    while the VMAs may change soon is unnecessary.
>>>>>>
>>>>>> Remove the extra permission check called on a page fault
>>>>>> (vm_operations_struct->fault) or during debugging
>>>>>> (vm_operations_struct->access) when loading the enclave page from swap
>>>>>> that ensures that the VMA permissions are not more relaxed than the
>>>>>> EPCM permissions. Since a VMA could only exist if it passed the
>>>>>> original permission checks during mmap() and a VMA may indeed
>>>>>> have more relaxed permissions than the EPCM permissions this extra
>>>>>> permission check is no longer appropriate.
>>>>>>
>>>>>> With the permission check removed, ensure that PTEs do
>>>>>> not blindly inherit the VMA permissions but instead the permissions
>>>>>> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
>>>>>> and enclave perspective) are installed with the writable bit set,
>>>>>> reducing the need for this additional flow to the permission mismatch
>>>>>> cases handled next.
>>>>>>
>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>>>> ---
>>>>>> Changes since V1:
>>>>>> - Reword commit message (Jarkko).
>>>>>> - Use "relax" instead of "exceed" when referring to permissions (Dave).
>>>>>> - Add snippet to Documentation/x86/sgx.rst that highlights the
>>>>>>   relationship between VMA, EPCM, and PTE permissions on SGX
>>>>>>   systems (Andy).
>>>>>>
>>>>>>  Documentation/x86/sgx.rst      | 10 +++++++++
>>>>>>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
>>>>>>  2 files changed, 30 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
>>>>>> index 89ff924b1480..5659932728a5 100644
>>>>>> --- a/Documentation/x86/sgx.rst
>>>>>> +++ b/Documentation/x86/sgx.rst
>>>>>> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
>>>>>>  * PTEs are installed to match the EPCM permissions, but not be more
>>>>>>    relaxed than the VMA permissions.
>>>>>>  
>>>>>> +On systems supporting SGX2 EPCM permissions may change while the
>>>>>> +enclave page belongs to a VMA without impacting the VMA permissions.
>>>>>> +This means that a running VMA may appear to allow access to an enclave
>>>>>> +page that is not allowed by its EPCM permissions. For example, when an
>>>>>> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
>>>>>> +subsequently changed to have read-only EPCM permissions. The kernel
>>>>>> +continues to maintain correct access to the enclave page through the
>>>>>> +PTE that will ensure that only access allowed by both the VMA
>>>>>> +and EPCM permissions are permitted.
>>>>>> +
>>>>>>  Application interface
>>>>>>  =====================
>>>>>>  
>>>>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>>>>>> index 48afe96ae0f0..b6105d9e7c46 100644
>>>>>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>>>>>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>>>>>> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>>>>>>  }
>>>>>>  
>>>>>>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>>>>>> -						unsigned long addr,
>>>>>> -						unsigned long vm_flags)
>>>>>> +						unsigned long addr)
>>>>>>  {
>>>>>> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>>>>>>  	struct sgx_epc_page *epc_page;
>>>>>>  	struct sgx_encl_page *entry;
>>>>>>  
>>>>>> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>>>>>>  	if (!entry)
>>>>>>  		return ERR_PTR(-EFAULT);
>>>>>>  
>>>>>> -	/*
>>>>>> -	 * Verify that the faulted page has equal or higher build time
>>>>>> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
>>>>>> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
>>>>>> -	 */
>>>>>> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
>>>>>> -		return ERR_PTR(-EFAULT);
>>>>>> -
>>>>>>  	/* Entry successfully located. */
>>>>>>  	if (entry->epc_page) {
>>>>>>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
>>>>>> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>>>>>  {
>>>>>>  	unsigned long addr = (unsigned long)vmf->address;
>>>>>>  	struct vm_area_struct *vma = vmf->vma;
>>>>>> +	unsigned long page_prot_bits;
>>>>>>  	struct sgx_encl_page *entry;
>>>>>> +	unsigned long vm_prot_bits;
>>>>>>  	unsigned long phys_addr;
>>>>>>  	struct sgx_encl *encl;
>>>>>>  	vm_fault_t ret;
>>>>>> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>>>>>  
>>>>>>  	mutex_lock(&encl->lock);
>>>>>>  
>>>>>> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
>>>>>> +	entry = sgx_encl_load_page(encl, addr);
>>>>>>  	if (IS_ERR(entry)) {
>>>>>>  		mutex_unlock(&encl->lock);
>>>>>   
>>>>>> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>>>>>  
>>>>>>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
>>>>>>  
>>>>>> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
>>>>>> +	/*
>>>>>> +	 * Insert PTE to match the EPCM page permissions ensured to not
>>>>>> +	 * exceed the VMA permissions.
>>>>>> +	 */
>>>>>> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>>>>>> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
>>>>>> +	/*
>>>>>> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
>>>>>> +	 * and EPCM are writable (no COW in SGX).
>>>>>> +	 */
>>>>>> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
>>>>>> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
>>>>>> +				  vm_get_page_prot(page_prot_bits));
>>>>>>  	if (ret != VM_FAULT_NOPAGE) {
>>>>>>  		mutex_unlock(&encl->lock);
>>>>>>  
>>>>>> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
>>>>>>   * Load an enclave page to EPC if required, and take encl->lock.
>>>>>>   */
>>>>>>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
>>>>>> -						   unsigned long addr,
>>>>>> -						   unsigned long vm_flags)
>>>>>> +						   unsigned long addr)
>>>>>>  {
>>>>>>  	struct sgx_encl_page *entry;
>>>>>>  
>>>>>>  	for ( ; ; ) {
>>>>>>  		mutex_lock(&encl->lock);
>>>>>>  
>>>>>> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
>>>>>> +		entry = sgx_encl_load_page(encl, addr);
>>>>>>  		if (PTR_ERR(entry) != -EBUSY)
>>>>>>  			break;
>>>>>>  
>>>>>> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>>  		return -EFAULT;
>>>>>>  
>>>>>>  	for (i = 0; i < len; i += cnt) {
>>>>>> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
>>>>>> -					      vma->vm_flags);
>>>>>> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
>>>>>>  		if (IS_ERR(entry)) {
>>>>>>  			ret = PTR_ERR(entry);
>>>>>>  			break;
>>>>>> -- 
>>>>>> 2.25.1
>>>>>>
>>>>>
>>>>> If you unconditionally set vm_max_prot_bits to RWX for dynamically created
>>>>> pags, you would not need to do this.
>>>>>
>>>>> These patches could be then safely dropped then:
>>>>>
>>>>> - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
>>>>> - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
>>>>> - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
>>>>>
>>>>> And that would also keep full ABI compatibility without exceptions to the
>>>>> existing mainline code.
>>>>>
>>>>
>>>> Dropping these changes do not just impact dynamically created pages. Dropping
>>>> these patches would result in EPCM page permission restriction being supported
>>>> for all pages, those added before enclave initialization as well as dynamically
>>>> added pages, but their PTEs will not be impacted.
>>>>
>>>> For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
>>>> then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
>>>> would keep allowing and installing RW PTEs to this page.
>>>
>>> I think that would be perfectly fine, if someone wants to do that. There is
>>> no corrateral damage on doing that. Kernel does not get messed because of
>>> that. It's a use case that does not make sense in the first place, so it'd
>>> be stupid to build anything extensive around it to the kernel.
>>>
>>> Shooting yourself to the foot is something that kernel does and should not
>>> protect user space from unless there is a risk of messing the state of the
>>> kernel itself.
>>>
>>> Much worse is that we have e.g. completely artificial ioctl
>>> SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support this scheme, which could e.g.
>>> cause extra roundtrips for simple EMODPE.
>>>
>>> Also this means not having to include 06/32, which keeps 100% backwards
>>> compatibility in run-time behaviour to the mainline while not restricting
>>> at all dynamically created pages. And we get rid of complex book keeping
>>> of vm_run_prot_bits.
>>>
>>> And generally the whole model is then very easy to understand and explain.
>>> If I had to keep presentation of the current mess in the patch set in a
>>> conference, I can honestly say that I would be in serious trouble. It's
>>> not clean and clear security model, which is a risk by itself.
>>
>> I.e.
>>
>> 1. For EADD'd pages: stick what has been the invariant 1,5 years now. Do
>>    not change it by any means (e.g. 06/32).
>> 2. For EAUG'd pages: set vm_max_prot_bits RWX, which essentially means do
>>    what ever you want with PTE's and EPCM.
>>
>> It's a clear and understandable model that does nothing bad to the kernel,
>> and a run-time developer can surely find away to get things on going. For
>> user space, the most important thing is the clarity in kernel behaviour,
>> and this does deliver that clarity. It's not perfect but it does do the
>> job and anyone can get it.
> 
> Also a quantitive argument for this is that by simplifying security model
> this way it is one ioctl less, which must be considered as +1. We do not
> want to add new ioctls unless it is something we absolutely cannnot live
> without. We absolutely can live without SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.
> 

ok, with the implications understood and accepted I will proceed with a new
series that separates EPCM from PTEs and make RWX PTEs possible by default
for EAUG pages. This has broader impact than just removing
the three patches you list. "[PATCH 07/32] x86/sgx: Add pfn_mkwrite() handler
for present PTEs" is also no longer needed and there is no longer a need
to flush PTEs after restricting permissions. New changes also need to
be considered - at least the current documentation. I'll rework the series.

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-08 16:04             ` Reinette Chatre
@ 2022-03-08 17:00               ` Jarkko Sakkinen
  2022-03-08 17:49                 ` Reinette Chatre
  2022-03-11 11:06                 ` Dr. Greg
  0 siblings, 2 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08 17:00 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Tue, Mar 08, 2022 at 08:04:33AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/8/2022 1:12 AM, Jarkko Sakkinen wrote:
> > On Tue, Mar 08, 2022 at 11:06:46AM +0200, Jarkko Sakkinen wrote:
> >> On Tue, Mar 08, 2022 at 10:14:42AM +0200, Jarkko Sakkinen wrote:
> >>> On Mon, Mar 07, 2022 at 09:36:36AM -0800, Reinette Chatre wrote:
> >>>> Hi Jarkko,
> >>>>
> >>>> On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
> >>>>> On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
> >>>>>> === Summary ===
> >>>>>>
> >>>>>> An SGX VMA can only be created if its permissions are the same or
> >>>>>> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> >>>>>> creation this same rule is again enforced by the page fault handler:
> >>>>>> faulted enclave pages are required to have equal or more relaxed
> >>>>>> EPCM permissions than the VMA permissions.
> >>>>>>
> >>>>>> On SGX1 systems the additional enforcement in the page fault handler
> >>>>>> is redundant and on SGX2 systems it incorrectly prevents access.
> >>>>>> On SGX1 systems it is unnecessary to repeat the enforcement of the
> >>>>>> permission rule. The rule used during original VMA creation will
> >>>>>> ensure that any access attempt will use correct permissions.
> >>>>>> With SGX2 the EPCM permissions of a page can change after VMA
> >>>>>> creation resulting in the VMA permissions potentially being more
> >>>>>> relaxed than the EPCM permissions and the page fault handler
> >>>>>> incorrectly blocking valid access attempts.
> >>>>>>
> >>>>>> Enable the VMA's pages to remain accessible while ensuring that
> >>>>>> the PTEs are installed to match the EPCM permissions but not be
> >>>>>> more relaxed than the VMA permissions.
> >>>>>>
> >>>>>> === Full Changelog ===
> >>>>>>
> >>>>>> An SGX enclave is an area of memory where parts of an application
> >>>>>> can reside. First an enclave is created and loaded (from
> >>>>>> non-enclave memory) with the code and data of an application,
> >>>>>> then user space can map (mmap()) the enclave memory to
> >>>>>> be able to enter the enclave at its defined entry points for
> >>>>>> execution within it.
> >>>>>>
> >>>>>> The hardware maintains a secure structure, the Enclave Page Cache Map
> >>>>>> (EPCM), that tracks the contents of the enclave. Of interest here is
> >>>>>> its tracking of the enclave page permissions. When a page is loaded
> >>>>>> into the enclave its permissions are specified and recorded in the
> >>>>>> EPCM. In parallel the kernel maintains permissions within the
> >>>>>> page table entries (PTEs) and the rule is that PTE permissions
> >>>>>> are not allowed to be more relaxed than the EPCM permissions.
> >>>>>>
> >>>>>> A new mapping (mmap()) of enclave memory can only succeed if the
> >>>>>> mapping has the same or weaker permissions than the permissions that
> >>>>>> were vetted during enclave creation. This is enforced by
> >>>>>> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
> >>>>>> paths. This rule remains.
> >>>>>>
> >>>>>> One feature of SGX2 is to support the modification of EPCM permissions
> >>>>>> after enclave initialization. Enclave pages may thus already be part
> >>>>>> of a VMA at the time their EPCM permissions are changed resulting
> >>>>>> in the VMA's permissions potentially being more relaxed than the EPCM
> >>>>>> permissions.
> >>>>>>
> >>>>>> Allow permissions of existing VMAs to be more relaxed than EPCM
> >>>>>> permissions in preparation for dynamic EPCM permission changes
> >>>>>> made possible in SGX2.  New VMAs that attempt to have more relaxed
> >>>>>> permissions than EPCM permissions continue to be unsupported.
> >>>>>>
> >>>>>> Reasons why permissions of existing VMAs are allowed to be more relaxed
> >>>>>> than EPCM permissions instead of dynamically changing VMA permissions
> >>>>>> when EPCM permissions change are:
> >>>>>> 1) Changing VMA permissions involve splitting VMAs which is an
> >>>>>>    operation that can fail. Additionally changing EPCM permissions of
> >>>>>>    a range of pages could also fail on any of the pages involved.
> >>>>>>    Handling these error cases causes problems. For example, if an
> >>>>>>    EPCM permission change fails and the VMA has already been split
> >>>>>>    then it is not possible to undo the VMA split nor possible to
> >>>>>>    undo the EPCM permission changes that did succeed before the
> >>>>>>    failure.
> >>>>>> 2) The kernel has little insight into the user space where EPCM
> >>>>>>    permissions are controlled from. For example, a RW page may
> >>>>>>    be made RO just before it is made RX and splitting the VMAs
> >>>>>>    while the VMAs may change soon is unnecessary.
> >>>>>>
> >>>>>> Remove the extra permission check called on a page fault
> >>>>>> (vm_operations_struct->fault) or during debugging
> >>>>>> (vm_operations_struct->access) when loading the enclave page from swap
> >>>>>> that ensures that the VMA permissions are not more relaxed than the
> >>>>>> EPCM permissions. Since a VMA could only exist if it passed the
> >>>>>> original permission checks during mmap() and a VMA may indeed
> >>>>>> have more relaxed permissions than the EPCM permissions this extra
> >>>>>> permission check is no longer appropriate.
> >>>>>>
> >>>>>> With the permission check removed, ensure that PTEs do
> >>>>>> not blindly inherit the VMA permissions but instead the permissions
> >>>>>> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
> >>>>>> and enclave perspective) are installed with the writable bit set,
> >>>>>> reducing the need for this additional flow to the permission mismatch
> >>>>>> cases handled next.
> >>>>>>
> >>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> >>>>>> ---
> >>>>>> Changes since V1:
> >>>>>> - Reword commit message (Jarkko).
> >>>>>> - Use "relax" instead of "exceed" when referring to permissions (Dave).
> >>>>>> - Add snippet to Documentation/x86/sgx.rst that highlights the
> >>>>>>   relationship between VMA, EPCM, and PTE permissions on SGX
> >>>>>>   systems (Andy).
> >>>>>>
> >>>>>>  Documentation/x86/sgx.rst      | 10 +++++++++
> >>>>>>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
> >>>>>>  2 files changed, 30 insertions(+), 18 deletions(-)
> >>>>>>
> >>>>>> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> >>>>>> index 89ff924b1480..5659932728a5 100644
> >>>>>> --- a/Documentation/x86/sgx.rst
> >>>>>> +++ b/Documentation/x86/sgx.rst
> >>>>>> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
> >>>>>>  * PTEs are installed to match the EPCM permissions, but not be more
> >>>>>>    relaxed than the VMA permissions.
> >>>>>>  
> >>>>>> +On systems supporting SGX2 EPCM permissions may change while the
> >>>>>> +enclave page belongs to a VMA without impacting the VMA permissions.
> >>>>>> +This means that a running VMA may appear to allow access to an enclave
> >>>>>> +page that is not allowed by its EPCM permissions. For example, when an
> >>>>>> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
> >>>>>> +subsequently changed to have read-only EPCM permissions. The kernel
> >>>>>> +continues to maintain correct access to the enclave page through the
> >>>>>> +PTE that will ensure that only access allowed by both the VMA
> >>>>>> +and EPCM permissions are permitted.
> >>>>>> +
> >>>>>>  Application interface
> >>>>>>  =====================
> >>>>>>  
> >>>>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> >>>>>> index 48afe96ae0f0..b6105d9e7c46 100644
> >>>>>> --- a/arch/x86/kernel/cpu/sgx/encl.c
> >>>>>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> >>>>>> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >>>>>>  }
> >>>>>>  
> >>>>>>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >>>>>> -						unsigned long addr,
> >>>>>> -						unsigned long vm_flags)
> >>>>>> +						unsigned long addr)
> >>>>>>  {
> >>>>>> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> >>>>>>  	struct sgx_epc_page *epc_page;
> >>>>>>  	struct sgx_encl_page *entry;
> >>>>>>  
> >>>>>> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >>>>>>  	if (!entry)
> >>>>>>  		return ERR_PTR(-EFAULT);
> >>>>>>  
> >>>>>> -	/*
> >>>>>> -	 * Verify that the faulted page has equal or higher build time
> >>>>>> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
> >>>>>> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
> >>>>>> -	 */
> >>>>>> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
> >>>>>> -		return ERR_PTR(-EFAULT);
> >>>>>> -
> >>>>>>  	/* Entry successfully located. */
> >>>>>>  	if (entry->epc_page) {
> >>>>>>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> >>>>>> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>>>>>  {
> >>>>>>  	unsigned long addr = (unsigned long)vmf->address;
> >>>>>>  	struct vm_area_struct *vma = vmf->vma;
> >>>>>> +	unsigned long page_prot_bits;
> >>>>>>  	struct sgx_encl_page *entry;
> >>>>>> +	unsigned long vm_prot_bits;
> >>>>>>  	unsigned long phys_addr;
> >>>>>>  	struct sgx_encl *encl;
> >>>>>>  	vm_fault_t ret;
> >>>>>> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>>>>>  
> >>>>>>  	mutex_lock(&encl->lock);
> >>>>>>  
> >>>>>> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
> >>>>>> +	entry = sgx_encl_load_page(encl, addr);
> >>>>>>  	if (IS_ERR(entry)) {
> >>>>>>  		mutex_unlock(&encl->lock);
> >>>>>   
> >>>>>> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>>>>>  
> >>>>>>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
> >>>>>>  
> >>>>>> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
> >>>>>> +	/*
> >>>>>> +	 * Insert PTE to match the EPCM page permissions ensured to not
> >>>>>> +	 * exceed the VMA permissions.
> >>>>>> +	 */
> >>>>>> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> >>>>>> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
> >>>>>> +	/*
> >>>>>> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
> >>>>>> +	 * and EPCM are writable (no COW in SGX).
> >>>>>> +	 */
> >>>>>> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
> >>>>>> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> >>>>>> +				  vm_get_page_prot(page_prot_bits));
> >>>>>>  	if (ret != VM_FAULT_NOPAGE) {
> >>>>>>  		mutex_unlock(&encl->lock);
> >>>>>>  
> >>>>>> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
> >>>>>>   * Load an enclave page to EPC if required, and take encl->lock.
> >>>>>>   */
> >>>>>>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
> >>>>>> -						   unsigned long addr,
> >>>>>> -						   unsigned long vm_flags)
> >>>>>> +						   unsigned long addr)
> >>>>>>  {
> >>>>>>  	struct sgx_encl_page *entry;
> >>>>>>  
> >>>>>>  	for ( ; ; ) {
> >>>>>>  		mutex_lock(&encl->lock);
> >>>>>>  
> >>>>>> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
> >>>>>> +		entry = sgx_encl_load_page(encl, addr);
> >>>>>>  		if (PTR_ERR(entry) != -EBUSY)
> >>>>>>  			break;
> >>>>>>  
> >>>>>> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
> >>>>>>  		return -EFAULT;
> >>>>>>  
> >>>>>>  	for (i = 0; i < len; i += cnt) {
> >>>>>> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
> >>>>>> -					      vma->vm_flags);
> >>>>>> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
> >>>>>>  		if (IS_ERR(entry)) {
> >>>>>>  			ret = PTR_ERR(entry);
> >>>>>>  			break;
> >>>>>> -- 
> >>>>>> 2.25.1
> >>>>>>
> >>>>>
> >>>>> If you unconditionally set vm_max_prot_bits to RWX for dynamically created
> >>>>> pags, you would not need to do this.
> >>>>>
> >>>>> These patches could be then safely dropped then:
> >>>>>
> >>>>> - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
> >>>>> - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
> >>>>> - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
> >>>>>
> >>>>> And that would also keep full ABI compatibility without exceptions to the
> >>>>> existing mainline code.
> >>>>>
> >>>>
> >>>> Dropping these changes do not just impact dynamically created pages. Dropping
> >>>> these patches would result in EPCM page permission restriction being supported
> >>>> for all pages, those added before enclave initialization as well as dynamically
> >>>> added pages, but their PTEs will not be impacted.
> >>>>
> >>>> For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
> >>>> then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
> >>>> would keep allowing and installing RW PTEs to this page.
> >>>
> >>> I think that would be perfectly fine, if someone wants to do that. There is
> >>> no corrateral damage on doing that. Kernel does not get messed because of
> >>> that. It's a use case that does not make sense in the first place, so it'd
> >>> be stupid to build anything extensive around it to the kernel.
> >>>
> >>> Shooting yourself to the foot is something that kernel does and should not
> >>> protect user space from unless there is a risk of messing the state of the
> >>> kernel itself.
> >>>
> >>> Much worse is that we have e.g. completely artificial ioctl
> >>> SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support this scheme, which could e.g.
> >>> cause extra roundtrips for simple EMODPE.
> >>>
> >>> Also this means not having to include 06/32, which keeps 100% backwards
> >>> compatibility in run-time behaviour to the mainline while not restricting
> >>> at all dynamically created pages. And we get rid of complex book keeping
> >>> of vm_run_prot_bits.
> >>>
> >>> And generally the whole model is then very easy to understand and explain.
> >>> If I had to keep presentation of the current mess in the patch set in a
> >>> conference, I can honestly say that I would be in serious trouble. It's
> >>> not clean and clear security model, which is a risk by itself.
> >>
> >> I.e.
> >>
> >> 1. For EADD'd pages: stick what has been the invariant 1,5 years now. Do
> >>    not change it by any means (e.g. 06/32).
> >> 2. For EAUG'd pages: set vm_max_prot_bits RWX, which essentially means do
> >>    what ever you want with PTE's and EPCM.
> >>
> >> It's a clear and understandable model that does nothing bad to the kernel,
> >> and a run-time developer can surely find away to get things on going. For
> >> user space, the most important thing is the clarity in kernel behaviour,
> >> and this does deliver that clarity. It's not perfect but it does do the
> >> job and anyone can get it.
> > 
> > Also a quantitive argument for this is that by simplifying security model
> > this way it is one ioctl less, which must be considered as +1. We do not
> > want to add new ioctls unless it is something we absolutely cannnot live
> > without. We absolutely can live without SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.
> > 
> 
> ok, with the implications understood and accepted I will proceed with a new
> series that separates EPCM from PTEs and make RWX PTEs possible by default
> for EAUG pages. This has broader impact than just removing
> the three patches you list. "[PATCH 07/32] x86/sgx: Add pfn_mkwrite() handler
> for present PTEs" is also no longer needed and there is no longer a need
> to flush PTEs after restricting permissions. New changes also need to
> be considered - at least the current documentation. I'll rework the series.

Yes, I really think it is a solid plan. Any possible LSM hooks would most
likely attach to build product, not the dynamic behaviour.

As far as the page fault handler goes, Haitao is correct after the all
discussions that it makes sense. The purpose of MAP_POPULATE series is
not to replace it but instead complement it. Just wanted to clear this
up as I said otherwise earlier this week.

Thank you.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-08 17:00               ` Jarkko Sakkinen
@ 2022-03-08 17:49                 ` Reinette Chatre
  2022-03-08 18:46                   ` Jarkko Sakkinen
  2022-03-11 11:06                 ` Dr. Greg
  1 sibling, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-08 17:49 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

Hi Jarkko,

On 3/8/2022 9:00 AM, Jarkko Sakkinen wrote:
> On Tue, Mar 08, 2022 at 08:04:33AM -0800, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/8/2022 1:12 AM, Jarkko Sakkinen wrote:
>>> On Tue, Mar 08, 2022 at 11:06:46AM +0200, Jarkko Sakkinen wrote:
>>>> On Tue, Mar 08, 2022 at 10:14:42AM +0200, Jarkko Sakkinen wrote:
>>>>> On Mon, Mar 07, 2022 at 09:36:36AM -0800, Reinette Chatre wrote:
>>>>>> Hi Jarkko,
>>>>>>
>>>>>> On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
>>>>>>> On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
>>>>>>>> === Summary ===
>>>>>>>>
>>>>>>>> An SGX VMA can only be created if its permissions are the same or
>>>>>>>> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
>>>>>>>> creation this same rule is again enforced by the page fault handler:
>>>>>>>> faulted enclave pages are required to have equal or more relaxed
>>>>>>>> EPCM permissions than the VMA permissions.
>>>>>>>>
>>>>>>>> On SGX1 systems the additional enforcement in the page fault handler
>>>>>>>> is redundant and on SGX2 systems it incorrectly prevents access.
>>>>>>>> On SGX1 systems it is unnecessary to repeat the enforcement of the
>>>>>>>> permission rule. The rule used during original VMA creation will
>>>>>>>> ensure that any access attempt will use correct permissions.
>>>>>>>> With SGX2 the EPCM permissions of a page can change after VMA
>>>>>>>> creation resulting in the VMA permissions potentially being more
>>>>>>>> relaxed than the EPCM permissions and the page fault handler
>>>>>>>> incorrectly blocking valid access attempts.
>>>>>>>>
>>>>>>>> Enable the VMA's pages to remain accessible while ensuring that
>>>>>>>> the PTEs are installed to match the EPCM permissions but not be
>>>>>>>> more relaxed than the VMA permissions.
>>>>>>>>
>>>>>>>> === Full Changelog ===
>>>>>>>>
>>>>>>>> An SGX enclave is an area of memory where parts of an application
>>>>>>>> can reside. First an enclave is created and loaded (from
>>>>>>>> non-enclave memory) with the code and data of an application,
>>>>>>>> then user space can map (mmap()) the enclave memory to
>>>>>>>> be able to enter the enclave at its defined entry points for
>>>>>>>> execution within it.
>>>>>>>>
>>>>>>>> The hardware maintains a secure structure, the Enclave Page Cache Map
>>>>>>>> (EPCM), that tracks the contents of the enclave. Of interest here is
>>>>>>>> its tracking of the enclave page permissions. When a page is loaded
>>>>>>>> into the enclave its permissions are specified and recorded in the
>>>>>>>> EPCM. In parallel the kernel maintains permissions within the
>>>>>>>> page table entries (PTEs) and the rule is that PTE permissions
>>>>>>>> are not allowed to be more relaxed than the EPCM permissions.
>>>>>>>>
>>>>>>>> A new mapping (mmap()) of enclave memory can only succeed if the
>>>>>>>> mapping has the same or weaker permissions than the permissions that
>>>>>>>> were vetted during enclave creation. This is enforced by
>>>>>>>> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
>>>>>>>> paths. This rule remains.
>>>>>>>>
>>>>>>>> One feature of SGX2 is to support the modification of EPCM permissions
>>>>>>>> after enclave initialization. Enclave pages may thus already be part
>>>>>>>> of a VMA at the time their EPCM permissions are changed resulting
>>>>>>>> in the VMA's permissions potentially being more relaxed than the EPCM
>>>>>>>> permissions.
>>>>>>>>
>>>>>>>> Allow permissions of existing VMAs to be more relaxed than EPCM
>>>>>>>> permissions in preparation for dynamic EPCM permission changes
>>>>>>>> made possible in SGX2.  New VMAs that attempt to have more relaxed
>>>>>>>> permissions than EPCM permissions continue to be unsupported.
>>>>>>>>
>>>>>>>> Reasons why permissions of existing VMAs are allowed to be more relaxed
>>>>>>>> than EPCM permissions instead of dynamically changing VMA permissions
>>>>>>>> when EPCM permissions change are:
>>>>>>>> 1) Changing VMA permissions involve splitting VMAs which is an
>>>>>>>>    operation that can fail. Additionally changing EPCM permissions of
>>>>>>>>    a range of pages could also fail on any of the pages involved.
>>>>>>>>    Handling these error cases causes problems. For example, if an
>>>>>>>>    EPCM permission change fails and the VMA has already been split
>>>>>>>>    then it is not possible to undo the VMA split nor possible to
>>>>>>>>    undo the EPCM permission changes that did succeed before the
>>>>>>>>    failure.
>>>>>>>> 2) The kernel has little insight into the user space where EPCM
>>>>>>>>    permissions are controlled from. For example, a RW page may
>>>>>>>>    be made RO just before it is made RX and splitting the VMAs
>>>>>>>>    while the VMAs may change soon is unnecessary.
>>>>>>>>
>>>>>>>> Remove the extra permission check called on a page fault
>>>>>>>> (vm_operations_struct->fault) or during debugging
>>>>>>>> (vm_operations_struct->access) when loading the enclave page from swap
>>>>>>>> that ensures that the VMA permissions are not more relaxed than the
>>>>>>>> EPCM permissions. Since a VMA could only exist if it passed the
>>>>>>>> original permission checks during mmap() and a VMA may indeed
>>>>>>>> have more relaxed permissions than the EPCM permissions this extra
>>>>>>>> permission check is no longer appropriate.
>>>>>>>>
>>>>>>>> With the permission check removed, ensure that PTEs do
>>>>>>>> not blindly inherit the VMA permissions but instead the permissions
>>>>>>>> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
>>>>>>>> and enclave perspective) are installed with the writable bit set,
>>>>>>>> reducing the need for this additional flow to the permission mismatch
>>>>>>>> cases handled next.
>>>>>>>>
>>>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>>>>>> ---
>>>>>>>> Changes since V1:
>>>>>>>> - Reword commit message (Jarkko).
>>>>>>>> - Use "relax" instead of "exceed" when referring to permissions (Dave).
>>>>>>>> - Add snippet to Documentation/x86/sgx.rst that highlights the
>>>>>>>>   relationship between VMA, EPCM, and PTE permissions on SGX
>>>>>>>>   systems (Andy).
>>>>>>>>
>>>>>>>>  Documentation/x86/sgx.rst      | 10 +++++++++
>>>>>>>>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
>>>>>>>>  2 files changed, 30 insertions(+), 18 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
>>>>>>>> index 89ff924b1480..5659932728a5 100644
>>>>>>>> --- a/Documentation/x86/sgx.rst
>>>>>>>> +++ b/Documentation/x86/sgx.rst
>>>>>>>> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
>>>>>>>>  * PTEs are installed to match the EPCM permissions, but not be more
>>>>>>>>    relaxed than the VMA permissions.
>>>>>>>>  
>>>>>>>> +On systems supporting SGX2 EPCM permissions may change while the
>>>>>>>> +enclave page belongs to a VMA without impacting the VMA permissions.
>>>>>>>> +This means that a running VMA may appear to allow access to an enclave
>>>>>>>> +page that is not allowed by its EPCM permissions. For example, when an
>>>>>>>> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
>>>>>>>> +subsequently changed to have read-only EPCM permissions. The kernel
>>>>>>>> +continues to maintain correct access to the enclave page through the
>>>>>>>> +PTE that will ensure that only access allowed by both the VMA
>>>>>>>> +and EPCM permissions are permitted.
>>>>>>>> +
>>>>>>>>  Application interface
>>>>>>>>  =====================
>>>>>>>>  
>>>>>>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>>>>>>>> index 48afe96ae0f0..b6105d9e7c46 100644
>>>>>>>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>>>>>>>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>>>>>>>> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>>>>>>>> -						unsigned long addr,
>>>>>>>> -						unsigned long vm_flags)
>>>>>>>> +						unsigned long addr)
>>>>>>>>  {
>>>>>>>> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>>>>>>>>  	struct sgx_epc_page *epc_page;
>>>>>>>>  	struct sgx_encl_page *entry;
>>>>>>>>  
>>>>>>>> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>>>>>>>>  	if (!entry)
>>>>>>>>  		return ERR_PTR(-EFAULT);
>>>>>>>>  
>>>>>>>> -	/*
>>>>>>>> -	 * Verify that the faulted page has equal or higher build time
>>>>>>>> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
>>>>>>>> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
>>>>>>>> -	 */
>>>>>>>> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
>>>>>>>> -		return ERR_PTR(-EFAULT);
>>>>>>>> -
>>>>>>>>  	/* Entry successfully located. */
>>>>>>>>  	if (entry->epc_page) {
>>>>>>>>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
>>>>>>>> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>>>>>>>  {
>>>>>>>>  	unsigned long addr = (unsigned long)vmf->address;
>>>>>>>>  	struct vm_area_struct *vma = vmf->vma;
>>>>>>>> +	unsigned long page_prot_bits;
>>>>>>>>  	struct sgx_encl_page *entry;
>>>>>>>> +	unsigned long vm_prot_bits;
>>>>>>>>  	unsigned long phys_addr;
>>>>>>>>  	struct sgx_encl *encl;
>>>>>>>>  	vm_fault_t ret;
>>>>>>>> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>>>>>>>  
>>>>>>>>  	mutex_lock(&encl->lock);
>>>>>>>>  
>>>>>>>> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
>>>>>>>> +	entry = sgx_encl_load_page(encl, addr);
>>>>>>>>  	if (IS_ERR(entry)) {
>>>>>>>>  		mutex_unlock(&encl->lock);
>>>>>>>   
>>>>>>>> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>>>>>>>>  
>>>>>>>>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
>>>>>>>>  
>>>>>>>> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
>>>>>>>> +	/*
>>>>>>>> +	 * Insert PTE to match the EPCM page permissions ensured to not
>>>>>>>> +	 * exceed the VMA permissions.
>>>>>>>> +	 */
>>>>>>>> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>>>>>>>> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
>>>>>>>> +	/*
>>>>>>>> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
>>>>>>>> +	 * and EPCM are writable (no COW in SGX).
>>>>>>>> +	 */
>>>>>>>> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
>>>>>>>> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
>>>>>>>> +				  vm_get_page_prot(page_prot_bits));
>>>>>>>>  	if (ret != VM_FAULT_NOPAGE) {
>>>>>>>>  		mutex_unlock(&encl->lock);
>>>>>>>>  
>>>>>>>> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
>>>>>>>>   * Load an enclave page to EPC if required, and take encl->lock.
>>>>>>>>   */
>>>>>>>>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
>>>>>>>> -						   unsigned long addr,
>>>>>>>> -						   unsigned long vm_flags)
>>>>>>>> +						   unsigned long addr)
>>>>>>>>  {
>>>>>>>>  	struct sgx_encl_page *entry;
>>>>>>>>  
>>>>>>>>  	for ( ; ; ) {
>>>>>>>>  		mutex_lock(&encl->lock);
>>>>>>>>  
>>>>>>>> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
>>>>>>>> +		entry = sgx_encl_load_page(encl, addr);
>>>>>>>>  		if (PTR_ERR(entry) != -EBUSY)
>>>>>>>>  			break;
>>>>>>>>  
>>>>>>>> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>>>>  		return -EFAULT;
>>>>>>>>  
>>>>>>>>  	for (i = 0; i < len; i += cnt) {
>>>>>>>> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
>>>>>>>> -					      vma->vm_flags);
>>>>>>>> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
>>>>>>>>  		if (IS_ERR(entry)) {
>>>>>>>>  			ret = PTR_ERR(entry);
>>>>>>>>  			break;
>>>>>>>> -- 
>>>>>>>> 2.25.1
>>>>>>>>
>>>>>>>
>>>>>>> If you unconditionally set vm_max_prot_bits to RWX for dynamically created
>>>>>>> pags, you would not need to do this.
>>>>>>>
>>>>>>> These patches could be then safely dropped then:
>>>>>>>
>>>>>>> - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
>>>>>>> - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
>>>>>>> - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
>>>>>>>
>>>>>>> And that would also keep full ABI compatibility without exceptions to the
>>>>>>> existing mainline code.
>>>>>>>
>>>>>>
>>>>>> Dropping these changes do not just impact dynamically created pages. Dropping
>>>>>> these patches would result in EPCM page permission restriction being supported
>>>>>> for all pages, those added before enclave initialization as well as dynamically
>>>>>> added pages, but their PTEs will not be impacted.
>>>>>>
>>>>>> For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
>>>>>> then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
>>>>>> would keep allowing and installing RW PTEs to this page.
>>>>>
>>>>> I think that would be perfectly fine, if someone wants to do that. There is
>>>>> no corrateral damage on doing that. Kernel does not get messed because of
>>>>> that. It's a use case that does not make sense in the first place, so it'd
>>>>> be stupid to build anything extensive around it to the kernel.
>>>>>
>>>>> Shooting yourself to the foot is something that kernel does and should not
>>>>> protect user space from unless there is a risk of messing the state of the
>>>>> kernel itself.
>>>>>
>>>>> Much worse is that we have e.g. completely artificial ioctl
>>>>> SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support this scheme, which could e.g.
>>>>> cause extra roundtrips for simple EMODPE.
>>>>>
>>>>> Also this means not having to include 06/32, which keeps 100% backwards
>>>>> compatibility in run-time behaviour to the mainline while not restricting
>>>>> at all dynamically created pages. And we get rid of complex book keeping
>>>>> of vm_run_prot_bits.
>>>>>
>>>>> And generally the whole model is then very easy to understand and explain.
>>>>> If I had to keep presentation of the current mess in the patch set in a
>>>>> conference, I can honestly say that I would be in serious trouble. It's
>>>>> not clean and clear security model, which is a risk by itself.
>>>>
>>>> I.e.
>>>>
>>>> 1. For EADD'd pages: stick what has been the invariant 1,5 years now. Do
>>>>    not change it by any means (e.g. 06/32).
>>>> 2. For EAUG'd pages: set vm_max_prot_bits RWX, which essentially means do
>>>>    what ever you want with PTE's and EPCM.
>>>>
>>>> It's a clear and understandable model that does nothing bad to the kernel,
>>>> and a run-time developer can surely find away to get things on going. For
>>>> user space, the most important thing is the clarity in kernel behaviour,
>>>> and this does deliver that clarity. It's not perfect but it does do the
>>>> job and anyone can get it.
>>>
>>> Also a quantitive argument for this is that by simplifying security model
>>> this way it is one ioctl less, which must be considered as +1. We do not
>>> want to add new ioctls unless it is something we absolutely cannnot live
>>> without. We absolutely can live without SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.
>>>
>>
>> ok, with the implications understood and accepted I will proceed with a new
>> series that separates EPCM from PTEs and make RWX PTEs possible by default
>> for EAUG pages. This has broader impact than just removing
>> the three patches you list. "[PATCH 07/32] x86/sgx: Add pfn_mkwrite() handler
>> for present PTEs" is also no longer needed and there is no longer a need
>> to flush PTEs after restricting permissions. New changes also need to
>> be considered - at least the current documentation. I'll rework the series.
> 
> Yes, I really think it is a solid plan. Any possible LSM hooks would most
> likely attach to build product, not the dynamic behaviour.
> 
> As far as the page fault handler goes, Haitao is correct after the all
> discussions that it makes sense. The purpose of MAP_POPULATE series is
> not to replace it but instead complement it. Just wanted to clear this
> up as I said otherwise earlier this week.
> 

Understood. I will keep the implementation where EAUG is done in page fault
handler. I do plan to pick up your patch "x86/sgx: Export sgx_encl_page_alloc()"
since a consequence of the other changes is that this can now be shared.

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-08 17:49                 ` Reinette Chatre
@ 2022-03-08 18:46                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08 18:46 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: dave.hansen, tglx, bp, luto, mingo, linux-sgx, x86, seanjc,
	kai.huang, cathy.zhang, cedric.xing, haitao.huang, mark.shanahan,
	hpa, linux-kernel

On Tue, Mar 08, 2022 at 09:49:01AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/8/2022 9:00 AM, Jarkko Sakkinen wrote:
> > On Tue, Mar 08, 2022 at 08:04:33AM -0800, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 3/8/2022 1:12 AM, Jarkko Sakkinen wrote:
> >>> On Tue, Mar 08, 2022 at 11:06:46AM +0200, Jarkko Sakkinen wrote:
> >>>> On Tue, Mar 08, 2022 at 10:14:42AM +0200, Jarkko Sakkinen wrote:
> >>>>> On Mon, Mar 07, 2022 at 09:36:36AM -0800, Reinette Chatre wrote:
> >>>>>> Hi Jarkko,
> >>>>>>
> >>>>>> On 3/7/2022 9:10 AM, Jarkko Sakkinen wrote:
> >>>>>>> On Mon, Feb 07, 2022 at 04:45:28PM -0800, Reinette Chatre wrote:
> >>>>>>>> === Summary ===
> >>>>>>>>
> >>>>>>>> An SGX VMA can only be created if its permissions are the same or
> >>>>>>>> weaker than the Enclave Page Cache Map (EPCM) permissions. After VMA
> >>>>>>>> creation this same rule is again enforced by the page fault handler:
> >>>>>>>> faulted enclave pages are required to have equal or more relaxed
> >>>>>>>> EPCM permissions than the VMA permissions.
> >>>>>>>>
> >>>>>>>> On SGX1 systems the additional enforcement in the page fault handler
> >>>>>>>> is redundant and on SGX2 systems it incorrectly prevents access.
> >>>>>>>> On SGX1 systems it is unnecessary to repeat the enforcement of the
> >>>>>>>> permission rule. The rule used during original VMA creation will
> >>>>>>>> ensure that any access attempt will use correct permissions.
> >>>>>>>> With SGX2 the EPCM permissions of a page can change after VMA
> >>>>>>>> creation resulting in the VMA permissions potentially being more
> >>>>>>>> relaxed than the EPCM permissions and the page fault handler
> >>>>>>>> incorrectly blocking valid access attempts.
> >>>>>>>>
> >>>>>>>> Enable the VMA's pages to remain accessible while ensuring that
> >>>>>>>> the PTEs are installed to match the EPCM permissions but not be
> >>>>>>>> more relaxed than the VMA permissions.
> >>>>>>>>
> >>>>>>>> === Full Changelog ===
> >>>>>>>>
> >>>>>>>> An SGX enclave is an area of memory where parts of an application
> >>>>>>>> can reside. First an enclave is created and loaded (from
> >>>>>>>> non-enclave memory) with the code and data of an application,
> >>>>>>>> then user space can map (mmap()) the enclave memory to
> >>>>>>>> be able to enter the enclave at its defined entry points for
> >>>>>>>> execution within it.
> >>>>>>>>
> >>>>>>>> The hardware maintains a secure structure, the Enclave Page Cache Map
> >>>>>>>> (EPCM), that tracks the contents of the enclave. Of interest here is
> >>>>>>>> its tracking of the enclave page permissions. When a page is loaded
> >>>>>>>> into the enclave its permissions are specified and recorded in the
> >>>>>>>> EPCM. In parallel the kernel maintains permissions within the
> >>>>>>>> page table entries (PTEs) and the rule is that PTE permissions
> >>>>>>>> are not allowed to be more relaxed than the EPCM permissions.
> >>>>>>>>
> >>>>>>>> A new mapping (mmap()) of enclave memory can only succeed if the
> >>>>>>>> mapping has the same or weaker permissions than the permissions that
> >>>>>>>> were vetted during enclave creation. This is enforced by
> >>>>>>>> sgx_encl_may_map() that is called on the mmap() as well as mprotect()
> >>>>>>>> paths. This rule remains.
> >>>>>>>>
> >>>>>>>> One feature of SGX2 is to support the modification of EPCM permissions
> >>>>>>>> after enclave initialization. Enclave pages may thus already be part
> >>>>>>>> of a VMA at the time their EPCM permissions are changed resulting
> >>>>>>>> in the VMA's permissions potentially being more relaxed than the EPCM
> >>>>>>>> permissions.
> >>>>>>>>
> >>>>>>>> Allow permissions of existing VMAs to be more relaxed than EPCM
> >>>>>>>> permissions in preparation for dynamic EPCM permission changes
> >>>>>>>> made possible in SGX2.  New VMAs that attempt to have more relaxed
> >>>>>>>> permissions than EPCM permissions continue to be unsupported.
> >>>>>>>>
> >>>>>>>> Reasons why permissions of existing VMAs are allowed to be more relaxed
> >>>>>>>> than EPCM permissions instead of dynamically changing VMA permissions
> >>>>>>>> when EPCM permissions change are:
> >>>>>>>> 1) Changing VMA permissions involve splitting VMAs which is an
> >>>>>>>>    operation that can fail. Additionally changing EPCM permissions of
> >>>>>>>>    a range of pages could also fail on any of the pages involved.
> >>>>>>>>    Handling these error cases causes problems. For example, if an
> >>>>>>>>    EPCM permission change fails and the VMA has already been split
> >>>>>>>>    then it is not possible to undo the VMA split nor possible to
> >>>>>>>>    undo the EPCM permission changes that did succeed before the
> >>>>>>>>    failure.
> >>>>>>>> 2) The kernel has little insight into the user space where EPCM
> >>>>>>>>    permissions are controlled from. For example, a RW page may
> >>>>>>>>    be made RO just before it is made RX and splitting the VMAs
> >>>>>>>>    while the VMAs may change soon is unnecessary.
> >>>>>>>>
> >>>>>>>> Remove the extra permission check called on a page fault
> >>>>>>>> (vm_operations_struct->fault) or during debugging
> >>>>>>>> (vm_operations_struct->access) when loading the enclave page from swap
> >>>>>>>> that ensures that the VMA permissions are not more relaxed than the
> >>>>>>>> EPCM permissions. Since a VMA could only exist if it passed the
> >>>>>>>> original permission checks during mmap() and a VMA may indeed
> >>>>>>>> have more relaxed permissions than the EPCM permissions this extra
> >>>>>>>> permission check is no longer appropriate.
> >>>>>>>>
> >>>>>>>> With the permission check removed, ensure that PTEs do
> >>>>>>>> not blindly inherit the VMA permissions but instead the permissions
> >>>>>>>> that the VMA and EPCM agree on. PTEs for writable pages (from VMA
> >>>>>>>> and enclave perspective) are installed with the writable bit set,
> >>>>>>>> reducing the need for this additional flow to the permission mismatch
> >>>>>>>> cases handled next.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> >>>>>>>> ---
> >>>>>>>> Changes since V1:
> >>>>>>>> - Reword commit message (Jarkko).
> >>>>>>>> - Use "relax" instead of "exceed" when referring to permissions (Dave).
> >>>>>>>> - Add snippet to Documentation/x86/sgx.rst that highlights the
> >>>>>>>>   relationship between VMA, EPCM, and PTE permissions on SGX
> >>>>>>>>   systems (Andy).
> >>>>>>>>
> >>>>>>>>  Documentation/x86/sgx.rst      | 10 +++++++++
> >>>>>>>>  arch/x86/kernel/cpu/sgx/encl.c | 38 ++++++++++++++++++----------------
> >>>>>>>>  2 files changed, 30 insertions(+), 18 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
> >>>>>>>> index 89ff924b1480..5659932728a5 100644
> >>>>>>>> --- a/Documentation/x86/sgx.rst
> >>>>>>>> +++ b/Documentation/x86/sgx.rst
> >>>>>>>> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
> >>>>>>>>  * PTEs are installed to match the EPCM permissions, but not be more
> >>>>>>>>    relaxed than the VMA permissions.
> >>>>>>>>  
> >>>>>>>> +On systems supporting SGX2 EPCM permissions may change while the
> >>>>>>>> +enclave page belongs to a VMA without impacting the VMA permissions.
> >>>>>>>> +This means that a running VMA may appear to allow access to an enclave
> >>>>>>>> +page that is not allowed by its EPCM permissions. For example, when an
> >>>>>>>> +enclave page with RW EPCM permissions is mapped by a RW VMA but is
> >>>>>>>> +subsequently changed to have read-only EPCM permissions. The kernel
> >>>>>>>> +continues to maintain correct access to the enclave page through the
> >>>>>>>> +PTE that will ensure that only access allowed by both the VMA
> >>>>>>>> +and EPCM permissions are permitted.
> >>>>>>>> +
> >>>>>>>>  Application interface
> >>>>>>>>  =====================
> >>>>>>>>  
> >>>>>>>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> >>>>>>>> index 48afe96ae0f0..b6105d9e7c46 100644
> >>>>>>>> --- a/arch/x86/kernel/cpu/sgx/encl.c
> >>>>>>>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> >>>>>>>> @@ -91,10 +91,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>>  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >>>>>>>> -						unsigned long addr,
> >>>>>>>> -						unsigned long vm_flags)
> >>>>>>>> +						unsigned long addr)
> >>>>>>>>  {
> >>>>>>>> -	unsigned long vm_prot_bits = vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> >>>>>>>>  	struct sgx_epc_page *epc_page;
> >>>>>>>>  	struct sgx_encl_page *entry;
> >>>>>>>>  
> >>>>>>>> @@ -102,14 +100,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >>>>>>>>  	if (!entry)
> >>>>>>>>  		return ERR_PTR(-EFAULT);
> >>>>>>>>  
> >>>>>>>> -	/*
> >>>>>>>> -	 * Verify that the faulted page has equal or higher build time
> >>>>>>>> -	 * permissions than the VMA permissions (i.e. the subset of {VM_READ,
> >>>>>>>> -	 * VM_WRITE, VM_EXECUTE} in vma->vm_flags).
> >>>>>>>> -	 */
> >>>>>>>> -	if ((entry->vm_max_prot_bits & vm_prot_bits) != vm_prot_bits)
> >>>>>>>> -		return ERR_PTR(-EFAULT);
> >>>>>>>> -
> >>>>>>>>  	/* Entry successfully located. */
> >>>>>>>>  	if (entry->epc_page) {
> >>>>>>>>  		if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
> >>>>>>>> @@ -138,7 +128,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>>>>>>>  {
> >>>>>>>>  	unsigned long addr = (unsigned long)vmf->address;
> >>>>>>>>  	struct vm_area_struct *vma = vmf->vma;
> >>>>>>>> +	unsigned long page_prot_bits;
> >>>>>>>>  	struct sgx_encl_page *entry;
> >>>>>>>> +	unsigned long vm_prot_bits;
> >>>>>>>>  	unsigned long phys_addr;
> >>>>>>>>  	struct sgx_encl *encl;
> >>>>>>>>  	vm_fault_t ret;
> >>>>>>>> @@ -155,7 +147,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>>>>>>>  
> >>>>>>>>  	mutex_lock(&encl->lock);
> >>>>>>>>  
> >>>>>>>> -	entry = sgx_encl_load_page(encl, addr, vma->vm_flags);
> >>>>>>>> +	entry = sgx_encl_load_page(encl, addr);
> >>>>>>>>  	if (IS_ERR(entry)) {
> >>>>>>>>  		mutex_unlock(&encl->lock);
> >>>>>>>   
> >>>>>>>> @@ -167,7 +159,19 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >>>>>>>>  
> >>>>>>>>  	phys_addr = sgx_get_epc_phys_addr(entry->epc_page);
> >>>>>>>>  
> >>>>>>>> -	ret = vmf_insert_pfn(vma, addr, PFN_DOWN(phys_addr));
> >>>>>>>> +	/*
> >>>>>>>> +	 * Insert PTE to match the EPCM page permissions ensured to not
> >>>>>>>> +	 * exceed the VMA permissions.
> >>>>>>>> +	 */
> >>>>>>>> +	vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
> >>>>>>>> +	page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
> >>>>>>>> +	/*
> >>>>>>>> +	 * Add VM_SHARED so that PTE is made writable right away if VMA
> >>>>>>>> +	 * and EPCM are writable (no COW in SGX).
> >>>>>>>> +	 */
> >>>>>>>> +	page_prot_bits |= (vma->vm_flags & VM_SHARED);
> >>>>>>>> +	ret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> >>>>>>>> +				  vm_get_page_prot(page_prot_bits));
> >>>>>>>>  	if (ret != VM_FAULT_NOPAGE) {
> >>>>>>>>  		mutex_unlock(&encl->lock);
> >>>>>>>>  
> >>>>>>>> @@ -295,15 +299,14 @@ static int sgx_encl_debug_write(struct sgx_encl *encl, struct sgx_encl_page *pag
> >>>>>>>>   * Load an enclave page to EPC if required, and take encl->lock.
> >>>>>>>>   */
> >>>>>>>>  static struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
> >>>>>>>> -						   unsigned long addr,
> >>>>>>>> -						   unsigned long vm_flags)
> >>>>>>>> +						   unsigned long addr)
> >>>>>>>>  {
> >>>>>>>>  	struct sgx_encl_page *entry;
> >>>>>>>>  
> >>>>>>>>  	for ( ; ; ) {
> >>>>>>>>  		mutex_lock(&encl->lock);
> >>>>>>>>  
> >>>>>>>> -		entry = sgx_encl_load_page(encl, addr, vm_flags);
> >>>>>>>> +		entry = sgx_encl_load_page(encl, addr);
> >>>>>>>>  		if (PTR_ERR(entry) != -EBUSY)
> >>>>>>>>  			break;
> >>>>>>>>  
> >>>>>>>> @@ -339,8 +342,7 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr,
> >>>>>>>>  		return -EFAULT;
> >>>>>>>>  
> >>>>>>>>  	for (i = 0; i < len; i += cnt) {
> >>>>>>>> -		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK,
> >>>>>>>> -					      vma->vm_flags);
> >>>>>>>> +		entry = sgx_encl_reserve_page(encl, (addr + i) & PAGE_MASK);
> >>>>>>>>  		if (IS_ERR(entry)) {
> >>>>>>>>  			ret = PTR_ERR(entry);
> >>>>>>>>  			break;
> >>>>>>>> -- 
> >>>>>>>> 2.25.1
> >>>>>>>>
> >>>>>>>
> >>>>>>> If you unconditionally set vm_max_prot_bits to RWX for dynamically created
> >>>>>>> pags, you would not need to do this.
> >>>>>>>
> >>>>>>> These patches could be then safely dropped then:
> >>>>>>>
> >>>>>>> - [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions 
> >>>>>>> - [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
> >>>>>>> - [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions
> >>>>>>>
> >>>>>>> And that would also keep full ABI compatibility without exceptions to the
> >>>>>>> existing mainline code.
> >>>>>>>
> >>>>>>
> >>>>>> Dropping these changes do not just impact dynamically created pages. Dropping
> >>>>>> these patches would result in EPCM page permission restriction being supported
> >>>>>> for all pages, those added before enclave initialization as well as dynamically
> >>>>>> added pages, but their PTEs will not be impacted.
> >>>>>>
> >>>>>> For example, if a RW enclave page is added via SGX_IOC_ENCLAVE_ADD_PAGES and
> >>>>>> then later made read-only via SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS then Linux
> >>>>>> would keep allowing and installing RW PTEs to this page.
> >>>>>
> >>>>> I think that would be perfectly fine, if someone wants to do that. There is
> >>>>> no corrateral damage on doing that. Kernel does not get messed because of
> >>>>> that. It's a use case that does not make sense in the first place, so it'd
> >>>>> be stupid to build anything extensive around it to the kernel.
> >>>>>
> >>>>> Shooting yourself to the foot is something that kernel does and should not
> >>>>> protect user space from unless there is a risk of messing the state of the
> >>>>> kernel itself.
> >>>>>
> >>>>> Much worse is that we have e.g. completely artificial ioctl
> >>>>> SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to support this scheme, which could e.g.
> >>>>> cause extra roundtrips for simple EMODPE.
> >>>>>
> >>>>> Also this means not having to include 06/32, which keeps 100% backwards
> >>>>> compatibility in run-time behaviour to the mainline while not restricting
> >>>>> at all dynamically created pages. And we get rid of complex book keeping
> >>>>> of vm_run_prot_bits.
> >>>>>
> >>>>> And generally the whole model is then very easy to understand and explain.
> >>>>> If I had to keep presentation of the current mess in the patch set in a
> >>>>> conference, I can honestly say that I would be in serious trouble. It's
> >>>>> not clean and clear security model, which is a risk by itself.
> >>>>
> >>>> I.e.
> >>>>
> >>>> 1. For EADD'd pages: stick what has been the invariant 1,5 years now. Do
> >>>>    not change it by any means (e.g. 06/32).
> >>>> 2. For EAUG'd pages: set vm_max_prot_bits RWX, which essentially means do
> >>>>    what ever you want with PTE's and EPCM.
> >>>>
> >>>> It's a clear and understandable model that does nothing bad to the kernel,
> >>>> and a run-time developer can surely find away to get things on going. For
> >>>> user space, the most important thing is the clarity in kernel behaviour,
> >>>> and this does deliver that clarity. It's not perfect but it does do the
> >>>> job and anyone can get it.
> >>>
> >>> Also a quantitive argument for this is that by simplifying security model
> >>> this way it is one ioctl less, which must be considered as +1. We do not
> >>> want to add new ioctls unless it is something we absolutely cannnot live
> >>> without. We absolutely can live without SGX_IOC_ENCLAVE_RELAX_PERMISSIONS.
> >>>
> >>
> >> ok, with the implications understood and accepted I will proceed with a new
> >> series that separates EPCM from PTEs and make RWX PTEs possible by default
> >> for EAUG pages. This has broader impact than just removing
> >> the three patches you list. "[PATCH 07/32] x86/sgx: Add pfn_mkwrite() handler
> >> for present PTEs" is also no longer needed and there is no longer a need
> >> to flush PTEs after restricting permissions. New changes also need to
> >> be considered - at least the current documentation. I'll rework the series.
> > 
> > Yes, I really think it is a solid plan. Any possible LSM hooks would most
> > likely attach to build product, not the dynamic behaviour.
> > 
> > As far as the page fault handler goes, Haitao is correct after the all
> > discussions that it makes sense. The purpose of MAP_POPULATE series is
> > not to replace it but instead complement it. Just wanted to clear this
> > up as I said otherwise earlier this week.
> > 
> 
> Understood. I will keep the implementation where EAUG is done in page fault
> handler. I do plan to pick up your patch "x86/sgx: Export sgx_encl_page_alloc()"
> since a consequence of the other changes is that this can now be shared.

Yeah, I think we might be able to get this polished for v5.19. I'd expect
a revision or few for polishing the corners but other than that this looks
to be going on right tracks now.

> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-03 21:44                         ` Dave Hansen
  2022-03-05  3:19                           ` Jarkko Sakkinen
@ 2022-03-10  5:43                           ` Jarkko Sakkinen
  2022-03-10  5:59                             ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-10  5:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Reinette Chatre, Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> On 3/3/22 13:23, Reinette Chatre wrote:
> > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > then I believe that SGX would benefit.
> 
> Some Intel folks asked for this quite a while ago.  I think it's
> entirely doable: add a new vm_ops->populate() function that will allow
> ignoring VM_IO|VM_PFNMAP if present.
> 
> Or, if nobody wants to waste all of the vm_ops space, just add an
> arch_vma_populate() or something which can call over into SGX.
> 
> I'll happily review the patches if anyone can put such a beast together.

Everyone would be better off, if EAUG's were done unconditionally for
mmap() after initialization. Nice property is that this needs no core mm
changes.

The resource saving argument is at least a bit weak because you might use
EMODPR for the address range anyway. So you end up doing things just
slower. And to have good confidentiality, you actually probably want to
clear also dynamically added pages with EACCEPTCOPY (and zero page) when
you take them into use.

I find it also a bit worrying that enclave has direct access to allocate
kernel resources and trigger ring-0 opcode. I don't like that part at
all. syscall/ioctl sets the correct barrier, as the host side should be
and is the resource manager, not the enclave.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-10  5:43                           ` Jarkko Sakkinen
@ 2022-03-10  5:59                             ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-10  5:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Reinette Chatre, Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx,
	bp, Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Thu, Mar 10, 2022 at 07:43:42AM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote:
> > On 3/3/22 13:23, Reinette Chatre wrote:
> > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their
> > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability
> > > then I believe that SGX would benefit.
> > 
> > Some Intel folks asked for this quite a while ago.  I think it's
> > entirely doable: add a new vm_ops->populate() function that will allow
> > ignoring VM_IO|VM_PFNMAP if present.
> > 
> > Or, if nobody wants to waste all of the vm_ops space, just add an
> > arch_vma_populate() or something which can call over into SGX.
> > 
> > I'll happily review the patches if anyone can put such a beast together.
> 
> Everyone would be better off, if EAUG's were done unconditionally for
> mmap() after initialization. Nice property is that this needs no core mm
> changes.
> 
> The resource saving argument is at least a bit weak because you might use
> EMODPR for the address range anyway. So you end up doing things just
> slower. And to have good confidentiality, you actually probably want to
> clear also dynamically added pages with EACCEPTCOPY (and zero page) when
> you take them into use.
> 
> I find it also a bit worrying that enclave has direct access to allocate
> kernel resources and trigger ring-0 opcode. I don't like that part at
> all. syscall/ioctl sets the correct barrier, as the host side should be
> and is the resource manager, not the enclave.

Actually, this should be ABI compatible too. I'd expect all kselftests
continue work as they are.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-02-23 19:21     ` Dhanraj, Vijay
  2022-02-23 22:42       ` Reinette Chatre
  2022-02-28 12:24       ` Jarkko Sakkinen
@ 2022-03-10  6:10       ` Jarkko Sakkinen
  2022-03-10 18:33         ` Haitao Huang
  2 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-10  6:10 UTC (permalink / raw)
  To: Dhanraj, Vijay
  Cc: Chatre, Reinette, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> Hi All,
> 
> Regarding the recent update of splitting the page permissions change
> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> one? That is, revert to how it was done in the v1 version?
> 
> Why? Currently in Gramine (a library OS for unmodified applications,
> https://gramineproject.io/) with the new proposed change, one needs to
> store the page permission for each page or range of pages. And for every
> request of `mmap` or `mprotect`, Gramine would have to do a lookup of the
> page permissions for the request range and then call the respective IOCTL
> either RESTRICT or RELAX. This seems a little overwhelming.
> 
> Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do
> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> With this approach, we can avoid storing  page permissions and simplify
> the implementation.
> 
> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows
> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> not sure what will be the performance impact. Is there any data point to
> see the performance impact?
> 
> Thanks,
> -Vijay

This should get better in the next versuin. "relax" is gone. And for
dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
internal vm_max_prot_bits is set to RWX.

I patched the existing series eno

For Enarx I'm using the following patterns.

Shim mmap() handler:
1. Ask host for mmap() syscall.
2. Construct secinfo matching the protection bits.
3. For each page in the address range: EACCEPTCOPY with a
   zero page.
    
Shim mprotect() handler:
1. Ask host for mprotect() syscall.
2. For each page in the address range: EACCEPT with PROT_NONE
   secinfo and EMODPE with the secinfo having the prot bits.
    
Backend mprotect() handler:
1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
   range with PROT_NONE.
2. Invoke real mprotect() syscall.

Not super-complicated.

That is the safest way to changes permissions i.e. use EMODPR only to reset
the permissions, and EMODPE as EMODP. Then the page is always either
inaccessible completely or with the correct permissions.

Any other ways to use EMODPR are a bit questionable. That's why I tend to
think that it would be better to kernel provide only limited version of it
to reset the permissions. Most of the other use will be most likely
mis-use. IMHO there is only one legit pattern to use it, i.e. "least
racy" pattern.

I would replace SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS with
SGX_IOC_ENCLAVE_RESET_PERMISSIONS that resets pages to PROT_NONE or embed
this straight into mprotect().

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-10  6:10       ` Jarkko Sakkinen
@ 2022-03-10 18:33         ` Haitao Huang
  2022-03-11 12:10           ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Haitao Huang @ 2022-03-10 18:33 UTC (permalink / raw)
  To: Dhanraj, Vijay, Jarkko Sakkinen
  Cc: Chatre, Reinette, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko

I have some trouble understanding the sequences below.

On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
>> Hi All,
>>
>> Regarding the recent update of splitting the page permissions change
>> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
>> one? That is, revert to how it was done in the v1 version?
>>
>> Why? Currently in Gramine (a library OS for unmodified applications,
>> https://gramineproject.io/) with the new proposed change, one needs to
>> store the page permission for each page or range of pages. And for every
>> request of `mmap` or `mprotect`, Gramine would have to do a lookup of  
>> the
>> page permissions for the request range and then call the respective  
>> IOCTL
>> either RESTRICT or RELAX. This seems a little overwhelming.
>>
>> Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do
>> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
>> With this approach, we can avoid storing  page permissions and simplify
>> the implementation.
>>
>> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`  
>> flows
>> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
>> not sure what will be the performance impact. Is there any data point to
>> see the performance impact?
>>
>> Thanks,
>> -Vijay
>
> This should get better in the next versuin. "relax" is gone. And for
> dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> internal vm_max_prot_bits is set to RWX.
>
> I patched the existing series eno
>
> For Enarx I'm using the following patterns.
>
> Shim mmap() handler:
> 1. Ask host for mmap() syscall.
> 2. Construct secinfo matching the protection bits.
> 3. For each page in the address range: EACCEPTCOPY with a
>    zero page.

For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.  
So this only works for mmap(..., RW) or mmap(...,RWX).

So that gives you pages with RW/RWX.

To change permissions of any of those pages from RW/RWX to R/RX , you need  
call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you  
can't just do EMODPE.

so for RW->R, you either:

1)EMODPR(EPCM.NONE)
2)EACCEPT(EPCM.NONE)
3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read  
access permitted by enclave"

or:

1)EMODPR(EPCM.PROT_R)
2)EACCEPT(EPCM.PROT_R)


> Shim mprotect() handler:
> 1. Ask host for mprotect() syscall.
> 2. For each page in the address range: EACCEPT with PROT_NONE
>    secinfo and EMODPE with the secinfo having the prot bits.

EACCEPT requires PTE.R. And EAUG'd pages will always initialized with  
EPCM.RW,
so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.


> Backend mprotect() handler:
> 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
>    range with PROT_NONE.
> 2. Invoke real mprotect() syscall.
>
Note #1 can only be done after EACCEPT. MODPR is not allowed for pending  
pages.

> Not super-complicated.
>
> That is the safest way to changes permissions i.e. use EMODPR only to  
> reset
> the permissions, and EMODPE as EMODP. Then the page is always either
> inaccessible completely or with the correct permissions.
>
> Any other ways to use EMODPR are a bit questionable. That's why I tend to
> think that it would be better to kernel provide only limited version of  
> it
> to reset the permissions. Most of the other use will be most likely
> mis-use. IMHO there is only one legit pattern to use it, i.e. "least
> racy" pattern.
>

I don't see it as "racy" if you copy some data into RW page and reduce it  
to R.
 From kernel point of view the only diff is EMODPR(NONE) vs EMODPR(R).

It's more efficient to do just EMODPR(R) than EMODPR(NONE)+ EMODPE(R).


Thanks
Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions
  2022-03-08 17:00               ` Jarkko Sakkinen
  2022-03-08 17:49                 ` Reinette Chatre
@ 2022-03-11 11:06                 ` Dr. Greg
  1 sibling, 0 replies; 130+ messages in thread
From: Dr. Greg @ 2022-03-11 11:06 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Reinette Chatre, dave.hansen, tglx, bp, luto, mingo, linux-sgx,
	x86, seanjc, kai.huang, cathy.zhang, cedric.xing, haitao.huang,
	mark.shanahan, hpa, linux-kernel, corbet

On Tue, Mar 08, 2022 at 07:00:03PM +0200, Jarkko Sakkinen wrote:

Good morning, I hope this note finds the week ending well for
everyone.

Based on previous experiences, I wasn't going to respond to this
conversation.  However, after seeing Thomas' response to Cathy's
microcode patch series, and reflecting on things a bit, I thought it
would be useful to do so in order to stimulate further scholarly discussion.

For the record, Thomas was spot-on with concerns about how much sense
it makes to persist enclaves through a microcode update.  Everyone
despises downtime, but if you are serious about security, idling a machine
out to fully 'reset it', in the face of a microcode update seems to be
the only prudent action.

However, with all due respect, the assertion that Linux is all about
the highest standards in technical honesty and soundness of
engineering, are specious, at least in the context of the development
of this driver.

I see that Jarkko noted his conversations with Mark Shanahan on SGX
micro-architectural details.  For the record, I have been fortunate
enough to have engaged with Shanahan, Razos, Johnson and a number of
the other engineers behind this technology.  I've also had the
opportunity to engage with and provide recommendations to the SGX
engineering team in Israel, before the demise of the Platform Security
Group and the exile of SGX to servers only.

I believe that Simon Johnson received a Principal Engineer nomination
for his work on SGX.  He called me one morning here at the lake and we
had a rather lengthy and entertaining discussion about the issues
surrounding doing packet inspection from inside of an enclave, I
believe at 40 GBPS line rates, maybe 10, I can't remember.  In any
event, way faster then what could be accomplished in the face of the
~60,000 cycle overhead of untrusted<->trusted context switches, which
in turn motivated the 'switchless' architecture.

So with the length and girth discussions at bay, reflections on the
technical and security issues at hand follow below, for whatever they
are worth.

> On Tue, Mar 08, 2022 at 08:04:33AM -0800, Reinette Chatre wrote:
> > ok, with the implications understood and accepted I will proceed
> > with a new series that separates EPCM from PTEs and make RWX PTEs
> > possible by default for EAUG pages. This has broader impact than
> > just removing the three patches you list. "[PATCH 07/32] x86/sgx:
> > Add pfn_mkwrite() handler for present PTEs" is also no longer
> > needed and there is no longer a need to flush PTEs after
> > restricting permissions. New changes also need to be considered -
> > at least the current documentation. I'll rework the series.

> Yes, I really think it is a solid plan. Any possible LSM hooks would
> most likely attach to build product, not the dynamic behaviour.

I assume everyone remembers, if not the kernel archives will have full
details, that we had a very lively discussion about these issues
starting well over two years ago.

Jarkko's, rather dogmatic assertion, that it should simply be 'The
Wild West', with respect to PTE memory permissions on dynamically
allocated enclave memory, suggests that all of the hand wringing and
proselytizing about SGX being a way to circumvent LSM controls on
executable memory were political and technical grandstanding,
amounting to nothing but security theater.

Apologies if this is perceived to be a bit strident, it had been a
long week already on Wednesday morning.

I made the point at the time, that remained unacknowledged, that none
of the machinations involved had any practical security value with
respect to where everyone wanted this technology to go, ie. a driver
with full Enclave Dynamic Memory Management (EDMM) support, which is
the precipice on which we now stand.

I noted that the only valid security controls for this technology were
reputational controls based on cryptographic identities.  In fact, we
developed, and posted, a rather complete implementation of such an
infrastructure.  Here is a URL to the last patch that we had time to
fuss with putting up:

ftp://ftp.enjellic.com/pub/sgx/kernel/SFLC-5.12.patch

I think we have a 5.13, if not a 5.14 patch laying around as well.

The response, from one of the illuminaries in these discussions was;
"I dare you to post the patches so I can immediately NACK them".

I guess that pretty much covers the question of why their may be
perceived reluctance in some quarters about spending time trying to
upstream kernel functionality.

So, it was with some interest, that I noted Reinette Chatre's recent
e-mail which indicated, that in the face of the EDMM driver and the
security implications it presents, a proof-of-concept implementation
for reputational security controls had been developed.  That
implementation is based on MRENCLAVE and/or MRSIGNER values, both
cryptographically based identities, as was ours.

Although we didn't bother with MRENCLAVE values, for largely the same
reason why SGX_KEYPOLICY_MRENCLAVE isn't considered useful for
symmetric key generation inside of an enclave, with perhaps the
exception of shrouding keys to defeat speculation attacks.

So, to assist the conversation, and for the 'Lore' record.

In an EDMM environment, anyone with adverse intent is going to simply
ship an enclave that amounts to nothing more than a bootloader.  Said
enclave will setup a network connection to an external code
repository, which will verify that it is only talking to a known
enclave through remote attestation, and then download whatever code,
via a cryptographically secured connection, that they actually want to
run in the enclave.

How do I know that?

I know that because we were paid to develop those types of systems by
customers who wanted to run proprietary and/or confidential code in
the 'cloud'.  It was interesting to see the number of groups that
looked at SGX as a means to protect their 'secret sauce'.

I conclude, from Jarkko's comment above, that the kernel is going to
simply ignore this threat scenario, seems vaguely unwise.

So, perhaps the best way to advance a profitable discussion, is for
the involved kernel developers to state, for the benefit of those of
us less enlightened, how effective LSM's are going to be developed for
the EDMM threat model.

I can offer up a few strawmen approaches

- Refuse any socket connections to an application that maps an enclave.

- Refuse to allow an application mapping an enclave to access any
files, particularly if it looks like they contain encrypted content.

- Count the number of page mappings requested and decide that NN pages
are OK but NN+ are not, MAP_POPULATE returns E2BIG??

- Implement seccomp like controls to analyze and interpret OCALL
behavior.

The first three seem a bit like show-stoppers when it comes to doing
anything useful with SGX, seccomp has a reputation of being hard to
get 'right', that would seem to be particularly the case here.

Perhaps something more exotic.

How about a BPF enabled LSM that monitors enclave page access
patterns, so a concerned system administrator can build custom
variants of the directed page side channel attack that Oakland
demonstrated against Haven.  Oakland, in his conclusions, described
their approach as 'devastating' to the notion that SGX could prevent an
adversarial OS from knowing what is going on inside of an enclave.

The conundrum is pretty simple and straight forward.  Either the
technology works, as advertised, which means, by definition, that the
operating system has no effective insight into what an enclave is
doing.

Or some of the kernel developers are right, and there is a way for the
OS to have effective insight and thus control, over what an enclave is
trying to do.  A fact that effectively implies that Fortanix, Asylo,
Enarx, Gramine, Oculum et.al. are peddling the equivalent of Security
Snake Oil when it comes to SGX enabled 'confidential' computing.

The above shouldn't be considered pejorative to those products,
companies or initiatives.  I have an acquaintance in the IT business
that tells me I worry too much about if things work and how, because
he has made a ton of money selling people stuff that he knows doesn't
work.

For the record, I'm completely ambivalent with respect to how any of
this gets done or what the PTE permissions for dynamic content are.
For those who may not be ambivalent about whether Linux gets security
'right', let me leave the following thoughts.

It hasn't been my experience that good engineers design things or
processes for the sake of designing them.  The architecture for
EDMM/SGX was proposed and presented in the form of two papers at
HASP-2016.

Let's see, the operating system 'flow' model counts as an author none
other then Mark Shanahan himself, with Bin Xing and Rebekah Hurd.

The SGX instructions and architecture paper was done by Frank McKeen,
Ilya Alexandrovich, Ittai Anati, Dror Capi, Simon Johnson, Rebekah
Hurd and Carlos Razos.

The interactions I've had left me feeling like these were really smart
people.  Maybe they were on psychedelics when they designed this and
wrote the papers?  It would seem to be helpful for these discussions
to know if this was the case.

Or maybe, perhaps, there are some subtleties, hidden 'gotchas',
micro-architectural pecularities and/or security considerations that
influenced the documented EDMM flow.  Seems like secure kernel
development would benefit from knowing that, rather then concluding
that this method is slow, and perhaps hard to implement, so Linux is
going to ignore these potential issues.

The EDMM papers, if anyone should happen to read them, indicate in the
acknowledgments that the design was done in collaboration with OS
designers.  I'm presuming that was Pienado's group at Microsoft
Research, given that is where Haven came from, maybe they were
dabbling with psychedelics.

I believe they did manage to document the early micro-architectural
errata about certain page access patterns causing processor faults.
That leads one to believe they couldn't have been too confused about
what was going on.

When I brought all of this up last time I was told I was trying to
present a 'scary boogeyman'.  That could be, I will leave that for
others to judge.

My pragmatic response to that accusation is why would we argue to have
systemd and the distro's change system behaviors to accomodate the
ability to apply an LSM to a 500K 'bootloader'.  When that
'bootloader' can turn around and potentially pull down gigabytes of
code that we appear to have decided don't need any controls applied to
whatsoever.

Caution would seem to suggest the need to understand the implications
of these issues a bit better than we currently do.

> Thank you.
> 
> BR, Jarkko

Best wishes for a pleasant weekend to everyone.

Dr. Greg

As always,
Dr. Greg Wettstein, Ph.D, Worker      Autonomously self-defensive
Enjellic Systems Development, LLC     IOT platforms and edge devices.
4206 N. 19th Ave.
Fargo, ND  58102
PH: 701-281-1686                      EMAIL: dg@enjellic.com
------------------------------------------------------------------------------
"Thinking implies disagreement; and disagreement implies
 non-conformity; and non-comformity implies heresy; and heresy implies
 disloyality -- so obviously thinking must be stopped"
                                -- [Call to Greatness, 1954]
                                   Adlai Stephenson

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-10 18:33         ` Haitao Huang
@ 2022-03-11 12:10           ` Jarkko Sakkinen
  2022-03-11 12:16             ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-11 12:10 UTC (permalink / raw)
  To: Haitao Huang, Chatre, Reinette
  Cc: Dhanraj, Vijay, Chatre, Reinette, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
> Hi Jarkko
> 
> I have some trouble understanding the sequences below.
> 
> On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > > Hi All,
> > > 
> > > Regarding the recent update of splitting the page permissions change
> > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > > one? That is, revert to how it was done in the v1 version?
> > > 
> > > Why? Currently in Gramine (a library OS for unmodified applications,
> > > https://gramineproject.io/) with the new proposed change, one needs to
> > > store the page permission for each page or range of pages. And for every
> > > request of `mmap` or `mprotect`, Gramine would have to do a lookup
> > > of the
> > > page permissions for the request range and then call the respective
> > > IOCTL
> > > either RESTRICT or RELAX. This seems a little overwhelming.
> > > 
> > > Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do
> > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> > > With this approach, we can avoid storing  page permissions and simplify
> > > the implementation.
> > > 
> > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
> > > flows
> > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> > > not sure what will be the performance impact. Is there any data point to
> > > see the performance impact?
> > > 
> > > Thanks,
> > > -Vijay
> > 
> > This should get better in the next versuin. "relax" is gone. And for
> > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> > internal vm_max_prot_bits is set to RWX.
> > 
> > I patched the existing series eno
> > 
> > For Enarx I'm using the following patterns.
> > 
> > Shim mmap() handler:
> > 1. Ask host for mmap() syscall.
> > 2. Construct secinfo matching the protection bits.
> > 3. For each page in the address range: EACCEPTCOPY with a
> >    zero page.
> 
> For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
> So this only works for mmap(..., RW) or mmap(...,RWX).

I use it only with EAUG.

> So that gives you pages with RW/RWX.
> 
> To change permissions of any of those pages from RW/RWX to R/RX , you need
> call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
> just do EMODPE.
> 
> so for RW->R, you either:
> 
> 1)EMODPR(EPCM.NONE)
> 2)EACCEPT(EPCM.NONE)
> 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
> access permitted by enclave"
> 
> or:
> 
> 1)EMODPR(EPCM.PROT_R)
> 2)EACCEPT(EPCM.PROT_R)

I checked from SDM and you're correct.

Then the appropriate thing is to reset to R.

> > Shim mprotect() handler:
> > 1. Ask host for mprotect() syscall.
> > 2. For each page in the address range: EACCEPT with PROT_NONE
> >    secinfo and EMODPE with the secinfo having the prot bits.
> 
> EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
> EPCM.RW,
> so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.

Ditto.

> > Backend mprotect() handler:
> > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
> >    range with PROT_NONE.
> > 2. Invoke real mprotect() syscall.
> > 
> Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
> pages.

Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.

Reinette, the ioctl should already check that either R or W is set in
secinfo and return -EACCES.

I.e.

(* Check for misconfigured SECINFO flags*)
IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
(SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
THEN #GP(0); FI;

I was testing this and wondering why my enclave #GP's, and then I checked
SDM after reading Haitao's response. So clearly check in kernel side is
needed.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-11 12:10           ` Jarkko Sakkinen
@ 2022-03-11 12:16             ` Jarkko Sakkinen
  2022-03-11 12:33               ` Jarkko Sakkinen
  2022-03-11 17:53               ` Reinette Chatre
  0 siblings, 2 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-11 12:16 UTC (permalink / raw)
  To: Haitao Huang, Chatre, Reinette
  Cc: Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
> > Hi Jarkko
> > 
> > I have some trouble understanding the sequences below.
> > 
> > On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> > wrote:
> > 
> > > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > > > Hi All,
> > > > 
> > > > Regarding the recent update of splitting the page permissions change
> > > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > > > one? That is, revert to how it was done in the v1 version?
> > > > 
> > > > Why? Currently in Gramine (a library OS for unmodified applications,
> > > > https://gramineproject.io/) with the new proposed change, one needs to
> > > > store the page permission for each page or range of pages. And for every
> > > > request of `mmap` or `mprotect`, Gramine would have to do a lookup
> > > > of the
> > > > page permissions for the request range and then call the respective
> > > > IOCTL
> > > > either RESTRICT or RELAX. This seems a little overwhelming.
> > > > 
> > > > Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do
> > > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> > > > With this approach, we can avoid storing  page permissions and simplify
> > > > the implementation.
> > > > 
> > > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
> > > > flows
> > > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> > > > not sure what will be the performance impact. Is there any data point to
> > > > see the performance impact?
> > > > 
> > > > Thanks,
> > > > -Vijay
> > > 
> > > This should get better in the next versuin. "relax" is gone. And for
> > > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> > > internal vm_max_prot_bits is set to RWX.
> > > 
> > > I patched the existing series eno
> > > 
> > > For Enarx I'm using the following patterns.
> > > 
> > > Shim mmap() handler:
> > > 1. Ask host for mmap() syscall.
> > > 2. Construct secinfo matching the protection bits.
> > > 3. For each page in the address range: EACCEPTCOPY with a
> > >    zero page.
> > 
> > For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
> > So this only works for mmap(..., RW) or mmap(...,RWX).
> 
> I use it only with EAUG.
> 
> > So that gives you pages with RW/RWX.
> > 
> > To change permissions of any of those pages from RW/RWX to R/RX , you need
> > call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
> > just do EMODPE.
> > 
> > so for RW->R, you either:
> > 
> > 1)EMODPR(EPCM.NONE)
> > 2)EACCEPT(EPCM.NONE)
> > 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
> > access permitted by enclave"
> > 
> > or:
> > 
> > 1)EMODPR(EPCM.PROT_R)
> > 2)EACCEPT(EPCM.PROT_R)
> 
> I checked from SDM and you're correct.
> 
> Then the appropriate thing is to reset to R.
> 
> > > Shim mprotect() handler:
> > > 1. Ask host for mprotect() syscall.
> > > 2. For each page in the address range: EACCEPT with PROT_NONE
> > >    secinfo and EMODPE with the secinfo having the prot bits.
> > 
> > EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
> > EPCM.RW,
> > so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.
> 
> Ditto.
> 
> > > Backend mprotect() handler:
> > > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
> > >    range with PROT_NONE.
> > > 2. Invoke real mprotect() syscall.
> > > 
> > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
> > pages.
> 
> Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.
> 
> Reinette, the ioctl should already check that either R or W is set in
> secinfo and return -EACCES.
> 
> I.e.
> 
> (* Check for misconfigured SECINFO flags*)
> IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
> (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
> THEN #GP(0); FI;
> 
> I was testing this and wondering why my enclave #GP's, and then I checked
> SDM after reading Haitao's response. So clearly check in kernel side is
> needed.

I would consider also adding such check "add pages". It's our least common
denominator.

If we can assume that at least R is there for every enclave page, then it
gives invariant that enables EMODPR with R all the time.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-11 12:16             ` Jarkko Sakkinen
@ 2022-03-11 12:33               ` Jarkko Sakkinen
  2022-03-11 17:53               ` Reinette Chatre
  1 sibling, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-11 12:33 UTC (permalink / raw)
  To: Haitao Huang, Chatre, Reinette
  Cc: Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Fri, Mar 11, 2022 at 02:16:47PM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote:
> > On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
> > > Hi Jarkko
> > > 
> > > I have some trouble understanding the sequences below.
> > > 
> > > On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org>
> > > wrote:
> > > 
> > > > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
> > > > > Hi All,
> > > > > 
> > > > > Regarding the recent update of splitting the page permissions change
> > > > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into
> > > > > one? That is, revert to how it was done in the v1 version?
> > > > > 
> > > > > Why? Currently in Gramine (a library OS for unmodified applications,
> > > > > https://gramineproject.io/) with the new proposed change, one needs to
> > > > > store the page permission for each page or range of pages. And for every
> > > > > request of `mmap` or `mprotect`, Gramine would have to do a lookup
> > > > > of the
> > > > > page permissions for the request range and then call the respective
> > > > > IOCTL
> > > > > either RESTRICT or RELAX. This seems a little overwhelming.
> > > > > 
> > > > > Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do
> > > > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
> > > > > With this approach, we can avoid storing  page permissions and simplify
> > > > > the implementation.
> > > > > 
> > > > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
> > > > > flows
> > > > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
> > > > > not sure what will be the performance impact. Is there any data point to
> > > > > see the performance impact?
> > > > > 
> > > > > Thanks,
> > > > > -Vijay
> > > > 
> > > > This should get better in the next versuin. "relax" is gone. And for
> > > > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
> > > > internal vm_max_prot_bits is set to RWX.
> > > > 
> > > > I patched the existing series eno
> > > > 
> > > > For Enarx I'm using the following patterns.
> > > > 
> > > > Shim mmap() handler:
> > > > 1. Ask host for mmap() syscall.
> > > > 2. Construct secinfo matching the protection bits.
> > > > 3. For each page in the address range: EACCEPTCOPY with a
> > > >    zero page.
> > > 
> > > For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
> > > So this only works for mmap(..., RW) or mmap(...,RWX).
> > 
> > I use it only with EAUG.
> > 
> > > So that gives you pages with RW/RWX.
> > > 
> > > To change permissions of any of those pages from RW/RWX to R/RX , you need
> > > call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
> > > just do EMODPE.
> > > 
> > > so for RW->R, you either:
> > > 
> > > 1)EMODPR(EPCM.NONE)
> > > 2)EACCEPT(EPCM.NONE)
> > > 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
> > > access permitted by enclave"
> > > 
> > > or:
> > > 
> > > 1)EMODPR(EPCM.PROT_R)
> > > 2)EACCEPT(EPCM.PROT_R)
> > 
> > I checked from SDM and you're correct.
> > 
> > Then the appropriate thing is to reset to R.
> > 
> > > > Shim mprotect() handler:
> > > > 1. Ask host for mprotect() syscall.
> > > > 2. For each page in the address range: EACCEPT with PROT_NONE
> > > >    secinfo and EMODPE with the secinfo having the prot bits.
> > > 
> > > EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
> > > EPCM.RW,
> > > so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.
> > 
> > Ditto.
> > 
> > > > Backend mprotect() handler:
> > > > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
> > > >    range with PROT_NONE.
> > > > 2. Invoke real mprotect() syscall.
> > > > 
> > > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
> > > pages.
> > 
> > Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.
> > 
> > Reinette, the ioctl should already check that either R or W is set in
> > secinfo and return -EACCES.
> > 
> > I.e.
> > 
> > (* Check for misconfigured SECINFO flags*)
> > IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
> > (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
> > THEN #GP(0); FI;
> > 
> > I was testing this and wondering why my enclave #GP's, and then I checked
> > SDM after reading Haitao's response. So clearly check in kernel side is
> > needed.
> 
> I would consider also adding such check "add pages". It's our least common
> denominator.
> 
> If we can assume that at least R is there for every enclave page, then it
> gives invariant that enables EMODPR with R all the time.

Since EAUG is done already in the #PF handler, so must be EMODPR. Otherwise
we do things incosistently [*]. One being in #PF handler and other being
ioctl is unacceptable.

Moving EMODPR to #PF handler would be trivial:

1. In mprotect() callback unmap PTE's for
   the range.
2. In #PF handler, EMODPR with read permissions.

This is something that would be understandable for the user space. The only
API ever required would be EMODPE for permission changes. You could
basically implement the whole thing for EPCM inside enclave with no ioctls
required.

That would leave only ioctls to the series:
1. SGX_IOC_ENCLAVE_MODIFY_TYPE
2. SGX_IOO_ENCLAVE_REMOVE_PAGES

[*] For me stick to #PF handler for EAUG is fine for the first mainline
version. The API side is factors more critical.

BR, Jarkko


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-11 12:16             ` Jarkko Sakkinen
  2022-03-11 12:33               ` Jarkko Sakkinen
@ 2022-03-11 17:53               ` Reinette Chatre
  2022-03-11 18:11                 ` Jarkko Sakkinen
  2022-03-14  2:49                 ` Jarkko Sakkinen
  1 sibling, 2 replies; 130+ messages in thread
From: Reinette Chatre @ 2022-03-11 17:53 UTC (permalink / raw)
  To: Jarkko Sakkinen, Haitao Huang
  Cc: Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/11/2022 4:16 AM, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote:
>> On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote:
>>> Hi Jarkko
>>>
>>> I have some trouble understanding the sequences below.
>>>
>>> On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org>
>>> wrote:
>>>
>>>> On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote:
>>>>> Hi All,
>>>>>
>>>>> Regarding the recent update of splitting the page permissions change
>>>>> request into two IOCTLS (RELAX and RESTRICT), can we combine them into
>>>>> one? That is, revert to how it was done in the v1 version?
>>>>>
>>>>> Why? Currently in Gramine (a library OS for unmodified applications,
>>>>> https://gramineproject.io/) with the new proposed change, one needs to
>>>>> store the page permission for each page or range of pages. And for every
>>>>> request of `mmap` or `mprotect`, Gramine would have to do a lookup
>>>>> of the
>>>>> page permissions for the request range and then call the respective
>>>>> IOCTL
>>>>> either RESTRICT or RELAX. This seems a little overwhelming.
>>>>>
>>>>> Request: Instead, can we do `MODPE`,  call `RESTRICT` IOCTL, and then do
>>>>> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request?
>>>>> With this approach, we can avoid storing  page permissions and simplify
>>>>> the implementation.
>>>>>
>>>>> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK`
>>>>> flows
>>>>> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am
>>>>> not sure what will be the performance impact. Is there any data point to
>>>>> see the performance impact?
>>>>>
>>>>> Thanks,
>>>>> -Vijay
>>>>
>>>> This should get better in the next versuin. "relax" is gone. And for
>>>> dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e.
>>>> internal vm_max_prot_bits is set to RWX.
>>>>
>>>> I patched the existing series eno
>>>>
>>>> For Enarx I'm using the following patterns.
>>>>
>>>> Shim mmap() handler:
>>>> 1. Ask host for mmap() syscall.
>>>> 2. Construct secinfo matching the protection bits.
>>>> 3. For each page in the address range: EACCEPTCOPY with a
>>>>    zero page.
>>>
>>> For EACCEPTCOPY to work, I believe PTE.RW is required for the target page.
>>> So this only works for mmap(..., RW) or mmap(...,RWX).
>>
>> I use it only with EAUG.
>>
>>> So that gives you pages with RW/RWX.
>>>
>>> To change permissions of any of those pages from RW/RWX to R/RX , you need
>>> call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't
>>> just do EMODPE.
>>>
>>> so for RW->R, you either:
>>>
>>> 1)EMODPR(EPCM.NONE)
>>> 2)EACCEPT(EPCM.NONE)
>>> 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read
>>> access permitted by enclave"
>>>
>>> or:
>>>
>>> 1)EMODPR(EPCM.PROT_R)
>>> 2)EACCEPT(EPCM.PROT_R)
>>
>> I checked from SDM and you're correct.
>>
>> Then the appropriate thing is to reset to R.
>>
>>>> Shim mprotect() handler:
>>>> 1. Ask host for mprotect() syscall.
>>>> 2. For each page in the address range: EACCEPT with PROT_NONE
>>>>    secinfo and EMODPE with the secinfo having the prot bits.
>>>
>>> EACCEPT requires PTE.R. And EAUG'd pages will always initialized with
>>> EPCM.RW,
>>> so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH.
>>
>> Ditto.
>>
>>>> Backend mprotect() handler:
>>>> 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address
>>>>    range with PROT_NONE.
>>>> 2. Invoke real mprotect() syscall.
>>>>
>>> Note #1 can only be done after EACCEPT. MODPR is not allowed for pending
>>> pages.
>>
>> Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop.
>>
>> Reinette, the ioctl should already check that either R or W is set in
>> secinfo and return -EACCES.
>>
>> I.e.
>>
>> (* Check for misconfigured SECINFO flags*)
>> IF ( (SCRATCH_SECINFO reserved fields are not zero ) or
>> (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) )
>> THEN #GP(0); FI;
>>
>> I was testing this and wondering why my enclave #GP's, and then I checked
>> SDM after reading Haitao's response. So clearly check in kernel side is
>> needed.

I do not believe that you encountered the #GP documented above because that
check is already present in the current implementation of
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:

sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo():
	if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
		return -EINVAL;

It does return EINVAL which is the catch-all error code used to represent
invalid input from user space. I am not convinced that EACCES should be used
instead though, EACCES means "Permission denied", which is not the case here.
The case here is just an invalid request.

It currently does not prevent the user from setting PROT_NONE though, which
EMODPR does seem to allow.

I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
This motivates that EMODPR->PROT_NONE should not be allowed since it would
not be possible to relax permissions (run EMODPE) after that. Even so, I
also found in the SDM that EACCEPT has the note "Read access permitted
by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
from that perspective either since the enclave will not be able to
EACCEPT the change. Does that match your understanding?

I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least.

> I would consider also adding such check "add pages". It's our least common
> denominator.
> 
> If we can assume that at least R is there for every enclave page, then it
> gives invariant that enables EMODPR with R all the time.

Adding pages without permissions to an enclave does not seem practical. I
do not know if there are such usages. I can add this as a separate change for
consideration.

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-11 17:53               ` Reinette Chatre
@ 2022-03-11 18:11                 ` Jarkko Sakkinen
  2022-03-11 19:28                   ` Reinette Chatre
  2022-03-14  2:49                 ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-11 18:11 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
 
> I do not believe that you encountered the #GP documented above because that
> check is already present in the current implementation of
> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
> 
> sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo():
> 	if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
> 		return -EINVAL;
> 
> It does return EINVAL which is the catch-all error code used to represent
> invalid input from user space. I am not convinced that EACCES should be used
> instead though, EACCES means "Permission denied", which is not the case here.
> The case here is just an invalid request.
> 
> It currently does not prevent the user from setting PROT_NONE though, which
> EMODPR does seem to allow.
> 
> I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> This motivates that EMODPR->PROT_NONE should not be allowed since it would
> not be possible to relax permissions (run EMODPE) after that. Even so, I
> also found in the SDM that EACCEPT has the note "Read access permitted
> by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> from that perspective either since the enclave will not be able to
> EACCEPT the change. Does that match your understanding?
> 
> I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least.

Yes, I think we are in the same line with this.

But there is another thing.

As EAUG is taken care by the page handler so should EMODPR. It makes the
developer experience whole a lot easier when you don't have to back call
to host and ask it to execute EMODPR for the range.

It's also a huge incosistency in this patch set that they are handled
differently.

And it creates a concurrency case for user space that is complicated to say
the least, i.e. divided work between host and enclave implementation to
execute EMODPR is a nightmare scenario. On the other hand this is trivial
to sort out in kernel.

So what it means that, in one way or antoher, mprotect() needs to be the
melting point for both. This can be called mandatory requirement, however
this patch set it done, not least because of managing concurrency between
kernel and user space.

You can get that done by these steps:

1. Unmap PTE's in mprotect() flow.
2. In #PF handler, EMODPR with R set.

This clear API for enclave developer because you know in what state pages
are after mprotect(), and what you need to still do to them. Only the
syscall needs to be them performed by the host side.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-11 18:11                 ` Jarkko Sakkinen
@ 2022-03-11 19:28                   ` Reinette Chatre
  2022-03-14  3:42                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-11 19:28 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/11/2022 10:11 AM, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>  
>> I do not believe that you encountered the #GP documented above because that
>> check is already present in the current implementation of
>> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS:
>>
>> sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo():
>> 	if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R))
>> 		return -EINVAL;
>>
>> It does return EINVAL which is the catch-all error code used to represent
>> invalid input from user space. I am not convinced that EACCES should be used
>> instead though, EACCES means "Permission denied", which is not the case here.
>> The case here is just an invalid request.
>>
>> It currently does not prevent the user from setting PROT_NONE though, which
>> EMODPR does seem to allow.
>>
>> I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
>> This motivates that EMODPR->PROT_NONE should not be allowed since it would
>> not be possible to relax permissions (run EMODPE) after that. Even so, I
>> also found in the SDM that EACCEPT has the note "Read access permitted
>> by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
>> from that perspective either since the enclave will not be able to
>> EACCEPT the change. Does that match your understanding?
>>
>> I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least.
> 
> Yes, I think we are in the same line with this.
> 
> But there is another thing.
> 
> As EAUG is taken care by the page handler so should EMODPR. It makes the
> developer experience whole a lot easier when you don't have to back call
> to host and ask it to execute EMODPR for the range.
> 
> It's also a huge incosistency in this patch set that they are handled
> differently.
> 
> And it creates a concurrency case for user space that is complicated to say
> the least, i.e. divided work between host and enclave implementation to
> execute EMODPR is a nightmare scenario. On the other hand this is trivial
> to sort out in kernel.

EMODPR has possible failures due to state that is managed by the user space
runtime. Being able to communicate accurate EMODPR error codes to user space
runtime is helpful to the runtime in supporting its management of the enclave
memory. Accurate EMODPR error codes can be communicated when using an ioctl(),
not when run from within a page fault handler. 
 
> So what it means that, in one way or antoher, mprotect() needs to be the
> melting point for both.

mprotect() is the syscall to modify VMA permissions. EPCM permissions are
different from VMA permissions and they are currently treated differently
by the kernel. 

Moving EPCM permission changes to mprotect() forces EPCM permissions to be
the same as VMA permissions. That is a significant change. It is also
inconsistent since EPCM permission changes cannot be managed completely
from the kernel since the kernel can only ever restrict permissions.

> This can be called mandatory requirement, however
> this patch set it done, not least because of managing concurrency between
> kernel and user space.
> 
> You can get that done by these steps:
> 
> 1. Unmap PTE's in mprotect() flow.
> 2. In #PF handler, EMODPR with R set.

There is also the very significant ETRACK flow that
needs to be run after EMODPR. The implications of sending IPIs to all
CPUs that may be running in an enclave while in a page fault handler needs
to be considered. Page faults should be as fast as possible.

If this is considered then this tremendous impact on the page fault handler
should be managed and avoided as much as possible - but how will the page
fault handler even know when it should run EMODPR? The enclave can run 
EMODPE from within the enclave at any time without any insight from the
kernel so the only way to have accurate permissions would then be to
run EMODPR on _every_ page fault, which is obviously a non-starter due
to the significant impact (EMODPR and ETRACK) and blast radius (IPIs).

Trying to move running of EMODPR earlier, during the mprotect() call itself
is also full of obstacles since the mprotect() call may result in VMAs
being split, which is an operation that can fail, and followed by
the EMODPR-ETRACK flows that can also fail (and not be able to 
undo the VMA splits). With the EMODPR-ETRACK flows that can fail it
is here also not possible to communicate accurately to user space since
now there is the whole page range to consider, for example, mprotect()
cannot communicate
(a) which pages caused the failure, and (b) what failure was encountered.
This is possible when using the ioctl().


> This clear API for enclave developer because you know in what state pages
> are after mprotect(), and what you need to still do to them. Only the
> syscall needs to be them performed by the host side.

Supporting permission restriction in an ioctl() enables the runtime to manage
the enclave memory without needing to map it.

I have considered the idea of supporting the permission restriction with
mprotect() but as you can see in this response I did not find it to be
practical.

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-11 17:53               ` Reinette Chatre
  2022-03-11 18:11                 ` Jarkko Sakkinen
@ 2022-03-14  2:49                 ` Jarkko Sakkinen
  2022-03-14  2:50                   ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-14  2:49 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:

> I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> This motivates that EMODPR->PROT_NONE should not be allowed since it would
> not be possible to relax permissions (run EMODPE) after that. Even so, I
> also found in the SDM that EACCEPT has the note "Read access permitted
> by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> from that perspective either since the enclave will not be able to
> EACCEPT the change. Does that match your understanding?

Yes, PROT_NONE should not be allowed.

This is however the real problem.

The current kernel patch set has inconsistent API and EMODPR ioctl is
simply unacceptable. It  also requires more concurrency management from
user space run-time, which would be heck a lot easier to do in the kernel.

If you really want EMODPR as ioctl, then for consistencys sake, then EAUG
should be too. Like this when things go opposite directions, this patch set
plain and simply will not work out.

I would pick EAUG's strategy from these two as it requires half the back
calls to host from an enclave. I.e. please combine mprotect() and EMODPR,
either in the #PF handler or as part of mprotect(), which ever suits you
best.

I'll try demonstrate this with two examples.

mmap() could go something like this() (simplified):
1. Execution #UD's to SYSCALL.
2. Host calls enclave's mmap() handler with mmap() parameters.
3. Enclave up-calls host's mmap().
4. Loops the range with EACCEPTCOPY.

mprotect() has to be done like this:
1. Execution #UD's to SYSCALL.
2. Host calls enclave's mprotect() handler.
3. Enclave up-calls host's mprotect().
4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
3. Loops the range with EACCEPT.

This is just terrible IMHO. I hope these examples bring some insight.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14  2:49                 ` Jarkko Sakkinen
@ 2022-03-14  2:50                   ` Jarkko Sakkinen
  2022-03-14  2:58                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-14  2:50 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> 
> > I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> > This motivates that EMODPR->PROT_NONE should not be allowed since it would
> > not be possible to relax permissions (run EMODPE) after that. Even so, I
> > also found in the SDM that EACCEPT has the note "Read access permitted
> > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> > from that perspective either since the enclave will not be able to
> > EACCEPT the change. Does that match your understanding?
> 
> Yes, PROT_NONE should not be allowed.
> 
> This is however the real problem.
> 
> The current kernel patch set has inconsistent API and EMODPR ioctl is
> simply unacceptable. It  also requires more concurrency management from
> user space run-time, which would be heck a lot easier to do in the kernel.
> 
> If you really want EMODPR as ioctl, then for consistencys sake, then EAUG
> should be too. Like this when things go opposite directions, this patch set
> plain and simply will not work out.
> 
> I would pick EAUG's strategy from these two as it requires half the back
> calls to host from an enclave. I.e. please combine mprotect() and EMODPR,
> either in the #PF handler or as part of mprotect(), which ever suits you
> best.
> 
> I'll try demonstrate this with two examples.
> 
> mmap() could go something like this() (simplified):
> 1. Execution #UD's to SYSCALL.
> 2. Host calls enclave's mmap() handler with mmap() parameters.
> 3. Enclave up-calls host's mmap().
> 4. Loops the range with EACCEPTCOPY.
> 
> mprotect() has to be done like this:
> 1. Execution #UD's to SYSCALL.
> 2. Host calls enclave's mprotect() handler.
> 3. Enclave up-calls host's mprotect().
> 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> 3. Loops the range with EACCEPT.
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  5. Loops the range with EACCEPT + EMODPE.

> This is just terrible IMHO. I hope these examples bring some insight.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14  2:50                   ` Jarkko Sakkinen
@ 2022-03-14  2:58                     ` Jarkko Sakkinen
  2022-03-14 15:39                       ` Haitao Huang
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-14  2:58 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > 
> > > I saw Haitao's note that EMODPE requires "Read access permitted by enclave".
> > > This motivates that EMODPR->PROT_NONE should not be allowed since it would
> > > not be possible to relax permissions (run EMODPE) after that. Even so, I
> > > also found in the SDM that EACCEPT has the note "Read access permitted
> > > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical
> > > from that perspective either since the enclave will not be able to
> > > EACCEPT the change. Does that match your understanding?
> > 
> > Yes, PROT_NONE should not be allowed.
> > 
> > This is however the real problem.
> > 
> > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > simply unacceptable. It  also requires more concurrency management from
> > user space run-time, which would be heck a lot easier to do in the kernel.
> > 
> > If you really want EMODPR as ioctl, then for consistencys sake, then EAUG
> > should be too. Like this when things go opposite directions, this patch set
> > plain and simply will not work out.
> > 
> > I would pick EAUG's strategy from these two as it requires half the back
> > calls to host from an enclave. I.e. please combine mprotect() and EMODPR,
> > either in the #PF handler or as part of mprotect(), which ever suits you
> > best.
> > 
> > I'll try demonstrate this with two examples.
> > 
> > mmap() could go something like this() (simplified):
> > 1. Execution #UD's to SYSCALL.
> > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > 3. Enclave up-calls host's mmap().
> > 4. Loops the range with EACCEPTCOPY.
> > 
> > mprotect() has to be done like this:
> > 1. Execution #UD's to SYSCALL.
> > 2. Host calls enclave's mprotect() handler.
> > 3. Enclave up-calls host's mprotect().
> > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> > 3. Loops the range with EACCEPT.
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   5. Loops the range with EACCEPT + EMODPE.
> 
> > This is just terrible IMHO. I hope these examples bring some insight.

E.g. in Enarx we have to add a special up-call (so called enarxcall in
intermediate that we call sallyport, which provides shared buffer to
communicate with the enclave) just for reseting the range with PROT_READ.
Feel very redundant, adds ugly cruft and is completely opposite strategy to
what you've chosen to do with EAUG, which is I think correct choice as far
as API is concerned.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-11 19:28                   ` Reinette Chatre
@ 2022-03-14  3:42                     ` Jarkko Sakkinen
  2022-03-14  3:45                       ` Jarkko Sakkinen
  2022-03-14 15:32                       ` Reinette Chatre
  0 siblings, 2 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-14  3:42 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> Supporting permission restriction in an ioctl() enables the runtime to manage
> the enclave memory without needing to map it.

Which is opposite what you do in EAUG. You can also augment pages without
needing the map them. Sure you get that capability, but it is quite useless
in practice.

> I have considered the idea of supporting the permission restriction with
> mprotect() but as you can see in this response I did not find it to be
> practical.

Where is it practical? What is your application? How is it practical to
delegate the concurrency management of a split mprotect() to user space?
How do we get rid off a useless up-call to the host?

> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14  3:42                     ` Jarkko Sakkinen
@ 2022-03-14  3:45                       ` Jarkko Sakkinen
  2022-03-14  3:54                         ` Jarkko Sakkinen
  2022-03-14 15:32                       ` Reinette Chatre
  1 sibling, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-14  3:45 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Mon, Mar 14, 2022 at 05:42:43AM +0200, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> > Supporting permission restriction in an ioctl() enables the runtime to manage
> > the enclave memory without needing to map it.
> 
> Which is opposite what you do in EAUG. You can also augment pages without
> needing the map them. Sure you get that capability, but it is quite useless
> in practice.

Essentially you are tuning for a niche artifical use case over the common
case that most people end up doing. It makes no sense.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14  3:45                       ` Jarkko Sakkinen
@ 2022-03-14  3:54                         ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-14  3:54 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Mon, Mar 14, 2022 at 05:45:48AM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 05:42:43AM +0200, Jarkko Sakkinen wrote:
> > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> > > Supporting permission restriction in an ioctl() enables the runtime to manage
> > > the enclave memory without needing to map it.
> > 
> > Which is opposite what you do in EAUG. You can also augment pages without
> > needing the map them. Sure you get that capability, but it is quite useless
> > in practice.
> 
> Essentially you are tuning for a niche artifical use case over the common
> case that most people end up doing. It makes no sense.

Also it is important to remember why EMODPR is there: it is not to bring
useful control mechanism or interesting applications for SGX. It's there
because of hardware constraints. Therefore it should be used accordingly
and certainly not to fully expose its interface to the user space.

Without hardware constraints, we would have only in-enclave EMODP.

It is essentially a reset mechanism for EPCM, not more or less. Therefore,
it should be used as such and pick a *fixed* value to reset the EPCM from
the mapped range. I think PROT_READ is the sanest choice of the available
options. Then, EMODPE can be used for the most part just like "EMODP".

Please do not fully expose EMODPR to the user space. It's a pandora box
of misbehaviour and shooting yourself into foot.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14  3:42                     ` Jarkko Sakkinen
  2022-03-14  3:45                       ` Jarkko Sakkinen
@ 2022-03-14 15:32                       ` Reinette Chatre
  2022-03-17  4:30                         ` Jarkko Sakkinen
  1 sibling, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-14 15:32 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>> Supporting permission restriction in an ioctl() enables the runtime to manage
>> the enclave memory without needing to map it.
> 
> Which is opposite what you do in EAUG. You can also augment pages without
> needing the map them. Sure you get that capability, but it is quite useless
> in practice.
> 
>> I have considered the idea of supporting the permission restriction with
>> mprotect() but as you can see in this response I did not find it to be
>> practical.
> 
> Where is it practical? What is your application? How is it practical to
> delegate the concurrency management of a split mprotect() to user space?
> How do we get rid off a useless up-call to the host?
> 

The email you responded to contained many obstacles against using mprotect()
but you chose to ignore them and snipped them all from your response. Could
you please address the issues instead of dismissing them? 

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14  2:58                     ` Jarkko Sakkinen
@ 2022-03-14 15:39                       ` Haitao Huang
  2022-03-17  4:34                         ` Jarkko Sakkinen
                                           ` (2 more replies)
  0 siblings, 3 replies; 130+ messages in thread
From: Haitao Huang @ 2022-03-14 15:39 UTC (permalink / raw)
  To: Reinette Chatre, Jarkko Sakkinen
  Cc: Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski, Andy, mingo,
	linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

Hi Jarkko

On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
>> On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
>> > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>> >
>> > > I saw Haitao's note that EMODPE requires "Read access permitted by  
>> enclave".
>> > > This motivates that EMODPR->PROT_NONE should not be allowed since  
>> it would
>> > > not be possible to relax permissions (run EMODPE) after that. Even  
>> so, I
>> > > also found in the SDM that EACCEPT has the note "Read access  
>> permitted
>> > > by enclave". That seems to indicate that EMODPR->PROT_NONE is not  
>> practical
>> > > from that perspective either since the enclave will not be able to
>> > > EACCEPT the change. Does that match your understanding?
>> >
>> > Yes, PROT_NONE should not be allowed.
>> >
>> > This is however the real problem.
>> >
>> > The current kernel patch set has inconsistent API and EMODPR ioctl is
>> > simply unacceptable. It  also requires more concurrency management  
>> from
>> > user space run-time, which would be heck a lot easier to do in the  
>> kernel.
>> >
>> > If you really want EMODPR as ioctl, then for consistencys sake, then  
>> EAUG
>> > should be too. Like this when things go opposite directions, this  
>> patch set
>> > plain and simply will not work out.
>> >
>> > I would pick EAUG's strategy from these two as it requires half the  
>> back
>> > calls to host from an enclave. I.e. please combine mprotect() and  
>> EMODPR,
>> > either in the #PF handler or as part of mprotect(), which ever suits  
>> you
>> > best.
>> >
>> > I'll try demonstrate this with two examples.
>> >
>> > mmap() could go something like this() (simplified):
>> > 1. Execution #UD's to SYSCALL.
>> > 2. Host calls enclave's mmap() handler with mmap() parameters.
>> > 3. Enclave up-calls host's mmap().
>> > 4. Loops the range with EACCEPTCOPY.
>> >
>> > mprotect() has to be done like this:
>> > 1. Execution #UD's to SYSCALL.
>> > 2. Host calls enclave's mprotect() handler.
>> > 3. Enclave up-calls host's mprotect().
>> > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.

I assume up-calls here are ocalls as we call them in our implementation,  
which are the calls enclave make to untrusted side via EEXIT.

If so, can your implementation combine this two up-calls into one, then  
host side just do ioctl() and mprotect to kernel? If so, would that  
address your concern about extra up-calls?


>> > 3. Loops the range with EACCEPT.
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   5. Loops the range with EACCEPT + EMODPE.
>>
>> > This is just terrible IMHO. I hope these examples bring some insight.
>
> E.g. in Enarx we have to add a special up-call (so called enarxcall in
> intermediate that we call sallyport, which provides shared buffer to
> communicate with the enclave) just for reseting the range with PROT_READ.
> Feel very redundant, adds ugly cruft and is completely opposite strategy  
> to
> what you've chosen to do with EAUG, which is I think correct choice as  
> far
> as API is concerned.

The problem with EMODPR on #PF is that kernel needs to know what  
permissions requested from enclave at the time of #PF. So enclave has to  
make at least one call to kernel (again via ocall in our case, I assume  
up-call in your case) to make the change.

Enclave runtime may not know the permissions until upper layer application  
code (JIT or some kind of code loader) make the decision to change it. And  
the ocalls/up-calls can only be done at that time, not upfront, like mmap  
that is only used to reserve ranges.

I also see this model as consistent to what kernel does for regular memory  
mappings: adding physical pages on #PF or pre-fault and changing PTE  
permissions only after mprotect is called.

I would agree/prefer mprotect and the ioctl() for EMODPR be combined, but  
Reinette pointed out some issues above on managing VMAs and handling  
errors in that approach.

BR
Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14 15:32                       ` Reinette Chatre
@ 2022-03-17  4:30                         ` Jarkko Sakkinen
  2022-03-17 22:08                           ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17  4:30 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >> Supporting permission restriction in an ioctl() enables the runtime to manage
> >> the enclave memory without needing to map it.
> > 
> > Which is opposite what you do in EAUG. You can also augment pages without
> > needing the map them. Sure you get that capability, but it is quite useless
> > in practice.
> > 
> >> I have considered the idea of supporting the permission restriction with
> >> mprotect() but as you can see in this response I did not find it to be
> >> practical.
> > 
> > Where is it practical? What is your application? How is it practical to
> > delegate the concurrency management of a split mprotect() to user space?
> > How do we get rid off a useless up-call to the host?
> > 
> 
> The email you responded to contained many obstacles against using mprotect()
> but you chose to ignore them and snipped them all from your response. Could
> you please address the issues instead of dismissing them? 

I did read the whole email but did not see anything that would make a case
for fully exposed EMODPR, or having asymmetrical towards how EAUG works.

I had the same discussion with Haitao about PROT_NONE earlier, and am
fully aware that PROT_READ is required.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14 15:39                       ` Haitao Huang
@ 2022-03-17  4:34                         ` Jarkko Sakkinen
  2022-03-17 14:42                           ` Haitao Huang
  2022-03-17  4:37                         ` Jarkko Sakkinen
  2022-03-17  7:01                         ` Jarkko Sakkinen
  2 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17  4:34 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> Hi Jarkko
> 
> On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > >
> > > > > I saw Haitao's note that EMODPE requires "Read access permitted
> > > by enclave".
> > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > since it would
> > > > > not be possible to relax permissions (run EMODPE) after that.
> > > Even so, I
> > > > > also found in the SDM that EACCEPT has the note "Read access
> > > permitted
> > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > not practical
> > > > > from that perspective either since the enclave will not be able to
> > > > > EACCEPT the change. Does that match your understanding?
> > > >
> > > > Yes, PROT_NONE should not be allowed.
> > > >
> > > > This is however the real problem.
> > > >
> > > > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > > > simply unacceptable. It  also requires more concurrency management
> > > from
> > > > user space run-time, which would be heck a lot easier to do in the
> > > kernel.
> > > >
> > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > then EAUG
> > > > should be too. Like this when things go opposite directions, this
> > > patch set
> > > > plain and simply will not work out.
> > > >
> > > > I would pick EAUG's strategy from these two as it requires half
> > > the back
> > > > calls to host from an enclave. I.e. please combine mprotect() and
> > > EMODPR,
> > > > either in the #PF handler or as part of mprotect(), which ever
> > > suits you
> > > > best.
> > > >
> > > > I'll try demonstrate this with two examples.
> > > >
> > > > mmap() could go something like this() (simplified):
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > 3. Enclave up-calls host's mmap().
> > > > 4. Loops the range with EACCEPTCOPY.
> > > >
> > > > mprotect() has to be done like this:
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mprotect() handler.
> > > > 3. Enclave up-calls host's mprotect().
> > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> 
> I assume up-calls here are ocalls as we call them in our implementation,
> which are the calls enclave make to untrusted side via EEXIT.
> 
> If so, can your implementation combine this two up-calls into one, then host
> side just do ioctl() and mprotect to kernel? If so, would that address your
> concern about extra up-calls?
> 
> 
> > > > 3. Loops the range with EACCEPT.
> > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >   5. Loops the range with EACCEPT + EMODPE.
> > > 
> > > > This is just terrible IMHO. I hope these examples bring some insight.
> > 
> > E.g. in Enarx we have to add a special up-call (so called enarxcall in
> > intermediate that we call sallyport, which provides shared buffer to
> > communicate with the enclave) just for reseting the range with PROT_READ.
> > Feel very redundant, adds ugly cruft and is completely opposite strategy
> > to
> > what you've chosen to do with EAUG, which is I think correct choice as
> > far
> > as API is concerned.
> 
> The problem with EMODPR on #PF is that kernel needs to know what permissions
> requested from enclave at the time of #PF. So enclave has to make at least
> one call to kernel (again via ocall in our case, I assume up-call in your
> case) to make the change.

Your security scheme is broken if permissions are requested outside the
enclave, i.e. the hostile environment controls the permissions. That should
always come from the enclave and enclave uses EACCEPT* to validate that
what was given to EMODPR, EAUG and EMODT matches its expections.

Upper layer application should not never be in charge, and a broken
security scheme should never be supported.

If EMODPR sets unconditionally to PROT_READ, enclave is able to validate
this fact and then it can use EMODPE to set appropriate permissions.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14 15:39                       ` Haitao Huang
  2022-03-17  4:34                         ` Jarkko Sakkinen
@ 2022-03-17  4:37                         ` Jarkko Sakkinen
  2022-03-17 14:47                           ` Haitao Huang
  2022-03-17  7:01                         ` Jarkko Sakkinen
  2 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17  4:37 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> I also see this model as consistent to what kernel does for regular memory
> mappings: adding physical pages on #PF or pre-fault and changing PTE
> permissions only after mprotect is called.

And you were against this in EAUG's case. As in the EAUG's case
EMODPR could be done as part of the mprotect() flow.

BR, Jarkko


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-14 15:39                       ` Haitao Huang
  2022-03-17  4:34                         ` Jarkko Sakkinen
  2022-03-17  4:37                         ` Jarkko Sakkinen
@ 2022-03-17  7:01                         ` Jarkko Sakkinen
  2022-03-17  7:11                           ` Jarkko Sakkinen
  2 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17  7:01 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> Hi Jarkko
> 
> On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > >
> > > > > I saw Haitao's note that EMODPE requires "Read access permitted
> > > by enclave".
> > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > since it would
> > > > > not be possible to relax permissions (run EMODPE) after that.
> > > Even so, I
> > > > > also found in the SDM that EACCEPT has the note "Read access
> > > permitted
> > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > not practical
> > > > > from that perspective either since the enclave will not be able to
> > > > > EACCEPT the change. Does that match your understanding?
> > > >
> > > > Yes, PROT_NONE should not be allowed.
> > > >
> > > > This is however the real problem.
> > > >
> > > > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > > > simply unacceptable. It  also requires more concurrency management
> > > from
> > > > user space run-time, which would be heck a lot easier to do in the
> > > kernel.
> > > >
> > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > then EAUG
> > > > should be too. Like this when things go opposite directions, this
> > > patch set
> > > > plain and simply will not work out.
> > > >
> > > > I would pick EAUG's strategy from these two as it requires half
> > > the back
> > > > calls to host from an enclave. I.e. please combine mprotect() and
> > > EMODPR,
> > > > either in the #PF handler or as part of mprotect(), which ever
> > > suits you
> > > > best.
> > > >
> > > > I'll try demonstrate this with two examples.
> > > >
> > > > mmap() could go something like this() (simplified):
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > 3. Enclave up-calls host's mmap().
> > > > 4. Loops the range with EACCEPTCOPY.
> > > >
> > > > mprotect() has to be done like this:
> > > > 1. Execution #UD's to SYSCALL.
> > > > 2. Host calls enclave's mprotect() handler.
> > > > 3. Enclave up-calls host's mprotect().
> > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> 
> I assume up-calls here are ocalls as we call them in our implementation,
> which are the calls enclave make to untrusted side via EEXIT.
> 
> If so, can your implementation combine this two up-calls into one, then host
> side just do ioctl() and mprotect to kernel? If so, would that address your
> concern about extra up-calls?
> 
> 
> > > > 3. Loops the range with EACCEPT.
> > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >   5. Loops the range with EACCEPT + EMODPE.
> > > 
> > > > This is just terrible IMHO. I hope these examples bring some insight.
> > 
> > E.g. in Enarx we have to add a special up-call (so called enarxcall in
> > intermediate that we call sallyport, which provides shared buffer to
> > communicate with the enclave) just for reseting the range with PROT_READ.
> > Feel very redundant, adds ugly cruft and is completely opposite strategy
> > to
> > what you've chosen to do with EAUG, which is I think correct choice as
> > far
> > as API is concerned.
> 
> The problem with EMODPR on #PF is that kernel needs to know what permissions
> requested from enclave at the time of #PF. So enclave has to make at least
> one call to kernel (again via ocall in our case, I assume up-call in your
> case) to make the change.

The #PF handler should do unconditionally EMODPR with PROT_READ.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17  7:01                         ` Jarkko Sakkinen
@ 2022-03-17  7:11                           ` Jarkko Sakkinen
  2022-03-17 14:28                             ` Haitao Huang
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17  7:11 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > Hi Jarkko
> > 
> > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> > wrote:
> > 
> > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > >
> > > > > > I saw Haitao's note that EMODPE requires "Read access permitted
> > > > by enclave".
> > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > since it would
> > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > Even so, I
> > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > permitted
> > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > not practical
> > > > > > from that perspective either since the enclave will not be able to
> > > > > > EACCEPT the change. Does that match your understanding?
> > > > >
> > > > > Yes, PROT_NONE should not be allowed.
> > > > >
> > > > > This is however the real problem.
> > > > >
> > > > > The current kernel patch set has inconsistent API and EMODPR ioctl is
> > > > > simply unacceptable. It  also requires more concurrency management
> > > > from
> > > > > user space run-time, which would be heck a lot easier to do in the
> > > > kernel.
> > > > >
> > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > then EAUG
> > > > > should be too. Like this when things go opposite directions, this
> > > > patch set
> > > > > plain and simply will not work out.
> > > > >
> > > > > I would pick EAUG's strategy from these two as it requires half
> > > > the back
> > > > > calls to host from an enclave. I.e. please combine mprotect() and
> > > > EMODPR,
> > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > suits you
> > > > > best.
> > > > >
> > > > > I'll try demonstrate this with two examples.
> > > > >
> > > > > mmap() could go something like this() (simplified):
> > > > > 1. Execution #UD's to SYSCALL.
> > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > 3. Enclave up-calls host's mmap().
> > > > > 4. Loops the range with EACCEPTCOPY.
> > > > >
> > > > > mprotect() has to be done like this:
> > > > > 1. Execution #UD's to SYSCALL.
> > > > > 2. Host calls enclave's mprotect() handler.
> > > > > 3. Enclave up-calls host's mprotect().
> > > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
> > 
> > I assume up-calls here are ocalls as we call them in our implementation,
> > which are the calls enclave make to untrusted side via EEXIT.
> > 
> > If so, can your implementation combine this two up-calls into one, then host
> > side just do ioctl() and mprotect to kernel? If so, would that address your
> > concern about extra up-calls?
> > 
> > 
> > > > > 3. Loops the range with EACCEPT.
> > > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > >   5. Loops the range with EACCEPT + EMODPE.
> > > > 
> > > > > This is just terrible IMHO. I hope these examples bring some insight.
> > > 
> > > E.g. in Enarx we have to add a special up-call (so called enarxcall in
> > > intermediate that we call sallyport, which provides shared buffer to
> > > communicate with the enclave) just for reseting the range with PROT_READ.
> > > Feel very redundant, adds ugly cruft and is completely opposite strategy
> > > to
> > > what you've chosen to do with EAUG, which is I think correct choice as
> > > far
> > > as API is concerned.
> > 
> > The problem with EMODPR on #PF is that kernel needs to know what permissions
> > requested from enclave at the time of #PF. So enclave has to make at least
> > one call to kernel (again via ocall in our case, I assume up-call in your
> > case) to make the change.
> 
> The #PF handler should do unconditionally EMODPR with PROT_READ.

Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
this detail hugely anymore because it does not affect uapi. 

Using EMODPR as a permission control mechanism is a ridiculous idea, and
I cannot commit to maintain a broken uapi.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17  7:11                           ` Jarkko Sakkinen
@ 2022-03-17 14:28                             ` Haitao Huang
  2022-03-17 21:50                               ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Haitao Huang @ 2022-03-17 14:28 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

Hi

On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
>> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
>> > Hi Jarkko
>> >
>> > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen  
>> <jarkko@kernel.org>
>> > wrote:
>> >
>> > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
>> > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
>> > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>> > > > >
>> > > > > > I saw Haitao's note that EMODPE requires "Read access  
>> permitted
>> > > > by enclave".
>> > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
>> > > > since it would
>> > > > > > not be possible to relax permissions (run EMODPE) after that.
>> > > > Even so, I
>> > > > > > also found in the SDM that EACCEPT has the note "Read access
>> > > > permitted
>> > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
>> > > > not practical
>> > > > > > from that perspective either since the enclave will not be  
>> able to
>> > > > > > EACCEPT the change. Does that match your understanding?
>> > > > >
>> > > > > Yes, PROT_NONE should not be allowed.
>> > > > >
>> > > > > This is however the real problem.
>> > > > >
>> > > > > The current kernel patch set has inconsistent API and EMODPR  
>> ioctl is
>> > > > > simply unacceptable. It  also requires more concurrency  
>> management
>> > > > from
>> > > > > user space run-time, which would be heck a lot easier to do in  
>> the
>> > > > kernel.
>> > > > >
>> > > > > If you really want EMODPR as ioctl, then for consistencys sake,
>> > > > then EAUG
>> > > > > should be too. Like this when things go opposite directions,  
>> this
>> > > > patch set
>> > > > > plain and simply will not work out.
>> > > > >
>> > > > > I would pick EAUG's strategy from these two as it requires half
>> > > > the back
>> > > > > calls to host from an enclave. I.e. please combine mprotect()  
>> and
>> > > > EMODPR,
>> > > > > either in the #PF handler or as part of mprotect(), which ever
>> > > > suits you
>> > > > > best.
>> > > > >
>> > > > > I'll try demonstrate this with two examples.
>> > > > >
>> > > > > mmap() could go something like this() (simplified):
>> > > > > 1. Execution #UD's to SYSCALL.
>> > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
>> > > > > 3. Enclave up-calls host's mmap().
>> > > > > 4. Loops the range with EACCEPTCOPY.
>> > > > >
>> > > > > mprotect() has to be done like this:
>> > > > > 1. Execution #UD's to SYSCALL.
>> > > > > 2. Host calls enclave's mprotect() handler.
>> > > > > 3. Enclave up-calls host's mprotect().
>> > > > > 4. Enclave up-calls host's ioctl() to  
>> SGX_IOC_ENCLAVE_PERMISSIONS.
>> >
>> > I assume up-calls here are ocalls as we call them in our  
>> implementation,
>> > which are the calls enclave make to untrusted side via EEXIT.
>> >ar
>> > If so, can your implementation combine this two up-calls into one,  
>> then host
>> > side just do ioctl() and mprotect to kernel? If so, would that  
>> address your
>> > concern about extra up-calls?
>> >
>> >
>> > > > > 3. Loops the range with EACCEPT.
>> > > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > > >   5. Loops the range with EACCEPT + EMODPE.
>> > > >
>> > > > > This is just terrible IMHO. I hope these examples bring some  
>> insight.
>> > >
>> > > E.g. in Enarx we have to add a special up-call (so called enarxcall  
>> in
>> > > intermediate that we call sallyport, which provides shared buffer to
>> > > communicate with the enclave) just for reseting the range with  
>> PROT_READ.
>> > > Feel very redundant, adds ugly cruft and is completely opposite  
>> strategy
>> > > to
>> > > what you've chosen to do with EAUG, which is I think correct choice  
>> as
>> > > far
>> > > as API is concerned.
>> >
>> > The problem with EMODPR on #PF is that kernel needs to know what  
>> permissions
>> > requested from enclave at the time of #PF. So enclave has to make at  
>> least
>> > one call to kernel (again via ocall in our case, I assume up-call in  
>> your
>> > case) to make the change.
>>
>> The #PF handler should do unconditionally EMODPR with PROT_READ.
>
> Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> this detail hugely anymore because it does not affect uapi.
>
> Using EMODPR as a permission control mechanism is a ridiculous idea, and
> I cannot commit to maintain a broken uapi.
>

Jarkko, how would automatically forcing PROT_READ on #PF work for this  
sequence?

1) EAUG a page (has to be RW)
2) EACCEPT(RW)
3) enclave copies some data to page
4) enclave wants to change permission to R

If you are proposing mprotect, then as I indicated earlier, please address  
concerns raised by Reinette:
https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/



Thanks
Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17  4:34                         ` Jarkko Sakkinen
@ 2022-03-17 14:42                           ` Haitao Huang
  0 siblings, 0 replies; 130+ messages in thread
From: Haitao Huang @ 2022-03-17 14:42 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Wed, 16 Mar 2022 23:34:39 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
>> Hi Jarkko
>>
>> On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org>
>> wrote:
>>
>> > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
>> > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
>> > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
>> > > >
>> > > > > I saw Haitao's note that EMODPE requires "Read access permitted
>> > > by enclave".
>> > > > > This motivates that EMODPR->PROT_NONE should not be allowed
>> > > since it would
>> > > > > not be possible to relax permissions (run EMODPE) after that.
>> > > Even so, I
>> > > > > also found in the SDM that EACCEPT has the note "Read access
>> > > permitted
>> > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
>> > > not practical
>> > > > > from that perspective either since the enclave will not be able  
>> to
>> > > > > EACCEPT the change. Does that match your understanding?
>> > > >
>> > > > Yes, PROT_NONE should not be allowed.
>> > > >
>> > > > This is however the real problem.
>> > > >
>> > > > The current kernel patch set has inconsistent API and EMODPR  
>> ioctl is
>> > > > simply unacceptable. It  also requires more concurrency management
>> > > from
>> > > > user space run-time, which would be heck a lot easier to do in the
>> > > kernel.
>> > > >
>> > > > If you really want EMODPR as ioctl, then for consistencys sake,
>> > > then EAUG
>> > > > should be too. Like this when things go opposite directions, this
>> > > patch set
>> > > > plain and simply will not work out.
>> > > >
>> > > > I would pick EAUG's strategy from these two as it requires half
>> > > the back
>> > > > calls to host from an enclave. I.e. please combine mprotect() and
>> > > EMODPR,
>> > > > either in the #PF handler or as part of mprotect(), which ever
>> > > suits you
>> > > > best.
>> > > >
>> > > > I'll try demonstrate this with two examples.
>> > > >
>> > > > mmap() could go something like this() (simplified):
>> > > > 1. Execution #UD's to SYSCALL.
>> > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
>> > > > 3. Enclave up-calls host's mmap().
>> > > > 4. Loops the range with EACCEPTCOPY.
>> > > >
>> > > > mprotect() has to be done like this:
>> > > > 1. Execution #UD's to SYSCALL.
>> > > > 2. Host calls enclave's mprotect() handler.
>> > > > 3. Enclave up-calls host's mprotect().
>> > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS.
>>
>> I assume up-calls here are ocalls as we call them in our implementation,
>> which are the calls enclave make to untrusted side via EEXIT.
>>
>> If so, can your implementation combine this two up-calls into one, then  
>> host
>> side just do ioctl() and mprotect to kernel? If so, would that address  
>> your
>> concern about extra up-calls?
>>
>>
>> > > > 3. Loops the range with EACCEPT.
>> > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > >   5. Loops the range with EACCEPT + EMODPE.
>> > >
>> > > > This is just terrible IMHO. I hope these examples bring some  
>> insight.
>> >
>> > E.g. in Enarx we have to add a special up-call (so called enarxcall in
>> > intermediate that we call sallyport, which provides shared buffer to
>> > communicate with the enclave) just for reseting the range with  
>> PROT_READ.
>> > Feel very redundant, adds ugly cruft and is completely opposite  
>> strategy
>> > to
>> > what you've chosen to do with EAUG, which is I think correct choice as
>> > far
>> > as API is concerned.
>>
>> The problem with EMODPR on #PF is that kernel needs to know what  
>> permissions
>> requested from enclave at the time of #PF. So enclave has to make at  
>> least
>> one call to kernel (again via ocall in our case, I assume up-call in  
>> your
>> case) to make the change.
>
> Your security scheme is broken if permissions are requested outside the
> enclave, i.e. the hostile environment controls the permissions. That  
> should
> always come from the enclave and enclave uses EACCEPT* to validate that
> what was given to EMODPR, EAUG and EMODT matches its expections.
>
> Upper layer application should not never be in charge, and a broken
> security scheme should never be supported.
>
Upper layer in this case I mean code inside enclave.
Enclave can always use EACCEPT to verify permissions and is in full  
control of EPCM permissions.
Kernel(code outside enclave invoking kernel) would only be able to reduce  
EPCM permissions, and as you know enclave can always EMODPE.
So this is not related to enclave security.

Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17  4:37                         ` Jarkko Sakkinen
@ 2022-03-17 14:47                           ` Haitao Huang
  0 siblings, 0 replies; 130+ messages in thread
From: Haitao Huang @ 2022-03-17 14:47 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Wed, 16 Mar 2022 23:37:26 -0500, Jarkko Sakkinen <jarkko@kernel.org>  
wrote:

> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
>> I also see this model as consistent to what kernel does for regular  
>> memory
>> mappings: adding physical pages on #PF or pre-fault and changing PTE
>> permissions only after mprotect is called.
>
> And you were against this in EAUG's case. As in the EAUG's case
> EMODPR could be done as part of the mprotect() flow.
>

I preferred not automatic/unconditional EAUG during mmap.
Here I think automatic/unconditional EMODPR(PROT_READ) on #PF would not  
work for all cases. See my reply to your other email.

Thanks
Haitao

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17 14:28                             ` Haitao Huang
@ 2022-03-17 21:50                               ` Jarkko Sakkinen
  2022-03-17 22:00                                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17 21:50 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote:
> Hi
> 
> On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> wrote:
> 
> > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > > > Hi Jarkko
> > > >
> > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen
> > > <jarkko@kernel.org>
> > > > wrote:
> > > >
> > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > > > >
> > > > > > > > I saw Haitao's note that EMODPE requires "Read access
> > > permitted
> > > > > > by enclave".
> > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > > > since it would
> > > > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > > > Even so, I
> > > > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > > > permitted
> > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > > > not practical
> > > > > > > > from that perspective either since the enclave will not be
> > > able to
> > > > > > > > EACCEPT the change. Does that match your understanding?
> > > > > > >
> > > > > > > Yes, PROT_NONE should not be allowed.
> > > > > > >
> > > > > > > This is however the real problem.
> > > > > > >
> > > > > > > The current kernel patch set has inconsistent API and EMODPR
> > > ioctl is
> > > > > > > simply unacceptable. It  also requires more concurrency
> > > management
> > > > > > from
> > > > > > > user space run-time, which would be heck a lot easier to do
> > > in the
> > > > > > kernel.
> > > > > > >
> > > > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > > > then EAUG
> > > > > > > should be too. Like this when things go opposite directions,
> > > this
> > > > > > patch set
> > > > > > > plain and simply will not work out.
> > > > > > >
> > > > > > > I would pick EAUG's strategy from these two as it requires half
> > > > > > the back
> > > > > > > calls to host from an enclave. I.e. please combine
> > > mprotect() and
> > > > > > EMODPR,
> > > > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > > > suits you
> > > > > > > best.
> > > > > > >
> > > > > > > I'll try demonstrate this with two examples.
> > > > > > >
> > > > > > > mmap() could go something like this() (simplified):
> > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > > > 3. Enclave up-calls host's mmap().
> > > > > > > 4. Loops the range with EACCEPTCOPY.
> > > > > > >
> > > > > > > mprotect() has to be done like this:
> > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > 2. Host calls enclave's mprotect() handler.
> > > > > > > 3. Enclave up-calls host's mprotect().
> > > > > > > 4. Enclave up-calls host's ioctl() to
> > > SGX_IOC_ENCLAVE_PERMISSIONS.
> > > >
> > > > I assume up-calls here are ocalls as we call them in our
> > > implementation,
> > > > which are the calls enclave make to untrusted side via EEXIT.
> > > >ar
> > > > If so, can your implementation combine this two up-calls into one,
> > > then host
> > > > side just do ioctl() and mprotect to kernel? If so, would that
> > > address your
> > > > concern about extra up-calls?
> > > >
> > > >
> > > > > > > 3. Loops the range with EACCEPT.
> > > > > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > >   5. Loops the range with EACCEPT + EMODPE.
> > > > > >
> > > > > > > This is just terrible IMHO. I hope these examples bring some
> > > insight.
> > > > >
> > > > > E.g. in Enarx we have to add a special up-call (so called
> > > enarxcall in
> > > > > intermediate that we call sallyport, which provides shared buffer to
> > > > > communicate with the enclave) just for reseting the range with
> > > PROT_READ.
> > > > > Feel very redundant, adds ugly cruft and is completely opposite
> > > strategy
> > > > > to
> > > > > what you've chosen to do with EAUG, which is I think correct
> > > choice as
> > > > > far
> > > > > as API is concerned.
> > > >
> > > > The problem with EMODPR on #PF is that kernel needs to know what
> > > permissions
> > > > requested from enclave at the time of #PF. So enclave has to make
> > > at least
> > > > one call to kernel (again via ocall in our case, I assume up-call
> > > in your
> > > > case) to make the change.
> > > 
> > > The #PF handler should do unconditionally EMODPR with PROT_READ.
> > 
> > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> > this detail hugely anymore because it does not affect uapi.
> > 
> > Using EMODPR as a permission control mechanism is a ridiculous idea, and
> > I cannot commit to maintain a broken uapi.
> > 
> 
> Jarkko, how would automatically forcing PROT_READ on #PF work for this
> sequence?
> 
> 1) EAUG a page (has to be RW)
> 2) EACCEPT(RW)
> 3) enclave copies some data to page
> 4) enclave wants to change permission to R
> 
> If you are proposing mprotect, then as I indicated earlier, please address
> concerns raised by Reinette:
> https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/

For EAUG you can choose between #PF handler and having it as part of
mmap() with the same uapi.

For EMODPR clearly #PF handler would be tricky but nothing prevents
resetting the permissions as part of mprotect() flow, which is trivial.

One good reason to have a fixed EMODPR is that e.g. emulating properly
mprotect() is almost undoable if you don't do it otherwise. Specifically
the scenario where your address range spans through multiple adjacent
VMAs. It's even without EMODPR complex enough scenario that you really
don't want to ask yourself for more trouble than use EMODPR in a super
conservative manner.

Having EMODPR fully exposed will only make more difficult API to do with
extra round-trips. If you want to use ring-0 instructions fully exposed,
please don't use a kernel. There's a bunch of hardware features in Intel
CPUs for which Linux does not provide 1:1 all wide open interfaces.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17 21:50                               ` Jarkko Sakkinen
@ 2022-03-17 22:00                                 ` Jarkko Sakkinen
  2022-03-17 22:23                                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17 22:00 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Thu, Mar 17, 2022 at 11:50:41PM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote:
> > Hi
> > 
> > On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> > wrote:
> > 
> > > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> > > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > > > > Hi Jarkko
> > > > >
> > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen
> > > > <jarkko@kernel.org>
> > > > > wrote:
> > > > >
> > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > > > > >
> > > > > > > > > I saw Haitao's note that EMODPE requires "Read access
> > > > permitted
> > > > > > > by enclave".
> > > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > > > > since it would
> > > > > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > > > > Even so, I
> > > > > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > > > > permitted
> > > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > > > > not practical
> > > > > > > > > from that perspective either since the enclave will not be
> > > > able to
> > > > > > > > > EACCEPT the change. Does that match your understanding?
> > > > > > > >
> > > > > > > > Yes, PROT_NONE should not be allowed.
> > > > > > > >
> > > > > > > > This is however the real problem.
> > > > > > > >
> > > > > > > > The current kernel patch set has inconsistent API and EMODPR
> > > > ioctl is
> > > > > > > > simply unacceptable. It  also requires more concurrency
> > > > management
> > > > > > > from
> > > > > > > > user space run-time, which would be heck a lot easier to do
> > > > in the
> > > > > > > kernel.
> > > > > > > >
> > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > > > > then EAUG
> > > > > > > > should be too. Like this when things go opposite directions,
> > > > this
> > > > > > > patch set
> > > > > > > > plain and simply will not work out.
> > > > > > > >
> > > > > > > > I would pick EAUG's strategy from these two as it requires half
> > > > > > > the back
> > > > > > > > calls to host from an enclave. I.e. please combine
> > > > mprotect() and
> > > > > > > EMODPR,
> > > > > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > > > > suits you
> > > > > > > > best.
> > > > > > > >
> > > > > > > > I'll try demonstrate this with two examples.
> > > > > > > >
> > > > > > > > mmap() could go something like this() (simplified):
> > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > > > > 3. Enclave up-calls host's mmap().
> > > > > > > > 4. Loops the range with EACCEPTCOPY.
> > > > > > > >
> > > > > > > > mprotect() has to be done like this:
> > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > 2. Host calls enclave's mprotect() handler.
> > > > > > > > 3. Enclave up-calls host's mprotect().
> > > > > > > > 4. Enclave up-calls host's ioctl() to
> > > > SGX_IOC_ENCLAVE_PERMISSIONS.
> > > > >
> > > > > I assume up-calls here are ocalls as we call them in our
> > > > implementation,
> > > > > which are the calls enclave make to untrusted side via EEXIT.
> > > > >ar
> > > > > If so, can your implementation combine this two up-calls into one,
> > > > then host
> > > > > side just do ioctl() and mprotect to kernel? If so, would that
> > > > address your
> > > > > concern about extra up-calls?
> > > > >
> > > > >
> > > > > > > > 3. Loops the range with EACCEPT.
> > > > > > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > >   5. Loops the range with EACCEPT + EMODPE.
> > > > > > >
> > > > > > > > This is just terrible IMHO. I hope these examples bring some
> > > > insight.
> > > > > >
> > > > > > E.g. in Enarx we have to add a special up-call (so called
> > > > enarxcall in
> > > > > > intermediate that we call sallyport, which provides shared buffer to
> > > > > > communicate with the enclave) just for reseting the range with
> > > > PROT_READ.
> > > > > > Feel very redundant, adds ugly cruft and is completely opposite
> > > > strategy
> > > > > > to
> > > > > > what you've chosen to do with EAUG, which is I think correct
> > > > choice as
> > > > > > far
> > > > > > as API is concerned.
> > > > >
> > > > > The problem with EMODPR on #PF is that kernel needs to know what
> > > > permissions
> > > > > requested from enclave at the time of #PF. So enclave has to make
> > > > at least
> > > > > one call to kernel (again via ocall in our case, I assume up-call
> > > > in your
> > > > > case) to make the change.
> > > > 
> > > > The #PF handler should do unconditionally EMODPR with PROT_READ.
> > > 
> > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> > > this detail hugely anymore because it does not affect uapi.
> > > 
> > > Using EMODPR as a permission control mechanism is a ridiculous idea, and
> > > I cannot commit to maintain a broken uapi.
> > > 
> > 
> > Jarkko, how would automatically forcing PROT_READ on #PF work for this
> > sequence?
> > 
> > 1) EAUG a page (has to be RW)
> > 2) EACCEPT(RW)
> > 3) enclave copies some data to page
> > 4) enclave wants to change permission to R
> > 
> > If you are proposing mprotect, then as I indicated earlier, please address
> > concerns raised by Reinette:
> > https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/
> 
> For EAUG you can choose between #PF handler and having it as part of
> mmap() with the same uapi.
> 
> For EMODPR clearly #PF handler would be tricky but nothing prevents
> resetting the permissions as part of mprotect() flow, which is trivial.
> 
> One good reason to have a fixed EMODPR is that e.g. emulating properly
> mprotect() is almost undoable if you don't do it otherwise. Specifically

s/don't//g

> the scenario where your address range spans through multiple adjacent
> VMAs. It's even without EMODPR complex enough scenario that you really
> don't want to ask yourself for more trouble than use EMODPR in a super
> conservative manner.
> 
> Having EMODPR fully exposed will only make more difficult API to do with
> extra round-trips. If you want to use ring-0 instructions fully exposed,
> please don't use a kernel. There's a bunch of hardware features in Intel
> CPUs for which Linux does not provide 1:1 all wide open interfaces.
> 
> BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17  4:30                         ` Jarkko Sakkinen
@ 2022-03-17 22:08                           ` Reinette Chatre
  2022-03-17 22:51                             ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-17 22:08 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
>>>> the enclave memory without needing to map it.
>>>
>>> Which is opposite what you do in EAUG. You can also augment pages without
>>> needing the map them. Sure you get that capability, but it is quite useless
>>> in practice.
>>>
>>>> I have considered the idea of supporting the permission restriction with
>>>> mprotect() but as you can see in this response I did not find it to be
>>>> practical.
>>>
>>> Where is it practical? What is your application? How is it practical to
>>> delegate the concurrency management of a split mprotect() to user space?
>>> How do we get rid off a useless up-call to the host?
>>>
>>
>> The email you responded to contained many obstacles against using mprotect()
>> but you chose to ignore them and snipped them all from your response. Could
>> you please address the issues instead of dismissing them? 
> 
> I did read the whole email but did not see anything that would make a case
> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.

I believe that on its own each obstacle I shared with you is significant enough
to not follow that approach. You simply respond that I am just not making a
case without acknowledging any obstacle or providing a reason why the obstacles
are not valid.

To help me understand your view, could you please respond to each of the
obstacles I list below and how it is not an issue?


1) ABI change:
   mprotect() is currently supported to modify VMA permissions
   irrespective of EPCM permissions. Supporting EPCM permission
   changes with mprotect() would change this behavior.
   For example, currently it is possible to have RW enclave
   memory and support multiple tasks accessing the memory. Two
   tasks can map the memory RW and later one can run mprotect()
   to reduce the VMA permissions to read-only without impacting
   the access of the other task.
   By moving EPCM permission changes to mprotect() this usage
   will no longer be supported and current behavior will change.
   
2) Only half EPCM permission management:
   Moving to mprotect() as a way to set EPCM permissions is
   not a clear interface for EPCM permission management because
   the kernel can only restrict permissions. Even so, the kernel
   has no insight into the current EPCM permissions and thus whether they
   actually need to be restricted so every mprotect() call,
   all except RWX, will need to be treated as a permission
   restriction with all the implementation obstacles
   that accompany it (more below).

There are two possible ways to implement permission restriction
as triggered by mprotect(), (a) during the mprotect() call or
(b) during a subsequent #PF (as suggested by you), each has
its own obstacles.

3) mprotect() implementation 

   When the user calls mprotect() the expectation is that the
   call will either succeed or fail. If the call fails the user
   expects the system to be unchanged. This is not possible if
   permission restriction is done as part of mprotect().

   (a) mprotect() may span multiple VMAs and involves VMA splits
       that (from what I understand) cannot be undone. SGX memory
       does not support VMA merges. If any SGX function
       (EMODPR or ETRACK on any page) done after a VMA split fails
       then the user will be left with fragmented memory.

   (b) The EMODPR/ETRACK pair can fail on any of the pages provided
       by the mprotect() call. If there is a failure then the
       kernel cannot undo previously executed EMODPR since the kernel
       cannot run EMODPE. The EPCM permissions are thus left in inconsistent
       state since some of the pages would have changed EPCM permissions
       and mprotect() does not have mechanism to communicate
       partial success.
       The partial success is needed to communicate to user space
       (i) which pages need EACCEPT, (ii) which pages need to be
       in new request (although user space does not have information
       to help the new request succeed - see below).

   (c) User space runtime has control over management of EPC memory
       and accurate failure information would help it to do so.
       Knowing the error code of the EMODPR failure would help
       user space to take appropriate action. For example, EMODPR
       can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
       to learn that it needs to run EACCEPT on that page before
       the EMODPR can succeed. Alternatively, if it learns that the
       return is "SGX_EPC_PAGE_CONFLICT" then it could determine
       that some other part of the runtime attempted an ENCLU 
       function on that page.
       It is not possible to provide such detailed errors to user
       space with mprotect().


4) #PF implementation

   (a) There is more to restricting permissions than just running
       ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
       also initiate the ETRACK flow to ensure that any thread within
       the enclave is interrupted by sending an IPI to the CPU, 
       this includes the thread that just triggered the #PF.        

   (b) Second consideration of the EMODPR and ETRACK flow is that
       this has a large "blast radius" in that any thread in the
       enclave needs to be interrupted. #PFs may arrive at any time
       so setting up a page range where a fault into any page in the
       page range will trigger enclave exits for all threads is
       a significant yet random impact. I believe it would be better
       to update all pages in the range at the same time and in this
       way contain the impact of this significant EMODPR/ETRACK/IPIs
       flow.

   (c) How will the page fault handler know when EMODPR/ETRACK should
       be run? Consider that the page fault handler can be called
       significantly later than the mprotect() call and that
       user space can call EMODPE any time to extend permissions.
       This implies that EMODPR/ETRACK/IPIs should be run during
       *every* page fault, irrespective of mprotect().

   (d) If a page is in pending or modified state then EMODPR will
       always fail. This is something that needs to be fixed by
       user space runtime but the page fault will not be able
       to communicate this.     

Considering the above, could you please provide clear guidance on
how you envision permission restriction to be supported by mprotect()?

Reinette

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17 22:00                                 ` Jarkko Sakkinen
@ 2022-03-17 22:23                                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17 22:23 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Reinette Chatre, Dhanraj, Vijay, dave.hansen, tglx, bp,
	Lutomirski, Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel, nathaniel

On Fri, Mar 18, 2022 at 12:00:17AM +0200, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 11:50:41PM +0200, Jarkko Sakkinen wrote:
> > On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote:
> > > Hi
> > > 
> > > On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org>
> > > wrote:
> > > 
> > > > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote:
> > > > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote:
> > > > > > Hi Jarkko
> > > > > >
> > > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen
> > > > > <jarkko@kernel.org>
> > > > > > wrote:
> > > > > >
> > > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote:
> > > > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote:
> > > > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote:
> > > > > > > > >
> > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access
> > > > > permitted
> > > > > > > > by enclave".
> > > > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed
> > > > > > > > since it would
> > > > > > > > > > not be possible to relax permissions (run EMODPE) after that.
> > > > > > > > Even so, I
> > > > > > > > > > also found in the SDM that EACCEPT has the note "Read access
> > > > > > > > permitted
> > > > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is
> > > > > > > > not practical
> > > > > > > > > > from that perspective either since the enclave will not be
> > > > > able to
> > > > > > > > > > EACCEPT the change. Does that match your understanding?
> > > > > > > > >
> > > > > > > > > Yes, PROT_NONE should not be allowed.
> > > > > > > > >
> > > > > > > > > This is however the real problem.
> > > > > > > > >
> > > > > > > > > The current kernel patch set has inconsistent API and EMODPR
> > > > > ioctl is
> > > > > > > > > simply unacceptable. It  also requires more concurrency
> > > > > management
> > > > > > > > from
> > > > > > > > > user space run-time, which would be heck a lot easier to do
> > > > > in the
> > > > > > > > kernel.
> > > > > > > > >
> > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake,
> > > > > > > > then EAUG
> > > > > > > > > should be too. Like this when things go opposite directions,
> > > > > this
> > > > > > > > patch set
> > > > > > > > > plain and simply will not work out.
> > > > > > > > >
> > > > > > > > > I would pick EAUG's strategy from these two as it requires half
> > > > > > > > the back
> > > > > > > > > calls to host from an enclave. I.e. please combine
> > > > > mprotect() and
> > > > > > > > EMODPR,
> > > > > > > > > either in the #PF handler or as part of mprotect(), which ever
> > > > > > > > suits you
> > > > > > > > > best.
> > > > > > > > >
> > > > > > > > > I'll try demonstrate this with two examples.
> > > > > > > > >
> > > > > > > > > mmap() could go something like this() (simplified):
> > > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters.
> > > > > > > > > 3. Enclave up-calls host's mmap().
> > > > > > > > > 4. Loops the range with EACCEPTCOPY.
> > > > > > > > >
> > > > > > > > > mprotect() has to be done like this:
> > > > > > > > > 1. Execution #UD's to SYSCALL.
> > > > > > > > > 2. Host calls enclave's mprotect() handler.
> > > > > > > > > 3. Enclave up-calls host's mprotect().
> > > > > > > > > 4. Enclave up-calls host's ioctl() to
> > > > > SGX_IOC_ENCLAVE_PERMISSIONS.
> > > > > >
> > > > > > I assume up-calls here are ocalls as we call them in our
> > > > > implementation,
> > > > > > which are the calls enclave make to untrusted side via EEXIT.
> > > > > >ar
> > > > > > If so, can your implementation combine this two up-calls into one,
> > > > > then host
> > > > > > side just do ioctl() and mprotect to kernel? If so, would that
> > > > > address your
> > > > > > concern about extra up-calls?
> > > > > >
> > > > > >
> > > > > > > > > 3. Loops the range with EACCEPT.
> > > > > > > >   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > > >   5. Loops the range with EACCEPT + EMODPE.
> > > > > > > >
> > > > > > > > > This is just terrible IMHO. I hope these examples bring some
> > > > > insight.
> > > > > > >
> > > > > > > E.g. in Enarx we have to add a special up-call (so called
> > > > > enarxcall in
> > > > > > > intermediate that we call sallyport, which provides shared buffer to
> > > > > > > communicate with the enclave) just for reseting the range with
> > > > > PROT_READ.
> > > > > > > Feel very redundant, adds ugly cruft and is completely opposite
> > > > > strategy
> > > > > > > to
> > > > > > > what you've chosen to do with EAUG, which is I think correct
> > > > > choice as
> > > > > > > far
> > > > > > > as API is concerned.
> > > > > >
> > > > > > The problem with EMODPR on #PF is that kernel needs to know what
> > > > > permissions
> > > > > > requested from enclave at the time of #PF. So enclave has to make
> > > > > at least
> > > > > > one call to kernel (again via ocall in our case, I assume up-call
> > > > > in your
> > > > > > case) to make the change.
> > > > > 
> > > > > The #PF handler should do unconditionally EMODPR with PROT_READ.
> > > > 
> > > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about
> > > > this detail hugely anymore because it does not affect uapi.
> > > > 
> > > > Using EMODPR as a permission control mechanism is a ridiculous idea, and
> > > > I cannot commit to maintain a broken uapi.
> > > > 
> > > 
> > > Jarkko, how would automatically forcing PROT_READ on #PF work for this
> > > sequence?
> > > 
> > > 1) EAUG a page (has to be RW)
> > > 2) EACCEPT(RW)
> > > 3) enclave copies some data to page
> > > 4) enclave wants to change permission to R
> > > 
> > > If you are proposing mprotect, then as I indicated earlier, please address
> > > concerns raised by Reinette:
> > > https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/
> > 
> > For EAUG you can choose between #PF handler and having it as part of
> > mmap() with the same uapi.
> > 
> > For EMODPR clearly #PF handler would be tricky but nothing prevents
> > resetting the permissions as part of mprotect() flow, which is trivial.
> > 
> > One good reason to have a fixed EMODPR is that e.g. emulating properly
> > mprotect() is almost undoable if you don't do it otherwise. Specifically
> 
> s/don't//g
> 
> > the scenario where your address range spans through multiple adjacent
> > VMAs. It's even without EMODPR complex enough scenario that you really
> > don't want to ask yourself for more trouble than use EMODPR in a super
> > conservative manner.
> > 
> > Having EMODPR fully exposed will only make more difficult API to do with
> > extra round-trips. If you want to use ring-0 instructions fully exposed,
> > please don't use a kernel. There's a bunch of hardware features in Intel
> > CPUs for which Linux does not provide 1:1 all wide open interfaces.

I've now run a tweaked SGX2 v2 patch set [*] over 1,5 weeks and I'm really
really confident about the stability. My laptop has not crashed a single
time. For EAUG portion I'm probably rather sooner than later ready to give
reviewed-by's because the API works just great.

Just want to put a note that it is not the internals that I'm too concerned
off. For v3 I'd suggest that it is sent as you see fit and not to get stuck
to EMODPR.

What I'll do, once I get it, is that I'll construct a small well-defined
patch or perhaps patch set, which shows how I would change the EMODPR part.

[*] I run it my 2020 XPS13 laptop, which is SGX2 capable, and created this
    CI thing that produces periodically automated kernel package builds of
    it for the Arch Linux: https://github.com/jarkkojs/aur-linux-sgx/actions.
    It's distro kernel with the same config, Reinette's patches on top, and
    my tweaks on top of them. When v3 comes out, I'll update the kernel
    version and replaces the v2+ patches with them.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17 22:08                           ` Reinette Chatre
@ 2022-03-17 22:51                             ` Jarkko Sakkinen
  2022-03-18  0:11                               ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-17 22:51 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> > On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> >>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> >>>> the enclave memory without needing to map it.
> >>>
> >>> Which is opposite what you do in EAUG. You can also augment pages without
> >>> needing the map them. Sure you get that capability, but it is quite useless
> >>> in practice.
> >>>
> >>>> I have considered the idea of supporting the permission restriction with
> >>>> mprotect() but as you can see in this response I did not find it to be
> >>>> practical.
> >>>
> >>> Where is it practical? What is your application? How is it practical to
> >>> delegate the concurrency management of a split mprotect() to user space?
> >>> How do we get rid off a useless up-call to the host?
> >>>
> >>
> >> The email you responded to contained many obstacles against using mprotect()
> >> but you chose to ignore them and snipped them all from your response. Could
> >> you please address the issues instead of dismissing them? 
> > 
> > I did read the whole email but did not see anything that would make a case
> > for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
> 
> I believe that on its own each obstacle I shared with you is significant enough
> to not follow that approach. You simply respond that I am just not making a
> case without acknowledging any obstacle or providing a reason why the obstacles
> are not valid.
> 
> To help me understand your view, could you please respond to each of the
> obstacles I list below and how it is not an issue?
> 
> 
> 1) ABI change:
>    mprotect() is currently supported to modify VMA permissions
>    irrespective of EPCM permissions. Supporting EPCM permission
>    changes with mprotect() would change this behavior.
>    For example, currently it is possible to have RW enclave
>    memory and support multiple tasks accessing the memory. Two
>    tasks can map the memory RW and later one can run mprotect()
>    to reduce the VMA permissions to read-only without impacting
>    the access of the other task.
>    By moving EPCM permission changes to mprotect() this usage
>    will no longer be supported and current behavior will change.

Your concurrency scenario is somewhat artificial. Obviously you need to
synchronize somehow, and breaking something that could be done with one
system call into two separates is not going to help with that. On the
contrary, it will add a yet one more difficulty layer.

mprotect() controls PTE permissions, not EPCM permissions. It is the corner
stone to do any sort of confidential computing to have this division.
That's why EACCEPT and EACCEPTCOPY exist.

There is no "current behaviour" yet because there is no mainline code, i.e.
that is easy one to address.

> 2) Only half EPCM permission management:
>    Moving to mprotect() as a way to set EPCM permissions is
>    not a clear interface for EPCM permission management because
>    the kernel can only restrict permissions. Even so, the kernel
>    has no insight into the current EPCM permissions and thus whether they
>    actually need to be restricted so every mprotect() call,
>    all except RWX, will need to be treated as a permission
>    restriction with all the implementation obstacles
>    that accompany it (more below).
> 
> There are two possible ways to implement permission restriction
> as triggered by mprotect(), (a) during the mprotect() call or
> (b) during a subsequent #PF (as suggested by you), each has
> its own obstacles.

I would have prefered also for EAUG to bundle it unconditionally to mmap()
flow. I've merely said that I don't care whether it is a part of mprotect()
flow or in the #PF handler, as long as the feature is not uncontrolled
chaos. Probably at least in mprotect() case it is easier flow to implement
it directly as part of mprotect().

Kernel is not the most trusted party in the confidential computing
scenarios. It is one of the adversaries. And SGX is designed in the way
that enclave controls EPCMD database and kernel PTEs. By trying to
artificially limit this you don't bring security, other than trying to
block implementing applications based on SGX2.

We can ditch the whole SGX, if the point is that kernel controls what
happens inside enclave. Normal VMAs are much more capable for that purpose,
and kernel has full control over them with e.g. PTEs.

> 
> 3) mprotect() implementation 
> 
>    When the user calls mprotect() the expectation is that the
>    call will either succeed or fail. If the call fails the user
>    expects the system to be unchanged. This is not possible if
>    permission restriction is done as part of mprotect().
> 
>    (a) mprotect() may span multiple VMAs and involves VMA splits
>        that (from what I understand) cannot be undone. SGX memory
>        does not support VMA merges. If any SGX function
>        (EMODPR or ETRACK on any page) done after a VMA split fails
>        then the user will be left with fragmented memory.

Oh well, SGX does not even support syscalls, if we go this level of
arguments. And you are trying to sort this out with even more flakky
interface, rather than stable EPCM reset to read state.

I've been implementing this exact feature lately and only realistic way to
do it without many corner cases is first use the current ioctl to reset the
range to READ in EPCM, and with EMODPE set the appropriate permissions.


>    (b) The EMODPR/ETRACK pair can fail on any of the pages provided
>        by the mprotect() call. If there is a failure then the
>        kernel cannot undo previously executed EMODPR since the kernel
>        cannot run EMODPE. The EPCM permissions are thus left in inconsistent
>        state since some of the pages would have changed EPCM permissions
>        and mprotect() does not have mechanism to communicate
>        partial success.
>        The partial success is needed to communicate to user space
>        (i) which pages need EACCEPT, (ii) which pages need to be
>        in new request (although user space does not have information
>        to help the new request succeed - see below).

It's true but how common is that? Return e.g. -EIO, and run-time will
re-build the enclave. That anyway happens all the time with SGX for
various reasons (e.g. VM migration, S3 and whatnot). It's only important
that you know when this happens.

> 
>    (c) User space runtime has control over management of EPC memory
>        and accurate failure information would help it to do so.
>        Knowing the error code of the EMODPR failure would help
>        user space to take appropriate action. For example, EMODPR
>        can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
>        to learn that it needs to run EACCEPT on that page before
>        the EMODPR can succeed. Alternatively, if it learns that the
>        return is "SGX_EPC_PAGE_CONFLICT" then it could determine
>        that some other part of the runtime attempted an ENCLU 
>        function on that page.
>        It is not possible to provide such detailed errors to user
>        space with mprotect().

Actually user space run-time is also an adversary. Kernel and user
space can e.g. kill the enclave or limit it with PTEs but EPCM is
beyond them *after* initialization. The whole point is to be able
to put e.g. containers to untrusted cloud.
> 
> 
> 4) #PF implementation
> 
>    (a) There is more to restricting permissions than just running
>        ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
>        also initiate the ETRACK flow to ensure that any thread within
>        the enclave is interrupted by sending an IPI to the CPU, 
>        this includes the thread that just triggered the #PF.        
> 
>    (b) Second consideration of the EMODPR and ETRACK flow is that
>        this has a large "blast radius" in that any thread in the
>        enclave needs to be interrupted. #PFs may arrive at any time
>        so setting up a page range where a fault into any page in the
>        page range will trigger enclave exits for all threads is
>        a significant yet random impact. I believe it would be better
>        to update all pages in the range at the same time and in this
>        way contain the impact of this significant EMODPR/ETRACK/IPIs
>        flow.
> 
>    (c) How will the page fault handler know when EMODPR/ETRACK should
>        be run? Consider that the page fault handler can be called
>        significantly later than the mprotect() call and that
>        user space can call EMODPE any time to extend permissions.
>        This implies that EMODPR/ETRACK/IPIs should be run during
>        *every* page fault, irrespective of mprotect().
> 
>    (d) If a page is in pending or modified state then EMODPR will
>        always fail. This is something that needs to be fixed by
>        user space runtime but the page fault will not be able
>        to communicate this.     
> 
> Considering the above, could you please provide clear guidance on
> how you envision permission restriction to be supported by mprotect()?

I'm not specifically driving #PF implementation but because it was so
important for EAUG, I said that I'm fine with #PF based implementation.

Personally, I would do both EAUG and EMODPR as part of mmap() and
mprotect() (e.g. to catch that partial success and return that -EIO)
flow but either works for me. The API is more of a concern than the
internals.

> 
> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-17 22:51                             ` Jarkko Sakkinen
@ 2022-03-18  0:11                               ` Reinette Chatre
  2022-03-20  0:24                                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-18  0:11 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
>>>> Hi Jarkko,
>>>>
>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
>>>>>> the enclave memory without needing to map it.
>>>>>
>>>>> Which is opposite what you do in EAUG. You can also augment pages without
>>>>> needing the map them. Sure you get that capability, but it is quite useless
>>>>> in practice.
>>>>>
>>>>>> I have considered the idea of supporting the permission restriction with
>>>>>> mprotect() but as you can see in this response I did not find it to be
>>>>>> practical.
>>>>>
>>>>> Where is it practical? What is your application? How is it practical to
>>>>> delegate the concurrency management of a split mprotect() to user space?
>>>>> How do we get rid off a useless up-call to the host?
>>>>>
>>>>
>>>> The email you responded to contained many obstacles against using mprotect()
>>>> but you chose to ignore them and snipped them all from your response. Could
>>>> you please address the issues instead of dismissing them? 
>>>
>>> I did read the whole email but did not see anything that would make a case
>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
>>
>> I believe that on its own each obstacle I shared with you is significant enough
>> to not follow that approach. You simply respond that I am just not making a
>> case without acknowledging any obstacle or providing a reason why the obstacles
>> are not valid.
>>
>> To help me understand your view, could you please respond to each of the
>> obstacles I list below and how it is not an issue?
>>
>>
>> 1) ABI change:
>>    mprotect() is currently supported to modify VMA permissions
>>    irrespective of EPCM permissions. Supporting EPCM permission
>>    changes with mprotect() would change this behavior.
>>    For example, currently it is possible to have RW enclave
>>    memory and support multiple tasks accessing the memory. Two
>>    tasks can map the memory RW and later one can run mprotect()
>>    to reduce the VMA permissions to read-only without impacting
>>    the access of the other task.
>>    By moving EPCM permission changes to mprotect() this usage
>>    will no longer be supported and current behavior will change.
> 
> Your concurrency scenario is somewhat artificial. Obviously you need to
> synchronize somehow, and breaking something that could be done with one
> system call into two separates is not going to help with that. On the
> contrary, it will add a yet one more difficulty layer.

This is about supporting multiple threads in a single enclave, they can
all have their own memory mappings based on the needs. This is currently
supported in mainline as part of SGX1.

> 
> mprotect() controls PTE permissions, not EPCM permissions. It is the corner
> stone to do any sort of confidential computing to have this division.
> That's why EACCEPT and EACCEPTCOPY exist.

Right, mprotect() controls PTE permissions but now you are requesting it
to control EPCM permissions also. 

There is only one permission field in the mprotect() API so this implies
that you request VMA and EPCM permissions to be in sync. This is new
behavior - different from the current mainline behavior.

> 
> There is no "current behaviour" yet because there is no mainline code, i.e.
> that is easy one to address.

What I described is the current behavior in mainline code. It is the
current SGX1 behavior. Running an environment as I described on a SGX2
system with the mprotect() behavior you propose will see new behavior
with some threads encountering page faults with SGX error
code when it could run without issue on SGX1 system.

I do consider this an ABI change. It should be addressed
before using mprotect() for EPCM permissions can be considered.

Please do provide your opinion about the ABI change.

>> 2) Only half EPCM permission management:
>>    Moving to mprotect() as a way to set EPCM permissions is
>>    not a clear interface for EPCM permission management because
>>    the kernel can only restrict permissions. Even so, the kernel
>>    has no insight into the current EPCM permissions and thus whether they
>>    actually need to be restricted so every mprotect() call,
>>    all except RWX, will need to be treated as a permission
>>    restriction with all the implementation obstacles
>>    that accompany it (more below).
>>
>> There are two possible ways to implement permission restriction
>> as triggered by mprotect(), (a) during the mprotect() call or
>> (b) during a subsequent #PF (as suggested by you), each has
>> its own obstacles.
> 
> I would have prefered also for EAUG to bundle it unconditionally to mmap()
> flow. I've merely said that I don't care whether it is a part of mprotect()
> flow or in the #PF handler, as long as the feature is not uncontrolled
> chaos. Probably at least in mprotect() case it is easier flow to implement
> it directly as part of mprotect().
> 
> Kernel is not the most trusted party in the confidential computing
> scenarios. It is one of the adversaries. And SGX is designed in the way
> that enclave controls EPCMD database and kernel PTEs. By trying to
> artificially limit this you don't bring security, other than trying to
> block implementing applications based on SGX2.

I do not follow your argument. How is implementing EPCM permission restriction
with an ioctl() limiting anything? 

> 
> We can ditch the whole SGX, if the point is that kernel controls what
> happens inside enclave. Normal VMAs are much more capable for that purpose,
> and kernel has full control over them with e.g. PTEs.
> 
>>
>> 3) mprotect() implementation 
>>
>>    When the user calls mprotect() the expectation is that the
>>    call will either succeed or fail. If the call fails the user
>>    expects the system to be unchanged. This is not possible if
>>    permission restriction is done as part of mprotect().
>>
>>    (a) mprotect() may span multiple VMAs and involves VMA splits
>>        that (from what I understand) cannot be undone. SGX memory
>>        does not support VMA merges. If any SGX function
>>        (EMODPR or ETRACK on any page) done after a VMA split fails
>>        then the user will be left with fragmented memory.
> 
> Oh well, SGX does not even support syscalls, if we go this level of
> arguments. And you are trying to sort this out with even more flakky
> interface, rather than stable EPCM reset to read state.

I did not find your answer on how to handle this obstacle. Are you
saying that leaving the user with fragmented memory and inconsistent
state is acceptable?

Could you please elaborate? I am trying to understand how to support
this permission restriction with mprotect() and I get stuck on the scenario
where VMAs need to be split - this has to be handled if we go this route.

If it is possible to integrate with mprotect() then I can do so but I
do not see how to do so yet and here I mention one issue and you
again just dismiss it. If we are not able to handle this then it is
indeed mprotect() that will be the "flakky interface" and we should
stick with the ioctl().

 
> I've been implementing this exact feature lately and only realistic way to
> do it without many corner cases is first use the current ioctl to reset the
> range to READ in EPCM, and with EMODPE set the appropriate permissions.

This is supported in the current implementation with the
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl().

> 
> 
>>    (b) The EMODPR/ETRACK pair can fail on any of the pages provided
>>        by the mprotect() call. If there is a failure then the
>>        kernel cannot undo previously executed EMODPR since the kernel
>>        cannot run EMODPE. The EPCM permissions are thus left in inconsistent
>>        state since some of the pages would have changed EPCM permissions
>>        and mprotect() does not have mechanism to communicate
>>        partial success.
>>        The partial success is needed to communicate to user space
>>        (i) which pages need EACCEPT, (ii) which pages need to be
>>        in new request (although user space does not have information
>>        to help the new request succeed - see below).
> 
> It's true but how common is that?

The kernel needs to handle all scenarios, whether it is common or not.

> Return e.g. -EIO, and run-time will
> re-build the enclave. That anyway happens all the time with SGX for
> various reasons (e.g. VM migration, S3 and whatnot). It's only important
> that you know when this happens.

Please confirm: you support a user space implementation using mprotect()
that can leave the system in inconsistent state?


>>    (c) User space runtime has control over management of EPC memory
>>        and accurate failure information would help it to do so.
>>        Knowing the error code of the EMODPR failure would help
>>        user space to take appropriate action. For example, EMODPR
>>        can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
>>        to learn that it needs to run EACCEPT on that page before
>>        the EMODPR can succeed. Alternatively, if it learns that the
>>        return is "SGX_EPC_PAGE_CONFLICT" then it could determine
>>        that some other part of the runtime attempted an ENCLU 
>>        function on that page.
>>        It is not possible to provide such detailed errors to user
>>        space with mprotect().
> 
> Actually user space run-time is also an adversary. Kernel and user
> space can e.g. kill the enclave or limit it with PTEs but EPCM is
> beyond them *after* initialization. The whole point is to be able
> to put e.g. containers to untrusted cloud.

You seem to be saying that while the kernel could help the
runtime to manage the enclave it should not. Is this correct?

There may be scenarios where an enclave could repair itself during runtime,
for example by running EACCEPT on a page that had a PENDING bit set.
This information is provided to the runtime with the
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect()
implementation the kernel cannot provide this information and thus
forces the enclave to be torn down and rebuilt to recover.

Is this (using mprotect()) the kernel implementation you prefer?

>> 4) #PF implementation
>>
>>    (a) There is more to restricting permissions than just running
>>        ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
>>        also initiate the ETRACK flow to ensure that any thread within
>>        the enclave is interrupted by sending an IPI to the CPU, 
>>        this includes the thread that just triggered the #PF.        
>>
>>    (b) Second consideration of the EMODPR and ETRACK flow is that
>>        this has a large "blast radius" in that any thread in the
>>        enclave needs to be interrupted. #PFs may arrive at any time
>>        so setting up a page range where a fault into any page in the
>>        page range will trigger enclave exits for all threads is
>>        a significant yet random impact. I believe it would be better
>>        to update all pages in the range at the same time and in this
>>        way contain the impact of this significant EMODPR/ETRACK/IPIs
>>        flow.
>>
>>    (c) How will the page fault handler know when EMODPR/ETRACK should
>>        be run? Consider that the page fault handler can be called
>>        significantly later than the mprotect() call and that
>>        user space can call EMODPE any time to extend permissions.
>>        This implies that EMODPR/ETRACK/IPIs should be run during
>>        *every* page fault, irrespective of mprotect().
>>
>>    (d) If a page is in pending or modified state then EMODPR will
>>        always fail. This is something that needs to be fixed by
>>        user space runtime but the page fault will not be able
>>        to communicate this.     
>>
>> Considering the above, could you please provide clear guidance on
>> how you envision permission restriction to be supported by mprotect()?
> 
> I'm not specifically driving #PF implementation but because it was so
> important for EAUG, I said that I'm fine with #PF based implementation.
> 
> Personally, I would do both EAUG and EMODPR as part of mmap() and
> mprotect() (e.g. to catch that partial success and return that -EIO)
> flow but either works for me. The API is more of a concern than the
> internals.

Are you now requesting EMODPR as part of mmap() also? Could you
please elaborate how mmap() and mprotect() can handle partial success? 

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-18  0:11                               ` Reinette Chatre
@ 2022-03-20  0:24                                 ` Jarkko Sakkinen
  2022-03-28 23:22                                   ` Reinette Chatre
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-20  0:24 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> > On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> >>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> >>>> Hi Jarkko,
> >>>>
> >>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> >>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> >>>>>> the enclave memory without needing to map it.
> >>>>>
> >>>>> Which is opposite what you do in EAUG. You can also augment pages without
> >>>>> needing the map them. Sure you get that capability, but it is quite useless
> >>>>> in practice.
> >>>>>
> >>>>>> I have considered the idea of supporting the permission restriction with
> >>>>>> mprotect() but as you can see in this response I did not find it to be
> >>>>>> practical.
> >>>>>
> >>>>> Where is it practical? What is your application? How is it practical to
> >>>>> delegate the concurrency management of a split mprotect() to user space?
> >>>>> How do we get rid off a useless up-call to the host?
> >>>>>
> >>>>
> >>>> The email you responded to contained many obstacles against using mprotect()
> >>>> but you chose to ignore them and snipped them all from your response. Could
> >>>> you please address the issues instead of dismissing them? 
> >>>
> >>> I did read the whole email but did not see anything that would make a case
> >>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
> >>
> >> I believe that on its own each obstacle I shared with you is significant enough
> >> to not follow that approach. You simply respond that I am just not making a
> >> case without acknowledging any obstacle or providing a reason why the obstacles
> >> are not valid.
> >>
> >> To help me understand your view, could you please respond to each of the
> >> obstacles I list below and how it is not an issue?
> >>
> >>
> >> 1) ABI change:
> >>    mprotect() is currently supported to modify VMA permissions
> >>    irrespective of EPCM permissions. Supporting EPCM permission
> >>    changes with mprotect() would change this behavior.
> >>    For example, currently it is possible to have RW enclave
> >>    memory and support multiple tasks accessing the memory. Two
> >>    tasks can map the memory RW and later one can run mprotect()
> >>    to reduce the VMA permissions to read-only without impacting
> >>    the access of the other task.
> >>    By moving EPCM permission changes to mprotect() this usage
> >>    will no longer be supported and current behavior will change.
> > 
> > Your concurrency scenario is somewhat artificial. Obviously you need to
> > synchronize somehow, and breaking something that could be done with one
> > system call into two separates is not going to help with that. On the
> > contrary, it will add a yet one more difficulty layer.
> 
> This is about supporting multiple threads in a single enclave, they can
> all have their own memory mappings based on the needs. This is currently
> supported in mainline as part of SGX1.
> 
> > 
> > mprotect() controls PTE permissions, not EPCM permissions. It is the corner
> > stone to do any sort of confidential computing to have this division.
> > That's why EACCEPT and EACCEPTCOPY exist.
> 
> Right, mprotect() controls PTE permissions but now you are requesting it
> to control EPCM permissions also. 
> 
> There is only one permission field in the mprotect() API so this implies
> that you request VMA and EPCM permissions to be in sync. This is new
> behavior - different from the current mainline behavior.

Not true. mprotect() should do EPCM reset by fixed PROT_READ for EMODPR.
Then enclave can use EMODPE to set the permissions.

> 
> > 
> > There is no "current behaviour" yet because there is no mainline code, i.e.
> > that is easy one to address.
> 
> What I described is the current behavior in mainline code. It is the
> current SGX1 behavior. Running an environment as I described on a SGX2
> system with the mprotect() behavior you propose will see new behavior
> with some threads encountering page faults with SGX error
> code when it could run without issue on SGX1 system.
> 
> I do consider this an ABI change. It should be addressed
> before using mprotect() for EPCM permissions can be considered.
> 
> Please do provide your opinion about the ABI change.

With SGX1 there's no meaningful use for mprotect() after EINIT. This
would be of course applicable after EINIT, not before. We have a flag
to check whether enclave has been initialized.

> 
> >> 2) Only half EPCM permission management:
> >>    Moving to mprotect() as a way to set EPCM permissions is
> >>    not a clear interface for EPCM permission management because
> >>    the kernel can only restrict permissions. Even so, the kernel
> >>    has no insight into the current EPCM permissions and thus whether they
> >>    actually need to be restricted so every mprotect() call,
> >>    all except RWX, will need to be treated as a permission
> >>    restriction with all the implementation obstacles
> >>    that accompany it (more below).
> >>
> >> There are two possible ways to implement permission restriction
> >> as triggered by mprotect(), (a) during the mprotect() call or
> >> (b) during a subsequent #PF (as suggested by you), each has
> >> its own obstacles.
> > 
> > I would have prefered also for EAUG to bundle it unconditionally to mmap()
> > flow. I've merely said that I don't care whether it is a part of mprotect()
> > flow or in the #PF handler, as long as the feature is not uncontrolled
> > chaos. Probably at least in mprotect() case it is easier flow to implement
> > it directly as part of mprotect().
> > 
> > Kernel is not the most trusted party in the confidential computing
> > scenarios. It is one of the adversaries. And SGX is designed in the way
> > that enclave controls EPCMD database and kernel PTEs. By trying to
> > artificially limit this you don't bring security, other than trying to
> > block implementing applications based on SGX2.
> 
> I do not follow your argument. How is implementing EPCM permission restriction
> with an ioctl() limiting anything? 

If you use minimal permissions with EMODPR, it gives freedom for EMODPE
to use like it was EMODP, which is great.

> 
> > 
> > We can ditch the whole SGX, if the point is that kernel controls what
> > happens inside enclave. Normal VMAs are much more capable for that purpose,
> > and kernel has full control over them with e.g. PTEs.
> > 
> >>
> >> 3) mprotect() implementation 
> >>
> >>    When the user calls mprotect() the expectation is that the
> >>    call will either succeed or fail. If the call fails the user
> >>    expects the system to be unchanged. This is not possible if
> >>    permission restriction is done as part of mprotect().
> >>
> >>    (a) mprotect() may span multiple VMAs and involves VMA splits
> >>        that (from what I understand) cannot be undone. SGX memory
> >>        does not support VMA merges. If any SGX function
> >>        (EMODPR or ETRACK on any page) done after a VMA split fails
> >>        then the user will be left with fragmented memory.
> > 
> > Oh well, SGX does not even support syscalls, if we go this level of
> > arguments. And you are trying to sort this out with even more flakky
> > interface, rather than stable EPCM reset to read state.
> 
> I did not find your answer on how to handle this obstacle. Are you
> saying that leaving the user with fragmented memory and inconsistent
> state is acceptable?
> 
> Could you please elaborate? I am trying to understand how to support
> this permission restriction with mprotect() and I get stuck on the scenario
> where VMAs need to be split - this has to be handled if we go this route.
> 
> If it is possible to integrate with mprotect() then I can do so but I
> do not see how to do so yet and here I mention one issue and you
> again just dismiss it. If we are not able to handle this then it is
> indeed mprotect() that will be the "flakky interface" and we should
> stick with the ioctl().

It's flakky because you have to pair every single mprotect() with
ioctl() that is unconditionally set to PROT_READ. Also it is concurrency
wise worse because mprotect() can do both with mmap_sem held. It adds
an extra useless round trip to the kernel.

> 
>  
> > I've been implementing this exact feature lately and only realistic way to
> > do it without many corner cases is first use the current ioctl to reset the
> > range to READ in EPCM, and with EMODPE set the appropriate permissions.
> 
> This is supported in the current implementation with the
> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl().
> 
> > 
> > 
> >>    (b) The EMODPR/ETRACK pair can fail on any of the pages provided
> >>        by the mprotect() call. If there is a failure then the
> >>        kernel cannot undo previously executed EMODPR since the kernel
> >>        cannot run EMODPE. The EPCM permissions are thus left in inconsistent
> >>        state since some of the pages would have changed EPCM permissions
> >>        and mprotect() does not have mechanism to communicate
> >>        partial success.
> >>        The partial success is needed to communicate to user space
> >>        (i) which pages need EACCEPT, (ii) which pages need to be
> >>        in new request (although user space does not have information
> >>        to help the new request succeed - see below).
> > 
> > It's true but how common is that?
> 
> The kernel needs to handle all scenarios, whether it is common or not.

This is not true. Kernel needs to provide meaningful interface to the
hardware that does not user space to do stupid things. We do not provide
1:1 inteface to every single hardware interface. Allowing to use EMODPE
actually does provide full control of the permissions. That should be
enough.

> 
> > Return e.g. -EIO, and run-time will
> > re-build the enclave. That anyway happens all the time with SGX for
> > various reasons (e.g. VM migration, S3 and whatnot). It's only important
> > that you know when this happens.
> 
> Please confirm: you support a user space implementation using mprotect()
> that can leave the system in inconsistent state?

It actually does not leave kernel structures to incosistent state so it's
all fine. Partial success is almost inexistent unless there is actual bug
in the run-time. It's same as with files, sockets etc. If partial success
happens, user space is probably already in incosistent state.

I'm not sure how "system" is defined here so I cannot give definitive a
yes/no answer.

User space kicking itself to foot is not something that kernel usually
has to take extra measures for.

> 
> 
> >>    (c) User space runtime has control over management of EPC memory
> >>        and accurate failure information would help it to do so.
> >>        Knowing the error code of the EMODPR failure would help
> >>        user space to take appropriate action. For example, EMODPR
> >>        can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
> >>        to learn that it needs to run EACCEPT on that page before
> >>        the EMODPR can succeed. Alternatively, if it learns that the
> >>        return is "SGX_EPC_PAGE_CONFLICT" then it could determine
> >>        that some other part of the runtime attempted an ENCLU 
> >>        function on that page.
> >>        It is not possible to provide such detailed errors to user
> >>        space with mprotect().
> > 
> > Actually user space run-time is also an adversary. Kernel and user
> > space can e.g. kill the enclave or limit it with PTEs but EPCM is
> > beyond them *after* initialization. The whole point is to be able
> > to put e.g. containers to untrusted cloud.
> 
> You seem to be saying that while the kernel could help the
> runtime to manage the enclave it should not. Is this correct?
> 
> There may be scenarios where an enclave could repair itself during runtime,
> for example by running EACCEPT on a page that had a PENDING bit set.
> This information is provided to the runtime with the
> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect()
> implementation the kernel cannot provide this information and thus
> forces the enclave to be torn down and rebuilt to recover.
> 
> Is this (using mprotect()) the kernel implementation you prefer?

If there is partial success it's a bug, not a legit scenario for well
behaving run-time.

> 
> >> 4) #PF implementation
> >>
> >>    (a) There is more to restricting permissions than just running
> >>        ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
> >>        also initiate the ETRACK flow to ensure that any thread within
> >>        the enclave is interrupted by sending an IPI to the CPU, 
> >>        this includes the thread that just triggered the #PF.        
> >>
> >>    (b) Second consideration of the EMODPR and ETRACK flow is that
> >>        this has a large "blast radius" in that any thread in the
> >>        enclave needs to be interrupted. #PFs may arrive at any time
> >>        so setting up a page range where a fault into any page in the
> >>        page range will trigger enclave exits for all threads is
> >>        a significant yet random impact. I believe it would be better
> >>        to update all pages in the range at the same time and in this
> >>        way contain the impact of this significant EMODPR/ETRACK/IPIs
> >>        flow.
> >>
> >>    (c) How will the page fault handler know when EMODPR/ETRACK should
> >>        be run? Consider that the page fault handler can be called
> >>        significantly later than the mprotect() call and that
> >>        user space can call EMODPE any time to extend permissions.
> >>        This implies that EMODPR/ETRACK/IPIs should be run during
> >>        *every* page fault, irrespective of mprotect().
> >>
> >>    (d) If a page is in pending or modified state then EMODPR will
> >>        always fail. This is something that needs to be fixed by
> >>        user space runtime but the page fault will not be able
> >>        to communicate this.     
> >>
> >> Considering the above, could you please provide clear guidance on
> >> how you envision permission restriction to be supported by mprotect()?
> > 
> > I'm not specifically driving #PF implementation but because it was so
> > important for EAUG, I said that I'm fine with #PF based implementation.
> > 
> > Personally, I would do both EAUG and EMODPR as part of mmap() and
> > mprotect() (e.g. to catch that partial success and return that -EIO)
> > flow but either works for me. The API is more of a concern than the
> > internals.
> 
> Are you now requesting EMODPR as part of mmap() also? Could you
> please elaborate how mmap() and mprotect() can handle partial success? 

Nope, I was just referring that EAUG is #PF based but could have been
also been implemented as part of mmap() flow. API wise it is symmetrical.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-20  0:24                                 ` Jarkko Sakkinen
@ 2022-03-28 23:22                                   ` Reinette Chatre
  2022-03-30 15:00                                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Reinette Chatre @ 2022-03-28 23:22 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

Hi Jarkko,

On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote:
> On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
>>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
>>>> Hi Jarkko,
>>>>
>>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
>>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
>>>>>> Hi Jarkko,
>>>>>>
>>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
>>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
>>>>>>>> the enclave memory without needing to map it.
>>>>>>>
>>>>>>> Which is opposite what you do in EAUG. You can also augment pages without
>>>>>>> needing the map them. Sure you get that capability, but it is quite useless
>>>>>>> in practice.
>>>>>>>
>>>>>>>> I have considered the idea of supporting the permission restriction with
>>>>>>>> mprotect() but as you can see in this response I did not find it to be
>>>>>>>> practical.
>>>>>>>
>>>>>>> Where is it practical? What is your application? How is it practical to
>>>>>>> delegate the concurrency management of a split mprotect() to user space?
>>>>>>> How do we get rid off a useless up-call to the host?
>>>>>>>
>>>>>>
>>>>>> The email you responded to contained many obstacles against using mprotect()
>>>>>> but you chose to ignore them and snipped them all from your response. Could
>>>>>> you please address the issues instead of dismissing them? 
>>>>>
>>>>> I did read the whole email but did not see anything that would make a case
>>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
>>>>
>>>> I believe that on its own each obstacle I shared with you is significant enough
>>>> to not follow that approach. You simply respond that I am just not making a
>>>> case without acknowledging any obstacle or providing a reason why the obstacles
>>>> are not valid.
>>>>
>>>> To help me understand your view, could you please respond to each of the
>>>> obstacles I list below and how it is not an issue?
>>>>
>>>>
>>>> 1) ABI change:
>>>>    mprotect() is currently supported to modify VMA permissions
>>>>    irrespective of EPCM permissions. Supporting EPCM permission
>>>>    changes with mprotect() would change this behavior.
>>>>    For example, currently it is possible to have RW enclave
>>>>    memory and support multiple tasks accessing the memory. Two
>>>>    tasks can map the memory RW and later one can run mprotect()
>>>>    to reduce the VMA permissions to read-only without impacting
>>>>    the access of the other task.
>>>>    By moving EPCM permission changes to mprotect() this usage
>>>>    will no longer be supported and current behavior will change.
>>>
>>> Your concurrency scenario is somewhat artificial. Obviously you need to
>>> synchronize somehow, and breaking something that could be done with one
>>> system call into two separates is not going to help with that. On the
>>> contrary, it will add a yet one more difficulty layer.
>>
>> This is about supporting multiple threads in a single enclave, they can
>> all have their own memory mappings based on the needs. This is currently
>> supported in mainline as part of SGX1.


Could you please comment on the above?

>>
>>>
>>> mprotect() controls PTE permissions, not EPCM permissions. It is the corner
>>> stone to do any sort of confidential computing to have this division.
>>> That's why EACCEPT and EACCEPTCOPY exist.
>>
>> Right, mprotect() controls PTE permissions but now you are requesting it
>> to control EPCM permissions also. 
>>
>> There is only one permission field in the mprotect() API so this implies
>> that you request VMA and EPCM permissions to be in sync. This is new
>> behavior - different from the current mainline behavior.
> 
> Not true. mprotect() should do EPCM reset by fixed PROT_READ for EMODPR.
> Then enclave can use EMODPE to set the permissions.

I think that I am starting to decipher what your vision is. If I understand
correctly mprotect() would serve a double purpose:
a) modify VMA permissions exactly as is done in SGX1 (no consideration of EPCM
   permissions and only limitation is that VMA permissions are not allowed to
   exceed vm_max_prot_bits)
b) EPCM permissions are _always_ restricted to PROT_READ irrespective of
   VMA permissions requested (new)

Is this correct?

With mprotect() always resetting EPCM to be PROT_READ there is no new sync
between VMA and EPCM permissions.

>>> There is no "current behaviour" yet because there is no mainline code, i.e.
>>> that is easy one to address.
>>
>> What I described is the current behavior in mainline code. It is the
>> current SGX1 behavior. Running an environment as I described on a SGX2
>> system with the mprotect() behavior you propose will see new behavior
>> with some threads encountering page faults with SGX error
>> code when it could run without issue on SGX1 system.
>>
>> I do consider this an ABI change. It should be addressed
>> before using mprotect() for EPCM permissions can be considered.
>>
>> Please do provide your opinion about the ABI change.
> 
> With SGX1 there's no meaningful use for mprotect() after EINIT. This
> would be of course applicable after EINIT, not before. We have a flag
> to check whether enclave has been initialized.

I interpret your comment to mean that the ABI change is acceptable since
existing usages of mprotect() after EINIT are not meaningful.

>>>> 2) Only half EPCM permission management:
>>>>    Moving to mprotect() as a way to set EPCM permissions is
>>>>    not a clear interface for EPCM permission management because
>>>>    the kernel can only restrict permissions. Even so, the kernel
>>>>    has no insight into the current EPCM permissions and thus whether they
>>>>    actually need to be restricted so every mprotect() call,
>>>>    all except RWX, will need to be treated as a permission
>>>>    restriction with all the implementation obstacles
>>>>    that accompany it (more below).
>>>>
>>>> There are two possible ways to implement permission restriction
>>>> as triggered by mprotect(), (a) during the mprotect() call or
>>>> (b) during a subsequent #PF (as suggested by you), each has
>>>> its own obstacles.
>>>
>>> I would have prefered also for EAUG to bundle it unconditionally to mmap()
>>> flow. I've merely said that I don't care whether it is a part of mprotect()
>>> flow or in the #PF handler, as long as the feature is not uncontrolled
>>> chaos. Probably at least in mprotect() case it is easier flow to implement
>>> it directly as part of mprotect().
>>>
>>> Kernel is not the most trusted party in the confidential computing
>>> scenarios. It is one of the adversaries. And SGX is designed in the way
>>> that enclave controls EPCMD database and kernel PTEs. By trying to
>>> artificially limit this you don't bring security, other than trying to
>>> block implementing applications based on SGX2.
>>
>> I do not follow your argument. How is implementing EPCM permission restriction
>> with an ioctl() limiting anything? 
> 
> If you use minimal permissions with EMODPR, it gives freedom for EMODPE
> to use like it was EMODP, which is great.

Understood.

> 
>>
>>>
>>> We can ditch the whole SGX, if the point is that kernel controls what
>>> happens inside enclave. Normal VMAs are much more capable for that purpose,
>>> and kernel has full control over them with e.g. PTEs.
>>>
>>>>
>>>> 3) mprotect() implementation 
>>>>
>>>>    When the user calls mprotect() the expectation is that the
>>>>    call will either succeed or fail. If the call fails the user
>>>>    expects the system to be unchanged. This is not possible if
>>>>    permission restriction is done as part of mprotect().
>>>>
>>>>    (a) mprotect() may span multiple VMAs and involves VMA splits
>>>>        that (from what I understand) cannot be undone. SGX memory
>>>>        does not support VMA merges. If any SGX function
>>>>        (EMODPR or ETRACK on any page) done after a VMA split fails
>>>>        then the user will be left with fragmented memory.
>>>
>>> Oh well, SGX does not even support syscalls, if we go this level of
>>> arguments. And you are trying to sort this out with even more flakky
>>> interface, rather than stable EPCM reset to read state.
>>
>> I did not find your answer on how to handle this obstacle. Are you
>> saying that leaving the user with fragmented memory and inconsistent
>> state is acceptable?
>>
>> Could you please elaborate? I am trying to understand how to support
>> this permission restriction with mprotect() and I get stuck on the scenario
>> where VMAs need to be split - this has to be handled if we go this route.
>>
>> If it is possible to integrate with mprotect() then I can do so but I
>> do not see how to do so yet and here I mention one issue and you
>> again just dismiss it. If we are not able to handle this then it is
>> indeed mprotect() that will be the "flakky interface" and we should
>> stick with the ioctl().
> 
> It's flakky because you have to pair every single mprotect() with
> ioctl() that is unconditionally set to PROT_READ. Also it is concurrency
> wise worse because mprotect() can do both with mmap_sem held. It adds
> an extra useless round trip to the kernel.

This still does not address my concern regarding possible fragmented memory.
Are you considering fragmented memory to be in the same category as the
inconsistent state mentioned below? (That it is a consequence of a bug in
the run-time?)

>>> I've been implementing this exact feature lately and only realistic way to
>>> do it without many corner cases is first use the current ioctl to reset the
>>> range to READ in EPCM, and with EMODPE set the appropriate permissions.
>>
>> This is supported in the current implementation with the
>> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl().
>>
>>>
>>>
>>>>    (b) The EMODPR/ETRACK pair can fail on any of the pages provided
>>>>        by the mprotect() call. If there is a failure then the
>>>>        kernel cannot undo previously executed EMODPR since the kernel
>>>>        cannot run EMODPE. The EPCM permissions are thus left in inconsistent
>>>>        state since some of the pages would have changed EPCM permissions
>>>>        and mprotect() does not have mechanism to communicate
>>>>        partial success.
>>>>        The partial success is needed to communicate to user space
>>>>        (i) which pages need EACCEPT, (ii) which pages need to be
>>>>        in new request (although user space does not have information
>>>>        to help the new request succeed - see below).
>>>
>>> It's true but how common is that?
>>
>> The kernel needs to handle all scenarios, whether it is common or not.
> 
> This is not true. Kernel needs to provide meaningful interface to the
> hardware that does not user space to do stupid things. We do not provide
> 1:1 inteface to every single hardware interface. Allowing to use EMODPE
> actually does provide full control of the permissions. That should be
> enough.

I was not proposing that the kernel "provides a 1:1 interface for every single
hardware interface". 

My comment was that the kernel needs to handle all user space scenarios.

It is possible that an enclave page is in a state where EMODPR can fail
because of something that needs to be fixed from within the enclave or run-time,
for example, clearing a EPCM.PENDING bit. The kernel needs to handle such
scenarios. I understand from your explanations that run-time handling of
such scenarios are not a goal or requirement but instead should always
require enclave re-build.

>>> Return e.g. -EIO, and run-time will
>>> re-build the enclave. That anyway happens all the time with SGX for
>>> various reasons (e.g. VM migration, S3 and whatnot). It's only important
>>> that you know when this happens.
>>
>> Please confirm: you support a user space implementation using mprotect()
>> that can leave the system in inconsistent state?
> 
> It actually does not leave kernel structures to incosistent state so it's
> all fine. Partial success is almost inexistent unless there is actual bug
> in the run-time. It's same as with files, sockets etc. If partial success
> happens, user space is probably already in incosistent state.
> 
> I'm not sure how "system" is defined here so I cannot give definitive a
> yes/no answer.
> 
> User space kicking itself to foot is not something that kernel usually
> has to take extra measures for.

I am not against allowing user space kicking itself. I was of the opinion
that it would be helpful if the kernel can provide information to user space to
salvage itself instead of always forcing it to re-build. You make it clear
here and below that this is not a goal or requirement.

>>>>    (c) User space runtime has control over management of EPC memory
>>>>        and accurate failure information would help it to do so.
>>>>        Knowing the error code of the EMODPR failure would help
>>>>        user space to take appropriate action. For example, EMODPR
>>>>        can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
>>>>        to learn that it needs to run EACCEPT on that page before
>>>>        the EMODPR can succeed. Alternatively, if it learns that the
>>>>        return is "SGX_EPC_PAGE_CONFLICT" then it could determine
>>>>        that some other part of the runtime attempted an ENCLU 
>>>>        function on that page.
>>>>        It is not possible to provide such detailed errors to user
>>>>        space with mprotect().
>>>
>>> Actually user space run-time is also an adversary. Kernel and user
>>> space can e.g. kill the enclave or limit it with PTEs but EPCM is
>>> beyond them *after* initialization. The whole point is to be able
>>> to put e.g. containers to untrusted cloud.
>>
>> You seem to be saying that while the kernel could help the
>> runtime to manage the enclave it should not. Is this correct?
>>
>> There may be scenarios where an enclave could repair itself during runtime,
>> for example by running EACCEPT on a page that had a PENDING bit set.
>> This information is provided to the runtime with the
>> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect()
>> implementation the kernel cannot provide this information and thus
>> forces the enclave to be torn down and rebuilt to recover.
>>
>> Is this (using mprotect()) the kernel implementation you prefer?
> 
> If there is partial success it's a bug, not a legit scenario for well
> behaving run-time.

ok

Reinette


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-28 23:22                                   ` Reinette Chatre
@ 2022-03-30 15:00                                     ` Jarkko Sakkinen
  2022-03-30 15:02                                       ` Jarkko Sakkinen
  0 siblings, 1 reply; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-30 15:00 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Mon, Mar 28, 2022 at 04:22:35PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote:
> > On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> >>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> >>>> Hi Jarkko,
> >>>>
> >>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> >>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> >>>>>> Hi Jarkko,
> >>>>>>
> >>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> >>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> >>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> >>>>>>>> the enclave memory without needing to map it.
> >>>>>>>
> >>>>>>> Which is opposite what you do in EAUG. You can also augment pages without
> >>>>>>> needing the map them. Sure you get that capability, but it is quite useless
> >>>>>>> in practice.
> >>>>>>>
> >>>>>>>> I have considered the idea of supporting the permission restriction with
> >>>>>>>> mprotect() but as you can see in this response I did not find it to be
> >>>>>>>> practical.
> >>>>>>>
> >>>>>>> Where is it practical? What is your application? How is it practical to
> >>>>>>> delegate the concurrency management of a split mprotect() to user space?
> >>>>>>> How do we get rid off a useless up-call to the host?
> >>>>>>>
> >>>>>>
> >>>>>> The email you responded to contained many obstacles against using mprotect()
> >>>>>> but you chose to ignore them and snipped them all from your response. Could
> >>>>>> you please address the issues instead of dismissing them? 
> >>>>>
> >>>>> I did read the whole email but did not see anything that would make a case
> >>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
> >>>>
> >>>> I believe that on its own each obstacle I shared with you is significant enough
> >>>> to not follow that approach. You simply respond that I am just not making a
> >>>> case without acknowledging any obstacle or providing a reason why the obstacles
> >>>> are not valid.
> >>>>
> >>>> To help me understand your view, could you please respond to each of the
> >>>> obstacles I list below and how it is not an issue?
> >>>>
> >>>>
> >>>> 1) ABI change:
> >>>>    mprotect() is currently supported to modify VMA permissions
> >>>>    irrespective of EPCM permissions. Supporting EPCM permission
> >>>>    changes with mprotect() would change this behavior.
> >>>>    For example, currently it is possible to have RW enclave
> >>>>    memory and support multiple tasks accessing the memory. Two
> >>>>    tasks can map the memory RW and later one can run mprotect()
> >>>>    to reduce the VMA permissions to read-only without impacting
> >>>>    the access of the other task.
> >>>>    By moving EPCM permission changes to mprotect() this usage
> >>>>    will no longer be supported and current behavior will change.
> >>>
> >>> Your concurrency scenario is somewhat artificial. Obviously you need to
> >>> synchronize somehow, and breaking something that could be done with one
> >>> system call into two separates is not going to help with that. On the
> >>> contrary, it will add a yet one more difficulty layer.
> >>
> >> This is about supporting multiple threads in a single enclave, they can
> >> all have their own memory mappings based on the needs. This is currently
> >> supported in mainline as part of SGX1.
> 
> 
> Could you please comment on the above?


I've probably spent probably over two weeks of my life addressing concerns
to the point that I feel as I was implementing this feature (that could be
faster way to get it done).

So I'll just wait the next version and see how it is like and give my
feedback based on that. It's not really my problem to address every
possible concern.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page permissions
  2022-03-30 15:00                                     ` Jarkko Sakkinen
@ 2022-03-30 15:02                                       ` Jarkko Sakkinen
  0 siblings, 0 replies; 130+ messages in thread
From: Jarkko Sakkinen @ 2022-03-30 15:02 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Haitao Huang, Dhanraj, Vijay, dave.hansen, tglx, bp, Lutomirski,
	Andy, mingo, linux-sgx, x86, Christopherson,,
	Sean, Huang, Kai, Zhang, Cathy, Xing, Cedric, Huang, Haitao,
	Shanahan, Mark, hpa, linux-kernel

On Wed, Mar 30, 2022 at 06:00:30PM +0300, Jarkko Sakkinen wrote:
> On Mon, Mar 28, 2022 at 04:22:35PM -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote:
> > > On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote:
> > >> Hi Jarkko,
> > >>
> > >> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote:
> > >>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote:
> > >>>> Hi Jarkko,
> > >>>>
> > >>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> > >>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
> > >>>>>> Hi Jarkko,
> > >>>>>>
> > >>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
> > >>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
> > >>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
> > >>>>>>>> the enclave memory without needing to map it.
> > >>>>>>>
> > >>>>>>> Which is opposite what you do in EAUG. You can also augment pages without
> > >>>>>>> needing the map them. Sure you get that capability, but it is quite useless
> > >>>>>>> in practice.
> > >>>>>>>
> > >>>>>>>> I have considered the idea of supporting the permission restriction with
> > >>>>>>>> mprotect() but as you can see in this response I did not find it to be
> > >>>>>>>> practical.
> > >>>>>>>
> > >>>>>>> Where is it practical? What is your application? How is it practical to
> > >>>>>>> delegate the concurrency management of a split mprotect() to user space?
> > >>>>>>> How do we get rid off a useless up-call to the host?
> > >>>>>>>
> > >>>>>>
> > >>>>>> The email you responded to contained many obstacles against using mprotect()
> > >>>>>> but you chose to ignore them and snipped them all from your response. Could
> > >>>>>> you please address the issues instead of dismissing them? 
> > >>>>>
> > >>>>> I did read the whole email but did not see anything that would make a case
> > >>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
> > >>>>
> > >>>> I believe that on its own each obstacle I shared with you is significant enough
> > >>>> to not follow that approach. You simply respond that I am just not making a
> > >>>> case without acknowledging any obstacle or providing a reason why the obstacles
> > >>>> are not valid.
> > >>>>
> > >>>> To help me understand your view, could you please respond to each of the
> > >>>> obstacles I list below and how it is not an issue?
> > >>>>
> > >>>>
> > >>>> 1) ABI change:
> > >>>>    mprotect() is currently supported to modify VMA permissions
> > >>>>    irrespective of EPCM permissions. Supporting EPCM permission
> > >>>>    changes with mprotect() would change this behavior.
> > >>>>    For example, currently it is possible to have RW enclave
> > >>>>    memory and support multiple tasks accessing the memory. Two
> > >>>>    tasks can map the memory RW and later one can run mprotect()
> > >>>>    to reduce the VMA permissions to read-only without impacting
> > >>>>    the access of the other task.
> > >>>>    By moving EPCM permission changes to mprotect() this usage
> > >>>>    will no longer be supported and current behavior will change.
> > >>>
> > >>> Your concurrency scenario is somewhat artificial. Obviously you need to
> > >>> synchronize somehow, and breaking something that could be done with one
> > >>> system call into two separates is not going to help with that. On the
> > >>> contrary, it will add a yet one more difficulty layer.
> > >>
> > >> This is about supporting multiple threads in a single enclave, they can
> > >> all have their own memory mappings based on the needs. This is currently
> > >> supported in mainline as part of SGX1.
> > 
> > 
> > Could you please comment on the above?
> 
> 
> I've probably spent probably over two weeks of my life addressing concerns
> to the point that I feel as I was implementing this feature (that could be
> faster way to get it done).
> 
> So I'll just wait the next version and see how it is like and give my
> feedback based on that. It's not really my problem to address every
> possible concern.

Once v3 is out, I'll check what I think is right, and what is wrong
and might send some fixups and see where that leads to. I think it
is more costructive way to move forward. Repeating same arguments
leads to nowhere.

BR, Jarkko

^ permalink raw reply	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2022-03-30 15:03 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08  0:45 [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 01/32] x86/sgx: Add short descriptions to ENCLS wrappers Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 02/32] x86/sgx: Add wrapper for SGX2 EMODPR function Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 03/32] x86/sgx: Add wrapper for SGX2 EMODT function Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 04/32] x86/sgx: Add wrapper for SGX2 EAUG function Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 05/32] Documentation/x86: Document SGX permission details Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 06/32] x86/sgx: Support VMA permissions more relaxed than enclave permissions Reinette Chatre
2022-03-07 17:10   ` Jarkko Sakkinen
2022-03-07 17:36     ` Reinette Chatre
2022-03-08  8:14       ` Jarkko Sakkinen
2022-03-08  9:06         ` Jarkko Sakkinen
2022-03-08  9:12           ` Jarkko Sakkinen
2022-03-08 16:04             ` Reinette Chatre
2022-03-08 17:00               ` Jarkko Sakkinen
2022-03-08 17:49                 ` Reinette Chatre
2022-03-08 18:46                   ` Jarkko Sakkinen
2022-03-11 11:06                 ` Dr. Greg
2022-02-08  0:45 ` [PATCH V2 07/32] x86/sgx: Add pfn_mkwrite() handler for present PTEs Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes Reinette Chatre
2022-03-04  8:55   ` Jarkko Sakkinen
2022-03-04 19:19     ` Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 09/32] x86/sgx: Export sgx_encl_ewb_cpumask() Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 10/32] x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask() Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 11/32] x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes() Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 12/32] x86/sgx: Make sgx_ipi_cb() available internally Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 13/32] x86/sgx: Create utility to validate user provided offset and length Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 14/32] x86/sgx: Keep record of SGX page type Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 15/32] x86/sgx: Support relaxing of enclave page permissions Reinette Chatre
2022-03-04  8:59   ` Jarkko Sakkinen
2022-02-08  0:45 ` [PATCH V2 16/32] x86/sgx: Support restricting " Reinette Chatre
2022-02-21  0:49   ` Jarkko Sakkinen
2022-02-22 18:35     ` Reinette Chatre
2022-02-23 15:46       ` Jarkko Sakkinen
2022-02-23 19:55         ` Reinette Chatre
2022-02-28 12:27           ` Jarkko Sakkinen
2022-02-23 19:21     ` Dhanraj, Vijay
2022-02-23 22:42       ` Reinette Chatre
2022-02-28 12:24       ` Jarkko Sakkinen
2022-02-28 13:19         ` Jarkko Sakkinen
2022-02-28 15:16         ` Dave Hansen
2022-02-28 17:44           ` Dhanraj, Vijay
2022-03-01 13:26           ` Jarkko Sakkinen
2022-03-01 13:42             ` Jarkko Sakkinen
2022-03-01 17:48               ` Reinette Chatre
2022-03-02  2:05                 ` Jarkko Sakkinen
2022-03-02  2:11                   ` Jarkko Sakkinen
2022-03-02  4:03                     ` Jarkko Sakkinen
2022-03-02 22:57                   ` Reinette Chatre
2022-03-03 16:08                     ` Haitao Huang
2022-03-03 21:23                       ` Reinette Chatre
2022-03-03 21:44                         ` Dave Hansen
2022-03-05  3:19                           ` Jarkko Sakkinen
2022-03-06  0:15                             ` Jarkko Sakkinen
2022-03-06  0:25                               ` Jarkko Sakkinen
2022-03-10  5:43                           ` Jarkko Sakkinen
2022-03-10  5:59                             ` Jarkko Sakkinen
2022-03-03 23:18                       ` Jarkko Sakkinen
2022-03-04  4:03                         ` Haitao Huang
2022-03-04  8:30                           ` Jarkko Sakkinen
2022-03-04 15:51                             ` Haitao Huang
2022-03-05  1:02                               ` Jarkko Sakkinen
2022-03-06 14:24                                 ` Haitao Huang
2022-03-03 23:12                     ` Jarkko Sakkinen
2022-03-04  0:48                       ` Reinette Chatre
2022-03-10  6:10       ` Jarkko Sakkinen
2022-03-10 18:33         ` Haitao Huang
2022-03-11 12:10           ` Jarkko Sakkinen
2022-03-11 12:16             ` Jarkko Sakkinen
2022-03-11 12:33               ` Jarkko Sakkinen
2022-03-11 17:53               ` Reinette Chatre
2022-03-11 18:11                 ` Jarkko Sakkinen
2022-03-11 19:28                   ` Reinette Chatre
2022-03-14  3:42                     ` Jarkko Sakkinen
2022-03-14  3:45                       ` Jarkko Sakkinen
2022-03-14  3:54                         ` Jarkko Sakkinen
2022-03-14 15:32                       ` Reinette Chatre
2022-03-17  4:30                         ` Jarkko Sakkinen
2022-03-17 22:08                           ` Reinette Chatre
2022-03-17 22:51                             ` Jarkko Sakkinen
2022-03-18  0:11                               ` Reinette Chatre
2022-03-20  0:24                                 ` Jarkko Sakkinen
2022-03-28 23:22                                   ` Reinette Chatre
2022-03-30 15:00                                     ` Jarkko Sakkinen
2022-03-30 15:02                                       ` Jarkko Sakkinen
2022-03-14  2:49                 ` Jarkko Sakkinen
2022-03-14  2:50                   ` Jarkko Sakkinen
2022-03-14  2:58                     ` Jarkko Sakkinen
2022-03-14 15:39                       ` Haitao Huang
2022-03-17  4:34                         ` Jarkko Sakkinen
2022-03-17 14:42                           ` Haitao Huang
2022-03-17  4:37                         ` Jarkko Sakkinen
2022-03-17 14:47                           ` Haitao Huang
2022-03-17  7:01                         ` Jarkko Sakkinen
2022-03-17  7:11                           ` Jarkko Sakkinen
2022-03-17 14:28                             ` Haitao Huang
2022-03-17 21:50                               ` Jarkko Sakkinen
2022-03-17 22:00                                 ` Jarkko Sakkinen
2022-03-17 22:23                                   ` Jarkko Sakkinen
2022-02-08  0:45 ` [PATCH V2 17/32] selftests/sgx: Add test for EPCM permission changes Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 18/32] selftests/sgx: Add test for TCS page " Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 19/32] x86/sgx: Support adding of pages to an initialized enclave Reinette Chatre
2022-02-19 11:57   ` Jarkko Sakkinen
2022-02-19 12:01     ` Jarkko Sakkinen
2022-02-20 18:40       ` Jarkko Sakkinen
2022-02-22 19:19         ` Reinette Chatre
2022-02-23 15:46           ` Jarkko Sakkinen
2022-03-07 16:16   ` Jarkko Sakkinen
2022-02-08  0:45 ` [PATCH V2 20/32] x86/sgx: Tighten accessible memory range after enclave initialization Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 21/32] selftests/sgx: Test two different SGX2 EAUG flows Reinette Chatre
2022-03-07 16:39   ` Jarkko Sakkinen
2022-02-08  0:45 ` [PATCH V2 22/32] x86/sgx: Support modifying SGX page type Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 23/32] x86/sgx: Support complete page removal Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 24/32] Documentation/x86: Introduce enclave runtime management section Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 25/32] selftests/sgx: Introduce dynamic entry point Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 26/32] selftests/sgx: Introduce TCS initialization enclave operation Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 27/32] selftests/sgx: Test complete changing of page type flow Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 28/32] selftests/sgx: Test faulty enclave behavior Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 29/32] selftests/sgx: Test invalid access to removed enclave page Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 30/32] selftests/sgx: Test reclaiming of untouched page Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 31/32] x86/sgx: Free up EPC pages directly to support large page ranges Reinette Chatre
2022-02-08  0:45 ` [PATCH V2 32/32] selftests/sgx: Page removal stress test Reinette Chatre
2022-02-22 20:27 ` [PATCH V2 00/32] x86/sgx and selftests/sgx: Support SGX2 Nathaniel McCallum
2022-02-22 22:39   ` Reinette Chatre
2022-02-23 13:24     ` Nathaniel McCallum
2022-02-23 18:25       ` Reinette Chatre
2022-03-02 16:57         ` Nathaniel McCallum
2022-03-02 21:20           ` Reinette Chatre
2022-03-03  1:13             ` Nathaniel McCallum
2022-03-03 17:49               ` Reinette Chatre
2022-03-04  0:57               ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).