linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] x86/sgx: Test and fixes
@ 2022-08-30  3:12 Jarkko Sakkinen
  2022-08-30  3:12 ` [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
                   ` (5 more replies)
  0 siblings, 6 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-30  3:12 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Jarkko Sakkinen

A collection of tests and fixes for SGX.

Haitao Huang (2):
  x86/sgx: Handle VA page allocation failure for EAUG on PF.
  selftests/sgx: retry the ioctls returned with EAGAIN

Jarkko Sakkinen (2):
  x86/sgx: Do not consider unsanitized pages an error
  selftests/sgx: Add a bpftrace script for tracking allocation errors

Kristen Carlson Accardi (1):
  selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning

Vijay Dhanraj (1):
  selftests/sgx: Add SGX selftest augment_via_eaccept_long

 arch/x86/kernel/cpu/sgx/encl.c             |   5 +-
 arch/x86/kernel/cpu/sgx/main.c             |  19 ++-
 tools/testing/selftests/sgx/alloc-error.bt |   7 +
 tools/testing/selftests/sgx/load.c         |   5 +-
 tools/testing/selftests/sgx/main.c         | 180 +++++++++++++++++----
 tools/testing/selftests/sgx/main.h         |   3 +-
 tools/testing/selftests/sgx/sigstruct.c    |   8 +-
 7 files changed, 189 insertions(+), 38 deletions(-)
 create mode 100644 tools/testing/selftests/sgx/alloc-error.bt

-- 
2.37.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-30  3:12 [PATCH 0/6] x86/sgx: Test and fixes Jarkko Sakkinen
@ 2022-08-30  3:12 ` Jarkko Sakkinen
  2022-08-30 22:54   ` Reinette Chatre
  2022-08-30  3:12 ` [PATCH 2/6] x86/sgx: Handle VA page allocation failure for EAUG on PF Jarkko Sakkinen
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-30  3:12 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Jarkko Sakkinen, Paul Menzel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

In sgx_init(), if misc_register() for the provision device fails, and
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped.

This triggers WARN_ON() because sgx_dirty_page_list ends up being
non-empty, and dumps the call stack:

[    0.000000] Linux version 6.0.0-rc2 (root@4beb429beb4a) (gcc (Debian
11.3.0-3) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38) #382 SMP
PREEMPT_DYNAMIC Fri Aug 26 12:52:15 UTC 2022
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.0.0-rc2
root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
module_blacklist=psmouse initcall_debug log_buf_len=4M cryptomgr.notests
[…]
[    0.268089] calling  sgx_init+0x0/0x409 @ 1
[    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[    0.268591] ------------[ cut here ]------------
[    0.268592] WARNING: CPU: 6 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[    0.268598] Modules linked in:
[    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
0000000000000000
[    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
00000000ffffffff
[    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
ffff8dcd820a4180
[    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
ffffb6c74006bce0
[    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
0000000000000000
[    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
knlGS:0000000000000000
[    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
00000000003706e0
[    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    0.268622] Call Trace:
[    0.268624]  <TASK>
[    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
[    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.268634]  ? __kthread_parkme+0x36/0x90
[    0.268637]  kthread+0xe5/0x110
[    0.268639]  ? kthread_complete_and_exit+0x20/0x20
[    0.268642]  ret_from_fork+0x1f/0x30
[    0.268647]  </TASK>
[    0.268648] ---[ end trace 0000000000000000 ]---
[    0.268694] initcall sgx_init+0x0/0x409 returned -19 after 603 usecs

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

Print a simple warning instead, and improve the output by printing the
number of unsanitized pages, in order to provide debug informnation for
future needs.

Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
Cc: Haitao Huang <haitao.huang@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>

v5:
- Add the klog dump and sysctl option to the commit message.

v4:
- Explain expectations for dirty_page_list in the function header, instead
  of an inline comment.
- Improve commit message to explain the conditions better.
- Return the number of pages left dirty to ksgxd() and print warning after
  the 2nd call, if there are any.

v3:
- Remove WARN_ON().
- Tuned comments and the commit message a bit.

v2:
- Replaced WARN_ON() with optional pr_info() inside
  __sgx_sanitize_pages().
- Rewrote the commit message.
- Added the fixes tag.
---
 arch/x86/kernel/cpu/sgx/main.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..903100fcfce3 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Contents of the @dirty_page_list must be thread-local, i.e.
+ * not shared by multiple threads.
  */
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
 	struct sgx_epc_page *page;
+	int left_dirty = 0;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			break;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }
 
 static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	int left_dirty;
+
 	set_freezable();
 
 	/*
@@ -395,10 +402,10 @@ static int ksgxd(void *p)
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
 	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (left_dirty)
+		pr_warn("%d unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/6] x86/sgx: Handle VA page allocation failure for EAUG on PF.
  2022-08-30  3:12 [PATCH 0/6] x86/sgx: Test and fixes Jarkko Sakkinen
  2022-08-30  3:12 ` [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
@ 2022-08-30  3:12 ` Jarkko Sakkinen
  2022-08-30 22:54   ` Reinette Chatre
  2022-08-30  3:12 ` [PATCH 3/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning Jarkko Sakkinen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-30  3:12 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

From: Haitao Huang <haitao.huang@linux.intel.com>

VM_FAULT_NOPAGE is expected behaviour for -EBUSY failure path, when
augmenting a page, as this means that the reclaimer thread has been
triggered, and the intention is just to round-trip in ring-3, and
retry with a new page fault.

Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v2:
* Removed reviewed-by, no other changes.
---
 arch/x86/kernel/cpu/sgx/encl.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 24c1bb8eb196..de92c1c8b79d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -344,8 +344,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
 	}
 
 	va_page = sgx_encl_grow(encl, false);
-	if (IS_ERR(va_page))
+	if (IS_ERR(va_page)) {
+		if (PTR_ERR(va_page) == -EBUSY)
+			vmret =  VM_FAULT_NOPAGE;
 		goto err_out_epc;
+	}
 
 	if (va_page)
 		list_add(&va_page->list, &encl->va_pages);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
  2022-08-30  3:12 [PATCH 0/6] x86/sgx: Test and fixes Jarkko Sakkinen
  2022-08-30  3:12 ` [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
  2022-08-30  3:12 ` [PATCH 2/6] x86/sgx: Handle VA page allocation failure for EAUG on PF Jarkko Sakkinen
@ 2022-08-30  3:12 ` Jarkko Sakkinen
  2022-08-30 18:18   ` Reinette Chatre
  2022-08-30  3:12 ` [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long Jarkko Sakkinen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-30  3:12 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Kristen Carlson Accardi, Jarkko Sakkinen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

From: Kristen Carlson Accardi <kristen@linux.intel.com>

OpenSSL 3.0 deprecates some of the functions used in the SGX
selftests, causing build errors on new distros. For now ignore
the warnings until support for the functions is no longer
available and mark FIXME so that it can be clear this should
be removed at some point.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 tools/testing/selftests/sgx/sigstruct.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/sgx/sigstruct.c b/tools/testing/selftests/sgx/sigstruct.c
index 50c5ab1aa6fa..a07896a46364 100644
--- a/tools/testing/selftests/sgx/sigstruct.c
+++ b/tools/testing/selftests/sgx/sigstruct.c
@@ -17,6 +17,12 @@
 #include "defines.h"
 #include "main.h"
 
+/*
+ * FIXME: OpenSSL 3.0 has deprecated some functions. For now just ignore
+ * the warnings.
+ */
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+
 struct q1q2_ctx {
 	BN_CTX *bn_ctx;
 	BIGNUM *m;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-08-30  3:12 [PATCH 0/6] x86/sgx: Test and fixes Jarkko Sakkinen
                   ` (2 preceding siblings ...)
  2022-08-30  3:12 ` [PATCH 3/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning Jarkko Sakkinen
@ 2022-08-30  3:12 ` Jarkko Sakkinen
  2022-08-30 22:55   ` Reinette Chatre
  2022-08-30  3:12 ` [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN Jarkko Sakkinen
  2022-08-30  3:12 ` [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors Jarkko Sakkinen
  5 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-30  3:12 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Jarkko Sakkinen, Shuah Khan, open list:KERNEL SELFTEST FRAMEWORK,
	open list

From: Vijay Dhanraj <vijay.dhanraj@intel.com>

Add a new test case which is same as augment_via_eaccept but adds a
larger number of EPC pages to stress test EAUG via EACCEPT.

Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v2:
- Addressed Reinette's feedback:
  https://lore.kernel.org/linux-sgx/24bd8e42-ff4e-0090-d9e1-cd81e4807f21@intel.com/
---
 tools/testing/selftests/sgx/load.c      |   5 +-
 tools/testing/selftests/sgx/main.c      | 141 +++++++++++++++++++++---
 tools/testing/selftests/sgx/main.h      |   3 +-
 tools/testing/selftests/sgx/sigstruct.c |   2 +-
 4 files changed, 129 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/sgx/load.c b/tools/testing/selftests/sgx/load.c
index 94bdeac1cf04..7de1b15c90b1 100644
--- a/tools/testing/selftests/sgx/load.c
+++ b/tools/testing/selftests/sgx/load.c
@@ -171,7 +171,8 @@ uint64_t encl_get_entry(struct encl *encl, const char *symbol)
 	return 0;
 }
 
-bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
+bool encl_load(const char *path, struct encl *encl, unsigned long heap_size,
+			   unsigned long edmm_size)
 {
 	const char device_path[] = "/dev/sgx_enclave";
 	struct encl_segment *seg;
@@ -300,7 +301,7 @@ bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
 
 	encl->src_size = encl->segment_tbl[j].offset + encl->segment_tbl[j].size;
 
-	for (encl->encl_size = 4096; encl->encl_size < encl->src_size; )
+	for (encl->encl_size = 4096; encl->encl_size < encl->src_size + edmm_size;)
 		encl->encl_size <<= 1;
 
 	return true;
diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 9820b3809c69..867e98502120 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -21,8 +21,15 @@
 #include "../kselftest_harness.h"
 #include "main.h"
 
+/*
+ * The size was chosen based on a bug report:
+ * https://lore.kernel.org/linux-sgx/DM8PR11MB55912A7F47A84EC9913A6352F6999@DM8PR11MB5591.namprd11.prod.outlook.com/
+ */
+static const unsigned long EDMM_SIZE_LONG = 8L * 1024L * 1024L * 1024L; // 8 GB
+static const unsigned long TIMEOUT_LONG = 900; /* seconds */
 static const uint64_t MAGIC = 0x1122334455667788ULL;
 static const uint64_t MAGIC2 = 0x8877665544332211ULL;
+
 vdso_sgx_enter_enclave_t vdso_sgx_enter_enclave;
 
 /*
@@ -173,7 +180,8 @@ FIXTURE(enclave) {
 };
 
 static bool setup_test_encl(unsigned long heap_size, struct encl *encl,
-			    struct __test_metadata *_metadata)
+			    struct __test_metadata *_metadata,
+			    unsigned long edmm_size)
 {
 	Elf64_Sym *sgx_enter_enclave_sym = NULL;
 	struct vdso_symtab symtab;
@@ -183,7 +191,7 @@ static bool setup_test_encl(unsigned long heap_size, struct encl *encl,
 	unsigned int i;
 	void *addr;
 
-	if (!encl_load("test_encl.elf", encl, heap_size)) {
+	if (!encl_load("test_encl.elf", encl, heap_size, edmm_size)) {
 		encl_delete(encl);
 		TH_LOG("Failed to load the test enclave.");
 		return false;
@@ -284,7 +292,7 @@ TEST_F(enclave, unclobbered_vdso)
 	struct encl_op_get_from_buf get_op;
 	struct encl_op_put_to_buf put_op;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -357,7 +365,7 @@ TEST_F(enclave, unclobbered_vdso_oversubscribed)
 
 	total_mem = get_total_epc_mem();
 	ASSERT_NE(total_mem, 0);
-	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -401,7 +409,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
 	ASSERT_NE(total_mem, 0);
 	TH_LOG("Creating an enclave with %lu bytes heap may take a while ...",
 	       total_mem);
-	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata, 0));
 
 	/*
 	 * Hardware (SGX2) and kernel support is needed for this test. Start
@@ -506,7 +514,7 @@ TEST_F(enclave, clobbered_vdso)
 	struct encl_op_get_from_buf get_op;
 	struct encl_op_put_to_buf put_op;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -542,7 +550,7 @@ TEST_F(enclave, clobbered_vdso_and_user_function)
 	struct encl_op_get_from_buf get_op;
 	struct encl_op_put_to_buf put_op;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -575,7 +583,7 @@ TEST_F(enclave, tcs_entry)
 {
 	struct encl_op_header op;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -620,7 +628,7 @@ TEST_F(enclave, pte_permissions)
 	unsigned long data_start;
 	int ret;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -722,7 +730,7 @@ TEST_F(enclave, tcs_permissions)
 	struct sgx_enclave_restrict_permissions ioc;
 	int ret, errno_save;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -785,7 +793,7 @@ TEST_F(enclave, epcm_permissions)
 	unsigned long data_start;
 	int ret, errno_save;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -986,7 +994,7 @@ TEST_F(enclave, augment)
 	if (!sgx2_supported())
 		SKIP(return, "SGX2 not supported");
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -1116,7 +1124,7 @@ TEST_F(enclave, augment_via_eaccept)
 	if (!sgx2_supported())
 		SKIP(return, "SGX2 not supported");
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -1210,6 +1218,103 @@ TEST_F(enclave, augment_via_eaccept)
 	munmap(addr, PAGE_SIZE);
 }
 
+/*
+ * Test for the addition of large number of pages to an initialized enclave via
+ * a pre-emptive run of EACCEPT on every page to be added.
+ */
+TEST_F_TIMEOUT(enclave, augment_via_eaccept_long, TIMEOUT_LONG)
+{
+	struct encl_op_get_from_addr get_addr_op;
+	struct encl_op_put_to_addr put_addr_op;
+	struct encl_op_eaccept eaccept_op;
+	size_t total_size = 0;
+	unsigned long i;
+	void *addr;
+
+	if (!sgx2_supported())
+		SKIP(return, "SGX2 not supported");
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata,
+				    EDMM_SIZE_LONG));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	for (i = 0; i < self->encl.nr_segments; i++) {
+		struct encl_segment *seg = &self->encl.segment_tbl[i];
+
+		total_size += seg->size;
+	}
+
+	/*
+	 * mmap() every page at end of existing enclave to be used for
+	 * EDMM.
+	 */
+	addr = mmap((void *)self->encl.encl_base + total_size, EDMM_SIZE_LONG,
+			PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_FIXED,
+			self->encl.fd, 0);
+	EXPECT_NE(addr, MAP_FAILED);
+
+	self->run.exception_vector = 0;
+	self->run.exception_error_code = 0;
+	self->run.exception_addr = 0;
+
+	/*
+	 * Run EACCEPT on new page to trigger the #PF->EAUG->EACCEPT(again
+	 * without a #PF). All should be transparent to userspace.
+	 */
+	eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
+	eaccept_op.ret = 0;
+	eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+	for (i = 0; i < EDMM_SIZE_LONG; i += 4096) {
+		eaccept_op.epc_addr = (uint64_t)(addr + i);
+
+		EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+		if (self->run.exception_vector == 14 &&
+			self->run.exception_error_code == 4 &&
+			self->run.exception_addr == self->encl.encl_base) {
+			munmap(addr, EDMM_SIZE_LONG);
+			SKIP(return, "Kernel does not support adding pages to initialized enclave");
+		}
+
+		EXPECT_EQ(self->run.exception_vector, 0);
+		EXPECT_EQ(self->run.exception_error_code, 0);
+		EXPECT_EQ(self->run.exception_addr, 0);
+		ASSERT_EQ(eaccept_op.ret, 0);
+		ASSERT_EQ(self->run.function, EEXIT);
+	}
+
+	put_addr_op.value = MAGIC;
+	put_addr_op.addr = (unsigned long)addr;
+	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	/*
+	 * Read memory from newly added page that was just written to,
+	 * confirming that data previously written (MAGIC) is present.
+	 */
+	get_addr_op.value = 0;
+	get_addr_op.addr = (unsigned long)addr;
+	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+	EXPECT_EQ(get_addr_op.value, MAGIC);
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+
+	munmap(addr, EDMM_SIZE_LONG);
+}
+
 /*
  * SGX2 page type modification test in two phases:
  * Phase 1:
@@ -1238,7 +1343,7 @@ TEST_F(enclave, tcs_create)
 	int ret, i;
 
 	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl,
-				    _metadata));
+				    _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -1568,7 +1673,7 @@ TEST_F(enclave, remove_added_page_no_eaccept)
 	unsigned long data_start;
 	int ret, errno_save;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -1679,7 +1784,7 @@ TEST_F(enclave, remove_added_page_invalid_access)
 	unsigned long data_start;
 	int ret, errno_save;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -1794,7 +1899,7 @@ TEST_F(enclave, remove_added_page_invalid_access_after_eaccept)
 	unsigned long data_start;
 	int ret, errno_save;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	memset(&self->run, 0, sizeof(self->run));
 	self->run.tcs = self->encl.encl_base;
@@ -1918,7 +2023,7 @@ TEST_F(enclave, remove_untouched_page)
 	unsigned long data_start;
 	int ret, errno_save;
 
-	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
 
 	/*
 	 * Hardware (SGX2) and kernel support is needed for this test. Start
diff --git a/tools/testing/selftests/sgx/main.h b/tools/testing/selftests/sgx/main.h
index fc585be97e2f..fe5d39ac0e1e 100644
--- a/tools/testing/selftests/sgx/main.h
+++ b/tools/testing/selftests/sgx/main.h
@@ -35,7 +35,8 @@ extern unsigned char sign_key[];
 extern unsigned char sign_key_end[];
 
 void encl_delete(struct encl *ctx);
-bool encl_load(const char *path, struct encl *encl, unsigned long heap_size);
+bool encl_load(const char *path, struct encl *encl, unsigned long heap_size,
+			   unsigned long edmm_size);
 bool encl_measure(struct encl *encl);
 bool encl_build(struct encl *encl);
 uint64_t encl_get_entry(struct encl *encl, const char *symbol);
diff --git a/tools/testing/selftests/sgx/sigstruct.c b/tools/testing/selftests/sgx/sigstruct.c
index a07896a46364..decd767434d5 100644
--- a/tools/testing/selftests/sgx/sigstruct.c
+++ b/tools/testing/selftests/sgx/sigstruct.c
@@ -349,7 +349,7 @@ bool encl_measure(struct encl *encl)
 	if (!ctx)
 		goto err;
 
-	if (!mrenclave_ecreate(ctx, encl->src_size))
+	if (!mrenclave_ecreate(ctx, encl->encl_size))
 		goto err;
 
 	for (i = 0; i < encl->nr_segments; i++) {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN
  2022-08-30  3:12 [PATCH 0/6] x86/sgx: Test and fixes Jarkko Sakkinen
                   ` (3 preceding siblings ...)
  2022-08-30  3:12 ` [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long Jarkko Sakkinen
@ 2022-08-30  3:12 ` Jarkko Sakkinen
  2022-08-30 22:56   ` Reinette Chatre
  2022-08-30  3:12 ` [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors Jarkko Sakkinen
  5 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-30  3:12 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Jarkko Sakkinen, Shuah Khan, open list:KERNEL SELFTEST FRAMEWORK,
	open list

From: Haitao Huang <haitao.huang@linux.intel.com>

For EMODT and EREMOVE ioctls with a large range, kernel
may not finish in one shot and return EAGAIN error code
and count of bytes of EPC pages on that operations are
finished successfully.

Change the unclobbered_vdso_oversubscribed_remove test
to rerun the ioctls in a loop, updating offset and length
using the byte count returned in each iteration.

Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 tools/testing/selftests/sgx/main.c | 39 +++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 867e98502120..3268d8b01b0b 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -399,7 +399,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
 	unsigned long total_mem;
 	int ret, errno_save;
 	unsigned long addr;
-	unsigned long i;
+	unsigned long i, count;
 
 	/*
 	 * Create enclave with additional heap that is as big as all
@@ -461,16 +461,27 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
 	modt_ioc.offset = heap->offset;
 	modt_ioc.length = heap->size;
 	modt_ioc.page_type = SGX_PAGE_TYPE_TRIM;
-
+	count = 0;
 	TH_LOG("Changing type of %zd bytes to trimmed may take a while ...",
 	       heap->size);
-	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPES, &modt_ioc);
-	errno_save = ret == -1 ? errno : 0;
+	do {
+		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPES, &modt_ioc);
+		errno_save = ret == -1 ? errno : 0;
+		if (errno_save == EAGAIN) {
+			count += modt_ioc.count;
+			modt_ioc.offset += modt_ioc.count;
+			modt_ioc.length -= modt_ioc.count;
+			modt_ioc.result = 0;
+			modt_ioc.count = 0;
+		} else
+			break;
+	} while (modt_ioc.length != 0);
 
 	EXPECT_EQ(ret, 0);
 	EXPECT_EQ(errno_save, 0);
 	EXPECT_EQ(modt_ioc.result, 0);
-	EXPECT_EQ(modt_ioc.count, heap->size);
+	count += modt_ioc.count;
+	EXPECT_EQ(count, heap->size);
 
 	/* EACCEPT all removed pages. */
 	addr = self->encl.encl_base + heap->offset;
@@ -498,15 +509,25 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
 
 	remove_ioc.offset = heap->offset;
 	remove_ioc.length = heap->size;
-
+	count = 0;
 	TH_LOG("Removing %zd bytes from enclave may take a while ...",
 	       heap->size);
-	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
-	errno_save = ret == -1 ? errno : 0;
+	do {
+		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+		errno_save = ret == -1 ? errno : 0;
+		if (errno_save == EAGAIN) {
+			count += remove_ioc.count;
+			remove_ioc.offset += remove_ioc.count;
+			remove_ioc.length -= remove_ioc.count;
+			remove_ioc.count = 0;
+		} else
+			break;
+	} while (remove_ioc.length != 0);
 
 	EXPECT_EQ(ret, 0);
 	EXPECT_EQ(errno_save, 0);
-	EXPECT_EQ(remove_ioc.count, heap->size);
+	count += remove_ioc.count;
+	EXPECT_EQ(count, heap->size);
 }
 
 TEST_F(enclave, clobbered_vdso)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-08-30  3:12 [PATCH 0/6] x86/sgx: Test and fixes Jarkko Sakkinen
                   ` (4 preceding siblings ...)
  2022-08-30  3:12 ` [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN Jarkko Sakkinen
@ 2022-08-30  3:12 ` Jarkko Sakkinen
  2022-08-30 22:57   ` Reinette Chatre
  2022-08-31 18:23   ` Dave Hansen
  5 siblings, 2 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-30  3:12 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Jarkko Sakkinen, Shuah Khan, open list,
	open list:KERNEL SELFTEST FRAMEWORK

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 tools/testing/selftests/sgx/alloc-error.bt | 7 +++++++
 1 file changed, 7 insertions(+)
 create mode 100644 tools/testing/selftests/sgx/alloc-error.bt

diff --git a/tools/testing/selftests/sgx/alloc-error.bt b/tools/testing/selftests/sgx/alloc-error.bt
new file mode 100644
index 000000000000..9268d50dea29
--- /dev/null
+++ b/tools/testing/selftests/sgx/alloc-error.bt
@@ -0,0 +1,7 @@
+kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
+		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
+}
+
+kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
+		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
+}
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
  2022-08-30  3:12 ` [PATCH 3/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning Jarkko Sakkinen
@ 2022-08-30 18:18   ` Reinette Chatre
  2022-08-31  1:07     ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-08-30 18:18 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Dave Hansen,
	Kristen Carlson Accardi, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

Hi Jarkko,

On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> From: Kristen Carlson Accardi <kristen@linux.intel.com>
> 
> OpenSSL 3.0 deprecates some of the functions used in the SGX
> selftests, causing build errors on new distros. For now ignore
> the warnings until support for the functions is no longer
> available and mark FIXME so that it can be clear this should
> be removed at some point.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---

This one can be dropped from this series. The fix already made it
into v6.0-rc3 via selftest.

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-30  3:12 ` [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
@ 2022-08-30 22:54   ` Reinette Chatre
  2022-08-31  1:27     ` Huang, Kai
  2022-08-31  1:55     ` Jarkko Sakkinen
  0 siblings, 2 replies; 56+ messages in thread
From: Reinette Chatre @ 2022-08-30 22:54 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi Jarkko,

On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> In sgx_init(), if misc_register() for the provision device fails, and
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped.

I do not think misc_register() is required to fail for the scenario to
be triggered (rather use "or" than "and"?). Perhaps just
"In sgx_init(), if a failure is encountered after ksgxd is started
(via sgx_page_reclaimer_init()) ...".

To help the reader understand the subject of this patch it may help
to explain that prematurely stopping ksgxd may leave some
unsanitized pages, but that is not a problem since SGX cannot
be used on the platform anyway. 

> This triggers WARN_ON() because sgx_dirty_page_list ends up being
> non-empty, and dumps the call stack:
> 

Traces like below can be frowned upon. I recommend that you follow the
guidance in "Backtraces in commit mesages"(sic) in 
Documentation/process/submitting-patches.rst.

> [    0.000000] Linux version 6.0.0-rc2 (root@4beb429beb4a) (gcc (Debian
> 11.3.0-3) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38) #382 SMP
> PREEMPT_DYNAMIC Fri Aug 26 12:52:15 UTC 2022
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.0.0-rc2
> root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> module_blacklist=psmouse initcall_debug log_buf_len=4M cryptomgr.notests
> […]
> [    0.268089] calling  sgx_init+0x0/0x409 @ 1
> [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> [    0.268591] ------------[ cut here ]------------
> [    0.268592] WARNING: CPU: 6 PID: 83 at
> arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> [    0.268598] Modules linked in:
> [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> 07/06/2022
> [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> 0000000000000000
> [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> 00000000ffffffff
> [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> ffff8dcd820a4180
> [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> ffffb6c74006bce0
> [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> 0000000000000000
> [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> knlGS:0000000000000000
> [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> 00000000003706e0
> [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    0.268622] Call Trace:
> [    0.268624]  <TASK>
> [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [    0.268634]  ? __kthread_parkme+0x36/0x90
> [    0.268637]  kthread+0xe5/0x110
> [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> [    0.268642]  ret_from_fork+0x1f/0x30
> [    0.268647]  </TASK>
> [    0.268648] ---[ end trace 0000000000000000 ]---
> [    0.268694] initcall sgx_init+0x0/0x409 returned -19 after 603 usecs
> 
> Ultimately this can crash the kernel, if the following is set:
> 
> 	/proc/sys/kernel/panic_on_warn
> 
> Print a simple warning instead, and improve the output by printing the
> number of unsanitized pages, in order to provide debug informnation for
> future needs.

informnation -> information

 
...

> Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

Should this go to stable?

> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 515e2a5f25bb..903100fcfce3 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
>   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
>   * from the input list, and made available for the page allocator. SECS pages
>   * prepending their children in the input list are left intact.
> + *
> + * Contents of the @dirty_page_list must be thread-local, i.e.
> + * not shared by multiple threads.

Did you intend to mention something about the needed locking here? It looks
like some information is lost during the move to the function description.

>   */
> -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  {
>  	struct sgx_epc_page *page;
> +	int left_dirty = 0;

I do not know how many pages this code should be ready for but at least
this could handle more by being an unsigned int considering that it is
always positive ... maybe even unsigned long?

>  	LIST_HEAD(dirty);
>  	int ret;
>  
> -	/* dirty_page_list is thread-local, no need for a lock: */
>  	while (!list_empty(dirty_page_list)) {
>  		if (kthread_should_stop())
> -			return;
> +			break;
>  
>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>  
> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  		} else {
>  			/* The page is not yet clean - move to the dirty list. */
>  			list_move_tail(&page->list, &dirty);
> +			left_dirty++;
>  		}
>  
>  		cond_resched();
>  	}
>  
>  	list_splice(&dirty, dirty_page_list);
> +	return left_dirty;
>  }
>  
>  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	int left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty)
> +		pr_warn("%d unsanitized pages\n", left_dirty);
>  
>  	while (!kthread_should_stop()) {
>  		if (try_to_freeze())


Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/6] x86/sgx: Handle VA page allocation failure for EAUG on PF.
  2022-08-30  3:12 ` [PATCH 2/6] x86/sgx: Handle VA page allocation failure for EAUG on PF Jarkko Sakkinen
@ 2022-08-30 22:54   ` Reinette Chatre
  0 siblings, 0 replies; 56+ messages in thread
From: Reinette Chatre @ 2022-08-30 22:54 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi Jarkko and Haitao,

On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> From: Haitao Huang <haitao.huang@linux.intel.com>
> 
> VM_FAULT_NOPAGE is expected behaviour for -EBUSY failure path, when
> augmenting a page, as this means that the reclaimer thread has been
> triggered, and the intention is just to round-trip in ring-3, and
> retry with a new page fault.

Thank you very much for finding and fixing this issue.

> 
> Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> Tested-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
> v2:
> * Removed reviewed-by, no other changes.
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 24c1bb8eb196..de92c1c8b79d 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -344,8 +344,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
>  	}
>  
>  	va_page = sgx_encl_grow(encl, false);
> -	if (IS_ERR(va_page))
> +	if (IS_ERR(va_page)) {
> +		if (PTR_ERR(va_page) == -EBUSY)
> +			vmret =  VM_FAULT_NOPAGE;

Some unnecessary white space remained after the copy of this snippet. No
need to propagate that mistake.

>  		goto err_out_epc;
> +	}
>  
>  	if (va_page)
>  		list_add(&va_page->list, &encl->va_pages);


Apart from the white space nitpick this looks good.

Thank you very much.

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-08-30  3:12 ` [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long Jarkko Sakkinen
@ 2022-08-30 22:55   ` Reinette Chatre
  2022-08-31  2:28     ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-08-30 22:55 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

Hi Vijay and Jarkko,

On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> From: Vijay Dhanraj <vijay.dhanraj@intel.com>
> 
> Add a new test case which is same as augment_via_eaccept but adds a
> larger number of EPC pages to stress test EAUG via EACCEPT.
> 
> Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
> v2:
> - Addressed Reinette's feedback:
>   https://lore.kernel.org/linux-sgx/24bd8e42-ff4e-0090-d9e1-cd81e4807f21@intel.com/
> ---
>  tools/testing/selftests/sgx/load.c      |   5 +-
>  tools/testing/selftests/sgx/main.c      | 141 +++++++++++++++++++++---
>  tools/testing/selftests/sgx/main.h      |   3 +-
>  tools/testing/selftests/sgx/sigstruct.c |   2 +-
>  4 files changed, 129 insertions(+), 22 deletions(-)

There seems to be at least three patches merged into one here:
1) Update SGX selftests to create enclaves with provided size dedicated
   to EDMM (this change causes a lot of noise and distracts from the test
   addition).
2) The mrenclave_ecreate() fix (which is still incomplete).
3) The actual test addition.


> diff --git a/tools/testing/selftests/sgx/load.c b/tools/testing/selftests/sgx/load.c
> index 94bdeac1cf04..7de1b15c90b1 100644
> --- a/tools/testing/selftests/sgx/load.c
> +++ b/tools/testing/selftests/sgx/load.c
> @@ -171,7 +171,8 @@ uint64_t encl_get_entry(struct encl *encl, const char *symbol)
>  	return 0;
>  }
>  
> -bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
> +bool encl_load(const char *path, struct encl *encl, unsigned long heap_size,
> +			   unsigned long edmm_size)
>  {

checkpatch.pl informs about alignment issues above and also a few other places.

>  	const char device_path[] = "/dev/sgx_enclave";
>  	struct encl_segment *seg;
> @@ -300,7 +301,7 @@ bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
>  
>  	encl->src_size = encl->segment_tbl[j].offset + encl->segment_tbl[j].size;
>  
> -	for (encl->encl_size = 4096; encl->encl_size < encl->src_size; )
> +	for (encl->encl_size = 4096; encl->encl_size < encl->src_size + edmm_size;)
>  		encl->encl_size <<= 1;
>  
>  	return true;
> diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
> index 9820b3809c69..867e98502120 100644
> --- a/tools/testing/selftests/sgx/main.c
> +++ b/tools/testing/selftests/sgx/main.c
> @@ -21,8 +21,15 @@
>  #include "../kselftest_harness.h"
>  #include "main.h"
>  
> +/*
> + * The size was chosen based on a bug report:
> + * https://lore.kernel.org/linux-sgx/DM8PR11MB55912A7F47A84EC9913A6352F6999@DM8PR11MB5591.namprd11.prod.outlook.com/
> + */
> +static const unsigned long EDMM_SIZE_LONG = 8L * 1024L * 1024L * 1024L; // 8 GB
> +static const unsigned long TIMEOUT_LONG = 900; /* seconds */

It is interesting that in this small snippet there are three different
comment styles (multi-line comment preceding code, tail comment using /**/,
and tail comment using //) :)

>  static const uint64_t MAGIC = 0x1122334455667788ULL;
>  static const uint64_t MAGIC2 = 0x8877665544332211ULL;
> +

Addition of empty line here.

>  vdso_sgx_enter_enclave_t vdso_sgx_enter_enclave;
>  
>  /*
> @@ -173,7 +180,8 @@ FIXTURE(enclave) {
>  };
>  
>  static bool setup_test_encl(unsigned long heap_size, struct encl *encl,
> -			    struct __test_metadata *_metadata)
> +			    struct __test_metadata *_metadata,
> +			    unsigned long edmm_size)
>  {
>  	Elf64_Sym *sgx_enter_enclave_sym = NULL;
>  	struct vdso_symtab symtab;
> @@ -183,7 +191,7 @@ static bool setup_test_encl(unsigned long heap_size, struct encl *encl,
>  	unsigned int i;
>  	void *addr;
>  
> -	if (!encl_load("test_encl.elf", encl, heap_size)) {
> +	if (!encl_load("test_encl.elf", encl, heap_size, edmm_size)) {
>  		encl_delete(encl);
>  		TH_LOG("Failed to load the test enclave.");
>  		return false;
> @@ -284,7 +292,7 @@ TEST_F(enclave, unclobbered_vdso)
>  	struct encl_op_get_from_buf get_op;
>  	struct encl_op_put_to_buf put_op;
>  
> -	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
> +	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
>  
>  	memset(&self->run, 0, sizeof(self->run));
>  	self->run.tcs = self->encl.encl_base;
> @@ -357,7 +365,7 @@ TEST_F(enclave, unclobbered_vdso_oversubscribed)
>  
>  	total_mem = get_total_epc_mem();
>  	ASSERT_NE(total_mem, 0);
> -	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata));
> +	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata, 0));
>  

The support for EDMM size and all the call site changes could be moved to a separate
patch to make the changes easier to parse.


...

> @@ -1210,6 +1218,103 @@ TEST_F(enclave, augment_via_eaccept)
>  	munmap(addr, PAGE_SIZE);
>  }
>  
> +/*
> + * Test for the addition of large number of pages to an initialized enclave via
> + * a pre-emptive run of EACCEPT on every page to be added.
> + */
> +TEST_F_TIMEOUT(enclave, augment_via_eaccept_long, TIMEOUT_LONG)
> +{
> +	struct encl_op_get_from_addr get_addr_op;
> +	struct encl_op_put_to_addr put_addr_op;
> +	struct encl_op_eaccept eaccept_op;
> +	size_t total_size = 0;
> +	unsigned long i;
> +	void *addr;
> +
> +	if (!sgx2_supported())
> +		SKIP(return, "SGX2 not supported");
> +
> +	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata,
> +				    EDMM_SIZE_LONG));
> +
> +	memset(&self->run, 0, sizeof(self->run));
> +	self->run.tcs = self->encl.encl_base;
> +
> +	for (i = 0; i < self->encl.nr_segments; i++) {
> +		struct encl_segment *seg = &self->encl.segment_tbl[i];
> +
> +		total_size += seg->size;
> +	}
> +
> +	/*
> +	 * mmap() every page at end of existing enclave to be used for
> +	 * EDMM.
> +	 */
> +	addr = mmap((void *)self->encl.encl_base + total_size, EDMM_SIZE_LONG,
> +			PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_FIXED,
> +			self->encl.fd, 0);
> +	EXPECT_NE(addr, MAP_FAILED);
> +
> +	self->run.exception_vector = 0;
> +	self->run.exception_error_code = 0;
> +	self->run.exception_addr = 0;
> +
> +	/*
> +	 * Run EACCEPT on new page to trigger the #PF->EAUG->EACCEPT(again
> +	 * without a #PF). All should be transparent to userspace.
> +	 */

s/on new page/on every page/ ?

> +	eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
> +	eaccept_op.ret = 0;
> +	eaccept_op.header.type = ENCL_OP_EACCEPT;
> +
> +	for (i = 0; i < EDMM_SIZE_LONG; i += 4096) {
> +		eaccept_op.epc_addr = (uint64_t)(addr + i);
> +
> +		EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
> +		if (self->run.exception_vector == 14 &&
> +			self->run.exception_error_code == 4 &&
> +			self->run.exception_addr == self->encl.encl_base) {
> +			munmap(addr, EDMM_SIZE_LONG);
> +			SKIP(return, "Kernel does not support adding pages to initialized enclave");
> +		}
> +
> +		EXPECT_EQ(self->run.exception_vector, 0);
> +		EXPECT_EQ(self->run.exception_error_code, 0);
> +		EXPECT_EQ(self->run.exception_addr, 0);
> +		ASSERT_EQ(eaccept_op.ret, 0);
> +		ASSERT_EQ(self->run.function, EEXIT);
> +	}
> +

I see that you removed all comments here after the discussion about what
comment would be appropriate. This may be ok but it does seem awkward that
there are two parts to this section and the second part still has a comment
while the first does not. How about the comment I provided in
https://lore.kernel.org/linux-sgx/69a5e350-ded9-30c9-dc41-d08c01dd05dc@intel.com/ :

/*
 * Pool of pages were successfully added to enclave. Perform sanity
 * check on first page of the pool only to ensure data can be written
 * to and read from a dynamically added enclave page.
 */

> +	put_addr_op.value = MAGIC;
> +	put_addr_op.addr = (unsigned long)addr;
> +	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
> +
> +	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
> +
> +	EXPECT_EEXIT(&self->run);
> +	EXPECT_EQ(self->run.exception_vector, 0);
> +	EXPECT_EQ(self->run.exception_error_code, 0);
> +	EXPECT_EQ(self->run.exception_addr, 0);
> +
> +	/*
> +	 * Read memory from newly added page that was just written to,
> +	 * confirming that data previously written (MAGIC) is present.
> +	 */
> +	get_addr_op.value = 0;
> +	get_addr_op.addr = (unsigned long)addr;
> +	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
> +
> +	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
> +
> +	EXPECT_EQ(get_addr_op.value, MAGIC);
> +	EXPECT_EEXIT(&self->run);
> +	EXPECT_EQ(self->run.exception_vector, 0);
> +	EXPECT_EQ(self->run.exception_error_code, 0);
> +	EXPECT_EQ(self->run.exception_addr, 0);
> +
> +	munmap(addr, EDMM_SIZE_LONG);
> +}
> +
>  /*
>   * SGX2 page type modification test in two phases:
>   * Phase 1:

...

> diff --git a/tools/testing/selftests/sgx/sigstruct.c b/tools/testing/selftests/sgx/sigstruct.c
> index a07896a46364..decd767434d5 100644
> --- a/tools/testing/selftests/sgx/sigstruct.c
> +++ b/tools/testing/selftests/sgx/sigstruct.c
> @@ -349,7 +349,7 @@ bool encl_measure(struct encl *encl)
>  	if (!ctx)
>  		goto err;
>  
> -	if (!mrenclave_ecreate(ctx, encl->src_size))
> +	if (!mrenclave_ecreate(ctx, encl->encl_size))
>  		goto err;
>  
>  	for (i = 0; i < encl->nr_segments; i++) {

As I mentioned in feedback to previous version, even though
encl_size is now provided, this misses that mrenclave_ecreate()
continues to recompute it within.

Reinette



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN
  2022-08-30  3:12 ` [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN Jarkko Sakkinen
@ 2022-08-30 22:56   ` Reinette Chatre
  2022-08-31  2:31     ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-08-30 22:56 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

Hi Haitao and Jarkko,


selftests/sgx: Retry the ioctl()s returned with EAGAIN


On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> From: Haitao Huang <haitao.huang@linux.intel.com>
> 
> For EMODT and EREMOVE ioctls with a large range, kernel

ioctl()s?

> may not finish in one shot and return EAGAIN error code
> and count of bytes of EPC pages on that operations are
> finished successfully.
> 
> Change the unclobbered_vdso_oversubscribed_remove test
> to rerun the ioctls in a loop, updating offset and length

ioctl()s?

> using the byte count returned in each iteration.
> 

This is a valuable addition, thank you very much.

> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
>  tools/testing/selftests/sgx/main.c | 39 +++++++++++++++++++++++-------
>  1 file changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
> index 867e98502120..3268d8b01b0b 100644
> --- a/tools/testing/selftests/sgx/main.c
> +++ b/tools/testing/selftests/sgx/main.c
> @@ -399,7 +399,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
>  	unsigned long total_mem;
>  	int ret, errno_save;
>  	unsigned long addr;
> -	unsigned long i;
> +	unsigned long i, count;

Reverse fir tree?

>  
>  	/*
>  	 * Create enclave with additional heap that is as big as all
> @@ -461,16 +461,27 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
>  	modt_ioc.offset = heap->offset;
>  	modt_ioc.length = heap->size;
>  	modt_ioc.page_type = SGX_PAGE_TYPE_TRIM;
> -
> +	count = 0;
>  	TH_LOG("Changing type of %zd bytes to trimmed may take a while ...",
>  	       heap->size);
> -	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPES, &modt_ioc);
> -	errno_save = ret == -1 ? errno : 0;
> +	do {
> +		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPES, &modt_ioc);
> +		errno_save = ret == -1 ? errno : 0;
> +		if (errno_save == EAGAIN) {
> +			count += modt_ioc.count;
> +			modt_ioc.offset += modt_ioc.count;
> +			modt_ioc.length -= modt_ioc.count;
> +			modt_ioc.result = 0;

As part of the test I think it would be helpful to know if there was a result code
in here. What do you think of failing the test if it is not zero?

> +			modt_ioc.count = 0;
> +		} else
> +			break;

Watch out for unbalanced braces (also later in patch). This causes
checkpatch.pl noise.

> +	} while (modt_ioc.length != 0);
>  
>  	EXPECT_EQ(ret, 0);
>  	EXPECT_EQ(errno_save, 0);
>  	EXPECT_EQ(modt_ioc.result, 0);
> -	EXPECT_EQ(modt_ioc.count, heap->size);
> +	count += modt_ioc.count;
> +	EXPECT_EQ(count, heap->size);
>  
>  	/* EACCEPT all removed pages. */
>  	addr = self->encl.encl_base + heap->offset;
> @@ -498,15 +509,25 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
>  
>  	remove_ioc.offset = heap->offset;
>  	remove_ioc.length = heap->size;
> -
> +	count = 0;
>  	TH_LOG("Removing %zd bytes from enclave may take a while ...",
>  	       heap->size);
> -	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
> -	errno_save = ret == -1 ? errno : 0;
> +	do {
> +		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
> +		errno_save = ret == -1 ? errno : 0;
> +		if (errno_save == EAGAIN) {
> +			count += remove_ioc.count;
> +			remove_ioc.offset += remove_ioc.count;
> +			remove_ioc.length -= remove_ioc.count;
> +			remove_ioc.count = 0;
> +		} else
> +			break;
> +	} while (remove_ioc.length != 0);
>  
>  	EXPECT_EQ(ret, 0);
>  	EXPECT_EQ(errno_save, 0);
> -	EXPECT_EQ(remove_ioc.count, heap->size);
> +	count += remove_ioc.count;
> +	EXPECT_EQ(count, heap->size);
>  }
>  
>  TEST_F(enclave, clobbered_vdso)

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-08-30  3:12 ` [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors Jarkko Sakkinen
@ 2022-08-30 22:57   ` Reinette Chatre
  2022-08-31  2:33     ` Jarkko Sakkinen
  2022-08-31 18:23   ` Dave Hansen
  1 sibling, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-08-30 22:57 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan, open list,
	open list:KERNEL SELFTEST FRAMEWORK

Hi Jarkko,

On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
>  tools/testing/selftests/sgx/alloc-error.bt | 7 +++++++
>  1 file changed, 7 insertions(+)
>  create mode 100644 tools/testing/selftests/sgx/alloc-error.bt
> 
> diff --git a/tools/testing/selftests/sgx/alloc-error.bt b/tools/testing/selftests/sgx/alloc-error.bt
> new file mode 100644
> index 000000000000..9268d50dea29
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/alloc-error.bt
> @@ -0,0 +1,7 @@
> +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
> +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
> +}
> +
> +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
> +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
> +}


Could there be a snippet of comments in this new file to guide users
on how to use this script?

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
  2022-08-30 18:18   ` Reinette Chatre
@ 2022-08-31  1:07     ` Jarkko Sakkinen
  0 siblings, 0 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  1:07 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen,
	Kristen Carlson Accardi, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On Tue, Aug 30, 2022 at 11:18:04AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > From: Kristen Carlson Accardi <kristen@linux.intel.com>
> > 
> > OpenSSL 3.0 deprecates some of the functions used in the SGX
> > selftests, causing build errors on new distros. For now ignore
> > the warnings until support for the functions is no longer
> > available and mark FIXME so that it can be clear this should
> > be removed at some point.
> > 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> 
> This one can be dropped from this series. The fix already made it
> into v6.0-rc3 via selftest.

ok

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-30 22:54   ` Reinette Chatre
@ 2022-08-31  1:27     ` Huang, Kai
  2022-08-31  2:15       ` jarkko
  2022-08-31  1:55     ` Jarkko Sakkinen
  1 sibling, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2022-08-31  1:27 UTC (permalink / raw)
  To: linux-sgx, Chatre, Reinette, jarkko
  Cc: pmenzel, x86, bp, Dhanraj, Vijay, dave.hansen, mingo, tglx, hpa,
	haitao.huang, linux-kernel

On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > In sgx_init(), if misc_register() for the provision device fails, and
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped.
> 
> I do not think misc_register() is required to fail for the scenario to
> be triggered (rather use "or" than "and"?). Perhaps just
> "In sgx_init(), if a failure is encountered after ksgxd is started
> (via sgx_page_reclaimer_init()) ...".

IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
be initialized successfully, sgx_init() still returns 0.

Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
initialized successfully.  Then the code change in this patch won't be necessary
if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
early stage before we are sure either the driver or KVM SGX will work.

Btw currently EPC pages assigned to KVM guest cannot be reclaimed, so
theoretically ksgxd can be moved to sgx_drv_init(), but who knows someday we
will decide to make KVM guest EPC pages to be able to be reclaimed. :)



-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-30 22:54   ` Reinette Chatre
  2022-08-31  1:27     ` Huang, Kai
@ 2022-08-31  1:55     ` Jarkko Sakkinen
  2022-08-31  1:58       ` Jarkko Sakkinen
  2022-08-31 18:08       ` Reinette Chatre
  1 sibling, 2 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  1:55 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > In sgx_init(), if misc_register() for the provision device fails, and
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped.
> 
> I do not think misc_register() is required to fail for the scenario to
> be triggered (rather use "or" than "and"?). Perhaps just
> "In sgx_init(), if a failure is encountered after ksgxd is started
> (via sgx_page_reclaimer_init()) ...".

This would be the fixed version of the sentence:

"
In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.
"

I want to keep the end states listed and not make it more abstract.

The second sentence addresses the remark below.

> To help the reader understand the subject of this patch it may help
> to explain that prematurely stopping ksgxd may leave some
> unsanitized pages, but that is not a problem since SGX cannot
> be used on the platform anyway. 
> 
> > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > non-empty, and dumps the call stack:
> > 
> 
> Traces like below can be frowned upon. I recommend that you follow the
> guidance in "Backtraces in commit mesages"(sic) in 
> Documentation/process/submitting-patches.rst.
> 
> > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0

Is this good enough? I had not actually spotted this section before but
nice that it exists. Apparently has been added in 5.12.

>> > 
> > Ultimately this can crash the kernel, if the following is set:
> > 
> > 	/proc/sys/kernel/panic_on_warn
> > 
> > Print a simple warning instead, and improve the output by printing the
> > number of unsanitized pages, in order to provide debug informnation for
> > future needs.
> 
> informnation -> information

+1

> 
>  
> ...
> 
> > Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> 
> Should this go to stable?

I guess it should. The hard reason for this that it can panic
the kernel.

> 
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 515e2a5f25bb..903100fcfce3 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
> >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> >   * from the input list, and made available for the page allocator. SECS pages
> >   * prepending their children in the input list are left intact.
> > + *
> > + * Contents of the @dirty_page_list must be thread-local, i.e.
> > + * not shared by multiple threads.
> 
> Did you intend to mention something about the needed locking here? It looks
> like some information is lost during the move to the function description.

Nothing about the locking that concerns the parameter, as the
sentence defines clear constraints for the caller.

> 
> >   */
> > -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
> >  {
> >  	struct sgx_epc_page *page;
> > +	int left_dirty = 0;
> 
> I do not know how many pages this code should be ready for but at least
> this could handle more by being an unsigned int considering that it is
> always positive ... maybe even unsigned long?

I would go for 'long'. More information below.

> 
> >  	LIST_HEAD(dirty);
> >  	int ret;
> >  
> > -	/* dirty_page_list is thread-local, no need for a lock: */
> >  	while (!list_empty(dirty_page_list)) {
> >  		if (kthread_should_stop())
> > -			return;
> > +			break;
> >  
> >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> >  
> > @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> >  		} else {
> >  			/* The page is not yet clean - move to the dirty list. */
> >  			list_move_tail(&page->list, &dirty);
> > +			left_dirty++;
> >  		}
> >  
> >  		cond_resched();
> >  	}
> >  
> >  	list_splice(&dirty, dirty_page_list);
> > +	return left_dirty;
> >  }
> >  
> >  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	int left_dirty;
> > +
> >  	set_freezable();
> >  
> >  	/*
> > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (left_dirty)
> > +		pr_warn("%d unsanitized pages\n", left_dirty);
> >  
> >  	while (!kthread_should_stop()) {
> >  		if (try_to_freeze())
> 
> 
> Reinette

We need to return -ECANCELED on premature stop, and number of
pages otherwise.

In premature stop, nothing should be printed, as the number
is by practical means a random number. Otherwise, it is an
indicator of a bug in the driver, and therefore a non-zero
number should be printed pr_err(), if that happens after the
second call.

Thanks for feedback.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  1:55     ` Jarkko Sakkinen
@ 2022-08-31  1:58       ` Jarkko Sakkinen
  2022-08-31  2:01         ` Jarkko Sakkinen
  2022-08-31 18:08       ` Reinette Chatre
  1 sibling, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  1:58 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Wed, Aug 31, 2022 at 04:55:24AM +0300, Jarkko Sakkinen wrote:
> On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > In sgx_init(), if misc_register() for the provision device fails, and
> > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > prematurely stopped.
> > 
> > I do not think misc_register() is required to fail for the scenario to
> > be triggered (rather use "or" than "and"?). Perhaps just
> > "In sgx_init(), if a failure is encountered after ksgxd is started
> > (via sgx_page_reclaimer_init()) ...".
> 
> This would be the fixed version of the sentence:
> 
> "
> In sgx_init(), if misc_register() fails or misc_register() succeeds but
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped. This may leave some unsanitized pages, which does
> not matter, because SGX will be disabled for the whole power cycle.
> "
> 
> I want to keep the end states listed and not make it more abstract.
> 
> The second sentence addresses the remark below.
> 
> > To help the reader understand the subject of this patch it may help
> > to explain that prematurely stopping ksgxd may leave some
> > unsanitized pages, but that is not a problem since SGX cannot
> > be used on the platform anyway. 
> > 
> > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > non-empty, and dumps the call stack:
> > > 
> > 
> > Traces like below can be frowned upon. I recommend that you follow the
> > guidance in "Backtraces in commit mesages"(sic) in 
> > Documentation/process/submitting-patches.rst.
> > 
> > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> 
> Is this good enough? I had not actually spotted this section before but
> nice that it exists. Apparently has been added in 5.12.
> 
> >> > 
> > > Ultimately this can crash the kernel, if the following is set:
> > > 
> > > 	/proc/sys/kernel/panic_on_warn
> > > 
> > > Print a simple warning instead, and improve the output by printing the
> > > number of unsanitized pages, in order to provide debug informnation for
> > > future needs.
> > 
> > informnation -> information
> 
> +1
> 
> > 
> >  
> > ...
> > 
> > > Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > 
> > Should this go to stable?
> 
> I guess it should. The hard reason for this that it can panic
> the kernel.
> 
> > 
> > > 
> > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > index 515e2a5f25bb..903100fcfce3 100644
> > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
> > >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> > >   * from the input list, and made available for the page allocator. SECS pages
> > >   * prepending their children in the input list are left intact.
> > > + *
> > > + * Contents of the @dirty_page_list must be thread-local, i.e.
> > > + * not shared by multiple threads.
> > 
> > Did you intend to mention something about the needed locking here? It looks
> > like some information is lost during the move to the function description.
> 
> Nothing about the locking that concerns the parameter, as the
> sentence defines clear constraints for the caller.
> 
> > 
> > >   */
> > > -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > >  {
> > >  	struct sgx_epc_page *page;
> > > +	int left_dirty = 0;
> > 
> > I do not know how many pages this code should be ready for but at least
> > this could handle more by being an unsigned int considering that it is
> > always positive ... maybe even unsigned long?
> 
> I would go for 'long'. More information below.
> 
> > 
> > >  	LIST_HEAD(dirty);
> > >  	int ret;
> > >  
> > > -	/* dirty_page_list is thread-local, no need for a lock: */
> > >  	while (!list_empty(dirty_page_list)) {
> > >  		if (kthread_should_stop())
> > > -			return;
> > > +			break;
> > >  
> > >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> > >  
> > > @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > >  		} else {
> > >  			/* The page is not yet clean - move to the dirty list. */
> > >  			list_move_tail(&page->list, &dirty);
> > > +			left_dirty++;
> > >  		}
> > >  
> > >  		cond_resched();
> > >  	}
> > >  
> > >  	list_splice(&dirty, dirty_page_list);
> > > +	return left_dirty;
> > >  }
> > >  
> > >  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> > > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> > >  
> > >  static int ksgxd(void *p)
> > >  {
> > > +	int left_dirty;
> > > +
> > >  	set_freezable();
> > >  
> > >  	/*
> > > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >  	 */
> > >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > >  
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (left_dirty)
> > > +		pr_warn("%d unsanitized pages\n", left_dirty);
> > >  
> > >  	while (!kthread_should_stop()) {
> > >  		if (try_to_freeze())
> > 
> > 
> > Reinette
> 
> We need to return -ECANCELED on premature stop, and number of
> pages otherwise.
> 
> In premature stop, nothing should be printed, as the number
> is by practical means a random number. Otherwise, it is an
> indicator of a bug in the driver, and therefore a non-zero
> number should be printed pr_err(), if that happens after the
> second call.

I.e. even though we print less we get more *information* what
is going inside the kernel. Warning is not correct for either
path IMHO.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  1:58       ` Jarkko Sakkinen
@ 2022-08-31  2:01         ` Jarkko Sakkinen
  0 siblings, 0 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  2:01 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Wed, Aug 31, 2022 at 04:58:58AM +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 04:55:24AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > prematurely stopped.
> > > 
> > > I do not think misc_register() is required to fail for the scenario to
> > > be triggered (rather use "or" than "and"?). Perhaps just
> > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > (via sgx_page_reclaimer_init()) ...".
> > 
> > This would be the fixed version of the sentence:
> > 
> > "
> > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped. This may leave some unsanitized pages, which does
> > not matter, because SGX will be disabled for the whole power cycle.
> > "
> > 
> > I want to keep the end states listed and not make it more abstract.
> > 
> > The second sentence addresses the remark below.
> > 
> > > To help the reader understand the subject of this patch it may help
> > > to explain that prematurely stopping ksgxd may leave some
> > > unsanitized pages, but that is not a problem since SGX cannot
> > > be used on the platform anyway. 
> > > 
> > > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > > non-empty, and dumps the call stack:
> > > > 
> > > 
> > > Traces like below can be frowned upon. I recommend that you follow the
> > > guidance in "Backtraces in commit mesages"(sic) in 
> > > Documentation/process/submitting-patches.rst.
> > > 
> > > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > 
> > Is this good enough? I had not actually spotted this section before but
> > nice that it exists. Apparently has been added in 5.12.
> > 
> > >> > 
> > > > Ultimately this can crash the kernel, if the following is set:
> > > > 
> > > > 	/proc/sys/kernel/panic_on_warn
> > > > 
> > > > Print a simple warning instead, and improve the output by printing the
> > > > number of unsanitized pages, in order to provide debug informnation for
> > > > future needs.
> > > 
> > > informnation -> information
> > 
> > +1
> > 
> > > 
> > >  
> > > ...
> > > 
> > > > Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
> > > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
> > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > 
> > > Should this go to stable?
> > 
> > I guess it should. The hard reason for this that it can panic
> > the kernel.
> > 
> > > 
> > > > 
> > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > > index 515e2a5f25bb..903100fcfce3 100644
> > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
> > > >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> > > >   * from the input list, and made available for the page allocator. SECS pages
> > > >   * prepending their children in the input list are left intact.
> > > > + *
> > > > + * Contents of the @dirty_page_list must be thread-local, i.e.
> > > > + * not shared by multiple threads.
> > > 
> > > Did you intend to mention something about the needed locking here? It looks
> > > like some information is lost during the move to the function description.
> > 
> > Nothing about the locking that concerns the parameter, as the
> > sentence defines clear constraints for the caller.
> > 
> > > 
> > > >   */
> > > > -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > > +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > >  {
> > > >  	struct sgx_epc_page *page;
> > > > +	int left_dirty = 0;
> > > 
> > > I do not know how many pages this code should be ready for but at least
> > > this could handle more by being an unsigned int considering that it is
> > > always positive ... maybe even unsigned long?
> > 
> > I would go for 'long'. More information below.
> > 
> > > 
> > > >  	LIST_HEAD(dirty);
> > > >  	int ret;
> > > >  
> > > > -	/* dirty_page_list is thread-local, no need for a lock: */
> > > >  	while (!list_empty(dirty_page_list)) {
> > > >  		if (kthread_should_stop())
> > > > -			return;
> > > > +			break;
> > > >  
> > > >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> > > >  
> > > > @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> > > >  		} else {
> > > >  			/* The page is not yet clean - move to the dirty list. */
> > > >  			list_move_tail(&page->list, &dirty);
> > > > +			left_dirty++;
> > > >  		}
> > > >  
> > > >  		cond_resched();
> > > >  	}
> > > >  
> > > >  	list_splice(&dirty, dirty_page_list);
> > > > +	return left_dirty;
> > > >  }
> > > >  
> > > >  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
> > > > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> > > >  
> > > >  static int ksgxd(void *p)
> > > >  {
> > > > +	int left_dirty;
> > > > +
> > > >  	set_freezable();
> > > >  
> > > >  	/*
> > > > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> > > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > > >  	 */
> > > >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > >  
> > > > -	/* sanity check: */
> > > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	if (left_dirty)
> > > > +		pr_warn("%d unsanitized pages\n", left_dirty);
> > > >  
> > > >  	while (!kthread_should_stop()) {
> > > >  		if (try_to_freeze())
> > > 
> > > 
> > > Reinette
> > 
> > We need to return -ECANCELED on premature stop, and number of
> > pages otherwise.
> > 
> > In premature stop, nothing should be printed, as the number
> > is by practical means a random number. Otherwise, it is an
> > indicator of a bug in the driver, and therefore a non-zero
> > number should be printed pr_err(), if that happens after the
> > second call.
> 
> I.e. even though we print less we get more *information* what
> is going inside the kernel. Warning is not correct for either
> path IMHO.

Oh, sorry, I forgot one thing. The devices should be actually
deinitialized in the error case. We do not want to leave a
broken driver running.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  1:27     ` Huang, Kai
@ 2022-08-31  2:15       ` jarkko
  2022-08-31  2:35         ` Huang, Kai
  0 siblings, 1 reply; 56+ messages in thread
From: jarkko @ 2022-08-31  2:15 UTC (permalink / raw)
  To: Huang, Kai
  Cc: linux-sgx, Chatre, Reinette, pmenzel, x86, bp, Dhanraj, Vijay,
	dave.hansen, mingo, tglx, hpa, haitao.huang, linux-kernel

On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > In sgx_init(), if misc_register() for the provision device fails, and
> > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > prematurely stopped.
> > 
> > I do not think misc_register() is required to fail for the scenario to
> > be triggered (rather use "or" than "and"?). Perhaps just
> > "In sgx_init(), if a failure is encountered after ksgxd is started
> > (via sgx_page_reclaimer_init()) ...".
> 
> IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> be initialized successfully, sgx_init() still returns 0.
> 
> Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> initialized successfully.  Then the code change in this patch won't be necessary
> if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> early stage before we are sure either the driver or KVM SGX will work.

I would focus fixing the existing flow rather than reinventing the flow.

It can be made to work, and therefore it is IMHO correct action to take.

> Btw currently EPC pages assigned to KVM guest cannot be reclaimed, so
> theoretically ksgxd can be moved to sgx_drv_init(), but who knows someday we
> will decide to make KVM guest EPC pages to be able to be reclaimed. :)

I'm open to changes but it is in my opinion out of context for this.

> 
> 
> 
> -- 
> Thanks,
> -Kai
> 
> 

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-08-30 22:55   ` Reinette Chatre
@ 2022-08-31  2:28     ` Jarkko Sakkinen
  2022-08-31 18:09       ` Reinette Chatre
  0 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  2:28 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On Tue, Aug 30, 2022 at 03:55:47PM -0700, Reinette Chatre wrote:
> Hi Vijay and Jarkko,
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > From: Vijay Dhanraj <vijay.dhanraj@intel.com>
> > 
> > Add a new test case which is same as augment_via_eaccept but adds a
> > larger number of EPC pages to stress test EAUG via EACCEPT.
> > 
> > Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
> > Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> > v2:
> > - Addressed Reinette's feedback:
> >   https://lore.kernel.org/linux-sgx/24bd8e42-ff4e-0090-d9e1-cd81e4807f21@intel.com/
> > ---
> >  tools/testing/selftests/sgx/load.c      |   5 +-
> >  tools/testing/selftests/sgx/main.c      | 141 +++++++++++++++++++++---
> >  tools/testing/selftests/sgx/main.h      |   3 +-
> >  tools/testing/selftests/sgx/sigstruct.c |   2 +-
> >  4 files changed, 129 insertions(+), 22 deletions(-)
> 
> There seems to be at least three patches merged into one here:
> 1) Update SGX selftests to create enclaves with provided size dedicated
>    to EDMM (this change causes a lot of noise and distracts from the test
>    addition).
> 2) The mrenclave_ecreate() fix (which is still incomplete).
> 3) The actual test addition.

I would agree on this on a kernel patch but not for kselftest patch. It
does not really give useful value here. This adds a test and that is a
good enough granularity in my opinion, unless some major architecture
work is required as precursory. It is not the case here.

> > diff --git a/tools/testing/selftests/sgx/load.c b/tools/testing/selftests/sgx/load.c
> > index 94bdeac1cf04..7de1b15c90b1 100644
> > --- a/tools/testing/selftests/sgx/load.c
> > +++ b/tools/testing/selftests/sgx/load.c
> > @@ -171,7 +171,8 @@ uint64_t encl_get_entry(struct encl *encl, const char *symbol)
> >  	return 0;
> >  }
> >  
> > -bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
> > +bool encl_load(const char *path, struct encl *encl, unsigned long heap_size,
> > +			   unsigned long edmm_size)
> >  {
> 
> checkpatch.pl informs about alignment issues above and also a few other places.

Weird. I did run checkpatch through these. Will revisit.

> 
> >  	const char device_path[] = "/dev/sgx_enclave";
> >  	struct encl_segment *seg;
> > @@ -300,7 +301,7 @@ bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
> >  
> >  	encl->src_size = encl->segment_tbl[j].offset + encl->segment_tbl[j].size;
> >  
> > -	for (encl->encl_size = 4096; encl->encl_size < encl->src_size; )
> > +	for (encl->encl_size = 4096; encl->encl_size < encl->src_size + edmm_size;)
> >  		encl->encl_size <<= 1;
> >  
> >  	return true;
> > diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
> > index 9820b3809c69..867e98502120 100644
> > --- a/tools/testing/selftests/sgx/main.c
> > +++ b/tools/testing/selftests/sgx/main.c
> > @@ -21,8 +21,15 @@
> >  #include "../kselftest_harness.h"
> >  #include "main.h"
> >  
> > +/*
> > + * The size was chosen based on a bug report:
> > + * https://lore.kernel.org/linux-sgx/DM8PR11MB55912A7F47A84EC9913A6352F6999@DM8PR11MB5591.namprd11.prod.outlook.com/
> > + */
> > +static const unsigned long EDMM_SIZE_LONG = 8L * 1024L * 1024L * 1024L; // 8 GB
> > +static const unsigned long TIMEOUT_LONG = 900; /* seconds */
> 
> It is interesting that in this small snippet there are three different
> comment styles (multi-line comment preceding code, tail comment using /**/,
> and tail comment using //) :)

Oops. Will fix.

> 
> >  static const uint64_t MAGIC = 0x1122334455667788ULL;
> >  static const uint64_t MAGIC2 = 0x8877665544332211ULL;
> > +
> 
> Addition of empty line here.

Will remove.

> 
> >  vdso_sgx_enter_enclave_t vdso_sgx_enter_enclave;
> >  
> >  /*
> > @@ -173,7 +180,8 @@ FIXTURE(enclave) {
> >  };
> >  
> >  static bool setup_test_encl(unsigned long heap_size, struct encl *encl,
> > -			    struct __test_metadata *_metadata)
> > +			    struct __test_metadata *_metadata,
> > +			    unsigned long edmm_size)
> >  {
> >  	Elf64_Sym *sgx_enter_enclave_sym = NULL;
> >  	struct vdso_symtab symtab;
> > @@ -183,7 +191,7 @@ static bool setup_test_encl(unsigned long heap_size, struct encl *encl,
> >  	unsigned int i;
> >  	void *addr;
> >  
> > -	if (!encl_load("test_encl.elf", encl, heap_size)) {
> > +	if (!encl_load("test_encl.elf", encl, heap_size, edmm_size)) {
> >  		encl_delete(encl);
> >  		TH_LOG("Failed to load the test enclave.");
> >  		return false;
> > @@ -284,7 +292,7 @@ TEST_F(enclave, unclobbered_vdso)
> >  	struct encl_op_get_from_buf get_op;
> >  	struct encl_op_put_to_buf put_op;
> >  
> > -	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
> > +	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata, 0));
> >  
> >  	memset(&self->run, 0, sizeof(self->run));
> >  	self->run.tcs = self->encl.encl_base;
> > @@ -357,7 +365,7 @@ TEST_F(enclave, unclobbered_vdso_oversubscribed)
> >  
> >  	total_mem = get_total_epc_mem();
> >  	ASSERT_NE(total_mem, 0);
> > -	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata));
> > +	ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata, 0));
> >  
> 
> The support for EDMM size and all the call site changes could be moved to a separate
> patch to make the changes easier to parse.
> 
> 
> ...
> 
> > @@ -1210,6 +1218,103 @@ TEST_F(enclave, augment_via_eaccept)
> >  	munmap(addr, PAGE_SIZE);
> >  }
> >  
> > +/*
> > + * Test for the addition of large number of pages to an initialized enclave via
> > + * a pre-emptive run of EACCEPT on every page to be added.
> > + */
> > +TEST_F_TIMEOUT(enclave, augment_via_eaccept_long, TIMEOUT_LONG)
> > +{
> > +	struct encl_op_get_from_addr get_addr_op;
> > +	struct encl_op_put_to_addr put_addr_op;
> > +	struct encl_op_eaccept eaccept_op;
> > +	size_t total_size = 0;
> > +	unsigned long i;
> > +	void *addr;
> > +
> > +	if (!sgx2_supported())
> > +		SKIP(return, "SGX2 not supported");
> > +
> > +	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata,
> > +				    EDMM_SIZE_LONG));
> > +
> > +	memset(&self->run, 0, sizeof(self->run));
> > +	self->run.tcs = self->encl.encl_base;
> > +
> > +	for (i = 0; i < self->encl.nr_segments; i++) {
> > +		struct encl_segment *seg = &self->encl.segment_tbl[i];
> > +
> > +		total_size += seg->size;
> > +	}
> > +
> > +	/*
> > +	 * mmap() every page at end of existing enclave to be used for
> > +	 * EDMM.
> > +	 */
> > +	addr = mmap((void *)self->encl.encl_base + total_size, EDMM_SIZE_LONG,
> > +			PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_FIXED,
> > +			self->encl.fd, 0);
> > +	EXPECT_NE(addr, MAP_FAILED);
> > +
> > +	self->run.exception_vector = 0;
> > +	self->run.exception_error_code = 0;
> > +	self->run.exception_addr = 0;
> > +
> > +	/*
> > +	 * Run EACCEPT on new page to trigger the #PF->EAUG->EACCEPT(again
> > +	 * without a #PF). All should be transparent to userspace.
> > +	 */
> 
> s/on new page/on every page/ ?

+1

> 
> > +	eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
> > +	eaccept_op.ret = 0;
> > +	eaccept_op.header.type = ENCL_OP_EACCEPT;
> > +
> > +	for (i = 0; i < EDMM_SIZE_LONG; i += 4096) {
> > +		eaccept_op.epc_addr = (uint64_t)(addr + i);
> > +
> > +		EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
> > +		if (self->run.exception_vector == 14 &&
> > +			self->run.exception_error_code == 4 &&
> > +			self->run.exception_addr == self->encl.encl_base) {
> > +			munmap(addr, EDMM_SIZE_LONG);
> > +			SKIP(return, "Kernel does not support adding pages to initialized enclave");
> > +		}
> > +
> > +		EXPECT_EQ(self->run.exception_vector, 0);
> > +		EXPECT_EQ(self->run.exception_error_code, 0);
> > +		EXPECT_EQ(self->run.exception_addr, 0);
> > +		ASSERT_EQ(eaccept_op.ret, 0);
> > +		ASSERT_EQ(self->run.function, EEXIT);
> > +	}
> > +
> 
> I see that you removed all comments here after the discussion about what
> comment would be appropriate. This may be ok but it does seem awkward that
> there are two parts to this section and the second part still has a comment
> while the first does not. How about the comment I provided in
> https://lore.kernel.org/linux-sgx/69a5e350-ded9-30c9-dc41-d08c01dd05dc@intel.com/ :
> 
> /*
>  * Pool of pages were successfully added to enclave. Perform sanity
>  * check on first page of the pool only to ensure data can be written
>  * to and read from a dynamically added enclave page.
>  */

+1

> 
> > +	put_addr_op.value = MAGIC;
> > +	put_addr_op.addr = (unsigned long)addr;
> > +	put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
> > +
> > +	EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
> > +
> > +	EXPECT_EEXIT(&self->run);
> > +	EXPECT_EQ(self->run.exception_vector, 0);
> > +	EXPECT_EQ(self->run.exception_error_code, 0);
> > +	EXPECT_EQ(self->run.exception_addr, 0);
> > +
> > +	/*
> > +	 * Read memory from newly added page that was just written to,
> > +	 * confirming that data previously written (MAGIC) is present.
> > +	 */
> > +	get_addr_op.value = 0;
> > +	get_addr_op.addr = (unsigned long)addr;
> > +	get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
> > +
> > +	EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
> > +
> > +	EXPECT_EQ(get_addr_op.value, MAGIC);
> > +	EXPECT_EEXIT(&self->run);
> > +	EXPECT_EQ(self->run.exception_vector, 0);
> > +	EXPECT_EQ(self->run.exception_error_code, 0);
> > +	EXPECT_EQ(self->run.exception_addr, 0);
> > +
> > +	munmap(addr, EDMM_SIZE_LONG);
> > +}
> > +
> >  /*
> >   * SGX2 page type modification test in two phases:
> >   * Phase 1:
> 
> ...
> 
> > diff --git a/tools/testing/selftests/sgx/sigstruct.c b/tools/testing/selftests/sgx/sigstruct.c
> > index a07896a46364..decd767434d5 100644
> > --- a/tools/testing/selftests/sgx/sigstruct.c
> > +++ b/tools/testing/selftests/sgx/sigstruct.c
> > @@ -349,7 +349,7 @@ bool encl_measure(struct encl *encl)
> >  	if (!ctx)
> >  		goto err;
> >  
> > -	if (!mrenclave_ecreate(ctx, encl->src_size))
> > +	if (!mrenclave_ecreate(ctx, encl->encl_size))
> >  		goto err;
> >  
> >  	for (i = 0; i < encl->nr_segments; i++) {
> 
> As I mentioned in feedback to previous version, even though
> encl_size is now provided, this misses that mrenclave_ecreate()
> continues to recompute it within.
> 
> Reinette
> 

I simply forgot this one, sorry. Will fix.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN
  2022-08-30 22:56   ` Reinette Chatre
@ 2022-08-31  2:31     ` Jarkko Sakkinen
  2022-08-31 18:09       ` Reinette Chatre
  2022-08-31 18:14       ` Dave Hansen
  0 siblings, 2 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  2:31 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On Tue, Aug 30, 2022 at 03:56:29PM -0700, Reinette Chatre wrote:
> Hi Haitao and Jarkko,
> 
> 
> selftests/sgx: Retry the ioctl()s returned with EAGAIN
> 
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > From: Haitao Huang <haitao.huang@linux.intel.com>
> > 
> > For EMODT and EREMOVE ioctls with a large range, kernel
> 
> ioctl()s?

Ioctl is common enough to be considered as noun and is
widely phrased like that in commit messages. I don't
see any added clarity.

> 
> > may not finish in one shot and return EAGAIN error code
> > and count of bytes of EPC pages on that operations are
> > finished successfully.
> > 
> > Change the unclobbered_vdso_oversubscribed_remove test
> > to rerun the ioctls in a loop, updating offset and length
> 
> ioctl()s?
> 
> > using the byte count returned in each iteration.
> > 
> 
> This is a valuable addition, thank you very much.
> 
> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> > Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> >  tools/testing/selftests/sgx/main.c | 39 +++++++++++++++++++++++-------
> >  1 file changed, 30 insertions(+), 9 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
> > index 867e98502120..3268d8b01b0b 100644
> > --- a/tools/testing/selftests/sgx/main.c
> > +++ b/tools/testing/selftests/sgx/main.c
> > @@ -399,7 +399,7 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
> >  	unsigned long total_mem;
> >  	int ret, errno_save;
> >  	unsigned long addr;
> > -	unsigned long i;
> > +	unsigned long i, count;
> 
> Reverse fir tree?

+1

> 
> >  
> >  	/*
> >  	 * Create enclave with additional heap that is as big as all
> > @@ -461,16 +461,27 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
> >  	modt_ioc.offset = heap->offset;
> >  	modt_ioc.length = heap->size;
> >  	modt_ioc.page_type = SGX_PAGE_TYPE_TRIM;
> > -
> > +	count = 0;
> >  	TH_LOG("Changing type of %zd bytes to trimmed may take a while ...",
> >  	       heap->size);
> > -	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPES, &modt_ioc);
> > -	errno_save = ret == -1 ? errno : 0;
> > +	do {
> > +		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPES, &modt_ioc);
> > +		errno_save = ret == -1 ? errno : 0;
> > +		if (errno_save == EAGAIN) {
> > +			count += modt_ioc.count;
> > +			modt_ioc.offset += modt_ioc.count;
> > +			modt_ioc.length -= modt_ioc.count;
> > +			modt_ioc.result = 0;
> 
> As part of the test I think it would be helpful to know if there was a result code
> in here. What do you think of failing the test if it is not zero?

I would not mind doing that.

> 
> > +			modt_ioc.count = 0;
> > +		} else
> > +			break;
> 
> Watch out for unbalanced braces (also later in patch). This causes
> checkpatch.pl noise.

Again. I did run checkpatch to all of these. Will revisit.

> 
> > +	} while (modt_ioc.length != 0);
> >  
> >  	EXPECT_EQ(ret, 0);
> >  	EXPECT_EQ(errno_save, 0);
> >  	EXPECT_EQ(modt_ioc.result, 0);
> > -	EXPECT_EQ(modt_ioc.count, heap->size);
> > +	count += modt_ioc.count;
> > +	EXPECT_EQ(count, heap->size);
> >  
> >  	/* EACCEPT all removed pages. */
> >  	addr = self->encl.encl_base + heap->offset;
> > @@ -498,15 +509,25 @@ TEST_F_TIMEOUT(enclave, unclobbered_vdso_oversubscribed_remove, 900)
> >  
> >  	remove_ioc.offset = heap->offset;
> >  	remove_ioc.length = heap->size;
> > -
> > +	count = 0;
> >  	TH_LOG("Removing %zd bytes from enclave may take a while ...",
> >  	       heap->size);
> > -	ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
> > -	errno_save = ret == -1 ? errno : 0;
> > +	do {
> > +		ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
> > +		errno_save = ret == -1 ? errno : 0;
> > +		if (errno_save == EAGAIN) {
> > +			count += remove_ioc.count;
> > +			remove_ioc.offset += remove_ioc.count;
> > +			remove_ioc.length -= remove_ioc.count;
> > +			remove_ioc.count = 0;
> > +		} else
> > +			break;
> > +	} while (remove_ioc.length != 0);
> >  
> >  	EXPECT_EQ(ret, 0);
> >  	EXPECT_EQ(errno_save, 0);
> > -	EXPECT_EQ(remove_ioc.count, heap->size);
> > +	count += remove_ioc.count;
> > +	EXPECT_EQ(count, heap->size);
> >  }
> >  
> >  TEST_F(enclave, clobbered_vdso)
> 
> Reinette


BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-08-30 22:57   ` Reinette Chatre
@ 2022-08-31  2:33     ` Jarkko Sakkinen
  2022-08-31 18:10       ` Reinette Chatre
  0 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  2:33 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list, open list:KERNEL SELFTEST FRAMEWORK

On Tue, Aug 30, 2022 at 03:57:24PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> >  tools/testing/selftests/sgx/alloc-error.bt | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >  create mode 100644 tools/testing/selftests/sgx/alloc-error.bt
> > 
> > diff --git a/tools/testing/selftests/sgx/alloc-error.bt b/tools/testing/selftests/sgx/alloc-error.bt
> > new file mode 100644
> > index 000000000000..9268d50dea29
> > --- /dev/null
> > +++ b/tools/testing/selftests/sgx/alloc-error.bt
> > @@ -0,0 +1,7 @@
> > +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
> > +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
> > +}
> > +
> > +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
> > +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
> > +}
> 
> 
> Could there be a snippet of comments in this new file to guide users
> on how to use this script?

Do not mean to be rude but I'm not sure what there is to guide but
I'm open for ideas.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  2:15       ` jarkko
@ 2022-08-31  2:35         ` Huang, Kai
  2022-08-31  2:44           ` jarkko
  0 siblings, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2022-08-31  2:35 UTC (permalink / raw)
  To: jarkko
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > prematurely stopped.
> > > 
> > > I do not think misc_register() is required to fail for the scenario to
> > > be triggered (rather use "or" than "and"?). Perhaps just
> > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > (via sgx_page_reclaimer_init()) ...".
> > 
> > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > be initialized successfully, sgx_init() still returns 0.
> > 
> > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > initialized successfully.  Then the code change in this patch won't be necessary
> > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > early stage before we are sure either the driver or KVM SGX will work.
> 
> I would focus fixing the existing flow rather than reinventing the flow.
> 
> It can be made to work, and therefore it is IMHO correct action to take.

From another perspective, the *existing flow* is the reason which causes this
bug.  A real fix is to fix the flow itself.

> 
> > Btw currently EPC pages assigned to KVM guest cannot be reclaimed, so
> > theoretically ksgxd can be moved to sgx_drv_init(), but who knows someday we
> > will decide to make KVM guest EPC pages to be able to be reclaimed. :)
> 
> I'm open to changes but it is in my opinion out of context for this.
> 
> 

Yeah.  I was expressing the reason I suggested to move sgx_page_reclaimer_init()
to the end of sgx_init(), but not to sgx_drv_init().

But moving to sgx_drv_init() also makes sense to me given KVM guest EPC pages
are not reclaimable now.  For now there's no reason to run ksgxd() if only
virtual EPC driver is enabled.  We can move sgx_page_reclaimer_init() out of
sgx_drv_init() when we add KVM guest EPC page reclaiming support (if it happens
in the future).

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  2:35         ` Huang, Kai
@ 2022-08-31  2:44           ` jarkko
  2022-08-31  2:55             ` Huang, Kai
  0 siblings, 1 reply; 56+ messages in thread
From: jarkko @ 2022-08-31  2:44 UTC (permalink / raw)
  To: Huang, Kai
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > Hi Jarkko,
> > > > 
> > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > prematurely stopped.
> > > > 
> > > > I do not think misc_register() is required to fail for the scenario to
> > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > (via sgx_page_reclaimer_init()) ...".
> > > 
> > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > be initialized successfully, sgx_init() still returns 0.
> > > 
> > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > initialized successfully.  Then the code change in this patch won't be necessary
> > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > early stage before we are sure either the driver or KVM SGX will work.
> > 
> > I would focus fixing the existing flow rather than reinventing the flow.
> > 
> > It can be made to work, and therefore it is IMHO correct action to take.
> 
> From another perspective, the *existing flow* is the reason which causes this
> bug.  A real fix is to fix the flow itself.

Any existing flow in part of the kernel can have a bug. That
does not mean that switching flow would be proper way to fix
a bug.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  2:44           ` jarkko
@ 2022-08-31  2:55             ` Huang, Kai
  2022-08-31  2:57               ` jarkko
  0 siblings, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2022-08-31  2:55 UTC (permalink / raw)
  To: jarkko
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > Hi Jarkko,
> > > > > 
> > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > prematurely stopped.
> > > > > 
> > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > (via sgx_page_reclaimer_init()) ...".
> > > > 
> > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > be initialized successfully, sgx_init() still returns 0.
> > > > 
> > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > early stage before we are sure either the driver or KVM SGX will work.
> > > 
> > > I would focus fixing the existing flow rather than reinventing the flow.
> > > 
> > > It can be made to work, and therefore it is IMHO correct action to take.
> > 
> > From another perspective, the *existing flow* is the reason which causes this
> > bug.  A real fix is to fix the flow itself.
> 
> Any existing flow in part of the kernel can have a bug. That
> does not mean that switching flow would be proper way to fix
> a bug.
> 
> BR, Jarkko

Yes but I think this is only true when the flow is reasonable.  If the flow
itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).

Anyway, let us also hear from others.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  2:55             ` Huang, Kai
@ 2022-08-31  2:57               ` jarkko
  2022-08-31  3:10                 ` Jarkko Sakkinen
  2022-08-31  3:17                 ` Huang, Kai
  0 siblings, 2 replies; 56+ messages in thread
From: jarkko @ 2022-08-31  2:57 UTC (permalink / raw)
  To: Huang, Kai
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > Hi Jarkko,
> > > > > > 
> > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > prematurely stopped.
> > > > > > 
> > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > 
> > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > 
> > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > 
> > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > 
> > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > 
> > > From another perspective, the *existing flow* is the reason which causes this
> > > bug.  A real fix is to fix the flow itself.
> > 
> > Any existing flow in part of the kernel can have a bug. That
> > does not mean that switching flow would be proper way to fix
> > a bug.
> > 
> > BR, Jarkko
> 
> Yes but I think this is only true when the flow is reasonable.  If the flow
> itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> 
> Anyway, let us also hear from others.

The flow can be made to work without issues, which in the
context of a bug fix is exactly what a bug fix should do.
Not more or less.

You don't gain any measurable value for the user with this
switch idea.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  2:57               ` jarkko
@ 2022-08-31  3:10                 ` Jarkko Sakkinen
  2022-08-31  3:28                   ` Huang, Kai
  2022-08-31  3:17                 ` Huang, Kai
  1 sibling, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31  3:10 UTC (permalink / raw)
  To: Huang, Kai
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > Hi Jarkko,
> > > > > > > 
> > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > prematurely stopped.
> > > > > > > 
> > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > 
> > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > 
> > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > 
> > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > 
> > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > 
> > > > From another perspective, the *existing flow* is the reason which causes this
> > > > bug.  A real fix is to fix the flow itself.
> > > 
> > > Any existing flow in part of the kernel can have a bug. That
> > > does not mean that switching flow would be proper way to fix
> > > a bug.
> > > 
> > > BR, Jarkko
> > 
> > Yes but I think this is only true when the flow is reasonable.  If the flow
> > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > 
> > Anyway, let us also hear from others.
> 
> The flow can be made to work without issues, which in the
> context of a bug fix is exactly what a bug fix should do.
> Not more or less.
> 
> You don't gain any measurable value for the user with this
> switch idea.

And besides this not proper way to review patch anyway because you did
not review the code. I'll focus on fix what is broken e.g. so that it
is easy to backport to stable and distro kernels, and call it a day.
It certainly does not have to make code "perfect", as long as known
bugs are sorted out.

You are welcome to review the next version of the patch, once I've
resolved the issues that were pointed out by Reinette, if you still
see some issue but this type of speculative discussion is frankly just
wasting everyones time.

(need to check my mutt config, do not know why it is not always
putting real name to from field)

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  2:57               ` jarkko
  2022-08-31  3:10                 ` Jarkko Sakkinen
@ 2022-08-31  3:17                 ` Huang, Kai
  2022-08-31 15:18                   ` Haitao Huang
  1 sibling, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2022-08-31  3:17 UTC (permalink / raw)
  To: jarkko
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, 2022-08-31 at 05:57 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > Hi Jarkko,
> > > > > > > 
> > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > prematurely stopped.
> > > > > > > 
> > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > 
> > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > 
> > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > 
> > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > 
> > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > 
> > > > From another perspective, the *existing flow* is the reason which causes this
> > > > bug.  A real fix is to fix the flow itself.
> > > 
> > > Any existing flow in part of the kernel can have a bug. That
> > > does not mean that switching flow would be proper way to fix
> > > a bug.
> > > 
> > > BR, Jarkko
> > 
> > Yes but I think this is only true when the flow is reasonable.  If the flow
> > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > 
> > Anyway, let us also hear from others.
> 
> The flow can be made to work without issues, which in the
> context of a bug fix is exactly what a bug fix should do.
> Not more or less.

No. To me the flow itself is buggy.  There's no reason to start ksgxd() before
at least SGX driver is initialized to work.

Patching the buggy flow is more like a workaround, but isn't a real fix.

> 
> You don't gain any measurable value for the user with this
> switch idea.

There is actual gain by moving sgx_page_reclaimer_init() to sgx_drv_init(), or
only calling sgx_page_reclaimer_init() when sgx_drv_init() returns success:

If somehow sgx_drv_init() fails to initialize, ksgxd() won't run.

Currently, if SGX driver fails to initialize but virtual EPC initializes
successfully, ksgxd() still runs. However it achieves nothing but only wastes
CPU cycles.


-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  3:10                 ` Jarkko Sakkinen
@ 2022-08-31  3:28                   ` Huang, Kai
  2022-08-31  3:40                     ` jarkko
  0 siblings, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2022-08-31  3:28 UTC (permalink / raw)
  To: jarkko
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, 2022-08-31 at 06:10 +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > > Hi Jarkko,
> > > > > > > > 
> > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > > prematurely stopped.
> > > > > > > > 
> > > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > > 
> > > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > > 
> > > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > > 
> > > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > > 
> > > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > > 
> > > > > From another perspective, the *existing flow* is the reason which causes this
> > > > > bug.  A real fix is to fix the flow itself.
> > > > 
> > > > Any existing flow in part of the kernel can have a bug. That
> > > > does not mean that switching flow would be proper way to fix
> > > > a bug.
> > > > 
> > > > BR, Jarkko
> > > 
> > > Yes but I think this is only true when the flow is reasonable.  If the flow
> > > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > > 
> > > Anyway, let us also hear from others.
> > 
> > The flow can be made to work without issues, which in the
> > context of a bug fix is exactly what a bug fix should do.
> > Not more or less.
> > 
> > You don't gain any measurable value for the user with this
> > switch idea.
> 
> And besides this not proper way to review patch anyway because you did
> not review the code. 
> 

I did review the code, but I couldn't agree on the fix.  That's why I expressed
my view here.


> I'll focus on fix what is broken e.g. so that it
> is easy to backport to stable and distro kernels, and call it a day.
> It certainly does not have to make code "perfect", as long as known
> bugs are sorted out.

Why cannot the fix which fixes the flow go to stable?

> 
> You are welcome to review the next version of the patch, once I've
> resolved the issues that were pointed out by Reinette, if you still
> see some issue but this type of speculative discussion is frankly just
> wasting everyones time.

Hmm.. Why pointing out a better fix (my perspective of course) is wasting
everyone's time?


-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  3:28                   ` Huang, Kai
@ 2022-08-31  3:40                     ` jarkko
  0 siblings, 0 replies; 56+ messages in thread
From: jarkko @ 2022-08-31  3:40 UTC (permalink / raw)
  To: Huang, Kai
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, haitao.huang, hpa, linux-kernel

On Wed, Aug 31, 2022 at 03:28:20AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 06:10 +0300, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 05:57:22AM +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > > > Hi Jarkko,
> > > > > > > > > 
> > > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > > > In sgx_init(), if misc_register() for the provision device fails, and
> > > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > > > > > > > prematurely stopped.
> > > > > > > > > 
> > > > > > > > > I do not think misc_register() is required to fail for the scenario to
> > > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is started
> > > > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > > > 
> > > > > > > > IMHO "a failure" might be too vague.  For instance, failure to sgx_drv_init()
> > > > > > > > won't immediately result in ksgxd to stop prematurally.  As long as KVM SGX can
> > > > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > > > 
> > > > > > > > Btw I was thinking whether we should move sgx_page_reclaimer_init() to the end
> > > > > > > > of sgx_init(), after we make sure at least one of the driver and the KVM SGX is
> > > > > > > > initialized successfully.  Then the code change in this patch won't be necessary
> > > > > > > > if I understand correctly.  AFAICT there's no good reason to start the ksgxd at
> > > > > > > > early stage before we are sure either the driver or KVM SGX will work.
> > > > > > > 
> > > > > > > I would focus fixing the existing flow rather than reinventing the flow.
> > > > > > > 
> > > > > > > It can be made to work, and therefore it is IMHO correct action to take.
> > > > > > 
> > > > > > From another perspective, the *existing flow* is the reason which causes this
> > > > > > bug.  A real fix is to fix the flow itself.
> > > > > 
> > > > > Any existing flow in part of the kernel can have a bug. That
> > > > > does not mean that switching flow would be proper way to fix
> > > > > a bug.
> > > > > 
> > > > > BR, Jarkko
> > > > 
> > > > Yes but I think this is only true when the flow is reasonable.  If the flow
> > > > itself isn't reasonable, we should fix the flow (given it's easy to fix AFAICT).
> > > > 
> > > > Anyway, let us also hear from others.
> > > 
> > > The flow can be made to work without issues, which in the
> > > context of a bug fix is exactly what a bug fix should do.
> > > Not more or less.
> > > 
> > > You don't gain any measurable value for the user with this
> > > switch idea.
> > 
> > And besides this not proper way to review patch anyway because you did
> > not review the code. 
> > 
> 
> I did review the code, but I couldn't agree on the fix.  That's why I expressed
> my view here.
> 
> 
> > I'll focus on fix what is broken e.g. so that it
> > is easy to backport to stable and distro kernels, and call it a day.
> > It certainly does not have to make code "perfect", as long as known
> > bugs are sorted out.
> 
> Why cannot the fix which fixes the flow go to stable?
> 
> > 
> > You are welcome to review the next version of the patch, once I've
> > resolved the issues that were pointed out by Reinette, if you still
> > see some issue but this type of speculative discussion is frankly just
> > wasting everyones time.
> 
> Hmm.. Why pointing out a better fix (my perspective of course) is wasting
> everyone's time?

There was not a single inline comment.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  3:17                 ` Huang, Kai
@ 2022-08-31 15:18                   ` Haitao Huang
  2022-08-31 18:28                     ` jarkko
  0 siblings, 1 reply; 56+ messages in thread
From: Haitao Huang @ 2022-08-31 15:18 UTC (permalink / raw)
  To: jarkko, Huang, Kai
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, hpa, linux-kernel

Hi Kai
On Tue, 30 Aug 2022 22:17:08 -0500, Huang, Kai <kai.huang@intel.com> wrote:

> On Wed, 2022-08-31 at 05:57 +0300, jarkko@kernel.org wrote:
>> On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
>> > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
>> > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
>> > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
>> > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
>> > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
>> > > > > > > Hi Jarkko,
>> > > > > > >
>> > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>> > > > > > > > In sgx_init(), if misc_register() for the provision  
>> device fails, and
>> > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then  
>> ksgxd will be
>> > > > > > > > prematurely stopped.
>> > > > > > >
>> > > > > > > I do not think misc_register() is required to fail for the  
>> scenario to
>> > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
>> > > > > > > "In sgx_init(), if a failure is encountered after ksgxd is  
>> started
>> > > > > > > (via sgx_page_reclaimer_init()) ...".
>> > > > > >
>> > > > > > IMHO "a failure" might be too vague.  For instance, failure  
>> to sgx_drv_init()
>> > > > > > won't immediately result in ksgxd to stop prematurally.  As  
>> long as KVM SGX can
>> > > > > > be initialized successfully, sgx_init() still returns 0.
>> > > > > >
>> > > > > > Btw I was thinking whether we should move  
>> sgx_page_reclaimer_init() to the end
>> > > > > > of sgx_init(), after we make sure at least one of the driver  
>> and the KVM SGX is
>> > > > > > initialized successfully.  Then the code change in this patch  
>> won't be necessary
>> > > > > > if I understand correctly.  AFAICT there's no good reason to  
>> start the ksgxd at
>> > > > > > early stage before we are sure either the driver or KVM SGX  
>> will work.
>> > > > >
>> > > > > I would focus fixing the existing flow rather than reinventing  
>> the flow.
>> > > > >
>> > > > > It can be made to work, and therefore it is IMHO correct action  
>> to take.
>> > > >
>> > > > From another perspective, the *existing flow* is the reason which  
>> causes this
>> > > > bug.  A real fix is to fix the flow itself.
>> > >
>> > > Any existing flow in part of the kernel can have a bug. That
>> > > does not mean that switching flow would be proper way to fix
>> > > a bug.
>> > >
>> > > BR, Jarkko
>> >
>> > Yes but I think this is only true when the flow is reasonable.  If  
>> the flow
>> > itself isn't reasonable, we should fix the flow (given it's easy to  
>> fix AFAICT).
>> >
>> > Anyway, let us also hear from others.
>>
>> The flow can be made to work without issues, which in the
>> context of a bug fix is exactly what a bug fix should do.
>> Not more or less.
>
> No. To me the flow itself is buggy.  There's no reason to start ksgxd()  
> before
> at least SGX driver is initialized to work.
>

Will it cause racing if we expose dev nodes to user space before
ksgxd is started and sensitization done?

> Patching the buggy flow is more like a workaround, but isn't a real fix.
>
>>
>> You don't gain any measurable value for the user with this
>> switch idea.
>
> There is actual gain by moving sgx_page_reclaimer_init() to  
> sgx_drv_init(), or
> only calling sgx_page_reclaimer_init() when sgx_drv_init() returns  
> success:
>
> If somehow sgx_drv_init() fails to initialize, ksgxd() won't run.
>
> Currently, if SGX driver fails to initialize but virtual EPC initializes
> successfully, ksgxd() still runs. However it achieves nothing but only  
> wastes
> CPU cycles.
>
>

You still need ksgxd for sanitizing (at least) and swapping (potentially)
even if only virtual EPC initializes.

Thanks
Haitao

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31  1:55     ` Jarkko Sakkinen
  2022-08-31  1:58       ` Jarkko Sakkinen
@ 2022-08-31 18:08       ` Reinette Chatre
  1 sibling, 0 replies; 56+ messages in thread
From: Reinette Chatre @ 2022-08-31 18:08 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi Jarkko,

On 8/30/2022 6:55 PM, Jarkko Sakkinen wrote:
> On Tue, Aug 30, 2022 at 03:54:27PM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>>> In sgx_init(), if misc_register() for the provision device fails, and
>>> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
>>> prematurely stopped.
>>
>> I do not think misc_register() is required to fail for the scenario to
>> be triggered (rather use "or" than "and"?). Perhaps just
>> "In sgx_init(), if a failure is encountered after ksgxd is started
>> (via sgx_page_reclaimer_init()) ...".
> 
> This would be the fixed version of the sentence:
> 
> "
> In sgx_init(), if misc_register() fails or misc_register() succeeds but
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped. This may leave some unsanitized pages, which does
> not matter, because SGX will be disabled for the whole power cycle.
> "
> 
> I want to keep the end states listed and not make it more abstract.
> 
> The second sentence addresses the remark below.

Thank you for capturing this. It sounds good.

> 
>> To help the reader understand the subject of this patch it may help
>> to explain that prematurely stopping ksgxd may leave some
>> unsanitized pages, but that is not a problem since SGX cannot
>> be used on the platform anyway. 
>>
>>> This triggers WARN_ON() because sgx_dirty_page_list ends up being
>>> non-empty, and dumps the call stack:
>>>
>>
>> Traces like below can be frowned upon. I recommend that you follow the
>> guidance in "Backtraces in commit mesages"(sic) in 
>> Documentation/process/submitting-patches.rst.
>>
>>> [    0.268592] WARNING: CPU: 6 PID: 83 at
>>> arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> 
> Is this good enough? I had not actually spotted this section before but
> nice that it exists. Apparently has been added in 5.12.

The timestamp should be removed, it is among the things listed as
"distracting information". In this case the backtrace and registers within
the trace are not adding value but I think it is important to mention
the kernel version somewhere for folks to be able to interpret the
line number provided.

Yet I see you kept the whole trace in V2 ?

>>> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
>>> index 515e2a5f25bb..903100fcfce3 100644
>>> --- a/arch/x86/kernel/cpu/sgx/main.c
>>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>>> @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
>>>   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
>>>   * from the input list, and made available for the page allocator. SECS pages
>>>   * prepending their children in the input list are left intact.
>>> + *
>>> + * Contents of the @dirty_page_list must be thread-local, i.e.
>>> + * not shared by multiple threads.
>>
>> Did you intend to mention something about the needed locking here? It looks
>> like some information is lost during the move to the function description.
> 
> Nothing about the locking that concerns the parameter, as the
> sentence defines clear constraints for the caller.

ok

> 
>>
>>>   */
>>> -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>>> +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
>>>  {
>>>  	struct sgx_epc_page *page;
>>> +	int left_dirty = 0;
>>
>> I do not know how many pages this code should be ready for but at least
>> this could handle more by being an unsigned int considering that it is
>> always positive ... maybe even unsigned long?
> 
> I would go for 'long'. More information below.
> 
>>
>>>  	LIST_HEAD(dirty);
>>>  	int ret;
>>>  
>>> -	/* dirty_page_list is thread-local, no need for a lock: */
>>>  	while (!list_empty(dirty_page_list)) {
>>>  		if (kthread_should_stop())
>>> -			return;
>>> +			break;
>>>  
>>>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>>>  
>>> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>>>  		} else {
>>>  			/* The page is not yet clean - move to the dirty list. */
>>>  			list_move_tail(&page->list, &dirty);
>>> +			left_dirty++;
>>>  		}
>>>  
>>>  		cond_resched();
>>>  	}
>>>  
>>>  	list_splice(&dirty, dirty_page_list);
>>> +	return left_dirty;
>>>  }
>>>  
>>>  static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
>>> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>>>  
>>>  static int ksgxd(void *p)
>>>  {
>>> +	int left_dirty;
>>> +
>>>  	set_freezable();
>>>  
>>>  	/*
>>> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>>>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>>>  	 */
>>>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
>>> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>>>  
>>> -	/* sanity check: */
>>> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
>>> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
>>> +	if (left_dirty)
>>> +		pr_warn("%d unsanitized pages\n", left_dirty);
>>>  
>>>  	while (!kthread_should_stop()) {
>>>  		if (try_to_freeze())
>>
>>
>> Reinette
> 
> We need to return -ECANCELED on premature stop, and number of
> pages otherwise.
> 
> In premature stop, nothing should be printed, as the number
> is by practical means a random number. Otherwise, it is an
> indicator of a bug in the driver, and therefore a non-zero
> number should be printed pr_err(), if that happens after the
> second call.
> 

Good point.

> Oh, sorry, I forgot one thing. The devices should be actually
> deinitialized in the error case. We do not want to leave a
> broken driver running.

Is this not already done on the error path of sgx_init()?

Reinette


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-08-31  2:28     ` Jarkko Sakkinen
@ 2022-08-31 18:09       ` Reinette Chatre
  2022-09-01 22:16         ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-08-31 18:09 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

Hi Jarkko,

On 8/30/2022 7:28 PM, Jarkko Sakkinen wrote:
> On Tue, Aug 30, 2022 at 03:55:47PM -0700, Reinette Chatre wrote:
>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>>> From: Vijay Dhanraj <vijay.dhanraj@intel.com>
>>>
>>> Add a new test case which is same as augment_via_eaccept but adds a
>>> larger number of EPC pages to stress test EAUG via EACCEPT.
>>>
>>> Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
>>> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
>>> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
>>> ---
>>> v2:
>>> - Addressed Reinette's feedback:
>>>   https://lore.kernel.org/linux-sgx/24bd8e42-ff4e-0090-d9e1-cd81e4807f21@intel.com/
>>> ---
>>>  tools/testing/selftests/sgx/load.c      |   5 +-
>>>  tools/testing/selftests/sgx/main.c      | 141 +++++++++++++++++++++---
>>>  tools/testing/selftests/sgx/main.h      |   3 +-
>>>  tools/testing/selftests/sgx/sigstruct.c |   2 +-
>>>  4 files changed, 129 insertions(+), 22 deletions(-)
>>
>> There seems to be at least three patches merged into one here:
>> 1) Update SGX selftests to create enclaves with provided size dedicated
>>    to EDMM (this change causes a lot of noise and distracts from the test
>>    addition).
>> 2) The mrenclave_ecreate() fix (which is still incomplete).
>> 3) The actual test addition.
> 
> I would agree on this on a kernel patch but not for kselftest patch. It
> does not really give useful value here. This adds a test and that is a
> good enough granularity in my opinion, unless some major architecture
> work is required as precursory. It is not the case here.

I must say that for many good reasons this goes against one of the
fundamental rules of kernel patches: separate logical changes into
separate patches. This is your domain though so of course the work
within it follows your guidance and I will not pursue it further.

> 
>>> diff --git a/tools/testing/selftests/sgx/load.c b/tools/testing/selftests/sgx/load.c
>>> index 94bdeac1cf04..7de1b15c90b1 100644
>>> --- a/tools/testing/selftests/sgx/load.c
>>> +++ b/tools/testing/selftests/sgx/load.c
>>> @@ -171,7 +171,8 @@ uint64_t encl_get_entry(struct encl *encl, const char *symbol)
>>>  	return 0;
>>>  }
>>>  
>>> -bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
>>> +bool encl_load(const char *path, struct encl *encl, unsigned long heap_size,
>>> +			   unsigned long edmm_size)
>>>  {
>>
>> checkpatch.pl informs about alignment issues above and also a few other places.
> 
> Weird. I did run checkpatch through these. Will revisit.

I usually run checkpatch.pl with "--strict".

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN
  2022-08-31  2:31     ` Jarkko Sakkinen
@ 2022-08-31 18:09       ` Reinette Chatre
  2022-09-01 22:17         ` Jarkko Sakkinen
  2022-08-31 18:14       ` Dave Hansen
  1 sibling, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-08-31 18:09 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

Hi Jarkko,

On 8/30/2022 7:31 PM, Jarkko Sakkinen wrote:
> On Tue, Aug 30, 2022 at 03:56:29PM -0700, Reinette Chatre wrote:
>> Hi Haitao and Jarkko,
>>
>>
>> selftests/sgx: Retry the ioctl()s returned with EAGAIN
>>
>>
>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>>> From: Haitao Huang <haitao.huang@linux.intel.com>
>>>
>>> For EMODT and EREMOVE ioctls with a large range, kernel
>>
>> ioctl()s?
> 
> Ioctl is common enough to be considered as noun and is
> widely phrased like that in commit messages. I don't
> see any added clarity.

ok. I was asked to make this change in the SGX2 patches and
thought that I should propagate this advice :)

>>> +			modt_ioc.count = 0;
>>> +		} else
>>> +			break;
>>
>> Watch out for unbalanced braces (also later in patch). This causes
>> checkpatch.pl noise.
> 
> Again. I did run checkpatch to all of these. Will revisit.

It looks like I see it because I use "checkpatch.pl --strict".

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-08-31  2:33     ` Jarkko Sakkinen
@ 2022-08-31 18:10       ` Reinette Chatre
  2022-08-31 18:23         ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-08-31 18:10 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list, open list:KERNEL SELFTEST FRAMEWORK

Hi Jarkko,

On 8/30/2022 7:33 PM, Jarkko Sakkinen wrote:
> On Tue, Aug 30, 2022 at 03:57:24PM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>>> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
>>> ---
>>>  tools/testing/selftests/sgx/alloc-error.bt | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>  create mode 100644 tools/testing/selftests/sgx/alloc-error.bt
>>>
>>> diff --git a/tools/testing/selftests/sgx/alloc-error.bt b/tools/testing/selftests/sgx/alloc-error.bt
>>> new file mode 100644
>>> index 000000000000..9268d50dea29
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/sgx/alloc-error.bt
>>> @@ -0,0 +1,7 @@
>>> +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
>>> +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
>>> +}
>>> +
>>> +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
>>> +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
>>> +}
>>
>>
>> Could there be a snippet of comments in this new file to guide users
>> on how to use this script?
> 
> Do not mean to be rude but I'm not sure what there is to guide but
> I'm open for ideas.

How about something like below in comments as part of the script:

"bpftrace script using kretprobe to trace returns of some key functions
 in support of tracking allocation errors."

This is essentially in the subject line of the patch but that information
will be lost when somebody looks at the script and tries to figure
out what it is.

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN
  2022-08-31  2:31     ` Jarkko Sakkinen
  2022-08-31 18:09       ` Reinette Chatre
@ 2022-08-31 18:14       ` Dave Hansen
  2022-09-01 22:18         ` Jarkko Sakkinen
  1 sibling, 1 reply; 56+ messages in thread
From: Dave Hansen @ 2022-08-31 18:14 UTC (permalink / raw)
  To: Jarkko Sakkinen, Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On 8/30/22 19:31, Jarkko Sakkinen wrote:
>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
>>> From: Haitao Huang <haitao.huang@linux.intel.com>
>>>
>>> For EMODT and EREMOVE ioctls with a large range, kernel
>> ioctl()s?
> Ioctl is common enough to be considered as noun and is
> widely phrased like that in commit messages. I don't
> see any added clarity.

I definitely prefer ioctl().

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-08-31 18:10       ` Reinette Chatre
@ 2022-08-31 18:23         ` Jarkko Sakkinen
  0 siblings, 0 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31 18:23 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list, open list:KERNEL SELFTEST FRAMEWORK

On Wed, Aug 31, 2022 at 11:10:20AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/30/2022 7:33 PM, Jarkko Sakkinen wrote:
> > On Tue, Aug 30, 2022 at 03:57:24PM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> >>> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> >>> ---
> >>>  tools/testing/selftests/sgx/alloc-error.bt | 7 +++++++
> >>>  1 file changed, 7 insertions(+)
> >>>  create mode 100644 tools/testing/selftests/sgx/alloc-error.bt
> >>>
> >>> diff --git a/tools/testing/selftests/sgx/alloc-error.bt b/tools/testing/selftests/sgx/alloc-error.bt
> >>> new file mode 100644
> >>> index 000000000000..9268d50dea29
> >>> --- /dev/null
> >>> +++ b/tools/testing/selftests/sgx/alloc-error.bt
> >>> @@ -0,0 +1,7 @@
> >>> +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
> >>> +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
> >>> +}
> >>> +
> >>> +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
> >>> +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
> >>> +}
> >>
> >>
> >> Could there be a snippet of comments in this new file to guide users
> >> on how to use this script?
> > 
> > Do not mean to be rude but I'm not sure what there is to guide but
> > I'm open for ideas.
> 
> How about something like below in comments as part of the script:
> 
> "bpftrace script using kretprobe to trace returns of some key functions
>  in support of tracking allocation errors."

I think comments that I put (before seeing) also
make it clear enough (not to say that what you
had not been a valid alternative).

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-08-30  3:12 ` [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors Jarkko Sakkinen
  2022-08-30 22:57   ` Reinette Chatre
@ 2022-08-31 18:23   ` Dave Hansen
  2022-09-01 22:20     ` Jarkko Sakkinen
  1 sibling, 1 reply; 56+ messages in thread
From: Dave Hansen @ 2022-08-31 18:23 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Shuah Khan, open list, open list:KERNEL SELFTEST FRAMEWORK

On 8/29/22 20:12, Jarkko Sakkinen wrote:
> diff --git a/tools/testing/selftests/sgx/alloc-error.bt b/tools/testing/selftests/sgx/alloc-error.bt
> new file mode 100644
> index 000000000000..9268d50dea29
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/alloc-error.bt
> @@ -0,0 +1,7 @@
> +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
> +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
> +}
> +
> +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
> +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
> +}

I guess this doesn't _hurt_, but it's also not exactly the easiest way
to get this done.  You need a whole bpf toolchain.  You could also just do:

	perf probe 'sgx_encl_page_alloc%return $retval'

Even *that* can be replicated in a few scant lines of shell code echoing
into /sys/kernel/debug/tracing.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 15:18                   ` Haitao Huang
@ 2022-08-31 18:28                     ` jarkko
  2022-08-31 18:35                       ` Dave Hansen
  0 siblings, 1 reply; 56+ messages in thread
From: jarkko @ 2022-08-31 18:28 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Huang, Kai, pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay,
	Chatre, Reinette, mingo, tglx, bp, hpa, linux-kernel

On Wed, Aug 31, 2022 at 10:18:00AM -0500, Haitao Huang wrote:
> Hi Kai
> On Tue, 30 Aug 2022 22:17:08 -0500, Huang, Kai <kai.huang@intel.com> wrote:
> 
> > On Wed, 2022-08-31 at 05:57 +0300, jarkko@kernel.org wrote:
> > > On Wed, Aug 31, 2022 at 02:55:52AM +0000, Huang, Kai wrote:
> > > > On Wed, 2022-08-31 at 05:44 +0300, jarkko@kernel.org wrote:
> > > > > On Wed, Aug 31, 2022 at 02:35:53AM +0000, Huang, Kai wrote:
> > > > > > On Wed, 2022-08-31 at 05:15 +0300, jarkko@kernel.org wrote:
> > > > > > > On Wed, Aug 31, 2022 at 01:27:58AM +0000, Huang, Kai wrote:
> > > > > > > > On Tue, 2022-08-30 at 15:54 -0700, Reinette Chatre wrote:
> > > > > > > > > Hi Jarkko,
> > > > > > > > >
> > > > > > > > > On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > > > > > > > > > In sgx_init(), if misc_register() for the provision
> > > device fails, and
> > > > > > > > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds,
> > > then ksgxd will be
> > > > > > > > > > prematurely stopped.
> > > > > > > > >
> > > > > > > > > I do not think misc_register() is required to fail for
> > > the scenario to
> > > > > > > > > be triggered (rather use "or" than "and"?). Perhaps just
> > > > > > > > > "In sgx_init(), if a failure is encountered after ksgxd
> > > is started
> > > > > > > > > (via sgx_page_reclaimer_init()) ...".
> > > > > > > >
> > > > > > > > IMHO "a failure" might be too vague.  For instance,
> > > failure to sgx_drv_init()
> > > > > > > > won't immediately result in ksgxd to stop prematurally.
> > > As long as KVM SGX can
> > > > > > > > be initialized successfully, sgx_init() still returns 0.
> > > > > > > >
> > > > > > > > Btw I was thinking whether we should move
> > > sgx_page_reclaimer_init() to the end
> > > > > > > > of sgx_init(), after we make sure at least one of the
> > > driver and the KVM SGX is
> > > > > > > > initialized successfully.  Then the code change in this
> > > patch won't be necessary
> > > > > > > > if I understand correctly.  AFAICT there's no good reason
> > > to start the ksgxd at
> > > > > > > > early stage before we are sure either the driver or KVM
> > > SGX will work.
> > > > > > >
> > > > > > > I would focus fixing the existing flow rather than
> > > reinventing the flow.
> > > > > > >
> > > > > > > It can be made to work, and therefore it is IMHO correct
> > > action to take.
> > > > > >
> > > > > > From another perspective, the *existing flow* is the reason
> > > which causes this
> > > > > > bug.  A real fix is to fix the flow itself.
> > > > >
> > > > > Any existing flow in part of the kernel can have a bug. That
> > > > > does not mean that switching flow would be proper way to fix
> > > > > a bug.
> > > > >
> > > > > BR, Jarkko
> > > >
> > > > Yes but I think this is only true when the flow is reasonable.  If
> > > the flow
> > > > itself isn't reasonable, we should fix the flow (given it's easy
> > > to fix AFAICT).
> > > >
> > > > Anyway, let us also hear from others.
> > > 
> > > The flow can be made to work without issues, which in the
> > > context of a bug fix is exactly what a bug fix should do.
> > > Not more or less.
> > 
> > No. To me the flow itself is buggy.  There's no reason to start ksgxd()
> > before
> > at least SGX driver is initialized to work.
> > 
> 
> Will it cause racing if we expose dev nodes to user space before
> ksgxd is started and sensitization done?

I'll to explain this.

So the point is to fix the issue at hand, and fix it locally.

Changing initialization order is simply out of context. It's
not really an argument for or against changing it

We are fixing sanitization here, and only that with zero
side-effects to any other semantics.

It's dictated by the development process [*] but more
importantly it's also just plain common sense.

[*] https://www.kernel.org/doc/html/v5.19/process/submitting-patches.html#separate-your-changes

BR, Jarkko



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 18:28                     ` jarkko
@ 2022-08-31 18:35                       ` Dave Hansen
  2022-08-31 18:44                         ` jarkko
                                           ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Dave Hansen @ 2022-08-31 18:35 UTC (permalink / raw)
  To: jarkko, Haitao Huang
  Cc: Huang, Kai, pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay,
	Chatre, Reinette, mingo, tglx, bp, hpa, linux-kernel

Jarkko, Kai and Haitao,

Can you three please start trimming your replies?  You don't need to and
should not quote the entirety of your messages every time you reply.

On 8/31/22 11:28, jarkko@kernel.org wrote:
>> Will it cause racing if we expose dev nodes to user space before
>> ksgxd is started and sensitization done?
> I'll to explain this.
> 
> So the point is to fix the issue at hand, and fix it locally.
> 
> Changing initialization order is simply out of context. It's
> not really an argument for or against changing it
> 
> We are fixing sanitization here, and only that with zero
> side-effects to any other semantics.
> 
> It's dictated by the development process [*] but more
> importantly it's also just plain common sense.

Kai, I think your suggestion is reasonable.  You make a good point about
not needing ksgxd for vepc.

*But*, I think it's a bit too much for a bugfix that's headed to
-stable.  I'm concerned that it will have unintended side effects,
*especially* when there's a working, tested alternative.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 18:35                       ` Dave Hansen
@ 2022-08-31 18:44                         ` jarkko
  2022-08-31 18:45                         ` jarkko
  2022-08-31 20:42                         ` Huang, Kai
  2 siblings, 0 replies; 56+ messages in thread
From: jarkko @ 2022-08-31 18:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Haitao Huang, Huang, Kai, pmenzel, linux-sgx, x86, dave.hansen,
	Dhanraj, Vijay, Chatre, Reinette, mingo, tglx, bp, hpa,
	linux-kernel

On Wed, Aug 31, 2022 at 11:35:10AM -0700, Dave Hansen wrote:
> Jarkko, Kai and Haitao,
> 
> Can you three please start trimming your replies?  You don't need to and
> should not quote the entirety of your messages every time you reply.
> 
> On 8/31/22 11:28, jarkko@kernel.org wrote:
> >> Will it cause racing if we expose dev nodes to user space before
> >> ksgxd is started and sensitization done?
> > I'll to explain this.
> > 
> > So the point is to fix the issue at hand, and fix it locally.
> > 
> > Changing initialization order is simply out of context. It's
> > not really an argument for or against changing it
> > 
> > We are fixing sanitization here, and only that with zero
> > side-effects to any other semantics.
> > 
> > It's dictated by the development process [*] but more
> > importantly it's also just plain common sense.
> 
> Kai, I think your suggestion is reasonable.  You make a good point about
> not needing ksgxd for vepc.
> 
> *But*, I think it's a bit too much for a bugfix that's headed to
> -stable.  I'm concerned that it will have unintended side effects,
> *especially* when there's a working, tested alternative.

Yeah, I also actually *do* agree that the suggestions could
be reasonable.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 18:35                       ` Dave Hansen
  2022-08-31 18:44                         ` jarkko
@ 2022-08-31 18:45                         ` jarkko
  2022-08-31 20:42                         ` Huang, Kai
  2 siblings, 0 replies; 56+ messages in thread
From: jarkko @ 2022-08-31 18:45 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Haitao Huang, Huang, Kai, pmenzel, linux-sgx, x86, dave.hansen,
	Dhanraj, Vijay, Chatre, Reinette, mingo, tglx, bp, hpa,
	linux-kernel

On Wed, Aug 31, 2022 at 11:35:10AM -0700, Dave Hansen wrote:
> Jarkko, Kai and Haitao,
> 
> Can you three please start trimming your replies?  You don't need to and
> should not quote the entirety of your messages every time you reply.

Sure, sorry about that.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 18:35                       ` Dave Hansen
  2022-08-31 18:44                         ` jarkko
  2022-08-31 18:45                         ` jarkko
@ 2022-08-31 20:42                         ` Huang, Kai
  2022-09-01 22:27                           ` jarkko
  2 siblings, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2022-08-31 20:42 UTC (permalink / raw)
  To: Hansen, Dave, jarkko, haitao.huang
  Cc: pmenzel, linux-sgx, x86, dave.hansen, Dhanraj, Vijay, Chatre,
	Reinette, mingo, tglx, bp, linux-kernel, hpa

On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> Jarkko, Kai and Haitao,
> 
> Can you three please start trimming your replies?  You don't need to and
> should not quote the entirety of your messages every time you reply.
> 
> On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > Will it cause racing if we expose dev nodes to user space before
> > > ksgxd is started and sensitization done?
> > I'll to explain this.
> > 
> > So the point is to fix the issue at hand, and fix it locally.
> > 
> > Changing initialization order is simply out of context. It's
> > not really an argument for or against changing it
> > 
> > We are fixing sanitization here, and only that with zero
> > side-effects to any other semantics.
> > 
> > It's dictated by the development process [*] but more
> > importantly it's also just plain common sense.
> 
> Kai, I think your suggestion is reasonable.  You make a good point about
> not needing ksgxd for vepc.
> 
> *But*, I think it's a bit too much for a bugfix that's headed to
> -stable.  I'm concerned that it will have unintended side effects,
> *especially* when there's a working, tested alternative.

Agreed. Thanks Dave/Jarkko.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-08-31 18:09       ` Reinette Chatre
@ 2022-09-01 22:16         ` Jarkko Sakkinen
  2022-09-01 23:11           ` Reinette Chatre
  0 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 22:16 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On Wed, Aug 31, 2022 at 11:09:02AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/30/2022 7:28 PM, Jarkko Sakkinen wrote:
> > On Tue, Aug 30, 2022 at 03:55:47PM -0700, Reinette Chatre wrote:
> >> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> >>> From: Vijay Dhanraj <vijay.dhanraj@intel.com>
> >>>
> >>> Add a new test case which is same as augment_via_eaccept but adds a
> >>> larger number of EPC pages to stress test EAUG via EACCEPT.
> >>>
> >>> Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
> >>> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
> >>> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> >>> ---
> >>> v2:
> >>> - Addressed Reinette's feedback:
> >>>   https://lore.kernel.org/linux-sgx/24bd8e42-ff4e-0090-d9e1-cd81e4807f21@intel.com/
> >>> ---
> >>>  tools/testing/selftests/sgx/load.c      |   5 +-
> >>>  tools/testing/selftests/sgx/main.c      | 141 +++++++++++++++++++++---
> >>>  tools/testing/selftests/sgx/main.h      |   3 +-
> >>>  tools/testing/selftests/sgx/sigstruct.c |   2 +-
> >>>  4 files changed, 129 insertions(+), 22 deletions(-)
> >>
> >> There seems to be at least three patches merged into one here:
> >> 1) Update SGX selftests to create enclaves with provided size dedicated
> >>    to EDMM (this change causes a lot of noise and distracts from the test
> >>    addition).
> >> 2) The mrenclave_ecreate() fix (which is still incomplete).
> >> 3) The actual test addition.
> > 
> > I would agree on this on a kernel patch but not for kselftest patch. It
> > does not really give useful value here. This adds a test and that is a
> > good enough granularity in my opinion, unless some major architecture
> > work is required as precursory. It is not the case here.
> 
> I must say that for many good reasons this goes against one of the
> fundamental rules of kernel patches: separate logical changes into
> separate patches. This is your domain though so of course the work
> within it follows your guidance and I will not pursue it further.

I don't consider kselftest patch exactly same as kernel patch
but I can split this. What would be good enough?

> I usually run checkpatch.pl with "--strict".

I honestly did not know that this option was available :-)
First time I hear of it.

Thanks.

> 
> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN
  2022-08-31 18:09       ` Reinette Chatre
@ 2022-09-01 22:17         ` Jarkko Sakkinen
  0 siblings, 0 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 22:17 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On Wed, Aug 31, 2022 at 11:09:21AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/30/2022 7:31 PM, Jarkko Sakkinen wrote:
> > On Tue, Aug 30, 2022 at 03:56:29PM -0700, Reinette Chatre wrote:
> >> Hi Haitao and Jarkko,
> >>
> >>
> >> selftests/sgx: Retry the ioctl()s returned with EAGAIN
> >>
> >>
> >> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> >>> From: Haitao Huang <haitao.huang@linux.intel.com>
> >>>
> >>> For EMODT and EREMOVE ioctls with a large range, kernel
> >>
> >> ioctl()s?
> > 
> > Ioctl is common enough to be considered as noun and is
> > widely phrased like that in commit messages. I don't
> > see any added clarity.
> 
> ok. I was asked to make this change in the SGX2 patches and
> thought that I should propagate this advice :)

I can use the other form too, np.

> 
> >>> +			modt_ioc.count = 0;
> >>> +		} else
> >>> +			break;
> >>
> >> Watch out for unbalanced braces (also later in patch). This causes
> >> checkpatch.pl noise.
> > 
> > Again. I did run checkpatch to all of these. Will revisit.
> 
> It looks like I see it because I use "checkpatch.pl --strict".

Thanks BTW for pointing this out :-)

> Reinette

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN
  2022-08-31 18:14       ` Dave Hansen
@ 2022-09-01 22:18         ` Jarkko Sakkinen
  0 siblings, 0 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 22:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Reinette Chatre, linux-sgx, Haitao Huang, Vijay Dhanraj,
	Dave Hansen, Shuah Khan, open list:KERNEL SELFTEST FRAMEWORK,
	open list

On Wed, Aug 31, 2022 at 11:14:02AM -0700, Dave Hansen wrote:
> On 8/30/22 19:31, Jarkko Sakkinen wrote:
> >> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> >>> From: Haitao Huang <haitao.huang@linux.intel.com>
> >>>
> >>> For EMODT and EREMOVE ioctls with a large range, kernel
> >> ioctl()s?
> > Ioctl is common enough to be considered as noun and is
> > widely phrased like that in commit messages. I don't
> > see any added clarity.
> 
> I definitely prefer ioctl().

I can use it.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-08-31 18:23   ` Dave Hansen
@ 2022-09-01 22:20     ` Jarkko Sakkinen
  2022-09-01 22:34       ` Dave Hansen
  0 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 22:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Reinette Chatre,
	Dave Hansen, Shuah Khan, open list,
	open list:KERNEL SELFTEST FRAMEWORK

On Wed, Aug 31, 2022 at 11:23:55AM -0700, Dave Hansen wrote:
> On 8/29/22 20:12, Jarkko Sakkinen wrote:
> > diff --git a/tools/testing/selftests/sgx/alloc-error.bt b/tools/testing/selftests/sgx/alloc-error.bt
> > new file mode 100644
> > index 000000000000..9268d50dea29
> > --- /dev/null
> > +++ b/tools/testing/selftests/sgx/alloc-error.bt
> > @@ -0,0 +1,7 @@
> > +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
> > +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
> > +}
> > +
> > +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
> > +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
> > +}
> 
> I guess this doesn't _hurt_, but it's also not exactly the easiest way
> to get this done.  You need a whole bpf toolchain.  You could also just do:
> 
> 	perf probe 'sgx_encl_page_alloc%return $retval'
> 
> Even *that* can be replicated in a few scant lines of shell code echoing
> into /sys/kernel/debug/tracing.

Thanks, I have not used perf that much. What if I replace
this with a shell script using perf? How do you use that
for two kretprobes?

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 20:42                         ` Huang, Kai
@ 2022-09-01 22:27                           ` jarkko
  2022-09-01 22:41                             ` Huang, Kai
  0 siblings, 1 reply; 56+ messages in thread
From: jarkko @ 2022-09-01 22:27 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Hansen, Dave, haitao.huang, pmenzel, linux-sgx, x86, dave.hansen,
	Dhanraj, Vijay, Chatre, Reinette, mingo, tglx, bp, linux-kernel,
	hpa

On Wed, Aug 31, 2022 at 08:42:59PM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> > Jarkko, Kai and Haitao,
> > 
> > Can you three please start trimming your replies?  You don't need to and
> > should not quote the entirety of your messages every time you reply.
> > 
> > On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > > Will it cause racing if we expose dev nodes to user space before
> > > > ksgxd is started and sensitization done?
> > > I'll to explain this.
> > > 
> > > So the point is to fix the issue at hand, and fix it locally.
> > > 
> > > Changing initialization order is simply out of context. It's
> > > not really an argument for or against changing it
> > > 
> > > We are fixing sanitization here, and only that with zero
> > > side-effects to any other semantics.
> > > 
> > > It's dictated by the development process [*] but more
> > > importantly it's also just plain common sense.
> > 
> > Kai, I think your suggestion is reasonable.  You make a good point about
> > not needing ksgxd for vepc.
> > 
> > *But*, I think it's a bit too much for a bugfix that's headed to
> > -stable.  I'm concerned that it will have unintended side effects,
> > *especially* when there's a working, tested alternative.
> 
> Agreed. Thanks Dave/Jarkko.

Please do a patch. It's a very reasonable suggestion when
considered out of context of this bug.

If you go really rigid with this, the compilation process
should not compile in sanitization process in the case when
only vepc is enabled. It's useless functionality in that
case.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-09-01 22:20     ` Jarkko Sakkinen
@ 2022-09-01 22:34       ` Dave Hansen
  2022-09-01 23:55         ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Hansen @ 2022-09-01 22:34 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Reinette Chatre,
	Dave Hansen, Shuah Khan, open list,
	open list:KERNEL SELFTEST FRAMEWORK

On 9/1/22 15:20, Jarkko Sakkinen wrote:
>>> +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
>>> +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
>>> +}
>>> +
>>> +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
>>> +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
>>> +}
>> I guess this doesn't _hurt_, but it's also not exactly the easiest way
>> to get this done.  You need a whole bpf toolchain.  You could also just do:
>>
>> 	perf probe 'sgx_encl_page_alloc%return $retval'
>>
>> Even *that* can be replicated in a few scant lines of shell code echoing
>> into /sys/kernel/debug/tracing.
> Thanks, I have not used perf that much. What if I replace
> this with a shell script using perf? How do you use that
> for two kretprobes?

The manpage is pretty good.

But, I'd proably be doing something along these lines:

	perf probe 'sgx_encl_page_alloc%return ret=$retval'
	perf record -e probe:sgx_encl_page_alloc -aR	\
		--filter='ret >= 0xwhatever' sleep 1
	perf script

There are probably shorter ways to do it, but I'm pretty sure that works.
	

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 22:27                           ` jarkko
@ 2022-09-01 22:41                             ` Huang, Kai
  2022-09-01 23:58                               ` jarkko
  0 siblings, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2022-09-01 22:41 UTC (permalink / raw)
  To: jarkko
  Cc: pmenzel, Hansen, Dave, linux-sgx, x86, dave.hansen, Dhanraj,
	Vijay, Chatre, Reinette, mingo, tglx, bp, haitao.huang, hpa,
	linux-kernel

On Fri, 2022-09-02 at 01:27 +0300, jarkko@kernel.org wrote:
> On Wed, Aug 31, 2022 at 08:42:59PM +0000, Huang, Kai wrote:
> > On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> > > Jarkko, Kai and Haitao,
> > > 
> > > Can you three please start trimming your replies?  You don't need to and
> > > should not quote the entirety of your messages every time you reply.
> > > 
> > > On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > > > Will it cause racing if we expose dev nodes to user space before
> > > > > ksgxd is started and sensitization done?
> > > > I'll to explain this.
> > > > 
> > > > So the point is to fix the issue at hand, and fix it locally.
> > > > 
> > > > Changing initialization order is simply out of context. It's
> > > > not really an argument for or against changing it
> > > > 
> > > > We are fixing sanitization here, and only that with zero
> > > > side-effects to any other semantics.
> > > > 
> > > > It's dictated by the development process [*] but more
> > > > importantly it's also just plain common sense.
> > > 
> > > Kai, I think your suggestion is reasonable.  You make a good point about
> > > not needing ksgxd for vepc.
> > > 
> > > *But*, I think it's a bit too much for a bugfix that's headed to
> > > -stable.  I'm concerned that it will have unintended side effects,
> > > *especially* when there's a working, tested alternative.
> > 
> > Agreed. Thanks Dave/Jarkko.
> 
> Please do a patch. It's a very reasonable suggestion when
> considered out of context of this bug.
> 
> If you go really rigid with this, the compilation process
> should not compile in sanitization process in the case when
> only vepc is enabled. It's useless functionality in that
> case.
> 
> BR, Jarkko

Yeah I am planning to work out one to see how it goes.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-09-01 22:16         ` Jarkko Sakkinen
@ 2022-09-01 23:11           ` Reinette Chatre
  2022-09-02  0:00             ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Reinette Chatre @ 2022-09-01 23:11 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

Hi Jarkko,

On 9/1/2022 3:16 PM, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 11:09:02AM -0700, Reinette Chatre wrote:
>> On 8/30/2022 7:28 PM, Jarkko Sakkinen wrote:
>>> On Tue, Aug 30, 2022 at 03:55:47PM -0700, Reinette Chatre wrote:
>>>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:

>>>> There seems to be at least three patches merged into one here:
>>>> 1) Update SGX selftests to create enclaves with provided size dedicated
>>>>    to EDMM (this change causes a lot of noise and distracts from the test
>>>>    addition).
>>>> 2) The mrenclave_ecreate() fix (which is still incomplete).
>>>> 3) The actual test addition.
>>>
>>> I would agree on this on a kernel patch but not for kselftest patch. It
>>> does not really give useful value here. This adds a test and that is a
>>> good enough granularity in my opinion, unless some major architecture
>>> work is required as precursory. It is not the case here.
>>
>> I must say that for many good reasons this goes against one of the
>> fundamental rules of kernel patches: separate logical changes into
>> separate patches. This is your domain though so of course the work
>> within it follows your guidance and I will not pursue it further.
> 
> I don't consider kselftest patch exactly same as kernel patch

You are not alone.

> but I can split this. What would be good enough?

I identified three candidate patches in my original response that
is quoted above, but as I mentioned I understand the sentiment
and this is your domain so I will not insist on it.

Reinette

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors
  2022-09-01 22:34       ` Dave Hansen
@ 2022-09-01 23:55         ` Jarkko Sakkinen
  0 siblings, 0 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 23:55 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Reinette Chatre,
	Dave Hansen, Shuah Khan, open list,
	open list:KERNEL SELFTEST FRAMEWORK

On Thu, Sep 01, 2022 at 03:34:49PM -0700, Dave Hansen wrote:
> On 9/1/22 15:20, Jarkko Sakkinen wrote:
> >>> +kr:sgx_alloc_epc_page /(uint64)retval >= (uint64)(-4095)/ {
> >>> +		 printf("sgx_alloc_epc_page: retval=%d\n", (int64)retval);
> >>> +}
> >>> +
> >>> +kr:sgx_encl_page_alloc /(uint64)retval >= (uint64)(-4095)/ {
> >>> +		 printf("sgx_encl_page_alloc: retval=%d\n", (int64)retval);
> >>> +}
> >> I guess this doesn't _hurt_, but it's also not exactly the easiest way
> >> to get this done.  You need a whole bpf toolchain.  You could also just do:
> >>
> >> 	perf probe 'sgx_encl_page_alloc%return $retval'
> >>
> >> Even *that* can be replicated in a few scant lines of shell code echoing
> >> into /sys/kernel/debug/tracing.
> > Thanks, I have not used perf that much. What if I replace
> > this with a shell script using perf? How do you use that
> > for two kretprobes?
> 
> The manpage is pretty good.
> 
> But, I'd proably be doing something along these lines:
> 
> 	perf probe 'sgx_encl_page_alloc%return ret=$retval'
> 	perf record -e probe:sgx_encl_page_alloc -aR	\
> 		--filter='ret >= 0xwhatever' sleep 1
> 	perf script
> 
> There are probably shorter ways to do it, but I'm pretty sure that works.

Thanks, will give it a shot.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 22:41                             ` Huang, Kai
@ 2022-09-01 23:58                               ` jarkko
  2022-09-02  0:26                                 ` Huang, Kai
  0 siblings, 1 reply; 56+ messages in thread
From: jarkko @ 2022-09-01 23:58 UTC (permalink / raw)
  To: Huang, Kai
  Cc: pmenzel, Hansen, Dave, linux-sgx, x86, dave.hansen, Dhanraj,
	Vijay, Chatre, Reinette, mingo, tglx, bp, haitao.huang, hpa,
	linux-kernel

On Thu, Sep 01, 2022 at 10:41:54PM +0000, Huang, Kai wrote:
> On Fri, 2022-09-02 at 01:27 +0300, jarkko@kernel.org wrote:
> > On Wed, Aug 31, 2022 at 08:42:59PM +0000, Huang, Kai wrote:
> > > On Wed, 2022-08-31 at 11:35 -0700, Dave Hansen wrote:
> > > > Jarkko, Kai and Haitao,
> > > > 
> > > > Can you three please start trimming your replies?  You don't need to and
> > > > should not quote the entirety of your messages every time you reply.
> > > > 
> > > > On 8/31/22 11:28, jarkko@kernel.org wrote:
> > > > > > Will it cause racing if we expose dev nodes to user space before
> > > > > > ksgxd is started and sensitization done?
> > > > > I'll to explain this.
> > > > > 
> > > > > So the point is to fix the issue at hand, and fix it locally.
> > > > > 
> > > > > Changing initialization order is simply out of context. It's
> > > > > not really an argument for or against changing it
> > > > > 
> > > > > We are fixing sanitization here, and only that with zero
> > > > > side-effects to any other semantics.
> > > > > 
> > > > > It's dictated by the development process [*] but more
> > > > > importantly it's also just plain common sense.
> > > > 
> > > > Kai, I think your suggestion is reasonable.  You make a good point about
> > > > not needing ksgxd for vepc.
> > > > 
> > > > *But*, I think it's a bit too much for a bugfix that's headed to
> > > > -stable.  I'm concerned that it will have unintended side effects,
> > > > *especially* when there's a working, tested alternative.
> > > 
> > > Agreed. Thanks Dave/Jarkko.
> > 
> > Please do a patch. It's a very reasonable suggestion when
> > considered out of context of this bug.
> > 
> > If you go really rigid with this, the compilation process
> > should not compile in sanitization process in the case when
> > only vepc is enabled. It's useless functionality in that
> > case.
> > 
> > BR, Jarkko
> 
> Yeah I am planning to work out one to see how it goes.

Looking forward to try it out :-)

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-09-01 23:11           ` Reinette Chatre
@ 2022-09-02  0:00             ` Jarkko Sakkinen
  2022-09-02  0:02               ` Jarkko Sakkinen
  0 siblings, 1 reply; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-09-02  0:00 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On Thu, Sep 01, 2022 at 04:11:34PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 9/1/2022 3:16 PM, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 11:09:02AM -0700, Reinette Chatre wrote:
> >> On 8/30/2022 7:28 PM, Jarkko Sakkinen wrote:
> >>> On Tue, Aug 30, 2022 at 03:55:47PM -0700, Reinette Chatre wrote:
> >>>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> 
> >>>> There seems to be at least three patches merged into one here:
> >>>> 1) Update SGX selftests to create enclaves with provided size dedicated
> >>>>    to EDMM (this change causes a lot of noise and distracts from the test
> >>>>    addition).
> >>>> 2) The mrenclave_ecreate() fix (which is still incomplete).
> >>>> 3) The actual test addition.
> >>>
> >>> I would agree on this on a kernel patch but not for kselftest patch. It
> >>> does not really give useful value here. This adds a test and that is a
> >>> good enough granularity in my opinion, unless some major architecture
> >>> work is required as precursory. It is not the case here.
> >>
> >> I must say that for many good reasons this goes against one of the
> >> fundamental rules of kernel patches: separate logical changes into
> >> separate patches. This is your domain though so of course the work
> >> within it follows your guidance and I will not pursue it further.
> > 
> > I don't consider kselftest patch exactly same as kernel patch
> 
> You are not alone.
> 
> > but I can split this. What would be good enough?
> 
> I identified three candidate patches in my original response that
> is quoted above, but as I mentioned I understand the sentiment
> and this is your domain so I will not insist on it.

OK, fair enough, I'll rework on this. It's my domain but
at least my own aim is always only satisfy on consensus
:-)

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long
  2022-09-02  0:00             ` Jarkko Sakkinen
@ 2022-09-02  0:02               ` Jarkko Sakkinen
  0 siblings, 0 replies; 56+ messages in thread
From: Jarkko Sakkinen @ 2022-09-02  0:02 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Shuah Khan,
	open list:KERNEL SELFTEST FRAMEWORK, open list

On Fri, Sep 02, 2022 at 03:00:36AM +0300, Jarkko Sakkinen wrote:
> On Thu, Sep 01, 2022 at 04:11:34PM -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 9/1/2022 3:16 PM, Jarkko Sakkinen wrote:
> > > On Wed, Aug 31, 2022 at 11:09:02AM -0700, Reinette Chatre wrote:
> > >> On 8/30/2022 7:28 PM, Jarkko Sakkinen wrote:
> > >>> On Tue, Aug 30, 2022 at 03:55:47PM -0700, Reinette Chatre wrote:
> > >>>> On 8/29/2022 8:12 PM, Jarkko Sakkinen wrote:
> > 
> > >>>> There seems to be at least three patches merged into one here:
> > >>>> 1) Update SGX selftests to create enclaves with provided size dedicated
> > >>>>    to EDMM (this change causes a lot of noise and distracts from the test
> > >>>>    addition).
> > >>>> 2) The mrenclave_ecreate() fix (which is still incomplete).
> > >>>> 3) The actual test addition.
> > >>>
> > >>> I would agree on this on a kernel patch but not for kselftest patch. It
> > >>> does not really give useful value here. This adds a test and that is a
> > >>> good enough granularity in my opinion, unless some major architecture
> > >>> work is required as precursory. It is not the case here.
> > >>
> > >> I must say that for many good reasons this goes against one of the
> > >> fundamental rules of kernel patches: separate logical changes into
> > >> separate patches. This is your domain though so of course the work
> > >> within it follows your guidance and I will not pursue it further.
> > > 
> > > I don't consider kselftest patch exactly same as kernel patch
> > 
> > You are not alone.
> > 
> > > but I can split this. What would be good enough?
> > 
> > I identified three candidate patches in my original response that
> > is quoted above, but as I mentioned I understand the sentiment
> > and this is your domain so I will not insist on it.
> 
> OK, fair enough, I'll rework on this. It's my domain but
> at least my own aim is always only satisfy on consensus
> :-)

I'll also split the patch set because this is not as urgent
as getting the fixes in. There will be separate fixes and
kselftest improvements patch sets.

BR, Jarkko

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 23:58                               ` jarkko
@ 2022-09-02  0:26                                 ` Huang, Kai
  0 siblings, 0 replies; 56+ messages in thread
From: Huang, Kai @ 2022-09-02  0:26 UTC (permalink / raw)
  To: jarkko
  Cc: pmenzel, Hansen, Dave, linux-sgx, x86, dave.hansen, Dhanraj,
	Vijay, Chatre, Reinette, mingo, tglx, bp, haitao.huang, hpa,
	linux-kernel

> >
> > Yeah I am planning to work out one to see how it goes.
> 
> Looking forward to try it out :-)

Just let you know I'll work on it perhaps late next week or the week after, since I have other higher priority tasks to do. :)

Thanks,
-Kai


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2022-09-02  0:27 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-30  3:12 [PATCH 0/6] x86/sgx: Test and fixes Jarkko Sakkinen
2022-08-30  3:12 ` [PATCH 1/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
2022-08-30 22:54   ` Reinette Chatre
2022-08-31  1:27     ` Huang, Kai
2022-08-31  2:15       ` jarkko
2022-08-31  2:35         ` Huang, Kai
2022-08-31  2:44           ` jarkko
2022-08-31  2:55             ` Huang, Kai
2022-08-31  2:57               ` jarkko
2022-08-31  3:10                 ` Jarkko Sakkinen
2022-08-31  3:28                   ` Huang, Kai
2022-08-31  3:40                     ` jarkko
2022-08-31  3:17                 ` Huang, Kai
2022-08-31 15:18                   ` Haitao Huang
2022-08-31 18:28                     ` jarkko
2022-08-31 18:35                       ` Dave Hansen
2022-08-31 18:44                         ` jarkko
2022-08-31 18:45                         ` jarkko
2022-08-31 20:42                         ` Huang, Kai
2022-09-01 22:27                           ` jarkko
2022-09-01 22:41                             ` Huang, Kai
2022-09-01 23:58                               ` jarkko
2022-09-02  0:26                                 ` Huang, Kai
2022-08-31  1:55     ` Jarkko Sakkinen
2022-08-31  1:58       ` Jarkko Sakkinen
2022-08-31  2:01         ` Jarkko Sakkinen
2022-08-31 18:08       ` Reinette Chatre
2022-08-30  3:12 ` [PATCH 2/6] x86/sgx: Handle VA page allocation failure for EAUG on PF Jarkko Sakkinen
2022-08-30 22:54   ` Reinette Chatre
2022-08-30  3:12 ` [PATCH 3/6] selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning Jarkko Sakkinen
2022-08-30 18:18   ` Reinette Chatre
2022-08-31  1:07     ` Jarkko Sakkinen
2022-08-30  3:12 ` [PATCH 4/6] selftests/sgx: Add SGX selftest augment_via_eaccept_long Jarkko Sakkinen
2022-08-30 22:55   ` Reinette Chatre
2022-08-31  2:28     ` Jarkko Sakkinen
2022-08-31 18:09       ` Reinette Chatre
2022-09-01 22:16         ` Jarkko Sakkinen
2022-09-01 23:11           ` Reinette Chatre
2022-09-02  0:00             ` Jarkko Sakkinen
2022-09-02  0:02               ` Jarkko Sakkinen
2022-08-30  3:12 ` [PATCH 5/6] selftests/sgx: retry the ioctls returned with EAGAIN Jarkko Sakkinen
2022-08-30 22:56   ` Reinette Chatre
2022-08-31  2:31     ` Jarkko Sakkinen
2022-08-31 18:09       ` Reinette Chatre
2022-09-01 22:17         ` Jarkko Sakkinen
2022-08-31 18:14       ` Dave Hansen
2022-09-01 22:18         ` Jarkko Sakkinen
2022-08-30  3:12 ` [PATCH 6/6] selftests/sgx: Add a bpftrace script for tracking allocation errors Jarkko Sakkinen
2022-08-30 22:57   ` Reinette Chatre
2022-08-31  2:33     ` Jarkko Sakkinen
2022-08-31 18:10       ` Reinette Chatre
2022-08-31 18:23         ` Jarkko Sakkinen
2022-08-31 18:23   ` Dave Hansen
2022-09-01 22:20     ` Jarkko Sakkinen
2022-09-01 22:34       ` Dave Hansen
2022-09-01 23:55         ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).