linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Basic recovery for machine checks inside SGX
@ 2021-07-08 18:14 Tony Luck
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Cover the easy cases:
1) memory errors reported by patrol scrubber in unused SGX pages
2) machine checks due to poison consumption from SGX_PAGE_TYPE_REG
   pages
3) When poison is consumed in an enclave inside a guest, just kill
   the guest.

Tony Luck (4):
  x86/sgx: Track phase and type of SGX EPC pages
  x86/sgx: Add basic infrastructure to recover from errors in SGX memory
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/sgx.h                    |   6 +
 arch/x86/kernel/cpu/sgx/encl.c                |   4 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |   4 +-
 arch/x86/kernel/cpu/sgx/main.c                | 147 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |  17 +-
 arch/x86/kernel/cpu/sgx/virt.c                |  11 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |   4 +
 10 files changed, 219 insertions(+), 11 deletions(-)


base-commit: 62fb9874f5da54fdb243003b386128037319b219
-- 
2.29.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  2021-07-09 18:08   ` Jarkko Sakkinen
  2021-07-14 20:42   ` Reinette Chatre
  2021-07-08 18:14 ` [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory Tony Luck
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Memory errors can be reported either synchronously as memory is accessed,
or asynchronously by speculative access or by a memory controller page
scrubber.  The life cycle of an EPC page takes it through:
	dirty -> free -> in-use -> free.

Memory errors are reported using physical addresses. It is a simple
matter to find which sgx_epc_page structure maps a given address.
But then recovery code needs to be able to determine the current use of
the page to take the appropriate recovery action. Within the "in-use"
phase different actions are needed based on how the page is used in
the enclave.

Add new flags bits to describe the phase (with an extra bit for the new
phase of "poisoned"). Drop pages marked as poisoned instead of adding
them to a free list to make sure they are not re-used.

Add a type field to struct epc_page for how an in-use page has been
allocated. Re-use "enum sgx_page_type" for this type, with a couple
of additions for s/w types.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/sgx.h      |  6 ++++++
 arch/x86/kernel/cpu/sgx/encl.c  |  4 ++--
 arch/x86/kernel/cpu/sgx/ioctl.c |  4 ++--
 arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h   | 14 ++++++++++++--
 arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
 6 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 9c31e0ebc55b..9619a6d77a83 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -216,6 +216,8 @@ struct sgx_pageinfo {
  * %SGX_PAGE_TYPE_REG:	a regular page
  * %SGX_PAGE_TYPE_VA:	a VA page
  * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
+ *
+ * Also used to track current use of &struct sgx_epc_page
  */
 enum sgx_page_type {
 	SGX_PAGE_TYPE_SECS,
@@ -223,6 +225,10 @@ enum sgx_page_type {
 	SGX_PAGE_TYPE_REG,
 	SGX_PAGE_TYPE_VA,
 	SGX_PAGE_TYPE_TRIM,
+
+	/* sgx_epc_page.type */
+	SGX_PAGE_TYPE_FREE = 100,
+	SGX_PAGE_TYPE_KVM = 101,
 };
 
 #define SGX_NR_PAGE_TYPES	5
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 3be203297988..abf6e1a704c0 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(encl_page, false);
+	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
 	if (IS_ERR(epc_page))
 		return epc_page;
 
@@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, true);
+	epc_page = sgx_alloc_epc_page(NULL,  SGX_PAGE_TYPE_VA, true);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..a74ae00194cc 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -83,7 +83,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 
 	encl->backing = backing;
 
-	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
+	secs_epc = sgx_alloc_epc_page(&encl->secs, SGX_PAGE_TYPE_SECS, true);
 	if (IS_ERR(secs_epc)) {
 		ret = PTR_ERR(secs_epc);
 		goto err_out_backing;
@@ -300,7 +300,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 	if (IS_ERR(encl_page))
 		return PTR_ERR(encl_page);
 
-	epc_page = sgx_alloc_epc_page(encl_page, true);
+	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, true);
 	if (IS_ERR(epc_page)) {
 		kfree(encl_page);
 		return PTR_ERR(epc_page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..643df87b3e01 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -401,7 +401,12 @@ static void sgx_reclaim_pages(void)
 		section = &sgx_epc_sections[epc_page->section];
 		node = section->node;
 
+		/* drop poison pages instead of adding to free list */
+		if (epc_page->flags & SGX_EPC_PAGE_POISON)
+			continue;
+
 		spin_lock(&node->lock);
+		epc_page->flags = SGX_EPC_PAGE_FREE;
 		list_add_tail(&epc_page->list, &node->free_page_list);
 		sgx_nr_free_pages++;
 		spin_unlock(&node->lock);
@@ -560,6 +565,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
 /**
  * sgx_alloc_epc_page() - Allocate an EPC page
  * @owner:	the owner of the EPC page
+ * @type:	type of page being allocated
  * @reclaim:	reclaim pages if necessary
  *
  * Iterate through EPC sections and borrow a free EPC page to the caller. When a
@@ -574,7 +580,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
  *   an EPC page,
  *   -errno on error
  */
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim)
 {
 	struct sgx_epc_page *page;
 
@@ -582,6 +588,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 		page = __sgx_alloc_epc_page();
 		if (!IS_ERR(page)) {
 			page->owner = owner;
+			page->type = type;
+			page->flags = 0;
 			break;
 		}
 
@@ -616,14 +624,22 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * responsibility to make sure that the page is in uninitialized state. In other
  * words, do EREMOVE, EWB or whatever operation is necessary before calling
  * this function.
+ *
+ * Note that if the page has been tagged as poisoned, it is simply
+ * dropped on the floor instead of added to the free list to make
+ * sure we do not re-use it.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	struct sgx_numa_node *node = section->node;
 
+	if (page->flags & SGX_EPC_PAGE_POISON)
+		return;
+
 	spin_lock(&node->lock);
 
+	page->flags = SGX_EPC_PAGE_FREE;
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
@@ -651,7 +667,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
-		section->pages[i].flags = 0;
+		section->pages[i].flags = SGX_EPC_PAGE_DIRTY;
+		section->pages[i].type = SGX_PAGE_TYPE_FREE;
 		section->pages[i].owner = NULL;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..e43d3c27eb96 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,9 +26,19 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Pages, on the "sgx_dirty_page_list" */
+#define SGX_EPC_PAGE_DIRTY		BIT(1)
+
+/* Pages, on one of the node free lists */
+#define SGX_EPC_PAGE_FREE		BIT(2)
+
+/* Pages, with h/w poison errors */
+#define SGX_EPC_PAGE_POISON		BIT(3)
+
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 type;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
@@ -82,7 +92,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page);
 
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim);
 
 #ifdef CONFIG_X86_SGX_KVM
 int __init sgx_vepc_init(void);
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 64511c4a5200..044dd92ebd63 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -46,7 +46,7 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc,
 	if (epc_page)
 		return 0;
 
-	epc_page = sgx_alloc_epc_page(vepc, false);
+	epc_page = sgx_alloc_epc_page(vepc, SGX_PAGE_TYPE_KVM, false);
 	if (IS_ERR(epc_page))
 		return PTR_ERR(epc_page);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  2021-07-08 18:14 ` [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
  2021-07-08 18:14 ` [PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck
  3 siblings, 0 replies; 13+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

X86 machine check architecture reports a physical address when there is
a memory error.

Add an end_phys_addr field to the sgx_epc_section structure and a new
function sgx_paddr_to_page() that searches all such structures to see
if a physical address is part of an SGX EPC page.

The ACPI EINJ module has a sanity check that the injection address is
valid. Add and export a function sgx_is_epc_page() so that it can be
changed to allow injection to SGX EPC pages.

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously inside the enclave send a SIGBUS just the
same as for poison consumption outside of an enclave.

For asynchronously reported errors the easiest cases are when the page
is currently free. Just drop the page from the free list so that it will
never be allocated.

An SGX_PAGE_TYPE_REG page can just be unmapped from the enclave. If the
enclave doesn't ever touch that page again all is well. If it does touch
the page it will die. Possible future enhancement here to mark the
unmapped PTE so that it will cause a SIGBUS.

SGX_PAGE_TYPE_KVM pages may be recoverable depending on how they are
being used by guests. To fix that properly requires injecting the machine
check into the guest. For now just kill it.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 126 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |   3 +-
 arch/x86/kernel/cpu/sgx/virt.c |   9 +++
 3 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 643df87b3e01..4a354f89373e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -664,6 +664,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -676,6 +677,131 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
+		section = &sgx_epc_sections[i];
+
+		if (paddr < section->phys_addr || paddr > section->end_phys_addr)
+			continue;
+
+		return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+	}
+
+	return NULL;
+}
+
+bool sgx_is_epc_page(u64 paddr)
+{
+	return !!sgx_paddr_to_page(paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
+void __attribute__((weak)) sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+}
+
+/*
+ * This can be called in three ways:
+ * a) When an enclave has just consumed poison. In this case
+ *    calling process context is the owner of the enclave.
+ * b) For some asychronous poison notification (e.g. patrol
+ *    scrubber or speculative execution saw the poison.)
+ *    In this case calling context is a kernel thread.
+ * c) Someone asked to take a page offline using the
+ *    /sys/devices/system/memory/{hard,soft}_offline_page.
+ *    In this case caller is the process writing these files.
+ */
+int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *epc_page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_encl_page *encl_page;
+	struct sgx_numa_node *node;
+	unsigned long addr;
+	u16 page_flags;
+
+	if (!epc_page)
+		return -ENXIO;
+
+	spin_lock(&sgx_reclaimer_lock);
+
+	page_flags = epc_page->flags;
+	epc_page->flags |= SGX_EPC_PAGE_POISON;
+
+	/*
+	 * Poison was synchronously consumed by an enclave in the current
+	 * task. Send a SIGBUS here. Hardware should prevent this enclave
+	 * from being re-entered, so no concern that the poisoned page
+	 * will be touched a second time. The poisoned EPC page will be
+	 * dropped as pages are freed during task exit.
+	 */
+	if (flags & MF_ACTION_REQUIRED) {
+		if (epc_page->type == SGX_PAGE_TYPE_REG) {
+			encl_page = epc_page->owner;
+			addr = encl_page->desc & PAGE_MASK;
+			force_sig_mceerr(BUS_MCEERR_AR, (void *)addr, PAGE_SHIFT);
+		} else {
+			force_sig(SIGBUS);
+		}
+		goto unlock;
+	}
+
+	section = &sgx_epc_sections[epc_page->section];
+	node = section->node;
+
+	if (page_flags & SGX_EPC_PAGE_POISON)
+		goto unlock;
+
+	if (page_flags & SGX_EPC_PAGE_DIRTY) {
+		list_del(&epc_page->list);
+	} else if (page_flags & SGX_EPC_PAGE_FREE) {
+		spin_lock(&node->lock);
+		epc_page->owner = NULL;
+		list_del(&epc_page->list);
+		sgx_nr_free_pages--;
+		spin_unlock(&node->lock);
+		goto unlock;
+	}
+
+	switch (epc_page->type) {
+	case SGX_PAGE_TYPE_REG:
+		encl_page = epc_page->owner;
+		/* Unmap the page, unless it was already in progress to be freed */
+		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
+			spin_unlock(&sgx_reclaimer_lock);
+			sgx_reclaimer_block(epc_page);
+			kref_put(&encl_page->encl->refcount, sgx_encl_release);
+			goto done;
+		}
+		break;
+
+	case SGX_PAGE_TYPE_KVM:
+		spin_unlock(&sgx_reclaimer_lock);
+		sgx_memory_failure_vepc(epc_page);
+		break;
+
+	default:
+		/*
+		 * I don't expect SECS, TCS, VA pages would result
+		 * in recoverable machine checks. If it turns out
+		 * that they do, then add cases for recovery for
+		 * each of them.
+		 */
+		panic("Poisoned active SGX page type %d\n", epc_page->type);
+		break;
+	}
+	goto done;
+
+unlock:
+	spin_unlock(&sgx_reclaimer_lock);
+done:
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index e43d3c27eb96..7701f5f88b61 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -39,7 +39,7 @@ struct sgx_epc_page {
 	unsigned int section;
 	u16 flags;
 	u16 type;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
@@ -60,6 +60,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 044dd92ebd63..08918166394f 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -21,6 +21,7 @@
 struct sgx_vepc {
 	struct xarray page_array;
 	struct mutex lock;
+	struct task_struct *task;
 };
 
 /*
@@ -218,6 +219,13 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+void sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+	struct sgx_vepc *vepc = epc_page->owner;
+
+	send_sig(SIGBUS, vepc->task, false);
+}
+
 static int sgx_vepc_open(struct inode *inode, struct file *file)
 {
 	struct sgx_vepc *vepc;
@@ -227,6 +235,7 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
 		return -ENOMEM;
 	mutex_init(&vepc->lock);
 	xa_init(&vepc->page_array);
+	vepc->task = current;
 
 	file->private_data = vepc;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
  2021-07-08 18:14 ` [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  2021-07-08 18:14 ` [PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck
  3 siblings, 0 replies; 13+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Add a call inside memory_failure() to check if the address is an SGX
EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/mm.h  | 9 +++++++++
 mm/memory-failure.c | 4 ++++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ae31622deef..1b9d0912942a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3251,5 +3251,14 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifdef CONFIG_X86_SGX
+int sgx_memory_failure(unsigned long pfn, int flags);
+#else
+static inline int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6f5f78885ab4..02b1c58729c1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1453,6 +1453,10 @@ int memory_failure(unsigned long pfn, int flags)
 
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = sgx_memory_failure(pfn, flags);
+		if (res == 0)
+			return 0;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
 			if (pgmap)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] x86/sgx: Add hook to error injection address validation
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
                   ` (2 preceding siblings ...)
  2021-07-08 18:14 ` [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  3 siblings, 0 replies; 13+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 include/linux/mm.h                            |  6 ++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 328e8aeece6c..fb634219e232 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !sgx_is_epc_page(base_addr)))
 		return -EINVAL;
 
 inject:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1b9d0912942a..47eb960516cf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3253,11 +3253,17 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 
 #ifdef CONFIG_X86_SGX
 int sgx_memory_failure(unsigned long pfn, int flags);
+bool sgx_is_epc_page(u64 paddr);
 #else
 static inline int sgx_memory_failure(unsigned long pfn, int flags)
 {
 	return -ENXIO;
 }
+
+static inline bool sgx_is_epc_page(u64 paddr)
+{
+	return false;
+}
 #endif
 
 #endif /* __KERNEL__ */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
@ 2021-07-09 18:08   ` Jarkko Sakkinen
  2021-07-09 18:09     ` Jarkko Sakkinen
  2021-07-14 20:42   ` Reinette Chatre
  1 sibling, 1 reply; 13+ messages in thread
From: Jarkko Sakkinen @ 2021-07-09 18:08 UTC (permalink / raw)
  To: Tony Luck; +Cc: Dave Hansen, x86, linux-sgx, linux-kernel

On Thu, Jul 08, 2021 at 11:14:20AM -0700, Tony Luck wrote:
> Memory errors can be reported either synchronously as memory is accessed,
> or asynchronously by speculative access or by a memory controller page
> scrubber.  The life cycle of an EPC page takes it through:
> 	dirty -> free -> in-use -> free.
> 
> Memory errors are reported using physical addresses. It is a simple
> matter to find which sgx_epc_page structure maps a given address.
> But then recovery code needs to be able to determine the current use of
> the page to take the appropriate recovery action. Within the "in-use"
> phase different actions are needed based on how the page is used in
> the enclave.
> 
> Add new flags bits to describe the phase (with an extra bit for the new
> phase of "poisoned"). Drop pages marked as poisoned instead of adding
> them to a free list to make sure they are not re-used.
> 
> Add a type field to struct epc_page for how an in-use page has been
> allocated. Re-use "enum sgx_page_type" for this type, with a couple
> of additions for s/w types.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/sgx.h      |  6 ++++++
>  arch/x86/kernel/cpu/sgx/encl.c  |  4 ++--
>  arch/x86/kernel/cpu/sgx/ioctl.c |  4 ++--
>  arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++++++++++--
>  arch/x86/kernel/cpu/sgx/sgx.h   | 14 ++++++++++++--
>  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
>  6 files changed, 42 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 9c31e0ebc55b..9619a6d77a83 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -216,6 +216,8 @@ struct sgx_pageinfo {
>   * %SGX_PAGE_TYPE_REG:	a regular page
>   * %SGX_PAGE_TYPE_VA:	a VA page
>   * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
> + *
> + * Also used to track current use of &struct sgx_epc_page
>   */
>  enum sgx_page_type {
>  	SGX_PAGE_TYPE_SECS,
> @@ -223,6 +225,10 @@ enum sgx_page_type {
>  	SGX_PAGE_TYPE_REG,
>  	SGX_PAGE_TYPE_VA,
>  	SGX_PAGE_TYPE_TRIM,
> +
> +	/* sgx_epc_page.type */
> +	SGX_PAGE_TYPE_FREE = 100,
> +	SGX_PAGE_TYPE_KVM = 101,
>  };
>  
>  #define SGX_NR_PAGE_TYPES	5
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 3be203297988..abf6e1a704c0 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	struct sgx_epc_page *epc_page;
>  	int ret;
>  
> -	epc_page = sgx_alloc_epc_page(encl_page, false);
> +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
>  	if (IS_ERR(epc_page))
>  		return epc_page;
>  
> @@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
>  	struct sgx_epc_page *epc_page;
>  	int ret;
>  
> -	epc_page = sgx_alloc_epc_page(NULL, true);
> +	epc_page = sgx_alloc_epc_page(NULL,  SGX_PAGE_TYPE_VA, true);
>  	if (IS_ERR(epc_page))
>  		return ERR_CAST(epc_page);
>  
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 83df20e3e633..a74ae00194cc 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -83,7 +83,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
>  
>  	encl->backing = backing;
>  
> -	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
> +	secs_epc = sgx_alloc_epc_page(&encl->secs, SGX_PAGE_TYPE_SECS, true);
>  	if (IS_ERR(secs_epc)) {
>  		ret = PTR_ERR(secs_epc);
>  		goto err_out_backing;
> @@ -300,7 +300,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
>  	if (IS_ERR(encl_page))
>  		return PTR_ERR(encl_page);
>  
> -	epc_page = sgx_alloc_epc_page(encl_page, true);
> +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, true);
>  	if (IS_ERR(epc_page)) {
>  		kfree(encl_page);
>  		return PTR_ERR(epc_page);
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 63d3de02bbcc..643df87b3e01 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -401,7 +401,12 @@ static void sgx_reclaim_pages(void)
>  		section = &sgx_epc_sections[epc_page->section];
>  		node = section->node;
>  
> +		/* drop poison pages instead of adding to free list */
> +		if (epc_page->flags & SGX_EPC_PAGE_POISON)
> +			continue;
> +
>  		spin_lock(&node->lock);
> +		epc_page->flags = SGX_EPC_PAGE_FREE;
>  		list_add_tail(&epc_page->list, &node->free_page_list);
>  		sgx_nr_free_pages++;
>  		spin_unlock(&node->lock);
> @@ -560,6 +565,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>  /**
>   * sgx_alloc_epc_page() - Allocate an EPC page
>   * @owner:	the owner of the EPC page
> + * @type:	type of page being allocated
>   * @reclaim:	reclaim pages if necessary
>   *
>   * Iterate through EPC sections and borrow a free EPC page to the caller. When a
> @@ -574,7 +580,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>   *   an EPC page,
>   *   -errno on error
>   */
> -struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> +struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim)
>  {
>  	struct sgx_epc_page *page;
>  
> @@ -582,6 +588,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>  		page = __sgx_alloc_epc_page();
>  		if (!IS_ERR(page)) {
>  			page->owner = owner;
> +			page->type = type;
> +			page->flags = 0;
>  			break;
>  		}
>  
> @@ -616,14 +624,22 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>   * responsibility to make sure that the page is in uninitialized state. In other
>   * words, do EREMOVE, EWB or whatever operation is necessary before calling
>   * this function.
> + *
> + * Note that if the page has been tagged as poisoned, it is simply
> + * dropped on the floor instead of added to the free list to make
> + * sure we do not re-use it.
>   */
>  void sgx_free_epc_page(struct sgx_epc_page *page)
>  {
>  	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
>  	struct sgx_numa_node *node = section->node;
>  
> +	if (page->flags & SGX_EPC_PAGE_POISON)
> +		return;

I tend to think that it would be nice to collect them somewhere instead
purposely leaking. E.g. this gives possibility to examine list with
debugging tools.

/Jarkko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-09 18:08   ` Jarkko Sakkinen
@ 2021-07-09 18:09     ` Jarkko Sakkinen
  0 siblings, 0 replies; 13+ messages in thread
From: Jarkko Sakkinen @ 2021-07-09 18:09 UTC (permalink / raw)
  To: Tony Luck; +Cc: Dave Hansen, x86, linux-sgx, linux-kernel

On Fri, Jul 09, 2021 at 09:08:03PM +0300, Jarkko Sakkinen wrote:
> On Thu, Jul 08, 2021 at 11:14:20AM -0700, Tony Luck wrote:
> > Memory errors can be reported either synchronously as memory is accessed,
> > or asynchronously by speculative access or by a memory controller page
> > scrubber.  The life cycle of an EPC page takes it through:
> > 	dirty -> free -> in-use -> free.
> > 
> > Memory errors are reported using physical addresses. It is a simple
> > matter to find which sgx_epc_page structure maps a given address.
> > But then recovery code needs to be able to determine the current use of
> > the page to take the appropriate recovery action. Within the "in-use"
> > phase different actions are needed based on how the page is used in
> > the enclave.
> > 
> > Add new flags bits to describe the phase (with an extra bit for the new
> > phase of "poisoned"). Drop pages marked as poisoned instead of adding
> > them to a free list to make sure they are not re-used.
> > 
> > Add a type field to struct epc_page for how an in-use page has been
> > allocated. Re-use "enum sgx_page_type" for this type, with a couple
> > of additions for s/w types.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/include/asm/sgx.h      |  6 ++++++
> >  arch/x86/kernel/cpu/sgx/encl.c  |  4 ++--
> >  arch/x86/kernel/cpu/sgx/ioctl.c |  4 ++--
> >  arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++++++++++--
> >  arch/x86/kernel/cpu/sgx/sgx.h   | 14 ++++++++++++--
> >  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
> >  6 files changed, 42 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> > index 9c31e0ebc55b..9619a6d77a83 100644
> > --- a/arch/x86/include/asm/sgx.h
> > +++ b/arch/x86/include/asm/sgx.h
> > @@ -216,6 +216,8 @@ struct sgx_pageinfo {
> >   * %SGX_PAGE_TYPE_REG:	a regular page
> >   * %SGX_PAGE_TYPE_VA:	a VA page
> >   * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
> > + *
> > + * Also used to track current use of &struct sgx_epc_page
> >   */
> >  enum sgx_page_type {
> >  	SGX_PAGE_TYPE_SECS,
> > @@ -223,6 +225,10 @@ enum sgx_page_type {
> >  	SGX_PAGE_TYPE_REG,
> >  	SGX_PAGE_TYPE_VA,
> >  	SGX_PAGE_TYPE_TRIM,
> > +
> > +	/* sgx_epc_page.type */
> > +	SGX_PAGE_TYPE_FREE = 100,
> > +	SGX_PAGE_TYPE_KVM = 101,
> >  };
> >  
> >  #define SGX_NR_PAGE_TYPES	5
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 3be203297988..abf6e1a704c0 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >  	struct sgx_epc_page *epc_page;
> >  	int ret;
> >  
> > -	epc_page = sgx_alloc_epc_page(encl_page, false);
> > +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
> >  	if (IS_ERR(epc_page))
> >  		return epc_page;
> >  
> > @@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
> >  	struct sgx_epc_page *epc_page;
> >  	int ret;
> >  
> > -	epc_page = sgx_alloc_epc_page(NULL, true);
> > +	epc_page = sgx_alloc_epc_page(NULL,  SGX_PAGE_TYPE_VA, true);
> >  	if (IS_ERR(epc_page))
> >  		return ERR_CAST(epc_page);
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 83df20e3e633..a74ae00194cc 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -83,7 +83,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
> >  
> >  	encl->backing = backing;
> >  
> > -	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
> > +	secs_epc = sgx_alloc_epc_page(&encl->secs, SGX_PAGE_TYPE_SECS, true);
> >  	if (IS_ERR(secs_epc)) {
> >  		ret = PTR_ERR(secs_epc);
> >  		goto err_out_backing;
> > @@ -300,7 +300,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
> >  	if (IS_ERR(encl_page))
> >  		return PTR_ERR(encl_page);
> >  
> > -	epc_page = sgx_alloc_epc_page(encl_page, true);
> > +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, true);
> >  	if (IS_ERR(epc_page)) {
> >  		kfree(encl_page);
> >  		return PTR_ERR(epc_page);
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 63d3de02bbcc..643df87b3e01 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -401,7 +401,12 @@ static void sgx_reclaim_pages(void)
> >  		section = &sgx_epc_sections[epc_page->section];
> >  		node = section->node;
> >  
> > +		/* drop poison pages instead of adding to free list */
> > +		if (epc_page->flags & SGX_EPC_PAGE_POISON)
> > +			continue;
> > +
> >  		spin_lock(&node->lock);
> > +		epc_page->flags = SGX_EPC_PAGE_FREE;
> >  		list_add_tail(&epc_page->list, &node->free_page_list);
> >  		sgx_nr_free_pages++;
> >  		spin_unlock(&node->lock);
> > @@ -560,6 +565,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
> >  /**
> >   * sgx_alloc_epc_page() - Allocate an EPC page
> >   * @owner:	the owner of the EPC page
> > + * @type:	type of page being allocated
> >   * @reclaim:	reclaim pages if necessary
> >   *
> >   * Iterate through EPC sections and borrow a free EPC page to the caller. When a
> > @@ -574,7 +580,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
> >   *   an EPC page,
> >   *   -errno on error
> >   */
> > -struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> > +struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim)
> >  {
> >  	struct sgx_epc_page *page;
> >  
> > @@ -582,6 +588,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >  		page = __sgx_alloc_epc_page();
> >  		if (!IS_ERR(page)) {
> >  			page->owner = owner;
> > +			page->type = type;
> > +			page->flags = 0;
> >  			break;
> >  		}
> >  
> > @@ -616,14 +624,22 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >   * responsibility to make sure that the page is in uninitialized state. In other
> >   * words, do EREMOVE, EWB or whatever operation is necessary before calling
> >   * this function.
> > + *
> > + * Note that if the page has been tagged as poisoned, it is simply
> > + * dropped on the floor instead of added to the free list to make
> > + * sure we do not re-use it.
> >   */
> >  void sgx_free_epc_page(struct sgx_epc_page *page)
> >  {
> >  	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
> >  	struct sgx_numa_node *node = section->node;
> >  
> > +	if (page->flags & SGX_EPC_PAGE_POISON)
> > +		return;
> 
> I tend to think that it would be nice to collect them somewhere instead
> purposely leaking. E.g. this gives possibility to examine list with
> debugging tools.

I'm not also sure why free and dirty pages need to be tagged. Why a
poison flag is enough? This could be better explained in the commit
message.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
  2021-07-09 18:08   ` Jarkko Sakkinen
@ 2021-07-14 20:42   ` Reinette Chatre
  2021-07-14 20:59     ` Luck, Tony
  1 sibling, 1 reply; 13+ messages in thread
From: Reinette Chatre @ 2021-07-14 20:42 UTC (permalink / raw)
  To: Tony Luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Hi Tony,

On 7/8/2021 11:14 AM, Tony Luck wrote:
> 
> Add a type field to struct epc_page for how an in-use page has been
> allocated. Re-use "enum sgx_page_type" for this type, with a couple
> of additions for s/w types.

Tracking the enclave page type is a useful addition that will also help 
the SGX2 support where some instructions (ENCLS[EMODPR]) are only 
allowed on pages with particular type.

Could this tracking be done at the enclave page (struct sgx_encl_page) 
instead? The enclave page's EPC page information is not available when 
the page is in swap and it would be useful to know the page type without 
loading the page from swap. The information would continue to be 
accessible from struct epc_page via the owner pointer that may make some 
of the changes easier since it would not be needed to pass the page type 
around so much and thus possibly address the SECS page issue that Sean 
pointed out in
https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/

> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 4628acec0009..e43d3c27eb96 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -26,9 +26,19 @@
>   /* Pages, which are being tracked by the page reclaimer. */
>   #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
>   
> +/* Pages, on the "sgx_dirty_page_list" */
> +#define SGX_EPC_PAGE_DIRTY		BIT(1)
> +
> +/* Pages, on one of the node free lists */
> +#define SGX_EPC_PAGE_FREE		BIT(2)
> +
> +/* Pages, with h/w poison errors */
> +#define SGX_EPC_PAGE_POISON		BIT(3)
> +
>   struct sgx_epc_page {
>   	unsigned int section;
> -	unsigned int flags;
> +	u16 flags;
> +	u16 type;

Could this be "enum sgx_page_type type" ?

Reinette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 20:42   ` Reinette Chatre
@ 2021-07-14 20:59     ` Luck, Tony
  2021-07-14 21:21       ` Reinette Chatre
  0 siblings, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2021-07-14 20:59 UTC (permalink / raw)
  To: Chatre, Reinette
  Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

> Could this tracking be done at the enclave page (struct sgx_encl_page) 
> instead?

In principle yes. Though Sean has some issues with me tracking types
at all.

> The enclave page's EPC page information is not available when 
> the page is in swap and it would be useful to know the page type without 
> loading the page from swap. The information would continue to be 
> accessible from struct epc_page via the owner pointer that may make some 
> of the changes easier since it would not be needed to pass the page type 
> around so much and thus possibly address the SECS page issue that Sean 
> pointed out in
> https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/

I think I noticed that the "owner" pointer in sgx_encl_page doesn't point
back to the epc_page for all types of SGX pages. So some additional
changes would be needed. I'm not at all sure why this is different (or
what use the non-REG pages use "owner" for.

>>   struct sgx_epc_page {
>>   	unsigned int section;
>> -	unsigned int flags;
>> +	u16 flags;
>> +	u16 type;
>
> Could this be "enum sgx_page_type type" ?

Maybe. I thought I needed extra types (like FREE and DIRTY). But
Sean pointed out how to avoid some of them.

-Tony

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 20:59     ` Luck, Tony
@ 2021-07-14 21:21       ` Reinette Chatre
  2021-07-14 23:08         ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Reinette Chatre @ 2021-07-14 21:21 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Hi Tony,

On 7/14/2021 1:59 PM, Luck, Tony wrote:
>> Could this tracking be done at the enclave page (struct sgx_encl_page)
>> instead?
> 
> In principle yes. Though Sean has some issues with me tracking types
> at all.

For the SGX2 work knowing the page types are useful. Some instructions 
only work on certain page types and knowing beforehand whether an 
instruction could work helps to avoid dealing with the errors when it 
does not work.

>> The enclave page's EPC page information is not available when
>> the page is in swap and it would be useful to know the page type without
>> loading the page from swap. The information would continue to be
>> accessible from struct epc_page via the owner pointer that may make some
>> of the changes easier since it would not be needed to pass the page type
>> around so much and thus possibly address the SECS page issue that Sean
>> pointed out in
>> https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/
> 
> I think I noticed that the "owner" pointer in sgx_encl_page doesn't point
> back to the epc_page for all types of SGX pages. So some additional
> changes would be needed. I'm not at all sure why this is different (or
> what use the non-REG pages use "owner" for.

This may be VA pages? struct sgx_va_page also contains a pointer to an 
EPC page. I did not consider that for this case. Perhaps these could be 
identified uniquely.

Reinette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 21:21       ` Reinette Chatre
@ 2021-07-14 23:08         ` Sean Christopherson
  2021-07-14 23:39           ` Luck, Tony
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2021-07-14 23:08 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Luck, Tony, Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

On Wed, Jul 14, 2021, Reinette Chatre wrote:
> Hi Tony,
> 
> On 7/14/2021 1:59 PM, Luck, Tony wrote:
> > > Could this tracking be done at the enclave page (struct sgx_encl_page)
> > > instead?
> > 
> > In principle yes. Though Sean has some issues with me tracking types
> > at all.

I've no objection to tracking the type for SGX2, my argument in the context of
#MC support is that there should be no need to track the type.  Either the #MC
is recoverable or it isn't, and the enclave is toast regardless of what type of
page hit the #MC.

There might be a need to identify track vEPC pages, e.g. to avoid the retpoline
associated with a virtual function table, but IMO that would be better done as a
new flag instead of overloading the page type.  E.g. a page can be both a
vEPC page and an SECS/REG/VA page depending on its use in the guest.

> For the SGX2 work knowing the page types are useful. Some instructions only
> work on certain page types and knowing beforehand whether an instruction
> could work helps to avoid dealing with the errors when it does not work.

Yes, but the SGX2 use case is specific to "native" enclaves, i.e. it can and
should be limited to sgx_encl_page, as opposed to being shoved into sgx_epc_page.

> > > The enclave page's EPC page information is not available when
> > > the page is in swap and it would be useful to know the page type without
> > > loading the page from swap. The information would continue to be
> > > accessible from struct epc_page via the owner pointer that may make some
> > > of the changes easier since it would not be needed to pass the page type
> > > around so much and thus possibly address the SECS page issue that Sean
> > > pointed out in
> > > https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/
> > 
> > I think I noticed that the "owner" pointer in sgx_encl_page doesn't point
> > back to the epc_page for all types of SGX pages. So some additional
> > changes would be needed. I'm not at all sure why this is different (or
> > what use the non-REG pages use "owner" for.
> 
> This may be VA pages? struct sgx_va_page also contains a pointer to an EPC
> page. I did not consider that for this case. Perhaps these could be
> identified uniquely.

The "owner" is currently only used for reclaim.  IIRC, the proposed EPC cgroup
also used "owner" to enable forced "reclaim", i.e. reclaiming EPC by nuking the
owning entity, e.g. tearing down a virtual EPC section.  And I believe the cgroup
also used the aforementioned vEPC flag to invoke the correct EPC OOM reaper.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 23:08         ` Sean Christopherson
@ 2021-07-14 23:39           ` Luck, Tony
  2021-07-15 15:33             ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Luck, Tony @ 2021-07-14 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Chatre, Reinette
  Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

> I've no objection to tracking the type for SGX2, my argument in the context of
> #MC support is that there should be no need to track the type.  Either the #MC
> is recoverable or it isn't, and the enclave is toast regardless of what type of
> page hit the #MC.

I'll separate the "phase" from the "type".

Here phase is used for the life-cycle of EPC pages:

DIRTY -> FREE -> IN-USE -> DIRTY

Errors can be reported by memory controller page scrubbers
for pages that are not "IN-USE" ... and the recovery action is
just to make sure that they are never allocated.

When a page is IN-USE ... it has a "type". I currently
only have a way to inject errors into SGX_PAGE_TYPE_REG
pages. That means initial recovery code is going to focus on
those since that is all I can test. But I'll try not to special case
them as far as possible.

-Tony

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 23:39           ` Luck, Tony
@ 2021-07-15 15:33             ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2021-07-15 15:33 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Chatre, Reinette, Jarkko Sakkinen, Dave Hansen, x86, linux-sgx,
	linux-kernel

On Wed, Jul 14, 2021, Luck, Tony wrote:
> > I've no objection to tracking the type for SGX2, my argument in the context of
> > #MC support is that there should be no need to track the type.  Either the #MC
> > is recoverable or it isn't, and the enclave is toast regardless of what type of
> > page hit the #MC.
> 
> I'll separate the "phase" from the "type".
> 
> Here phase is used for the life-cycle of EPC pages:
> 
> DIRTY -> FREE -> IN-USE -> DIRTY

Not that it affects anything, but that's not quite true.  In hardware, pages are
either FREE or IN-USE, there is no concept of DIRTY.  DIRTY is the kernel's
arbitrary description of a page that has not been sanitized and so is considered
to be in an unknown state, i.e. the kernel doesn't know if it's FREE or IN-USE.

Once a page is sanitized (during boot), its state is known and the page is never
put back on the so called dirty list, i.e. the software flow is:

  DIRTY -> FREE -> IN-USE -> FREE

> Errors can be reported by memory controller page scrubbers for pages that are
> not "IN-USE" ... and the recovery action is just to make sure that they are
> never allocated.
>
> When a page is IN-USE ... it has a "type". I currently only have a way to
> inject errors into SGX_PAGE_TYPE_REG pages. That means initial recovery code
> is going to focus on those since that is all I can test. But I'll try not to
> special case them as far as possible.

Inability to test expected behavior doesn't mean we shouldn't implement towards
the expected behavior, i.e. someone somewhere must know how SECS and VA pages
behave in response to a memory error.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-15 15:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
2021-07-09 18:08   ` Jarkko Sakkinen
2021-07-09 18:09     ` Jarkko Sakkinen
2021-07-14 20:42   ` Reinette Chatre
2021-07-14 20:59     ` Luck, Tony
2021-07-14 21:21       ` Reinette Chatre
2021-07-14 23:08         ` Sean Christopherson
2021-07-14 23:39           ` Luck, Tony
2021-07-15 15:33             ` Sean Christopherson
2021-07-08 18:14 ` [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory Tony Luck
2021-07-08 18:14 ` [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-07-08 18:14 ` [PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).