All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Basic recovery for machine checks inside SGX
@ 2021-07-08 18:14 Tony Luck
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
                   ` (4 more replies)
  0 siblings, 5 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Cover the easy cases:
1) memory errors reported by patrol scrubber in unused SGX pages
2) machine checks due to poison consumption from SGX_PAGE_TYPE_REG
   pages
3) When poison is consumed in an enclave inside a guest, just kill
   the guest.

Tony Luck (4):
  x86/sgx: Track phase and type of SGX EPC pages
  x86/sgx: Add basic infrastructure to recover from errors in SGX memory
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/sgx.h                    |   6 +
 arch/x86/kernel/cpu/sgx/encl.c                |   4 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |   4 +-
 arch/x86/kernel/cpu/sgx/main.c                | 147 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |  17 +-
 arch/x86/kernel/cpu/sgx/virt.c                |  11 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |   4 +
 10 files changed, 219 insertions(+), 11 deletions(-)


base-commit: 62fb9874f5da54fdb243003b386128037319b219
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  2021-07-09 18:08   ` Jarkko Sakkinen
  2021-07-14 20:42   ` Reinette Chatre
  2021-07-08 18:14 ` [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory Tony Luck
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Memory errors can be reported either synchronously as memory is accessed,
or asynchronously by speculative access or by a memory controller page
scrubber.  The life cycle of an EPC page takes it through:
	dirty -> free -> in-use -> free.

Memory errors are reported using physical addresses. It is a simple
matter to find which sgx_epc_page structure maps a given address.
But then recovery code needs to be able to determine the current use of
the page to take the appropriate recovery action. Within the "in-use"
phase different actions are needed based on how the page is used in
the enclave.

Add new flags bits to describe the phase (with an extra bit for the new
phase of "poisoned"). Drop pages marked as poisoned instead of adding
them to a free list to make sure they are not re-used.

Add a type field to struct epc_page for how an in-use page has been
allocated. Re-use "enum sgx_page_type" for this type, with a couple
of additions for s/w types.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/sgx.h      |  6 ++++++
 arch/x86/kernel/cpu/sgx/encl.c  |  4 ++--
 arch/x86/kernel/cpu/sgx/ioctl.c |  4 ++--
 arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h   | 14 ++++++++++++--
 arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
 6 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 9c31e0ebc55b..9619a6d77a83 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -216,6 +216,8 @@ struct sgx_pageinfo {
  * %SGX_PAGE_TYPE_REG:	a regular page
  * %SGX_PAGE_TYPE_VA:	a VA page
  * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
+ *
+ * Also used to track current use of &struct sgx_epc_page
  */
 enum sgx_page_type {
 	SGX_PAGE_TYPE_SECS,
@@ -223,6 +225,10 @@ enum sgx_page_type {
 	SGX_PAGE_TYPE_REG,
 	SGX_PAGE_TYPE_VA,
 	SGX_PAGE_TYPE_TRIM,
+
+	/* sgx_epc_page.type */
+	SGX_PAGE_TYPE_FREE = 100,
+	SGX_PAGE_TYPE_KVM = 101,
 };
 
 #define SGX_NR_PAGE_TYPES	5
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 3be203297988..abf6e1a704c0 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(encl_page, false);
+	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
 	if (IS_ERR(epc_page))
 		return epc_page;
 
@@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, true);
+	epc_page = sgx_alloc_epc_page(NULL,  SGX_PAGE_TYPE_VA, true);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..a74ae00194cc 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -83,7 +83,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 
 	encl->backing = backing;
 
-	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
+	secs_epc = sgx_alloc_epc_page(&encl->secs, SGX_PAGE_TYPE_SECS, true);
 	if (IS_ERR(secs_epc)) {
 		ret = PTR_ERR(secs_epc);
 		goto err_out_backing;
@@ -300,7 +300,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
 	if (IS_ERR(encl_page))
 		return PTR_ERR(encl_page);
 
-	epc_page = sgx_alloc_epc_page(encl_page, true);
+	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, true);
 	if (IS_ERR(epc_page)) {
 		kfree(encl_page);
 		return PTR_ERR(epc_page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..643df87b3e01 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -401,7 +401,12 @@ static void sgx_reclaim_pages(void)
 		section = &sgx_epc_sections[epc_page->section];
 		node = section->node;
 
+		/* drop poison pages instead of adding to free list */
+		if (epc_page->flags & SGX_EPC_PAGE_POISON)
+			continue;
+
 		spin_lock(&node->lock);
+		epc_page->flags = SGX_EPC_PAGE_FREE;
 		list_add_tail(&epc_page->list, &node->free_page_list);
 		sgx_nr_free_pages++;
 		spin_unlock(&node->lock);
@@ -560,6 +565,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
 /**
  * sgx_alloc_epc_page() - Allocate an EPC page
  * @owner:	the owner of the EPC page
+ * @type:	type of page being allocated
  * @reclaim:	reclaim pages if necessary
  *
  * Iterate through EPC sections and borrow a free EPC page to the caller. When a
@@ -574,7 +580,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
  *   an EPC page,
  *   -errno on error
  */
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim)
 {
 	struct sgx_epc_page *page;
 
@@ -582,6 +588,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 		page = __sgx_alloc_epc_page();
 		if (!IS_ERR(page)) {
 			page->owner = owner;
+			page->type = type;
+			page->flags = 0;
 			break;
 		}
 
@@ -616,14 +624,22 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * responsibility to make sure that the page is in uninitialized state. In other
  * words, do EREMOVE, EWB or whatever operation is necessary before calling
  * this function.
+ *
+ * Note that if the page has been tagged as poisoned, it is simply
+ * dropped on the floor instead of added to the free list to make
+ * sure we do not re-use it.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
 	struct sgx_numa_node *node = section->node;
 
+	if (page->flags & SGX_EPC_PAGE_POISON)
+		return;
+
 	spin_lock(&node->lock);
 
+	page->flags = SGX_EPC_PAGE_FREE;
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
@@ -651,7 +667,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
-		section->pages[i].flags = 0;
+		section->pages[i].flags = SGX_EPC_PAGE_DIRTY;
+		section->pages[i].type = SGX_PAGE_TYPE_FREE;
 		section->pages[i].owner = NULL;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..e43d3c27eb96 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,9 +26,19 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Pages, on the "sgx_dirty_page_list" */
+#define SGX_EPC_PAGE_DIRTY		BIT(1)
+
+/* Pages, on one of the node free lists */
+#define SGX_EPC_PAGE_FREE		BIT(2)
+
+/* Pages, with h/w poison errors */
+#define SGX_EPC_PAGE_POISON		BIT(3)
+
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 type;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
@@ -82,7 +92,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page);
 
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim);
 
 #ifdef CONFIG_X86_SGX_KVM
 int __init sgx_vepc_init(void);
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 64511c4a5200..044dd92ebd63 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -46,7 +46,7 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc,
 	if (epc_page)
 		return 0;
 
-	epc_page = sgx_alloc_epc_page(vepc, false);
+	epc_page = sgx_alloc_epc_page(vepc, SGX_PAGE_TYPE_KVM, false);
 	if (IS_ERR(epc_page))
 		return PTR_ERR(epc_page);
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  2021-07-08 18:14 ` [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

X86 machine check architecture reports a physical address when there is
a memory error.

Add an end_phys_addr field to the sgx_epc_section structure and a new
function sgx_paddr_to_page() that searches all such structures to see
if a physical address is part of an SGX EPC page.

The ACPI EINJ module has a sanity check that the injection address is
valid. Add and export a function sgx_is_epc_page() so that it can be
changed to allow injection to SGX EPC pages.

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously inside the enclave send a SIGBUS just the
same as for poison consumption outside of an enclave.

For asynchronously reported errors the easiest cases are when the page
is currently free. Just drop the page from the free list so that it will
never be allocated.

An SGX_PAGE_TYPE_REG page can just be unmapped from the enclave. If the
enclave doesn't ever touch that page again all is well. If it does touch
the page it will die. Possible future enhancement here to mark the
unmapped PTE so that it will cause a SIGBUS.

SGX_PAGE_TYPE_KVM pages may be recoverable depending on how they are
being used by guests. To fix that properly requires injecting the machine
check into the guest. For now just kill it.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 126 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |   3 +-
 arch/x86/kernel/cpu/sgx/virt.c |   9 +++
 3 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 643df87b3e01..4a354f89373e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -664,6 +664,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -676,6 +677,131 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
+		section = &sgx_epc_sections[i];
+
+		if (paddr < section->phys_addr || paddr > section->end_phys_addr)
+			continue;
+
+		return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+	}
+
+	return NULL;
+}
+
+bool sgx_is_epc_page(u64 paddr)
+{
+	return !!sgx_paddr_to_page(paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
+void __attribute__((weak)) sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+}
+
+/*
+ * This can be called in three ways:
+ * a) When an enclave has just consumed poison. In this case
+ *    calling process context is the owner of the enclave.
+ * b) For some asychronous poison notification (e.g. patrol
+ *    scrubber or speculative execution saw the poison.)
+ *    In this case calling context is a kernel thread.
+ * c) Someone asked to take a page offline using the
+ *    /sys/devices/system/memory/{hard,soft}_offline_page.
+ *    In this case caller is the process writing these files.
+ */
+int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *epc_page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_encl_page *encl_page;
+	struct sgx_numa_node *node;
+	unsigned long addr;
+	u16 page_flags;
+
+	if (!epc_page)
+		return -ENXIO;
+
+	spin_lock(&sgx_reclaimer_lock);
+
+	page_flags = epc_page->flags;
+	epc_page->flags |= SGX_EPC_PAGE_POISON;
+
+	/*
+	 * Poison was synchronously consumed by an enclave in the current
+	 * task. Send a SIGBUS here. Hardware should prevent this enclave
+	 * from being re-entered, so no concern that the poisoned page
+	 * will be touched a second time. The poisoned EPC page will be
+	 * dropped as pages are freed during task exit.
+	 */
+	if (flags & MF_ACTION_REQUIRED) {
+		if (epc_page->type == SGX_PAGE_TYPE_REG) {
+			encl_page = epc_page->owner;
+			addr = encl_page->desc & PAGE_MASK;
+			force_sig_mceerr(BUS_MCEERR_AR, (void *)addr, PAGE_SHIFT);
+		} else {
+			force_sig(SIGBUS);
+		}
+		goto unlock;
+	}
+
+	section = &sgx_epc_sections[epc_page->section];
+	node = section->node;
+
+	if (page_flags & SGX_EPC_PAGE_POISON)
+		goto unlock;
+
+	if (page_flags & SGX_EPC_PAGE_DIRTY) {
+		list_del(&epc_page->list);
+	} else if (page_flags & SGX_EPC_PAGE_FREE) {
+		spin_lock(&node->lock);
+		epc_page->owner = NULL;
+		list_del(&epc_page->list);
+		sgx_nr_free_pages--;
+		spin_unlock(&node->lock);
+		goto unlock;
+	}
+
+	switch (epc_page->type) {
+	case SGX_PAGE_TYPE_REG:
+		encl_page = epc_page->owner;
+		/* Unmap the page, unless it was already in progress to be freed */
+		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
+			spin_unlock(&sgx_reclaimer_lock);
+			sgx_reclaimer_block(epc_page);
+			kref_put(&encl_page->encl->refcount, sgx_encl_release);
+			goto done;
+		}
+		break;
+
+	case SGX_PAGE_TYPE_KVM:
+		spin_unlock(&sgx_reclaimer_lock);
+		sgx_memory_failure_vepc(epc_page);
+		break;
+
+	default:
+		/*
+		 * I don't expect SECS, TCS, VA pages would result
+		 * in recoverable machine checks. If it turns out
+		 * that they do, then add cases for recovery for
+		 * each of them.
+		 */
+		panic("Poisoned active SGX page type %d\n", epc_page->type);
+		break;
+	}
+	goto done;
+
+unlock:
+	spin_unlock(&sgx_reclaimer_lock);
+done:
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index e43d3c27eb96..7701f5f88b61 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -39,7 +39,7 @@ struct sgx_epc_page {
 	unsigned int section;
 	u16 flags;
 	u16 type;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
@@ -60,6 +60,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 044dd92ebd63..08918166394f 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -21,6 +21,7 @@
 struct sgx_vepc {
 	struct xarray page_array;
 	struct mutex lock;
+	struct task_struct *task;
 };
 
 /*
@@ -218,6 +219,13 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+void sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+	struct sgx_vepc *vepc = epc_page->owner;
+
+	send_sig(SIGBUS, vepc->task, false);
+}
+
 static int sgx_vepc_open(struct inode *inode, struct file *file)
 {
 	struct sgx_vepc *vepc;
@@ -227,6 +235,7 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
 		return -ENOMEM;
 	mutex_init(&vepc->lock);
 	xa_init(&vepc->page_array);
+	vepc->task = current;
 
 	file->private_data = vepc;
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
  2021-07-08 18:14 ` [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  2021-07-08 18:14 ` [PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
  4 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Add a call inside memory_failure() to check if the address is an SGX
EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/mm.h  | 9 +++++++++
 mm/memory-failure.c | 4 ++++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ae31622deef..1b9d0912942a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3251,5 +3251,14 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifdef CONFIG_X86_SGX
+int sgx_memory_failure(unsigned long pfn, int flags);
+#else
+static inline int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6f5f78885ab4..02b1c58729c1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1453,6 +1453,10 @@ int memory_failure(unsigned long pfn, int flags)
 
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = sgx_memory_failure(pfn, flags);
+		if (res == 0)
+			return 0;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
 			if (pgmap)
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH 4/4] x86/sgx: Add hook to error injection address validation
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
                   ` (2 preceding siblings ...)
  2021-07-08 18:14 ` [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
@ 2021-07-08 18:14 ` Tony Luck
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
  4 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-08 18:14 UTC (permalink / raw)
  To: tony.luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 include/linux/mm.h                            |  6 ++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 328e8aeece6c..fb634219e232 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !sgx_is_epc_page(base_addr)))
 		return -EINVAL;
 
 inject:
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1b9d0912942a..47eb960516cf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3253,11 +3253,17 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 
 #ifdef CONFIG_X86_SGX
 int sgx_memory_failure(unsigned long pfn, int flags);
+bool sgx_is_epc_page(u64 paddr);
 #else
 static inline int sgx_memory_failure(unsigned long pfn, int flags)
 {
 	return -ENXIO;
 }
+
+static inline bool sgx_is_epc_page(u64 paddr)
+{
+	return false;
+}
 #endif
 
 #endif /* __KERNEL__ */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
@ 2021-07-09 18:08   ` Jarkko Sakkinen
  2021-07-09 18:09     ` Jarkko Sakkinen
  2021-07-14 20:42   ` Reinette Chatre
  1 sibling, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-07-09 18:08 UTC (permalink / raw)
  To: Tony Luck; +Cc: Dave Hansen, x86, linux-sgx, linux-kernel

On Thu, Jul 08, 2021 at 11:14:20AM -0700, Tony Luck wrote:
> Memory errors can be reported either synchronously as memory is accessed,
> or asynchronously by speculative access or by a memory controller page
> scrubber.  The life cycle of an EPC page takes it through:
> 	dirty -> free -> in-use -> free.
> 
> Memory errors are reported using physical addresses. It is a simple
> matter to find which sgx_epc_page structure maps a given address.
> But then recovery code needs to be able to determine the current use of
> the page to take the appropriate recovery action. Within the "in-use"
> phase different actions are needed based on how the page is used in
> the enclave.
> 
> Add new flags bits to describe the phase (with an extra bit for the new
> phase of "poisoned"). Drop pages marked as poisoned instead of adding
> them to a free list to make sure they are not re-used.
> 
> Add a type field to struct epc_page for how an in-use page has been
> allocated. Re-use "enum sgx_page_type" for this type, with a couple
> of additions for s/w types.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/sgx.h      |  6 ++++++
>  arch/x86/kernel/cpu/sgx/encl.c  |  4 ++--
>  arch/x86/kernel/cpu/sgx/ioctl.c |  4 ++--
>  arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++++++++++--
>  arch/x86/kernel/cpu/sgx/sgx.h   | 14 ++++++++++++--
>  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
>  6 files changed, 42 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 9c31e0ebc55b..9619a6d77a83 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -216,6 +216,8 @@ struct sgx_pageinfo {
>   * %SGX_PAGE_TYPE_REG:	a regular page
>   * %SGX_PAGE_TYPE_VA:	a VA page
>   * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
> + *
> + * Also used to track current use of &struct sgx_epc_page
>   */
>  enum sgx_page_type {
>  	SGX_PAGE_TYPE_SECS,
> @@ -223,6 +225,10 @@ enum sgx_page_type {
>  	SGX_PAGE_TYPE_REG,
>  	SGX_PAGE_TYPE_VA,
>  	SGX_PAGE_TYPE_TRIM,
> +
> +	/* sgx_epc_page.type */
> +	SGX_PAGE_TYPE_FREE = 100,
> +	SGX_PAGE_TYPE_KVM = 101,
>  };
>  
>  #define SGX_NR_PAGE_TYPES	5
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 3be203297988..abf6e1a704c0 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
>  	struct sgx_epc_page *epc_page;
>  	int ret;
>  
> -	epc_page = sgx_alloc_epc_page(encl_page, false);
> +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
>  	if (IS_ERR(epc_page))
>  		return epc_page;
>  
> @@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
>  	struct sgx_epc_page *epc_page;
>  	int ret;
>  
> -	epc_page = sgx_alloc_epc_page(NULL, true);
> +	epc_page = sgx_alloc_epc_page(NULL,  SGX_PAGE_TYPE_VA, true);
>  	if (IS_ERR(epc_page))
>  		return ERR_CAST(epc_page);
>  
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 83df20e3e633..a74ae00194cc 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -83,7 +83,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
>  
>  	encl->backing = backing;
>  
> -	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
> +	secs_epc = sgx_alloc_epc_page(&encl->secs, SGX_PAGE_TYPE_SECS, true);
>  	if (IS_ERR(secs_epc)) {
>  		ret = PTR_ERR(secs_epc);
>  		goto err_out_backing;
> @@ -300,7 +300,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
>  	if (IS_ERR(encl_page))
>  		return PTR_ERR(encl_page);
>  
> -	epc_page = sgx_alloc_epc_page(encl_page, true);
> +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, true);
>  	if (IS_ERR(epc_page)) {
>  		kfree(encl_page);
>  		return PTR_ERR(epc_page);
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 63d3de02bbcc..643df87b3e01 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -401,7 +401,12 @@ static void sgx_reclaim_pages(void)
>  		section = &sgx_epc_sections[epc_page->section];
>  		node = section->node;
>  
> +		/* drop poison pages instead of adding to free list */
> +		if (epc_page->flags & SGX_EPC_PAGE_POISON)
> +			continue;
> +
>  		spin_lock(&node->lock);
> +		epc_page->flags = SGX_EPC_PAGE_FREE;
>  		list_add_tail(&epc_page->list, &node->free_page_list);
>  		sgx_nr_free_pages++;
>  		spin_unlock(&node->lock);
> @@ -560,6 +565,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>  /**
>   * sgx_alloc_epc_page() - Allocate an EPC page
>   * @owner:	the owner of the EPC page
> + * @type:	type of page being allocated
>   * @reclaim:	reclaim pages if necessary
>   *
>   * Iterate through EPC sections and borrow a free EPC page to the caller. When a
> @@ -574,7 +580,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>   *   an EPC page,
>   *   -errno on error
>   */
> -struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> +struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim)
>  {
>  	struct sgx_epc_page *page;
>  
> @@ -582,6 +588,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>  		page = __sgx_alloc_epc_page();
>  		if (!IS_ERR(page)) {
>  			page->owner = owner;
> +			page->type = type;
> +			page->flags = 0;
>  			break;
>  		}
>  
> @@ -616,14 +624,22 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>   * responsibility to make sure that the page is in uninitialized state. In other
>   * words, do EREMOVE, EWB or whatever operation is necessary before calling
>   * this function.
> + *
> + * Note that if the page has been tagged as poisoned, it is simply
> + * dropped on the floor instead of added to the free list to make
> + * sure we do not re-use it.
>   */
>  void sgx_free_epc_page(struct sgx_epc_page *page)
>  {
>  	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
>  	struct sgx_numa_node *node = section->node;
>  
> +	if (page->flags & SGX_EPC_PAGE_POISON)
> +		return;

I tend to think that it would be nice to collect them somewhere instead
purposely leaking. E.g. this gives possibility to examine list with
debugging tools.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-09 18:08   ` Jarkko Sakkinen
@ 2021-07-09 18:09     ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-07-09 18:09 UTC (permalink / raw)
  To: Tony Luck; +Cc: Dave Hansen, x86, linux-sgx, linux-kernel

On Fri, Jul 09, 2021 at 09:08:03PM +0300, Jarkko Sakkinen wrote:
> On Thu, Jul 08, 2021 at 11:14:20AM -0700, Tony Luck wrote:
> > Memory errors can be reported either synchronously as memory is accessed,
> > or asynchronously by speculative access or by a memory controller page
> > scrubber.  The life cycle of an EPC page takes it through:
> > 	dirty -> free -> in-use -> free.
> > 
> > Memory errors are reported using physical addresses. It is a simple
> > matter to find which sgx_epc_page structure maps a given address.
> > But then recovery code needs to be able to determine the current use of
> > the page to take the appropriate recovery action. Within the "in-use"
> > phase different actions are needed based on how the page is used in
> > the enclave.
> > 
> > Add new flags bits to describe the phase (with an extra bit for the new
> > phase of "poisoned"). Drop pages marked as poisoned instead of adding
> > them to a free list to make sure they are not re-used.
> > 
> > Add a type field to struct epc_page for how an in-use page has been
> > allocated. Re-use "enum sgx_page_type" for this type, with a couple
> > of additions for s/w types.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/include/asm/sgx.h      |  6 ++++++
> >  arch/x86/kernel/cpu/sgx/encl.c  |  4 ++--
> >  arch/x86/kernel/cpu/sgx/ioctl.c |  4 ++--
> >  arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++++++++++--
> >  arch/x86/kernel/cpu/sgx/sgx.h   | 14 ++++++++++++--
> >  arch/x86/kernel/cpu/sgx/virt.c  |  2 +-
> >  6 files changed, 42 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> > index 9c31e0ebc55b..9619a6d77a83 100644
> > --- a/arch/x86/include/asm/sgx.h
> > +++ b/arch/x86/include/asm/sgx.h
> > @@ -216,6 +216,8 @@ struct sgx_pageinfo {
> >   * %SGX_PAGE_TYPE_REG:	a regular page
> >   * %SGX_PAGE_TYPE_VA:	a VA page
> >   * %SGX_PAGE_TYPE_TRIM:	a page in trimmed state
> > + *
> > + * Also used to track current use of &struct sgx_epc_page
> >   */
> >  enum sgx_page_type {
> >  	SGX_PAGE_TYPE_SECS,
> > @@ -223,6 +225,10 @@ enum sgx_page_type {
> >  	SGX_PAGE_TYPE_REG,
> >  	SGX_PAGE_TYPE_VA,
> >  	SGX_PAGE_TYPE_TRIM,
> > +
> > +	/* sgx_epc_page.type */
> > +	SGX_PAGE_TYPE_FREE = 100,
> > +	SGX_PAGE_TYPE_KVM = 101,
> >  };
> >  
> >  #define SGX_NR_PAGE_TYPES	5
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 3be203297988..abf6e1a704c0 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -72,7 +72,7 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page,
> >  	struct sgx_epc_page *epc_page;
> >  	int ret;
> >  
> > -	epc_page = sgx_alloc_epc_page(encl_page, false);
> > +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, false);
> >  	if (IS_ERR(epc_page))
> >  		return epc_page;
> >  
> > @@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
> >  	struct sgx_epc_page *epc_page;
> >  	int ret;
> >  
> > -	epc_page = sgx_alloc_epc_page(NULL, true);
> > +	epc_page = sgx_alloc_epc_page(NULL,  SGX_PAGE_TYPE_VA, true);
> >  	if (IS_ERR(epc_page))
> >  		return ERR_CAST(epc_page);
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 83df20e3e633..a74ae00194cc 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -83,7 +83,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
> >  
> >  	encl->backing = backing;
> >  
> > -	secs_epc = sgx_alloc_epc_page(&encl->secs, true);
> > +	secs_epc = sgx_alloc_epc_page(&encl->secs, SGX_PAGE_TYPE_SECS, true);
> >  	if (IS_ERR(secs_epc)) {
> >  		ret = PTR_ERR(secs_epc);
> >  		goto err_out_backing;
> > @@ -300,7 +300,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
> >  	if (IS_ERR(encl_page))
> >  		return PTR_ERR(encl_page);
> >  
> > -	epc_page = sgx_alloc_epc_page(encl_page, true);
> > +	epc_page = sgx_alloc_epc_page(encl_page, SGX_PAGE_TYPE_REG, true);
> >  	if (IS_ERR(epc_page)) {
> >  		kfree(encl_page);
> >  		return PTR_ERR(epc_page);
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 63d3de02bbcc..643df87b3e01 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -401,7 +401,12 @@ static void sgx_reclaim_pages(void)
> >  		section = &sgx_epc_sections[epc_page->section];
> >  		node = section->node;
> >  
> > +		/* drop poison pages instead of adding to free list */
> > +		if (epc_page->flags & SGX_EPC_PAGE_POISON)
> > +			continue;
> > +
> >  		spin_lock(&node->lock);
> > +		epc_page->flags = SGX_EPC_PAGE_FREE;
> >  		list_add_tail(&epc_page->list, &node->free_page_list);
> >  		sgx_nr_free_pages++;
> >  		spin_unlock(&node->lock);
> > @@ -560,6 +565,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
> >  /**
> >   * sgx_alloc_epc_page() - Allocate an EPC page
> >   * @owner:	the owner of the EPC page
> > + * @type:	type of page being allocated
> >   * @reclaim:	reclaim pages if necessary
> >   *
> >   * Iterate through EPC sections and borrow a free EPC page to the caller. When a
> > @@ -574,7 +580,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
> >   *   an EPC page,
> >   *   -errno on error
> >   */
> > -struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> > +struct sgx_epc_page *sgx_alloc_epc_page(void *owner, enum sgx_page_type type, bool reclaim)
> >  {
> >  	struct sgx_epc_page *page;
> >  
> > @@ -582,6 +588,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >  		page = __sgx_alloc_epc_page();
> >  		if (!IS_ERR(page)) {
> >  			page->owner = owner;
> > +			page->type = type;
> > +			page->flags = 0;
> >  			break;
> >  		}
> >  
> > @@ -616,14 +624,22 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >   * responsibility to make sure that the page is in uninitialized state. In other
> >   * words, do EREMOVE, EWB or whatever operation is necessary before calling
> >   * this function.
> > + *
> > + * Note that if the page has been tagged as poisoned, it is simply
> > + * dropped on the floor instead of added to the free list to make
> > + * sure we do not re-use it.
> >   */
> >  void sgx_free_epc_page(struct sgx_epc_page *page)
> >  {
> >  	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
> >  	struct sgx_numa_node *node = section->node;
> >  
> > +	if (page->flags & SGX_EPC_PAGE_POISON)
> > +		return;
> 
> I tend to think that it would be nice to collect them somewhere instead
> purposely leaking. E.g. this gives possibility to examine list with
> debugging tools.

I'm not also sure why free and dirty pages need to be tagged. Why a
poison flag is enough? This could be better explained in the commit
message.


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
  2021-07-09 18:08   ` Jarkko Sakkinen
@ 2021-07-14 20:42   ` Reinette Chatre
  2021-07-14 20:59     ` Luck, Tony
  1 sibling, 1 reply; 185+ messages in thread
From: Reinette Chatre @ 2021-07-14 20:42 UTC (permalink / raw)
  To: Tony Luck; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Hi Tony,

On 7/8/2021 11:14 AM, Tony Luck wrote:
> 
> Add a type field to struct epc_page for how an in-use page has been
> allocated. Re-use "enum sgx_page_type" for this type, with a couple
> of additions for s/w types.

Tracking the enclave page type is a useful addition that will also help 
the SGX2 support where some instructions (ENCLS[EMODPR]) are only 
allowed on pages with particular type.

Could this tracking be done at the enclave page (struct sgx_encl_page) 
instead? The enclave page's EPC page information is not available when 
the page is in swap and it would be useful to know the page type without 
loading the page from swap. The information would continue to be 
accessible from struct epc_page via the owner pointer that may make some 
of the changes easier since it would not be needed to pass the page type 
around so much and thus possibly address the SECS page issue that Sean 
pointed out in
https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/

> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 4628acec0009..e43d3c27eb96 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -26,9 +26,19 @@
>   /* Pages, which are being tracked by the page reclaimer. */
>   #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
>   
> +/* Pages, on the "sgx_dirty_page_list" */
> +#define SGX_EPC_PAGE_DIRTY		BIT(1)
> +
> +/* Pages, on one of the node free lists */
> +#define SGX_EPC_PAGE_FREE		BIT(2)
> +
> +/* Pages, with h/w poison errors */
> +#define SGX_EPC_PAGE_POISON		BIT(3)
> +
>   struct sgx_epc_page {
>   	unsigned int section;
> -	unsigned int flags;
> +	u16 flags;
> +	u16 type;

Could this be "enum sgx_page_type type" ?

Reinette

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 20:42   ` Reinette Chatre
@ 2021-07-14 20:59     ` Luck, Tony
  2021-07-14 21:21       ` Reinette Chatre
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-07-14 20:59 UTC (permalink / raw)
  To: Chatre, Reinette
  Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

> Could this tracking be done at the enclave page (struct sgx_encl_page) 
> instead?

In principle yes. Though Sean has some issues with me tracking types
at all.

> The enclave page's EPC page information is not available when 
> the page is in swap and it would be useful to know the page type without 
> loading the page from swap. The information would continue to be 
> accessible from struct epc_page via the owner pointer that may make some 
> of the changes easier since it would not be needed to pass the page type 
> around so much and thus possibly address the SECS page issue that Sean 
> pointed out in
> https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/

I think I noticed that the "owner" pointer in sgx_encl_page doesn't point
back to the epc_page for all types of SGX pages. So some additional
changes would be needed. I'm not at all sure why this is different (or
what use the non-REG pages use "owner" for.

>>   struct sgx_epc_page {
>>   	unsigned int section;
>> -	unsigned int flags;
>> +	u16 flags;
>> +	u16 type;
>
> Could this be "enum sgx_page_type type" ?

Maybe. I thought I needed extra types (like FREE and DIRTY). But
Sean pointed out how to avoid some of them.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 20:59     ` Luck, Tony
@ 2021-07-14 21:21       ` Reinette Chatre
  2021-07-14 23:08         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Reinette Chatre @ 2021-07-14 21:21 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

Hi Tony,

On 7/14/2021 1:59 PM, Luck, Tony wrote:
>> Could this tracking be done at the enclave page (struct sgx_encl_page)
>> instead?
> 
> In principle yes. Though Sean has some issues with me tracking types
> at all.

For the SGX2 work knowing the page types are useful. Some instructions 
only work on certain page types and knowing beforehand whether an 
instruction could work helps to avoid dealing with the errors when it 
does not work.

>> The enclave page's EPC page information is not available when
>> the page is in swap and it would be useful to know the page type without
>> loading the page from swap. The information would continue to be
>> accessible from struct epc_page via the owner pointer that may make some
>> of the changes easier since it would not be needed to pass the page type
>> around so much and thus possibly address the SECS page issue that Sean
>> pointed out in
>> https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/
> 
> I think I noticed that the "owner" pointer in sgx_encl_page doesn't point
> back to the epc_page for all types of SGX pages. So some additional
> changes would be needed. I'm not at all sure why this is different (or
> what use the non-REG pages use "owner" for.

This may be VA pages? struct sgx_va_page also contains a pointer to an 
EPC page. I did not consider that for this case. Perhaps these could be 
identified uniquely.

Reinette

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 21:21       ` Reinette Chatre
@ 2021-07-14 23:08         ` Sean Christopherson
  2021-07-14 23:39           ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2021-07-14 23:08 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Luck, Tony, Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

On Wed, Jul 14, 2021, Reinette Chatre wrote:
> Hi Tony,
> 
> On 7/14/2021 1:59 PM, Luck, Tony wrote:
> > > Could this tracking be done at the enclave page (struct sgx_encl_page)
> > > instead?
> > 
> > In principle yes. Though Sean has some issues with me tracking types
> > at all.

I've no objection to tracking the type for SGX2, my argument in the context of
#MC support is that there should be no need to track the type.  Either the #MC
is recoverable or it isn't, and the enclave is toast regardless of what type of
page hit the #MC.

There might be a need to identify track vEPC pages, e.g. to avoid the retpoline
associated with a virtual function table, but IMO that would be better done as a
new flag instead of overloading the page type.  E.g. a page can be both a
vEPC page and an SECS/REG/VA page depending on its use in the guest.

> For the SGX2 work knowing the page types are useful. Some instructions only
> work on certain page types and knowing beforehand whether an instruction
> could work helps to avoid dealing with the errors when it does not work.

Yes, but the SGX2 use case is specific to "native" enclaves, i.e. it can and
should be limited to sgx_encl_page, as opposed to being shoved into sgx_epc_page.

> > > The enclave page's EPC page information is not available when
> > > the page is in swap and it would be useful to know the page type without
> > > loading the page from swap. The information would continue to be
> > > accessible from struct epc_page via the owner pointer that may make some
> > > of the changes easier since it would not be needed to pass the page type
> > > around so much and thus possibly address the SECS page issue that Sean
> > > pointed out in
> > > https://lore.kernel.org/lkml/YO3FuBupQTKYaKBf@google.com/
> > 
> > I think I noticed that the "owner" pointer in sgx_encl_page doesn't point
> > back to the epc_page for all types of SGX pages. So some additional
> > changes would be needed. I'm not at all sure why this is different (or
> > what use the non-REG pages use "owner" for.
> 
> This may be VA pages? struct sgx_va_page also contains a pointer to an EPC
> page. I did not consider that for this case. Perhaps these could be
> identified uniquely.

The "owner" is currently only used for reclaim.  IIRC, the proposed EPC cgroup
also used "owner" to enable forced "reclaim", i.e. reclaiming EPC by nuking the
owning entity, e.g. tearing down a virtual EPC section.  And I believe the cgroup
also used the aforementioned vEPC flag to invoke the correct EPC OOM reaper.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 23:08         ` Sean Christopherson
@ 2021-07-14 23:39           ` Luck, Tony
  2021-07-15 15:33             ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-07-14 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Chatre, Reinette
  Cc: Jarkko Sakkinen, Dave Hansen, x86, linux-sgx, linux-kernel

> I've no objection to tracking the type for SGX2, my argument in the context of
> #MC support is that there should be no need to track the type.  Either the #MC
> is recoverable or it isn't, and the enclave is toast regardless of what type of
> page hit the #MC.

I'll separate the "phase" from the "type".

Here phase is used for the life-cycle of EPC pages:

DIRTY -> FREE -> IN-USE -> DIRTY

Errors can be reported by memory controller page scrubbers
for pages that are not "IN-USE" ... and the recovery action is
just to make sure that they are never allocated.

When a page is IN-USE ... it has a "type". I currently
only have a way to inject errors into SGX_PAGE_TYPE_REG
pages. That means initial recovery code is going to focus on
those since that is all I can test. But I'll try not to special case
them as far as possible.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages
  2021-07-14 23:39           ` Luck, Tony
@ 2021-07-15 15:33             ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2021-07-15 15:33 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Chatre, Reinette, Jarkko Sakkinen, Dave Hansen, x86, linux-sgx,
	linux-kernel

On Wed, Jul 14, 2021, Luck, Tony wrote:
> > I've no objection to tracking the type for SGX2, my argument in the context of
> > #MC support is that there should be no need to track the type.  Either the #MC
> > is recoverable or it isn't, and the enclave is toast regardless of what type of
> > page hit the #MC.
> 
> I'll separate the "phase" from the "type".
> 
> Here phase is used for the life-cycle of EPC pages:
> 
> DIRTY -> FREE -> IN-USE -> DIRTY

Not that it affects anything, but that's not quite true.  In hardware, pages are
either FREE or IN-USE, there is no concept of DIRTY.  DIRTY is the kernel's
arbitrary description of a page that has not been sanitized and so is considered
to be in an unknown state, i.e. the kernel doesn't know if it's FREE or IN-USE.

Once a page is sanitized (during boot), its state is known and the page is never
put back on the so called dirty list, i.e. the software flow is:

  DIRTY -> FREE -> IN-USE -> FREE

> Errors can be reported by memory controller page scrubbers for pages that are
> not "IN-USE" ... and the recovery action is just to make sure that they are
> never allocated.
>
> When a page is IN-USE ... it has a "type". I currently only have a way to
> inject errors into SGX_PAGE_TYPE_REG pages. That means initial recovery code
> is going to focus on those since that is all I can test. But I'll try not to
> special case them as far as possible.

Inability to test expected behavior doesn't mean we shouldn't implement towards
the expected behavior, i.e. someone somewhere must know how SECS and VA pages
behave in response to a memory error.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 0/6] Basic recovery for machine checks inside SGX
  2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
                   ` (3 preceding siblings ...)
  2021-07-08 18:14 ` [PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-07-19 18:20 ` Tony Luck
  2021-07-19 18:20   ` [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
                     ` (7 more replies)
  4 siblings, 8 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-19 18:20 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

Very different from version 1 based on feedback.

Sean:	Didn't like tracking types of SGX pages, so that's all gone now. I
	do track the life cycle (in patch 1) using the "owner" field to
	determine whether a page is in use vs. dirty/free. Currently
	this series doesn't make use of that ... so patch 1 could be
	dropped. But it is very small, and I think a pre-requisite for
	future improvements to take pre-emptive action for asynch poison
	notification (rather that just hoping that the enclave will exit
	without accessing poison, or that if it does consume the poison
	the error will be recoverable).

	I think we should defer the whole asynch action to a subsequent
	series that can build on top of this (and do it properly ...
	my version 1 sent out SIGBUS signals without regard for system
	(/proc/sys/vm/memory_failure_early_kill) or per-task (prctl
	PR_MCE_KILL) policies).

Jarkko:	Said poison pages should not just be dropped on the floor. They
	should be added to a list for future tools to examine. I tried
	the list approach, but safely removing pages from free/dirty
	lists involved some complex locking, so I skipped ahead to the
	"tools" idea and just added files in debugfs to show the count
	of poison pages and a list of addresses (maybe the count is
	redundant? Could just "wc -l poison_page_list"?).

Other:	I got a complaint that after a poison page is handled Linux
	spits out this message:
		Could not invalidate pfn=0x2000c4d from 1:1 map
	this is from set_mce_nospec() and happens because EPC pages
	are not in the 1:1 map. Add code to check and ignore them.

Tony Luck (6):
  x86/sgx: Provide indication of life-cycle of EPC pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/encl.c                |   2 +-
 arch/x86/kernel/cpu/sgx/main.c                | 137 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |  19 ++-
 8 files changed, 195 insertions(+), 10 deletions(-)


base-commit: 2734d6c1b1a089fb593ef6a23d4b70903526fe0c
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-07-19 18:20   ` Tony Luck
  2021-07-19 18:28     ` Dave Hansen
  2021-07-27  2:04     ` Sakkinen, Jarkko
  2021-07-19 18:20   ` [PATCH v2 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
                     ` (6 subsequent siblings)
  7 siblings, 2 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-19 18:20 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

SGX EPC pages go through the following life cycle:

	DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

It would be good to use the sgx_epc_page->owner field as an indicator
of where an EPC page is currently in that cycle (owner != NULL means
the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
that calls with NULL.

Make the following changes:

1) Change the type of "owner" to "void *" (it can have other types
   besides "struct sgx_encl_page *).
2) Update sgx_alloc_va_page() to pass in a dummy non-NULL value in
   this case.
3) Add a check to sgx_free_epc_page() to prevent calling with NULL.
4) Reset owner to NULL in sgx_free_epc_page().

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 2 +-
 arch/x86/kernel/cpu/sgx/main.c | 6 ++++++
 arch/x86/kernel/cpu/sgx/sgx.h  | 2 +-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..ca328d56d230 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, true);
+	epc_page = sgx_alloc_epc_page("Not NULL!", true);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..d61bc1f635a1 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -578,6 +578,11 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 {
 	struct sgx_epc_page *page;
 
+	if (!owner) {
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
 	for ( ; ; ) {
 		page = __sgx_alloc_epc_page();
 		if (!IS_ERR(page)) {
@@ -624,6 +629,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
+	page->owner = NULL;
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..4e1a410b8a62 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -29,7 +29,7 @@
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 2/6] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
  2021-07-19 18:20   ` [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-07-19 18:20   ` Tony Luck
  2021-07-19 18:20   ` [PATCH v2 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-19 18:20 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

Add an end_phys_addr field to the sgx_epc_section structure and a
new function sgx_paddr_to_page() that searches all such structures
and returns the struct sgx_epc_page pointer if the address is an EPC
page. This function is only intended for use within SGX code.

Export a function sgx_is_epc_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index d61bc1f635a1..41753f81a071 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -654,6 +654,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -665,6 +666,29 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
+		section = &sgx_epc_sections[i];
+
+		if (paddr < section->phys_addr || paddr > section->end_phys_addr)
+			continue;
+
+		return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+	}
+
+	return NULL;
+}
+
+bool sgx_is_epc_page(u64 paddr)
+{
+	return !!sgx_paddr_to_page(paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4e1a410b8a62..226b081a4d05 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -50,6 +50,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 3/6] x86/sgx: Initial poison handling for dirty and free pages
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
  2021-07-19 18:20   ` [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
  2021-07-19 18:20   ` [PATCH v2 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-07-19 18:20   ` Tony Luck
  2021-07-27  2:08     ` Sakkinen, Jarkko
  2021-07-19 18:20   ` [PATCH v2 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-07-19 18:20 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add a new flag bit (SGX_EPC_PAGE_POISON) that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When allocating pages
3) When freeing epc pages

In all cases drop the poisoned page to make sure it will not be
reallocated.

Add debugfs files /sys/kernel/debug/sgx/poison_page_{count,list}
so that system administrators can see how many enclave pages have
been dropped and get a list of those pages.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 50 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 41753f81a071..db77f62d6ef1 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -11,6 +11,7 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include <linux/debugfs.h>
 #include <asm/sgx.h>
 #include "driver.h"
 #include "encl.h"
@@ -34,6 +35,9 @@ static unsigned long sgx_nr_free_pages;
 /* Nodes with one or more EPC sections. */
 static nodemask_t sgx_numa_mask;
 
+/* Maintain a count of poison pages */
+static u32 poison_page_count;
+
 /*
  * Array with one list_head for each possible NUMA node.  Each
  * list contains all the sgx_epc_section's which are on that
@@ -47,6 +51,9 @@ static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Don't try to clean a poisoned page. That might trigger a machine check.
+ * Just drop the page and move on.
  */
 static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
@@ -61,6 +68,11 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->flags & SGX_EPC_PAGE_POISON) {
+			list_del(&page->list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -567,6 +579,9 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
  * @reclaim is set to true, directly reclaim pages when we are out of pages. No
  * mm's can be locked when @reclaim is set to true.
  *
+ * A page on the free list might have been reported as poisoned by the patrol
+ * scrubber. If so, skip this page, and try again.
+ *
  * Finally, wake up ksgxd when the number of pages goes below the watermark
  * before returning back to the caller.
  *
@@ -585,6 +600,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 
 	for ( ; ; ) {
 		page = __sgx_alloc_epc_page();
+
+		if (page->flags & SGX_EPC_PAGE_POISON)
+			continue;
+
 		if (!IS_ERR(page)) {
 			page->owner = owner;
 			break;
@@ -621,6 +640,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * responsibility to make sure that the page is in uninitialized state. In other
  * words, do EREMOVE, EWB or whatever operation is necessary before calling
  * this function.
+ *
+ * Drop poison pages so they won't be reallocated.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
@@ -630,7 +651,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	spin_lock(&node->lock);
 
 	page->owner = NULL;
-	list_add_tail(&page->list, &node->free_page_list);
+	if (!(page->flags & SGX_EPC_PAGE_POISON))
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
 	spin_unlock(&node->lock);
@@ -820,8 +842,30 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static int poison_list_show(struct seq_file *m, void *private)
+{
+	struct sgx_epc_section *section;
+	struct sgx_epc_page *page;
+	unsigned long addr;
+	int i;
+
+	for (i = 0; i < SGX_MAX_EPC_SECTIONS; i++) {
+		section = &sgx_epc_sections[i];
+		page = section->pages;
+		for (addr = section->phys_addr; addr < section->end_phys_addr;
+		     addr += PAGE_SIZE, page++) {
+			if (page->flags & SGX_EPC_PAGE_POISON)
+				seq_printf(m, "0x%lx\n", addr);
+		}
+	}
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(poison_list);
+
 static int __init sgx_init(void)
 {
+	struct dentry *dir;
 	int ret;
 	int i;
 
@@ -853,6 +897,10 @@ static int __init sgx_init(void)
 	if (sgx_vepc_init() && ret)
 		goto err_provision;
 
+	dir = debugfs_create_dir("sgx", NULL);
+	debugfs_create_u32("poison_page_count", 0400, dir, &poison_page_count);
+	debugfs_create_file("poison_page_list", 0400, dir, NULL, &poison_list_fops);
+
 	return 0;
 
 err_provision:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 226b081a4d05..2c3987ecdfe4 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Poisoned pages */
+#define SGX_EPC_PAGE_POISON		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 4/6] x86/sgx: Add SGX infrastructure to recover from poison
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
                     ` (2 preceding siblings ...)
  2021-07-19 18:20   ` [PATCH v2 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-07-19 18:20   ` Tony Luck
  2021-07-19 18:20   ` [PATCH v2 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-19 18:20 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index db77f62d6ef1..430d3214d21e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -711,6 +711,63 @@ bool sgx_is_epc_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(sgx_is_epc_page);
 
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->flags & SGX_EPC_PAGE_POISON)
+		return 0;
+
+	page->flags |= SGX_EPC_PAGE_POISON;
+	poison_page_count++;
+
+	/*
+	 * Nothing more to do here for dirty/free pages.
+	 * They will be added to the poison list when
+	 * they get to the head of their lists.
+	 */
+	if (!page->owner)
+		return 0;
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
                     ` (3 preceding siblings ...)
  2021-07-19 18:20   ` [PATCH v2 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-07-19 18:20   ` Tony Luck
  2021-07-19 18:20   ` [PATCH v2 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-19 18:20 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

Add a call inside memory_failure() to check if the address is an SGX
EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 15 +++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..801af8f30c83 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (sgx_is_epc_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7ca22e6e694a..2ff599bcf8c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3283,5 +3283,20 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifdef CONFIG_X86_SGX
+int sgx_memory_failure(unsigned long pfn, int flags);
+bool sgx_is_epc_page(u64 paddr);
+#else
+static inline int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+
+static inline bool sgx_is_epc_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index eefd823deb67..3ce6b6aabf0f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1626,21 +1626,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = sgx_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 6/6] x86/sgx: Add hook to error injection address validation
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
                     ` (4 preceding siblings ...)
  2021-07-19 18:20   ` [PATCH v2 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
@ 2021-07-19 18:20   ` Tony Luck
  2021-07-27  1:54   ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Sakkinen, Jarkko
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-19 18:20 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..cd7cffc955bf 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !sgx_is_epc_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-19 18:20   ` [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-07-19 18:28     ` Dave Hansen
  2021-07-27  2:04     ` Sakkinen, Jarkko
  1 sibling, 0 replies; 185+ messages in thread
From: Dave Hansen @ 2021-07-19 18:28 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Jarkko Sakkinen; +Cc: x86, linux-kernel

On 7/19/21 11:20 AM, Tony Luck wrote:
> 1) Change the type of "owner" to "void *" (it can have other types
>    besides "struct sgx_encl_page *).

I see:

static int __sgx_vepc_fault(struct sgx_vepc *vepc,
{
...
        epc_page = sgx_alloc_epc_page(vepc, false);

where sgx_alloc_epc_page() sets page->owner=vepc.  But, I don't see a
*reader* anywhere.  Do we actually use that vepc anywhere?

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 0/6] Basic recovery for machine checks inside SGX
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
                     ` (5 preceding siblings ...)
  2021-07-19 18:20   ` [PATCH v2 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-07-27  1:54   ` Sakkinen, Jarkko
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Sakkinen, Jarkko @ 2021-07-27  1:54 UTC (permalink / raw)
  To: Luck, Tony, Hansen, Dave, seanjc; +Cc: linux-kernel, x86

On Mon, 2021-07-19 at 11:20 -0700, Tony Luck wrote:
> Very different from version 1 based on feedback.
> 
> Sean:	Didn't like tracking types of SGX pages, so that's all gone now. I
> 	do track the life cycle (in patch 1) using the "owner" field to
> 	determine whether a page is in use vs. dirty/free. Currently
> 	this series doesn't make use of that ... so patch 1 could be
> 	dropped. But it is very small, and I think a pre-requisite for
> 	future improvements to take pre-emptive action for asynch poison
> 	notification (rather that just hoping that the enclave will exit
> 	without accessing poison, or that if it does consume the poison
> 	the error will be recoverable).
> 
> 	I think we should defer the whole asynch action to a subsequent
> 	series that can build on top of this (and do it properly ...
> 	my version 1 sent out SIGBUS signals without regard for system
> 	(/proc/sys/vm/memory_failure_early_kill) or per-task (prctl
> 	PR_MCE_KILL) policies).
> 
> Jarkko:	Said poison pages should not just be dropped on the floor. They
> 	should be added to a list for future tools to examine. I tried
> 	the list approach, but safely removing pages from free/dirty
> 	lists involved some complex locking, so I skipped ahead to the
> 	"tools" idea and just added files in debugfs to show the count
> 	of poison pages and a list of addresses (maybe the count is
> 	redundant? Could just "wc -l poison_page_list"?).
> 
> Other:	I got a complaint that after a poison page is handled Linux
> 	spits out this message:
> 		Could not invalidate pfn=0x2000c4d from 1:1 map
> 	this is from set_mce_nospec() and happens because EPC pages
> 	are not in the 1:1 map. Add code to check and ignore them.
> 
> Tony Luck (6):
>   x86/sgx: Provide indication of life-cycle of EPC pages
>   x86/sgx: Add infrastructure to identify SGX EPC pages
>   x86/sgx: Initial poison handling for dirty and free pages
>   x86/sgx: Add SGX infrastructure to recover from poison
>   x86/sgx: Hook sgx_memory_failure() into mainline code
>   x86/sgx: Add hook to error injection address validation
> 
>  .../firmware-guide/acpi/apei/einj.rst         |  19 +++
>  arch/x86/include/asm/set_memory.h             |   4 +
>  arch/x86/kernel/cpu/sgx/encl.c                |   2 +-
>  arch/x86/kernel/cpu/sgx/main.c                | 137 +++++++++++++++++-
>  arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
>  drivers/acpi/apei/einj.c                      |   3 +-
>  include/linux/mm.h                            |  15 ++
>  mm/memory-failure.c                           |  19 ++-
>  8 files changed, 195 insertions(+), 10 deletions(-)
> 
> 
> base-commit: 2734d6c1b1a089fb593ef6a23d4b70903526fe0c

Use jarkko@kernel.org in future versions.

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-19 18:20   ` [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
  2021-07-19 18:28     ` Dave Hansen
@ 2021-07-27  2:04     ` Sakkinen, Jarkko
  1 sibling, 0 replies; 185+ messages in thread
From: Sakkinen, Jarkko @ 2021-07-27  2:04 UTC (permalink / raw)
  To: Luck, Tony, Hansen, Dave, seanjc; +Cc: linux-kernel, x86

On Mon, 2021-07-19 at 11:20 -0700, Tony Luck wrote:
> SGX EPC pages go through the following life cycle:
> 
> 	DIRTY ---> FREE ---> IN-USE --\
>                     ^                 |
>                     \-----------------/
> 
> Recovery action for poison for a DIRTY or FREE page is simple. Just
> make sure never to allocate the page. IN-USE pages need some extra
> handling.
> 
> It would be good to use the sgx_epc_page->owner field as an indicator
> of where an EPC page is currently in that cycle (owner != NULL means
> the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
> that calls with NULL.
> 
> Make the following changes:
> 
> 1) Change the type of "owner" to "void *" (it can have other types
>    besides "struct sgx_encl_page *).
> 2) Update sgx_alloc_va_page() to pass in a dummy non-NULL value in
>    this case.
> 3) Add a check to sgx_free_epc_page() to prevent calling with NULL.
> 4) Reset owner to NULL in sgx_free_epc_page().
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c | 2 +-
>  arch/x86/kernel/cpu/sgx/main.c | 6 ++++++
>  arch/x86/kernel/cpu/sgx/sgx.h  | 2 +-
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 001808e3901c..ca328d56d230 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -679,7 +679,7 @@ struct sgx_epc_page *sgx_alloc_va_page(void)
>  	struct sgx_epc_page *epc_page;
>  	int ret;
>  
> -	epc_page = sgx_alloc_epc_page(NULL, true);
> +	epc_page = sgx_alloc_epc_page("Not NULL!", true);


I would instead set owner to epc_page inside sgx_alloc_epc_page(),
when NULL is passed to owner. That would be semantically sound.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 3/6] x86/sgx: Initial poison handling for dirty and free pages
  2021-07-19 18:20   ` [PATCH v2 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-07-27  2:08     ` Sakkinen, Jarkko
  0 siblings, 0 replies; 185+ messages in thread
From: Sakkinen, Jarkko @ 2021-07-27  2:08 UTC (permalink / raw)
  To: Luck, Tony, Hansen, Dave, seanjc; +Cc: linux-kernel, x86

On Mon, 2021-07-19 at 11:20 -0700, Tony Luck wrote:
> +	dir = debugfs_create_dir("sgx", NULL);
> +	debugfs_create_u32("poison_page_count", 0400, dir, &poison_page_count);
> +	debugfs_create_file("poison_page_list", 0400, dir, NULL, &poison_list_fops);

I'm adding debugfs attributes in my reclaimer kselftest patch
set. The feedback that I got from Boris for that is that these
must be documented in Documentation/x86/sgx.rst.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 0/7] Basic recovery for machine checks inside SGX
  2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
                     ` (6 preceding siblings ...)
  2021-07-27  1:54   ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Sakkinen, Jarkko
@ 2021-07-28 20:46   ` Tony Luck
  2021-07-28 20:46     ` [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
                       ` (7 more replies)
  7 siblings, 8 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

Changes since v2:

Jarkko:
	1) Don't provide a dummy non-NULL value for "owner" of new SGX EPC
	   pages at the call site. Instead change sgx_alloc_epc_page() to
	   provide a non-NULL value.
	2) Add description of the new debugfs files to sgx.rst
	   [Added a whole section on uncorrected memory errors]

Tony Luck (7):
  x86/sgx: Provide indication of life-cycle of EPC pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add documentation for SGX memory errors

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 Documentation/x86/sgx.rst                     |  26 ++++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 134 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |  19 ++-
 8 files changed, 216 insertions(+), 10 deletions(-)


base-commit: ff1176468d368232b684f75e82563369208bc371
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
@ 2021-07-28 20:46     ` Tony Luck
  2021-07-28 22:12       ` Dave Hansen
  2021-07-28 20:46     ` [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

SGX EPC pages go through the following life cycle:

	DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

It would be good to use the sgx_epc_page->owner field as an indicator
of where an EPC page is currently in that cycle (owner != NULL means
the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
that calls with NULL.

Make the following changes:

1) Change the type of "owner" to "void *" (it can have other types
   besides "struct sgx_encl_page *).
2) Add a check to sgx_free_epc_page(). If the caller specified the
   owner as NULL, then set the owner field to self-reference the
   SGX epc page itself.
3) Reset owner to NULL in sgx_free_epc_page().

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 3 ++-
 arch/x86/kernel/cpu/sgx/sgx.h  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..17d09186a6c2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -581,7 +581,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 	for ( ; ; ) {
 		page = __sgx_alloc_epc_page();
 		if (!IS_ERR(page)) {
-			page->owner = owner;
+			page->owner = owner ? owner : page;
 			break;
 		}
 
@@ -624,6 +624,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
+	page->owner = NULL;
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..4e1a410b8a62 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -29,7 +29,7 @@
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
  2021-07-28 20:46     ` [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-07-28 20:46     ` Tony Luck
  2021-07-28 22:19       ` Dave Hansen
  2021-07-28 20:46     ` [PATCH v3 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                       ` (5 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

Add an end_phys_addr field to the sgx_epc_section structure and a
new function sgx_paddr_to_page() that searches all such structures
and returns the struct sgx_epc_page pointer if the address is an EPC
page. This function is only intended for use within SGX code.

Export a function sgx_is_epc_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 17d09186a6c2..ce40c010c9cb 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -649,6 +649,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -660,6 +661,29 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
+		section = &sgx_epc_sections[i];
+
+		if (paddr < section->phys_addr || paddr > section->end_phys_addr)
+			continue;
+
+		return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+	}
+
+	return NULL;
+}
+
+bool sgx_is_epc_page(u64 paddr)
+{
+	return !!sgx_paddr_to_page(paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4e1a410b8a62..226b081a4d05 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -50,6 +50,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
  2021-07-28 20:46     ` [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
  2021-07-28 20:46     ` [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-07-28 20:46     ` Tony Luck
  2021-07-30  0:42       ` Jarkko Sakkinen
  2021-07-28 20:46     ` [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add a new flag bit (SGX_EPC_PAGE_POISON) that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When allocating pages
3) When freeing epc pages

In all cases drop the poisoned page to make sure it will not be
reallocated.

Add debugfs files /sys/kernel/debug/sgx/poison_page_{count,list}
so that system administrators can see how many enclave pages have
been dropped and get a list of those pages.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 50 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ce40c010c9cb..354f0abec12d 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -11,6 +11,7 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
+#include <linux/debugfs.h>
 #include <asm/sgx.h>
 #include "driver.h"
 #include "encl.h"
@@ -34,6 +35,9 @@ static unsigned long sgx_nr_free_pages;
 /* Nodes with one or more EPC sections. */
 static nodemask_t sgx_numa_mask;
 
+/* Maintain a count of poison pages */
+static u32 poison_page_count;
+
 /*
  * Array with one list_head for each possible NUMA node.  Each
  * list contains all the sgx_epc_section's which are on that
@@ -47,6 +51,9 @@ static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Don't try to clean a poisoned page. That might trigger a machine check.
+ * Just drop the page and move on.
  */
 static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
@@ -61,6 +68,11 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->flags & SGX_EPC_PAGE_POISON) {
+			list_del(&page->list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -567,6 +579,9 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
  * @reclaim is set to true, directly reclaim pages when we are out of pages. No
  * mm's can be locked when @reclaim is set to true.
  *
+ * A page on the free list might have been reported as poisoned by the patrol
+ * scrubber. If so, skip this page, and try again.
+ *
  * Finally, wake up ksgxd when the number of pages goes below the watermark
  * before returning back to the caller.
  *
@@ -580,6 +595,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 
 	for ( ; ; ) {
 		page = __sgx_alloc_epc_page();
+
+		if (page->flags & SGX_EPC_PAGE_POISON)
+			continue;
+
 		if (!IS_ERR(page)) {
 			page->owner = owner ? owner : page;
 			break;
@@ -616,6 +635,8 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * responsibility to make sure that the page is in uninitialized state. In other
  * words, do EREMOVE, EWB or whatever operation is necessary before calling
  * this function.
+ *
+ * Drop poison pages so they won't be reallocated.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
@@ -625,7 +646,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	spin_lock(&node->lock);
 
 	page->owner = NULL;
-	list_add_tail(&page->list, &node->free_page_list);
+	if (!(page->flags & SGX_EPC_PAGE_POISON))
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
 	spin_unlock(&node->lock);
@@ -815,8 +837,30 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static int poison_list_show(struct seq_file *m, void *private)
+{
+	struct sgx_epc_section *section;
+	struct sgx_epc_page *page;
+	unsigned long addr;
+	int i;
+
+	for (i = 0; i < SGX_MAX_EPC_SECTIONS; i++) {
+		section = &sgx_epc_sections[i];
+		page = section->pages;
+		for (addr = section->phys_addr; addr < section->end_phys_addr;
+		     addr += PAGE_SIZE, page++) {
+			if (page->flags & SGX_EPC_PAGE_POISON)
+				seq_printf(m, "0x%lx\n", addr);
+		}
+	}
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(poison_list);
+
 static int __init sgx_init(void)
 {
+	struct dentry *dir;
 	int ret;
 	int i;
 
@@ -848,6 +892,10 @@ static int __init sgx_init(void)
 	if (sgx_vepc_init() && ret)
 		goto err_provision;
 
+	dir = debugfs_create_dir("sgx", NULL);
+	debugfs_create_u32("poison_page_count", 0400, dir, &poison_page_count);
+	debugfs_create_file("poison_page_list", 0400, dir, NULL, &poison_list_fops);
+
 	return 0;
 
 err_provision:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 226b081a4d05..2c3987ecdfe4 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Poisoned pages */
+#define SGX_EPC_PAGE_POISON		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
                       ` (2 preceding siblings ...)
  2021-07-28 20:46     ` [PATCH v3 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-07-28 20:46     ` Tony Luck
  2021-07-28 22:29       ` Dave Hansen
  2021-07-28 20:46     ` [PATCH v3 5/7] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 354f0abec12d..7d281dba707a 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -706,6 +706,63 @@ bool sgx_is_epc_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(sgx_is_epc_page);
 
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->flags & SGX_EPC_PAGE_POISON)
+		return 0;
+
+	page->flags |= SGX_EPC_PAGE_POISON;
+	poison_page_count++;
+
+	/*
+	 * Nothing more to do here for dirty/free pages.
+	 * They will be added to the poison list when
+	 * they get to the head of their lists.
+	 */
+	if (!page->owner)
+		return 0;
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 5/7] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
                       ` (3 preceding siblings ...)
  2021-07-28 20:46     ` [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-07-28 20:46     ` Tony Luck
  2021-07-28 20:46     ` [PATCH v3 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

Add a call inside memory_failure() to check if the address is an SGX
EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 15 +++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..801af8f30c83 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (sgx_is_epc_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7ca22e6e694a..2ff599bcf8c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3283,5 +3283,20 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifdef CONFIG_X86_SGX
+int sgx_memory_failure(unsigned long pfn, int flags);
+bool sgx_is_epc_page(u64 paddr);
+#else
+static inline int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+
+static inline bool sgx_is_epc_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index eefd823deb67..3ce6b6aabf0f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1626,21 +1626,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = sgx_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 6/7] x86/sgx: Add hook to error injection address validation
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
                       ` (4 preceding siblings ...)
  2021-07-28 20:46     ` [PATCH v3 5/7] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
@ 2021-07-28 20:46     ` Tony Luck
  2021-07-28 20:46     ` [PATCH v3 7/7] x86/sgx: Add documentation for SGX memory errors Tony Luck
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..cd7cffc955bf 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !sgx_is_epc_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v3 7/7] x86/sgx: Add documentation for SGX memory errors
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
                       ` (5 preceding siblings ...)
  2021-07-28 20:46     ` [PATCH v3 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-07-28 20:46     ` Tony Luck
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-07-28 20:46 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: x86, linux-kernel, Tony Luck

Error handling is a bit different for SGX pages. Add a section describing
how asynchronous and consumed errors are handled and the two new
debugfs files that show the count and list of pages with uncorrected
memory errors.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/x86/sgx.rst | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index dd0ac96ff9ef..461bd1daa565 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -250,3 +250,29 @@ user wants to deploy SGX applications both on the host and in guests
 on the same machine, the user should reserve enough EPC (by taking out
 total virtual EPC size of all SGX VMs from the physical EPC size) for
 host SGX applications so they can run with acceptable performance.
+
+Uncorrected memory errors
+=========================
+Systems that support machine check recovery and have local machine
+check delivery enabled can recover from uncorrected memory errors in
+many situations.
+
+Errors in SGX pages that are not currently in use will prevent those
+pages from being allocated.
+
+Errors asynchronously reported against active SGX pages will simply note
+that the page has an error. If the enclave terminates without accessing
+the page Linux will not return it to the free list for reallocation.
+
+When an uncorrected memory error is consumed from within an enclave the
+h/w will mark that enclave so that it cannot be re-entered.  Linux will
+send a SIGBUS to the current task.
+
+In addition to console log entries from processing the machine check or
+corrected machine check interrupt, Linux also provides debugfs files to
+indicate the number of SGX enclave pages that have reported errors and
+the physical addresses of each page:
+
+/sys/kernel/debug/sgx/poison_page_count
+
+/sys/kernel/debug/sgx/poison_page_list
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 20:46     ` [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-07-28 22:12       ` Dave Hansen
  2021-07-28 22:57         ` Luck, Tony
  2021-07-30  0:33         ` Jarkko Sakkinen
  0 siblings, 2 replies; 185+ messages in thread
From: Dave Hansen @ 2021-07-28 22:12 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Jarkko Sakkinen; +Cc: x86, linux-kernel

On 7/28/21 1:46 PM, Tony Luck wrote:
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -581,7 +581,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>  	for ( ; ; ) {
>  		page = __sgx_alloc_epc_page();
>  		if (!IS_ERR(page)) {
> -			page->owner = owner;
> +			page->owner = owner ? owner : page;
>  			break;
>  		}

I'm a little worried about this.

Let's say we get confused about the type of the page and dereference
page->owner.  If it's NULL, we get a nice oops.  If it's a real, valid
pointer, we get real valid memory back that we can scribble on.

Wouldn't it be safer to do something like:

	page->owner = owner ? owner : (void *)-1;

-1 is non-NULL, but also invalid, which makes it harder for us to poke
ourselves in the eye.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-28 20:46     ` [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-07-28 22:19       ` Dave Hansen
  2021-07-30  0:38         ` Jarkko Sakkinen
  0 siblings, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-07-28 22:19 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Jarkko Sakkinen; +Cc: x86, linux-kernel

On 7/28/21 1:46 PM, Tony Luck wrote:
> Export a function sgx_is_epc_page() that simply reports whether an
> address is an EPC page for use elsewhere in the kernel.

It would be really nice to mention why this needs to be exported to
modules.  I assume it's the error injection driver or something that can
be built as a module, but this export was a surprise when I saw it.

It's probably also worth noting that this is a sloooooooow
implementation compared to the core VM code that does something
analogous: pfn_to_page().  It's fine for error handling, but we should
probably have a comment to this effect so that more liberal use doesn't
creep in anywhere.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-07-28 20:46     ` [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-07-28 22:29       ` Dave Hansen
  2021-07-28 23:00         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-07-28 22:29 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Jarkko Sakkinen; +Cc: x86, linux-kernel

On 7/28/21 1:46 PM, Tony Luck wrote:
> +int sgx_memory_failure(unsigned long pfn, int flags)
> +{
...
> +	page->flags |= SGX_EPC_PAGE_POISON;

Is this safe outside of any locks?

I see the reclaimer doing things like:

                epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;

I'd worry that this code and other non-atomic epc_page->flags
manipulation could trample on each other.

This might need to some some atomic bit manipulation *and* convert all
the other epc_page->flags users.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 22:12       ` Dave Hansen
@ 2021-07-28 22:57         ` Luck, Tony
  2021-07-28 23:12           ` Dave Hansen
  2021-07-30  0:34           ` Jarkko Sakkinen
  2021-07-30  0:33         ` Jarkko Sakkinen
  1 sibling, 2 replies; 185+ messages in thread
From: Luck, Tony @ 2021-07-28 22:57 UTC (permalink / raw)
  To: Hansen, Dave, Sean Christopherson, Jarkko Sakkinen; +Cc: x86, linux-kernel

> Wouldn't it be safer to do something like:
>
>	page->owner = owner ? owner : (void *)-1;
>
> -1 is non-NULL, but also invalid, which makes it harder for us to poke
> ourselves in the eye.

Does Linux have some #define INVALID_POINTER thing that
provides a guaranteed bad (e.g. non-canonical) value?

(void *)-1 seems hacky.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-07-28 22:29       ` Dave Hansen
@ 2021-07-28 23:00         ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2021-07-28 23:00 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Tony Luck, Jarkko Sakkinen, x86, linux-kernel

On Wed, Jul 28, 2021, Dave Hansen wrote:
> On 7/28/21 1:46 PM, Tony Luck wrote:
> > +int sgx_memory_failure(unsigned long pfn, int flags)
> > +{
> ...
> > +	page->flags |= SGX_EPC_PAGE_POISON;
> 
> Is this safe outside of any locks?

It's safe outside of sgx_reclaimer_lock iff this can guarantee nothing else can
reach the page.  I'm pretty sure that doesn't hold true here.

> I see the reclaimer doing things like:
> 
>                 epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> 
> I'd worry that this code and other non-atomic epc_page->flags
> manipulation could trample on each other.
> 
> This might need to some some atomic bit manipulation *and* convert all
> the other epc_page->flags users.

I don't think atomics would be sufficient as that would open all sorts of possible
races.  E.g. this new code in __sgx_sanitize_pages()

                page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);

+               if (page->flags & SGX_EPC_PAGE_POISON) {
+                       list_del(&page->list);
+                       continue;
+               }
+
		***HERE***
                ret = __eremove(sgx_get_epc_virt_addr(page));

could attempt EREMOVE on a freshly POISONed page.  That appears to be "benign"
since ENCLS is wrapped with_ASM_EXTABLE_FAULT, but it feels wrong to add a check
that we know can race.

And similar races for allocation/free could hand out a poisoned page or add one
to the free list.

@@ -585,6 +600,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)

        for ( ; ; ) {
                page = __sgx_alloc_epc_page();
+
+               if (page->flags & SGX_EPC_PAGE_POISON)
+                       continue;
		*** HERE ***
+


@@ -630,7 +651,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
        spin_lock(&node->lock);

        page->owner = NULL;
-       list_add_tail(&page->list, &node->free_page_list);
+       if (!(page->flags & SGX_EPC_PAGE_POISON))
		*** HERE ***
+               list_add_tail(&page->list, &node->free_page_list);


Setting POISON and hoping we eventually notice doesn't sound robust.  Maybe some
of these races are unavoidable due to the nature of #MC delivery, but I would hope
the kernel can at least avoid handing out a poisoned page to a different enclave.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 22:57         ` Luck, Tony
@ 2021-07-28 23:12           ` Dave Hansen
  2021-07-28 23:32             ` Sean Christopherson
  2021-07-30  0:34           ` Jarkko Sakkinen
  1 sibling, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-07-28 23:12 UTC (permalink / raw)
  To: Luck, Tony, Sean Christopherson, Jarkko Sakkinen; +Cc: x86, linux-kernel

On 7/28/21 3:57 PM, Luck, Tony wrote:
>> Wouldn't it be safer to do something like:
>>
>> 	page->owner = owner ? owner : (void *)-1;
>>
>> -1 is non-NULL, but also invalid, which makes it harder for us to poke
>> ourselves in the eye.
> Does Linux have some #define INVALID_POINTER thing that
> provides a guaranteed bad (e.g. non-canonical) value?
> 
> (void *)-1 seems hacky.

ERR_PTR(-SOMETHING) wouldn't be too bad.  I guess it could even be:

	page->owner = ERR_PTR(SGX_EPC_PAGE_VA);

and then:

#define SGX_EPC_PAGE_VA 0xffff...something...greppable

I *thought* we had a file full of these magic values, but maybe I'm
misremembering the uapi magic header.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 23:12           ` Dave Hansen
@ 2021-07-28 23:32             ` Sean Christopherson
  2021-07-28 23:48               ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2021-07-28 23:32 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Luck, Tony, Jarkko Sakkinen, x86, linux-kernel

On Wed, Jul 28, 2021, Dave Hansen wrote:
> On 7/28/21 3:57 PM, Luck, Tony wrote:
> >> Wouldn't it be safer to do something like:
> >>
> >> 	page->owner = owner ? owner : (void *)-1;
> >>
> >> -1 is non-NULL, but also invalid, which makes it harder for us to poke
> >> ourselves in the eye.
> > Does Linux have some #define INVALID_POINTER thing that
> > provides a guaranteed bad (e.g. non-canonical) value?
> > 
> > (void *)-1 seems hacky.
> 
> ERR_PTR(-SOMETHING) wouldn't be too bad.  I guess it could even be:
> 
> 	page->owner = ERR_PTR(SGX_EPC_PAGE_VA);
> 
> and then:
> 
> #define SGX_EPC_PAGE_VA 0xffff...something...greppable
> 
> I *thought* we had a file full of these magic values, but maybe I'm
> misremembering the uapi magic header.

Rather than use a magic const, just pass in the actual va_page.  The only reason
NULL is passed is that prior to virtual EPC, there were only enclave pages and
VA pages, and assiging a non-NULL pointer to sgx_epc_page.owner, which is a
struct sgx_encl_page, was gross.  Virtual EPC sets owner somewhat prematurely;
it's needed iff an EPC cgroup is added, to support OOM EPC killing (and a pointer
to va_page is also needed in this case).

sgx_epc_page.owner can even be converted to 'void *' without additional changes
since all consumers capture it in a local sgx_encl_page variable.


diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..f9da8fe4dd6b 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -674,12 +674,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(void)
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page)
 {
        struct sgx_epc_page *epc_page;
        int ret;

-       epc_page = sgx_alloc_epc_page(NULL, true);
+       epc_page = sgx_alloc_epc_page(va_page, true);
        if (IS_ERR(epc_page))
                return ERR_CAST(epc_page);

diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..3d12dbeae14a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
                                  struct sgx_encl_page *page);

-struct sgx_epc_page *sgx_alloc_va_page(void);
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..655ce0bb069d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
                if (!va_page)
                        return ERR_PTR(-ENOMEM);

-               va_page->epc_page = sgx_alloc_va_page();
+               va_page->epc_page = sgx_alloc_va_page(va_page);
                if (IS_ERR(va_page->epc_page)) {
                        err = ERR_CAST(va_page->epc_page);
                        kfree(va_page);
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..4e1a410b8a62 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -29,7 +29,7 @@
 struct sgx_epc_page {
        unsigned int section;
        unsigned int flags;
-       struct sgx_encl_page *owner;
+       void *owner;
        struct list_head list;
 };



^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 23:32             ` Sean Christopherson
@ 2021-07-28 23:48               ` Luck, Tony
  2021-07-29  0:07                 ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-07-28 23:48 UTC (permalink / raw)
  To: Sean Christopherson, Hansen, Dave; +Cc: Jarkko Sakkinen, x86, linux-kernel

> -       epc_page = sgx_alloc_epc_page(NULL, true);
> +       epc_page = sgx_alloc_epc_page(va_page, true);

Providing a real value for the owner seems much better than all the hacks
to invent a value to use instead of NULL.

Can you add a "Signed-off-by"? Then I'll replace my part 0001 with your version.

-Tony

[Just need to coax you into re-writing all the other parts for me now :-) ]

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 23:48               ` Luck, Tony
@ 2021-07-29  0:07                 ` Sean Christopherson
  2021-07-29  0:42                   ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2021-07-29  0:07 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Hansen, Dave, Jarkko Sakkinen, x86, linux-kernel

On Wed, Jul 28, 2021, Luck, Tony wrote:
> > -       epc_page = sgx_alloc_epc_page(NULL, true);
> > +       epc_page = sgx_alloc_epc_page(va_page, true);
> 
> Providing a real value for the owner seems much better than all the hacks
> to invent a value to use instead of NULL.
> 
> Can you add a "Signed-off-by"? Then I'll replace my part 0001 with your version.

Signed-off-by: Sean Christopherson <seanjc@google.com>

> -Tony
> 
> [Just need to coax you into re-writing all the other parts for me now :-) ]

LOL, it might be easier to convince folks to just kill off SGX ;-)

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-29  0:07                 ` Sean Christopherson
@ 2021-07-29  0:42                   ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-07-29  0:42 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Hansen, Dave, Jarkko Sakkinen, x86, linux-kernel

On Thu, Jul 29, 2021 at 12:07:08AM +0000, Sean Christopherson wrote:
> On Wed, Jul 28, 2021, Luck, Tony wrote:
> > > -       epc_page = sgx_alloc_epc_page(NULL, true);
> > > +       epc_page = sgx_alloc_epc_page(va_page, true);
> > 
> > Providing a real value for the owner seems much better than all the hacks
> > to invent a value to use instead of NULL.
> > 
> > Can you add a "Signed-off-by"? Then I'll replace my part 0001 with your version.

My commit comment (updated to match how the code actually changed).
Sean's code.

N.B. I added the kernel doc entry for the new argument to sgx_alloc_va_page()

+ * @va_page:   struct sgx_va_page connected to this VA page

If you have something better, then I will swap that line out too.

-Tony

From: Sean Christopherson <seanjc@google.com>
Subject: [PATCH] x86/sgx: Provide indication of life-cycle of EPC pages

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

It would be good to use the sgx_epc_page->owner field as an indicator
of where an EPC page is currently in that cycle (owner != NULL means
the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
that calls with NULL.

Fix up the one holdout to provide a non-NULL owner.

Also change the type of "owner" to "void *" (since it can have other
types besides "struct sgx_encl_page *").

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 5 +++--
 arch/x86/kernel/cpu/sgx/encl.h  | 2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
 arch/x86/kernel/cpu/sgx/sgx.h   | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..ad8c61933b0a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -667,6 +667,7 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 
 /**
  * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ * @va_page:	struct sgx_va_page connected to this VA page
  *
  * Allocate a free EPC page and convert it to a Version Array (VA) page.
  *
@@ -674,12 +675,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(void)
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page)
 {
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, true);
+	epc_page = sgx_alloc_epc_page(va_page, true);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..3d12dbeae14a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
 
-struct sgx_epc_page *sgx_alloc_va_page(void);
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..655ce0bb069d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 		if (!va_page)
 			return ERR_PTR(-ENOMEM);
 
-		va_page->epc_page = sgx_alloc_va_page();
+		va_page->epc_page = sgx_alloc_va_page(va_page);
 		if (IS_ERR(va_page->epc_page)) {
 			err = ERR_CAST(va_page->epc_page);
 			kfree(va_page);
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..4e1a410b8a62 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -29,7 +29,7 @@
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 22:12       ` Dave Hansen
  2021-07-28 22:57         ` Luck, Tony
@ 2021-07-30  0:33         ` Jarkko Sakkinen
  1 sibling, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-07-30  0:33 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Tony Luck, Sean Christopherson, x86, linux-kernel

On Wed, Jul 28, 2021 at 03:12:03PM -0700, Dave Hansen wrote:
> On 7/28/21 1:46 PM, Tony Luck wrote:
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -581,7 +581,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >  	for ( ; ; ) {
> >  		page = __sgx_alloc_epc_page();
> >  		if (!IS_ERR(page)) {
> > -			page->owner = owner;
> > +			page->owner = owner ? owner : page;
> >  			break;
> >  		}
> 
> I'm a little worried about this.
> 
> Let's say we get confused about the type of the page and dereference
> page->owner.  If it's NULL, we get a nice oops.  If it's a real, valid
> pointer, we get real valid memory back that we can scribble on.
> 
> Wouldn't it be safer to do something like:
> 
> 	page->owner = owner ? owner : (void *)-1;
> 
> -1 is non-NULL, but also invalid, which makes it harder for us to poke
> ourselves in the eye.

Works for me.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-07-28 22:57         ` Luck, Tony
  2021-07-28 23:12           ` Dave Hansen
@ 2021-07-30  0:34           ` Jarkko Sakkinen
  1 sibling, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-07-30  0:34 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Hansen, Dave, Sean Christopherson, x86, linux-kernel

On Wed, Jul 28, 2021 at 10:57:07PM +0000, Luck, Tony wrote:
> > Wouldn't it be safer to do something like:
> >
> >	page->owner = owner ? owner : (void *)-1;
> >
> > -1 is non-NULL, but also invalid, which makes it harder for us to poke
> > ourselves in the eye.
> 
> Does Linux have some #define INVALID_POINTER thing that
> provides a guaranteed bad (e.g. non-canonical) value?
> 
> (void *)-1 seems hacky.
> 
> -Tony

MAP_FAILED?

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-28 22:19       ` Dave Hansen
@ 2021-07-30  0:38         ` Jarkko Sakkinen
  2021-07-30 16:46           ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-07-30  0:38 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Tony Luck, Sean Christopherson, x86, linux-kernel

On Wed, Jul 28, 2021 at 03:19:46PM -0700, Dave Hansen wrote:
> On 7/28/21 1:46 PM, Tony Luck wrote:
> > Export a function sgx_is_epc_page() that simply reports whether an
> > address is an EPC page for use elsewhere in the kernel.
> 
> It would be really nice to mention why this needs to be exported to
> modules.  I assume it's the error injection driver or something that can
> be built as a module, but this export was a surprise when I saw it.
> 
> It's probably also worth noting that this is a sloooooooow
> implementation compared to the core VM code that does something
> analogous: pfn_to_page().  It's fine for error handling, but we should
> probably have a comment to this effect so that more liberal use doesn't
> creep in anywhere.

You could also create an xarray to track physical EPC address ranges,
and make the query fast.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-07-28 20:46     ` [PATCH v3 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-07-30  0:42       ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-07-30  0:42 UTC (permalink / raw)
  To: Tony Luck; +Cc: Sean Christopherson, Dave Hansen, x86, linux-kernel

On Wed, Jul 28, 2021 at 01:46:49PM -0700, Tony Luck wrote:
> +	dir = debugfs_create_dir("sgx", NULL);

dir = debugfs_create_dir("sgx", arch_debugfs_dir);

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30  0:38         ` Jarkko Sakkinen
@ 2021-07-30 16:46           ` Sean Christopherson
  2021-07-30 16:50             ` Dave Hansen
  2021-08-02  8:48             ` Jarkko Sakkinen
  0 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2021-07-30 16:46 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: Dave Hansen, Tony Luck, x86, linux-kernel

On Fri, Jul 30, 2021, Jarkko Sakkinen wrote:
> On Wed, Jul 28, 2021 at 03:19:46PM -0700, Dave Hansen wrote:
> > On 7/28/21 1:46 PM, Tony Luck wrote:
> > > Export a function sgx_is_epc_page() that simply reports whether an
> > > address is an EPC page for use elsewhere in the kernel.
> > 
> > It would be really nice to mention why this needs to be exported to
> > modules.  I assume it's the error injection driver or something that can
> > be built as a module, but this export was a surprise when I saw it.
> > 
> > It's probably also worth noting that this is a sloooooooow
> > implementation compared to the core VM code that does something
> > analogous: pfn_to_page().  It's fine for error handling, but we should
> > probably have a comment to this effect so that more liberal use doesn't
> > creep in anywhere.
> 
> You could also create an xarray to track physical EPC address ranges,
> and make the query fast.

Eh, it's not _that_ slow due to the constraints on the number of EPC sections.
The hard limit is currently '8', and practically speaking there will be one
section per socket.  Turning a linear search into a binary search in this case
isn't going to buy much.

Out of curiosity, on multi-socket systems, are EPC sections clustered in a single
address range, or are they interleaved with regular RAM?  If they're clustered,
you could track the min/max across all sections to optimize the common case that
an address isn't in any EPC section.

static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
{
	struct sgx_epc_section *section;
	int i;

        if (paddr < min_epc_pa || paddr > max_epc_pa)
                return NULL;

	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
		section = &sgx_epc_sections[i];

		if (paddr < section->phys_addr || paddr > section->end_phys_addr)
			continue;

		return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
	}

	return NULL;
}

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 16:46           ` Sean Christopherson
@ 2021-07-30 16:50             ` Dave Hansen
  2021-07-30 18:44               ` Luck, Tony
  2021-08-02  8:51               ` Jarkko Sakkinen
  2021-08-02  8:48             ` Jarkko Sakkinen
  1 sibling, 2 replies; 185+ messages in thread
From: Dave Hansen @ 2021-07-30 16:50 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen; +Cc: Tony Luck, x86, linux-kernel

On 7/30/21 9:46 AM, Sean Christopherson wrote:
> Out of curiosity, on multi-socket systems, are EPC sections clustered in a single
> address range, or are they interleaved with regular RAM?  If they're clustered,
> you could track the min/max across all sections to optimize the common case that
> an address isn't in any EPC section.

They're interleaved on the systems that I've seen:

	Socket 0 - RAM
	Socket 0 - EPC
	Socket 1 - RAM
	Socket 1 - EPC

It would probably be pretty expensive in terms of the physical address
remapping resources to cluster them.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 16:50             ` Dave Hansen
@ 2021-07-30 18:44               ` Luck, Tony
  2021-07-30 20:35                 ` Dave Hansen
  2021-08-02  8:52                 ` Jarkko Sakkinen
  2021-08-02  8:51               ` Jarkko Sakkinen
  1 sibling, 2 replies; 185+ messages in thread
From: Luck, Tony @ 2021-07-30 18:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Sean Christopherson, Jarkko Sakkinen, x86, linux-kernel

On Fri, Jul 30, 2021 at 09:50:59AM -0700, Dave Hansen wrote:
> On 7/30/21 9:46 AM, Sean Christopherson wrote:
> > Out of curiosity, on multi-socket systems, are EPC sections clustered in a single
> > address range, or are they interleaved with regular RAM?  If they're clustered,
> > you could track the min/max across all sections to optimize the common case that
> > an address isn't in any EPC section.
> 
> They're interleaved on the systems that I've seen:
> 
> 	Socket 0 - RAM
> 	Socket 0 - EPC
> 	Socket 1 - RAM
> 	Socket 1 - EPC
> 
> It would probably be pretty expensive in terms of the physical address
> remapping resources to cluster them.

I thought xarray was overkill ... and it is ... but it makes the code
considerably shorter/simpler!

I think I'm going to go with it. Thanks to Jarkko for the suggestion.

Also added comments based on Dave's feedback on why the function is
exported, and that sgx_is_epc_page() will be slower than people might
expect.

-Tony

From 7026de93f5bf370be9d067cdc068a4a2a54bbd3e Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck@intel.com>
Date: Fri, 30 Jul 2021 11:39:45 -0700
Subject: [PATCH] x86/sgx: Add infrastructure to identify SGX EPC pages

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function sgx_is_epc_page() that simply reports whether an address
is an EPC page for use elsewhere in the kernel. The ACPI error injection
code needs this function and is typically built as a module, so export it.

Note that sgx_is_epc_page() will be slower than other similar "what type
is this page" functions that can simply check bits in the "struct page".
If there is some future performance critical user of this function it
may need to be implemented in a more efficient way.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 21 +++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 3d19bba3fa7e..d65787391b22 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(epc_page_ranges);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
+	xa_store_range(&epc_page_ranges, section->phys_addr,
+		       section->end_phys_addr, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -660,6 +664,23 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&epc_page_ranges, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+bool sgx_is_epc_page(u64 paddr)
+{
+	return !!xa_load(&epc_page_ranges, paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4e1a410b8a62..226b081a4d05 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -50,6 +50,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 18:44               ` Luck, Tony
@ 2021-07-30 20:35                 ` Dave Hansen
  2021-07-30 23:35                   ` Luck, Tony
  2021-08-02  8:52                 ` Jarkko Sakkinen
  1 sibling, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-07-30 20:35 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Sean Christopherson, Jarkko Sakkinen, x86, linux-kernel, Matthew Wilcox

On 7/30/21 11:44 AM, Luck, Tony wrote:
> @@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	}
>  
>  	section->phys_addr = phys_addr;
> +	section->end_phys_addr = phys_addr + size - 1;
> +	xa_store_range(&epc_page_ranges, section->phys_addr,
> +		       section->end_phys_addr, section, GFP_KERNEL);

That is compact, but how much memory does it eat?  I'm a little worried
about this hunk of xa_store_range():

>                 do {
>                         xas_set_range(&xas, first, last);
>                         xas_store(&xas, entry);
>                         if (xas_error(&xas))
>                                 goto unlock;
>                         first += xas_size(&xas);
>                 } while (first <= last);

That makes it look like it's iterating over the whole range and making
loads of individual array instead of doing something super clever like
keeping an extent-style structure.

Let's say we have 1TB of EPC.  How big is the array to store these
indexes?  Would this be more compact if instead of doing a physical
address range:

	xa_store_range(&epc_page_ranges,
		       section->phys_addr,
		       section->end_phys_addr, ...);

... you did it based on PFNs:

	xa_store_range(&epc_page_ranges,
		       section->phys_addr     >> PAGE_SHIFT,
		       section->end_phys_addr >> PAGE_SHIFT, ...);

SGX sections are at *least* page-aligned, so this should be fine.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 20:35                 ` Dave Hansen
@ 2021-07-30 23:35                   ` Luck, Tony
  2021-08-03 21:34                     ` Matthew Wilcox
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-07-30 23:35 UTC (permalink / raw)
  To: Hansen, Dave
  Cc: Sean Christopherson, Jarkko Sakkinen, x86, linux-kernel, Matthew Wilcox

>	xa_store_range(&epc_page_ranges,
>		       section->phys_addr,
>		       section->end_phys_addr, ...);
>
> ... you did it based on PFNs:
>
>	xa_store_range(&epc_page_ranges,
>		       section->phys_addr     >> PAGE_SHIFT,
>		       section->end_phys_addr >> PAGE_SHIFT, ...);
>
> SGX sections are at *least* page-aligned, so this should be fine.

I found xa_dump() (hidden inside #ifdef XA_DEBUG)

Trying both with and without the >> PAGE_SHIFT made no difference
to the number of lines of console output that xa_dump() spits out.
266 either way.

There are only two ranges on this system

[   11.937592] sgx: EPC section 0x8000c00000-0x807f7fffff
[   11.945811] sgx: EPC section 0x10000c00000-0x1007fffffff

So I'm a little bit sad that xarray appears to have broken them up
into a bunch of pieces.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 16:46           ` Sean Christopherson
  2021-07-30 16:50             ` Dave Hansen
@ 2021-08-02  8:48             ` Jarkko Sakkinen
  1 sibling, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-08-02  8:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Dave Hansen, Tony Luck, x86, linux-kernel

On Fri, Jul 30, 2021 at 04:46:07PM +0000, Sean Christopherson wrote:
> On Fri, Jul 30, 2021, Jarkko Sakkinen wrote:
> > On Wed, Jul 28, 2021 at 03:19:46PM -0700, Dave Hansen wrote:
> > > On 7/28/21 1:46 PM, Tony Luck wrote:
> > > > Export a function sgx_is_epc_page() that simply reports whether an
> > > > address is an EPC page for use elsewhere in the kernel.
> > > 
> > > It would be really nice to mention why this needs to be exported to
> > > modules.  I assume it's the error injection driver or something that can
> > > be built as a module, but this export was a surprise when I saw it.
> > > 
> > > It's probably also worth noting that this is a sloooooooow
> > > implementation compared to the core VM code that does something
> > > analogous: pfn_to_page().  It's fine for error handling, but we should
> > > probably have a comment to this effect so that more liberal use doesn't
> > > creep in anywhere.
> > 
> > You could also create an xarray to track physical EPC address ranges,
> > and make the query fast.
> 
> Eh, it's not _that_ slow due to the constraints on the number of EPC sections.
> The hard limit is currently '8', and practically speaking there will be one
> section per socket.  Turning a linear search into a binary search in this case
> isn't going to buy much.

Also, consumes more memory.

Just pointing out that it is possible to improve without much fuzz, if ever
required, for instance by using DEFINE_XARRAY() to be define file-scope
xarray.

> Out of curiosity, on multi-socket systems, are EPC sections clustered in a single
> address range, or are they interleaved with regular RAM?  If they're clustered,
> you could track the min/max across all sections to optimize the common case that
> an address isn't in any EPC section.

Given that physical address ranges of different NUMA nodes are disjoint,
and each has EPC section is reserved from one such section, I would presume
that they are interleaved.

> static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
> {
> 	struct sgx_epc_section *section;
> 	int i;
> 
>         if (paddr < min_epc_pa || paddr > max_epc_pa)
>                 return NULL;
> 
> 	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
> 		section = &sgx_epc_sections[i];
> 
> 		if (paddr < section->phys_addr || paddr > section->end_phys_addr)
> 			continue;
> 
> 		return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
> 	}
> 
> 	return NULL;
> }

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 16:50             ` Dave Hansen
  2021-07-30 18:44               ` Luck, Tony
@ 2021-08-02  8:51               ` Jarkko Sakkinen
  1 sibling, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-08-02  8:51 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Sean Christopherson, Tony Luck, x86, linux-kernel

On Fri, Jul 30, 2021 at 09:50:59AM -0700, Dave Hansen wrote:
> On 7/30/21 9:46 AM, Sean Christopherson wrote:
> > Out of curiosity, on multi-socket systems, are EPC sections clustered in a single
> > address range, or are they interleaved with regular RAM?  If they're clustered,
> > you could track the min/max across all sections to optimize the common case that
> > an address isn't in any EPC section.
> 
> They're interleaved on the systems that I've seen:
> 
> 	Socket 0 - RAM
> 	Socket 0 - EPC
> 	Socket 1 - RAM
> 	Socket 1 - EPC
> 
> It would probably be pretty expensive in terms of the physical address
> remapping resources to cluster them.

If they were clustered, wouldn't that also break up our initialization code
for NUMA? It's based on detecting of which NUMA nodes address range is the
given EPC section.

I.e. there should be some meta-data to draw the connection to the correct
NUMA node, if they were clustered (which does not exist).

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 18:44               ` Luck, Tony
  2021-07-30 20:35                 ` Dave Hansen
@ 2021-08-02  8:52                 ` Jarkko Sakkinen
  1 sibling, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-08-02  8:52 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Dave Hansen, Sean Christopherson, x86, linux-kernel

On Fri, Jul 30, 2021 at 11:44:00AM -0700, Luck, Tony wrote:
> On Fri, Jul 30, 2021 at 09:50:59AM -0700, Dave Hansen wrote:
> > On 7/30/21 9:46 AM, Sean Christopherson wrote:
> > > Out of curiosity, on multi-socket systems, are EPC sections clustered in a single
> > > address range, or are they interleaved with regular RAM?  If they're clustered,
> > > you could track the min/max across all sections to optimize the common case that
> > > an address isn't in any EPC section.
> > 
> > They're interleaved on the systems that I've seen:
> > 
> > 	Socket 0 - RAM
> > 	Socket 0 - EPC
> > 	Socket 1 - RAM
> > 	Socket 1 - EPC
> > 
> > It would probably be pretty expensive in terms of the physical address
> > remapping resources to cluster them.
> 
> I thought xarray was overkill ... and it is ... but it makes the code
> considerably shorter/simpler!
> 
> I think I'm going to go with it. Thanks to Jarkko for the suggestion.

If it makes the code considerably simpler, that in my opinion justifies the
minor size increase.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-07-30 23:35                   ` Luck, Tony
@ 2021-08-03 21:34                     ` Matthew Wilcox
  2021-08-03 23:49                       ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Matthew Wilcox @ 2021-08-03 21:34 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Hansen, Dave, Sean Christopherson, Jarkko Sakkinen, x86,
	linux-kernel, Liam R. Howlett

On Fri, Jul 30, 2021 at 11:35:38PM +0000, Luck, Tony wrote:
> >	xa_store_range(&epc_page_ranges,
> >		       section->phys_addr,
> >		       section->end_phys_addr, ...);
> >
> > ... you did it based on PFNs:
> >
> >	xa_store_range(&epc_page_ranges,
> >		       section->phys_addr     >> PAGE_SHIFT,
> >		       section->end_phys_addr >> PAGE_SHIFT, ...);
> >
> > SGX sections are at *least* page-aligned, so this should be fine.
> 
> I found xa_dump() (hidden inside #ifdef XA_DEBUG)
> 
> Trying both with and without the >> PAGE_SHIFT made no difference
> to the number of lines of console output that xa_dump() spits out.
> 266 either way.
> 
> There are only two ranges on this system
> 
> [   11.937592] sgx: EPC section 0x8000c00000-0x807f7fffff
> [   11.945811] sgx: EPC section 0x10000c00000-0x1007fffffff
> 
> So I'm a little bit sad that xarray appears to have broken them up
> into a bunch of pieces.

That's inherent in the (current) back end data structure, I'm afraid.
As a radix tree, it can only look up based on the N bits available at
each level of the tree, so if your entry is an aligned power-of-64,
everything is nice and neat, and you're a single entry at one level
of the tree.  If you're an arbitrary range, things get more complicated,
and I have to do a little dance to redirect the lookup towards the
canonical entry.

Liam and I are working on a new replacement data structure called the
Maple Tree, but it's not yet ready to replace the radix tree back end.
It looks like it would be perfect for your case; there would be five
entries in it, stored in one 256-byte node:

	NULL
0x8000bfffff
	p1
0x807f7fffff
	NULL
0x10000c00000
	p2
0x1007fffffff
	NULL
0xffff'ffff'ffff'ffff

It would actually turn into a linear scan, because that's just the
fastest way to find something in a list of five elements.  A third
range would take us to a list of seven elements, which still fits
in a single node.  Once we get to more than that, you'd have a
two-level tree, which would work until you have more than ~20 ranges.

We could do better for your case by storing 10x (start, end, p) in each
leaf node, but we're (currently) optimising for VMAs which tend to be
tightly packed, meaning that an implicit 'start' element is a better
choice as it gives us 15x (end, p) pairs.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-08-03 21:34                     ` Matthew Wilcox
@ 2021-08-03 23:49                       ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-08-03 23:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hansen, Dave, Sean Christopherson, Jarkko Sakkinen, x86,
	linux-kernel, Liam R. Howlett



Sent from my iPhone

> On Aug 3, 2021, at 14:47, Matthew Wilcox
> 
> Liam and I are working on a new replacement data structure called the
> Maple Tree, but it's not yet ready to replace the radix tree back end.
> It looks like it would be perfect for your case; there would be five
> entries in it, stored in one 256-byte node:
> 
>    NULL
> 0x8000bfffff
>    p1
> 0x807f7fffff
>    NULL
> 0x10000c00000
>    p2
> 0x1007fffffff
>    NULL
> 0xffff'ffff'ffff'ffff
> 
> It would actually turn into a linear scan, because that's just the
> fastest way to find something in a list of five elements.  A third
> range would take us to a list of seven elements, which still fits
> in a single node.  Once we get to more than that, you'd have a
> two-level tree, which would work until you have more than ~20 ranges.
> 
> We could do better for your case by storing 10x (start, end, p) in each
> leaf node, but we're (currently) optimising for VMAs which tend to be
> tightly packed, meaning that an implicit 'start' element is a better
> choice as it gives us 15x (end, p) pairs.

That’s good to know. While current xarray
implementation might be a bit wasteful[1],
things will get better.

I’m still going with xarray to keep the source
simple.

-Tony

[1] A few KBytes extra doesn’t even sound
too terrible to manage tens of MBytes (or
more) of SGX EPC memory on a system
with a half TByte total memory.



^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v4 0/6] Basic recovery for machine checks inside SGX
  2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
                       ` (6 preceding siblings ...)
  2021-07-28 20:46     ` [PATCH v3 7/7] x86/sgx: Add documentation for SGX memory errors Tony Luck
@ 2021-08-27 19:55     ` Tony Luck
  2021-08-27 19:55       ` [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
                         ` (8 more replies)
  7 siblings, 9 replies; 185+ messages in thread
From: Tony Luck @ 2021-08-27 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel, Tony Luck

Here's version 4 (just 38 more to go if I want to meet the bar set by
the base SGX series :-) )

Changes since v3:

Dave Hansen:
	1) Concerns about assigning a default value to the "owner"
	   pointer if the caller of sgx_alloc_epc_page() called with
	   a NULL value.
	Resolved: Sean provided a patch to fix the only caller that
	was using NULL. I merged it in here.

	2) Better commit message to explain why sgx_is_epc_page() is
	   exported.
	Done.

	3) Unhappy with "void *owner" in struct sgx_epc_page. Would
	   be better to use an anonymous union of all the types.
	Done.

Sean Christopherson:
	1) Races updating bits in flags field.
	Resolved: "poison" is now a separate field.

	2) More races. When poison alert happens while moving
	   a page on/off a free/dirty list.
	Resolved: Well mostly. All the run time changes are now
	done while holding the node->lock. There's a gap while
	moving pages from dirty list to free list. But that's
	a short-ish window during boot, and the races are mostly
	harmless. Worst is that we might call __eremove() for a
	page that just got marked as poisoned. But then
	sgx_free_epc_page() will see the poison flag and do the
	right thing.

Jarkko Sakkinen:
	1) Use xarray to keep track of which pages are the special
	   SGX EPC ones.
	This spawned a short discussion on whether it was overkill. But
	xarray makes the source much simpler, and there are improvements
	in the pipeline for xarray that will make it handle this use
	case more efficiently. So I made this change.

	2) Move the sgx debugfs directory under arch_debugfs_dir.
	Done.

Tony Luck (6):
  x86/sgx: Provide indication of life-cycle of EPC pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/encl.c                |   5 +-
 arch/x86/kernel/cpu/sgx/encl.h                |   2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |   2 +-
 arch/x86/kernel/cpu/sgx/main.c                | 140 ++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h                 |  14 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |  19 ++-
 10 files changed, 196 insertions(+), 27 deletions(-)


base-commit: e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-08-27 19:55       ` Tony Luck
  2021-09-01  3:55         ` Jarkko Sakkinen
  2021-08-27 19:55       ` [PATCH v4 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-08-27 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel, Tony Luck

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

It would be good to use the sgx_epc_page->owner field as an indicator
of where an EPC page is currently in that cycle (owner != NULL means
the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
that calls with NULL.

Since there are multiple uses of the "owner" field with different types
change the sgx_epc_page structure to define an anonymous union with
each of the uses explicitly called out.

Start epc_pages out with a non-NULL owner while they are in DIRTY state.

Fix up the one holdout to provide a non-NULL owner.

Refactor the allocation sequence so that changes to/from NULL
value happen together with adding/removing the epc_page from
a free list while the node->lock is held.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  |  5 +++--
 arch/x86/kernel/cpu/sgx/encl.h  |  2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
 arch/x86/kernel/cpu/sgx/main.c  | 23 ++++++++++++-----------
 arch/x86/kernel/cpu/sgx/sgx.h   | 12 ++++++++----
 5 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..ad8c61933b0a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -667,6 +667,7 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 
 /**
  * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ * @va_page:	struct sgx_va_page connected to this VA page
  *
  * Allocate a free EPC page and convert it to a Version Array (VA) page.
  *
@@ -674,12 +675,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(void)
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page)
 {
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, true);
+	epc_page = sgx_alloc_epc_page(va_page, true);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..3d12dbeae14a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
 
-struct sgx_epc_page *sgx_alloc_va_page(void);
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..655ce0bb069d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 		if (!va_page)
 			return ERR_PTR(-ENOMEM);
 
-		va_page->epc_page = sgx_alloc_va_page();
+		va_page->epc_page = sgx_alloc_va_page(va_page);
 		if (IS_ERR(va_page->epc_page)) {
 			err = ERR_CAST(va_page->epc_page);
 			kfree(va_page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..4a5b51d16133 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -457,7 +457,7 @@ static bool __init sgx_page_reclaimer_init(void)
 	return true;
 }
 
-static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(void *private, int nid)
 {
 	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
 	struct sgx_epc_page *page = NULL;
@@ -471,6 +471,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
+	page->private = private;
 	sgx_nr_free_pages--;
 
 	spin_unlock(&node->lock);
@@ -480,6 +481,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
+ * @owner:	the owner of the EPC page
  *
  * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
  * from the NUMA node, where the caller is executing.
@@ -488,14 +490,14 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
  * - an EPC page:	A borrowed EPC pages were available.
  * - NULL:		Out of EPC pages.
  */
-struct sgx_epc_page *__sgx_alloc_epc_page(void)
+struct sgx_epc_page *__sgx_alloc_epc_page(void *private)
 {
 	struct sgx_epc_page *page;
 	int nid_of_current = numa_node_id();
 	int nid = nid_of_current;
 
 	if (node_isset(nid_of_current, sgx_numa_mask)) {
-		page = __sgx_alloc_epc_page_from_node(nid_of_current);
+		page = __sgx_alloc_epc_page_from_node(private, nid_of_current);
 		if (page)
 			return page;
 	}
@@ -506,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 		if (nid == nid_of_current)
 			break;
 
-		page = __sgx_alloc_epc_page_from_node(nid);
+		page = __sgx_alloc_epc_page_from_node(private, nid);
 		if (page)
 			return page;
 	}
@@ -559,7 +561,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
 
 /**
  * sgx_alloc_epc_page() - Allocate an EPC page
- * @owner:	the owner of the EPC page
+ * @private:	per-caller private data
  * @reclaim:	reclaim pages if necessary
  *
  * Iterate through EPC sections and borrow a free EPC page to the caller. When a
@@ -574,16 +576,14 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
  *   an EPC page,
  *   -errno on error
  */
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
+struct sgx_epc_page *sgx_alloc_epc_page(void *private, bool reclaim)
 {
 	struct sgx_epc_page *page;
 
 	for ( ; ; ) {
-		page = __sgx_alloc_epc_page();
-		if (!IS_ERR(page)) {
-			page->owner = owner;
+		page = __sgx_alloc_epc_page(private);
+		if (!IS_ERR(page))
 			break;
-		}
 
 		if (list_empty(&sgx_active_page_list))
 			return ERR_PTR(-ENOMEM);
@@ -624,6 +624,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
+	page->private = NULL;
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
@@ -652,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
-		section->pages[i].owner = NULL;
+		section->pages[i].private = "dirty";
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..8b1be10a46f6 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -28,8 +28,12 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
-	struct sgx_encl_page *owner;
+	int flags;
+	union {
+		void *private;
+		struct sgx_encl_page *owner;
+		struct sgx_encl_page *vepc;
+	};
 	struct list_head list;
 };
 
@@ -77,12 +81,12 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 	return section->virt_addr + index * PAGE_SIZE;
 }
 
-struct sgx_epc_page *__sgx_alloc_epc_page(void);
+struct sgx_epc_page *__sgx_alloc_epc_page(void *private);
 void sgx_free_epc_page(struct sgx_epc_page *page);
 
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+struct sgx_epc_page *sgx_alloc_epc_page(void *private, bool reclaim);
 
 #ifdef CONFIG_X86_SGX_KVM
 int __init sgx_vepc_init(void);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v4 2/6] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
  2021-08-27 19:55       ` [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-08-27 19:55       ` Tony Luck
  2021-09-01  4:30         ` Jarkko Sakkinen
  2021-08-27 19:55       ` [PATCH v4 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                         ` (6 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-08-27 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel, Tony Luck

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function sgx_is_epc_page() that simply reports whether an address
is an EPC page for use elsewhere in the kernel. The ACPI error injection
code needs this function and is typically built as a module, so export it.

Note that sgx_is_epc_page() will be slower than other similar "what type
is this page" functions that can simply check bits in the "struct page".
If there is some future performance critical user of this function it
may need to be implemented in a more efficient way.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 10 ++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4a5b51d16133..261f81b3f8af 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(epc_page_ranges);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
+	xa_store_range(&epc_page_ranges, section->phys_addr,
+		       section->end_phys_addr, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -660,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool sgx_is_epc_page(u64 paddr)
+{
+	return !!xa_load(&epc_page_ranges, paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 8b1be10a46f6..6a55b1971956 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -54,6 +54,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v4 3/6] x86/sgx: Initial poison handling for dirty and free pages
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
  2021-08-27 19:55       ` [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
  2021-08-27 19:55       ` [PATCH v4 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-08-27 19:55       ` Tony Luck
  2021-08-27 19:55       ` [PATCH v4 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-08-27 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel, Tony Luck

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
administrators get a list of those pages that have been dropped because
of poison.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 30 +++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 261f81b3f8af..c08df4e35ff0 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/debugfs.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
@@ -43,6 +44,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +64,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +634,10 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	spin_lock(&node->lock);
 
 	page->private = NULL;
-	list_add_tail(&page->list, &node->free_page_list);
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
 	spin_unlock(&node->lock);
@@ -657,6 +668,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
+		section->pages[i].poison = 0;
 		section->pages[i].private = "dirty";
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
@@ -801,8 +813,21 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static int poison_list_show(struct seq_file *m, void *private)
+{
+	struct sgx_epc_page *page;
+
+	list_for_each_entry(page, &sgx_poison_page_list, list)
+		seq_printf(m, "0x%lx\n", sgx_get_epc_phys_addr(page));
+
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(poison_list);
+
 static int __init sgx_init(void)
 {
+	struct dentry *dir;
 	int ret;
 	int i;
 
@@ -834,6 +859,9 @@ static int __init sgx_init(void)
 	if (sgx_vepc_init() && ret)
 		goto err_provision;
 
+	dir = debugfs_create_dir("sgx", arch_debugfs_dir);
+	debugfs_create_file("poison_page_list", 0400, dir, NULL, &poison_list_fops);
+
 	return 0;
 
 err_provision:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 6a55b1971956..77f3d98c9fbf 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -28,7 +28,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	int flags;
+	u16 flags;
+	u16 poison;
 	union {
 		void *private;
 		struct sgx_encl_page *owner;
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v4 4/6] x86/sgx: Add SGX infrastructure to recover from poison
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
                         ` (2 preceding siblings ...)
  2021-08-27 19:55       ` [PATCH v4 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-08-27 19:55       ` Tony Luck
  2021-08-27 19:55       ` [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-08-27 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel, Tony Luck

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the poison page list.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index c08df4e35ff0..d9fe08f68d13 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -682,6 +682,83 @@ bool sgx_is_epc_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(sgx_is_epc_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&epc_page_ranges, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If there is no owner, then the page is on a free list.
+	 * Move it to the poison page list.
+	 */
+	if (!page->private) {
+		list_del(&page->list);
+		list_add(&page->list, &sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
                         ` (3 preceding siblings ...)
  2021-08-27 19:55       ` [PATCH v4 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-08-27 19:55       ` Tony Luck
  2021-09-03  6:12         ` Jarkko Sakkinen
  2021-08-27 19:55       ` [PATCH v4 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
                         ` (3 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-08-27 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel, Tony Luck

Add a call inside memory_failure() to check if the address is an SGX
EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 15 +++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..801af8f30c83 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (sgx_is_epc_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7ca22e6e694a..2ff599bcf8c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3283,5 +3283,20 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifdef CONFIG_X86_SGX
+int sgx_memory_failure(unsigned long pfn, int flags);
+bool sgx_is_epc_page(u64 paddr);
+#else
+static inline int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+
+static inline bool sgx_is_epc_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 470400cc7513..ce04debd18f6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = sgx_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v4 6/6] x86/sgx: Add hook to error injection address validation
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
                         ` (4 preceding siblings ...)
  2021-08-27 19:55       ` [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
@ 2021-08-27 19:55       ` Tony Luck
  2021-08-27 20:28       ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Borislav Petkov
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-08-27 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel, Tony Luck

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..cd7cffc955bf 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !sgx_is_epc_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.29.2


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 0/6] Basic recovery for machine checks inside SGX
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
                         ` (5 preceding siblings ...)
  2021-08-27 19:55       ` [PATCH v4 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-08-27 20:28       ` Borislav Petkov
  2021-08-27 20:43         ` Sean Christopherson
  2021-09-01  2:06       ` Jarkko Sakkinen
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
  8 siblings, 1 reply; 185+ messages in thread
From: Borislav Petkov @ 2021-08-27 20:28 UTC (permalink / raw)
  To: Tony Luck
  Cc: Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	x86, linux-kernel

On Fri, Aug 27, 2021 at 12:55:37PM -0700, Tony Luck wrote:
> Here's version 4 (just 38 more to go if I want to meet the bar set by
> the base SGX series :-) )

You're off by 1:

https://lore.kernel.org/lkml/20201214114200.GD26358@zn.tnic/

you have only just 37 more.

:-P

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 0/6] Basic recovery for machine checks inside SGX
  2021-08-27 20:28       ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Borislav Petkov
@ 2021-08-27 20:43         ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2021-08-27 20:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, Jarkko Sakkinen, Dave Hansen, Cathy Zhang, x86, linux-kernel

On Fri, Aug 27, 2021, Borislav Petkov wrote:
> On Fri, Aug 27, 2021 at 12:55:37PM -0700, Tony Luck wrote:
> > Here's version 4 (just 38 more to go if I want to meet the bar set by
> > the base SGX series :-) )
> 
> You're off by 1:
> 
> https://lore.kernel.org/lkml/20201214114200.GD26358@zn.tnic/
> 
> you have only just 37 more.
> 
> :-P

LOL, sorry for setting such high standards.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 0/6] Basic recovery for machine checks inside SGX
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
                         ` (6 preceding siblings ...)
  2021-08-27 20:28       ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Borislav Petkov
@ 2021-09-01  2:06       ` Jarkko Sakkinen
  2021-09-01 14:48         ` Luck, Tony
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
  8 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-01  2:06 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel

On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> Here's version 4 (just 38 more to go if I want to meet the bar set by
> the base SGX series :-) )
> 
> Changes since v3:
> 
> Dave Hansen:
> 	1) Concerns about assigning a default value to the "owner"
> 	   pointer if the caller of sgx_alloc_epc_page() called with
> 	   a NULL value.
> 	Resolved: Sean provided a patch to fix the only caller that
> 	was using NULL. I merged it in here.
> 
> 	2) Better commit message to explain why sgx_is_epc_page() is
> 	   exported.
> 	Done.
> 
> 	3) Unhappy with "void *owner" in struct sgx_epc_page. Would
> 	   be better to use an anonymous union of all the types.
> 	Done.
> 
> Sean Christopherson:
> 	1) Races updating bits in flags field.
> 	Resolved: "poison" is now a separate field.
> 
> 	2) More races. When poison alert happens while moving
> 	   a page on/off a free/dirty list.
> 	Resolved: Well mostly. All the run time changes are now
> 	done while holding the node->lock. There's a gap while
> 	moving pages from dirty list to free list. But that's
> 	a short-ish window during boot, and the races are mostly
> 	harmless. Worst is that we might call __eremove() for a
> 	page that just got marked as poisoned. But then
> 	sgx_free_epc_page() will see the poison flag and do the
> 	right thing.
> 
> Jarkko Sakkinen:
> 	1) Use xarray to keep track of which pages are the special
> 	   SGX EPC ones.
> 	This spawned a short discussion on whether it was overkill. But
> 	xarray makes the source much simpler, and there are improvements
> 	in the pipeline for xarray that will make it handle this use
> 	case more efficiently. So I made this change.
> 
> 	2) Move the sgx debugfs directory under arch_debugfs_dir.
> 	Done.
> 
> Tony Luck (6):
>   x86/sgx: Provide indication of life-cycle of EPC pages
>   x86/sgx: Add infrastructure to identify SGX EPC pages
>   x86/sgx: Initial poison handling for dirty and free pages
>   x86/sgx: Add SGX infrastructure to recover from poison
>   x86/sgx: Hook sgx_memory_failure() into mainline code
>   x86/sgx: Add hook to error injection address validation
> 
>  .../firmware-guide/acpi/apei/einj.rst         |  19 +++
>  arch/x86/include/asm/set_memory.h             |   4 +
>  arch/x86/kernel/cpu/sgx/encl.c                |   5 +-
>  arch/x86/kernel/cpu/sgx/encl.h                |   2 +-
>  arch/x86/kernel/cpu/sgx/ioctl.c               |   2 +-
>  arch/x86/kernel/cpu/sgx/main.c                | 140 ++++++++++++++++--
>  arch/x86/kernel/cpu/sgx/sgx.h                 |  14 +-
>  drivers/acpi/apei/einj.c                      |   3 +-
>  include/linux/mm.h                            |  15 ++
>  mm/memory-failure.c                           |  19 ++-
>  10 files changed, 196 insertions(+), 27 deletions(-)
> 
> 
> base-commit: e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93

Would be nice to get this also to linux-sgx@vger.kernel.org in
future.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-08-27 19:55       ` [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-09-01  3:55         ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-01  3:55 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel

On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> SGX EPC pages go through the following life cycle:
> 
>         DIRTY ---> FREE ---> IN-USE --\
>                     ^                 |
>                     \-----------------/
> 
> Recovery action for poison for a DIRTY or FREE page is simple. Just
> make sure never to allocate the page. IN-USE pages need some extra
> handling.
> 
> It would be good to use the sgx_epc_page->owner field as an indicator
> of where an EPC page is currently in that cycle (owner != NULL means
> the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
> that calls with NULL.
> 
> Since there are multiple uses of the "owner" field with different types
> change the sgx_epc_page structure to define an anonymous union with
> each of the uses explicitly called out.
> 
> Start epc_pages out with a non-NULL owner while they are in DIRTY state.
> 
> Fix up the one holdout to provide a non-NULL owner.
> 
> Refactor the allocation sequence so that changes to/from NULL
> value happen together with adding/removing the epc_page from
> a free list while the node->lock is held.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c  |  5 +++--
>  arch/x86/kernel/cpu/sgx/encl.h  |  2 +-
>  arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
>  arch/x86/kernel/cpu/sgx/main.c  | 23 ++++++++++++-----------
>  arch/x86/kernel/cpu/sgx/sgx.h   | 12 ++++++++----
>  5 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 001808e3901c..ad8c61933b0a 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -667,6 +667,7 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>  
>  /**
>   * sgx_alloc_va_page() - Allocate a Version Array (VA) page
> + * @va_page:	struct sgx_va_page connected to this VA page
>   *
>   * Allocate a free EPC page and convert it to a Version Array (VA) page.
>   *
> @@ -674,12 +675,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>   *   a VA page,
>   *   -errno otherwise
>   */
> -struct sgx_epc_page *sgx_alloc_va_page(void)
> +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page)

Why not just 

struct sgx_epc_page *sgx_alloc_va_page(void *owner)

>  {
>  	struct sgx_epc_page *epc_page;
>  	int ret;
>  
> -	epc_page = sgx_alloc_epc_page(NULL, true);
> +	epc_page = sgx_alloc_epc_page(va_page, true);

epc_page = sgx_alloc_epc_page(owner, true);

>  	if (IS_ERR(epc_page))
>  		return ERR_CAST(epc_page);

This function does not do anything with the internals of struct sgx_va_page.

> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index fec43ca65065..3d12dbeae14a 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
>  int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>  				  struct sgx_encl_page *page);
>  
> -struct sgx_epc_page *sgx_alloc_va_page(void);
> +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page);
>  unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
>  void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
>  bool sgx_va_page_full(struct sgx_va_page *va_page);
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 83df20e3e633..655ce0bb069d 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
>  		if (!va_page)
>  			return ERR_PTR(-ENOMEM);
>  
> -		va_page->epc_page = sgx_alloc_va_page();
> +		va_page->epc_page = sgx_alloc_va_page(va_page);
>  		if (IS_ERR(va_page->epc_page)) {
>  			err = ERR_CAST(va_page->epc_page);
>  			kfree(va_page);
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 63d3de02bbcc..4a5b51d16133 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -457,7 +457,7 @@ static bool __init sgx_page_reclaimer_init(void)
>  	return true;
>  }
>  
> -static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(void *private, int nid)
>  {
>  	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
>  	struct sgx_epc_page *page = NULL;
> @@ -471,6 +471,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
>  
>  	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
>  	list_del_init(&page->list);
> +	page->private = private;
>  	sgx_nr_free_pages--;
>  
>  	spin_unlock(&node->lock);
> @@ -480,6 +481,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
>  
>  /**
>   * __sgx_alloc_epc_page() - Allocate an EPC page
> + * @owner:	the owner of the EPC page
>   *
>   * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
>   * from the NUMA node, where the caller is executing.
> @@ -488,14 +490,14 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
>   * - an EPC page:	A borrowed EPC pages were available.
>   * - NULL:		Out of EPC pages.
>   */
> -struct sgx_epc_page *__sgx_alloc_epc_page(void)
> +struct sgx_epc_page *__sgx_alloc_epc_page(void *private)
>  {
>  	struct sgx_epc_page *page;
>  	int nid_of_current = numa_node_id();
>  	int nid = nid_of_current;
>  
>  	if (node_isset(nid_of_current, sgx_numa_mask)) {
> -		page = __sgx_alloc_epc_page_from_node(nid_of_current);
> +		page = __sgx_alloc_epc_page_from_node(private, nid_of_current);
>  		if (page)
>  			return page;
>  	}
> @@ -506,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  		if (nid == nid_of_current)
>  			break;
>  
> -		page = __sgx_alloc_epc_page_from_node(nid);
> +		page = __sgx_alloc_epc_page_from_node(private, nid);
>  		if (page)
>  			return page;
>  	}
> @@ -559,7 +561,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>  
>  /**
>   * sgx_alloc_epc_page() - Allocate an EPC page
> - * @owner:	the owner of the EPC page
> + * @private:	per-caller private data
>   * @reclaim:	reclaim pages if necessary
>   *
>   * Iterate through EPC sections and borrow a free EPC page to the caller. When a
> @@ -574,16 +576,14 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>   *   an EPC page,
>   *   -errno on error
>   */
> -struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> +struct sgx_epc_page *sgx_alloc_epc_page(void *private, bool reclaim)
>  {
>  	struct sgx_epc_page *page;
>  
>  	for ( ; ; ) {
> -		page = __sgx_alloc_epc_page();
> -		if (!IS_ERR(page)) {
> -			page->owner = owner;
> +		page = __sgx_alloc_epc_page(private);
> +		if (!IS_ERR(page))
>  			break;
> -		}
>  
>  		if (list_empty(&sgx_active_page_list))
>  			return ERR_PTR(-ENOMEM);
> @@ -624,6 +624,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
>  
>  	spin_lock(&node->lock);
>  
> +	page->private = NULL;
>  	list_add_tail(&page->list, &node->free_page_list);
>  	sgx_nr_free_pages++;
>  
> @@ -652,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	for (i = 0; i < nr_pages; i++) {
>  		section->pages[i].section = index;
>  		section->pages[i].flags = 0;
> -		section->pages[i].owner = NULL;
> +		section->pages[i].private = "dirty";

#define DIRTY ((void *)-1)

section->pages[i].owner = DIRTY;


>  		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
>  	}
>  
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 4628acec0009..8b1be10a46f6 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -28,8 +28,12 @@
>  
>  struct sgx_epc_page {
>  	unsigned int section;
> -	unsigned int flags;
> -	struct sgx_encl_page *owner;
> +	int flags;
> +	union {
> +		void *private;
> +		struct sgx_encl_page *owner;
> +		struct sgx_encl_page *vepc;
> +	};


Why not just keep it as void *owner, and cast it as seen
appropriate?

>  	struct list_head list;
>  };
>  
> @@ -77,12 +81,12 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
>  	return section->virt_addr + index * PAGE_SIZE;
>  }
>  
> -struct sgx_epc_page *__sgx_alloc_epc_page(void);
> +struct sgx_epc_page *__sgx_alloc_epc_page(void *private);
>  void sgx_free_epc_page(struct sgx_epc_page *page);
>  
>  void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
>  int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
> -struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
> +struct sgx_epc_page *sgx_alloc_epc_page(void *private, bool reclaim);
>  
>  #ifdef CONFIG_X86_SGX_KVM
>  int __init sgx_vepc_init(void);


/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 2/6] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-08-27 19:55       ` [PATCH v4 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-09-01  4:30         ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-01  4:30 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel

On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> X86 machine check architecture reports a physical address when there
> is a memory error. Handling that error requires a method to determine
> whether the physical address reported is in any of the areas reserved
> for EPC pages by BIOS.
> 
> SGX EPC pages do not have Linux "struct page" associated with them.
> 
> Keep track of the mapping from ranges of EPC pages to the sections
> that contain them using an xarray.
> 
> Create a function sgx_is_epc_page() that simply reports whether an address
> is an EPC page for use elsewhere in the kernel. The ACPI error injection
> code needs this function and is typically built as a module, so export it.
> 
> Note that sgx_is_epc_page() will be slower than other similar "what type
> is this page" functions that can simply check bits in the "struct page".
> If there is some future performance critical user of this function it
> may need to be implemented in a more efficient way.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 10 ++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 4a5b51d16133..261f81b3f8af 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>  static int sgx_nr_epc_sections;
>  static struct task_struct *ksgxd_tsk;
>  static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
> +static DEFINE_XARRAY(epc_page_ranges);

Maybe we could just call this "sgx_epc_address_space"?

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v4 0/6] Basic recovery for machine checks inside SGX
  2021-09-01  2:06       ` Jarkko Sakkinen
@ 2021-09-01 14:48         ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-09-01 14:48 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

> Would be nice to get this also to linux-sgx@vger.kernel.org in
> future.

Will add to list for next version.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-08-27 19:55       ` [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
@ 2021-09-03  6:12         ` Jarkko Sakkinen
  2021-09-03  6:56           ` Jarkko Sakkinen
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-03  6:12 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel

On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> +#ifdef CONFIG_X86_SGX
> +int sgx_memory_failure(unsigned long pfn, int flags);
> +bool sgx_is_epc_page(u64 paddr);
> +#else
> +static inline int sgx_memory_failure(unsigned long pfn, int flags)
> +{
> +	return -ENXIO;
> +}
> +
> +static inline bool sgx_is_epc_page(u64 paddr)
> +{
> +	return false;
> +}
> +#endif

These decl's should be in arch/x86/include/asm/sgx.h, and as part of
patch that contains the implementations.

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-03  6:12         ` Jarkko Sakkinen
@ 2021-09-03  6:56           ` Jarkko Sakkinen
  2021-09-06 18:51             ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-03  6:56 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen
  Cc: Cathy Zhang, x86, linux-kernel

On Fri, 2021-09-03 at 09:12 +0300, Jarkko Sakkinen wrote:
> On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> > +#ifdef CONFIG_X86_SGX
> > +int sgx_memory_failure(unsigned long pfn, int flags);
> > +bool sgx_is_epc_page(u64 paddr);
> > +#else
> > +static inline int sgx_memory_failure(unsigned long pfn, int flags)
> > +{
> > +	return -ENXIO;
> > +}
> > +
> > +static inline bool sgx_is_epc_page(u64 paddr)
> > +{
> > +	return false;
> > +}
> > +#endif
> 
> These decl's should be in arch/x86/include/asm/sgx.h, and as part of
> patch that contains the implementations.

To align with this, I wrote a small patch:

https://lore.kernel.org/linux-sgx/20210903064156.387979-1-jarkko@kernel.org/T/#u

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-03  6:56           ` Jarkko Sakkinen
@ 2021-09-06 18:51             ` Luck, Tony
  2021-09-07 14:07               ` Jarkko Sakkinen
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-06 18:51 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

On Fri, 2021-09-03 at 09:12 +0300, Jarkko Sakkinen wrote:
> On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> > +#ifdef CONFIG_X86_SGX
> > +int sgx_memory_failure(unsigned long pfn, int flags);
> > +bool sgx_is_epc_page(u64 paddr);
> > +#else
> > +static inline int sgx_memory_failure(unsigned long pfn, int flags)
> > +{
> > +	return -ENXIO;
> > +}
> > +
> > +static inline bool sgx_is_epc_page(u64 paddr)
> > +{
> > +	return false;
> > +}
> > +#endif
> 
> These decl's should be in arch/x86/include/asm/sgx.h, and as part of
> patch that contains the implementations.

But I need to use these functions in arch independent code.  Specifically in
mm/memory-failure.c and drivers/acpi/apei/einj.c

If I just #include <asm/sgx.h> in those files I'll break the build for other
architectures.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-06 18:51             ` Luck, Tony
@ 2021-09-07 14:07               ` Jarkko Sakkinen
  2021-09-07 14:13                 ` Dave Hansen
  2021-09-07 15:03                 ` Luck, Tony
  0 siblings, 2 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-07 14:07 UTC (permalink / raw)
  To: Luck, Tony, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

On Mon, 2021-09-06 at 18:51 +0000, Luck, Tony wrote:
> On Fri, 2021-09-03 at 09:12 +0300, Jarkko Sakkinen wrote:
> > On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> > > +#ifdef CONFIG_X86_SGX
> > > +int sgx_memory_failure(unsigned long pfn, int flags);
> > > +bool sgx_is_epc_page(u64 paddr);
> > > +#else
> > > +static inline int sgx_memory_failure(unsigned long pfn, int flags)
> > > +{
> > > +	return -ENXIO;
> > > +}
> > > +
> > > +static inline bool sgx_is_epc_page(u64 paddr)
> > > +{
> > > +	return false;
> > > +}
> > > +#endif
> > 
> > These decl's should be in arch/x86/include/asm/sgx.h, and as part of
> > patch that contains the implementations.
> 
> But I need to use these functions in arch independent code.  Specifically in
> mm/memory-failure.c and drivers/acpi/apei/einj.c
> 
> If I just #include <asm/sgx.h> in those files I'll break the build for other
> architectures.

What does specifically break the build?

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-07 14:07               ` Jarkko Sakkinen
@ 2021-09-07 14:13                 ` Dave Hansen
  2021-09-07 15:07                   ` Luck, Tony
  2021-09-07 15:03                 ` Luck, Tony
  1 sibling, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-09-07 14:13 UTC (permalink / raw)
  To: Jarkko Sakkinen, Luck, Tony, Sean Christopherson
  Cc: Zhang, Cathy, x86, linux-kernel

On 9/7/21 7:07 AM, Jarkko Sakkinen wrote:
>> If I just #include <asm/sgx.h> in those files I'll break the build for other
>> architectures.
> What does specifically break the build?

Remember, our x86 "<asm/sgx.h>" is:

	arch/x86/include/asm/sgx.h

On powerpc, "<asm/sgx.h>" is:

	arch/powerpc/include/asm/sgx.h

You'll get a file not found error looking for sgx.h.

That said... Tony, it's probably a bit more friendly if the mm.h code
you add:

> +#ifdef CONFIG_X86_SGX
> +int sgx_memory_failure(unsigned long pfn, int flags);
> +bool sgx_is_epc_page(u64 paddr);
> +#else
> +static inline int sgx_memory_failure(unsigned long pfn, int flags)
> +{
> +	return -ENXIO;
> +}
> +
> +static inline bool sgx_is_epc_page(u64 paddr)
> +{
> +	return false;
> +}
> +#endif

was a bit more generic.  Maybe something like:

	int arch_memory_failure(unsigned long pfn, int flags);

BTW, I don't see sgx_is_epc_page() in arch-generic code.  Does it really
need to be in mm.h?

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-07 14:07               ` Jarkko Sakkinen
  2021-09-07 14:13                 ` Dave Hansen
@ 2021-09-07 15:03                 ` Luck, Tony
  2021-09-07 15:08                   ` Jarkko Sakkinen
  1 sibling, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-07 15:03 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

>> If I just #include <asm/sgx.h> in those files I'll break the build for other
>> architectures.
>
> What does specifically break the build?

There is no file named arch/arm/include/asm/sgx.h (ditto for other architectures that build memory-failure.c and einj.c).

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-07 14:13                 ` Dave Hansen
@ 2021-09-07 15:07                   ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-09-07 15:07 UTC (permalink / raw)
  To: Hansen, Dave, Jarkko Sakkinen, Sean Christopherson
  Cc: Zhang, Cathy, x86, linux-kernel

> BTW, I don't see sgx_is_epc_page() in arch-generic code.  Does it really
> need to be in mm.h?

I use it in drivers/acpi/apei/einj.c

Arm is a big user of ACPI. I don't see any Kconfig exclusions for CONFIG_ACPI_APEI_EINJ

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-07 15:03                 ` Luck, Tony
@ 2021-09-07 15:08                   ` Jarkko Sakkinen
  2021-09-07 17:46                     ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-07 15:08 UTC (permalink / raw)
  To: Luck, Tony, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

On Tue, 2021-09-07 at 15:03 +0000, Luck, Tony wrote:
> > > If I just #include <asm/sgx.h> in those files I'll break the build for other
> > > architectures.
> > 
> > What does specifically break the build?
> 
> There is no file named arch/arm/include/asm/sgx.h (ditto for other architectures that build memory-failure.c and einj.c).
> 
> -Tony

Would it be too obnoxious to flag that include in those files?

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-07 15:08                   ` Jarkko Sakkinen
@ 2021-09-07 17:46                     ` Luck, Tony
  2021-09-08  0:59                       ` Luck, Tony
  2021-09-08  2:29                       ` Jarkko Sakkinen
  0 siblings, 2 replies; 185+ messages in thread
From: Luck, Tony @ 2021-09-07 17:46 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

> Would it be too obnoxious to flag that include in those files?

Jarkko,

You mean:

#ifdef CONFIG_X86_SGX
#include <asm/sgx.h>
#endif

in mm/memory-failure.h?

That wouldn't help. I need the do-nothing stub definition on other architectures.

I'm going to explore Dave's suggestion of changing the names to something less sgx specific.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-07 17:46                     ` Luck, Tony
@ 2021-09-08  0:59                       ` Luck, Tony
  2021-09-08 16:49                         ` Dave Hansen
  2021-09-08  2:29                       ` Jarkko Sakkinen
  1 sibling, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-08  0:59 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

> I'm going to explore Dave's suggestion of changing the names to something less sgx specific.

So now I have the two functions renamed to

	arch_memory_failure() and arch_is_platform_page()

in arch/x86/kernel/cpu/sgx/main.c

In arch/x86/include/asm/processor.h

+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif

and in include/linux/mm.h

+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+       return -ENXIO;
+}
+#endif
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+       return false;
+}
+#endif

Dave: Is that what you wanted?  If so I can fold these bits back into the
appropriate bits of the series. Address other comments. and post v5.

Sean: If you have stuff that needs attention in v4 please holler soon.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-07 17:46                     ` Luck, Tony
  2021-09-08  0:59                       ` Luck, Tony
@ 2021-09-08  2:29                       ` Jarkko Sakkinen
  1 sibling, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-08  2:29 UTC (permalink / raw)
  To: Luck, Tony, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, x86, linux-kernel

On Tue, 2021-09-07 at 17:46 +0000, Luck, Tony wrote:
> > Would it be too obnoxious to flag that include in those files?
> 
> Jarkko,
> 
> You mean:
> 
> #ifdef CONFIG_X86_SGX
> #include <asm/sgx.h>
> #endif
> 
> in mm/memory-failure.h?
> 
> That wouldn't help. I need the do-nothing stub definition on other architectures.
> 
> I'm going to explore Dave's suggestion of changing the names to something less sgx specific.
> 
> -Tony


Ah sorry, I get it :-) Yeah, Dave's suggestion makes much more sense.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code
  2021-09-08  0:59                       ` Luck, Tony
@ 2021-09-08 16:49                         ` Dave Hansen
  0 siblings, 0 replies; 185+ messages in thread
From: Dave Hansen @ 2021-09-08 16:49 UTC (permalink / raw)
  To: Luck, Tony, Jarkko Sakkinen, Sean Christopherson
  Cc: Zhang, Cathy, x86, linux-kernel

On 9/7/21 5:59 PM, Luck, Tony wrote:
> +#ifdef CONFIG_X86_SGX
> +int arch_memory_failure(unsigned long pfn, int flags);
> +#define arch_memory_failure arch_memory_failure
> +
> +bool arch_is_platform_page(u64 paddr);
> +#define arch_is_platform_page arch_is_platform_page
> +#endif
> 
> and in include/linux/mm.h
> 
> +#ifndef arch_memory_failure
> +static inline int arch_memory_failure(unsigned long pfn, int flags)
> +{
> +       return -ENXIO;
> +}
> +#endif
> +#ifndef arch_is_platform_page
> +static inline bool arch_is_platform_page(u64 paddr)
> +{
> +       return false;
> +}
> +#endif
> 
> Dave: Is that what you wanted?  If so I can fold these bits back into the
> appropriate bits of the series. Address other comments. and post v5.

Looks good to me.

These can *also* be done with a

config ARCH_HAS_SPECIAL_MEMORY_FAILURE
	bool

in mm/Kconfig.h, and then:

	select ARCH_HAS_SPECIAL_MEMORY_FAILURE

in the SGX Kconfig instead of the ifndef's.  I prefer the configs
personally because they are less ambiguous and can't be screwed up my
missing #includes or weird #include ordering problems.  But, some folks
prefer to avoid polluting the CONFIG_* space.

That's just pure personal preference though.  Either way is fine.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 0/7] Basic recovery for machine checks inside SGX
  2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
                         ` (7 preceding siblings ...)
  2021-09-01  2:06       ` Jarkko Sakkinen
@ 2021-09-17 21:38       ` Tony Luck
  2021-09-17 21:38         ` [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
                           ` (7 more replies)
  8 siblings, 8 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

Now version 5.

Changes since v4:

Jarkko Sakkinen:
	+ Add linux-sgx@vger.kernel.org to Cc: list
	+ Remove explicit struct sgx_va_page *va_page type
	from argument and use in sgx_alloc_va_page(). Just
	use "void *" as this code doesn't do anything with the
	internals of struct sgx_va_page.
	+ Drop the union of all possible types for the "owner"
	field in struct sgx_epc_page (sorry Dave Hansen, this
	went in last time from your comment, but it doesn't
	seem to add much value). Back to "void *owner;"
	+ rename the xarray that tracks which addresses are
	EPC pages from "epc_page_ranges" to "sgx_epc_address_space".

Dave Hansen:
	+ Use more generic names for the globally visible
	functions that are needed in generic code:
		sgx_memory_failure -> arch_memory_failure
		sgx_is_epc_page -> arch_is_platform_page

Tony Luck:
	+ Found that ghes code spits warnings for memory addresses
	that it thinks are bad. Add a check for SGX pages.

Tony Luck (7):
  x86/sgx: Provide indication of life-cycle of EPC pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/processor.h              |   8 +
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/encl.c                |   5 +-
 arch/x86/kernel/cpu/sgx/encl.h                |   2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |   2 +-
 arch/x86/kernel/cpu/sgx/main.c                | 140 ++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h                 |  14 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  13 ++
 mm/memory-failure.c                           |  19 ++-
 12 files changed, 203 insertions(+), 28 deletions(-)


base-commit: 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
@ 2021-09-17 21:38         ` Tony Luck
  2021-09-21 21:28           ` Jarkko Sakkinen
  2021-09-17 21:38         ` [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
                           ` (6 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

It would be good to use the sgx_epc_page->owner field as an indicator
of where an EPC page is currently in that cycle (owner != NULL means
the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
that calls with NULL.

Since there are multiple uses of the "owner" field with different types
change the sgx_epc_page structure to define an anonymous union with
each of the uses explicitly called out.

Start epc_pages out with a non-NULL owner while they are in DIRTY state.

Fix up the one holdout to provide a non-NULL owner.

Refactor the allocation sequence so that changes to/from NULL
value happen together with adding/removing the epc_page from
a free list while the node->lock is held.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  |  5 +++--
 arch/x86/kernel/cpu/sgx/encl.h  |  2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
 arch/x86/kernel/cpu/sgx/main.c  | 23 ++++++++++++-----------
 arch/x86/kernel/cpu/sgx/sgx.h   | 12 ++++++++----
 5 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..ad8c61933b0a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -667,6 +667,7 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 
 /**
  * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ * @va_page:	struct sgx_va_page connected to this VA page
  *
  * Allocate a free EPC page and convert it to a Version Array (VA) page.
  *
@@ -674,12 +675,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(void)
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page)
 {
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, true);
+	epc_page = sgx_alloc_epc_page(va_page, true);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..3d12dbeae14a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
 
-struct sgx_epc_page *sgx_alloc_va_page(void);
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_va_page *va_page);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..655ce0bb069d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 		if (!va_page)
 			return ERR_PTR(-ENOMEM);
 
-		va_page->epc_page = sgx_alloc_va_page();
+		va_page->epc_page = sgx_alloc_va_page(va_page);
 		if (IS_ERR(va_page->epc_page)) {
 			err = ERR_CAST(va_page->epc_page);
 			kfree(va_page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..4a5b51d16133 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -457,7 +457,7 @@ static bool __init sgx_page_reclaimer_init(void)
 	return true;
 }
 
-static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(void *private, int nid)
 {
 	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
 	struct sgx_epc_page *page = NULL;
@@ -471,6 +471,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
+	page->private = private;
 	sgx_nr_free_pages--;
 
 	spin_unlock(&node->lock);
@@ -480,6 +481,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
+ * @owner:	the owner of the EPC page
  *
  * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
  * from the NUMA node, where the caller is executing.
@@ -488,14 +490,14 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
  * - an EPC page:	A borrowed EPC pages were available.
  * - NULL:		Out of EPC pages.
  */
-struct sgx_epc_page *__sgx_alloc_epc_page(void)
+struct sgx_epc_page *__sgx_alloc_epc_page(void *private)
 {
 	struct sgx_epc_page *page;
 	int nid_of_current = numa_node_id();
 	int nid = nid_of_current;
 
 	if (node_isset(nid_of_current, sgx_numa_mask)) {
-		page = __sgx_alloc_epc_page_from_node(nid_of_current);
+		page = __sgx_alloc_epc_page_from_node(private, nid_of_current);
 		if (page)
 			return page;
 	}
@@ -506,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 		if (nid == nid_of_current)
 			break;
 
-		page = __sgx_alloc_epc_page_from_node(nid);
+		page = __sgx_alloc_epc_page_from_node(private, nid);
 		if (page)
 			return page;
 	}
@@ -559,7 +561,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
 
 /**
  * sgx_alloc_epc_page() - Allocate an EPC page
- * @owner:	the owner of the EPC page
+ * @private:	per-caller private data
  * @reclaim:	reclaim pages if necessary
  *
  * Iterate through EPC sections and borrow a free EPC page to the caller. When a
@@ -574,16 +576,14 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
  *   an EPC page,
  *   -errno on error
  */
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
+struct sgx_epc_page *sgx_alloc_epc_page(void *private, bool reclaim)
 {
 	struct sgx_epc_page *page;
 
 	for ( ; ; ) {
-		page = __sgx_alloc_epc_page();
-		if (!IS_ERR(page)) {
-			page->owner = owner;
+		page = __sgx_alloc_epc_page(private);
+		if (!IS_ERR(page))
 			break;
-		}
 
 		if (list_empty(&sgx_active_page_list))
 			return ERR_PTR(-ENOMEM);
@@ -624,6 +624,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
+	page->private = NULL;
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
@@ -652,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
-		section->pages[i].owner = NULL;
+		section->pages[i].private = "dirty";
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..8b1be10a46f6 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -28,8 +28,12 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
-	struct sgx_encl_page *owner;
+	int flags;
+	union {
+		void *private;
+		struct sgx_encl_page *owner;
+		struct sgx_encl_page *vepc;
+	};
 	struct list_head list;
 };
 
@@ -77,12 +81,12 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 	return section->virt_addr + index * PAGE_SIZE;
 }
 
-struct sgx_epc_page *__sgx_alloc_epc_page(void);
+struct sgx_epc_page *__sgx_alloc_epc_page(void *private);
 void sgx_free_epc_page(struct sgx_epc_page *page);
 
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
 int sgx_unmark_page_reclaimable(struct sgx_epc_page *page);
-struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim);
+struct sgx_epc_page *sgx_alloc_epc_page(void *private, bool reclaim);
 
 #ifdef CONFIG_X86_SGX_KVM
 int __init sgx_vepc_init(void);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
  2021-09-17 21:38         ` [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-09-17 21:38         ` Tony Luck
  2021-09-21 20:23           ` Dave Hansen
  2021-09-17 21:38         ` [PATCH v5 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                           ` (5 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an address
is an EPC page for use elsewhere in the kernel. The ACPI error injection
code needs this function and is typically built as a module, so export it.

Note that arch_is_platform_page() will be slower than other similar "what type
is this page" functions that can simply check bits in the "struct page".
If there is some future performance critical user of this function it
may need to be implemented in a more efficient way.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 10 ++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4a5b51d16133..10892513212d 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(epc_page_ranges);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
+	xa_store_range(&epc_page_ranges, section->phys_addr,
+		       section->end_phys_addr, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -660,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&epc_page_ranges, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 8b1be10a46f6..6a55b1971956 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -54,6 +54,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
  2021-09-17 21:38         ` [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
  2021-09-17 21:38         ` [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-09-17 21:38         ` Tony Luck
  2021-09-17 21:38         ` [PATCH v5 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
administrators get a list of those pages that have been dropped because
of poison.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 30 +++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 10892513212d..7a53ff876059 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/debugfs.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
@@ -43,6 +44,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +64,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +634,10 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	spin_lock(&node->lock);
 
 	page->private = NULL;
-	list_add_tail(&page->list, &node->free_page_list);
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
 	spin_unlock(&node->lock);
@@ -657,6 +668,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
+		section->pages[i].poison = 0;
 		section->pages[i].private = "dirty";
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
@@ -801,8 +813,21 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static int poison_list_show(struct seq_file *m, void *private)
+{
+	struct sgx_epc_page *page;
+
+	list_for_each_entry(page, &sgx_poison_page_list, list)
+		seq_printf(m, "0x%lx\n", sgx_get_epc_phys_addr(page));
+
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(poison_list);
+
 static int __init sgx_init(void)
 {
+	struct dentry *dir;
 	int ret;
 	int i;
 
@@ -834,6 +859,9 @@ static int __init sgx_init(void)
 	if (sgx_vepc_init() && ret)
 		goto err_provision;
 
+	dir = debugfs_create_dir("sgx", arch_debugfs_dir);
+	debugfs_create_file("poison_page_list", 0400, dir, NULL, &poison_list_fops);
+
 	return 0;
 
 err_provision:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 6a55b1971956..77f3d98c9fbf 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -28,7 +28,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	int flags;
+	u16 flags;
+	u16 poison;
 	union {
 		void *private;
 		struct sgx_encl_page *owner;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
                           ` (2 preceding siblings ...)
  2021-09-17 21:38         ` [PATCH v5 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-09-17 21:38         ` Tony Luck
  2021-09-17 21:38         ` [PATCH v5 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

Provide a recovery function arch_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the poison page list.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 7a53ff876059..8f23c8489cec 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -682,6 +682,83 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&epc_page_ranges, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If there is no owner, then the page is on a free list.
+	 * Move it to the poison page list.
+	 */
+	if (!page->private) {
+		list_del(&page->list);
+		list_add(&page->list, &sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
                           ` (3 preceding siblings ...)
  2021-09-17 21:38         ` [PATCH v5 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-09-17 21:38         ` Tony Luck
  2021-09-17 21:38         ` [PATCH v5 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

Add a call inside memory_failure() to check if the address is an SGX
EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 13 +++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..3cc63682fe47 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,18 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 54879c339024..5693bac9509c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 6/7] x86/sgx: Add hook to error injection address validation
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
                           ` (4 preceding siblings ...)
  2021-09-17 21:38         ` [PATCH v5 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-09-17 21:38         ` Tony Luck
  2021-09-17 21:38         ` [PATCH v5 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v5 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
                           ` (5 preceding siblings ...)
  2021-09-17 21:38         ` [PATCH v5 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-09-17 21:38         ` Tony Luck
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-17 21:38 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel, Tony Luck

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additonal check to skip the warning if the address of the error
is in an SGX EPC page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-17 21:38         ` [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-09-21 20:23           ` Dave Hansen
  2021-09-21 20:50             ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-09-21 20:23 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Jarkko Sakkinen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel

On 9/17/21 2:38 PM, Tony Luck wrote:
>  /*
>   * These variables are part of the state of the reclaimer, and must be accessed
> @@ -649,6 +650,9 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	}
>  
>  	section->phys_addr = phys_addr;
> +	section->end_phys_addr = phys_addr + size - 1;
> +	xa_store_range(&epc_page_ranges, section->phys_addr,
> +		       section->end_phys_addr, section, GFP_KERNEL);

Did we ever figure out how much space storing really big ranges in the
xarray consumes?

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-21 20:23           ` Dave Hansen
@ 2021-09-21 20:50             ` Luck, Tony
  2021-09-21 22:32               ` Dave Hansen
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-21 20:50 UTC (permalink / raw)
  To: Hansen, Dave, Sean Christopherson, Jarkko Sakkinen, Matthew Wilcox
  Cc: Zhang, Cathy, linux-sgx, x86, linux-kernel

>>  	section->phys_addr = phys_addr;
>> +	section->end_phys_addr = phys_addr + size - 1;
>> +	xa_store_range(&epc_page_ranges, section->phys_addr,
>> +		       section->end_phys_addr, section, GFP_KERNEL);
>
> Did we ever figure out how much space storing really big ranges in the
> xarray consumes?

No. Willy said the existing xarray code would be less than optimal with
this usage, but that things would be much better when he applied some
maple tree updates to the internals of xarray.

If there is some easy way to measure the memory backing an xarray I'm
happy to get the data. Or if someone else can synthesize it ... the two
ranges on my system that are added to the xarray are:

$ dmesg | grep -i sgx
[    8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff
[    8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff

I.e. two ranges of a bit under 2GB each.

But I don't think the overhead can be too hideous:

$ grep MemFree /proc/meminfo
MemFree:        1048682016 kB

I still have ~ 1TB free. Which is much greater that the 640 KB which should
be "enough for anybody" :-).

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-17 21:38         ` [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-09-21 21:28           ` Jarkko Sakkinen
  2021-09-21 21:34             ` Luck, Tony
  2021-09-21 22:15             ` Dave Hansen
  0 siblings, 2 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-21 21:28 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel

On Fri, 2021-09-17 at 14:38 -0700, Tony Luck wrote:
> SGX EPC pages go through the following life cycle:
> 
>         DIRTY ---> FREE ---> IN-USE --\
>                     ^                 |
>                     \-----------------/
> 
> Recovery action for poison for a DIRTY or FREE page is simple. Just
> make sure never to allocate the page. IN-USE pages need some extra
> handling.
> 
> It would be good to use the sgx_epc_page->owner field as an indicator
> of where an EPC page is currently in that cycle (owner != NULL means
> the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
> that calls with NULL.
> 
> Since there are multiple uses of the "owner" field with different types
> change the sgx_epc_page structure to define an anonymous union with
> each of the uses explicitly called out.

But it's still always a pointer.

And not only that, but two alternative fields in that union have *exactly* the
same type, so it's kind of artifically representing the problem more complex
than it really is.

I'm not just getting, why all this complexity, and not a few casts instead?

I neither get the rename of "owner" to "private". It serves very little value.
I'm not saying that "owner" is best name ever but it's not *that* confusing
either. That I'm sure that it is definitely not very productive to rename it.

Also there was still this "dirty". We could use ((void *)-1), which was also
suggested for earlier revisions.

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-21 21:28           ` Jarkko Sakkinen
@ 2021-09-21 21:34             ` Luck, Tony
  2021-09-22  5:17               ` Jarkko Sakkinen
  2021-09-21 22:15             ` Dave Hansen
  1 sibling, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-21 21:34 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, linux-sgx, x86, linux-kernel

>> Since there are multiple uses of the "owner" field with different types
>> change the sgx_epc_page structure to define an anonymous union with
>> each of the uses explicitly called out.
>
> But it's still always a pointer.
>
> And not only that, but two alternative fields in that union have *exactly* the
> same type, so it's kind of artifically representing the problem more complex
> than it really is.

Bother! I seem to have jumbled some old bits of v4 into this series.

I agree that we just want "void *owner; here.  I even made the changes.
Then managed to lose them while updating.

I'll find the bits I lost and re-merge them in.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-21 21:28           ` Jarkko Sakkinen
  2021-09-21 21:34             ` Luck, Tony
@ 2021-09-21 22:15             ` Dave Hansen
  2021-09-22  5:27               ` Jarkko Sakkinen
  1 sibling, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-09-21 22:15 UTC (permalink / raw)
  To: Jarkko Sakkinen, Tony Luck, Sean Christopherson
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel

On 9/21/21 2:28 PM, Jarkko Sakkinen wrote:
>> Since there are multiple uses of the "owner" field with different types
>> change the sgx_epc_page structure to define an anonymous union with
>> each of the uses explicitly called out.
> But it's still always a pointer.
> 
> And not only that, but two alternative fields in that union have *exactly* the
> same type, so it's kind of artifically representing the problem more complex
> than it really is.
> 
> I'm not just getting, why all this complexity, and not a few casts instead?

I suggested this.  It makes the structure more self-describing because
it explicitly lists the possibles uses of the space in the structure.

Maybe I stare at 'struct page' and its 4 unions too much and I'm
enamored by their shininess.  But, in the end, I prefer unions to casting.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-21 20:50             ` Luck, Tony
@ 2021-09-21 22:32               ` Dave Hansen
  2021-09-21 23:48                 ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Dave Hansen @ 2021-09-21 22:32 UTC (permalink / raw)
  To: Luck, Tony, Sean Christopherson, Jarkko Sakkinen, Matthew Wilcox
  Cc: Zhang, Cathy, linux-sgx, x86, linux-kernel

On 9/21/21 1:50 PM, Luck, Tony wrote:
>> Did we ever figure out how much space storing really big ranges in the
>> xarray consumes?
> No. Willy said the existing xarray code would be less than optimal with
> this usage, but that things would be much better when he applied some
> maple tree updates to the internals of xarray.
> 
> If there is some easy way to measure the memory backing an xarray I'm
> happy to get the data. Or if someone else can synthesize it ... the two
> ranges on my system that are added to the xarray are:
> 
> $ dmesg | grep -i sgx
> [    8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff
> [    8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff
> 
> I.e. two ranges of a bit under 2GB each.
> 
> But I don't think the overhead can be too hideous:
> 
> $ grep MemFree /proc/meminfo
> MemFree:        1048682016 kB
> 
> I still have ~ 1TB free. Which is much greater that the 640 KB which should
> be "enough for anybody" :-).

There is a kmem_cache_create() for the xarray nodes.  So, you should be
able to see the difference in /proc/meminfo's "Slab" field.  Maybe boot
with init=/bin/sh to reduce the noise and look at meminfo both with and
without SGX your patch applied, or just with the xarray bits commented out.

I don't quite know how the data structures are munged, but xas_alloc()
makes it look like 'struct xa_node' is allocated from
radix_tree_node_cachep.  If that's the case, you should also be able to
see this in even more detail in:

# grep radix /proc/slabinfo
radix_tree_node   432305 482412    584   28    4 : tunables    0    0
 0 : slabdata  17229  17229      0

again, on a system with and without your new code enabled.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-21 22:32               ` Dave Hansen
@ 2021-09-21 23:48                 ` Luck, Tony
  2021-09-21 23:50                   ` Dave Hansen
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-21 23:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, Jarkko Sakkinen, Matthew Wilcox, Zhang,
	Cathy, linux-sgx, x86, linux-kernel

On Tue, Sep 21, 2021 at 03:32:14PM -0700, Dave Hansen wrote:
> On 9/21/21 1:50 PM, Luck, Tony wrote:
> >> Did we ever figure out how much space storing really big ranges in the
> >> xarray consumes?
> > No. Willy said the existing xarray code would be less than optimal with
> > this usage, but that things would be much better when he applied some
> > maple tree updates to the internals of xarray.
> > 
> > If there is some easy way to measure the memory backing an xarray I'm
> > happy to get the data. Or if someone else can synthesize it ... the two
> > ranges on my system that are added to the xarray are:
> > 
> > $ dmesg | grep -i sgx
> > [    8.496844] sgx: EPC section 0x8000c00000-0x807f7fffff
> > [    8.505118] sgx: EPC section 0x10000c00000-0x1007fffffff
> > 
> > I.e. two ranges of a bit under 2GB each.
> > 
> > But I don't think the overhead can be too hideous:
> > 
> > $ grep MemFree /proc/meminfo
> > MemFree:        1048682016 kB
> > 
> > I still have ~ 1TB free. Which is much greater that the 640 KB which should
> > be "enough for anybody" :-).
> 
> There is a kmem_cache_create() for the xarray nodes.  So, you should be
> able to see the difference in /proc/meminfo's "Slab" field.  Maybe boot
> with init=/bin/sh to reduce the noise and look at meminfo both with and
> without SGX your patch applied, or just with the xarray bits commented out.
> 
> I don't quite know how the data structures are munged, but xas_alloc()
> makes it look like 'struct xa_node' is allocated from
> radix_tree_node_cachep.  If that's the case, you should also be able to
> see this in even more detail in:
> 
> # grep radix /proc/slabinfo
> radix_tree_node   432305 482412    584   28    4 : tunables    0    0
>  0 : slabdata  17229  17229      0
> 
> again, on a system with and without your new code enabled.


Booting with init=/bin/sh and running that grep command right away at
the prompt:

With the xa_store_range() call commented out of my kernel:

radix_tree_node     9800   9968    584   56    8 : tunables    0    0    0 : slabdata    178    178      0


With xa_store_range() enabled:

radix_tree_node     9950  10136    584   56    8 : tunables    0    0    0 : slabdata    181    181      0



The head of the file says these are the field names:

# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>

So I think this means that I have (9950 - 9800) * 584 = 87600 more bytes
allocated. Maybe that's a lot? But percentage-wise is seems in the
noise. E.g. We allocate one "struct sgx_epc_page" for each SGX page.
On my system I have 4GB of SGX EPC, so around 32 MB of these structures.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-21 23:48                 ` Luck, Tony
@ 2021-09-21 23:50                   ` Dave Hansen
  0 siblings, 0 replies; 185+ messages in thread
From: Dave Hansen @ 2021-09-21 23:50 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Sean Christopherson, Jarkko Sakkinen, Matthew Wilcox, Zhang,
	Cathy, linux-sgx, x86, linux-kernel

On 9/21/21 4:48 PM, Luck, Tony wrote:
> 
> # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> 
> So I think this means that I have (9950 - 9800) * 584 = 87600 more bytes
> allocated. Maybe that's a lot? But percentage-wise is seems in the
> noise. E.g. We allocate one "struct sgx_epc_page" for each SGX page.
> On my system I have 4GB of SGX EPC, so around 32 MB of these structures.

100k for 4GB of EPC is certainly in the noise as far as I'm concerned.

Thanks for checking this.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-21 21:34             ` Luck, Tony
@ 2021-09-22  5:17               ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-22  5:17 UTC (permalink / raw)
  To: Luck, Tony, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, linux-sgx, x86, linux-kernel

On Tue, 2021-09-21 at 21:34 +0000, Luck, Tony wrote:
> > > Since there are multiple uses of the "owner" field with different types
> > > change the sgx_epc_page structure to define an anonymous union with
> > > each of the uses explicitly called out.
> > 
> > But it's still always a pointer.
> > 
> > And not only that, but two alternative fields in that union have *exactly* the
> > same type, so it's kind of artifically representing the problem more complex
> > than it really is.
> 
> Bother! I seem to have jumbled some old bits of v4 into this series.
> 
> I agree that we just want "void *owner; here.  I even made the changes.
> Then managed to lose them while updating.
> 
> I'll find the bits I lost and re-merge them in.
> 
> -Tony

Yeah, ok, cool, thank you. Just reporting what I was observing :-)

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-21 22:15             ` Dave Hansen
@ 2021-09-22  5:27               ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-22  5:27 UTC (permalink / raw)
  To: Dave Hansen, Tony Luck, Sean Christopherson
  Cc: Cathy Zhang, linux-sgx, x86, linux-kernel

On Tue, 2021-09-21 at 15:15 -0700, Dave Hansen wrote:
> On 9/21/21 2:28 PM, Jarkko Sakkinen wrote:
> > > Since there are multiple uses of the "owner" field with different types
> > > change the sgx_epc_page structure to define an anonymous union with
> > > each of the uses explicitly called out.
> > But it's still always a pointer.
> > 
> > And not only that, but two alternative fields in that union have *exactly* the
> > same type, so it's kind of artifically representing the problem more complex
> > than it really is.
> > 
> > I'm not just getting, why all this complexity, and not a few casts instead?
> 
> I suggested this.  It makes the structure more self-describing because
> it explicitly lists the possibles uses of the space in the structure.
> 
> Maybe I stare at 'struct page' and its 4 unions too much and I'm
> enamored by their shininess.  But, in the end, I prefer unions to casting.

Yeah, packing data into constrained space (as in the case of struct page) is
the only application for, where you can speak of a quantitative decision, when
you pick union.

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 0/7] Basic recovery for machine checks inside SGX
  2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
                           ` (6 preceding siblings ...)
  2021-09-17 21:38         ` [PATCH v5 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-09-22 18:21         ` Tony Luck
  2021-09-22 18:21           ` [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
                             ` (7 more replies)
  7 siblings, 8 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Now version 6 (what I actually meant to post as v5).

Note that I've dropped linux-kernel@vger.kernel.org and x86@kernel.org
from the distribution. Time to get some internal agreement on these
changes before bothering the x86 maintainers with yet another version.

So I'm looking for Acked-by: or Reviewed-by: on any bits of this
series that are worthy, and comments on the problems I need to fix
in the not-worthy parts.

Changes since v4 (I'm going to ignore the bogus v5 I posted):

Jarkko Sakkinen:
	+ Add linux-sgx@vger.kernel.org to Cc: list
	+ Remove explicit struct sgx_va_page *va_page type
	from argument and use in sgx_alloc_va_page(). Just
	use "void *" as this code doesn't do anything with the
	internals of struct sgx_va_page.
	+ Drop the union of all possible types for the "owner"
	field in struct sgx_epc_page (sorry Dave Hansen, this
	went in last time from your comment, but it doesn't
	seem to add much value). Back to "void *owner;"
	+ rename the xarray that tracks which addresses are
	EPC pages from "epc_page_ranges" to "sgx_epc_address_space".

Dave Hansen:
	+ Use more generic names for the globally visible
	functions that are needed in generic code:
		sgx_memory_failure -> arch_memory_failure
		sgx_is_epc_page -> arch_is_platform_page
	+ Commit comment on space used by xarray to track EPC pages.

Tony Luck:
	+ Found that ghes code spits warnings for memory addresses
	that it thinks are bad. Add a check for SGX pages.

Tony Luck (7):
  x86/sgx: Provide indication of life-cycle of EPC pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/processor.h              |   8 +
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/encl.c                |   5 +-
 arch/x86/kernel/cpu/sgx/encl.h                |   2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |   2 +-
 arch/x86/kernel/cpu/sgx/main.c                | 137 ++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h                 |   7 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 ++
 mm/memory-failure.c                           |  19 ++-
 12 files changed, 196 insertions(+), 26 deletions(-)


base-commit: e4e737bb5c170df6135a127739a9e6148ee3da82
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-09-22 18:21           ` Tony Luck
  2021-09-23 20:21             ` Jarkko Sakkinen
  2021-09-22 18:21           ` [PATCH v6 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
                             ` (6 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

It would be good to use the sgx_epc_page->owner field as an indicator
of where an EPC page is currently in that cycle (owner != NULL means
the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
that calls with NULL.

Since there are multiple uses of the "owner" field with different types
change the type of sgx_epc_page.owner to "void *.

Start epc_pages out with a non-NULL owner while they are in DIRTY state.

Fix up the one holdout to provide a non-NULL owner.

Refactor the allocation sequence so that changes to/from NULL
value happen together with adding/removing the epc_page from
a free list while the node->lock is held.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  |  5 +++--
 arch/x86/kernel/cpu/sgx/encl.h  |  2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
 arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++----------
 arch/x86/kernel/cpu/sgx/sgx.h   |  4 ++--
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..62cf20d5fbf6 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -667,6 +667,7 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 
 /**
  * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ * @owner:	struct sgx_va_page connected to this VA page
  *
  * Allocate a free EPC page and convert it to a Version Array (VA) page.
  *
@@ -674,12 +675,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(void)
+struct sgx_epc_page *sgx_alloc_va_page(void *owner)
 {
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, true);
+	epc_page = sgx_alloc_epc_page(owner, true);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..2a972bc9b2d1 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
 
-struct sgx_epc_page *sgx_alloc_va_page(void);
+struct sgx_epc_page *sgx_alloc_va_page(void *owner);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 83df20e3e633..655ce0bb069d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 		if (!va_page)
 			return ERR_PTR(-ENOMEM);
 
-		va_page->epc_page = sgx_alloc_va_page();
+		va_page->epc_page = sgx_alloc_va_page(va_page);
 		if (IS_ERR(va_page->epc_page)) {
 			err = ERR_CAST(va_page->epc_page);
 			kfree(va_page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..69743709ec90 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -457,7 +457,7 @@ static bool __init sgx_page_reclaimer_init(void)
 	return true;
 }
 
-static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
+static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(void *owner, int nid)
 {
 	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
 	struct sgx_epc_page *page = NULL;
@@ -471,6 +471,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
+	page->owner = owner;
 	sgx_nr_free_pages--;
 
 	spin_unlock(&node->lock);
@@ -480,6 +481,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 
 /**
  * __sgx_alloc_epc_page() - Allocate an EPC page
+ * @owner:	the owner of the EPC page
  *
  * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
  * from the NUMA node, where the caller is executing.
@@ -488,14 +490,14 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
  * - an EPC page:	A borrowed EPC pages were available.
  * - NULL:		Out of EPC pages.
  */
-struct sgx_epc_page *__sgx_alloc_epc_page(void)
+struct sgx_epc_page *__sgx_alloc_epc_page(void *owner)
 {
 	struct sgx_epc_page *page;
 	int nid_of_current = numa_node_id();
 	int nid = nid_of_current;
 
 	if (node_isset(nid_of_current, sgx_numa_mask)) {
-		page = __sgx_alloc_epc_page_from_node(nid_of_current);
+		page = __sgx_alloc_epc_page_from_node(owner, nid_of_current);
 		if (page)
 			return page;
 	}
@@ -506,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
 		if (nid == nid_of_current)
 			break;
 
-		page = __sgx_alloc_epc_page_from_node(nid);
+		page = __sgx_alloc_epc_page_from_node(owner, nid);
 		if (page)
 			return page;
 	}
@@ -559,7 +561,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
 
 /**
  * sgx_alloc_epc_page() - Allocate an EPC page
- * @owner:	the owner of the EPC page
+ * @owner:	per-caller page owner
  * @reclaim:	reclaim pages if necessary
  *
  * Iterate through EPC sections and borrow a free EPC page to the caller. When a
@@ -579,11 +581,9 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
 	struct sgx_epc_page *page;
 
 	for ( ; ; ) {
-		page = __sgx_alloc_epc_page();
-		if (!IS_ERR(page)) {
-			page->owner = owner;
+		page = __sgx_alloc_epc_page(owner);
+		if (!IS_ERR(page))
 			break;
-		}
 
 		if (list_empty(&sgx_active_page_list))
 			return ERR_PTR(-ENOMEM);
@@ -624,6 +624,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
+	page->owner = NULL;
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
@@ -652,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
-		section->pages[i].owner = NULL;
+		section->pages[i].owner = (void *)-1;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..cc624778645f 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -29,7 +29,7 @@
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
@@ -77,7 +77,7 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page)
 	return section->virt_addr + index * PAGE_SIZE;
 }
 
-struct sgx_epc_page *__sgx_alloc_epc_page(void);
+struct sgx_epc_page *__sgx_alloc_epc_page(void *owner);
 void sgx_free_epc_page(struct sgx_epc_page *page);
 
 void sgx_mark_page_reclaimable(struct sgx_epc_page *page);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-09-22 18:21           ` [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-09-22 18:21           ` Tony Luck
  2021-09-22 18:21           ` [PATCH v6 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an address
is an EPC page for use elsewhere in the kernel. The ACPI error injection
code needs this function and is typically built as a module, so export it.

Note that arch_is_platform_page() will be slower than other similar "what type
is this page" functions that can simply check bits in the "struct page".
If there is some future performance critical user of this function it
may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 69743709ec90..72a173b3affa 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -649,6 +650,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -660,6 +663,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-09-22 18:21           ` [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
  2021-09-22 18:21           ` [PATCH v6 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
@ 2021-09-22 18:21           ` Tony Luck
  2021-09-22 18:21           ` [PATCH v6 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
administrators get a list of those pages that have been dropped because
of poison.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 30 +++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 72a173b3affa..91be079788ee 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/debugfs.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
@@ -43,6 +44,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +64,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +634,10 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 	spin_lock(&node->lock);
 
 	page->owner = NULL;
-	list_add_tail(&page->list, &node->free_page_list);
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 
 	spin_unlock(&node->lock);
@@ -656,6 +667,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
+		section->pages[i].poison = 0;
 		section->pages[i].owner = (void *)-1;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
@@ -800,8 +812,21 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static int poison_list_show(struct seq_file *m, void *private)
+{
+	struct sgx_epc_page *page;
+
+	list_for_each_entry(page, &sgx_poison_page_list, list)
+		seq_printf(m, "0x%lx\n", sgx_get_epc_phys_addr(page));
+
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(poison_list);
+
 static int __init sgx_init(void)
 {
+	struct dentry *dir;
 	int ret;
 	int i;
 
@@ -833,6 +858,9 @@ static int __init sgx_init(void)
 	if (sgx_vepc_init() && ret)
 		goto err_provision;
 
+	dir = debugfs_create_dir("sgx", arch_debugfs_dir);
+	debugfs_create_file("poison_page_list", 0400, dir, NULL, &poison_list_fops);
+
 	return 0;
 
 err_provision:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index cc624778645f..9ba87bc3da61 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -28,7 +28,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	void *owner;
 	struct list_head list;
 };
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
                             ` (2 preceding siblings ...)
  2021-09-22 18:21           ` [PATCH v6 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-09-22 18:21           ` Tony Luck
  2021-09-22 18:21           ` [PATCH v6 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the poison page list.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 91be079788ee..63d6b6d019d0 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -681,6 +681,83 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If there is no owner, then the page is on a free list.
+	 * Move it to the poison page list.
+	 */
+	if (!page->owner) {
+		list_del(&page->list);
+		list_add(&page->list, &sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
                             ` (3 preceding siblings ...)
  2021-09-22 18:21           ` [PATCH v6 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-09-22 18:21           ` Tony Luck
  2021-09-22 18:21           ` [PATCH v6 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 14 ++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..62b199ed5ec6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 54879c339024..5693bac9509c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 6/7] x86/sgx: Add hook to error injection address validation
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
                             ` (4 preceding siblings ...)
  2021-09-22 18:21           ` [PATCH v6 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-09-22 18:21           ` Tony Luck
  2021-09-22 18:21           ` [PATCH v6 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v6 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
                             ` (5 preceding siblings ...)
  2021-09-22 18:21           ` [PATCH v6 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-09-22 18:21           ` Tony Luck
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-22 18:21 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additonal check to skip the warning if the address of the error
is in an SGX EPC page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-22 18:21           ` [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
@ 2021-09-23 20:21             ` Jarkko Sakkinen
  2021-09-23 20:24               ` Jarkko Sakkinen
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-23 20:21 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen; +Cc: Cathy Zhang, linux-sgx

On Wed, 2021-09-22 at 11:21 -0700, Tony Luck wrote:
> SGX EPC pages go through the following life cycle:
> 
>         DIRTY ---> FREE ---> IN-USE --\
>                     ^                 |
>                     \-----------------/
> 
> Recovery action for poison for a DIRTY or FREE page is simple. Just
> make sure never to allocate the page. IN-USE pages need some extra
> handling.
> 
> It would be good to use the sgx_epc_page->owner field as an indicator
> of where an EPC page is currently in that cycle (owner != NULL means
> the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
> that calls with NULL.
> 
> Since there are multiple uses of the "owner" field with different types
> change the type of sgx_epc_page.owner to "void *.
> 
> Start epc_pages out with a non-NULL owner while they are in DIRTY state.
> 
> Fix up the one holdout to provide a non-NULL owner.
> 
> Refactor the allocation sequence so that changes to/from NULL
> value happen together with adding/removing the epc_page from
> a free list while the node->lock is held.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/encl.c  |  5 +++--
>  arch/x86/kernel/cpu/sgx/encl.h  |  2 +-
>  arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
>  arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++----------
>  arch/x86/kernel/cpu/sgx/sgx.h   |  4 ++--
>  5 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 001808e3901c..62cf20d5fbf6 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -667,6 +667,7 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>  
>  /**
>   * sgx_alloc_va_page() - Allocate a Version Array (VA) page
> + * @owner:	struct sgx_va_page connected to this VA page
>   *
>   * Allocate a free EPC page and convert it to a Version Array (VA) page.
>   *
> @@ -674,12 +675,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>   *   a VA page,
>   *   -errno otherwise
>   */
> -struct sgx_epc_page *sgx_alloc_va_page(void)
> +struct sgx_epc_page *sgx_alloc_va_page(void *owner)
>  {
>  	struct sgx_epc_page *epc_page;
>  	int ret;
>  
> -	epc_page = sgx_alloc_epc_page(NULL, true);
> +	epc_page = sgx_alloc_epc_page(owner, true);
>  	if (IS_ERR(epc_page))
>  		return ERR_CAST(epc_page);
>  
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index fec43ca65065..2a972bc9b2d1 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
>  int sgx_encl_test_and_clear_young(struct mm_struct *mm,
>  				  struct sgx_encl_page *page);
>  
> -struct sgx_epc_page *sgx_alloc_va_page(void);
> +struct sgx_epc_page *sgx_alloc_va_page(void *owner);
>  unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
>  void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
>  bool sgx_va_page_full(struct sgx_va_page *va_page);
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 83df20e3e633..655ce0bb069d 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
>  		if (!va_page)
>  			return ERR_PTR(-ENOMEM);
>  
> -		va_page->epc_page = sgx_alloc_va_page();
> +		va_page->epc_page = sgx_alloc_va_page(va_page);
>  		if (IS_ERR(va_page->epc_page)) {
>  			err = ERR_CAST(va_page->epc_page);
>  			kfree(va_page);
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 63d3de02bbcc..69743709ec90 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -457,7 +457,7 @@ static bool __init sgx_page_reclaimer_init(void)
>  	return true;
>  }
>  
> -static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(void *owner, int nid)
>  {
>  	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
>  	struct sgx_epc_page *page = NULL;
> @@ -471,6 +471,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
>  
>  	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
>  	list_del_init(&page->list);
> +	page->owner = owner;
>  	sgx_nr_free_pages--;
>  
>  	spin_unlock(&node->lock);
> @@ -480,6 +481,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
>  
>  /**
>   * __sgx_alloc_epc_page() - Allocate an EPC page
> + * @owner:	the owner of the EPC page
>   *
>   * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
>   * from the NUMA node, where the caller is executing.
> @@ -488,14 +490,14 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
>   * - an EPC page:	A borrowed EPC pages were available.
>   * - NULL:		Out of EPC pages.
>   */
> -struct sgx_epc_page *__sgx_alloc_epc_page(void)
> +struct sgx_epc_page *__sgx_alloc_epc_page(void *owner)
>  {
>  	struct sgx_epc_page *page;
>  	int nid_of_current = numa_node_id();
>  	int nid = nid_of_current;
>  
>  	if (node_isset(nid_of_current, sgx_numa_mask)) {
> -		page = __sgx_alloc_epc_page_from_node(nid_of_current);
> +		page = __sgx_alloc_epc_page_from_node(owner, nid_of_current);
>  		if (page)
>  			return page;
>  	}
> @@ -506,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
>  		if (nid == nid_of_current)
>  			break;
>  
> -		page = __sgx_alloc_epc_page_from_node(nid);
> +		page = __sgx_alloc_epc_page_from_node(owner, nid);
>  		if (page)
>  			return page;
>  	}
> @@ -559,7 +561,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
>  
>  /**
>   * sgx_alloc_epc_page() - Allocate an EPC page
> - * @owner:	the owner of the EPC page
> + * @owner:	per-caller page owner
>   * @reclaim:	reclaim pages if necessary
>   *
>   * Iterate through EPC sections and borrow a free EPC page to the caller. When a
> @@ -579,11 +581,9 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>  	struct sgx_epc_page *page;
>  
>  	for ( ; ; ) {
> -		page = __sgx_alloc_epc_page();
> -		if (!IS_ERR(page)) {
> -			page->owner = owner;
> +		page = __sgx_alloc_epc_page(owner);
> +		if (!IS_ERR(page))
>  			break;
> -		}
>  
>  		if (list_empty(&sgx_active_page_list))
>  			return ERR_PTR(-ENOMEM);
> @@ -624,6 +624,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
>  
>  	spin_lock(&node->lock);
>  
> +	page->owner = NULL;
>  	list_add_tail(&page->list, &node->free_page_list);
>  	sgx_nr_free_pages++;
>  
> @@ -652,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	for (i = 0; i < nr_pages; i++) {
>  		section->pages[i].section = index;
>  		section->pages[i].flags = 0;
> -		section->pages[i].owner = NULL;
> +		section->pages[i].owner = (void *)-1;

Probably should have a named constant.

Anyway, I wonder why we want to do tricks with 'owner', when the
struct has a flags field?

Right now its use is so nice and straight forward, and most
importantly intuitive.

So what I would do instead of this, would be to add something
like

/* Pages, which are being tracked by the page reclaimer. */
#define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)

/* Pages, which are allocated for use. */
#define SGX_EPC_PAGE_ALLOCATED		BIT(1)

This would be set by sgx_alloc_epc_page() and reset by
sgx_free_epc_page().

In the subsequent patch you could then instead of

       /*
        * If there is no owner, then the page is on a free list.
        * Move it to the poison page list.
        */
       if (!page->owner) {
               list_del(&page->list);
               list_add(&page->list, &sgx_poison_page_list);
               goto out;
       }

you would

       /*
        * If there is no owner, then the page is on a free list.
        * Move it to the poison page list.
        */
       if (!page->flags) {
               list_del(&page->list);
               list_add(&page->list, &sgx_poison_page_list);
               goto out;
       }

You don't actually need to compare to that flag because the
invariant would be that it is set, as long as the page is
not explicitly freed.

I think this is a better solution than in the patch set
because it does not introduce any unorthodox use of anything.

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-23 20:21             ` Jarkko Sakkinen
@ 2021-09-23 20:24               ` Jarkko Sakkinen
  2021-09-23 20:46                 ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-23 20:24 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen; +Cc: Cathy Zhang, linux-sgx

On Thu, 2021-09-23 at 23:21 +0300, Jarkko Sakkinen wrote:
> On Wed, 2021-09-22 at 11:21 -0700, Tony Luck wrote:
> > SGX EPC pages go through the following life cycle:
> > 
> >         DIRTY ---> FREE ---> IN-USE --\
> >                     ^                 |
> >                     \-----------------/
> > 
> > Recovery action for poison for a DIRTY or FREE page is simple. Just
> > make sure never to allocate the page. IN-USE pages need some extra
> > handling.
> > 
> > It would be good to use the sgx_epc_page->owner field as an indicator
> > of where an EPC page is currently in that cycle (owner != NULL means
> > the EPC page is IN-USE). But there is one caller, sgx_alloc_va_page(),
> > that calls with NULL.
> > 
> > Since there are multiple uses of the "owner" field with different types
> > change the type of sgx_epc_page.owner to "void *.
> > 
> > Start epc_pages out with a non-NULL owner while they are in DIRTY state.
> > 
> > Fix up the one holdout to provide a non-NULL owner.
> > 
> > Refactor the allocation sequence so that changes to/from NULL
> > value happen together with adding/removing the epc_page from
> > a free list while the node->lock is held.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/encl.c  |  5 +++--
> >  arch/x86/kernel/cpu/sgx/encl.h  |  2 +-
> >  arch/x86/kernel/cpu/sgx/ioctl.c |  2 +-
> >  arch/x86/kernel/cpu/sgx/main.c  | 21 +++++++++++----------
> >  arch/x86/kernel/cpu/sgx/sgx.h   |  4 ++--
> >  5 files changed, 18 insertions(+), 16 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 001808e3901c..62cf20d5fbf6 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -667,6 +667,7 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
> >  
> >  /**
> >   * sgx_alloc_va_page() - Allocate a Version Array (VA) page
> > + * @owner:	struct sgx_va_page connected to this VA page
> >   *
> >   * Allocate a free EPC page and convert it to a Version Array (VA) page.
> >   *
> > @@ -674,12 +675,12 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
> >   *   a VA page,
> >   *   -errno otherwise
> >   */
> > -struct sgx_epc_page *sgx_alloc_va_page(void)
> > +struct sgx_epc_page *sgx_alloc_va_page(void *owner)
> >  {
> >  	struct sgx_epc_page *epc_page;
> >  	int ret;
> >  
> > -	epc_page = sgx_alloc_epc_page(NULL, true);
> > +	epc_page = sgx_alloc_epc_page(owner, true);
> >  	if (IS_ERR(epc_page))
> >  		return ERR_CAST(epc_page);
> >  
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > index fec43ca65065..2a972bc9b2d1 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -111,7 +111,7 @@ void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
> >  int sgx_encl_test_and_clear_young(struct mm_struct *mm,
> >  				  struct sgx_encl_page *page);
> >  
> > -struct sgx_epc_page *sgx_alloc_va_page(void);
> > +struct sgx_epc_page *sgx_alloc_va_page(void *owner);
> >  unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
> >  void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
> >  bool sgx_va_page_full(struct sgx_va_page *va_page);
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 83df20e3e633..655ce0bb069d 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -30,7 +30,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> >  		if (!va_page)
> >  			return ERR_PTR(-ENOMEM);
> >  
> > -		va_page->epc_page = sgx_alloc_va_page();
> > +		va_page->epc_page = sgx_alloc_va_page(va_page);
> >  		if (IS_ERR(va_page->epc_page)) {
> >  			err = ERR_CAST(va_page->epc_page);
> >  			kfree(va_page);
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 63d3de02bbcc..69743709ec90 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -457,7 +457,7 @@ static bool __init sgx_page_reclaimer_init(void)
> >  	return true;
> >  }
> >  
> > -static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> > +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(void *owner, int nid)
> >  {
> >  	struct sgx_numa_node *node = &sgx_numa_nodes[nid];
> >  	struct sgx_epc_page *page = NULL;
> > @@ -471,6 +471,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> >  
> >  	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
> >  	list_del_init(&page->list);
> > +	page->owner = owner;
> >  	sgx_nr_free_pages--;
> >  
> >  	spin_unlock(&node->lock);
> > @@ -480,6 +481,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> >  
> >  /**
> >   * __sgx_alloc_epc_page() - Allocate an EPC page
> > + * @owner:	the owner of the EPC page
> >   *
> >   * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
> >   * from the NUMA node, where the caller is executing.
> > @@ -488,14 +490,14 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> >   * - an EPC page:	A borrowed EPC pages were available.
> >   * - NULL:		Out of EPC pages.
> >   */
> > -struct sgx_epc_page *__sgx_alloc_epc_page(void)
> > +struct sgx_epc_page *__sgx_alloc_epc_page(void *owner)
> >  {
> >  	struct sgx_epc_page *page;
> >  	int nid_of_current = numa_node_id();
> >  	int nid = nid_of_current;
> >  
> >  	if (node_isset(nid_of_current, sgx_numa_mask)) {
> > -		page = __sgx_alloc_epc_page_from_node(nid_of_current);
> > +		page = __sgx_alloc_epc_page_from_node(owner, nid_of_current);
> >  		if (page)
> >  			return page;
> >  	}
> > @@ -506,7 +508,7 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void)
> >  		if (nid == nid_of_current)
> >  			break;
> >  
> > -		page = __sgx_alloc_epc_page_from_node(nid);
> > +		page = __sgx_alloc_epc_page_from_node(owner, nid);
> >  		if (page)
> >  			return page;
> >  	}
> > @@ -559,7 +561,7 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
> >  
> >  /**
> >   * sgx_alloc_epc_page() - Allocate an EPC page
> > - * @owner:	the owner of the EPC page
> > + * @owner:	per-caller page owner
> >   * @reclaim:	reclaim pages if necessary
> >   *
> >   * Iterate through EPC sections and borrow a free EPC page to the caller. When a
> > @@ -579,11 +581,9 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
> >  	struct sgx_epc_page *page;
> >  
> >  	for ( ; ; ) {
> > -		page = __sgx_alloc_epc_page();
> > -		if (!IS_ERR(page)) {
> > -			page->owner = owner;
> > +		page = __sgx_alloc_epc_page(owner);
> > +		if (!IS_ERR(page))
> >  			break;
> > -		}
> >  
> >  		if (list_empty(&sgx_active_page_list))
> >  			return ERR_PTR(-ENOMEM);
> > @@ -624,6 +624,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
> >  
> >  	spin_lock(&node->lock);
> >  
> > +	page->owner = NULL;
> >  	list_add_tail(&page->list, &node->free_page_list);
> >  	sgx_nr_free_pages++;
> >  
> > @@ -652,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
> >  	for (i = 0; i < nr_pages; i++) {
> >  		section->pages[i].section = index;
> >  		section->pages[i].flags = 0;
> > -		section->pages[i].owner = NULL;
> > +		section->pages[i].owner = (void *)-1;
> 
> Probably should have a named constant.
> 
> Anyway, I wonder why we want to do tricks with 'owner', when the
> struct has a flags field?
> 
> Right now its use is so nice and straight forward, and most
> importantly intuitive.
> 
> So what I would do instead of this, would be to add something
> like
> 
> /* Pages, which are being tracked by the page reclaimer. */
> #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
> 
> /* Pages, which are allocated for use. */
> #define SGX_EPC_PAGE_ALLOCATED		BIT(1)
> 
> This would be set by sgx_alloc_epc_page() and reset by
> sgx_free_epc_page().
> 
> In the subsequent patch you could then instead of
> 
>        /*
>         * If there is no owner, then the page is on a free list.
>         * Move it to the poison page list.
>         */
>        if (!page->owner) {
>                list_del(&page->list);
>                list_add(&page->list, &sgx_poison_page_list);
>                goto out;
>        }
> 
> you would
> 
>        /*
>         * If there is no owner, then the page is on a free list.
>         * Move it to the poison page list.
>         */
>        if (!page->flags) {
>                list_del(&page->list);
>                list_add(&page->list, &sgx_poison_page_list);
>                goto out;
>        }
> 
> You don't actually need to compare to that flag because the
> invariant would be that it is set, as long as the page is
> not explicitly freed.
> 
> I think this is a better solution than in the patch set
> because it does not introduce any unorthodox use of anything.

And does not contain any special cases, e.g. when you debug
something you can always assume that a valid owner pointer is
always a legit sgx_encl_page instance.

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-23 20:24               ` Jarkko Sakkinen
@ 2021-09-23 20:46                 ` Luck, Tony
  2021-09-23 22:11                   ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-23 20:46 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: Sean Christopherson, Dave Hansen, Cathy Zhang, linux-sgx

On Thu, Sep 23, 2021 at 11:24:35PM +0300, Jarkko Sakkinen wrote:
> On Thu, 2021-09-23 at 23:21 +0300, Jarkko Sakkinen wrote:
> > On Wed, 2021-09-22 at 11:21 -0700, Tony Luck wrote:
> > > -		section->pages[i].owner = NULL;
> > > +		section->pages[i].owner = (void *)-1;
> > 
> > Probably should have a named constant.
> > 
> > Anyway, I wonder why we want to do tricks with 'owner', when the
> > struct has a flags field?
> > 
> > Right now its use is so nice and straight forward, and most
> > importantly intuitive.
> > 
> > So what I would do instead of this, would be to add something
> > like
> > 
> > /* Pages, which are being tracked by the page reclaimer. */
> > #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
> > 
> > /* Pages, which are allocated for use. */
> > #define SGX_EPC_PAGE_ALLOCATED		BIT(1)
> > 
> > This would be set by sgx_alloc_epc_page() and reset by
> > sgx_free_epc_page().
> > 
> > In the subsequent patch you could then instead of
> > 
> >        /*
> >         * If there is no owner, then the page is on a free list.
> >         * Move it to the poison page list.
> >         */
> >        if (!page->owner) {
> >                list_del(&page->list);
> >                list_add(&page->list, &sgx_poison_page_list);
> >                goto out;
> >        }
> > 
> > you would
> > 
> >        /*
> >         * If there is no owner, then the page is on a free list.
> >         * Move it to the poison page list.
> >         */
> >        if (!page->flags) {
> >                list_del(&page->list);
> >                list_add(&page->list, &sgx_poison_page_list);
> >                goto out;
> >        }
> > 
> > You don't actually need to compare to that flag because the
> > invariant would be that it is set, as long as the page is
> > not explicitly freed.
> > 
> > I think this is a better solution than in the patch set
> > because it does not introduce any unorthodox use of anything.
> 
> And does not contain any special cases, e.g. when you debug
> something you can always assume that a valid owner pointer is
> always a legit sgx_encl_page instance.
> 
Jarkko,

That's nice. It avoids having to create a fictitious owner for
the dirty pages, and for the sgx_alloc_va_page() case. Which
in turn means that the owner field in struct sgx_epc_page can
remain as "struct sgx_encl_page *owner;" (neatly avoiding DaveH's
request that it be an anonymous union of all the possible types,
because it is back to just being one type).

Thanks!  Will include in next version.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-23 20:46                 ` Luck, Tony
@ 2021-09-23 22:11                   ` Luck, Tony
  2021-09-28  2:13                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-23 22:11 UTC (permalink / raw)
  To: Luck, Tony, Jarkko Sakkinen
  Cc: Sean Christopherson, Hansen, Dave, Zhang, Cathy, linux-sgx

> That's nice. It avoids having to create a fictitious owner for
> the dirty pages, and for the sgx_alloc_va_page() case. Which
> in turn means that the owner field in struct sgx_epc_page can
> remain as "struct sgx_encl_page *owner;" (neatly avoiding DaveH's
> request that it be an anonymous union of all the possible types,
> because it is back to just being one type).
>
> Thanks!  Will include in next version.

Also avoids a bunch of refactoring to make sure to set the owner field
while holding zone->lock.

I roughly coded it up and the old part 0001 was:

 arch/x86/kernel/cpu/sgx/encl.c  |    5 +++--
 arch/x86/kernel/cpu/sgx/encl.h  |    2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c |    2 +-
 arch/x86/kernel/cpu/sgx/main.c  |   21 +++++++++++----------
 arch/x86/kernel/cpu/sgx/sgx.h   |    4 ++--
 5 files changed, 18 insertions(+), 16 deletions(-)

which is by no means huge, but the new part 0001 is

 arch/x86/kernel/cpu/sgx/main.c |    4 +++-
 arch/x86/kernel/cpu/sgx/sgx.h  |    3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 0/7] Basic recovery for machine checks inside SGX
  2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
                             ` (6 preceding siblings ...)
  2021-09-22 18:21           ` [PATCH v6 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-09-27 21:34           ` Tony Luck
  2021-09-27 21:34             ` [PATCH v7 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
                               ` (7 more replies)
  7 siblings, 8 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Now version 7

Note that linux-kernel@vger.kernel.org and x86@kernel.org are still
dropped from the distribution. Time to get some internal agreement on these
changes before bothering the x86 maintainers with yet another version.

So I'm looking for Acked-by: or Reviewed-by: on any bits of this
series that are worthy, and comments on the problems I need to fix
in the not-worthy parts.

Changes since v6:

Jarkko Sakkinen:
	Don't use "owner" == NULL vs. != NULL as an indicator
	of whether an SGX EPC page is free vs. in-use. Just add
	a new flags bit. Note this drops most of the changes I
	had in part 0001. Remainder of the patches are largely
	unchanged except where they check for the new flags bit
	instead of owner != NULL.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 121 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 ++
 mm/memory-failure.c                           |  19 ++-
 9 files changed, 185 insertions(+), 11 deletions(-)


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-09-27 21:34             ` Tony Luck
  2021-09-28  2:28               ` Jarkko Sakkinen
  2021-09-27 21:34             ` [PATCH v7 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                               ` (6 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
is allocated and cleared when the page is freed.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IN_USE bit is set.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 4 +++-
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..d18988a46c13 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = SGX_EPC_PAGE_IN_USE;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 }
@@ -651,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
-		section->pages[i].flags = 0;
+		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..f9202d3d6278 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Allocated pages */
+#define SGX_EPC_PAGE_IN_USE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-09-27 21:34             ` [PATCH v7 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
@ 2021-09-27 21:34             ` Tony Luck
  2021-09-28  2:30               ` Jarkko Sakkinen
  2021-09-27 21:34             ` [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                               ` (5 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an address
is an EPC page for use elsewhere in the kernel. The ACPI error injection
code needs this function and is typically built as a module, so export it.

Note that arch_is_platform_page() will be slower than other similar "what type
is this page" functions that can simply check bits in the "struct page".
If there is some future performance critical user of this function it
may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index d18988a46c13..09fa42690ff2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-09-27 21:34             ` [PATCH v7 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
  2021-09-27 21:34             ` [PATCH v7 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-09-27 21:34             ` Tony Luck
  2021-09-28  2:46               ` Jarkko Sakkinen
  2021-09-27 21:34             ` [PATCH v7 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                               ` (4 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
administrators get a list of those pages that have been dropped because
of poison.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 31 ++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 09fa42690ff2..b558c9a80af4 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*  Copyright(c) 2016-20 Intel Corporation. */
 
+#include <linux/debugfs.h>
 #include <linux/file.h>
 #include <linux/freezer.h>
 #include <linux/highmem.h>
@@ -43,6 +44,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +64,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +634,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = 0;
 
@@ -658,6 +670,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
@@ -801,8 +814,21 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static int poison_list_show(struct seq_file *m, void *private)
+{
+	struct sgx_epc_page *page;
+
+	list_for_each_entry(page, &sgx_poison_page_list, list)
+		seq_printf(m, "0x%lx\n", sgx_get_epc_phys_addr(page));
+
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(poison_list);
+
 static int __init sgx_init(void)
 {
+	struct dentry *dir;
 	int ret;
 	int i;
 
@@ -834,6 +860,9 @@ static int __init sgx_init(void)
 	if (sgx_vepc_init() && ret)
 		goto err_provision;
 
+	dir = debugfs_create_dir("sgx", arch_debugfs_dir);
+	debugfs_create_file("poison_page_list", 0400, dir, NULL, &poison_list_fops);
+
 	return 0;
 
 err_provision:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index f9202d3d6278..a990a4c9a00f 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
                               ` (2 preceding siblings ...)
  2021-09-27 21:34             ` [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-09-27 21:34             ` Tony Luck
  2021-09-27 21:34             ` [PATCH v7 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                               ` (3 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the poison page list.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index b558c9a80af4..9931fabb29eb 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -683,6 +683,83 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If flags is zero, then the page is on a free list.
+	 * Move it to the poison page list.
+	 */
+	if (!page->flags) {
+		list_del(&page->list);
+		list_add(&page->list, &sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
                               ` (3 preceding siblings ...)
  2021-09-27 21:34             ` [PATCH v7 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-09-27 21:34             ` Tony Luck
  2021-09-27 21:34             ` [PATCH v7 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                               ` (2 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 14 ++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..62b199ed5ec6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 6/7] x86/sgx: Add hook to error injection address validation
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
                               ` (4 preceding siblings ...)
  2021-09-27 21:34             ` [PATCH v7 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-09-27 21:34             ` Tony Luck
  2021-09-27 21:34             ` [PATCH v7 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v7 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
                               ` (5 preceding siblings ...)
  2021-09-27 21:34             ` [PATCH v7 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-09-27 21:34             ` Tony Luck
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-09-27 21:34 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages
  2021-09-23 22:11                   ` Luck, Tony
@ 2021-09-28  2:13                     ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-28  2:13 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Sean Christopherson, Hansen, Dave, Zhang, Cathy, linux-sgx

On Thu, 2021-09-23 at 22:11 +0000, Luck, Tony wrote:
> > That's nice. It avoids having to create a fictitious owner for
> > the dirty pages, and for the sgx_alloc_va_page() case. Which
> > in turn means that the owner field in struct sgx_epc_page can
> > remain as "struct sgx_encl_page *owner;" (neatly avoiding DaveH's
> > request that it be an anonymous union of all the possible types,
> > because it is back to just being one type).
> > 
> > Thanks!  Will include in next version.
> 
> Also avoids a bunch of refactoring to make sure to set the owner field
> while holding zone->lock.
> 
> I roughly coded it up and the old part 0001 was:
> 
>  arch/x86/kernel/cpu/sgx/encl.c  |    5 +++--
>  arch/x86/kernel/cpu/sgx/encl.h  |    2 +-
>  arch/x86/kernel/cpu/sgx/ioctl.c |    2 +-
>  arch/x86/kernel/cpu/sgx/main.c  |   21 +++++++++++----------
>  arch/x86/kernel/cpu/sgx/sgx.h   |    4 ++--
>  5 files changed, 18 insertions(+), 16 deletions(-)
> 
> which is by no means huge, but the new part 0001 is
> 
>  arch/x86/kernel/cpu/sgx/main.c |    4 +++-
>  arch/x86/kernel/cpu/sgx/sgx.h  |    3 +++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> -Tony

This is good to hear. I guess it is then a no brainer to move into
this direction then.

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v7 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  2021-09-27 21:34             ` [PATCH v7 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
@ 2021-09-28  2:28               ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-28  2:28 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen; +Cc: Cathy Zhang, linux-sgx

On Mon, 2021-09-27 at 14:34 -0700, Tony Luck wrote:
> SGX EPC pages go through the following life cycle:
> 
>         DIRTY ---> FREE ---> IN-USE --\
>                     ^                 |
>                     \-----------------/
> 
> Recovery action for poison for a DIRTY or FREE page is simple. Just
> make sure never to allocate the page. IN-USE pages need some extra
> handling.
> 
> Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
> is allocated and cleared when the page is freed.
> 
> Notes:
> 
> 1) These transitions are made while holding the node->lock so that
>    future code that checks the flags while holding the node->lock
>    can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
>    page is on the free list.
> 
> 2) Initially while the pages are on the dirty list the
>    SGX_EPC_PAGE_IN_USE bit is set.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v7 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-09-27 21:34             ` [PATCH v7 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-09-28  2:30               ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-28  2:30 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen; +Cc: Cathy Zhang, linux-sgx

On Mon, 2021-09-27 at 14:34 -0700, Tony Luck wrote:
> X86 machine check architecture reports a physical address when there
> is a memory error. Handling that error requires a method to determine
> whether the physical address reported is in any of the areas reserved
> for EPC pages by BIOS.
> 
> SGX EPC pages do not have Linux "struct page" associated with them.
> 
> Keep track of the mapping from ranges of EPC pages to the sections
> that contain them using an xarray.
> 
> Create a function arch_is_platform_page() that simply reports whether an address
> is an EPC page for use elsewhere in the kernel. The ACPI error injection
> code needs this function and is typically built as a module, so export it.
> 
> Note that arch_is_platform_page() will be slower than other similar "what type
> is this page" functions that can simply check bits in the "struct page".
> If there is some future performance critical user of this function it
> may need to be implemented in a more efficient way.
> 
> Note also that the current implementation of xarray allocates a few
> hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
> configured. This isn't ideal, but worth it for the code simplicity.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index d18988a46c13..09fa42690ff2 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
>  static int sgx_nr_epc_sections;
>  static struct task_struct *ksgxd_tsk;
>  static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
> +static DEFINE_XARRAY(sgx_epc_address_space);
>  
>  /*
>   * These variables are part of the state of the reclaimer, and must be accessed
> @@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	}
>  
>  	section->phys_addr = phys_addr;
> +	xa_store_range(&sgx_epc_address_space, section->phys_addr,
> +		       phys_addr + size - 1, section, GFP_KERNEL);
>  
>  	for (i = 0; i < nr_pages; i++) {
>  		section->pages[i].section = index;
> @@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  	return true;
>  }
>  
> +bool arch_is_platform_page(u64 paddr)
> +{
> +	return !!xa_load(&sgx_epc_address_space, paddr);
> +}
> +EXPORT_SYMBOL_GPL(arch_is_platform_page);
> +
>  /**
>   * A section metric is concatenated in a way that @low bits 12-31 define the
>   * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-27 21:34             ` [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-09-28  2:46               ` Jarkko Sakkinen
  2021-09-28 15:41                 ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-28  2:46 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen; +Cc: Cathy Zhang, linux-sgx

On Mon, 2021-09-27 at 14:34 -0700, Tony Luck wrote:
> A memory controller patrol scrubber can report poison in a page
> that isn't currently being used.
> 
> Add "poison" field in the sgx_epc_page that can be set for an
> sgx_epc_page. Check for it:
> 1) When sanitizing dirty pages
> 2) When freeing epc pages
> 
> Poison is a new field separated from flags to avoid having to make
> all updates to flags atomic, or integrate poison state changes into
> some other locking scheme to protect flags.
> 
> In both cases place the poisoned page on a list of poisoned epc pages
> to make sure it will not be reallocated.
> 
> Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
> administrators get a list of those pages that have been dropped because
> of poison.

So, what would a sysadmin do with that detailed information?

I would decrease the granularity a bit rather add something like
this for each node:

	/sys/devices/system/node/node[0-9]*/sgx/poisoned_size

which would give the total amount of poisoned memory in bytes
for that node.

See the series that I've recently posted:

https://lore.kernel.org/linux-sgx/20210914030422.377601-1-jarkko@kernel.org/T/#t

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-28  2:46               ` Jarkko Sakkinen
@ 2021-09-28 15:41                 ` Luck, Tony
  2021-09-28 20:11                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-28 15:41 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson, Hansen, Dave
  Cc: Zhang, Cathy, linux-sgx

>> Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
>> administrators get a list of those pages that have been dropped because
>> of poison.
>
> So, what would a sysadmin do with that detailed information?

It's going to be a rare case that there are any poisoned pages on that list
(a large enough cluster will have some systems that have uncorrected
recoverable errors in SGX EPC memory).

Even when there are some poisoned pages, there will only be a few. Systems
that have thousands of pages with uncorrected memory errors will surely crash
because one of those errors is going to either trigger an error marked as fatal,
or the error won’t be recoverable by Linux because it is in kernel memory.

A sysadmin might add a script to run during system shutdown (or periodically
during run-time) to save the poison page list. Then at startup run:

for addr in `cat saved_sgx_poison_page_list`
do
	echo $addr > /sys/devices/system/memory/hard_offline_page
done

to make poison persistent across reboots.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-28 15:41                 ` Luck, Tony
@ 2021-09-28 20:11                   ` Jarkko Sakkinen
  2021-09-28 20:53                     ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-28 20:11 UTC (permalink / raw)
  To: Luck, Tony, Sean Christopherson, Hansen, Dave; +Cc: Zhang, Cathy, linux-sgx

On Tue, 2021-09-28 at 15:41 +0000, Luck, Tony wrote:
> > > Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
> > > administrators get a list of those pages that have been dropped because
> > > of poison.
> > 
> > So, what would a sysadmin do with that detailed information?
> 
> It's going to be a rare case that there are any poisoned pages on that list
> (a large enough cluster will have some systems that have uncorrected
> recoverable errors in SGX EPC memory).
> 
> Even when there are some poisoned pages, there will only be a few. Systems
> that have thousands of pages with uncorrected memory errors will surely crash
> because one of those errors is going to either trigger an error marked as fatal,
> or the error won’t be recoverable by Linux because it is in kernel memory.
> 
> A sysadmin might add a script to run during system shutdown (or periodically
> during run-time) to save the poison page list. Then at startup run:
> 
> for addr in `cat saved_sgx_poison_page_list`
> do
> 	echo $addr > /sys/devices/system/memory/hard_offline_page
> done
> 
> to make poison persistent across reboots.
> 
> -Tony

Couldn't it be a blob with 8 bytes for each address?

/Jarkko

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-28 20:11                   ` Jarkko Sakkinen
@ 2021-09-28 20:53                     ` Luck, Tony
  2021-09-30 14:40                       ` Jarkko Sakkinen
  0 siblings, 1 reply; 185+ messages in thread
From: Luck, Tony @ 2021-09-28 20:53 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, Hansen, Dave, Zhang, Cathy, linux-sgx

On Tue, Sep 28, 2021 at 11:11:30PM +0300, Jarkko Sakkinen wrote:
> On Tue, 2021-09-28 at 15:41 +0000, Luck, Tony wrote:
> > > > Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
> > > > administrators get a list of those pages that have been dropped because
> > > > of poison.
> > > 
> > > So, what would a sysadmin do with that detailed information?
> > 
> > It's going to be a rare case that there are any poisoned pages on that list
> > (a large enough cluster will have some systems that have uncorrected
> > recoverable errors in SGX EPC memory).
> > 
> > Even when there are some poisoned pages, there will only be a few. Systems
> > that have thousands of pages with uncorrected memory errors will surely crash
> > because one of those errors is going to either trigger an error marked as fatal,
> > or the error won’t be recoverable by Linux because it is in kernel memory.
> > 
> > A sysadmin might add a script to run during system shutdown (or periodically
> > during run-time) to save the poison page list. Then at startup run:
> > 
> > for addr in `cat saved_sgx_poison_page_list`
> > do
> > 	echo $addr > /sys/devices/system/memory/hard_offline_page
> > done
> > 
> > to make poison persistent across reboots.
> > 
> > -Tony
> 
> Couldn't it be a blob with 8 bytes for each address?

It could be a blob. But that would require some perl/python
instead of simple shell to do the above persistence trick.

Or I could just drop the debugfs interface from this patch,
waiting until some use case for the data is fleshed out so
that it can be done in the most sensible way for that use case.

Untested updated patch below.

-Tony

From 551fbc5822e8faf93ff53f0a2b2448b0b98f1dde Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck@intel.com>
Date: Mon, 27 Sep 2021 13:26:06 -0700
Subject: [PATCH] x86/sgx: Initial poison handling for dirty and free pages

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 09fa42690ff2..653bace26100 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +633,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = 0;
 
@@ -658,6 +669,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index f9202d3d6278..a990a4c9a00f 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-28 20:53                     ` Luck, Tony
@ 2021-09-30 14:40                       ` Jarkko Sakkinen
  2021-09-30 18:02                         ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-09-30 14:40 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Sean Christopherson, Hansen, Dave, Zhang, Cathy, linux-sgx

On Tue, 2021-09-28 at 13:53 -0700, Luck, Tony wrote:
> On Tue, Sep 28, 2021 at 11:11:30PM +0300, Jarkko Sakkinen wrote:
> > On Tue, 2021-09-28 at 15:41 +0000, Luck, Tony wrote:
> > > > > Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system
> > > > > administrators get a list of those pages that have been dropped because
> > > > > of poison.
> > > > 
> > > > So, what would a sysadmin do with that detailed information?
> > > 
> > > It's going to be a rare case that there are any poisoned pages on that list
> > > (a large enough cluster will have some systems that have uncorrected
> > > recoverable errors in SGX EPC memory).
> > > 
> > > Even when there are some poisoned pages, there will only be a few. Systems
> > > that have thousands of pages with uncorrected memory errors will surely crash
> > > because one of those errors is going to either trigger an error marked as fatal,
> > > or the error won’t be recoverable by Linux because it is in kernel memory.
> > > 
> > > A sysadmin might add a script to run during system shutdown (or periodically
> > > during run-time) to save the poison page list. Then at startup run:
> > > 
> > > for addr in `cat saved_sgx_poison_page_list`
> > > do
> > > 	echo $addr > /sys/devices/system/memory/hard_offline_page
> > > done
> > > 
> > > to make poison persistent across reboots.
> > > 
> > > -Tony
> > 
> > Couldn't it be a blob with 8 bytes for each address?
> 
> It could be a blob. But that would require some perl/python
> instead of simple shell to do the above persistence trick.

The way I've understood it, a list of values breaks sysfs conventions.
There can be only single value per attribute. Even, if the blob is
interpreted as a list of integers, it is still a value, as far as sysfs
is concerned.

I'd also consider programs written with C, or perhaps Rust, when we
(ever) add any new sysfs for SGX. In my opinion, it makes sense to make
any uapi things we add accesible to as many tools as we can.

Such a trivially constructed blob is not enormously hard to parse in any
language, but at least I don't enjoy parsing list of strings in C code,
whereas loading a blob is effortless.

This kind of shows why the current sysfs conventions make sense in the
first place: they enforce to design attributes in the manner that they
are as reachable as possible. That's why I would follow the conventions
in a strict manner.

Finally, I would make a proper sysfs attribute out of this (and a separate
patch), which would be available per node.

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-09-30 14:40                       ` Jarkko Sakkinen
@ 2021-09-30 18:02                         ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-09-30 18:02 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Sean Christopherson, Hansen, Dave, Zhang, Cathy, linux-sgx

On Thu, Sep 30, 2021 at 05:40:18PM +0300, Jarkko Sakkinen wrote:
> On Tue, 2021-09-28 at 13:53 -0700, Luck, Tony wrote:
> > > Couldn't it be a blob with 8 bytes for each address?
> > 
> > It could be a blob. But that would require some perl/python
> > instead of simple shell to do the above persistence trick.
> 
> The way I've understood it, a list of values breaks sysfs conventions.
> There can be only single value per attribute. Even, if the blob is
> interpreted as a list of integers, it is still a value, as far as sysfs
> is concerned.
> 
> I'd also consider programs written with C, or perhaps Rust, when we
> (ever) add any new sysfs for SGX. In my opinion, it makes sense to make
> any uapi things we add accesible to as many tools as we can.
> 
> Such a trivially constructed blob is not enormously hard to parse in any
> language, but at least I don't enjoy parsing list of strings in C code,
> whereas loading a blob is effortless.
> 
> This kind of shows why the current sysfs conventions make sense in the
> first place: they enforce to design attributes in the manner that they
> are as reachable as possible. That's why I would follow the conventions
> in a strict manner.
> 
> Finally, I would make a proper sysfs attribute out of this (and a separate
> patch), which would be available per node.

Those are all good points. I'm going to drop any interface from this
series (because that's above and beyond the goal of "basic machine check
support"). We can spend some time to come up with the right interface
and add that in a future series.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 0/7] Basic recovery for machine checks inside SGX
  2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
                               ` (6 preceding siblings ...)
  2021-09-27 21:34             ` [PATCH v7 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-01 16:47             ` Tony Luck
  2021-10-01 16:47               ` [PATCH v8 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
                                 ` (8 more replies)
  7 siblings, 9 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Now version 8

Note that linux-kernel@vger.kernel.org and x86@kernel.org are still
dropped from the distribution. Time to get some internal agreement on these
changes before bothering the x86 maintainers with yet another version.

So I'm still looking for Acked-by: or Reviewed-by: on any bits of this
series that are worthy, and comments on the problems I need to fix
in the not-worthy parts.

Changes since v7

Parts 1 & 2: Added "Reviewed-by" tag from Jarkko (Thanks!!!)

Part 3: Jarkko had many good questions about the debugfs interface
	that was included to display addresses of pages on the SGX
	poison list. I don't have good answers to them all. While
	this was a useful hook while I was testing these patches
	(check that all the thousands of SGX pages that into which
	I had injected errors showed up on the list) it isn't needed
	for basic recovery. So I dropped the debugfs bits from the
	patch. We can revisit later when there is a clear use case
	for what should be provided.

Parts 4-7: Unchanged.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 +++-
 9 files changed, 168 insertions(+), 11 deletions(-)


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-10-01 16:47               ` Tony Luck
  2021-10-01 16:47               ` [PATCH v8 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                                 ` (7 subsequent siblings)
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
is allocated and cleared when the page is freed.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IN_USE bit is set.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 4 +++-
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..d18988a46c13 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = SGX_EPC_PAGE_IN_USE;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 }
@@ -651,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
-		section->pages[i].flags = 0;
+		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..f9202d3d6278 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Allocated pages */
+#define SGX_EPC_PAGE_IN_USE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-01 16:47               ` [PATCH v8 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
@ 2021-10-01 16:47               ` Tony Luck
  2021-10-01 16:47               ` [PATCH v8 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                                 ` (6 subsequent siblings)
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index d18988a46c13..09fa42690ff2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-01 16:47               ` [PATCH v8 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
  2021-10-01 16:47               ` [PATCH v8 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-01 16:47               ` Tony Luck
  2021-10-04 23:24                 ` Jarkko Sakkinen
  2021-10-01 16:47               ` [PATCH v8 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                                 ` (5 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 09fa42690ff2..653bace26100 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +633,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = 0;
 
@@ -658,6 +669,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index f9202d3d6278..a990a4c9a00f 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                 ` (2 preceding siblings ...)
  2021-10-01 16:47               ` [PATCH v8 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-01 16:47               ` Tony Luck
  2021-10-04 23:30                 ` Jarkko Sakkinen
  2021-10-01 16:47               ` [PATCH v8 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                                 ` (4 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the poison page list.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 653bace26100..398c9749e4d1 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -682,6 +682,83 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If flags is zero, then the page is on a free list.
+	 * Move it to the poison page list.
+	 */
+	if (!page->flags) {
+		list_del(&page->list);
+		list_add(&page->list, &sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                 ` (3 preceding siblings ...)
  2021-10-01 16:47               ` [PATCH v8 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-01 16:47               ` Tony Luck
  2021-10-01 16:47               ` [PATCH v8 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                                 ` (3 subsequent siblings)
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 14 ++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..62b199ed5ec6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                 ` (4 preceding siblings ...)
  2021-10-01 16:47               ` [PATCH v8 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-01 16:47               ` Tony Luck
  2021-10-01 16:47               ` [PATCH v8 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
                                 ` (2 subsequent siblings)
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v8 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                 ` (5 preceding siblings ...)
  2021-10-01 16:47               ` [PATCH v8 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-01 16:47               ` Tony Luck
  2021-10-04 21:56               ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Reinette Chatre
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
  8 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-01 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx, Tony Luck

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v8 0/7] Basic recovery for machine checks inside SGX
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                 ` (6 preceding siblings ...)
  2021-10-01 16:47               ` [PATCH v8 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-04 21:56               ` Reinette Chatre
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
  8 siblings, 0 replies; 185+ messages in thread
From: Reinette Chatre @ 2021-10-04 21:56 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Jarkko Sakkinen, Dave Hansen
  Cc: Cathy Zhang, linux-sgx

Hi Tony,

On 10/1/2021 9:47 AM, Tony Luck wrote:
> Now version 8
> 
> Note that linux-kernel@vger.kernel.org and x86@kernel.org are still
> dropped from the distribution. Time to get some internal agreement on these
> changes before bothering the x86 maintainers with yet another version.
> 
> So I'm still looking for Acked-by: or Reviewed-by: on any bits of this
> series that are worthy, and comments on the problems I need to fix
> in the not-worthy parts.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>

Details of testing:
I was curious how the signaling worked after reading some snippets that 
a vDSO does not generate a signal. To help my understanding I created 
the test below in the SGX selftests that implements the test sequence 
you document in patch 6/7 and with it I can see how the SIGBUS is delivered:

[BEGIN TEST OUTPUT]
#  RUN           enclave.mce ...
MCE test from pid 2746 on addr 0x7fc879644000
Set up error injection on virt 0x7fc879644000 and press any key to 
continue test ...

# mce: Test terminated unexpectedly by signal 7
[END TEST OUTPUT]

Below is the test I ran. It only implements steps 3 to 7 from the test 
sequence you document in patch 6/7. It does still require manual 
intervention to determine the physical address and trigger the error 
injection on the physical address. It also currently treats the SIGBUS 
as a test failure, which I did to help clearly see the signal, but it 
could be changed to TEST_F_SIGNAL to have a SIGBUS mean success. The 
test is thus not appropriate for the SGX selftests in its current form 
but is provided as informational to describe the testing I did. It 
applies on top of the recent SGX selftest changes from: 
https://lore.kernel.org/lkml/cover.1631731214.git.reinette.chatre@intel.com/

---8<---
  tools/testing/selftests/sgx/defines.h   |  7 +++++
  tools/testing/selftests/sgx/main.c      | 35 +++++++++++++++++++++
  tools/testing/selftests/sgx/test_encl.c | 41 +++++++++++++++++++++++++
  3 files changed, 83 insertions(+)

diff --git a/tools/testing/selftests/sgx/defines.h 
b/tools/testing/selftests/sgx/defines.h
index 02d775789ea7..2b471ba68e91 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -24,6 +24,7 @@ enum encl_op_type {
  	ENCL_OP_PUT_TO_ADDRESS,
  	ENCL_OP_GET_FROM_ADDRESS,
  	ENCL_OP_NOP,
+	ENCL_OP_MCE,
  	ENCL_OP_MAX,
  };

@@ -53,4 +54,10 @@ struct encl_op_get_from_addr {
  	uint64_t addr;
  };

+struct encl_op_mce {
+	struct encl_op_header header;
+	uint64_t addr;
+	uint64_t value;
+	uint64_t delay_cycles;
+};
  #endif /* DEFINES_H */
diff --git a/tools/testing/selftests/sgx/main.c 
b/tools/testing/selftests/sgx/main.c
index 79669c245f94..2979306a687a 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -594,4 +594,39 @@ TEST_F(enclave, pte_permissions)
  	EXPECT_EQ(self->run.exception_addr, 0);
  }

+TEST_F_TIMEOUT(enclave, mce, 600)
+{
+	struct encl_op_mce mce_op;
+	unsigned long data_start;
+
+	ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, 
_metadata));
+
+	memset(&self->run, 0, sizeof(self->run));
+	self->run.tcs = self->encl.encl_base;
+
+	data_start = self->encl.encl_base +
+		     encl_get_data_offset(&self->encl) +
+		     PAGE_SIZE;
+
+	printf("MCE test from pid %d on addr 0x%lx\n", getpid(), data_start);
+	/*
+	 * Sanity check to ensure it is possible to write to page that will
+	 * have its permissions manipulated.
+	 */
+
+	printf("Set up error injection on virt 0x%lx and press any key to 
continue test ...\n", data_start);
+	getchar();
+	mce_op.value = MAGIC;
+	mce_op.addr = data_start;
+	mce_op.delay_cycles = 600000000;
+	mce_op.header.type = ENCL_OP_MCE;
+
+	EXPECT_EQ(ENCL_CALL(&mce_op, &self->run, true), 0);
+
+	EXPECT_EEXIT(&self->run);
+	EXPECT_EQ(self->run.exception_vector, 0);
+	EXPECT_EQ(self->run.exception_error_code, 0);
+	EXPECT_EQ(self->run.exception_addr, 0);
+}
+
  TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/sgx/test_encl.c 
b/tools/testing/selftests/sgx/test_encl.c
index 4fca01cfd898..223a80529ba6 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -11,6 +11,11 @@
   */
  static uint8_t encl_buffer[8192] = { 1 };

+static inline void clflush(volatile void *__p)
+{
+	asm volatile("clflush %0" : "+m" (*(volatile char __force *)__p));
+}
+
  static void *memcpy(void *dest, const void *src, size_t n)
  {
  	size_t i;
@@ -35,6 +40,30 @@ static void do_encl_op_get_from_buf(void *op)
  	memcpy(&op2->value, &encl_buffer[0], 8);
  }

+static __always_inline unsigned long long read_tsc(void)
+{
+	unsigned long low, high;
+	asm volatile("rdtsc" : "=a" (low), "=d" (high) :: "ecx");
+	return (low | high << 32);
+}
+
+static __always_inline void rep_nop(void)
+{
+	asm volatile("rep;nop": : :"memory");
+}
+
+void delay_mce(unsigned cycles)
+{
+	unsigned long long start, now;
+	start = read_tsc();
+	for (;;) {
+		now = read_tsc();
+		if (now - start >= cycles)
+			break;
+		rep_nop();
+	}
+}
+
  static void do_encl_op_put_to_addr(void *_op)
  {
  	struct encl_op_put_to_addr *op = _op;
@@ -49,6 +78,17 @@ static void do_encl_op_get_from_addr(void *_op)
  	memcpy(&op->value, (void *)op->addr, 8);
  }

+static void do_encl_op_mce(void *_op)
+{
+	struct encl_op_mce *op = _op;
+
+	memcpy((void *)op->addr, &op->value, 8);
+	clflush((void *)op->addr);
+	delay_mce(op->delay_cycles);
+	memcpy(&op->value, (void *)op->addr, 8);
+}
+
+
  static void do_encl_op_nop(void *_op)
  {

@@ -62,6 +102,7 @@ void encl_body(void *rdi,  void *rsi)
  		do_encl_op_put_to_addr,
  		do_encl_op_get_from_addr,
  		do_encl_op_nop,
+		do_encl_op_mce,
  	};

  	struct encl_op_header *op = (struct encl_op_header *)rdi;

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v8 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-01 16:47               ` [PATCH v8 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-04 23:24                 ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-10-04 23:24 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen; +Cc: Cathy Zhang, linux-sgx

On Fri, 2021-10-01 at 09:47 -0700, Tony Luck wrote:
> A memory controller patrol scrubber can report poison in a page
> that isn't currently being used.
> 
> Add "poison" field in the sgx_epc_page that can be set for an
> sgx_epc_page. Check for it:
> 1) When sanitizing dirty pages
> 2) When freeing epc pages
> 
> Poison is a new field separated from flags to avoid having to make
> all updates to flags atomic, or integrate poison state changes into
> some other locking scheme to protect flags.
> 
> In both cases place the poisoned page on a list of poisoned epc pages
> to make sure it will not be reallocated.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
>  arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
>  2 files changed, 15 insertions(+), 2 deletions(-)

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v8 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-01 16:47               ` [PATCH v8 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-04 23:30                 ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-10-04 23:30 UTC (permalink / raw)
  To: Tony Luck, Sean Christopherson, Dave Hansen; +Cc: Cathy Zhang, linux-sgx

On Fri, 2021-10-01 at 09:47 -0700, Tony Luck wrote:
> Provide a recovery function sgx_memory_failure(). If the poison was
> consumed synchronously then send a SIGBUS. Note that the virtual
> address of the access is not included with the SIGBUS as is the case
> for poison outside of SGX enclaves. This doesn't matter as addresses
> of code/data inside an enclave is of little to no use to code executing
> outside the (now dead) enclave.
> 
> Poison found in a free page results in the page being moved from the
> free list to the poison page list.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>


Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 0/7] Basic recovery for machine checks inside SGX
  2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                 ` (7 preceding siblings ...)
  2021-10-04 21:56               ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Reinette Chatre
@ 2021-10-11 18:59               ` Tony Luck
  2021-10-11 18:59                 ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
                                   ` (8 more replies)
  8 siblings, 9 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck

Posting latest version to a slightly wider audience.

The big picture is that SGX uses some memory pages that are walled off
from access by the OS. This means they:
1) Don't have "struct page" describing them
2) Don't appear in the kernel 1:1 map

But they are still backed by normal DDR memory, so errors can occur.

Parts 1-4 of this series handle the internal SGX bits to keep track of
these pages in an error context. They've had a fair amount of review
on the linux-sgx list (but if any of the 37 subscribers to that list
not named Jarkko or Reinette want to chime in with extra comments and
{Acked,Reviewed,Tested}-by that would be great).

Linux-mm reviewers can (if they like) skip to part 5 where two changes are
made: 1) Hook into memory_failure() in the same spot as device mapping 2)
Skip trying to change 1:1 map (since SGX pages aren't there).

The hooks have generic looking names rather than specifically saying
"sgx" at the suggestion of Dave Hansen. I'm not wedded to the names,
so better suggestions welcome.  I could also change to using some
"ARCH_HAS_PLATFORM_PAGES" config bits if that's the current fashion.

Rafael (and other ACPI list readers) can skip to parts 6 & 7 where there
are hooks into error injection and reporting to simply say "these odd
looking physical addresses are actually ok to use). I added some extra
notes to the einj.rst documentation on how to inject into SGX memory.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 +++-
 9 files changed, 168 insertions(+), 11 deletions(-)


base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
@ 2021-10-11 18:59                 ` Tony Luck
  2021-10-15 22:57                   ` Sean Christopherson
  2021-10-11 18:59                 ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
is allocated and cleared when the page is freed.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IN_USE bit is set.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 4 +++-
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..d18988a46c13 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = SGX_EPC_PAGE_IN_USE;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 }
@@ -651,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
-		section->pages[i].flags = 0;
+		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..f9202d3d6278 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Allocated pages */
+#define SGX_EPC_PAGE_IN_USE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
  2021-10-11 18:59                 ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
@ 2021-10-11 18:59                 ` Tony Luck
  2021-10-22 10:43                   ` kernel test robot
  2021-10-11 18:59                 ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index d18988a46c13..09fa42690ff2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
  2021-10-11 18:59                 ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
  2021-10-11 18:59                 ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-11 18:59                 ` Tony Luck
  2021-10-15 23:07                   ` Sean Christopherson
  2021-10-11 18:59                 ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 09fa42690ff2..653bace26100 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +633,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = 0;
 
@@ -658,6 +669,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index f9202d3d6278..a990a4c9a00f 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
                                   ` (2 preceding siblings ...)
  2021-10-11 18:59                 ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-11 18:59                 ` Tony Luck
  2021-10-15 23:10                   ` Sean Christopherson
  2021-10-11 18:59                 ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the poison page list.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 653bace26100..398c9749e4d1 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -682,6 +682,83 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If flags is zero, then the page is on a free list.
+	 * Move it to the poison page list.
+	 */
+	if (!page->flags) {
+		list_del(&page->list);
+		list_add(&page->list, &sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
                                   ` (3 preceding siblings ...)
  2021-10-11 18:59                 ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-11 18:59                 ` Tony Luck
  2021-10-12 16:49                   ` Jarkko Sakkinen
  2021-10-11 18:59                 ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 14 ++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..62b199ed5ec6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
                                   ` (4 preceding siblings ...)
  2021-10-11 18:59                 ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-11 18:59                 ` Tony Luck
  2021-10-12 16:50                   ` Jarkko Sakkinen
  2021-10-11 18:59                 ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
                                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
                                   ` (5 preceding siblings ...)
  2021-10-11 18:59                 ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-11 18:59                 ` Tony Luck
  2021-10-12 16:51                   ` Jarkko Sakkinen
  2021-10-12 16:48                 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
  8 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 0/7] Basic recovery for machine checks inside SGX
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
                                   ` (6 preceding siblings ...)
  2021-10-11 18:59                 ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-12 16:48                 ` Jarkko Sakkinen
  2021-10-12 17:57                   ` Luck, Tony
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
  8 siblings, 1 reply; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:48 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> Posting latest version to a slightly wider audience.
> 
> The big picture is that SGX uses some memory pages that are walled off
> from access by the OS. This means they:
> 1) Don't have "struct page" describing them
> 2) Don't appear in the kernel 1:1 map
> 
> But they are still backed by normal DDR memory, so errors can occur.
> 
> Parts 1-4 of this series handle the internal SGX bits to keep track of
> these pages in an error context. They've had a fair amount of review
> on the linux-sgx list (but if any of the 37 subscribers to that list
> not named Jarkko or Reinette want to chime in with extra comments and
> {Acked,Reviewed,Tested}-by that would be great).
> 
> Linux-mm reviewers can (if they like) skip to part 5 where two changes are
> made: 1) Hook into memory_failure() in the same spot as device mapping 2)
> Skip trying to change 1:1 map (since SGX pages aren't there).
> 
> The hooks have generic looking names rather than specifically saying
> "sgx" at the suggestion of Dave Hansen. I'm not wedded to the names,
> so better suggestions welcome.  I could also change to using some
> "ARCH_HAS_PLATFORM_PAGES" config bits if that's the current fashion.
> 
> Rafael (and other ACPI list readers) can skip to parts 6 & 7 where there
> are hooks into error injection and reporting to simply say "these odd
> looking physical addresses are actually ok to use). I added some extra
> notes to the einj.rst documentation on how to inject into SGX memory.
> 
> Tony Luck (7):
>   x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
>   x86/sgx: Add infrastructure to identify SGX EPC pages
>   x86/sgx: Initial poison handling for dirty and free pages
>   x86/sgx: Add SGX infrastructure to recover from poison
>   x86/sgx: Hook arch_memory_failure() into mainline code
>   x86/sgx: Add hook to error injection address validation
>   x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
> 
>  .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
>  arch/x86/include/asm/processor.h              |   8 ++
>  arch/x86/include/asm/set_memory.h             |   4 +
>  arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
>  arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
>  drivers/acpi/apei/einj.c                      |   3 +-
>  drivers/acpi/apei/ghes.c                      |   2 +-
>  include/linux/mm.h                            |  14 +++
>  mm/memory-failure.c                           |  19 +++-
>  9 files changed, 168 insertions(+), 11 deletions(-)
> 
> 
> base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc

I think you instructed me on this before but I've forgot it:
how do I simulate this and test how it works?

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-11 18:59                 ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-12 16:49                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:49 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> Add a call inside memory_failure() to call the arch specific code
> to check if the address is an SGX EPC page and handle it.
> 
> Note the SGX EPC pages do not have a "struct page" entry, so the hook
> goes in at the same point as the device mapping hook.
> 
> Pull the call to acquire the mutex earlier so the SGX errors are also
> protected.
> 
> Make set_mce_nospec() skip SGX pages when trying to adjust
> the 1:1 map.
> 
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/processor.h  |  8 ++++++++
>  arch/x86/include/asm/set_memory.h |  4 ++++
>  include/linux/mm.h                | 14 ++++++++++++++
>  mm/memory-failure.c               | 19 +++++++++++++------
>  4 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 9ad2acaaae9b..4865f2860a4f 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -853,4 +853,12 @@ enum mds_mitigations {
>         MDS_MITIGATION_VMWERV,
>  };
>  
> +#ifdef CONFIG_X86_SGX
> +int arch_memory_failure(unsigned long pfn, int flags);
> +#define arch_memory_failure arch_memory_failure
> +
> +bool arch_is_platform_page(u64 paddr);
> +#define arch_is_platform_page arch_is_platform_page
> +#endif
> +
>  #endif /* _ASM_X86_PROCESSOR_H */
> diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
> index 43fa081a1adb..ce8dd215f5b3 100644
> --- a/arch/x86/include/asm/set_memory.h
> +++ b/arch/x86/include/asm/set_memory.h
> @@ -2,6 +2,7 @@
>  #ifndef _ASM_X86_SET_MEMORY_H
>  #define _ASM_X86_SET_MEMORY_H
>  
> +#include <linux/mm.h>
>  #include <asm/page.h>
>  #include <asm-generic/set_memory.h>
>  
> @@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>         unsigned long decoy_addr;
>         int rc;
>  
> +       /* SGX pages are not in the 1:1 map */
> +       if (arch_is_platform_page(pfn << PAGE_SHIFT))
> +               return 0;
>         /*
>          * We would like to just call:
>          *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..62b199ed5ec6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
>         return 0;
>  }
>  
> +#ifndef arch_memory_failure
> +static inline int arch_memory_failure(unsigned long pfn, int flags)
> +{
> +       return -ENXIO;
> +}
> +#endif
> +
> +#ifndef arch_is_platform_page
> +static inline bool arch_is_platform_page(u64 paddr)
> +{
> +       return false;
> +}
> +#endif
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3e6449f2102a..b1cbf9845c19 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
>         if (!sysctl_memory_failure_recovery)
>                 panic("Memory failure on page %lx", pfn);
>  
> +       mutex_lock(&mf_mutex);
> +
>         p = pfn_to_online_page(pfn);
>         if (!p) {
> +               res = arch_memory_failure(pfn, flags);
> +               if (res == 0)
> +                       goto unlock_mutex;
> +
>                 if (pfn_valid(pfn)) {
>                         pgmap = get_dev_pagemap(pfn, NULL);
> -                       if (pgmap)
> -                               return memory_failure_dev_pagemap(pfn, flags,
> -                                                                 pgmap);
> +                       if (pgmap) {
> +                               res = memory_failure_dev_pagemap(pfn, flags,
> +                                                                pgmap);
> +                               goto unlock_mutex;
> +                       }
>                 }
>                 pr_err("Memory failure: %#lx: memory outside kernel control\n",
>                         pfn);
> -               return -ENXIO;
> +               res = -ENXIO;
> +               goto unlock_mutex;
>         }
>  
> -       mutex_lock(&mf_mutex);
> -
>  try_again:
>         if (PageHuge(p)) {
>                 res = memory_failure_hugetlb(pfn, flags);

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-11 18:59                 ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-12 16:50                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:50 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> SGX reserved memory does not appear in the standard address maps.
> 
> Add hook to call into the SGX code to check if an address is located
> in SGX memory.
> 
> There are other challenges in injecting errors into SGX. Update the
> documentation with a sequence of operations to inject.
> 
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
>  drivers/acpi/apei/einj.c                      |  3 ++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
> index c042176e1707..55e2331a6438 100644
> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
> @@ -181,5 +181,24 @@ You should see something like this in dmesg::
>    [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
>    [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0
> channel_mask:1 rank:0)
>  
> +Special notes for injection into SGX enclaves:
> +
> +There may be a separate BIOS setup option to enable SGX injection.
> +
> +The injection process consists of setting some special memory controller
> +trigger that will inject the error on the next write to the target
> +address. But the h/w prevents any software outside of an SGX enclave
> +from accessing enclave pages (even BIOS SMM mode).
> +
> +The following sequence can be used:
> +  1) Determine physical address of enclave page
> +  2) Use "notrigger=1" mode to inject (this will setup
> +     the injection address, but will not actually inject)
> +  3) Enter the enclave
> +  4) Store data to the virtual address matching physical address from step 1
> +  5) Execute CLFLUSH for that virtual address
> +  6) Spin delay for 250ms
> +  7) Read from the virtual address. This will trigger the error
> +
>  For more information about EINJ, please refer to ACPI specification
>  version 4.0, section 17.5 and ACPI 5.0, section 18.6.
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 2882450c443e..67c335baad52 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>             ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
>                                 != REGION_INTERSECTS) &&
>              (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
> -                               != REGION_INTERSECTS)))
> +                               != REGION_INTERSECTS) &&
> +            !arch_is_platform_page(base_addr)))
>                 return -EINVAL;
>  
>  inject:

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-11 18:59                 ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-12 16:51                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 185+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:51 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> SGX EPC pages do not have a "struct page" associated with them so the
> pfn_valid() sanity check fails and results in a warning message to
> the console.
> 
> Add an additional check to skip the warning if the address of the error
> is in an SGX EPC page.
> 
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/acpi/apei/ghes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0c8330ed1ffd..0c5c9acc6254 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
>                 return false;
>  
>         pfn = PHYS_PFN(physical_addr);
> -       if (!pfn_valid(pfn)) {
> +       if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
>                 pr_warn_ratelimited(FW_WARN GHES_PFX
>                 "Invalid address in generic error data: %#llx\n",
>                 physical_addr);

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v9 0/7] Basic recovery for machine checks inside SGX
  2021-10-12 16:48                 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
@ 2021-10-12 17:57                   ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-10-12 17:57 UTC (permalink / raw)
  To: Jarkko Sakkinen, Wysocki, Rafael J, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Hansen, Dave, Zhang, Cathy,
	linux-sgx, linux-acpi, linux-mm

> I think you instructed me on this before but I've forgot it:
> how do I simulate this and test how it works?

Jarkko,

You can test the non-execution paths (e.g. where the memory error is
reported by a patrol scrubber in the memory controller) by:

# echo 0x{some_SGX_EPC_ADDRESS} > /sys/devices/system/memory/hard_offline_page

The execution paths are more difficult. You need a system that can inject
errors into EPC memory. There are some hints in the Documenation changes
in part 0006.

Reinette posted some changes to sgx tests that she used to validate.

-Tony


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  2021-10-11 18:59                 ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
@ 2021-10-15 22:57                   ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2021-10-15 22:57 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Mon, Oct 11, 2021, Tony Luck wrote:
> SGX EPC pages go through the following life cycle:
> 
>         DIRTY ---> FREE ---> IN-USE --\
>                     ^                 |
>                     \-----------------/
> 
> Recovery action for poison for a DIRTY or FREE page is simple. Just
> make sure never to allocate the page. IN-USE pages need some extra
> handling.
> 
> Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
> is allocated and cleared when the page is freed.
> 
> Notes:
> 
> 1) These transitions are made while holding the node->lock so that
>    future code that checks the flags while holding the node->lock
>    can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
>    page is on the free list.
> 
> 2) Initially while the pages are on the dirty list the
>    SGX_EPC_PAGE_IN_USE bit is set.

This needs to state _why_ pages are marked as IN_USE from the get-go.  Ignoring
the "Notes", the whole changelog clearly states the the DIRTY state does _not_
require special handling, but then "Add SGX infrastructure to recover from poison"
goes and relies on it being set.

Alternatively, why not invert it and have SGX_EPC_PAGE_FREE?  That would have
clear semantics, the poison recovery code wouldn't have to assume that !flags
means "free", and the whole changelog becomes:

  Add a flag to explicitly track whether or not an EPC page is on a free list,
  memory failure recovery code needs to be able to detect if a poisoned page is
  free so that recovery can know if it's safe to "steal" the page.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-11 18:59                 ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-15 23:07                   ` Sean Christopherson
  2021-10-15 23:32                     ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2021-10-15 23:07 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Mon, Oct 11, 2021, Tony Luck wrote:
> A memory controller patrol scrubber can report poison in a page
> that isn't currently being used.
> 
> Add "poison" field in the sgx_epc_page that can be set for an
> sgx_epc_page. Check for it:
> 1) When sanitizing dirty pages
> 2) When freeing epc pages
> 
> Poison is a new field separated from flags to avoid having to make
> all updates to flags atomic, or integrate poison state changes into
> some other locking scheme to protect flags.

Explain why atomic would be needed.  I lived in this code for a few years and
still had to look at the source to remember that the reclaimer can set flags
without taking node->lock.

> In both cases place the poisoned page on a list of poisoned epc pages
> to make sure it will not be reallocated.
> 
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
>  arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 09fa42690ff2..653bace26100 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
>  static struct sgx_numa_node *sgx_numa_nodes;
>  
>  static LIST_HEAD(sgx_dirty_page_list);
> +static LIST_HEAD(sgx_poison_page_list);
>  
>  /*
>   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> @@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  
>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>  
> +		if (page->poison) {

Does this need READ_ONCE (and WRITE_ONCE in the writer) to prevent reloading
page->poison since the sanitizer doesn't hold node->lock, i.e. page->poison can
be set any time?  Honest question, I'm terrible with memory ordering rules...

> +			list_del(&page->list);
> +			list_add(&page->list, &sgx_poison_page_list);

list_move()

> +			continue;
> +		}
> +
>  		ret = __eremove(sgx_get_epc_virt_addr(page));
>  		if (!ret) {
>  			/*
> @@ -626,7 +633,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
>  
>  	spin_lock(&node->lock);
>  
> -	list_add_tail(&page->list, &node->free_page_list);
> +	page->owner = NULL;
> +	if (page->poison)
> +		list_add(&page->list, &sgx_poison_page_list);

sgx_poison_page_list is a global list, whereas node->lock is, well, per node.
On a system with multiple EPCs, this could corrupt sgx_poison_page_list if
multiple poisoned pages from different nodes are freed simultaneously.

> +	else
> +		list_add_tail(&page->list, &node->free_page_list);
>  	sgx_nr_free_pages++;
>  	page->flags = 0;
>  
> @@ -658,6 +669,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  		section->pages[i].section = index;
>  		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
>  		section->pages[i].owner = NULL;
> +		section->pages[i].poison = 0;
>  		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
>  	}
>  
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index f9202d3d6278..a990a4c9a00f 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -31,7 +31,8 @@
>  
>  struct sgx_epc_page {
>  	unsigned int section;
> -	unsigned int flags;
> +	u16 flags;
> +	u16 poison;
>  	struct sgx_encl_page *owner;
>  	struct list_head list;
>  };
> 
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-11 18:59                 ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-15 23:10                   ` Sean Christopherson
  2021-10-15 23:19                     ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2021-10-15 23:10 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Mon, Oct 11, 2021, Tony Luck wrote:
> +	section = &sgx_epc_sections[page->section];
> +	node = section->node;
> +
> +	spin_lock(&node->lock);
> +
> +	/* Already poisoned? Nothing more to do */
> +	if (page->poison)
> +		goto out;
> +
> +	page->poison = 1;
> +
> +	/*
> +	 * If flags is zero, then the page is on a free list.
> +	 * Move it to the poison page list.
> +	 */
> +	if (!page->flags) {

If the flag is inverted, this becomes

	if (page->flags & SGX_EPC_PAGE_FREE) {

> +		list_del(&page->list);
> +		list_add(&page->list, &sgx_poison_page_list);

list_move(), and needs the same protection for sgx_poison_page_list.

> +		goto out;
> +	}
> +
> +	/*
> +	 * TBD: Add additional plumbing to enable pre-emptive
> +	 * action for asynchronous poison notification. Until
> +	 * then just hope that the poison:
> +	 * a) is not accessed - sgx_free_epc_page() will deal with it
> +	 *    when the user gives it back
> +	 * b) results in a recoverable machine check rather than
> +	 *    a fatal one
> +	 */
> +out:
> +	spin_unlock(&node->lock);
> +	return 0;
> +}
> +
>  /**
>   * A section metric is concatenated in a way that @low bits 12-31 define the
>   * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
> 
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-15 23:10                   ` Sean Christopherson
@ 2021-10-15 23:19                     ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-10-15 23:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Fri, Oct 15, 2021 at 11:10:32PM +0000, Sean Christopherson wrote:
> On Mon, Oct 11, 2021, Tony Luck wrote:
> > +	section = &sgx_epc_sections[page->section];
> > +	node = section->node;
> > +
> > +	spin_lock(&node->lock);
> > +
> > +	/* Already poisoned? Nothing more to do */
> > +	if (page->poison)
> > +		goto out;
> > +
> > +	page->poison = 1;
> > +
> > +	/*
> > +	 * If flags is zero, then the page is on a free list.
> > +	 * Move it to the poison page list.
> > +	 */
> > +	if (!page->flags) {
> 
> If the flag is inverted, this becomes
> 
> 	if (page->flags & SGX_EPC_PAGE_FREE) {

I like the inversion. I'll switch to SGX_EPC_PAGE_FREE

> 
> > +		list_del(&page->list);
> > +		list_add(&page->list, &sgx_poison_page_list);
> 
> list_move(), and needs the same protection for sgx_poison_page_list.

Didn't know list_move() existed. Will change all the lis_del+list_add
into list_move.

Also change the sgx_poison_page_list from global to per-node. Then
the adds will be safe (accessed while holding the node->lock).


Thanks for the review.

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-15 23:07                   ` Sean Christopherson
@ 2021-10-15 23:32                     ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-10-15 23:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Fri, Oct 15, 2021 at 11:07:48PM +0000, Sean Christopherson wrote:
> On Mon, Oct 11, 2021, Tony Luck wrote:
> > A memory controller patrol scrubber can report poison in a page
> > that isn't currently being used.
> > 
> > Add "poison" field in the sgx_epc_page that can be set for an
> > sgx_epc_page. Check for it:
> > 1) When sanitizing dirty pages
> > 2) When freeing epc pages
> > 
> > Poison is a new field separated from flags to avoid having to make
> > all updates to flags atomic, or integrate poison state changes into
> > some other locking scheme to protect flags.
> 
> Explain why atomic would be needed.  I lived in this code for a few years and
> still had to look at the source to remember that the reclaimer can set flags
> without taking node->lock.

Will add explanation.

> 
> > In both cases place the poisoned page on a list of poisoned epc pages
> > to make sure it will not be reallocated.
> > 
> > Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> > Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
> >  arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 09fa42690ff2..653bace26100 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
> >  static struct sgx_numa_node *sgx_numa_nodes;
> >  
> >  static LIST_HEAD(sgx_dirty_page_list);
> > +static LIST_HEAD(sgx_poison_page_list);
> >  
> >  /*
> >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> > @@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> >  
> >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> >  
> > +		if (page->poison) {
> 
> Does this need READ_ONCE (and WRITE_ONCE in the writer) to prevent reloading
> page->poison since the sanitizer doesn't hold node->lock, i.e. page->poison can
> be set any time?  Honest question, I'm terrible with memory ordering rules...
> 

I think it's safe. I set page->poison in arch_memory_failure() while
holding node->lock in kthread context.  So not "at any time".

This particular read is done without holding the lock ... and is thus
racy. But there are a zillion other races early in boot before the EPC
pages get sanitized and moved to the free list. E.g. if an error is
reported before they are added to the sgx_epc_address_space xarray,
then all this code will just ignore the error as "not in Linux
controlled memory".

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 0/7] Basic recovery for machine checks inside SGX
  2021-10-11 18:59               ` [PATCH v9 " Tony Luck
                                   ` (7 preceding siblings ...)
  2021-10-12 16:48                 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
@ 2021-10-18 20:25                 ` Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
                                     ` (7 more replies)
  8 siblings, 8 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck

v10 (based on v5.15-rc6)

Changes since v9:

ACPI reviewers (Rafael): No changes to parts 6 & 7.

MM reviewers (Horiguchi-san): No changes to part 5.

Jarkko:
	Added Reviewed-by tags to remaining patches.
	N.B. I kept the tags on parts 1, 3, 4 because
	changes based on Sean feedback didn't seem
	consequential. Please let me know if you disagree
	and see new problems introduced by me trying to
	follow Sean's feedback.

Sean:
	1) Reverse the polarity of the neutron flow (sorry,
	Dr Who fan will always autocomplete a sentence that
	begins "reverse the polarity" that way.) Actual change
	is for the new flag bit. Instead of marking in-use
	pages with the new bit, mark free pages instead. This
	avoids the weirdness where I marked the pages on the
	dirty list as "in-use", when clearly they are not.

	2) Race conditions adding poisoned pages to the global
	list of poisoned pages.
	Fixed this by changing from a global list to a per-node
	list. Additions are protected by the node->lock.

	3) Use list_move() instead of list_del(); list_add()
	Fixed both places I used this idiom.

	4) Race looking at page->poison when cleaning dirty pages.
	Added a comment documenting why losing this race isn't
	overly harmful.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 113 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   7 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 ++-
 9 files changed, 179 insertions(+), 10 deletions(-)


base-commit: 519d81956ee277b4419c723adfb154603c2565ba
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
@ 2021-10-18 20:25                   ` Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IS_FREE bit is cleared.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 2 ++
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..825aa91516c8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = SGX_EPC_PAGE_IS_FREE;
 
 	spin_unlock(&node->lock);
 }
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..5906471156c5 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Pages on free list */
+#define SGX_EPC_PAGE_IS_FREE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
@ 2021-10-18 20:25                   ` Tony Luck
  2021-11-19 12:25                     ` kernel test robot
  2021-10-18 20:25                   ` [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 825aa91516c8..5c02cffdabc8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-18 20:25                   ` Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).

In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  4 +++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 5c02cffdabc8..e5fcb8354bcc 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -62,6 +62,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		/*
+		 * Checking page->poison without holding the node->lock
+		 * is racy, but losing the race (i.e. poison is set just
+		 * after the check) just means __eremove() will be uselessly
+		 * called for a page that sgx_free_epc_page() will put onto
+		 * the node->sgx_poison_page_list later.
+		 */
+		if (page->poison) {
+			struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+			struct sgx_numa_node *node = section->node;
+
+			spin_lock(&node->lock);
+			list_move(&page->list, &node->sgx_poison_page_list);
+			spin_unlock(&node->lock);
+
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +644,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &node->sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = SGX_EPC_PAGE_IS_FREE;
 
@@ -658,6 +680,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
@@ -724,6 +747,7 @@ static bool __init sgx_page_cache_init(void)
 		if (!node_isset(nid, sgx_numa_mask)) {
 			spin_lock_init(&sgx_numa_nodes[nid].lock);
 			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
 			node_set(nid, sgx_numa_mask);
 		}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5906471156c5..9ec3136c7800 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
@@ -42,6 +43,7 @@ struct sgx_epc_page {
  */
 struct sgx_numa_node {
 	struct list_head free_page_list;
+	struct list_head sgx_poison_page_list;
 	spinlock_t lock;
 };
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
                                     ` (2 preceding siblings ...)
  2021-10-18 20:25                   ` [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-18 20:25                   ` Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 76 ++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index e5fcb8354bcc..231c494dfd40 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -693,6 +693,82 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If the page is on a free list, move it to the per-node
+	 * poison page list.
+	 */
+	if (page->flags & SGX_EPC_PAGE_IS_FREE) {
+		list_move(&page->list, &node->sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
                                     ` (3 preceding siblings ...)
  2021-10-18 20:25                   ` [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-18 20:25                   ` Tony Luck
  2021-10-20  9:06                     ` Naoya Horiguchi
  2021-10-18 20:25                   ` [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 14 ++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..62b199ed5ec6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
                                     ` (4 preceding siblings ...)
  2021-10-18 20:25                   ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-18 20:25                   ` Tony Luck
  2021-10-18 20:25                   ` [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
                                     ` (5 preceding siblings ...)
  2021-10-18 20:25                   ` [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-18 20:25                   ` Tony Luck
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-18 20:25                   ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-20  9:06                     ` Naoya Horiguchi
  2021-10-20 17:04                       ` Luck, Tony
  0 siblings, 1 reply; 185+ messages in thread
From: Naoya Horiguchi @ 2021-10-20  9:06 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, Oct 18, 2021 at 01:25:40PM -0700, Tony Luck wrote:
> Add a call inside memory_failure() to call the arch specific code
> to check if the address is an SGX EPC page and handle it.
> 
> Note the SGX EPC pages do not have a "struct page" entry, so the hook
> goes in at the same point as the device mapping hook.
> 
> Pull the call to acquire the mutex earlier so the SGX errors are also
> protected.
> 
> Make set_mce_nospec() skip SGX pages when trying to adjust
> the 1:1 map.
> 
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
...
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..62b199ed5ec6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
>  	return 0;
>  }
>  
> +#ifndef arch_memory_failure
> +static inline int arch_memory_failure(unsigned long pfn, int flags)
> +{
> +	return -ENXIO;
> +}
> +#endif
> +
> +#ifndef arch_is_platform_page
> +static inline bool arch_is_platform_page(u64 paddr)
> +{
> +	return false;
> +}
> +#endif
> +

How about putting these definitions near the other related functions
in the same file (like below)?

  ...
  extern void shake_page(struct page *p);
  extern atomic_long_t num_poisoned_pages __read_mostly;
  extern int soft_offline_page(unsigned long pfn, int flags);
  
  // here?
  
  /*
   * Error handlers for various types of pages.
   */
  enum mf_result {

Otherwise, the patch looks good to me.

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 185+ messages in thread

* RE: [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-20  9:06                     ` Naoya Horiguchi
@ 2021-10-20 17:04                       ` Luck, Tony
  0 siblings, 0 replies; 185+ messages in thread
From: Luck, Tony @ 2021-10-20 17:04 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Wysocki, Rafael J, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Hansen, Dave, Zhang, Cathy,
	linux-sgx, linux-acpi, linux-mm, Chatre, Reinette

> How about putting these definitions near the other related functions
> in the same file (like below)?
>
>  ...
>  extern void shake_page(struct page *p);
>  extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(unsigned long pfn, int flags);
>  
>  // here?

Makes sense to group together with these other RAS bits.
I'll move the definitions here.
  

> Otherwise, the patch looks good to me.
>
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks for the review!

-Tony

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-11 18:59                 ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-22 10:43                   ` kernel test robot
  0 siblings, 0 replies; 185+ messages in thread
From: kernel test robot @ 2021-10-22 10:43 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: kbuild-all, Andrew Morton, Linux Memory Management List,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 2730 bytes --]

Hi Tony,

I love your patch! Yet something to improve:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on hnaz-mm/master tip/x86/sgx v5.15-rc6 next-20211021]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tony-Luck/x86-sgx-Add-new-sgx_epc_page-flag-bit-to-mark-in-use-pages/20211012-035926
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-a011-20211011 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/9c7bd2907252bfbf4948be9855e3535319e1e9e4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tony-Luck/x86-sgx-Add-new-sgx_epc_page-flag-bit-to-mark-in-use-pages/20211012-035926
        git checkout 9c7bd2907252bfbf4948be9855e3535319e1e9e4
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: arch/x86/kernel/cpu/sgx/main.o: in function `sgx_setup_epc_section':
>> arch/x86/kernel/cpu/sgx/main.c:654: undefined reference to `xa_store_range'


vim +654 arch/x86/kernel/cpu/sgx/main.c

   635	
   636	static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
   637						 unsigned long index,
   638						 struct sgx_epc_section *section)
   639	{
   640		unsigned long nr_pages = size >> PAGE_SHIFT;
   641		unsigned long i;
   642	
   643		section->virt_addr = memremap(phys_addr, size, MEMREMAP_WB);
   644		if (!section->virt_addr)
   645			return false;
   646	
   647		section->pages = vmalloc(nr_pages * sizeof(struct sgx_epc_page));
   648		if (!section->pages) {
   649			memunmap(section->virt_addr);
   650			return false;
   651		}
   652	
   653		section->phys_addr = phys_addr;
 > 654		xa_store_range(&sgx_epc_address_space, section->phys_addr,
   655			       phys_addr + size - 1, section, GFP_KERNEL);
   656	
   657		for (i = 0; i < nr_pages; i++) {
   658			section->pages[i].section = index;
   659			section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
   660			section->pages[i].owner = NULL;
   661			list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
   662		}
   663	
   664		return true;
   665	}
   666	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33705 bytes --]

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 0/7] Basic recovery for machine checks inside SGX
  2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
                                     ` (6 preceding siblings ...)
  2021-10-18 20:25                   ` [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-26 22:00                   ` Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
                                       ` (6 more replies)
  7 siblings, 7 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck

Boris,

I took this series out of lkml/x86 for a few revisions, I think
the last one posted to lkml was v5. So much has changed since then
that it might be easier to just look at this as if it were v1 and
ignore the earlier history.

First four patches add infrastructure within the SGX code to
track enclave pages (because these pages don't have a "struct
page" as they aren't directly accessible by Linux). All have
"Reviewed-by" tags from Jarkko (SGX maintainer).

Patch 5 hooks into memory_failure() to invoke recovery if
the physical address is in enclave space. This has a
"Reviewed-by" tag from Naoya Horiguchi the maintainer for
mm/memory-failure.c

Patch 6 is a hook into the error injection code and addition
to the error injection documentation explaining extra steps
needed to inject into SGX enclave memory.

Patch 7 is a hook into GHES error reporting path to recognize
that SGX enclave addresses are valid and need processing.

-Tony

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/Kconfig                              |   1 +
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 113 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   7 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  13 ++
 mm/memory-failure.c                           |  19 ++-
 10 files changed, 179 insertions(+), 10 deletions(-)


base-commit: 3906fe9bb7f1a2c8667ae54e967dc8690824f4ea
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-10-26 22:00                     ` Tony Luck
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IS_FREE bit is cleared.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 2 ++
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..825aa91516c8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = SGX_EPC_PAGE_IS_FREE;
 
 	spin_unlock(&node->lock);
 }
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..5906471156c5 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Pages on free list */
+#define SGX_EPC_PAGE_IS_FREE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
@ 2021-10-26 22:00                     ` Tony Luck
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/Kconfig               | 1 +
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9830e7e1060..b3b5b5a31f89 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1902,6 +1902,7 @@ config X86_SGX
 	select SRCU
 	select MMU_NOTIFIER
 	select NUMA_KEEP_MEMINFO if NUMA
+	select XARRAY_MULTI
 	help
 	  Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
 	  that can be used by applications to set aside private regions of code
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 825aa91516c8..5c02cffdabc8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-26 22:00                     ` Tony Luck
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                                       ` (3 subsequent siblings)
  6 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).

In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  4 +++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 5c02cffdabc8..e5fcb8354bcc 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -62,6 +62,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		/*
+		 * Checking page->poison without holding the node->lock
+		 * is racy, but losing the race (i.e. poison is set just
+		 * after the check) just means __eremove() will be uselessly
+		 * called for a page that sgx_free_epc_page() will put onto
+		 * the node->sgx_poison_page_list later.
+		 */
+		if (page->poison) {
+			struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+			struct sgx_numa_node *node = section->node;
+
+			spin_lock(&node->lock);
+			list_move(&page->list, &node->sgx_poison_page_list);
+			spin_unlock(&node->lock);
+
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +644,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &node->sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = SGX_EPC_PAGE_IS_FREE;
 
@@ -658,6 +680,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
@@ -724,6 +747,7 @@ static bool __init sgx_page_cache_init(void)
 		if (!node_isset(nid, sgx_numa_mask)) {
 			spin_lock_init(&sgx_numa_nodes[nid].lock);
 			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
 			node_set(nid, sgx_numa_mask);
 		}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5906471156c5..9ec3136c7800 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
@@ -42,6 +43,7 @@ struct sgx_epc_page {
  */
 struct sgx_numa_node {
 	struct list_head free_page_list;
+	struct list_head sgx_poison_page_list;
 	spinlock_t lock;
 };
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                       ` (2 preceding siblings ...)
  2021-10-26 22:00                     ` [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-26 22:00                     ` Tony Luck
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 76 ++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index e5fcb8354bcc..231c494dfd40 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -693,6 +693,82 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If the page is on a free list, move it to the per-node
+	 * poison page list.
+	 */
+	if (page->flags & SGX_EPC_PAGE_IS_FREE) {
+		list_move(&page->list, &node->sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                       ` (3 preceding siblings ...)
  2021-10-26 22:00                     ` [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-26 22:00                     ` Tony Luck
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  6 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 13 +++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..0aa48b238db2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3124,6 +3124,19 @@ extern void shake_page(struct page *p);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
 
 /*
  * Error handlers for various types of pages.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                       ` (4 preceding siblings ...)
  2021-10-26 22:00                     ` [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-26 22:00                     ` Tony Luck
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  2021-10-26 22:00                     ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  6 siblings, 1 reply; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                                       ` (5 preceding siblings ...)
  2021-10-26 22:00                     ` [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-26 22:00                     ` Tony Luck
  2021-10-29 18:39                       ` Rafael J. Wysocki
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  6 siblings, 2 replies; 185+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-26 22:00                     ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-29 18:39                       ` Rafael J. Wysocki
  2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
  1 sibling, 0 replies; 185+ messages in thread
From: Rafael J. Wysocki @ 2021-10-29 18:39 UTC (permalink / raw)
  To: Tony Luck
  Cc: Borislav Petkov, the arch/x86 maintainers, Rafael J. Wysocki,
	HORIGUCHI NAOYA(堀口 直也),
	Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, ACPI Devel Maling List,
	Linux Memory Management List, Linux Kernel Mailing List,
	Reinette Chatre

On Wed, Oct 27, 2021 at 12:01 AM Tony Luck <tony.luck@intel.com> wrote:
>
> SGX EPC pages do not have a "struct page" associated with them so the
> pfn_valid() sanity check fails and results in a warning message to
> the console.
>
> Add an additional check to skip the warning if the address of the error
> is in an SGX EPC page.
>
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/apei/ghes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0c8330ed1ffd..0c5c9acc6254 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
>                 return false;
>
>         pfn = PHYS_PFN(physical_addr);
> -       if (!pfn_valid(pfn)) {
> +       if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
>                 pr_warn_ratelimited(FW_WARN GHES_PFX
>                 "Invalid address in generic error data: %#llx\n",
>                 physical_addr);
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [tip: x86/sgx] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-26 22:00                     ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  2021-10-29 18:39                       ` Rafael J. Wysocki
@ 2021-11-15 20:22                       ` tip-bot2 for Tony Luck
  1 sibling, 0 replies; 185+ messages in thread
From: tip-bot2 for Tony Luck @ 2021-11-15 20:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Dave Hansen, Jarkko Sakkinen, Reinette Chatre, x86,
	linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     3ad6fd77a2d62e8f4465b429b65805eaf88e1b9e
Gitweb:        https://git.kernel.org/tip/3ad6fd77a2d62e8f4465b429b65805eaf88e1b9e
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 26 Oct 2021 15:00:50 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Mon, 15 Nov 2021 11:13:16 -08:00

x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330e..0c5c9ac 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [tip: x86/sgx] x86/sgx: Add hook to error injection address validation
  2021-10-26 22:00                     ` [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-11-15 20:22                       ` tip-bot2 for Tony Luck
  0 siblings, 0 replies; 185+ messages in thread
From: tip-bot2 for Tony Luck @ 2021-11-15 20:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Dave Hansen, Jarkko Sakkinen, Reinette Chatre, x86,
	linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     c6acb1e7bf4656b9434335c72b8245cc84575fde
Gitweb:        https://git.kernel.org/tip/c6acb1e7bf4656b9434335c72b8245cc84575fde
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 26 Oct 2021 15:00:49 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Mon, 15 Nov 2021 11:13:16 -08:00

x86/sgx: Add hook to error injection address validation

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-7-tony.luck@intel.com
---
 Documentation/firmware-guide/acpi/apei/einj.rst | 19 ++++++++++++++++-
 drivers/acpi/apei/einj.c                        |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176..55e2331 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index edb2622..95cc2a9 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -545,7 +545,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [tip: x86/sgx] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-26 22:00                     ` [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-11-15 20:22                       ` tip-bot2 for Tony Luck
  0 siblings, 0 replies; 185+ messages in thread
From: tip-bot2 for Tony Luck @ 2021-11-15 20:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Dave Hansen, Jarkko Sakkinen, Naoya Horiguchi,
	Reinette Chatre, x86, linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     03b122da74b22fbe7cd98184fa5657a9ce13970c
Gitweb:        https://git.kernel.org/tip/03b122da74b22fbe7cd98184fa5657a9ce13970c
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 26 Oct 2021 15:00:48 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Mon, 15 Nov 2021 11:13:16 -08:00

x86/sgx: Hook arch_memory_failure() into mainline code

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-6-tony.luck@intel.com
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 13 +++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 355d38c..2c5f12a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -855,4 +855,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 8726175..ff0f2d9 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -99,6 +100,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a7e4a9e..57f1aa2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3231,6 +3231,19 @@ extern void shake_page(struct page *p);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
 
 /*
  * Error handlers for various types of pages.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 07c875f..fddee33 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1651,21 +1651,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [tip: x86/sgx] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-26 22:00                     ` [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-11-15 20:22                       ` tip-bot2 for Tony Luck
  0 siblings, 0 replies; 185+ messages in thread
From: tip-bot2 for Tony Luck @ 2021-11-15 20:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Dave Hansen, Jarkko Sakkinen, Reinette Chatre, x86,
	linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     a495cbdffa30558b34f3c95555cecc4fd9688039
Gitweb:        https://git.kernel.org/tip/a495cbdffa30558b34f3c95555cecc4fd9688039
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 26 Oct 2021 15:00:47 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Mon, 15 Nov 2021 11:13:16 -08:00

x86/sgx: Add SGX infrastructure to recover from poison

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-5-tony.luck@intel.com
---
 arch/x86/kernel/cpu/sgx/main.c | 76 +++++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index e5fcb83..231c494 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -693,6 +693,82 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If the page is on a free list, move it to the per-node
+	 * poison page list.
+	 */
+	if (page->flags & SGX_EPC_PAGE_IS_FREE) {
+		list_move(&page->list, &node->sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [tip: x86/sgx] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-26 22:00                     ` [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-11-15 20:22                       ` tip-bot2 for Tony Luck
  0 siblings, 0 replies; 185+ messages in thread
From: tip-bot2 for Tony Luck @ 2021-11-15 20:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Dave Hansen, Jarkko Sakkinen, Reinette Chatre, x86,
	linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     40e0e7843e23d164625b9031514f5672f8758bf4
Gitweb:        https://git.kernel.org/tip/40e0e7843e23d164625b9031514f5672f8758bf4
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 26 Oct 2021 15:00:45 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Mon, 15 Nov 2021 11:13:16 -08:00

x86/sgx: Add infrastructure to identify SGX EPC pages

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-3-tony.luck@intel.com
---
 arch/x86/Kconfig               |  1 +
 arch/x86/kernel/cpu/sgx/main.c |  9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 95dd1ee..b9281fa 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1917,6 +1917,7 @@ config X86_SGX
 	select SRCU
 	select MMU_NOTIFIER
 	select NUMA_KEEP_MEMINFO if NUMA
+	select XARRAY_MULTI
 	help
 	  Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
 	  that can be used by applications to set aside private regions of code
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 825aa91..5c02cff 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [tip: x86/sgx] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-26 22:00                     ` [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-11-15 20:22                       ` tip-bot2 for Tony Luck
  0 siblings, 0 replies; 185+ messages in thread
From: tip-bot2 for Tony Luck @ 2021-11-15 20:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Dave Hansen, Jarkko Sakkinen, Reinette Chatre, x86,
	linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     992801ae92431761b3d8ec88abd5793d154d34ac
Gitweb:        https://git.kernel.org/tip/992801ae92431761b3d8ec88abd5793d154d34ac
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 26 Oct 2021 15:00:46 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Mon, 15 Nov 2021 11:13:16 -08:00

x86/sgx: Initial poison handling for dirty and free pages

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).

In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-4-tony.luck@intel.com
---
 arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  4 +++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 5c02cff..e5fcb83 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -62,6 +62,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		/*
+		 * Checking page->poison without holding the node->lock
+		 * is racy, but losing the race (i.e. poison is set just
+		 * after the check) just means __eremove() will be uselessly
+		 * called for a page that sgx_free_epc_page() will put onto
+		 * the node->sgx_poison_page_list later.
+		 */
+		if (page->poison) {
+			struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+			struct sgx_numa_node *node = section->node;
+
+			spin_lock(&node->lock);
+			list_move(&page->list, &node->sgx_poison_page_list);
+			spin_unlock(&node->lock);
+
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +644,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &node->sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = SGX_EPC_PAGE_IS_FREE;
 
@@ -658,6 +680,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
@@ -724,6 +747,7 @@ static bool __init sgx_page_cache_init(void)
 		if (!node_isset(nid, sgx_numa_mask)) {
 			spin_lock_init(&sgx_numa_nodes[nid].lock);
 			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
 			node_set(nid, sgx_numa_mask);
 		}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5906471..9ec3136 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
@@ -42,6 +43,7 @@ struct sgx_epc_page {
  */
 struct sgx_numa_node {
 	struct list_head free_page_list;
+	struct list_head sgx_poison_page_list;
 	spinlock_t lock;
 };
 

^ permalink raw reply	[flat|nested] 185+ messages in thread

* [tip: x86/sgx] x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  2021-10-26 22:00                     ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
@ 2021-11-15 20:22                       ` tip-bot2 for Tony Luck
  0 siblings, 0 replies; 185+ messages in thread
From: tip-bot2 for Tony Luck @ 2021-11-15 20:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Dave Hansen, Jarkko Sakkinen, Reinette Chatre, x86,
	linux-kernel

The following commit has been merged into the x86/sgx branch of tip:

Commit-ID:     d6d261bded8a57aed4faa12d08a5b193418d3aa4
Gitweb:        https://git.kernel.org/tip/d6d261bded8a57aed4faa12d08a5b193418d3aa4
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 26 Oct 2021 15:00:44 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Mon, 15 Nov 2021 11:13:16 -08:00

x86/sgx: Add new sgx_epc_page flag bit to mark free pages

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IS_FREE bit is cleared.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-2-tony.luck@intel.com
---
 arch/x86/kernel/cpu/sgx/main.c | 2 ++
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de0..825aa91 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = SGX_EPC_PAGE_IS_FREE;
 
 	spin_unlock(&node->lock);
 }
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628ace..5906471 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Pages on free list */
+#define SGX_EPC_PAGE_IS_FREE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-18 20:25                   ` [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-11-19 12:25                     ` kernel test robot
  0 siblings, 0 replies; 185+ messages in thread
From: kernel test robot @ 2021-11-19 12:25 UTC (permalink / raw)
  To: Tony Luck; +Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

Hi Tony,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/x86/sgx]
[also build test ERROR on rafael-pm/linux-next hnaz-mm/master v5.16-rc1 next-20211118]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tony-Luck/x86-sgx-Add-new-sgx_epc_page-flag-bit-to-mark-free-pages/20211019-052547
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 71eba1c0939e3b1ad1b71fe0171de30e265437e3
config: x86_64-randconfig-r001-20211025 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project a461fa64bb37cffd73f683c74f6b0780379fc2ca)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb9a962c5c9e30df5df1546cf8843b3ba3f1476a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tony-Luck/x86-sgx-Add-new-sgx_epc_page-flag-bit-to-mark-free-pages/20211019-052547
        git checkout bb9a962c5c9e30df5df1546cf8843b3ba3f1476a
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: xa_store_range
   >>> referenced by main.c:654 (arch/x86/kernel/cpu/sgx/main.c:654)
   >>>               kernel/cpu/sgx/main.o:(sgx_setup_epc_section) in archive arch/x86/built-in.a

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35893 bytes --]

^ permalink raw reply	[flat|nested] 185+ messages in thread

end of thread, other threads:[~2021-11-19 12:25 UTC | newest]

Thread overview: 185+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
2021-07-09 18:08   ` Jarkko Sakkinen
2021-07-09 18:09     ` Jarkko Sakkinen
2021-07-14 20:42   ` Reinette Chatre
2021-07-14 20:59     ` Luck, Tony
2021-07-14 21:21       ` Reinette Chatre
2021-07-14 23:08         ` Sean Christopherson
2021-07-14 23:39           ` Luck, Tony
2021-07-15 15:33             ` Sean Christopherson
2021-07-08 18:14 ` [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory Tony Luck
2021-07-08 18:14 ` [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-07-08 18:14 ` [PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck
2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
2021-07-19 18:20   ` [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-07-19 18:28     ` Dave Hansen
2021-07-27  2:04     ` Sakkinen, Jarkko
2021-07-19 18:20   ` [PATCH v2 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-07-19 18:20   ` [PATCH v2 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-07-27  2:08     ` Sakkinen, Jarkko
2021-07-19 18:20   ` [PATCH v2 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-07-19 18:20   ` [PATCH v2 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-07-19 18:20   ` [PATCH v2 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
2021-07-27  1:54   ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Sakkinen, Jarkko
2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
2021-07-28 20:46     ` [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-07-28 22:12       ` Dave Hansen
2021-07-28 22:57         ` Luck, Tony
2021-07-28 23:12           ` Dave Hansen
2021-07-28 23:32             ` Sean Christopherson
2021-07-28 23:48               ` Luck, Tony
2021-07-29  0:07                 ` Sean Christopherson
2021-07-29  0:42                   ` Luck, Tony
2021-07-30  0:34           ` Jarkko Sakkinen
2021-07-30  0:33         ` Jarkko Sakkinen
2021-07-28 20:46     ` [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-07-28 22:19       ` Dave Hansen
2021-07-30  0:38         ` Jarkko Sakkinen
2021-07-30 16:46           ` Sean Christopherson
2021-07-30 16:50             ` Dave Hansen
2021-07-30 18:44               ` Luck, Tony
2021-07-30 20:35                 ` Dave Hansen
2021-07-30 23:35                   ` Luck, Tony
2021-08-03 21:34                     ` Matthew Wilcox
2021-08-03 23:49                       ` Luck, Tony
2021-08-02  8:52                 ` Jarkko Sakkinen
2021-08-02  8:51               ` Jarkko Sakkinen
2021-08-02  8:48             ` Jarkko Sakkinen
2021-07-28 20:46     ` [PATCH v3 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-07-30  0:42       ` Jarkko Sakkinen
2021-07-28 20:46     ` [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-07-28 22:29       ` Dave Hansen
2021-07-28 23:00         ` Sean Christopherson
2021-07-28 20:46     ` [PATCH v3 5/7] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-07-28 20:46     ` [PATCH v3 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-07-28 20:46     ` [PATCH v3 7/7] x86/sgx: Add documentation for SGX memory errors Tony Luck
2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
2021-08-27 19:55       ` [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-09-01  3:55         ` Jarkko Sakkinen
2021-08-27 19:55       ` [PATCH v4 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-09-01  4:30         ` Jarkko Sakkinen
2021-08-27 19:55       ` [PATCH v4 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-08-27 19:55       ` [PATCH v4 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-08-27 19:55       ` [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-09-03  6:12         ` Jarkko Sakkinen
2021-09-03  6:56           ` Jarkko Sakkinen
2021-09-06 18:51             ` Luck, Tony
2021-09-07 14:07               ` Jarkko Sakkinen
2021-09-07 14:13                 ` Dave Hansen
2021-09-07 15:07                   ` Luck, Tony
2021-09-07 15:03                 ` Luck, Tony
2021-09-07 15:08                   ` Jarkko Sakkinen
2021-09-07 17:46                     ` Luck, Tony
2021-09-08  0:59                       ` Luck, Tony
2021-09-08 16:49                         ` Dave Hansen
2021-09-08  2:29                       ` Jarkko Sakkinen
2021-08-27 19:55       ` [PATCH v4 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
2021-08-27 20:28       ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Borislav Petkov
2021-08-27 20:43         ` Sean Christopherson
2021-09-01  2:06       ` Jarkko Sakkinen
2021-09-01 14:48         ` Luck, Tony
2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
2021-09-17 21:38         ` [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-09-21 21:28           ` Jarkko Sakkinen
2021-09-21 21:34             ` Luck, Tony
2021-09-22  5:17               ` Jarkko Sakkinen
2021-09-21 22:15             ` Dave Hansen
2021-09-22  5:27               ` Jarkko Sakkinen
2021-09-17 21:38         ` [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-09-21 20:23           ` Dave Hansen
2021-09-21 20:50             ` Luck, Tony
2021-09-21 22:32               ` Dave Hansen
2021-09-21 23:48                 ` Luck, Tony
2021-09-21 23:50                   ` Dave Hansen
2021-09-17 21:38         ` [PATCH v5 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-09-17 21:38         ` [PATCH v5 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-09-17 21:38         ` [PATCH v5 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-09-17 21:38         ` [PATCH v5 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-09-17 21:38         ` [PATCH v5 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-09-22 18:21         ` [PATCH v6 0/7] Basic recovery for machine checks inside SGX Tony Luck
2021-09-22 18:21           ` [PATCH v6 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-09-23 20:21             ` Jarkko Sakkinen
2021-09-23 20:24               ` Jarkko Sakkinen
2021-09-23 20:46                 ` Luck, Tony
2021-09-23 22:11                   ` Luck, Tony
2021-09-28  2:13                     ` Jarkko Sakkinen
2021-09-22 18:21           ` [PATCH v6 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-09-22 18:21           ` [PATCH v6 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-09-22 18:21           ` [PATCH v6 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-09-22 18:21           ` [PATCH v6 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-09-22 18:21           ` [PATCH v6 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-09-22 18:21           ` [PATCH v6 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-09-27 21:34           ` [PATCH v7 0/7] Basic recovery for machine checks inside SGX Tony Luck
2021-09-27 21:34             ` [PATCH v7 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
2021-09-28  2:28               ` Jarkko Sakkinen
2021-09-27 21:34             ` [PATCH v7 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-09-28  2:30               ` Jarkko Sakkinen
2021-09-27 21:34             ` [PATCH v7 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-09-28  2:46               ` Jarkko Sakkinen
2021-09-28 15:41                 ` Luck, Tony
2021-09-28 20:11                   ` Jarkko Sakkinen
2021-09-28 20:53                     ` Luck, Tony
2021-09-30 14:40                       ` Jarkko Sakkinen
2021-09-30 18:02                         ` Luck, Tony
2021-09-27 21:34             ` [PATCH v7 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-09-27 21:34             ` [PATCH v7 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-09-27 21:34             ` [PATCH v7 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-09-27 21:34             ` [PATCH v7 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-01 16:47             ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Tony Luck
2021-10-01 16:47               ` [PATCH v8 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
2021-10-01 16:47               ` [PATCH v8 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-10-01 16:47               ` [PATCH v8 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-10-04 23:24                 ` Jarkko Sakkinen
2021-10-01 16:47               ` [PATCH v8 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-10-04 23:30                 ` Jarkko Sakkinen
2021-10-01 16:47               ` [PATCH v8 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-10-01 16:47               ` [PATCH v8 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-10-01 16:47               ` [PATCH v8 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-04 21:56               ` [PATCH v8 0/7] Basic recovery for machine checks inside SGX Reinette Chatre
2021-10-11 18:59               ` [PATCH v9 " Tony Luck
2021-10-11 18:59                 ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
2021-10-15 22:57                   ` Sean Christopherson
2021-10-11 18:59                 ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-10-22 10:43                   ` kernel test robot
2021-10-11 18:59                 ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-10-15 23:07                   ` Sean Christopherson
2021-10-15 23:32                     ` Luck, Tony
2021-10-11 18:59                 ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-10-15 23:10                   ` Sean Christopherson
2021-10-15 23:19                     ` Luck, Tony
2021-10-11 18:59                 ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-10-12 16:49                   ` Jarkko Sakkinen
2021-10-11 18:59                 ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-10-12 16:50                   ` Jarkko Sakkinen
2021-10-11 18:59                 ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-12 16:51                   ` Jarkko Sakkinen
2021-10-12 16:48                 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
2021-10-12 17:57                   ` Luck, Tony
2021-10-18 20:25                 ` [PATCH v10 " Tony Luck
2021-10-18 20:25                   ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
2021-10-18 20:25                   ` [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-11-19 12:25                     ` kernel test robot
2021-10-18 20:25                   ` [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-10-18 20:25                   ` [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-10-18 20:25                   ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-10-20  9:06                     ` Naoya Horiguchi
2021-10-20 17:04                       ` Luck, Tony
2021-10-18 20:25                   ` [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-10-18 20:25                   ` [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-26 22:00                   ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
2021-10-26 22:00                     ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
2021-10-26 22:00                     ` [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
2021-10-26 22:00                     ` [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
2021-10-26 22:00                     ` [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
2021-10-26 22:00                     ` [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
2021-10-26 22:00                     ` [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck
2021-10-26 22:00                     ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-29 18:39                       ` Rafael J. Wysocki
2021-11-15 20:22                       ` [tip: x86/sgx] " tip-bot2 for Tony Luck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.