linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/7] Basic recovery for machine checks inside SGX
       [not found] <20211001164724.220532-1-tony.luck@intel.com>
@ 2021-10-11 18:59 ` Tony Luck
  2021-10-11 18:59   ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
                     ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck

Posting latest version to a slightly wider audience.

The big picture is that SGX uses some memory pages that are walled off
from access by the OS. This means they:
1) Don't have "struct page" describing them
2) Don't appear in the kernel 1:1 map

But they are still backed by normal DDR memory, so errors can occur.

Parts 1-4 of this series handle the internal SGX bits to keep track of
these pages in an error context. They've had a fair amount of review
on the linux-sgx list (but if any of the 37 subscribers to that list
not named Jarkko or Reinette want to chime in with extra comments and
{Acked,Reviewed,Tested}-by that would be great).

Linux-mm reviewers can (if they like) skip to part 5 where two changes are
made: 1) Hook into memory_failure() in the same spot as device mapping 2)
Skip trying to change 1:1 map (since SGX pages aren't there).

The hooks have generic looking names rather than specifically saying
"sgx" at the suggestion of Dave Hansen. I'm not wedded to the names,
so better suggestions welcome.  I could also change to using some
"ARCH_HAS_PLATFORM_PAGES" config bits if that's the current fashion.

Rafael (and other ACPI list readers) can skip to parts 6 & 7 where there
are hooks into error injection and reporting to simply say "these odd
looking physical addresses are actually ok to use). I added some extra
notes to the einj.rst documentation on how to inject into SGX memory.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 +++-
 9 files changed, 168 insertions(+), 11 deletions(-)


base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc
-- 
2.31.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-10-11 18:59   ` Tony Luck
  2021-10-15 22:57     ` Sean Christopherson
  2021-10-11 18:59   ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
is allocated and cleared when the page is freed.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IN_USE bit is set.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 4 +++-
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..d18988a46c13 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = SGX_EPC_PAGE_IN_USE;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 }
@@ -651,7 +653,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
-		section->pages[i].flags = 0;
+		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..f9202d3d6278 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Allocated pages */
+#define SGX_EPC_PAGE_IN_USE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-11 18:59   ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
@ 2021-10-11 18:59   ` Tony Luck
  2021-10-22 10:43     ` kernel test robot
  2021-10-11 18:59   ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index d18988a46c13..09fa42690ff2 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-11 18:59   ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
  2021-10-11 18:59   ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-11 18:59   ` Tony Luck
  2021-10-15 23:07     ` Sean Christopherson
  2021-10-11 18:59   ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make
all updates to flags atomic, or integrate poison state changes into
some other locking scheme to protect flags.

In both cases place the poisoned page on a list of poisoned epc pages
to make sure it will not be reallocated.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 09fa42690ff2..653bace26100 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
 static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
+static LIST_HEAD(sgx_poison_page_list);
 
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
@@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		if (page->poison) {
+			list_del(&page->list);
+			list_add(&page->list, &sgx_poison_page_list);
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +633,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = 0;
 
@@ -658,6 +669,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index f9202d3d6278..a990a4c9a00f 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
                     ` (2 preceding siblings ...)
  2021-10-11 18:59   ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-11 18:59   ` Tony Luck
  2021-10-15 23:10     ` Sean Christopherson
  2021-10-11 18:59   ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the poison page list.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 653bace26100..398c9749e4d1 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -682,6 +682,83 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If flags is zero, then the page is on a free list.
+	 * Move it to the poison page list.
+	 */
+	if (!page->flags) {
+		list_del(&page->list);
+		list_add(&page->list, &sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
                     ` (3 preceding siblings ...)
  2021-10-11 18:59   ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-11 18:59   ` Tony Luck
  2021-10-12 16:49     ` Jarkko Sakkinen
  2021-10-11 18:59   ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 14 ++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..62b199ed5ec6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
                     ` (4 preceding siblings ...)
  2021-10-11 18:59   ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-11 18:59   ` Tony Luck
  2021-10-12 16:50     ` Jarkko Sakkinen
  2021-10-11 18:59   ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
                     ` (5 preceding siblings ...)
  2021-10-11 18:59   ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-11 18:59   ` Tony Luck
  2021-10-12 16:51     ` Jarkko Sakkinen
  2021-10-12 16:48   ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
  8 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-11 18:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 0/7] Basic recovery for machine checks inside SGX
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
                     ` (6 preceding siblings ...)
  2021-10-11 18:59   ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-12 16:48   ` Jarkko Sakkinen
  2021-10-12 17:57     ` Luck, Tony
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
  8 siblings, 1 reply; 38+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:48 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> Posting latest version to a slightly wider audience.
> 
> The big picture is that SGX uses some memory pages that are walled off
> from access by the OS. This means they:
> 1) Don't have "struct page" describing them
> 2) Don't appear in the kernel 1:1 map
> 
> But they are still backed by normal DDR memory, so errors can occur.
> 
> Parts 1-4 of this series handle the internal SGX bits to keep track of
> these pages in an error context. They've had a fair amount of review
> on the linux-sgx list (but if any of the 37 subscribers to that list
> not named Jarkko or Reinette want to chime in with extra comments and
> {Acked,Reviewed,Tested}-by that would be great).
> 
> Linux-mm reviewers can (if they like) skip to part 5 where two changes are
> made: 1) Hook into memory_failure() in the same spot as device mapping 2)
> Skip trying to change 1:1 map (since SGX pages aren't there).
> 
> The hooks have generic looking names rather than specifically saying
> "sgx" at the suggestion of Dave Hansen. I'm not wedded to the names,
> so better suggestions welcome.  I could also change to using some
> "ARCH_HAS_PLATFORM_PAGES" config bits if that's the current fashion.
> 
> Rafael (and other ACPI list readers) can skip to parts 6 & 7 where there
> are hooks into error injection and reporting to simply say "these odd
> looking physical addresses are actually ok to use). I added some extra
> notes to the einj.rst documentation on how to inject into SGX memory.
> 
> Tony Luck (7):
>   x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
>   x86/sgx: Add infrastructure to identify SGX EPC pages
>   x86/sgx: Initial poison handling for dirty and free pages
>   x86/sgx: Add SGX infrastructure to recover from poison
>   x86/sgx: Hook arch_memory_failure() into mainline code
>   x86/sgx: Add hook to error injection address validation
>   x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
> 
>  .../firmware-guide/acpi/apei/einj.rst         |  19 ++++
>  arch/x86/include/asm/processor.h              |   8 ++
>  arch/x86/include/asm/set_memory.h             |   4 +
>  arch/x86/kernel/cpu/sgx/main.c                | 104 +++++++++++++++++-
>  arch/x86/kernel/cpu/sgx/sgx.h                 |   6 +-
>  drivers/acpi/apei/einj.c                      |   3 +-
>  drivers/acpi/apei/ghes.c                      |   2 +-
>  include/linux/mm.h                            |  14 +++
>  mm/memory-failure.c                           |  19 +++-
>  9 files changed, 168 insertions(+), 11 deletions(-)
> 
> 
> base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc

I think you instructed me on this before but I've forgot it:
how do I simulate this and test how it works?

/Jarkko



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-11 18:59   ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-12 16:49     ` Jarkko Sakkinen
  0 siblings, 0 replies; 38+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:49 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> Add a call inside memory_failure() to call the arch specific code
> to check if the address is an SGX EPC page and handle it.
> 
> Note the SGX EPC pages do not have a "struct page" entry, so the hook
> goes in at the same point as the device mapping hook.
> 
> Pull the call to acquire the mutex earlier so the SGX errors are also
> protected.
> 
> Make set_mce_nospec() skip SGX pages when trying to adjust
> the 1:1 map.
> 
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/processor.h  |  8 ++++++++
>  arch/x86/include/asm/set_memory.h |  4 ++++
>  include/linux/mm.h                | 14 ++++++++++++++
>  mm/memory-failure.c               | 19 +++++++++++++------
>  4 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 9ad2acaaae9b..4865f2860a4f 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -853,4 +853,12 @@ enum mds_mitigations {
>         MDS_MITIGATION_VMWERV,
>  };
>  
> +#ifdef CONFIG_X86_SGX
> +int arch_memory_failure(unsigned long pfn, int flags);
> +#define arch_memory_failure arch_memory_failure
> +
> +bool arch_is_platform_page(u64 paddr);
> +#define arch_is_platform_page arch_is_platform_page
> +#endif
> +
>  #endif /* _ASM_X86_PROCESSOR_H */
> diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
> index 43fa081a1adb..ce8dd215f5b3 100644
> --- a/arch/x86/include/asm/set_memory.h
> +++ b/arch/x86/include/asm/set_memory.h
> @@ -2,6 +2,7 @@
>  #ifndef _ASM_X86_SET_MEMORY_H
>  #define _ASM_X86_SET_MEMORY_H
>  
> +#include <linux/mm.h>
>  #include <asm/page.h>
>  #include <asm-generic/set_memory.h>
>  
> @@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
>         unsigned long decoy_addr;
>         int rc;
>  
> +       /* SGX pages are not in the 1:1 map */
> +       if (arch_is_platform_page(pfn << PAGE_SHIFT))
> +               return 0;
>         /*
>          * We would like to just call:
>          *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..62b199ed5ec6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
>         return 0;
>  }
>  
> +#ifndef arch_memory_failure
> +static inline int arch_memory_failure(unsigned long pfn, int flags)
> +{
> +       return -ENXIO;
> +}
> +#endif
> +
> +#ifndef arch_is_platform_page
> +static inline bool arch_is_platform_page(u64 paddr)
> +{
> +       return false;
> +}
> +#endif
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3e6449f2102a..b1cbf9845c19 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
>         if (!sysctl_memory_failure_recovery)
>                 panic("Memory failure on page %lx", pfn);
>  
> +       mutex_lock(&mf_mutex);
> +
>         p = pfn_to_online_page(pfn);
>         if (!p) {
> +               res = arch_memory_failure(pfn, flags);
> +               if (res == 0)
> +                       goto unlock_mutex;
> +
>                 if (pfn_valid(pfn)) {
>                         pgmap = get_dev_pagemap(pfn, NULL);
> -                       if (pgmap)
> -                               return memory_failure_dev_pagemap(pfn, flags,
> -                                                                 pgmap);
> +                       if (pgmap) {
> +                               res = memory_failure_dev_pagemap(pfn, flags,
> +                                                                pgmap);
> +                               goto unlock_mutex;
> +                       }
>                 }
>                 pr_err("Memory failure: %#lx: memory outside kernel control\n",
>                         pfn);
> -               return -ENXIO;
> +               res = -ENXIO;
> +               goto unlock_mutex;
>         }
>  
> -       mutex_lock(&mf_mutex);
> -
>  try_again:
>         if (PageHuge(p)) {
>                 res = memory_failure_hugetlb(pfn, flags);

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-11 18:59   ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-12 16:50     ` Jarkko Sakkinen
  0 siblings, 0 replies; 38+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:50 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> SGX reserved memory does not appear in the standard address maps.
> 
> Add hook to call into the SGX code to check if an address is located
> in SGX memory.
> 
> There are other challenges in injecting errors into SGX. Update the
> documentation with a sequence of operations to inject.
> 
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
>  drivers/acpi/apei/einj.c                      |  3 ++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
> index c042176e1707..55e2331a6438 100644
> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
> @@ -181,5 +181,24 @@ You should see something like this in dmesg::
>    [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
>    [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0
> channel_mask:1 rank:0)
>  
> +Special notes for injection into SGX enclaves:
> +
> +There may be a separate BIOS setup option to enable SGX injection.
> +
> +The injection process consists of setting some special memory controller
> +trigger that will inject the error on the next write to the target
> +address. But the h/w prevents any software outside of an SGX enclave
> +from accessing enclave pages (even BIOS SMM mode).
> +
> +The following sequence can be used:
> +  1) Determine physical address of enclave page
> +  2) Use "notrigger=1" mode to inject (this will setup
> +     the injection address, but will not actually inject)
> +  3) Enter the enclave
> +  4) Store data to the virtual address matching physical address from step 1
> +  5) Execute CLFLUSH for that virtual address
> +  6) Spin delay for 250ms
> +  7) Read from the virtual address. This will trigger the error
> +
>  For more information about EINJ, please refer to ACPI specification
>  version 4.0, section 17.5 and ACPI 5.0, section 18.6.
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 2882450c443e..67c335baad52 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>             ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
>                                 != REGION_INTERSECTS) &&
>              (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
> -                               != REGION_INTERSECTS)))
> +                               != REGION_INTERSECTS) &&
> +            !arch_is_platform_page(base_addr)))
>                 return -EINVAL;
>  
>  inject:

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-11 18:59   ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-12 16:51     ` Jarkko Sakkinen
  0 siblings, 0 replies; 38+ messages in thread
From: Jarkko Sakkinen @ 2021-10-12 16:51 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, 2021-10-11 at 11:59 -0700, Tony Luck wrote:
> SGX EPC pages do not have a "struct page" associated with them so the
> pfn_valid() sanity check fails and results in a warning message to
> the console.
> 
> Add an additional check to skip the warning if the address of the error
> is in an SGX EPC page.
> 
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/acpi/apei/ghes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0c8330ed1ffd..0c5c9acc6254 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
>                 return false;
>  
>         pfn = PHYS_PFN(physical_addr);
> -       if (!pfn_valid(pfn)) {
> +       if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
>                 pr_warn_ratelimited(FW_WARN GHES_PFX
>                 "Invalid address in generic error data: %#llx\n",
>                 physical_addr);

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko



^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v9 0/7] Basic recovery for machine checks inside SGX
  2021-10-12 16:48   ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
@ 2021-10-12 17:57     ` Luck, Tony
  0 siblings, 0 replies; 38+ messages in thread
From: Luck, Tony @ 2021-10-12 17:57 UTC (permalink / raw)
  To: Jarkko Sakkinen, Wysocki, Rafael J, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Hansen, Dave, Zhang, Cathy,
	linux-sgx, linux-acpi, linux-mm

> I think you instructed me on this before but I've forgot it:
> how do I simulate this and test how it works?

Jarkko,

You can test the non-execution paths (e.g. where the memory error is
reported by a patrol scrubber in the memory controller) by:

# echo 0x{some_SGX_EPC_ADDRESS} > /sys/devices/system/memory/hard_offline_page

The execution paths are more difficult. You need a system that can inject
errors into EPC memory. There are some hints in the Documenation changes
in part 0006.

Reinette posted some changes to sgx tests that she used to validate.

-Tony


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
  2021-10-11 18:59   ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
@ 2021-10-15 22:57     ` Sean Christopherson
  0 siblings, 0 replies; 38+ messages in thread
From: Sean Christopherson @ 2021-10-15 22:57 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Mon, Oct 11, 2021, Tony Luck wrote:
> SGX EPC pages go through the following life cycle:
> 
>         DIRTY ---> FREE ---> IN-USE --\
>                     ^                 |
>                     \-----------------/
> 
> Recovery action for poison for a DIRTY or FREE page is simple. Just
> make sure never to allocate the page. IN-USE pages need some extra
> handling.
> 
> Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
> is allocated and cleared when the page is freed.
> 
> Notes:
> 
> 1) These transitions are made while holding the node->lock so that
>    future code that checks the flags while holding the node->lock
>    can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
>    page is on the free list.
> 
> 2) Initially while the pages are on the dirty list the
>    SGX_EPC_PAGE_IN_USE bit is set.

This needs to state _why_ pages are marked as IN_USE from the get-go.  Ignoring
the "Notes", the whole changelog clearly states the the DIRTY state does _not_
require special handling, but then "Add SGX infrastructure to recover from poison"
goes and relies on it being set.

Alternatively, why not invert it and have SGX_EPC_PAGE_FREE?  That would have
clear semantics, the poison recovery code wouldn't have to assume that !flags
means "free", and the whole changelog becomes:

  Add a flag to explicitly track whether or not an EPC page is on a free list,
  memory failure recovery code needs to be able to detect if a poisoned page is
  free so that recovery can know if it's safe to "steal" the page.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-11 18:59   ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-15 23:07     ` Sean Christopherson
  2021-10-15 23:32       ` Luck, Tony
  0 siblings, 1 reply; 38+ messages in thread
From: Sean Christopherson @ 2021-10-15 23:07 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Mon, Oct 11, 2021, Tony Luck wrote:
> A memory controller patrol scrubber can report poison in a page
> that isn't currently being used.
> 
> Add "poison" field in the sgx_epc_page that can be set for an
> sgx_epc_page. Check for it:
> 1) When sanitizing dirty pages
> 2) When freeing epc pages
> 
> Poison is a new field separated from flags to avoid having to make
> all updates to flags atomic, or integrate poison state changes into
> some other locking scheme to protect flags.

Explain why atomic would be needed.  I lived in this code for a few years and
still had to look at the source to remember that the reclaimer can set flags
without taking node->lock.

> In both cases place the poisoned page on a list of poisoned epc pages
> to make sure it will not be reallocated.
> 
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
>  arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 09fa42690ff2..653bace26100 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
>  static struct sgx_numa_node *sgx_numa_nodes;
>  
>  static LIST_HEAD(sgx_dirty_page_list);
> +static LIST_HEAD(sgx_poison_page_list);
>  
>  /*
>   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> @@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  
>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>  
> +		if (page->poison) {

Does this need READ_ONCE (and WRITE_ONCE in the writer) to prevent reloading
page->poison since the sanitizer doesn't hold node->lock, i.e. page->poison can
be set any time?  Honest question, I'm terrible with memory ordering rules...

> +			list_del(&page->list);
> +			list_add(&page->list, &sgx_poison_page_list);

list_move()

> +			continue;
> +		}
> +
>  		ret = __eremove(sgx_get_epc_virt_addr(page));
>  		if (!ret) {
>  			/*
> @@ -626,7 +633,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
>  
>  	spin_lock(&node->lock);
>  
> -	list_add_tail(&page->list, &node->free_page_list);
> +	page->owner = NULL;
> +	if (page->poison)
> +		list_add(&page->list, &sgx_poison_page_list);

sgx_poison_page_list is a global list, whereas node->lock is, well, per node.
On a system with multiple EPCs, this could corrupt sgx_poison_page_list if
multiple poisoned pages from different nodes are freed simultaneously.

> +	else
> +		list_add_tail(&page->list, &node->free_page_list);
>  	sgx_nr_free_pages++;
>  	page->flags = 0;
>  
> @@ -658,6 +669,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
>  		section->pages[i].section = index;
>  		section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
>  		section->pages[i].owner = NULL;
> +		section->pages[i].poison = 0;
>  		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
>  	}
>  
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index f9202d3d6278..a990a4c9a00f 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -31,7 +31,8 @@
>  
>  struct sgx_epc_page {
>  	unsigned int section;
> -	unsigned int flags;
> +	u16 flags;
> +	u16 poison;
>  	struct sgx_encl_page *owner;
>  	struct list_head list;
>  };
> 
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-11 18:59   ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-15 23:10     ` Sean Christopherson
  2021-10-15 23:19       ` Luck, Tony
  0 siblings, 1 reply; 38+ messages in thread
From: Sean Christopherson @ 2021-10-15 23:10 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Mon, Oct 11, 2021, Tony Luck wrote:
> +	section = &sgx_epc_sections[page->section];
> +	node = section->node;
> +
> +	spin_lock(&node->lock);
> +
> +	/* Already poisoned? Nothing more to do */
> +	if (page->poison)
> +		goto out;
> +
> +	page->poison = 1;
> +
> +	/*
> +	 * If flags is zero, then the page is on a free list.
> +	 * Move it to the poison page list.
> +	 */
> +	if (!page->flags) {

If the flag is inverted, this becomes

	if (page->flags & SGX_EPC_PAGE_FREE) {

> +		list_del(&page->list);
> +		list_add(&page->list, &sgx_poison_page_list);

list_move(), and needs the same protection for sgx_poison_page_list.

> +		goto out;
> +	}
> +
> +	/*
> +	 * TBD: Add additional plumbing to enable pre-emptive
> +	 * action for asynchronous poison notification. Until
> +	 * then just hope that the poison:
> +	 * a) is not accessed - sgx_free_epc_page() will deal with it
> +	 *    when the user gives it back
> +	 * b) results in a recoverable machine check rather than
> +	 *    a fatal one
> +	 */
> +out:
> +	spin_unlock(&node->lock);
> +	return 0;
> +}
> +
>  /**
>   * A section metric is concatenated in a way that @low bits 12-31 define the
>   * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
> 
> -- 
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-15 23:10     ` Sean Christopherson
@ 2021-10-15 23:19       ` Luck, Tony
  0 siblings, 0 replies; 38+ messages in thread
From: Luck, Tony @ 2021-10-15 23:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Fri, Oct 15, 2021 at 11:10:32PM +0000, Sean Christopherson wrote:
> On Mon, Oct 11, 2021, Tony Luck wrote:
> > +	section = &sgx_epc_sections[page->section];
> > +	node = section->node;
> > +
> > +	spin_lock(&node->lock);
> > +
> > +	/* Already poisoned? Nothing more to do */
> > +	if (page->poison)
> > +		goto out;
> > +
> > +	page->poison = 1;
> > +
> > +	/*
> > +	 * If flags is zero, then the page is on a free list.
> > +	 * Move it to the poison page list.
> > +	 */
> > +	if (!page->flags) {
> 
> If the flag is inverted, this becomes
> 
> 	if (page->flags & SGX_EPC_PAGE_FREE) {

I like the inversion. I'll switch to SGX_EPC_PAGE_FREE

> 
> > +		list_del(&page->list);
> > +		list_add(&page->list, &sgx_poison_page_list);
> 
> list_move(), and needs the same protection for sgx_poison_page_list.

Didn't know list_move() existed. Will change all the lis_del+list_add
into list_move.

Also change the sgx_poison_page_list from global to per-node. Then
the adds will be safe (accessed while holding the node->lock).


Thanks for the review.

-Tony


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-15 23:07     ` Sean Christopherson
@ 2021-10-15 23:32       ` Luck, Tony
  0 siblings, 0 replies; 38+ messages in thread
From: Luck, Tony @ 2021-10-15 23:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Jarkko Sakkinen, Dave Hansen, Cathy Zhang, linux-sgx, linux-acpi,
	linux-mm, Reinette Chatre

On Fri, Oct 15, 2021 at 11:07:48PM +0000, Sean Christopherson wrote:
> On Mon, Oct 11, 2021, Tony Luck wrote:
> > A memory controller patrol scrubber can report poison in a page
> > that isn't currently being used.
> > 
> > Add "poison" field in the sgx_epc_page that can be set for an
> > sgx_epc_page. Check for it:
> > 1) When sanitizing dirty pages
> > 2) When freeing epc pages
> > 
> > Poison is a new field separated from flags to avoid having to make
> > all updates to flags atomic, or integrate poison state changes into
> > some other locking scheme to protect flags.
> 
> Explain why atomic would be needed.  I lived in this code for a few years and
> still had to look at the source to remember that the reclaimer can set flags
> without taking node->lock.

Will add explanation.

> 
> > In both cases place the poisoned page on a list of poisoned epc pages
> > to make sure it will not be reallocated.
> > 
> > Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> > Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/main.c | 14 +++++++++++++-
> >  arch/x86/kernel/cpu/sgx/sgx.h  |  3 ++-
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index 09fa42690ff2..653bace26100 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -43,6 +43,7 @@ static nodemask_t sgx_numa_mask;
> >  static struct sgx_numa_node *sgx_numa_nodes;
> >  
> >  static LIST_HEAD(sgx_dirty_page_list);
> > +static LIST_HEAD(sgx_poison_page_list);
> >  
> >  /*
> >   * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
> > @@ -62,6 +63,12 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
> >  
> >  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
> >  
> > +		if (page->poison) {
> 
> Does this need READ_ONCE (and WRITE_ONCE in the writer) to prevent reloading
> page->poison since the sanitizer doesn't hold node->lock, i.e. page->poison can
> be set any time?  Honest question, I'm terrible with memory ordering rules...
> 

I think it's safe. I set page->poison in arch_memory_failure() while
holding node->lock in kthread context.  So not "at any time".

This particular read is done without holding the lock ... and is thus
racy. But there are a zillion other races early in boot before the EPC
pages get sanitized and moved to the free list. E.g. if an error is
reported before they are added to the sgx_epc_address_space xarray,
then all this code will just ignore the error as "not in Linux
controlled memory".

-Tony


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v10 0/7] Basic recovery for machine checks inside SGX
  2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
                     ` (7 preceding siblings ...)
  2021-10-12 16:48   ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
@ 2021-10-18 20:25   ` Tony Luck
  2021-10-18 20:25     ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
                       ` (7 more replies)
  8 siblings, 8 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck

v10 (based on v5.15-rc6)

Changes since v9:

ACPI reviewers (Rafael): No changes to parts 6 & 7.

MM reviewers (Horiguchi-san): No changes to part 5.

Jarkko:
	Added Reviewed-by tags to remaining patches.
	N.B. I kept the tags on parts 1, 3, 4 because
	changes based on Sean feedback didn't seem
	consequential. Please let me know if you disagree
	and see new problems introduced by me trying to
	follow Sean's feedback.

Sean:
	1) Reverse the polarity of the neutron flow (sorry,
	Dr Who fan will always autocomplete a sentence that
	begins "reverse the polarity" that way.) Actual change
	is for the new flag bit. Instead of marking in-use
	pages with the new bit, mark free pages instead. This
	avoids the weirdness where I marked the pages on the
	dirty list as "in-use", when clearly they are not.

	2) Race conditions adding poisoned pages to the global
	list of poisoned pages.
	Fixed this by changing from a global list to a per-node
	list. Additions are protected by the node->lock.

	3) Use list_move() instead of list_del(); list_add()
	Fixed both places I used this idiom.

	4) Race looking at page->poison when cleaning dirty pages.
	Added a comment documenting why losing this race isn't
	overly harmful.

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 113 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   7 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  14 +++
 mm/memory-failure.c                           |  19 ++-
 9 files changed, 179 insertions(+), 10 deletions(-)


base-commit: 519d81956ee277b4419c723adfb154603c2565ba
-- 
2.31.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
@ 2021-10-18 20:25     ` Tony Luck
  2021-10-18 20:25     ` [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IS_FREE bit is cleared.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 2 ++
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..825aa91516c8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = SGX_EPC_PAGE_IS_FREE;
 
 	spin_unlock(&node->lock);
 }
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..5906471156c5 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Pages on free list */
+#define SGX_EPC_PAGE_IS_FREE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
  2021-10-18 20:25     ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
@ 2021-10-18 20:25     ` Tony Luck
  2021-10-18 20:25     ` [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 825aa91516c8..5c02cffdabc8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
  2021-10-18 20:25     ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
  2021-10-18 20:25     ` [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-18 20:25     ` Tony Luck
  2021-10-18 20:25     ` [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).

In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  4 +++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 5c02cffdabc8..e5fcb8354bcc 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -62,6 +62,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		/*
+		 * Checking page->poison without holding the node->lock
+		 * is racy, but losing the race (i.e. poison is set just
+		 * after the check) just means __eremove() will be uselessly
+		 * called for a page that sgx_free_epc_page() will put onto
+		 * the node->sgx_poison_page_list later.
+		 */
+		if (page->poison) {
+			struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+			struct sgx_numa_node *node = section->node;
+
+			spin_lock(&node->lock);
+			list_move(&page->list, &node->sgx_poison_page_list);
+			spin_unlock(&node->lock);
+
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +644,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &node->sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = SGX_EPC_PAGE_IS_FREE;
 
@@ -658,6 +680,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
@@ -724,6 +747,7 @@ static bool __init sgx_page_cache_init(void)
 		if (!node_isset(nid, sgx_numa_mask)) {
 			spin_lock_init(&sgx_numa_nodes[nid].lock);
 			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
 			node_set(nid, sgx_numa_mask);
 		}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5906471156c5..9ec3136c7800 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
@@ -42,6 +43,7 @@ struct sgx_epc_page {
  */
 struct sgx_numa_node {
 	struct list_head free_page_list;
+	struct list_head sgx_poison_page_list;
 	spinlock_t lock;
 };
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
                       ` (2 preceding siblings ...)
  2021-10-18 20:25     ` [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-18 20:25     ` Tony Luck
  2021-10-18 20:25     ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 76 ++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index e5fcb8354bcc..231c494dfd40 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -693,6 +693,82 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If the page is on a free list, move it to the per-node
+	 * poison page list.
+	 */
+	if (page->flags & SGX_EPC_PAGE_IS_FREE) {
+		list_move(&page->list, &node->sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
                       ` (3 preceding siblings ...)
  2021-10-18 20:25     ` [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-18 20:25     ` Tony Luck
  2021-10-20  9:06       ` Naoya Horiguchi
  2021-10-18 20:25     ` [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 14 ++++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..62b199ed5ec6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
 	return 0;
 }
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
                       ` (4 preceding siblings ...)
  2021-10-18 20:25     ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-18 20:25     ` Tony Luck
  2021-10-18 20:25     ` [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
                       ` (5 preceding siblings ...)
  2021-10-18 20:25     ` [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-18 20:25     ` Tony Luck
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  7 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-18 20:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, naoya.horiguchi
  Cc: Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, linux-acpi, linux-mm, Tony Luck,
	Reinette Chatre

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-18 20:25     ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-20  9:06       ` Naoya Horiguchi
  2021-10-20 17:04         ` Luck, Tony
  0 siblings, 1 reply; 38+ messages in thread
From: Naoya Horiguchi @ 2021-10-20  9:06 UTC (permalink / raw)
  To: Tony Luck
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, Reinette Chatre

On Mon, Oct 18, 2021 at 01:25:40PM -0700, Tony Luck wrote:
> Add a call inside memory_failure() to call the arch specific code
> to check if the address is an SGX EPC page and handle it.
> 
> Note the SGX EPC pages do not have a "struct page" entry, so the hook
> goes in at the same point as the device mapping hook.
> 
> Pull the call to acquire the mutex earlier so the SGX errors are also
> protected.
> 
> Make set_mce_nospec() skip SGX pages when trying to adjust
> the 1:1 map.
> 
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
...
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..62b199ed5ec6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3284,5 +3284,19 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
>  	return 0;
>  }
>  
> +#ifndef arch_memory_failure
> +static inline int arch_memory_failure(unsigned long pfn, int flags)
> +{
> +	return -ENXIO;
> +}
> +#endif
> +
> +#ifndef arch_is_platform_page
> +static inline bool arch_is_platform_page(u64 paddr)
> +{
> +	return false;
> +}
> +#endif
> +

How about putting these definitions near the other related functions
in the same file (like below)?

  ...
  extern void shake_page(struct page *p);
  extern atomic_long_t num_poisoned_pages __read_mostly;
  extern int soft_offline_page(unsigned long pfn, int flags);
  
  // here?
  
  /*
   * Error handlers for various types of pages.
   */
  enum mf_result {

Otherwise, the patch looks good to me.

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks,
Naoya Horiguchi


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-20  9:06       ` Naoya Horiguchi
@ 2021-10-20 17:04         ` Luck, Tony
  0 siblings, 0 replies; 38+ messages in thread
From: Luck, Tony @ 2021-10-20 17:04 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Wysocki, Rafael J, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Hansen, Dave, Zhang, Cathy,
	linux-sgx, linux-acpi, linux-mm, Chatre, Reinette

> How about putting these definitions near the other related functions
> in the same file (like below)?
>
>  ...
>  extern void shake_page(struct page *p);
>  extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(unsigned long pfn, int flags);
>  
>  // here?

Makes sense to group together with these other RAS bits.
I'll move the definitions here.
  

> Otherwise, the patch looks good to me.
>
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks for the review!

-Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-11 18:59   ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-22 10:43     ` kernel test robot
  0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2021-10-22 10:43 UTC (permalink / raw)
  To: Tony Luck, Rafael J. Wysocki, naoya.horiguchi
  Cc: kbuild-all, Andrew Morton, Linux Memory Management List,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 2730 bytes --]

Hi Tony,

I love your patch! Yet something to improve:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on hnaz-mm/master tip/x86/sgx v5.15-rc6 next-20211021]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tony-Luck/x86-sgx-Add-new-sgx_epc_page-flag-bit-to-mark-in-use-pages/20211012-035926
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-a011-20211011 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/9c7bd2907252bfbf4948be9855e3535319e1e9e4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tony-Luck/x86-sgx-Add-new-sgx_epc_page-flag-bit-to-mark-in-use-pages/20211012-035926
        git checkout 9c7bd2907252bfbf4948be9855e3535319e1e9e4
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: arch/x86/kernel/cpu/sgx/main.o: in function `sgx_setup_epc_section':
>> arch/x86/kernel/cpu/sgx/main.c:654: undefined reference to `xa_store_range'


vim +654 arch/x86/kernel/cpu/sgx/main.c

   635	
   636	static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
   637						 unsigned long index,
   638						 struct sgx_epc_section *section)
   639	{
   640		unsigned long nr_pages = size >> PAGE_SHIFT;
   641		unsigned long i;
   642	
   643		section->virt_addr = memremap(phys_addr, size, MEMREMAP_WB);
   644		if (!section->virt_addr)
   645			return false;
   646	
   647		section->pages = vmalloc(nr_pages * sizeof(struct sgx_epc_page));
   648		if (!section->pages) {
   649			memunmap(section->virt_addr);
   650			return false;
   651		}
   652	
   653		section->phys_addr = phys_addr;
 > 654		xa_store_range(&sgx_epc_address_space, section->phys_addr,
   655			       phys_addr + size - 1, section, GFP_KERNEL);
   656	
   657		for (i = 0; i < nr_pages; i++) {
   658			section->pages[i].section = index;
   659			section->pages[i].flags = SGX_EPC_PAGE_IN_USE;
   660			section->pages[i].owner = NULL;
   661			list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
   662		}
   663	
   664		return true;
   665	}
   666	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33705 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 0/7] Basic recovery for machine checks inside SGX
  2021-10-18 20:25   ` [PATCH v10 " Tony Luck
                       ` (6 preceding siblings ...)
  2021-10-18 20:25     ` [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-26 22:00     ` Tony Luck
  2021-10-26 22:00       ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
                         ` (6 more replies)
  7 siblings, 7 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck

Boris,

I took this series out of lkml/x86 for a few revisions, I think
the last one posted to lkml was v5. So much has changed since then
that it might be easier to just look at this as if it were v1 and
ignore the earlier history.

First four patches add infrastructure within the SGX code to
track enclave pages (because these pages don't have a "struct
page" as they aren't directly accessible by Linux). All have
"Reviewed-by" tags from Jarkko (SGX maintainer).

Patch 5 hooks into memory_failure() to invoke recovery if
the physical address is in enclave space. This has a
"Reviewed-by" tag from Naoya Horiguchi the maintainer for
mm/memory-failure.c

Patch 6 is a hook into the error injection code and addition
to the error injection documentation explaining extra steps
needed to inject into SGX enclave memory.

Patch 7 is a hook into GHES error reporting path to recognize
that SGX enclave addresses are valid and need processing.

-Tony

Tony Luck (7):
  x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  x86/sgx: Add infrastructure to identify SGX EPC pages
  x86/sgx: Initial poison handling for dirty and free pages
  x86/sgx: Add SGX infrastructure to recover from poison
  x86/sgx: Hook arch_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/Kconfig                              |   1 +
 arch/x86/include/asm/processor.h              |   8 ++
 arch/x86/include/asm/set_memory.h             |   4 +
 arch/x86/kernel/cpu/sgx/main.c                | 113 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |   7 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 drivers/acpi/apei/ghes.c                      |   2 +-
 include/linux/mm.h                            |  13 ++
 mm/memory-failure.c                           |  19 ++-
 10 files changed, 179 insertions(+), 10 deletions(-)


base-commit: 3906fe9bb7f1a2c8667ae54e967dc8690824f4ea
-- 
2.31.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
@ 2021-10-26 22:00       ` Tony Luck
  2021-10-26 22:00       ` [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

SGX EPC pages go through the following life cycle:

        DIRTY ---> FREE ---> IN-USE --\
                    ^                 |
                    \-----------------/

Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.

Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.

Notes:

1) These transitions are made while holding the node->lock so that
   future code that checks the flags while holding the node->lock
   can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
   page is on the free list.

2) Initially while the pages are on the dirty list the
   SGX_EPC_PAGE_IS_FREE bit is cleared.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 2 ++
 arch/x86/kernel/cpu/sgx/sgx.h  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 63d3de02bbcc..825aa91516c8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -472,6 +472,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
 	page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
 	list_del_init(&page->list);
 	sgx_nr_free_pages--;
+	page->flags = 0;
 
 	spin_unlock(&node->lock);
 
@@ -626,6 +627,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
+	page->flags = SGX_EPC_PAGE_IS_FREE;
 
 	spin_unlock(&node->lock);
 }
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..5906471156c5 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -26,6 +26,9 @@
 /* Pages, which are being tracked by the page reclaimer. */
 #define SGX_EPC_PAGE_RECLAIMER_TRACKED	BIT(0)
 
+/* Pages on free list */
+#define SGX_EPC_PAGE_IS_FREE		BIT(1)
+
 struct sgx_epc_page {
 	unsigned int section;
 	unsigned int flags;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-26 22:00       ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
@ 2021-10-26 22:00       ` Tony Luck
  2021-10-26 22:00       ` [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page".  If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/Kconfig               | 1 +
 arch/x86/kernel/cpu/sgx/main.c | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9830e7e1060..b3b5b5a31f89 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1902,6 +1902,7 @@ config X86_SGX
 	select SRCU
 	select MMU_NOTIFIER
 	select NUMA_KEEP_MEMINFO if NUMA
+	select XARRAY_MULTI
 	help
 	  Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
 	  that can be used by applications to set aside private regions of code
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 825aa91516c8..5c02cffdabc8 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -20,6 +20,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
 static int sgx_nr_epc_sections;
 static struct task_struct *ksgxd_tsk;
 static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
+static DEFINE_XARRAY(sgx_epc_address_space);
 
 /*
  * These variables are part of the state of the reclaimer, and must be accessed
@@ -650,6 +651,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	xa_store_range(&sgx_epc_address_space, section->phys_addr,
+		       phys_addr + size - 1, section, GFP_KERNEL);
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -661,6 +664,12 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+bool arch_is_platform_page(u64 paddr)
+{
+	return !!xa_load(&sgx_epc_address_space, paddr);
+}
+EXPORT_SYMBOL_GPL(arch_is_platform_page);
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
  2021-10-26 22:00       ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
  2021-10-26 22:00       ` [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
@ 2021-10-26 22:00       ` Tony Luck
  2021-10-26 22:00       ` [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

A memory controller patrol scrubber can report poison in a page
that isn't currently being used.

Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages

Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).

In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h  |  4 +++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 5c02cffdabc8..e5fcb8354bcc 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -62,6 +62,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
+		/*
+		 * Checking page->poison without holding the node->lock
+		 * is racy, but losing the race (i.e. poison is set just
+		 * after the check) just means __eremove() will be uselessly
+		 * called for a page that sgx_free_epc_page() will put onto
+		 * the node->sgx_poison_page_list later.
+		 */
+		if (page->poison) {
+			struct sgx_epc_section *section = &sgx_epc_sections[page->section];
+			struct sgx_numa_node *node = section->node;
+
+			spin_lock(&node->lock);
+			list_move(&page->list, &node->sgx_poison_page_list);
+			spin_unlock(&node->lock);
+
+			continue;
+		}
+
 		ret = __eremove(sgx_get_epc_virt_addr(page));
 		if (!ret) {
 			/*
@@ -626,7 +644,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
 
 	spin_lock(&node->lock);
 
-	list_add_tail(&page->list, &node->free_page_list);
+	page->owner = NULL;
+	if (page->poison)
+		list_add(&page->list, &node->sgx_poison_page_list);
+	else
+		list_add_tail(&page->list, &node->free_page_list);
 	sgx_nr_free_pages++;
 	page->flags = SGX_EPC_PAGE_IS_FREE;
 
@@ -658,6 +680,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 		section->pages[i].section = index;
 		section->pages[i].flags = 0;
 		section->pages[i].owner = NULL;
+		section->pages[i].poison = 0;
 		list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
 	}
 
@@ -724,6 +747,7 @@ static bool __init sgx_page_cache_init(void)
 		if (!node_isset(nid, sgx_numa_mask)) {
 			spin_lock_init(&sgx_numa_nodes[nid].lock);
 			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
+			INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
 			node_set(nid, sgx_numa_mask);
 		}
 
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 5906471156c5..9ec3136c7800 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -31,7 +31,8 @@
 
 struct sgx_epc_page {
 	unsigned int section;
-	unsigned int flags;
+	u16 flags;
+	u16 poison;
 	struct sgx_encl_page *owner;
 	struct list_head list;
 };
@@ -42,6 +43,7 @@ struct sgx_epc_page {
  */
 struct sgx_numa_node {
 	struct list_head free_page_list;
+	struct list_head sgx_poison_page_list;
 	spinlock_t lock;
 };
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                         ` (2 preceding siblings ...)
  2021-10-26 22:00       ` [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
@ 2021-10-26 22:00       ` Tony Luck
  2021-10-26 22:00       ` [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.

Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 76 ++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index e5fcb8354bcc..231c494dfd40 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -693,6 +693,82 @@ bool arch_is_platform_page(u64 paddr)
 }
 EXPORT_SYMBOL_GPL(arch_is_platform_page);
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+
+	section = xa_load(&sgx_epc_address_space, paddr);
+	if (!section)
+		return NULL;
+
+	return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+}
+
+/*
+ * Called in process context to handle a hardware reported
+ * error in an SGX EPC page.
+ * If the MF_ACTION_REQUIRED bit is set in flags, then the
+ * context is the task that consumed the poison data. Otherwise
+ * this is called from a kernel thread unrelated to the page.
+ */
+int arch_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_numa_node *node;
+
+	/*
+	 * mm/memory-failure.c calls this routine for all errors
+	 * where there isn't a "struct page" for the address. But that
+	 * includes other address ranges besides SGX.
+	 */
+	if (!page)
+		return -ENXIO;
+
+	/*
+	 * If poison was consumed synchronously. Send a SIGBUS to
+	 * the task. Hardware has already exited the SGX enclave and
+	 * will not allow re-entry to an enclave that has a memory
+	 * error. The signal may help the task understand why the
+	 * enclave is broken.
+	 */
+	if (flags & MF_ACTION_REQUIRED)
+		force_sig(SIGBUS);
+
+	section = &sgx_epc_sections[page->section];
+	node = section->node;
+
+	spin_lock(&node->lock);
+
+	/* Already poisoned? Nothing more to do */
+	if (page->poison)
+		goto out;
+
+	page->poison = 1;
+
+	/*
+	 * If the page is on a free list, move it to the per-node
+	 * poison page list.
+	 */
+	if (page->flags & SGX_EPC_PAGE_IS_FREE) {
+		list_move(&page->list, &node->sgx_poison_page_list);
+		goto out;
+	}
+
+	/*
+	 * TBD: Add additional plumbing to enable pre-emptive
+	 * action for asynchronous poison notification. Until
+	 * then just hope that the poison:
+	 * a) is not accessed - sgx_free_epc_page() will deal with it
+	 *    when the user gives it back
+	 * b) results in a recoverable machine check rather than
+	 *    a fatal one
+	 */
+out:
+	spin_unlock(&node->lock);
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                         ` (3 preceding siblings ...)
  2021-10-26 22:00       ` [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
@ 2021-10-26 22:00       ` Tony Luck
  2021-10-26 22:00       ` [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
  2021-10-26 22:00       ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  6 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

Add a call inside memory_failure() to call the arch specific code
to check if the address is an SGX EPC page and handle it.

Note the SGX EPC pages do not have a "struct page" entry, so the hook
goes in at the same point as the device mapping hook.

Pull the call to acquire the mutex earlier so the SGX errors are also
protected.

Make set_mce_nospec() skip SGX pages when trying to adjust
the 1:1 map.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h  |  8 ++++++++
 arch/x86/include/asm/set_memory.h |  4 ++++
 include/linux/mm.h                | 13 +++++++++++++
 mm/memory-failure.c               | 19 +++++++++++++------
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..4865f2860a4f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -853,4 +853,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+#ifdef CONFIG_X86_SGX
+int arch_memory_failure(unsigned long pfn, int flags);
+#define arch_memory_failure arch_memory_failure
+
+bool arch_is_platform_page(u64 paddr);
+#define arch_is_platform_page arch_is_platform_page
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..ce8dd215f5b3 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_SET_MEMORY_H
 #define _ASM_X86_SET_MEMORY_H
 
+#include <linux/mm.h>
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
@@ -98,6 +99,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
 	unsigned long decoy_addr;
 	int rc;
 
+	/* SGX pages are not in the 1:1 map */
+	if (arch_is_platform_page(pfn << PAGE_SHIFT))
+		return 0;
 	/*
 	 * We would like to just call:
 	 *      set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..0aa48b238db2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3124,6 +3124,19 @@ extern void shake_page(struct page *p);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(unsigned long pfn, int flags);
 
+#ifndef arch_memory_failure
+static inline int arch_memory_failure(unsigned long pfn, int flags)
+{
+	return -ENXIO;
+}
+#endif
+
+#ifndef arch_is_platform_page
+static inline bool arch_is_platform_page(u64 paddr)
+{
+	return false;
+}
+#endif
 
 /*
  * Error handlers for various types of pages.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..b1cbf9845c19 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1632,21 +1632,28 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
+	mutex_lock(&mf_mutex);
+
 	p = pfn_to_online_page(pfn);
 	if (!p) {
+		res = arch_memory_failure(pfn, flags);
+		if (res == 0)
+			goto unlock_mutex;
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
-			if (pgmap)
-				return memory_failure_dev_pagemap(pfn, flags,
-								  pgmap);
+			if (pgmap) {
+				res = memory_failure_dev_pagemap(pfn, flags,
+								 pgmap);
+				goto unlock_mutex;
+			}
 		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
-		return -ENXIO;
+		res = -ENXIO;
+		goto unlock_mutex;
 	}
 
-	mutex_lock(&mf_mutex);
-
 try_again:
 	if (PageHuge(p)) {
 		res = memory_failure_hugetlb(pfn, flags);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                         ` (4 preceding siblings ...)
  2021-10-26 22:00       ` [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
@ 2021-10-26 22:00       ` Tony Luck
  2021-10-26 22:00       ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
  6 siblings, 0 replies; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

SGX reserved memory does not appear in the standard address maps.

Add hook to call into the SGX code to check if an address is located
in SGX memory.

There are other challenges in injecting errors into SGX. Update the
documentation with a sequence of operations to inject.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../firmware-guide/acpi/apei/einj.rst         | 19 +++++++++++++++++++
 drivers/acpi/apei/einj.c                      |  3 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index c042176e1707..55e2331a6438 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,5 +181,24 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+Special notes for injection into SGX enclaves:
+
+There may be a separate BIOS setup option to enable SGX injection.
+
+The injection process consists of setting some special memory controller
+trigger that will inject the error on the next write to the target
+address. But the h/w prevents any software outside of an SGX enclave
+from accessing enclave pages (even BIOS SMM mode).
+
+The following sequence can be used:
+  1) Determine physical address of enclave page
+  2) Use "notrigger=1" mode to inject (this will setup
+     the injection address, but will not actually inject)
+  3) Enter the enclave
+  4) Store data to the virtual address matching physical address from step 1
+  5) Execute CLFLUSH for that virtual address
+  6) Spin delay for 250ms
+  7) Read from the virtual address. This will trigger the error
+
 For more information about EINJ, please refer to ACPI specification
 version 4.0, section 17.5 and ACPI 5.0, section 18.6.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 2882450c443e..67c335baad52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -544,7 +544,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	    ((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
 				!= REGION_INTERSECTS) &&
 	     (region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
-				!= REGION_INTERSECTS)))
+				!= REGION_INTERSECTS) &&
+	     !arch_is_platform_page(base_addr)))
 		return -EINVAL;
 
 inject:
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
                         ` (5 preceding siblings ...)
  2021-10-26 22:00       ` [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
@ 2021-10-26 22:00       ` Tony Luck
  2021-10-29 18:39         ` Rafael J. Wysocki
  6 siblings, 1 reply; 38+ messages in thread
From: Tony Luck @ 2021-10-26 22:00 UTC (permalink / raw)
  To: Borislav Petkov, x86
  Cc: Rafael J. Wysocki, naoya.horiguchi, Andrew Morton,
	Sean Christopherson, Jarkko Sakkinen, Dave Hansen, Cathy Zhang,
	linux-sgx, linux-acpi, linux-mm, linux-kernel, Tony Luck,
	Reinette Chatre

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0c8330ed1ffd..0c5c9acc6254 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn)) {
+	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		physical_addr);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  2021-10-26 22:00       ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
@ 2021-10-29 18:39         ` Rafael J. Wysocki
  0 siblings, 0 replies; 38+ messages in thread
From: Rafael J. Wysocki @ 2021-10-29 18:39 UTC (permalink / raw)
  To: Tony Luck
  Cc: Borislav Petkov, the arch/x86 maintainers, Rafael J. Wysocki,
	HORIGUCHI NAOYA(堀口 直也),
	Andrew Morton, Sean Christopherson, Jarkko Sakkinen, Dave Hansen,
	Cathy Zhang, linux-sgx, ACPI Devel Maling List,
	Linux Memory Management List, Linux Kernel Mailing List,
	Reinette Chatre

On Wed, Oct 27, 2021 at 12:01 AM Tony Luck <tony.luck@intel.com> wrote:
>
> SGX EPC pages do not have a "struct page" associated with them so the
> pfn_valid() sanity check fails and results in a warning message to
> the console.
>
> Add an additional check to skip the warning if the address of the error
> is in an SGX EPC page.
>
> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/acpi/apei/ghes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0c8330ed1ffd..0c5c9acc6254 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
>                 return false;
>
>         pfn = PHYS_PFN(physical_addr);
> -       if (!pfn_valid(pfn)) {
> +       if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
>                 pr_warn_ratelimited(FW_WARN GHES_PFX
>                 "Invalid address in generic error data: %#llx\n",
>                 physical_addr);
> --
> 2.31.1
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2021-10-29 18:39 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20211001164724.220532-1-tony.luck@intel.com>
2021-10-11 18:59 ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Tony Luck
2021-10-11 18:59   ` [PATCH v9 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages Tony Luck
2021-10-15 22:57     ` Sean Christopherson
2021-10-11 18:59   ` [PATCH v9 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-10-22 10:43     ` kernel test robot
2021-10-11 18:59   ` [PATCH v9 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-10-15 23:07     ` Sean Christopherson
2021-10-15 23:32       ` Luck, Tony
2021-10-11 18:59   ` [PATCH v9 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-10-15 23:10     ` Sean Christopherson
2021-10-15 23:19       ` Luck, Tony
2021-10-11 18:59   ` [PATCH v9 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-10-12 16:49     ` Jarkko Sakkinen
2021-10-11 18:59   ` [PATCH v9 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-10-12 16:50     ` Jarkko Sakkinen
2021-10-11 18:59   ` [PATCH v9 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-12 16:51     ` Jarkko Sakkinen
2021-10-12 16:48   ` [PATCH v9 0/7] Basic recovery for machine checks inside SGX Jarkko Sakkinen
2021-10-12 17:57     ` Luck, Tony
2021-10-18 20:25   ` [PATCH v10 " Tony Luck
2021-10-18 20:25     ` [PATCH v10 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
2021-10-18 20:25     ` [PATCH v10 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-10-18 20:25     ` [PATCH v10 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-10-18 20:25     ` [PATCH v10 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-10-18 20:25     ` [PATCH v10 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-10-20  9:06       ` Naoya Horiguchi
2021-10-20 17:04         ` Luck, Tony
2021-10-18 20:25     ` [PATCH v10 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-10-18 20:25     ` [PATCH v10 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-26 22:00     ` [PATCH v11 0/7] Basic recovery for machine checks inside SGX Tony Luck
2021-10-26 22:00       ` [PATCH v11 1/7] x86/sgx: Add new sgx_epc_page flag bit to mark free pages Tony Luck
2021-10-26 22:00       ` [PATCH v11 2/7] x86/sgx: Add infrastructure to identify SGX EPC pages Tony Luck
2021-10-26 22:00       ` [PATCH v11 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-10-26 22:00       ` [PATCH v11 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-10-26 22:00       ` [PATCH v11 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-10-26 22:00       ` [PATCH v11 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-10-26 22:00       ` [PATCH v11 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck
2021-10-29 18:39         ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).