linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tony Luck <tony.luck@intel.com>
To: tony.luck@intel.com
Cc: Jarkko Sakkinen <jarkko@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, linux-sgx@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory
Date: Thu,  8 Jul 2021 11:14:21 -0700	[thread overview]
Message-ID: <20210708181423.1312359-3-tony.luck@intel.com> (raw)
In-Reply-To: <20210708181423.1312359-1-tony.luck@intel.com>

X86 machine check architecture reports a physical address when there is
a memory error.

Add an end_phys_addr field to the sgx_epc_section structure and a new
function sgx_paddr_to_page() that searches all such structures to see
if a physical address is part of an SGX EPC page.

The ACPI EINJ module has a sanity check that the injection address is
valid. Add and export a function sgx_is_epc_page() so that it can be
changed to allow injection to SGX EPC pages.

Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously inside the enclave send a SIGBUS just the
same as for poison consumption outside of an enclave.

For asynchronously reported errors the easiest cases are when the page
is currently free. Just drop the page from the free list so that it will
never be allocated.

An SGX_PAGE_TYPE_REG page can just be unmapped from the enclave. If the
enclave doesn't ever touch that page again all is well. If it does touch
the page it will die. Possible future enhancement here to mark the
unmapped PTE so that it will cause a SIGBUS.

SGX_PAGE_TYPE_KVM pages may be recoverable depending on how they are
being used by guests. To fix that properly requires injecting the machine
check into the guest. For now just kill it.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 126 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h  |   3 +-
 arch/x86/kernel/cpu/sgx/virt.c |   9 +++
 3 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 643df87b3e01..4a354f89373e 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -664,6 +664,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	}
 
 	section->phys_addr = phys_addr;
+	section->end_phys_addr = phys_addr + size - 1;
 
 	for (i = 0; i < nr_pages; i++) {
 		section->pages[i].section = index;
@@ -676,6 +677,131 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
 	return true;
 }
 
+static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
+{
+	struct sgx_epc_section *section;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
+		section = &sgx_epc_sections[i];
+
+		if (paddr < section->phys_addr || paddr > section->end_phys_addr)
+			continue;
+
+		return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
+	}
+
+	return NULL;
+}
+
+bool sgx_is_epc_page(u64 paddr)
+{
+	return !!sgx_paddr_to_page(paddr);
+}
+EXPORT_SYMBOL_GPL(sgx_is_epc_page);
+
+void __attribute__((weak)) sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+}
+
+/*
+ * This can be called in three ways:
+ * a) When an enclave has just consumed poison. In this case
+ *    calling process context is the owner of the enclave.
+ * b) For some asychronous poison notification (e.g. patrol
+ *    scrubber or speculative execution saw the poison.)
+ *    In this case calling context is a kernel thread.
+ * c) Someone asked to take a page offline using the
+ *    /sys/devices/system/memory/{hard,soft}_offline_page.
+ *    In this case caller is the process writing these files.
+ */
+int sgx_memory_failure(unsigned long pfn, int flags)
+{
+	struct sgx_epc_page *epc_page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
+	struct sgx_epc_section *section;
+	struct sgx_encl_page *encl_page;
+	struct sgx_numa_node *node;
+	unsigned long addr;
+	u16 page_flags;
+
+	if (!epc_page)
+		return -ENXIO;
+
+	spin_lock(&sgx_reclaimer_lock);
+
+	page_flags = epc_page->flags;
+	epc_page->flags |= SGX_EPC_PAGE_POISON;
+
+	/*
+	 * Poison was synchronously consumed by an enclave in the current
+	 * task. Send a SIGBUS here. Hardware should prevent this enclave
+	 * from being re-entered, so no concern that the poisoned page
+	 * will be touched a second time. The poisoned EPC page will be
+	 * dropped as pages are freed during task exit.
+	 */
+	if (flags & MF_ACTION_REQUIRED) {
+		if (epc_page->type == SGX_PAGE_TYPE_REG) {
+			encl_page = epc_page->owner;
+			addr = encl_page->desc & PAGE_MASK;
+			force_sig_mceerr(BUS_MCEERR_AR, (void *)addr, PAGE_SHIFT);
+		} else {
+			force_sig(SIGBUS);
+		}
+		goto unlock;
+	}
+
+	section = &sgx_epc_sections[epc_page->section];
+	node = section->node;
+
+	if (page_flags & SGX_EPC_PAGE_POISON)
+		goto unlock;
+
+	if (page_flags & SGX_EPC_PAGE_DIRTY) {
+		list_del(&epc_page->list);
+	} else if (page_flags & SGX_EPC_PAGE_FREE) {
+		spin_lock(&node->lock);
+		epc_page->owner = NULL;
+		list_del(&epc_page->list);
+		sgx_nr_free_pages--;
+		spin_unlock(&node->lock);
+		goto unlock;
+	}
+
+	switch (epc_page->type) {
+	case SGX_PAGE_TYPE_REG:
+		encl_page = epc_page->owner;
+		/* Unmap the page, unless it was already in progress to be freed */
+		if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) {
+			spin_unlock(&sgx_reclaimer_lock);
+			sgx_reclaimer_block(epc_page);
+			kref_put(&encl_page->encl->refcount, sgx_encl_release);
+			goto done;
+		}
+		break;
+
+	case SGX_PAGE_TYPE_KVM:
+		spin_unlock(&sgx_reclaimer_lock);
+		sgx_memory_failure_vepc(epc_page);
+		break;
+
+	default:
+		/*
+		 * I don't expect SECS, TCS, VA pages would result
+		 * in recoverable machine checks. If it turns out
+		 * that they do, then add cases for recovery for
+		 * each of them.
+		 */
+		panic("Poisoned active SGX page type %d\n", epc_page->type);
+		break;
+	}
+	goto done;
+
+unlock:
+	spin_unlock(&sgx_reclaimer_lock);
+done:
+	return 0;
+}
+
 /**
  * A section metric is concatenated in a way that @low bits 12-31 define the
  * bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index e43d3c27eb96..7701f5f88b61 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -39,7 +39,7 @@ struct sgx_epc_page {
 	unsigned int section;
 	u16 flags;
 	u16 type;
-	struct sgx_encl_page *owner;
+	void *owner;
 	struct list_head list;
 };
 
@@ -60,6 +60,7 @@ struct sgx_numa_node {
  */
 struct sgx_epc_section {
 	unsigned long phys_addr;
+	unsigned long end_phys_addr;
 	void *virt_addr;
 	struct sgx_epc_page *pages;
 	struct sgx_numa_node *node;
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 044dd92ebd63..08918166394f 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -21,6 +21,7 @@
 struct sgx_vepc {
 	struct xarray page_array;
 	struct mutex lock;
+	struct task_struct *task;
 };
 
 /*
@@ -218,6 +219,13 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+void sgx_memory_failure_vepc(struct sgx_epc_page *epc_page)
+{
+	struct sgx_vepc *vepc = epc_page->owner;
+
+	send_sig(SIGBUS, vepc->task, false);
+}
+
 static int sgx_vepc_open(struct inode *inode, struct file *file)
 {
 	struct sgx_vepc *vepc;
@@ -227,6 +235,7 @@ static int sgx_vepc_open(struct inode *inode, struct file *file)
 		return -ENOMEM;
 	mutex_init(&vepc->lock);
 	xa_init(&vepc->page_array);
+	vepc->task = current;
 
 	file->private_data = vepc;
 
-- 
2.29.2


  parent reply	other threads:[~2021-07-08 18:14 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08 18:14 [PATCH 0/4] Basic recovery for machine checks inside SGX Tony Luck
2021-07-08 18:14 ` [PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
2021-07-09 18:08   ` Jarkko Sakkinen
2021-07-09 18:09     ` Jarkko Sakkinen
2021-07-14 20:42   ` Reinette Chatre
2021-07-14 20:59     ` Luck, Tony
2021-07-14 21:21       ` Reinette Chatre
2021-07-14 23:08         ` Sean Christopherson
2021-07-14 23:39           ` Luck, Tony
2021-07-15 15:33             ` Sean Christopherson
2021-07-08 18:14 ` Tony Luck [this message]
2021-07-08 18:14 ` [PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-07-08 18:14 ` [PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck
2021-07-19 18:20 ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Tony Luck
2021-07-19 18:20   ` [PATCH v2 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-07-19 18:28     ` Dave Hansen
2021-07-27  2:04     ` Sakkinen, Jarkko
2021-07-19 18:20   ` [PATCH v2 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-07-19 18:20   ` [PATCH v2 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-07-27  2:08     ` Sakkinen, Jarkko
2021-07-19 18:20   ` [PATCH v2 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-07-19 18:20   ` [PATCH v2 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-07-19 18:20   ` [PATCH v2 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
2021-07-27  1:54   ` [PATCH v2 0/6] Basic recovery for machine checks inside SGX Sakkinen, Jarkko
2021-07-28 20:46   ` [PATCH v3 0/7] " Tony Luck
2021-07-28 20:46     ` [PATCH v3 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-07-28 22:12       ` Dave Hansen
2021-07-28 22:57         ` Luck, Tony
2021-07-28 23:12           ` Dave Hansen
2021-07-28 23:32             ` Sean Christopherson
2021-07-28 23:48               ` Luck, Tony
2021-07-29  0:07                 ` Sean Christopherson
2021-07-29  0:42                   ` Luck, Tony
2021-07-30  0:34           ` Jarkko Sakkinen
2021-07-30  0:33         ` Jarkko Sakkinen
2021-07-28 20:46     ` [PATCH v3 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-07-28 22:19       ` Dave Hansen
2021-07-30  0:38         ` Jarkko Sakkinen
2021-07-30 16:46           ` Sean Christopherson
2021-07-30 16:50             ` Dave Hansen
2021-07-30 18:44               ` Luck, Tony
2021-07-30 20:35                 ` Dave Hansen
2021-07-30 23:35                   ` Luck, Tony
2021-08-03 21:34                     ` Matthew Wilcox
2021-08-03 23:49                       ` Luck, Tony
2021-08-02  8:52                 ` Jarkko Sakkinen
2021-08-02  8:51               ` Jarkko Sakkinen
2021-08-02  8:48             ` Jarkko Sakkinen
2021-07-28 20:46     ` [PATCH v3 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-07-30  0:42       ` Jarkko Sakkinen
2021-07-28 20:46     ` [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-07-28 22:29       ` Dave Hansen
2021-07-28 23:00         ` Sean Christopherson
2021-07-28 20:46     ` [PATCH v3 5/7] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-07-28 20:46     ` [PATCH v3 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-07-28 20:46     ` [PATCH v3 7/7] x86/sgx: Add documentation for SGX memory errors Tony Luck
2021-08-27 19:55     ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Tony Luck
2021-08-27 19:55       ` [PATCH v4 1/6] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-09-01  3:55         ` Jarkko Sakkinen
2021-08-27 19:55       ` [PATCH v4 2/6] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-09-01  4:30         ` Jarkko Sakkinen
2021-08-27 19:55       ` [PATCH v4 3/6] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-08-27 19:55       ` [PATCH v4 4/6] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-08-27 19:55       ` [PATCH v4 5/6] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-09-03  6:12         ` Jarkko Sakkinen
2021-09-03  6:56           ` Jarkko Sakkinen
2021-09-06 18:51             ` Luck, Tony
2021-09-07 14:07               ` Jarkko Sakkinen
2021-09-07 14:13                 ` Dave Hansen
2021-09-07 15:07                   ` Luck, Tony
2021-09-07 15:03                 ` Luck, Tony
2021-09-07 15:08                   ` Jarkko Sakkinen
2021-09-07 17:46                     ` Luck, Tony
2021-09-08  0:59                       ` Luck, Tony
2021-09-08 16:49                         ` Dave Hansen
2021-09-08  2:29                       ` Jarkko Sakkinen
2021-08-27 19:55       ` [PATCH v4 6/6] x86/sgx: Add hook to error injection address validation Tony Luck
2021-08-27 20:28       ` [PATCH v4 0/6] Basic recovery for machine checks inside SGX Borislav Petkov
2021-08-27 20:43         ` Sean Christopherson
2021-09-01  2:06       ` Jarkko Sakkinen
2021-09-01 14:48         ` Luck, Tony
2021-09-17 21:38       ` [PATCH v5 0/7] " Tony Luck
2021-09-17 21:38         ` [PATCH v5 1/7] x86/sgx: Provide indication of life-cycle of EPC pages Tony Luck
2021-09-21 21:28           ` Jarkko Sakkinen
2021-09-21 21:34             ` Luck, Tony
2021-09-22  5:17               ` Jarkko Sakkinen
2021-09-21 22:15             ` Dave Hansen
2021-09-22  5:27               ` Jarkko Sakkinen
2021-09-17 21:38         ` [PATCH v5 2/7] x86/sgx: Add infrastructure to identify SGX " Tony Luck
2021-09-21 20:23           ` Dave Hansen
2021-09-21 20:50             ` Luck, Tony
2021-09-21 22:32               ` Dave Hansen
2021-09-21 23:48                 ` Luck, Tony
2021-09-21 23:50                   ` Dave Hansen
2021-09-17 21:38         ` [PATCH v5 3/7] x86/sgx: Initial poison handling for dirty and free pages Tony Luck
2021-09-17 21:38         ` [PATCH v5 4/7] x86/sgx: Add SGX infrastructure to recover from poison Tony Luck
2021-09-17 21:38         ` [PATCH v5 5/7] x86/sgx: Hook arch_memory_failure() into mainline code Tony Luck
2021-09-17 21:38         ` [PATCH v5 6/7] x86/sgx: Add hook to error injection address validation Tony Luck
2021-09-17 21:38         ` [PATCH v5 7/7] x86/sgx: Add check for SGX pages to ghes_do_memory_failure() Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210708181423.1312359-3-tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jarkko@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).