[RFC PATCH 0/4] Machine check recovery for SGX

From: Tony Luck <tony.luck@intel.com>
To: linux-kernel@vger.kernel.org
Cc: x86@kernel.org, Dave Hansen <dave.hansen@intel.com>,
	Jarkko Sakkinen <jarkko.sakkinen@intel.com>,
	Sean Christopherson <seanjc@google.com>,
	Tony Luck <tony.luck@intel.com>
Subject: [RFC PATCH 0/4] Machine check recovery for SGX
Date: Tue,  8 Jun 2021 14:40:34 -0700	[thread overview]
Message-ID: <20210608214038.1026259-1-tony.luck@intel.com> (raw)

Early draft because there are people outside of Intel that want to
see how this is coming along, and this is the easiest way to share.

I wouldn't advise running this code on a production system as testing
has been very light.

SGX memory pages are allocated from special protected memory ranges
and do not have Linux "struct page" structures to manage them.

A recent architecture change results in new behavior for SGX enclaves
on a system when a recoverable local machine check occurs.
a) If the machine check is triggered by code executing outside of an
   enclave, then it can be handled as normal by the OS. Enclaves are
   not affected
b) If the machine check is triggered by code in an enclave, then that
   enclave cannot be re-entered. But other enclaves on the system can
   continue to execute.

This means that "recovery" from an error in an active enclave page
will result in the termination of that enclave.

Memory controller patrol scrubbing may find errors in unused SGX pages.
Those can simply be removed from the free list so that they will not
be used.

On bare metal there are two cases for "regular" SGX pages:
1) Error is found by patrol scrubber. Action is to remove the page
   from the enclave. If the page isn't accessed again, then the enclave
   can continue to execute. If the page is accessed the page cannot be
   replaced, so the enclave will be terminated.
   This part of the code (and the free page part) tested using
   /sys/devices/system/memory/hard_offline_page to call memory_failure().

2) Error triggers a machine check when enclave code accesses poison. In
   this case a SIGBUS is sent to the task that owns the enclave (just
   like the non-SGX case).
   This part of the code has been tested with EINJ error injection.

Poison in other types of SGX pages (e.g. SECS) isn't handled yet.

The virtualization case is just a shell. Linux doesn't know how the
guest is using each page. For now just SIGKILL the task (qemu?) that
owns the SGX pages.
This part of the code compiles, but has not been tested.

Tony Luck (4):
  x86/sgx: Track phase and type of SGX EPC pages
  x86/sgx: Add basic infrastructure to recover from errors in SGX memory
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/sgx.h                    |   6 +
 arch/x86/kernel/cpu/sgx/encl.c                |   4 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |   4 +-
 arch/x86/kernel/cpu/sgx/main.c                | 147 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |  17 +-
 arch/x86/kernel/cpu/sgx/virt.c                |  11 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |   4 +
 10 files changed, 219 insertions(+), 11 deletions(-)

-- 
2.29.2