All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tony Luck <tony.luck@intel.com>
To: linux-kernel@vger.kernel.org
Cc: x86@kernel.org, Dave Hansen <dave.hansen@intel.com>,
	Jarkko Sakkinen <jarkko.sakkinen@intel.com>,
	Sean Christopherson <seanjc@google.com>,
	Tony Luck <tony.luck@intel.com>
Subject: [RFC PATCH 0/4] Machine check recovery for SGX
Date: Tue,  8 Jun 2021 14:40:34 -0700	[thread overview]
Message-ID: <20210608214038.1026259-1-tony.luck@intel.com> (raw)

Early draft because there are people outside of Intel that want to
see how this is coming along, and this is the easiest way to share.

I wouldn't advise running this code on a production system as testing
has been very light.

SGX memory pages are allocated from special protected memory ranges
and do not have Linux "struct page" structures to manage them.

A recent architecture change results in new behavior for SGX enclaves
on a system when a recoverable local machine check occurs.
a) If the machine check is triggered by code executing outside of an
   enclave, then it can be handled as normal by the OS. Enclaves are
   not affected
b) If the machine check is triggered by code in an enclave, then that
   enclave cannot be re-entered. But other enclaves on the system can
   continue to execute.

This means that "recovery" from an error in an active enclave page
will result in the termination of that enclave.

Memory controller patrol scrubbing may find errors in unused SGX pages.
Those can simply be removed from the free list so that they will not
be used.

On bare metal there are two cases for "regular" SGX pages:
1) Error is found by patrol scrubber. Action is to remove the page
   from the enclave. If the page isn't accessed again, then the enclave
   can continue to execute. If the page is accessed the page cannot be
   replaced, so the enclave will be terminated.
   This part of the code (and the free page part) tested using
   /sys/devices/system/memory/hard_offline_page to call memory_failure().

2) Error triggers a machine check when enclave code accesses poison. In
   this case a SIGBUS is sent to the task that owns the enclave (just
   like the non-SGX case).
   This part of the code has been tested with EINJ error injection.

Poison in other types of SGX pages (e.g. SECS) isn't handled yet.

The virtualization case is just a shell. Linux doesn't know how the
guest is using each page. For now just SIGKILL the task (qemu?) that
owns the SGX pages.
This part of the code compiles, but has not been tested.

Tony Luck (4):
  x86/sgx: Track phase and type of SGX EPC pages
  x86/sgx: Add basic infrastructure to recover from errors in SGX memory
  x86/sgx: Hook sgx_memory_failure() into mainline code
  x86/sgx: Add hook to error injection address validation

 .../firmware-guide/acpi/apei/einj.rst         |  19 +++
 arch/x86/include/asm/sgx.h                    |   6 +
 arch/x86/kernel/cpu/sgx/encl.c                |   4 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |   4 +-
 arch/x86/kernel/cpu/sgx/main.c                | 147 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/sgx.h                 |  17 +-
 arch/x86/kernel/cpu/sgx/virt.c                |  11 +-
 drivers/acpi/apei/einj.c                      |   3 +-
 include/linux/mm.h                            |  15 ++
 mm/memory-failure.c                           |   4 +
 10 files changed, 219 insertions(+), 11 deletions(-)

-- 
2.29.2


             reply	other threads:[~2021-06-08 21:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-08 21:40 Tony Luck [this message]
2021-06-08 21:40 ` [RFC PATCH 1/4] x86/sgx: Track phase and type of SGX EPC pages Tony Luck
2021-07-13 16:56   ` Sean Christopherson
2021-07-27  1:48     ` Sakkinen, Jarkko
2021-06-08 21:40 ` [RFC PATCH 2/4] x86/sgx: Add basic infrastructure to recover from errors in SGX memory Tony Luck
2021-07-13 17:14   ` Sean Christopherson
2021-06-08 21:40 ` [RFC PATCH 3/4] x86/sgx: Hook sgx_memory_failure() into mainline code Tony Luck
2021-06-08 21:40 ` [RFC PATCH 4/4] x86/sgx: Add hook to error injection address validation Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210608214038.1026259-1-tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=jarkko.sakkinen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.