linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Kai Huang <kai.huang@intel.com>, Borislav Petkov <bp@alien8.de>,
	Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, x86@kernel.org, linux-sgx@vger.kernel.org,
	linux-kernel@vger.kernel.org, jarkko@kernel.org, luto@kernel.org,
	dave.hansen@intel.com, rick.p.edgecombe@intel.com,
	haitao.huang@intel.com, tglx@linutronix.de, mingo@redhat.com,
	hpa@zytor.com
Subject: Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
Date: Wed, 24 Mar 2021 11:09:20 +0100	[thread overview]
Message-ID: <236c0aa9-92f2-97c8-ab11-d55b9a98c931@redhat.com> (raw)
In-Reply-To: <b35f66a10ecc07a1eecb829912d5664886ca169b.camel@intel.com>

On 24/03/21 10:38, Kai Huang wrote:
> Hi Sean, Boris, Paolo,
> 
> Thanks for the discussion. I tried to digest all your conversations and
> hopefully I have understood you correctly. I pasted the new patch here
> (not full patch, but relevant part only). I modified the error msg, added
> some writeup to Documentation/x86/sgx.rst, and put Sean's explanation of this
> bug to the commit msg (per Paolo). I am terrible Documentation writer, so
> please help to check and give comments. Thanks!

I have some phrasing suggestions below but that was actually pretty good.

> ---
> commit 1e297a535bcb4f51a08343c40207520017d85efe (HEAD)
> Author: Kai Huang <kai.huang@intel.com>
> Date:   Wed Jan 20 03:40:53 2021 +0200
> 
>      x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
>      
>      EREMOVE takes a page and removes any association between that page and
>      an enclave.  It must be run on a page before it can be added into
>      another enclave.  Currently, EREMOVE is run as part of pages being freed
>      into the SGX page allocator.  It is not expected to fail.
>      
>      KVM does not track how guest pages are used, which means that SGX
>      virtualization use of EREMOVE might fail.  Specifically, it is
>      legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
>      KVM guest, because KVM/kernel doesn't track SECS pages.
>
>      Break out the EREMOVE call from the SGX page allocator.  This will allow
>      the SGX virtualization code to use the allocator directly.  (SGX/KVM
>      will also introduce a more permissive EREMOVE helper).

Ok, I think I got the source of my confusion.  The part in parentheses
is the key.  It was not clear that KVM can deal with EREMOVE failures
*without printing the error*.  Good!

>      Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
>      more specific that it is used to free EPC page assigned host enclave.
>      Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
>      sites so there's no functional change.
>      
>      Improve error message when EREMOVE fails, and kernel is about to leak
>      EPC page, which is likely a kernel bug.  This is effectively a kernel
>      use-after-free of EPC, and due to the way SGX works, the bug is detected
>      at freeing.  Rather than add the page back to the pool of available EPC,
>      the kernel intentionally leaks the page to avoid additional errors in
>      the future.
>      
>      Also add documentation to explain to user what is the bug and suggest
>      user what to do when this bug happens, although extremely unlikely.

Rewritten:

EREMOVE takes a page and removes any association between that page and
an enclave.  It must be run on a page before it can be added into
another enclave.  Currently, EREMOVE is run as part of pages being freed
into the SGX page allocator.  It is not expected to fail, as it would
indicate a use-after-free of EPC.  Rather than add the page back to the
pool of available EPC, the kernel intentionally leaks the page to avoid
additional errors in the future.

However, KVM does not track how guest pages are used, which means that SGX
virtualization use of EREMOVE might fail.  Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.

To allow SGX/KVM to introduce a more permissive EREMOVE helper and to
let the SGX virtualization code use the allocator directly,
break out the EREMOVE call from the SGX page allocator.  Rename the
original sgx_free_epc_page() to sgx_encl_free_epc_page(),
indicating that it is used to free EPC page assigned host enclave.
Replace sgx_free_epc_page() with sgx_encl_free_epc_page() in all call
sites so there's no functional change.

At the same time improve error message when EREMOVE fails, and add
documentation to explain to user what is the bug and suggest user what
to do when this bug happens, although extremely unlikely.

> +Although extremely unlikely, EPC leaks can happen if kernel SGX bug happens,
> +when a WARNING with below message is shown in dmesg:

Remove "Although extremely unlikely".

> +"...EREMOVE returned ..., kernel bug likely.  EPC page leaked, SGX may become
> +unusuable.  Please refer to Documentation/x86/sgx.rst for more information."
> +
> +This is effectively a kernel use-after-free of EPC, and due to the way SGX
> +works, the bug is detected at freeing. Rather than add the page back to the pool
> +of available EPC, the kernel intentionally leaks the page to avoid additional
> +errors in the future.
> +
> +When this happens, kernel will likely soon leak majority of EPC pages, and SGX
> +will likely become unusable. However while this may be fatal to SGX, other
> +kernel functionalities are unlikely to be impacted, and should continue to work.
> +
> +As a result, when this happpens, user should stop running any new SGX workloads,
> +(or just any new workloads), and migrate all valuable workloads, for instance,
> +virtual machines, to other places.

Remove everything starting with "for instance".

  Although a machine reboot can recover all
> +EPC, debugging and fixing this bug is appreciated.

Replace the second part with "the bug should be reported to the Linux developers".
The poor user is not expected to debug SGX. ;)

> +/* Error message for EREMOVE failure, when kernel is about to leak EPC page */
> +#define EREMOVE_ERROR_MESSAGE \
> +       "EREMOVE returned %d (0x%x), kernel bug likely.  EPC page leaked, SGX may become
> unusuable.  Please refer to Documentation/x86/sgx.rst for more information."

Rewritten:

EREMOVE returned %d and an EPC page was leaked; SGX may become unusable.
This is a kernel bug, refer to Documentation/x86/sgx.rst for more information.

Also please split it across multiple lines.

Paolo


  reply	other threads:[~2021-03-24 10:10 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19  7:29 [PATCH v3 00/25] KVM SGX virtualization support Kai Huang
2021-03-19  7:22 ` [PATCH v3 01/25] x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:22 ` [PATCH v3 02/25] x86/cpufeatures: Add SGX1 and SGX2 sub-features Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:22 ` [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
2021-03-22 18:16   ` Borislav Petkov
2021-03-22 18:56     ` Sean Christopherson
2021-03-22 19:11       ` Paolo Bonzini
2021-03-22 20:43         ` Kai Huang
2021-03-23 16:40           ` Paolo Bonzini
2021-03-22 19:15       ` Borislav Petkov
2021-03-22 19:37         ` Sean Christopherson
2021-03-22 20:36           ` Kai Huang
2021-03-22 21:06           ` Borislav Petkov
2021-03-22 22:06             ` Kai Huang
2021-03-22 22:37               ` Borislav Petkov
2021-03-22 23:16                 ` Kai Huang
2021-03-23 15:45                   ` Sean Christopherson
2021-03-23 16:06                     ` Borislav Petkov
2021-03-23 16:21                       ` Sean Christopherson
2021-03-23 16:32                         ` Borislav Petkov
2021-03-23 16:51                           ` Sean Christopherson
2021-03-24  9:38                           ` Kai Huang
2021-03-24 10:09                             ` Paolo Bonzini [this message]
2021-03-24 10:48                               ` Kai Huang
2021-03-24 11:24                                 ` Paolo Bonzini
2021-03-24 23:23                               ` Kai Huang
2021-03-24 23:39                                 ` Paolo Bonzini
2021-03-24 23:46                                   ` Kai Huang
2021-03-25  8:42                                     ` Borislav Petkov
2021-03-25  9:38                                       ` Kai Huang
2021-03-25 16:52                                         ` Borislav Petkov
2021-03-24  9:28                         ` Jarkko Sakkinen
2021-03-23 16:38                       ` Paolo Bonzini
2021-03-23 17:02                         ` Sean Christopherson
2021-03-23 17:06                           ` Paolo Bonzini
2021-03-23 17:16                             ` Sean Christopherson
2021-03-23 18:16                             ` Borislav Petkov
2021-03-24  9:26                       ` Jarkko Sakkinen
2021-03-22 22:23             ` Kai Huang
2021-03-25  9:30   ` [PATCH v4 " Kai Huang
2021-03-26 19:48     ` Jarkko Sakkinen
2021-03-26 20:38       ` Kai Huang
2021-03-26 21:39       ` Jarkko Sakkinen
2021-04-07 10:03     ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:22 ` [PATCH v3 04/25] x86/sgx: Add SGX_CHILD_PRESENT hardware error code Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:22 ` [PATCH v3 05/25] x86/sgx: Introduce virtual EPC for use by KVM guests Kai Huang
2021-03-25  9:36   ` Kai Huang
2021-03-26 15:03   ` Borislav Petkov
2021-03-26 15:17     ` Dave Hansen
2021-03-26 15:29       ` Borislav Petkov
2021-03-26 15:35         ` Dave Hansen
2021-03-26 17:02           ` Borislav Petkov
2021-03-31  1:10     ` Kai Huang
2021-03-31  6:44       ` Boris Petkov
2021-03-31  6:51         ` Kai Huang
2021-03-31  7:44           ` Boris Petkov
2021-03-31  8:53             ` Kai Huang
2021-03-31 12:20               ` Kai Huang
2021-04-01 18:31                 ` Borislav Petkov
2021-04-01 23:38                   ` Kai Huang
2021-04-01  9:45               ` Kai Huang
2021-04-01  9:42   ` [PATCH v4 " Kai Huang
2021-04-05  9:01   ` [PATCH v3 " Borislav Petkov
2021-04-05 21:46     ` Kai Huang
2021-04-06  8:28       ` Borislav Petkov
2021-04-06  9:04         ` Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:22 ` [PATCH v3 06/25] x86/cpu/intel: Allow SGX virtualization without Launch Control support Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 07/25] x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled Kai Huang
2021-04-02  9:48   ` Borislav Petkov
2021-04-02 11:08     ` Kai Huang
2021-04-02 11:22       ` Borislav Petkov
2021-04-02 11:38         ` Kai Huang
2021-04-02 15:42     ` Sean Christopherson
2021-04-02 19:08       ` Kai Huang
2021-04-02 19:19       ` Borislav Petkov
2021-04-02 19:30         ` Sean Christopherson
2021-04-02 19:46           ` Borislav Petkov
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:23 ` [PATCH v3 08/25] x86/sgx: Expose SGX architectural definitions to the kernel Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 09/25] x86/sgx: Move ENCLS leaf definitions to sgx.h Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 10/25] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 11/25] x86/sgx: Add encls_faulted() helper Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 12/25] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Kai Huang
2021-03-19  7:23 ` [PATCH v3 13/25] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM Kai Huang
2021-04-05  9:07   ` Borislav Petkov
2021-04-05 21:44     ` Kai Huang
2021-04-06  7:40       ` Borislav Petkov
2021-04-06  8:59         ` Kai Huang
2021-04-06  9:09           ` Borislav Petkov
2021-04-06  9:24             ` Kai Huang
2021-04-06  9:32               ` Borislav Petkov
2021-04-06  9:41                 ` Kai Huang
2021-04-06 17:08                   ` Borislav Petkov
2021-04-06 20:33                     ` Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 14/25] x86/sgx: Move provisioning device creation out of SGX driver Kai Huang
2021-04-07 10:03   ` [tip: x86/sgx] " tip-bot2 for Sean Christopherson
2021-03-19  7:23 ` [PATCH v3 15/25] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) Kai Huang
2021-03-19  7:23 ` [PATCH v3 16/25] KVM: x86: Define new #PF SGX error code bit Kai Huang
2021-03-19  7:23 ` [PATCH v3 17/25] KVM: x86: Add support for reverse CPUID lookup of scattered features Kai Huang
2021-03-19  7:23 ` [PATCH v3 18/25] KVM: x86: Add reverse-CPUID lookup support for scattered SGX features Kai Huang
2021-03-19  7:23 ` [PATCH v3 19/25] KVM: VMX: Add basic handling of VM-Exit from SGX enclave Kai Huang
2021-03-19  7:23 ` [PATCH v3 20/25] KVM: VMX: Frame in ENCLS handler for SGX virtualization Kai Huang
2021-03-19  7:23 ` [PATCH v3 21/25] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions Kai Huang
2021-03-19  7:23 ` [PATCH v3 22/25] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs Kai Huang
2021-03-19  7:23 ` [PATCH v3 23/25] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) Kai Huang
2021-03-19  7:23 ` [PATCH v3 24/25] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC Kai Huang
2021-03-19  7:24 ` [PATCH v3 25/25] KVM: x86: Add capability to grant VM access to privileged SGX attribute Kai Huang
2021-03-19 14:52 ` [PATCH v3 00/25] KVM SGX virtualization support Jarkko Sakkinen
2021-03-22 10:03   ` Kai Huang
2021-03-22 10:31     ` Borislav Petkov
2021-03-26 22:46 ` Jarkko Sakkinen
2021-03-28 21:01   ` Huang, Kai
2021-03-31 23:23     ` Jarkko Sakkinen
  -- strict thread matches above, loose matches on Subject: below --
2021-03-09  1:39 [PATCH v2 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() Kai Huang
2021-03-11  2:01 ` [PATCH v3 " Kai Huang
2021-03-12 21:21   ` Sean Christopherson
2021-03-13 10:45     ` Jarkko Sakkinen
2021-03-15  7:12       ` Kai Huang
2021-03-15 13:18         ` Jarkko Sakkinen
2021-03-15 13:19           ` Jarkko Sakkinen
2021-03-15 20:29             ` Kai Huang
2021-03-15 22:59               ` Jarkko Sakkinen
2021-03-15 23:50                 ` Kai Huang
2021-03-15 23:11               ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=236c0aa9-92f2-97c8-ab11-d55b9a98c931@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=haitao.huang@intel.com \
    --cc=hpa@zytor.com \
    --cc=jarkko@kernel.org \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).