All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior
@ 2022-06-08  3:26 Zhiquan Li
  2022-06-08  3:26 ` [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page Zhiquan Li
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Zhiquan Li @ 2022-06-08  3:26 UTC (permalink / raw)
  To: linux-sgx, tony.luck, jarkko, dave.hansen
  Cc: seanjc, kai.huang, fan.du, cathy.zhang, zhiquan1.li

V3: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#t

Changes since V3:
- Move the definition of EPC page flag SGX_EPC_PAGE_KVM_GUEST from
  Cathy's third patch of SGX rebootless recovery patch set but discard
  irrelevant portion, since it might need more time to re-forge and
  these are two different features.
  Link: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#m9782d23496cacecb7da07a67daa79f4b322ae170

V2: https://lore.kernel.org/linux-sgx/694234d7-6a0d-e85f-f2f9-e52b4a61e1ec@intel.com/T/#t

Changes since V2:
- Repurpose the owner field as the virtual address of virtual EPC page
- Remove struct sgx_vepc_page and relevant code.
- Remove patch 01 as the changes are not necessary in new design.
- Rework patch 02 suggested by Jarkko.
- Adapt patch 03 and 04 since struct sgx_vepc_page was discarded.
- Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
  SGX_EPC_PAGE_KVM_GUEST as they are duplicated.
  Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u

V1: https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@intel.com/T/#t

Changes since V1:
- Updated cover letter and commit messages, added valuable
  information from Jarkko, Tony and Kai’s comments.
- Added documentations for struct struct sgx_vepc and
  struct sgx_vepc_page.

Hi everyone,

This series contains a few patches to fine grained SGX MCA behavior.

When VM guest access a SGX EPC page with memory failure, current
behavior will kill the guest, expected only kill the SGX application
inside it.

To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
information for hypervisor to inject #MC information to guest, which
is helpful in SGX virtualization case.

The rest of things are guest side. Currently the hypervisor like
Qemu already has mature facility to convert HVA to GPA and inject #MC
to the guest OS.

Then we extend the solution for the normal SGX case, so that the task
has opportunity to make further decision while EPC page has memory
failure.

However, when a page triggers a machine check, it only reports the PFN.
But in order to inject #MC into hypervisor, the virtual address
is required. Then repurpose the “owner” field as the virtual address of
the virtual EPC page so that arch_memory_failure() can easily retrieve
it.

Add a new EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the
meaning of the field.

Suppose an enclave is shared by multiple processes, when an enclave
page triggers a machine check, the enclave will be disabled so that
it couldn't be entered again. Killing other processes with the same
enclave mapped would perhaps be overkill, but they are going to find
that the enclave is "dead" next time they try to use it. Thanks for
Jarkko’s head up and Tony’s clarification on this point.

Our intension is to provide additional info so that the application has
more choices. Current behavior looks gently, and we don’t want to
change it.

If you expect the other processes to be informed in such case, then
you’re looking for an MCA “early kill” feature which worth another
patch set to implement it.

Unlike host enclaves, virtual EPC instance cannot be shared by multiple
VMs. It is because how enclaves are created is totally up to the guest.
Sharing virtual EPC instance will be very likely to unexpectedly break
enclaves in all VMs.

SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
being shared by multiple VMs via fork(). However KVM doesn't support
running a VM across multiple mm structures, and the de facto userspace
hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
this should not happen.

This series is based on tip/x86/sgx.

Tests:
1. MCE injection test for SGX in VM.
   As we expected, the application was killed and VM was alive.
2. MCE injection test for SGX on host.
   As we expected, the application received SIGBUS with extra info.
3. Kernel selftest/sgx: PASS
4. Internal SGX stress test: PASS
5. kmemleak test: No memory leakage detected.

Much appreciate your feedback.

Best Regards,
Zhiquan

Zhiquan Li (3):
  x86/sgx: Repurpose the owner field as the virtual address of virtual
    EPC page
  x86/sgx: Fine grained SGX MCA behavior for virtualization
  x86/sgx: Fine grained SGX MCA behavior for normal case

 arch/x86/kernel/cpu/sgx/main.c | 27 +++++++++++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h  |  2 ++
 arch/x86/kernel/cpu/sgx/virt.c |  4 +++-
 3 files changed, 30 insertions(+), 3 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page
  2022-06-08  3:26 [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Zhiquan Li
@ 2022-06-08  3:26 ` Zhiquan Li
  2022-06-08  3:45   ` Zhiquan Li
  2022-06-08  3:54   ` Kai Huang
  2022-06-08  3:26 ` [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization Zhiquan Li
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Zhiquan Li @ 2022-06-08  3:26 UTC (permalink / raw)
  To: linux-sgx, tony.luck, jarkko, dave.hansen
  Cc: seanjc, kai.huang, fan.du, cathy.zhang, zhiquan1.li

When a page triggers a machine check, it only reports the
physical address of EPC page. But in order to inject #MC into
hypervisor, the virtual address is required. Then repurpose the
"owner" field as the virtual address of the virtual EPC page so that
arch_memory_failure() can easily retrieve it.

Add a new EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the
meaning of the field.

Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
---
Changes since V3:
- Move the definition of EPC page flag SGX_EPC_PAGE_KVM_GUEST from
  Cathy's third patch of SGX rebootless recovery patch set but discard
  irrelevant portion, since it might need more time to re-forge and
  these are two different features.
  Link: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#m9782d23496cacecb7da07a67daa79f4b322ae170

Changes since V2:
- Rework the patch suggested by Jarkko.
- Remove struct sgx_vepc_page and relevant code.
- Remove new EPC page flag SGX_EPC_PAGE_IS_VEPC definition as it is
  duplicated to SGX_EPC_PAGE_KVM_GUEST.
  Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u

Changes since V1:
- Add documentation suggested by Jarkko.
---
 arch/x86/kernel/cpu/sgx/sgx.h  | 2 ++
 arch/x86/kernel/cpu/sgx/virt.c | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 0f17def9fe6f..b43582da1bcf 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -28,6 +28,8 @@
 
 /* Pages on free list */
 #define SGX_EPC_PAGE_IS_FREE		BIT(1)
+/* Pages allocated for KVM guest */
+#define SGX_EPC_PAGE_KVM_GUEST		BIT(2)
 
 struct sgx_epc_page {
 	unsigned int section;
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 6a77a14eee38..776ae5c1c032 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -46,10 +46,12 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc,
 	if (epc_page)
 		return 0;
 
-	epc_page = sgx_alloc_epc_page(vepc, false);
+	epc_page = sgx_alloc_epc_page((void *)addr, false);
 	if (IS_ERR(epc_page))
 		return PTR_ERR(epc_page);
 
+	epc_page->flags |= SGX_EPC_PAGE_KVM_GUEST;
+
 	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
 	if (ret)
 		goto err_free;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization
  2022-06-08  3:26 [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Zhiquan Li
  2022-06-08  3:26 ` [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page Zhiquan Li
@ 2022-06-08  3:26 ` Zhiquan Li
  2022-06-08  3:52   ` Kai Huang
  2022-06-08  3:26 ` [PATCH v4 3/3] x86/sgx: Fine grained SGX MCA behavior for normal case Zhiquan Li
  2022-06-08  8:10 ` [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Jarkko Sakkinen
  3 siblings, 1 reply; 12+ messages in thread
From: Zhiquan Li @ 2022-06-08  3:26 UTC (permalink / raw)
  To: linux-sgx, tony.luck, jarkko, dave.hansen
  Cc: seanjc, kai.huang, fan.du, cathy.zhang, zhiquan1.li

When VM guest access a SGX EPC page with memory failure, current
behavior will kill the guest, expected only kill the SGX application
inside it.

To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
information for hypervisor to inject #MC information to guest, which is
helpful in SGX case.

The rest of things are guest side. Currently the hypervisor like Qemu
already has mature facility to convert HVA to GPA and inject #MC to
the guest OS.

Unlike host enclaves, virtual EPC instance cannot be shared by multiple
VMs.  It is because how enclaves are created is totally up to the guest.
Sharing virtual EPC instance will be very likely to unexpectedly break
enclaves in all VMs.

SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
being shared by multiple VMs via fork().  However KVM doesn't support
running a VM across multiple mm structures, and the de facto userspace
hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
this should not happen.

Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@intel.com/T/#m1d1f4098f4fad78034e8706a60e4d79c119db407
---
No changes since V3.

Changes since V2:
- Retrieve virtual address from "owner" field of struct sgx_epc_page,
  instead of struct sgx_vepc_page.
- Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
  SGX_EPC_PAGE_KVM_GUEST as they are duplicated.

Changes since V1:
- Add Acked-by from Kai Huang.
- Add Kai’s excellent explanation regarding to why we no need to
  consider that one virtual EPC be shared by two guests.
---
 arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index ab4ec54bbdd9..faca7f73b06d 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -715,6 +715,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
 	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
 	struct sgx_epc_section *section;
 	struct sgx_numa_node *node;
+	int ret = 0;
+	unsigned long vaddr;
 
 	/*
 	 * mm/memory-failure.c calls this routine for all errors
@@ -731,8 +733,26 @@ int arch_memory_failure(unsigned long pfn, int flags)
 	 * error. The signal may help the task understand why the
 	 * enclave is broken.
 	 */
-	if (flags & MF_ACTION_REQUIRED)
-		force_sig(SIGBUS);
+	if (flags & MF_ACTION_REQUIRED) {
+		/*
+		 * Provide extra info to the task so that it can make further
+		 * decision but not simply kill it. This is quite useful for
+		 * virtualization case.
+		 */
+		if (page->flags & SGX_EPC_PAGE_KVM_GUEST) {
+			/*
+			 * The "owner" field is repurposed as the virtual address
+			 * of virtual EPC page.
+			 */
+			vaddr = (unsigned long)page->owner & PAGE_MASK;
+			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)vaddr,
+					PAGE_SHIFT);
+			if (ret < 0)
+				pr_err("Memory failure: Error sending signal to %s:%d: %d\n",
+					current->comm, current->pid, ret);
+		} else
+			force_sig(SIGBUS);
+	}
 
 	section = &sgx_epc_sections[page->section];
 	node = section->node;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/3] x86/sgx: Fine grained SGX MCA behavior for normal case
  2022-06-08  3:26 [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Zhiquan Li
  2022-06-08  3:26 ` [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page Zhiquan Li
  2022-06-08  3:26 ` [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization Zhiquan Li
@ 2022-06-08  3:26 ` Zhiquan Li
  2022-06-08  8:10 ` [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Jarkko Sakkinen
  3 siblings, 0 replies; 12+ messages in thread
From: Zhiquan Li @ 2022-06-08  3:26 UTC (permalink / raw)
  To: linux-sgx, tony.luck, jarkko, dave.hansen
  Cc: seanjc, kai.huang, fan.du, cathy.zhang, zhiquan1.li

When the application accesses a SGX EPC page with memory failure, the
task will receive a SIGBUS signal without any extra info, unless the
EPC page has SGX_EPC_PAGE_KVM_GUEST flag. However, in some cases,
we only use SGX in sub-task and we don't expect the entire task group
be killed due to a SGX EPC page for a sub-task has memory failure.

To fix it, we extend the solution for normal case. That is, the SGX
regular EPC page with memory failure will trigger a SIGBUS signal with
code BUS_MCEERR_AR and additional info, so that the user has opportunity
to make further decision.

Suppose an enclave is shared by multiple processes, when an enclave page
triggers a machine check, the enclave will be disabled so that it
couldn't be entered again. Killing other processes with the same enclave
mapped would perhaps be overkill, but they are going to find that the
enclave is "dead" next time they try to use it. Thanks for Jarkko's head
up and Tony's clarification on this point.

Our intension is to provide additional info so that the application has
more choices. Current behavior looks gently, and we don't want to change
it.

Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
---
No changes since V3.

Changes since V2:
- Adapted the code since struct sgx_vepc_page was discarded.
- Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
  SGX_EPC_PAGE_KVM_GUEST as they are duplicated.

Changes since V1:
- Add valuable information from Jarkko and Tony the into commit
  message.

Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index faca7f73b06d..69a2a29c8957 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -739,12 +739,15 @@ int arch_memory_failure(unsigned long pfn, int flags)
 		 * decision but not simply kill it. This is quite useful for
 		 * virtualization case.
 		 */
-		if (page->flags & SGX_EPC_PAGE_KVM_GUEST) {
+		if (page->owner) {
 			/*
 			 * The "owner" field is repurposed as the virtual address
 			 * of virtual EPC page.
 			 */
-			vaddr = (unsigned long)page->owner & PAGE_MASK;
+			if (page->flags & SGX_EPC_PAGE_KVM_GUEST)
+				vaddr = (unsigned long)page->owner & PAGE_MASK;
+			else
+				vaddr = (unsigned long)page->owner->desc & PAGE_MASK;
 			ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)vaddr,
 					PAGE_SHIFT);
 			if (ret < 0)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page
  2022-06-08  3:26 ` [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page Zhiquan Li
@ 2022-06-08  3:45   ` Zhiquan Li
  2022-06-08  3:54   ` Kai Huang
  1 sibling, 0 replies; 12+ messages in thread
From: Zhiquan Li @ 2022-06-08  3:45 UTC (permalink / raw)
  To: linux-sgx, Zhang, Cathy
  Cc: seanjc, kai.huang, fan.du, dave.hansen, jarkko, tony.luck

On 2022/6/8 11:26, Zhiquan Li wrote:
> When a page triggers a machine check, it only reports the
> physical address of EPC page. But in order to inject #MC into
> hypervisor, the virtual address is required. Then repurpose the
> "owner" field as the virtual address of the virtual EPC page so that
> arch_memory_failure() can easily retrieve it.
> 
> Add a new EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the
> meaning of the field.
> 
> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>

Hi Cathy,

I forgot to add your signature here.

The flag SGX_EPC_PAGE_KVM_GUEST is taken from your patch:
https://lore.kernel.org/linux-sgx/YoveWpEsH6Hghc5Y@kernel.org/T/#u

Can I add "Co-developed-by" as well as "Signed-off-by" for you?

Best Regards,
Zhiquan

> ---
> Changes since V3:
> - Move the definition of EPC page flag SGX_EPC_PAGE_KVM_GUEST from
>   Cathy's third patch of SGX rebootless recovery patch set but discard
>   irrelevant portion, since it might need more time to re-forge and
>   these are two different features.
>   Link: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#m9782d23496cacecb7da07a67daa79f4b322ae170
> 
> Changes since V2:
> - Rework the patch suggested by Jarkko.
> - Remove struct sgx_vepc_page and relevant code.
> - Remove new EPC page flag SGX_EPC_PAGE_IS_VEPC definition as it is
>   duplicated to SGX_EPC_PAGE_KVM_GUEST.
>   Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u
> 
> Changes since V1:
> - Add documentation suggested by Jarkko.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization
  2022-06-08  3:26 ` [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization Zhiquan Li
@ 2022-06-08  3:52   ` Kai Huang
  2022-06-08  8:13     ` Jarkko Sakkinen
  0 siblings, 1 reply; 12+ messages in thread
From: Kai Huang @ 2022-06-08  3:52 UTC (permalink / raw)
  To: Zhiquan Li, linux-sgx, tony.luck, jarkko, dave.hansen
  Cc: seanjc, fan.du, cathy.zhang

On Wed, 2022-06-08 at 11:26 +0800, Zhiquan Li wrote:
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -715,6 +715,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
>  	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
>  	struct sgx_epc_section *section;
>  	struct sgx_numa_node *node;
> +	int ret = 0;
> +	unsigned long vaddr;

Please switch the order of the two variables so all of variables are in reverse
Christmas style.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page
  2022-06-08  3:26 ` [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page Zhiquan Li
  2022-06-08  3:45   ` Zhiquan Li
@ 2022-06-08  3:54   ` Kai Huang
  1 sibling, 0 replies; 12+ messages in thread
From: Kai Huang @ 2022-06-08  3:54 UTC (permalink / raw)
  To: Zhiquan Li, linux-sgx, tony.luck, jarkko, dave.hansen
  Cc: seanjc, fan.du, cathy.zhang

On Wed, 2022-06-08 at 11:26 +0800, Zhiquan Li wrote:
> When a page triggers a machine check, it only reports the
> physical address of EPC page. But in order to inject #MC into
> hypervisor, the virtual address is required. Then repurpose the
> "owner" field as the virtual address of the virtual EPC page so that
> arch_memory_failure() can easily retrieve it.
> 
> Add a new EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the
> meaning of the field.
> 
> Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
> ---
> Changes since V3:
> - Move the definition of EPC page flag SGX_EPC_PAGE_KVM_GUEST from
>   Cathy's third patch of SGX rebootless recovery patch set but discard
>   irrelevant portion, since it might need more time to re-forge and
>   these are two different features.
>   Link: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#m9782d23496cacecb7da07a67daa79f4b322ae170
> 
> Changes since V2:
> - Rework the patch suggested by Jarkko.
> - Remove struct sgx_vepc_page and relevant code.
> - Remove new EPC page flag SGX_EPC_PAGE_IS_VEPC definition as it is
>   duplicated to SGX_EPC_PAGE_KVM_GUEST.
>   Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u
> 
> Changes since V1:
> - Add documentation suggested by Jarkko.
> ---
>  arch/x86/kernel/cpu/sgx/sgx.h  | 2 ++
>  arch/x86/kernel/cpu/sgx/virt.c | 4 +++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index 0f17def9fe6f..b43582da1bcf 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -28,6 +28,8 @@
>  
>  /* Pages on free list */
>  #define SGX_EPC_PAGE_IS_FREE		BIT(1)
> +/* Pages allocated for KVM guest */
> +#define SGX_EPC_PAGE_KVM_GUEST		BIT(2)
>  
>  struct sgx_epc_page {
>  	unsigned int section;
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 6a77a14eee38..776ae5c1c032 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -46,10 +46,12 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc,
>  	if (epc_page)
>  		return 0;
>  
> -	epc_page = sgx_alloc_epc_page(vepc, false);
> +	epc_page = sgx_alloc_epc_page((void *)addr, false);
>  	if (IS_ERR(epc_page))
>  		return PTR_ERR(epc_page);
>  
> +	epc_page->flags |= SGX_EPC_PAGE_KVM_GUEST;
> +
>  	ret = xa_err(xa_store(&vepc->page_array, index, epc_page, GFP_KERNEL));
>  	if (ret)
>  		goto err_free;


Acked-by: Kai Huang <kai.huang@intel.com>

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior
  2022-06-08  3:26 [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Zhiquan Li
                   ` (2 preceding siblings ...)
  2022-06-08  3:26 ` [PATCH v4 3/3] x86/sgx: Fine grained SGX MCA behavior for normal case Zhiquan Li
@ 2022-06-08  8:10 ` Jarkko Sakkinen
  2022-06-08  9:12   ` Jarkko Sakkinen
  2022-06-08  9:48   ` Zhiquan Li
  3 siblings, 2 replies; 12+ messages in thread
From: Jarkko Sakkinen @ 2022-06-08  8:10 UTC (permalink / raw)
  To: Zhiquan Li
  Cc: linux-sgx, tony.luck, dave.hansen, seanjc, kai.huang, fan.du,
	cathy.zhang

On Wed, Jun 08, 2022 at 11:26:51AM +0800, Zhiquan Li wrote:
> V3: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#t
> 
> Changes since V3:
> - Move the definition of EPC page flag SGX_EPC_PAGE_KVM_GUEST from
>   Cathy's third patch of SGX rebootless recovery patch set but discard
>   irrelevant portion, since it might need more time to re-forge and
>   these are two different features.
>   Link: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#m9782d23496cacecb7da07a67daa79f4b322ae170
> 
> V2: https://lore.kernel.org/linux-sgx/694234d7-6a0d-e85f-f2f9-e52b4a61e1ec@intel.com/T/#t
> 
> Changes since V2:
> - Repurpose the owner field as the virtual address of virtual EPC page
> - Remove struct sgx_vepc_page and relevant code.
> - Remove patch 01 as the changes are not necessary in new design.
> - Rework patch 02 suggested by Jarkko.
> - Adapt patch 03 and 04 since struct sgx_vepc_page was discarded.
> - Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
>   SGX_EPC_PAGE_KVM_GUEST as they are duplicated.
>   Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u
> 
> V1: https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@intel.com/T/#t
> 
> Changes since V1:
> - Updated cover letter and commit messages, added valuable
>   information from Jarkko, Tony and Kai’s comments.
> - Added documentations for struct struct sgx_vepc and
>   struct sgx_vepc_page.
> 
> Hi everyone,
> 
> This series contains a few patches to fine grained SGX MCA behavior.
> 
> When VM guest access a SGX EPC page with memory failure, current
> behavior will kill the guest, expected only kill the SGX application
> inside it.
> 
> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
> information for hypervisor to inject #MC information to guest, which
> is helpful in SGX virtualization case.
> 
> The rest of things are guest side. Currently the hypervisor like
> Qemu already has mature facility to convert HVA to GPA and inject #MC
> to the guest OS.
> 
> Then we extend the solution for the normal SGX case, so that the task
> has opportunity to make further decision while EPC page has memory
> failure.
> 
> However, when a page triggers a machine check, it only reports the PFN.
> But in order to inject #MC into hypervisor, the virtual address
> is required. Then repurpose the “owner” field as the virtual address of
> the virtual EPC page so that arch_memory_failure() can easily retrieve
> it.
> 
> Add a new EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the
> meaning of the field.
> 
> Suppose an enclave is shared by multiple processes, when an enclave
> page triggers a machine check, the enclave will be disabled so that
> it couldn't be entered again. Killing other processes with the same
> enclave mapped would perhaps be overkill, but they are going to find
> that the enclave is "dead" next time they try to use it. Thanks for
> Jarkko’s head up and Tony’s clarification on this point.
> 
> Our intension is to provide additional info so that the application has
> more choices. Current behavior looks gently, and we don’t want to
> change it.
> 
> If you expect the other processes to be informed in such case, then
> you’re looking for an MCA “early kill” feature which worth another
> patch set to implement it.
> 
> Unlike host enclaves, virtual EPC instance cannot be shared by multiple
> VMs. It is because how enclaves are created is totally up to the guest.
> Sharing virtual EPC instance will be very likely to unexpectedly break
> enclaves in all VMs.
> 
> SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
> being shared by multiple VMs via fork(). However KVM doesn't support
> running a VM across multiple mm structures, and the de facto userspace
> hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
> this should not happen.
> 
> This series is based on tip/x86/sgx.
> 
> Tests:
> 1. MCE injection test for SGX in VM.
>    As we expected, the application was killed and VM was alive.
> 2. MCE injection test for SGX on host.
>    As we expected, the application received SIGBUS with extra info.
> 3. Kernel selftest/sgx: PASS
> 4. Internal SGX stress test: PASS
> 5. kmemleak test: No memory leakage detected.
> 
> Much appreciate your feedback.
> 
> Best Regards,
> Zhiquan
> 
> Zhiquan Li (3):
>   x86/sgx: Repurpose the owner field as the virtual address of virtual
>     EPC page
>   x86/sgx: Fine grained SGX MCA behavior for virtualization
>   x86/sgx: Fine grained SGX MCA behavior for normal case
> 
>  arch/x86/kernel/cpu/sgx/main.c | 27 +++++++++++++++++++++++++--
>  arch/x86/kernel/cpu/sgx/sgx.h  |  2 ++
>  arch/x86/kernel/cpu/sgx/virt.c |  4 +++-
>  3 files changed, 30 insertions(+), 3 deletions(-)
> 
> -- 
> 2.25.1
> 

LGTM, I'll have to check if I'm able to trigger MCE with
/sys/devices/system/memory/hard_offline_page, as hinted by Tony.

Just trying to think how to get a legit PFN number. I guess one workable
way is to attach kretprobe to sgx_alloc_epc_page(), and do similar
conversion as in sgx_get_epc_phys_addr() for ((struct sgx_epc_page
*)retval) and print it out.

BR, Jarkko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization
  2022-06-08  3:52   ` Kai Huang
@ 2022-06-08  8:13     ` Jarkko Sakkinen
  2022-06-08  8:33       ` Zhiquan Li
  0 siblings, 1 reply; 12+ messages in thread
From: Jarkko Sakkinen @ 2022-06-08  8:13 UTC (permalink / raw)
  To: Kai Huang
  Cc: Zhiquan Li, linux-sgx, tony.luck, dave.hansen, seanjc, fan.du,
	cathy.zhang

On Wed, Jun 08, 2022 at 03:52:46PM +1200, Kai Huang wrote:
> On Wed, 2022-06-08 at 11:26 +0800, Zhiquan Li wrote:
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -715,6 +715,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
> >  	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
> >  	struct sgx_epc_section *section;
> >  	struct sgx_numa_node *node;
> > +	int ret = 0;
> > +	unsigned long vaddr;
> 
> Please switch the order of the two variables so all of variables are in reverse
> Christmas style.

Yeah, we prefer that. Is it necessary to initialize ret?

BR, Jarkko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization
  2022-06-08  8:13     ` Jarkko Sakkinen
@ 2022-06-08  8:33       ` Zhiquan Li
  0 siblings, 0 replies; 12+ messages in thread
From: Zhiquan Li @ 2022-06-08  8:33 UTC (permalink / raw)
  To: Jarkko Sakkinen, Kai Huang
  Cc: linux-sgx, tony.luck, dave.hansen, seanjc, fan.du, cathy.zhang


On 2022/6/8 16:13, Jarkko Sakkinen wrote:
> On Wed, Jun 08, 2022 at 03:52:46PM +1200, Kai Huang wrote:
>> On Wed, 2022-06-08 at 11:26 +0800, Zhiquan Li wrote:
>>> --- a/arch/x86/kernel/cpu/sgx/main.c
>>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>>> @@ -715,6 +715,8 @@ int arch_memory_failure(unsigned long pfn, int flags)
>>>  	struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
>>>  	struct sgx_epc_section *section;
>>>  	struct sgx_numa_node *node;
>>> +	int ret = 0;
>>> +	unsigned long vaddr;
>>
>> Please switch the order of the two variables so all of variables are in reverse
>> Christmas style.
> 
> Yeah, we prefer that. Is it necessary to initialize ret?
> 

No problem, I will switch the order in V5 patch.

I referenced mm/memory-failure.c:kill_proc() to initialize "ret".
Whatever it will be overridden by the return value of force_sig_mceerr(), so it's
not necessary, we can remove it in V5 patch.

Best Regards,
Zhiquan

> BR, Jarkko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior
  2022-06-08  8:10 ` [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Jarkko Sakkinen
@ 2022-06-08  9:12   ` Jarkko Sakkinen
  2022-06-08  9:48   ` Zhiquan Li
  1 sibling, 0 replies; 12+ messages in thread
From: Jarkko Sakkinen @ 2022-06-08  9:12 UTC (permalink / raw)
  To: Zhiquan Li
  Cc: linux-sgx, tony.luck, dave.hansen, seanjc, kai.huang, fan.du,
	cathy.zhang

On Wed, Jun 08, 2022 at 11:10:23AM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 08, 2022 at 11:26:51AM +0800, Zhiquan Li wrote:
> > V3: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#t
> > 
> > Changes since V3:
> > - Move the definition of EPC page flag SGX_EPC_PAGE_KVM_GUEST from
> >   Cathy's third patch of SGX rebootless recovery patch set but discard
> >   irrelevant portion, since it might need more time to re-forge and
> >   these are two different features.
> >   Link: https://lore.kernel.org/linux-sgx/41704e5d4c03b49fcda12e695595211d950cfb08.camel@kernel.org/T/#m9782d23496cacecb7da07a67daa79f4b322ae170
> > 
> > V2: https://lore.kernel.org/linux-sgx/694234d7-6a0d-e85f-f2f9-e52b4a61e1ec@intel.com/T/#t
> > 
> > Changes since V2:
> > - Repurpose the owner field as the virtual address of virtual EPC page
> > - Remove struct sgx_vepc_page and relevant code.
> > - Remove patch 01 as the changes are not necessary in new design.
> > - Rework patch 02 suggested by Jarkko.
> > - Adapt patch 03 and 04 since struct sgx_vepc_page was discarded.
> > - Replace EPC page flag SGX_EPC_PAGE_IS_VEPC with
> >   SGX_EPC_PAGE_KVM_GUEST as they are duplicated.
> >   Link: https://lore.kernel.org/linux-sgx/eb95b32ecf3d44a695610cf7f2816785@intel.com/T/#u
> > 
> > V1: https://lore.kernel.org/linux-sgx/443cb425-009c-2784-56f4-5e707122de76@intel.com/T/#t
> > 
> > Changes since V1:
> > - Updated cover letter and commit messages, added valuable
> >   information from Jarkko, Tony and Kai’s comments.
> > - Added documentations for struct struct sgx_vepc and
> >   struct sgx_vepc_page.
> > 
> > Hi everyone,
> > 
> > This series contains a few patches to fine grained SGX MCA behavior.
> > 
> > When VM guest access a SGX EPC page with memory failure, current
> > behavior will kill the guest, expected only kill the SGX application
> > inside it.
> > 
> > To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
> > information for hypervisor to inject #MC information to guest, which
> > is helpful in SGX virtualization case.
> > 
> > The rest of things are guest side. Currently the hypervisor like
> > Qemu already has mature facility to convert HVA to GPA and inject #MC
> > to the guest OS.
> > 
> > Then we extend the solution for the normal SGX case, so that the task
> > has opportunity to make further decision while EPC page has memory
> > failure.
> > 
> > However, when a page triggers a machine check, it only reports the PFN.
> > But in order to inject #MC into hypervisor, the virtual address
> > is required. Then repurpose the “owner” field as the virtual address of
> > the virtual EPC page so that arch_memory_failure() can easily retrieve
> > it.
> > 
> > Add a new EPC page flag - SGX_EPC_PAGE_KVM_GUEST to interpret the
> > meaning of the field.
> > 
> > Suppose an enclave is shared by multiple processes, when an enclave
> > page triggers a machine check, the enclave will be disabled so that
> > it couldn't be entered again. Killing other processes with the same
> > enclave mapped would perhaps be overkill, but they are going to find
> > that the enclave is "dead" next time they try to use it. Thanks for
> > Jarkko’s head up and Tony’s clarification on this point.
> > 
> > Our intension is to provide additional info so that the application has
> > more choices. Current behavior looks gently, and we don’t want to
> > change it.
> > 
> > If you expect the other processes to be informed in such case, then
> > you’re looking for an MCA “early kill” feature which worth another
> > patch set to implement it.
> > 
> > Unlike host enclaves, virtual EPC instance cannot be shared by multiple
> > VMs. It is because how enclaves are created is totally up to the guest.
> > Sharing virtual EPC instance will be very likely to unexpectedly break
> > enclaves in all VMs.
> > 
> > SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
> > being shared by multiple VMs via fork(). However KVM doesn't support
> > running a VM across multiple mm structures, and the de facto userspace
> > hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
> > this should not happen.
> > 
> > This series is based on tip/x86/sgx.
> > 
> > Tests:
> > 1. MCE injection test for SGX in VM.
> >    As we expected, the application was killed and VM was alive.
> > 2. MCE injection test for SGX on host.
> >    As we expected, the application received SIGBUS with extra info.
> > 3. Kernel selftest/sgx: PASS
> > 4. Internal SGX stress test: PASS
> > 5. kmemleak test: No memory leakage detected.
> > 
> > Much appreciate your feedback.
> > 
> > Best Regards,
> > Zhiquan
> > 
> > Zhiquan Li (3):
> >   x86/sgx: Repurpose the owner field as the virtual address of virtual
> >     EPC page
> >   x86/sgx: Fine grained SGX MCA behavior for virtualization
> >   x86/sgx: Fine grained SGX MCA behavior for normal case
> > 
> >  arch/x86/kernel/cpu/sgx/main.c | 27 +++++++++++++++++++++++++--
> >  arch/x86/kernel/cpu/sgx/sgx.h  |  2 ++
> >  arch/x86/kernel/cpu/sgx/virt.c |  4 +++-
> >  3 files changed, 30 insertions(+), 3 deletions(-)
> > 
> > -- 
> > 2.25.1
> > 
> 
> LGTM, I'll have to check if I'm able to trigger MCE with
> /sys/devices/system/memory/hard_offline_page, as hinted by Tony.
> 
> Just trying to think how to get a legit PFN number. I guess one workable
> way is to attach kretprobe to sgx_alloc_epc_page(), and do similar
> conversion as in sgx_get_epc_phys_addr() for ((struct sgx_epc_page
> *)retval) and print it out.

Or I just lookup the address range with dmesg, and then loop through
the PFN's writing them one by one until the enclave dies.

BR, Jarkko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior
  2022-06-08  8:10 ` [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Jarkko Sakkinen
  2022-06-08  9:12   ` Jarkko Sakkinen
@ 2022-06-08  9:48   ` Zhiquan Li
  1 sibling, 0 replies; 12+ messages in thread
From: Zhiquan Li @ 2022-06-08  9:48 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, tony.luck, dave.hansen, seanjc, kai.huang, fan.du,
	cathy.zhang


On 2022/6/8 16:10, Jarkko Sakkinen wrote:
> LGTM, I'll have to check if I'm able to trigger MCE with
> /sys/devices/system/memory/hard_offline_page, as hinted by Tony.
> 
> Just trying to think how to get a legit PFN number. I guess one workable
> way is to attach kretprobe to sgx_alloc_epc_page(), and do similar
> conversion as in sgx_get_epc_phys_addr() for ((struct sgx_epc_page
> *)retval) and print it out.
> 

We follow the hint in Documentation/firmware-guide/acpi/apei/einj.rst
added by Tony.
To validate the part for virtualization, we do step 1~2 on host, do step
3~7 in VM.

Regarding to how to get the SGX EPC page mappings among GVA -> GPA -> HPA,
we do something like these:

1. Get GVA -> GPA in guest OS

1) Find the probe point in sgx_vma_fault(), as vmf_insert_pfn() only be
   call once in sgx_vma_fault():

   crash> dis sgx_vma_fault | grep vmf_insert_pfn
   0xffffffff8ce527b1 <sgx_vma_fault+113>: callq  0xffffffff8d0ec1d0 <vmf_insert_pfn>

2) Get the mapping of GVA to guest PFN

   echo 'p:sgxvmfault sgx_vma_fault+113 vaddr=%si pfn=%dx' >> /sys/kernel/debug/tracing/kprobe_events
   cat /sys/kernel/debug/tracing/kprobe_events
   echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
   cat /sys/kernel/debug/tracing/trace_pipe

2. Get GPA -> HPA on host OS
__sgx_vepc_fault() can tell us the mapping of HVA -> HPA, but to inject a
memory failure, we need GPA -> HPA. There are several ways can archive this,
e.g.,

- patch Qemu to show GPA -> HVA, then we can easily convert HVA -> HPA
- Walk EPT table
- patch kernel to show GPA -> HPA

We use the last one because it's most straightforward.

@@ -4047,6 +4047,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
    else
        r = __direct_map(vcpu, fault);

+   if (!!sgx_paddr_to_page(fault->pfn << PAGE_SHIFT))
+       trace_printk("SGX: gpa:0x%llx hpa:0x%llx\n", fault->gfn << PAGE_SHIFT, fault->pfn << PAGE_SHIFT);
 out_unlock:
    if (is_tdp_mmu_fault)
        read_unlock(&vcpu->kvm->mmu_lock);

(Because the filter of ftrace kprobe cannot support such a complex
expression, so we have to patch the host kernel directly.)

Then we get the mappings of GVA -> GPA -> HPA, next we can inject
real errors into enclave memory using ACPI/EINJ. Try to touch the
GVA in guest OS will trigger the bug and see how the patch 02 work.

Finally, Qemu console will show below message but will not be killed:

    qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f3273f2a000 and GUEST addr 0x18012b000 of type BUS_MCEERR_AR injected


Best Regards,
Zhiquan

> BR, Jarkko

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-08 10:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-08  3:26 [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Zhiquan Li
2022-06-08  3:26 ` [PATCH v4 1/3] x86/sgx: Repurpose the owner field as the virtual address of virtual EPC page Zhiquan Li
2022-06-08  3:45   ` Zhiquan Li
2022-06-08  3:54   ` Kai Huang
2022-06-08  3:26 ` [PATCH v4 2/3] x86/sgx: Fine grained SGX MCA behavior for virtualization Zhiquan Li
2022-06-08  3:52   ` Kai Huang
2022-06-08  8:13     ` Jarkko Sakkinen
2022-06-08  8:33       ` Zhiquan Li
2022-06-08  3:26 ` [PATCH v4 3/3] x86/sgx: Fine grained SGX MCA behavior for normal case Zhiquan Li
2022-06-08  8:10 ` [PATCH v4 0/3] x86/sgx: fine grained SGX MCA behavior Jarkko Sakkinen
2022-06-08  9:12   ` Jarkko Sakkinen
2022-06-08  9:48   ` Zhiquan Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.