linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Support TDX guests on Hyper-V
@ 2023-04-08 20:47 Dexuan Cui
  2023-04-08 20:47 ` [PATCH v4 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Dexuan Cui @ 2023-04-08 20:47 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Tianyu.Lan, Dexuan Cui

The patchset adds the Hyper-V specific code so that a TDX guest can run
on Hyper-V. Please review. Thanks!

This v4 patchset is based on Michael Kelley's v7 DDA patchset:
https://github.com/kelleymh/linux/commits/v7
Michael's patches are being merged into the upstream (some of the
patches are already in tip.git and the others are going upstream via
the Hyper-V tree).

This v4 patchset addressed the comments from Kirill, Sathyanarayanan
and Michael. Please see each patch's log message for the changes.

If you want to view the patches on github, it is also in this branch:
https://github.com/dcui/tdx/commits/decui/michaelv7dda/tdx/v4

FYI, v1-v3 are here:
https://lwn.net/ml/linux-kernel/20221121195151.21812-1-decui@microsoft.com/
https://lwn.net/ml/linux-kernel/20221207003325.21503-7-decui@microsoft.com/
https://lwn.net/ml/linux-kernel/20230206192419.24525-1-decui@microsoft.com/

Thanks,
Dexuan

Dexuan Cui (6):
  x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
  x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  x86/hyperv: Support hypercalls for TDX guests
  Drivers: hv: vmbus: Support TDX guests
  x86/hyperv: Fix serial console interrupts for TDX guests

 arch/x86/coco/tdx/tdx.c            | 122 ++++++++++++++++++++++-------
 arch/x86/hyperv/hv_apic.c          |   6 +-
 arch/x86/hyperv/hv_init.c          |  27 ++++++-
 arch/x86/hyperv/ivm.c              |  20 +++++
 arch/x86/include/asm/hyperv-tlfs.h |   3 +-
 arch/x86/include/asm/mshyperv.h    |  20 +++++
 arch/x86/kernel/cpu/mshyperv.c     |  43 ++++++++++
 drivers/hv/hv.c                    |  62 +++++++++++++--
 drivers/hv/hv_common.c             |  30 +++++++
 include/asm-generic/mshyperv.h     |   1 +
 10 files changed, 295 insertions(+), 39 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
  2023-04-08 20:47 [PATCH v4 0/6] Support TDX guests on Hyper-V Dexuan Cui
@ 2023-04-08 20:47 ` Dexuan Cui
  2023-04-11 16:15   ` Michael Kelley (LINUX)
  2023-04-08 20:47 ` [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Dexuan Cui @ 2023-04-08 20:47 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Tianyu.Lan, Dexuan Cui

GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
operation for the pages in the region starting at the GPA specified
in R11.

When a TDX guest runs on Hyper-V, Hyper-V returns the retry error
when hyperv_init() -> swiotlb_update_mem_attributes() ->
set_memory_decrypted() decrypts up to 1GB of swiotlb bounce buffers.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 12 deletions(-)

Changes in v2:
  Used __tdx_hypercall() directly in tdx_map_gpa().
  Added a max_retry_cnt of 1000.
  Renamed a few variables, e.g., r11 -> map_fail_paddr.

Changes in v3:
  Changed max_retry_cnt from 1000 to 3.

Changes in v4:
  __tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
  Added Kirill's Acked-by.

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 4c4c6db39eca3..5574c91541a2d 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -28,6 +28,8 @@
 #define TDVMCALL_MAP_GPA		0x10001
 #define TDVMCALL_REPORT_FATAL_ERROR	0x10003
 
+#define TDVMCALL_STATUS_RETRY		1
+
 /* MMIO direction */
 #define EPT_READ	0
 #define EPT_WRITE	1
@@ -788,14 +790,15 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
 }
 
 /*
- * Inform the VMM of the guest's intent for this physical page: shared with
- * the VMM or private to the guest.  The VMM is expected to change its mapping
- * of the page in response.
+ * Notify the VMM about page mapping conversion. More info about ABI
+ * can be found in TDX Guest-Host-Communication Interface (GHCI),
+ * section "TDG.VP.VMCALL<MapGPA>".
  */
-static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
 {
-	phys_addr_t start = __pa(vaddr);
-	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+	int max_retry_cnt = 3, retry_cnt = 0;
+	struct tdx_hypercall_args args;
+	u64 map_fail_paddr, ret;
 
 	if (!enc) {
 		/* Set the shared (decrypted) bits: */
@@ -803,12 +806,49 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 		end   |= cc_mkdec(0);
 	}
 
-	/*
-	 * Notify the VMM about page mapping conversion. More info about ABI
-	 * can be found in TDX Guest-Host-Communication Interface (GHCI),
-	 * section "TDG.VP.VMCALL<MapGPA>"
-	 */
-	if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
+	while (1) {
+		memset(&args, 0, sizeof(args));
+		args.r10 = TDX_HYPERCALL_STANDARD;
+		args.r11 = TDVMCALL_MAP_GPA;
+		args.r12 = start;
+		args.r13 = end - start;
+
+		ret = __tdx_hypercall_ret(&args);
+		if (ret != TDVMCALL_STATUS_RETRY)
+			break;
+		/*
+		 * The guest must retry the operation for the pages in the
+		 * region starting at the GPA specified in R11. Make sure R11
+		 * contains a sane value.
+		 */
+		map_fail_paddr = args.r11;
+		if (map_fail_paddr < start || map_fail_paddr >= end)
+			return false;
+
+		if (map_fail_paddr == start) {
+			retry_cnt++;
+			if (retry_cnt > max_retry_cnt)
+				return false;
+		} else {
+			retry_cnt = 0;
+			start = map_fail_paddr;
+		}
+	}
+
+	return !ret;
+}
+
+/*
+ * Inform the VMM of the guest's intent for this physical page: shared with
+ * the VMM or private to the guest. The VMM is expected to change its mapping
+ * of the page in response.
+ */
+static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+{
+	phys_addr_t start = __pa(vaddr);
+	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+
+	if (!tdx_map_gpa(start, end, enc))
 		return false;
 
 	/* private->shared conversion  requires only MapGPA call */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-04-08 20:47 [PATCH v4 0/6] Support TDX guests on Hyper-V Dexuan Cui
  2023-04-08 20:47 ` [PATCH v4 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
@ 2023-04-08 20:47 ` Dexuan Cui
  2023-04-11 16:28   ` Michael Kelley (LINUX)
  2023-04-08 20:47 ` [PATCH v4 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Dexuan Cui @ 2023-04-08 20:47 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Tianyu.Lan, Dexuan Cui

When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
allocates buffers using vzalloc(), and needs to share the buffers with the
host OS by calling set_memory_decrypted(), which is not working for
vmalloc() yet. Add the support by handling the pages one by one.

Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 24 deletions(-)

Changes in v2:
  Changed tdx_enc_status_changed() in place.

Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
how to get the underlying huge page info (if huge page is in use) and
try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
is_vm_area_hugepages() and  __vfree() -> __vunmap(), and I think the
underlying page allocation info is internal to the mm code, and there
is no mm API to for me get the info in tdx_enc_status_changed().

Changes in v3:
  No change since v2.

Changes in v4:
  Added Kirill's Co-developed-by since Kirill helped to improve the
    code by adding tdx_enc_status_changed_phys().

  Thanks Kirill for the clarification on load_unaligned_zeropad()!

  The vzalloc() usage in drivers/net/hyperv/netvsc.c: netvsc_init_buf()
    remains the same. It may not worth it to "allocate a vmalloc region,
    allocate pages manually", because we have to consider the worst case
    where the system is sufferiing from severe memory fragmentation and
    we can only allocate multiple single pages. We may not want to
    complicate the code in netvsc_init_buf(). We'll support NIC SR-IOV
    for TDX VMs on Hyper-V, so the netvsc send/recv buffers won't be
    used when the VF NIC is up.

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 5574c91541a2d..731be50b3d093 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -7,6 +7,7 @@
 #include <linux/cpufeature.h>
 #include <linux/export.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <asm/coco.h>
 #include <asm/tdx.h>
 #include <asm/vmx.h>
@@ -789,6 +790,34 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
 	return true;
 }
 
+static bool try_accept_page(phys_addr_t start, phys_addr_t end)
+{
+	/*
+	 * For shared->private conversion, accept the page using
+	 * TDX_ACCEPT_PAGE TDX module call.
+	 */
+	while (start < end) {
+		unsigned long len = end - start;
+
+		/*
+		 * Try larger accepts first. It gives chance to VMM to keep
+		 * 1G/2M SEPT entries where possible and speeds up process by
+		 * cutting number of hypercalls (if successful).
+		 */
+
+		if (try_accept_one(&start, len, PG_LEVEL_1G))
+			continue;
+
+		if (try_accept_one(&start, len, PG_LEVEL_2M))
+			continue;
+
+		if (!try_accept_one(&start, len, PG_LEVEL_4K))
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * Notify the VMM about page mapping conversion. More info about ABI
  * can be found in TDX Guest-Host-Communication Interface (GHCI),
@@ -838,6 +867,19 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
 	return !ret;
 }
 
+static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end,
+					bool enc)
+{
+	if (!tdx_map_gpa(start, end, enc))
+		return false;
+
+	/* private->shared conversion requires only MapGPA call */
+	if (!enc)
+		return true;
+
+	return try_accept_page(start, end);
+}
+
 /*
  * Inform the VMM of the guest's intent for this physical page: shared with
  * the VMM or private to the guest. The VMM is expected to change its mapping
@@ -845,37 +887,23 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
  */
 static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 {
-	phys_addr_t start = __pa(vaddr);
-	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+	unsigned long start = vaddr;
+	unsigned long end = start + numpages * PAGE_SIZE;
 
-	if (!tdx_map_gpa(start, end, enc))
+	if (offset_in_page(start) != 0)
 		return false;
 
-	/* private->shared conversion  requires only MapGPA call */
-	if (!enc)
-		return true;
+	if (!is_vmalloc_addr((void *)start))
+		return tdx_enc_status_changed_phys(__pa(start), __pa(end), enc);
 
-	/*
-	 * For shared->private conversion, accept the page using
-	 * TDX_ACCEPT_PAGE TDX module call.
-	 */
 	while (start < end) {
-		unsigned long len = end - start;
+		phys_addr_t start_pa = slow_virt_to_phys((void *)start);
+		phys_addr_t end_pa = start_pa + PAGE_SIZE;
 
-		/*
-		 * Try larger accepts first. It gives chance to VMM to keep
-		 * 1G/2M SEPT entries where possible and speeds up process by
-		 * cutting number of hypercalls (if successful).
-		 */
-
-		if (try_accept_one(&start, len, PG_LEVEL_1G))
-			continue;
-
-		if (try_accept_one(&start, len, PG_LEVEL_2M))
-			continue;
-
-		if (!try_accept_one(&start, len, PG_LEVEL_4K))
+		if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc))
 			return false;
+
+		start += PAGE_SIZE;
 	}
 
 	return true;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  2023-04-08 20:47 [PATCH v4 0/6] Support TDX guests on Hyper-V Dexuan Cui
  2023-04-08 20:47 ` [PATCH v4 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
  2023-04-08 20:47 ` [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
@ 2023-04-08 20:47 ` Dexuan Cui
  2023-04-11 16:30   ` Michael Kelley (LINUX)
  2023-04-08 20:47 ` [PATCH v4 4/6] x86/hyperv: Support hypercalls for " Dexuan Cui
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Dexuan Cui @ 2023-04-08 20:47 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Tianyu.Lan, Dexuan Cui

No logic change to SNP/VBS guests.

hv_isolation_type_tdx() wil be used to instruct a TDX guest on Hyper-V to
do some TDX-specific operations, e.g. hv_do_hypercall() should use
__tdx_hypercall(), and a TDX guest on Hyper-V should handle the Hyper-V
Event/Message/Monitor pages specially.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/hyperv/ivm.c              | 6 ++++++
 arch/x86/include/asm/hyperv-tlfs.h | 3 ++-
 arch/x86/include/asm/mshyperv.h    | 3 +++
 arch/x86/kernel/cpu/mshyperv.c     | 2 ++
 drivers/hv/hv_common.c             | 6 ++++++
 5 files changed, 19 insertions(+), 1 deletion(-)

Changes in v2:
  Added "#ifdef CONFIG_INTEL_TDX_GUEST and #endif" for
    hv_isolation_type_tdx() in arch/x86/hyperv/ivm.c.

    Simplified the changes in ms_hyperv_init_platform().

Changes in v3:
  Added Kuppuswamy's Reviewed-by.

Changes in v4:
  A minor rebase to Michael's v7 DDA patchset.

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 127d5b7b63de1..3658ade4f4121 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -400,6 +400,7 @@ bool hv_is_isolation_supported(void)
 }
 
 DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
 
 /*
  * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
@@ -409,3 +410,8 @@ bool hv_isolation_type_snp(void)
 {
 	return static_branch_unlikely(&isolation_type_snp);
 }
+
+bool hv_isolation_type_tdx(void)
+{
+	return static_branch_unlikely(&isolation_type_tdx);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index b4fb75bd10138..338f383c721c9 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -169,7 +169,8 @@
 enum hv_isolation_type {
 	HV_ISOLATION_TYPE_NONE	= 0,
 	HV_ISOLATION_TYPE_VBS	= 1,
-	HV_ISOLATION_TYPE_SNP	= 2
+	HV_ISOLATION_TYPE_SNP	= 2,
+	HV_ISOLATION_TYPE_TDX	= 3
 };
 
 /* Hyper-V specific model specific registers (MSRs) */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index e3cef98a01420..de7ceae9e65e9 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -22,6 +22,7 @@
 union hv_ghcb;
 
 DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);
 
 typedef int (*hyperv_fill_flush_list_func)(
 		struct hv_guest_mapping_flush_list *flush,
@@ -38,6 +39,8 @@ extern u64 hv_current_partition_id;
 
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
+extern bool hv_isolation_type_tdx(void);
+
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index ff348ebb6ae28..a87fb934cd4b4 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -405,6 +405,8 @@ static void __init ms_hyperv_init_platform(void)
 
 		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
 			static_branch_enable(&isolation_type_snp);
+		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+			static_branch_enable(&isolation_type_tdx);
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 6d40b6c7b23b9..c55db7ea6580b 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -271,6 +271,12 @@ bool __weak hv_isolation_type_snp(void)
 }
 EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
 
+bool __weak hv_isolation_type_tdx(void)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
+
 void __weak hv_setup_vmbus_handler(void (*handler)(void))
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 4/6] x86/hyperv: Support hypercalls for TDX guests
  2023-04-08 20:47 [PATCH v4 0/6] Support TDX guests on Hyper-V Dexuan Cui
                   ` (2 preceding siblings ...)
  2023-04-08 20:47 ` [PATCH v4 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
@ 2023-04-08 20:47 ` Dexuan Cui
  2023-04-11 16:36   ` Michael Kelley (LINUX)
  2023-04-08 20:47 ` [PATCH v4 5/6] Drivers: hv: vmbus: Support " Dexuan Cui
  2023-04-08 20:47 ` [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for " Dexuan Cui
  5 siblings, 1 reply; 20+ messages in thread
From: Dexuan Cui @ 2023-04-08 20:47 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Tianyu.Lan, Dexuan Cui

A TDX guest uses the GHCI call rather than hv_hypercall_pg.

In hv_do_hypercall(), Hyper-V requires that the input/output addresses
must have the cc_mask.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
Changes in v2:
  Implemented hv_tdx_hypercall() in C rather than in assembly code.
  Renamed the parameter names of hv_tdx_hypercall().
  Used cc_mkdec() directly in hv_do_hypercall().

Changes in v3:
  Decrypted/encrypted hyperv_pcpu_input_arg in
    hv_common_cpu_init() and hv_common_cpu_die().

Changes in v4:
  __tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
  hv_common_cpu_die(): explicitly ignore the error set_memory_encrypted() [Michael Kelley]
  Added Sathyanarayanan's Reviewed-by.

 arch/x86/hyperv/hv_init.c       |  8 ++++++++
 arch/x86/hyperv/ivm.c           | 14 ++++++++++++++
 arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
 drivers/hv/hv_common.c          | 24 ++++++++++++++++++++++++
 include/asm-generic/mshyperv.h  |  1 +
 5 files changed, 64 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a5f9474f08e12..f175e0de821c3 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -432,6 +432,10 @@ void __init hyperv_init(void)
 	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
 	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
+	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+	if (hv_isolation_type_tdx())
+		goto skip_hypercall_pg_init;
+
 	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
 			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
@@ -471,6 +475,7 @@ void __init hyperv_init(void)
 		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	}
 
+skip_hypercall_pg_init:
 	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
 	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -594,6 +599,9 @@ bool hv_is_hyperv_initialized(void)
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return false;
 
+	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+	if (hv_isolation_type_tdx())
+		return true;
 	/*
 	 * Verify that earlier initialization succeeded by checking
 	 * that the hypercall page is setup
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 3658ade4f4121..23304c9ddd34f 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -415,3 +415,17 @@ bool hv_isolation_type_tdx(void)
 {
 	return static_branch_unlikely(&isolation_type_tdx);
 }
+
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
+{
+	struct tdx_hypercall_args args = { };
+
+	args.r10 = control;
+	args.rdx = param1;
+	args.r8  = param2;
+
+	(void)__tdx_hypercall_ret(&args);
+
+	return args.r11;
+}
+EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index de7ceae9e65e9..71077326f57bc 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -10,6 +10,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/paravirt.h>
 #include <asm/mshyperv.h>
+#include <asm/coco.h>
 
 /*
  * Hyper-V always provides a single IO-APIC at this MMIO address.
@@ -45,6 +46,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
 
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+
+/*
+ * If the hypercall involves no input or output parameters, the hypervisor
+ * ignores the corresponding GPA pointer.
+ */
 static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 {
 	u64 input_address = input ? virt_to_phys(input) : 0;
@@ -52,6 +59,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control,
+					cc_mkdec(input_address),
+					cc_mkdec(output_address));
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
@@ -95,6 +106,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control, input1, 0);
+
 	{
 		__asm__ __volatile__(CALL_NOSPEC
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -140,6 +154,9 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control, input1, input2);
+
 	{
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     CALL_NOSPEC
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index c55db7ea6580b..10e85682e83eb 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -21,6 +21,7 @@
 #include <linux/ptrace.h>
 #include <linux/slab.h>
 #include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
 
@@ -128,6 +129,7 @@ int hv_common_cpu_init(unsigned int cpu)
 	u64 msr_vp_index;
 	gfp_t flags;
 	int pgcount = hv_root_partition ? 2 : 1;
+	int ret;
 
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
 	flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -137,6 +139,17 @@ int hv_common_cpu_init(unsigned int cpu)
 	if (!(*inputarg))
 		return -ENOMEM;
 
+	if (hv_isolation_type_tdx()) {
+		ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+		if (ret) {
+			/* It may be unsafe to free *inputarg */
+			*inputarg = NULL;
+			return ret;
+		}
+
+		memset(*inputarg, 0x00, pgcount * HV_HYP_PAGE_SIZE);
+	}
+
 	if (hv_root_partition) {
 		outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
 		*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -157,6 +170,8 @@ int hv_common_cpu_die(unsigned int cpu)
 	unsigned long flags;
 	void **inputarg, **outputarg;
 	void *mem;
+	int pgcount = hv_root_partition ? 2 : 1;
+	int ret;
 
 	local_irq_save(flags);
 
@@ -171,6 +186,15 @@ int hv_common_cpu_die(unsigned int cpu)
 
 	local_irq_restore(flags);
 
+	if (hv_isolation_type_tdx()) {
+		ret = set_memory_encrypted((unsigned long)mem, pgcount);
+		if (ret)
+			pr_warn("Hyper-V: Failed to encrypt input arg on cpu%d: %d\n",
+				cpu, ret);
+		/* It's unsafe to free 'mem'. */
+		return 0;
+	}
+
 	kfree(mem);
 
 	return 0;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index afcd9ae9588cb..83e56ebe0cb7f 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -58,6 +58,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
 extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
 extern bool hv_isolation_type_snp(void);
+extern bool hv_isolation_type_tdx(void);
 
 /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
 static inline int hv_result(u64 status)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests
  2023-04-08 20:47 [PATCH v4 0/6] Support TDX guests on Hyper-V Dexuan Cui
                   ` (3 preceding siblings ...)
  2023-04-08 20:47 ` [PATCH v4 4/6] x86/hyperv: Support hypercalls for " Dexuan Cui
@ 2023-04-08 20:47 ` Dexuan Cui
  2023-04-11 16:53   ` Michael Kelley (LINUX)
  2023-04-12 14:05   ` Michael Kelley (LINUX)
  2023-04-08 20:47 ` [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for " Dexuan Cui
  5 siblings, 2 replies; 20+ messages in thread
From: Dexuan Cui @ 2023-04-08 20:47 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Tianyu.Lan, Dexuan Cui

Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
  No need to use hv_vp_assist_page.
  Don't use the unsafe Hyper-V TSC page.
  Don't try to use HV_REGISTER_CRASH_CTL.
  Don't trust Hyper-V's TLB-flushing hypercalls.
  Don't use lazy EOI.
  Share SynIC Event/Message pages and VMBus Monitor pages with the host.
  Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2:
  Used a new function hv_set_memory_enc_dec_needed() in
    __set_memory_enc_pgtable().
  Added the missing set_memory_encrypted() in hv_synic_free().

Changes in v3:
  Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
  (Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).

  Used cc_mkdec() in hv_synic_enable_regs().

  ms_hyperv_init_platform():
    Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
    Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.

  Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.

Changes in v4:
  A minor rebase to Michael's v7 DDA patchset. I'm very happy that
    I can drop my v3 change to arch/x86/mm/pat/set_memory.c due to
    Michael's work.

 arch/x86/hyperv/hv_apic.c      |  6 ++--
 arch/x86/hyperv/hv_init.c      | 19 ++++++++---
 arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++-
 drivers/hv/hv.c                | 62 +++++++++++++++++++++++++++++++---
 4 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index fb8b2c088681a..16919c7b3196e 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
 	    (exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
 		return true;
 
-	if (!hv_hypercall_pg)
+	/* A TDX guest doesn't use hv_hypercall_pg. */
+	if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
 		return false;
 
 	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
@@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)
 
 	trace_hyperv_send_ipi_one(cpu, vector);
 
-	if (!hv_hypercall_pg || (vp == VP_INVAL))
+	/* A TDX guest doesn't use hv_hypercall_pg. */
+	if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
 		return false;
 
 	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index f175e0de821c3..f28357ecad7d9 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -79,7 +79,7 @@ static int hyperv_init_ghcb(void)
 static int hv_cpu_init(unsigned int cpu)
 {
 	union hv_vp_assist_msr_contents msr = { 0 };
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
+	struct hv_vp_assist_page **hvp;
 	int ret;
 
 	ret = hv_common_cpu_init(cpu);
@@ -89,6 +89,7 @@ static int hv_cpu_init(unsigned int cpu)
 	if (!hv_vp_assist_page)
 		return 0;
 
+	hvp = &hv_vp_assist_page[cpu];
 	if (hv_root_partition) {
 		/*
 		 * For root partition we get the hypervisor provided VP assist
@@ -398,11 +399,21 @@ void __init hyperv_init(void)
 	if (hv_common_init())
 		return;
 
-	hv_vp_assist_page = kcalloc(num_possible_cpus(),
-				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
+	/*
+	 * The VP assist page is useless to a TDX guest: the only use we
+	 * would have for it is lazy EOI, which can not be used with TDX.
+	 */
+	if (hv_isolation_type_tdx())
+		hv_vp_assist_page = NULL;
+	else
+		hv_vp_assist_page = kcalloc(num_possible_cpus(),
+					    sizeof(*hv_vp_assist_page),
+					    GFP_KERNEL);
 	if (!hv_vp_assist_page) {
 		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
-		goto common_free;
+
+		if (!hv_isolation_type_tdx())
+			goto common_free;
 	}
 
 	if (hv_isolation_type_snp()) {
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index a87fb934cd4b4..e9106c9d92f81 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -405,8 +405,27 @@ static void __init ms_hyperv_init_platform(void)
 
 		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
 			static_branch_enable(&isolation_type_snp);
-		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
 			static_branch_enable(&isolation_type_tdx);
+
+			/*
+			 * The GPAs of SynIC Event/Message pages and VMBus
+			 * Moniter pages need to be added by this offset.
+			 */
+			ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
+
+			/* Don't use the unsafe Hyper-V TSC page */
+			ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+
+			/* HV_REGISTER_CRASH_CTL is unsupported */
+			ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+
+			/* Don't trust Hyper-V's TLB-flushing hypercalls */
+			ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+
+			/* A TDX VM must use x2APIC and doesn't use lazy EOI */
+			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+		}
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 008234894d287..22ecb79d21efd 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -18,6 +18,7 @@
 #include <linux/clockchips.h>
 #include <linux/delay.h>
 #include <linux/interrupt.h>
+#include <linux/set_memory.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/mshyperv.h>
 #include "hyperv_vmbus.h"
@@ -119,6 +120,7 @@ int hv_synic_alloc(void)
 {
 	int cpu;
 	struct hv_per_cpu_context *hv_cpu;
+	int ret = -ENOMEM;
 
 	/*
 	 * First, zero all per-cpu memory areas so hv_synic_free() can
@@ -168,6 +170,30 @@ int hv_synic_alloc(void)
 			pr_err("Unable to allocate post msg page\n");
 			goto err;
 		}
+
+
+		if (hv_isolation_type_tdx()) {
+			ret = set_memory_decrypted(
+				(unsigned long)hv_cpu->synic_message_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt SYNIC msg page\n");
+				goto err;
+			}
+
+			ret = set_memory_decrypted(
+				(unsigned long)hv_cpu->synic_event_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt SYNIC event page\n");
+				goto err;
+			}
+
+			ret = set_memory_decrypted(
+				(unsigned long)hv_cpu->post_msg_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt post msg page\n");
+				goto err;
+			}
+		}
 	}
 
 	return 0;
@@ -176,18 +202,42 @@ int hv_synic_alloc(void)
 	 * Any memory allocations that succeeded will be freed when
 	 * the caller cleans up by calling hv_synic_free()
 	 */
-	return -ENOMEM;
+	return ret;
 }
 
 
 void hv_synic_free(void)
 {
 	int cpu;
+	int ret;
 
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
+		if (hv_isolation_type_tdx()) {
+			ret = set_memory_encrypted(
+				(unsigned long)hv_cpu->synic_message_page, 1);
+			if (ret) {
+				pr_err("Failed to encrypt SYNIC msg page\n");
+				continue;
+			}
+
+			ret = set_memory_encrypted(
+				(unsigned long)hv_cpu->synic_event_page, 1);
+			if (ret) {
+				pr_err("Failed to encrypt SYNIC event page\n");
+				continue;
+			}
+
+			ret = set_memory_encrypted(
+				(unsigned long)hv_cpu->post_msg_page, 1);
+			if (ret) {
+				pr_err("Failed to encrypt post msg page\n");
+				continue;
+			}
+		}
+
 		free_page((unsigned long)hv_cpu->synic_event_page);
 		free_page((unsigned long)hv_cpu->synic_message_page);
 		free_page((unsigned long)hv_cpu->post_msg_page);
@@ -225,8 +275,9 @@ void hv_synic_enable_regs(unsigned int cpu)
 		if (!hv_cpu->synic_message_page)
 			pr_err("Fail to map synic message page.\n");
 	} else {
-		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
-			>> HV_HYP_PAGE_SHIFT;
+		simp.base_simp_gpa =
+			cc_mkdec(virt_to_phys(hv_cpu->synic_message_page)) >>
+			HV_HYP_PAGE_SHIFT;
 	}
 
 	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
@@ -244,8 +295,9 @@ void hv_synic_enable_regs(unsigned int cpu)
 		if (!hv_cpu->synic_event_page)
 			pr_err("Fail to map synic event page.\n");
 	} else {
-		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
-			>> HV_HYP_PAGE_SHIFT;
+		siefp.base_siefp_gpa =
+			cc_mkdec(virt_to_phys(hv_cpu->synic_event_page)) >>
+			HV_HYP_PAGE_SHIFT;
 	}
 
 	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for TDX guests
  2023-04-08 20:47 [PATCH v4 0/6] Support TDX guests on Hyper-V Dexuan Cui
                   ` (4 preceding siblings ...)
  2023-04-08 20:47 ` [PATCH v4 5/6] Drivers: hv: vmbus: Support " Dexuan Cui
@ 2023-04-08 20:47 ` Dexuan Cui
  2023-04-11 17:13   ` Michael Kelley (LINUX)
  5 siblings, 1 reply; 20+ messages in thread
From: Dexuan Cui @ 2023-04-08 20:47 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Tianyu.Lan, Dexuan Cui

When a TDX guest runs on Hyper-V, the UEFI firmware sets the HW_REDUCED
flag, and consequently ttyS0 interrupts can't work. Fix the issue by
overriding x86_init.acpi.reduced_hw_early_init().

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes since v1:
    None.

 arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e9106c9d92f81..deedced0f2bb0 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -318,6 +318,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
 }
 #endif
 
+/*
+ * When a TDX guest runs on Hyper-V, the firmware sets the HW_REDUCED flag: see
+ * acpi_tb_create_local_fadt(). Consequently ttyS0 interrupts can't work because
+ * request_irq() -> ... -> irq_to_desc() returns NULL for ttyS0. This happens
+ * because mp_config_acpi_legacy_irqs() sees a nr_legacy_irqs() of 0, so it
+ * doesn't initialize the array 'mp_irqs[]', and later setup_IO_APIC_irqs() ->
+ * find_irq_entry() fails to find the legacy irqs from the array, and hence
+ * doesn't create the necessary irq description info.
+ *
+ * Copy arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() but doesn't
+ * change 'legacy_pic', so it keeps its default value 'default_legacy_pic' in
+ * mp_config_acpi_legacy_irqs(), which sees a non-zero nr_legacy_irqs(), and
+ * eventually serial console interrupts can work properly.
+ */
+static void __init reduced_hw_init(void)
+{
+	x86_init.timers.timer_init	= x86_init_noop;
+	x86_init.irqs.pre_vector_init	= x86_init_noop;
+}
+
 static void __init ms_hyperv_init_platform(void)
 {
 	int hv_max_functions_eax;
@@ -425,6 +445,8 @@ static void __init ms_hyperv_init_platform(void)
 
 			/* A TDX VM must use x2APIC and doesn't use lazy EOI */
 			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+
+			x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
 		}
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
  2023-04-08 20:47 ` [PATCH v4 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
@ 2023-04-11 16:15   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-11 16:15 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel, Tianyu Lan

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> 
> GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
> error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
> operation for the pages in the region starting at the GPA specified
> in R11.
> 
> When a TDX guest runs on Hyper-V, Hyper-V returns the retry error
> when hyperv_init() -> swiotlb_update_mem_attributes() ->
> set_memory_decrypted() decrypts up to 1GB of swiotlb bounce buffers.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 52 insertions(+), 12 deletions(-)
> 
> Changes in v2:
>   Used __tdx_hypercall() directly in tdx_map_gpa().
>   Added a max_retry_cnt of 1000.
>   Renamed a few variables, e.g., r11 -> map_fail_paddr.
> 
> Changes in v3:
>   Changed max_retry_cnt from 1000 to 3.
> 
> Changes in v4:
>   __tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
>   Added Kirill's Acked-by.
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 4c4c6db39eca3..5574c91541a2d 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -28,6 +28,8 @@
>  #define TDVMCALL_MAP_GPA		0x10001
>  #define TDVMCALL_REPORT_FATAL_ERROR	0x10003
> 
> +#define TDVMCALL_STATUS_RETRY		1
> +
>  /* MMIO direction */
>  #define EPT_READ	0
>  #define EPT_WRITE	1
> @@ -788,14 +790,15 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
>  }
> 
>  /*
> - * Inform the VMM of the guest's intent for this physical page: shared with
> - * the VMM or private to the guest.  The VMM is expected to change its mapping
> - * of the page in response.
> + * Notify the VMM about page mapping conversion. More info about ABI
> + * can be found in TDX Guest-Host-Communication Interface (GHCI),
> + * section "TDG.VP.VMCALL<MapGPA>".
>   */
> -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> +static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
>  {
> -	phys_addr_t start = __pa(vaddr);
> -	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> +	int max_retry_cnt = 3, retry_cnt = 0;
> +	struct tdx_hypercall_args args;
> +	u64 map_fail_paddr, ret;
> 
>  	if (!enc) {
>  		/* Set the shared (decrypted) bits: */
> @@ -803,12 +806,49 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
>  		end   |= cc_mkdec(0);
>  	}
> 
> -	/*
> -	 * Notify the VMM about page mapping conversion. More info about ABI
> -	 * can be found in TDX Guest-Host-Communication Interface (GHCI),
> -	 * section "TDG.VP.VMCALL<MapGPA>"
> -	 */
> -	if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
> +	while (1) {
> +		memset(&args, 0, sizeof(args));
> +		args.r10 = TDX_HYPERCALL_STANDARD;
> +		args.r11 = TDVMCALL_MAP_GPA;
> +		args.r12 = start;
> +		args.r13 = end - start;
> +
> +		ret = __tdx_hypercall_ret(&args);
> +		if (ret != TDVMCALL_STATUS_RETRY)
> +			break;
> +		/*
> +		 * The guest must retry the operation for the pages in the
> +		 * region starting at the GPA specified in R11. Make sure R11
> +		 * contains a sane value.
> +		 */
> +		map_fail_paddr = args.r11;
> +		if (map_fail_paddr < start || map_fail_paddr >= end)
> +			return false;
> +
> +		if (map_fail_paddr == start) {
> +			retry_cnt++;
> +			if (retry_cnt > max_retry_cnt)
> +				return false;
> +		} else {
> +			retry_cnt = 0;
> +			start = map_fail_paddr;
> +		}
> +	}
> +
> +	return !ret;
> +}
> +
> +/*
> + * Inform the VMM of the guest's intent for this physical page: shared with
> + * the VMM or private to the guest. The VMM is expected to change its mapping
> + * of the page in response.
> + */
> +static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> +{
> +	phys_addr_t start = __pa(vaddr);
> +	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> +
> +	if (!tdx_map_gpa(start, end, enc))
>  		return false;
> 
>  	/* private->shared conversion  requires only MapGPA call */
> --
> 2.25.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-04-08 20:47 ` [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
@ 2023-04-11 16:28   ` Michael Kelley (LINUX)
  2023-04-12 15:19     ` kirill.shutemov
  2023-04-12 15:20     ` kirill.shutemov
  0 siblings, 2 replies; 20+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-11 16:28 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel, Tianyu Lan

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> 
> When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
> allocates buffers using vzalloc(), and needs to share the buffers with the
> host OS by calling set_memory_decrypted(), which is not working for
> vmalloc() yet. Add the support by handling the pages one by one.
> 
> Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++++++-------------
>  1 file changed, 52 insertions(+), 24 deletions(-)
> 
> Changes in v2:
>   Changed tdx_enc_status_changed() in place.
> 
> Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
> how to get the underlying huge page info (if huge page is in use) and
> try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
> is_vm_area_hugepages() and  __vfree() -> __vunmap(), and I think the
> underlying page allocation info is internal to the mm code, and there
> is no mm API to for me get the info in tdx_enc_status_changed().
> 
> Changes in v3:
>   No change since v2.
> 
> Changes in v4:
>   Added Kirill's Co-developed-by since Kirill helped to improve the
>     code by adding tdx_enc_status_changed_phys().
> 
>   Thanks Kirill for the clarification on load_unaligned_zeropad()!
> 
>   The vzalloc() usage in drivers/net/hyperv/netvsc.c: netvsc_init_buf()
>     remains the same. It may not worth it to "allocate a vmalloc region,
>     allocate pages manually", because we have to consider the worst case
>     where the system is sufferiing from severe memory fragmentation and
>     we can only allocate multiple single pages. We may not want to
>     complicate the code in netvsc_init_buf(). We'll support NIC SR-IOV
>     for TDX VMs on Hyper-V, so the netvsc send/recv buffers won't be
>     used when the VF NIC is up.
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 5574c91541a2d..731be50b3d093 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -7,6 +7,7 @@
>  #include <linux/cpufeature.h>
>  #include <linux/export.h>
>  #include <linux/io.h>
> +#include <linux/mm.h>
>  #include <asm/coco.h>
>  #include <asm/tdx.h>
>  #include <asm/vmx.h>
> @@ -789,6 +790,34 @@ static bool try_accept_one(phys_addr_t *start, unsigned long
> len,
>  	return true;
>  }
> 
> +static bool try_accept_page(phys_addr_t start, phys_addr_t end)
> +{
> +	/*
> +	 * For shared->private conversion, accept the page using
> +	 * TDX_ACCEPT_PAGE TDX module call.
> +	 */
> +	while (start < end) {
> +		unsigned long len = end - start;
> +
> +		/*
> +		 * Try larger accepts first. It gives chance to VMM to keep
> +		 * 1G/2M SEPT entries where possible and speeds up process by
> +		 * cutting number of hypercalls (if successful).
> +		 */
> +
> +		if (try_accept_one(&start, len, PG_LEVEL_1G))
> +			continue;
> +
> +		if (try_accept_one(&start, len, PG_LEVEL_2M))
> +			continue;
> +
> +		if (!try_accept_one(&start, len, PG_LEVEL_4K))
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>  /*
>   * Notify the VMM about page mapping conversion. More info about ABI
>   * can be found in TDX Guest-Host-Communication Interface (GHCI),
> @@ -838,6 +867,19 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t
> end, bool enc)
>  	return !ret;
>  }
> 
> +static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end,
> +					bool enc)
> +{
> +	if (!tdx_map_gpa(start, end, enc))
> +		return false;
> +
> +	/* private->shared conversion requires only MapGPA call */
> +	if (!enc)
> +		return true;
> +
> +	return try_accept_page(start, end);
> +}
> +
>  /*
>   * Inform the VMM of the guest's intent for this physical page: shared with
>   * the VMM or private to the guest. The VMM is expected to change its mapping
> @@ -845,37 +887,23 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t
> end, bool enc)
>   */
>  static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
>  {
> -	phys_addr_t start = __pa(vaddr);
> -	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> +	unsigned long start = vaddr;
> +	unsigned long end = start + numpages * PAGE_SIZE;
> 
> -	if (!tdx_map_gpa(start, end, enc))
> +	if (offset_in_page(start) != 0)
>  		return false;
> 
> -	/* private->shared conversion  requires only MapGPA call */
> -	if (!enc)
> -		return true;
> +	if (!is_vmalloc_addr((void *)start))
> +		return tdx_enc_status_changed_phys(__pa(start), __pa(end), enc);
> 
> -	/*
> -	 * For shared->private conversion, accept the page using
> -	 * TDX_ACCEPT_PAGE TDX module call.
> -	 */
>  	while (start < end) {
> -		unsigned long len = end - start;
> +		phys_addr_t start_pa = slow_virt_to_phys((void *)start);

I've usually seen "page_to_phys(vmalloc_to_page(start))" for getting
the physical address corresponding to a vmalloc() address.  For example,
see virt_to_hvpfn(), kvm_kaddr_to_phys(), nx842_get_pa(),
rproc_va_to_pa(), etc.

Does anyone know if there is a reason to use one vs. the other?
slow_virt_to_phys() is x86-only, but that's not a problem here.

Otherwise,
Reviewed-by: Michael Kelley <mikelley@microsoft.com>

> +		phys_addr_t end_pa = start_pa + PAGE_SIZE;
> 
> -		/*
> -		 * Try larger accepts first. It gives chance to VMM to keep
> -		 * 1G/2M SEPT entries where possible and speeds up process by
> -		 * cutting number of hypercalls (if successful).
> -		 */
> -
> -		if (try_accept_one(&start, len, PG_LEVEL_1G))
> -			continue;
> -
> -		if (try_accept_one(&start, len, PG_LEVEL_2M))
> -			continue;
> -
> -		if (!try_accept_one(&start, len, PG_LEVEL_4K))
> +		if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc))
>  			return false;
> +
> +		start += PAGE_SIZE;
>  	}
> 
>  	return true;
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  2023-04-08 20:47 ` [PATCH v4 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
@ 2023-04-11 16:30   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-11 16:30 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel, Tianyu Lan

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> 
> No logic change to SNP/VBS guests.
> 
> hv_isolation_type_tdx() wil be used to instruct a TDX guest on Hyper-V to
> do some TDX-specific operations, e.g. hv_do_hypercall() should use
> __tdx_hypercall(), and a TDX guest on Hyper-V should handle the Hyper-V
> Event/Message/Monitor pages specially.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/hyperv/ivm.c              | 6 ++++++
>  arch/x86/include/asm/hyperv-tlfs.h | 3 ++-
>  arch/x86/include/asm/mshyperv.h    | 3 +++
>  arch/x86/kernel/cpu/mshyperv.c     | 2 ++
>  drivers/hv/hv_common.c             | 6 ++++++
>  5 files changed, 19 insertions(+), 1 deletion(-)
> 
> Changes in v2:
>   Added "#ifdef CONFIG_INTEL_TDX_GUEST and #endif" for
>     hv_isolation_type_tdx() in arch/x86/hyperv/ivm.c.
> 
>     Simplified the changes in ms_hyperv_init_platform().
> 
> Changes in v3:
>   Added Kuppuswamy's Reviewed-by.
> 
> Changes in v4:
>   A minor rebase to Michael's v7 DDA patchset.
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 127d5b7b63de1..3658ade4f4121 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -400,6 +400,7 @@ bool hv_is_isolation_supported(void)
>  }
> 
>  DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
> 
>  /*
>   * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
> @@ -409,3 +410,8 @@ bool hv_isolation_type_snp(void)
>  {
>  	return static_branch_unlikely(&isolation_type_snp);
>  }
> +
> +bool hv_isolation_type_tdx(void)
> +{
> +	return static_branch_unlikely(&isolation_type_tdx);
> +}
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index b4fb75bd10138..338f383c721c9 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -169,7 +169,8 @@
>  enum hv_isolation_type {
>  	HV_ISOLATION_TYPE_NONE	= 0,
>  	HV_ISOLATION_TYPE_VBS	= 1,
> -	HV_ISOLATION_TYPE_SNP	= 2
> +	HV_ISOLATION_TYPE_SNP	= 2,
> +	HV_ISOLATION_TYPE_TDX	= 3
>  };
> 
>  /* Hyper-V specific model specific registers (MSRs) */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index e3cef98a01420..de7ceae9e65e9 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -22,6 +22,7 @@
>  union hv_ghcb;
> 
>  DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> +DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);
> 
>  typedef int (*hyperv_fill_flush_list_func)(
>  		struct hv_guest_mapping_flush_list *flush,
> @@ -38,6 +39,8 @@ extern u64 hv_current_partition_id;
> 
>  extern union hv_ghcb * __percpu *hv_ghcb_pg;
> 
> +extern bool hv_isolation_type_tdx(void);
> +
>  int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
>  int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
>  int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index ff348ebb6ae28..a87fb934cd4b4 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -405,6 +405,8 @@ static void __init ms_hyperv_init_platform(void)
> 
>  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
>  			static_branch_enable(&isolation_type_snp);
> +		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
> +			static_branch_enable(&isolation_type_tdx);
>  	}
> 
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 6d40b6c7b23b9..c55db7ea6580b 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -271,6 +271,12 @@ bool __weak hv_isolation_type_snp(void)
>  }
>  EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> 
> +bool __weak hv_isolation_type_tdx(void)
> +{
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
> +
>  void __weak hv_setup_vmbus_handler(void (*handler)(void))
>  {
>  }
> --
> 2.25.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 4/6] x86/hyperv: Support hypercalls for TDX guests
  2023-04-08 20:47 ` [PATCH v4 4/6] x86/hyperv: Support hypercalls for " Dexuan Cui
@ 2023-04-11 16:36   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-11 16:36 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel, Tianyu Lan

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> 
> A TDX guest uses the GHCI call rather than hv_hypercall_pg.
> 
> In hv_do_hypercall(), Hyper-V requires that the input/output addresses
> must have the cc_mask.
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> Changes in v2:
>   Implemented hv_tdx_hypercall() in C rather than in assembly code.
>   Renamed the parameter names of hv_tdx_hypercall().
>   Used cc_mkdec() directly in hv_do_hypercall().
> 
> Changes in v3:
>   Decrypted/encrypted hyperv_pcpu_input_arg in
>     hv_common_cpu_init() and hv_common_cpu_die().
> 
> Changes in v4:
>   __tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
>   hv_common_cpu_die(): explicitly ignore the error set_memory_encrypted() [Michael Kelley]
>   Added Sathyanarayanan's Reviewed-by.
> 
>  arch/x86/hyperv/hv_init.c       |  8 ++++++++
>  arch/x86/hyperv/ivm.c           | 14 ++++++++++++++
>  arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
>  drivers/hv/hv_common.c          | 24 ++++++++++++++++++++++++
>  include/asm-generic/mshyperv.h  |  1 +
>  5 files changed, 64 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a5f9474f08e12..f175e0de821c3 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -432,6 +432,10 @@ void __init hyperv_init(void)
>  	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
>  	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> +	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> +	if (hv_isolation_type_tdx())
> +		goto skip_hypercall_pg_init;
> +
>  	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
>  			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -471,6 +475,7 @@ void __init hyperv_init(void)
>  		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	}
> 
> +skip_hypercall_pg_init:
>  	/*
>  	 * hyperv_init() is called before LAPIC is initialized: see
>  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> @@ -594,6 +599,9 @@ bool hv_is_hyperv_initialized(void)
>  	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>  		return false;
> 
> +	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> +	if (hv_isolation_type_tdx())
> +		return true;
>  	/*
>  	 * Verify that earlier initialization succeeded by checking
>  	 * that the hypercall page is setup
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 3658ade4f4121..23304c9ddd34f 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -415,3 +415,17 @@ bool hv_isolation_type_tdx(void)
>  {
>  	return static_branch_unlikely(&isolation_type_tdx);
>  }
> +
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	struct tdx_hypercall_args args = { };
> +
> +	args.r10 = control;
> +	args.rdx = param1;
> +	args.r8  = param2;
> +
> +	(void)__tdx_hypercall_ret(&args);
> +
> +	return args.r11;
> +}
> +EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index de7ceae9e65e9..71077326f57bc 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -10,6 +10,7 @@
>  #include <asm/nospec-branch.h>
>  #include <asm/paravirt.h>
>  #include <asm/mshyperv.h>
> +#include <asm/coco.h>
> 
>  /*
>   * Hyper-V always provides a single IO-APIC at this MMIO address.
> @@ -45,6 +46,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
>  int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
>  int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> 
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +
> +/*
> + * If the hypercall involves no input or output parameters, the hypervisor
> + * ignores the corresponding GPA pointer.
> + */
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
>  	u64 input_address = input ? virt_to_phys(input) : 0;
> @@ -52,6 +59,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control,
> +					cc_mkdec(input_address),
> +					cc_mkdec(output_address));
>  	if (!hv_hypercall_pg)
>  		return U64_MAX;
> 
> @@ -95,6 +106,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
>  	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control, input1, 0);
> +
>  	{
>  		__asm__ __volatile__(CALL_NOSPEC
>  				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> @@ -140,6 +154,9 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
>  	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control, input1, input2);
> +
>  	{
>  		__asm__ __volatile__("mov %4, %%r8\n"
>  				     CALL_NOSPEC
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index c55db7ea6580b..10e85682e83eb 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -21,6 +21,7 @@
>  #include <linux/ptrace.h>
>  #include <linux/slab.h>
>  #include <linux/dma-map-ops.h>
> +#include <linux/set_memory.h>
>  #include <asm/hyperv-tlfs.h>
>  #include <asm/mshyperv.h>
> 
> @@ -128,6 +129,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  	u64 msr_vp_index;
>  	gfp_t flags;
>  	int pgcount = hv_root_partition ? 2 : 1;
> +	int ret;
> 
>  	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
>  	flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
> @@ -137,6 +139,17 @@ int hv_common_cpu_init(unsigned int cpu)
>  	if (!(*inputarg))
>  		return -ENOMEM;
> 
> +	if (hv_isolation_type_tdx()) {
> +		ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
> +		if (ret) {
> +			/* It may be unsafe to free *inputarg */
> +			*inputarg = NULL;
> +			return ret;
> +		}
> +
> +		memset(*inputarg, 0x00, pgcount * HV_HYP_PAGE_SIZE);
> +	}
> +
>  	if (hv_root_partition) {
>  		outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>  		*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> @@ -157,6 +170,8 @@ int hv_common_cpu_die(unsigned int cpu)
>  	unsigned long flags;
>  	void **inputarg, **outputarg;
>  	void *mem;
> +	int pgcount = hv_root_partition ? 2 : 1;
> +	int ret;
> 
>  	local_irq_save(flags);
> 
> @@ -171,6 +186,15 @@ int hv_common_cpu_die(unsigned int cpu)
> 
>  	local_irq_restore(flags);
> 
> +	if (hv_isolation_type_tdx()) {
> +		ret = set_memory_encrypted((unsigned long)mem, pgcount);
> +		if (ret)
> +			pr_warn("Hyper-V: Failed to encrypt input arg on cpu%d: %d\n",
> +				cpu, ret);
> +		/* It's unsafe to free 'mem'. */
> +		return 0;
> +	}
> +
>  	kfree(mem);
> 
>  	return 0;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index afcd9ae9588cb..83e56ebe0cb7f 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -58,6 +58,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
>  extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
>  extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
>  extern bool hv_isolation_type_snp(void);
> +extern bool hv_isolation_type_tdx(void);
> 
>  /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
>  static inline int hv_result(u64 status)
> --
> 2.25.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests
  2023-04-08 20:47 ` [PATCH v4 5/6] Drivers: hv: vmbus: Support " Dexuan Cui
@ 2023-04-11 16:53   ` Michael Kelley (LINUX)
  2023-04-22  1:05     ` Dexuan Cui
  2023-04-12 14:05   ` Michael Kelley (LINUX)
  1 sibling, 1 reply; 20+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-11 16:53 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel, Tianyu Lan

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> 
> Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
>   No need to use hv_vp_assist_page.
>   Don't use the unsafe Hyper-V TSC page.
>   Don't try to use HV_REGISTER_CRASH_CTL.
>   Don't trust Hyper-V's TLB-flushing hypercalls.
>   Don't use lazy EOI.
>   Share SynIC Event/Message pages and VMBus Monitor pages with the host.
>   Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes in v2:
>   Used a new function hv_set_memory_enc_dec_needed() in
>     __set_memory_enc_pgtable().
>   Added the missing set_memory_encrypted() in hv_synic_free().
> 
> Changes in v3:
>   Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
>   (Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).
> 
>   Used cc_mkdec() in hv_synic_enable_regs().
> 
>   ms_hyperv_init_platform():
>     Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
>     Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.
> 
>   Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.
> 
> Changes in v4:
>   A minor rebase to Michael's v7 DDA patchset. I'm very happy that
>     I can drop my v3 change to arch/x86/mm/pat/set_memory.c due to
>     Michael's work.
> 
>  arch/x86/hyperv/hv_apic.c      |  6 ++--
>  arch/x86/hyperv/hv_init.c      | 19 ++++++++---
>  arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++-
>  drivers/hv/hv.c                | 62 +++++++++++++++++++++++++++++++---
>  4 files changed, 96 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index fb8b2c088681a..16919c7b3196e 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
>  	    (exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
>  		return true;
> 
> -	if (!hv_hypercall_pg)
> +	/* A TDX guest doesn't use hv_hypercall_pg. */
> +	if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
>  		return false;
> 
>  	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> @@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)
> 
>  	trace_hyperv_send_ipi_one(cpu, vector);
> 
> -	if (!hv_hypercall_pg || (vp == VP_INVAL))
> +	/* A TDX guest doesn't use hv_hypercall_pg. */
> +	if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
>  		return false;
> 
>  	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f175e0de821c3..f28357ecad7d9 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -79,7 +79,7 @@ static int hyperv_init_ghcb(void)
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> +	struct hv_vp_assist_page **hvp;
>  	int ret;
> 
>  	ret = hv_common_cpu_init(cpu);
> @@ -89,6 +89,7 @@ static int hv_cpu_init(unsigned int cpu)
>  	if (!hv_vp_assist_page)
>  		return 0;
> 
> +	hvp = &hv_vp_assist_page[cpu];
>  	if (hv_root_partition) {
>  		/*
>  		 * For root partition we get the hypervisor provided VP assist
> @@ -398,11 +399,21 @@ void __init hyperv_init(void)
>  	if (hv_common_init())
>  		return;
> 
> -	hv_vp_assist_page = kcalloc(num_possible_cpus(),
> -				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
> +	/*
> +	 * The VP assist page is useless to a TDX guest: the only use we
> +	 * would have for it is lazy EOI, which can not be used with TDX.
> +	 */
> +	if (hv_isolation_type_tdx())
> +		hv_vp_assist_page = NULL;
> +	else
> +		hv_vp_assist_page = kcalloc(num_possible_cpus(),
> +					    sizeof(*hv_vp_assist_page),
> +					    GFP_KERNEL);
>  	if (!hv_vp_assist_page) {
>  		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> -		goto common_free;
> +
> +		if (!hv_isolation_type_tdx())
> +			goto common_free;
>  	}
> 
>  	if (hv_isolation_type_snp()) {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index a87fb934cd4b4..e9106c9d92f81 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -405,8 +405,27 @@ static void __init ms_hyperv_init_platform(void)
> 
>  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
>  			static_branch_enable(&isolation_type_snp);
> -		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
> +		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
>  			static_branch_enable(&isolation_type_tdx);
> +
> +			/*
> +			 * The GPAs of SynIC Event/Message pages and VMBus
> +			 * Moniter pages need to be added by this offset.
> +			 */
> +			ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
> +
> +			/* Don't use the unsafe Hyper-V TSC page */
> +			ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
> +
> +			/* HV_REGISTER_CRASH_CTL is unsupported */
> +			ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> +
> +			/* Don't trust Hyper-V's TLB-flushing hypercalls */
> +			ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
> +
> +			/* A TDX VM must use x2APIC and doesn't use lazy EOI */
> +			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> +		}
>  	}
> 
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 008234894d287..22ecb79d21efd 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -18,6 +18,7 @@
>  #include <linux/clockchips.h>
>  #include <linux/delay.h>
>  #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
>  #include <clocksource/hyperv_timer.h>
>  #include <asm/mshyperv.h>
>  #include "hyperv_vmbus.h"
> @@ -119,6 +120,7 @@ int hv_synic_alloc(void)
>  {
>  	int cpu;
>  	struct hv_per_cpu_context *hv_cpu;
> +	int ret = -ENOMEM;
> 
>  	/*
>  	 * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
>  			pr_err("Unable to allocate post msg page\n");
>  			goto err;
>  		}
> +
> +
> +		if (hv_isolation_type_tdx()) {
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->synic_message_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt SYNIC msg page\n");
> +				goto err;
> +			}
> +
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->synic_event_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt SYNIC event page\n");
> +				goto err;
> +			}
> +
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->post_msg_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt post msg page\n");
> +				goto err;
> +			}
> +		}

The error path here doesn't always work correctly.  If one or more of the
three pages is decrypted, and then one of the decryptions fails, we're left
in a state where all three pages are allocated, but some are decrypted
and some are not.  hv_synic_free() won't know which pages are allocated
but still encrypted, and will try to re-encrypt a page that wasn't
successfully decrypted.

The code to clean up from an error is messy because there are three pages
involved.   You've posted a separate patch to eliminate the need for the
post_msg_page.  If that patch came before this patch series (or maybe
incorporated into this series), the code here only has to deal with two
pages instead of three, making the cleanup of decryption errors easier.

>  	}
> 
>  	return 0;
> @@ -176,18 +202,42 @@ int hv_synic_alloc(void)
>  	 * Any memory allocations that succeeded will be freed when
>  	 * the caller cleans up by calling hv_synic_free()
>  	 */
> -	return -ENOMEM;
> +	return ret;
>  }
> 
> 
>  void hv_synic_free(void)
>  {
>  	int cpu;
> +	int ret;
> 
>  	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu
>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> 
> +		if (hv_isolation_type_tdx()) {
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->synic_message_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt SYNIC msg page\n");
> +				continue;
> +			}
> +
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->synic_event_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt SYNIC event page\n");
> +				continue;
> +			}
> +
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->post_msg_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt post msg page\n");
> +				continue;
> +			}

If any of the three re-encryptions fails, we'll leak all three pages.  That's
probably OK.  Eliminating the post_msg_page will help.

> +		}
> +
>  		free_page((unsigned long)hv_cpu->synic_event_page);
>  		free_page((unsigned long)hv_cpu->synic_message_page);
>  		free_page((unsigned long)hv_cpu->post_msg_page);
> @@ -225,8 +275,9 @@ void hv_synic_enable_regs(unsigned int cpu)
>  		if (!hv_cpu->synic_message_page)
>  			pr_err("Fail to map synic message page.\n");
>  	} else {
> -		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> -			>> HV_HYP_PAGE_SHIFT;
> +		simp.base_simp_gpa =
> +			cc_mkdec(virt_to_phys(hv_cpu->synic_message_page)) >>
> +			HV_HYP_PAGE_SHIFT;
>  	}
> 
>  	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> @@ -244,8 +295,9 @@ void hv_synic_enable_regs(unsigned int cpu)
>  		if (!hv_cpu->synic_event_page)
>  			pr_err("Fail to map synic event page.\n");
>  	} else {
> -		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> -			>> HV_HYP_PAGE_SHIFT;
> +		siefp.base_siefp_gpa =
> +			cc_mkdec(virt_to_phys(hv_cpu->synic_event_page)) >>
> +			HV_HYP_PAGE_SHIFT;
>  	}
> 
>  	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for TDX guests
  2023-04-08 20:47 ` [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for " Dexuan Cui
@ 2023-04-11 17:13   ` Michael Kelley (LINUX)
  2023-04-21  3:54     ` Dexuan Cui
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-11 17:13 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel, Tianyu Lan

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> 
> When a TDX guest runs on Hyper-V, the UEFI firmware sets the HW_REDUCED
> flag, and consequently ttyS0 interrupts can't work. Fix the issue by
> overriding x86_init.acpi.reduced_hw_early_init().
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes since v1:
>     None.
> 
>  arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index e9106c9d92f81..deedced0f2bb0 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -318,6 +318,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
>  }
>  #endif
> 
> +/*
> + * When a TDX guest runs on Hyper-V, the firmware sets the HW_REDUCED flag: see
> + * acpi_tb_create_local_fadt(). Consequently ttyS0 interrupts can't work because
> + * request_irq() -> ... -> irq_to_desc() returns NULL for ttyS0. This happens
> + * because mp_config_acpi_legacy_irqs() sees a nr_legacy_irqs() of 0, so it
> + * doesn't initialize the array 'mp_irqs[]', and later setup_IO_APIC_irqs() ->
> + * find_irq_entry() fails to find the legacy irqs from the array, and hence
> + * doesn't create the necessary irq description info.
> + *
> + * Copy arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() but doesn't
> + * change 'legacy_pic', so it keeps its default value 'default_legacy_pic' in
> + * mp_config_acpi_legacy_irqs(), which sees a non-zero nr_legacy_irqs(), and
> + * eventually serial console interrupts can work properly.

I had a little trouble parsing this comment.  Slightly better wording is:

    * Clone arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() here,
    * except don't change 'legacy_pic'.  It keeps its default value 'default_legacy_pic'.
    * mp_config_acpi_legacy_irqs() sees a non-zero nr_legacy_irqs(), and
    * eventually serial console interrupts can work properly.

Otherwise,

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

> + */
> +static void __init reduced_hw_init(void)
> +{
> +	x86_init.timers.timer_init	= x86_init_noop;
> +	x86_init.irqs.pre_vector_init	= x86_init_noop;
> +}
> +
>  static void __init ms_hyperv_init_platform(void)
>  {
>  	int hv_max_functions_eax;
> @@ -425,6 +445,8 @@ static void __init ms_hyperv_init_platform(void)
> 
>  			/* A TDX VM must use x2APIC and doesn't use lazy EOI */
>  			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> +
> +			x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
>  		}
>  	}
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests
  2023-04-08 20:47 ` [PATCH v4 5/6] Drivers: hv: vmbus: Support " Dexuan Cui
  2023-04-11 16:53   ` Michael Kelley (LINUX)
@ 2023-04-12 14:05   ` Michael Kelley (LINUX)
  2023-04-21  3:34     ` Dexuan Cui
  1 sibling, 1 reply; 20+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-12 14:05 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel, Tianyu Lan

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> 
> Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
>   No need to use hv_vp_assist_page.
>   Don't use the unsafe Hyper-V TSC page.
>   Don't try to use HV_REGISTER_CRASH_CTL.
>   Don't trust Hyper-V's TLB-flushing hypercalls.
>   Don't use lazy EOI.
>   Share SynIC Event/Message pages and VMBus Monitor pages with the host.
>   Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes in v2:
>   Used a new function hv_set_memory_enc_dec_needed() in
>     __set_memory_enc_pgtable().
>   Added the missing set_memory_encrypted() in hv_synic_free().
> 
> Changes in v3:
>   Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
>   (Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).
> 
>   Used cc_mkdec() in hv_synic_enable_regs().
> 
>   ms_hyperv_init_platform():
>     Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
>     Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.
> 
>   Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.
> 
> Changes in v4:
>   A minor rebase to Michael's v7 DDA patchset. I'm very happy that
>     I can drop my v3 change to arch/x86/mm/pat/set_memory.c due to
>     Michael's work.
> 
>  arch/x86/hyperv/hv_apic.c      |  6 ++--
>  arch/x86/hyperv/hv_init.c      | 19 ++++++++---
>  arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++-
>  drivers/hv/hv.c                | 62 +++++++++++++++++++++++++++++++---
>  4 files changed, 96 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index fb8b2c088681a..16919c7b3196e 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int
> vector,
>  	    (exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
>  		return true;
> 
> -	if (!hv_hypercall_pg)
> +	/* A TDX guest doesn't use hv_hypercall_pg. */
> +	if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
>  		return false;
> 
>  	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> @@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)
> 
>  	trace_hyperv_send_ipi_one(cpu, vector);
> 
> -	if (!hv_hypercall_pg || (vp == VP_INVAL))
> +	/* A TDX guest doesn't use hv_hypercall_pg. */
> +	if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
>  		return false;
> 
>  	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f175e0de821c3..f28357ecad7d9 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -79,7 +79,7 @@ static int hyperv_init_ghcb(void)
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> +	struct hv_vp_assist_page **hvp;
>  	int ret;
> 
>  	ret = hv_common_cpu_init(cpu);
> @@ -89,6 +89,7 @@ static int hv_cpu_init(unsigned int cpu)
>  	if (!hv_vp_assist_page)
>  		return 0;
> 
> +	hvp = &hv_vp_assist_page[cpu];
>  	if (hv_root_partition) {
>  		/*
>  		 * For root partition we get the hypervisor provided VP assist
> @@ -398,11 +399,21 @@ void __init hyperv_init(void)
>  	if (hv_common_init())
>  		return;
> 
> -	hv_vp_assist_page = kcalloc(num_possible_cpus(),
> -				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
> +	/*
> +	 * The VP assist page is useless to a TDX guest: the only use we
> +	 * would have for it is lazy EOI, which can not be used with TDX.
> +	 */
> +	if (hv_isolation_type_tdx())
> +		hv_vp_assist_page = NULL;
> +	else
> +		hv_vp_assist_page = kcalloc(num_possible_cpus(),
> +					    sizeof(*hv_vp_assist_page),
> +					    GFP_KERNEL);
>  	if (!hv_vp_assist_page) {
>  		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> -		goto common_free;
> +
> +		if (!hv_isolation_type_tdx())
> +			goto common_free;
>  	}
> 
>  	if (hv_isolation_type_snp()) {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index a87fb934cd4b4..e9106c9d92f81 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -405,8 +405,27 @@ static void __init ms_hyperv_init_platform(void)
> 
>  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
>  			static_branch_enable(&isolation_type_snp);
> -		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
> +		else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
>  			static_branch_enable(&isolation_type_tdx);
> +
> +			/*
> +			 * The GPAs of SynIC Event/Message pages and VMBus
> +			 * Moniter pages need to be added by this offset.
> +			 */
> +			ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
> +
> +			/* Don't use the unsafe Hyper-V TSC page */
> +			ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
> +
> +			/* HV_REGISTER_CRASH_CTL is unsupported */
> +			ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> +
> +			/* Don't trust Hyper-V's TLB-flushing hypercalls */
> +			ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
> +
> +			/* A TDX VM must use x2APIC and doesn't use lazy EOI */
> +			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> +		}
>  	}
> 
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 008234894d287..22ecb79d21efd 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -18,6 +18,7 @@
>  #include <linux/clockchips.h>
>  #include <linux/delay.h>
>  #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
>  #include <clocksource/hyperv_timer.h>
>  #include <asm/mshyperv.h>
>  #include "hyperv_vmbus.h"
> @@ -119,6 +120,7 @@ int hv_synic_alloc(void)
>  {
>  	int cpu;
>  	struct hv_per_cpu_context *hv_cpu;
> +	int ret = -ENOMEM;
> 
>  	/*
>  	 * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
>  			pr_err("Unable to allocate post msg page\n");
>  			goto err;
>  		}
> +
> +
> +		if (hv_isolation_type_tdx()) {
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->synic_message_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt SYNIC msg page\n");
> +				goto err;
> +			}
> +
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->synic_event_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt SYNIC event page\n");
> +				goto err;
> +			}
> +
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->post_msg_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt post msg page\n");
> +				goto err;
> +			}
> +		}

One other comment:  The memory for the synic_message_page,
synic_event_page, and post_msg_page is obtained using get_zeroed_page().
But after the decryption, the memory contents will be random garbage that
isn't all zeroes.  You'll need to do a memset() after the decryption to get the
contents back to zero.  Compare with Patch 6 in Tianyu's fully enlightened
SNP patch series.

Michael

>  	}
> 
>  	return 0;
> @@ -176,18 +202,42 @@ int hv_synic_alloc(void)
>  	 * Any memory allocations that succeeded will be freed when
>  	 * the caller cleans up by calling hv_synic_free()
>  	 */
> -	return -ENOMEM;
> +	return ret;
>  }
> 
> 
>  void hv_synic_free(void)
>  {
>  	int cpu;
> +	int ret;
> 
>  	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu
>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> 
> +		if (hv_isolation_type_tdx()) {
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->synic_message_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt SYNIC msg page\n");
> +				continue;
> +			}
> +
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->synic_event_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt SYNIC event page\n");
> +				continue;
> +			}
> +
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->post_msg_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt post msg page\n");
> +				continue;
> +			}
> +		}
> +
>  		free_page((unsigned long)hv_cpu->synic_event_page);
>  		free_page((unsigned long)hv_cpu->synic_message_page);
>  		free_page((unsigned long)hv_cpu->post_msg_page);
> @@ -225,8 +275,9 @@ void hv_synic_enable_regs(unsigned int cpu)
>  		if (!hv_cpu->synic_message_page)
>  			pr_err("Fail to map synic message page.\n");
>  	} else {
> -		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> -			>> HV_HYP_PAGE_SHIFT;
> +		simp.base_simp_gpa =
> +			cc_mkdec(virt_to_phys(hv_cpu->synic_message_page)) >>
> +			HV_HYP_PAGE_SHIFT;
>  	}
> 
>  	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> @@ -244,8 +295,9 @@ void hv_synic_enable_regs(unsigned int cpu)
>  		if (!hv_cpu->synic_event_page)
>  			pr_err("Fail to map synic event page.\n");
>  	} else {
> -		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> -			>> HV_HYP_PAGE_SHIFT;
> +		siefp.base_siefp_gpa =
> +			cc_mkdec(virt_to_phys(hv_cpu->synic_event_page)) >>
> +			HV_HYP_PAGE_SHIFT;
>  	}
> 
>  	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-04-11 16:28   ` Michael Kelley (LINUX)
@ 2023-04-12 15:19     ` kirill.shutemov
  2023-04-21  3:14       ` Dexuan Cui
  2023-04-12 15:20     ` kirill.shutemov
  1 sibling, 1 reply; 20+ messages in thread
From: kirill.shutemov @ 2023-04-12 15:19 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, linux-kernel, Tianyu Lan

On Tue, Apr 11, 2023 at 04:28:20PM +0000, Michael Kelley (LINUX) wrote:
> From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 1:48 PM
> > 
> > When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
> > allocates buffers using vzalloc(), and needs to share the buffers with the
> > host OS by calling set_memory_decrypted(), which is not working for
> > vmalloc() yet. Add the support by handling the pages one by one.
> > 
> > Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Co-developed-by requires[1] Signed-off-by. You can use mine, if you want.

[1] Documentation/process/submitting-patches.rst

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-04-11 16:28   ` Michael Kelley (LINUX)
  2023-04-12 15:19     ` kirill.shutemov
@ 2023-04-12 15:20     ` kirill.shutemov
  1 sibling, 0 replies; 20+ messages in thread
From: kirill.shutemov @ 2023-04-12 15:20 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, linux-kernel, Tianyu Lan

On Tue, Apr 11, 2023 at 04:28:20PM +0000, Michael Kelley (LINUX) wrote:
> Does anyone know if there is a reason to use one vs. the other?
> slow_virt_to_phys() is x86-only, but that's not a problem here.

slow_virt_to_phys() is more generic as it works for non-vmalloc addresses,
but generally they are equivalent for the use.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-04-12 15:19     ` kirill.shutemov
@ 2023-04-21  3:14       ` Dexuan Cui
  0 siblings, 0 replies; 20+ messages in thread
From: Dexuan Cui @ 2023-04-21  3:14 UTC (permalink / raw)
  To: kirill.shutemov, Michael Kelley (LINUX)
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, KY Srinivasan, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, linux-kernel, Tianyu Lan

> From: kirill.shutemov@linux.intel.com <kirill.shutemov@linux.intel.com>
> Sent: Wednesday, April 12, 2023 8:20 AM
> ...
> > > Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Co-developed-by requires[1] Signed-off-by. You can use mine, if you want.

Thanks, Kirill. I'll add your Signed-off-by in v5.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests
  2023-04-12 14:05   ` Michael Kelley (LINUX)
@ 2023-04-21  3:34     ` Dexuan Cui
  0 siblings, 0 replies; 20+ messages in thread
From: Dexuan Cui @ 2023-04-21  3:34 UTC (permalink / raw)
  To: Michael Kelley (LINUX),
	ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86
  Cc: linux-kernel, Tianyu Lan

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Wednesday, April 12, 2023 7:05 AM
> > @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
> >  			pr_err("Unable to allocate post msg page\n");
> >  			goto err;
> >  		}
> > +
> > +
> > +		if (hv_isolation_type_tdx()) {
> > +			ret = set_memory_decrypted(
> > +				(unsigned long)hv_cpu->synic_message_page, 1);
> > +			if (ret) {
> > +				pr_err("Failed to decrypt SYNIC msg page\n");
> > +				goto err;
> > +			}
> > +
> > +			ret = set_memory_decrypted(
> > +				(unsigned long)hv_cpu->synic_event_page, 1);
> > +			if (ret) {
> > +				pr_err("Failed to decrypt SYNIC event page\n");
> > +				goto err;
> > +			}
> > +
> > +			ret = set_memory_decrypted(
> > +				(unsigned long)hv_cpu->post_msg_page, 1);
> > +			if (ret) {
> > +				pr_err("Failed to decrypt post msg page\n");
> > +				goto err;
> > +			}
> > +		}
> 
> One other comment:  The memory for the synic_message_page,
> synic_event_page, and post_msg_page is obtained using get_zeroed_page().
> But after the decryption, the memory contents will be random garbage that

I'm not sure about this, as I don't see any "unknown msgtype=" message from
vmbus_on_msg_dpc() :-) 

I agree it's good to add a memset(). Will do it in v5.

> isn't all zeroes.  You'll need to do a memset() after the decryption to get the
> contents back to zero.  Compare with Patch 6 in Tianyu's fully enlightened
> SNP patch series.
> 
> Michael
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for TDX guests
  2023-04-11 17:13   ` Michael Kelley (LINUX)
@ 2023-04-21  3:54     ` Dexuan Cui
  0 siblings, 0 replies; 20+ messages in thread
From: Dexuan Cui @ 2023-04-21  3:54 UTC (permalink / raw)
  To: Michael Kelley (LINUX),
	ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86
  Cc: linux-kernel, Tianyu Lan

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Tuesday, April 11, 2023 10:14 AM
> >  ...
> > + * Copy arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() but
> > doesn't change 'legacy_pic', so it keeps its default value 'default_legacy_pic' in
> > + * mp_config_acpi_legacy_irqs(), which sees a non-zero nr_legacy_irqs(),
> > and eventually serial console interrupts can work properly.
> 
> I had a little trouble parsing this comment.  Slightly better wording is:
> 
>  * Clone arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init()
> here,
>     * except don't change 'legacy_pic'.  It keeps its default value
> 'default_legacy_pic'.
>     * mp_config_acpi_legacy_irqs() sees a non-zero nr_legacy_irqs(), and
>     * eventually serial console interrupts can work properly.
> 
> Otherwise,
> 
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>

Thanks! Will use this in v5.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests
  2023-04-11 16:53   ` Michael Kelley (LINUX)
@ 2023-04-22  1:05     ` Dexuan Cui
  0 siblings, 0 replies; 20+ messages in thread
From: Dexuan Cui @ 2023-04-22  1:05 UTC (permalink / raw)
  To: Michael Kelley (LINUX),
	ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86
  Cc: linux-kernel, Tianyu Lan

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Tuesday, April 11, 2023 9:53 AM
> ...
> > @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
> >  			pr_err("Unable to allocate post msg page\n");
> >  			goto err;
> > ...
> The error path here doesn't always work correctly.  If one or more of the
> three pages is decrypted, and then one of the decryptions fails, we're left
> in a state where all three pages are allocated, but some are decrypted
> and some are not.  hv_synic_free() won't know which pages are allocated
> but still encrypted, and will try to re-encrypt a page that wasn't
> successfully decrypted.
> 
> The code to clean up from an error is messy because there are three pages
> involved. You've posted a separate patch to eliminate the need for the
> post_msg_page.  If that patch came before this patch series (or maybe
> incorporated into this series), the code here only has to deal with two
> pages instead of three, making the cleanup of decryption errors easier.
> 
> >  	}
> >. ..
> >  void hv_synic_free(void)
> >  {
> >  	int cpu;
> > +	int ret;
> >... 
> If any of the three re-encryptions fails, we'll leak all three pages. That's
> probably OK.  Eliminating the post_msg_page will help.

The post_msg_page has been removed in the hyperv-next branch.
I'm rebasing this patch and is going to post v5. 
I think I can use the below changes:

@@ -159,6 +161,28 @@ int hv_synic_alloc(void)
                                goto err;
                        }
                }
+
+               /* It's better to leak the page if the decryption fails. */
+               if (hv_isolation_type_tdx()) {
+                       ret = set_memory_decrypted(
+                               (unsigned long)hv_cpu->synic_message_page, 1);
+                       if (ret) {
+                               pr_err("Failed to decrypt SYNIC msg page\n");
+                               hv_cpu->synic_message_page = NULL;
+                               goto err;
+                       }
+
+                       ret = set_memory_decrypted(
+                               (unsigned long)hv_cpu->synic_event_page, 1);
+                       if (ret) {
+                               pr_err("Failed to decrypt SYNIC event page\n");
+                               hv_cpu->synic_event_page = NULL;
+                               goto err;
+                       }
+
+                       memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+                       memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+               }
        }
... 
 void hv_synic_free(void)
 {
        int cpu;
+       int ret;
 
        for_each_present_cpu(cpu) {
                struct hv_per_cpu_context *hv_cpu
                        = per_cpu_ptr(hv_context.cpu_context, cpu);
 
+               /* It's better to leak the page if the encryption fails. */
+               if (hv_isolation_type_tdx()) {
+                       if (hv_cpu->synic_message_page) {
+                               ret = set_memory_encrypted((unsigned long)
+                                       hv_cpu->synic_message_page, 1);
+                               if (ret) {
+                                       pr_err("Failed to encrypt SYNIC msg page\n");
+                                       hv_cpu->synic_message_page = NULL;
+                               }
+                       }
+
+                       if (hv_cpu->synic_event_page) {
+                               ret = set_memory_encrypted((unsigned long)
+                                       hv_cpu->synic_event_page, 1);
+                               if (ret) {
+                                       pr_err("Failed to encrypt SYNIC event page\n");
+                                       hv_cpu->synic_event_page = NULL;
+                               }
+                       }
+               }
+
                free_page((unsigned long)hv_cpu->synic_event_page);
                free_page((unsigned long)hv_cpu->synic_message_page);
        }

BTW, at the end of hv_synic_alloc() there is a comment saying:
        /*
         * Any memory allocations that succeeded will be freed when
         * the caller cleans up by calling hv_synic_free()
         */

If hv_synic_alloc() -> get_zeroed_page() fails, hv_context.hv_numa_map() is
not freed() and hv_synic_alloc() returns -ENOMEM to vmbus_bus_init(); next,
we do "goto err_alloc;", i.e. hv_synic_free() is not called. We can post a
separate patch for this.

Thanks,
Dexuan


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-04-22  1:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-08 20:47 [PATCH v4 0/6] Support TDX guests on Hyper-V Dexuan Cui
2023-04-08 20:47 ` [PATCH v4 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
2023-04-11 16:15   ` Michael Kelley (LINUX)
2023-04-08 20:47 ` [PATCH v4 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
2023-04-11 16:28   ` Michael Kelley (LINUX)
2023-04-12 15:19     ` kirill.shutemov
2023-04-21  3:14       ` Dexuan Cui
2023-04-12 15:20     ` kirill.shutemov
2023-04-08 20:47 ` [PATCH v4 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
2023-04-11 16:30   ` Michael Kelley (LINUX)
2023-04-08 20:47 ` [PATCH v4 4/6] x86/hyperv: Support hypercalls for " Dexuan Cui
2023-04-11 16:36   ` Michael Kelley (LINUX)
2023-04-08 20:47 ` [PATCH v4 5/6] Drivers: hv: vmbus: Support " Dexuan Cui
2023-04-11 16:53   ` Michael Kelley (LINUX)
2023-04-22  1:05     ` Dexuan Cui
2023-04-12 14:05   ` Michael Kelley (LINUX)
2023-04-21  3:34     ` Dexuan Cui
2023-04-08 20:47 ` [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for " Dexuan Cui
2023-04-11 17:13   ` Michael Kelley (LINUX)
2023-04-21  3:54     ` Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).