linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Support TDX guests on Hyper-V
@ 2022-12-07  0:33 Dexuan Cui
  2022-12-07  0:33 ` [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-07  0:33 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Dexuan Cui

This patchset adds the Hyper-V specific code so that a TDX guest can run
on Hyper-V. Please review. Thanks!

v1 is here:
https://lwn.net/ml/linux-kernel/20221121195151.21812-1-decui@microsoft.com/

I think I addressed all the comments received against v1. Please let me
know in case I missed anything.

I added a section "changes in v2" in each patch: in patch 0002, the
section contains a few questions. Please refer to the patch.

This v2 pathset is based on the hyperv-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next

Thanks,
Dexuan

Dexuan Cui (5):
  x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
  x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  x86/hyperv: Support hypercalls for TDX guests
  Drivers: hv: vmbus: Support TDX guests

Kirill A. Shutemov (1):
  x86/tdx: Expand __tdx_hypercall() to handle more arguments

 arch/x86/coco/tdx/tdcall.S         |  82 +++++++++++++++------
 arch/x86/coco/tdx/tdx.c            | 113 ++++++++++++++++++++++-------
 arch/x86/hyperv/hv_init.c          |  27 ++++++-
 arch/x86/hyperv/ivm.c              |  28 +++++++
 arch/x86/include/asm/hyperv-tlfs.h |   3 +-
 arch/x86/include/asm/mshyperv.h    |  20 +++++
 arch/x86/include/asm/shared/tdx.h  |   6 ++
 arch/x86/kernel/asm-offsets.c      |   6 ++
 arch/x86/kernel/cpu/mshyperv.c     |  22 +++++-
 arch/x86/mm/pat/set_memory.c       |   2 +-
 drivers/hv/connection.c            |   4 +-
 drivers/hv/hv.c                    |  60 ++++++++++++++-
 drivers/hv/hv_common.c             |  12 +++
 drivers/hv/ring_buffer.c           |   2 +-
 include/asm-generic/mshyperv.h     |   2 +
 15 files changed, 328 insertions(+), 61 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
  2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
@ 2022-12-07  0:33 ` Dexuan Cui
  2022-12-08 19:48   ` Kirill A. Shutemov
  2022-12-07  0:33 ` [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Dexuan Cui @ 2022-12-07  0:33 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Dexuan Cui

GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
operation for the pages in the region starting at the GPA specified
in R11.

When a TDX guest runs on Hyper-V, Hyper-V returns the retry error
when hyperv_init() -> swiotlb_update_mem_attributes() ->
set_memory_decrypted() decrypts up to 1GB of swiotlb bounce buffers.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2:
  Used __tdx_hypercall() directly in tdx_map_gpa().
  Added a max_retry_cnt of 1000.
  Renamed a few variables, e.g., r11 -> map_fail_paddr.

 arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++--------
 1 file changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 3fee96931ff5..cdeda698d308 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -20,6 +20,8 @@
 /* TDX hypercall Leaf IDs */
 #define TDVMCALL_MAP_GPA		0x10001
 
+#define TDVMCALL_STATUS_RETRY		1
+
 /* MMIO direction */
 #define EPT_READ	0
 #define EPT_WRITE	1
@@ -692,14 +694,15 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
 }
 
 /*
- * Inform the VMM of the guest's intent for this physical page: shared with
- * the VMM or private to the guest.  The VMM is expected to change its mapping
- * of the page in response.
+ * Notify the VMM about page mapping conversion. More info about ABI
+ * can be found in TDX Guest-Host-Communication Interface (GHCI),
+ * section "TDG.VP.VMCALL<MapGPA>".
  */
-static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
 {
-	phys_addr_t start = __pa(vaddr);
-	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+	int max_retry_cnt = 1000, retry_cnt = 0;
+	struct tdx_hypercall_args args;
+	u64 map_fail_paddr, ret;
 
 	if (!enc) {
 		/* Set the shared (decrypted) bits: */
@@ -707,12 +710,49 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 		end   |= cc_mkdec(0);
 	}
 
-	/*
-	 * Notify the VMM about page mapping conversion. More info about ABI
-	 * can be found in TDX Guest-Host-Communication Interface (GHCI),
-	 * section "TDG.VP.VMCALL<MapGPA>"
-	 */
-	if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
+	while (1) {
+		memset(&args, 0, sizeof(args));
+		args.r10 = TDX_HYPERCALL_STANDARD;
+		args.r11 = TDVMCALL_MAP_GPA;
+		args.r12 = start;
+		args.r13 = end - start;
+
+		ret = __tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
+		if (ret != TDVMCALL_STATUS_RETRY)
+			break;
+		/*
+		 * The guest must retry the operation for the pages in the
+		 * region starting at the GPA specified in R11. Make sure R11
+		 * contains a sane value.
+		 */
+		map_fail_paddr = args.r11;
+		if (map_fail_paddr < start || map_fail_paddr >= end)
+			return false;
+
+		if (map_fail_paddr == start) {
+			retry_cnt++;
+			if (retry_cnt > max_retry_cnt)
+				return false;
+		} else {
+			retry_cnt = 0;
+			start = map_fail_paddr;
+		}
+	}
+
+	return !ret;
+}
+
+/*
+ * Inform the VMM of the guest's intent for this physical page: shared with
+ * the VMM or private to the guest. The VMM is expected to change its mapping
+ * of the page in response.
+ */
+static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+{
+	phys_addr_t start = __pa(vaddr);
+	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+
+	if (!tdx_map_gpa(start, end, enc))
 		return false;
 
 	/* private->shared conversion  requires only MapGPA call */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
  2022-12-07  0:33 ` [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
@ 2022-12-07  0:33 ` Dexuan Cui
  2023-01-05  9:44   ` Zhi Wang
  2022-12-07  0:33 ` [PATCH v2 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Dexuan Cui @ 2022-12-07  0:33 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Dexuan Cui

When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
allocates buffers using vzalloc(), and needs to share the buffers with the
host OS by calling set_memory_decrypted(), which is not working for
vmalloc() yet. Add the support by handling the pages one by one.

Signed-off-by: Dexuan Cui <decui@microsoft.com>

---

Changes in v2:
  Changed tdx_enc_status_changed() in place.
  
Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
how to get the underlying huge page info (if huge page is in use) and
try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
is_vm_area_hugepages() and  __vfree() -> __vunmap(), and I think the
underlying page allocation info is internal to the mm code, and there
is no mm API to for me get the info in tdx_enc_status_changed().

Hi, Kirill, the load_unaligned_zeropad() issue is not addressed in
this patch. The issue looks like a generic issue that also happens to
AMD SNP vTOM mode and C-bit mode. Will need to figure out how to
address the issue. If we decide to adjust direct mapping to have the
shared bit set, it lools like we need to do the below for each
'start_va' vmalloc page:
  pa = slow_virt_to_phys(start_va);
  set_memory_decrypted(phys_to_virt(pa), 1); -- this line calls
tdx_enc_status_changed() the second time for the page, which is bad.
It looks like we need to find a way to reuse the cpa_flush() related
code in __set_memory_enc_pgtable() and make sure we call
tdx_enc_status_changed() only once for a vmalloc page?

  
 arch/x86/coco/tdx/tdx.c | 69 ++++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 25 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index cdeda698d308..795ac56f06b8 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -5,6 +5,7 @@
 #define pr_fmt(fmt)     "tdx: " fmt
 
 #include <linux/cpufeature.h>
+#include <linux/mm.h>
 #include <asm/coco.h>
 #include <asm/tdx.h>
 #include <asm/vmx.h>
@@ -693,6 +694,34 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
 	return true;
 }
 
+static bool try_accept_page(phys_addr_t start, phys_addr_t end)
+{
+	/*
+	 * For shared->private conversion, accept the page using
+	 * TDX_ACCEPT_PAGE TDX module call.
+	 */
+	while (start < end) {
+		unsigned long len = end - start;
+
+		/*
+		 * Try larger accepts first. It gives chance to VMM to keep
+		 * 1G/2M SEPT entries where possible and speeds up process by
+		 * cutting number of hypercalls (if successful).
+		 */
+
+		if (try_accept_one(&start, len, PG_LEVEL_1G))
+			continue;
+
+		if (try_accept_one(&start, len, PG_LEVEL_2M))
+			continue;
+
+		if (!try_accept_one(&start, len, PG_LEVEL_4K))
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * Notify the VMM about page mapping conversion. More info about ABI
  * can be found in TDX Guest-Host-Communication Interface (GHCI),
@@ -749,37 +778,27 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
  */
 static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 {
-	phys_addr_t start = __pa(vaddr);
-	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+	bool is_vmalloc = is_vmalloc_addr((void *)vaddr);
+	unsigned long len = numpages * PAGE_SIZE;
+	void *start_va = (void *)vaddr, *end_va = start_va + len;
+	phys_addr_t start_pa, end_pa;
 
-	if (!tdx_map_gpa(start, end, enc))
+	if (offset_in_page(start_va) != 0)
 		return false;
 
-	/* private->shared conversion  requires only MapGPA call */
-	if (!enc)
-		return true;
-
-	/*
-	 * For shared->private conversion, accept the page using
-	 * TDX_ACCEPT_PAGE TDX module call.
-	 */
-	while (start < end) {
-		unsigned long len = end - start;
-
-		/*
-		 * Try larger accepts first. It gives chance to VMM to keep
-		 * 1G/2M SEPT entries where possible and speeds up process by
-		 * cutting number of hypercalls (if successful).
-		 */
-
-		if (try_accept_one(&start, len, PG_LEVEL_1G))
-			continue;
+	while (start_va < end_va) {
+		start_pa = is_vmalloc ? slow_virt_to_phys(start_va) :
+					__pa(start_va);
+		end_pa = start_pa + (is_vmalloc ? PAGE_SIZE : len);
 
-		if (try_accept_one(&start, len, PG_LEVEL_2M))
-			continue;
+		if (!tdx_map_gpa(start_pa, end_pa, enc))
+			return false;
 
-		if (!try_accept_one(&start, len, PG_LEVEL_4K))
+		/* private->shared conversion requires only MapGPA call */
+		if (enc && !try_accept_page(start_pa, end_pa))
 			return false;
+
+		start_va += is_vmalloc ? PAGE_SIZE : len;
 	}
 
 	return true;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
  2022-12-07  0:33 ` [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
  2022-12-07  0:33 ` [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
@ 2022-12-07  0:33 ` Dexuan Cui
  2022-12-12  0:59   ` Sathyanarayanan Kuppuswamy
  2022-12-07  0:33 ` [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments Dexuan Cui
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Dexuan Cui @ 2022-12-07  0:33 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Dexuan Cui

No logic change to SNP/VBS guests.

hv_isolation_type_tdx() wil be used to instruct a TDX guest on Hyper-V to
do some TDX-specific operations, e.g. hv_do_hypercall() should use
__tdx_hypercall(), and a TDX guest on Hyper-V should handle the Hyper-V
Event/Message/Monitor pages specially.

Signed-off-by: Dexuan Cui <decui@microsoft.com>

---

Changes in v2:
  Added "#ifdef CONFIG_INTEL_TDX_GUEST and #endif" for
    hv_isolation_type_tdx() in arch/x86/hyperv/ivm.c.

    Simplified the changes in ms_hyperv_init_platform().

 arch/x86/hyperv/ivm.c              | 9 +++++++++
 arch/x86/include/asm/hyperv-tlfs.h | 3 ++-
 arch/x86/include/asm/mshyperv.h    | 3 +++
 arch/x86/kernel/cpu/mshyperv.c     | 7 ++++++-
 drivers/hv/hv_common.c             | 6 ++++++
 5 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 1dbcbd9da74d..13ccb52eecd7 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -269,6 +269,15 @@ bool hv_isolation_type_snp(void)
 	return static_branch_unlikely(&isolation_type_snp);
 }
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
+
+bool hv_isolation_type_tdx(void)
+{
+	return static_branch_unlikely(&isolation_type_tdx);
+}
+#endif
+
 /*
  * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
  *
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 6d9368ea3701..6c0a04d078f5 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -161,7 +161,8 @@
 enum hv_isolation_type {
 	HV_ISOLATION_TYPE_NONE	= 0,
 	HV_ISOLATION_TYPE_VBS	= 1,
-	HV_ISOLATION_TYPE_SNP	= 2
+	HV_ISOLATION_TYPE_SNP	= 2,
+	HV_ISOLATION_TYPE_TDX	= 3
 };
 
 /* Hyper-V specific model specific registers (MSRs) */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 61f0c206bff0..8a2cafec4675 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -14,6 +14,7 @@
 union hv_ghcb;
 
 DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);
 
 typedef int (*hyperv_fill_flush_list_func)(
 		struct hv_guest_mapping_flush_list *flush,
@@ -32,6 +33,8 @@ extern u64 hv_current_partition_id;
 
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
+extern bool hv_isolation_type_tdx(void);
+
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 46668e255421..941372449ff2 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -339,9 +339,14 @@ static void __init ms_hyperv_init_platform(void)
 		}
 		/* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
 		if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
-			if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
+			if (hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS ||
+			    hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
 				cc_set_vendor(CC_VENDOR_HYPERV);
 		}
+
+		if (IS_ENABLED(CONFIG_INTEL_TDX_GUEST) &&
+		    hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+			static_branch_enable(&isolation_type_tdx);
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..a9a03ab04b97 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -268,6 +268,12 @@ bool __weak hv_isolation_type_snp(void)
 }
 EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
 
+bool __weak hv_isolation_type_tdx(void)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
+
 void __weak hv_setup_vmbus_handler(void (*handler)(void))
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments
  2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
                   ` (2 preceding siblings ...)
  2022-12-07  0:33 ` [PATCH v2 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
@ 2022-12-07  0:33 ` Dexuan Cui
  2022-12-07 22:14   ` Sathyanarayanan Kuppuswamy
  2022-12-08 22:07   ` Kirill A. Shutemov
  2022-12-07  0:33 ` [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests Dexuan Cui
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-07  0:33 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

So far __tdx_hypercall() only handles six arguments for VMCALL.
Expanding it to six more register would allow to cover more use-cases.

Using RDI and RSI as VMCALL arguments requires more register shuffling.
RAX is used to hold tdx_hypercall_args pointer and RBP stores flags.

While there, fix typo in the comment on panic branch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
---

This patch is from Kirill. I'm posting the patch on behalf him:
https://lwn.net/ml/linux-kernel/20221202214741.7vfmqgvgubxqffen@box.shutemov.name/

This is actually v1, but let's use v2 in the Subject to be consistent
with the Subjects of the other patches.

 arch/x86/coco/tdx/tdcall.S        | 82 ++++++++++++++++++++++---------
 arch/x86/include/asm/shared/tdx.h |  6 +++
 arch/x86/kernel/asm-offsets.c     |  6 +++
 3 files changed, 70 insertions(+), 24 deletions(-)

diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
index f9eb1134f22d..64e57739dc9d 100644
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -13,6 +13,12 @@
 /*
  * Bitmasks of exposed registers (with VMM).
  */
+#define TDX_RDX		BIT(2)
+#define TDX_RBX		BIT(3)
+#define TDX_RSI		BIT(6)
+#define TDX_RDI		BIT(7)
+#define TDX_R8		BIT(8)
+#define TDX_R9		BIT(9)
 #define TDX_R10		BIT(10)
 #define TDX_R11		BIT(11)
 #define TDX_R12		BIT(12)
@@ -27,9 +33,9 @@
  * details can be found in TDX GHCI specification, section
  * titled "TDCALL [TDG.VP.VMCALL] leaf".
  */
-#define TDVMCALL_EXPOSE_REGS_MASK	( TDX_R10 | TDX_R11 | \
-					  TDX_R12 | TDX_R13 | \
-					  TDX_R14 | TDX_R15 )
+#define TDVMCALL_EXPOSE_REGS_MASK	\
+	( TDX_RDX | TDX_RBX | TDX_RSI | TDX_RDI | TDX_R8  | TDX_R9  | \
+	  TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13 | TDX_R14 | TDX_R15 )
 
 /*
  * __tdx_module_call()  - Used by TDX guests to request services from
@@ -124,19 +130,32 @@ SYM_FUNC_START(__tdx_hypercall)
 	push %r14
 	push %r13
 	push %r12
+	push %rbx
+	push %rbp
+
+	movq %rdi, %rax
+	movq %rsi, %rbp
+
+	/* Copy hypercall registers from arg struct: */
+	movq TDX_HYPERCALL_r8(%rax),  %r8
+	movq TDX_HYPERCALL_r9(%rax),  %r9
+	movq TDX_HYPERCALL_r10(%rax), %r10
+	movq TDX_HYPERCALL_r11(%rax), %r11
+	movq TDX_HYPERCALL_r12(%rax), %r12
+	movq TDX_HYPERCALL_r13(%rax), %r13
+	movq TDX_HYPERCALL_r14(%rax), %r14
+	movq TDX_HYPERCALL_r15(%rax), %r15
+	movq TDX_HYPERCALL_rdi(%rax), %rdi
+	movq TDX_HYPERCALL_rsi(%rax), %rsi
+	movq TDX_HYPERCALL_rbx(%rax), %rbx
+	movq TDX_HYPERCALL_rdx(%rax), %rdx
+
+	push %rax
 
 	/* Mangle function call ABI into TDCALL ABI: */
 	/* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
 	xor %eax, %eax
 
-	/* Copy hypercall registers from arg struct: */
-	movq TDX_HYPERCALL_r10(%rdi), %r10
-	movq TDX_HYPERCALL_r11(%rdi), %r11
-	movq TDX_HYPERCALL_r12(%rdi), %r12
-	movq TDX_HYPERCALL_r13(%rdi), %r13
-	movq TDX_HYPERCALL_r14(%rdi), %r14
-	movq TDX_HYPERCALL_r15(%rdi), %r15
-
 	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
 
 	/*
@@ -148,14 +167,14 @@ SYM_FUNC_START(__tdx_hypercall)
 	 * HLT operation indefinitely. Since this is the not the desired
 	 * result, conditionally call STI before TDCALL.
 	 */
-	testq $TDX_HCALL_ISSUE_STI, %rsi
+	testq $TDX_HCALL_ISSUE_STI, %rbp
 	jz .Lskip_sti
 	sti
 .Lskip_sti:
 	tdcall
 
 	/*
-	 * RAX==0 indicates a failure of the TDVMCALL mechanism itself and that
+	 * RAX!=0 indicates a failure of the TDVMCALL mechanism itself and that
 	 * something has gone horribly wrong with the TDX module.
 	 *
 	 * The return status of the hypercall operation is in a separate
@@ -165,30 +184,45 @@ SYM_FUNC_START(__tdx_hypercall)
 	testq %rax, %rax
 	jne .Lpanic
 
-	/* TDVMCALL leaf return code is in R10 */
-	movq %r10, %rax
+	pop %rax
 
 	/* Copy hypercall result registers to arg struct if needed */
-	testq $TDX_HCALL_HAS_OUTPUT, %rsi
+	testq $TDX_HCALL_HAS_OUTPUT, %rbp
 	jz .Lout
 
-	movq %r10, TDX_HYPERCALL_r10(%rdi)
-	movq %r11, TDX_HYPERCALL_r11(%rdi)
-	movq %r12, TDX_HYPERCALL_r12(%rdi)
-	movq %r13, TDX_HYPERCALL_r13(%rdi)
-	movq %r14, TDX_HYPERCALL_r14(%rdi)
-	movq %r15, TDX_HYPERCALL_r15(%rdi)
+	movq %r8,  TDX_HYPERCALL_r8(%rax)
+	movq %r9,  TDX_HYPERCALL_r9(%rax)
+	movq %r10, TDX_HYPERCALL_r10(%rax)
+	movq %r11, TDX_HYPERCALL_r11(%rax)
+	movq %r12, TDX_HYPERCALL_r12(%rax)
+	movq %r13, TDX_HYPERCALL_r13(%rax)
+	movq %r14, TDX_HYPERCALL_r14(%rax)
+	movq %r15, TDX_HYPERCALL_r15(%rax)
+	movq %rdi, TDX_HYPERCALL_rdi(%rax)
+	movq %rsi, TDX_HYPERCALL_rsi(%rax)
+	movq %rbx, TDX_HYPERCALL_rbx(%rax)
+	movq %rdx, TDX_HYPERCALL_rdx(%rax)
 .Lout:
+	/* TDVMCALL leaf return code is in R10 */
+	movq %r10, %rax
+
 	/*
 	 * Zero out registers exposed to the VMM to avoid speculative execution
 	 * with VMM-controlled values. This needs to include all registers
-	 * present in TDVMCALL_EXPOSE_REGS_MASK (except R12-R15). R12-R15
-	 * context will be restored.
+	 * present in TDVMCALL_EXPOSE_REGS_MASK, except RBX, and R12-R15 which
+	 * will be restored.
 	 */
+	xor %r8d,  %r8d
+	xor %r9d,  %r9d
 	xor %r10d, %r10d
 	xor %r11d, %r11d
+	xor %rdi,  %rdi
+	xor %rsi,  %rsi
+	xor %rdx,  %rdx
 
 	/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
+	pop %rbp
+	pop %rbx
 	pop %r12
 	pop %r13
 	pop %r14
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index e53f26228fbb..8068faa52de1 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -22,12 +22,18 @@
  * This is a software only structure and not part of the TDX module/VMM ABI.
  */
 struct tdx_hypercall_args {
+	u64 r8;
+	u64 r9;
 	u64 r10;
 	u64 r11;
 	u64 r12;
 	u64 r13;
 	u64 r14;
 	u64 r15;
+	u64 rdi;
+	u64 rsi;
+	u64 rbx;
+	u64 rdx;
 };
 
 /* Used to request services from the VMM */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 437308004ef2..9f09947495e2 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -75,12 +75,18 @@ static void __used common(void)
 	OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
 
 	BLANK();
+	OFFSET(TDX_HYPERCALL_r8,  tdx_hypercall_args, r8);
+	OFFSET(TDX_HYPERCALL_r9,  tdx_hypercall_args, r9);
 	OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_args, r10);
 	OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_args, r11);
 	OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_args, r12);
 	OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_args, r13);
 	OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_args, r14);
 	OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_args, r15);
+	OFFSET(TDX_HYPERCALL_rdi, tdx_hypercall_args, rdi);
+	OFFSET(TDX_HYPERCALL_rsi, tdx_hypercall_args, rsi);
+	OFFSET(TDX_HYPERCALL_rbx, tdx_hypercall_args, rbx);
+	OFFSET(TDX_HYPERCALL_rdx, tdx_hypercall_args, rdx);
 
 	BLANK();
 	OFFSET(BP_scratch, boot_params, scratch);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests
  2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
                   ` (3 preceding siblings ...)
  2022-12-07  0:33 ` [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments Dexuan Cui
@ 2022-12-07  0:33 ` Dexuan Cui
  2022-12-12 16:38   ` Michael Kelley (LINUX)
  2023-01-06 11:23   ` Zhi Wang
  2022-12-07  0:33 ` [PATCH v2 6/6] Drivers: hv: vmbus: Support " Dexuan Cui
  2022-12-12  0:04 ` [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
  6 siblings, 2 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-07  0:33 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Dexuan Cui

A TDX guest uses the GHCI call rather than hv_hypercall_pg.

In hv_do_hypercall(), Hyper-V requires that the input/output addresses
must have the cc_mask.

Signed-off-by: Dexuan Cui <decui@microsoft.com>

---

Changes in v2:
  Implemented hv_tdx_hypercall() in C rather than in assembly code.
  Renamed the parameter names of hv_tdx_hypercall().
  Used cc_mkdec() directly in hv_do_hypercall().

 arch/x86/hyperv/hv_init.c       |  8 ++++++++
 arch/x86/hyperv/ivm.c           | 14 ++++++++++++++
 arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a823fde1ad7f..c0ba53ad8b8e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -430,6 +430,10 @@ void __init hyperv_init(void)
 	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
 	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
+	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+	if (hv_isolation_type_tdx())
+		goto skip_hypercall_pg_init;
+
 	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
 			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
@@ -469,6 +473,7 @@ void __init hyperv_init(void)
 		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	}
 
+skip_hypercall_pg_init:
 	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
 	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -604,6 +609,9 @@ bool hv_is_hyperv_initialized(void)
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return false;
 
+	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+	if (hv_isolation_type_tdx())
+		return true;
 	/*
 	 * Verify that earlier initialization succeeded by checking
 	 * that the hypercall page is setup
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 13ccb52eecd7..07e4253b5809 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -276,6 +276,20 @@ bool hv_isolation_type_tdx(void)
 {
 	return static_branch_unlikely(&isolation_type_tdx);
 }
+
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
+{
+	struct tdx_hypercall_args args = { };
+
+	args.r10 = control;
+	args.rdx = param1;
+	args.r8  = param2;
+
+	(void)__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
+
+	return args.r11;
+}
+EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
 #endif
 
 /*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 8a2cafec4675..a4d665472d9e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -10,6 +10,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/paravirt.h>
 #include <asm/mshyperv.h>
+#include <asm/coco.h>
 
 union hv_ghcb;
 
@@ -39,6 +40,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
 
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+
+/*
+ * If the hypercall involves no input or output parameters, the hypervisor
+ * ignores the corresponding GPA pointer.
+ */
 static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 {
 	u64 input_address = input ? virt_to_phys(input) : 0;
@@ -46,6 +53,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control,
+					cc_mkdec(input_address),
+					cc_mkdec(output_address));
 	if (!hv_hypercall_pg)
 		return U64_MAX;
 
@@ -83,6 +94,9 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
 	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control, input1, 0);
+
 	{
 		__asm__ __volatile__(CALL_NOSPEC
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -114,6 +128,9 @@ static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
 	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
 
 #ifdef CONFIG_X86_64
+	if (hv_isolation_type_tdx())
+		return hv_tdx_hypercall(control, input1, input2);
+
 	{
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     CALL_NOSPEC
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/6] Drivers: hv: vmbus: Support TDX guests
  2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
                   ` (4 preceding siblings ...)
  2022-12-07  0:33 ` [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests Dexuan Cui
@ 2022-12-07  0:33 ` Dexuan Cui
  2022-12-12 17:02   ` Michael Kelley (LINUX)
  2022-12-12  0:04 ` [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
  6 siblings, 1 reply; 28+ messages in thread
From: Dexuan Cui @ 2022-12-07  0:33 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley
  Cc: linux-kernel, Dexuan Cui

Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
  No need to use hv_vp_assist_page.
  Don't use the unsafe Hyper-V TSC page.
  Don't try to use HV_REGISTER_CRASH_CTL.
  Share SynIC Event/Message pages and VMBus Monitor pages with the host.
  Use pgprot_decrypted(PAGE_KERNEL_NOENC))in hv_ringbuffer_init().

Signed-off-by: Dexuan Cui <decui@microsoft.com>

Changes in v2:
  Used a new function hv_set_memory_enc_dec_needed() in
    __set_memory_enc_pgtable().
  Added the missing set_memory_encrypted() in hv_synic_free().
  
---

 arch/x86/hyperv/hv_init.c      | 19 ++++++++---
 arch/x86/hyperv/ivm.c          |  5 +++
 arch/x86/kernel/cpu/mshyperv.c | 17 +++++++++-
 arch/x86/mm/pat/set_memory.c   |  2 +-
 drivers/hv/connection.c        |  4 ++-
 drivers/hv/hv.c                | 60 +++++++++++++++++++++++++++++++++-
 drivers/hv/hv_common.c         |  6 ++++
 drivers/hv/ring_buffer.c       |  2 +-
 include/asm-generic/mshyperv.h |  2 ++
 9 files changed, 108 insertions(+), 9 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index c0ba53ad8b8e..8d7b63346194 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
 static int hv_cpu_init(unsigned int cpu)
 {
 	union hv_vp_assist_msr_contents msr = { 0 };
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
+	struct hv_vp_assist_page **hvp;
 	int ret;
 
 	ret = hv_common_cpu_init(cpu);
@@ -87,6 +87,7 @@ static int hv_cpu_init(unsigned int cpu)
 	if (!hv_vp_assist_page)
 		return 0;
 
+	hvp = &hv_vp_assist_page[cpu];
 	if (hv_root_partition) {
 		/*
 		 * For root partition we get the hypervisor provided VP assist
@@ -396,11 +397,21 @@ void __init hyperv_init(void)
 	if (hv_common_init())
 		return;
 
-	hv_vp_assist_page = kcalloc(num_possible_cpus(),
-				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
+	/*
+	 * The VP assist page is useless to a TDX guest: the only use we
+	 * would have for it is lazy EOI, which can not be used with TDX.
+	 */
+	if (hv_isolation_type_tdx())
+		hv_vp_assist_page = NULL;
+	else
+		hv_vp_assist_page = kcalloc(num_possible_cpus(),
+					    sizeof(*hv_vp_assist_page),
+					    GFP_KERNEL);
 	if (!hv_vp_assist_page) {
 		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
-		goto common_free;
+
+		if (!hv_isolation_type_tdx())
+			goto common_free;
 	}
 
 	if (hv_isolation_type_snp()) {
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 07e4253b5809..4398042f10d5 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -258,6 +258,11 @@ bool hv_is_isolation_supported(void)
 	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
 
+bool hv_set_memory_enc_dec_needed(void)
+{
+	return hv_is_isolation_supported() && !hv_isolation_type_tdx();
+}
+
 DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
 
 /*
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 941372449ff2..24569da3c1f8 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -345,8 +345,23 @@ static void __init ms_hyperv_init_platform(void)
 		}
 
 		if (IS_ENABLED(CONFIG_INTEL_TDX_GUEST) &&
-		    hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+		    hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
 			static_branch_enable(&isolation_type_tdx);
+
+			/*
+			 * The GPAs of SynIC Event/Message pages and VMBus
+			 * Moniter pages need to be added by this offset.
+			 */
+			ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
+
+			/* Don't use the unsafe Hyper-V TSC page */
+			ms_hyperv.features &=
+				~HV_MSR_REFERENCE_TSC_AVAILABLE;
+
+			/* HV_REGISTER_CRASH_CTL is unsupported */
+			ms_hyperv.misc_features &=
+				 ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+		}
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 2e5a045731de..5892196f8ade 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2120,7 +2120,7 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
 
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 {
-	if (hv_is_isolation_supported())
+	if (hv_set_memory_enc_dec_needed())
 		return hv_set_mem_host_visibility(addr, numpages, !enc);
 
 	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 9dc27e5d367a..1ecc3c29e3f7 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -250,12 +250,14 @@ int vmbus_connect(void)
 		 * Isolation VM with AMD SNP needs to access monitor page via
 		 * address space above shared gpa boundary.
 		 */
-		if (hv_isolation_type_snp()) {
+		if (hv_isolation_type_snp() || hv_isolation_type_tdx()) {
 			vmbus_connection.monitor_pages_pa[0] +=
 				ms_hyperv.shared_gpa_boundary;
 			vmbus_connection.monitor_pages_pa[1] +=
 				ms_hyperv.shared_gpa_boundary;
+		}
 
+		if (hv_isolation_type_snp()) {
 			vmbus_connection.monitor_pages[0]
 				= memremap(vmbus_connection.monitor_pages_pa[0],
 					   HV_HYP_PAGE_SIZE,
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 4d6480d57546..78aca415985c 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -18,6 +18,7 @@
 #include <linux/clockchips.h>
 #include <linux/delay.h>
 #include <linux/interrupt.h>
+#include <linux/set_memory.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/mshyperv.h>
 #include "hyperv_vmbus.h"
@@ -119,6 +120,7 @@ int hv_synic_alloc(void)
 {
 	int cpu;
 	struct hv_per_cpu_context *hv_cpu;
+	int ret = -ENOMEM;
 
 	/*
 	 * First, zero all per-cpu memory areas so hv_synic_free() can
@@ -168,6 +170,30 @@ int hv_synic_alloc(void)
 			pr_err("Unable to allocate post msg page\n");
 			goto err;
 		}
+
+
+		if (hv_isolation_type_tdx()) {
+			ret = set_memory_decrypted(
+				(unsigned long)hv_cpu->synic_message_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt SYNIC msg page\n");
+				goto err;
+			}
+
+			ret = set_memory_decrypted(
+				(unsigned long)hv_cpu->synic_event_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt SYNIC event page\n");
+				goto err;
+			}
+
+			ret = set_memory_decrypted(
+				(unsigned long)hv_cpu->post_msg_page, 1);
+			if (ret) {
+				pr_err("Failed to decrypt post msg page\n");
+				goto err;
+			}
+		}
 	}
 
 	return 0;
@@ -176,18 +202,42 @@ int hv_synic_alloc(void)
 	 * Any memory allocations that succeeded will be freed when
 	 * the caller cleans up by calling hv_synic_free()
 	 */
-	return -ENOMEM;
+	return ret;
 }
 
 
 void hv_synic_free(void)
 {
 	int cpu;
+	int ret;
 
 	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
+		if (hv_isolation_type_tdx()) {
+			ret = set_memory_encrypted(
+				(unsigned long)hv_cpu->synic_message_page, 1);
+			if (ret) {
+				pr_err("Failed to encrypt SYNIC msg page\n");
+				continue;
+			}
+
+			ret = set_memory_encrypted(
+				(unsigned long)hv_cpu->synic_event_page, 1);
+			if (ret) {
+				pr_err("Failed to encrypt SYNIC event page\n");
+				continue;
+			}
+
+			ret = set_memory_encrypted(
+				(unsigned long)hv_cpu->post_msg_page, 1);
+			if (ret) {
+				pr_err("Failed to encrypt post msg page\n");
+				continue;
+			}
+		}
+
 		free_page((unsigned long)hv_cpu->synic_event_page);
 		free_page((unsigned long)hv_cpu->synic_message_page);
 		free_page((unsigned long)hv_cpu->post_msg_page);
@@ -225,6 +275,10 @@ void hv_synic_enable_regs(unsigned int cpu)
 	} else {
 		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
 			>> HV_HYP_PAGE_SHIFT;
+
+		if (hv_isolation_type_tdx())
+			simp.base_simp_gpa |= ms_hyperv.shared_gpa_boundary
+				>> HV_HYP_PAGE_SHIFT;
 	}
 
 	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
@@ -243,6 +297,10 @@ void hv_synic_enable_regs(unsigned int cpu)
 	} else {
 		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
 			>> HV_HYP_PAGE_SHIFT;
+
+		if (hv_isolation_type_tdx())
+			siefp.base_siefp_gpa |= ms_hyperv.shared_gpa_boundary
+				>> HV_HYP_PAGE_SHIFT;
 	}
 
 	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index a9a03ab04b97..192dcf295dfc 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -262,6 +262,12 @@ bool __weak hv_is_isolation_supported(void)
 }
 EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
 
+bool __weak hv_set_memory_enc_dec_needed(void)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_set_memory_enc_dec_needed);
+
 bool __weak hv_isolation_type_snp(void)
 {
 	return false;
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index c6692fd5ab15..a51da82316ce 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -233,7 +233,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
 
 		ring_info->ring_buffer = (struct hv_ring_buffer *)
 			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
-				PAGE_KERNEL);
+				pgprot_decrypted(PAGE_KERNEL_NOENC));
 
 		kfree(pages_wraparound);
 		if (!ring_info->ring_buffer)
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index bfb9eb9d7215..b7b1b18c9854 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -262,6 +262,7 @@ bool hv_is_hyperv_initialized(void);
 bool hv_is_hibernation_supported(void);
 enum hv_isolation_type hv_get_isolation_type(void);
 bool hv_is_isolation_supported(void);
+bool hv_set_memory_enc_dec_needed(void);
 bool hv_isolation_type_snp(void);
 u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 void hyperv_cleanup(void);
@@ -274,6 +275,7 @@ static inline bool hv_is_hyperv_initialized(void) { return false; }
 static inline bool hv_is_hibernation_supported(void) { return false; }
 static inline void hyperv_cleanup(void) {}
 static inline bool hv_is_isolation_supported(void) { return false; }
+static inline bool hv_set_memory_enc_dec_needed(void) { return false; }
 static inline enum hv_isolation_type hv_get_isolation_type(void)
 {
 	return HV_ISOLATION_TYPE_NONE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments
  2022-12-07  0:33 ` [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments Dexuan Cui
@ 2022-12-07 22:14   ` Sathyanarayanan Kuppuswamy
  2022-12-08 15:54     ` Dexuan Cui
  2022-12-08 22:07   ` Kirill A. Shutemov
  1 sibling, 1 reply; 28+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2022-12-07 22:14 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt, seanjc,
	tglx, tony.luck, wei.liu, x86, mikelley
  Cc: linux-kernel



On 12/6/22 4:33 PM, Dexuan Cui wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> So far __tdx_hypercall() only handles six arguments for VMCALL.
> Expanding it to six more register would allow to cover more use-cases.
> 
> Using RDI and RSI as VMCALL arguments requires more register shuffling.
> RAX is used to hold tdx_hypercall_args pointer and RBP stores flags.
> 
> While there, fix typo in the comment on panic branch.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> Tested-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> This patch is from Kirill. I'm posting the patch on behalf him:
> https://lwn.net/ml/linux-kernel/20221202214741.7vfmqgvgubxqffen@box.shutemov.name/
> 
> This is actually v1, but let's use v2 in the Subject to be consistent
> with the Subjects of the other patches.
> 
>  arch/x86/coco/tdx/tdcall.S        | 82 ++++++++++++++++++++++---------
>  arch/x86/include/asm/shared/tdx.h |  6 +++
>  arch/x86/kernel/asm-offsets.c     |  6 +++
>  3 files changed, 70 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
> index f9eb1134f22d..64e57739dc9d 100644
> --- a/arch/x86/coco/tdx/tdcall.S
> +++ b/arch/x86/coco/tdx/tdcall.S
> @@ -13,6 +13,12 @@
>  /*
>   * Bitmasks of exposed registers (with VMM).
>   */
> +#define TDX_RDX		BIT(2)
> +#define TDX_RBX		BIT(3)
> +#define TDX_RSI		BIT(6)
> +#define TDX_RDI		BIT(7)
> +#define TDX_R8		BIT(8)
> +#define TDX_R9		BIT(9)
>  #define TDX_R10		BIT(10)
>  #define TDX_R11		BIT(11)
>  #define TDX_R12		BIT(12)
> @@ -27,9 +33,9 @@
>   * details can be found in TDX GHCI specification, section
>   * titled "TDCALL [TDG.VP.VMCALL] leaf".
>   */
> -#define TDVMCALL_EXPOSE_REGS_MASK	( TDX_R10 | TDX_R11 | \
> -					  TDX_R12 | TDX_R13 | \
> -					  TDX_R14 | TDX_R15 )
> +#define TDVMCALL_EXPOSE_REGS_MASK	\
> +	( TDX_RDX | TDX_RBX | TDX_RSI | TDX_RDI | TDX_R8  | TDX_R9  | \
> +	  TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13 | TDX_R14 | TDX_R15 )
>  

You seem to have expanded the list to include all VMCALL supported
registers except RBP. Why not include it as well? That way, it will be
a complete support.

>  /*
>   * __tdx_module_call()  - Used by TDX guests to request services from
> @@ -124,19 +130,32 @@ SYM_FUNC_START(__tdx_hypercall)
>  	push %r14
>  	push %r13
>  	push %r12
> +	push %rbx
> +	push %rbp
> +
> +	movq %rdi, %rax
> +	movq %rsi, %rbp
> +
> +	/* Copy hypercall registers from arg struct: */
> +	movq TDX_HYPERCALL_r8(%rax),  %r8
> +	movq TDX_HYPERCALL_r9(%rax),  %r9
> +	movq TDX_HYPERCALL_r10(%rax), %r10
> +	movq TDX_HYPERCALL_r11(%rax), %r11
> +	movq TDX_HYPERCALL_r12(%rax), %r12
> +	movq TDX_HYPERCALL_r13(%rax), %r13
> +	movq TDX_HYPERCALL_r14(%rax), %r14
> +	movq TDX_HYPERCALL_r15(%rax), %r15
> +	movq TDX_HYPERCALL_rdi(%rax), %rdi
> +	movq TDX_HYPERCALL_rsi(%rax), %rsi
> +	movq TDX_HYPERCALL_rbx(%rax), %rbx
> +	movq TDX_HYPERCALL_rdx(%rax), %rdx
> +
> +	push %rax
>  
>  	/* Mangle function call ABI into TDCALL ABI: */
>  	/* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
>  	xor %eax, %eax
>  
> -	/* Copy hypercall registers from arg struct: */
> -	movq TDX_HYPERCALL_r10(%rdi), %r10
> -	movq TDX_HYPERCALL_r11(%rdi), %r11
> -	movq TDX_HYPERCALL_r12(%rdi), %r12
> -	movq TDX_HYPERCALL_r13(%rdi), %r13
> -	movq TDX_HYPERCALL_r14(%rdi), %r14
> -	movq TDX_HYPERCALL_r15(%rdi), %r15
> -
>  	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>  
>  	/*
> @@ -148,14 +167,14 @@ SYM_FUNC_START(__tdx_hypercall)
>  	 * HLT operation indefinitely. Since this is the not the desired
>  	 * result, conditionally call STI before TDCALL.
>  	 */
> -	testq $TDX_HCALL_ISSUE_STI, %rsi
> +	testq $TDX_HCALL_ISSUE_STI, %rbp
>  	jz .Lskip_sti
>  	sti
>  .Lskip_sti:
>  	tdcall
>  
>  	/*
> -	 * RAX==0 indicates a failure of the TDVMCALL mechanism itself and that
> +	 * RAX!=0 indicates a failure of the TDVMCALL mechanism itself and that
>  	 * something has gone horribly wrong with the TDX module.
>  	 *
>  	 * The return status of the hypercall operation is in a separate
> @@ -165,30 +184,45 @@ SYM_FUNC_START(__tdx_hypercall)
>  	testq %rax, %rax
>  	jne .Lpanic
>  
> -	/* TDVMCALL leaf return code is in R10 */
> -	movq %r10, %rax
> +	pop %rax
>  
>  	/* Copy hypercall result registers to arg struct if needed */
> -	testq $TDX_HCALL_HAS_OUTPUT, %rsi
> +	testq $TDX_HCALL_HAS_OUTPUT, %rbp
>  	jz .Lout
>  
> -	movq %r10, TDX_HYPERCALL_r10(%rdi)
> -	movq %r11, TDX_HYPERCALL_r11(%rdi)
> -	movq %r12, TDX_HYPERCALL_r12(%rdi)
> -	movq %r13, TDX_HYPERCALL_r13(%rdi)
> -	movq %r14, TDX_HYPERCALL_r14(%rdi)
> -	movq %r15, TDX_HYPERCALL_r15(%rdi)
> +	movq %r8,  TDX_HYPERCALL_r8(%rax)
> +	movq %r9,  TDX_HYPERCALL_r9(%rax)
> +	movq %r10, TDX_HYPERCALL_r10(%rax)
> +	movq %r11, TDX_HYPERCALL_r11(%rax)
> +	movq %r12, TDX_HYPERCALL_r12(%rax)
> +	movq %r13, TDX_HYPERCALL_r13(%rax)
> +	movq %r14, TDX_HYPERCALL_r14(%rax)
> +	movq %r15, TDX_HYPERCALL_r15(%rax)
> +	movq %rdi, TDX_HYPERCALL_rdi(%rax)
> +	movq %rsi, TDX_HYPERCALL_rsi(%rax)
> +	movq %rbx, TDX_HYPERCALL_rbx(%rax)
> +	movq %rdx, TDX_HYPERCALL_rdx(%rax)
>  .Lout:
> +	/* TDVMCALL leaf return code is in R10 */
> +	movq %r10, %rax
> +
>  	/*
>  	 * Zero out registers exposed to the VMM to avoid speculative execution
>  	 * with VMM-controlled values. This needs to include all registers
> -	 * present in TDVMCALL_EXPOSE_REGS_MASK (except R12-R15). R12-R15
> -	 * context will be restored.
> +	 * present in TDVMCALL_EXPOSE_REGS_MASK, except RBX, and R12-R15 which
> +	 * will be restored.
>  	 */
> +	xor %r8d,  %r8d
> +	xor %r9d,  %r9d
>  	xor %r10d, %r10d
>  	xor %r11d, %r11d
> +	xor %rdi,  %rdi
> +	xor %rsi,  %rsi
> +	xor %rdx,  %rdx
>  
>  	/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
> +	pop %rbp
> +	pop %rbx
>  	pop %r12
>  	pop %r13
>  	pop %r14
> diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> index e53f26228fbb..8068faa52de1 100644
> --- a/arch/x86/include/asm/shared/tdx.h
> +++ b/arch/x86/include/asm/shared/tdx.h
> @@ -22,12 +22,18 @@
>   * This is a software only structure and not part of the TDX module/VMM ABI.
>   */
>  struct tdx_hypercall_args {
> +	u64 r8;
> +	u64 r9;
>  	u64 r10;
>  	u64 r11;
>  	u64 r12;
>  	u64 r13;
>  	u64 r14;
>  	u64 r15;
> +	u64 rdi;
> +	u64 rsi;
> +	u64 rbx;
> +	u64 rdx;
>  };
>  
>  /* Used to request services from the VMM */
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index 437308004ef2..9f09947495e2 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -75,12 +75,18 @@ static void __used common(void)
>  	OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
>  
>  	BLANK();
> +	OFFSET(TDX_HYPERCALL_r8,  tdx_hypercall_args, r8);
> +	OFFSET(TDX_HYPERCALL_r9,  tdx_hypercall_args, r9);
>  	OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_args, r10);
>  	OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_args, r11);
>  	OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_args, r12);
>  	OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_args, r13);
>  	OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_args, r14);
>  	OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_args, r15);
> +	OFFSET(TDX_HYPERCALL_rdi, tdx_hypercall_args, rdi);
> +	OFFSET(TDX_HYPERCALL_rsi, tdx_hypercall_args, rsi);
> +	OFFSET(TDX_HYPERCALL_rbx, tdx_hypercall_args, rbx);
> +	OFFSET(TDX_HYPERCALL_rdx, tdx_hypercall_args, rdx);
>  
>  	BLANK();
>  	OFFSET(BP_scratch, boot_params, scratch);

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments
  2022-12-07 22:14   ` Sathyanarayanan Kuppuswamy
@ 2022-12-08 15:54     ` Dexuan Cui
  2022-12-08 22:06       ` Kirill A. Shutemov
  0 siblings, 1 reply; 28+ messages in thread
From: Dexuan Cui @ 2022-12-08 15:54 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, ak, arnd, bp, brijesh.singh,
	Williams, Dan J, dave.hansen, Haiyang Zhang, hpa, jane.chu,
	kirill.shutemov, KY Srinivasan, linux-arch, linux-hyperv, luto,
	mingo, peterz, rostedt, seanjc, tglx, tony.luck, wei.liu, x86,
	Michael Kelley (LINUX)
  Cc: linux-kernel

> From: Sathyanarayanan Kuppuswamy
> Sent: Wednesday, December 7, 2022 2:14 PM
>  [...]
> > --- a/arch/x86/coco/tdx/tdcall.S
> > +++ b/arch/x86/coco/tdx/tdcall.S
> > @@ -13,6 +13,12 @@
> >  /*
> >   * Bitmasks of exposed registers (with VMM).
> >   */
> > +#define TDX_RDX		BIT(2)
> > +#define TDX_RBX		BIT(3)
> > +#define TDX_RSI		BIT(6)
> > +#define TDX_RDI		BIT(7)
> > +#define TDX_R8		BIT(8)
> > +#define TDX_R9		BIT(9)
> >  #define TDX_R10		BIT(10)
> >  #define TDX_R11		BIT(11)
> >  #define TDX_R12		BIT(12)
> > @@ -27,9 +33,9 @@
> >   * details can be found in TDX GHCI specification, section
> >   * titled "TDCALL [TDG.VP.VMCALL] leaf".
> >   */
> > -#define TDVMCALL_EXPOSE_REGS_MASK	( TDX_R10 | TDX_R11 | \
> > -					  TDX_R12 | TDX_R13 | \
> > -					  TDX_R14 | TDX_R15 )
> > +#define TDVMCALL_EXPOSE_REGS_MASK	\
> > +	( TDX_RDX | TDX_RBX | TDX_RSI | TDX_RDI | TDX_R8  | TDX_R9  | \
> > +	  TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13 | TDX_R14 | TDX_R15 )
> >
> 
> You seem to have expanded the list to include all VMCALL supported
> registers except RBP. Why not include it as well? That way, it will be
> a complete support.

Hi Kirill, can you please share your thoughts?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
  2022-12-07  0:33 ` [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
@ 2022-12-08 19:48   ` Kirill A. Shutemov
  2022-12-08 19:54     ` Dexuan Cui
  0 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2022-12-08 19:48 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley, linux-kernel

On Tue, Dec 06, 2022 at 04:33:20PM -0800, Dexuan Cui wrote:
> GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
> error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
> operation for the pages in the region starting at the GPA specified
> in R11.
> 
> When a TDX guest runs on Hyper-V, Hyper-V returns the retry error
> when hyperv_init() -> swiotlb_update_mem_attributes() ->
> set_memory_decrypted() decrypts up to 1GB of swiotlb bounce buffers.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Changes in v2:
>   Used __tdx_hypercall() directly in tdx_map_gpa().
>   Added a max_retry_cnt of 1000.
>   Renamed a few variables, e.g., r11 -> map_fail_paddr.
> 
>  arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 52 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 3fee96931ff5..cdeda698d308 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -20,6 +20,8 @@
>  /* TDX hypercall Leaf IDs */
>  #define TDVMCALL_MAP_GPA		0x10001
>  
> +#define TDVMCALL_STATUS_RETRY		1
> +
>  /* MMIO direction */
>  #define EPT_READ	0
>  #define EPT_WRITE	1
> @@ -692,14 +694,15 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
>  }
>  
>  /*
> - * Inform the VMM of the guest's intent for this physical page: shared with
> - * the VMM or private to the guest.  The VMM is expected to change its mapping
> - * of the page in response.
> + * Notify the VMM about page mapping conversion. More info about ABI
> + * can be found in TDX Guest-Host-Communication Interface (GHCI),
> + * section "TDG.VP.VMCALL<MapGPA>".
>   */
> -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> +static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
>  {
> -	phys_addr_t start = __pa(vaddr);
> -	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> +	int max_retry_cnt = 1000, retry_cnt = 0;

Hm. max_retry_cnt looks too high to me. I expected to see 3 or something.

Any justification for it to be *that* high?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
  2022-12-08 19:48   ` Kirill A. Shutemov
@ 2022-12-08 19:54     ` Dexuan Cui
  0 siblings, 0 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-08 19:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: ak, arnd, bp, brijesh.singh, Williams, Dan J, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel

> From: Kirill A. Shutemov <kirill@shutemov.name>
> Sent: Thursday, December 8, 2022 11:48 AM
> To: Dexuan Cui <decui@microsoft.com>
> > [...]
> > +static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
> >  {
> > -	phys_addr_t start = __pa(vaddr);
> > -	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> > +	int max_retry_cnt = 1000, retry_cnt = 0;
> 
> Hm. max_retry_cnt looks too high to me. I expected to see 3 or something.
> 
> Any justification for it to be *that* high?

No. I just used an enough big number :-)
I'll change it to 3 in the next version.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments
  2022-12-08 15:54     ` Dexuan Cui
@ 2022-12-08 22:06       ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2022-12-08 22:06 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Sathyanarayanan Kuppuswamy, ak, arnd, bp, brijesh.singh,
	Williams, Dan J, dave.hansen, Haiyang Zhang, hpa, jane.chu,
	kirill.shutemov, KY Srinivasan, linux-arch, linux-hyperv, luto,
	mingo, peterz, rostedt, seanjc, tglx, tony.luck, wei.liu, x86,
	Michael Kelley (LINUX),
	linux-kernel

On Thu, Dec 08, 2022 at 03:54:32PM +0000, Dexuan Cui wrote:
> > From: Sathyanarayanan Kuppuswamy
> > Sent: Wednesday, December 7, 2022 2:14 PM
> >  [...]
> > > --- a/arch/x86/coco/tdx/tdcall.S
> > > +++ b/arch/x86/coco/tdx/tdcall.S
> > > @@ -13,6 +13,12 @@
> > >  /*
> > >   * Bitmasks of exposed registers (with VMM).
> > >   */
> > > +#define TDX_RDX		BIT(2)
> > > +#define TDX_RBX		BIT(3)
> > > +#define TDX_RSI		BIT(6)
> > > +#define TDX_RDI		BIT(7)
> > > +#define TDX_R8		BIT(8)
> > > +#define TDX_R9		BIT(9)
> > >  #define TDX_R10		BIT(10)
> > >  #define TDX_R11		BIT(11)
> > >  #define TDX_R12		BIT(12)
> > > @@ -27,9 +33,9 @@
> > >   * details can be found in TDX GHCI specification, section
> > >   * titled "TDCALL [TDG.VP.VMCALL] leaf".
> > >   */
> > > -#define TDVMCALL_EXPOSE_REGS_MASK	( TDX_R10 | TDX_R11 | \
> > > -					  TDX_R12 | TDX_R13 | \
> > > -					  TDX_R14 | TDX_R15 )
> > > +#define TDVMCALL_EXPOSE_REGS_MASK	\
> > > +	( TDX_RDX | TDX_RBX | TDX_RSI | TDX_RDI | TDX_R8  | TDX_R9  | \
> > > +	  TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13 | TDX_R14 | TDX_R15 )
> > >
> > 
> > You seem to have expanded the list to include all VMCALL supported
> > registers except RBP. Why not include it as well? That way, it will be
> > a complete support.
> 
> Hi Kirill, can you please share your thoughts?

I wrote the patch to handle redefined ReportFatalError() (the updated GHCI
comes soon). It doesn't need the RBP. And we run out of registers to stash
arguments into. Let's think about this when the first user of RBP comes up.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments
  2022-12-07  0:33 ` [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments Dexuan Cui
  2022-12-07 22:14   ` Sathyanarayanan Kuppuswamy
@ 2022-12-08 22:07   ` Kirill A. Shutemov
  2022-12-08 23:09     ` Dexuan Cui
  1 sibling, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2022-12-08 22:07 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley, linux-kernel

On Tue, Dec 06, 2022 at 04:33:23PM -0800, Dexuan Cui wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> So far __tdx_hypercall() only handles six arguments for VMCALL.
> Expanding it to six more register would allow to cover more use-cases.
> 
> Using RDI and RSI as VMCALL arguments requires more register shuffling.
> RAX is used to hold tdx_hypercall_args pointer and RBP stores flags.
> 
> While there, fix typo in the comment on panic branch.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> Tested-by: Dexuan Cui <decui@microsoft.com>

Since you submit the patch you need to add your Signed-off-by.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments
  2022-12-08 22:07   ` Kirill A. Shutemov
@ 2022-12-08 23:09     ` Dexuan Cui
  0 siblings, 0 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-08 23:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: ak, arnd, bp, brijesh.singh, Williams, Dan J, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel

> From: Kirill A. Shutemov <kirill@shutemov.name>
> Sent: Thursday, December 8, 2022 2:07 PM
> [...]
> 
> Since you submit the patch you need to add your Signed-off-by.

Ok. Will do.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 0/6] Support TDX guests on Hyper-V
  2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
                   ` (5 preceding siblings ...)
  2022-12-07  0:33 ` [PATCH v2 6/6] Drivers: hv: vmbus: Support " Dexuan Cui
@ 2022-12-12  0:04 ` Dexuan Cui
  6 siblings, 0 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-12  0:04 UTC (permalink / raw)
  To: ak, arnd, bp, brijesh.singh, Williams, Dan J, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX)
  Cc: linux-kernel, Sebastian, Shiny

> From: Dexuan Cui <decui@microsoft.com>
> Sent: Tuesday, December 6, 2022 4:33 PM
> [...]
> 
> This patchset adds the Hyper-V specific code so that a TDX guest can run
> on Hyper-V. Please review. Thanks!

Hi Kiril, Sathyanarayanan, thanks for your comments. It'd be great if you
can comment on other patches.

Hi Dave, can you please also share your thoughts, especially for
patch 1, 2, 4 and 5?

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
  2022-12-07  0:33 ` [PATCH v2 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
@ 2022-12-12  0:59   ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 28+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2022-12-12  0:59 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
	dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt, seanjc,
	tglx, tony.luck, wei.liu, x86, mikelley
  Cc: linux-kernel

Hi,

On 12/6/22 4:33 PM, Dexuan Cui wrote:
> No logic change to SNP/VBS guests.
> 
> hv_isolation_type_tdx() wil be used to instruct a TDX guest on Hyper-V to
> do some TDX-specific operations, e.g. hv_do_hypercall() should use
> __tdx_hypercall(), and a TDX guest on Hyper-V should handle the Hyper-V
> Event/Message/Monitor pages specially.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

> 
> Changes in v2:
>   Added "#ifdef CONFIG_INTEL_TDX_GUEST and #endif" for
>     hv_isolation_type_tdx() in arch/x86/hyperv/ivm.c.
> 
>     Simplified the changes in ms_hyperv_init_platform().
> 
>  arch/x86/hyperv/ivm.c              | 9 +++++++++
>  arch/x86/include/asm/hyperv-tlfs.h | 3 ++-
>  arch/x86/include/asm/mshyperv.h    | 3 +++
>  arch/x86/kernel/cpu/mshyperv.c     | 7 ++++++-
>  drivers/hv/hv_common.c             | 6 ++++++
>  5 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 1dbcbd9da74d..13ccb52eecd7 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -269,6 +269,15 @@ bool hv_isolation_type_snp(void)
>  	return static_branch_unlikely(&isolation_type_snp);
>  }
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
> +
> +bool hv_isolation_type_tdx(void)
> +{
> +	return static_branch_unlikely(&isolation_type_tdx);
> +}
> +#endif
> +
>  /*
>   * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
>   *
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 6d9368ea3701..6c0a04d078f5 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -161,7 +161,8 @@
>  enum hv_isolation_type {
>  	HV_ISOLATION_TYPE_NONE	= 0,
>  	HV_ISOLATION_TYPE_VBS	= 1,
> -	HV_ISOLATION_TYPE_SNP	= 2
> +	HV_ISOLATION_TYPE_SNP	= 2,
> +	HV_ISOLATION_TYPE_TDX	= 3
>  };
>  
>  /* Hyper-V specific model specific registers (MSRs) */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 61f0c206bff0..8a2cafec4675 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -14,6 +14,7 @@
>  union hv_ghcb;
>  
>  DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> +DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);
>  
>  typedef int (*hyperv_fill_flush_list_func)(
>  		struct hv_guest_mapping_flush_list *flush,
> @@ -32,6 +33,8 @@ extern u64 hv_current_partition_id;
>  
>  extern union hv_ghcb * __percpu *hv_ghcb_pg;
>  
> +extern bool hv_isolation_type_tdx(void);
> +
>  int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
>  int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
>  int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 46668e255421..941372449ff2 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -339,9 +339,14 @@ static void __init ms_hyperv_init_platform(void)
>  		}
>  		/* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
>  		if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
> -			if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
> +			if (hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS ||
> +			    hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
>  				cc_set_vendor(CC_VENDOR_HYPERV);
>  		}
> +
> +		if (IS_ENABLED(CONFIG_INTEL_TDX_GUEST) &&
> +		    hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
> +			static_branch_enable(&isolation_type_tdx);
>  	}
>  
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index ae68298c0dca..a9a03ab04b97 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -268,6 +268,12 @@ bool __weak hv_isolation_type_snp(void)
>  }
>  EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
>  
> +bool __weak hv_isolation_type_tdx(void)
> +{
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
> +
>  void __weak hv_setup_vmbus_handler(void (*handler)(void))
>  {
>  }

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests
  2022-12-07  0:33 ` [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests Dexuan Cui
@ 2022-12-12 16:38   ` Michael Kelley (LINUX)
  2022-12-12 19:10     ` Dexuan Cui
  2023-01-06 11:23   ` Zhi Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Michael Kelley (LINUX) @ 2022-12-12 16:38 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, Williams, Dan J,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com> Sent: Tuesday, December 6, 2022 4:33 PM
> 
> A TDX guest uses the GHCI call rather than hv_hypercall_pg.
> 
> In hv_do_hypercall(), Hyper-V requires that the input/output addresses
> must have the cc_mask.

Is it a requirement that the input/output addresses refer to guest memory
pages that have marked as shared/decrypted?  For example, I don't see
any code to mark the hyperv_pcpu_input_arg page as shared/decrypted.
Do the use cases for the hyperv_pcpu_input_arg page just not occur in a
TDX VM?

Michael

> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> ---
> 
> Changes in v2:
>   Implemented hv_tdx_hypercall() in C rather than in assembly code.
>   Renamed the parameter names of hv_tdx_hypercall().
>   Used cc_mkdec() directly in hv_do_hypercall().
> 
>  arch/x86/hyperv/hv_init.c       |  8 ++++++++
>  arch/x86/hyperv/ivm.c           | 14 ++++++++++++++
>  arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
>  3 files changed, 39 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a823fde1ad7f..c0ba53ad8b8e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -430,6 +430,10 @@ void __init hyperv_init(void)
>  	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
>  	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> +	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> +	if (hv_isolation_type_tdx())
> +		goto skip_hypercall_pg_init;
> +
>  	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
>  			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -469,6 +473,7 @@ void __init hyperv_init(void)
>  		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	}
> 
> +skip_hypercall_pg_init:
>  	/*
>  	 * hyperv_init() is called before LAPIC is initialized: see
>  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> @@ -604,6 +609,9 @@ bool hv_is_hyperv_initialized(void)
>  	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>  		return false;
> 
> +	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> +	if (hv_isolation_type_tdx())
> +		return true;
>  	/*
>  	 * Verify that earlier initialization succeeded by checking
>  	 * that the hypercall page is setup
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 13ccb52eecd7..07e4253b5809 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -276,6 +276,20 @@ bool hv_isolation_type_tdx(void)
>  {
>  	return static_branch_unlikely(&isolation_type_tdx);
>  }
> +
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	struct tdx_hypercall_args args = { };
> +
> +	args.r10 = control;
> +	args.rdx = param1;
> +	args.r8  = param2;
> +
> +	(void)__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
> +
> +	return args.r11;
> +}
> +EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
>  #endif
> 
>  /*
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 8a2cafec4675..a4d665472d9e 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -10,6 +10,7 @@
>  #include <asm/nospec-branch.h>
>  #include <asm/paravirt.h>
>  #include <asm/mshyperv.h>
> +#include <asm/coco.h>
> 
>  union hv_ghcb;
> 
> @@ -39,6 +40,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32
> num_pages);
>  int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
>  int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> 
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +
> +/*
> + * If the hypercall involves no input or output parameters, the hypervisor
> + * ignores the corresponding GPA pointer.
> + */
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
>  	u64 input_address = input ? virt_to_phys(input) : 0;
> @@ -46,6 +53,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void
> *output)
>  	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control,
> +					cc_mkdec(input_address),
> +					cc_mkdec(output_address));
>  	if (!hv_hypercall_pg)
>  		return U64_MAX;
> 
> @@ -83,6 +94,9 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>  	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
> 
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control, input1, 0);
> +
>  	{
>  		__asm__ __volatile__(CALL_NOSPEC
>  				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> @@ -114,6 +128,9 @@ static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1,
> u64 input2)
>  	u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
> 
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control, input1, input2);
> +
>  	{
>  		__asm__ __volatile__("mov %4, %%r8\n"
>  				     CALL_NOSPEC
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 6/6] Drivers: hv: vmbus: Support TDX guests
  2022-12-07  0:33 ` [PATCH v2 6/6] Drivers: hv: vmbus: Support " Dexuan Cui
@ 2022-12-12 17:02   ` Michael Kelley (LINUX)
  2022-12-12 19:18     ` Dexuan Cui
  0 siblings, 1 reply; 28+ messages in thread
From: Michael Kelley (LINUX) @ 2022-12-12 17:02 UTC (permalink / raw)
  To: Dexuan Cui, ak, arnd, bp, brijesh.singh, Williams, Dan J,
	dave.hansen, Haiyang Zhang, hpa, jane.chu, kirill.shutemov,
	KY Srinivasan, linux-arch, linux-hyperv, luto, mingo, peterz,
	rostedt, sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck,
	wei.liu, x86
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com> Sent: Tuesday, December 6, 2022 4:33 PM
> 
> Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
>   No need to use hv_vp_assist_page.
>   Don't use the unsafe Hyper-V TSC page.
>   Don't try to use HV_REGISTER_CRASH_CTL.
>   Share SynIC Event/Message pages and VMBus Monitor pages with the host.
>   Use pgprot_decrypted(PAGE_KERNEL_NOENC))in hv_ringbuffer_init().
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> Changes in v2:
>   Used a new function hv_set_memory_enc_dec_needed() in
>     __set_memory_enc_pgtable().
>   Added the missing set_memory_encrypted() in hv_synic_free().
> 
> ---
> 
>  arch/x86/hyperv/hv_init.c      | 19 ++++++++---
>  arch/x86/hyperv/ivm.c          |  5 +++
>  arch/x86/kernel/cpu/mshyperv.c | 17 +++++++++-
>  arch/x86/mm/pat/set_memory.c   |  2 +-
>  drivers/hv/connection.c        |  4 ++-
>  drivers/hv/hv.c                | 60 +++++++++++++++++++++++++++++++++-
>  drivers/hv/hv_common.c         |  6 ++++
>  drivers/hv/ring_buffer.c       |  2 +-
>  include/asm-generic/mshyperv.h |  2 ++
>  9 files changed, 108 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index c0ba53ad8b8e..8d7b63346194 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> +	struct hv_vp_assist_page **hvp;
>  	int ret;
> 
>  	ret = hv_common_cpu_init(cpu);
> @@ -87,6 +87,7 @@ static int hv_cpu_init(unsigned int cpu)
>  	if (!hv_vp_assist_page)
>  		return 0;
> 
> +	hvp = &hv_vp_assist_page[cpu];
>  	if (hv_root_partition) {
>  		/*
>  		 * For root partition we get the hypervisor provided VP assist
> @@ -396,11 +397,21 @@ void __init hyperv_init(void)
>  	if (hv_common_init())
>  		return;
> 
> -	hv_vp_assist_page = kcalloc(num_possible_cpus(),
> -				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
> +	/*
> +	 * The VP assist page is useless to a TDX guest: the only use we
> +	 * would have for it is lazy EOI, which can not be used with TDX.
> +	 */
> +	if (hv_isolation_type_tdx())
> +		hv_vp_assist_page = NULL;
> +	else
> +		hv_vp_assist_page = kcalloc(num_possible_cpus(),
> +					    sizeof(*hv_vp_assist_page),
> +					    GFP_KERNEL);
>  	if (!hv_vp_assist_page) {
>  		ms_hyperv.hints &=
> ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> -		goto common_free;
> +
> +		if (!hv_isolation_type_tdx())
> +			goto common_free;
>  	}
> 
>  	if (hv_isolation_type_snp()) {
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 07e4253b5809..4398042f10d5 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -258,6 +258,11 @@ bool hv_is_isolation_supported(void)
>  	return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> 
> +bool hv_set_memory_enc_dec_needed(void)
> +{
> +	return hv_is_isolation_supported() && !hv_isolation_type_tdx();
> +}
> +
>  DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> 
>  /*
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 941372449ff2..24569da3c1f8 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -345,8 +345,23 @@ static void __init ms_hyperv_init_platform(void)
>  		}
> 
>  		if (IS_ENABLED(CONFIG_INTEL_TDX_GUEST) &&
> -		    hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
> +		    hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
>  			static_branch_enable(&isolation_type_tdx);
> +
> +			/*
> +			 * The GPAs of SynIC Event/Message pages and VMBus
> +			 * Moniter pages need to be added by this offset.
> +			 */
> +			ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
> +
> +			/* Don't use the unsafe Hyper-V TSC page */
> +			ms_hyperv.features &=
> +				~HV_MSR_REFERENCE_TSC_AVAILABLE;
> +
> +			/* HV_REGISTER_CRASH_CTL is unsupported */
> +			ms_hyperv.misc_features &=
> +				 ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> +		}
>  	}
> 
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 2e5a045731de..5892196f8ade 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2120,7 +2120,7 @@ static int __set_memory_enc_pgtable(unsigned long addr,
> int numpages, bool enc)
> 
>  static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  {
> -	if (hv_is_isolation_supported())
> +	if (hv_set_memory_enc_dec_needed())
>  		return hv_set_mem_host_visibility(addr, numpages, !enc);
> 
>  	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 9dc27e5d367a..1ecc3c29e3f7 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -250,12 +250,14 @@ int vmbus_connect(void)
>  		 * Isolation VM with AMD SNP needs to access monitor page via
>  		 * address space above shared gpa boundary.
>  		 */
> -		if (hv_isolation_type_snp()) {
> +		if (hv_isolation_type_snp() || hv_isolation_type_tdx()) {
>  			vmbus_connection.monitor_pages_pa[0] +=
>  				ms_hyperv.shared_gpa_boundary;
>  			vmbus_connection.monitor_pages_pa[1] +=
>  				ms_hyperv.shared_gpa_boundary;
> +		}
> 
> +		if (hv_isolation_type_snp()) {
>  			vmbus_connection.monitor_pages[0]
>  				=
> memremap(vmbus_connection.monitor_pages_pa[0],
>  					   HV_HYP_PAGE_SIZE,
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 4d6480d57546..78aca415985c 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -18,6 +18,7 @@
>  #include <linux/clockchips.h>
>  #include <linux/delay.h>
>  #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
>  #include <clocksource/hyperv_timer.h>
>  #include <asm/mshyperv.h>
>  #include "hyperv_vmbus.h"
> @@ -119,6 +120,7 @@ int hv_synic_alloc(void)
>  {
>  	int cpu;
>  	struct hv_per_cpu_context *hv_cpu;
> +	int ret = -ENOMEM;
> 
>  	/*
>  	 * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
>  			pr_err("Unable to allocate post msg page\n");
>  			goto err;
>  		}
> +
> +
> +		if (hv_isolation_type_tdx()) {
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->synic_message_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt SYNIC msg page\n");
> +				goto err;
> +			}
> +
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->synic_event_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt SYNIC event page\n");
> +				goto err;
> +			}
> +
> +			ret = set_memory_decrypted(
> +				(unsigned long)hv_cpu->post_msg_page, 1);
> +			if (ret) {
> +				pr_err("Failed to decrypt post msg page\n");
> +				goto err;
> +			}
> +		}
>  	}
> 
>  	return 0;
> @@ -176,18 +202,42 @@ int hv_synic_alloc(void)
>  	 * Any memory allocations that succeeded will be freed when
>  	 * the caller cleans up by calling hv_synic_free()
>  	 */
> -	return -ENOMEM;
> +	return ret;
>  }
> 
> 
>  void hv_synic_free(void)
>  {
>  	int cpu;
> +	int ret;
> 
>  	for_each_present_cpu(cpu) {
>  		struct hv_per_cpu_context *hv_cpu
>  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> 
> +		if (hv_isolation_type_tdx()) {
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->synic_message_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt SYNIC msg page\n");
> +				continue;
> +			}
> +
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->synic_event_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt SYNIC event page\n");
> +				continue;
> +			}
> +
> +			ret = set_memory_encrypted(
> +				(unsigned long)hv_cpu->post_msg_page, 1);
> +			if (ret) {
> +				pr_err("Failed to encrypt post msg page\n");
> +				continue;
> +			}
> +		}
> +
>  		free_page((unsigned long)hv_cpu->synic_event_page);
>  		free_page((unsigned long)hv_cpu->synic_message_page);
>  		free_page((unsigned long)hv_cpu->post_msg_page);
> @@ -225,6 +275,10 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	} else {
>  		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
>  			>> HV_HYP_PAGE_SHIFT;
> +
> +		if (hv_isolation_type_tdx())
> +			simp.base_simp_gpa |= ms_hyperv.shared_gpa_boundary
> +				>> HV_HYP_PAGE_SHIFT;

Since we're using cc_mkdec() in hv_do_hypercall() to set the SHARED bit,
perhaps the same could be done here to simplify the code and avoid the
explicit call to hv_isolation_type_tdx():

		simp.base_simp_gpa = cc_mkdec(virt_to_phys(hv_cpu->synic_message))
			>> HV_HYP_PAGE_SHIFT;

cc_mkdec() does nothing in a normal VM, and vTOM VMs are already
special-cased.

>  	}
> 
>  	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> @@ -243,6 +297,10 @@ void hv_synic_enable_regs(unsigned int cpu)
>  	} else {
>  		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
>  			>> HV_HYP_PAGE_SHIFT;
> +
> +		if (hv_isolation_type_tdx())
> +			siefp.base_siefp_gpa |= ms_hyperv.shared_gpa_boundary
> +				>> HV_HYP_PAGE_SHIFT;

Same here.


>  	}
> 
>  	hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index a9a03ab04b97..192dcf295dfc 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -262,6 +262,12 @@ bool __weak hv_is_isolation_supported(void)
>  }
>  EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
> 
> +bool __weak hv_set_memory_enc_dec_needed(void)
> +{
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(hv_set_memory_enc_dec_needed);
> +
>  bool __weak hv_isolation_type_snp(void)
>  {
>  	return false;
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index c6692fd5ab15..a51da82316ce 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -233,7 +233,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> 
>  		ring_info->ring_buffer = (struct hv_ring_buffer *)
>  			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
> -				PAGE_KERNEL);
> +				pgprot_decrypted(PAGE_KERNEL_NOENC));
> 
>  		kfree(pages_wraparound);
>  		if (!ring_info->ring_buffer)
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index bfb9eb9d7215..b7b1b18c9854 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -262,6 +262,7 @@ bool hv_is_hyperv_initialized(void);
>  bool hv_is_hibernation_supported(void);
>  enum hv_isolation_type hv_get_isolation_type(void);
>  bool hv_is_isolation_supported(void);
> +bool hv_set_memory_enc_dec_needed(void);
>  bool hv_isolation_type_snp(void);
>  u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
>  void hyperv_cleanup(void);
> @@ -274,6 +275,7 @@ static inline bool hv_is_hyperv_initialized(void) { return false; }
>  static inline bool hv_is_hibernation_supported(void) { return false; }
>  static inline void hyperv_cleanup(void) {}
>  static inline bool hv_is_isolation_supported(void) { return false; }
> +static inline bool hv_set_memory_enc_dec_needed(void) { return false; }
>  static inline enum hv_isolation_type hv_get_isolation_type(void)
>  {
>  	return HV_ISOLATION_TYPE_NONE;
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests
  2022-12-12 16:38   ` Michael Kelley (LINUX)
@ 2022-12-12 19:10     ` Dexuan Cui
  0 siblings, 0 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-12 19:10 UTC (permalink / raw)
  To: Michael Kelley (LINUX),
	ak, arnd, bp, brijesh.singh, Williams, Dan J, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86
  Cc: linux-kernel

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Monday, December 12, 2022 8:39 AM
> > [...]
> > A TDX guest uses the GHCI call rather than hv_hypercall_pg.
> >
> > In hv_do_hypercall(), Hyper-V requires that the input/output addresses
> > must have the cc_mask.
> 
> Is it a requirement that the input/output addresses refer to guest memory
> pages that have marked as shared/decrypted?  For example, I don't see

Yes. 

> any code to mark the hyperv_pcpu_input_arg page as shared/decrypted.
> Do the use cases for the hyperv_pcpu_input_arg page just not occur in a
> TDX VM?

I missed this when sending v2, and I realized this when testing DDA.
Will get this fixed in v3.

BTW, I noticed Tianyu posted a similar patch:
[RFC PATCH V2 10/18] drivers: hv: Decrypt percpu hvcall input arg page in sev-snp enlightened guest


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 6/6] Drivers: hv: vmbus: Support TDX guests
  2022-12-12 17:02   ` Michael Kelley (LINUX)
@ 2022-12-12 19:18     ` Dexuan Cui
  0 siblings, 0 replies; 28+ messages in thread
From: Dexuan Cui @ 2022-12-12 19:18 UTC (permalink / raw)
  To: Michael Kelley (LINUX),
	ak, arnd, bp, brijesh.singh, Williams, Dan J, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86
  Cc: linux-kernel

> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Monday, December 12, 2022 9:02 AM
> > [...]
> > @@ -225,6 +275,10 @@ void hv_synic_enable_regs(unsigned int cpu)
> >  	} else {
> >  		simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> >  			>> HV_HYP_PAGE_SHIFT;
> > +
> > +		if (hv_isolation_type_tdx())
> > +			simp.base_simp_gpa |= ms_hyperv.shared_gpa_boundary
> > +				>> HV_HYP_PAGE_SHIFT;
> 
> Since we're using cc_mkdec() in hv_do_hypercall() to set the SHARED bit,
> perhaps the same could be done here to simplify the code and avoid the
> explicit call to hv_isolation_type_tdx():

Good idea! Will do.

> 		simp.base_simp_gpa =
> cc_mkdec(virt_to_phys(hv_cpu->synic_message))
> 			>> HV_HYP_PAGE_SHIFT;
> 
> cc_mkdec() does nothing in a normal VM, and vTOM VMs are already
> special-cased.

This should work for C-bit SNP as well (clearing cc_mask from the GPA
doesn't really change the GPA since the GPA doesn't have the bit in the
first place).

> >  	hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> > @@ -243,6 +297,10 @@ void hv_synic_enable_regs(unsigned int cpu)
> >  	} else {
> >  		siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> >  			>> HV_HYP_PAGE_SHIFT;
> > +
> > +		if (hv_isolation_type_tdx())
> > +			siefp.base_siefp_gpa |= ms_hyperv.shared_gpa_boundary
> > +				>> HV_HYP_PAGE_SHIFT;
> 
> Same here.

Will do.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2022-12-07  0:33 ` [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
@ 2023-01-05  9:44   ` Zhi Wang
  2023-01-05 17:33     ` Dexuan Cui
  0 siblings, 1 reply; 28+ messages in thread
From: Zhi Wang @ 2023-01-05  9:44 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley, linux-kernel, zhi.a.wang

On Tue,  6 Dec 2022 16:33:21 -0800
Dexuan Cui <decui@microsoft.com> wrote:

> When a TDX guest runs on Hyper-V, the hv_netvsc driver's
> netvsc_init_buf() allocates buffers using vzalloc(), and needs to share
> the buffers with the host OS by calling set_memory_decrypted(), which is
> not working for vmalloc() yet. Add the support by handling the pages one
> by one.
> 

It seems calling set_memory_decrypted() in netvsc_init_buf() is missing in
this patch series. I guess there should be another one extra patch to cover
that.

> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> ---
> 
> Changes in v2:
>   Changed tdx_enc_status_changed() in place.
>   
> Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
> how to get the underlying huge page info (if huge page is in use) and
> try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
> is_vm_area_hugepages() and  __vfree() -> __vunmap(), and I think the
> underlying page allocation info is internal to the mm code, and there
> is no mm API to for me get the info in tdx_enc_status_changed().
> 
> Hi, Kirill, the load_unaligned_zeropad() issue is not addressed in
> this patch. The issue looks like a generic issue that also happens to
> AMD SNP vTOM mode and C-bit mode. Will need to figure out how to
> address the issue. If we decide to adjust direct mapping to have the
> shared bit set, it lools like we need to do the below for each
> 'start_va' vmalloc page:
>   pa = slow_virt_to_phys(start_va);
>   set_memory_decrypted(phys_to_virt(pa), 1); -- this line calls
> tdx_enc_status_changed() the second time for the page, which is bad.
> It looks like we need to find a way to reuse the cpa_flush() related
> code in __set_memory_enc_pgtable() and make sure we call
> tdx_enc_status_changed() only once for a vmalloc page?
> 
>   
>  arch/x86/coco/tdx/tdx.c | 69 ++++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index cdeda698d308..795ac56f06b8 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -5,6 +5,7 @@
>  #define pr_fmt(fmt)     "tdx: " fmt
>  
>  #include <linux/cpufeature.h>
> +#include <linux/mm.h>
>  #include <asm/coco.h>
>  #include <asm/tdx.h>
>  #include <asm/vmx.h>
> @@ -693,6 +694,34 @@ static bool try_accept_one(phys_addr_t *start,
> unsigned long len, return true;
>  }
>  
> +static bool try_accept_page(phys_addr_t start, phys_addr_t end)
> +{
> +	/*
> +	 * For shared->private conversion, accept the page using
> +	 * TDX_ACCEPT_PAGE TDX module call.
> +	 */
> +	while (start < end) {
> +		unsigned long len = end - start;
> +
> +		/*
> +		 * Try larger accepts first. It gives chance to VMM to
> keep
> +		 * 1G/2M SEPT entries where possible and speeds up
> process by
> +		 * cutting number of hypercalls (if successful).
> +		 */
> +
> +		if (try_accept_one(&start, len, PG_LEVEL_1G))
> +			continue;
> +
> +		if (try_accept_one(&start, len, PG_LEVEL_2M))
> +			continue;
> +
> +		if (!try_accept_one(&start, len, PG_LEVEL_4K))
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>  /*
>   * Notify the VMM about page mapping conversion. More info about ABI
>   * can be found in TDX Guest-Host-Communication Interface (GHCI),
> @@ -749,37 +778,27 @@ static bool tdx_map_gpa(phys_addr_t start,
> phys_addr_t end, bool enc) */
>  static bool tdx_enc_status_changed(unsigned long vaddr, int numpages,
> bool enc) {
> -	phys_addr_t start = __pa(vaddr);
> -	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> +	bool is_vmalloc = is_vmalloc_addr((void *)vaddr);
> +	unsigned long len = numpages * PAGE_SIZE;
> +	void *start_va = (void *)vaddr, *end_va = start_va + len;
> +	phys_addr_t start_pa, end_pa;
>  
> -	if (!tdx_map_gpa(start, end, enc))
> +	if (offset_in_page(start_va) != 0)
>  		return false;
>  
> -	/* private->shared conversion  requires only MapGPA call */
> -	if (!enc)
> -		return true;
> -
> -	/*
> -	 * For shared->private conversion, accept the page using
> -	 * TDX_ACCEPT_PAGE TDX module call.
> -	 */
> -	while (start < end) {
> -		unsigned long len = end - start;
> -
> -		/*
> -		 * Try larger accepts first. It gives chance to VMM to
> keep
> -		 * 1G/2M SEPT entries where possible and speeds up
> process by
> -		 * cutting number of hypercalls (if successful).
> -		 */
> -
> -		if (try_accept_one(&start, len, PG_LEVEL_1G))
> -			continue;
> +	while (start_va < end_va) {
> +		start_pa = is_vmalloc ? slow_virt_to_phys(start_va) :
> +					__pa(start_va);
> +		end_pa = start_pa + (is_vmalloc ? PAGE_SIZE : len);
>  
> -		if (try_accept_one(&start, len, PG_LEVEL_2M))
> -			continue;
> +		if (!tdx_map_gpa(start_pa, end_pa, enc))
> +			return false;
>  
> -		if (!try_accept_one(&start, len, PG_LEVEL_4K))
> +		/* private->shared conversion requires only MapGPA call
> */
> +		if (enc && !try_accept_page(start_pa, end_pa))
>  			return false;
> +
> +		start_va += is_vmalloc ? PAGE_SIZE : len;
>  	}
>  
>  	return true;


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-01-05  9:44   ` Zhi Wang
@ 2023-01-05 17:33     ` Dexuan Cui
  2023-01-05 18:10       ` Zhi Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Dexuan Cui @ 2023-01-05 17:33 UTC (permalink / raw)
  To: Zhi Wang
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel, zhi.a.wang

> From: Zhi Wang <zhi.wang.linux@gmail.com>
> Sent: Thursday, January 5, 2023 1:45 AM
>  [...]
> On Tue,  6 Dec 2022 16:33:21 -0800
> Dexuan Cui <decui@microsoft.com> wrote:
> 
> > When a TDX guest runs on Hyper-V, the hv_netvsc driver's
> > netvsc_init_buf() allocates buffers using vzalloc(), and needs to share
> > the buffers with the host OS by calling set_memory_decrypted(), which is
> > not working for vmalloc() yet. Add the support by handling the pages one
> > by one.
> 
> It seems calling set_memory_decrypted() in netvsc_init_buf() is missing in
> this patch series. I guess there should be another one extra patch to cover
> that.

set_memory_decrypted() is not missing here. In netvsc_init_buf(), after
the line "net_device->recv_buf = vzalloc(buf_size);", we have 

vmbus_establish_gpadl(device->channel, net_device->recv_buf, ...), which

calls __vmbus_establish_gpadl(), which calls 

set_memory_decrypted((unsigned long)kbuffer, ...)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-01-05 17:33     ` Dexuan Cui
@ 2023-01-05 18:10       ` Zhi Wang
  2023-01-05 20:29         ` Dexuan Cui
  0 siblings, 1 reply; 28+ messages in thread
From: Zhi Wang @ 2023-01-05 18:10 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel, zhi.a.wang

On Thu, 5 Jan 2023 17:33:16 +0000
Dexuan Cui <decui@microsoft.com> wrote:

> > From: Zhi Wang <zhi.wang.linux@gmail.com>
> > Sent: Thursday, January 5, 2023 1:45 AM
> >  [...]
> > On Tue,  6 Dec 2022 16:33:21 -0800
> > Dexuan Cui <decui@microsoft.com> wrote:
> > 
> > > When a TDX guest runs on Hyper-V, the hv_netvsc driver's
> > > netvsc_init_buf() allocates buffers using vzalloc(), and needs to
> > > share the buffers with the host OS by calling
> > > set_memory_decrypted(), which is not working for vmalloc() yet. Add
> > > the support by handling the pages one by one.
> > 
> > It seems calling set_memory_decrypted() in netvsc_init_buf() is
> > missing in this patch series. I guess there should be another one
> > extra patch to cover that.
> 
> set_memory_decrypted() is not missing here. In netvsc_init_buf(), after
> the line "net_device->recv_buf = vzalloc(buf_size);", we have 
> 
> vmbus_establish_gpadl(device->channel, net_device->recv_buf, ...), which
> 
> calls __vmbus_establish_gpadl(), which calls 
> 
> set_memory_decrypted((unsigned long)kbuffer, ...)
> 

I see. Then do we still need the hv_map_memory()in the following
code piece in netvsc.c after {set_memoery_encrypted, decrypted}()
supporting memory from vmalloc()?

	/* set_memory_decrypted() is called here. */

        ret = vmbus_establish_gpadl(device->channel, net_device->recv_buf,
                                    buf_size,
                                    &net_device->recv_buf_gpadl_handle);
        if (ret != 0) {
                netdev_err(ndev,
                        "unable to establish receive buffer's gpadl\n");
                goto cleanup;
        }
	
	/* Should we remove this? */
        if (hv_isolation_type_snp()) {
                vaddr = hv_map_memory(net_device->recv_buf, buf_size);
                if (!vaddr) {
                        ret = -ENOMEM;
                        goto cleanup;
                }

                net_device->recv_original_buf = net_device->recv_buf;
                net_device->recv_buf = vaddr;
        }

I assume that we need an VA mapped to a shared GPA here.

The VA(net_device->recv_buf) has been associated with a shared GPA in
set_memory_decrypted() by adjusting the kernel page table. hv_map_memory()
is with similar purpose but just a different way:

void *hv_map_memory(void *addr, unsigned long size)
{
        unsigned long *pfns = kcalloc(size / PAGE_SIZE,
                                      sizeof(unsigned long), GFP_KERNEL);
        void *vaddr;
        int i;

        if (!pfns)
                return NULL;

        for (i = 0; i < size / PAGE_SIZE; i++)
                pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
                        (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);

        vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
        kfree(pfns);

        return vaddr;
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-01-05 18:10       ` Zhi Wang
@ 2023-01-05 20:29         ` Dexuan Cui
  2023-01-06 10:10           ` Zhi Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Dexuan Cui @ 2023-01-05 20:29 UTC (permalink / raw)
  To: Zhi Wang
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel, zhi.a.wang

> From: Zhi Wang <zhi.wang.linux@gmail.com>
> Sent: Thursday, January 5, 2023 10:10 AM
> [...]
> I see. Then do we still need the hv_map_memory()in the following
> code piece in netvsc.c after {set_memoery_encrypted, decrypted}()
> supporting memory from vmalloc()?

For SNP, set_memory_decrypted() is already able to support memory
from vmalloc().

For TDX, currently set_memory_decrypted()() is unable to support
memory from vmalloc().

>         /* set_memory_decrypted() is called here. */
>         ret = vmbus_establish_gpadl(device->channel,
> net_device->recv_buf, buf_size, 
> &net_device->recv_buf_gpadl_handle);
>         if (ret != 0) {
>                 netdev_err(ndev,
>                         "unable to establish receive buffer's gpadl\n");
>                 goto cleanup;
>         }
> 
>         /* Should we remove this? */

The below block of code is for SNP rather than TDX, so it has nothing to do
with the patch here. BTW, the code is ineeded removed in Michael's patchset,
which is for device assignment support for SNP guests on Hyper-V:
https://lwn.net/ml/linux-kernel/1669951831-4180-11-git-send-email-mikelley@microsoft.com/
and I'm happy with the removal of the code.

>         if (hv_isolation_type_snp()) {
>                 vaddr = hv_map_memory(net_device->recv_buf, buf_size);
>                 if (!vaddr) {
>                         ret = -ENOMEM;
>                         goto cleanup;
>                 }
> 
>                 net_device->recv_original_buf = net_device->recv_buf;
>                 net_device->recv_buf = vaddr;
>         }
> 
> I assume that we need an VA mapped to a shared GPA here.

Yes.

> The VA(net_device->recv_buf) has been associated with a shared GPA in
> set_memory_decrypted() by adjusting the kernel page table.

For a SNP guest with pavavisor on Hyper-V, this is not true in the current
mainline kernel: see set_memory_decrypted() -> __set_memory_enc_dec():

static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
		//Dexuan: For a SNP guest with paravisor on Hyper-V, currently we
        // only call hv_set_mem_host_visibility(), i.e. the page tabe is not
        // updated. This is being changed by Michael's patchset, e.g.,
https://lwn.net/ml/linux-kernel/1669951831-4180-7-git-send-email-mikelley@microsoft.com/
        
        if (hv_is_isolation_supported())
                return hv_set_mem_host_visibility(addr, numpages, !enc);

        if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
                return __set_memory_enc_pgtable(addr, numpages, enc);

        return 0;
}

> hv_map_memory()
> is with similar purpose but just a different way:
> 
> void *hv_map_memory(void *addr, unsigned long size)
> {
>         unsigned long *pfns = kcalloc(size / PAGE_SIZE,
>                                       sizeof(unsigned long),
> GFP_KERNEL);
>         void *vaddr;
>         int i;
> 
>         if (!pfns)
>                 return NULL;
> 
>         for (i = 0; i < size / PAGE_SIZE; i++)
>                 pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
>                         (ms_hyperv.shared_gpa_boundary >>
> PAGE_SHIFT);
> 
>         vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
>         kfree(pfns);
> 
>         return vaddr;
> }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-01-05 20:29         ` Dexuan Cui
@ 2023-01-06 10:10           ` Zhi Wang
  2023-01-06 15:39             ` Dexuan Cui
  0 siblings, 1 reply; 28+ messages in thread
From: Zhi Wang @ 2023-01-06 10:10 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel, zhi.a.wang

On Thu, 5 Jan 2023 20:29:25 +0000
Dexuan Cui <decui@microsoft.com> wrote:

> > From: Zhi Wang <zhi.wang.linux@gmail.com>
> > Sent: Thursday, January 5, 2023 10:10 AM
> > [...]
> > I see. Then do we still need the hv_map_memory()in the following
> > code piece in netvsc.c after {set_memoery_encrypted, decrypted}()
> > supporting memory from vmalloc()?
> 
> For SNP, set_memory_decrypted() is already able to support memory
> from vmalloc().
> 
> For TDX, currently set_memory_decrypted()() is unable to support
> memory from vmalloc().
> 
I guess we both agree that memory conversion in HV should be done through
coco so the hv_map_memory can be removed (even the extra does not hurt
currently)

The memory conversion in current HV code is done by different approaches.
Some are going through the coco, some are not, which ends up
with if(hv_isolation_type_snp()) in memory allocation path. It can be
confusing. I suppose a reasonable purpose of hv_isolation_type_snp()
should cover the AMD SEV-SNP specific parts which haven't been (or are
not going to be) covered by coco. For example the GHCB stuff. 

Thanks,
Zhi.

> >         /* set_memory_decrypted() is called here. */
> >         ret = vmbus_establish_gpadl(device->channel,
> > net_device->recv_buf, buf_size, 
> > &net_device->recv_buf_gpadl_handle);
> >         if (ret != 0) {
> >                 netdev_err(ndev,
> >                         "unable to establish receive buffer's
> > gpadl\n"); goto cleanup;
> >         }
> > 
> >         /* Should we remove this? */
> 
> The below block of code is for SNP rather than TDX, so it has nothing to
> do with the patch here. BTW, the code is ineeded removed in Michael's
> patchset, which is for device assignment support for SNP guests on
> Hyper-V:
> https://lwn.net/ml/linux-kernel/1669951831-4180-11-git-send-email-mikelley@microsoft.com/

So happy to see this. :)

> and I'm happy with the removal of the code.
> 
> >         if (hv_isolation_type_snp()) {
> >                 vaddr = hv_map_memory(net_device->recv_buf, buf_size);
> >                 if (!vaddr) {
> >                         ret = -ENOMEM;
> >                         goto cleanup;
> >                 }
> > 
> >                 net_device->recv_original_buf = net_device->recv_buf;
> >                 net_device->recv_buf = vaddr;
> >         }
> > 
> > I assume that we need an VA mapped to a shared GPA here.
> 
> Yes.
> 
> > The VA(net_device->recv_buf) has been associated with a shared GPA in
> > set_memory_decrypted() by adjusting the kernel page table.
> 
> For a SNP guest with pavavisor on Hyper-V, this is not true in the
> current mainline kernel: see set_memory_decrypted() ->
> __set_memory_enc_dec():
> 
> static int __set_memory_enc_dec(unsigned long addr, int numpages, bool
> enc) {
> 		//Dexuan: For a SNP guest with paravisor on Hyper-V,
> currently we // only call hv_set_mem_host_visibility(), i.e. the page
> tabe is not // updated. This is being changed by Michael's patchset,
> e.g.,
> https://lwn.net/ml/linux-kernel/1669951831-4180-7-git-send-email-mikelley@microsoft.com/ 
>         if (hv_is_isolation_supported())
>                 return hv_set_mem_host_visibility(addr, numpages, !enc);
> 
>         if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>                 return __set_memory_enc_pgtable(addr, numpages, enc);
> 
>         return 0;
> }
> 
> > hv_map_memory()
> > is with similar purpose but just a different way:
> > 
> > void *hv_map_memory(void *addr, unsigned long size)
> > {
> >         unsigned long *pfns = kcalloc(size / PAGE_SIZE,
> >                                       sizeof(unsigned long),
> > GFP_KERNEL);
> >         void *vaddr;
> >         int i;
> > 
> >         if (!pfns)
> >                 return NULL;
> > 
> >         for (i = 0; i < size / PAGE_SIZE; i++)
> >                 pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
> >                         (ms_hyperv.shared_gpa_boundary >>
> > PAGE_SHIFT);
> > 
> >         vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
> >         kfree(pfns);
> > 
> >         return vaddr;
> > }


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests
  2022-12-07  0:33 ` [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests Dexuan Cui
  2022-12-12 16:38   ` Michael Kelley (LINUX)
@ 2023-01-06 11:23   ` Zhi Wang
  2023-01-09  7:27     ` Dexuan Cui
  1 sibling, 1 reply; 28+ messages in thread
From: Zhi Wang @ 2023-01-06 11:23 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	haiyangz, hpa, jane.chu, kirill.shutemov, kys, linux-arch,
	linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, mikelley, linux-kernel

On Tue,  6 Dec 2022 16:33:24 -0800
Dexuan Cui <decui@microsoft.com> wrote:

> A TDX guest uses the GHCI call rather than hv_hypercall_pg.
> 
> In hv_do_hypercall(), Hyper-V requires that the input/output addresses
> must have the cc_mask.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> ---
> 
> Changes in v2:
>   Implemented hv_tdx_hypercall() in C rather than in assembly code.
>   Renamed the parameter names of hv_tdx_hypercall().
>   Used cc_mkdec() directly in hv_do_hypercall().
> 
>  arch/x86/hyperv/hv_init.c       |  8 ++++++++
>  arch/x86/hyperv/ivm.c           | 14 ++++++++++++++
>  arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
>  3 files changed, 39 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a823fde1ad7f..c0ba53ad8b8e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -430,6 +430,10 @@ void __init hyperv_init(void)
>  	/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
>  	hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>  
> +	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg.
> */
> +	if (hv_isolation_type_tdx())
> +		goto skip_hypercall_pg_init;
> +
>  	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1,
> VMALLOC_START, VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -469,6 +473,7 @@ void __init hyperv_init(void)
>  		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	}
>  
> +skip_hypercall_pg_init:
>  	/*
>  	 * hyperv_init() is called before LAPIC is initialized: see
>  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> @@ -604,6 +609,9 @@ bool hv_is_hyperv_initialized(void)
>  	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>  		return false;
>  
> +	/* A TDX guest uses the GHCI call rather than hv_hypercall_pg.
> */
> +	if (hv_isolation_type_tdx())
> +		return true;
>  	/*
>  	 * Verify that earlier initialization succeeded by checking
>  	 * that the hypercall page is setup
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 13ccb52eecd7..07e4253b5809 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -276,6 +276,20 @@ bool hv_isolation_type_tdx(void)
>  {
>  	return static_branch_unlikely(&isolation_type_tdx);
>  }
> +
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	struct tdx_hypercall_args args = { };
> +
> +	args.r10 = control;
> +	args.rdx = param1;
> +	args.r8  = param2;
> +
> +	(void)__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
> +
> +	return args.r11;
> +}
> +EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
>  #endif
>  
>  /*
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h index 8a2cafec4675..a4d665472d9e 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -10,6 +10,7 @@
>  #include <asm/nospec-branch.h>
>  #include <asm/paravirt.h>
>  #include <asm/mshyperv.h>
> +#include <asm/coco.h>
>  
>  union hv_ghcb;
>  
> @@ -39,6 +40,12 @@ int hv_call_deposit_pages(int node, u64 partition_id,
> u32 num_pages); int hv_call_add_logical_proc(int node, u32 lp_index, u32
> acpi_id); int hv_call_create_vp(int node, u64 partition_id, u32
> vp_index, u32 flags); 
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +
> +/*
> + * If the hypercall involves no input or output parameters, the
> hypervisor
> + * ignores the corresponding GPA pointer.
> + */
>  static inline u64 hv_do_hypercall(u64 control, void *input, void
> *output) {
>  	u64 input_address = input ? virt_to_phys(input) : 0;
> @@ -46,6 +53,10 @@ static inline u64 hv_do_hypercall(u64 control, void
> *input, void *output) u64 hv_status;
>  
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control,
> +					cc_mkdec(input_address),
> +					cc_mkdec(output_address));
>  	if (!hv_hypercall_pg)
>  		return U64_MAX;
>  

> @@ -83,6 +94,9 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64
> input1) u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
>  
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control, input1, 0);
> +
>  	{
>  		__asm__ __volatile__(CALL_NOSPEC
>  				     : "=a" (hv_status),
> ASM_CALL_CONSTRAINT, @@ -114,6 +128,9 @@ static inline u64
> hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2) u64 hv_status,
> control = (u64)code | HV_HYPERCALL_FAST_BIT; 
>  #ifdef CONFIG_X86_64
> +	if (hv_isolation_type_tdx())
> +		return hv_tdx_hypercall(control, input1, input2);
> +
In some paths, for example vmbus_set_event(), choosing the SNP-based or
generic hypercall happens in the caller, while now TDX-based hypercall is
embraced in the generic hypercall path, e.g. hv_do_fast_hypercall8(). Which
style will be chosen in the future? Seems the coding structure needs to be
aligned.

void vmbus_set_event(struct vmbus_channel *channel)
{
        u32 child_relid = channel->offermsg.child_relid;

        if (!channel->is_dedicated_interrupt)
                vmbus_send_interrupt(child_relid);

        ++channel->sig_events;

        if (hv_isolation_type_snp())
                hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
                                NULL, sizeof(channel->sig_event));
        else
                hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT,
        channel->sig_event);
}


>  	{
>  		__asm__ __volatile__("mov %4, %%r8\n"
>  				     CALL_NOSPEC


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
  2023-01-06 10:10           ` Zhi Wang
@ 2023-01-06 15:39             ` Dexuan Cui
  0 siblings, 0 replies; 28+ messages in thread
From: Dexuan Cui @ 2023-01-06 15:39 UTC (permalink / raw)
  To: Zhi Wang
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel, zhi.a.wang

> From: Zhi Wang <zhi.wang.linux@gmail.com>
> Sent: Friday, January 6, 2023 2:11 AM
> To: Dexuan Cui <decui@microsoft.com>
>  [...]
> I guess we both agree that memory conversion in HV should be done through
> coco so the hv_map_memory can be removed (even the extra does not hurt
> currently)

Correct. As I mentioned, Michael's pachset is doing that and hopefully it would
be merged into the upstream soon.

> The memory conversion in current HV code is done by different approaches.
> Some are going through the coco, some are not, which ends up
> with if(hv_isolation_type_snp()) in memory allocation path. It can be
> confusing. I suppose a reasonable purpose of hv_isolation_type_snp()
> should cover the AMD SEV-SNP specific parts which haven't been (or are
> not going to be) covered by coco. For example the GHCB stuff.
> 
> Thanks,
> Zhi.

Exactly.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests
  2023-01-06 11:23   ` Zhi Wang
@ 2023-01-09  7:27     ` Dexuan Cui
  0 siblings, 0 replies; 28+ messages in thread
From: Dexuan Cui @ 2023-01-09  7:27 UTC (permalink / raw)
  To: Zhi Wang
  Cc: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
	Haiyang Zhang, hpa, jane.chu, kirill.shutemov, KY Srinivasan,
	linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
	sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu,
	x86, Michael Kelley (LINUX),
	linux-kernel

> From: Zhi Wang <zhi.wang.linux@gmail.com>
> Sent: Friday, January 6, 2023 3:24 AM
> > @@ -83,6 +94,9 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64
> > input1) u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
> >
> >  #ifdef CONFIG_X86_64
> > +	if (hv_isolation_type_tdx())
> > +		return hv_tdx_hypercall(control, input1, 0);
> > +
> >  	{
> >  		__asm__ __volatile__(CALL_NOSPEC
> >  				     : "=a" (hv_status),
> > ASM_CALL_CONSTRAINT, @@ -114,6 +128,9 @@ static inline u64
> > hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2) u64 hv_status,
> > control = (u64)code | HV_HYPERCALL_FAST_BIT;
> >  #ifdef CONFIG_X86_64
> > +	if (hv_isolation_type_tdx())
> > +		return hv_tdx_hypercall(control, input1, input2);
> > +
> In some paths, for example vmbus_set_event(), choosing the SNP-based or

In a SNP guest with pavavisor on Hyper-V, hv_ghcb_hypercall() is called in
only two places: vmbus_set_event() and hv_post_message(), where the
hypercalls, which are done via GHCB, need to be handled by the Hyper-V
hypervisor directly; in other places, the hypercalls, which are done via the
hypercall page, need to be handled by the pavavisor. That's to say, there
are 2 different kinds of hypercalls for a SNP guest with pavavisor on Hyper-V,
and hence we have to use 2 styles.

> generic hypercall happens in the caller, while now TDX-based hypercall is
> embraced in the generic hypercall path, e.g. hv_do_fast_hypercall8(). Which
> style will be chosen in the future? Seems the coding structure needs to be
> aligned.

For a TDX guest without pavavisor on Hyper-V, there is only one style of
hypercalls, so I make the change in hv_do_hypercall() and
hv_do_fast_hypercall*() directly. 

I don't think we can make any clean-up changes right now. When we
support the TDX guest with pavavisor on Hyper-V, we'll figure out if we
can make any improvement.

> void vmbus_set_event(struct vmbus_channel *channel)
> {
>         u32 child_relid = channel->offermsg.child_relid;
> 
>         if (!channel->is_dedicated_interrupt)
>                 vmbus_send_interrupt(child_relid);
> 
>         ++channel->sig_events;
> 
>         if (hv_isolation_type_snp())
>                 hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT,
> &channel->sig_event,
>                                 NULL, sizeof(channel->sig_event));
>         else
>                 hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT,
>         channel->sig_event);
> }
> 
> 
> >  	{
> >  		__asm__ __volatile__("mov %4, %%r8\n"
> >  				     CALL_NOSPEC

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-01-09  7:28 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07  0:33 [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui
2022-12-07  0:33 ` [PATCH v2 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed Dexuan Cui
2022-12-08 19:48   ` Kirill A. Shutemov
2022-12-08 19:54     ` Dexuan Cui
2022-12-07  0:33 ` [PATCH v2 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Dexuan Cui
2023-01-05  9:44   ` Zhi Wang
2023-01-05 17:33     ` Dexuan Cui
2023-01-05 18:10       ` Zhi Wang
2023-01-05 20:29         ` Dexuan Cui
2023-01-06 10:10           ` Zhi Wang
2023-01-06 15:39             ` Dexuan Cui
2022-12-07  0:33 ` [PATCH v2 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests Dexuan Cui
2022-12-12  0:59   ` Sathyanarayanan Kuppuswamy
2022-12-07  0:33 ` [PATCH v2 4/6] x86/tdx: Expand __tdx_hypercall() to handle more arguments Dexuan Cui
2022-12-07 22:14   ` Sathyanarayanan Kuppuswamy
2022-12-08 15:54     ` Dexuan Cui
2022-12-08 22:06       ` Kirill A. Shutemov
2022-12-08 22:07   ` Kirill A. Shutemov
2022-12-08 23:09     ` Dexuan Cui
2022-12-07  0:33 ` [PATCH v2 5/6] x86/hyperv: Support hypercalls for TDX guests Dexuan Cui
2022-12-12 16:38   ` Michael Kelley (LINUX)
2022-12-12 19:10     ` Dexuan Cui
2023-01-06 11:23   ` Zhi Wang
2023-01-09  7:27     ` Dexuan Cui
2022-12-07  0:33 ` [PATCH v2 6/6] Drivers: hv: vmbus: Support " Dexuan Cui
2022-12-12 17:02   ` Michael Kelley (LINUX)
2022-12-12 19:18     ` Dexuan Cui
2022-12-12  0:04 ` [PATCH v2 0/6] Support TDX guests on Hyper-V Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).