All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state
@ 2024-01-05 18:30 mhkelley58
  2024-01-05 18:30 ` [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback mhkelley58
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: mhkelley58 @ 2024-01-05 18:30 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, kirill.shutemov,
	haiyangz, wei.liu, decui, luto, peterz, akpm, urezki, hch,
	lstoakes, thomas.lendacky, ardb, jroedel, seanjc,
	rick.p.edgecombe, sathyanarayanan.kuppuswamy, linux-kernel,
	linux-coco, linux-hyperv, linux-mm

From: Michael Kelley <mhklinux@outlook.com>

In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress.  The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.

However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.

To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.

This version of the patch series marks transitioning pages "not present"
only when running as a Hyper-V guest with a paravisor. Previous
versions[1] marked transitioning pages "not present" regardless of the
hypervisor and regardless of whether a paravisor is in use.  That more
general use had the benefit of decoupling the load_unaligned_zeropad()
fixup from CoCo VM #VE and #VC exception handling.  But the implementation
was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks
require a valid virtual address, not a physical address like with TDX and
the Hyper-V paravisor.  Marking the transitioning pages "not present"
causes the virtual address to not be valid, and the PVALIDATE
instruction in the SEV-SNP callback fails. Constructing a temporary
virtual address for this purpose is slower and adds complexity that
negates the benefits of the more general use. So this version narrows
the applicability of the approach to just where it is required
because of the #VC and #VE exceptions being routed to a paravisor.

The previous version minimized the TLB flushing done during page
transitions between encrypted and decrypted. Because this version
marks the pages "not present" in hypervisor specific callbacks and
not in __set_memory_enc_pgtable(), doing such optimization is more
difficult to coordinate. But the page transitions are not a hot path,
so this version eschews optimization of TLB flushing in favor of
simplicity.

Since this version no longer touches __set_memory_enc_pgtable(),
I've also removed patches that add comments about error handling
in that function.  Rick Edgecombe has proposed patches to improve
that error handling, and I'll leave those comments to Rick's
patches.

Patch 1 handles implications of the hypervisor callbacks needing
to do virt-to-phys translations on pages that are temporarily
marked not present.

Patch 2 makes the existing set_memory_p() function available for
use in the hypervisor callbacks.

Patch 3 is the core change that marks the transitioning pages
as not present.

This patch set is based on the linux-next20240103 code tree.

Changes in v3:
* Major rework and simplification per discussion above.

Changes in v2:
* Added Patches 3 and 4 to deal with the failure on SEV-SNP
  [Tom Lendacky]
* Split the main change into two separate patches (Patch 5 and
  Patch 6) to improve reviewability and to offer the option of
  retaining both hypervisor callbacks.
* Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64
  so that the code builds correctly for 32-bit, even though it
  is never executed for 32-bit [reported by kernel test robot]

[1] https://lore.kernel.org/lkml/20231121212016.1154303-1-mhklinux@outlook.com/

Michael Kelley (3):
  x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
    callback
  x86/mm: Regularize set_memory_p() parameters and make non-static
  x86/hyperv: Make encrypted/decrypted changes safe for
    load_unaligned_zeropad()

 arch/x86/hyperv/ivm.c             | 58 ++++++++++++++++++++++++++++---
 arch/x86/include/asm/set_memory.h |  1 +
 arch/x86/mm/pat/set_memory.c      | 25 +++++++------
 3 files changed, 70 insertions(+), 14 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-05 18:30 [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state mhkelley58
@ 2024-01-05 18:30 ` mhkelley58
  2024-01-08 13:07   ` kirill.shutemov
  2024-01-12  1:20   ` Edgecombe, Rick P
  2024-01-05 18:30 ` [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static mhkelley58
  2024-01-05 18:30 ` [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad() mhkelley58
  2 siblings, 2 replies; 18+ messages in thread
From: mhkelley58 @ 2024-01-05 18:30 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, kirill.shutemov,
	haiyangz, wei.liu, decui, luto, peterz, akpm, urezki, hch,
	lstoakes, thomas.lendacky, ardb, jroedel, seanjc,
	rick.p.edgecombe, sathyanarayanan.kuppuswamy, linux-kernel,
	linux-coco, linux-hyperv, linux-mm

From: Michael Kelley <mhklinux@outlook.com>

In preparation for temporarily marking pages not present during a
transition between encrypted and decrypted, use slow_virt_to_phys()
in the hypervisor callback. As long as the PFN is correct,
slow_virt_to_phys() works even if the leaf PTE is not present.
The existing functions that depend on vmalloc_to_page() all
require that the leaf PTE be marked present, so they don't work.

Update the comments for slow_virt_to_phys() to note this broader usage
and the requirement to work even if the PTE is not marked present.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/hyperv/ivm.c        |  9 ++++++++-
 arch/x86/mm/pat/set_memory.c | 13 +++++++++----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 02e55237d919..8ba18635e338 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -524,7 +524,14 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
 		return false;
 
 	for (i = 0, pfn = 0; i < pagecount; i++) {
-		pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
+		/*
+		 * Use slow_virt_to_phys() because the PRESENT bit has been
+		 * temporarily cleared in the PTEs.  slow_virt_to_phys() works
+		 * without the PRESENT bit while virt_to_hvpfn() or similar
+		 * does not.
+		 */
+		pfn_array[pfn] = slow_virt_to_phys((void *)kbuffer +
+					i * HV_HYP_PAGE_SIZE) >> HV_HYP_PAGE_SHIFT;
 		pfn++;
 
 		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index bda9f129835e..8e19796e7ce5 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -755,10 +755,15 @@ pmd_t *lookup_pmd_address(unsigned long address)
  * areas on 32-bit NUMA systems.  The percpu areas can
  * end up in this kind of memory, for instance.
  *
- * This could be optimized, but it is only intended to be
- * used at initialization time, and keeping it
- * unoptimized should increase the testing coverage for
- * the more obscure platforms.
+ * It is also used in callbacks for CoCo VM page transitions between private
+ * and shared because it works when the PRESENT bit is not set in the leaf
+ * PTE. In such cases, the state of the PTEs, including the PFN, is otherwise
+ * known to be valid, so the returned physical address is correct. The similar
+ * function vmalloc_to_pfn() can't be used because it requires the PRESENT bit.
+ *
+ * This could be optimized, but it is only used in paths that are not perf
+ * sensitive, and keeping it unoptimized should increase the testing coverage
+ * for the more obscure platforms.
  */
 phys_addr_t slow_virt_to_phys(void *__virt_addr)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static
  2024-01-05 18:30 [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state mhkelley58
  2024-01-05 18:30 ` [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback mhkelley58
@ 2024-01-05 18:30 ` mhkelley58
  2024-01-08 13:10   ` kirill.shutemov
  2024-01-12  0:56   ` Edgecombe, Rick P
  2024-01-05 18:30 ` [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad() mhkelley58
  2 siblings, 2 replies; 18+ messages in thread
From: mhkelley58 @ 2024-01-05 18:30 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, kirill.shutemov,
	haiyangz, wei.liu, decui, luto, peterz, akpm, urezki, hch,
	lstoakes, thomas.lendacky, ardb, jroedel, seanjc,
	rick.p.edgecombe, sathyanarayanan.kuppuswamy, linux-kernel,
	linux-coco, linux-hyperv, linux-mm

From: Michael Kelley <mhklinux@outlook.com>

set_memory_p() is currently static.  It has parameters that don't
match set_memory_p() under arch/powerpc and that aren't congruent
with the other set_memory_* functions. There's no good reason for
the difference.

Fix this by making the parameters consistent, and update the one
existing call site.  Make the function non-static and add it to
include/asm/set_memory.h so that it is completely parallel to
set_memory_np() and is usable in other modules.

No functional change.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/include/asm/set_memory.h |  1 +
 arch/x86/mm/pat/set_memory.c      | 12 ++++++------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index a5e89641bd2d..9aee31862b4a 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -47,6 +47,7 @@ int set_memory_uc(unsigned long addr, int numpages);
 int set_memory_wc(unsigned long addr, int numpages);
 int set_memory_wb(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
+int set_memory_p(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 8e19796e7ce5..05d42395a462 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2046,17 +2046,12 @@ int set_mce_nospec(unsigned long pfn)
 	return rc;
 }
 
-static int set_memory_p(unsigned long *addr, int numpages)
-{
-	return change_page_attr_set(addr, numpages, __pgprot(_PAGE_PRESENT), 0);
-}
-
 /* Restore full speculative operation to the pfn. */
 int clear_mce_nospec(unsigned long pfn)
 {
 	unsigned long addr = (unsigned long) pfn_to_kaddr(pfn);
 
-	return set_memory_p(&addr, 1);
+	return set_memory_p(addr, 1);
 }
 EXPORT_SYMBOL_GPL(clear_mce_nospec);
 #endif /* CONFIG_X86_64 */
@@ -2109,6 +2104,11 @@ int set_memory_np_noalias(unsigned long addr, int numpages)
 					CPA_NO_CHECK_ALIAS, NULL);
 }
 
+int set_memory_p(unsigned long addr, int numpages)
+{
+	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+}
+
 int set_memory_4k(unsigned long addr, int numpages)
 {
 	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  2024-01-05 18:30 [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state mhkelley58
  2024-01-05 18:30 ` [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback mhkelley58
  2024-01-05 18:30 ` [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static mhkelley58
@ 2024-01-05 18:30 ` mhkelley58
  2024-01-08 18:37   ` Kuppuswamy Sathyanarayanan
  2024-01-12  0:26   ` Edgecombe, Rick P
  2 siblings, 2 replies; 18+ messages in thread
From: mhkelley58 @ 2024-01-05 18:30 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, hpa, kirill.shutemov,
	haiyangz, wei.liu, decui, luto, peterz, akpm, urezki, hch,
	lstoakes, thomas.lendacky, ardb, jroedel, seanjc,
	rick.p.edgecombe, sathyanarayanan.kuppuswamy, linux-kernel,
	linux-coco, linux-hyperv, linux-mm

From: Michael Kelley <mhklinux@outlook.com>

In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress.  The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.

However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.

To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/hyperv/ivm.c | 49 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 8ba18635e338..5ad39256a5d2 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,7 @@
 #include <asm/io.h>
 #include <asm/coco.h>
 #include <asm/mem_encrypt.h>
+#include <asm/set_memory.h>
 #include <asm/mshyperv.h>
 #include <asm/hypervisor.h>
 #include <asm/mtrr.h>
@@ -502,6 +503,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 		return -EFAULT;
 }
 
+/*
+ * When transitioning memory between encrypted and decrypted, the caller
+ * of set_memory_encrypted() or set_memory_decrypted() is responsible for
+ * ensuring that the memory isn't in use and isn't referenced while the
+ * transition is in progress.  The transition has multiple steps, and the
+ * memory is in an inconsistent state until all steps are complete. A
+ * reference while the state is inconsistent could result in an exception
+ * that can't be cleanly fixed up.
+ *
+ * But the Linux kernel load_unaligned_zeropad() mechanism could cause a
+ * stray reference that can't be prevented by the caller, so Linux has
+ * specific code to handle this case. But when the #VC and #VE exceptions
+ * routed to a paravisor, the specific code doesn't work. To avoid this
+ * problem, mark the pages as "not present" while the transition is in
+ * progress. If load_unaligned_zeropad() causes a stray reference, a normal
+ * page fault is generated instead of #VC or #VE, and the page-fault-based
+ * handlers for load_unaligned_zeropad() resolve the reference.  When the
+ * transition is complete, hv_vtom_set_host_visibility() marks the pages
+ * as "present" again.
+ */
+static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
+{
+	return !set_memory_np(kbuffer, pagecount);
+}
+
 /*
  * hv_vtom_set_host_visibility - Set specified memory visible to host.
  *
@@ -521,7 +547,7 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
 
 	pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
 	if (!pfn_array)
-		return false;
+		goto err_set_memory_p;
 
 	for (i = 0, pfn = 0; i < pagecount; i++) {
 		/*
@@ -545,14 +571,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
 		}
 	}
 
- err_free_pfn_array:
+err_free_pfn_array:
 	kfree(pfn_array);
+
+err_set_memory_p:
+	/*
+	 * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present()
+	 * did. Do this even if there is an error earlier in this function in
+	 * order to avoid leaving the memory range in a "broken" state. Setting
+	 * the PRESENT bits shouldn't fail, but return an error if it does.
+	 */
+	if (set_memory_p(kbuffer, pagecount))
+		result = false;
+
 	return result;
 }
 
 static bool hv_vtom_tlb_flush_required(bool private)
 {
-	return true;
+	/*
+	 * Since hv_vtom_clear_present() marks the PTEs as "not present"
+	 * and flushes the TLB, they can't be in the TLB. That makes the
+	 * flush controlled by this function redundant, so return "false".
+	 */
+	return false;
 }
 
 static bool hv_vtom_cache_flush_required(void)
@@ -615,6 +657,7 @@ void __init hv_vtom_init(void)
 	x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
 	x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
 	x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
+	x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present;
 	x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
 
 	/* Set WB as the default cache mode. */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-05 18:30 ` [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback mhkelley58
@ 2024-01-08 13:07   ` kirill.shutemov
  2024-01-08 16:00     ` Michael Kelley
  2024-01-12  1:20   ` Edgecombe, Rick P
  1 sibling, 1 reply; 18+ messages in thread
From: kirill.shutemov @ 2024-01-08 13:07 UTC (permalink / raw)
  To: mhklinux
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, haiyangz, wei.liu, decui,
	luto, peterz, akpm, urezki, hch, lstoakes, thomas.lendacky, ardb,
	jroedel, seanjc, rick.p.edgecombe, sathyanarayanan.kuppuswamy,
	linux-kernel, linux-coco, linux-hyperv, linux-mm

On Fri, Jan 05, 2024 at 10:30:23AM -0800, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> In preparation for temporarily marking pages not present during a
> transition between encrypted and decrypted, use slow_virt_to_phys()
> in the hypervisor callback. As long as the PFN is correct,
> slow_virt_to_phys() works even if the leaf PTE is not present.
> The existing functions that depend on vmalloc_to_page() all
> require that the leaf PTE be marked present, so they don't work.
> 
> Update the comments for slow_virt_to_phys() to note this broader usage
> and the requirement to work even if the PTE is not marked present.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
>  arch/x86/hyperv/ivm.c        |  9 ++++++++-
>  arch/x86/mm/pat/set_memory.c | 13 +++++++++----
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 02e55237d919..8ba18635e338 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -524,7 +524,14 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
>  		return false;
>  
>  	for (i = 0, pfn = 0; i < pagecount; i++) {
> -		pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
> +		/*
> +		 * Use slow_virt_to_phys() because the PRESENT bit has been
> +		 * temporarily cleared in the PTEs.  slow_virt_to_phys() works
> +		 * without the PRESENT bit while virt_to_hvpfn() or similar
> +		 * does not.
> +		 */
> +		pfn_array[pfn] = slow_virt_to_phys((void *)kbuffer +
> +					i * HV_HYP_PAGE_SIZE) >> HV_HYP_PAGE_SHIFT;

I think you can make it much more readable by introducing few variables:

		virt = (void *)kbuffer + i * HV_HYPPAGE_SIZE;
		phys = slow_virt_to_phys(virt);
		pfn_array[pfn] = phys >> HV_HYP_PAGE_SHIFT;

>  		pfn++;
>  
>  		if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index bda9f129835e..8e19796e7ce5 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -755,10 +755,15 @@ pmd_t *lookup_pmd_address(unsigned long address)
>   * areas on 32-bit NUMA systems.  The percpu areas can
>   * end up in this kind of memory, for instance.
>   *
> - * This could be optimized, but it is only intended to be
> - * used at initialization time, and keeping it
> - * unoptimized should increase the testing coverage for
> - * the more obscure platforms.
> + * It is also used in callbacks for CoCo VM page transitions between private
> + * and shared because it works when the PRESENT bit is not set in the leaf
> + * PTE. In such cases, the state of the PTEs, including the PFN, is otherwise
> + * known to be valid, so the returned physical address is correct. The similar
> + * function vmalloc_to_pfn() can't be used because it requires the PRESENT bit.
> + *
> + * This could be optimized, but it is only used in paths that are not perf
> + * sensitive, and keeping it unoptimized should increase the testing coverage
> + * for the more obscure platforms.
>   */
>  phys_addr_t slow_virt_to_phys(void *__virt_addr)
>  {
> -- 
> 2.25.1
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static
  2024-01-05 18:30 ` [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static mhkelley58
@ 2024-01-08 13:10   ` kirill.shutemov
  2024-01-12  0:56   ` Edgecombe, Rick P
  1 sibling, 0 replies; 18+ messages in thread
From: kirill.shutemov @ 2024-01-08 13:10 UTC (permalink / raw)
  To: mhklinux
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, haiyangz, wei.liu, decui,
	luto, peterz, akpm, urezki, hch, lstoakes, thomas.lendacky, ardb,
	jroedel, seanjc, rick.p.edgecombe, sathyanarayanan.kuppuswamy,
	linux-kernel, linux-coco, linux-hyperv, linux-mm

On Fri, Jan 05, 2024 at 10:30:24AM -0800, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> set_memory_p() is currently static.  It has parameters that don't
> match set_memory_p() under arch/powerpc and that aren't congruent
> with the other set_memory_* functions. There's no good reason for
> the difference.
> 
> Fix this by making the parameters consistent, and update the one
> existing call site.  Make the function non-static and add it to
> include/asm/set_memory.h so that it is completely parallel to
> set_memory_np() and is usable in other modules.
> 
> No functional change.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-08 13:07   ` kirill.shutemov
@ 2024-01-08 16:00     ` Michael Kelley
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kelley @ 2024-01-08 16:00 UTC (permalink / raw)
  To: kirill.shutemov
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, haiyangz, wei.liu, decui,
	luto, peterz, akpm, urezki, hch, lstoakes, thomas.lendacky, ardb,
	jroedel, seanjc, rick.p.edgecombe, sathyanarayanan.kuppuswamy,
	linux-kernel, linux-coco, linux-hyperv, linux-mm

From: kirill.shutemov@linux.intel.com <kirill.shutemov@linux.intel.com> Sent: Monday, January 8, 2024 5:08 AM
> 
> On Fri, Jan 05, 2024 at 10:30:23AM -0800, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > In preparation for temporarily marking pages not present during a
> > transition between encrypted and decrypted, use slow_virt_to_phys()
> > in the hypervisor callback. As long as the PFN is correct,
> > slow_virt_to_phys() works even if the leaf PTE is not present.
> > The existing functions that depend on vmalloc_to_page() all
> > require that the leaf PTE be marked present, so they don't work.
> >
> > Update the comments for slow_virt_to_phys() to note this broader usage
> > and the requirement to work even if the PTE is not marked present.
> >
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > ---
> >  arch/x86/hyperv/ivm.c        |  9 ++++++++-
> >  arch/x86/mm/pat/set_memory.c | 13 +++++++++----
> >  2 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> > index 02e55237d919..8ba18635e338 100644
> > --- a/arch/x86/hyperv/ivm.c
> > +++ b/arch/x86/hyperv/ivm.c
> > @@ -524,7 +524,14 @@ static bool hv_vtom_set_host_visibility(unsigned
> long kbuffer, int pagecount, bo
> >  		return false;
> >
> >  	for (i = 0, pfn = 0; i < pagecount; i++) {
> > -		pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
> > +		/*
> > +		 * Use slow_virt_to_phys() because the PRESENT bit has been
> > +		 * temporarily cleared in the PTEs.  slow_virt_to_phys() works
> > +		 * without the PRESENT bit while virt_to_hvpfn() or similar
> > +		 * does not.
> > +		 */
> > +		pfn_array[pfn] = slow_virt_to_phys((void *)kbuffer +
> > +					i * HV_HYP_PAGE_SIZE) >> HV_HYP_PAGE_SHIFT;
> 
> I think you can make it much more readable by introducing few variables:
> 
> 		virt = (void *)kbuffer + i * HV_HYPPAGE_SIZE;
> 		phys = slow_virt_to_phys(virt);
> 		pfn_array[pfn] = phys >> HV_HYP_PAGE_SHIFT;
> 

Agreed.  I'll do this in the next version.

Michael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  2024-01-05 18:30 ` [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad() mhkelley58
@ 2024-01-08 18:37   ` Kuppuswamy Sathyanarayanan
  2024-01-08 19:13     ` Michael Kelley
  2024-01-12  0:26   ` Edgecombe, Rick P
  1 sibling, 1 reply; 18+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-01-08 18:37 UTC (permalink / raw)
  To: mhklinux, tglx, mingo, bp, dave.hansen, x86, hpa,
	kirill.shutemov, haiyangz, wei.liu, decui, luto, peterz, akpm,
	urezki, hch, lstoakes, thomas.lendacky, ardb, jroedel, seanjc,
	rick.p.edgecombe, linux-kernel, linux-coco, linux-hyperv,
	linux-mm



On 1/5/2024 10:30 AM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> In a CoCo VM, when transitioning memory from encrypted to decrypted, or
> vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
> is responsible for ensuring the memory isn't in use and isn't referenced
> while the transition is in progress.  The transition has multiple steps,
> and the memory is in an inconsistent state until all steps are complete.
> A reference while the state is inconsistent could result in an exception
> that can't be cleanly fixed up.
> 
> However, the kernel load_unaligned_zeropad() mechanism could cause a stray
> reference that can't be prevented by the caller of set_memory_encrypted()
> or set_memory_decrypted(), so there's specific code to handle this case.
> But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
> with the #VC or #VE exception routed to the paravisor. There's no
> architectural way to forward the exceptions back to the guest kernel, and
> in such a case, the load_unaligned_zeropad() specific code doesn't work.
> 
> To avoid this problem, mark pages as "not present" while a transition
> is in progress. If load_unaligned_zeropad() causes a stray reference, a
> normal page fault is generated instead of #VC or #VE, and the
> page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
> reference. When the encrypted/decrypted transition is complete, mark the
> pages as "present" again.

Change looks good to me. But I am wondering why are adding it part of prepare
and finish callbacks instead of directly in set_memory_encrypted() function.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
>  arch/x86/hyperv/ivm.c | 49 ++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 8ba18635e338..5ad39256a5d2 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -15,6 +15,7 @@
>  #include <asm/io.h>
>  #include <asm/coco.h>
>  #include <asm/mem_encrypt.h>
> +#include <asm/set_memory.h>
>  #include <asm/mshyperv.h>
>  #include <asm/hypervisor.h>
>  #include <asm/mtrr.h>
> @@ -502,6 +503,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  		return -EFAULT;
>  }
>  
> +/*
> + * When transitioning memory between encrypted and decrypted, the caller
> + * of set_memory_encrypted() or set_memory_decrypted() is responsible for
> + * ensuring that the memory isn't in use and isn't referenced while the
> + * transition is in progress.  The transition has multiple steps, and the
> + * memory is in an inconsistent state until all steps are complete. A
> + * reference while the state is inconsistent could result in an exception
> + * that can't be cleanly fixed up.
> + *
> + * But the Linux kernel load_unaligned_zeropad() mechanism could cause a
> + * stray reference that can't be prevented by the caller, so Linux has
> + * specific code to handle this case. But when the #VC and #VE exceptions
> + * routed to a paravisor, the specific code doesn't work. To avoid this
> + * problem, mark the pages as "not present" while the transition is in
> + * progress. If load_unaligned_zeropad() causes a stray reference, a normal
> + * page fault is generated instead of #VC or #VE, and the page-fault-based
> + * handlers for load_unaligned_zeropad() resolve the reference.  When the
> + * transition is complete, hv_vtom_set_host_visibility() marks the pages
> + * as "present" again.
> + */
> +static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
> +{
> +	return !set_memory_np(kbuffer, pagecount);
> +}
> +
>  /*
>   * hv_vtom_set_host_visibility - Set specified memory visible to host.
>   *
> @@ -521,7 +547,7 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
>  
>  	pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
>  	if (!pfn_array)
> -		return false;
> +		goto err_set_memory_p;
>  
>  	for (i = 0, pfn = 0; i < pagecount; i++) {
>  		/*
> @@ -545,14 +571,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
>  		}
>  	}
>  
> - err_free_pfn_array:
> +err_free_pfn_array:
>  	kfree(pfn_array);
> +
> +err_set_memory_p:
> +	/*
> +	 * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present()
> +	 * did. Do this even if there is an error earlier in this function in
> +	 * order to avoid leaving the memory range in a "broken" state. Setting
> +	 * the PRESENT bits shouldn't fail, but return an error if it does.
> +	 */
> +	if (set_memory_p(kbuffer, pagecount))
> +		result = false;
> +
>  	return result;
>  }
>  
>  static bool hv_vtom_tlb_flush_required(bool private)
>  {
> -	return true;
> +	/*
> +	 * Since hv_vtom_clear_present() marks the PTEs as "not present"
> +	 * and flushes the TLB, they can't be in the TLB. That makes the
> +	 * flush controlled by this function redundant, so return "false".
> +	 */
> +	return false;
>  }
>  
>  static bool hv_vtom_cache_flush_required(void)
> @@ -615,6 +657,7 @@ void __init hv_vtom_init(void)
>  	x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
>  	x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
>  	x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
> +	x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present;
>  	x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
>  
>  	/* Set WB as the default cache mode. */

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  2024-01-08 18:37   ` Kuppuswamy Sathyanarayanan
@ 2024-01-08 19:13     ` Michael Kelley
  2024-01-08 19:24       ` Kuppuswamy Sathyanarayanan
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Kelley @ 2024-01-08 19:13 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, tglx, mingo, bp, dave.hansen, x86,
	hpa, kirill.shutemov, haiyangz, wei.liu, decui, luto, peterz,
	akpm, urezki, hch, lstoakes, thomas.lendacky, ardb, jroedel,
	seanjc, rick.p.edgecombe, linux-kernel, linux-coco, linux-hyperv,
	linux-mm

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Sent: Monday, January 8, 2024 10:37 AM
> 
> On 1/5/2024 10:30 AM, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > In a CoCo VM, when transitioning memory from encrypted to decrypted, or
> > vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
> > is responsible for ensuring the memory isn't in use and isn't referenced
> > while the transition is in progress.  The transition has multiple steps,
> > and the memory is in an inconsistent state until all steps are complete.
> > A reference while the state is inconsistent could result in an exception
> > that can't be cleanly fixed up.
> >
> > However, the kernel load_unaligned_zeropad() mechanism could cause a stray
> > reference that can't be prevented by the caller of set_memory_encrypted()
> > or set_memory_decrypted(), so there's specific code to handle this case.
> > But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
> > with the #VC or #VE exception routed to the paravisor. There's no
> > architectural way to forward the exceptions back to the guest kernel, and
> > in such a case, the load_unaligned_zeropad() specific code doesn't work.
> >
> > To avoid this problem, mark pages as "not present" while a transition
> > is in progress. If load_unaligned_zeropad() causes a stray reference, a
> > normal page fault is generated instead of #VC or #VE, and the
> > page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
> > reference. When the encrypted/decrypted transition is complete, mark the
> > pages as "present" again.
> 
> Change looks good to me. But I am wondering why are adding it part of
> prepare and finish callbacks instead of directly in set_memory_encrypted() function.
> 

The prepare/finish callbacks are different for TDX, SEV-SNP, and
Hyper-V CoCo guests running with a paravisor -- so there are three sets
of callbacks.  As described in the cover letter, I've given up on using this
scheme for the TDX and SEV-SNP cases, because of the difficulty with
the SEV-SNP callbacks needing a valid virtual address (whereas TDX and
Hyper-V paravisor need only a physical address).  So it seems like the
callbacks specific to the Hyper-V paravisor are the natural place for the
code.  That leaves the TDX and SEV-SNP code paths unchanged, which
was my intent.

Or maybe I'm not understanding your comment?  If that's the case,
please elaborate.

Michael

> Reviewed-by: Kuppuswamy Sathyanarayanan
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  2024-01-08 19:13     ` Michael Kelley
@ 2024-01-08 19:24       ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 18+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-01-08 19:24 UTC (permalink / raw)
  To: Michael Kelley, tglx, mingo, bp, dave.hansen, x86, hpa,
	kirill.shutemov, haiyangz, wei.liu, decui, luto, peterz, akpm,
	urezki, hch, lstoakes, thomas.lendacky, ardb, jroedel, seanjc,
	rick.p.edgecombe, linux-kernel, linux-coco, linux-hyperv,
	linux-mm



On 1/8/2024 11:13 AM, Michael Kelley wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Sent: Monday, January 8, 2024 10:37 AM
>>
>> On 1/5/2024 10:30 AM, mhkelley58@gmail.com wrote:
>>> From: Michael Kelley <mhklinux@outlook.com>
>>>
>>> In a CoCo VM, when transitioning memory from encrypted to decrypted, or
>>> vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
>>> is responsible for ensuring the memory isn't in use and isn't referenced
>>> while the transition is in progress.  The transition has multiple steps,
>>> and the memory is in an inconsistent state until all steps are complete.
>>> A reference while the state is inconsistent could result in an exception
>>> that can't be cleanly fixed up.
>>>
>>> However, the kernel load_unaligned_zeropad() mechanism could cause a stray
>>> reference that can't be prevented by the caller of set_memory_encrypted()
>>> or set_memory_decrypted(), so there's specific code to handle this case.
>>> But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
>>> with the #VC or #VE exception routed to the paravisor. There's no
>>> architectural way to forward the exceptions back to the guest kernel, and
>>> in such a case, the load_unaligned_zeropad() specific code doesn't work.
>>>
>>> To avoid this problem, mark pages as "not present" while a transition
>>> is in progress. If load_unaligned_zeropad() causes a stray reference, a
>>> normal page fault is generated instead of #VC or #VE, and the
>>> page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
>>> reference. When the encrypted/decrypted transition is complete, mark the
>>> pages as "present" again.
>>
>> Change looks good to me. But I am wondering why are adding it part of
>> prepare and finish callbacks instead of directly in set_memory_encrypted() function.
>>
> 
> The prepare/finish callbacks are different for TDX, SEV-SNP, and
> Hyper-V CoCo guests running with a paravisor -- so there are three sets
> of callbacks.  As described in the cover letter, I've given up on using this
> scheme for the TDX and SEV-SNP cases, because of the difficulty with
> the SEV-SNP callbacks needing a valid virtual address (whereas TDX and
> Hyper-V paravisor need only a physical address).  So it seems like the
> callbacks specific to the Hyper-V paravisor are the natural place for the
> code.  That leaves the TDX and SEV-SNP code paths unchanged, which
> was my intent.
> 

Got it. Thanks for clarifying it.

> Or maybe I'm not understanding your comment?  If that's the case,
> please elaborate.
> 
> Michael
> 
>> Reviewed-by: Kuppuswamy Sathyanarayanan
>> <sathyanarayanan.kuppuswamy@linux.intel.com>
>>

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  2024-01-05 18:30 ` [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad() mhkelley58
  2024-01-08 18:37   ` Kuppuswamy Sathyanarayanan
@ 2024-01-12  0:26   ` Edgecombe, Rick P
  2024-01-12  3:19     ` Michael Kelley
  1 sibling, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-01-12  0:26 UTC (permalink / raw)
  To: linux-hyperv, ardb, hch, Lutomirski, Andy, mhklinux, linux-coco,
	dave.hansen, thomas.lendacky, haiyangz, akpm, kirill.shutemov,
	mingo, seanjc, linux-kernel, tglx, Cui, Dexuan, urezki, linux-mm,
	hpa, peterz, wei.liu, bp, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

On Fri, 2024-01-05 at 10:30 -0800, mhkelley58@gmail.com wrote:
>   * hv_vtom_set_host_visibility - Set specified memory visible to
> host.
>   *
> @@ -521,7 +547,7 @@ static bool hv_vtom_set_host_visibility(unsigned
> long kbuffer, int pagecount, bo
>  
>         pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
>         if (!pfn_array)
> -               return false;
> +               goto err_set_memory_p;
>  

If kmalloc() fails here, and set_memory_p() succeeds below, then
hv_vtom_set_host_visibility() returns true, but skips all the
hv_mark_gpa_visibility() work. Shouldn't it return false?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static
  2024-01-05 18:30 ` [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static mhkelley58
  2024-01-08 13:10   ` kirill.shutemov
@ 2024-01-12  0:56   ` Edgecombe, Rick P
  1 sibling, 0 replies; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-01-12  0:56 UTC (permalink / raw)
  To: linux-hyperv, ardb, hch, Lutomirski, Andy, mhklinux, linux-coco,
	dave.hansen, thomas.lendacky, haiyangz, akpm, kirill.shutemov,
	mingo, seanjc, linux-kernel, tglx, Cui, Dexuan, urezki, linux-mm,
	hpa, peterz, wei.liu, bp, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

On Fri, 2024-01-05 at 10:30 -0800, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> set_memory_p() is currently static.  It has parameters that don't
> match set_memory_p() under arch/powerpc and that aren't congruent
> with the other set_memory_* functions. There's no good reason for
> the difference.
> 
> Fix this by making the parameters consistent, and update the one
> existing call site.  Make the function non-static and add it to
> include/asm/set_memory.h so that it is completely parallel to
> set_memory_np() and is usable in other modules.
> 
> No functional change.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-05 18:30 ` [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback mhkelley58
  2024-01-08 13:07   ` kirill.shutemov
@ 2024-01-12  1:20   ` Edgecombe, Rick P
  2024-01-12 15:07     ` Michael Kelley
  1 sibling, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-01-12  1:20 UTC (permalink / raw)
  To: linux-hyperv, ardb, hch, Lutomirski, Andy, mhklinux, linux-coco,
	dave.hansen, thomas.lendacky, haiyangz, akpm, kirill.shutemov,
	mingo, seanjc, linux-kernel, tglx, Cui, Dexuan, urezki, linux-mm,
	hpa, peterz, wei.liu, bp, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

On Fri, 2024-01-05 at 10:30 -0800, mhkelley58@gmail.com wrote:
> + * It is also used in callbacks for CoCo VM page transitions between
> private
> + * and shared because it works when the PRESENT bit is not set in
> the leaf
> + * PTE. In such cases, the state of the PTEs, including the PFN, is
> otherwise
> + * known to be valid, so the returned physical address is correct.
> The similar
> + * function vmalloc_to_pfn() can't be used because it requires the
> PRESENT bit.

I'm not sure about this comment. It is mostly about callers far away
and other functions in vmalloc. Probably a decent chance to get stale.
It also kind of begs the question of why vmalloc_to_pfn() requires the
present bit in the leaf.

It seems the first part of the comment is about why this is needed when
__pa() exists. One reason given is that __pa() doesn't work with
vmalloc memory. Then the next bit talks about another similar function
that works with vmalloc memory.

So the comment is a risk to get stale, and leaves me a little confused
why this function exists.

I think the reason is because vmalloc_to_pfn() *only* works with
vmalloc memory and this is needed to work on other alias mappings.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  2024-01-12  0:26   ` Edgecombe, Rick P
@ 2024-01-12  3:19     ` Michael Kelley
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kelley @ 2024-01-12  3:19 UTC (permalink / raw)
  To: Edgecombe, Rick P, linux-hyperv, ardb, hch, Lutomirski, Andy,
	linux-coco, dave.hansen, thomas.lendacky, haiyangz, akpm,
	kirill.shutemov, mingo, seanjc, linux-kernel, tglx, Cui, Dexuan,
	urezki, linux-mm, hpa, peterz, wei.liu, bp, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

From: Edgecombe, Rick P <rick.p.edgecombe@intel.com> Sent: Thursday, January 11, 2024 4:27 PM
> 
> On Fri, 2024-01-05 at 10:30 -0800, mhkelley58@gmail.com wrote:
> >   * hv_vtom_set_host_visibility - Set specified memory visible to host.
> >   *
> > @@ -521,7 +547,7 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
> >
> >         pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
> >         if (!pfn_array)
> > -               return false;
> > +               goto err_set_memory_p;
> >
> 
> If kmalloc() fails here, and set_memory_p() succeeds below, then
> hv_vtom_set_host_visibility() returns true, but skips all the
> hv_mark_gpa_visibility() work. Shouldn't it return false?

Ooops.  Yes.  If the kmalloc() fails, need to set "result" to false before
doing the goto.  Will fix in the next version.

Michael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-12  1:20   ` Edgecombe, Rick P
@ 2024-01-12 15:07     ` Michael Kelley
  2024-01-12 17:17       ` Edgecombe, Rick P
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Kelley @ 2024-01-12 15:07 UTC (permalink / raw)
  To: Edgecombe, Rick P, linux-hyperv, ardb, hch, Lutomirski, Andy,
	linux-coco, dave.hansen, thomas.lendacky, haiyangz, akpm,
	kirill.shutemov, mingo, seanjc, linux-kernel, tglx, Cui, Dexuan,
	urezki, linux-mm, hpa, peterz, wei.liu, bp, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

From: Edgecombe, Rick P <rick.p.edgecombe@intel.com> Sent: Thursday, January 11, 2024 5:20 PM
> 
> On Fri, 2024-01-05 at 10:30 -0800, mhkelley58@gmail.com wrote:
> > + * It is also used in callbacks for CoCo VM page transitions between private
> > + * and shared because it works when the PRESENT bit is not set in the leaf
> > + * PTE. In such cases, the state of the PTEs, including the PFN, is otherwise
> > + * known to be valid, so the returned physical address is correct.  The similar
> > + * function vmalloc_to_pfn() can't be used because it requires the PRESENT bit.
> 
> I'm not sure about this comment. It is mostly about callers far away
> and other functions in vmalloc. Probably a decent chance to get stale.
> It also kind of begs the question of why vmalloc_to_pfn() requires the
> present bit in the leaf.

The comment is Kirill Shutemov's suggestion based on comments in
an earlier version of the patch series.  See [1].   The intent is to prevent
some later revision to slow_virt_to_phys() from adding a check for the
present bit and breaking the CoCo VM hypervisor callback.  Yes, the
comment could get stale, but I'm not sure how else to call out the
implicit dependency.  The idea of creating a private version of
slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
is also discussed in the thread, but that seems worse overall.

As for why vmalloc_to_page() checks the present bit, I don't know.
But vmalloc_to_page() is very widely used, so trying to change it
doesn't seem viable.

> 
> It seems the first part of the comment is about why this is needed when
> __pa() exists. One reason given is that __pa() doesn't work with
> vmalloc memory. Then the next bit talks about another similar function
> that works with vmalloc memory.
> 
> So the comment is a risk to get stale, and leaves me a little confused
> why this function exists.
> 
> I think the reason is because vmalloc_to_pfn() *only* works with
> vmalloc memory and this is needed to work on other alias mappings.

Presumably so.  The first paragraph of the existing comment also
calls out "alloc_remap() areas on 32-bit NUMA systems" as
needing slow_virt_to_phys().

Michael

[1] https://lore.kernel.org/lkml/20230828221333.5blshosyqafbgwlc@box.shutemov.name/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-12 15:07     ` Michael Kelley
@ 2024-01-12 17:17       ` Edgecombe, Rick P
  2024-01-12 19:24         ` Michael Kelley
  0 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-01-12 17:17 UTC (permalink / raw)
  To: linux-coco, ardb, Lutomirski, Andy, mhklinux, hch, dave.hansen,
	thomas.lendacky, haiyangz, akpm, kirill.shutemov, mingo, seanjc,
	linux-kernel, tglx, Cui, Dexuan, urezki, linux-mm, linux-hyperv,
	peterz, hpa, bp, wei.liu, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

On Fri, 2024-01-12 at 15:07 +0000, Michael Kelley wrote:
> The comment is Kirill Shutemov's suggestion based on comments in
> an earlier version of the patch series.  See [1].   The intent is to
> prevent
> some later revision to slow_virt_to_phys() from adding a check for
> the
> present bit and breaking the CoCo VM hypervisor callback.  Yes, the
> comment could get stale, but I'm not sure how else to call out the
> implicit dependency.  The idea of creating a private version of
> slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
> is also discussed in the thread, but that seems worse overall.

Well, it's not a huge deal, but I would have just put a comment at the
caller like:

/*
 * Use slow_virt_to_phys() instead of vmalloc_to_page(), because it
 * returns the PFN even for NP PTEs.
 */

If someone is changing slow_virt_to_phys() they should check the
callers to make sure they won't break anything. They can see the
comment then and have an easy time.

An optional comment at slow_virt_to_phys() could explain how the
function works in regards to the present bit, but not include details
about CoCoO VM page transition's usage of the present bit. The proposed
comment text looks like something more appropriate for a commit log.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-12 17:17       ` Edgecombe, Rick P
@ 2024-01-12 19:24         ` Michael Kelley
  2024-01-15 10:00           ` kirill.shutemov
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Kelley @ 2024-01-12 19:24 UTC (permalink / raw)
  To: Edgecombe, Rick P, linux-coco, ardb, Lutomirski, Andy, hch,
	dave.hansen, thomas.lendacky, haiyangz, akpm, kirill.shutemov,
	mingo, seanjc, linux-kernel, tglx, Cui, Dexuan, urezki, linux-mm,
	linux-hyperv, peterz, hpa, bp, wei.liu, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

From: Edgecombe, Rick P <rick.p.edgecombe@intel.com> Sent: Friday, January 12, 2024 9:17 AM
> 
> On Fri, 2024-01-12 at 15:07 +0000, Michael Kelley wrote:
> > The comment is Kirill Shutemov's suggestion based on comments in
> > an earlier version of the patch series.  See [1].   The intent is to
> > prevent
> > some later revision to slow_virt_to_phys() from adding a check for
> > the
> > present bit and breaking the CoCo VM hypervisor callback.  Yes, the
> > comment could get stale, but I'm not sure how else to call out the
> > implicit dependency.  The idea of creating a private version of
> > slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
> > is also discussed in the thread, but that seems worse overall.
> 
> Well, it's not a huge deal, but I would have just put a comment at the
> caller like:
> 
> /*
>  * Use slow_virt_to_phys() instead of vmalloc_to_page(), because it
>  * returns the PFN even for NP PTEs.
>  */

Yes, that comment is added in this patch.

> 
> If someone is changing slow_virt_to_phys() they should check the
> callers to make sure they won't break anything. They can see the
> comment then and have an easy time.
> 
> An optional comment at slow_virt_to_phys() could explain how the
> function works in regards to the present bit, but not include details
> about CoCoO VM page transition's usage of the present bit. The proposed
> comment text looks like something more appropriate for a commit log.

Kirill -- you originally asked for a comment in slow_virt_to_phys(). [1]
Are you OK with Rick's proposal?

Michael

[1] https://lore.kernel.org/lkml/20230828221333.5blshosyqafbgwlc@box.shutemov.name/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  2024-01-12 19:24         ` Michael Kelley
@ 2024-01-15 10:00           ` kirill.shutemov
  0 siblings, 0 replies; 18+ messages in thread
From: kirill.shutemov @ 2024-01-15 10:00 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Edgecombe, Rick P, linux-coco, ardb, Lutomirski, Andy, hch,
	dave.hansen, thomas.lendacky, haiyangz, akpm, mingo, seanjc,
	linux-kernel, tglx, Cui, Dexuan, urezki, linux-mm, linux-hyperv,
	peterz, hpa, bp, wei.liu, Rodel, Jorg,
	sathyanarayanan.kuppuswamy, lstoakes, x86

On Fri, Jan 12, 2024 at 07:24:35PM +0000, Michael Kelley wrote:
> From: Edgecombe, Rick P <rick.p.edgecombe@intel.com> Sent: Friday, January 12, 2024 9:17 AM
> > 
> > On Fri, 2024-01-12 at 15:07 +0000, Michael Kelley wrote:
> > > The comment is Kirill Shutemov's suggestion based on comments in
> > > an earlier version of the patch series.  See [1].   The intent is to
> > > prevent
> > > some later revision to slow_virt_to_phys() from adding a check for
> > > the
> > > present bit and breaking the CoCo VM hypervisor callback.  Yes, the
> > > comment could get stale, but I'm not sure how else to call out the
> > > implicit dependency.  The idea of creating a private version of
> > > slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
> > > is also discussed in the thread, but that seems worse overall.
> > 
> > Well, it's not a huge deal, but I would have just put a comment at the
> > caller like:
> > 
> > /*
> >  * Use slow_virt_to_phys() instead of vmalloc_to_page(), because it
> >  * returns the PFN even for NP PTEs.
> >  */
> 
> Yes, that comment is added in this patch.
> 
> > 
> > If someone is changing slow_virt_to_phys() they should check the
> > callers to make sure they won't break anything. They can see the
> > comment then and have an easy time.
> > 
> > An optional comment at slow_virt_to_phys() could explain how the
> > function works in regards to the present bit, but not include details
> > about CoCoO VM page transition's usage of the present bit. The proposed
> > comment text looks like something more appropriate for a commit log.
> 
> Kirill -- you originally asked for a comment in slow_virt_to_phys(). [1]
> Are you OK with Rick's proposal?

Yes, sounds sensible.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-01-15 10:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-05 18:30 [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state mhkelley58
2024-01-05 18:30 ` [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback mhkelley58
2024-01-08 13:07   ` kirill.shutemov
2024-01-08 16:00     ` Michael Kelley
2024-01-12  1:20   ` Edgecombe, Rick P
2024-01-12 15:07     ` Michael Kelley
2024-01-12 17:17       ` Edgecombe, Rick P
2024-01-12 19:24         ` Michael Kelley
2024-01-15 10:00           ` kirill.shutemov
2024-01-05 18:30 ` [PATCH v3 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static mhkelley58
2024-01-08 13:10   ` kirill.shutemov
2024-01-12  0:56   ` Edgecombe, Rick P
2024-01-05 18:30 ` [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad() mhkelley58
2024-01-08 18:37   ` Kuppuswamy Sathyanarayanan
2024-01-08 19:13     ` Michael Kelley
2024-01-08 19:24       ` Kuppuswamy Sathyanarayanan
2024-01-12  0:26   ` Edgecombe, Rick P
2024-01-12  3:19     ` Michael Kelley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.