All of lore.kernel.org
 help / color / mirror / Atom feed
* [v4 0/5] Migrate non-migrated pages of a SVM.
@ 2020-07-17  8:00 ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

The time to switch a VM to Secure-VM, increases by the size of the VM.
A 100GB VM takes about 7minutes. This is unacceptable.  This linear
increase is caused by a suboptimal behavior by the Ultravisor and the
Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of the
VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the remaining GFNs as
normal GFNs. This leads to memory corruption; manifested
consistently when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But the hypervisor cannot handle any H_SVM_PAGE_IN
requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors
in the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a
SVM, and thus solves the problem.

Testing: Passed rigorous testing using various sized SVMs.

Changelog:

v4:  .  Incorported Bharata's comments:
	- Optimization -- replace write mmap semaphore with read mmap semphore.
	- disable page-merge during memory hotplug.
	- rearranged the patches. consolidated the page-migration-retry logic
		in a single patch.

v3: . Optimized the page-migration retry-logic. 
    . Relax and relinquish the cpu regularly while bulk migrating
    	the non-migrated pages. This issue was causing soft-lockups.
	Fixed it.
    . Added a new patch, to retry page-migration a couple of times
    	before returning H_BUSY in H_SVM_PAGE_IN. This issue was
	seen a few times in a 24hour continuous reboot test of the SVMs.

v2: . fixed a bug observed by Laurent. The state of the GFN's associated
	with Secure-VMs were not reset during memslot flush.
    . Re-organized the code, for easier review.
    . Better description of the patch series.

v1: fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.


Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (4):
  KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs
    to secure-GFNs.
  KVM: PPC: Book3S HV: retry page migration before erroring-out

 Documentation/powerpc/ultravisor.rst        |   4 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  12 +
 arch/powerpc/kvm/book3s_hv.c                |  10 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 508 ++++++++++++++++++++++++----
 4 files changed, 457 insertions(+), 77 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4 0/5] Migrate non-migrated pages of a SVM.
@ 2020-07-17  8:00 ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

The time to switch a VM to Secure-VM, increases by the size of the VM.
A 100GB VM takes about 7minutes. This is unacceptable.  This linear
increase is caused by a suboptimal behavior by the Ultravisor and the
Hypervisor.  The Ultravisor unnecessarily migrates all the GFN of the
VM from normal-memory to secure-memory. It has to just migrate the
necessary and sufficient GFNs.

However when the optimization is incorporated in the Ultravisor, the
Hypervisor starts misbehaving. The Hypervisor has a inbuilt assumption
that the Ultravisor will explicitly request to migrate, each and every
GFN of the VM. If only necessary and sufficient GFNs are requested for
migration, the Hypervisor continues to manage the remaining GFNs as
normal GFNs. This leads to memory corruption; manifested
consistently when the SVM reboots.

The same is true, when a memory slot is hotplugged into a SVM. The
Hypervisor expects the ultravisor to request migration of all GFNs to
secure-GFN.  But the hypervisor cannot handle any H_SVM_PAGE_IN
requests from the Ultravisor, done in the context of
UV_REGISTER_MEM_SLOT ucall.  This problem manifests as random errors
in the SVM, when a memory-slot is hotplugged.

This patch series automatically migrates the non-migrated pages of a
SVM, and thus solves the problem.

Testing: Passed rigorous testing using various sized SVMs.

Changelog:

v4:  .  Incorported Bharata's comments:
	- Optimization -- replace write mmap semaphore with read mmap semphore.
	- disable page-merge during memory hotplug.
	- rearranged the patches. consolidated the page-migration-retry logic
		in a single patch.

v3: . Optimized the page-migration retry-logic. 
    . Relax and relinquish the cpu regularly while bulk migrating
    	the non-migrated pages. This issue was causing soft-lockups.
	Fixed it.
    . Added a new patch, to retry page-migration a couple of times
    	before returning H_BUSY in H_SVM_PAGE_IN. This issue was
	seen a few times in a 24hour continuous reboot test of the SVMs.

v2: . fixed a bug observed by Laurent. The state of the GFN's associated
	with Secure-VMs were not reset during memslot flush.
    . Re-organized the code, for easier review.
    . Better description of the patch series.

v1: fixed a bug observed by Bharata. Pages that where paged-in and later
paged-out must also be skipped from migration during H_SVM_INIT_DONE.


Laurent Dufour (1):
  KVM: PPC: Book3S HV: migrate hot plugged memory

Ram Pai (4):
  KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs
    to secure-GFNs.
  KVM: PPC: Book3S HV: retry page migration before erroring-out

 Documentation/powerpc/ultravisor.rst        |   4 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  12 +
 arch/powerpc/kvm/book3s_hv.c                |  10 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 508 ++++++++++++++++++++++++----
 4 files changed, 457 insertions(+), 77 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [v4 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  2020-07-17  8:00 ` Ram Pai
@ 2020-07-17  8:00   ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

Page-merging of pages in memory-slots associated with a Secure VM,
is disabled in H_SVM_PAGE_IN handler.

This operation should have been done much earlier; the moment the VM
is initiated for secure-transition. Delaying this operation, increases
the probability for those pages to acquire new references , making it
impossible to migrate those pages.

Disable page-migration in H_SVM_INIT_START handling.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 Documentation/powerpc/ultravisor.rst |  1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 98 +++++++++++++++++++++++++++---------
 2 files changed, 76 insertions(+), 23 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index df136c8..a1c8c37 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -895,6 +895,7 @@ Return values
     One of the following values:
 
 	* H_SUCCESS	 on success.
+        * H_STATE        if the VM is not in a position to switch to secure.
 
 Description
 ~~~~~~~~~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index e6f76bc..0baa293 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 	return false;
 }
 
+static int kvmppc_memslot_page_merge(struct kvm *kvm,
+		struct kvm_memory_slot *memslot, bool merge)
+{
+	unsigned long gfn = memslot->base_gfn;
+	unsigned long end, start = gfn_to_hva(kvm, gfn);
+	int ret = 0;
+	struct vm_area_struct *vma;
+	int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
+
+	if (kvm_is_error_hva(start))
+		return H_STATE;
+
+	end = start + (memslot->npages << PAGE_SHIFT);
+
+	mmap_write_lock(kvm->mm);
+	do {
+		vma = find_vma_intersection(kvm->mm, start, end);
+		if (!vma) {
+			ret = H_STATE;
+			break;
+		}
+		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+			  merge_flag, &vma->vm_flags);
+		if (ret) {
+			ret = H_STATE;
+			break;
+		}
+		start = vma->vm_end + 1;
+	} while (end > vma->vm_end);
+
+	mmap_write_unlock(kvm->mm);
+	return ret;
+}
+
+static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int ret = 0;
+
+	slots = kvm_memslots(kvm);
+	kvm_for_each_memslot(memslot, slots) {
+		ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+static inline int kvmppc_disable_page_merge(struct kvm *kvm)
+{
+	return __kvmppc_page_merge(kvm, false);
+}
+
+static inline int kvmppc_enable_page_merge(struct kvm *kvm)
+{
+	return __kvmppc_page_merge(kvm, true);
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
 	struct kvm_memslots *slots;
@@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 		return H_AUTHORITY;
 
 	srcu_idx = srcu_read_lock(&kvm->srcu);
+
+	/* disable page-merging for all memslot */
+	ret = kvmppc_disable_page_merge(kvm);
+	if (ret)
+		goto out;
+
+	/* register the memslot */
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots) {
 		if (kvmppc_uvmem_slot_init(kvm, memslot)) {
 			ret = H_PARAMETER;
-			goto out;
+			break;
 		}
 		ret = uv_register_mem_slot(kvm->arch.lpid,
 					   memslot->base_gfn << PAGE_SHIFT,
@@ -245,9 +311,12 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 		if (ret < 0) {
 			kvmppc_uvmem_slot_free(kvm, memslot);
 			ret = H_PARAMETER;
-			goto out;
+			break;
 		}
 	}
+
+	if (ret)
+		kvmppc_enable_page_merge(kvm);
 out:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
@@ -384,7 +453,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
  */
 static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 		   unsigned long end, unsigned long gpa, struct kvm *kvm,
-		   unsigned long page_shift, bool *downgrade)
+		   unsigned long page_shift)
 {
 	unsigned long src_pfn, dst_pfn = 0;
 	struct migrate_vma mig;
@@ -400,18 +469,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	mig.src = &src_pfn;
 	mig.dst = &dst_pfn;
 
-	/*
-	 * We come here with mmap_lock write lock held just for
-	 * ksm_madvise(), otherwise we only need read mmap_lock.
-	 * Hence downgrade to read lock once ksm_madvise() is done.
-	 */
-	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
-			  MADV_UNMERGEABLE, &vma->vm_flags);
-	mmap_write_downgrade(kvm->mm);
-	*downgrade = true;
-	if (ret)
-		return ret;
-
 	ret = migrate_vma_setup(&mig);
 	if (ret)
 		return ret;
@@ -503,7 +560,6 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 		unsigned long flags,
 		unsigned long page_shift)
 {
-	bool downgrade = false;
 	unsigned long start, end;
 	struct vm_area_struct *vma;
 	int srcu_idx;
@@ -524,7 +580,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 
 	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-	mmap_write_lock(kvm->mm);
+	mmap_read_lock(kvm->mm);
 
 	start = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(start))
@@ -540,16 +596,12 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (!vma || vma->vm_start > start || vma->vm_end < end)
 		goto out_unlock;
 
-	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift,
-				&downgrade))
+	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
 		ret = H_SUCCESS;
 out_unlock:
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
-	if (downgrade)
-		mmap_read_unlock(kvm->mm);
-	else
-		mmap_write_unlock(kvm->mm);
+	mmap_read_unlock(kvm->mm);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
@ 2020-07-17  8:00   ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

Page-merging of pages in memory-slots associated with a Secure VM,
is disabled in H_SVM_PAGE_IN handler.

This operation should have been done much earlier; the moment the VM
is initiated for secure-transition. Delaying this operation, increases
the probability for those pages to acquire new references , making it
impossible to migrate those pages.

Disable page-migration in H_SVM_INIT_START handling.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 Documentation/powerpc/ultravisor.rst |  1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 98 +++++++++++++++++++++++++++---------
 2 files changed, 76 insertions(+), 23 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index df136c8..a1c8c37 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -895,6 +895,7 @@ Return values
     One of the following values:
 
 	* H_SUCCESS	 on success.
+        * H_STATE        if the VM is not in a position to switch to secure.
 
 Description
 ~~~~~~~~~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index e6f76bc..0baa293 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 	return false;
 }
 
+static int kvmppc_memslot_page_merge(struct kvm *kvm,
+		struct kvm_memory_slot *memslot, bool merge)
+{
+	unsigned long gfn = memslot->base_gfn;
+	unsigned long end, start = gfn_to_hva(kvm, gfn);
+	int ret = 0;
+	struct vm_area_struct *vma;
+	int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
+
+	if (kvm_is_error_hva(start))
+		return H_STATE;
+
+	end = start + (memslot->npages << PAGE_SHIFT);
+
+	mmap_write_lock(kvm->mm);
+	do {
+		vma = find_vma_intersection(kvm->mm, start, end);
+		if (!vma) {
+			ret = H_STATE;
+			break;
+		}
+		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+			  merge_flag, &vma->vm_flags);
+		if (ret) {
+			ret = H_STATE;
+			break;
+		}
+		start = vma->vm_end + 1;
+	} while (end > vma->vm_end);
+
+	mmap_write_unlock(kvm->mm);
+	return ret;
+}
+
+static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int ret = 0;
+
+	slots = kvm_memslots(kvm);
+	kvm_for_each_memslot(memslot, slots) {
+		ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+static inline int kvmppc_disable_page_merge(struct kvm *kvm)
+{
+	return __kvmppc_page_merge(kvm, false);
+}
+
+static inline int kvmppc_enable_page_merge(struct kvm *kvm)
+{
+	return __kvmppc_page_merge(kvm, true);
+}
+
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 {
 	struct kvm_memslots *slots;
@@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 		return H_AUTHORITY;
 
 	srcu_idx = srcu_read_lock(&kvm->srcu);
+
+	/* disable page-merging for all memslot */
+	ret = kvmppc_disable_page_merge(kvm);
+	if (ret)
+		goto out;
+
+	/* register the memslot */
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots) {
 		if (kvmppc_uvmem_slot_init(kvm, memslot)) {
 			ret = H_PARAMETER;
-			goto out;
+			break;
 		}
 		ret = uv_register_mem_slot(kvm->arch.lpid,
 					   memslot->base_gfn << PAGE_SHIFT,
@@ -245,9 +311,12 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 		if (ret < 0) {
 			kvmppc_uvmem_slot_free(kvm, memslot);
 			ret = H_PARAMETER;
-			goto out;
+			break;
 		}
 	}
+
+	if (ret)
+		kvmppc_enable_page_merge(kvm);
 out:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
@@ -384,7 +453,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
  */
 static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 		   unsigned long end, unsigned long gpa, struct kvm *kvm,
-		   unsigned long page_shift, bool *downgrade)
+		   unsigned long page_shift)
 {
 	unsigned long src_pfn, dst_pfn = 0;
 	struct migrate_vma mig;
@@ -400,18 +469,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	mig.src = &src_pfn;
 	mig.dst = &dst_pfn;
 
-	/*
-	 * We come here with mmap_lock write lock held just for
-	 * ksm_madvise(), otherwise we only need read mmap_lock.
-	 * Hence downgrade to read lock once ksm_madvise() is done.
-	 */
-	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
-			  MADV_UNMERGEABLE, &vma->vm_flags);
-	mmap_write_downgrade(kvm->mm);
-	*downgrade = true;
-	if (ret)
-		return ret;
-
 	ret = migrate_vma_setup(&mig);
 	if (ret)
 		return ret;
@@ -503,7 +560,6 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 		unsigned long flags,
 		unsigned long page_shift)
 {
-	bool downgrade = false;
 	unsigned long start, end;
 	struct vm_area_struct *vma;
 	int srcu_idx;
@@ -524,7 +580,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 
 	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-	mmap_write_lock(kvm->mm);
+	mmap_read_lock(kvm->mm);
 
 	start = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(start))
@@ -540,16 +596,12 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (!vma || vma->vm_start > start || vma->vm_end < end)
 		goto out_unlock;
 
-	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift,
-				&downgrade))
+	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
 		ret = H_SUCCESS;
 out_unlock:
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
-	if (downgrade)
-		mmap_read_unlock(kvm->mm);
-	else
-		mmap_write_unlock(kvm->mm);
+	mmap_read_unlock(kvm->mm);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  2020-07-17  8:00 ` Ram Pai
@ 2020-07-17  8:00   ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

During the life of SVM, its GFNs transition through normal, secure and
shared states. Since the kernel does not track GFNs that are shared, it
is not possible to disambiguate a shared GFN from a GFN whose PFN has
not yet been migrated to a secure-PFN. Also it is not possible to
disambiguate a secure-GFN from a GFN whose GFN has been pagedout from
the ultravisor.

The ability to identify the state of a GFN is needed to skip migration
of its PFN to secure-PFN during ESM transition.

The code is re-organized to track the states of a GFN as explained
below.

************************************************************************
 1. States of a GFN
    ---------------
 The GFN can be in one of the following states.

 (a) Secure - The GFN is secure. The GFN is associated with
 	a Secure VM, the contents of the GFN is not accessible
 	to the Hypervisor.  This GFN can be backed by a secure-PFN,
 	or can be backed by a normal-PFN with contents encrypted.
 	The former is true when the GFN is paged-in into the
 	ultravisor. The latter is true when the GFN is paged-out
 	of the ultravisor.

 (b) Shared - The GFN is shared. The GFN is associated with a
 	a secure VM. The contents of the GFN is accessible to
 	Hypervisor. This GFN is backed by a normal-PFN and its
 	content is un-encrypted.

 (c) Normal - The GFN is a normal. The GFN is associated with
 	a normal VM. The contents of the GFN is accesible to
 	the Hypervisor. Its content is never encrypted.

 2. States of a VM.
    ---------------

 (a) Normal VM:  A VM whose contents are always accessible to
 	the hypervisor.  All its GFNs are normal-GFNs.

 (b) Secure VM: A VM whose contents are not accessible to the
 	hypervisor without the VM's consent.  Its GFNs are
 	either Shared-GFN or Secure-GFNs.

 (c) Transient VM: A Normal VM that is transitioning to secure VM.
 	The transition starts on successful return of
 	H_SVM_INIT_START, and ends on successful return
 	of H_SVM_INIT_DONE. This transient VM, can have GFNs
 	in any of the three states; i.e Secure-GFN, Shared-GFN,
 	and Normal-GFN.	The VM never executes in this state
 	in supervisor-mode.

 3. Memory slot State.
    ------------------
  	The state of a memory slot mirrors the state of the
  	VM the memory slot is associated with.

 4. VM State transition.
    --------------------

  A VM always starts in Normal Mode.

  H_SVM_INIT_START moves the VM into transient state. During this
  time the Ultravisor may request some of its GFNs to be shared or
  secured. So its GFNs can be in one of the three GFN states.

  H_SVM_INIT_DONE moves the VM entirely from transient state to
  secure-state. At this point any left-over normal-GFNs are
  transitioned to Secure-GFN.

  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
  All its GFNs are moved to Normal-GFNs.

  UV_TERMINATE transitions the secure-VM back to normal-VM. All
  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
  Note: The contents of the normal-GFN is undefined at this point.

 5. GFN state implementation:
    -------------------------

 Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
 when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
 set, and contains the value of the secure-PFN.
 It is associated with a normal-PFN; also called mem_pfn, when
 the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
 The value of the normal-PFN is not tracked.

 Shared GFN is associated with a normal-PFN. Its pfn[] has
 KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
 is not tracked.

 Normal GFN is associated with normal-PFN. Its pfn[] has
 no flag set. The value of the normal-PFN is not tracked.

 6. Life cycle of a GFN
    --------------------
 --------------------------------------------------------------
 |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
 |        |operation   |operation | abort/    |               |
 |        |            |          | terminate |               |
 -------------------------------------------------------------
 |        |            |          |           |               |
 | Secure |     Shared | Secure   |Normal     |Secure         |
 |        |            |          |           |               |
 | Shared |     Shared | Secure   |Normal     |Shared         |
 |        |            |          |           |               |
 | Normal |     Shared | Secure   |Normal     |Secure         |
 --------------------------------------------------------------

 7. Life cycle of a VM
    --------------------
 --------------------------------------------------------------------
 |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
 |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
 |         |           |          |         |           |           |
 --------- ----------------------------------------------------------
 |         |           |          |         |           |           |
 | Normal  | Normal    | Transient|Error    |Error      |Normal     |
 |         |           |          |         |           |           |
 | Secure  |   Error   | Error    |Error    |Error      |Normal     |
 |         |           |          |         |           |           |
 |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
 --------------------------------------------------------------------

************************************************************************

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 187 +++++++++++++++++++++++++++++++++----
 1 file changed, 168 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 0baa293..df2e272 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN	(1UL << 63)
+/*
+ * States of a GFN
+ * ---------------
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ *	a Secure VM, the contents of the GFN is not accessible
+ *	to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ *	or can be backed by a normal-PFN with contents encrypted.
+ *	The former is true when the GFN is paged-in into the
+ *	ultravisor. The latter is true when the GFN is paged-out
+ *	of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ *	a secure VM. The contents of the GFN is accessible to
+ *	Hypervisor. This GFN is backed by a normal-PFN and its
+ *	content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ *	a normal VM. The contents of the GFN is accesible to
+ *	the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---------------
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ *	the hypervisor.  All its GFNs are normal-GFNs.
+ *
+ * Secure VM: A VM whose contents are not accessible to the
+ *	hypervisor without the VM's consent.  Its GFNs are
+ *	either Shared-GFN or Secure-GFNs.
+ *
+ * Transient VM: A Normal VM that is transitioning to secure VM.
+ *	The transition starts on successful return of
+ *	H_SVM_INIT_START, and ends on successful return
+ *	of H_SVM_INIT_DONE. This transient VM, can have GFNs
+ *	in any of the three states; i.e Secure-GFN, Shared-GFN,
+ *	and Normal-GFN.	The VM never executes in this state
+ *	in supervisor-mode.
+ *
+ * Memory slot State.
+ * -----------------------------
+ *	The state of a memory slot mirrors the state of the
+ *	VM the memory slot is associated with.
+ *
+ * VM State transition.
+ * --------------------
+ *
+ *  A VM always starts in Normal Mode.
+ *
+ *  H_SVM_INIT_START moves the VM into transient state. During this
+ *  time the Ultravisor may request some of its GFNs to be shared or
+ *  secured. So its GFNs can be in one of the three GFN states.
+ *
+ *  H_SVM_INIT_DONE moves the VM entirely from transient state to
+ *  secure-state. At this point any left-over normal-GFNs are
+ *  transitioned to Secure-GFN.
+ *
+ *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
+ *  All its GFNs are moved to Normal-GFNs.
+ *
+ *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
+ *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
+ *  Note: The contents of the normal-GFN is undefined at this point.
+ *
+ * GFN state implementation:
+ * -------------------------
+ *
+ * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
+ * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
+ * set, and contains the value of the secure-PFN.
+ * It is associated with a normal-PFN; also called mem_pfn, when
+ * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
+ * The value of the normal-PFN is not tracked.
+ *
+ * Shared GFN is associated with a normal-PFN. Its pfn[] has
+ * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
+ * is not tracked.
+ *
+ * Normal GFN is associated with normal-PFN. Its pfn[] has
+ * no flag set. The value of the normal-PFN is not tracked.
+ *
+ * Life cycle of a GFN
+ * --------------------
+ *
+ * --------------------------------------------------------------
+ * |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
+ * |        |operation   |operation | abort/    |               |
+ * |        |            |          | terminate |               |
+ * -------------------------------------------------------------
+ * |        |            |          |           |               |
+ * | Secure |     Shared | Secure   |Normal     |Secure         |
+ * |        |            |          |           |               |
+ * | Shared |     Shared | Secure   |Normal     |Shared         |
+ * |        |            |          |           |               |
+ * | Normal |     Shared | Secure   |Normal     |Secure         |
+ * --------------------------------------------------------------
+ *
+ * Life cycle of a VM
+ * --------------------
+ *
+ * --------------------------------------------------------------------
+ * |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
+ * |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
+ * |         |           |          |         |           |           |
+ * --------- ----------------------------------------------------------
+ * |         |           |          |         |           |           |
+ * | Normal  | Normal    | Transient|Error    |Error      |Normal     |
+ * |         |           |          |         |           |           |
+ * | Secure  |   Error   | Error    |Error    |Error      |Normal     |
+ * |         |           |          |         |           |           |
+ * |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
+ * --------------------------------------------------------------------
+ */
+
+#define KVMPPC_GFN_UVMEM_PFN	(1UL << 63)
+#define KVMPPC_GFN_MEM_PFN	(1UL << 62)
+#define KVMPPC_GFN_SHARED	(1UL << 61)
+#define KVMPPC_GFN_SECURE	(KVMPPC_GFN_UVMEM_PFN | KVMPPC_GFN_MEM_PFN)
+#define KVMPPC_GFN_FLAG_MASK	(KVMPPC_GFN_SECURE | KVMPPC_GFN_SHARED)
+#define KVMPPC_GFN_PFN_MASK	(~KVMPPC_GFN_FLAG_MASK)
 
 struct kvmppc_uvmem_slot {
 	struct list_head list;
@@ -106,11 +226,11 @@ struct kvmppc_uvmem_slot {
 	unsigned long base_pfn;
 	unsigned long *pfns;
 };
-
 struct kvmppc_uvmem_page_pvt {
 	struct kvm *kvm;
 	unsigned long gpa;
 	bool skip_page_out;
+	bool remove_gfn;
 };
 
 bool kvmppc_uvmem_available(void)
@@ -163,8 +283,8 @@ void kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot *slot)
 	mutex_unlock(&kvm->arch.uvmem_lock);
 }
 
-static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
-				    struct kvm *kvm)
+static void kvmppc_mark_gfn(unsigned long gfn, struct kvm *kvm,
+			unsigned long flag, unsigned long uvmem_pfn)
 {
 	struct kvmppc_uvmem_slot *p;
 
@@ -172,24 +292,41 @@ static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
 		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
 			unsigned long index = gfn - p->base_pfn;
 
-			p->pfns[index] = uvmem_pfn | KVMPPC_UVMEM_PFN;
+			if (flag == KVMPPC_GFN_UVMEM_PFN)
+				p->pfns[index] = uvmem_pfn | flag;
+			else
+				p->pfns[index] = flag;
 			return;
 		}
 	}
 }
 
-static void kvmppc_uvmem_pfn_remove(unsigned long gfn, struct kvm *kvm)
+/* mark the GFN as secure-GFN associated with @uvmem pfn device-PFN. */
+static void kvmppc_gfn_secure_uvmem_pfn(unsigned long gfn,
+			unsigned long uvmem_pfn, struct kvm *kvm)
 {
-	struct kvmppc_uvmem_slot *p;
+	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_UVMEM_PFN, uvmem_pfn);
+}
 
-	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
-		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
-			p->pfns[gfn - p->base_pfn] = 0;
-			return;
-		}
-	}
+/* mark the GFN as secure-GFN associated with a memory-PFN. */
+static void kvmppc_gfn_secure_mem_pfn(unsigned long gfn, struct kvm *kvm)
+{
+	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_MEM_PFN, 0);
+}
+
+/* mark the GFN as a shared GFN. */
+static void kvmppc_gfn_shared(unsigned long gfn, struct kvm *kvm)
+{
+	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_SHARED, 0);
+}
+
+/* mark the GFN as a non-existent GFN. */
+static void kvmppc_gfn_remove(unsigned long gfn, struct kvm *kvm)
+{
+	kvmppc_mark_gfn(gfn, kvm, 0, 0);
 }
 
+/* return true, if the GFN is a secure-GFN backed by a secure-PFN */
 static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 				    unsigned long *uvmem_pfn)
 {
@@ -199,10 +336,10 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
 			unsigned long index = gfn - p->base_pfn;
 
-			if (p->pfns[index] & KVMPPC_UVMEM_PFN) {
+			if (p->pfns[index] & KVMPPC_GFN_UVMEM_PFN) {
 				if (uvmem_pfn)
 					*uvmem_pfn = p->pfns[index] &
-						     ~KVMPPC_UVMEM_PFN;
+						     KVMPPC_GFN_PFN_MASK;
 				return true;
 			} else
 				return false;
@@ -353,6 +490,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 
 		mutex_lock(&kvm->arch.uvmem_lock);
 		if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+			kvmppc_gfn_remove(gfn, kvm);
 			mutex_unlock(&kvm->arch.uvmem_lock);
 			continue;
 		}
@@ -360,6 +498,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 		uvmem_page = pfn_to_page(uvmem_pfn);
 		pvt = uvmem_page->zone_device_data;
 		pvt->skip_page_out = skip_page_out;
+		pvt->remove_gfn = true;
 		mutex_unlock(&kvm->arch.uvmem_lock);
 
 		pfn = gfn_to_pfn(kvm, gfn);
@@ -429,7 +568,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
 		goto out_clear;
 
 	uvmem_pfn = bit + pfn_first;
-	kvmppc_uvmem_pfn_insert(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
+	kvmppc_gfn_secure_uvmem_pfn(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
 
 	pvt->gpa = gpa;
 	pvt->kvm = kvm;
@@ -524,6 +663,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 		uvmem_page = pfn_to_page(uvmem_pfn);
 		pvt = uvmem_page->zone_device_data;
 		pvt->skip_page_out = true;
+		pvt->remove_gfn = false;
 	}
 
 retry:
@@ -537,12 +677,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 		uvmem_page = pfn_to_page(uvmem_pfn);
 		pvt = uvmem_page->zone_device_data;
 		pvt->skip_page_out = true;
+		pvt->remove_gfn = false;
 		kvm_release_pfn_clean(pfn);
 		goto retry;
 	}
 
-	if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift))
+	if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
+				page_shift)) {
+		kvmppc_gfn_shared(gfn, kvm);
 		ret = H_SUCCESS;
+	}
 	kvm_release_pfn_clean(pfn);
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
@@ -598,6 +742,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 
 	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
 		ret = H_SUCCESS;
+
 out_unlock:
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
@@ -706,7 +851,8 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf)
 /*
  * Release the device PFN back to the pool
  *
- * Gets called when secure page becomes a normal page during H_SVM_PAGE_OUT.
+ * Gets called when secure GFN tranistions from a secure-PFN
+ * to a normal PFN during H_SVM_PAGE_OUT.
  * Gets called with kvm->arch.uvmem_lock held.
  */
 static void kvmppc_uvmem_page_free(struct page *page)
@@ -721,7 +867,10 @@ static void kvmppc_uvmem_page_free(struct page *page)
 
 	pvt = page->zone_device_data;
 	page->zone_device_data = NULL;
-	kvmppc_uvmem_pfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
+	if (pvt->remove_gfn)
+		kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
+	else
+		kvmppc_gfn_secure_mem_pfn(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
 	kfree(pvt);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
@ 2020-07-17  8:00   ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

During the life of SVM, its GFNs transition through normal, secure and
shared states. Since the kernel does not track GFNs that are shared, it
is not possible to disambiguate a shared GFN from a GFN whose PFN has
not yet been migrated to a secure-PFN. Also it is not possible to
disambiguate a secure-GFN from a GFN whose GFN has been pagedout from
the ultravisor.

The ability to identify the state of a GFN is needed to skip migration
of its PFN to secure-PFN during ESM transition.

The code is re-organized to track the states of a GFN as explained
below.

************************************************************************
 1. States of a GFN
    ---------------
 The GFN can be in one of the following states.

 (a) Secure - The GFN is secure. The GFN is associated with
 	a Secure VM, the contents of the GFN is not accessible
 	to the Hypervisor.  This GFN can be backed by a secure-PFN,
 	or can be backed by a normal-PFN with contents encrypted.
 	The former is true when the GFN is paged-in into the
 	ultravisor. The latter is true when the GFN is paged-out
 	of the ultravisor.

 (b) Shared - The GFN is shared. The GFN is associated with a
 	a secure VM. The contents of the GFN is accessible to
 	Hypervisor. This GFN is backed by a normal-PFN and its
 	content is un-encrypted.

 (c) Normal - The GFN is a normal. The GFN is associated with
 	a normal VM. The contents of the GFN is accesible to
 	the Hypervisor. Its content is never encrypted.

 2. States of a VM.
    ---------------

 (a) Normal VM:  A VM whose contents are always accessible to
 	the hypervisor.  All its GFNs are normal-GFNs.

 (b) Secure VM: A VM whose contents are not accessible to the
 	hypervisor without the VM's consent.  Its GFNs are
 	either Shared-GFN or Secure-GFNs.

 (c) Transient VM: A Normal VM that is transitioning to secure VM.
 	The transition starts on successful return of
 	H_SVM_INIT_START, and ends on successful return
 	of H_SVM_INIT_DONE. This transient VM, can have GFNs
 	in any of the three states; i.e Secure-GFN, Shared-GFN,
 	and Normal-GFN.	The VM never executes in this state
 	in supervisor-mode.

 3. Memory slot State.
    ------------------
  	The state of a memory slot mirrors the state of the
  	VM the memory slot is associated with.

 4. VM State transition.
    --------------------

  A VM always starts in Normal Mode.

  H_SVM_INIT_START moves the VM into transient state. During this
  time the Ultravisor may request some of its GFNs to be shared or
  secured. So its GFNs can be in one of the three GFN states.

  H_SVM_INIT_DONE moves the VM entirely from transient state to
  secure-state. At this point any left-over normal-GFNs are
  transitioned to Secure-GFN.

  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
  All its GFNs are moved to Normal-GFNs.

  UV_TERMINATE transitions the secure-VM back to normal-VM. All
  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
  Note: The contents of the normal-GFN is undefined at this point.

 5. GFN state implementation:
    -------------------------

 Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
 when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
 set, and contains the value of the secure-PFN.
 It is associated with a normal-PFN; also called mem_pfn, when
 the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
 The value of the normal-PFN is not tracked.

 Shared GFN is associated with a normal-PFN. Its pfn[] has
 KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
 is not tracked.

 Normal GFN is associated with normal-PFN. Its pfn[] has
 no flag set. The value of the normal-PFN is not tracked.

 6. Life cycle of a GFN
    --------------------
 --------------------------------------------------------------
 |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
 |        |operation   |operation | abort/    |               |
 |        |            |          | terminate |               |
 -------------------------------------------------------------
 |        |            |          |           |               |
 | Secure |     Shared | Secure   |Normal     |Secure         |
 |        |            |          |           |               |
 | Shared |     Shared | Secure   |Normal     |Shared         |
 |        |            |          |           |               |
 | Normal |     Shared | Secure   |Normal     |Secure         |
 --------------------------------------------------------------

 7. Life cycle of a VM
    --------------------
 --------------------------------------------------------------------
 |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
 |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
 |         |           |          |         |           |           |
 --------- ----------------------------------------------------------
 |         |           |          |         |           |           |
 | Normal  | Normal    | Transient|Error    |Error      |Normal     |
 |         |           |          |         |           |           |
 | Secure  |   Error   | Error    |Error    |Error      |Normal     |
 |         |           |          |         |           |           |
 |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
 --------------------------------------------------------------------

************************************************************************

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 187 +++++++++++++++++++++++++++++++++----
 1 file changed, 168 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 0baa293..df2e272 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -98,7 +98,127 @@
 static unsigned long *kvmppc_uvmem_bitmap;
 static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
-#define KVMPPC_UVMEM_PFN	(1UL << 63)
+/*
+ * States of a GFN
+ * ---------------
+ * The GFN can be in one of the following states.
+ *
+ * (a) Secure - The GFN is secure. The GFN is associated with
+ *	a Secure VM, the contents of the GFN is not accessible
+ *	to the Hypervisor.  This GFN can be backed by a secure-PFN,
+ *	or can be backed by a normal-PFN with contents encrypted.
+ *	The former is true when the GFN is paged-in into the
+ *	ultravisor. The latter is true when the GFN is paged-out
+ *	of the ultravisor.
+ *
+ * (b) Shared - The GFN is shared. The GFN is associated with a
+ *	a secure VM. The contents of the GFN is accessible to
+ *	Hypervisor. This GFN is backed by a normal-PFN and its
+ *	content is un-encrypted.
+ *
+ * (c) Normal - The GFN is a normal. The GFN is associated with
+ *	a normal VM. The contents of the GFN is accesible to
+ *	the Hypervisor. Its content is never encrypted.
+ *
+ * States of a VM.
+ * ---------------
+ *
+ * Normal VM:  A VM whose contents are always accessible to
+ *	the hypervisor.  All its GFNs are normal-GFNs.
+ *
+ * Secure VM: A VM whose contents are not accessible to the
+ *	hypervisor without the VM's consent.  Its GFNs are
+ *	either Shared-GFN or Secure-GFNs.
+ *
+ * Transient VM: A Normal VM that is transitioning to secure VM.
+ *	The transition starts on successful return of
+ *	H_SVM_INIT_START, and ends on successful return
+ *	of H_SVM_INIT_DONE. This transient VM, can have GFNs
+ *	in any of the three states; i.e Secure-GFN, Shared-GFN,
+ *	and Normal-GFN.	The VM never executes in this state
+ *	in supervisor-mode.
+ *
+ * Memory slot State.
+ * -----------------------------
+ *	The state of a memory slot mirrors the state of the
+ *	VM the memory slot is associated with.
+ *
+ * VM State transition.
+ * --------------------
+ *
+ *  A VM always starts in Normal Mode.
+ *
+ *  H_SVM_INIT_START moves the VM into transient state. During this
+ *  time the Ultravisor may request some of its GFNs to be shared or
+ *  secured. So its GFNs can be in one of the three GFN states.
+ *
+ *  H_SVM_INIT_DONE moves the VM entirely from transient state to
+ *  secure-state. At this point any left-over normal-GFNs are
+ *  transitioned to Secure-GFN.
+ *
+ *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
+ *  All its GFNs are moved to Normal-GFNs.
+ *
+ *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
+ *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
+ *  Note: The contents of the normal-GFN is undefined at this point.
+ *
+ * GFN state implementation:
+ * -------------------------
+ *
+ * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
+ * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
+ * set, and contains the value of the secure-PFN.
+ * It is associated with a normal-PFN; also called mem_pfn, when
+ * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
+ * The value of the normal-PFN is not tracked.
+ *
+ * Shared GFN is associated with a normal-PFN. Its pfn[] has
+ * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
+ * is not tracked.
+ *
+ * Normal GFN is associated with normal-PFN. Its pfn[] has
+ * no flag set. The value of the normal-PFN is not tracked.
+ *
+ * Life cycle of a GFN
+ * --------------------
+ *
+ * --------------------------------------------------------------
+ * |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
+ * |        |operation   |operation | abort/    |               |
+ * |        |            |          | terminate |               |
+ * -------------------------------------------------------------
+ * |        |            |          |           |               |
+ * | Secure |     Shared | Secure   |Normal     |Secure         |
+ * |        |            |          |           |               |
+ * | Shared |     Shared | Secure   |Normal     |Shared         |
+ * |        |            |          |           |               |
+ * | Normal |     Shared | Secure   |Normal     |Secure         |
+ * --------------------------------------------------------------
+ *
+ * Life cycle of a VM
+ * --------------------
+ *
+ * --------------------------------------------------------------------
+ * |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
+ * |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
+ * |         |           |          |         |           |           |
+ * --------- ----------------------------------------------------------
+ * |         |           |          |         |           |           |
+ * | Normal  | Normal    | Transient|Error    |Error      |Normal     |
+ * |         |           |          |         |           |           |
+ * | Secure  |   Error   | Error    |Error    |Error      |Normal     |
+ * |         |           |          |         |           |           |
+ * |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
+ * --------------------------------------------------------------------
+ */
+
+#define KVMPPC_GFN_UVMEM_PFN	(1UL << 63)
+#define KVMPPC_GFN_MEM_PFN	(1UL << 62)
+#define KVMPPC_GFN_SHARED	(1UL << 61)
+#define KVMPPC_GFN_SECURE	(KVMPPC_GFN_UVMEM_PFN | KVMPPC_GFN_MEM_PFN)
+#define KVMPPC_GFN_FLAG_MASK	(KVMPPC_GFN_SECURE | KVMPPC_GFN_SHARED)
+#define KVMPPC_GFN_PFN_MASK	(~KVMPPC_GFN_FLAG_MASK)
 
 struct kvmppc_uvmem_slot {
 	struct list_head list;
@@ -106,11 +226,11 @@ struct kvmppc_uvmem_slot {
 	unsigned long base_pfn;
 	unsigned long *pfns;
 };
-
 struct kvmppc_uvmem_page_pvt {
 	struct kvm *kvm;
 	unsigned long gpa;
 	bool skip_page_out;
+	bool remove_gfn;
 };
 
 bool kvmppc_uvmem_available(void)
@@ -163,8 +283,8 @@ void kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot *slot)
 	mutex_unlock(&kvm->arch.uvmem_lock);
 }
 
-static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
-				    struct kvm *kvm)
+static void kvmppc_mark_gfn(unsigned long gfn, struct kvm *kvm,
+			unsigned long flag, unsigned long uvmem_pfn)
 {
 	struct kvmppc_uvmem_slot *p;
 
@@ -172,24 +292,41 @@ static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
 		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
 			unsigned long index = gfn - p->base_pfn;
 
-			p->pfns[index] = uvmem_pfn | KVMPPC_UVMEM_PFN;
+			if (flag = KVMPPC_GFN_UVMEM_PFN)
+				p->pfns[index] = uvmem_pfn | flag;
+			else
+				p->pfns[index] = flag;
 			return;
 		}
 	}
 }
 
-static void kvmppc_uvmem_pfn_remove(unsigned long gfn, struct kvm *kvm)
+/* mark the GFN as secure-GFN associated with @uvmem pfn device-PFN. */
+static void kvmppc_gfn_secure_uvmem_pfn(unsigned long gfn,
+			unsigned long uvmem_pfn, struct kvm *kvm)
 {
-	struct kvmppc_uvmem_slot *p;
+	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_UVMEM_PFN, uvmem_pfn);
+}
 
-	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
-		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
-			p->pfns[gfn - p->base_pfn] = 0;
-			return;
-		}
-	}
+/* mark the GFN as secure-GFN associated with a memory-PFN. */
+static void kvmppc_gfn_secure_mem_pfn(unsigned long gfn, struct kvm *kvm)
+{
+	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_MEM_PFN, 0);
+}
+
+/* mark the GFN as a shared GFN. */
+static void kvmppc_gfn_shared(unsigned long gfn, struct kvm *kvm)
+{
+	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_SHARED, 0);
+}
+
+/* mark the GFN as a non-existent GFN. */
+static void kvmppc_gfn_remove(unsigned long gfn, struct kvm *kvm)
+{
+	kvmppc_mark_gfn(gfn, kvm, 0, 0);
 }
 
+/* return true, if the GFN is a secure-GFN backed by a secure-PFN */
 static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 				    unsigned long *uvmem_pfn)
 {
@@ -199,10 +336,10 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
 			unsigned long index = gfn - p->base_pfn;
 
-			if (p->pfns[index] & KVMPPC_UVMEM_PFN) {
+			if (p->pfns[index] & KVMPPC_GFN_UVMEM_PFN) {
 				if (uvmem_pfn)
 					*uvmem_pfn = p->pfns[index] &
-						     ~KVMPPC_UVMEM_PFN;
+						     KVMPPC_GFN_PFN_MASK;
 				return true;
 			} else
 				return false;
@@ -353,6 +490,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 
 		mutex_lock(&kvm->arch.uvmem_lock);
 		if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+			kvmppc_gfn_remove(gfn, kvm);
 			mutex_unlock(&kvm->arch.uvmem_lock);
 			continue;
 		}
@@ -360,6 +498,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 		uvmem_page = pfn_to_page(uvmem_pfn);
 		pvt = uvmem_page->zone_device_data;
 		pvt->skip_page_out = skip_page_out;
+		pvt->remove_gfn = true;
 		mutex_unlock(&kvm->arch.uvmem_lock);
 
 		pfn = gfn_to_pfn(kvm, gfn);
@@ -429,7 +568,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
 		goto out_clear;
 
 	uvmem_pfn = bit + pfn_first;
-	kvmppc_uvmem_pfn_insert(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
+	kvmppc_gfn_secure_uvmem_pfn(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
 
 	pvt->gpa = gpa;
 	pvt->kvm = kvm;
@@ -524,6 +663,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 		uvmem_page = pfn_to_page(uvmem_pfn);
 		pvt = uvmem_page->zone_device_data;
 		pvt->skip_page_out = true;
+		pvt->remove_gfn = false;
 	}
 
 retry:
@@ -537,12 +677,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 		uvmem_page = pfn_to_page(uvmem_pfn);
 		pvt = uvmem_page->zone_device_data;
 		pvt->skip_page_out = true;
+		pvt->remove_gfn = false;
 		kvm_release_pfn_clean(pfn);
 		goto retry;
 	}
 
-	if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift))
+	if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
+				page_shift)) {
+		kvmppc_gfn_shared(gfn, kvm);
 		ret = H_SUCCESS;
+	}
 	kvm_release_pfn_clean(pfn);
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
@@ -598,6 +742,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 
 	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
 		ret = H_SUCCESS;
+
 out_unlock:
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
@@ -706,7 +851,8 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf)
 /*
  * Release the device PFN back to the pool
  *
- * Gets called when secure page becomes a normal page during H_SVM_PAGE_OUT.
+ * Gets called when secure GFN tranistions from a secure-PFN
+ * to a normal PFN during H_SVM_PAGE_OUT.
  * Gets called with kvm->arch.uvmem_lock held.
  */
 static void kvmppc_uvmem_page_free(struct page *page)
@@ -721,7 +867,10 @@ static void kvmppc_uvmem_page_free(struct page *page)
 
 	pvt = page->zone_device_data;
 	page->zone_device_data = NULL;
-	kvmppc_uvmem_pfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
+	if (pvt->remove_gfn)
+		kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
+	else
+		kvmppc_gfn_secure_mem_pfn(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
 	kfree(pvt);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.
  2020-07-17  8:00 ` Ram Pai
@ 2020-07-17  8:00   ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the
pages of the SVM before calling H_SVM_INIT_DONE. This causes a huge
delay in tranistioning the VM to SVM. The Ultravisor is only interested
in the pages that contain the kernel, initrd and other important data
structures. The rest contain throw-away content.

However if not all pages are requested by the Ultravisor, the Hypervisor
continues to consider the GFNs corresponding to the non-requested pages
as normal GFNs. This can lead to data-corruption and undefined behavior.

In H_SVM_INIT_DONE handler, move all the PFNs associated with the SVM's
GFNs to secure-PFNs. Skip the GFNs that are already Paged-in or Shared
or Paged-in followed by a Paged-out.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 Documentation/powerpc/ultravisor.rst        |   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 135 +++++++++++++++++++++++++---
 3 files changed, 125 insertions(+), 14 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index a1c8c37..ba6b1bf 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -934,6 +934,8 @@ Return values
 	* H_UNSUPPORTED		if called from the wrong context (e.g.
 				from an SVM or before an H_SVM_INIT_START
 				hypercall).
+	* H_STATE		if the hypervisor could not successfully
+                                transition the VM to Secure VM.
 
 Description
 ~~~~~~~~~~~
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 9cb7d8b..f229ab5 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 			     struct kvm *kvm, bool skip_page_out);
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+			const struct kvm_memory_slot *memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index df2e272..3274663 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include <asm/ultravisor.h>
 #include <asm/mman.h>
 #include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s_uvmem.h>
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -348,8 +349,45 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 	return false;
 }
 
+/*
+ * starting from *gfn search for the next available GFN that is not yet
+ * transitioned to a secure GFN.  return the value of that GFN in *gfn.  If a
+ * GFN is found, return true, else return false
+ */
+static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot *memslot,
+		struct kvm *kvm, unsigned long *gfn)
+{
+	struct kvmppc_uvmem_slot *p;
+	bool ret = false;
+	unsigned long i;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+
+	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list)
+		if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns)
+			break;
+	if (!p)
+		goto out;
+	/*
+	 * The code below assumes, one to one correspondence between
+	 * kvmppc_uvmem_slot and memslot.
+	 */
+	for (i = *gfn; i < p->base_pfn + p->nr_pfns; i++) {
+		unsigned long index = i - p->base_pfn;
+
+		if (!(p->pfns[index] & KVMPPC_GFN_FLAG_MASK)) {
+			*gfn = i;
+			ret = true;
+			break;
+		}
+	}
+out:
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	return ret;
+}
+
 static int kvmppc_memslot_page_merge(struct kvm *kvm,
-		struct kvm_memory_slot *memslot, bool merge)
+		const struct kvm_memory_slot *memslot, bool merge)
 {
 	unsigned long gfn = memslot->base_gfn;
 	unsigned long end, start = gfn_to_hva(kvm, gfn);
@@ -461,12 +499,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 {
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int srcu_idx;
+	long ret = H_SUCCESS;
+
 	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
 		return H_UNSUPPORTED;
 
+	/* migrate any unmoved normal pfn to device pfns*/
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	slots = kvm_memslots(kvm);
+	kvm_for_each_memslot(memslot, slots) {
+		ret = kvmppc_uv_migrate_mem_slot(kvm, memslot);
+		if (ret) {
+			ret = H_STATE;
+			goto out;
+		}
+	}
+
 	kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
 	pr_info("LPID %d went secure\n", kvm->arch.lpid);
-	return H_SUCCESS;
+
+out:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return ret;
 }
 
 /*
@@ -587,12 +644,14 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
 }
 
 /*
- * Alloc a PFN from private device memory pool and copy page from normal
- * memory to secure memory using UV_PAGE_IN uvcall.
+ * Alloc a PFN from private device memory pool. If @pagein is true,
+ * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
-		   unsigned long end, unsigned long gpa, struct kvm *kvm,
-		   unsigned long page_shift)
+static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
+		unsigned long start,
+		unsigned long end, unsigned long gpa, struct kvm *kvm,
+		unsigned long page_shift,
+		bool pagein)
 {
 	unsigned long src_pfn, dst_pfn = 0;
 	struct migrate_vma mig;
@@ -623,11 +682,16 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 		goto out_finalize;
 	}
 
-	pfn = *mig.src >> MIGRATE_PFN_SHIFT;
-	spage = migrate_pfn_to_page(*mig.src);
-	if (spage)
-		uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
-			   page_shift);
+	if (pagein) {
+		pfn = *mig.src >> MIGRATE_PFN_SHIFT;
+		spage = migrate_pfn_to_page(*mig.src);
+		if (spage) {
+			ret = uv_page_in(kvm->arch.lpid, pfn << page_shift,
+					gpa, 0, page_shift);
+			if (ret)
+				goto out_finalize;
+		}
+	}
 
 	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
 	migrate_vma_pages(&mig);
@@ -636,6 +700,46 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	return ret;
 }
 
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+		const struct kvm_memory_slot *memslot)
+{
+	unsigned long gfn = memslot->base_gfn;
+	struct vm_area_struct *vma;
+	unsigned long start, end;
+	int ret = 0;
+
+	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {
+
+		mmap_read_lock(kvm->mm);
+		start = gfn_to_hva(kvm, gfn);
+		if (kvm_is_error_hva(start)) {
+			ret = H_STATE;
+			goto next;
+		}
+
+		end = start + (1UL << PAGE_SHIFT);
+		vma = find_vma_intersection(kvm->mm, start, end);
+		if (!vma || vma->vm_start > start || vma->vm_end < end) {
+			ret = H_STATE;
+			goto next;
+		}
+
+		mutex_lock(&kvm->arch.uvmem_lock);
+		ret = kvmppc_svm_migrate_page(vma, start, end,
+				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
+		mutex_unlock(&kvm->arch.uvmem_lock);
+		if (ret)
+			ret = H_STATE;
+
+next:
+		mmap_read_unlock(kvm->mm);
+
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 /*
  * Shares the page with HV, thus making it a normal page.
  *
@@ -740,8 +844,11 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (!vma || vma->vm_start > start || vma->vm_end < end)
 		goto out_unlock;
 
-	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
-		ret = H_SUCCESS;
+	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
+				true))
+		goto out_unlock;
+
+	ret = H_SUCCESS;
 
 out_unlock:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.
@ 2020-07-17  8:00   ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the
pages of the SVM before calling H_SVM_INIT_DONE. This causes a huge
delay in tranistioning the VM to SVM. The Ultravisor is only interested
in the pages that contain the kernel, initrd and other important data
structures. The rest contain throw-away content.

However if not all pages are requested by the Ultravisor, the Hypervisor
continues to consider the GFNs corresponding to the non-requested pages
as normal GFNs. This can lead to data-corruption and undefined behavior.

In H_SVM_INIT_DONE handler, move all the PFNs associated with the SVM's
GFNs to secure-PFNs. Skip the GFNs that are already Paged-in or Shared
or Paged-in followed by a Paged-out.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 Documentation/powerpc/ultravisor.rst        |   2 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 135 +++++++++++++++++++++++++---
 3 files changed, 125 insertions(+), 14 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index a1c8c37..ba6b1bf 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -934,6 +934,8 @@ Return values
 	* H_UNSUPPORTED		if called from the wrong context (e.g.
 				from an SVM or before an H_SVM_INIT_START
 				hypercall).
+	* H_STATE		if the hypervisor could not successfully
+                                transition the VM to Secure VM.
 
 Description
 ~~~~~~~~~~~
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 9cb7d8b..f229ab5 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 			     struct kvm *kvm, bool skip_page_out);
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+			const struct kvm_memory_slot *memslot);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index df2e272..3274663 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include <asm/ultravisor.h>
 #include <asm/mman.h>
 #include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s_uvmem.h>
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -348,8 +349,45 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
 	return false;
 }
 
+/*
+ * starting from *gfn search for the next available GFN that is not yet
+ * transitioned to a secure GFN.  return the value of that GFN in *gfn.  If a
+ * GFN is found, return true, else return false
+ */
+static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot *memslot,
+		struct kvm *kvm, unsigned long *gfn)
+{
+	struct kvmppc_uvmem_slot *p;
+	bool ret = false;
+	unsigned long i;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+
+	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list)
+		if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns)
+			break;
+	if (!p)
+		goto out;
+	/*
+	 * The code below assumes, one to one correspondence between
+	 * kvmppc_uvmem_slot and memslot.
+	 */
+	for (i = *gfn; i < p->base_pfn + p->nr_pfns; i++) {
+		unsigned long index = i - p->base_pfn;
+
+		if (!(p->pfns[index] & KVMPPC_GFN_FLAG_MASK)) {
+			*gfn = i;
+			ret = true;
+			break;
+		}
+	}
+out:
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	return ret;
+}
+
 static int kvmppc_memslot_page_merge(struct kvm *kvm,
-		struct kvm_memory_slot *memslot, bool merge)
+		const struct kvm_memory_slot *memslot, bool merge)
 {
 	unsigned long gfn = memslot->base_gfn;
 	unsigned long end, start = gfn_to_hva(kvm, gfn);
@@ -461,12 +499,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 {
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int srcu_idx;
+	long ret = H_SUCCESS;
+
 	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
 		return H_UNSUPPORTED;
 
+	/* migrate any unmoved normal pfn to device pfns*/
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	slots = kvm_memslots(kvm);
+	kvm_for_each_memslot(memslot, slots) {
+		ret = kvmppc_uv_migrate_mem_slot(kvm, memslot);
+		if (ret) {
+			ret = H_STATE;
+			goto out;
+		}
+	}
+
 	kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
 	pr_info("LPID %d went secure\n", kvm->arch.lpid);
-	return H_SUCCESS;
+
+out:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return ret;
 }
 
 /*
@@ -587,12 +644,14 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
 }
 
 /*
- * Alloc a PFN from private device memory pool and copy page from normal
- * memory to secure memory using UV_PAGE_IN uvcall.
+ * Alloc a PFN from private device memory pool. If @pagein is true,
+ * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
  */
-static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
-		   unsigned long end, unsigned long gpa, struct kvm *kvm,
-		   unsigned long page_shift)
+static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
+		unsigned long start,
+		unsigned long end, unsigned long gpa, struct kvm *kvm,
+		unsigned long page_shift,
+		bool pagein)
 {
 	unsigned long src_pfn, dst_pfn = 0;
 	struct migrate_vma mig;
@@ -623,11 +682,16 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 		goto out_finalize;
 	}
 
-	pfn = *mig.src >> MIGRATE_PFN_SHIFT;
-	spage = migrate_pfn_to_page(*mig.src);
-	if (spage)
-		uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
-			   page_shift);
+	if (pagein) {
+		pfn = *mig.src >> MIGRATE_PFN_SHIFT;
+		spage = migrate_pfn_to_page(*mig.src);
+		if (spage) {
+			ret = uv_page_in(kvm->arch.lpid, pfn << page_shift,
+					gpa, 0, page_shift);
+			if (ret)
+				goto out_finalize;
+		}
+	}
 
 	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
 	migrate_vma_pages(&mig);
@@ -636,6 +700,46 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	return ret;
 }
 
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+		const struct kvm_memory_slot *memslot)
+{
+	unsigned long gfn = memslot->base_gfn;
+	struct vm_area_struct *vma;
+	unsigned long start, end;
+	int ret = 0;
+
+	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {
+
+		mmap_read_lock(kvm->mm);
+		start = gfn_to_hva(kvm, gfn);
+		if (kvm_is_error_hva(start)) {
+			ret = H_STATE;
+			goto next;
+		}
+
+		end = start + (1UL << PAGE_SHIFT);
+		vma = find_vma_intersection(kvm->mm, start, end);
+		if (!vma || vma->vm_start > start || vma->vm_end < end) {
+			ret = H_STATE;
+			goto next;
+		}
+
+		mutex_lock(&kvm->arch.uvmem_lock);
+		ret = kvmppc_svm_migrate_page(vma, start, end,
+				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
+		mutex_unlock(&kvm->arch.uvmem_lock);
+		if (ret)
+			ret = H_STATE;
+
+next:
+		mmap_read_unlock(kvm->mm);
+
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 /*
  * Shares the page with HV, thus making it a normal page.
  *
@@ -740,8 +844,11 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (!vma || vma->vm_start > start || vma->vm_end < end)
 		goto out_unlock;
 
-	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
-		ret = H_SUCCESS;
+	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
+				true))
+		goto out_unlock;
+
+	ret = H_SUCCESS;
 
 out_unlock:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out
  2020-07-17  8:00 ` Ram Pai
@ 2020-07-17  8:00   ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

The page requested for page-in; sometimes, can have transient
references, and hence cannot migrate immediately. Retry a few times
before returning error.

The same is true for non-migrated pages that are migrated in
H_SVM_INIT_DONE hanlder.  Retry a few times before returning error.

H_SVM_PAGE_IN interface is enhanced to return H_BUSY if the page is
not in a migratable state.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 Documentation/powerpc/ultravisor.rst |   1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 106 ++++++++++++++++++++++++-----------
 2 files changed, 74 insertions(+), 33 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index ba6b1bf..fe533ad 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -1035,6 +1035,7 @@ Return values
 	* H_PARAMETER	if ``guest_pa`` is invalid.
 	* H_P2		if ``flags`` is invalid.
 	* H_P3		if ``order`` of page is invalid.
+	* H_BUSY	if ``page`` is not in a state to pagein
 
 Description
 ~~~~~~~~~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 3274663..a206984 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -672,7 +672,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
 		return ret;
 
 	if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
-		ret = -1;
+		ret = -2;
 		goto out_finalize;
 	}
 
@@ -700,43 +700,73 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
 	return ret;
 }
 
-int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
-		const struct kvm_memory_slot *memslot)
+/*
+ * return 1, if some page migration failed because of transient error,
+ * while the remaining pages migrated successfully.
+ * The caller can use this as a hint to retry.
+ *
+ * return 0 otherwise. *ret indicates the success status
+ * of this call.
+ */
+static bool __kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+		const struct kvm_memory_slot *memslot, int *ret)
 {
 	unsigned long gfn = memslot->base_gfn;
 	struct vm_area_struct *vma;
 	unsigned long start, end;
-	int ret = 0;
+	bool retry = false;
 
+	*ret = 0;
 	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {
 
 		mmap_read_lock(kvm->mm);
 		start = gfn_to_hva(kvm, gfn);
 		if (kvm_is_error_hva(start)) {
-			ret = H_STATE;
+			*ret = H_STATE;
 			goto next;
 		}
 
 		end = start + (1UL << PAGE_SHIFT);
 		vma = find_vma_intersection(kvm->mm, start, end);
 		if (!vma || vma->vm_start > start || vma->vm_end < end) {
-			ret = H_STATE;
+			*ret = H_STATE;
 			goto next;
 		}
 
 		mutex_lock(&kvm->arch.uvmem_lock);
-		ret = kvmppc_svm_migrate_page(vma, start, end,
+		*ret = kvmppc_svm_migrate_page(vma, start, end,
 				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
 		mutex_unlock(&kvm->arch.uvmem_lock);
-		if (ret)
-			ret = H_STATE;
 
 next:
 		mmap_read_unlock(kvm->mm);
+		if (*ret == -2) {
+			retry = true;
+			continue;
+		}
 
-		if (ret)
-			break;
+		if (*ret)
+			return false;
 	}
+	return retry;
+}
+
+#define REPEAT_COUNT 10
+
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+		const struct kvm_memory_slot *memslot)
+{
+	int ret = 0, repeat_count = REPEAT_COUNT;
+
+	/*
+	 * try migration of pages in the memslot 'repeat_count' number of
+	 * times, provided each time migration fails because of transient
+	 * errors only.
+	 */
+	while (__kvmppc_uv_migrate_mem_slot(kvm, memslot, &ret) &&
+		repeat_count--)
+		;
+
 	return ret;
 }
 
@@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	struct vm_area_struct *vma;
 	int srcu_idx;
 	unsigned long gfn = gpa >> page_shift;
-	int ret;
+	int ret, repeat_count = REPEAT_COUNT;
 
 	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
 		return H_UNSUPPORTED;
@@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (flags & H_PAGE_IN_SHARED)
 		return kvmppc_share_page(kvm, gpa, page_shift);
 
-	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-	mmap_read_lock(kvm->mm);
 
-	start = gfn_to_hva(kvm, gfn);
-	if (kvm_is_error_hva(start))
-		goto out;
-
-	mutex_lock(&kvm->arch.uvmem_lock);
 	/* Fail the page-in request of an already paged-in page */
-	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
-		goto out_unlock;
+	mutex_lock(&kvm->arch.uvmem_lock);
+	ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	if (ret) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
+		return H_PARAMETER;
+	}
 
-	end = start + (1UL << page_shift);
-	vma = find_vma_intersection(kvm->mm, start, end);
-	if (!vma || vma->vm_start > start || vma->vm_end < end)
-		goto out_unlock;
+	do {
+		ret = H_PARAMETER;
+		mmap_read_lock(kvm->mm);
 
-	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
-				true))
-		goto out_unlock;
+		start = gfn_to_hva(kvm, gfn);
+		if (kvm_is_error_hva(start)) {
+			mmap_read_unlock(kvm->mm);
+			break;
+		}
 
-	ret = H_SUCCESS;
+		end = start + (1UL << page_shift);
+		vma = find_vma_intersection(kvm->mm, start, end);
+		if (!vma || vma->vm_start > start || vma->vm_end < end) {
+			mmap_read_unlock(kvm->mm);
+			break;
+		}
+
+		mutex_lock(&kvm->arch.uvmem_lock);
+		ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift, true);
+		mutex_unlock(&kvm->arch.uvmem_lock);
+
+		mmap_read_unlock(kvm->mm);
+	} while (ret == -2 && repeat_count--);
+
+	if (ret == -2)
+		ret = H_BUSY;
 
-out_unlock:
-	mutex_unlock(&kvm->arch.uvmem_lock);
-out:
-	mmap_read_unlock(kvm->mm);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out
@ 2020-07-17  8:00   ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

The page requested for page-in; sometimes, can have transient
references, and hence cannot migrate immediately. Retry a few times
before returning error.

The same is true for non-migrated pages that are migrated in
H_SVM_INIT_DONE hanlder.  Retry a few times before returning error.

H_SVM_PAGE_IN interface is enhanced to return H_BUSY if the page is
not in a migratable state.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 Documentation/powerpc/ultravisor.rst |   1 +
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 106 ++++++++++++++++++++++++-----------
 2 files changed, 74 insertions(+), 33 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index ba6b1bf..fe533ad 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -1035,6 +1035,7 @@ Return values
 	* H_PARAMETER	if ``guest_pa`` is invalid.
 	* H_P2		if ``flags`` is invalid.
 	* H_P3		if ``order`` of page is invalid.
+	* H_BUSY	if ``page`` is not in a state to pagein
 
 Description
 ~~~~~~~~~~~
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 3274663..a206984 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -672,7 +672,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
 		return ret;
 
 	if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
-		ret = -1;
+		ret = -2;
 		goto out_finalize;
 	}
 
@@ -700,43 +700,73 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
 	return ret;
 }
 
-int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
-		const struct kvm_memory_slot *memslot)
+/*
+ * return 1, if some page migration failed because of transient error,
+ * while the remaining pages migrated successfully.
+ * The caller can use this as a hint to retry.
+ *
+ * return 0 otherwise. *ret indicates the success status
+ * of this call.
+ */
+static bool __kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+		const struct kvm_memory_slot *memslot, int *ret)
 {
 	unsigned long gfn = memslot->base_gfn;
 	struct vm_area_struct *vma;
 	unsigned long start, end;
-	int ret = 0;
+	bool retry = false;
 
+	*ret = 0;
 	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {
 
 		mmap_read_lock(kvm->mm);
 		start = gfn_to_hva(kvm, gfn);
 		if (kvm_is_error_hva(start)) {
-			ret = H_STATE;
+			*ret = H_STATE;
 			goto next;
 		}
 
 		end = start + (1UL << PAGE_SHIFT);
 		vma = find_vma_intersection(kvm->mm, start, end);
 		if (!vma || vma->vm_start > start || vma->vm_end < end) {
-			ret = H_STATE;
+			*ret = H_STATE;
 			goto next;
 		}
 
 		mutex_lock(&kvm->arch.uvmem_lock);
-		ret = kvmppc_svm_migrate_page(vma, start, end,
+		*ret = kvmppc_svm_migrate_page(vma, start, end,
 				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
 		mutex_unlock(&kvm->arch.uvmem_lock);
-		if (ret)
-			ret = H_STATE;
 
 next:
 		mmap_read_unlock(kvm->mm);
+		if (*ret = -2) {
+			retry = true;
+			continue;
+		}
 
-		if (ret)
-			break;
+		if (*ret)
+			return false;
 	}
+	return retry;
+}
+
+#define REPEAT_COUNT 10
+
+int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
+		const struct kvm_memory_slot *memslot)
+{
+	int ret = 0, repeat_count = REPEAT_COUNT;
+
+	/*
+	 * try migration of pages in the memslot 'repeat_count' number of
+	 * times, provided each time migration fails because of transient
+	 * errors only.
+	 */
+	while (__kvmppc_uv_migrate_mem_slot(kvm, memslot, &ret) &&
+		repeat_count--)
+		;
+
 	return ret;
 }
 
@@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	struct vm_area_struct *vma;
 	int srcu_idx;
 	unsigned long gfn = gpa >> page_shift;
-	int ret;
+	int ret, repeat_count = REPEAT_COUNT;
 
 	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
 		return H_UNSUPPORTED;
@@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (flags & H_PAGE_IN_SHARED)
 		return kvmppc_share_page(kvm, gpa, page_shift);
 
-	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-	mmap_read_lock(kvm->mm);
 
-	start = gfn_to_hva(kvm, gfn);
-	if (kvm_is_error_hva(start))
-		goto out;
-
-	mutex_lock(&kvm->arch.uvmem_lock);
 	/* Fail the page-in request of an already paged-in page */
-	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
-		goto out_unlock;
+	mutex_lock(&kvm->arch.uvmem_lock);
+	ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	if (ret) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
+		return H_PARAMETER;
+	}
 
-	end = start + (1UL << page_shift);
-	vma = find_vma_intersection(kvm->mm, start, end);
-	if (!vma || vma->vm_start > start || vma->vm_end < end)
-		goto out_unlock;
+	do {
+		ret = H_PARAMETER;
+		mmap_read_lock(kvm->mm);
 
-	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
-				true))
-		goto out_unlock;
+		start = gfn_to_hva(kvm, gfn);
+		if (kvm_is_error_hva(start)) {
+			mmap_read_unlock(kvm->mm);
+			break;
+		}
 
-	ret = H_SUCCESS;
+		end = start + (1UL << page_shift);
+		vma = find_vma_intersection(kvm->mm, start, end);
+		if (!vma || vma->vm_start > start || vma->vm_end < end) {
+			mmap_read_unlock(kvm->mm);
+			break;
+		}
+
+		mutex_lock(&kvm->arch.uvmem_lock);
+		ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift, true);
+		mutex_unlock(&kvm->arch.uvmem_lock);
+
+		mmap_read_unlock(kvm->mm);
+	} while (ret = -2 && repeat_count--);
+
+	if (ret = -2)
+		ret = H_BUSY;
 
-out_unlock:
-	mutex_unlock(&kvm->arch.uvmem_lock);
-out:
-	mmap_read_unlock(kvm->mm);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory
  2020-07-17  8:00 ` Ram Pai
@ 2020-07-17  8:00   ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

From: Laurent Dufour <ldufour@linux.ibm.com>

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to secure-PFNs, aka device-PFNs.

Call kvmppc_uv_migrate_mem_slot() to accomplish this.
Disable page-merge for all pages in the memory slot.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[rearranged the code, and modified the commit log]
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 ++++++++++
 arch/powerpc/kvm/book3s_hv.c                | 10 ++--------
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 22 ++++++++++++++++++++++
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index f229ab5..6f7da00 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -25,6 +25,9 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 			     struct kvm *kvm, bool skip_page_out);
 int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
 			const struct kvm_memory_slot *memslot);
+void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new);
+void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old);
+
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -84,5 +87,12 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 			struct kvm *kvm, bool skip_page_out) { }
+
+static inline void  kvmppc_memslot_create(struct kvm *kvm,
+		const struct kvm_memory_slot *new) { }
+
+static inline void  kvmppc_memslot_delete(struct kvm *kvm,
+		const struct kvm_memory_slot *old) { }
+
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d331b46..bf3be3b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4515,16 +4515,10 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
 
 	switch (change) {
 	case KVM_MR_CREATE:
-		if (kvmppc_uvmem_slot_init(kvm, new))
-			return;
-		uv_register_mem_slot(kvm->arch.lpid,
-				     new->base_gfn << PAGE_SHIFT,
-				     new->npages * PAGE_SIZE,
-				     0, new->id);
+		kvmppc_memslot_create(kvm, new);
 		break;
 	case KVM_MR_DELETE:
-		uv_unregister_mem_slot(kvm->arch.lpid, old->id);
-		kvmppc_uvmem_slot_free(kvm, old);
+		kvmppc_memslot_delete(kvm, old);
 		break;
 	default:
 		/* TODO: Handle KVM_MR_MOVE */
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a206984..a2b4d25 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -1089,6 +1089,28 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
 }
 
+void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new)
+{
+	if (kvmppc_uvmem_slot_init(kvm, new))
+		return;
+
+	if (kvmppc_memslot_page_merge(kvm, new, false))
+		return;
+
+	if (uv_register_mem_slot(kvm->arch.lpid, new->base_gfn << PAGE_SHIFT,
+			new->npages * PAGE_SIZE, 0, new->id))
+		return;
+
+	kvmppc_uv_migrate_mem_slot(kvm, new);
+}
+
+void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old)
+{
+	uv_unregister_mem_slot(kvm->arch.lpid, old->id);
+	kvmppc_memslot_page_merge(kvm, old, true);
+	kvmppc_uvmem_slot_free(kvm, old);
+}
+
 static u64 kvmppc_get_secmem_size(void)
 {
 	struct device_node *np;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory
@ 2020-07-17  8:00   ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-17  8:00 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

From: Laurent Dufour <ldufour@linux.ibm.com>

When a memory slot is hot plugged to a SVM, PFNs associated with the
GFNs in that slot must be migrated to secure-PFNs, aka device-PFNs.

Call kvmppc_uv_migrate_mem_slot() to accomplish this.
Disable page-merge for all pages in the memory slot.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[rearranged the code, and modified the commit log]
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 ++++++++++
 arch/powerpc/kvm/book3s_hv.c                | 10 ++--------
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 22 ++++++++++++++++++++++
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index f229ab5..6f7da00 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -25,6 +25,9 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 			     struct kvm *kvm, bool skip_page_out);
 int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
 			const struct kvm_memory_slot *memslot);
+void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new);
+void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old);
+
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -84,5 +87,12 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 static inline void
 kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 			struct kvm *kvm, bool skip_page_out) { }
+
+static inline void  kvmppc_memslot_create(struct kvm *kvm,
+		const struct kvm_memory_slot *new) { }
+
+static inline void  kvmppc_memslot_delete(struct kvm *kvm,
+		const struct kvm_memory_slot *old) { }
+
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d331b46..bf3be3b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4515,16 +4515,10 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
 
 	switch (change) {
 	case KVM_MR_CREATE:
-		if (kvmppc_uvmem_slot_init(kvm, new))
-			return;
-		uv_register_mem_slot(kvm->arch.lpid,
-				     new->base_gfn << PAGE_SHIFT,
-				     new->npages * PAGE_SIZE,
-				     0, new->id);
+		kvmppc_memslot_create(kvm, new);
 		break;
 	case KVM_MR_DELETE:
-		uv_unregister_mem_slot(kvm->arch.lpid, old->id);
-		kvmppc_uvmem_slot_free(kvm, old);
+		kvmppc_memslot_delete(kvm, old);
 		break;
 	default:
 		/* TODO: Handle KVM_MR_MOVE */
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a206984..a2b4d25 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -1089,6 +1089,28 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	return (ret = U_SUCCESS) ? RESUME_GUEST : -EFAULT;
 }
 
+void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new)
+{
+	if (kvmppc_uvmem_slot_init(kvm, new))
+		return;
+
+	if (kvmppc_memslot_page_merge(kvm, new, false))
+		return;
+
+	if (uv_register_mem_slot(kvm->arch.lpid, new->base_gfn << PAGE_SHIFT,
+			new->npages * PAGE_SIZE, 0, new->id))
+		return;
+
+	kvmppc_uv_migrate_mem_slot(kvm, new);
+}
+
+void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old)
+{
+	uv_unregister_mem_slot(kvm->arch.lpid, old->id);
+	kvmppc_memslot_page_merge(kvm, old, true);
+	kvmppc_uvmem_slot_free(kvm, old);
+}
+
 static u64 kvmppc_get_secmem_size(void)
 {
 	struct device_node *np;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [v4 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  2020-07-17  8:00   ` Ram Pai
@ 2020-07-22  8:52     ` Bharata B Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-22  8:52 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:23AM -0700, Ram Pai wrote:
> Page-merging of pages in memory-slots associated with a Secure VM,
> is disabled in H_SVM_PAGE_IN handler.
> 
> This operation should have been done much earlier; the moment the VM
> is initiated for secure-transition. Delaying this operation, increases
> the probability for those pages to acquire new references , making it
> impossible to migrate those pages.
> 
> Disable page-migration in H_SVM_INIT_START handling.
> 
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>

Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>

with a few observations below...

> ---
>  Documentation/powerpc/ultravisor.rst |  1 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 98 +++++++++++++++++++++++++++---------
>  2 files changed, 76 insertions(+), 23 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
> index df136c8..a1c8c37 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -895,6 +895,7 @@ Return values
>      One of the following values:
>  
>  	* H_SUCCESS	 on success.
> +        * H_STATE        if the VM is not in a position to switch to secure.
>  
>  Description
>  ~~~~~~~~~~~
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index e6f76bc..0baa293 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  	return false;
>  }
>  
> +static int kvmppc_memslot_page_merge(struct kvm *kvm,
> +		struct kvm_memory_slot *memslot, bool merge)
> +{
> +	unsigned long gfn = memslot->base_gfn;
> +	unsigned long end, start = gfn_to_hva(kvm, gfn);
> +	int ret = 0;
> +	struct vm_area_struct *vma;
> +	int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> +
> +	if (kvm_is_error_hva(start))
> +		return H_STATE;
> +
> +	end = start + (memslot->npages << PAGE_SHIFT);
> +
> +	mmap_write_lock(kvm->mm);
> +	do {
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma) {
> +			ret = H_STATE;
> +			break;
> +		}
> +		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +			  merge_flag, &vma->vm_flags);
> +		if (ret) {
> +			ret = H_STATE;
> +			break;
> +		}
> +		start = vma->vm_end + 1;

This should be start = vma->vm_end I believe.

> +	} while (end > vma->vm_end);
> +
> +	mmap_write_unlock(kvm->mm);
> +	return ret;
> +}
> +
> +static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
> +{
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *memslot;
> +	int ret = 0;
> +
> +	slots = kvm_memslots(kvm);
> +	kvm_for_each_memslot(memslot, slots) {
> +		ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
> +		if (ret)
> +			break;
> +	}
> +	return ret;
> +}

You walk through all the slots here to issue kvm_madvise, but...

> +
> +static inline int kvmppc_disable_page_merge(struct kvm *kvm)
> +{
> +	return __kvmppc_page_merge(kvm, false);
> +}
> +
> +static inline int kvmppc_enable_page_merge(struct kvm *kvm)
> +{
> +	return __kvmppc_page_merge(kvm, true);
> +}
> +
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  {
>  	struct kvm_memslots *slots;
> @@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  		return H_AUTHORITY;
>  
>  	srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +	/* disable page-merging for all memslot */
> +	ret = kvmppc_disable_page_merge(kvm);
> +	if (ret)
> +		goto out;
> +
> +	/* register the memslot */
>  	slots = kvm_memslots(kvm);
>  	kvm_for_each_memslot(memslot, slots) {

... you are walking thro' the same set of slots here anyway. I think
it makes sense to issue merge advices from here itself. That will
help you to share code with kvmppc_memslot_create() in 5/5.

All the below 3 calls are common to both the code paths, I think
they can be carved out into a separate function if you prefer.

kvmppc_uvmem_slot_init
kvmppc_memslot_page_merge
uv_register_mem_slot

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
@ 2020-07-22  8:52     ` Bharata B Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-22  8:52 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:23AM -0700, Ram Pai wrote:
> Page-merging of pages in memory-slots associated with a Secure VM,
> is disabled in H_SVM_PAGE_IN handler.
> 
> This operation should have been done much earlier; the moment the VM
> is initiated for secure-transition. Delaying this operation, increases
> the probability for those pages to acquire new references , making it
> impossible to migrate those pages.
> 
> Disable page-migration in H_SVM_INIT_START handling.
> 
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>

Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>

with a few observations below...

> ---
>  Documentation/powerpc/ultravisor.rst |  1 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 98 +++++++++++++++++++++++++++---------
>  2 files changed, 76 insertions(+), 23 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
> index df136c8..a1c8c37 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -895,6 +895,7 @@ Return values
>      One of the following values:
>  
>  	* H_SUCCESS	 on success.
> +        * H_STATE        if the VM is not in a position to switch to secure.
>  
>  Description
>  ~~~~~~~~~~~
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index e6f76bc..0baa293 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  	return false;
>  }
>  
> +static int kvmppc_memslot_page_merge(struct kvm *kvm,
> +		struct kvm_memory_slot *memslot, bool merge)
> +{
> +	unsigned long gfn = memslot->base_gfn;
> +	unsigned long end, start = gfn_to_hva(kvm, gfn);
> +	int ret = 0;
> +	struct vm_area_struct *vma;
> +	int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> +
> +	if (kvm_is_error_hva(start))
> +		return H_STATE;
> +
> +	end = start + (memslot->npages << PAGE_SHIFT);
> +
> +	mmap_write_lock(kvm->mm);
> +	do {
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma) {
> +			ret = H_STATE;
> +			break;
> +		}
> +		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +			  merge_flag, &vma->vm_flags);
> +		if (ret) {
> +			ret = H_STATE;
> +			break;
> +		}
> +		start = vma->vm_end + 1;

This should be start = vma->vm_end I believe.

> +	} while (end > vma->vm_end);
> +
> +	mmap_write_unlock(kvm->mm);
> +	return ret;
> +}
> +
> +static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
> +{
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *memslot;
> +	int ret = 0;
> +
> +	slots = kvm_memslots(kvm);
> +	kvm_for_each_memslot(memslot, slots) {
> +		ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
> +		if (ret)
> +			break;
> +	}
> +	return ret;
> +}

You walk through all the slots here to issue kvm_madvise, but...

> +
> +static inline int kvmppc_disable_page_merge(struct kvm *kvm)
> +{
> +	return __kvmppc_page_merge(kvm, false);
> +}
> +
> +static inline int kvmppc_enable_page_merge(struct kvm *kvm)
> +{
> +	return __kvmppc_page_merge(kvm, true);
> +}
> +
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  {
>  	struct kvm_memslots *slots;
> @@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  		return H_AUTHORITY;
>  
>  	srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +	/* disable page-merging for all memslot */
> +	ret = kvmppc_disable_page_merge(kvm);
> +	if (ret)
> +		goto out;
> +
> +	/* register the memslot */
>  	slots = kvm_memslots(kvm);
>  	kvm_for_each_memslot(memslot, slots) {

... you are walking thro' the same set of slots here anyway. I think
it makes sense to issue merge advices from here itself. That will
help you to share code with kvmppc_memslot_create() in 5/5.

All the below 3 calls are common to both the code paths, I think
they can be carved out into a separate function if you prefer.

kvmppc_uvmem_slot_init
kvmppc_memslot_page_merge
uv_register_mem_slot

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory
  2020-07-17  8:00   ` Ram Pai
@ 2020-07-22 10:13     ` Bharata B Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-22 10:01 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:27AM -0700, Ram Pai wrote:
> From: Laurent Dufour <ldufour@linux.ibm.com>
> 
> When a memory slot is hot plugged to a SVM, PFNs associated with the
> GFNs in that slot must be migrated to secure-PFNs, aka device-PFNs.
> 
> Call kvmppc_uv_migrate_mem_slot() to accomplish this.
> Disable page-merge for all pages in the memory slot.
> 
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> [rearranged the code, and modified the commit log]
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 ++++++++++
>  arch/powerpc/kvm/book3s_hv.c                | 10 ++--------
>  arch/powerpc/kvm/book3s_hv_uvmem.c          | 22 ++++++++++++++++++++++
>  3 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index f229ab5..6f7da00 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -25,6 +25,9 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  			     struct kvm *kvm, bool skip_page_out);
>  int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
>  			const struct kvm_memory_slot *memslot);
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new);
> +void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old);

The names look a bit generic, but these functions are specific
to secure guests. May be rename them to kvmppc_uvmem_memslot_[create/delele]?

> +
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> @@ -84,5 +87,12 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
>  static inline void
>  kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  			struct kvm *kvm, bool skip_page_out) { }
> +
> +static inline void  kvmppc_memslot_create(struct kvm *kvm,
> +		const struct kvm_memory_slot *new) { }
> +
> +static inline void  kvmppc_memslot_delete(struct kvm *kvm,
> +		const struct kvm_memory_slot *old) { }
> +
>  #endif /* CONFIG_PPC_UV */
>  #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index d331b46..bf3be3b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4515,16 +4515,10 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>  
>  	switch (change) {
>  	case KVM_MR_CREATE:
> -		if (kvmppc_uvmem_slot_init(kvm, new))
> -			return;
> -		uv_register_mem_slot(kvm->arch.lpid,
> -				     new->base_gfn << PAGE_SHIFT,
> -				     new->npages * PAGE_SIZE,
> -				     0, new->id);
> +		kvmppc_memslot_create(kvm, new);
>  		break;
>  	case KVM_MR_DELETE:
> -		uv_unregister_mem_slot(kvm->arch.lpid, old->id);
> -		kvmppc_uvmem_slot_free(kvm, old);
> +		kvmppc_memslot_delete(kvm, old);
>  		break;
>  	default:
>  		/* TODO: Handle KVM_MR_MOVE */
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index a206984..a2b4d25 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -1089,6 +1089,28 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
>  	return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
>  }
>  
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new)
> +{
> +	if (kvmppc_uvmem_slot_init(kvm, new))
> +		return;
> +
> +	if (kvmppc_memslot_page_merge(kvm, new, false))
> +		return;
> +
> +	if (uv_register_mem_slot(kvm->arch.lpid, new->base_gfn << PAGE_SHIFT,
> +			new->npages * PAGE_SIZE, 0, new->id))
> +		return;
> +
> +	kvmppc_uv_migrate_mem_slot(kvm, new);

Quite a few things can return failure here including
kvmppc_uv_migrate_mem_slot() and we are ignoring all of those.
I am wondering if this should be called from prepare_memory_region callback
instead of commit_memory_region. In the prepare phase, we have a way
to back out in case of error. Can you check if moving this call to
prepare callback is feasible?

In the other case in 1/5, the code issues ksm unmerge request on error,
but not here.

Also check if the code for 1st three calls can be shared with similar
code in 1/5.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory
@ 2020-07-22 10:13     ` Bharata B Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-22 10:13 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:27AM -0700, Ram Pai wrote:
> From: Laurent Dufour <ldufour@linux.ibm.com>
> 
> When a memory slot is hot plugged to a SVM, PFNs associated with the
> GFNs in that slot must be migrated to secure-PFNs, aka device-PFNs.
> 
> Call kvmppc_uv_migrate_mem_slot() to accomplish this.
> Disable page-merge for all pages in the memory slot.
> 
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> [rearranged the code, and modified the commit log]
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 ++++++++++
>  arch/powerpc/kvm/book3s_hv.c                | 10 ++--------
>  arch/powerpc/kvm/book3s_hv_uvmem.c          | 22 ++++++++++++++++++++++
>  3 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index f229ab5..6f7da00 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -25,6 +25,9 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  			     struct kvm *kvm, bool skip_page_out);
>  int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
>  			const struct kvm_memory_slot *memslot);
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new);
> +void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot *old);

The names look a bit generic, but these functions are specific
to secure guests. May be rename them to kvmppc_uvmem_memslot_[create/delele]?

> +
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> @@ -84,5 +87,12 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
>  static inline void
>  kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  			struct kvm *kvm, bool skip_page_out) { }
> +
> +static inline void  kvmppc_memslot_create(struct kvm *kvm,
> +		const struct kvm_memory_slot *new) { }
> +
> +static inline void  kvmppc_memslot_delete(struct kvm *kvm,
> +		const struct kvm_memory_slot *old) { }
> +
>  #endif /* CONFIG_PPC_UV */
>  #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index d331b46..bf3be3b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4515,16 +4515,10 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>  
>  	switch (change) {
>  	case KVM_MR_CREATE:
> -		if (kvmppc_uvmem_slot_init(kvm, new))
> -			return;
> -		uv_register_mem_slot(kvm->arch.lpid,
> -				     new->base_gfn << PAGE_SHIFT,
> -				     new->npages * PAGE_SIZE,
> -				     0, new->id);
> +		kvmppc_memslot_create(kvm, new);
>  		break;
>  	case KVM_MR_DELETE:
> -		uv_unregister_mem_slot(kvm->arch.lpid, old->id);
> -		kvmppc_uvmem_slot_free(kvm, old);
> +		kvmppc_memslot_delete(kvm, old);
>  		break;
>  	default:
>  		/* TODO: Handle KVM_MR_MOVE */
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index a206984..a2b4d25 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -1089,6 +1089,28 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
>  	return (ret = U_SUCCESS) ? RESUME_GUEST : -EFAULT;
>  }
>  
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot *new)
> +{
> +	if (kvmppc_uvmem_slot_init(kvm, new))
> +		return;
> +
> +	if (kvmppc_memslot_page_merge(kvm, new, false))
> +		return;
> +
> +	if (uv_register_mem_slot(kvm->arch.lpid, new->base_gfn << PAGE_SHIFT,
> +			new->npages * PAGE_SIZE, 0, new->id))
> +		return;
> +
> +	kvmppc_uv_migrate_mem_slot(kvm, new);

Quite a few things can return failure here including
kvmppc_uv_migrate_mem_slot() and we are ignoring all of those.
I am wondering if this should be called from prepare_memory_region callback
instead of commit_memory_region. In the prepare phase, we have a way
to back out in case of error. Can you check if moving this call to
prepare callback is feasible?

In the other case in 1/5, the code issues ksm unmerge request on error,
but not here.

Also check if the code for 1st three calls can be shared with similar
code in 1/5.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  2020-07-17  8:00   ` Ram Pai
@ 2020-07-23  4:49     ` Bharata B Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-23  4:48 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:24AM -0700, Ram Pai wrote:
> During the life of SVM, its GFNs transition through normal, secure and
> shared states. Since the kernel does not track GFNs that are shared, it
> is not possible to disambiguate a shared GFN from a GFN whose PFN has
> not yet been migrated to a secure-PFN. Also it is not possible to
> disambiguate a secure-GFN from a GFN whose GFN has been pagedout from
> the ultravisor.
> 
> The ability to identify the state of a GFN is needed to skip migration
> of its PFN to secure-PFN during ESM transition.
> 
> The code is re-organized to track the states of a GFN as explained
> below.
> 
> ************************************************************************
>  1. States of a GFN
>     ---------------
>  The GFN can be in one of the following states.
> 
>  (a) Secure - The GFN is secure. The GFN is associated with
>  	a Secure VM, the contents of the GFN is not accessible
>  	to the Hypervisor.  This GFN can be backed by a secure-PFN,
>  	or can be backed by a normal-PFN with contents encrypted.
>  	The former is true when the GFN is paged-in into the
>  	ultravisor. The latter is true when the GFN is paged-out
>  	of the ultravisor.
> 
>  (b) Shared - The GFN is shared. The GFN is associated with a
>  	a secure VM. The contents of the GFN is accessible to
>  	Hypervisor. This GFN is backed by a normal-PFN and its
>  	content is un-encrypted.
> 
>  (c) Normal - The GFN is a normal. The GFN is associated with
>  	a normal VM. The contents of the GFN is accesible to
>  	the Hypervisor. Its content is never encrypted.
> 
>  2. States of a VM.
>     ---------------
> 
>  (a) Normal VM:  A VM whose contents are always accessible to
>  	the hypervisor.  All its GFNs are normal-GFNs.
> 
>  (b) Secure VM: A VM whose contents are not accessible to the
>  	hypervisor without the VM's consent.  Its GFNs are
>  	either Shared-GFN or Secure-GFNs.
> 
>  (c) Transient VM: A Normal VM that is transitioning to secure VM.
>  	The transition starts on successful return of
>  	H_SVM_INIT_START, and ends on successful return
>  	of H_SVM_INIT_DONE. This transient VM, can have GFNs
>  	in any of the three states; i.e Secure-GFN, Shared-GFN,
>  	and Normal-GFN.	The VM never executes in this state
>  	in supervisor-mode.
> 
>  3. Memory slot State.
>     ------------------
>   	The state of a memory slot mirrors the state of the
>   	VM the memory slot is associated with.
> 
>  4. VM State transition.
>     --------------------
> 
>   A VM always starts in Normal Mode.
> 
>   H_SVM_INIT_START moves the VM into transient state. During this
>   time the Ultravisor may request some of its GFNs to be shared or
>   secured. So its GFNs can be in one of the three GFN states.
> 
>   H_SVM_INIT_DONE moves the VM entirely from transient state to
>   secure-state. At this point any left-over normal-GFNs are
>   transitioned to Secure-GFN.
> 
>   H_SVM_INIT_ABORT moves the transient VM back to normal VM.
>   All its GFNs are moved to Normal-GFNs.
> 
>   UV_TERMINATE transitions the secure-VM back to normal-VM. All
>   the secure-GFN and shared-GFNs are tranistioned to normal-GFN
>   Note: The contents of the normal-GFN is undefined at this point.
> 
>  5. GFN state implementation:
>     -------------------------
> 
>  Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
>  when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
>  set, and contains the value of the secure-PFN.
>  It is associated with a normal-PFN; also called mem_pfn, when
>  the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
>  The value of the normal-PFN is not tracked.
> 
>  Shared GFN is associated with a normal-PFN. Its pfn[] has
>  KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
>  is not tracked.
> 
>  Normal GFN is associated with normal-PFN. Its pfn[] has
>  no flag set. The value of the normal-PFN is not tracked.
> 
>  6. Life cycle of a GFN
>     --------------------
>  --------------------------------------------------------------
>  |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
>  |        |operation   |operation | abort/    |               |
>  |        |            |          | terminate |               |
>  -------------------------------------------------------------
>  |        |            |          |           |               |
>  | Secure |     Shared | Secure   |Normal     |Secure         |
>  |        |            |          |           |               |
>  | Shared |     Shared | Secure   |Normal     |Shared         |
>  |        |            |          |           |               |
>  | Normal |     Shared | Secure   |Normal     |Secure         |
>  --------------------------------------------------------------
> 
>  7. Life cycle of a VM
>     --------------------
>  --------------------------------------------------------------------
>  |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
>  |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
>  |         |           |          |         |           |           |
>  --------- ----------------------------------------------------------
>  |         |           |          |         |           |           |
>  | Normal  | Normal    | Transient|Error    |Error      |Normal     |
>  |         |           |          |         |           |           |
>  | Secure  |   Error   | Error    |Error    |Error      |Normal     |
>  |         |           |          |         |           |           |
>  |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
>  --------------------------------------------------------------------
> 
> ************************************************************************
> 
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 187 +++++++++++++++++++++++++++++++++----
>  1 file changed, 168 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 0baa293..df2e272 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -98,7 +98,127 @@
>  static unsigned long *kvmppc_uvmem_bitmap;
>  static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
>  
> -#define KVMPPC_UVMEM_PFN	(1UL << 63)
> +/*
> + * States of a GFN
> + * ---------------
> + * The GFN can be in one of the following states.
> + *
> + * (a) Secure - The GFN is secure. The GFN is associated with
> + *	a Secure VM, the contents of the GFN is not accessible
> + *	to the Hypervisor.  This GFN can be backed by a secure-PFN,
> + *	or can be backed by a normal-PFN with contents encrypted.
> + *	The former is true when the GFN is paged-in into the
> + *	ultravisor. The latter is true when the GFN is paged-out
> + *	of the ultravisor.
> + *
> + * (b) Shared - The GFN is shared. The GFN is associated with a
> + *	a secure VM. The contents of the GFN is accessible to
> + *	Hypervisor. This GFN is backed by a normal-PFN and its
> + *	content is un-encrypted.
> + *
> + * (c) Normal - The GFN is a normal. The GFN is associated with
> + *	a normal VM. The contents of the GFN is accesible to
> + *	the Hypervisor. Its content is never encrypted.
> + *
> + * States of a VM.
> + * ---------------
> + *
> + * Normal VM:  A VM whose contents are always accessible to
> + *	the hypervisor.  All its GFNs are normal-GFNs.
> + *
> + * Secure VM: A VM whose contents are not accessible to the
> + *	hypervisor without the VM's consent.  Its GFNs are
> + *	either Shared-GFN or Secure-GFNs.
> + *
> + * Transient VM: A Normal VM that is transitioning to secure VM.
> + *	The transition starts on successful return of
> + *	H_SVM_INIT_START, and ends on successful return
> + *	of H_SVM_INIT_DONE. This transient VM, can have GFNs
> + *	in any of the three states; i.e Secure-GFN, Shared-GFN,
> + *	and Normal-GFN.	The VM never executes in this state
> + *	in supervisor-mode.
> + *
> + * Memory slot State.
> + * -----------------------------
> + *	The state of a memory slot mirrors the state of the
> + *	VM the memory slot is associated with.
> + *
> + * VM State transition.
> + * --------------------
> + *
> + *  A VM always starts in Normal Mode.
> + *
> + *  H_SVM_INIT_START moves the VM into transient state. During this
> + *  time the Ultravisor may request some of its GFNs to be shared or
> + *  secured. So its GFNs can be in one of the three GFN states.
> + *
> + *  H_SVM_INIT_DONE moves the VM entirely from transient state to
> + *  secure-state. At this point any left-over normal-GFNs are
> + *  transitioned to Secure-GFN.
> + *
> + *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
> + *  All its GFNs are moved to Normal-GFNs.
> + *
> + *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
> + *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
> + *  Note: The contents of the normal-GFN is undefined at this point.
> + *
> + * GFN state implementation:
> + * -------------------------
> + *
> + * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
> + * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
> + * set, and contains the value of the secure-PFN.
> + * It is associated with a normal-PFN; also called mem_pfn, when
> + * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
> + * The value of the normal-PFN is not tracked.
> + *
> + * Shared GFN is associated with a normal-PFN. Its pfn[] has
> + * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
> + * is not tracked.
> + *
> + * Normal GFN is associated with normal-PFN. Its pfn[] has
> + * no flag set. The value of the normal-PFN is not tracked.
> + *
> + * Life cycle of a GFN
> + * --------------------
> + *
> + * --------------------------------------------------------------
> + * |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
> + * |        |operation   |operation | abort/    |               |
> + * |        |            |          | terminate |               |
> + * -------------------------------------------------------------
> + * |        |            |          |           |               |
> + * | Secure |     Shared | Secure   |Normal     |Secure         |
> + * |        |            |          |           |               |
> + * | Shared |     Shared | Secure   |Normal     |Shared         |
> + * |        |            |          |           |               |
> + * | Normal |     Shared | Secure   |Normal     |Secure         |
> + * --------------------------------------------------------------
> + *
> + * Life cycle of a VM
> + * --------------------
> + *
> + * --------------------------------------------------------------------
> + * |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
> + * |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
> + * |         |           |          |         |           |           |
> + * --------- ----------------------------------------------------------
> + * |         |           |          |         |           |           |
> + * | Normal  | Normal    | Transient|Error    |Error      |Normal     |
> + * |         |           |          |         |           |           |
> + * | Secure  |   Error   | Error    |Error    |Error      |Normal     |
> + * |         |           |          |         |           |           |
> + * |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
> + * --------------------------------------------------------------------
> + */
> +
> +#define KVMPPC_GFN_UVMEM_PFN	(1UL << 63)
> +#define KVMPPC_GFN_MEM_PFN	(1UL << 62)
> +#define KVMPPC_GFN_SHARED	(1UL << 61)
> +#define KVMPPC_GFN_SECURE	(KVMPPC_GFN_UVMEM_PFN | KVMPPC_GFN_MEM_PFN)
> +#define KVMPPC_GFN_FLAG_MASK	(KVMPPC_GFN_SECURE | KVMPPC_GFN_SHARED)
> +#define KVMPPC_GFN_PFN_MASK	(~KVMPPC_GFN_FLAG_MASK)
>  
>  struct kvmppc_uvmem_slot {
>  	struct list_head list;
> @@ -106,11 +226,11 @@ struct kvmppc_uvmem_slot {
>  	unsigned long base_pfn;
>  	unsigned long *pfns;
>  };
> -
>  struct kvmppc_uvmem_page_pvt {
>  	struct kvm *kvm;
>  	unsigned long gpa;
>  	bool skip_page_out;
> +	bool remove_gfn;
>  };
>  
>  bool kvmppc_uvmem_available(void)
> @@ -163,8 +283,8 @@ void kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot *slot)
>  	mutex_unlock(&kvm->arch.uvmem_lock);
>  }
>  
> -static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
> -				    struct kvm *kvm)
> +static void kvmppc_mark_gfn(unsigned long gfn, struct kvm *kvm,
> +			unsigned long flag, unsigned long uvmem_pfn)
>  {
>  	struct kvmppc_uvmem_slot *p;
>  
> @@ -172,24 +292,41 @@ static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
>  		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
>  			unsigned long index = gfn - p->base_pfn;
>  
> -			p->pfns[index] = uvmem_pfn | KVMPPC_UVMEM_PFN;
> +			if (flag == KVMPPC_GFN_UVMEM_PFN)
> +				p->pfns[index] = uvmem_pfn | flag;
> +			else
> +				p->pfns[index] = flag;
>  			return;
>  		}
>  	}
>  }
>  
> -static void kvmppc_uvmem_pfn_remove(unsigned long gfn, struct kvm *kvm)
> +/* mark the GFN as secure-GFN associated with @uvmem pfn device-PFN. */
> +static void kvmppc_gfn_secure_uvmem_pfn(unsigned long gfn,
> +			unsigned long uvmem_pfn, struct kvm *kvm)
>  {
> -	struct kvmppc_uvmem_slot *p;
> +	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_UVMEM_PFN, uvmem_pfn);
> +}
>  
> -	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
> -		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
> -			p->pfns[gfn - p->base_pfn] = 0;
> -			return;
> -		}
> -	}
> +/* mark the GFN as secure-GFN associated with a memory-PFN. */
> +static void kvmppc_gfn_secure_mem_pfn(unsigned long gfn, struct kvm *kvm)
> +{
> +	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_MEM_PFN, 0);
> +}
> +
> +/* mark the GFN as a shared GFN. */
> +static void kvmppc_gfn_shared(unsigned long gfn, struct kvm *kvm)
> +{
> +	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_SHARED, 0);
> +}
> +
> +/* mark the GFN as a non-existent GFN. */
> +static void kvmppc_gfn_remove(unsigned long gfn, struct kvm *kvm)
> +{
> +	kvmppc_mark_gfn(gfn, kvm, 0, 0);
>  }
>  
> +/* return true, if the GFN is a secure-GFN backed by a secure-PFN */
>  static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  				    unsigned long *uvmem_pfn)
>  {
> @@ -199,10 +336,10 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
>  			unsigned long index = gfn - p->base_pfn;
>  
> -			if (p->pfns[index] & KVMPPC_UVMEM_PFN) {
> +			if (p->pfns[index] & KVMPPC_GFN_UVMEM_PFN) {
>  				if (uvmem_pfn)
>  					*uvmem_pfn = p->pfns[index] &
> -						     ~KVMPPC_UVMEM_PFN;
> +						     KVMPPC_GFN_PFN_MASK;
>  				return true;
>  			} else
>  				return false;
> @@ -353,6 +490,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  
>  		mutex_lock(&kvm->arch.uvmem_lock);
>  		if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> +			kvmppc_gfn_remove(gfn, kvm);
>  			mutex_unlock(&kvm->arch.uvmem_lock);
>  			continue;
>  		}
> @@ -360,6 +498,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  		uvmem_page = pfn_to_page(uvmem_pfn);
>  		pvt = uvmem_page->zone_device_data;
>  		pvt->skip_page_out = skip_page_out;
> +		pvt->remove_gfn = true;
>  		mutex_unlock(&kvm->arch.uvmem_lock);
>  
>  		pfn = gfn_to_pfn(kvm, gfn);
> @@ -429,7 +568,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
>  		goto out_clear;
>  
>  	uvmem_pfn = bit + pfn_first;
> -	kvmppc_uvmem_pfn_insert(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
> +	kvmppc_gfn_secure_uvmem_pfn(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
>  
>  	pvt->gpa = gpa;
>  	pvt->kvm = kvm;
> @@ -524,6 +663,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
>  		uvmem_page = pfn_to_page(uvmem_pfn);
>  		pvt = uvmem_page->zone_device_data;
>  		pvt->skip_page_out = true;
> +		pvt->remove_gfn = false;
>  	}
>  
>  retry:
> @@ -537,12 +677,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
>  		uvmem_page = pfn_to_page(uvmem_pfn);
>  		pvt = uvmem_page->zone_device_data;
>  		pvt->skip_page_out = true;
> +		pvt->remove_gfn = false;

This is the case of making an already secure page as shared page.
A comment here as to why remove_gfn is set to false here will help.

Also isn't it by default false? Is there a situation where it starts
out by default false, becomes true later and you are required to
explicitly mark it false here?

Otherwise, Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
@ 2020-07-23  4:49     ` Bharata B Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-23  4:49 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:24AM -0700, Ram Pai wrote:
> During the life of SVM, its GFNs transition through normal, secure and
> shared states. Since the kernel does not track GFNs that are shared, it
> is not possible to disambiguate a shared GFN from a GFN whose PFN has
> not yet been migrated to a secure-PFN. Also it is not possible to
> disambiguate a secure-GFN from a GFN whose GFN has been pagedout from
> the ultravisor.
> 
> The ability to identify the state of a GFN is needed to skip migration
> of its PFN to secure-PFN during ESM transition.
> 
> The code is re-organized to track the states of a GFN as explained
> below.
> 
> ************************************************************************
>  1. States of a GFN
>     ---------------
>  The GFN can be in one of the following states.
> 
>  (a) Secure - The GFN is secure. The GFN is associated with
>  	a Secure VM, the contents of the GFN is not accessible
>  	to the Hypervisor.  This GFN can be backed by a secure-PFN,
>  	or can be backed by a normal-PFN with contents encrypted.
>  	The former is true when the GFN is paged-in into the
>  	ultravisor. The latter is true when the GFN is paged-out
>  	of the ultravisor.
> 
>  (b) Shared - The GFN is shared. The GFN is associated with a
>  	a secure VM. The contents of the GFN is accessible to
>  	Hypervisor. This GFN is backed by a normal-PFN and its
>  	content is un-encrypted.
> 
>  (c) Normal - The GFN is a normal. The GFN is associated with
>  	a normal VM. The contents of the GFN is accesible to
>  	the Hypervisor. Its content is never encrypted.
> 
>  2. States of a VM.
>     ---------------
> 
>  (a) Normal VM:  A VM whose contents are always accessible to
>  	the hypervisor.  All its GFNs are normal-GFNs.
> 
>  (b) Secure VM: A VM whose contents are not accessible to the
>  	hypervisor without the VM's consent.  Its GFNs are
>  	either Shared-GFN or Secure-GFNs.
> 
>  (c) Transient VM: A Normal VM that is transitioning to secure VM.
>  	The transition starts on successful return of
>  	H_SVM_INIT_START, and ends on successful return
>  	of H_SVM_INIT_DONE. This transient VM, can have GFNs
>  	in any of the three states; i.e Secure-GFN, Shared-GFN,
>  	and Normal-GFN.	The VM never executes in this state
>  	in supervisor-mode.
> 
>  3. Memory slot State.
>     ------------------
>   	The state of a memory slot mirrors the state of the
>   	VM the memory slot is associated with.
> 
>  4. VM State transition.
>     --------------------
> 
>   A VM always starts in Normal Mode.
> 
>   H_SVM_INIT_START moves the VM into transient state. During this
>   time the Ultravisor may request some of its GFNs to be shared or
>   secured. So its GFNs can be in one of the three GFN states.
> 
>   H_SVM_INIT_DONE moves the VM entirely from transient state to
>   secure-state. At this point any left-over normal-GFNs are
>   transitioned to Secure-GFN.
> 
>   H_SVM_INIT_ABORT moves the transient VM back to normal VM.
>   All its GFNs are moved to Normal-GFNs.
> 
>   UV_TERMINATE transitions the secure-VM back to normal-VM. All
>   the secure-GFN and shared-GFNs are tranistioned to normal-GFN
>   Note: The contents of the normal-GFN is undefined at this point.
> 
>  5. GFN state implementation:
>     -------------------------
> 
>  Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
>  when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
>  set, and contains the value of the secure-PFN.
>  It is associated with a normal-PFN; also called mem_pfn, when
>  the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
>  The value of the normal-PFN is not tracked.
> 
>  Shared GFN is associated with a normal-PFN. Its pfn[] has
>  KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
>  is not tracked.
> 
>  Normal GFN is associated with normal-PFN. Its pfn[] has
>  no flag set. The value of the normal-PFN is not tracked.
> 
>  6. Life cycle of a GFN
>     --------------------
>  --------------------------------------------------------------
>  |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
>  |        |operation   |operation | abort/    |               |
>  |        |            |          | terminate |               |
>  -------------------------------------------------------------
>  |        |            |          |           |               |
>  | Secure |     Shared | Secure   |Normal     |Secure         |
>  |        |            |          |           |               |
>  | Shared |     Shared | Secure   |Normal     |Shared         |
>  |        |            |          |           |               |
>  | Normal |     Shared | Secure   |Normal     |Secure         |
>  --------------------------------------------------------------
> 
>  7. Life cycle of a VM
>     --------------------
>  --------------------------------------------------------------------
>  |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
>  |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
>  |         |           |          |         |           |           |
>  --------- ----------------------------------------------------------
>  |         |           |          |         |           |           |
>  | Normal  | Normal    | Transient|Error    |Error      |Normal     |
>  |         |           |          |         |           |           |
>  | Secure  |   Error   | Error    |Error    |Error      |Normal     |
>  |         |           |          |         |           |           |
>  |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
>  --------------------------------------------------------------------
> 
> ************************************************************************
> 
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 187 +++++++++++++++++++++++++++++++++----
>  1 file changed, 168 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 0baa293..df2e272 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -98,7 +98,127 @@
>  static unsigned long *kvmppc_uvmem_bitmap;
>  static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
>  
> -#define KVMPPC_UVMEM_PFN	(1UL << 63)
> +/*
> + * States of a GFN
> + * ---------------
> + * The GFN can be in one of the following states.
> + *
> + * (a) Secure - The GFN is secure. The GFN is associated with
> + *	a Secure VM, the contents of the GFN is not accessible
> + *	to the Hypervisor.  This GFN can be backed by a secure-PFN,
> + *	or can be backed by a normal-PFN with contents encrypted.
> + *	The former is true when the GFN is paged-in into the
> + *	ultravisor. The latter is true when the GFN is paged-out
> + *	of the ultravisor.
> + *
> + * (b) Shared - The GFN is shared. The GFN is associated with a
> + *	a secure VM. The contents of the GFN is accessible to
> + *	Hypervisor. This GFN is backed by a normal-PFN and its
> + *	content is un-encrypted.
> + *
> + * (c) Normal - The GFN is a normal. The GFN is associated with
> + *	a normal VM. The contents of the GFN is accesible to
> + *	the Hypervisor. Its content is never encrypted.
> + *
> + * States of a VM.
> + * ---------------
> + *
> + * Normal VM:  A VM whose contents are always accessible to
> + *	the hypervisor.  All its GFNs are normal-GFNs.
> + *
> + * Secure VM: A VM whose contents are not accessible to the
> + *	hypervisor without the VM's consent.  Its GFNs are
> + *	either Shared-GFN or Secure-GFNs.
> + *
> + * Transient VM: A Normal VM that is transitioning to secure VM.
> + *	The transition starts on successful return of
> + *	H_SVM_INIT_START, and ends on successful return
> + *	of H_SVM_INIT_DONE. This transient VM, can have GFNs
> + *	in any of the three states; i.e Secure-GFN, Shared-GFN,
> + *	and Normal-GFN.	The VM never executes in this state
> + *	in supervisor-mode.
> + *
> + * Memory slot State.
> + * -----------------------------
> + *	The state of a memory slot mirrors the state of the
> + *	VM the memory slot is associated with.
> + *
> + * VM State transition.
> + * --------------------
> + *
> + *  A VM always starts in Normal Mode.
> + *
> + *  H_SVM_INIT_START moves the VM into transient state. During this
> + *  time the Ultravisor may request some of its GFNs to be shared or
> + *  secured. So its GFNs can be in one of the three GFN states.
> + *
> + *  H_SVM_INIT_DONE moves the VM entirely from transient state to
> + *  secure-state. At this point any left-over normal-GFNs are
> + *  transitioned to Secure-GFN.
> + *
> + *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
> + *  All its GFNs are moved to Normal-GFNs.
> + *
> + *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
> + *  the secure-GFN and shared-GFNs are tranistioned to normal-GFN
> + *  Note: The contents of the normal-GFN is undefined at this point.
> + *
> + * GFN state implementation:
> + * -------------------------
> + *
> + * Secure GFN is associated with a secure-PFN; also called uvmem_pfn,
> + * when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag
> + * set, and contains the value of the secure-PFN.
> + * It is associated with a normal-PFN; also called mem_pfn, when
> + * the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set.
> + * The value of the normal-PFN is not tracked.
> + *
> + * Shared GFN is associated with a normal-PFN. Its pfn[] has
> + * KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN
> + * is not tracked.
> + *
> + * Normal GFN is associated with normal-PFN. Its pfn[] has
> + * no flag set. The value of the normal-PFN is not tracked.
> + *
> + * Life cycle of a GFN
> + * --------------------
> + *
> + * --------------------------------------------------------------
> + * |        |     Share  |  Unshare | SVM       |H_SVM_INIT_DONE|
> + * |        |operation   |operation | abort/    |               |
> + * |        |            |          | terminate |               |
> + * -------------------------------------------------------------
> + * |        |            |          |           |               |
> + * | Secure |     Shared | Secure   |Normal     |Secure         |
> + * |        |            |          |           |               |
> + * | Shared |     Shared | Secure   |Normal     |Shared         |
> + * |        |            |          |           |               |
> + * | Normal |     Shared | Secure   |Normal     |Secure         |
> + * --------------------------------------------------------------
> + *
> + * Life cycle of a VM
> + * --------------------
> + *
> + * --------------------------------------------------------------------
> + * |         |  start    |  H_SVM_  |H_SVM_   |H_SVM_     |UV_SVM_    |
> + * |         |  VM       |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
> + * |         |           |          |         |           |           |
> + * --------- ----------------------------------------------------------
> + * |         |           |          |         |           |           |
> + * | Normal  | Normal    | Transient|Error    |Error      |Normal     |
> + * |         |           |          |         |           |           |
> + * | Secure  |   Error   | Error    |Error    |Error      |Normal     |
> + * |         |           |          |         |           |           |
> + * |Transient|   N/A     | Error    |Secure   |Normal     |Normal     |
> + * --------------------------------------------------------------------
> + */
> +
> +#define KVMPPC_GFN_UVMEM_PFN	(1UL << 63)
> +#define KVMPPC_GFN_MEM_PFN	(1UL << 62)
> +#define KVMPPC_GFN_SHARED	(1UL << 61)
> +#define KVMPPC_GFN_SECURE	(KVMPPC_GFN_UVMEM_PFN | KVMPPC_GFN_MEM_PFN)
> +#define KVMPPC_GFN_FLAG_MASK	(KVMPPC_GFN_SECURE | KVMPPC_GFN_SHARED)
> +#define KVMPPC_GFN_PFN_MASK	(~KVMPPC_GFN_FLAG_MASK)
>  
>  struct kvmppc_uvmem_slot {
>  	struct list_head list;
> @@ -106,11 +226,11 @@ struct kvmppc_uvmem_slot {
>  	unsigned long base_pfn;
>  	unsigned long *pfns;
>  };
> -
>  struct kvmppc_uvmem_page_pvt {
>  	struct kvm *kvm;
>  	unsigned long gpa;
>  	bool skip_page_out;
> +	bool remove_gfn;
>  };
>  
>  bool kvmppc_uvmem_available(void)
> @@ -163,8 +283,8 @@ void kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot *slot)
>  	mutex_unlock(&kvm->arch.uvmem_lock);
>  }
>  
> -static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
> -				    struct kvm *kvm)
> +static void kvmppc_mark_gfn(unsigned long gfn, struct kvm *kvm,
> +			unsigned long flag, unsigned long uvmem_pfn)
>  {
>  	struct kvmppc_uvmem_slot *p;
>  
> @@ -172,24 +292,41 @@ static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
>  		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
>  			unsigned long index = gfn - p->base_pfn;
>  
> -			p->pfns[index] = uvmem_pfn | KVMPPC_UVMEM_PFN;
> +			if (flag = KVMPPC_GFN_UVMEM_PFN)
> +				p->pfns[index] = uvmem_pfn | flag;
> +			else
> +				p->pfns[index] = flag;
>  			return;
>  		}
>  	}
>  }
>  
> -static void kvmppc_uvmem_pfn_remove(unsigned long gfn, struct kvm *kvm)
> +/* mark the GFN as secure-GFN associated with @uvmem pfn device-PFN. */
> +static void kvmppc_gfn_secure_uvmem_pfn(unsigned long gfn,
> +			unsigned long uvmem_pfn, struct kvm *kvm)
>  {
> -	struct kvmppc_uvmem_slot *p;
> +	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_UVMEM_PFN, uvmem_pfn);
> +}
>  
> -	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
> -		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
> -			p->pfns[gfn - p->base_pfn] = 0;
> -			return;
> -		}
> -	}
> +/* mark the GFN as secure-GFN associated with a memory-PFN. */
> +static void kvmppc_gfn_secure_mem_pfn(unsigned long gfn, struct kvm *kvm)
> +{
> +	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_MEM_PFN, 0);
> +}
> +
> +/* mark the GFN as a shared GFN. */
> +static void kvmppc_gfn_shared(unsigned long gfn, struct kvm *kvm)
> +{
> +	kvmppc_mark_gfn(gfn, kvm, KVMPPC_GFN_SHARED, 0);
> +}
> +
> +/* mark the GFN as a non-existent GFN. */
> +static void kvmppc_gfn_remove(unsigned long gfn, struct kvm *kvm)
> +{
> +	kvmppc_mark_gfn(gfn, kvm, 0, 0);
>  }
>  
> +/* return true, if the GFN is a secure-GFN backed by a secure-PFN */
>  static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  				    unsigned long *uvmem_pfn)
>  {
> @@ -199,10 +336,10 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
>  			unsigned long index = gfn - p->base_pfn;
>  
> -			if (p->pfns[index] & KVMPPC_UVMEM_PFN) {
> +			if (p->pfns[index] & KVMPPC_GFN_UVMEM_PFN) {
>  				if (uvmem_pfn)
>  					*uvmem_pfn = p->pfns[index] &
> -						     ~KVMPPC_UVMEM_PFN;
> +						     KVMPPC_GFN_PFN_MASK;
>  				return true;
>  			} else
>  				return false;
> @@ -353,6 +490,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  
>  		mutex_lock(&kvm->arch.uvmem_lock);
>  		if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> +			kvmppc_gfn_remove(gfn, kvm);
>  			mutex_unlock(&kvm->arch.uvmem_lock);
>  			continue;
>  		}
> @@ -360,6 +498,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  		uvmem_page = pfn_to_page(uvmem_pfn);
>  		pvt = uvmem_page->zone_device_data;
>  		pvt->skip_page_out = skip_page_out;
> +		pvt->remove_gfn = true;
>  		mutex_unlock(&kvm->arch.uvmem_lock);
>  
>  		pfn = gfn_to_pfn(kvm, gfn);
> @@ -429,7 +568,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
>  		goto out_clear;
>  
>  	uvmem_pfn = bit + pfn_first;
> -	kvmppc_uvmem_pfn_insert(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
> +	kvmppc_gfn_secure_uvmem_pfn(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
>  
>  	pvt->gpa = gpa;
>  	pvt->kvm = kvm;
> @@ -524,6 +663,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
>  		uvmem_page = pfn_to_page(uvmem_pfn);
>  		pvt = uvmem_page->zone_device_data;
>  		pvt->skip_page_out = true;
> +		pvt->remove_gfn = false;
>  	}
>  
>  retry:
> @@ -537,12 +677,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
>  		uvmem_page = pfn_to_page(uvmem_pfn);
>  		pvt = uvmem_page->zone_device_data;
>  		pvt->skip_page_out = true;
> +		pvt->remove_gfn = false;

This is the case of making an already secure page as shared page.
A comment here as to why remove_gfn is set to false here will help.

Also isn't it by default false? Is there a situation where it starts
out by default false, becomes true later and you are required to
explicitly mark it false here?

Otherwise, Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.
  2020-07-17  8:00   ` Ram Pai
@ 2020-07-23  6:22     ` Bharata B Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-23  6:10 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:25AM -0700, Ram Pai wrote:
>  
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> +		const struct kvm_memory_slot *memslot)

Don't see any callers for this outside of this file, so why not static?

> +{
> +	unsigned long gfn = memslot->base_gfn;
> +	struct vm_area_struct *vma;
> +	unsigned long start, end;
> +	int ret = 0;
> +
> +	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {

So you checked the state of gfn under uvmem_lock above, but release
it too.

> +
> +		mmap_read_lock(kvm->mm);
> +		start = gfn_to_hva(kvm, gfn);
> +		if (kvm_is_error_hva(start)) {
> +			ret = H_STATE;
> +			goto next;
> +		}
> +
> +		end = start + (1UL << PAGE_SHIFT);
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> +			ret = H_STATE;
> +			goto next;
> +		}
> +
> +		mutex_lock(&kvm->arch.uvmem_lock);
> +		ret = kvmppc_svm_migrate_page(vma, start, end,
> +				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);

What is the guarantee that the gfn is in the same earlier state when you do
do migration here?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out
  2020-07-17  8:00   ` Ram Pai
@ 2020-07-23  6:25     ` Bharata B Rao
  -1 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-23  6:13 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:26AM -0700, Ram Pai wrote:
> @@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
>  	struct vm_area_struct *vma;
>  	int srcu_idx;
>  	unsigned long gfn = gpa >> page_shift;
> -	int ret;
> +	int ret, repeat_count = REPEAT_COUNT;
>  
>  	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>  		return H_UNSUPPORTED;
> @@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
>  	if (flags & H_PAGE_IN_SHARED)
>  		return kvmppc_share_page(kvm, gpa, page_shift);
>  
> -	ret = H_PARAMETER;
>  	srcu_idx = srcu_read_lock(&kvm->srcu);
> -	mmap_read_lock(kvm->mm);
>  
> -	start = gfn_to_hva(kvm, gfn);
> -	if (kvm_is_error_hva(start))
> -		goto out;
> -
> -	mutex_lock(&kvm->arch.uvmem_lock);
>  	/* Fail the page-in request of an already paged-in page */
> -	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
> -		goto out_unlock;
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
> +	mutex_unlock(&kvm->arch.uvmem_lock);

Same comment as for the prev patch. I don't think you can release
the lock here.

> +	if (ret) {
> +		srcu_read_unlock(&kvm->srcu, srcu_idx);
> +		return H_PARAMETER;
> +	}
>  
> -	end = start + (1UL << page_shift);
> -	vma = find_vma_intersection(kvm->mm, start, end);
> -	if (!vma || vma->vm_start > start || vma->vm_end < end)
> -		goto out_unlock;
> +	do {
> +		ret = H_PARAMETER;
> +		mmap_read_lock(kvm->mm);
>  
> -	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
> -				true))
> -		goto out_unlock;
> +		start = gfn_to_hva(kvm, gfn);
> +		if (kvm_is_error_hva(start)) {
> +			mmap_read_unlock(kvm->mm);
> +			break;
> +		}
>  
> -	ret = H_SUCCESS;
> +		end = start + (1UL << page_shift);
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> +			mmap_read_unlock(kvm->mm);
> +			break;
> +		}
> +
> +		mutex_lock(&kvm->arch.uvmem_lock);
> +		ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift, true);
> +		mutex_unlock(&kvm->arch.uvmem_lock);
> +
> +		mmap_read_unlock(kvm->mm);
> +	} while (ret == -2 && repeat_count--);
> +
> +	if (ret == -2)
> +		ret = H_BUSY;
>  
> -out_unlock:
> -	mutex_unlock(&kvm->arch.uvmem_lock);
> -out:
> -	mmap_read_unlock(kvm->mm);
>  	srcu_read_unlock(&kvm->srcu, srcu_idx);
>  	return ret;
>  }
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.
@ 2020-07-23  6:22     ` Bharata B Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-23  6:22 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:25AM -0700, Ram Pai wrote:
>  
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> +		const struct kvm_memory_slot *memslot)

Don't see any callers for this outside of this file, so why not static?

> +{
> +	unsigned long gfn = memslot->base_gfn;
> +	struct vm_area_struct *vma;
> +	unsigned long start, end;
> +	int ret = 0;
> +
> +	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {

So you checked the state of gfn under uvmem_lock above, but release
it too.

> +
> +		mmap_read_lock(kvm->mm);
> +		start = gfn_to_hva(kvm, gfn);
> +		if (kvm_is_error_hva(start)) {
> +			ret = H_STATE;
> +			goto next;
> +		}
> +
> +		end = start + (1UL << PAGE_SHIFT);
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> +			ret = H_STATE;
> +			goto next;
> +		}
> +
> +		mutex_lock(&kvm->arch.uvmem_lock);
> +		ret = kvmppc_svm_migrate_page(vma, start, end,
> +				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);

What is the guarantee that the gfn is in the same earlier state when you do
do migration here?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out
@ 2020-07-23  6:25     ` Bharata B Rao
  0 siblings, 0 replies; 28+ messages in thread
From: Bharata B Rao @ 2020-07-23  6:25 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 17, 2020 at 01:00:26AM -0700, Ram Pai wrote:
> @@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
>  	struct vm_area_struct *vma;
>  	int srcu_idx;
>  	unsigned long gfn = gpa >> page_shift;
> -	int ret;
> +	int ret, repeat_count = REPEAT_COUNT;
>  
>  	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>  		return H_UNSUPPORTED;
> @@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
>  	if (flags & H_PAGE_IN_SHARED)
>  		return kvmppc_share_page(kvm, gpa, page_shift);
>  
> -	ret = H_PARAMETER;
>  	srcu_idx = srcu_read_lock(&kvm->srcu);
> -	mmap_read_lock(kvm->mm);
>  
> -	start = gfn_to_hva(kvm, gfn);
> -	if (kvm_is_error_hva(start))
> -		goto out;
> -
> -	mutex_lock(&kvm->arch.uvmem_lock);
>  	/* Fail the page-in request of an already paged-in page */
> -	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
> -		goto out_unlock;
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
> +	mutex_unlock(&kvm->arch.uvmem_lock);

Same comment as for the prev patch. I don't think you can release
the lock here.

> +	if (ret) {
> +		srcu_read_unlock(&kvm->srcu, srcu_idx);
> +		return H_PARAMETER;
> +	}
>  
> -	end = start + (1UL << page_shift);
> -	vma = find_vma_intersection(kvm->mm, start, end);
> -	if (!vma || vma->vm_start > start || vma->vm_end < end)
> -		goto out_unlock;
> +	do {
> +		ret = H_PARAMETER;
> +		mmap_read_lock(kvm->mm);
>  
> -	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
> -				true))
> -		goto out_unlock;
> +		start = gfn_to_hva(kvm, gfn);
> +		if (kvm_is_error_hva(start)) {
> +			mmap_read_unlock(kvm->mm);
> +			break;
> +		}
>  
> -	ret = H_SUCCESS;
> +		end = start + (1UL << page_shift);
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> +			mmap_read_unlock(kvm->mm);
> +			break;
> +		}
> +
> +		mutex_lock(&kvm->arch.uvmem_lock);
> +		ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift, true);
> +		mutex_unlock(&kvm->arch.uvmem_lock);
> +
> +		mmap_read_unlock(kvm->mm);
> +	} while (ret = -2 && repeat_count--);
> +
> +	if (ret = -2)
> +		ret = H_BUSY;
>  
> -out_unlock:
> -	mutex_unlock(&kvm->arch.uvmem_lock);
> -out:
> -	mmap_read_unlock(kvm->mm);
>  	srcu_read_unlock(&kvm->srcu, srcu_idx);
>  	return ret;
>  }
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
  2020-07-23  4:49     ` Bharata B Rao
@ 2020-07-23 11:14       ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-23 11:14 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Thu, Jul 23, 2020 at 10:18:30AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:24AM -0700, Ram Pai wrote:
> >  	pvt->gpa = gpa;
..snip..
> >  	pvt->kvm = kvm;
> > @@ -524,6 +663,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
> >  		uvmem_page = pfn_to_page(uvmem_pfn);
> >  		pvt = uvmem_page->zone_device_data;
> >  		pvt->skip_page_out = true;
> > +		pvt->remove_gfn = false;
> >  	}
> >  
> >  retry:
> > @@ -537,12 +677,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
> >  		uvmem_page = pfn_to_page(uvmem_pfn);
> >  		pvt = uvmem_page->zone_device_data;
> >  		pvt->skip_page_out = true;
> > +		pvt->remove_gfn = false;
> 
> This is the case of making an already secure page as shared page.
> A comment here as to why remove_gfn is set to false here will help.
> 
> Also isn't it by default false? Is there a situation where it starts
> out by default false, becomes true later and you are required to
> explicitly mark it false here?

It is by default false. And will be true when the GFN is
released/invalidated through kvmppc_uvmem_drop_pages().

It is marked false explicitly here, just to be safe, and protect
against any implicit changes.

> 
> Otherwise, Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
> 
Thanks for the review.

RP

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs
@ 2020-07-23 11:14       ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-23 11:14 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Thu, Jul 23, 2020 at 10:18:30AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:24AM -0700, Ram Pai wrote:
> >  	pvt->gpa = gpa;
..snip..
> >  	pvt->kvm = kvm;
> > @@ -524,6 +663,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
> >  		uvmem_page = pfn_to_page(uvmem_pfn);
> >  		pvt = uvmem_page->zone_device_data;
> >  		pvt->skip_page_out = true;
> > +		pvt->remove_gfn = false;
> >  	}
> >  
> >  retry:
> > @@ -537,12 +677,16 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
> >  		uvmem_page = pfn_to_page(uvmem_pfn);
> >  		pvt = uvmem_page->zone_device_data;
> >  		pvt->skip_page_out = true;
> > +		pvt->remove_gfn = false;
> 
> This is the case of making an already secure page as shared page.
> A comment here as to why remove_gfn is set to false here will help.
> 
> Also isn't it by default false? Is there a situation where it starts
> out by default false, becomes true later and you are required to
> explicitly mark it false here?

It is by default false. And will be true when the GFN is
released/invalidated through kvmppc_uvmem_drop_pages().

It is marked false explicitly here, just to be safe, and protect
against any implicit changes.

> 
> Otherwise, Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
> 
Thanks for the review.

RP

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.
  2020-07-23  6:22     ` Bharata B Rao
@ 2020-07-23 11:39       ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-23 11:39 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Thu, Jul 23, 2020 at 11:40:37AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:25AM -0700, Ram Pai wrote:
> >  
> > +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> > +		const struct kvm_memory_slot *memslot)
> 
> Don't see any callers for this outside of this file, so why not static?
> 
> > +{
> > +	unsigned long gfn = memslot->base_gfn;
> > +	struct vm_area_struct *vma;
> > +	unsigned long start, end;
> > +	int ret = 0;
> > +
> > +	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {
> 
> So you checked the state of gfn under uvmem_lock above, but release
> it too.
> 
> > +
> > +		mmap_read_lock(kvm->mm);
> > +		start = gfn_to_hva(kvm, gfn);
> > +		if (kvm_is_error_hva(start)) {
> > +			ret = H_STATE;
> > +			goto next;
> > +		}
> > +
> > +		end = start + (1UL << PAGE_SHIFT);
> > +		vma = find_vma_intersection(kvm->mm, start, end);
> > +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +			ret = H_STATE;
> > +			goto next;
> > +		}
> > +
> > +		mutex_lock(&kvm->arch.uvmem_lock);
> > +		ret = kvmppc_svm_migrate_page(vma, start, end,
> > +				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> 
> What is the guarantee that the gfn is in the same earlier state when you do
> do migration here?

Are you worried about the case, where someother thread will sneak-in and
migrate the GFN, and this migration request will become a duplicate one?

That is theortically possible, though practically improbable. This
transition is attempted only when there is one vcpu active in the VM.

However, may be, we should not bake-in that assumption in this code.
Will remove that assumption.

RP

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.
@ 2020-07-23 11:39       ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-23 11:39 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Thu, Jul 23, 2020 at 11:40:37AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:25AM -0700, Ram Pai wrote:
> >  
> > +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> > +		const struct kvm_memory_slot *memslot)
> 
> Don't see any callers for this outside of this file, so why not static?
> 
> > +{
> > +	unsigned long gfn = memslot->base_gfn;
> > +	struct vm_area_struct *vma;
> > +	unsigned long start, end;
> > +	int ret = 0;
> > +
> > +	while (kvmppc_next_nontransitioned_gfn(memslot, kvm, &gfn)) {
> 
> So you checked the state of gfn under uvmem_lock above, but release
> it too.
> 
> > +
> > +		mmap_read_lock(kvm->mm);
> > +		start = gfn_to_hva(kvm, gfn);
> > +		if (kvm_is_error_hva(start)) {
> > +			ret = H_STATE;
> > +			goto next;
> > +		}
> > +
> > +		end = start + (1UL << PAGE_SHIFT);
> > +		vma = find_vma_intersection(kvm->mm, start, end);
> > +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +			ret = H_STATE;
> > +			goto next;
> > +		}
> > +
> > +		mutex_lock(&kvm->arch.uvmem_lock);
> > +		ret = kvmppc_svm_migrate_page(vma, start, end,
> > +				(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> 
> What is the guarantee that the gfn is in the same earlier state when you do
> do migration here?

Are you worried about the case, where someother thread will sneak-in and
migrate the GFN, and this migration request will become a duplicate one?

That is theortically possible, though practically improbable. This
transition is attempted only when there is one vcpu active in the VM.

However, may be, we should not bake-in that assumption in this code.
Will remove that assumption.

RP

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out
  2020-07-23  6:25     ` Bharata B Rao
@ 2020-07-23 11:44       ` Ram Pai
  -1 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-23 11:44 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

I am dropping this patch based on our conversation, where we agreed, we
need to rootcause the migration failure.

On Thu, Jul 23, 2020 at 11:43:44AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:26AM -0700, Ram Pai wrote:
> > @@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
> >  	struct vm_area_struct *vma;
> >  	int srcu_idx;
> >  	unsigned long gfn = gpa >> page_shift;
> > -	int ret;
> > +	int ret, repeat_count = REPEAT_COUNT;
> >  
> >  	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> >  		return H_UNSUPPORTED;
> > @@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
> >  	if (flags & H_PAGE_IN_SHARED)
> >  		return kvmppc_share_page(kvm, gpa, page_shift);
> >  
> > -	ret = H_PARAMETER;
> >  	srcu_idx = srcu_read_lock(&kvm->srcu);
> > -	mmap_read_lock(kvm->mm);
> >  
> > -	start = gfn_to_hva(kvm, gfn);
> > -	if (kvm_is_error_hva(start))
> > -		goto out;
> > -
> > -	mutex_lock(&kvm->arch.uvmem_lock);
> >  	/* Fail the page-in request of an already paged-in page */
> > -	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
> > -		goto out_unlock;
> > +	mutex_lock(&kvm->arch.uvmem_lock);
> > +	ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
> > +	mutex_unlock(&kvm->arch.uvmem_lock);
> 
> Same comment as for the prev patch. I don't think you can release
> the lock here.
> 
> > +	if (ret) {
> > +		srcu_read_unlock(&kvm->srcu, srcu_idx);
> > +		return H_PARAMETER;
> > +	}
> >  
> > -	end = start + (1UL << page_shift);
> > -	vma = find_vma_intersection(kvm->mm, start, end);
> > -	if (!vma || vma->vm_start > start || vma->vm_end < end)
> > -		goto out_unlock;
> > +	do {
> > +		ret = H_PARAMETER;
> > +		mmap_read_lock(kvm->mm);
> >  
> > -	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
> > -				true))
> > -		goto out_unlock;
> > +		start = gfn_to_hva(kvm, gfn);
> > +		if (kvm_is_error_hva(start)) {
> > +			mmap_read_unlock(kvm->mm);
> > +			break;
> > +		}
> >  
> > -	ret = H_SUCCESS;
> > +		end = start + (1UL << page_shift);
> > +		vma = find_vma_intersection(kvm->mm, start, end);
> > +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +			mmap_read_unlock(kvm->mm);
> > +			break;
> > +		}
> > +
> > +		mutex_lock(&kvm->arch.uvmem_lock);
> > +		ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift, true);
> > +		mutex_unlock(&kvm->arch.uvmem_lock);
> > +
> > +		mmap_read_unlock(kvm->mm);
> > +	} while (ret == -2 && repeat_count--);
> > +
> > +	if (ret == -2)
> > +		ret = H_BUSY;
> >  
> > -out_unlock:
> > -	mutex_unlock(&kvm->arch.uvmem_lock);
> > -out:
> > -	mmap_read_unlock(kvm->mm);
> >  	srcu_read_unlock(&kvm->srcu, srcu_idx);
> >  	return ret;
> >  }
> > -- 
> > 1.8.3.1

-- 
Ram Pai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out
@ 2020-07-23 11:44       ` Ram Pai
  0 siblings, 0 replies; 28+ messages in thread
From: Ram Pai @ 2020-07-23 11:44 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

I am dropping this patch based on our conversation, where we agreed, we
need to rootcause the migration failure.

On Thu, Jul 23, 2020 at 11:43:44AM +0530, Bharata B Rao wrote:
> On Fri, Jul 17, 2020 at 01:00:26AM -0700, Ram Pai wrote:
> > @@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
> >  	struct vm_area_struct *vma;
> >  	int srcu_idx;
> >  	unsigned long gfn = gpa >> page_shift;
> > -	int ret;
> > +	int ret, repeat_count = REPEAT_COUNT;
> >  
> >  	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> >  		return H_UNSUPPORTED;
> > @@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
> >  	if (flags & H_PAGE_IN_SHARED)
> >  		return kvmppc_share_page(kvm, gpa, page_shift);
> >  
> > -	ret = H_PARAMETER;
> >  	srcu_idx = srcu_read_lock(&kvm->srcu);
> > -	mmap_read_lock(kvm->mm);
> >  
> > -	start = gfn_to_hva(kvm, gfn);
> > -	if (kvm_is_error_hva(start))
> > -		goto out;
> > -
> > -	mutex_lock(&kvm->arch.uvmem_lock);
> >  	/* Fail the page-in request of an already paged-in page */
> > -	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
> > -		goto out_unlock;
> > +	mutex_lock(&kvm->arch.uvmem_lock);
> > +	ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
> > +	mutex_unlock(&kvm->arch.uvmem_lock);
> 
> Same comment as for the prev patch. I don't think you can release
> the lock here.
> 
> > +	if (ret) {
> > +		srcu_read_unlock(&kvm->srcu, srcu_idx);
> > +		return H_PARAMETER;
> > +	}
> >  
> > -	end = start + (1UL << page_shift);
> > -	vma = find_vma_intersection(kvm->mm, start, end);
> > -	if (!vma || vma->vm_start > start || vma->vm_end < end)
> > -		goto out_unlock;
> > +	do {
> > +		ret = H_PARAMETER;
> > +		mmap_read_lock(kvm->mm);
> >  
> > -	if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
> > -				true))
> > -		goto out_unlock;
> > +		start = gfn_to_hva(kvm, gfn);
> > +		if (kvm_is_error_hva(start)) {
> > +			mmap_read_unlock(kvm->mm);
> > +			break;
> > +		}
> >  
> > -	ret = H_SUCCESS;
> > +		end = start + (1UL << page_shift);
> > +		vma = find_vma_intersection(kvm->mm, start, end);
> > +		if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +			mmap_read_unlock(kvm->mm);
> > +			break;
> > +		}
> > +
> > +		mutex_lock(&kvm->arch.uvmem_lock);
> > +		ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift, true);
> > +		mutex_unlock(&kvm->arch.uvmem_lock);
> > +
> > +		mmap_read_unlock(kvm->mm);
> > +	} while (ret = -2 && repeat_count--);
> > +
> > +	if (ret = -2)
> > +		ret = H_BUSY;
> >  
> > -out_unlock:
> > -	mutex_unlock(&kvm->arch.uvmem_lock);
> > -out:
> > -	mmap_read_unlock(kvm->mm);
> >  	srcu_read_unlock(&kvm->srcu, srcu_idx);
> >  	return ret;
> >  }
> > -- 
> > 1.8.3.1

-- 
Ram Pai

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-07-23 11:46 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17  8:00 [v4 0/5] Migrate non-migrated pages of a SVM Ram Pai
2020-07-17  8:00 ` Ram Pai
2020-07-17  8:00 ` [v4 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START Ram Pai
2020-07-17  8:00   ` Ram Pai
2020-07-22  8:52   ` Bharata B Rao
2020-07-22  8:52     ` Bharata B Rao
2020-07-17  8:00 ` [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs Ram Pai
2020-07-17  8:00   ` Ram Pai
2020-07-23  4:48   ` Bharata B Rao
2020-07-23  4:49     ` Bharata B Rao
2020-07-23 11:14     ` Ram Pai
2020-07-23 11:14       ` Ram Pai
2020-07-17  8:00 ` [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs Ram Pai
2020-07-17  8:00   ` Ram Pai
2020-07-23  6:10   ` Bharata B Rao
2020-07-23  6:22     ` Bharata B Rao
2020-07-23 11:39     ` Ram Pai
2020-07-23 11:39       ` Ram Pai
2020-07-17  8:00 ` [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out Ram Pai
2020-07-17  8:00   ` Ram Pai
2020-07-23  6:13   ` Bharata B Rao
2020-07-23  6:25     ` Bharata B Rao
2020-07-23 11:44     ` Ram Pai
2020-07-23 11:44       ` Ram Pai
2020-07-17  8:00 ` [v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory Ram Pai
2020-07-17  8:00   ` Ram Pai
2020-07-22 10:01   ` Bharata B Rao
2020-07-22 10:13     ` Bharata B Rao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.