LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest
@ 2019-11-04  4:17 Bharata B Rao
  2019-11-04  4:17 ` [PATCH v10 1/8] mm: ksm: Export ksm_madvise() Bharata B Rao
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: linuxram, cclaudio, Bharata B Rao, jglisse, aneesh.kumar, paulus,
	sukadev, hch

Hi,

This is the next version of the patchset that adds required support
in the KVM hypervisor to run secure guests on PEF-enabled POWER platforms.

The major change in this version is about not using kvm.arch->rmap[]
array to store device PFNs, thus not depending on the memslot availability
to reach to the device PFN from the fault path. Instead of rmap[], we
now have a different array which gets created and destroyed along with
memslot creation and deletion. These arrays hang off from kvm.arch and
are arragned in a simple linked list for now. We could move to some other
data structure in future if walking of linked list becomes an overhead
due to large number of memslots.

Other changes include:

- Rearranged/Merged/Cleaned up patches, removed all Acks/Reviewed-by since
  all the patches have changed.
- Added a new patch to support H_SVM_INIT_ABORT hcall (From Suka)
- Added KSM unmerge support so that VMAs that have device PFNs don't
  participate in KSM merging and eventually crash in KSM code.
- Release device pages during unplug (Paul) and ensure that memory
  hotplug and unplug works correctly.
- Let kvm-hv module to load on PEF-disabled platforms (Ram) when
  CONFIG_PPC_UV is enabled allowing regular non-secure guests
  to still run.
- Support guest reset when swithing to secure is in progress.
- Check if page is already secure in kvmppc_send_page_to_uv() before
  sending it to UV.
- Fixed sentinal for header file kvm_book3s_uvmem.h (Jason)

Now, all the dependencies required by this patchset are in powerpc/next
on which this patchset is based upon.

Outside of PowerPC code, this needs a change in KSM code as this
patchset uses ksm_madvise() which is not exported.

Anshuman Khandual (1):
  KVM: PPC: Ultravisor: Add PPC_UV config option

Bharata B Rao (6):
  mm: ksm: Export ksm_madvise()
  KVM: PPC: Support for running secure guests
  KVM: PPC: Shared pages support for secure guests
  KVM: PPC: Radix changes for secure guest
  KVM: PPC: Handle memory plug/unplug to secure VM
  KVM: PPC: Support reset of secure guest

Sukadev Bhattiprolu (1):
  KVM: PPC: Implement H_SVM_INIT_ABORT hcall

 Documentation/powerpc/ultravisor.rst        |  39 +
 Documentation/virt/kvm/api.txt              |  19 +
 arch/powerpc/Kconfig                        |  17 +
 arch/powerpc/include/asm/hvcall.h           |  10 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  80 ++
 arch/powerpc/include/asm/kvm_host.h         |   7 +
 arch/powerpc/include/asm/kvm_ppc.h          |   2 +
 arch/powerpc/include/asm/ultravisor-api.h   |   6 +
 arch/powerpc/include/asm/ultravisor.h       |  36 +
 arch/powerpc/kvm/Makefile                   |   3 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c      |  25 +
 arch/powerpc/kvm/book3s_hv.c                | 144 ++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S     |  23 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 794 ++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c                  |  12 +
 include/uapi/linux/kvm.h                    |   1 +
 mm/ksm.c                                    |   1 +
 17 files changed, 1217 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_uvmem.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 1/8] mm: ksm: Export ksm_madvise()
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
@ 2019-11-04  4:17 ` Bharata B Rao
  2019-11-06  4:33   ` Paul Mackerras
  2019-11-04  4:17 ` [PATCH v10 2/8] KVM: PPC: Support for running secure guests Bharata B Rao
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: linuxram, cclaudio, Bharata B Rao, jglisse, aneesh.kumar, paulus,
	sukadev, hch

KVM PPC module needs ksm_madvise() for supporting secure guests.
Guest pages that become secure are represented as device private
pages in the host. Such pages shouldn't participate in KSM merging.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 mm/ksm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/ksm.c b/mm/ksm.c
index dbee2eb4dd05..e45b02ad3f0b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2478,6 +2478,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(ksm_madvise);
 
 int __ksm_enter(struct mm_struct *mm)
 {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 2/8] KVM: PPC: Support for running secure guests
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
  2019-11-04  4:17 ` [PATCH v10 1/8] mm: ksm: Export ksm_madvise() Bharata B Rao
@ 2019-11-04  4:17 ` Bharata B Rao
  2019-11-06  4:34   ` Paul Mackerras
  2019-11-04  4:17 ` [PATCH v10 3/8] KVM: PPC: Shared pages support for " Bharata B Rao
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: linuxram, cclaudio, Bharata B Rao, jglisse, aneesh.kumar, paulus,
	sukadev, hch

A pseries guest can be run as secure guest on Ultravisor-enabled
POWER platforms. On such platforms, this driver will be used to manage
the movement of guest pages between the normal memory managed by
hypervisor (HV) and secure memory managed by Ultravisor (UV).

HV is informed about the guest's transition to secure mode via hcalls:

H_SVM_INIT_START: Initiate securing a VM
H_SVM_INIT_DONE: Conclude securing a VM

As part of H_SVM_INIT_START, register all existing memslots with
the UV. H_SVM_INIT_DONE call by UV informs HV that transition of
the guest to secure mode is complete.

These two states (transition to secure mode STARTED and transition
to secure mode COMPLETED) are recorded in kvm->arch.secure_guest.
Setting these states will cause the assembly code that enters the
guest to call the UV_RETURN ucall instead of trying to enter the
guest directly.

Migration of pages betwen normal and secure memory of secure
guest is implemented in H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls.

H_SVM_PAGE_IN: Move the content of a normal page to secure page
H_SVM_PAGE_OUT: Move the content of a secure page to normal page

Private ZONE_DEVICE memory equal to the amount of secure memory
available in the platform for running secure guests is created.
Whenever a page belonging to the guest becomes secure, a page from
this private device memory is used to represent and track that secure
page on the HV side. The movement of pages between normal and secure
memory is done via migrate_vma_pages() using UV_PAGE_IN and
UV_PAGE_OUT ucalls.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h           |   6 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  62 ++
 arch/powerpc/include/asm/kvm_host.h         |   6 +
 arch/powerpc/include/asm/ultravisor-api.h   |   3 +
 arch/powerpc/include/asm/ultravisor.h       |  21 +
 arch/powerpc/kvm/Makefile                   |   3 +
 arch/powerpc/kvm/book3s_hv.c                |  29 +
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 613 ++++++++++++++++++++
 8 files changed, 743 insertions(+)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_uvmem.c

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 11112023e327..4150732c81a0 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -342,6 +342,12 @@
 #define H_TLB_INVALIDATE	0xF808
 #define H_COPY_TOFROM_GUEST	0xF80C
 
+/* Platform-specific hcalls used by the Ultravisor */
+#define H_SVM_PAGE_IN		0xEF00
+#define H_SVM_PAGE_OUT		0xEF04
+#define H_SVM_INIT_START	0xEF08
+#define H_SVM_INIT_DONE		0xEF0C
+
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR		1
 #define H_SET_MODE_RESOURCE_SET_DAWR		2
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
new file mode 100644
index 000000000000..95f389c2937b
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_BOOK3S_UVMEM_H__
+#define __ASM_KVM_BOOK3S_UVMEM_H__
+
+#ifdef CONFIG_PPC_UV
+int kvmppc_uvmem_init(void);
+void kvmppc_uvmem_free(void);
+int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot);
+void kvmppc_uvmem_slot_free(struct kvm *kvm,
+			    const struct kvm_memory_slot *slot);
+unsigned long kvmppc_h_svm_page_in(struct kvm *kvm,
+				   unsigned long gra,
+				   unsigned long flags,
+				   unsigned long page_shift);
+unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
+				    unsigned long gra,
+				    unsigned long flags,
+				    unsigned long page_shift);
+unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
+unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
+#else
+static inline int kvmppc_uvmem_init(void)
+{
+	return 0;
+}
+
+static inline void kvmppc_uvmem_free(void) { }
+
+static inline int
+kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
+{
+	return 0;
+}
+
+static inline void
+kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot *slot) { }
+
+static inline unsigned long
+kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gra,
+		     unsigned long flags, unsigned long page_shift)
+{
+	return H_UNSUPPORTED;
+}
+
+static inline unsigned long
+kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gra,
+		      unsigned long flags, unsigned long page_shift)
+{
+	return H_UNSUPPORTED;
+}
+
+static inline unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
+{
+	return H_UNSUPPORTED;
+}
+
+static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
+{
+	return H_UNSUPPORTED;
+}
+#endif /* CONFIG_PPC_UV */
+#endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 6fe6ad64cba5..577ca95fac7c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -275,6 +275,10 @@ struct kvm_hpt_info {
 
 struct kvm_resize_hpt;
 
+/* Flag values for kvm_arch.secure_guest */
+#define KVMPPC_SECURE_INIT_START 0x1 /* H_SVM_INIT_START has been called */
+#define KVMPPC_SECURE_INIT_DONE  0x2 /* H_SVM_INIT_DONE completed */
+
 struct kvm_arch {
 	unsigned int lpid;
 	unsigned int smt_mode;		/* # vcpus per virtual core */
@@ -330,6 +334,8 @@ struct kvm_arch {
 #endif
 	struct kvmppc_ops *kvm_ops;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	struct mutex uvmem_lock;
+	struct list_head uvmem_pfns;
 	struct mutex mmu_setup_lock;	/* nests inside vcpu mutexes */
 	u64 l1_ptcr;
 	int max_nested_lpid;
diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h
index 4fcda1d5793d..2483f15bd71a 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -26,6 +26,9 @@
 #define UV_WRITE_PATE			0xF104
 #define UV_RETURN			0xF11C
 #define UV_ESM				0xF110
+#define UV_REGISTER_MEM_SLOT		0xF120
+#define UV_PAGE_IN			0xF128
+#define UV_PAGE_OUT			0xF12C
 #define UV_SHARE_PAGE			0xF130
 #define UV_UNSHARE_PAGE			0xF134
 #define UV_UNSHARE_ALL_PAGES		0xF140
diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h
index b1bc2e043ed4..79bb005e8ee9 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -46,4 +46,25 @@ static inline int uv_unshare_all_pages(void)
 	return ucall_norets(UV_UNSHARE_ALL_PAGES);
 }
 
+static inline int uv_page_in(u64 lpid, u64 src_ra, u64 dst_gpa, u64 flags,
+			     u64 page_shift)
+{
+	return ucall_norets(UV_PAGE_IN, lpid, src_ra, dst_gpa, flags,
+			    page_shift);
+}
+
+static inline int uv_page_out(u64 lpid, u64 dst_ra, u64 src_gpa, u64 flags,
+			      u64 page_shift)
+{
+	return ucall_norets(UV_PAGE_OUT, lpid, dst_ra, src_gpa, flags,
+			    page_shift);
+}
+
+static inline int uv_register_mem_slot(u64 lpid, u64 start_gpa, u64 size,
+				       u64 flags, u64 slotid)
+{
+	return ucall_norets(UV_REGISTER_MEM_SLOT, lpid, start_gpa,
+			    size, flags, slotid);
+}
+
 #endif	/* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 4c67cc79de7c..2bfeaa13befb 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -71,6 +71,9 @@ kvm-hv-y += \
 	book3s_64_mmu_radix.o \
 	book3s_hv_nested.o
 
+kvm-hv-$(CONFIG_PPC_UV) += \
+	book3s_hv_uvmem.o
+
 kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
 	book3s_hv_tm.o
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 709cf1fd4cf4..80e84277d11f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -72,6 +72,8 @@
 #include <asm/xics.h>
 #include <asm/xive.h>
 #include <asm/hw_breakpoint.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_book3s_uvmem.h>
 
 #include "book3s.h"
 
@@ -1078,6 +1080,25 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 					 kvmppc_get_gpr(vcpu, 5),
 					 kvmppc_get_gpr(vcpu, 6));
 		break;
+	case H_SVM_PAGE_IN:
+		ret = kvmppc_h_svm_page_in(vcpu->kvm,
+					   kvmppc_get_gpr(vcpu, 4),
+					   kvmppc_get_gpr(vcpu, 5),
+					   kvmppc_get_gpr(vcpu, 6));
+		break;
+	case H_SVM_PAGE_OUT:
+		ret = kvmppc_h_svm_page_out(vcpu->kvm,
+					    kvmppc_get_gpr(vcpu, 4),
+					    kvmppc_get_gpr(vcpu, 5),
+					    kvmppc_get_gpr(vcpu, 6));
+		break;
+	case H_SVM_INIT_START:
+		ret = kvmppc_h_svm_init_start(vcpu->kvm);
+		break;
+	case H_SVM_INIT_DONE:
+		ret = kvmppc_h_svm_init_done(vcpu->kvm);
+		break;
+
 	default:
 		return RESUME_HOST;
 	}
@@ -4784,6 +4805,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	char buf[32];
 	int ret;
 
+	mutex_init(&kvm->arch.uvmem_lock);
+	INIT_LIST_HEAD(&kvm->arch.uvmem_pfns);
 	mutex_init(&kvm->arch.mmu_setup_lock);
 
 	/* Allocate the guest's logical partition ID */
@@ -4955,6 +4978,7 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 		kvm->arch.process_table = 0;
 		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
 	}
+
 	kvmppc_free_lpid(kvm->arch.lpid);
 
 	kvmppc_free_pimap(kvm);
@@ -5544,11 +5568,16 @@ static int kvmppc_book3s_init_hv(void)
 			no_mixing_hpt_and_radix = true;
 	}
 
+	r = kvmppc_uvmem_init();
+	if (r < 0)
+		pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
+
 	return r;
 }
 
 static void kvmppc_book3s_exit_hv(void)
 {
+	kvmppc_uvmem_free();
 	kvmppc_free_host_rm_ops();
 	if (kvmppc_radix_possible())
 		kvmppc_radix_exit();
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
new file mode 100644
index 000000000000..fe456fd07c74
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -0,0 +1,613 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure pages management: Migration of pages between normal and secure
+ * memory of KVM guests.
+ *
+ * Copyright 2018 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+/*
+ * A pseries guest can be run as secure guest on Ultravisor-enabled
+ * POWER platforms. On such platforms, this driver will be used to manage
+ * the movement of guest pages between the normal memory managed by
+ * hypervisor (HV) and secure memory managed by Ultravisor (UV).
+ *
+ * The page-in or page-out requests from UV will come to HV as hcalls and
+ * HV will call back into UV via ultracalls to satisfy these page requests.
+ *
+ * Private ZONE_DEVICE memory equal to the amount of secure memory
+ * available in the platform for running secure guests is hotplugged.
+ * Whenever a page belonging to the guest becomes secure, a page from this
+ * private device memory is used to represent and track that secure page
+ * on the HV side.
+ */
+
+/*
+ * Notes on locking
+ *
+ * kvm->arch.uvmem_lock is a per-guest lock that prevents concurrent
+ * page-in and page-out requests for the same GPA. Concurrent accesses
+ * can either come via UV (guest vCPUs requesting for same page)
+ * or when HV and guest simultaneously access the same page.
+ * This mutex serializes the migration of page from HV(normal) to
+ * UV(secure) and vice versa. So the serialization points are around
+ * migrate_vma routines and page-in/out routines.
+ *
+ * Per-guest mutex comes with a cost though. Mainly it serializes the
+ * fault path as page-out can occur when HV faults on accessing secure
+ * guest pages. Currently UV issues page-in requests for all the guest
+ * PFNs one at a time during early boot (UV_ESM uvcall), so this is
+ * not a cause for concern. Also currently the number of page-outs caused
+ * by HV touching secure pages is very very low. If an when UV supports
+ * overcommitting, then we might see concurrent guest driven page-outs.
+ *
+ * Locking order
+ *
+ * 1. srcu_read_lock(&kvm->srcu) - Protects KVM memslots
+ * 2. down_read(&kvm->mm->mmap_sem) - find_vma, migrate_vma_pages and helpers
+ * 3. mutex_lock(&kvm->arch.uvmem_lock) - protects read/writes to uvmem slots
+ *					  thus acting as sync-points
+ *					  for page-in/out
+ */
+
+/*
+ * Notes on page size
+ *
+ * Currently UV uses 2MB mappings internally, but will issue H_SVM_PAGE_IN
+ * and H_SVM_PAGE_OUT hcalls in PAGE_SIZE(64K) granularity. HV tracks
+ * secure GPAs at 64K page size and maintains one device PFN for each
+ * 64K secure GPA. UV_PAGE_IN and UV_PAGE_OUT calls by HV are also issued
+ * for 64K page at a time.
+ *
+ * HV faulting on secure pages: When HV touches any secure page, it
+ * faults and issues a UV_PAGE_OUT request with 64K page size. Currently
+ * UV splits and remaps the 2MB page if necessary and copies out the
+ * required 64K page contents.
+ *
+ * In summary, the current secure pages handling code in HV assumes
+ * 64K page size and in fact fails any page-in/page-out requests of
+ * non-64K size upfront. If and when UV starts supporting multiple
+ * page-sizes, we need to break this assumption.
+ */
+
+#include <linux/pagemap.h>
+#include <linux/migrate.h>
+#include <linux/kvm_host.h>
+#include <linux/ksm.h>
+#include <asm/ultravisor.h>
+#include <asm/mman.h>
+
+static struct dev_pagemap kvmppc_uvmem_pgmap;
+static unsigned long *kvmppc_uvmem_bitmap;
+static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
+
+#define KVMPPC_UVMEM_PFN	(1UL << 63)
+
+struct kvmppc_uvmem_slot {
+	struct list_head list;
+	unsigned long nr_pfns;
+	unsigned long base_pfn;
+	unsigned long *pfns;
+};
+
+struct kvmppc_uvmem_page_pvt {
+	struct kvm *kvm;
+	unsigned long gpa;
+};
+
+int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
+{
+	struct kvmppc_uvmem_slot *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+	p->pfns = vzalloc(array_size(slot->npages, sizeof(*p->pfns)));
+	if (!p->pfns) {
+		kfree(p);
+		return -ENOMEM;
+	}
+	p->nr_pfns = slot->npages;
+	p->base_pfn = slot->base_gfn;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+	list_add(&p->list, &kvm->arch.uvmem_pfns);
+	mutex_unlock(&kvm->arch.uvmem_lock);
+
+	return 0;
+}
+
+/*
+ * All device PFNs are already released by the time we come here.
+ */
+void kvmppc_uvmem_slot_free(struct kvm *kvm, const struct kvm_memory_slot *slot)
+{
+	struct kvmppc_uvmem_slot *p, *next;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+	list_for_each_entry_safe(p, next, &kvm->arch.uvmem_pfns, list) {
+		if (p->base_pfn == slot->base_gfn) {
+			vfree(p->pfns);
+			list_del(&p->list);
+			kfree(p);
+			break;
+		}
+	}
+	mutex_unlock(&kvm->arch.uvmem_lock);
+}
+
+static void kvmppc_uvmem_pfn_insert(unsigned long gfn, unsigned long uvmem_pfn,
+				    struct kvm *kvm)
+{
+	struct kvmppc_uvmem_slot *p;
+
+	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
+		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
+			unsigned long index = gfn - p->base_pfn;
+
+			p->pfns[index] = uvmem_pfn | KVMPPC_UVMEM_PFN;
+			return;
+		}
+	}
+}
+
+static void kvmppc_uvmem_pfn_remove(unsigned long gfn, struct kvm *kvm)
+{
+	struct kvmppc_uvmem_slot *p;
+
+	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
+		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
+			p->pfns[gfn - p->base_pfn] = 0;
+			return;
+		}
+	}
+}
+
+static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
+				    unsigned long *uvmem_pfn)
+{
+	struct kvmppc_uvmem_slot *p;
+
+	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
+		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
+			unsigned long index = gfn - p->base_pfn;
+
+			if (p->pfns[index] & KVMPPC_UVMEM_PFN) {
+				if (uvmem_pfn)
+					*uvmem_pfn = p->pfns[index] &
+						     ~KVMPPC_UVMEM_PFN;
+				return true;
+			} else
+				return false;
+		}
+	}
+	return false;
+}
+
+unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int ret = H_SUCCESS;
+	int srcu_idx;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	slots = kvm_memslots(kvm);
+	kvm_for_each_memslot(memslot, slots) {
+		if (kvmppc_uvmem_slot_init(kvm, memslot)) {
+			ret = H_PARAMETER;
+			goto out;
+		}
+		ret = uv_register_mem_slot(kvm->arch.lpid,
+					   memslot->base_gfn << PAGE_SHIFT,
+					   memslot->npages * PAGE_SIZE,
+					   0, memslot->id);
+		if (ret < 0) {
+			kvmppc_uvmem_slot_free(kvm, memslot);
+			ret = H_PARAMETER;
+			goto out;
+		}
+	}
+	kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_START;
+out:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return ret;
+}
+
+unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
+{
+	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
+		return H_UNSUPPORTED;
+
+	kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
+	return H_SUCCESS;
+}
+
+/*
+ * Get a free device PFN from the pool
+ *
+ * Called when a normal page is moved to secure memory (UV_PAGE_IN). Device
+ * PFN will be used to keep track of the secure page on HV side.
+ *
+ * Called with kvm->arch.uvmem_lock held
+ */
+static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
+{
+	struct page *dpage = NULL;
+	unsigned long bit, uvmem_pfn;
+	struct kvmppc_uvmem_page_pvt *pvt;
+	unsigned long pfn_last, pfn_first;
+
+	pfn_first = kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT;
+	pfn_last = pfn_first +
+		   (resource_size(&kvmppc_uvmem_pgmap.res) >> PAGE_SHIFT);
+
+	spin_lock(&kvmppc_uvmem_bitmap_lock);
+	bit = find_first_zero_bit(kvmppc_uvmem_bitmap,
+				  pfn_last - pfn_first);
+	if (bit >= (pfn_last - pfn_first))
+		goto out;
+	bitmap_set(kvmppc_uvmem_bitmap, bit, 1);
+	spin_unlock(&kvmppc_uvmem_bitmap_lock);
+
+	pvt = kzalloc(sizeof(*pvt), GFP_KERNEL);
+	if (!pvt)
+		goto out_clear;
+
+	uvmem_pfn = bit + pfn_first;
+	kvmppc_uvmem_pfn_insert(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
+
+	pvt->gpa = gpa;
+	pvt->kvm = kvm;
+
+	dpage = pfn_to_page(uvmem_pfn);
+	dpage->zone_device_data = pvt;
+	get_page(dpage);
+	lock_page(dpage);
+	return dpage;
+out_clear:
+	spin_lock(&kvmppc_uvmem_bitmap_lock);
+	bitmap_clear(kvmppc_uvmem_bitmap, bit, 1);
+out:
+	spin_unlock(&kvmppc_uvmem_bitmap_lock);
+	return NULL;
+}
+
+/*
+ * Alloc a PFN from private device memory pool and copy page from normal
+ * memory to secure memory using UV_PAGE_IN uvcall.
+ */
+static int
+kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
+		   unsigned long end, unsigned long gpa, struct kvm *kvm,
+		   unsigned long page_shift)
+{
+	unsigned long src_pfn, dst_pfn = 0;
+	struct migrate_vma mig;
+	struct page *spage;
+	unsigned long pfn;
+	struct page *dpage;
+	int ret = 0;
+
+	memset(&mig, 0, sizeof(mig));
+	mig.vma = vma;
+	mig.start = start;
+	mig.end = end;
+	mig.src = &src_pfn;
+	mig.dst = &dst_pfn;
+
+	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+			  MADV_UNMERGEABLE, &vma->vm_flags);
+	if (ret)
+		return ret;
+
+	ret = migrate_vma_setup(&mig);
+	if (ret)
+		return ret;
+
+	if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
+		ret = -1;
+		goto out_finalize;
+	}
+
+	dpage = kvmppc_uvmem_get_page(gpa, kvm);
+	if (!dpage) {
+		ret = -1;
+		goto out_finalize;
+	}
+
+	pfn = *mig.src >> MIGRATE_PFN_SHIFT;
+	spage = migrate_pfn_to_page(*mig.src);
+	if (spage)
+		uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
+			   page_shift);
+
+	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+	migrate_vma_pages(&mig);
+out_finalize:
+	migrate_vma_finalize(&mig);
+	return ret;
+}
+
+/*
+ * H_SVM_PAGE_IN: Move page from normal memory to secure memory.
+ */
+unsigned long
+kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
+		     unsigned long flags, unsigned long page_shift)
+{
+	unsigned long start, end;
+	struct vm_area_struct *vma;
+	int srcu_idx;
+	unsigned long gfn = gpa >> page_shift;
+	int ret;
+
+	if (page_shift != PAGE_SHIFT)
+		return H_P3;
+
+	if (flags)
+		return H_P2;
+
+	ret = H_PARAMETER;
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	down_read(&kvm->mm->mmap_sem);
+
+	start = gfn_to_hva(kvm, gfn);
+	if (kvm_is_error_hva(start))
+		goto out;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+	/* Fail the page-in request of an already paged-in page */
+	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
+		goto out_unlock;
+
+	end = start + (1UL << page_shift);
+	vma = find_vma_intersection(kvm->mm, start, end);
+	if (!vma || vma->vm_start > start || vma->vm_end < end)
+		goto out_unlock;
+
+	if (!kvmppc_svm_page_in(vma, start, end, gpa, kvm, page_shift))
+		ret = H_SUCCESS;
+out_unlock:
+	mutex_unlock(&kvm->arch.uvmem_lock);
+out:
+	up_read(&kvm->mm->mmap_sem);
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return ret;
+}
+
+/*
+ * Provision a new page on HV side and copy over the contents
+ * from secure memory using UV_PAGE_OUT uvcall.
+ */
+static int
+kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
+		    unsigned long end, unsigned long page_shift,
+		    struct kvm *kvm, unsigned long gpa)
+{
+	unsigned long src_pfn, dst_pfn = 0;
+	struct migrate_vma mig;
+	struct page *dpage, *spage;
+	unsigned long pfn;
+	int ret = U_SUCCESS;
+
+	memset(&mig, 0, sizeof(mig));
+	mig.vma = vma;
+	mig.start = start;
+	mig.end = end;
+	mig.src = &src_pfn;
+	mig.dst = &dst_pfn;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+	/* The requested page is already paged-out, nothing to do */
+	if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
+		goto out;
+
+	ret = migrate_vma_setup(&mig);
+	if (ret)
+		return ret;
+
+	spage = migrate_pfn_to_page(*mig.src);
+	if (!spage || !(*mig.src & MIGRATE_PFN_MIGRATE))
+		goto out_finalize;
+
+	if (!is_zone_device_page(spage))
+		goto out_finalize;
+
+	dpage = alloc_page_vma(GFP_HIGHUSER, vma, start);
+	if (!dpage) {
+		ret = -1;
+		goto out_finalize;
+	}
+
+	lock_page(dpage);
+	pfn = page_to_pfn(dpage);
+
+	ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
+			  gpa, 0, page_shift);
+	if (ret == U_SUCCESS)
+		*mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
+	else {
+		unlock_page(dpage);
+		__free_page(dpage);
+		goto out_finalize;
+	}
+
+	migrate_vma_pages(&mig);
+out_finalize:
+	migrate_vma_finalize(&mig);
+out:
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	return ret;
+}
+
+/*
+ * Fault handler callback that gets called when HV touches any page that
+ * has been moved to secure memory, we ask UV to give back the page by
+ * issuing UV_PAGE_OUT uvcall.
+ *
+ * This eventually results in dropping of device PFN and the newly
+ * provisioned page/PFN gets populated in QEMU page tables.
+ */
+static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf)
+{
+	struct kvmppc_uvmem_page_pvt *pvt = vmf->page->zone_device_data;
+
+	if (kvmppc_svm_page_out(vmf->vma, vmf->address,
+				vmf->address + PAGE_SIZE, PAGE_SHIFT,
+				pvt->kvm, pvt->gpa))
+		return VM_FAULT_SIGBUS;
+	else
+		return 0;
+}
+
+/*
+ * Release the device PFN back to the pool
+ *
+ * Gets called when secure page becomes a normal page during H_SVM_PAGE_OUT.
+ * Gets called with kvm->arch.uvmem_lock held.
+ */
+static void kvmppc_uvmem_page_free(struct page *page)
+{
+	unsigned long pfn = page_to_pfn(page) -
+			(kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
+	struct kvmppc_uvmem_page_pvt *pvt;
+
+	spin_lock(&kvmppc_uvmem_bitmap_lock);
+	bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
+	spin_unlock(&kvmppc_uvmem_bitmap_lock);
+
+	pvt = page->zone_device_data;
+	page->zone_device_data = NULL;
+	kvmppc_uvmem_pfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
+	kfree(pvt);
+}
+
+static const struct dev_pagemap_ops kvmppc_uvmem_ops = {
+	.page_free = kvmppc_uvmem_page_free,
+	.migrate_to_ram	= kvmppc_uvmem_migrate_to_ram,
+};
+
+/*
+ * H_SVM_PAGE_OUT: Move page from secure memory to normal memory.
+ */
+unsigned long
+kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
+		      unsigned long flags, unsigned long page_shift)
+{
+	unsigned long gfn = gpa >> page_shift;
+	unsigned long start, end;
+	struct vm_area_struct *vma;
+	int srcu_idx;
+	int ret;
+
+	if (page_shift != PAGE_SHIFT)
+		return H_P3;
+
+	if (flags)
+		return H_P2;
+
+	ret = H_PARAMETER;
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	down_read(&kvm->mm->mmap_sem);
+	start = gfn_to_hva(kvm, gfn);
+	if (kvm_is_error_hva(start))
+		goto out;
+
+	end = start + (1UL << page_shift);
+	vma = find_vma_intersection(kvm->mm, start, end);
+	if (!vma || vma->vm_start > start || vma->vm_end < end)
+		goto out;
+
+	if (!kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa))
+		ret = H_SUCCESS;
+out:
+	up_read(&kvm->mm->mmap_sem);
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return ret;
+}
+
+static u64 kvmppc_get_secmem_size(void)
+{
+	struct device_node *np;
+	int i, len;
+	const __be32 *prop;
+	u64 size = 0;
+
+	np = of_find_compatible_node(NULL, NULL, "ibm,uv-firmware");
+	if (!np)
+		goto out;
+
+	prop = of_get_property(np, "secure-memory-ranges", &len);
+	if (!prop)
+		goto out_put;
+
+	for (i = 0; i < len / (sizeof(*prop) * 4); i++)
+		size += of_read_number(prop + (i * 4) + 2, 2);
+
+out_put:
+	of_node_put(np);
+out:
+	return size;
+}
+
+int kvmppc_uvmem_init(void)
+{
+	int ret = 0;
+	unsigned long size;
+	struct resource *res;
+	void *addr;
+	unsigned long pfn_last, pfn_first;
+
+	size = kvmppc_get_secmem_size();
+	if (!size) {
+		/*
+		 * Don't fail the initialization of kvm-hv module if
+		 * the platform doesn't export ibm,uv-firmware node.
+		 * Let normal guests run on such PEF-disabled platform.
+		 */
+		pr_info("KVMPPC-UVMEM: No support for secure guests\n");
+		goto out;
+	}
+
+	res = request_free_mem_region(&iomem_resource, size, "kvmppc_uvmem");
+	if (IS_ERR(res)) {
+		ret = PTR_ERR(res);
+		goto out;
+	}
+
+	kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE;
+	kvmppc_uvmem_pgmap.res = *res;
+	kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops;
+	addr = memremap_pages(&kvmppc_uvmem_pgmap, NUMA_NO_NODE);
+	if (IS_ERR(addr)) {
+		ret = PTR_ERR(addr);
+		goto out_free_region;
+	}
+
+	pfn_first = res->start >> PAGE_SHIFT;
+	pfn_last = pfn_first + (resource_size(res) >> PAGE_SHIFT);
+	kvmppc_uvmem_bitmap = kcalloc(BITS_TO_LONGS(pfn_last - pfn_first),
+				      sizeof(unsigned long), GFP_KERNEL);
+	if (!kvmppc_uvmem_bitmap) {
+		ret = -ENOMEM;
+		goto out_unmap;
+	}
+
+	pr_info("KVMPPC-UVMEM: Secure Memory size 0x%lx\n", size);
+	return ret;
+out_unmap:
+	memunmap_pages(&kvmppc_uvmem_pgmap);
+out_free_region:
+	release_mem_region(res->start, size);
+out:
+	return ret;
+}
+
+void kvmppc_uvmem_free(void)
+{
+	memunmap_pages(&kvmppc_uvmem_pgmap);
+	release_mem_region(kvmppc_uvmem_pgmap.res.start,
+			   resource_size(&kvmppc_uvmem_pgmap.res));
+	kfree(kvmppc_uvmem_bitmap);
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 3/8] KVM: PPC: Shared pages support for secure guests
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
  2019-11-04  4:17 ` [PATCH v10 1/8] mm: ksm: Export ksm_madvise() Bharata B Rao
  2019-11-04  4:17 ` [PATCH v10 2/8] KVM: PPC: Support for running secure guests Bharata B Rao
@ 2019-11-04  4:17 ` " Bharata B Rao
  2019-11-06  4:52   ` Paul Mackerras
  2019-11-04  4:17 ` [PATCH v10 4/8] KVM: PPC: Radix changes for secure guest Bharata B Rao
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: linuxram, cclaudio, Bharata B Rao, jglisse, aneesh.kumar, paulus,
	sukadev, hch

A secure guest will share some of its pages with hypervisor (Eg. virtio
bounce buffers etc). Support sharing of pages between hypervisor and
ultravisor.

Shared page is reachable via both HV and UV side page tables. Once a
secure page is converted to shared page, the device page that represents
the secure page is unmapped from the HV side page tables.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h  |  3 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c | 85 ++++++++++++++++++++++++++++--
 2 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 4150732c81a0..13bd870609c3 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -342,6 +342,9 @@
 #define H_TLB_INVALIDATE	0xF808
 #define H_COPY_TOFROM_GUEST	0xF80C
 
+/* Flags for H_SVM_PAGE_IN */
+#define H_PAGE_IN_SHARED        0x1
+
 /* Platform-specific hcalls used by the Ultravisor */
 #define H_SVM_PAGE_IN		0xEF00
 #define H_SVM_PAGE_OUT		0xEF04
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index fe456fd07c74..d9395a23b10d 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -19,7 +19,10 @@
  * available in the platform for running secure guests is hotplugged.
  * Whenever a page belonging to the guest becomes secure, a page from this
  * private device memory is used to represent and track that secure page
- * on the HV side.
+ * on the HV side. Some pages (like virtio buffers, VPA pages etc) are
+ * shared between UV and HV. However such pages aren't represented by
+ * device private memory and mappings to shared memory exist in both
+ * UV and HV page tables.
  */
 
 /*
@@ -64,6 +67,9 @@
  * UV splits and remaps the 2MB page if necessary and copies out the
  * required 64K page contents.
  *
+ * Shared pages: Whenever guest shares a secure page, UV will split and
+ * remap the 2MB page if required and issue H_SVM_PAGE_IN with 64K page size.
+ *
  * In summary, the current secure pages handling code in HV assumes
  * 64K page size and in fact fails any page-in/page-out requests of
  * non-64K size upfront. If and when UV starts supporting multiple
@@ -93,6 +99,7 @@ struct kvmppc_uvmem_slot {
 struct kvmppc_uvmem_page_pvt {
 	struct kvm *kvm;
 	unsigned long gpa;
+	bool skip_page_out;
 };
 
 int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
@@ -329,8 +336,64 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	return ret;
 }
 
+/*
+ * Shares the page with HV, thus making it a normal page.
+ *
+ * - If the page is already secure, then provision a new page and share
+ * - If the page is a normal page, share the existing page
+ *
+ * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
+ * to unmap the device page from QEMU's page tables.
+ */
+static unsigned long
+kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+{
+
+	int ret = H_PARAMETER;
+	struct page *uvmem_page;
+	struct kvmppc_uvmem_page_pvt *pvt;
+	unsigned long pfn;
+	unsigned long gfn = gpa >> page_shift;
+	int srcu_idx;
+	unsigned long uvmem_pfn;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	mutex_lock(&kvm->arch.uvmem_lock);
+	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+		uvmem_page = pfn_to_page(uvmem_pfn);
+		pvt = uvmem_page->zone_device_data;
+		pvt->skip_page_out = true;
+	}
+
+retry:
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	pfn = gfn_to_pfn(kvm, gfn);
+	if (is_error_noslot_pfn(pfn))
+		goto out;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+		uvmem_page = pfn_to_page(uvmem_pfn);
+		pvt = uvmem_page->zone_device_data;
+		pvt->skip_page_out = true;
+		kvm_release_pfn_clean(pfn);
+		goto retry;
+	}
+
+	if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift))
+		ret = H_SUCCESS;
+	kvm_release_pfn_clean(pfn);
+	mutex_unlock(&kvm->arch.uvmem_lock);
+out:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return ret;
+}
+
 /*
  * H_SVM_PAGE_IN: Move page from normal memory to secure memory.
+ *
+ * H_PAGE_IN_SHARED flag makes the page shared which means that the same
+ * memory in is visible from both UV and HV.
  */
 unsigned long
 kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
@@ -345,9 +408,12 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (page_shift != PAGE_SHIFT)
 		return H_P3;
 
-	if (flags)
+	if (flags & ~H_PAGE_IN_SHARED)
 		return H_P2;
 
+	if (flags & H_PAGE_IN_SHARED)
+		return kvmppc_share_page(kvm, gpa, page_shift);
+
 	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 	down_read(&kvm->mm->mmap_sem);
@@ -388,6 +454,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
 	unsigned long src_pfn, dst_pfn = 0;
 	struct migrate_vma mig;
 	struct page *dpage, *spage;
+	struct kvmppc_uvmem_page_pvt *pvt;
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
@@ -421,10 +488,20 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
 	}
 
 	lock_page(dpage);
+	pvt = spage->zone_device_data;
 	pfn = page_to_pfn(dpage);
 
-	ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
-			  gpa, 0, page_shift);
+	/*
+	 * This function is used in two cases:
+	 * - When HV touches a secure page, for which we do UV_PAGE_OUT
+	 * - When a secure page is converted to shared page, we *get*
+	 *   the page to essentially unmap the device page. In this
+	 *   case we skip page-out.
+	 */
+	if (!pvt->skip_page_out)
+		ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
+				  gpa, 0, page_shift);
+
 	if (ret == U_SUCCESS)
 		*mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
 	else {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 4/8] KVM: PPC: Radix changes for secure guest
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
                   ` (2 preceding siblings ...)
  2019-11-04  4:17 ` [PATCH v10 3/8] KVM: PPC: Shared pages support for " Bharata B Rao
@ 2019-11-04  4:17 ` Bharata B Rao
  2019-11-06  5:58   ` Paul Mackerras
  2019-11-04  4:17 ` [PATCH v10 5/8] KVM: PPC: Handle memory plug/unplug to secure VM Bharata B Rao
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: linuxram, cclaudio, Bharata B Rao, jglisse, aneesh.kumar, paulus,
	sukadev, hch

- After the guest becomes secure, when we handle a page fault of a page
  belonging to SVM in HV, send that page to UV via UV_PAGE_IN.
- Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL.
- Ensure all those routines that walk the secondary page tables of
  the guest don't do so in case of secure VM. For secure guest, the
  active secondary page tables are in secure memory and the secondary
  page tables in HV are freed when guest becomes secure.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  6 ++++
 arch/powerpc/include/asm/ultravisor-api.h   |  1 +
 arch/powerpc/include/asm/ultravisor.h       |  5 ++++
 arch/powerpc/kvm/book3s_64_mmu_radix.c      | 22 ++++++++++++++
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 32 +++++++++++++++++++++
 5 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 95f389c2937b..3033a9585b43 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -18,6 +18,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 				    unsigned long page_shift);
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
+int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -58,5 +59,10 @@ static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 {
 	return H_UNSUPPORTED;
 }
+
+static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
+{
+	return -EFAULT;
+}
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h
index 2483f15bd71a..e774274ab30e 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -32,5 +32,6 @@
 #define UV_SHARE_PAGE			0xF130
 #define UV_UNSHARE_PAGE			0xF134
 #define UV_UNSHARE_ALL_PAGES		0xF140
+#define UV_PAGE_INVAL			0xF138
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h
index 79bb005e8ee9..40cc8bace654 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -67,4 +67,9 @@ static inline int uv_register_mem_slot(u64 lpid, u64 start_gpa, u64 size,
 			    size, flags, slotid);
 }
 
+static inline int uv_page_inval(u64 lpid, u64 gpa, u64 page_shift)
+{
+	return ucall_norets(UV_PAGE_INVAL, lpid, gpa, page_shift);
+}
+
 #endif	/* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 2d415c36a61d..4aec55a0ebc7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -19,6 +19,8 @@
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/pte-walk.h>
+#include <asm/ultravisor.h>
+#include <asm/kvm_book3s_uvmem.h>
 
 /*
  * Supported radix tree geometry.
@@ -915,6 +917,9 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	if (!(dsisr & DSISR_PRTABLE_FAULT))
 		gpa |= ea & 0xfff;
 
+	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
+		return kvmppc_send_page_to_uv(kvm, gfn);
+
 	/* Get the corresponding memslot */
 	memslot = gfn_to_memslot(kvm, gfn);
 
@@ -972,6 +977,11 @@ int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	unsigned long gpa = gfn << PAGE_SHIFT;
 	unsigned int shift;
 
+	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE) {
+		uv_page_inval(kvm->arch.lpid, gpa, PAGE_SIZE);
+		return 0;
+	}
+
 	ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift);
 	if (ptep && pte_present(*ptep))
 		kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot,
@@ -989,6 +999,9 @@ int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	int ref = 0;
 	unsigned long old, *rmapp;
 
+	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
+		return ref;
+
 	ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift);
 	if (ptep && pte_present(*ptep) && pte_young(*ptep)) {
 		old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_ACCESSED, 0,
@@ -1013,6 +1026,9 @@ int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	unsigned int shift;
 	int ref = 0;
 
+	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
+		return ref;
+
 	ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift);
 	if (ptep && pte_present(*ptep) && pte_young(*ptep))
 		ref = 1;
@@ -1030,6 +1046,9 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm,
 	int ret = 0;
 	unsigned long old, *rmapp;
 
+	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
+		return ret;
+
 	ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift);
 	if (ptep && pte_present(*ptep) && pte_dirty(*ptep)) {
 		ret = 1;
@@ -1082,6 +1101,9 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
 	unsigned long gpa;
 	unsigned int shift;
 
+	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
+		return;
+
 	gpa = memslot->base_gfn << PAGE_SHIFT;
 	spin_lock(&kvm->mmu_lock);
 	for (n = memslot->npages; n; --n) {
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index d9395a23b10d..6dbb11ee4bc0 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -70,6 +70,17 @@
  * Shared pages: Whenever guest shares a secure page, UV will split and
  * remap the 2MB page if required and issue H_SVM_PAGE_IN with 64K page size.
  *
+ * HV invalidating a page: When a regular page belonging to secure
+ * guest gets unmapped, HV informs UV with UV_PAGE_INVAL of 64K
+ * page size. Using 64K page size is correct here because any non-secure
+ * page will essentially be of 64K page size. Splitting by UV during sharing
+ * and page-out ensures this.
+ *
+ * Page fault handling: When HV handles page fault of a page belonging
+ * to secure guest, it sends that to UV with a 64K UV_PAGE_IN request.
+ * Using 64K size is correct here too as UV would have split the 2MB page
+ * into 64k mappings and would have done page-outs earlier.
+ *
  * In summary, the current secure pages handling code in HV assumes
  * 64K page size and in fact fails any page-in/page-out requests of
  * non-64K size upfront. If and when UV starts supporting multiple
@@ -604,6 +615,27 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
 	return ret;
 }
 
+int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
+{
+	unsigned long pfn;
+	int ret = U_SUCCESS;
+
+	pfn = gfn_to_pfn(kvm, gfn);
+	if (is_error_noslot_pfn(pfn))
+		return -EFAULT;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
+		goto out;
+
+	ret = uv_page_in(kvm->arch.lpid, pfn << PAGE_SHIFT, gfn << PAGE_SHIFT,
+			 0, PAGE_SHIFT);
+out:
+	kvm_release_pfn_clean(pfn);
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
+}
+
 static u64 kvmppc_get_secmem_size(void)
 {
 	struct device_node *np;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 5/8] KVM: PPC: Handle memory plug/unplug to secure VM
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
                   ` (3 preceding siblings ...)
  2019-11-04  4:17 ` [PATCH v10 4/8] KVM: PPC: Radix changes for secure guest Bharata B Rao
@ 2019-11-04  4:17 ` Bharata B Rao
  2019-11-11  4:25   ` Paul Mackerras
  2019-11-04  4:17 ` [PATCH v10 6/8] KVM: PPC: Support reset of secure guest Bharata B Rao
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: Sukadev Bhattiprolu, linuxram, cclaudio, Bharata B Rao, jglisse,
	aneesh.kumar, paulus, sukadev, hch

Register the new memslot with UV during plug and unregister
the memslot during unplug. In addition, release all the
device pages during unplug.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
	[Added skip_page_out arg to kvmppc_uvmem_drop_pages()]
---
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  6 ++++
 arch/powerpc/include/asm/ultravisor-api.h   |  1 +
 arch/powerpc/include/asm/ultravisor.h       |  5 +++
 arch/powerpc/kvm/book3s_64_mmu_radix.c      |  3 ++
 arch/powerpc/kvm/book3s_hv.c                | 24 +++++++++++++
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 37 +++++++++++++++++++++
 6 files changed, 76 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 3033a9585b43..3cf8425b9838 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -19,6 +19,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
 int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+			     struct kvm *kvm, bool skip_page_out);
 #else
 static inline int kvmppc_uvmem_init(void)
 {
@@ -64,5 +66,9 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 {
 	return -EFAULT;
 }
+
+static inline void
+kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+			struct kvm *kvm, bool skip_page_out) { }
 #endif /* CONFIG_PPC_UV */
 #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h
index e774274ab30e..4b0d044caa2a 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -27,6 +27,7 @@
 #define UV_RETURN			0xF11C
 #define UV_ESM				0xF110
 #define UV_REGISTER_MEM_SLOT		0xF120
+#define UV_UNREGISTER_MEM_SLOT		0xF124
 #define UV_PAGE_IN			0xF128
 #define UV_PAGE_OUT			0xF12C
 #define UV_SHARE_PAGE			0xF130
diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h
index 40cc8bace654..b8e59b7b4ac8 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -67,6 +67,11 @@ static inline int uv_register_mem_slot(u64 lpid, u64 start_gpa, u64 size,
 			    size, flags, slotid);
 }
 
+static inline int uv_unregister_mem_slot(u64 lpid, u64 slotid)
+{
+	return ucall_norets(UV_UNREGISTER_MEM_SLOT, lpid, slotid);
+}
+
 static inline int uv_page_inval(u64 lpid, u64 gpa, u64 page_shift)
 {
 	return ucall_norets(UV_PAGE_INVAL, lpid, gpa, page_shift);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 4aec55a0ebc7..ee70bfc28c82 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1101,6 +1101,9 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
 	unsigned long gpa;
 	unsigned int shift;
 
+	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
+		kvmppc_uvmem_drop_pages(memslot, kvm, true);
+
 	if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
 		return;
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 80e84277d11f..cb7ae1e9e4f2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -74,6 +74,7 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_book3s_uvmem.h>
+#include <asm/ultravisor.h>
 
 #include "book3s.h"
 
@@ -4532,6 +4533,29 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
 	if (change == KVM_MR_FLAGS_ONLY && kvm_is_radix(kvm) &&
 	    ((new->flags ^ old->flags) & KVM_MEM_LOG_DIRTY_PAGES))
 		kvmppc_radix_flush_memslot(kvm, old);
+	/*
+	 * If UV hasn't yet called H_SVM_INIT_START, don't register memslots.
+	 */
+	if (!kvm->arch.secure_guest)
+		return;
+
+	switch (change) {
+	case KVM_MR_CREATE:
+		if (kvmppc_uvmem_slot_init(kvm, new))
+			return;
+		uv_register_mem_slot(kvm->arch.lpid,
+				     new->base_gfn << PAGE_SHIFT,
+				     new->npages * PAGE_SIZE,
+				     0, new->id);
+		break;
+	case KVM_MR_DELETE:
+		uv_unregister_mem_slot(kvm->arch.lpid, old->id);
+		kvmppc_uvmem_slot_free(kvm, old);
+		break;
+	default:
+		/* TODO: Handle KVM_MR_MOVE */
+		break;
+	}
 }
 
 /*
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 6dbb11ee4bc0..6f26a0351a31 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -241,6 +241,43 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 	return H_SUCCESS;
 }
 
+/*
+ * Drop device pages that we maintain for the secure guest
+ *
+ * We first mark the pages to be skipped from UV_PAGE_OUT when there
+ * is HV side fault on these pages. Next we *get* these pages, forcing
+ * fault on them, do fault time migration to replace the device PTEs in
+ * QEMU page table with normal PTEs from newly allocated pages.
+ */
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+			     struct kvm *kvm, bool skip_page_out)
+{
+	int i;
+	struct kvmppc_uvmem_page_pvt *pvt;
+	unsigned long pfn, uvmem_pfn;
+	unsigned long gfn = free->base_gfn;
+
+	for (i = free->npages; i; --i, ++gfn) {
+		struct page *uvmem_page;
+
+		mutex_lock(&kvm->arch.uvmem_lock);
+		if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+			mutex_unlock(&kvm->arch.uvmem_lock);
+			continue;
+		}
+
+		uvmem_page = pfn_to_page(uvmem_pfn);
+		pvt = uvmem_page->zone_device_data;
+		pvt->skip_page_out = skip_page_out;
+		mutex_unlock(&kvm->arch.uvmem_lock);
+
+		pfn = gfn_to_pfn(kvm, gfn);
+		if (is_error_noslot_pfn(pfn))
+			continue;
+		kvm_release_pfn_clean(pfn);
+	}
+}
+
 /*
  * Get a free device PFN from the pool
  *
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 6/8] KVM: PPC: Support reset of secure guest
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
                   ` (4 preceding siblings ...)
  2019-11-04  4:17 ` [PATCH v10 5/8] KVM: PPC: Handle memory plug/unplug to secure VM Bharata B Rao
@ 2019-11-04  4:17 ` Bharata B Rao
  2019-11-11  5:28   ` Paul Mackerras
  2019-11-12  5:34   ` Paul Mackerras
  2019-11-04  4:17 ` [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall Bharata B Rao
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: linuxram, cclaudio, Bharata B Rao, jglisse, aneesh.kumar, paulus,
	sukadev, hch

Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
This ioctl will be issued by QEMU during reset and includes the
the following steps:

- Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
- Unpin the VPA pages so that they can be migrated back to secure
  side when guest becomes secure again. This is required because
  pinned pages can't be migrated.
- Reinitialize guest's partitioned scoped page tables. These are
  freed when guest becomes secure (H_SVM_INIT_DONE)
- Release all device pages of the secure guest.

After these steps, guest is ready to issue UV_ESM call once again
to switch to secure mode.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
	[Implementation of uv_svm_terminate() and its call from
	guest shutdown path]
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
	[Unpinning of VPA pages]
---
 Documentation/virt/kvm/api.txt            | 19 +++++
 arch/powerpc/include/asm/kvm_ppc.h        |  2 +
 arch/powerpc/include/asm/ultravisor-api.h |  1 +
 arch/powerpc/include/asm/ultravisor.h     |  5 ++
 arch/powerpc/kvm/book3s_hv.c              | 88 +++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_uvmem.c        |  6 ++
 arch/powerpc/kvm/powerpc.c                | 12 ++++
 include/uapi/linux/kvm.h                  |  1 +
 8 files changed, 134 insertions(+)

diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index 4833904d32a5..1b2e1d2002ba 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -4126,6 +4126,25 @@ Valid values for 'action':
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+4.121 KVM_PPC_SVM_OFF
+
+Capability: basic
+Architectures: powerpc
+Type: vm ioctl
+Parameters: none
+Returns: 0 on successful completion,
+Errors:
+  EINVAL:    if ultravisor failed to terminate the secure guest
+  ENOMEM:    if hypervisor failed to allocate new radix page tables for guest
+
+This ioctl is used to turn off the secure mode of the guest or transition
+the guest from secure mode to normal mode. This is invoked when the guest
+is reset. This has no effect if called for a normal guest.
+
+This ioctl issues an ultravisor call to terminate the secure guest,
+unpins the VPA pages, reinitializes guest's partition scoped page
+tables and releases all the device pages that are used to track the
+secure pages by hypervisor.
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index ee62776e5433..6d1bb597fe95 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -177,6 +177,7 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm *kvm,
 extern int kvmppc_switch_mmu_to_hpt(struct kvm *kvm);
 extern int kvmppc_switch_mmu_to_radix(struct kvm *kvm);
 extern void kvmppc_setup_partition_table(struct kvm *kvm);
+int kvmppc_reinit_partition_table(struct kvm *kvm);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce_64 *args);
@@ -321,6 +322,7 @@ struct kvmppc_ops {
 			       int size);
 	int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
 			      int size);
+	int (*svm_off)(struct kvm *kvm);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h
index 4b0d044caa2a..b66f6db7be6c 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -34,5 +34,6 @@
 #define UV_UNSHARE_PAGE			0xF134
 #define UV_UNSHARE_ALL_PAGES		0xF140
 #define UV_PAGE_INVAL			0xF138
+#define UV_SVM_TERMINATE		0xF13C
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h
index b8e59b7b4ac8..790b0e63681f 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -77,4 +77,9 @@ static inline int uv_page_inval(u64 lpid, u64 gpa, u64 page_shift)
 	return ucall_norets(UV_PAGE_INVAL, lpid, gpa, page_shift);
 }
 
+static inline int uv_svm_terminate(u64 lpid)
+{
+	return ucall_norets(UV_SVM_TERMINATE, lpid);
+}
+
 #endif	/* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cb7ae1e9e4f2..d2bc4e9bbe7e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2443,6 +2443,15 @@ static void unpin_vpa(struct kvm *kvm, struct kvmppc_vpa *vpa)
 					vpa->dirty);
 }
 
+static void unpin_vpa_reset(struct kvm *kvm, struct kvmppc_vpa *vpa)
+{
+	unpin_vpa(kvm, vpa);
+	vpa->gpa = 0;
+	vpa->pinned_addr = NULL;
+	vpa->dirty = false;
+	vpa->update_pending = 0;
+}
+
 static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 {
 	spin_lock(&vcpu->arch.vpa_update_lock);
@@ -4611,6 +4620,25 @@ void kvmppc_setup_partition_table(struct kvm *kvm)
 	kvmhv_set_ptbl_entry(kvm->arch.lpid, dw0, dw1);
 }
 
+/*
+ * Called from KVM_PPC_SVM_OFF ioctl at guest reset time when secure
+ * guest is converted back to normal guest.
+ */
+int kvmppc_reinit_partition_table(struct kvm *kvm)
+{
+	int ret;
+
+	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE))
+		return 0;
+
+	ret = kvmppc_init_vm_radix(kvm);
+	if (ret)
+		return ret;
+
+	kvmppc_setup_partition_table(kvm);
+	return 0;
+}
+
 /*
  * Set up HPT (hashed page table) and RMA (real-mode area).
  * Must be called with kvm->arch.mmu_setup_lock held.
@@ -5000,6 +5028,7 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 		if (nesting_enabled(kvm))
 			kvmhv_release_all_nested(kvm);
 		kvm->arch.process_table = 0;
+		uv_svm_terminate(kvm->arch.lpid);
 		kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
 	}
 
@@ -5442,6 +5471,64 @@ static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
 	return rc;
 }
 
+/*
+ *  IOCTL handler to turn off secure mode of guest
+ *
+ * - Issue ucall to terminate the guest on the UV side
+ * - Unpin the VPA pages (Enables these pages to be migrated back
+ *   when VM becomes secure again)
+ * - Recreate partition table as the guest is transitioning back to
+ *   normal mode
+ * - Release all device pages
+ */
+static int kvmhv_svm_off(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	int srcu_idx;
+	int ret = 0;
+	int i;
+
+	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
+		return ret;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		struct kvm_memory_slot *memslot;
+		struct kvm_memslots *slots = __kvm_memslots(kvm, i);
+
+		if (!slots)
+			continue;
+
+		kvm_for_each_memslot(memslot, slots) {
+			kvmppc_uvmem_drop_pages(memslot, kvm, true);
+			uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
+		}
+	}
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+
+	ret = uv_svm_terminate(kvm->arch.lpid);
+	if (ret != U_SUCCESS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		spin_lock(&vcpu->arch.vpa_update_lock);
+		unpin_vpa_reset(kvm, &vcpu->arch.dtl);
+		unpin_vpa_reset(kvm, &vcpu->arch.slb_shadow);
+		unpin_vpa_reset(kvm, &vcpu->arch.vpa);
+		spin_unlock(&vcpu->arch.vpa_update_lock);
+	}
+
+	ret = kvmppc_reinit_partition_table(kvm);
+	if (ret)
+		goto out;
+
+	kvm->arch.secure_guest = 0;
+out:
+	return ret;
+}
+
 static struct kvmppc_ops kvm_ops_hv = {
 	.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
 	.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -5484,6 +5571,7 @@ static struct kvmppc_ops kvm_ops_hv = {
 	.enable_nested = kvmhv_enable_nested,
 	.load_from_eaddr = kvmhv_load_from_eaddr,
 	.store_to_eaddr = kvmhv_store_to_eaddr,
+	.svm_off = kvmhv_svm_off,
 };
 
 static int kvm_init_subcore_bitmap(void)
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 6f26a0351a31..2df0d3f80c60 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -93,6 +93,7 @@
 #include <linux/ksm.h>
 #include <asm/ultravisor.h>
 #include <asm/mman.h>
+#include <asm/kvm_ppc.h>
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
@@ -237,6 +238,11 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
 		return H_UNSUPPORTED;
 
+	if (kvm_is_radix(kvm)) {
+		kvmppc_free_radix(kvm);
+		pr_info("LPID %d went secure, freed HV side radix pgtables\n",
+			kvm->arch.lpid);
+	}
 	kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
 	return H_SUCCESS;
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3a77bb643452..ec9713c1d928 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -31,6 +31,8 @@
 #include <asm/hvcall.h>
 #include <asm/plpar_wrappers.h>
 #endif
+#include <asm/ultravisor.h>
+#include <asm/kvm_host.h>
 
 #include "timing.h"
 #include "irq.h"
@@ -2411,6 +2413,16 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			r = -EFAULT;
 		break;
 	}
+	case KVM_PPC_SVM_OFF: {
+		struct kvm *kvm = filp->private_data;
+
+		r = 0;
+		if (!kvm->arch.kvm_ops->svm_off)
+			goto out;
+
+		r = kvm->arch.kvm_ops->svm_off(kvm);
+		break;
+	}
 	default: {
 		struct kvm *kvm = filp->private_data;
 		r = kvm->arch.kvm_ops->arch_vm_ioctl(filp, ioctl, arg);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 52641d8ca9e8..efa8ad88cbd2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1337,6 +1337,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_CPU_CHAR	  _IOR(KVMIO,  0xb1, struct kvm_ppc_cpu_char)
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
+#define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
                   ` (5 preceding siblings ...)
  2019-11-04  4:17 ` [PATCH v10 6/8] KVM: PPC: Support reset of secure guest Bharata B Rao
@ 2019-11-04  4:17 ` Bharata B Rao
  2019-11-11  4:19   ` Paul Mackerras
  2019-11-04  4:18 ` [PATCH v10 8/8] KVM: PPC: Ultravisor: Add PPC_UV config option Bharata B Rao
  2019-11-06  4:30 ` [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Paul Mackerras
  8 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:17 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: Sukadev Bhattiprolu, linuxram, cclaudio, Bharata B Rao, jglisse,
	Ram Pai, aneesh.kumar, paulus, sukadev, hch

From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>

Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
abort an SVM after it has issued the H_SVM_INIT_START and before the
H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
encounters security violations or other errors when starting an SVM.

Note that this hcall is different from UV_SVM_TERMINATE ucall which
is used by HV to terminate/cleanup an SVM.

In case of H_SVM_INIT_ABORT, we should page-out all the pages back to
HV (i.e., we should not skip the page-out). Otherwise the VM's pages,
possibly including its text/data would be stuck in secure memory.
Since the SVM did not go secure, its MSR_S bit will be clear and the
VM wont be able to access its pages even to do a clean exit.

Based on patches and discussion with Ram Pai and Bharata Rao.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 Documentation/powerpc/ultravisor.rst        | 39 +++++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h           |  1 +
 arch/powerpc/include/asm/kvm_book3s_uvmem.h |  6 ++++
 arch/powerpc/include/asm/kvm_host.h         |  1 +
 arch/powerpc/kvm/book3s_hv.c                |  3 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S     | 23 ++++++++++--
 arch/powerpc/kvm/book3s_hv_uvmem.c          | 29 +++++++++++++++
 7 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index 730854f73830..286cabadc566 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -948,6 +948,45 @@ Use cases
     up its internal state for this virtual machine.
 
 
+H_SVM_INIT_ABORT
+----------------
+
+    Abort the process of securing an SVM.
+
+Syntax
+~~~~~~
+
+.. code-block:: c
+
+	uint64_t hypercall(const uint64_t H_SVM_INIT_ABORT)
+
+Return values
+~~~~~~~~~~~~~
+
+    One of the following values:
+
+	* H_SUCCESS 		on success.
+	* H_UNSUPPORTED		if called from the wrong context (e.g.
+				from an SVM or before an H_SVM_INIT_START
+				hypercall).
+
+Description
+~~~~~~~~~~~
+
+    Abort the process of securing a virtual machine. This call must
+    be made after a prior call to ``H_SVM_INIT_START`` hypercall.
+
+Use cases
+~~~~~~~~~
+
+
+    On successfully securing a virtual machine, the Ultravisor informs
+    If the Ultravisor is unable to secure a virtual machine either due
+    to lack of resources or because the VM's security information could
+    not be validated, Ultravisor informs the Hypervisor about it.
+    Hypervisor can use this call to clean up any internal state for this
+    virtual machine.
+
 H_SVM_PAGE_IN
 -------------
 
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 13bd870609c3..e90c073e437e 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -350,6 +350,7 @@
 #define H_SVM_PAGE_OUT		0xEF04
 #define H_SVM_INIT_START	0xEF08
 #define H_SVM_INIT_DONE		0xEF0C
+#define H_SVM_INIT_ABORT	0xEF14
 
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR		1
diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
index 3cf8425b9838..eaea400ea715 100644
--- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
+++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
@@ -18,6 +18,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
 				    unsigned long page_shift);
 unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
 unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
+unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
 int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
 void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 			     struct kvm *kvm, bool skip_page_out);
@@ -62,6 +63,11 @@ static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
 	return H_UNSUPPORTED;
 }
 
+static inline unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
+{
+	return H_UNSUPPORTED;
+}
+
 static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 {
 	return -EFAULT;
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 577ca95fac7c..8310c0407383 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -278,6 +278,7 @@ struct kvm_resize_hpt;
 /* Flag values for kvm_arch.secure_guest */
 #define KVMPPC_SECURE_INIT_START 0x1 /* H_SVM_INIT_START has been called */
 #define KVMPPC_SECURE_INIT_DONE  0x2 /* H_SVM_INIT_DONE completed */
+#define KVMPPC_SECURE_INIT_ABORT 0x4 /* H_SVM_INIT_ABORT issued */
 
 struct kvm_arch {
 	unsigned int lpid;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d2bc4e9bbe7e..ad4e38ce7b55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1099,6 +1099,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 	case H_SVM_INIT_DONE:
 		ret = kvmppc_h_svm_init_done(vcpu->kvm);
 		break;
+	case H_SVM_INIT_ABORT:
+		ret = kvmppc_h_svm_init_abort(vcpu->kvm);
+		break;
 
 	default:
 		return RESUME_HOST;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index faebcbb8c4db..8d192c9947cd 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1112,10 +1112,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	ld	r6, VCPU_KVM(r4)
 	lbz	r7, KVM_SECURE_GUEST(r6)
 	cmpdi	r7, 0
+	bne	check_svm_abort
+
 	ld	r6, VCPU_GPR(R6)(r4)
 	ld	r7, VCPU_GPR(R7)(r4)
-	bne	ret_to_ultra
-
 	lwz	r0, VCPU_CR(r4)
 	mtcr	r0
 
@@ -1125,6 +1125,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	ld	r4, VCPU_GPR(R4)(r4)
 	HRFI_TO_GUEST
 	b	.
+
+/*
+ * If SVM is about to abort, return to UV one last time but clear the
+ * secure_guest state so future fast_guest_returns return to the normal
+ * VM. We expect following state and we will restore the state.
+ *   R6 = kvm
+ *   R7 = kvm->secure_guest
+ */
+check_svm_abort:
+
+	cmpdi	r7, 4	/* KVMPPC_SECURE_INIT_ABORT */
+	bne	ret_to_ultra
+	li	r7, 0
+	stb	r7, KVM_SECURE_GUEST(r6)
+
 /*
  * Use UV_RETURN ultracall to return control back to the Ultravisor after
  * processing an hypercall or interrupt that was forwarded (a.k.a. reflected)
@@ -1134,8 +1149,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
  *   R0 = hcall result
  *   R2 = SRR1, so UV can detect a synthesized interrupt (if any)
  *   R3 = UV_RETURN
+ *   R6 = kvm (to be restored)
+ *   R7 = kvm->secure_guest (to be restored)
  */
 ret_to_ultra:
+	ld	r6, VCPU_GPR(R6)(r4)
+	ld	r7, VCPU_GPR(R7)(r4)
 	lwz	r0, VCPU_CR(r4)
 	mtcr	r0
 
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 2df0d3f80c60..627dfe4abf08 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -284,6 +284,35 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
 	}
 }
 
+unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
+{
+	int i;
+	int srcu_idx;
+
+	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
+		return H_UNSUPPORTED;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		struct kvm_memory_slot *memslot;
+		struct kvm_memslots *slots = __kvm_memslots(kvm, i);
+
+		if (!slots)
+			continue;
+
+		kvm_for_each_memslot(memslot, slots) {
+			kvmppc_uvmem_drop_pages(memslot, kvm, false);
+			uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
+			kvmppc_uvmem_slot_free(kvm, memslot);
+		}
+	}
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+
+	kvm->arch.secure_guest = KVMPPC_SECURE_INIT_ABORT;
+	pr_info("LPID %d: Switching to secure aborted\n", kvm->arch.lpid);
+	return H_SUCCESS;
+}
+
 /*
  * Get a free device PFN from the pool
  *
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v10 8/8] KVM: PPC: Ultravisor: Add PPC_UV config option
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
                   ` (6 preceding siblings ...)
  2019-11-04  4:17 ` [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall Bharata B Rao
@ 2019-11-04  4:18 ` Bharata B Rao
  2019-11-06  4:30 ` [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Paul Mackerras
  8 siblings, 0 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-04  4:18 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc, linux-mm
  Cc: Sukadev Bhattiprolu, linuxram, cclaudio, Bharata B Rao, jglisse,
	aneesh.kumar, paulus, sukadev, hch, Anshuman Khandual

From: Anshuman Khandual <khandual@linux.vnet.ibm.com>

CONFIG_PPC_UV adds support for ultravisor.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[ Update config help and commit message ]
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 arch/powerpc/Kconfig | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3e56c9c2f16e..d7fef29b47c9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -451,6 +451,23 @@ config PPC_TRANSACTIONAL_MEM
 	help
 	  Support user-mode Transactional Memory on POWERPC.
 
+config PPC_UV
+	bool "Ultravisor support"
+	depends on KVM_BOOK3S_HV_POSSIBLE
+	select ZONE_DEVICE
+	select DEV_PAGEMAP_OPS
+	select DEVICE_PRIVATE
+	select MEMORY_HOTPLUG
+	select MEMORY_HOTREMOVE
+	default n
+	help
+	  This option paravirtualizes the kernel to run in POWER platforms that
+	  supports the Protected Execution Facility (PEF). On such platforms,
+	  the ultravisor firmware runs at a privilege level above the
+	  hypervisor.
+
+	  If unsure, say "N".
+
 config LD_HEAD_STUB_CATCH
 	bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if EXPERT
 	depends on PPC64
-- 
2.21.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest
  2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
                   ` (7 preceding siblings ...)
  2019-11-04  4:18 ` [PATCH v10 8/8] KVM: PPC: Ultravisor: Add PPC_UV config option Bharata B Rao
@ 2019-11-06  4:30 ` Paul Mackerras
  2019-11-06  6:20   ` Bharata B Rao
  8 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-06  4:30 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:52AM +0530, Bharata B Rao wrote:
> Hi,
> 
> This is the next version of the patchset that adds required support
> in the KVM hypervisor to run secure guests on PEF-enabled POWER platforms.
> 
> The major change in this version is about not using kvm.arch->rmap[]
> array to store device PFNs, thus not depending on the memslot availability
> to reach to the device PFN from the fault path. Instead of rmap[], we
> now have a different array which gets created and destroyed along with
> memslot creation and deletion. These arrays hang off from kvm.arch and
> are arragned in a simple linked list for now. We could move to some other
> data structure in future if walking of linked list becomes an overhead
> due to large number of memslots.

Thanks.  This is looking really close now.

> Other changes include:
> 
> - Rearranged/Merged/Cleaned up patches, removed all Acks/Reviewed-by since
>   all the patches have changed.
> - Added a new patch to support H_SVM_INIT_ABORT hcall (From Suka)
> - Added KSM unmerge support so that VMAs that have device PFNs don't
>   participate in KSM merging and eventually crash in KSM code.
> - Release device pages during unplug (Paul) and ensure that memory
>   hotplug and unplug works correctly.
> - Let kvm-hv module to load on PEF-disabled platforms (Ram) when
>   CONFIG_PPC_UV is enabled allowing regular non-secure guests
>   to still run.
> - Support guest reset when swithing to secure is in progress.
> - Check if page is already secure in kvmppc_send_page_to_uv() before
>   sending it to UV.
> - Fixed sentinal for header file kvm_book3s_uvmem.h (Jason)
> 
> Now, all the dependencies required by this patchset are in powerpc/next
> on which this patchset is based upon.

Can you tell me what patches that are in powerpc/next but not upstream
this depends on?

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 1/8] mm: ksm: Export ksm_madvise()
  2019-11-04  4:17 ` [PATCH v10 1/8] mm: ksm: Export ksm_madvise() Bharata B Rao
@ 2019-11-06  4:33   ` Paul Mackerras
  2019-11-06  6:45     ` Bharata B Rao
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-06  4:33 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:53AM +0530, Bharata B Rao wrote:
> KVM PPC module needs ksm_madvise() for supporting secure guests.
> Guest pages that become secure are represented as device private
> pages in the host. Such pages shouldn't participate in KSM merging.

If we don't do the ksm_madvise call, then as far as I can tell, it
should all still work correctly, but we might have KSM pulling pages
in unnecessarily, causing a reduction in performance.  Is that right?

> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>

Acked-by: Paul Mackerras <paulus@ozlabs.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 2/8] KVM: PPC: Support for running secure guests
  2019-11-04  4:17 ` [PATCH v10 2/8] KVM: PPC: Support for running secure guests Bharata B Rao
@ 2019-11-06  4:34   ` Paul Mackerras
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Mackerras @ 2019-11-06  4:34 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:54AM +0530, Bharata B Rao wrote:
> A pseries guest can be run as secure guest on Ultravisor-enabled
> POWER platforms. On such platforms, this driver will be used to manage
> the movement of guest pages between the normal memory managed by
> hypervisor (HV) and secure memory managed by Ultravisor (UV).
> 
> HV is informed about the guest's transition to secure mode via hcalls:
> 
> H_SVM_INIT_START: Initiate securing a VM
> H_SVM_INIT_DONE: Conclude securing a VM
> 
> As part of H_SVM_INIT_START, register all existing memslots with
> the UV. H_SVM_INIT_DONE call by UV informs HV that transition of
> the guest to secure mode is complete.
> 
> These two states (transition to secure mode STARTED and transition
> to secure mode COMPLETED) are recorded in kvm->arch.secure_guest.
> Setting these states will cause the assembly code that enters the
> guest to call the UV_RETURN ucall instead of trying to enter the
> guest directly.
> 
> Migration of pages betwen normal and secure memory of secure
> guest is implemented in H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls.
> 
> H_SVM_PAGE_IN: Move the content of a normal page to secure page
> H_SVM_PAGE_OUT: Move the content of a secure page to normal page
> 
> Private ZONE_DEVICE memory equal to the amount of secure memory
> available in the platform for running secure guests is created.
> Whenever a page belonging to the guest becomes secure, a page from
> this private device memory is used to represent and track that secure
> page on the HV side. The movement of pages between normal and secure
> memory is done via migrate_vma_pages() using UV_PAGE_IN and
> UV_PAGE_OUT ucalls.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>

Reviewed-by: Paul Mackerras <paulus@ozlabs.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 3/8] KVM: PPC: Shared pages support for secure guests
  2019-11-04  4:17 ` [PATCH v10 3/8] KVM: PPC: Shared pages support for " Bharata B Rao
@ 2019-11-06  4:52   ` Paul Mackerras
  2019-11-06  8:22     ` Bharata B Rao
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-06  4:52 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:55AM +0530, Bharata B Rao wrote:
> A secure guest will share some of its pages with hypervisor (Eg. virtio
> bounce buffers etc). Support sharing of pages between hypervisor and
> ultravisor.
> 
> Shared page is reachable via both HV and UV side page tables. Once a
> secure page is converted to shared page, the device page that represents
> the secure page is unmapped from the HV side page tables.

I'd like to understand a little better what's going on - see below...

> +/*
> + * Shares the page with HV, thus making it a normal page.
> + *
> + * - If the page is already secure, then provision a new page and share
> + * - If the page is a normal page, share the existing page
> + *
> + * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
> + * to unmap the device page from QEMU's page tables.
> + */
> +static unsigned long
> +kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
> +{
> +
> +	int ret = H_PARAMETER;
> +	struct page *uvmem_page;
> +	struct kvmppc_uvmem_page_pvt *pvt;
> +	unsigned long pfn;
> +	unsigned long gfn = gpa >> page_shift;
> +	int srcu_idx;
> +	unsigned long uvmem_pfn;
> +
> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> +		uvmem_page = pfn_to_page(uvmem_pfn);
> +		pvt = uvmem_page->zone_device_data;
> +		pvt->skip_page_out = true;
> +	}
> +
> +retry:
> +	mutex_unlock(&kvm->arch.uvmem_lock);
> +	pfn = gfn_to_pfn(kvm, gfn);

At this point, pfn is the value obtained from the page table for
userspace (e.g. QEMU), right?  I would think it should be equal to
uvmem_pfn in most cases, shouldn't it?  If not, what is it going to
be?

> +	if (is_error_noslot_pfn(pfn))
> +		goto out;
> +
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> +		uvmem_page = pfn_to_page(uvmem_pfn);
> +		pvt = uvmem_page->zone_device_data;
> +		pvt->skip_page_out = true;
> +		kvm_release_pfn_clean(pfn);

This is going to do a put_page(), unless pfn is a reserved pfn.  If it
does a put_page(), where did we do the corresponding get_page()?
However, since kvmppc_gfn_is_uvmem_pfn() returned true, doesn't that
mean that pfn here should be a device pfn, and in fact should be the
same as uvmem_pfn (possibly with some extra bit(s) set)?  What does
kvm_is_reserved_pfn() return for a device pfn?

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 4/8] KVM: PPC: Radix changes for secure guest
  2019-11-04  4:17 ` [PATCH v10 4/8] KVM: PPC: Radix changes for secure guest Bharata B Rao
@ 2019-11-06  5:58   ` Paul Mackerras
  2019-11-06  8:36     ` Bharata B Rao
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-06  5:58 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:56AM +0530, Bharata B Rao wrote:
> - After the guest becomes secure, when we handle a page fault of a page
>   belonging to SVM in HV, send that page to UV via UV_PAGE_IN.
> - Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL.
> - Ensure all those routines that walk the secondary page tables of
>   the guest don't do so in case of secure VM. For secure guest, the
>   active secondary page tables are in secure memory and the secondary
>   page tables in HV are freed when guest becomes secure.

Why do we free the page tables?  Just to save a little memory?  It
feels like it would make things more fragile.

Also, I don't see where the freeing gets done in this patch.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest
  2019-11-06  4:30 ` [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Paul Mackerras
@ 2019-11-06  6:20   ` Bharata B Rao
  0 siblings, 0 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-06  6:20 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Wed, Nov 06, 2019 at 03:30:58PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:52AM +0530, Bharata B Rao wrote:
> > 
> > Now, all the dependencies required by this patchset are in powerpc/next
> > on which this patchset is based upon.
> 
> Can you tell me what patches that are in powerpc/next but not upstream
> this depends on?

Sorry, I should have been clear. All the dependencies are upstream.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 1/8] mm: ksm: Export ksm_madvise()
  2019-11-06  4:33   ` Paul Mackerras
@ 2019-11-06  6:45     ` Bharata B Rao
  2019-11-07  5:45       ` Paul Mackerras
  0 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-06  6:45 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Wed, Nov 06, 2019 at 03:33:29PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:53AM +0530, Bharata B Rao wrote:
> > KVM PPC module needs ksm_madvise() for supporting secure guests.
> > Guest pages that become secure are represented as device private
> > pages in the host. Such pages shouldn't participate in KSM merging.
> 
> If we don't do the ksm_madvise call, then as far as I can tell, it
> should all still work correctly, but we might have KSM pulling pages
> in unnecessarily, causing a reduction in performance.  Is that right?

I thought so too. When KSM tries to merge a secure page, it should
cause a fault resulting in page-out the secure page. However I see
the below crash when KSM is enabled and KSM scan tries to kmap and
memcmp the device private page.

BUG: Unable to handle kernel data access at 0xc007fffe00010000
Faulting instruction address: 0xc0000000000ab5a0
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 22 Comm: ksmd Not tainted 5.4.0-rc2-00026-g2249c0ae4a53-dirty #376
NIP:  c0000000000ab5a0 LR: c0000000003d7c3c CTR: 0000000000000004
REGS: c0000001c85d79b0 TRAP: 0300   Not tainted  (5.4.0-rc2-00026-g2249c0ae4a53-dirty)
MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002242  XER: 20040000
CFAR: c0000000000ab3d0 DAR: c007fffe00010000 DSISR: 40000000 IRQMASK: 0 
GPR00: 0000000000000004 c0000001c85d7c40 c0000000018ce000 c0000001c3880000 
GPR04: c007fffe00010000 0000000000010000 0000000000000000 ffffffffffffffff 
GPR08: c000000001992298 0000603820002138 ffffffffffffffff ffffffff00003a69 
GPR12: 0000000024002242 c000000002550000 c0000001c8700000 c00000000179b728 
GPR16: c00c01ffff800040 c00000000179b5b8 c00c00000070e200 ffffffffffffffff 
GPR20: 0000000000000000 0000000000000000 fffffffffffff000 c00000000179b648 
GPR24: c0000000024464a0 c00000000249f568 c000000001118918 0000000000000000 
GPR28: c0000001c804c590 c00000000249f518 0000000000000000 c0000001c8700000 
NIP [c0000000000ab5a0] memcmp+0x320/0x6a0
LR [c0000000003d7c3c] memcmp_pages+0x8c/0xe0
Call Trace:
[c0000001c85d7c40] [c0000001c804c590] 0xc0000001c804c590 (unreliable)
[c0000001c85d7c70] [c0000000004591d0] ksm_scan_thread+0x960/0x21b0
[c0000001c85d7db0] [c0000000001bf328] kthread+0x198/0x1a0
[c0000001c85d7e20] [c00000000000bfbc] ret_from_kernel_thread+0x5c/0x80
Instruction dump:
ebc1fff0 eba1ffe8 eb81ffe0 eb61ffd8 4e800020 38600001 4d810020 3860ffff 
4e800020 38000004 7c0903a6 7d201c28 <7d402428> 7c295040 38630008 38840008 

In anycase, we wouldn't want secure guests pages to be pulled out due
to KSM, hence disabled merging.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 3/8] KVM: PPC: Shared pages support for secure guests
  2019-11-06  4:52   ` Paul Mackerras
@ 2019-11-06  8:22     ` Bharata B Rao
  2019-11-06  8:29       ` Bharata B Rao
  0 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-06  8:22 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Wed, Nov 06, 2019 at 03:52:38PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:55AM +0530, Bharata B Rao wrote:
> > A secure guest will share some of its pages with hypervisor (Eg. virtio
> > bounce buffers etc). Support sharing of pages between hypervisor and
> > ultravisor.
> > 
> > Shared page is reachable via both HV and UV side page tables. Once a
> > secure page is converted to shared page, the device page that represents
> > the secure page is unmapped from the HV side page tables.
> 
> I'd like to understand a little better what's going on - see below...
> 
> > +/*
> > + * Shares the page with HV, thus making it a normal page.
> > + *
> > + * - If the page is already secure, then provision a new page and share
> > + * - If the page is a normal page, share the existing page
> > + *
> > + * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
> > + * to unmap the device page from QEMU's page tables.
> > + */
> > +static unsigned long
> > +kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
> > +{
> > +
> > +	int ret = H_PARAMETER;
> > +	struct page *uvmem_page;
> > +	struct kvmppc_uvmem_page_pvt *pvt;
> > +	unsigned long pfn;
> > +	unsigned long gfn = gpa >> page_shift;
> > +	int srcu_idx;
> > +	unsigned long uvmem_pfn;
> > +
> > +	srcu_idx = srcu_read_lock(&kvm->srcu);
> > +	mutex_lock(&kvm->arch.uvmem_lock);
> > +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> > +		uvmem_page = pfn_to_page(uvmem_pfn);
> > +		pvt = uvmem_page->zone_device_data;
> > +		pvt->skip_page_out = true;
> > +	}
> > +
> > +retry:
> > +	mutex_unlock(&kvm->arch.uvmem_lock);
> > +	pfn = gfn_to_pfn(kvm, gfn);
> 
> At this point, pfn is the value obtained from the page table for
> userspace (e.g. QEMU), right?

Yes.

> I would think it should be equal to
> uvmem_pfn in most cases, shouldn't it?

Yes, in most cases (Common case is to share a page that is already secure)

> If not, what is it going to
> be?

It can be a regular pfn if non-secure page is being shared again.

> 
> > +	if (is_error_noslot_pfn(pfn))
> > +		goto out;
> > +
> > +	mutex_lock(&kvm->arch.uvmem_lock);
> > +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> > +		uvmem_page = pfn_to_page(uvmem_pfn);
> > +		pvt = uvmem_page->zone_device_data;
> > +		pvt->skip_page_out = true;
> > +		kvm_release_pfn_clean(pfn);
> 
> This is going to do a put_page(), unless pfn is a reserved pfn.  If it
> does a put_page(), where did we do the corresponding get_page()?

gfn_to_pfn() will come with a reference held.

> However, since kvmppc_gfn_is_uvmem_pfn() returned true, doesn't that
> mean that pfn here should be a device pfn, and in fact should be the
> same as uvmem_pfn (possibly with some extra bit(s) set)?

If secure page is being converted to share, pfn will be uvmem_pfn (device pfn).
If not, it will be regular pfn.

>  What does
> kvm_is_reserved_pfn() return for a device pfn?

From this code patch, we will never call kvm_release_pfn_clean() on a device
pfn. The prior call to gfn_to_pfn() would fault, result in page-out thus
converting the device pfn to regular pfn (page share request for secure page
case).

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 3/8] KVM: PPC: Shared pages support for secure guests
  2019-11-06  8:22     ` Bharata B Rao
@ 2019-11-06  8:29       ` Bharata B Rao
  0 siblings, 0 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-06  8:29 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Wed, Nov 06, 2019 at 01:52:39PM +0530, Bharata B Rao wrote:
> > However, since kvmppc_gfn_is_uvmem_pfn() returned true, doesn't that
> > mean that pfn here should be a device pfn, and in fact should be the
> > same as uvmem_pfn (possibly with some extra bit(s) set)?
> 
> If secure page is being converted to share, pfn will be uvmem_pfn (device pfn).

Also, kvmppc_gfn_is_uvmem_pfn() needn't always return true. It returns
true only for secure pages, while this routine handles sharing of both
secure and normal pages.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 4/8] KVM: PPC: Radix changes for secure guest
  2019-11-06  5:58   ` Paul Mackerras
@ 2019-11-06  8:36     ` Bharata B Rao
  0 siblings, 0 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-06  8:36 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Wed, Nov 06, 2019 at 04:58:23PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:56AM +0530, Bharata B Rao wrote:
> > - After the guest becomes secure, when we handle a page fault of a page
> >   belonging to SVM in HV, send that page to UV via UV_PAGE_IN.
> > - Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL.
> > - Ensure all those routines that walk the secondary page tables of
> >   the guest don't do so in case of secure VM. For secure guest, the
> >   active secondary page tables are in secure memory and the secondary
> >   page tables in HV are freed when guest becomes secure.
> 
> Why do we free the page tables?  Just to save a little memory?  It
> feels like it would make things more fragile.

I guess we could just leave the page tables around and they would get
populated again if and when the guest is reset (i,e., when it goes back
to non-secure mode)

However it appeared cleaner to cleanup the page tables given that aren't
in user any longer.
> 
> Also, I don't see where the freeing gets done in this patch.

There isn't a very good reason for freeing code to be not part of this patch.
I just put that in reset patch (6/8) where there is code for reinitializing
page tables again.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 1/8] mm: ksm: Export ksm_madvise()
  2019-11-06  6:45     ` Bharata B Rao
@ 2019-11-07  5:45       ` Paul Mackerras
  2019-11-15 14:10         ` Bharata B Rao
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-07  5:45 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Wed, Nov 06, 2019 at 12:15:42PM +0530, Bharata B Rao wrote:
> On Wed, Nov 06, 2019 at 03:33:29PM +1100, Paul Mackerras wrote:
> > On Mon, Nov 04, 2019 at 09:47:53AM +0530, Bharata B Rao wrote:
> > > KVM PPC module needs ksm_madvise() for supporting secure guests.
> > > Guest pages that become secure are represented as device private
> > > pages in the host. Such pages shouldn't participate in KSM merging.
> > 
> > If we don't do the ksm_madvise call, then as far as I can tell, it
> > should all still work correctly, but we might have KSM pulling pages
> > in unnecessarily, causing a reduction in performance.  Is that right?
> 
> I thought so too. When KSM tries to merge a secure page, it should
> cause a fault resulting in page-out the secure page. However I see
> the below crash when KSM is enabled and KSM scan tries to kmap and
> memcmp the device private page.
> 
> BUG: Unable to handle kernel data access at 0xc007fffe00010000
> Faulting instruction address: 0xc0000000000ab5a0
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 22 Comm: ksmd Not tainted 5.4.0-rc2-00026-g2249c0ae4a53-dirty #376
> NIP:  c0000000000ab5a0 LR: c0000000003d7c3c CTR: 0000000000000004
> REGS: c0000001c85d79b0 TRAP: 0300   Not tainted  (5.4.0-rc2-00026-g2249c0ae4a53-dirty)
> MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002242  XER: 20040000
> CFAR: c0000000000ab3d0 DAR: c007fffe00010000 DSISR: 40000000 IRQMASK: 0 
> GPR00: 0000000000000004 c0000001c85d7c40 c0000000018ce000 c0000001c3880000 
> GPR04: c007fffe00010000 0000000000010000 0000000000000000 ffffffffffffffff 
> GPR08: c000000001992298 0000603820002138 ffffffffffffffff ffffffff00003a69 
> GPR12: 0000000024002242 c000000002550000 c0000001c8700000 c00000000179b728 
> GPR16: c00c01ffff800040 c00000000179b5b8 c00c00000070e200 ffffffffffffffff 
> GPR20: 0000000000000000 0000000000000000 fffffffffffff000 c00000000179b648 
> GPR24: c0000000024464a0 c00000000249f568 c000000001118918 0000000000000000 
> GPR28: c0000001c804c590 c00000000249f518 0000000000000000 c0000001c8700000 
> NIP [c0000000000ab5a0] memcmp+0x320/0x6a0
> LR [c0000000003d7c3c] memcmp_pages+0x8c/0xe0
> Call Trace:
> [c0000001c85d7c40] [c0000001c804c590] 0xc0000001c804c590 (unreliable)
> [c0000001c85d7c70] [c0000000004591d0] ksm_scan_thread+0x960/0x21b0
> [c0000001c85d7db0] [c0000000001bf328] kthread+0x198/0x1a0
> [c0000001c85d7e20] [c00000000000bfbc] ret_from_kernel_thread+0x5c/0x80
> Instruction dump:
> ebc1fff0 eba1ffe8 eb81ffe0 eb61ffd8 4e800020 38600001 4d810020 3860ffff 
> 4e800020 38000004 7c0903a6 7d201c28 <7d402428> 7c295040 38630008 38840008 

Hmmm, that seems like a bug in the ZONE_DEVICE stuff generally.  All
that ksm is doing as far as I can see is follow_page() and
kmap_atomic().  I wonder how many other places in the kernel might
also be prone to crashing if they try to touch device pages?

> In anycase, we wouldn't want secure guests pages to be pulled out due
> to KSM, hence disabled merging.

Sure, I don't disagree with that, but I worry that we are papering
over a bug here.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-04  4:17 ` [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall Bharata B Rao
@ 2019-11-11  4:19   ` Paul Mackerras
  2019-11-12  1:01     ` Ram Pai
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-11  4:19 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Sukadev Bhattiprolu, linuxram, cclaudio, kvm-ppc, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Mon, Nov 04, 2019 at 09:47:59AM +0530, Bharata B Rao wrote:
> From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> 
> Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
> abort an SVM after it has issued the H_SVM_INIT_START and before the
> H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
> encounters security violations or other errors when starting an SVM.
> 
> Note that this hcall is different from UV_SVM_TERMINATE ucall which
> is used by HV to terminate/cleanup an SVM.
> 
> In case of H_SVM_INIT_ABORT, we should page-out all the pages back to
> HV (i.e., we should not skip the page-out). Otherwise the VM's pages,
> possibly including its text/data would be stuck in secure memory.
> Since the SVM did not go secure, its MSR_S bit will be clear and the
> VM wont be able to access its pages even to do a clean exit.

It seems fragile to me to have one more transfer back into the
ultravisor after this call.  Why does the UV need to do this call and
then get control back again one more time?  Why can't the UV defer
doing this call until it can do it without expecting to see a return
from the hcall?  And if it does need to see a return from the hcall,
what would happen if a malicious hypervisor doesn't do the return?

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 5/8] KVM: PPC: Handle memory plug/unplug to secure VM
  2019-11-04  4:17 ` [PATCH v10 5/8] KVM: PPC: Handle memory plug/unplug to secure VM Bharata B Rao
@ 2019-11-11  4:25   ` Paul Mackerras
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Mackerras @ 2019-11-11  4:25 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Sukadev Bhattiprolu, linuxram, cclaudio, kvm-ppc, linux-mm,
	jglisse, aneesh.kumar, paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:57AM +0530, Bharata B Rao wrote:
> Register the new memslot with UV during plug and unregister
> the memslot during unplug. In addition, release all the
> device pages during unplug.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> 	[Added skip_page_out arg to kvmppc_uvmem_drop_pages()]

Reviewed-by: Paul Mackerras <paulus@ozlabs.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 6/8] KVM: PPC: Support reset of secure guest
  2019-11-04  4:17 ` [PATCH v10 6/8] KVM: PPC: Support reset of secure guest Bharata B Rao
@ 2019-11-11  5:28   ` Paul Mackerras
  2019-11-11  6:55     ` Bharata B Rao
  2019-11-12  5:34   ` Paul Mackerras
  1 sibling, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-11  5:28 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:58AM +0530, Bharata B Rao wrote:
> Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
> This ioctl will be issued by QEMU during reset and includes the
> the following steps:
> 
> - Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
> - Unpin the VPA pages so that they can be migrated back to secure
>   side when guest becomes secure again. This is required because
>   pinned pages can't be migrated.

Unpinning the VPA pages is normally handled during VM reset by QEMU
doing set_one_reg operations to set the values for the
KVM_REG_PPC_VPA_ADDR, KVM_REG_PPC_VPA_SLB and KVM_REG_PPC_VPA_DTL
pseudo-registers to zero.  Is there some reason why this isn't
happening for a secure VM, and if so, what is that reason?
If it is happening, then why do we need to unpin the pages explicitly
here?

> - Reinitialize guest's partitioned scoped page tables. These are
>   freed when guest becomes secure (H_SVM_INIT_DONE)

It doesn't seem particularly useful to me to free the partition-scoped
page tables when the guest becomes secure, and it feels like it makes
things more fragile.  If you don't free them then, then you don't need
to reallocate them now.

> - Release all device pages of the secure guest.
> 
> After these steps, guest is ready to issue UV_ESM call once again
> to switch to secure mode.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 6/8] KVM: PPC: Support reset of secure guest
  2019-11-11  5:28   ` Paul Mackerras
@ 2019-11-11  6:55     ` Bharata B Rao
  0 siblings, 0 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-11  6:55 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 11, 2019 at 04:28:06PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:58AM +0530, Bharata B Rao wrote:
> > Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
> > This ioctl will be issued by QEMU during reset and includes the
> > the following steps:
> > 
> > - Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
> > - Unpin the VPA pages so that they can be migrated back to secure
> >   side when guest becomes secure again. This is required because
> >   pinned pages can't be migrated.
> 
> Unpinning the VPA pages is normally handled during VM reset by QEMU
> doing set_one_reg operations to set the values for the
> KVM_REG_PPC_VPA_ADDR, KVM_REG_PPC_VPA_SLB and KVM_REG_PPC_VPA_DTL
> pseudo-registers to zero.  Is there some reason why this isn't
> happening for a secure VM, and if so, what is that reason?
> If it is happening, then why do we need to unpin the pages explicitly
> here?

We were observing these VPA pages still remaining pinned during
reset and hence subsequent paging-in of these pages were failing.
Unpinning them fixed the problem.

I will investigate and get back on why exactly these pages weren't
gettting unpinned normally as part of reset.

> 
> > - Reinitialize guest's partitioned scoped page tables. These are
> >   freed when guest becomes secure (H_SVM_INIT_DONE)
> 
> It doesn't seem particularly useful to me to free the partition-scoped
> page tables when the guest becomes secure, and it feels like it makes
> things more fragile.  If you don't free them then, then you don't need
> to reallocate them now.

Sure, I will not free them in the next version.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-11  4:19   ` Paul Mackerras
@ 2019-11-12  1:01     ` Ram Pai
  2019-11-12  5:38       ` Paul Mackerras
  0 siblings, 1 reply; 39+ messages in thread
From: Ram Pai @ 2019-11-12  1:01 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Mon, Nov 11, 2019 at 03:19:24PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:59AM +0530, Bharata B Rao wrote:
> > From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> > 
> > Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
> > abort an SVM after it has issued the H_SVM_INIT_START and before the
> > H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
> > encounters security violations or other errors when starting an SVM.
> > 
> > Note that this hcall is different from UV_SVM_TERMINATE ucall which
> > is used by HV to terminate/cleanup an SVM.
> > 
> > In case of H_SVM_INIT_ABORT, we should page-out all the pages back to
> > HV (i.e., we should not skip the page-out). Otherwise the VM's pages,
> > possibly including its text/data would be stuck in secure memory.
> > Since the SVM did not go secure, its MSR_S bit will be clear and the
> > VM wont be able to access its pages even to do a clean exit.
> 
> It seems fragile to me to have one more transfer back into the
> ultravisor after this call.  Why does the UV need to do this call and
> then get control back again one more time?  
> Why can't the UV defer
> doing this call until it can do it without expecting to see a return
> from the hcall?  

Sure. But, what if the hypervisor calls back into the UV through a
ucall, asking for some page to be paged-out?  If the ultravisor has
cleaned up the state associated with the SVM, it wont be able to service
that request.

H_SVM_INIT_ABORT is invoked to tell the hypervisor that the
secure-state-transition for the VM cannot be continued any further.
Hypervisor can than choose to do whatever with that information. It can
cleanup its state, and/or make ucalls to get some information from the
ultravisor.  It can also choose not to return control back to the ultravisor.


> And if it does need to see a return from the hcall,
> what would happen if a malicious hypervisor doesn't do the return?

That is fine.  At most it will be a denail-of-service attack.

RP

> 
> Paul.





If the ultravisor cleans up the SVM's state on its side and then informs
the Hypervisor to abort the SVM, the hypervisor will not be able to
cleanly terminate the VM.  Because to terminate the SVM, the hypervisor
still needs the services of the Ultravisor. For example: to get the
pages back into the hypervisor if needed. Another example is, the
hypervisor can call UV_SVM_TERMINATE.  Regardless of which ucall
gets called, the ultravisor has to hold on to enough state of the
SVM to service that request.

The current design assumes that the hypervisor explicitly informs the
ultravisor, that it is done with the SVM, through the UV_SVM_TERMINATE
ucall. Till that point the Ultravisor must to be ready to service any
ucalls made by the hypervisor on the SVM's behalf.


And if the ultravisor has cleaned-up the state of the SVM on it side,
any such ucall requests by the hypervisor will return with error. 

In summary -- for the hypervisor to cleanly terminate an SVM, it needs the
services of the ultravisor.  Only the hypervisor knows, when it would
NOT anymore need the services of the ultravisor for a SVM. Only after
the hypervisor communicates that through the UV_SVM_TERMINATE ucall,
the ultravisor will be able to confidently clean the state of the SVM
on its side.


The H_SVM_INIT_ABORT is a mechanism for the UV to inform the HV
to do whatever it needs to do to cleanup its state of the SVM; which
includes making ucalls to the ultravisor.


-- 
Ram Pai


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 6/8] KVM: PPC: Support reset of secure guest
  2019-11-04  4:17 ` [PATCH v10 6/8] KVM: PPC: Support reset of secure guest Bharata B Rao
  2019-11-11  5:28   ` Paul Mackerras
@ 2019-11-12  5:34   ` Paul Mackerras
  2019-11-13 15:29     ` Bharata B Rao
  1 sibling, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-12  5:34 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Mon, Nov 04, 2019 at 09:47:58AM +0530, Bharata B Rao wrote:
[snip]
> @@ -5442,6 +5471,64 @@ static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
>  	return rc;
>  }
>  
> +/*
> + *  IOCTL handler to turn off secure mode of guest
> + *
> + * - Issue ucall to terminate the guest on the UV side
> + * - Unpin the VPA pages (Enables these pages to be migrated back
> + *   when VM becomes secure again)
> + * - Recreate partition table as the guest is transitioning back to
> + *   normal mode
> + * - Release all device pages
> + */
> +static int kvmhv_svm_off(struct kvm *kvm)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int srcu_idx;
> +	int ret = 0;
> +	int i;
> +
> +	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> +		return ret;
> +

A further comment on this code: it should check that no vcpus are
running and fail if any are running, and it should prevent any vcpus
from running until the function is finished, using code like that in
kvmhv_configure_mmu().  That is, it should do something like this:

	mutex_lock(&kvm->arch.mmu_setup_lock);
	mmu_was_ready = kvm->arch.mmu_ready;
	if (kvm->arch.mmu_ready) {
		kvm->arch.mmu_ready = 0;
		/* order mmu_ready vs. vcpus_running */
		smp_mb();
		if (atomic_read(&kvm->arch.vcpus_running)) {
			kvm->arch.mmu_ready = 1;
			ret = -EBUSY;
			goto out_unlock;
		}
	}

and then after clearing kvm->arch.secure_guest below:

> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +		struct kvm_memory_slot *memslot;
> +		struct kvm_memslots *slots = __kvm_memslots(kvm, i);
> +
> +		if (!slots)
> +			continue;
> +
> +		kvm_for_each_memslot(memslot, slots) {
> +			kvmppc_uvmem_drop_pages(memslot, kvm, true);
> +			uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
> +		}
> +	}
> +	srcu_read_unlock(&kvm->srcu, srcu_idx);
> +
> +	ret = uv_svm_terminate(kvm->arch.lpid);
> +	if (ret != U_SUCCESS) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		spin_lock(&vcpu->arch.vpa_update_lock);
> +		unpin_vpa_reset(kvm, &vcpu->arch.dtl);
> +		unpin_vpa_reset(kvm, &vcpu->arch.slb_shadow);
> +		unpin_vpa_reset(kvm, &vcpu->arch.vpa);
> +		spin_unlock(&vcpu->arch.vpa_update_lock);
> +	}
> +
> +	ret = kvmppc_reinit_partition_table(kvm);
> +	if (ret)
> +		goto out;
> +
> +	kvm->arch.secure_guest = 0;

you need to do:

	kvm->arch.mmu_ready = mmu_was_ready;
 out_unlock:
	mutex_unlock(&kvm->arch.mmu_setup_lock);

> +out:
> +	return ret;
> +}
> +

With that extra check in place, it should be safe to unpin the vpas if
there is a good reason to do so.  ("Userspace has some bug that we
haven't found" isn't a good reason to do so.)

Paul.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-12  1:01     ` Ram Pai
@ 2019-11-12  5:38       ` Paul Mackerras
  2019-11-12  7:52         ` Ram Pai
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-12  5:38 UTC (permalink / raw)
  To: Ram Pai
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Mon, Nov 11, 2019 at 05:01:58PM -0800, Ram Pai wrote:
> On Mon, Nov 11, 2019 at 03:19:24PM +1100, Paul Mackerras wrote:
> > On Mon, Nov 04, 2019 at 09:47:59AM +0530, Bharata B Rao wrote:
> > > From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> > > 
> > > Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
> > > abort an SVM after it has issued the H_SVM_INIT_START and before the
> > > H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
> > > encounters security violations or other errors when starting an SVM.
> > > 
> > > Note that this hcall is different from UV_SVM_TERMINATE ucall which
> > > is used by HV to terminate/cleanup an SVM.
> > > 
> > > In case of H_SVM_INIT_ABORT, we should page-out all the pages back to
> > > HV (i.e., we should not skip the page-out). Otherwise the VM's pages,
> > > possibly including its text/data would be stuck in secure memory.
> > > Since the SVM did not go secure, its MSR_S bit will be clear and the
> > > VM wont be able to access its pages even to do a clean exit.
> > 
> > It seems fragile to me to have one more transfer back into the
> > ultravisor after this call.  Why does the UV need to do this call and
> > then get control back again one more time?  
> > Why can't the UV defer
> > doing this call until it can do it without expecting to see a return
> > from the hcall?  
> 
> Sure. But, what if the hypervisor calls back into the UV through a
> ucall, asking for some page to be paged-out?  If the ultravisor has
> cleaned up the state associated with the SVM, it wont be able to service
> that request.
> 
> H_SVM_INIT_ABORT is invoked to tell the hypervisor that the
> secure-state-transition for the VM cannot be continued any further.
> Hypervisor can than choose to do whatever with that information. It can
> cleanup its state, and/or make ucalls to get some information from the
> ultravisor.  It can also choose not to return control back to the ultravisor.
> 
> 
> > And if it does need to see a return from the hcall,
> > what would happen if a malicious hypervisor doesn't do the return?
> 
> That is fine.  At most it will be a denail-of-service attack.
> 
> RP
> 
> > 
> > Paul.
> 
> 
> 
> 
> 
> If the ultravisor cleans up the SVM's state on its side and then informs
> the Hypervisor to abort the SVM, the hypervisor will not be able to
> cleanly terminate the VM.  Because to terminate the SVM, the hypervisor
> still needs the services of the Ultravisor. For example: to get the
> pages back into the hypervisor if needed. Another example is, the
> hypervisor can call UV_SVM_TERMINATE.  Regardless of which ucall
> gets called, the ultravisor has to hold on to enough state of the
> SVM to service that request.

OK, that's a good reason.  That should be explained in the commit
message.

> The current design assumes that the hypervisor explicitly informs the
> ultravisor, that it is done with the SVM, through the UV_SVM_TERMINATE
> ucall. Till that point the Ultravisor must to be ready to service any
> ucalls made by the hypervisor on the SVM's behalf.

I see that UV_SVM_TERMINATE is done when the VM is being destroyed (at
which point kvm->arch.secure_guest doesn't matter any more), and in
kvmhv_svm_off(), where kvm->arch.secure_guest gets cleared
explicitly.  Hence I don't see any need for clearing it in the
assembly code on the next secure guest entry.  I think the change to
book3s_hv_rmhandlers.S can just be dropped.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-12  5:38       ` Paul Mackerras
@ 2019-11-12  7:52         ` Ram Pai
  2019-11-12 11:32           ` Paul Mackerras
  0 siblings, 1 reply; 39+ messages in thread
From: Ram Pai @ 2019-11-12  7:52 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Tue, Nov 12, 2019 at 04:38:36PM +1100, Paul Mackerras wrote:
> On Mon, Nov 11, 2019 at 05:01:58PM -0800, Ram Pai wrote:
> > On Mon, Nov 11, 2019 at 03:19:24PM +1100, Paul Mackerras wrote:
> > > On Mon, Nov 04, 2019 at 09:47:59AM +0530, Bharata B Rao wrote:
> > > > From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> > > > 
> > > > Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
> > > > abort an SVM after it has issued the H_SVM_INIT_START and before the
> > > > H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
> > > > encounters security violations or other errors when starting an SVM.
> > > > 
> > > > Note that this hcall is different from UV_SVM_TERMINATE ucall which
> > > > is used by HV to terminate/cleanup an SVM.
> > > > 
> > > > In case of H_SVM_INIT_ABORT, we should page-out all the pages back to
> > > > HV (i.e., we should not skip the page-out). Otherwise the VM's pages,
> > > > possibly including its text/data would be stuck in secure memory.
> > > > Since the SVM did not go secure, its MSR_S bit will be clear and the
> > > > VM wont be able to access its pages even to do a clean exit.
> > > 
...skip...
> > 
> > If the ultravisor cleans up the SVM's state on its side and then informs
> > the Hypervisor to abort the SVM, the hypervisor will not be able to
> > cleanly terminate the VM.  Because to terminate the SVM, the hypervisor
> > still needs the services of the Ultravisor. For example: to get the
> > pages back into the hypervisor if needed. Another example is, the
> > hypervisor can call UV_SVM_TERMINATE.  Regardless of which ucall
> > gets called, the ultravisor has to hold on to enough state of the
> > SVM to service that request.
> 
> OK, that's a good reason.  That should be explained in the commit
> message.
> 
> > The current design assumes that the hypervisor explicitly informs the
> > ultravisor, that it is done with the SVM, through the UV_SVM_TERMINATE
> > ucall. Till that point the Ultravisor must to be ready to service any
> > ucalls made by the hypervisor on the SVM's behalf.
> 
> I see that UV_SVM_TERMINATE is done when the VM is being destroyed (at
> which point kvm->arch.secure_guest doesn't matter any more), and in
> kvmhv_svm_off(), where kvm->arch.secure_guest gets cleared
> explicitly.  Hence I don't see any need for clearing it in the
> assembly code on the next secure guest entry.  I think the change to
> book3s_hv_rmhandlers.S can just be dropped.

There is subtle problem removing that code from the assembly.

If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
kvm->arch.secure_guest, the hypervisor will continue to think that the
VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
hcall was invoked, was to inform the Hypervisor that it should no longer
consider the VM as a Secure VM. So there is a inconsistency there.

This is fine, as long as the VM does not invoke any hcall or does not
receive any hypervisor-exceptions.  The moment either of those happen,
the control goes into the hypervisor, the hypervisor services
the exception/hcall and while returning, it will see that the
kvm->arch.secure_guest flag is set and **incorrectly** return
to the ultravisor through a UV_RETURN ucall.  Ultravisor will
not know what to do with it, because it does not consider that
VM as a Secure VM.  Bad things happen.

( Sidenote: when H_SVM_INIT_ABORT hcalls returns from the hypervisor,
  the ultravisor cleans up its internal state corresponding of that
  aborted-SVM and returns back to the caller with MSR[S]=0 )


RP


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-12  7:52         ` Ram Pai
@ 2019-11-12 11:32           ` Paul Mackerras
  2019-11-12 14:45             ` Ram Pai
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-12 11:32 UTC (permalink / raw)
  To: Ram Pai
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> On Tue, Nov 12, 2019 at 04:38:36PM +1100, Paul Mackerras wrote:
> > On Mon, Nov 11, 2019 at 05:01:58PM -0800, Ram Pai wrote:
> > > On Mon, Nov 11, 2019 at 03:19:24PM +1100, Paul Mackerras wrote:
> > > > On Mon, Nov 04, 2019 at 09:47:59AM +0530, Bharata B Rao wrote:
> > > > > From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> > > > > 
> > > > > Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
> > > > > abort an SVM after it has issued the H_SVM_INIT_START and before the
> > > > > H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
> > > > > encounters security violations or other errors when starting an SVM.
> > > > > 
> > > > > Note that this hcall is different from UV_SVM_TERMINATE ucall which
> > > > > is used by HV to terminate/cleanup an SVM.
> > > > > 
> > > > > In case of H_SVM_INIT_ABORT, we should page-out all the pages back to
> > > > > HV (i.e., we should not skip the page-out). Otherwise the VM's pages,
> > > > > possibly including its text/data would be stuck in secure memory.
> > > > > Since the SVM did not go secure, its MSR_S bit will be clear and the
> > > > > VM wont be able to access its pages even to do a clean exit.
> > > > 
> ...skip...
> > > 
> > > If the ultravisor cleans up the SVM's state on its side and then informs
> > > the Hypervisor to abort the SVM, the hypervisor will not be able to
> > > cleanly terminate the VM.  Because to terminate the SVM, the hypervisor
> > > still needs the services of the Ultravisor. For example: to get the
> > > pages back into the hypervisor if needed. Another example is, the
> > > hypervisor can call UV_SVM_TERMINATE.  Regardless of which ucall
> > > gets called, the ultravisor has to hold on to enough state of the
> > > SVM to service that request.
> > 
> > OK, that's a good reason.  That should be explained in the commit
> > message.
> > 
> > > The current design assumes that the hypervisor explicitly informs the
> > > ultravisor, that it is done with the SVM, through the UV_SVM_TERMINATE
> > > ucall. Till that point the Ultravisor must to be ready to service any
> > > ucalls made by the hypervisor on the SVM's behalf.
> > 
> > I see that UV_SVM_TERMINATE is done when the VM is being destroyed (at
> > which point kvm->arch.secure_guest doesn't matter any more), and in
> > kvmhv_svm_off(), where kvm->arch.secure_guest gets cleared
> > explicitly.  Hence I don't see any need for clearing it in the
> > assembly code on the next secure guest entry.  I think the change to
> > book3s_hv_rmhandlers.S can just be dropped.
> 
> There is subtle problem removing that code from the assembly.
> 
> If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> kvm->arch.secure_guest, the hypervisor will continue to think that the
> VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> hcall was invoked, was to inform the Hypervisor that it should no longer
> consider the VM as a Secure VM. So there is a inconsistency there.

Most of the checks that look at whether a VM is a secure VM use code
like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
take the false branch once we have set kvm->arch.secure_guest to
KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
most places we will treat the VM as a normal VM from then on.  If
there are any places where we still need to treat the VM as a secure
VM while we are processing the abort we can easily do that too.

> This is fine, as long as the VM does not invoke any hcall or does not
> receive any hypervisor-exceptions.  The moment either of those happen,
> the control goes into the hypervisor, the hypervisor services
> the exception/hcall and while returning, it will see that the
> kvm->arch.secure_guest flag is set and **incorrectly** return
> to the ultravisor through a UV_RETURN ucall.  Ultravisor will
> not know what to do with it, because it does not consider that
> VM as a Secure VM.  Bad things happen.

If bad things happen in the ultravisor because the hypervisor did
something it shouldn't, then it's game over, you just lost, thanks for
playing.  The ultravisor MUST be able to cope with bogus UV_RETURN
calls for a VM that it doesn't consider to be a secure VM.  You need
to work out how to handle such calls safely and implement that in the
ultravisor.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-12 11:32           ` Paul Mackerras
@ 2019-11-12 14:45             ` Ram Pai
  2019-11-13  0:14               ` Paul Mackerras
  0 siblings, 1 reply; 39+ messages in thread
From: Ram Pai @ 2019-11-12 14:45 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Tue, Nov 12, 2019 at 10:32:04PM +1100, Paul Mackerras wrote:
> On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> > On Tue, Nov 12, 2019 at 04:38:36PM +1100, Paul Mackerras wrote:
> > > On Mon, Nov 11, 2019 at 05:01:58PM -0800, Ram Pai wrote:
> > > > On Mon, Nov 11, 2019 at 03:19:24PM +1100, Paul Mackerras wrote:
> > > > > On Mon, Nov 04, 2019 at 09:47:59AM +0530, Bharata B Rao wrote:
> > > > > > From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> > > > > > 
> > > > > > Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
> > > > > > abort an SVM after it has issued the H_SVM_INIT_START and before the
> > > > > > H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
> > > > > > encounters security violations or other errors when starting an SVM.
> > > > > > 
> > > > > > Note that this hcall is different from UV_SVM_TERMINATE ucall which
> > > > > > is used by HV to terminate/cleanup an SVM.
> > > > > > 
> > > > > > In case of H_SVM_INIT_ABORT, we should page-out all the pages back to
> > > > > > HV (i.e., we should not skip the page-out). Otherwise the VM's pages,
> > > > > > possibly including its text/data would be stuck in secure memory.
> > > > > > Since the SVM did not go secure, its MSR_S bit will be clear and the
> > > > > > VM wont be able to access its pages even to do a clean exit.
> > > > > 
> > ...skip...
> > > > 
> > > > If the ultravisor cleans up the SVM's state on its side and then informs
> > > > the Hypervisor to abort the SVM, the hypervisor will not be able to
> > > > cleanly terminate the VM.  Because to terminate the SVM, the hypervisor
> > > > still needs the services of the Ultravisor. For example: to get the
> > > > pages back into the hypervisor if needed. Another example is, the
> > > > hypervisor can call UV_SVM_TERMINATE.  Regardless of which ucall
> > > > gets called, the ultravisor has to hold on to enough state of the
> > > > SVM to service that request.
> > > 
> > > OK, that's a good reason.  That should be explained in the commit
> > > message.
> > > 
> > > > The current design assumes that the hypervisor explicitly informs the
> > > > ultravisor, that it is done with the SVM, through the UV_SVM_TERMINATE
> > > > ucall. Till that point the Ultravisor must to be ready to service any
> > > > ucalls made by the hypervisor on the SVM's behalf.
> > > 
> > > I see that UV_SVM_TERMINATE is done when the VM is being destroyed (at
> > > which point kvm->arch.secure_guest doesn't matter any more), and in
> > > kvmhv_svm_off(), where kvm->arch.secure_guest gets cleared
> > > explicitly.  Hence I don't see any need for clearing it in the
> > > assembly code on the next secure guest entry.  I think the change to
> > > book3s_hv_rmhandlers.S can just be dropped.
> > 
> > There is subtle problem removing that code from the assembly.
> > 
> > If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> > kvm->arch.secure_guest, the hypervisor will continue to think that the
> > VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> > hcall was invoked, was to inform the Hypervisor that it should no longer
> > consider the VM as a Secure VM. So there is a inconsistency there.
> 
> Most of the checks that look at whether a VM is a secure VM use code
> like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
> since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
> take the false branch once we have set kvm->arch.secure_guest to
> KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
> most places we will treat the VM as a normal VM from then on.  If
> there are any places where we still need to treat the VM as a secure
> VM while we are processing the abort we can easily do that too.

Is the suggestion --  KVMPPC_SECURE_INIT_ABORT should never return back
to the Ultravisor?   Because removing that assembly code will NOT lead the
Hypervisor back into the Ultravisor.  This is fine with the
ultravisor. But then the hypervisor will not know where to return to.
If it wants to return directly to the VM, it wont know to
which address. It will be in a limbo.

> 
> > This is fine, as long as the VM does not invoke any hcall or does not
> > receive any hypervisor-exceptions.  The moment either of those happen,
> > the control goes into the hypervisor, the hypervisor services
> > the exception/hcall and while returning, it will see that the
> > kvm->arch.secure_guest flag is set and **incorrectly** return
> > to the ultravisor through a UV_RETURN ucall.  Ultravisor will
> > not know what to do with it, because it does not consider that
> > VM as a Secure VM.  Bad things happen.
> 
> If bad things happen in the ultravisor because the hypervisor did
> something it shouldn't, then it's game over, you just lost, thanks for
> playing.  The ultravisor MUST be able to cope with bogus UV_RETURN
> calls for a VM that it doesn't consider to be a secure VM.  You need
> to work out how to handle such calls safely and implement that in the
> ultravisor.

Actually we do handle this gracefully in the ultravisor :). 
We just retun back to the hypervisor saying "sorry dont know what
to do with it, please handle it yourself".

However hypervisor would not know what to do with that return, and bad
things happen in the hypervisor.

RP


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-12 14:45             ` Ram Pai
@ 2019-11-13  0:14               ` Paul Mackerras
  2019-11-13  6:32                 ` Ram Pai
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-13  0:14 UTC (permalink / raw)
  To: Ram Pai
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Tue, Nov 12, 2019 at 06:45:55AM -0800, Ram Pai wrote:
> On Tue, Nov 12, 2019 at 10:32:04PM +1100, Paul Mackerras wrote:
> > On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> > > There is subtle problem removing that code from the assembly.
> > > 
> > > If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> > > kvm->arch.secure_guest, the hypervisor will continue to think that the
> > > VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> > > hcall was invoked, was to inform the Hypervisor that it should no longer
> > > consider the VM as a Secure VM. So there is a inconsistency there.
> > 
> > Most of the checks that look at whether a VM is a secure VM use code
> > like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
> > since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
> > take the false branch once we have set kvm->arch.secure_guest to
> > KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
> > most places we will treat the VM as a normal VM from then on.  If
> > there are any places where we still need to treat the VM as a secure
> > VM while we are processing the abort we can easily do that too.
> 
> Is the suggestion --  KVMPPC_SECURE_INIT_ABORT should never return back
> to the Ultravisor?   Because removing that assembly code will NOT lead the

No.  The suggestion is that vcpu->arch.secure_guest stays set to
KVMPPC_SECURE_INIT_ABORT until userspace calls KVM_PPC_SVM_OFF.

> Hypervisor back into the Ultravisor.  This is fine with the
> ultravisor. But then the hypervisor will not know where to return to.
> If it wants to return directly to the VM, it wont know to
> which address. It will be in a limbo.
> 
> > 
> > > This is fine, as long as the VM does not invoke any hcall or does not
> > > receive any hypervisor-exceptions.  The moment either of those happen,
> > > the control goes into the hypervisor, the hypervisor services
> > > the exception/hcall and while returning, it will see that the
> > > kvm->arch.secure_guest flag is set and **incorrectly** return
> > > to the ultravisor through a UV_RETURN ucall.  Ultravisor will
> > > not know what to do with it, because it does not consider that
> > > VM as a Secure VM.  Bad things happen.
> > 
> > If bad things happen in the ultravisor because the hypervisor did
> > something it shouldn't, then it's game over, you just lost, thanks for
> > playing.  The ultravisor MUST be able to cope with bogus UV_RETURN
> > calls for a VM that it doesn't consider to be a secure VM.  You need
> > to work out how to handle such calls safely and implement that in the
> > ultravisor.
> 
> Actually we do handle this gracefully in the ultravisor :). 
> We just retun back to the hypervisor saying "sorry dont know what
> to do with it, please handle it yourself".
> 
> However hypervisor would not know what to do with that return, and bad
> things happen in the hypervisor.

Right.  We need something after the "sc 2" to handle the case where
the ultravisor returns with an error from the UV_RETURN.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-13  0:14               ` Paul Mackerras
@ 2019-11-13  6:32                 ` Ram Pai
  2019-11-13 21:18                   ` Paul Mackerras
  0 siblings, 1 reply; 39+ messages in thread
From: Ram Pai @ 2019-11-13  6:32 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Wed, Nov 13, 2019 at 11:14:27AM +1100, Paul Mackerras wrote:
> On Tue, Nov 12, 2019 at 06:45:55AM -0800, Ram Pai wrote:
> > On Tue, Nov 12, 2019 at 10:32:04PM +1100, Paul Mackerras wrote:
> > > On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> > > > There is subtle problem removing that code from the assembly.
> > > > 
> > > > If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> > > > kvm->arch.secure_guest, the hypervisor will continue to think that the
> > > > VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> > > > hcall was invoked, was to inform the Hypervisor that it should no longer
> > > > consider the VM as a Secure VM. So there is a inconsistency there.
> > > 
> > > Most of the checks that look at whether a VM is a secure VM use code
> > > like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
> > > since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
> > > take the false branch once we have set kvm->arch.secure_guest to
> > > KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
> > > most places we will treat the VM as a normal VM from then on.  If
> > > there are any places where we still need to treat the VM as a secure
> > > VM while we are processing the abort we can easily do that too.
> > 
> > Is the suggestion --  KVMPPC_SECURE_INIT_ABORT should never return back
> > to the Ultravisor?   Because removing that assembly code will NOT lead the
> 
> No.  The suggestion is that vcpu->arch.secure_guest stays set to
> KVMPPC_SECURE_INIT_ABORT until userspace calls KVM_PPC_SVM_OFF.

In the fast_guest_return path, if it finds 
(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_ABORT) is true, should it return to
UV or not?

Ideally it should return back to the ultravisor the first time
KVMPPC_SECURE_INIT_ABORT is set, and not than onwards.

RP


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 6/8] KVM: PPC: Support reset of secure guest
  2019-11-12  5:34   ` Paul Mackerras
@ 2019-11-13 15:29     ` Bharata B Rao
  2019-11-14  5:07       ` Paul Mackerras
  0 siblings, 1 reply; 39+ messages in thread
From: Bharata B Rao @ 2019-11-13 15:29 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Tue, Nov 12, 2019 at 04:34:34PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:58AM +0530, Bharata B Rao wrote:
> [snip]
> > @@ -5442,6 +5471,64 @@ static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
> >  	return rc;
> >  }
> >  
> > +/*
> > + *  IOCTL handler to turn off secure mode of guest
> > + *
> > + * - Issue ucall to terminate the guest on the UV side
> > + * - Unpin the VPA pages (Enables these pages to be migrated back
> > + *   when VM becomes secure again)
> > + * - Recreate partition table as the guest is transitioning back to
> > + *   normal mode
> > + * - Release all device pages
> > + */
> > +static int kvmhv_svm_off(struct kvm *kvm)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	int srcu_idx;
> > +	int ret = 0;
> > +	int i;
> > +
> > +	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> > +		return ret;
> > +
> 
> A further comment on this code: it should check that no vcpus are
> running and fail if any are running, and it should prevent any vcpus
> from running until the function is finished, using code like that in
> kvmhv_configure_mmu().  That is, it should do something like this:
> 
> 	mutex_lock(&kvm->arch.mmu_setup_lock);
> 	mmu_was_ready = kvm->arch.mmu_ready;
> 	if (kvm->arch.mmu_ready) {
> 		kvm->arch.mmu_ready = 0;
> 		/* order mmu_ready vs. vcpus_running */
> 		smp_mb();
> 		if (atomic_read(&kvm->arch.vcpus_running)) {
> 			kvm->arch.mmu_ready = 1;
> 			ret = -EBUSY;
> 			goto out_unlock;
> 		}
> 	}
> 
> and then after clearing kvm->arch.secure_guest below:
> 
> > +	srcu_idx = srcu_read_lock(&kvm->srcu);
> > +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > +		struct kvm_memory_slot *memslot;
> > +		struct kvm_memslots *slots = __kvm_memslots(kvm, i);
> > +
> > +		if (!slots)
> > +			continue;
> > +
> > +		kvm_for_each_memslot(memslot, slots) {
> > +			kvmppc_uvmem_drop_pages(memslot, kvm, true);
> > +			uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
> > +		}
> > +	}
> > +	srcu_read_unlock(&kvm->srcu, srcu_idx);
> > +
> > +	ret = uv_svm_terminate(kvm->arch.lpid);
> > +	if (ret != U_SUCCESS) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		spin_lock(&vcpu->arch.vpa_update_lock);
> > +		unpin_vpa_reset(kvm, &vcpu->arch.dtl);
> > +		unpin_vpa_reset(kvm, &vcpu->arch.slb_shadow);
> > +		unpin_vpa_reset(kvm, &vcpu->arch.vpa);
> > +		spin_unlock(&vcpu->arch.vpa_update_lock);
> > +	}
> > +
> > +	ret = kvmppc_reinit_partition_table(kvm);
> > +	if (ret)
> > +		goto out;
> > +
> > +	kvm->arch.secure_guest = 0;
> 
> you need to do:
> 
> 	kvm->arch.mmu_ready = mmu_was_ready;
>  out_unlock:
> 	mutex_unlock(&kvm->arch.mmu_setup_lock);
> 
> > +out:
> > +	return ret;
> > +}
> > +
> 
> With that extra check in place, it should be safe to unpin the vpas if
> there is a good reason to do so.  ("Userspace has some bug that we
> haven't found" isn't a good reason to do so.)

QEMU indeed does set_one_reg to reset the VPAs but that only marks
the VPA update as pending. The actual unpinning happens when vcpu
gets to run after reset at which time the VPAs are updated after
any unpinning (if required)

When secure guest reboots, vpu 0 gets to run and does unpin its
VPA pages and then proceeds with switching to secure. Here UV
tries to page-in all the guest pages, including the still pinned
VPA pages corresponding to other vcpus which haven't had a chance
to run till now. They are all still pinned and hence page-in fails.

To prevent this, we have to explicitly unpin the VPA pages during
this svm off ioctl. This will ensure that SMP secure guest is able
to reboot correctly.

So I will incorporate the code chunk you have shown above to fail
if any vcpu is running and prevent any vcpu from running when
we unpin VPAs from this ioctl.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-13  6:32                 ` Ram Pai
@ 2019-11-13 21:18                   ` Paul Mackerras
  2019-11-13 21:50                     ` Ram Pai
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-13 21:18 UTC (permalink / raw)
  To: Ram Pai
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Tue, Nov 12, 2019 at 10:32:33PM -0800, Ram Pai wrote:
> On Wed, Nov 13, 2019 at 11:14:27AM +1100, Paul Mackerras wrote:
> > On Tue, Nov 12, 2019 at 06:45:55AM -0800, Ram Pai wrote:
> > > On Tue, Nov 12, 2019 at 10:32:04PM +1100, Paul Mackerras wrote:
> > > > On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> > > > > There is subtle problem removing that code from the assembly.
> > > > > 
> > > > > If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> > > > > kvm->arch.secure_guest, the hypervisor will continue to think that the
> > > > > VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> > > > > hcall was invoked, was to inform the Hypervisor that it should no longer
> > > > > consider the VM as a Secure VM. So there is a inconsistency there.
> > > > 
> > > > Most of the checks that look at whether a VM is a secure VM use code
> > > > like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
> > > > since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
> > > > take the false branch once we have set kvm->arch.secure_guest to
> > > > KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
> > > > most places we will treat the VM as a normal VM from then on.  If
> > > > there are any places where we still need to treat the VM as a secure
> > > > VM while we are processing the abort we can easily do that too.
> > > 
> > > Is the suggestion --  KVMPPC_SECURE_INIT_ABORT should never return back
> > > to the Ultravisor?   Because removing that assembly code will NOT lead the
> > 
> > No.  The suggestion is that vcpu->arch.secure_guest stays set to
> > KVMPPC_SECURE_INIT_ABORT until userspace calls KVM_PPC_SVM_OFF.
> 
> In the fast_guest_return path, if it finds 
> (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_ABORT) is true, should it return to
> UV or not?
> 
> Ideally it should return back to the ultravisor the first time
> KVMPPC_SECURE_INIT_ABORT is set, and not than onwards.

What is ideal about that behavior?  Why would that be a particularly
good thing to do?

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-13 21:18                   ` Paul Mackerras
@ 2019-11-13 21:50                     ` Ram Pai
  2019-11-14  5:08                       ` Paul Mackerras
  0 siblings, 1 reply; 39+ messages in thread
From: Ram Pai @ 2019-11-13 21:50 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Thu, Nov 14, 2019 at 08:18:24AM +1100, Paul Mackerras wrote:
> On Tue, Nov 12, 2019 at 10:32:33PM -0800, Ram Pai wrote:
> > On Wed, Nov 13, 2019 at 11:14:27AM +1100, Paul Mackerras wrote:
> > > On Tue, Nov 12, 2019 at 06:45:55AM -0800, Ram Pai wrote:
> > > > On Tue, Nov 12, 2019 at 10:32:04PM +1100, Paul Mackerras wrote:
> > > > > On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> > > > > > There is subtle problem removing that code from the assembly.
> > > > > > 
> > > > > > If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> > > > > > kvm->arch.secure_guest, the hypervisor will continue to think that the
> > > > > > VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> > > > > > hcall was invoked, was to inform the Hypervisor that it should no longer
> > > > > > consider the VM as a Secure VM. So there is a inconsistency there.
> > > > > 
> > > > > Most of the checks that look at whether a VM is a secure VM use code
> > > > > like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
> > > > > since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
> > > > > take the false branch once we have set kvm->arch.secure_guest to
> > > > > KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
> > > > > most places we will treat the VM as a normal VM from then on.  If
> > > > > there are any places where we still need to treat the VM as a secure
> > > > > VM while we are processing the abort we can easily do that too.
> > > > 
> > > > Is the suggestion --  KVMPPC_SECURE_INIT_ABORT should never return back
> > > > to the Ultravisor?   Because removing that assembly code will NOT lead the
> > > 
> > > No.  The suggestion is that vcpu->arch.secure_guest stays set to
> > > KVMPPC_SECURE_INIT_ABORT until userspace calls KVM_PPC_SVM_OFF.
> > 
> > In the fast_guest_return path, if it finds 
> > (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_ABORT) is true, should it return to
> > UV or not?
> > 
> > Ideally it should return back to the ultravisor the first time
> > KVMPPC_SECURE_INIT_ABORT is set, and not than onwards.
> 
> What is ideal about that behavior?  Why would that be a particularly
> good thing to do?

It is following the rule -- "return back to the caller".

RP

-- 
Ram Pai


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 6/8] KVM: PPC: Support reset of secure guest
  2019-11-13 15:29     ` Bharata B Rao
@ 2019-11-14  5:07       ` Paul Mackerras
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Mackerras @ 2019-11-14  5:07 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Wed, Nov 13, 2019 at 08:59:08PM +0530, Bharata B Rao wrote:
> On Tue, Nov 12, 2019 at 04:34:34PM +1100, Paul Mackerras wrote:
> > On Mon, Nov 04, 2019 at 09:47:58AM +0530, Bharata B Rao wrote:
> > [snip]
> > > @@ -5442,6 +5471,64 @@ static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
> > >  	return rc;
> > >  }
> > >  
> > > +/*
> > > + *  IOCTL handler to turn off secure mode of guest
> > > + *
> > > + * - Issue ucall to terminate the guest on the UV side
> > > + * - Unpin the VPA pages (Enables these pages to be migrated back
> > > + *   when VM becomes secure again)
> > > + * - Recreate partition table as the guest is transitioning back to
> > > + *   normal mode
> > > + * - Release all device pages
> > > + */
> > > +static int kvmhv_svm_off(struct kvm *kvm)
> > > +{
> > > +	struct kvm_vcpu *vcpu;
> > > +	int srcu_idx;
> > > +	int ret = 0;
> > > +	int i;
> > > +
> > > +	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
> > > +		return ret;
> > > +
> > 
> > A further comment on this code: it should check that no vcpus are
> > running and fail if any are running, and it should prevent any vcpus
> > from running until the function is finished, using code like that in
> > kvmhv_configure_mmu().  That is, it should do something like this:
> > 
> > 	mutex_lock(&kvm->arch.mmu_setup_lock);
> > 	mmu_was_ready = kvm->arch.mmu_ready;
> > 	if (kvm->arch.mmu_ready) {
> > 		kvm->arch.mmu_ready = 0;
> > 		/* order mmu_ready vs. vcpus_running */
> > 		smp_mb();
> > 		if (atomic_read(&kvm->arch.vcpus_running)) {
> > 			kvm->arch.mmu_ready = 1;
> > 			ret = -EBUSY;
> > 			goto out_unlock;
> > 		}
> > 	}
> > 
> > and then after clearing kvm->arch.secure_guest below:
> > 
> > > +	srcu_idx = srcu_read_lock(&kvm->srcu);
> > > +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > > +		struct kvm_memory_slot *memslot;
> > > +		struct kvm_memslots *slots = __kvm_memslots(kvm, i);
> > > +
> > > +		if (!slots)
> > > +			continue;
> > > +
> > > +		kvm_for_each_memslot(memslot, slots) {
> > > +			kvmppc_uvmem_drop_pages(memslot, kvm, true);
> > > +			uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
> > > +		}
> > > +	}
> > > +	srcu_read_unlock(&kvm->srcu, srcu_idx);
> > > +
> > > +	ret = uv_svm_terminate(kvm->arch.lpid);
> > > +	if (ret != U_SUCCESS) {
> > > +		ret = -EINVAL;
> > > +		goto out;
> > > +	}
> > > +
> > > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > > +		spin_lock(&vcpu->arch.vpa_update_lock);
> > > +		unpin_vpa_reset(kvm, &vcpu->arch.dtl);
> > > +		unpin_vpa_reset(kvm, &vcpu->arch.slb_shadow);
> > > +		unpin_vpa_reset(kvm, &vcpu->arch.vpa);
> > > +		spin_unlock(&vcpu->arch.vpa_update_lock);
> > > +	}
> > > +
> > > +	ret = kvmppc_reinit_partition_table(kvm);
> > > +	if (ret)
> > > +		goto out;
> > > +
> > > +	kvm->arch.secure_guest = 0;
> > 
> > you need to do:
> > 
> > 	kvm->arch.mmu_ready = mmu_was_ready;
> >  out_unlock:
> > 	mutex_unlock(&kvm->arch.mmu_setup_lock);
> > 
> > > +out:
> > > +	return ret;
> > > +}
> > > +
> > 
> > With that extra check in place, it should be safe to unpin the vpas if
> > there is a good reason to do so.  ("Userspace has some bug that we
> > haven't found" isn't a good reason to do so.)
> 
> QEMU indeed does set_one_reg to reset the VPAs but that only marks
> the VPA update as pending. The actual unpinning happens when vcpu
> gets to run after reset at which time the VPAs are updated after
> any unpinning (if required)
> 
> When secure guest reboots, vpu 0 gets to run and does unpin its
> VPA pages and then proceeds with switching to secure. Here UV
> tries to page-in all the guest pages, including the still pinned
> VPA pages corresponding to other vcpus which haven't had a chance
> to run till now. They are all still pinned and hence page-in fails.
> 
> To prevent this, we have to explicitly unpin the VPA pages during
> this svm off ioctl. This will ensure that SMP secure guest is able
> to reboot correctly.

OK, that makes sense.  Please put a comment in the code explaining
this briefly.

> So I will incorporate the code chunk you have shown above to fail
> if any vcpu is running and prevent any vcpu from running when
> we unpin VPAs from this ioctl.

Sounds good.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-13 21:50                     ` Ram Pai
@ 2019-11-14  5:08                       ` Paul Mackerras
  2019-11-14  7:02                         ` Ram Pai
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Mackerras @ 2019-11-14  5:08 UTC (permalink / raw)
  To: Ram Pai
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Wed, Nov 13, 2019 at 01:50:42PM -0800, Ram Pai wrote:
> On Thu, Nov 14, 2019 at 08:18:24AM +1100, Paul Mackerras wrote:
> > On Tue, Nov 12, 2019 at 10:32:33PM -0800, Ram Pai wrote:
> > > On Wed, Nov 13, 2019 at 11:14:27AM +1100, Paul Mackerras wrote:
> > > > On Tue, Nov 12, 2019 at 06:45:55AM -0800, Ram Pai wrote:
> > > > > On Tue, Nov 12, 2019 at 10:32:04PM +1100, Paul Mackerras wrote:
> > > > > > On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> > > > > > > There is subtle problem removing that code from the assembly.
> > > > > > > 
> > > > > > > If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> > > > > > > kvm->arch.secure_guest, the hypervisor will continue to think that the
> > > > > > > VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> > > > > > > hcall was invoked, was to inform the Hypervisor that it should no longer
> > > > > > > consider the VM as a Secure VM. So there is a inconsistency there.
> > > > > > 
> > > > > > Most of the checks that look at whether a VM is a secure VM use code
> > > > > > like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
> > > > > > since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
> > > > > > take the false branch once we have set kvm->arch.secure_guest to
> > > > > > KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
> > > > > > most places we will treat the VM as a normal VM from then on.  If
> > > > > > there are any places where we still need to treat the VM as a secure
> > > > > > VM while we are processing the abort we can easily do that too.
> > > > > 
> > > > > Is the suggestion --  KVMPPC_SECURE_INIT_ABORT should never return back
> > > > > to the Ultravisor?   Because removing that assembly code will NOT lead the
> > > > 
> > > > No.  The suggestion is that vcpu->arch.secure_guest stays set to
> > > > KVMPPC_SECURE_INIT_ABORT until userspace calls KVM_PPC_SVM_OFF.
> > > 
> > > In the fast_guest_return path, if it finds 
> > > (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_ABORT) is true, should it return to
> > > UV or not?
> > > 
> > > Ideally it should return back to the ultravisor the first time
> > > KVMPPC_SECURE_INIT_ABORT is set, and not than onwards.
> > 
> > What is ideal about that behavior?  Why would that be a particularly
> > good thing to do?
> 
> It is following the rule -- "return back to the caller".

That doesn't address the question of why vcpu->arch.secure_guest
should be cleared at the point where we do that.

Paul.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall
  2019-11-14  5:08                       ` Paul Mackerras
@ 2019-11-14  7:02                         ` Ram Pai
  0 siblings, 0 replies; 39+ messages in thread
From: Ram Pai @ 2019-11-14  7:02 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Sukadev Bhattiprolu, cclaudio, kvm-ppc, Bharata B Rao, linux-mm,
	jglisse, Ram Pai, aneesh.kumar, paulus, sukadev, linuxppc-dev,
	hch

On Thu, Nov 14, 2019 at 04:08:25PM +1100, Paul Mackerras wrote:
> On Wed, Nov 13, 2019 at 01:50:42PM -0800, Ram Pai wrote:
> > On Thu, Nov 14, 2019 at 08:18:24AM +1100, Paul Mackerras wrote:
> > > On Tue, Nov 12, 2019 at 10:32:33PM -0800, Ram Pai wrote:
> > > > On Wed, Nov 13, 2019 at 11:14:27AM +1100, Paul Mackerras wrote:
> > > > > On Tue, Nov 12, 2019 at 06:45:55AM -0800, Ram Pai wrote:
> > > > > > On Tue, Nov 12, 2019 at 10:32:04PM +1100, Paul Mackerras wrote:
> > > > > > > On Mon, Nov 11, 2019 at 11:52:15PM -0800, Ram Pai wrote:
> > > > > > > > There is subtle problem removing that code from the assembly.
> > > > > > > > 
> > > > > > > > If the H_SVM_INIT_ABORT hcall returns to the ultravisor without clearing
> > > > > > > > kvm->arch.secure_guest, the hypervisor will continue to think that the
> > > > > > > > VM is a secure VM.   However the primary reason the H_SVM_INIT_ABORT
> > > > > > > > hcall was invoked, was to inform the Hypervisor that it should no longer
> > > > > > > > consider the VM as a Secure VM. So there is a inconsistency there.
> > > > > > > 
> > > > > > > Most of the checks that look at whether a VM is a secure VM use code
> > > > > > > like "if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)".  Now
> > > > > > > since KVMPPC_SECURE_INIT_ABORT is 4, an if statement such as that will
> > > > > > > take the false branch once we have set kvm->arch.secure_guest to
> > > > > > > KVMPPC_SECURE_INIT_ABORT in kvmppc_h_svm_init_abort.  So in fact in
> > > > > > > most places we will treat the VM as a normal VM from then on.  If
> > > > > > > there are any places where we still need to treat the VM as a secure
> > > > > > > VM while we are processing the abort we can easily do that too.
> > > > > > 
> > > > > > Is the suggestion --  KVMPPC_SECURE_INIT_ABORT should never return back
> > > > > > to the Ultravisor?   Because removing that assembly code will NOT lead the
> > > > > 
> > > > > No.  The suggestion is that vcpu->arch.secure_guest stays set to
> > > > > KVMPPC_SECURE_INIT_ABORT until userspace calls KVM_PPC_SVM_OFF.
> > > > 
> > > > In the fast_guest_return path, if it finds 
> > > > (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_ABORT) is true, should it return to
> > > > UV or not?
> > > > 
> > > > Ideally it should return back to the ultravisor the first time
> > > > KVMPPC_SECURE_INIT_ABORT is set, and not than onwards.
> > > 
> > > What is ideal about that behavior?  Why would that be a particularly
> > > good thing to do?
> > 
> > It is following the rule -- "return back to the caller".
> 
> That doesn't address the question of why vcpu->arch.secure_guest
> should be cleared at the point where we do that.

Ok. here is the sequence that I expect to happen.

-------------------------------------------------------------------
1) VM makes a ESM(Enter Secure Mode) ucall.

  1A) UV does the H_SVM_INIT_START hcall... the chit-chat between UV and HV happens
	and somewhere in that chit-chat, UV decides that the ESM call
	cannot be successfully completed.  Hence it calls
	H_SVM_INIT_ABORT to inform the Hypervisor.

    1A_a) Hypervisor aborts the VM's transition and returns back to the ultravisor.

  1B) UV cleans up the state of the VM on its side and returns
    	back to the VM, with failure.  VM is still in a normal VM state.
	Its MSR(S) is still 0.

2) VM gets a HDEC exception.

   2A) Hypervisor receives that HDEC exception. It handles the exception
   	and returns back to the VM.
-------------------------------------------------------------------

If you agree with the above sequence than, the patch needs all the proposed changes.


At step (1A_a) and step (2A),  the hypervisor is faced with the question
	--  Where should the control be returned to?

  for step 1A_a it has to return to the UV, and for step 2A it has to
  return to the VM. In other words the control has to **return back to
  the caller**.


It certainly has to rely on the vcpu->arch.secure_guest flag to do so.
If any flag in vcpu->arch.secure_guest is set, it has to return to 
UV.  Otherwise it has to return to VM.

This is the reason, the proposed patch in the fast_guest_return path
looks at the vcpu->arch.secure_guest. If it finds any flag set, it
returns to UV. However before returning to UV, it also clears all the
flags if  H_SVM_INIT_ABORT is set. This is to ensure that HV does not
return to the wrong place; i.e to UV, but returns to the VM at step 2A.

RP


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v10 1/8] mm: ksm: Export ksm_madvise()
  2019-11-07  5:45       ` Paul Mackerras
@ 2019-11-15 14:10         ` Bharata B Rao
  0 siblings, 0 replies; 39+ messages in thread
From: Bharata B Rao @ 2019-11-15 14:10 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxram, cclaudio, kvm-ppc, linux-mm, jglisse, aneesh.kumar,
	paulus, sukadev, linuxppc-dev, hch

On Thu, Nov 07, 2019 at 04:45:35PM +1100, Paul Mackerras wrote:
> On Wed, Nov 06, 2019 at 12:15:42PM +0530, Bharata B Rao wrote:
> > On Wed, Nov 06, 2019 at 03:33:29PM +1100, Paul Mackerras wrote:
> > > On Mon, Nov 04, 2019 at 09:47:53AM +0530, Bharata B Rao wrote:
> > > > KVM PPC module needs ksm_madvise() for supporting secure guests.
> > > > Guest pages that become secure are represented as device private
> > > > pages in the host. Such pages shouldn't participate in KSM merging.
> > > 
> > > If we don't do the ksm_madvise call, then as far as I can tell, it
> > > should all still work correctly, but we might have KSM pulling pages
> > > in unnecessarily, causing a reduction in performance.  Is that right?
> > 
> > I thought so too. When KSM tries to merge a secure page, it should
> > cause a fault resulting in page-out the secure page. However I see
> > the below crash when KSM is enabled and KSM scan tries to kmap and
> > memcmp the device private page.
> > 
> > BUG: Unable to handle kernel data access at 0xc007fffe00010000
> > Faulting instruction address: 0xc0000000000ab5a0
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> > Modules linked in:
> > CPU: 0 PID: 22 Comm: ksmd Not tainted 5.4.0-rc2-00026-g2249c0ae4a53-dirty #376
> > NIP:  c0000000000ab5a0 LR: c0000000003d7c3c CTR: 0000000000000004
> > REGS: c0000001c85d79b0 TRAP: 0300   Not tainted  (5.4.0-rc2-00026-g2249c0ae4a53-dirty)
> > MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002242  XER: 20040000
> > CFAR: c0000000000ab3d0 DAR: c007fffe00010000 DSISR: 40000000 IRQMASK: 0 
> > GPR00: 0000000000000004 c0000001c85d7c40 c0000000018ce000 c0000001c3880000 
> > GPR04: c007fffe00010000 0000000000010000 0000000000000000 ffffffffffffffff 
> > GPR08: c000000001992298 0000603820002138 ffffffffffffffff ffffffff00003a69 
> > GPR12: 0000000024002242 c000000002550000 c0000001c8700000 c00000000179b728 
> > GPR16: c00c01ffff800040 c00000000179b5b8 c00c00000070e200 ffffffffffffffff 
> > GPR20: 0000000000000000 0000000000000000 fffffffffffff000 c00000000179b648 
> > GPR24: c0000000024464a0 c00000000249f568 c000000001118918 0000000000000000 
> > GPR28: c0000001c804c590 c00000000249f518 0000000000000000 c0000001c8700000 
> > NIP [c0000000000ab5a0] memcmp+0x320/0x6a0
> > LR [c0000000003d7c3c] memcmp_pages+0x8c/0xe0
> > Call Trace:
> > [c0000001c85d7c40] [c0000001c804c590] 0xc0000001c804c590 (unreliable)
> > [c0000001c85d7c70] [c0000000004591d0] ksm_scan_thread+0x960/0x21b0
> > [c0000001c85d7db0] [c0000000001bf328] kthread+0x198/0x1a0
> > [c0000001c85d7e20] [c00000000000bfbc] ret_from_kernel_thread+0x5c/0x80
> > Instruction dump:
> > ebc1fff0 eba1ffe8 eb81ffe0 eb61ffd8 4e800020 38600001 4d810020 3860ffff 
> > 4e800020 38000004 7c0903a6 7d201c28 <7d402428> 7c295040 38630008 38840008 
> 
> Hmmm, that seems like a bug in the ZONE_DEVICE stuff generally.  All
> that ksm is doing as far as I can see is follow_page() and
> kmap_atomic().  I wonder how many other places in the kernel might
> also be prone to crashing if they try to touch device pages?

In the above shown crash, we don't go via follow_page() and hence
I believe we don't hit the fault path. I see that we come here
after getting the page from get_ksm_page() which returns a device
private page which the subsequent memcmp_pages() does kmap_atomic and
tries to access the address resulting in the above crash.

> 
> > In anycase, we wouldn't want secure guests pages to be pulled out due
> > to KSM, hence disabled merging.
> 
> Sure, I don't disagree with that, but I worry that we are papering
> over a bug here.

Looks like yes. May be someone with better understanding of KSM code
can comment here?

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, back to index

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-04  4:17 [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Bharata B Rao
2019-11-04  4:17 ` [PATCH v10 1/8] mm: ksm: Export ksm_madvise() Bharata B Rao
2019-11-06  4:33   ` Paul Mackerras
2019-11-06  6:45     ` Bharata B Rao
2019-11-07  5:45       ` Paul Mackerras
2019-11-15 14:10         ` Bharata B Rao
2019-11-04  4:17 ` [PATCH v10 2/8] KVM: PPC: Support for running secure guests Bharata B Rao
2019-11-06  4:34   ` Paul Mackerras
2019-11-04  4:17 ` [PATCH v10 3/8] KVM: PPC: Shared pages support for " Bharata B Rao
2019-11-06  4:52   ` Paul Mackerras
2019-11-06  8:22     ` Bharata B Rao
2019-11-06  8:29       ` Bharata B Rao
2019-11-04  4:17 ` [PATCH v10 4/8] KVM: PPC: Radix changes for secure guest Bharata B Rao
2019-11-06  5:58   ` Paul Mackerras
2019-11-06  8:36     ` Bharata B Rao
2019-11-04  4:17 ` [PATCH v10 5/8] KVM: PPC: Handle memory plug/unplug to secure VM Bharata B Rao
2019-11-11  4:25   ` Paul Mackerras
2019-11-04  4:17 ` [PATCH v10 6/8] KVM: PPC: Support reset of secure guest Bharata B Rao
2019-11-11  5:28   ` Paul Mackerras
2019-11-11  6:55     ` Bharata B Rao
2019-11-12  5:34   ` Paul Mackerras
2019-11-13 15:29     ` Bharata B Rao
2019-11-14  5:07       ` Paul Mackerras
2019-11-04  4:17 ` [PATCH v10 7/8] KVM: PPC: Implement H_SVM_INIT_ABORT hcall Bharata B Rao
2019-11-11  4:19   ` Paul Mackerras
2019-11-12  1:01     ` Ram Pai
2019-11-12  5:38       ` Paul Mackerras
2019-11-12  7:52         ` Ram Pai
2019-11-12 11:32           ` Paul Mackerras
2019-11-12 14:45             ` Ram Pai
2019-11-13  0:14               ` Paul Mackerras
2019-11-13  6:32                 ` Ram Pai
2019-11-13 21:18                   ` Paul Mackerras
2019-11-13 21:50                     ` Ram Pai
2019-11-14  5:08                       ` Paul Mackerras
2019-11-14  7:02                         ` Ram Pai
2019-11-04  4:18 ` [PATCH v10 8/8] KVM: PPC: Ultravisor: Add PPC_UV config option Bharata B Rao
2019-11-06  4:30 ` [PATCH v10 0/8] KVM: PPC: Driver to manage pages of secure guest Paul Mackerras
2019-11-06  6:20   ` Bharata B Rao

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git