linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] KVM implementation of PAPR HPT resizing extension
@ 2016-12-15  5:53 David Gibson
  2016-12-15  5:53 ` [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing David Gibson
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:53 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

Here is the KVM implementation for the proposed PAPR extension which
allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).

Using this requires a guest kernel with support for the extension.
Patches for guest side support in Linux were posted earlier:
  https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html

It also requires userspace (i.e. qemu) to intercept the HPT resizing
hypercalls and invoke the KVM ioctl()s to implement them.  This is
done instead of having KVM direclty intercept the hypercalls, so that
userspace can, if useful, impose additional restrictions on resizes:
for example it could refuse them entirely if policy for the VM
precludes resizing, or it could limit the size of HPT the guest can
request to meet resource limits.

Patches to implement the userspace part of HPT resizing are proposed
for qemu-2.9, and can be found at:
  https://github.com/dgibson/qemu/tree/upstream/hpt-resize

I'm posting these now, in the hopes that both these and the
corresponding guest side patches can be staged and merged for the 4.11
window.

David Gibson (11):
  powerpc/kvm: Reserve capabilities and ioctls for HPT resizing
  powerpc/kvm: Rename kvm_alloc_hpt() for clarity
  powerpc/kvm: Gather HPT related variables into sub-structure
  powerpc/kvm: Don't store values derivable from HPT order
  powerpc/kvm: Split HPT allocation from activation
  powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  powerpc/kvm: Create kvmppc_unmap_hpte_helper()
  powerpc/kvm: KVM-HV HPT resizing stub implementation
  powerpc/kvm: Outline of KVM-HV HPT resizing implementation
  powerpc/kvm: KVM-HV HPT resizing implementation
  powerpc/kvm: Advertise availablity of HPT resizing on KVM HV

 arch/powerpc/include/asm/kvm_book3s_64.h |  15 +
 arch/powerpc/include/asm/kvm_host.h      |  17 +-
 arch/powerpc/include/asm/kvm_ppc.h       |  15 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 629 +++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_hv.c             |  50 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c     |   8 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      |  62 +--
 arch/powerpc/kvm/powerpc.c               |   6 +
 include/uapi/linux/kvm.h                 |  10 +
 9 files changed, 637 insertions(+), 175 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
@ 2016-12-15  5:53 ` David Gibson
  2016-12-16  9:25   ` Thomas Huth
  2016-12-16 13:15   ` Thomas Huth
  2016-12-15  5:53 ` [PATCH 02/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity David Gibson
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:53 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime.

At present, HPT resizing is possible with KVM PR without kernel
modification, since the HPT is managed within qemu.  It's not possible yet
with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
use other capabilities which (by accident) reveal whether PR or HV is in
use to know if it can advertise HPT resizing capability to the guest.

To avoid ambiguity with existing kernels, the encoding is a bit odd.
    0 means "unknown" since that's what previous kernels will return
    1 means "HPT resize possible if available if and only if the HPT is allocated in
      userspace, rather than in the kernel".  Userspace can check
      KVM_CAP_PPC_ALLOC_HTAB to determine if that's the case.  In practice
      this will give the same results as userspace's fallback check.
    2 will mean "HPT resize available and implemented via ioctl()s
      KVM_PPC_RESIZE_HPT_PREPARE and KVM_PPC_RESIZE_HPT_COMMIT"

For now we always return 1, but the intention is to return 2 once HPT
resize is implemented for KVM HV.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/powerpc.c |  3 +++
 include/uapi/linux/kvm.h   | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index efd1183..bb23923 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SPAPR_MULTITCE:
 		r = 1;
 		break;
+	case KVM_CAP_SPAPR_RESIZE_HPT:
+		r = 1; /* resize allowed only if HPT is outside kernel */
+		break;
 #endif
 	case KVM_CAP_PPC_HTM:
 		r = cpu_has_feature(CPU_FTR_TM_COMP) &&
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cac48ed..904afe0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -685,6 +685,12 @@ struct kvm_ppc_smmu_info {
 	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
 };
 
+/* for KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} */
+struct kvm_ppc_resize_hpt {
+	__u64 flags;
+	__u32 shift;
+};
+
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
@@ -871,6 +877,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_USER_INSTR0 130
 #define KVM_CAP_MSI_DEVID 131
 #define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_SPAPR_RESIZE_HPT 133
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1187,6 +1194,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_ARM_SET_DEVICE_ADDR	  _IOW(KVMIO,  0xab, struct kvm_arm_device_addr)
 /* Available with KVM_CAP_PPC_RTAS */
 #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO,  0xac, struct kvm_rtas_token_args)
+/* Available with KVM_CAP_SPAPR_RESIZE_HPT */
+#define KVM_PPC_RESIZE_HPT_PREPARE _IOR(KVMIO, 0xad, struct kvm_ppc_resize_hpt)
+#define KVM_PPC_RESIZE_HPT_COMMIT _IOR(KVMIO, 0xae, struct kvm_ppc_resize_hpt)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
  2016-12-15  5:53 ` [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing David Gibson
@ 2016-12-15  5:53 ` David Gibson
  2016-12-16  9:03   ` Thomas Huth
  2016-12-15  5:53 ` [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure David Gibson
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:53 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 8 ++++----
 arch/powerpc/kvm/book3s_hv_builtin.c | 8 ++++----
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2da67bf..3db6be9 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -186,8 +186,8 @@ extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 		unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba);
-extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
-extern void kvm_release_hpt(struct page *page, unsigned long nr_pages);
+extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages);
+extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
 extern int kvmppc_core_init_vm(struct kvm *kvm);
 extern void kvmppc_core_destroy_vm(struct kvm *kvm);
 extern void kvmppc_core_free_memslot(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b795dd1..ae17cdd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 	}
 
 	kvm->arch.hpt_cma_alloc = 0;
-	page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT));
+	page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
 	if (page) {
 		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
 		memset((void *)hpt, 0, (1ul << order));
@@ -108,7 +108,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
  out_freehpt:
 	if (kvm->arch.hpt_cma_alloc)
-		kvm_release_hpt(page, 1 << (order - PAGE_SHIFT));
+		kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
 	else
 		free_pages(hpt, order - PAGE_SHIFT);
 	return -ENOMEM;
@@ -155,8 +155,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
 	kvmppc_free_lpid(kvm->arch.lpid);
 	vfree(kvm->arch.revmap);
 	if (kvm->arch.hpt_cma_alloc)
-		kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
-				1 << (kvm->arch.hpt_order - PAGE_SHIFT));
+		kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
+				 1 << (kvm->arch.hpt_order - PAGE_SHIFT));
 	else
 		free_pages(kvm->arch.hpt_virt,
 			   kvm->arch.hpt_order - PAGE_SHIFT);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 5bb24be..4c4aa47 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -52,19 +52,19 @@ static int __init early_parse_kvm_cma_resv(char *p)
 }
 early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv);
 
-struct page *kvm_alloc_hpt(unsigned long nr_pages)
+struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
 {
 	VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
 
 	return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
 }
-EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
+EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
 
-void kvm_release_hpt(struct page *page, unsigned long nr_pages)
+void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages)
 {
 	cma_release(kvm_cma, page, nr_pages);
 }
-EXPORT_SYMBOL_GPL(kvm_release_hpt);
+EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
 
 /**
  * kvm_cma_reserve() - reserve area for kvm hash pagetable
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
  2016-12-15  5:53 ` [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing David Gibson
  2016-12-15  5:53 ` [PATCH 02/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity David Gibson
@ 2016-12-15  5:53 ` David Gibson
  2016-12-16  9:24   ` Thomas Huth
  2016-12-15  5:53 ` [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order David Gibson
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:53 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_host.h | 16 ++++---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 ++++++++++++++++++-------------------
 arch/powerpc/kvm/book3s_hv.c        |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 ++++++++++++-------------
 4 files changed, 87 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index e59b172..2673271 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -241,12 +241,20 @@ struct kvm_arch_memory_slot {
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 };
 
+struct kvm_hpt_info {
+	unsigned long virt;
+	struct revmap_entry *rev;
+	unsigned long npte;
+	unsigned long mask;
+	u32 order;
+	int cma;
+};
+
 struct kvm_arch {
 	unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	unsigned int tlb_sets;
-	unsigned long hpt_virt;
-	struct revmap_entry *revmap;
+	struct kvm_hpt_info hpt;
 	atomic64_t mmio_update;
 	unsigned int host_lpid;
 	unsigned long host_lpcr;
@@ -256,14 +264,10 @@ struct kvm_arch {
 	unsigned long lpcr;
 	unsigned long vrma_slb_v;
 	int hpte_setup_done;
-	u32 hpt_order;
 	atomic_t vcpus_running;
 	u32 online_vcores;
-	unsigned long hpt_npte;
-	unsigned long hpt_mask;
 	atomic_t hpte_mod_interest;
 	cpumask_t need_tlb_flush;
-	int hpt_cma_alloc;
 	struct dentry *debugfs_dir;
 	struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ae17cdd..b5799d1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -61,12 +61,12 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 			order = PPC_MIN_HPT_ORDER;
 	}
 
-	kvm->arch.hpt_cma_alloc = 0;
+	kvm->arch.hpt.cma = 0;
 	page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
 	if (page) {
 		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
 		memset((void *)hpt, 0, (1ul << order));
-		kvm->arch.hpt_cma_alloc = 1;
+		kvm->arch.hpt.cma = 1;
 	}
 
 	/* Lastly try successively smaller sizes from the page allocator */
@@ -81,22 +81,22 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 	if (!hpt)
 		return -ENOMEM;
 
-	kvm->arch.hpt_virt = hpt;
-	kvm->arch.hpt_order = order;
+	kvm->arch.hpt.virt = hpt;
+	kvm->arch.hpt.order = order;
 	/* HPTEs are 2**4 bytes long */
-	kvm->arch.hpt_npte = 1ul << (order - 4);
+	kvm->arch.hpt.npte = 1ul << (order - 4);
 	/* 128 (2**7) bytes in each HPTEG */
-	kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
+	kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
 
 	atomic64_set(&kvm->arch.mmio_update, 0);
 
 	/* Allocate reverse map array */
-	rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
+	rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
 	if (!rev) {
 		pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
 		goto out_freehpt;
 	}
-	kvm->arch.revmap = rev;
+	kvm->arch.hpt.rev = rev;
 	kvm->arch.sdr1 = __pa(hpt) | (order - 18);
 
 	pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
@@ -107,7 +107,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 	return 0;
 
  out_freehpt:
-	if (kvm->arch.hpt_cma_alloc)
+	if (kvm->arch.hpt.cma)
 		kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
 	else
 		free_pages(hpt, order - PAGE_SHIFT);
@@ -129,10 +129,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 			goto out;
 		}
 	}
-	if (kvm->arch.hpt_virt) {
-		order = kvm->arch.hpt_order;
+	if (kvm->arch.hpt.virt) {
+		order = kvm->arch.hpt.order;
 		/* Set the entire HPT to 0, i.e. invalid HPTEs */
-		memset((void *)kvm->arch.hpt_virt, 0, 1ul << order);
+		memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
 		/*
 		 * Reset all the reverse-mapping chains for all memslots
 		 */
@@ -153,13 +153,13 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 void kvmppc_free_hpt(struct kvm *kvm)
 {
 	kvmppc_free_lpid(kvm->arch.lpid);
-	vfree(kvm->arch.revmap);
-	if (kvm->arch.hpt_cma_alloc)
-		kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
-				 1 << (kvm->arch.hpt_order - PAGE_SHIFT));
+	vfree(kvm->arch.hpt.rev);
+	if (kvm->arch.hpt.cma)
+		kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt.virt),
+				 1 << (kvm->arch.hpt.order - PAGE_SHIFT));
 	else
-		free_pages(kvm->arch.hpt_virt,
-			   kvm->arch.hpt_order - PAGE_SHIFT);
+		free_pages(kvm->arch.hpt.virt,
+			   kvm->arch.hpt.order - PAGE_SHIFT);
 }
 
 /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
@@ -194,8 +194,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
 	if (npages > 1ul << (40 - porder))
 		npages = 1ul << (40 - porder);
 	/* Can't use more than 1 HPTE per HPTEG */
-	if (npages > kvm->arch.hpt_mask + 1)
-		npages = kvm->arch.hpt_mask + 1;
+	if (npages > kvm->arch.hpt.mask + 1)
+		npages = kvm->arch.hpt.mask + 1;
 
 	hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
 		HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
@@ -205,7 +205,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
 	for (i = 0; i < npages; ++i) {
 		addr = i << porder;
 		/* can't use hpt_hash since va > 64 bits */
-		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt_mask;
+		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt.mask;
 		/*
 		 * We assume that the hash table is empty and no
 		 * vcpus are using it at this stage.  Since we create
@@ -338,11 +338,11 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 		preempt_enable();
 		return -ENOENT;
 	}
-	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
+	hptep = (__be64 *)(kvm->arch.hpt.virt + (index << 4));
 	v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
 	if (cpu_has_feature(CPU_FTR_ARCH_300))
 		v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
-	gr = kvm->arch.revmap[index].guest_rpte;
+	gr = kvm->arch.hpt.rev[index].guest_rpte;
 
 	unlock_hpte(hptep, orig_v);
 	preempt_enable();
@@ -480,8 +480,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		}
 	}
 	index = vcpu->arch.pgfault_index;
-	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
-	rev = &kvm->arch.revmap[index];
+	hptep = (__be64 *)(kvm->arch.hpt.virt + (index << 4));
+	rev = &kvm->arch.hpt.rev[index];
 	preempt_disable();
 	while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
 		cpu_relax();
@@ -745,7 +745,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   unsigned long gfn)
 {
-	struct revmap_entry *rev = kvm->arch.revmap;
+	struct revmap_entry *rev = kvm->arch.hpt.rev;
 	unsigned long h, i, j;
 	__be64 *hptep;
 	unsigned long ptel, psize, rcbits;
@@ -763,7 +763,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		 * rmap chain lock.
 		 */
 		i = *rmapp & KVMPPC_RMAP_INDEX;
-		hptep = (__be64 *) (kvm->arch.hpt_virt + (i << 4));
+		hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
 		if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
 			/* unlock rmap before spinning on the HPTE lock */
 			unlock_rmap(rmapp);
@@ -846,7 +846,7 @@ void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
 static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			 unsigned long gfn)
 {
-	struct revmap_entry *rev = kvm->arch.revmap;
+	struct revmap_entry *rev = kvm->arch.hpt.rev;
 	unsigned long head, i, j;
 	__be64 *hptep;
 	int ret = 0;
@@ -864,7 +864,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 
 	i = head = *rmapp & KVMPPC_RMAP_INDEX;
 	do {
-		hptep = (__be64 *) (kvm->arch.hpt_virt + (i << 4));
+		hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
 		j = rev[i].forw;
 
 		/* If this HPTE isn't referenced, ignore it */
@@ -904,7 +904,7 @@ int kvm_age_hva_hv(struct kvm *kvm, unsigned long start, unsigned long end)
 static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			      unsigned long gfn)
 {
-	struct revmap_entry *rev = kvm->arch.revmap;
+	struct revmap_entry *rev = kvm->arch.hpt.rev;
 	unsigned long head, i, j;
 	unsigned long *hp;
 	int ret = 1;
@@ -919,7 +919,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	if (*rmapp & KVMPPC_RMAP_PRESENT) {
 		i = head = *rmapp & KVMPPC_RMAP_INDEX;
 		do {
-			hp = (unsigned long *)(kvm->arch.hpt_virt + (i << 4));
+			hp = (unsigned long *)(kvm->arch.hpt.virt + (i << 4));
 			j = rev[i].forw;
 			if (be64_to_cpu(hp[1]) & HPTE_R_R)
 				goto out;
@@ -953,7 +953,7 @@ static int vcpus_running(struct kvm *kvm)
  */
 static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 {
-	struct revmap_entry *rev = kvm->arch.revmap;
+	struct revmap_entry *rev = kvm->arch.hpt.rev;
 	unsigned long head, i, j;
 	unsigned long n;
 	unsigned long v, r;
@@ -978,7 +978,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 	i = head = *rmapp & KVMPPC_RMAP_INDEX;
 	do {
 		unsigned long hptep1;
-		hptep = (__be64 *) (kvm->arch.hpt_virt + (i << 4));
+		hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
 		j = rev[i].forw;
 
 		/*
@@ -1290,8 +1290,8 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 	flags = ctx->flags;
 
 	i = ctx->index;
-	hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
-	revp = kvm->arch.revmap + i;
+	hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
+	revp = kvm->arch.hpt.rev + i;
 	lbuf = (unsigned long __user *)buf;
 
 	nb = 0;
@@ -1306,7 +1306,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 
 		/* Skip uninteresting entries, i.e. clean on not-first pass */
 		if (!first_pass) {
-			while (i < kvm->arch.hpt_npte &&
+			while (i < kvm->arch.hpt.npte &&
 			       !hpte_dirty(revp, hptp)) {
 				++i;
 				hptp += 2;
@@ -1316,7 +1316,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 		hdr.index = i;
 
 		/* Grab a series of valid entries */
-		while (i < kvm->arch.hpt_npte &&
+		while (i < kvm->arch.hpt.npte &&
 		       hdr.n_valid < 0xffff &&
 		       nb + HPTE_SIZE < count &&
 		       record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
@@ -1332,7 +1332,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 			++revp;
 		}
 		/* Now skip invalid entries while we can */
-		while (i < kvm->arch.hpt_npte &&
+		while (i < kvm->arch.hpt.npte &&
 		       hdr.n_invalid < 0xffff &&
 		       record_hpte(flags, hptp, hpte, revp, 0, first_pass)) {
 			/* found an invalid entry */
@@ -1353,7 +1353,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 		}
 
 		/* Check if we've wrapped around the hash table */
-		if (i >= kvm->arch.hpt_npte) {
+		if (i >= kvm->arch.hpt.npte) {
 			i = 0;
 			ctx->first_pass = 0;
 			break;
@@ -1412,11 +1412,11 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 
 		err = -EINVAL;
 		i = hdr.index;
-		if (i >= kvm->arch.hpt_npte ||
-		    i + hdr.n_valid + hdr.n_invalid > kvm->arch.hpt_npte)
+		if (i >= kvm->arch.hpt.npte ||
+		    i + hdr.n_valid + hdr.n_invalid > kvm->arch.hpt.npte)
 			break;
 
-		hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+		hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
 		lbuf = (unsigned long __user *)buf;
 		for (j = 0; j < hdr.n_valid; ++j) {
 			__be64 hpte_v;
@@ -1603,8 +1603,8 @@ static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
 
 	kvm = p->kvm;
 	i = p->hpt_index;
-	hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
-	for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+	hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
+	for (; len != 0 && i < kvm->arch.hpt.npte; ++i, hptp += 2) {
 		if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
 			continue;
 
@@ -1614,7 +1614,7 @@ static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
 			cpu_relax();
 		v = be64_to_cpu(hptp[0]) & ~HPTE_V_HVLOCK;
 		hr = be64_to_cpu(hptp[1]);
-		gr = kvm->arch.revmap[i].guest_rpte;
+		gr = kvm->arch.hpt.rev[i].guest_rpte;
 		unlock_hpte(hptp, v);
 		preempt_enable();
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8dcbe37..78baa2b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3114,7 +3114,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 		goto out;	/* another vcpu beat us to it */
 
 	/* Allocate hashed page table (if not done already) and reset it */
-	if (!kvm->arch.hpt_virt) {
+	if (!kvm->arch.hpt.virt) {
 		err = kvmppc_alloc_hpt(kvm, NULL);
 		if (err) {
 			pr_err("KVM: Couldn't alloc HPT\n");
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9ef3c4b..7e2b048 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -79,10 +79,10 @@ void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
 
 	if (*rmap & KVMPPC_RMAP_PRESENT) {
 		i = *rmap & KVMPPC_RMAP_INDEX;
-		head = &kvm->arch.revmap[i];
+		head = &kvm->arch.hpt.rev[i];
 		if (realmode)
 			head = real_vmalloc_addr(head);
-		tail = &kvm->arch.revmap[head->back];
+		tail = &kvm->arch.hpt.rev[head->back];
 		if (realmode)
 			tail = real_vmalloc_addr(tail);
 		rev->forw = i;
@@ -147,8 +147,8 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
 	lock_rmap(rmap);
 
 	head = *rmap & KVMPPC_RMAP_INDEX;
-	next = real_vmalloc_addr(&kvm->arch.revmap[rev->forw]);
-	prev = real_vmalloc_addr(&kvm->arch.revmap[rev->back]);
+	next = real_vmalloc_addr(&kvm->arch.hpt.rev[rev->forw]);
+	prev = real_vmalloc_addr(&kvm->arch.hpt.rev[rev->back]);
 	next->back = rev->back;
 	prev->forw = rev->forw;
 	if (head == pte_index) {
@@ -283,11 +283,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	/* Find and lock the HPTEG slot to use */
  do_insert:
-	if (pte_index >= kvm->arch.hpt_npte)
+	if (pte_index >= kvm->arch.hpt.npte)
 		return H_PARAMETER;
 	if (likely((flags & H_EXACT) == 0)) {
 		pte_index &= ~7UL;
-		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+		hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 		for (i = 0; i < 8; ++i) {
 			if ((be64_to_cpu(*hpte) & HPTE_V_VALID) == 0 &&
 			    try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
@@ -318,7 +318,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		}
 		pte_index += i;
 	} else {
-		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+		hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 		if (!try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
 				   HPTE_V_ABSENT)) {
 			/* Lock the slot and check again */
@@ -335,7 +335,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	}
 
 	/* Save away the guest's idea of the second HPTE dword */
-	rev = &kvm->arch.revmap[pte_index];
+	rev = &kvm->arch.hpt.rev[pte_index];
 	if (realmode)
 		rev = real_vmalloc_addr(rev);
 	if (rev) {
@@ -458,9 +458,9 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	struct revmap_entry *rev;
 	u64 pte, orig_pte, pte_r;
 
-	if (pte_index >= kvm->arch.hpt_npte)
+	if (pte_index >= kvm->arch.hpt.npte)
 		return H_PARAMETER;
-	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	pte = orig_pte = be64_to_cpu(hpte[0]);
@@ -476,7 +476,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 		return H_NOT_FOUND;
 	}
 
-	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
 	v = pte & ~HPTE_V_HVLOCK;
 	if (v & HPTE_V_VALID) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
@@ -544,13 +544,13 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 				break;
 			}
 			if (req != 1 || flags == 3 ||
-			    pte_index >= kvm->arch.hpt_npte) {
+			    pte_index >= kvm->arch.hpt.npte) {
 				/* parameter error */
 				args[j] = ((0xa0 | flags) << 56) + pte_index;
 				ret = H_PARAMETER;
 				break;
 			}
-			hp = (__be64 *) (kvm->arch.hpt_virt + (pte_index << 4));
+			hp = (__be64 *) (kvm->arch.hpt.virt + (pte_index << 4));
 			/* to avoid deadlock, don't spin except for first */
 			if (!try_lock_hpte(hp, HPTE_V_HVLOCK)) {
 				if (n)
@@ -587,7 +587,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 
 			args[j] = ((0x80 | flags) << 56) + pte_index;
-			rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+			rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
 			note_hpte_modification(kvm, rev);
 
 			if (!(hp0 & HPTE_V_VALID)) {
@@ -642,10 +642,10 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	unsigned long v, r, rb, mask, bits;
 	u64 pte_v, pte_r;
 
-	if (pte_index >= kvm->arch.hpt_npte)
+	if (pte_index >= kvm->arch.hpt.npte)
 		return H_PARAMETER;
 
-	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	v = pte_v = be64_to_cpu(hpte[0]);
@@ -665,7 +665,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	/* Update guest view of 2nd HPTE dword */
 	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
 		HPTE_R_KEY_HI | HPTE_R_KEY_LO;
-	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
 	if (rev) {
 		r = (rev->guest_rpte & ~mask) | bits;
 		rev->guest_rpte = r;
@@ -711,15 +711,15 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 	int i, n = 1;
 	struct revmap_entry *rev = NULL;
 
-	if (pte_index >= kvm->arch.hpt_npte)
+	if (pte_index >= kvm->arch.hpt.npte)
 		return H_PARAMETER;
 	if (flags & H_READ_4) {
 		pte_index &= ~3;
 		n = 4;
 	}
-	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
 	for (i = 0; i < n; ++i, ++pte_index) {
-		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+		hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
 		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
@@ -750,11 +750,11 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 	unsigned long *rmap;
 	long ret = H_NOT_FOUND;
 
-	if (pte_index >= kvm->arch.hpt_npte)
+	if (pte_index >= kvm->arch.hpt.npte)
 		return H_PARAMETER;
 
-	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
-	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	v = be64_to_cpu(hpte[0]);
@@ -796,11 +796,11 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 	unsigned long *rmap;
 	long ret = H_NOT_FOUND;
 
-	if (pte_index >= kvm->arch.hpt_npte)
+	if (pte_index >= kvm->arch.hpt.npte)
 		return H_PARAMETER;
 
-	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
-	hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	v = be64_to_cpu(hpte[0]);
@@ -949,7 +949,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		somask = (1UL << 28) - 1;
 		vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
 	}
-	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvm->arch.hpt_mask;
+	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvm->arch.hpt.mask;
 	avpn = slb_v & ~(somask >> 16);	/* also includes B */
 	avpn |= (eaddr & somask) >> 16;
 
@@ -960,7 +960,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 	val |= avpn;
 
 	for (;;) {
-		hpte = (__be64 *)(kvm->arch.hpt_virt + (hash << 7));
+		hpte = (__be64 *)(kvm->arch.hpt.virt + (hash << 7));
 
 		for (i = 0; i < 16; i += 2) {
 			/* Read the PTE racily */
@@ -996,7 +996,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		if (val & HPTE_V_SECONDARY)
 			break;
 		val |= HPTE_V_SECONDARY;
-		hash = hash ^ kvm->arch.hpt_mask;
+		hash = hash ^ kvm->arch.hpt.mask;
 	}
 	return -1;
 }
@@ -1045,14 +1045,14 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 				return status;	/* there really was no HPTE */
 			return 0;	/* for prot fault, HPTE disappeared */
 		}
-		hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
+		hpte = (__be64 *)(kvm->arch.hpt.virt + (index << 4));
 		v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
 		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
 			v = hpte_new_to_old_v(v, r);
 			r = hpte_new_to_old_r(r);
 		}
-		rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
+		rev = real_vmalloc_addr(&kvm->arch.hpt.rev[index]);
 		gr = rev->guest_rpte;
 
 		unlock_hpte(hpte, orig_v);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (2 preceding siblings ...)
  2016-12-15  5:53 ` [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure David Gibson
@ 2016-12-15  5:53 ` David Gibson
  2016-12-16 10:39   ` Thomas Huth
  2016-12-15  5:53 ` [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation David Gibson
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:53 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size.  The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 12 ++++++++++++
 arch/powerpc/include/asm/kvm_host.h      |  2 --
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 28 +++++++++++++---------------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 18 +++++++++---------
 4 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8482921..8810327 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -350,6 +350,18 @@ extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
 extern void kvmhv_rm_send_ipi(int cpu);
 
+static inline unsigned long kvmppc_hpt_npte(struct kvm_hpt_info *hpt)
+{
+	/* HPTEs are 2**4 bytes long */
+	return 1UL << (hpt->order - 4);
+}
+
+static inline unsigned long kvmppc_hpt_mask(struct kvm_hpt_info *hpt)
+{
+	/* 128 (2**7) bytes in each HPTEG */
+	return (1UL << (hpt->order - 7)) - 1;
+}
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2673271..3900f63 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -244,8 +244,6 @@ struct kvm_arch_memory_slot {
 struct kvm_hpt_info {
 	unsigned long virt;
 	struct revmap_entry *rev;
-	unsigned long npte;
-	unsigned long mask;
 	u32 order;
 	int cma;
 };
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b5799d1..fe88132 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 
 	kvm->arch.hpt.virt = hpt;
 	kvm->arch.hpt.order = order;
-	/* HPTEs are 2**4 bytes long */
-	kvm->arch.hpt.npte = 1ul << (order - 4);
-	/* 128 (2**7) bytes in each HPTEG */
-	kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
 
 	atomic64_set(&kvm->arch.mmio_update, 0);
 
 	/* Allocate reverse map array */
-	rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
+	rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
 	if (!rev) {
 		pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
 		goto out_freehpt;
@@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
 	if (npages > 1ul << (40 - porder))
 		npages = 1ul << (40 - porder);
 	/* Can't use more than 1 HPTE per HPTEG */
-	if (npages > kvm->arch.hpt.mask + 1)
-		npages = kvm->arch.hpt.mask + 1;
+	if (npages > kvmppc_hpt_mask(&kvm->arch.hpt) + 1)
+		npages = kvmppc_hpt_mask(&kvm->arch.hpt) + 1;
 
 	hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
 		HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
@@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
 	for (i = 0; i < npages; ++i) {
 		addr = i << porder;
 		/* can't use hpt_hash since va > 64 bits */
-		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt.mask;
+		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
+			& kvmppc_hpt_mask(&kvm->arch.hpt);
 		/*
 		 * We assume that the hash table is empty and no
 		 * vcpus are using it at this stage.  Since we create
@@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 
 		/* Skip uninteresting entries, i.e. clean on not-first pass */
 		if (!first_pass) {
-			while (i < kvm->arch.hpt.npte &&
+			while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
 			       !hpte_dirty(revp, hptp)) {
 				++i;
 				hptp += 2;
@@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 		hdr.index = i;
 
 		/* Grab a series of valid entries */
-		while (i < kvm->arch.hpt.npte &&
+		while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
 		       hdr.n_valid < 0xffff &&
 		       nb + HPTE_SIZE < count &&
 		       record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
@@ -1332,7 +1329,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 			++revp;
 		}
 		/* Now skip invalid entries while we can */
-		while (i < kvm->arch.hpt.npte &&
+		while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
 		       hdr.n_invalid < 0xffff &&
 		       record_hpte(flags, hptp, hpte, revp, 0, first_pass)) {
 			/* found an invalid entry */
@@ -1353,7 +1350,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
 		}
 
 		/* Check if we've wrapped around the hash table */
-		if (i >= kvm->arch.hpt.npte) {
+		if (i >= kvmppc_hpt_npte(&kvm->arch.hpt)) {
 			i = 0;
 			ctx->first_pass = 0;
 			break;
@@ -1412,8 +1409,8 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 
 		err = -EINVAL;
 		i = hdr.index;
-		if (i >= kvm->arch.hpt.npte ||
-		    i + hdr.n_valid + hdr.n_invalid > kvm->arch.hpt.npte)
+		if (i >= kvmppc_hpt_npte(&kvm->arch.hpt) ||
+		    i + hdr.n_valid + hdr.n_invalid > kvmppc_hpt_npte(&kvm->arch.hpt))
 			break;
 
 		hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
@@ -1604,7 +1601,8 @@ static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
 	kvm = p->kvm;
 	i = p->hpt_index;
 	hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
-	for (; len != 0 && i < kvm->arch.hpt.npte; ++i, hptp += 2) {
+	for (; len != 0 && i < kvmppc_hpt_npte(&kvm->arch.hpt);
+	     ++i, hptp += 2) {
 		if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
 			continue;
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 7e2b048..4ef3561 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -283,7 +283,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	/* Find and lock the HPTEG slot to use */
  do_insert:
-	if (pte_index >= kvm->arch.hpt.npte)
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 	if (likely((flags & H_EXACT) == 0)) {
 		pte_index &= ~7UL;
@@ -458,7 +458,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	struct revmap_entry *rev;
 	u64 pte, orig_pte, pte_r;
 
-	if (pte_index >= kvm->arch.hpt.npte)
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
@@ -544,7 +544,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 				break;
 			}
 			if (req != 1 || flags == 3 ||
-			    pte_index >= kvm->arch.hpt.npte) {
+			    pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt)) {
 				/* parameter error */
 				args[j] = ((0xa0 | flags) << 56) + pte_index;
 				ret = H_PARAMETER;
@@ -642,7 +642,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	unsigned long v, r, rb, mask, bits;
 	u64 pte_v, pte_r;
 
-	if (pte_index >= kvm->arch.hpt.npte)
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 
 	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
@@ -711,7 +711,7 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 	int i, n = 1;
 	struct revmap_entry *rev = NULL;
 
-	if (pte_index >= kvm->arch.hpt.npte)
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 	if (flags & H_READ_4) {
 		pte_index &= ~3;
@@ -750,7 +750,7 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 	unsigned long *rmap;
 	long ret = H_NOT_FOUND;
 
-	if (pte_index >= kvm->arch.hpt.npte)
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 
 	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
@@ -796,7 +796,7 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 	unsigned long *rmap;
 	long ret = H_NOT_FOUND;
 
-	if (pte_index >= kvm->arch.hpt.npte)
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 
 	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
@@ -949,7 +949,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		somask = (1UL << 28) - 1;
 		vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
 	}
-	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvm->arch.hpt.mask;
+	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvmppc_hpt_mask(&kvm->arch.hpt);
 	avpn = slb_v & ~(somask >> 16);	/* also includes B */
 	avpn |= (eaddr & somask) >> 16;
 
@@ -996,7 +996,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		if (val & HPTE_V_SECONDARY)
 			break;
 		val |= HPTE_V_SECONDARY;
-		hash = hash ^ kvm->arch.hpt.mask;
+		hash = hash ^ kvmppc_hpt_mask(&kvm->arch.hpt);
 	}
 	return -1;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (3 preceding siblings ...)
  2016-12-15  5:53 ` [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order David Gibson
@ 2016-12-15  5:53 ` David Gibson
  2016-12-16 11:57   ` Thomas Huth
  2016-12-15  5:53 ` [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size David Gibson
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:53 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM.  For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.

So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function.  Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.

We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation failed, we would
fall back to attempting smaller sizes with the page allocator.  Now, we
try first CMA, then the page allocator at each size.  As far as I can tell
this change should be harmless.

To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
call to kvmppc_free_lpid() that was there, we move to the single caller.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  3 ++
 arch/powerpc/include/asm/kvm_ppc.h       |  5 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 90 ++++++++++++++++----------------
 arch/powerpc/kvm/book3s_hv.c             | 18 +++++--
 4 files changed, 65 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8810327..6dc4004 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -22,6 +22,9 @@
 
 #include <asm/book3s/64/mmu-hash.h>
 
+/* Power architecture requires HPT is at least 256kB */
+#define PPC_MIN_HPT_ORDER	18
+
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 3db6be9..41575b8 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
 extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
 extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
 
-extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
+extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
 extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
-extern void kvmppc_free_hpt(struct kvm *kvm);
+extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
 				struct kvm_userspace_memory_region *mem);
 extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fe88132..68bb228 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,76 +40,70 @@
 
 #include "trace_hv.h"
 
-/* Power architecture requires HPT is at least 256kB */
-#define PPC_MIN_HPT_ORDER	18
-
 static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
 				long pte_index, unsigned long pteh,
 				unsigned long ptel, unsigned long *pte_idx_ret);
 static void kvmppc_rmap_reset(struct kvm *kvm);
 
-long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
+int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
 {
 	unsigned long hpt = 0;
-	struct revmap_entry *rev;
+	int cma = 0;
 	struct page *page = NULL;
-	long order = KVM_DEFAULT_HPT_ORDER;
-
-	if (htab_orderp) {
-		order = *htab_orderp;
-		if (order < PPC_MIN_HPT_ORDER)
-			order = PPC_MIN_HPT_ORDER;
-	}
+	struct revmap_entry *rev;
+	unsigned long npte;
 
-	kvm->arch.hpt.cma = 0;
+	hpt = 0;
+	cma = 0;
 	page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
 	if (page) {
 		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
 		memset((void *)hpt, 0, (1ul << order));
-		kvm->arch.hpt.cma = 1;
+		cma = 1;
 	}
 
-	/* Lastly try successively smaller sizes from the page allocator */
-	/* Only do this if userspace didn't specify a size via ioctl */
-	while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
-		hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
-				       __GFP_NOWARN, order - PAGE_SHIFT);
-		if (!hpt)
-			--order;
-	}
+	if (!hpt)
+		hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT
+				       |__GFP_NOWARN, order - PAGE_SHIFT);
 
 	if (!hpt)
 		return -ENOMEM;
 
-	kvm->arch.hpt.virt = hpt;
-	kvm->arch.hpt.order = order;
-
-	atomic64_set(&kvm->arch.mmio_update, 0);
+	/* HPTEs are 2**4 bytes long */
+	npte = 1ul << (order - 4);
 
 	/* Allocate reverse map array */
-	rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
+	rev = vmalloc(sizeof(struct revmap_entry) * npte);
 	if (!rev) {
-		pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
+		pr_err("kvmppc_allocate_hpt: Couldn't alloc reverse map array\n");
 		goto out_freehpt;
 	}
-	kvm->arch.hpt.rev = rev;
-	kvm->arch.sdr1 = __pa(hpt) | (order - 18);
 
-	pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
-		hpt, order, kvm->arch.lpid);
+	info->order = order;
+	info->virt = hpt;
+	info->cma = cma;
+	info->rev = rev;
 
-	if (htab_orderp)
-		*htab_orderp = order;
 	return 0;
 
  out_freehpt:
-	if (kvm->arch.hpt.cma)
+	if (cma)
 		kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
 	else
-		free_pages(hpt, order - PAGE_SHIFT);
+		free_pages(info->virt, order - PAGE_SHIFT);
 	return -ENOMEM;
 }
 
+void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
+{
+	atomic64_set(&kvm->arch.mmio_update, 0);
+	kvm->arch.hpt = *info;
+	kvm->arch.sdr1 = __pa(info->virt) | (info->order - 18);
+
+	pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
+		info->virt, (long)info->order, kvm->arch.lpid);
+}
+
 long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 {
 	long err = -EBUSY;
@@ -138,24 +132,28 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 		*htab_orderp = order;
 		err = 0;
 	} else {
-		err = kvmppc_alloc_hpt(kvm, htab_orderp);
-		order = *htab_orderp;
+		struct kvm_hpt_info info;
+
+		err = kvmppc_allocate_hpt(&info, *htab_orderp);
+		if (err < 0)
+			goto out;
+		kvmppc_set_hpt(kvm, &info);
 	}
  out:
 	mutex_unlock(&kvm->lock);
 	return err;
 }
 
-void kvmppc_free_hpt(struct kvm *kvm)
+void kvmppc_free_hpt(struct kvm_hpt_info *info)
 {
-	kvmppc_free_lpid(kvm->arch.lpid);
-	vfree(kvm->arch.hpt.rev);
-	if (kvm->arch.hpt.cma)
-		kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt.virt),
-				 1 << (kvm->arch.hpt.order - PAGE_SHIFT));
+	vfree(info->rev);
+	if (info->cma)
+		kvm_free_hpt_cma(virt_to_page(info->virt),
+				 1 << (info->order - PAGE_SHIFT));
 	else
-		free_pages(kvm->arch.hpt.virt,
-			   kvm->arch.hpt.order - PAGE_SHIFT);
+		free_pages(info->virt, info->order - PAGE_SHIFT);
+	info->virt = 0;
+	info->order = 0;
 }
 
 /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 78baa2b..71c5adb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3115,11 +3115,22 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 
 	/* Allocate hashed page table (if not done already) and reset it */
 	if (!kvm->arch.hpt.virt) {
-		err = kvmppc_alloc_hpt(kvm, NULL);
-		if (err) {
+		int order = KVM_DEFAULT_HPT_ORDER;
+		struct kvm_hpt_info info;
+
+		err = kvmppc_allocate_hpt(&info, order);
+		/* If we get here, it means userspace didn't specify a
+		 * size explicitly.  So, try successively smaller
+		 * sizes if the default failed. */
+		while (err < 0 && --order >= PPC_MIN_HPT_ORDER)
+			err  = kvmppc_allocate_hpt(&info, order);
+
+		if (err < 0) {
 			pr_err("KVM: Couldn't alloc HPT\n");
 			goto out;
 		}
+
+		kvmppc_set_hpt(kvm, &info);
 	}
 
 	/* Look up the memslot for guest physical address 0 */
@@ -3357,7 +3368,8 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 
 	kvmppc_free_vcores(kvm);
 
-	kvmppc_free_hpt(kvm);
+	kvmppc_free_lpid(kvm->arch.lpid);
+	kvmppc_free_hpt(&kvm->arch.hpt);
 
 	kvmppc_free_pimap(kvm);
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (4 preceding siblings ...)
  2016-12-15  5:53 ` [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation David Gibson
@ 2016-12-15  5:53 ` David Gibson
  2016-12-16 12:44   ` Thomas Huth
  2016-12-15  5:54 ` [PATCH 07/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper() David Gibson
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:53 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).

At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.

With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).

So, we need to allow this ioctl() to change the HPT size.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 53 ++++++++++++++++++++-----------------
 arch/powerpc/kvm/book3s_hv.c        |  5 +---
 3 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 41575b8..3b837bc 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
 extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
-extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
 extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
 extern long kvmppc_prepare_vrma(struct kvm *kvm,
 				struct kvm_userspace_memory_region *mem);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 68bb228..8e5ac2f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -104,10 +104,22 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
 		info->virt, (long)info->order, kvm->arch.lpid);
 }
 
-long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
+void kvmppc_free_hpt(struct kvm_hpt_info *info)
+{
+	vfree(info->rev);
+	if (info->cma)
+		kvm_free_hpt_cma(virt_to_page(info->virt),
+				 1 << (info->order - PAGE_SHIFT));
+	else
+		free_pages(info->virt, info->order - PAGE_SHIFT);
+	info->virt = 0;
+	info->order = 0;
+}
+
+long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 {
 	long err = -EBUSY;
-	long order;
+	struct kvm_hpt_info info;
 
 	mutex_lock(&kvm->lock);
 	if (kvm->arch.hpte_setup_done) {
@@ -119,8 +131,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 			goto out;
 		}
 	}
-	if (kvm->arch.hpt.virt) {
-		order = kvm->arch.hpt.order;
+	if (kvm->arch.hpt.order == order) {
+		/* We already have a suitable HPT */
+
 		/* Set the entire HPT to 0, i.e. invalid HPTEs */
 		memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
 		/*
@@ -129,33 +142,23 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 		kvmppc_rmap_reset(kvm);
 		/* Ensure that each vcpu will flush its TLB on next entry. */
 		cpumask_setall(&kvm->arch.need_tlb_flush);
-		*htab_orderp = order;
 		err = 0;
-	} else {
-		struct kvm_hpt_info info;
-
-		err = kvmppc_allocate_hpt(&info, *htab_orderp);
-		if (err < 0)
-			goto out;
-		kvmppc_set_hpt(kvm, &info);
+		goto out;
 	}
- out:
+
+	if (kvm->arch.hpt.virt)
+		kvmppc_free_hpt(&kvm->arch.hpt);
+
+	err = kvmppc_allocate_hpt(&info, order);
+	if (err < 0)
+		goto out;
+	kvmppc_set_hpt(kvm, &info);
+
+out:
 	mutex_unlock(&kvm->lock);
 	return err;
 }
 
-void kvmppc_free_hpt(struct kvm_hpt_info *info)
-{
-	vfree(info->rev);
-	if (info->cma)
-		kvm_free_hpt_cma(virt_to_page(info->virt),
-				 1 << (info->order - PAGE_SHIFT));
-	else
-		free_pages(info->virt, info->order - PAGE_SHIFT);
-	info->virt = 0;
-	info->order = 0;
-}
-
 /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
 static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize)
 {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 71c5adb..957e473 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3600,12 +3600,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
 		r = -EFAULT;
 		if (get_user(htab_order, (u32 __user *)argp))
 			break;
-		r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
+		r = kvmppc_alloc_reset_hpt(kvm, htab_order);
 		if (r)
 			break;
-		r = -EFAULT;
-		if (put_user(htab_order, (u32 __user *)argp))
-			break;
 		r = 0;
 		break;
 	}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper()
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (5 preceding siblings ...)
  2016-12-15  5:53 ` [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size David Gibson
@ 2016-12-15  5:54 ` David Gibson
  2016-12-16 12:48   ` Thomas Huth
  2016-12-15  5:54 ` [PATCH 08/11] powerpc/kvm: KVM-HV HPT resizing stub implementation David Gibson
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:54 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used
to force all guest mappings of a particular host page to be set ABSENT, and
removed from the reverse mappings.

For HPT resizing, we will have some cases where we want to set just a
single guest HPTE ABSENT and remove its reverse mappings.  To prepare with
this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE,
moving it to a new helper function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 77 +++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8e5ac2f..cd145eb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -740,13 +740,53 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
 }
 
+/* Must be called with both HPTE and rmap locked */
+static void kvmppc_unmap_hpte(struct kvm *kvm, unsigned long i,
+			      unsigned long *rmapp, unsigned long gfn)
+{
+	__be64 *hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
+	struct revmap_entry *rev = kvm->arch.hpt.rev;
+	unsigned long j, h;
+	unsigned long ptel, psize, rcbits;
+
+	j = rev[i].forw;
+	if (j == i) {
+		/* chain is now empty */
+		*rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
+	} else {
+		/* remove i from chain */
+		h = rev[i].back;
+		rev[h].forw = j;
+		rev[j].back = h;
+		rev[i].forw = rev[i].back = i;
+		*rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
+	}
+
+	/* Now check and modify the HPTE */
+	ptel = rev[i].guest_rpte;
+	psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
+	if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+	    hpte_rpn(ptel, psize) == gfn) {
+		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		kvmppc_invalidate_hpte(kvm, hptep, i);
+		hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+		/* Harvest R and C */
+		rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
+		*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+		if (rcbits & HPTE_R_C)
+			kvmppc_update_rmap_change(rmapp, psize);
+		if (rcbits & ~rev[i].guest_rpte) {
+			rev[i].guest_rpte = ptel | rcbits;
+			note_hpte_modification(kvm, &rev[i]);
+		}
+	}	
+}
+
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   unsigned long gfn)
 {
-	struct revmap_entry *rev = kvm->arch.hpt.rev;
-	unsigned long h, i, j;
+	unsigned long i;
 	__be64 *hptep;
-	unsigned long ptel, psize, rcbits;
 
 	for (;;) {
 		lock_rmap(rmapp);
@@ -769,37 +809,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 				cpu_relax();
 			continue;
 		}
-		j = rev[i].forw;
-		if (j == i) {
-			/* chain is now empty */
-			*rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
-		} else {
-			/* remove i from chain */
-			h = rev[i].back;
-			rev[h].forw = j;
-			rev[j].back = h;
-			rev[i].forw = rev[i].back = i;
-			*rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
-		}
 
-		/* Now check and modify the HPTE */
-		ptel = rev[i].guest_rpte;
-		psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
-		if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
-		    hpte_rpn(ptel, psize) == gfn) {
-			hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
-			kvmppc_invalidate_hpte(kvm, hptep, i);
-			hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
-			/* Harvest R and C */
-			rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
-			*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
-			if (rcbits & HPTE_R_C)
-				kvmppc_update_rmap_change(rmapp, psize);
-			if (rcbits & ~rev[i].guest_rpte) {
-				rev[i].guest_rpte = ptel | rcbits;
-				note_hpte_modification(kvm, &rev[i]);
-			}
-		}
+		kvmppc_unmap_hpte(kvm, i, rmapp, gfn);
 		unlock_rmap(rmapp);
 		__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 	}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/11] powerpc/kvm: KVM-HV HPT resizing stub implementation
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (6 preceding siblings ...)
  2016-12-15  5:54 ` [PATCH 07/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper() David Gibson
@ 2016-12-15  5:54 ` David Gibson
  2016-12-16 15:35   ` Thomas Huth
  2016-12-15  5:54 ` [PATCH 09/11] powerpc/kvm: Outline of KVM-HV HPT resizing implementation David Gibson
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:54 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

This patch adds a stub (always failing) implementation of the ioctl()s
for the HPT resizing PAPR extension.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_ppc.h  |  4 ++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 16 ++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c        | 22 ++++++++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 3b837bc..f8eaed0 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -215,6 +215,10 @@ extern void kvmppc_bookehv_exit(void);
 extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);
 
 extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *);
+extern long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
+					    struct kvm_ppc_resize_hpt *rhpt);
+extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
+					   struct kvm_ppc_resize_hpt *rhpt);
 
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cd145eb..ac0f18b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1160,6 +1160,22 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa,
 }
 
 /*
+ * HPT resizing
+ */
+
+long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
+				     struct kvm_ppc_resize_hpt *rhpt)
+{
+	return -EIO;
+}
+
+long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
+				    struct kvm_ppc_resize_hpt *rhpt)
+{
+	return -EIO;
+}
+
+/*
  * Functions for reading and writing the hash table via reads and
  * writes on a file descriptor.
  *
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 957e473..d022322 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3617,6 +3617,28 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
 		break;
 	}
 
+	case KVM_PPC_RESIZE_HPT_PREPARE: {
+		struct kvm_ppc_resize_hpt rhpt;
+
+		r = -EFAULT;
+		if (copy_from_user(&rhpt, argp, sizeof(rhpt)))
+			break;
+
+		r = kvm_vm_ioctl_resize_hpt_prepare(kvm, &rhpt);
+		break;
+	}
+
+	case KVM_PPC_RESIZE_HPT_COMMIT: {
+		struct kvm_ppc_resize_hpt rhpt;
+
+		r = -EFAULT;
+		if (copy_from_user(&rhpt, argp, sizeof(rhpt)))
+			break;
+
+		r = kvm_vm_ioctl_resize_hpt_commit(kvm, &rhpt);
+		break;
+	}
+
 	default:
 		r = -ENOTTY;
 	}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/11] powerpc/kvm: Outline of KVM-HV HPT resizing implementation
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (7 preceding siblings ...)
  2016-12-15  5:54 ` [PATCH 08/11] powerpc/kvm: KVM-HV HPT resizing stub implementation David Gibson
@ 2016-12-15  5:54 ` David Gibson
  2016-12-15  5:54 ` [PATCH 10/11] powerpc/kvm: " David Gibson
  2016-12-15  5:54 ` [PATCH 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV David Gibson
  10 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:54 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

This adds an outline (not yet working) of an implementation for the HPT
resizing PAPR extension.  Specifically it adds the work function which will
handle preparation for the resize, and synchronization between this, the
the HPT resizing hypercalls, the guest page fault path and guest HPT update
paths.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_host.h |   3 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 179 +++++++++++++++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_hv.c        |   3 +
 3 files changed, 183 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 3900f63..23559c3 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -248,6 +248,8 @@ struct kvm_hpt_info {
 	int cma;
 };
 
+struct kvm_resize_hpt;
+
 struct kvm_arch {
 	unsigned int lpid;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -268,6 +270,7 @@ struct kvm_arch {
 	cpumask_t need_tlb_flush;
 	struct dentry *debugfs_dir;
 	struct dentry *htab_dentry;
+	struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ac0f18b..a2ac749 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,9 +40,34 @@
 
 #include "trace_hv.h"
 
+//#define DEBUG_RESIZE_HPT	1
+
+#ifdef DEBUG_RESIZE_HPT
+#define resize_hpt_debug(resize, ...)				\
+	do {							\
+		printk(KERN_DEBUG "RESIZE HPT %p: ", resize);	\
+		printk(__VA_ARGS__);				\
+	} while (0)
+#else
+#define resize_hpt_debug(resize, ...)				\
+	do { } while (0)
+#endif
+
 static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
 				long pte_index, unsigned long pteh,
 				unsigned long ptel, unsigned long *pte_idx_ret);
+
+struct kvm_resize_hpt {
+	/* These fields read-only after init */
+	struct kvm *kvm;
+	struct work_struct work;
+	u32 order;
+
+	/* These fields protected by kvm->lock */
+	int error;
+	bool prepare_done;
+};
+
 static void kvmppc_rmap_reset(struct kvm *kvm);
 
 int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
@@ -1162,17 +1187,167 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa,
 /*
  * HPT resizing
  */
+static int resize_hpt_allocate(struct kvm_resize_hpt *resize)
+{
+	return 0;
+}
+
+static int resize_hpt_rehash(struct kvm_resize_hpt *resize)
+{
+	return -EIO;
+}
+
+static void resize_hpt_pivot(struct kvm_resize_hpt *resize)
+{
+}
+
+static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize)
+{
+	BUG_ON(kvm->arch.resize_hpt != resize);
+	kvm->arch.resize_hpt = NULL;
+	kfree(resize);
+}
+
+static void resize_hpt_prepare_work(struct work_struct *work)
+{
+	struct kvm_resize_hpt *resize = container_of(work,
+						     struct kvm_resize_hpt,
+						     work);
+	struct kvm *kvm = resize->kvm;
+	int err;
+
+	resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
+			 resize->order);
+
+	err = resize_hpt_allocate(resize);
+
+	mutex_lock(&kvm->lock);
+
+	resize->error = err;
+	resize->prepare_done = true;
+
+	mutex_unlock(&kvm->lock);
+}
 
 long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
 				     struct kvm_ppc_resize_hpt *rhpt)
 {
-	return -EIO;
+	unsigned long flags = rhpt->flags;
+	unsigned long shift = rhpt->shift;
+	struct kvm_resize_hpt *resize;
+	int ret;
+
+	if (flags != 0)
+		return -EINVAL;
+
+	if (shift && ((shift < 18) || (shift > 46)))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	resize = kvm->arch.resize_hpt;
+
+	if (resize) {
+		if (resize->order == shift) {
+			/* Suitable resize in progress */
+			if (resize->prepare_done) {
+				ret = resize->error;
+				if (ret != 0)
+					resize_hpt_release(kvm, resize);
+			} else {
+				ret = 100; /* estimated time in ms */
+			}
+
+			goto out;
+		}
+
+		/* not suitable, cancel it */
+		resize_hpt_release(kvm, resize);
+	}
+
+	ret = 0;
+	if (!shift)
+		goto out; /* nothing to do */
+
+	/* start new resize */
+
+	resize = kzalloc(sizeof(*resize), GFP_KERNEL);
+	resize->order = shift;
+	resize->kvm = kvm;
+	INIT_WORK(&resize->work, resize_hpt_prepare_work);
+	kvm->arch.resize_hpt = resize;
+
+	schedule_work(&resize->work);
+
+	ret = 100; /* estimated time in ms */
+
+out:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+static void resize_hpt_boot_vcpu(void *opaque)
+{
+	/* Nothing to do, just force a KVM exit */
 }
 
 long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 				    struct kvm_ppc_resize_hpt *rhpt)
 {
-	return -EIO;
+	unsigned long flags = rhpt->flags;
+	unsigned long shift = rhpt->shift;
+	struct kvm_resize_hpt *resize;
+	long ret;
+
+	if (flags != 0)
+		return -EINVAL;
+
+	if (shift && ((shift < 18) || (shift > 46)))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	resize = kvm->arch.resize_hpt;
+
+	/* This shouldn't be possible */
+	ret = -EIO;
+	if (WARN_ON(!kvm->arch.hpte_setup_done))
+		goto out_no_hpt;
+
+	/* Stop VCPUs from running while we mess with the HPT */
+	kvm->arch.hpte_setup_done = 0;
+	smp_mb();
+
+	/* Boot all CPUs out of the guest so they re-read
+	 * hpte_setup_done */
+	on_each_cpu(resize_hpt_boot_vcpu, NULL, 1);
+
+	ret = -ENXIO;
+	if (!resize || (resize->order != shift))
+		goto out;
+
+	ret = -EBUSY;
+	if (!resize->prepare_done)
+		goto out;
+
+	ret = resize->error;
+	if (ret != 0)
+		goto out;
+
+	ret = resize_hpt_rehash(resize);
+	if (ret != 0)
+		goto out;
+
+	resize_hpt_pivot(resize);
+
+out:
+	/* Let VCPUs run again */
+	kvm->arch.hpte_setup_done = 1;
+	smp_mb();
+out_no_hpt:
+	resize_hpt_release(kvm, resize);
+	mutex_unlock(&kvm->lock);
+	return ret;
 }
 
 /*
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d022322..613b8a5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3323,6 +3323,9 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 
 	kvm->arch.lpcr = lpcr;
 
+	/* Initialization for future HPT resizes */
+	kvm->arch.resize_hpt = NULL;
+
 	/*
 	 * Work out how many sets the TLB has, for the use of
 	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/11] powerpc/kvm: KVM-HV HPT resizing implementation
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (8 preceding siblings ...)
  2016-12-15  5:54 ` [PATCH 09/11] powerpc/kvm: Outline of KVM-HV HPT resizing implementation David Gibson
@ 2016-12-15  5:54 ` David Gibson
  2016-12-15  5:54 ` [PATCH 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV David Gibson
  10 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:54 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

This adds the "guts" of the implementation for the HPT resizing PAPR
extension.  It has the code to allocate and clear a new HPT, rehash an
existing HPT's entries into it, and accomplish the switchover for a
KVM guest from the old HPT to the new one.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 188 +++++++++++++++++++++++++++++++++++-
 1 file changed, 187 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index a2ac749..2a86c07 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -66,6 +66,10 @@ struct kvm_resize_hpt {
 	/* These fields protected by kvm->lock */
 	int error;
 	bool prepare_done;
+
+	/* Private to the work thread, until prepare_done is true,
+	 * then protected by kvm->resize_hpt_sem */
+	struct kvm_hpt_info hpt;
 };
 
 static void kvmppc_rmap_reset(struct kvm *kvm);
@@ -1189,21 +1193,203 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa,
  */
 static int resize_hpt_allocate(struct kvm_resize_hpt *resize)
 {
+	int rc;
+
+	rc = kvmppc_allocate_hpt(&resize->hpt, resize->order);
+	if (rc < 0)
+		return rc;
+
+	resize_hpt_debug(resize, "resize_hpt_allocate(): HPT @ 0x%lx\n",
+			 resize->hpt.virt);
+
 	return 0;
 }
 
+static unsigned long resize_hpt_rehash_hpte(struct kvm_resize_hpt *resize,
+					    unsigned long idx)
+{
+	struct kvm *kvm = resize->kvm;
+	struct kvm_hpt_info *old = &kvm->arch.hpt;
+	struct kvm_hpt_info *new = &resize->hpt;
+	unsigned long old_hash_mask = (1ULL << (old->order - 7)) - 1;
+	unsigned long new_hash_mask = (1ULL << (new->order - 7)) - 1;
+	__be64 *hptep, *new_hptep;
+	unsigned long vpte, rpte, guest_rpte;
+	int ret;
+	struct revmap_entry *rev;
+	unsigned long apsize, psize, avpn, pteg, hash;
+	unsigned long new_idx, new_pteg, replace_vpte;
+
+	hptep = (__be64 *)(old->virt + (idx << 4));
+
+	/* Guest is stopped, so new HPTEs can't be added or faulted
+	 * in, only unmapped or altered by host actions.  So, it's
+	 * safe to check this before we take the HPTE lock */
+	vpte = be64_to_cpu(hptep[0]);
+	if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT))
+		return 0; /* nothing to do */
+
+	while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+		cpu_relax();
+
+	vpte = be64_to_cpu(hptep[0]);
+
+	ret = 0;
+	if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT))
+		/* Nothing to do */
+		goto out;
+
+	/* Unmap */
+	rev = &old->rev[idx];
+	guest_rpte = rev->guest_rpte;
+
+	ret = -EIO;
+	apsize = hpte_page_size(vpte, guest_rpte);
+	if (!apsize)
+		goto out;
+
+	if (vpte & HPTE_V_VALID) {
+		unsigned long gfn = hpte_rpn(guest_rpte, apsize);
+		int srcu_idx = srcu_read_lock(&kvm->srcu);
+		struct kvm_memory_slot *memslot =
+			__gfn_to_memslot(kvm_memslots(kvm), gfn);
+
+		if (memslot) {
+			unsigned long *rmapp;
+			rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
+
+			lock_rmap(rmapp);
+			kvmppc_unmap_hpte(kvm, idx, rmapp, gfn);
+			unlock_rmap(rmapp);
+		}
+
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
+	}
+
+	/* Reload PTE after unmap */
+	vpte = be64_to_cpu(hptep[0]);
+
+	BUG_ON(vpte & HPTE_V_VALID);
+	BUG_ON(!(vpte & HPTE_V_ABSENT));
+
+	ret = 0;
+	if (!(vpte & HPTE_V_BOLTED))
+		goto out;
+
+	rpte = be64_to_cpu(hptep[1]);
+	psize = hpte_base_page_size(vpte, rpte);
+	avpn = HPTE_V_AVPN_VAL(vpte) & ~((psize - 1) >> 23);
+	pteg = idx / HPTES_PER_GROUP;
+	if (vpte & HPTE_V_SECONDARY)
+		pteg = ~pteg;
+
+	if (!(vpte & HPTE_V_1TB_SEG)) {
+		unsigned long offset, vsid;
+
+		/* We only have 28 - 23 bits of offset in avpn */
+		offset = (avpn & 0x1f) << 23;
+		vsid = avpn >> 5;
+		/* We can find more bits from the pteg value */
+		if (psize < (1ULL << 23))
+			offset |= ((vsid ^ pteg) & old_hash_mask) * psize;
+
+		hash = vsid ^ (offset / psize);
+	} else {
+		unsigned long offset, vsid;
+
+		/* We only have 40 - 23 bits of seg_off in avpn */
+		offset = (avpn & 0x1ffff) << 23;
+		vsid = avpn >> 17;
+		if (psize < (1ULL << 23))
+			offset |= ((vsid ^ (vsid << 25) ^ pteg) & old_hash_mask) * psize;
+
+		hash = vsid ^ (vsid << 25) ^ (offset / psize);
+	}
+
+	new_pteg = hash & new_hash_mask;
+	if (vpte & HPTE_V_SECONDARY) {
+		BUG_ON(~pteg != (hash & old_hash_mask));
+		new_pteg = ~new_pteg;
+	} else {
+		BUG_ON(pteg != (hash & old_hash_mask));
+	}
+
+	new_idx = new_pteg * HPTES_PER_GROUP + (idx % HPTES_PER_GROUP);
+	new_hptep = (__be64 *)(new->virt + (new_idx << 4));
+
+	replace_vpte = be64_to_cpu(new_hptep[0]);
+
+	if (replace_vpte & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+		BUG_ON(new->order >= old->order);
+
+		if (replace_vpte & HPTE_V_BOLTED) {
+			if (vpte & HPTE_V_BOLTED)
+				/* Bolted collision, nothing we can do */
+				ret = -ENOSPC;
+			/* Discard the new HPTE */
+			goto out;
+		}
+
+		/* Discard the previous HPTE */
+	}
+
+	new_hptep[1] = cpu_to_be64(rpte);
+	new->rev[new_idx].guest_rpte = guest_rpte;
+	/* No need for a barrier, since new HPT isn't active */
+	new_hptep[0] = cpu_to_be64(vpte);
+	unlock_hpte(new_hptep, vpte);
+
+out:
+	unlock_hpte(hptep, vpte);
+	return ret;
+}
+
 static int resize_hpt_rehash(struct kvm_resize_hpt *resize)
 {
-	return -EIO;
+	struct kvm *kvm = resize->kvm;
+	unsigned  long i;
+	int rc;
+
+	for (i = 0; i < kvmppc_hpt_npte(&kvm->arch.hpt); i++) {
+		rc = resize_hpt_rehash_hpte(resize, i);
+		if (rc != 0)
+			return rc;
+	}
+
+	return 0;
 }
 
 static void resize_hpt_pivot(struct kvm_resize_hpt *resize)
 {
+	struct kvm *kvm = resize->kvm;
+	struct kvm_hpt_info hpt_tmp;
+
+	/* Exchange the pending tables in the resize structure with
+	 * the active tables */
+
+	resize_hpt_debug(resize, "resize_hpt_pivot()\n");
+
+	spin_lock(&kvm->mmu_lock);
+	asm volatile("ptesync" : : : "memory");
+
+	hpt_tmp = kvm->arch.hpt;
+	kvmppc_set_hpt(kvm, &resize->hpt);
+	resize->hpt = hpt_tmp;
+
+	spin_unlock(&kvm->mmu_lock);
+
+	synchronize_srcu_expedited(&kvm->srcu);
+
+	resize_hpt_debug(resize, "resize_hpt_pivot() done\n");
 }
 
 static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize)
 {
 	BUG_ON(kvm->arch.resize_hpt != resize);
+
+	if (resize->hpt.virt)
+		kvmppc_free_hpt(&resize->hpt);
+
 	kvm->arch.resize_hpt = NULL;
 	kfree(resize);
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV
  2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
                   ` (9 preceding siblings ...)
  2016-12-15  5:54 ` [PATCH 10/11] powerpc/kvm: " David Gibson
@ 2016-12-15  5:54 ` David Gibson
  10 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-15  5:54 UTC (permalink / raw)
  To: paulus
  Cc: michael, benh, sjitindarsingh, thuth, lvivier, linuxppc-dev,
	linux-kernel, kvm, David Gibson

This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the
presence of in-kernel HPT resizing on KVM HV.  In fact the HPT resizing
isn't fully implemented, but this allows us to experiment with what's
there.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/powerpc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index bb23923..965d26b 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -606,7 +606,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 	case KVM_CAP_SPAPR_RESIZE_HPT:
-		r = 1; /* resize allowed only if HPT is outside kernel */
+		if (hv_enabled)
+			r = 2; /* In-kernel resize implementation */
+		else
+			r = 1; /* outside kernel resize allowed */
 		break;
 #endif
 	case KVM_CAP_PPC_HTM:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity
  2016-12-15  5:53 ` [PATCH 02/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity David Gibson
@ 2016-12-16  9:03   ` Thomas Huth
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Huth @ 2016-12-16  9:03 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:53, David Gibson wrote:
> The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
> all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
> by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
> it with CMA only.
> 
> To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
> Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
>  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 8 ++++----
>  arch/powerpc/kvm/book3s_hv_builtin.c | 8 ++++----
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 2da67bf..3db6be9 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -186,8 +186,8 @@ extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long tce_value, unsigned long npages);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba);
> -extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> -extern void kvm_release_hpt(struct page *page, unsigned long nr_pages);
> +extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages);
> +extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
>  extern int kvmppc_core_init_vm(struct kvm *kvm);
>  extern void kvmppc_core_destroy_vm(struct kvm *kvm);
>  extern void kvmppc_core_free_memslot(struct kvm *kvm,
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index b795dd1..ae17cdd 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>  	}
>  
>  	kvm->arch.hpt_cma_alloc = 0;
> -	page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT));
> +	page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
>  	if (page) {
>  		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
>  		memset((void *)hpt, 0, (1ul << order));
> @@ -108,7 +108,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>  
>   out_freehpt:
>  	if (kvm->arch.hpt_cma_alloc)
> -		kvm_release_hpt(page, 1 << (order - PAGE_SHIFT));
> +		kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
>  	else
>  		free_pages(hpt, order - PAGE_SHIFT);
>  	return -ENOMEM;
> @@ -155,8 +155,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
>  	kvmppc_free_lpid(kvm->arch.lpid);
>  	vfree(kvm->arch.revmap);
>  	if (kvm->arch.hpt_cma_alloc)
> -		kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
> -				1 << (kvm->arch.hpt_order - PAGE_SHIFT));
> +		kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
> +				 1 << (kvm->arch.hpt_order - PAGE_SHIFT));
>  	else
>  		free_pages(kvm->arch.hpt_virt,
>  			   kvm->arch.hpt_order - PAGE_SHIFT);
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 5bb24be..4c4aa47 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -52,19 +52,19 @@ static int __init early_parse_kvm_cma_resv(char *p)
>  }
>  early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv);
>  
> -struct page *kvm_alloc_hpt(unsigned long nr_pages)
> +struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
>  {
>  	VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
>  
>  	return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
>  }
> -EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
> +EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
>  
> -void kvm_release_hpt(struct page *page, unsigned long nr_pages)
> +void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages)
>  {
>  	cma_release(kvm_cma, page, nr_pages);
>  }
> -EXPORT_SYMBOL_GPL(kvm_release_hpt);
> +EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
>  
>  /**
>   * kvm_cma_reserve() - reserve area for kvm hash pagetable

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure
  2016-12-15  5:53 ` [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure David Gibson
@ 2016-12-16  9:24   ` Thomas Huth
  2016-12-19  0:03     ` David Gibson
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2016-12-16  9:24 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:53, David Gibson wrote:
> Currently, the powerpc kvm_arch structure contains a number of variables
> tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
> patch gathers them all together into a single kvm_hpt_info substructure.
> This makes life more convenient for the upcoming HPT resizing
> implementation.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/include/asm/kvm_host.h | 16 ++++---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 ++++++++++++++++++-------------------
>  arch/powerpc/kvm/book3s_hv.c        |  2 +-
>  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 ++++++++++++-------------
>  4 files changed, 87 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index e59b172..2673271 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -241,12 +241,20 @@ struct kvm_arch_memory_slot {
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  };
>  
> +struct kvm_hpt_info {
> +	unsigned long virt;
> +	struct revmap_entry *rev;
> +	unsigned long npte;
> +	unsigned long mask;
> +	u32 order;
> +	int cma;
> +};

While you're at it, it would be really great if you could add a comment
at the end of each line with a short description of what the variables
are about. E.g. if I just read "virt" and do not have much clue of the
code yet, I have a hard time to figure out what this means...

 Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing
  2016-12-15  5:53 ` [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing David Gibson
@ 2016-12-16  9:25   ` Thomas Huth
  2016-12-19  6:36     ` David Gibson
  2016-12-16 13:15   ` Thomas Huth
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2016-12-16  9:25 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:53, David Gibson wrote:
> This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
> advertise whether KVM is capable of handling the PAPR extensions for
> resizing the hashed page table during guest runtime.
> 
> At present, HPT resizing is possible with KVM PR without kernel
> modification, since the HPT is managed within qemu.  It's not possible yet
> with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
> use other capabilities which (by accident) reveal whether PR or HV is in
> use to know if it can advertise HPT resizing capability to the guest.
> 
> To avoid ambiguity with existing kernels, the encoding is a bit odd.
>     0 means "unknown" since that's what previous kernels will return
>     1 means "HPT resize possible if available if and only if the HPT is allocated in
>       userspace, rather than in the kernel".  Userspace can check
>       KVM_CAP_PPC_ALLOC_HTAB to determine if that's the case.  In practice
>       this will give the same results as userspace's fallback check.
>     2 will mean "HPT resize available and implemented via ioctl()s
>       KVM_PPC_RESIZE_HPT_PREPARE and KVM_PPC_RESIZE_HPT_COMMIT"

This encoding IMHO clearly needs some proper documentation in
Documentation/virtual/kvm/api.txt ... and maybe also some dedicated
#defines in an uapi header file.

 Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order
  2016-12-15  5:53 ` [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order David Gibson
@ 2016-12-16 10:39   ` Thomas Huth
  2016-12-19  0:04     ` David Gibson
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2016-12-16 10:39 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:53, David Gibson wrote:
> Currently the kvm_hpt_info structure stores the hashed page table's order,
> and also the number of HPTEs it contains and a mask for its size.  The
> last two can be easily derived from the order, so remove them and just
> calculate them as necessary with a couple of helper inlines.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
[...]
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index b5799d1..fe88132 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>  
>  	kvm->arch.hpt.virt = hpt;
>  	kvm->arch.hpt.order = order;
> -	/* HPTEs are 2**4 bytes long */
> -	kvm->arch.hpt.npte = 1ul << (order - 4);
> -	/* 128 (2**7) bytes in each HPTEG */
> -	kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
>  
>  	atomic64_set(&kvm->arch.mmio_update, 0);
>  
>  	/* Allocate reverse map array */
> -	rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
> +	rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
>  	if (!rev) {
>  		pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
>  		goto out_freehpt;
> @@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
>  	if (npages > 1ul << (40 - porder))
>  		npages = 1ul << (40 - porder);
>  	/* Can't use more than 1 HPTE per HPTEG */
> -	if (npages > kvm->arch.hpt.mask + 1)
> -		npages = kvm->arch.hpt.mask + 1;
> +	if (npages > kvmppc_hpt_mask(&kvm->arch.hpt) + 1)
> +		npages = kvmppc_hpt_mask(&kvm->arch.hpt) + 1;
>  
>  	hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
>  		HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
> @@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
>  	for (i = 0; i < npages; ++i) {
>  		addr = i << porder;
>  		/* can't use hpt_hash since va > 64 bits */
> -		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt.mask;
> +		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
> +			& kvmppc_hpt_mask(&kvm->arch.hpt);
>  		/*
>  		 * We assume that the hash table is empty and no
>  		 * vcpus are using it at this stage.  Since we create

kvmppc_hpt_mask() is now called three times in kvmppc_map_vrma() ... you
could use a local variable to store the value so that the calculation
has only be done once here.

> @@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
>  
>  		/* Skip uninteresting entries, i.e. clean on not-first pass */
>  		if (!first_pass) {
> -			while (i < kvm->arch.hpt.npte &&
> +			while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
>  			       !hpte_dirty(revp, hptp)) {
>  				++i;
>  				hptp += 2;
> @@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
>  		hdr.index = i;
>  
>  		/* Grab a series of valid entries */
> -		while (i < kvm->arch.hpt.npte &&
> +		while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
>  		       hdr.n_valid < 0xffff &&
>  		       nb + HPTE_SIZE < count &&
>  		       record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
> @@ -1332,7 +1329,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
>  			++revp;
>  		}
>  		/* Now skip invalid entries while we can */
> -		while (i < kvm->arch.hpt.npte &&
> +		while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
>  		       hdr.n_invalid < 0xffff &&
>  		       record_hpte(flags, hptp, hpte, revp, 0, first_pass)) {
>  			/* found an invalid entry */

Dito, you could use a local variable to store the value from
kvmppc_hpt_npte()

Anyway, apart from these nits, patch looks fine to me, so:

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation
  2016-12-15  5:53 ` [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation David Gibson
@ 2016-12-16 11:57   ` Thomas Huth
  2016-12-19  0:24     ` David Gibson
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2016-12-16 11:57 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:53, David Gibson wrote:
> Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
> and sets it up as the active page table for a VM.  For the upcoming HPT
> resize implementation we're going to want to allocate HPTs separately from
> activating them.
> 
> So, split the allocation itself out into kvmppc_allocate_hpt() and perform
> the activation with a new kvmppc_set_hpt() function.  Likewise we split
> kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
> which unsets it as an active HPT, then frees it.
> 
> We also move the logic to fall back to smaller HPT sizes if the first try
> fails into the single caller which used that behaviour,
> kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
> that previously if the initial attempt at CMA allocation failed, we would
> fall back to attempting smaller sizes with the page allocator.  Now, we
> try first CMA, then the page allocator at each size.  As far as I can tell
> this change should be harmless.
> 
> To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
> call to kvmppc_free_lpid() that was there, we move to the single caller.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h |  3 ++
>  arch/powerpc/include/asm/kvm_ppc.h       |  5 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c      | 90 ++++++++++++++++----------------
>  arch/powerpc/kvm/book3s_hv.c             | 18 +++++--
>  4 files changed, 65 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 8810327..6dc4004 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -22,6 +22,9 @@
>  
>  #include <asm/book3s/64/mmu-hash.h>
>  
> +/* Power architecture requires HPT is at least 256kB */
> +#define PPC_MIN_HPT_ORDER	18
> +
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 3db6be9..41575b8 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
>  extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
>  extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
>  
> -extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
> +extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
> +extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
>  extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
> -extern void kvmppc_free_hpt(struct kvm *kvm);
> +extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
>  extern long kvmppc_prepare_vrma(struct kvm *kvm,
>  				struct kvm_userspace_memory_region *mem);
>  extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index fe88132..68bb228 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -40,76 +40,70 @@
>  
>  #include "trace_hv.h"
>  
> -/* Power architecture requires HPT is at least 256kB */
> -#define PPC_MIN_HPT_ORDER	18
> -
>  static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
>  				long pte_index, unsigned long pteh,
>  				unsigned long ptel, unsigned long *pte_idx_ret);
>  static void kvmppc_rmap_reset(struct kvm *kvm);
>  
> -long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> +int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
>  {
>  	unsigned long hpt = 0;
> -	struct revmap_entry *rev;
> +	int cma = 0;
>  	struct page *page = NULL;
> -	long order = KVM_DEFAULT_HPT_ORDER;
> -
> -	if (htab_orderp) {
> -		order = *htab_orderp;
> -		if (order < PPC_MIN_HPT_ORDER)
> -			order = PPC_MIN_HPT_ORDER;
> -	}

Not sure, but as far as I can see, *htab_orderp could still be provided
from userspace via kvm_arch_vm_ioctl_hv() -> kvmppc_alloc_reset_hpt() ?
In that case, I think you should still check that the order is bigger
than PPC_MIN_HPT_ORDER, and return an error code otherwise?
Or if you feel confident that this value can never be supplied by
userspace anymore, add at least a BUG_ON(order < PPC_MIN_HPT_ORDER) here?

> +	struct revmap_entry *rev;
> +	unsigned long npte;
>  
> -	kvm->arch.hpt.cma = 0;
> +	hpt = 0;
> +	cma = 0;

Both hpt and cma are initialized to zero in the variable declarations
already, so the above two lines are redundant.

>  	page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
>  	if (page) {
>  		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
>  		memset((void *)hpt, 0, (1ul << order));
> -		kvm->arch.hpt.cma = 1;
> +		cma = 1;
>  	}
>  
> -	/* Lastly try successively smaller sizes from the page allocator */
> -	/* Only do this if userspace didn't specify a size via ioctl */
> -	while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
> -		hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
> -				       __GFP_NOWARN, order - PAGE_SHIFT);
> -		if (!hpt)
> -			--order;
> -	}
> +	if (!hpt)
> +		hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT
> +				       |__GFP_NOWARN, order - PAGE_SHIFT);
>  
>  	if (!hpt)
>  		return -ENOMEM;
>  
> -	kvm->arch.hpt.virt = hpt;
> -	kvm->arch.hpt.order = order;
> -
> -	atomic64_set(&kvm->arch.mmio_update, 0);
> +	/* HPTEs are 2**4 bytes long */
> +	npte = 1ul << (order - 4);
>  
>  	/* Allocate reverse map array */
> -	rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
> +	rev = vmalloc(sizeof(struct revmap_entry) * npte);
>  	if (!rev) {
> -		pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
> +		pr_err("kvmppc_allocate_hpt: Couldn't alloc reverse map array\n");
>  		goto out_freehpt;

So here you jump to out_freehpt before setting info->virt ...

>  	}
> -	kvm->arch.hpt.rev = rev;
> -	kvm->arch.sdr1 = __pa(hpt) | (order - 18);
>  
> -	pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
> -		hpt, order, kvm->arch.lpid);
> +	info->order = order;
> +	info->virt = hpt;
> +	info->cma = cma;
> +	info->rev = rev;
>  
> -	if (htab_orderp)
> -		*htab_orderp = order;
>  	return 0;
>  
>   out_freehpt:
> -	if (kvm->arch.hpt.cma)
> +	if (cma)
>  		kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
>  	else
> -		free_pages(hpt, order - PAGE_SHIFT);
> +		free_pages(info->virt, order - PAGE_SHIFT);

... but here you free info->virt which has not been set in case of the
"goto" above. That's clearly wrong.

>  	return -ENOMEM;
>  }
>  
> +void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
> +{
> +	atomic64_set(&kvm->arch.mmio_update, 0);
> +	kvm->arch.hpt = *info;
> +	kvm->arch.sdr1 = __pa(info->virt) | (info->order - 18);
> +
> +	pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
> +		info->virt, (long)info->order, kvm->arch.lpid);

Not directly related to your patch, but these messages pop up in the
dmesg each time I start a guest ... for the normal user who does not
have a clue what "htab" means, this can be pretty much annoying. Could
you maybe turn this into a pr_debug() instead?

> +}
> +
>  long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
>  {
>  	long err = -EBUSY;
> @@ -138,24 +132,28 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
>  		*htab_orderp = order;
>  		err = 0;
>  	} else {
> -		err = kvmppc_alloc_hpt(kvm, htab_orderp);
> -		order = *htab_orderp;
> +		struct kvm_hpt_info info;
> +
> +		err = kvmppc_allocate_hpt(&info, *htab_orderp);
> +		if (err < 0)
> +			goto out;
> +		kvmppc_set_hpt(kvm, &info);

Just a matter of taste (I dislike gotos if avoidable), but you could
also do:

	err = kvmppc_allocate_hpt(&info, *htab_orderp);
	if (!err)
		kvmppc_set_hpt(kvm, &info);

... and that's even one line shorter ;-)

>  	}
>   out:
>  	mutex_unlock(&kvm->lock);
>  	return err;
>  }
>  
> -void kvmppc_free_hpt(struct kvm *kvm)
> +void kvmppc_free_hpt(struct kvm_hpt_info *info)
>  {
> -	kvmppc_free_lpid(kvm->arch.lpid);
> -	vfree(kvm->arch.hpt.rev);
> -	if (kvm->arch.hpt.cma)
> -		kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt.virt),
> -				 1 << (kvm->arch.hpt.order - PAGE_SHIFT));
> +	vfree(info->rev);
> +	if (info->cma)
> +		kvm_free_hpt_cma(virt_to_page(info->virt),
> +				 1 << (info->order - PAGE_SHIFT));
>  	else
> -		free_pages(kvm->arch.hpt.virt,
> -			   kvm->arch.hpt.order - PAGE_SHIFT);
> +		free_pages(info->virt, info->order - PAGE_SHIFT);
> +	info->virt = 0;
> +	info->order = 0;
>  }
>  
>  /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 78baa2b..71c5adb 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3115,11 +3115,22 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
>  
>  	/* Allocate hashed page table (if not done already) and reset it */
>  	if (!kvm->arch.hpt.virt) {
> -		err = kvmppc_alloc_hpt(kvm, NULL);
> -		if (err) {
> +		int order = KVM_DEFAULT_HPT_ORDER;
> +		struct kvm_hpt_info info;
> +
> +		err = kvmppc_allocate_hpt(&info, order);
> +		/* If we get here, it means userspace didn't specify a
> +		 * size explicitly.  So, try successively smaller
> +		 * sizes if the default failed. */
> +		while (err < 0 && --order >= PPC_MIN_HPT_ORDER)
> +			err  = kvmppc_allocate_hpt(&info, order);

Not sure, but maybe replace "err < 0" with "err == -ENOMEM" in the while
condition? Or should it also loop on other future possible errors types?

> +		if (err < 0) {
>  			pr_err("KVM: Couldn't alloc HPT\n");
>  			goto out;
>  		}
> +
> +		kvmppc_set_hpt(kvm, &info);
>  	}
>  
>  	/* Look up the memslot for guest physical address 0 */
> @@ -3357,7 +3368,8 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
>  
>  	kvmppc_free_vcores(kvm);
>  
> -	kvmppc_free_hpt(kvm);
> +	kvmppc_free_lpid(kvm->arch.lpid);
> +	kvmppc_free_hpt(&kvm->arch.hpt);
>  
>  	kvmppc_free_pimap(kvm);
>  }
> 

 Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  2016-12-15  5:53 ` [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size David Gibson
@ 2016-12-16 12:44   ` Thomas Huth
  2016-12-19  0:48     ` David Gibson
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2016-12-16 12:44 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:53, David Gibson wrote:
> The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
> table (HPT) that userspace expects a guest VM to have, and is also used to
> clear that HPT when necessary (e.g. guest reboot).
> 
> At present, once the ioctl() is called for the first time, the HPT size can
> never be changed thereafter - it will be cleared but always sized as from
> the first call.
> 
> With upcoming HPT resize implementation, we're going to need to allow
> userspace to resize the HPT at reset (to change it back to the default size
> if the guest changed it).
> 
> So, we need to allow this ioctl() to change the HPT size.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 53 ++++++++++++++++++++-----------------
>  arch/powerpc/kvm/book3s_hv.c        |  5 +---
>  3 files changed, 30 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 41575b8..3b837bc 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
>  
>  extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
>  extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
> -extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
> +extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
>  extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
>  extern long kvmppc_prepare_vrma(struct kvm *kvm,
>  				struct kvm_userspace_memory_region *mem);
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 68bb228..8e5ac2f 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -104,10 +104,22 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
>  		info->virt, (long)info->order, kvm->arch.lpid);
>  }
>  
> -long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> +void kvmppc_free_hpt(struct kvm_hpt_info *info)
> +{
> +	vfree(info->rev);
> +	if (info->cma)
> +		kvm_free_hpt_cma(virt_to_page(info->virt),
> +				 1 << (info->order - PAGE_SHIFT));
> +	else
> +		free_pages(info->virt, info->order - PAGE_SHIFT);
> +	info->virt = 0;
> +	info->order = 0;
> +}

Why do you need to move kvmppc_free_hpt() around? Seems like unecessary
code churn to me?

> +long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
>  {
>  	long err = -EBUSY;
> -	long order;
> +	struct kvm_hpt_info info;
>  
>  	mutex_lock(&kvm->lock);
>  	if (kvm->arch.hpte_setup_done) {
> @@ -119,8 +131,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
>  			goto out;
>  		}
>  	}
> -	if (kvm->arch.hpt.virt) {
> -		order = kvm->arch.hpt.order;
> +	if (kvm->arch.hpt.order == order) {
> +		/* We already have a suitable HPT */
> +
>  		/* Set the entire HPT to 0, i.e. invalid HPTEs */
>  		memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
>  		/*
> @@ -129,33 +142,23 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
>  		kvmppc_rmap_reset(kvm);
>  		/* Ensure that each vcpu will flush its TLB on next entry. */
>  		cpumask_setall(&kvm->arch.need_tlb_flush);
> -		*htab_orderp = order;
>  		err = 0;
> -	} else {
> -		struct kvm_hpt_info info;
> -
> -		err = kvmppc_allocate_hpt(&info, *htab_orderp);
> -		if (err < 0)
> -			goto out;
> -		kvmppc_set_hpt(kvm, &info);
> +		goto out;
>  	}
> - out:
> +
> +	if (kvm->arch.hpt.virt)
> +		kvmppc_free_hpt(&kvm->arch.hpt);
> +
> +	err = kvmppc_allocate_hpt(&info, order);
> +	if (err < 0)
> +		goto out;
> +	kvmppc_set_hpt(kvm, &info);
> +
> +out:
>  	mutex_unlock(&kvm->lock);
>  	return err;
>  }
>  
> -void kvmppc_free_hpt(struct kvm_hpt_info *info)
> -{
> -	vfree(info->rev);
> -	if (info->cma)
> -		kvm_free_hpt_cma(virt_to_page(info->virt),
> -				 1 << (info->order - PAGE_SHIFT));
> -	else
> -		free_pages(info->virt, info->order - PAGE_SHIFT);
> -	info->virt = 0;
> -	info->order = 0;
> -}
> -
>  /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
>  static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize)
>  {
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 71c5adb..957e473 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3600,12 +3600,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
>  		r = -EFAULT;
>  		if (get_user(htab_order, (u32 __user *)argp))
>  			break;
> -		r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
> +		r = kvmppc_alloc_reset_hpt(kvm, htab_order);
>  		if (r)
>  			break;
> -		r = -EFAULT;
> -		if (put_user(htab_order, (u32 __user *)argp))
> -			break;

Now that htab_order is not changed anymore by the kernel, I'm pretty
sure you need some checks on the value here before calling
kvmppc_alloc_reset_hpt(), e.g. return an error code if htab_order <
PPC_MIN_HPT_ORDER.

And, I'm not sure if I got that right, but in former times, the
htab_order from the userspace application was just a suggestion, and now
it's mandatory, right? So if an old userspace used a very high value
here (or even something smaller than PPC_MIN_HPT_ORDER like 0), the
kernel fixed this up and the userspace could run happily with the fixed
value afterwards. But since this value from userspace if mandatory now,
such an userspace application is broken now. So maybe it's better to
introduce a new ioctl for this new behavior instead, to avoid breaking
old userspace applications?

Anyway, you should also update Documentation/virtual/kvm/api.txt to
reflect the new behavior.

>  		r = 0;
>  		break;
>  	}

 Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper()
  2016-12-15  5:54 ` [PATCH 07/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper() David Gibson
@ 2016-12-16 12:48   ` Thomas Huth
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Huth @ 2016-12-16 12:48 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:54, David Gibson wrote:
> The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used
> to force all guest mappings of a particular host page to be set ABSENT, and
> removed from the reverse mappings.
> 
> For HPT resizing, we will have some cases where we want to set just a
> single guest HPTE ABSENT and remove its reverse mappings.  To prepare with
> this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE,
> moving it to a new helper function.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 77 +++++++++++++++++++++----------------
>  1 file changed, 44 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 8e5ac2f..cd145eb 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -740,13 +740,53 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  	return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
>  }
>  
> +/* Must be called with both HPTE and rmap locked */
> +static void kvmppc_unmap_hpte(struct kvm *kvm, unsigned long i,
> +			      unsigned long *rmapp, unsigned long gfn)
> +{
> +	__be64 *hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
> +	struct revmap_entry *rev = kvm->arch.hpt.rev;
> +	unsigned long j, h;
> +	unsigned long ptel, psize, rcbits;
> +
> +	j = rev[i].forw;
> +	if (j == i) {
> +		/* chain is now empty */
> +		*rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
> +	} else {
> +		/* remove i from chain */
> +		h = rev[i].back;
> +		rev[h].forw = j;
> +		rev[j].back = h;
> +		rev[i].forw = rev[i].back = i;
> +		*rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
> +	}
> +
> +	/* Now check and modify the HPTE */
> +	ptel = rev[i].guest_rpte;
> +	psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
> +	if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
> +	    hpte_rpn(ptel, psize) == gfn) {
> +		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
> +		kvmppc_invalidate_hpte(kvm, hptep, i);
> +		hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
> +		/* Harvest R and C */
> +		rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
> +		*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
> +		if (rcbits & HPTE_R_C)
> +			kvmppc_update_rmap_change(rmapp, psize);
> +		if (rcbits & ~rev[i].guest_rpte) {
> +			rev[i].guest_rpte = ptel | rcbits;
> +			note_hpte_modification(kvm, &rev[i]);
> +		}
> +	}	

Superfluous TAB at the end of above line.

Apart from that, the patch looks fine to me.

 Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing
  2016-12-15  5:53 ` [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing David Gibson
  2016-12-16  9:25   ` Thomas Huth
@ 2016-12-16 13:15   ` Thomas Huth
  2016-12-19  6:37     ` David Gibson
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2016-12-16 13:15 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:53, David Gibson wrote:
> This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
> advertise whether KVM is capable of handling the PAPR extensions for
> resizing the hashed page table during guest runtime.
> 
> At present, HPT resizing is possible with KVM PR without kernel
> modification, since the HPT is managed within qemu.  It's not possible yet
> with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
> use other capabilities which (by accident) reveal whether PR or HV is in
> use to know if it can advertise HPT resizing capability to the guest.
> 
> To avoid ambiguity with existing kernels, the encoding is a bit odd.
>     0 means "unknown" since that's what previous kernels will return
>     1 means "HPT resize possible if available if and only if the HPT is allocated in
>       userspace, rather than in the kernel".  Userspace can check
>       KVM_CAP_PPC_ALLOC_HTAB to determine if that's the case.  In practice
>       this will give the same results as userspace's fallback check.
>     2 will mean "HPT resize available and implemented via ioctl()s
>       KVM_PPC_RESIZE_HPT_PREPARE and KVM_PPC_RESIZE_HPT_COMMIT"
> 
> For now we always return 1, but the intention is to return 2 once HPT
> resize is implemented for KVM HV.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/kvm/powerpc.c |  3 +++
>  include/uapi/linux/kvm.h   | 10 ++++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index efd1183..bb23923 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_SPAPR_MULTITCE:
>  		r = 1;
>  		break;
> +	case KVM_CAP_SPAPR_RESIZE_HPT:
> +		r = 1; /* resize allowed only if HPT is outside kernel */
> +		break;
>  #endif
>  	case KVM_CAP_PPC_HTM:
>  		r = cpu_has_feature(CPU_FTR_TM_COMP) &&
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index cac48ed..904afe0 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -685,6 +685,12 @@ struct kvm_ppc_smmu_info {
>  	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
>  };
>  
> +/* for KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} */
> +struct kvm_ppc_resize_hpt {
> +	__u64 flags;
> +	__u32 shift;
> +};

I think you should also add a final "__u32 pad" to that struct to make
sure that it is naturally aligned (like it is done with struct
kvm_coalesced_mmio_zone already for example).

 Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/11] powerpc/kvm: KVM-HV HPT resizing stub implementation
  2016-12-15  5:54 ` [PATCH 08/11] powerpc/kvm: KVM-HV HPT resizing stub implementation David Gibson
@ 2016-12-16 15:35   ` Thomas Huth
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Huth @ 2016-12-16 15:35 UTC (permalink / raw)
  To: David Gibson, paulus
  Cc: michael, benh, sjitindarsingh, lvivier, linuxppc-dev, linux-kernel, kvm

On 15.12.2016 06:54, David Gibson wrote:
> This patch adds a stub (always failing) implementation of the ioctl()s
> for the HPT resizing PAPR extension.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++++
>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 16 ++++++++++++++++
>  arch/powerpc/kvm/book3s_hv.c        | 22 ++++++++++++++++++++++
>  3 files changed, 42 insertions(+)

I think I'd just squash this with the patch where you do the real
implementation.

 Thomas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure
  2016-12-16  9:24   ` Thomas Huth
@ 2016-12-19  0:03     ` David Gibson
  0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-19  0:03 UTC (permalink / raw)
  To: Thomas Huth
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]

On Fri, Dec 16, 2016 at 10:24:17AM +0100, Thomas Huth wrote:
> On 15.12.2016 06:53, David Gibson wrote:
> > Currently, the powerpc kvm_arch structure contains a number of variables
> > tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
> > patch gathers them all together into a single kvm_hpt_info substructure.
> > This makes life more convenient for the upcoming HPT resizing
> > implementation.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  arch/powerpc/include/asm/kvm_host.h | 16 ++++---
> >  arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 ++++++++++++++++++-------------------
> >  arch/powerpc/kvm/book3s_hv.c        |  2 +-
> >  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 ++++++++++++-------------
> >  4 files changed, 87 insertions(+), 83 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> > index e59b172..2673271 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -241,12 +241,20 @@ struct kvm_arch_memory_slot {
> >  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> >  };
> >  
> > +struct kvm_hpt_info {
> > +	unsigned long virt;
> > +	struct revmap_entry *rev;
> > +	unsigned long npte;
> > +	unsigned long mask;
> > +	u32 order;
> > +	int cma;
> > +};
> 
> While you're at it, it would be really great if you could add a comment
> at the end of each line with a short description of what the variables
> are about. E.g. if I just read "virt" and do not have much clue of the
> code yet, I have a hard time to figure out what this means...

Good idea, done.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order
  2016-12-16 10:39   ` Thomas Huth
@ 2016-12-19  0:04     ` David Gibson
  0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-19  0:04 UTC (permalink / raw)
  To: Thomas Huth
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 4404 bytes --]

On Fri, Dec 16, 2016 at 11:39:26AM +0100, Thomas Huth wrote:
> On 15.12.2016 06:53, David Gibson wrote:
> > Currently the kvm_hpt_info structure stores the hashed page table's order,
> > and also the number of HPTEs it contains and a mask for its size.  The
> > last two can be easily derived from the order, so remove them and just
> > calculate them as necessary with a couple of helper inlines.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> [...]
> > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > index b5799d1..fe88132 100644
> > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > @@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> >  
> >  	kvm->arch.hpt.virt = hpt;
> >  	kvm->arch.hpt.order = order;
> > -	/* HPTEs are 2**4 bytes long */
> > -	kvm->arch.hpt.npte = 1ul << (order - 4);
> > -	/* 128 (2**7) bytes in each HPTEG */
> > -	kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;
> >  
> >  	atomic64_set(&kvm->arch.mmio_update, 0);
> >  
> >  	/* Allocate reverse map array */
> > -	rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
> > +	rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
> >  	if (!rev) {
> >  		pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
> >  		goto out_freehpt;
> > @@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
> >  	if (npages > 1ul << (40 - porder))
> >  		npages = 1ul << (40 - porder);
> >  	/* Can't use more than 1 HPTE per HPTEG */
> > -	if (npages > kvm->arch.hpt.mask + 1)
> > -		npages = kvm->arch.hpt.mask + 1;
> > +	if (npages > kvmppc_hpt_mask(&kvm->arch.hpt) + 1)
> > +		npages = kvmppc_hpt_mask(&kvm->arch.hpt) + 1;
> >  
> >  	hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
> >  		HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
> > @@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
> >  	for (i = 0; i < npages; ++i) {
> >  		addr = i << porder;
> >  		/* can't use hpt_hash since va > 64 bits */
> > -		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt.mask;
> > +		hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
> > +			& kvmppc_hpt_mask(&kvm->arch.hpt);
> >  		/*
> >  		 * We assume that the hash table is empty and no
> >  		 * vcpus are using it at this stage.  Since we create
> 
> kvmppc_hpt_mask() is now called three times in kvmppc_map_vrma() ... you
> could use a local variable to store the value so that the calculation
> has only be done once here.

Well, I was assuming that inlining plus gcc's common subexpression
elimination would deal with that.

> 
> > @@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
> >  
> >  		/* Skip uninteresting entries, i.e. clean on not-first pass */
> >  		if (!first_pass) {
> > -			while (i < kvm->arch.hpt.npte &&
> > +			while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
> >  			       !hpte_dirty(revp, hptp)) {
> >  				++i;
> >  				hptp += 2;
> > @@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
> >  		hdr.index = i;
> >  
> >  		/* Grab a series of valid entries */
> > -		while (i < kvm->arch.hpt.npte &&
> > +		while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
> >  		       hdr.n_valid < 0xffff &&
> >  		       nb + HPTE_SIZE < count &&
> >  		       record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
> > @@ -1332,7 +1329,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
> >  			++revp;
> >  		}
> >  		/* Now skip invalid entries while we can */
> > -		while (i < kvm->arch.hpt.npte &&
> > +		while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
> >  		       hdr.n_invalid < 0xffff &&
> >  		       record_hpte(flags, hptp, hpte, revp, 0, first_pass)) {
> >  			/* found an invalid entry */
> 
> Dito, you could use a local variable to store the value from
> kvmppc_hpt_npte()
> 
> Anyway, apart from these nits, patch looks fine to me, so:
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation
  2016-12-16 11:57   ` Thomas Huth
@ 2016-12-19  0:24     ` David Gibson
  0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-19  0:24 UTC (permalink / raw)
  To: Thomas Huth
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 11818 bytes --]

On Fri, Dec 16, 2016 at 12:57:26PM +0100, Thomas Huth wrote:
> On 15.12.2016 06:53, David Gibson wrote:
> > Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
> > and sets it up as the active page table for a VM.  For the upcoming HPT
> > resize implementation we're going to want to allocate HPTs separately from
> > activating them.
> > 
> > So, split the allocation itself out into kvmppc_allocate_hpt() and perform
> > the activation with a new kvmppc_set_hpt() function.  Likewise we split
> > kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
> > which unsets it as an active HPT, then frees it.
> > 
> > We also move the logic to fall back to smaller HPT sizes if the first try
> > fails into the single caller which used that behaviour,
> > kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
> > that previously if the initial attempt at CMA allocation failed, we would
> > fall back to attempting smaller sizes with the page allocator.  Now, we
> > try first CMA, then the page allocator at each size.  As far as I can tell
> > this change should be harmless.
> > 
> > To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
> > call to kvmppc_free_lpid() that was there, we move to the single caller.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  arch/powerpc/include/asm/kvm_book3s_64.h |  3 ++
> >  arch/powerpc/include/asm/kvm_ppc.h       |  5 +-
> >  arch/powerpc/kvm/book3s_64_mmu_hv.c      | 90 ++++++++++++++++----------------
> >  arch/powerpc/kvm/book3s_hv.c             | 18 +++++--
> >  4 files changed, 65 insertions(+), 51 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> > index 8810327..6dc4004 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> > @@ -22,6 +22,9 @@
> >  
> >  #include <asm/book3s/64/mmu-hash.h>
> >  
> > +/* Power architecture requires HPT is at least 256kB */
> > +#define PPC_MIN_HPT_ORDER	18
> > +
> >  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> >  static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
> >  {
> > diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> > index 3db6be9..41575b8 100644
> > --- a/arch/powerpc/include/asm/kvm_ppc.h
> > +++ b/arch/powerpc/include/asm/kvm_ppc.h
> > @@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
> >  extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
> >  extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
> >  
> > -extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
> > +extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
> > +extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
> >  extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
> > -extern void kvmppc_free_hpt(struct kvm *kvm);
> > +extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
> >  extern long kvmppc_prepare_vrma(struct kvm *kvm,
> >  				struct kvm_userspace_memory_region *mem);
> >  extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
> > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > index fe88132..68bb228 100644
> > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > @@ -40,76 +40,70 @@
> >  
> >  #include "trace_hv.h"
> >  
> > -/* Power architecture requires HPT is at least 256kB */
> > -#define PPC_MIN_HPT_ORDER	18
> > -
> >  static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
> >  				long pte_index, unsigned long pteh,
> >  				unsigned long ptel, unsigned long *pte_idx_ret);
> >  static void kvmppc_rmap_reset(struct kvm *kvm);
> >  
> > -long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
> > +int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
> >  {
> >  	unsigned long hpt = 0;
> > -	struct revmap_entry *rev;
> > +	int cma = 0;
> >  	struct page *page = NULL;
> > -	long order = KVM_DEFAULT_HPT_ORDER;
> > -
> > -	if (htab_orderp) {
> > -		order = *htab_orderp;
> > -		if (order < PPC_MIN_HPT_ORDER)
> > -			order = PPC_MIN_HPT_ORDER;
> > -	}
> 
> Not sure, but as far as I can see, *htab_orderp could still be provided
> from userspace via kvm_arch_vm_ioctl_hv() -> kvmppc_alloc_reset_hpt() ?
> In that case, I think you should still check that the order is bigger
> than PPC_MIN_HPT_ORDER, and return an error code otherwise?
> Or if you feel confident that this value can never be supplied by
> userspace anymore, add at least a BUG_ON(order < PPC_MIN_HPT_ORDER) here?

Ah, you're right, I do need to check this.

> > +	struct revmap_entry *rev;
> > +	unsigned long npte;
> >  
> > -	kvm->arch.hpt.cma = 0;
> > +	hpt = 0;
> > +	cma = 0;
> 
> Both hpt and cma are initialized to zero in the variable declarations
> already, so the above two lines are redundant.

Oops.  I even spotted that one at some point, then forgot about it
again.

> >  	page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
> >  	if (page) {
> >  		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
> >  		memset((void *)hpt, 0, (1ul << order));
> > -		kvm->arch.hpt.cma = 1;
> > +		cma = 1;
> >  	}
> >  
> > -	/* Lastly try successively smaller sizes from the page allocator */
> > -	/* Only do this if userspace didn't specify a size via ioctl */
> > -	while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
> > -		hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
> > -				       __GFP_NOWARN, order - PAGE_SHIFT);
> > -		if (!hpt)
> > -			--order;
> > -	}
> > +	if (!hpt)
> > +		hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT
> > +				       |__GFP_NOWARN, order - PAGE_SHIFT);
> >  
> >  	if (!hpt)
> >  		return -ENOMEM;
> >  
> > -	kvm->arch.hpt.virt = hpt;
> > -	kvm->arch.hpt.order = order;
> > -
> > -	atomic64_set(&kvm->arch.mmio_update, 0);
> > +	/* HPTEs are 2**4 bytes long */
> > +	npte = 1ul << (order - 4);
> >  
> >  	/* Allocate reverse map array */
> > -	rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
> > +	rev = vmalloc(sizeof(struct revmap_entry) * npte);
> >  	if (!rev) {
> > -		pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
> > +		pr_err("kvmppc_allocate_hpt: Couldn't alloc reverse map array\n");
> >  		goto out_freehpt;
> 
> So here you jump to out_freehpt before setting info->virt ...
> 
> >  	}
> > -	kvm->arch.hpt.rev = rev;
> > -	kvm->arch.sdr1 = __pa(hpt) | (order - 18);
> >  
> > -	pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
> > -		hpt, order, kvm->arch.lpid);
> > +	info->order = order;
> > +	info->virt = hpt;
> > +	info->cma = cma;
> > +	info->rev = rev;
> >  
> > -	if (htab_orderp)
> > -		*htab_orderp = order;
> >  	return 0;
> >  
> >   out_freehpt:
> > -	if (kvm->arch.hpt.cma)
> > +	if (cma)
> >  		kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
> >  	else
> > -		free_pages(hpt, order - PAGE_SHIFT);
> > +		free_pages(info->virt, order - PAGE_SHIFT);
> 
> ... but here you free info->virt which has not been set in case of the
> "goto" above. That's clearly wrong.

Good catch.  Also, there's only one use of that label, so there's not
really any reason to use a goto at all.  Adjusted.

> >  	return -ENOMEM;
> >  }
> >  
> > +void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
> > +{
> > +	atomic64_set(&kvm->arch.mmio_update, 0);
> > +	kvm->arch.hpt = *info;
> > +	kvm->arch.sdr1 = __pa(info->virt) | (info->order - 18);
> > +
> > +	pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
> > +		info->virt, (long)info->order, kvm->arch.lpid);
> 
> Not directly related to your patch, but these messages pop up in the
> dmesg each time I start a guest ... for the normal user who does not
> have a clue what "htab" means, this can be pretty much annoying. Could
> you maybe turn this into a pr_debug() instead?

Maybe, but that's not really in scope for these patches.

> > +}
> > +
> >  long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> >  {
> >  	long err = -EBUSY;
> > @@ -138,24 +132,28 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> >  		*htab_orderp = order;
> >  		err = 0;
> >  	} else {
> > -		err = kvmppc_alloc_hpt(kvm, htab_orderp);
> > -		order = *htab_orderp;
> > +		struct kvm_hpt_info info;
> > +
> > +		err = kvmppc_allocate_hpt(&info, *htab_orderp);
> > +		if (err < 0)
> > +			goto out;
> > +		kvmppc_set_hpt(kvm, &info);
> 
> Just a matter of taste (I dislike gotos if avoidable), but you could
> also do:
> 
> 	err = kvmppc_allocate_hpt(&info, *htab_orderp);
> 	if (!err)
> 		kvmppc_set_hpt(kvm, &info);
> 
> ... and that's even one line shorter ;-)

Hrm.  I'd prefer to keep the goto: when most 1-line if statements are
the exception/failure case, it's very easy to miss that here it's the
success case.

> >  	}
> >   out:
> >  	mutex_unlock(&kvm->lock);
> >  	return err;
> >  }
> >  
> > -void kvmppc_free_hpt(struct kvm *kvm)
> > +void kvmppc_free_hpt(struct kvm_hpt_info *info)
> >  {
> > -	kvmppc_free_lpid(kvm->arch.lpid);
> > -	vfree(kvm->arch.hpt.rev);
> > -	if (kvm->arch.hpt.cma)
> > -		kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt.virt),
> > -				 1 << (kvm->arch.hpt.order - PAGE_SHIFT));
> > +	vfree(info->rev);
> > +	if (info->cma)
> > +		kvm_free_hpt_cma(virt_to_page(info->virt),
> > +				 1 << (info->order - PAGE_SHIFT));
> >  	else
> > -		free_pages(kvm->arch.hpt.virt,
> > -			   kvm->arch.hpt.order - PAGE_SHIFT);
> > +		free_pages(info->virt, info->order - PAGE_SHIFT);
> > +	info->virt = 0;
> > +	info->order = 0;
> >  }
> >  
> >  /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 78baa2b..71c5adb 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3115,11 +3115,22 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
> >  
> >  	/* Allocate hashed page table (if not done already) and reset it */
> >  	if (!kvm->arch.hpt.virt) {
> > -		err = kvmppc_alloc_hpt(kvm, NULL);
> > -		if (err) {
> > +		int order = KVM_DEFAULT_HPT_ORDER;
> > +		struct kvm_hpt_info info;
> > +
> > +		err = kvmppc_allocate_hpt(&info, order);
> > +		/* If we get here, it means userspace didn't specify a
> > +		 * size explicitly.  So, try successively smaller
> > +		 * sizes if the default failed. */
> > +		while (err < 0 && --order >= PPC_MIN_HPT_ORDER)
> > +			err  = kvmppc_allocate_hpt(&info, order);
> 
> Not sure, but maybe replace "err < 0" with "err == -ENOMEM" in the while
> condition? Or should it also loop on other future possible errors types?

No, I think you're right.  Looping only on -ENOMEM is a better idea.

> > +		if (err < 0) {
> >  			pr_err("KVM: Couldn't alloc HPT\n");
> >  			goto out;
> >  		}
> > +
> > +		kvmppc_set_hpt(kvm, &info);
> >  	}
> >  
> >  	/* Look up the memslot for guest physical address 0 */
> > @@ -3357,7 +3368,8 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
> >  
> >  	kvmppc_free_vcores(kvm);
> >  
> > -	kvmppc_free_hpt(kvm);
> > +	kvmppc_free_lpid(kvm->arch.lpid);
> > +	kvmppc_free_hpt(&kvm->arch.hpt);
> >  
> >  	kvmppc_free_pimap(kvm);
> >  }
> > 
> 
>  Thomas
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  2016-12-16 12:44   ` Thomas Huth
@ 2016-12-19  0:48     ` David Gibson
  2016-12-19  7:49       ` Thomas Huth
  0 siblings, 1 reply; 29+ messages in thread
From: David Gibson @ 2016-12-19  0:48 UTC (permalink / raw)
  To: Thomas Huth
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 7197 bytes --]

On Fri, Dec 16, 2016 at 01:44:57PM +0100, Thomas Huth wrote:
> On 15.12.2016 06:53, David Gibson wrote:
> > The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
> > table (HPT) that userspace expects a guest VM to have, and is also used to
> > clear that HPT when necessary (e.g. guest reboot).
> > 
> > At present, once the ioctl() is called for the first time, the HPT size can
> > never be changed thereafter - it will be cleared but always sized as from
> > the first call.
> > 
> > With upcoming HPT resize implementation, we're going to need to allow
> > userspace to resize the HPT at reset (to change it back to the default size
> > if the guest changed it).
> > 
> > So, we need to allow this ioctl() to change the HPT size.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
> >  arch/powerpc/kvm/book3s_64_mmu_hv.c | 53 ++++++++++++++++++++-----------------
> >  arch/powerpc/kvm/book3s_hv.c        |  5 +---
> >  3 files changed, 30 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> > index 41575b8..3b837bc 100644
> > --- a/arch/powerpc/include/asm/kvm_ppc.h
> > +++ b/arch/powerpc/include/asm/kvm_ppc.h
> > @@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
> >  
> >  extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
> >  extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
> > -extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
> > +extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
> >  extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
> >  extern long kvmppc_prepare_vrma(struct kvm *kvm,
> >  				struct kvm_userspace_memory_region *mem);
> > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > index 68bb228..8e5ac2f 100644
> > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > @@ -104,10 +104,22 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
> >  		info->virt, (long)info->order, kvm->arch.lpid);
> >  }
> >  
> > -long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> > +void kvmppc_free_hpt(struct kvm_hpt_info *info)
> > +{
> > +	vfree(info->rev);
> > +	if (info->cma)
> > +		kvm_free_hpt_cma(virt_to_page(info->virt),
> > +				 1 << (info->order - PAGE_SHIFT));
> > +	else
> > +		free_pages(info->virt, info->order - PAGE_SHIFT);
> > +	info->virt = 0;
> > +	info->order = 0;
> > +}
> 
> Why do you need to move kvmppc_free_hpt() around? Seems like unecessary
> code churn to me?

Previously, kvmppc_free_hpt() wasn't needed in
kvmppc_alloc_reset_hpt(), now it is.  So we need to move it above that
function.  I could move it in the previous patch, but that would
obscure what the actual changes are to it, so it seemed better to do
it here.

> 
> > +long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
> >  {
> >  	long err = -EBUSY;
> > -	long order;
> > +	struct kvm_hpt_info info;
> >  
> >  	mutex_lock(&kvm->lock);
> >  	if (kvm->arch.hpte_setup_done) {
> > @@ -119,8 +131,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> >  			goto out;
> >  		}
> >  	}
> > -	if (kvm->arch.hpt.virt) {
> > -		order = kvm->arch.hpt.order;
> > +	if (kvm->arch.hpt.order == order) {
> > +		/* We already have a suitable HPT */
> > +
> >  		/* Set the entire HPT to 0, i.e. invalid HPTEs */
> >  		memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
> >  		/*
> > @@ -129,33 +142,23 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> >  		kvmppc_rmap_reset(kvm);
> >  		/* Ensure that each vcpu will flush its TLB on next entry. */
> >  		cpumask_setall(&kvm->arch.need_tlb_flush);
> > -		*htab_orderp = order;
> >  		err = 0;
> > -	} else {
> > -		struct kvm_hpt_info info;
> > -
> > -		err = kvmppc_allocate_hpt(&info, *htab_orderp);
> > -		if (err < 0)
> > -			goto out;
> > -		kvmppc_set_hpt(kvm, &info);
> > +		goto out;
> >  	}
> > - out:
> > +
> > +	if (kvm->arch.hpt.virt)
> > +		kvmppc_free_hpt(&kvm->arch.hpt);
> > +
> > +	err = kvmppc_allocate_hpt(&info, order);
> > +	if (err < 0)
> > +		goto out;
> > +	kvmppc_set_hpt(kvm, &info);
> > +
> > +out:
> >  	mutex_unlock(&kvm->lock);
> >  	return err;
> >  }
> >  
> > -void kvmppc_free_hpt(struct kvm_hpt_info *info)
> > -{
> > -	vfree(info->rev);
> > -	if (info->cma)
> > -		kvm_free_hpt_cma(virt_to_page(info->virt),
> > -				 1 << (info->order - PAGE_SHIFT));
> > -	else
> > -		free_pages(info->virt, info->order - PAGE_SHIFT);
> > -	info->virt = 0;
> > -	info->order = 0;
> > -}
> > -
> >  /* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
> >  static inline unsigned long hpte0_pgsize_encoding(unsigned long pgsize)
> >  {
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 71c5adb..957e473 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3600,12 +3600,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
> >  		r = -EFAULT;
> >  		if (get_user(htab_order, (u32 __user *)argp))
> >  			break;
> > -		r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
> > +		r = kvmppc_alloc_reset_hpt(kvm, htab_order);
> >  		if (r)
> >  			break;
> > -		r = -EFAULT;
> > -		if (put_user(htab_order, (u32 __user *)argp))
> > -			break;
> 
> Now that htab_order is not changed anymore by the kernel, I'm pretty
> sure you need some checks on the value here before calling
> kvmppc_alloc_reset_hpt(), e.g. return an error code if htab_order <
> PPC_MIN_HPT_ORDER.

Right.  I've done that by putting the checks into
kvmppc_allocate_hpt() in the earlier patch.

> And, I'm not sure if I got that right, but in former times, the
> htab_order from the userspace application was just a suggestion, and now
> it's mandatory, right? So if an old userspace used a very high value
> here (or even something smaller than PPC_MIN_HPT_ORDER like 0), the
> kernel fixed this up and the userspace could run happily with the fixed
> value afterwards. But since this value from userspace if mandatory now,
> such an userspace application is broken now. So maybe it's better to
> introduce a new ioctl for this new behavior instead, to avoid breaking
> old userspace applications?

A long time ago it was just a hint.  However, that behaviour was
already changed in 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
smaller HPT size in allocation ioctl".  This is important: without
that we could get a different HPT size on the two ends of a migration,
which broke things nastily.

> Anyway, you should also update Documentation/virtual/kvm/api.txt to
> reflect the new behavior.

Good point.  Done.

> 
> >  		r = 0;
> >  		break;
> >  	}
> 
>  Thomas
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing
  2016-12-16  9:25   ` Thomas Huth
@ 2016-12-19  6:36     ` David Gibson
  0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-19  6:36 UTC (permalink / raw)
  To: Thomas Huth
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 1842 bytes --]

On Fri, Dec 16, 2016 at 10:25:55AM +0100, Thomas Huth wrote:
> On 15.12.2016 06:53, David Gibson wrote:
> > This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
> > advertise whether KVM is capable of handling the PAPR extensions for
> > resizing the hashed page table during guest runtime.
> > 
> > At present, HPT resizing is possible with KVM PR without kernel
> > modification, since the HPT is managed within qemu.  It's not possible yet
> > with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
> > use other capabilities which (by accident) reveal whether PR or HV is in
> > use to know if it can advertise HPT resizing capability to the guest.
> > 
> > To avoid ambiguity with existing kernels, the encoding is a bit odd.
> >     0 means "unknown" since that's what previous kernels will return
> >     1 means "HPT resize possible if available if and only if the HPT is allocated in
> >       userspace, rather than in the kernel".  Userspace can check
> >       KVM_CAP_PPC_ALLOC_HTAB to determine if that's the case.  In practice
> >       this will give the same results as userspace's fallback check.
> >     2 will mean "HPT resize available and implemented via ioctl()s
> >       KVM_PPC_RESIZE_HPT_PREPARE and KVM_PPC_RESIZE_HPT_COMMIT"
> 
> This encoding IMHO clearly needs some proper documentation in
> Documentation/virtual/kvm/api.txt ... and maybe also some dedicated
> #defines in an uapi header file.

Ah, yeah.  Actually I'm talking to paulus again to see if we can come
up with a way to encode the necessary facts without something as weird
as this one.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing
  2016-12-16 13:15   ` Thomas Huth
@ 2016-12-19  6:37     ` David Gibson
  0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-19  6:37 UTC (permalink / raw)
  To: Thomas Huth
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]

On Fri, Dec 16, 2016 at 02:15:30PM +0100, Thomas Huth wrote:
> On 15.12.2016 06:53, David Gibson wrote:
> > This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
> > advertise whether KVM is capable of handling the PAPR extensions for
> > resizing the hashed page table during guest runtime.
> > 
> > At present, HPT resizing is possible with KVM PR without kernel
> > modification, since the HPT is managed within qemu.  It's not possible yet
> > with KVM HV, because the HPT is managed by KVM.  At present, qemu has to
> > use other capabilities which (by accident) reveal whether PR or HV is in
> > use to know if it can advertise HPT resizing capability to the guest.
> > 
> > To avoid ambiguity with existing kernels, the encoding is a bit odd.
> >     0 means "unknown" since that's what previous kernels will return
> >     1 means "HPT resize possible if available if and only if the HPT is allocated in
> >       userspace, rather than in the kernel".  Userspace can check
> >       KVM_CAP_PPC_ALLOC_HTAB to determine if that's the case.  In practice
> >       this will give the same results as userspace's fallback check.
> >     2 will mean "HPT resize available and implemented via ioctl()s
> >       KVM_PPC_RESIZE_HPT_PREPARE and KVM_PPC_RESIZE_HPT_COMMIT"
> > 
> > For now we always return 1, but the intention is to return 2 once HPT
> > resize is implemented for KVM HV.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  arch/powerpc/kvm/powerpc.c |  3 +++
> >  include/uapi/linux/kvm.h   | 10 ++++++++++
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index efd1183..bb23923 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  	case KVM_CAP_SPAPR_MULTITCE:
> >  		r = 1;
> >  		break;
> > +	case KVM_CAP_SPAPR_RESIZE_HPT:
> > +		r = 1; /* resize allowed only if HPT is outside kernel */
> > +		break;
> >  #endif
> >  	case KVM_CAP_PPC_HTM:
> >  		r = cpu_has_feature(CPU_FTR_TM_COMP) &&
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index cac48ed..904afe0 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -685,6 +685,12 @@ struct kvm_ppc_smmu_info {
> >  	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
> >  };
> >  
> > +/* for KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} */
> > +struct kvm_ppc_resize_hpt {
> > +	__u64 flags;
> > +	__u32 shift;
> > +};
> 
> I think you should also add a final "__u32 pad" to that struct to make
> sure that it is naturally aligned (like it is done with struct
> kvm_coalesced_mmio_zone already for example).

Seems reasonable; done.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  2016-12-19  0:48     ` David Gibson
@ 2016-12-19  7:49       ` Thomas Huth
  2016-12-19 23:16         ` David Gibson
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2016-12-19  7:49 UTC (permalink / raw)
  To: David Gibson
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm


[-- Attachment #1.1: Type: text/plain, Size: 4390 bytes --]

On 19.12.2016 01:48, David Gibson wrote:
> On Fri, Dec 16, 2016 at 01:44:57PM +0100, Thomas Huth wrote:
>> On 15.12.2016 06:53, David Gibson wrote:
>>> The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
>>> table (HPT) that userspace expects a guest VM to have, and is also used to
>>> clear that HPT when necessary (e.g. guest reboot).
>>>
>>> At present, once the ioctl() is called for the first time, the HPT size can
>>> never be changed thereafter - it will be cleared but always sized as from
>>> the first call.
>>>
>>> With upcoming HPT resize implementation, we're going to need to allow
>>> userspace to resize the HPT at reset (to change it back to the default size
>>> if the guest changed it).
>>>
>>> So, we need to allow this ioctl() to change the HPT size.
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
[...]
>>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>> index 68bb228..8e5ac2f 100644
>>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>> @@ -104,10 +104,22 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
>>>  		info->virt, (long)info->order, kvm->arch.lpid);
>>>  }
>>>  
>>> -long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
>>> +void kvmppc_free_hpt(struct kvm_hpt_info *info)
>>> +{
>>> +	vfree(info->rev);
>>> +	if (info->cma)
>>> +		kvm_free_hpt_cma(virt_to_page(info->virt),
>>> +				 1 << (info->order - PAGE_SHIFT));
>>> +	else
>>> +		free_pages(info->virt, info->order - PAGE_SHIFT);
>>> +	info->virt = 0;
>>> +	info->order = 0;
>>> +}
>>
>> Why do you need to move kvmppc_free_hpt() around? Seems like unecessary
>> code churn to me?
> 
> Previously, kvmppc_free_hpt() wasn't needed in
> kvmppc_alloc_reset_hpt(), now it is.  So we need to move it above that
> function.  I could move it in the previous patch, but that would
> obscure what the actual changes are to it, so it seemed better to do
> it here.

kvmppc_free_hpt() is not a static function, there is a prototype in a
header for this somewhere, so as far as I can see, it should also work
without moving this function?

[...]
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 71c5adb..957e473 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -3600,12 +3600,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
>>>  		r = -EFAULT;
>>>  		if (get_user(htab_order, (u32 __user *)argp))
>>>  			break;
>>> -		r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
>>> +		r = kvmppc_alloc_reset_hpt(kvm, htab_order);
>>>  		if (r)
>>>  			break;
>>> -		r = -EFAULT;
>>> -		if (put_user(htab_order, (u32 __user *)argp))
>>> -			break;
>>
>> Now that htab_order is not changed anymore by the kernel, I'm pretty
>> sure you need some checks on the value here before calling
>> kvmppc_alloc_reset_hpt(), e.g. return an error code if htab_order <
>> PPC_MIN_HPT_ORDER.
> 
> Right.  I've done that by putting the checks into
> kvmppc_allocate_hpt() in the earlier patch.
> 
>> And, I'm not sure if I got that right, but in former times, the
>> htab_order from the userspace application was just a suggestion, and now
>> it's mandatory, right? So if an old userspace used a very high value
>> here (or even something smaller than PPC_MIN_HPT_ORDER like 0), the
>> kernel fixed this up and the userspace could run happily with the fixed
>> value afterwards. But since this value from userspace if mandatory now,
>> such an userspace application is broken now. So maybe it's better to
>> introduce a new ioctl for this new behavior instead, to avoid breaking
>> old userspace applications?
> 
> A long time ago it was just a hint.  However, that behaviour was
> already changed in 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
> smaller HPT size in allocation ioctl".  This is important: without
> that we could get a different HPT size on the two ends of a migration,
> which broke things nastily.

OK, makes sense, especially if the userspace provided an order. But if I
get that patch right, it was still possible to call with order == 0 to
get the automatic sizing? Do we need to preserve that behavior for some
very old userspace applications?

 Thomas



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
  2016-12-19  7:49       ` Thomas Huth
@ 2016-12-19 23:16         ` David Gibson
  0 siblings, 0 replies; 29+ messages in thread
From: David Gibson @ 2016-12-19 23:16 UTC (permalink / raw)
  To: Thomas Huth
  Cc: paulus, michael, benh, sjitindarsingh, lvivier, linuxppc-dev,
	linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 5250 bytes --]

On Mon, Dec 19, 2016 at 08:49:24AM +0100, Thomas Huth wrote:
> On 19.12.2016 01:48, David Gibson wrote:
> > On Fri, Dec 16, 2016 at 01:44:57PM +0100, Thomas Huth wrote:
> >> On 15.12.2016 06:53, David Gibson wrote:
> >>> The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
> >>> table (HPT) that userspace expects a guest VM to have, and is also used to
> >>> clear that HPT when necessary (e.g. guest reboot).
> >>>
> >>> At present, once the ioctl() is called for the first time, the HPT size can
> >>> never be changed thereafter - it will be cleared but always sized as from
> >>> the first call.
> >>>
> >>> With upcoming HPT resize implementation, we're going to need to allow
> >>> userspace to resize the HPT at reset (to change it back to the default size
> >>> if the guest changed it).
> >>>
> >>> So, we need to allow this ioctl() to change the HPT size.
> >>>
> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>> ---
> [...]
> >>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> >>> index 68bb228..8e5ac2f 100644
> >>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> >>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> >>> @@ -104,10 +104,22 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
> >>>  		info->virt, (long)info->order, kvm->arch.lpid);
> >>>  }
> >>>  
> >>> -long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> >>> +void kvmppc_free_hpt(struct kvm_hpt_info *info)
> >>> +{
> >>> +	vfree(info->rev);
> >>> +	if (info->cma)
> >>> +		kvm_free_hpt_cma(virt_to_page(info->virt),
> >>> +				 1 << (info->order - PAGE_SHIFT));
> >>> +	else
> >>> +		free_pages(info->virt, info->order - PAGE_SHIFT);
> >>> +	info->virt = 0;
> >>> +	info->order = 0;
> >>> +}
> >>
> >> Why do you need to move kvmppc_free_hpt() around? Seems like unecessary
> >> code churn to me?
> > 
> > Previously, kvmppc_free_hpt() wasn't needed in
> > kvmppc_alloc_reset_hpt(), now it is.  So we need to move it above that
> > function.  I could move it in the previous patch, but that would
> > obscure what the actual changes are to it, so it seemed better to do
> > it here.
> 
> kvmppc_free_hpt() is not a static function, there is a prototype in a
> header for this somewhere, so as far as I can see, it should also work
> without moving this function?

Oh yeah.. how did I miss that..

> 
> [...]
> >>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> >>> index 71c5adb..957e473 100644
> >>> --- a/arch/powerpc/kvm/book3s_hv.c
> >>> +++ b/arch/powerpc/kvm/book3s_hv.c
> >>> @@ -3600,12 +3600,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
> >>>  		r = -EFAULT;
> >>>  		if (get_user(htab_order, (u32 __user *)argp))
> >>>  			break;
> >>> -		r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
> >>> +		r = kvmppc_alloc_reset_hpt(kvm, htab_order);
> >>>  		if (r)
> >>>  			break;
> >>> -		r = -EFAULT;
> >>> -		if (put_user(htab_order, (u32 __user *)argp))
> >>> -			break;
> >>
> >> Now that htab_order is not changed anymore by the kernel, I'm pretty
> >> sure you need some checks on the value here before calling
> >> kvmppc_alloc_reset_hpt(), e.g. return an error code if htab_order <
> >> PPC_MIN_HPT_ORDER.
> > 
> > Right.  I've done that by putting the checks into
> > kvmppc_allocate_hpt() in the earlier patch.
> > 
> >> And, I'm not sure if I got that right, but in former times, the
> >> htab_order from the userspace application was just a suggestion, and now
> >> it's mandatory, right? So if an old userspace used a very high value
> >> here (or even something smaller than PPC_MIN_HPT_ORDER like 0), the
> >> kernel fixed this up and the userspace could run happily with the fixed
> >> value afterwards. But since this value from userspace if mandatory now,
> >> such an userspace application is broken now. So maybe it's better to
> >> introduce a new ioctl for this new behavior instead, to avoid breaking
> >> old userspace applications?
> > 
> > A long time ago it was just a hint.  However, that behaviour was
> > already changed in 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
> > smaller HPT size in allocation ioctl".  This is important: without
> > that we could get a different HPT size on the two ends of a migration,
> > which broke things nastily.
> 
> OK, makes sense, especially if the userspace provided an order. But if I
> get that patch right, it was still possible to call with order == 0 to
> get the automatic sizing?

No, I don't think so.  It was possible to pass a NULL htab_orderp to
the allocation functions to get autosizing, but that would only
actually happen when called from kvm run because the explicit
allocation ioctl() hadn't been called yet.

> Do we need to preserve that behavior for some
> very old userspace applications?

No; as long as userspace has used this ioctl() at all, it has always
passed an explicit size, never attempted to use autosizing.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-12-19 23:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15  5:53 [PATCH 00/11] KVM implementation of PAPR HPT resizing extension David Gibson
2016-12-15  5:53 ` [PATCH 01/11] powerpc/kvm: Reserve capabilities and ioctls for HPT resizing David Gibson
2016-12-16  9:25   ` Thomas Huth
2016-12-19  6:36     ` David Gibson
2016-12-16 13:15   ` Thomas Huth
2016-12-19  6:37     ` David Gibson
2016-12-15  5:53 ` [PATCH 02/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity David Gibson
2016-12-16  9:03   ` Thomas Huth
2016-12-15  5:53 ` [PATCH 03/11] powerpc/kvm: Gather HPT related variables into sub-structure David Gibson
2016-12-16  9:24   ` Thomas Huth
2016-12-19  0:03     ` David Gibson
2016-12-15  5:53 ` [PATCH 04/11] powerpc/kvm: Don't store values derivable from HPT order David Gibson
2016-12-16 10:39   ` Thomas Huth
2016-12-19  0:04     ` David Gibson
2016-12-15  5:53 ` [PATCH 05/11] powerpc/kvm: Split HPT allocation from activation David Gibson
2016-12-16 11:57   ` Thomas Huth
2016-12-19  0:24     ` David Gibson
2016-12-15  5:53 ` [PATCH 06/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size David Gibson
2016-12-16 12:44   ` Thomas Huth
2016-12-19  0:48     ` David Gibson
2016-12-19  7:49       ` Thomas Huth
2016-12-19 23:16         ` David Gibson
2016-12-15  5:54 ` [PATCH 07/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper() David Gibson
2016-12-16 12:48   ` Thomas Huth
2016-12-15  5:54 ` [PATCH 08/11] powerpc/kvm: KVM-HV HPT resizing stub implementation David Gibson
2016-12-16 15:35   ` Thomas Huth
2016-12-15  5:54 ` [PATCH 09/11] powerpc/kvm: Outline of KVM-HV HPT resizing implementation David Gibson
2016-12-15  5:54 ` [PATCH 10/11] powerpc/kvm: " David Gibson
2016-12-15  5:54 ` [PATCH 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).