kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests
@ 2019-08-26  6:20 Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 01/23] KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler Suraj Jitindar Singh
                   ` (22 more replies)
  0 siblings, 23 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

This patch series adds support for running a nested kvm guest which uses the hpt
(hash page table) mmu type.

Patch series based on v5.3-rc6.

The first 8 patches in this series enable a radix guest (L1) running under a
radix hypervisor (L0) to act as a guest hypervisor for it's own nested (L2)
guest where the nested guest is using hash page table translation. This mainly
involved ensuring that the guest hypervisor uses the new run_single_vcpu()
entry path and ensuring that the appropriate functions which are normally
called in the real mode entry path in book3s_hv_rmhandlers.S are called on the
new virtual mode entry path when a hpt guest is being run.

The remainder of the patches enable a (L0) hypervisor to perform hash page
table translation for a nested (L2) hpt guest which is running under one of
it's radix (L1) it's which is acting as a guest hypervisor. This primarily
required changes to the nested guest entry patch to ensure that a shadow hpt
would be allocated for the nested hpt guest, that the slb was context switched
and that the real mode entry path in book3s_hv_rmhandlers.S could be used to
enter/exit a nested hpt guest.

It was also necessary to be able to create translations by inserting ptes into
the shadow page table which provided the combination of the translation from L2
virtual address to L1 guest physical address and the translation from L1 guest
physical address to L0 host real address. Additionally invalidations of these
translations need to be handled at both levels, by L1 via the H_TLB_INVALIDATE
hcall to invalidate a L2 virtual address to L1 guest physical address
translation, and by L0 when paging out a L1 guest page which had been
subsequently mapped through to L2 thus invalidating the L1 guest physical
address to L0 host real address translation.

Still lacking support for:
Passthrough of emulated mmio devices to nested hpt guests since the current
method of reading nested guest memory relies on using quadrants which are only
available when using radix translation.

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault
    handler

Suraj Jitindar Singh (22):
  KVM: PPC: Book3S HV: Increment mmu_notifier_seq when modifying radix
    pte rc bits
  KVM: PPC: Book3S HV: Nested: Don't allow hash guests to run nested
    guests
  KVM: PPC: Book3S HV: Handle making H_ENTER_NESTED hcall in a separate
    function
  KVM: PPC: Book3S HV: Enable calling kvmppc_hpte_hv_fault in virtual
    mode
  KVM: PPC: Book3S HV: Allow hpt manipulation hcalls to be called in
    virtual mode
  KVM: PPC: Book3S HV: Make kvmppc_invalidate_hpte() take lpid not a kvm
    struct
  KVM: PPC: Book3S HV: Nested: Allow pseries hypervisor to run hpt
    nested guest
  KVM: PPC: Book3S HV: Nested: Improve comments and naming of nest rmap
    functions
  KVM: PPC: Book3S HV: Nested: Increase gpa field in nest rmap to 46
    bits
  KVM: PPC: Book3S HV: Nested: Remove single nest rmap entries
  KVM: PPC: Book3S HV: Nested: add kvmhv_remove_all_nested_rmap_lpid()
  KVM: PPC: Book3S HV: Nested: Infrastructure for nested hpt guest setup
  KVM: PPC: Book3S HV: Nested: Context switch slb for nested hpt guest
  KVM: PPC: Book3S HV: Store lpcr and hdec_exp in the vcpu struct
  KVM: PPC: Book3S HV: Nested: Make kvmppc_run_vcpu() entry path nested
    capable
  KVM: PPC: Book3S HV: Nested: Rename kvmhv_xlate_addr_nested_radix
  KVM: PPC: Book3S HV: Separate out hashing from
    kvmppc_hv_find_lock_hpte()
  KVM: PPC: Book3S HV: Nested: Implement nested hpt mmu translation
  KVM: PPC: Book3S HV: Nested: Handle tlbie hcall for nested hpt guest
  KVM: PPC: Book3S HV: Nested: Implement nest rmap invalidations for hpt
    guests
  KVM: PPC: Book3S HV: Nested: Enable nested hpt guests
  KVM: PPC: Book3S HV: Add nested hpt pte information to debugfs

 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   15 +
 arch/powerpc/include/asm/book3s/64/mmu.h      |    9 +
 arch/powerpc/include/asm/hvcall.h             |   36 -
 arch/powerpc/include/asm/kvm_asm.h            |    5 +
 arch/powerpc/include/asm/kvm_book3s.h         |   30 +-
 arch/powerpc/include/asm/kvm_book3s_64.h      |   87 +-
 arch/powerpc/include/asm/kvm_host.h           |   57 +
 arch/powerpc/include/asm/kvm_ppc.h            |    5 +-
 arch/powerpc/kernel/asm-offsets.c             |    5 +
 arch/powerpc/kvm/book3s.c                     |    1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |  136 ++-
 arch/powerpc/kvm/book3s_64_mmu_radix.c        |  167 +--
 arch/powerpc/kvm/book3s_hv.c                  |  327 ++++--
 arch/powerpc/kvm/book3s_hv_builtin.c          |   33 +-
 arch/powerpc/kvm/book3s_hv_interrupts.S       |   25 +-
 arch/powerpc/kvm/book3s_hv_nested.c           | 1381 ++++++++++++++++++++++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c           |  298 ++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S       |  126 ++-
 arch/powerpc/kvm/book3s_xive.h                |   15 +
 arch/powerpc/kvm/powerpc.c                    |    3 +-
 20 files changed, 2136 insertions(+), 625 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/23] KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 02/23] KVM: PPC: Book3S HV: Increment mmu_notifier_seq when modifying radix pte rc bits Suraj Jitindar Singh
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm

From: Paul Mackerras <paulus@ozlabs.org>

This makes the same changes in the page fault handler for HPT guests
that commits 31c8b0d0694a ("KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot()
in page fault handler", 2018-03-01), 71d29f43b633 ("KVM: PPC: Book3S HV:
Don't use compound_order to determine host mapping size", 2018-09-11)
and 6579804c4317 ("KVM: PPC: Book3S HV: Avoid crash from THP collapse
during radix page fault", 2018-10-04) made for the page fault handler
for radix guests.

In summary, where we used to call get_user_pages_fast() and then do
special handling for VM_PFNMAP vmas, we now call __get_user_pages_fast()
and then __gfn_to_pfn_memslot() if that fails, followed by reading the
Linux PTE to get the host PFN, host page size and mapping attributes.

This also brings in the change from SetPageDirty() to set_page_dirty_lock()
which was done for the radix page fault handler in commit c3856aeb2940
("KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault
handler", 2018-02-23).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 118 +++++++++++++++++-------------------
 1 file changed, 57 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 9a75f0e1933b..a485bb018193 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -497,17 +497,18 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	__be64 *hptep;
 	unsigned long mmu_seq, psize, pte_size;
 	unsigned long gpa_base, gfn_base;
-	unsigned long gpa, gfn, hva, pfn;
+	unsigned long gpa, gfn, hva, pfn, hpa;
 	struct kvm_memory_slot *memslot;
 	unsigned long *rmap;
 	struct revmap_entry *rev;
-	struct page *page, *pages[1];
-	long index, ret, npages;
+	struct page *page;
+	long index, ret;
 	bool is_ci;
-	unsigned int writing, write_ok;
-	struct vm_area_struct *vma;
+	bool writing, write_ok;
+	unsigned int shift;
 	unsigned long rcbits;
 	long mmio_update;
+	pte_t pte, *ptep;
 
 	if (kvm_is_radix(kvm))
 		return kvmppc_book3s_radix_page_fault(run, vcpu, ea, dsisr);
@@ -581,59 +582,62 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	smp_rmb();
 
 	ret = -EFAULT;
-	is_ci = false;
-	pfn = 0;
 	page = NULL;
-	pte_size = PAGE_SIZE;
 	writing = (dsisr & DSISR_ISSTORE) != 0;
 	/* If writing != 0, then the HPTE must allow writing, if we get here */
 	write_ok = writing;
 	hva = gfn_to_hva_memslot(memslot, gfn);
-	npages = get_user_pages_fast(hva, 1, writing ? FOLL_WRITE : 0, pages);
-	if (npages < 1) {
-		/* Check if it's an I/O mapping */
-		down_read(&current->mm->mmap_sem);
-		vma = find_vma(current->mm, hva);
-		if (vma && vma->vm_start <= hva && hva + psize <= vma->vm_end &&
-		    (vma->vm_flags & VM_PFNMAP)) {
-			pfn = vma->vm_pgoff +
-				((hva - vma->vm_start) >> PAGE_SHIFT);
-			pte_size = psize;
-			is_ci = pte_ci(__pte((pgprot_val(vma->vm_page_prot))));
-			write_ok = vma->vm_flags & VM_WRITE;
-		}
-		up_read(&current->mm->mmap_sem);
-		if (!pfn)
-			goto out_put;
+
+	/*
+	 * Do a fast check first, since __gfn_to_pfn_memslot doesn't
+	 * do it with !atomic && !async, which is how we call it.
+	 * We always ask for write permission since the common case
+	 * is that the page is writable.
+	 */
+	if (__get_user_pages_fast(hva, 1, 1, &page) == 1) {
+		write_ok = true;
 	} else {
-		page = pages[0];
-		pfn = page_to_pfn(page);
-		if (PageHuge(page)) {
-			page = compound_head(page);
-			pte_size <<= compound_order(page);
-		}
-		/* if the guest wants write access, see if that is OK */
-		if (!writing && hpte_is_writable(r)) {
-			pte_t *ptep, pte;
-			unsigned long flags;
-			/*
-			 * We need to protect against page table destruction
-			 * hugepage split and collapse.
-			 */
-			local_irq_save(flags);
-			ptep = find_current_mm_pte(current->mm->pgd,
-						   hva, NULL, NULL);
-			if (ptep) {
-				pte = kvmppc_read_update_linux_pte(ptep, 1);
-				if (__pte_write(pte))
-					write_ok = 1;
-			}
-			local_irq_restore(flags);
+		/* Call KVM generic code to do the slow-path check */
+		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+					   writing, &write_ok);
+		if (is_error_noslot_pfn(pfn))
+			return -EFAULT;
+		page = NULL;
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageReserved(page))
+				page = NULL;
 		}
 	}
 
+	/*
+	 * Read the PTE from the process' radix tree and use that
+	 * so we get the shift and attribute bits.
+	 */
+	local_irq_disable();
+	ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift);
+	/*
+	 * If the PTE disappeared temporarily due to a THP
+	 * collapse, just return and let the guest try again.
+	 */
+	if (!ptep) {
+		local_irq_enable();
+		if (page)
+			put_page(page);
+		return RESUME_GUEST;
+	}
+	pte = *ptep;
+	local_irq_enable();
+	hpa = pte_pfn(pte) << PAGE_SHIFT;
+	pte_size = PAGE_SIZE;
+	if (shift)
+		pte_size = 1ul << shift;
+	is_ci = pte_ci(pte);
+
 	if (psize > pte_size)
 		goto out_put;
+	if (pte_size > psize)
+		hpa |= hva & (pte_size - psize);
 
 	/* Check WIMG vs. the actual page we're accessing */
 	if (!hpte_cache_flags_ok(r, is_ci)) {
@@ -647,14 +651,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	}
 
 	/*
-	 * Set the HPTE to point to pfn.
-	 * Since the pfn is at PAGE_SIZE granularity, make sure we
+	 * Set the HPTE to point to hpa.
+	 * Since the hpa is at PAGE_SIZE granularity, make sure we
 	 * don't mask out lower-order bits if psize < PAGE_SIZE.
 	 */
 	if (psize < PAGE_SIZE)
 		psize = PAGE_SIZE;
-	r = (r & HPTE_R_KEY_HI) | (r & ~(HPTE_R_PP0 - psize)) |
-					((pfn << PAGE_SHIFT) & ~(psize - 1));
+	r = (r & HPTE_R_KEY_HI) | (r & ~(HPTE_R_PP0 - psize)) | hpa;
 	if (hpte_is_writable(r) && !write_ok)
 		r = hpte_make_readonly(r);
 	ret = RESUME_GUEST;
@@ -719,20 +722,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	asm volatile("ptesync" : : : "memory");
 	preempt_enable();
 	if (page && hpte_is_writable(r))
-		SetPageDirty(page);
+		set_page_dirty_lock(page);
 
  out_put:
 	trace_kvm_page_fault_exit(vcpu, hpte, ret);
 
-	if (page) {
-		/*
-		 * We drop pages[0] here, not page because page might
-		 * have been set to the head page of a compound, but
-		 * we have to drop the reference on the correct tail
-		 * page to match the get inside gup()
-		 */
-		put_page(pages[0]);
-	}
+	if (page)
+		put_page(page);
 	return ret;
 
  out_unlock:
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/23] KVM: PPC: Book3S HV: Increment mmu_notifier_seq when modifying radix pte rc bits
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 01/23] KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 03/23] KVM: PPC: Book3S HV: Nested: Don't allow hash guests to run nested guests Suraj Jitindar Singh
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The kvm mmu_notifier_seq is used to communicate that a page mapping has
changed to code which is using that information in constructing a different
but reliant page mapping. For example when constructing a mapping for a
nested guest it is used to detect when the guest mapping has changed, which
would render the nested guest mapping invalid.

When running nested guests it is important that the rc bits are kept in
sync between the 2 ptes on the host in which they exist; the pte for the
guest, and the pte for the nested guest. This is done when inserting the
nested pte in __kvmhv_nested_page_fault_radix() by reducing the rc bits
being set in the nested pte to those already set in the guest pte. And
when setting the bits in the nested pte in response to an interrupt in
kvmhv_handle_nested_set_rc_radix() the same bits are also set in the
guest pte, with the bits not set in the nested pte if this fails.

When the host wants to remove rc bits from the guest pte in
kvm_radix_test_clear_dirty(), if first removes then from the guest pte
and then from any corresponding nested ptes which map the same guest
page. This means that there is a window between which the rc bits could
get out of sync between the two ptes as they might have been seen as set
in the guest pte and thus updated in the nested pte assuming as such,
while the host might be in the process of removing those rc bits leading
to an inconsistency.

In kvm_radix_test_clear_dirty() the mmu_lock spin lock is held across
removing the rc bits from the guest and nested pte, and the same is done
across updating the rc bits in the guest and nested pte in
kvmhv_handle_nested_set_rc_radix() and so there is no window for them to
get out of sync in this case. However when constructing the pte in
__kvmhv_nested_page_fault_radix() we drop the mmu_lock spin lock between
reading the guest pte and inserting the nested pte, presenting a window
for them to get out of sync. This is because the rc bits could have been
observed as set in the guest pte and set in the nested pte accordingly,
however in the mean time the rc bits in the guest pte could have been
cleared, and since the nested pte wasn't yet inserted there is no way
for the kvm_radix_test_clear_dirty() function to clear them and so an
inconsistency can arise.

To avoid the possibility of the rc bits getting out of sync, increment
the mmu_notifier_seq in kvm_radix_test_clear_dirty() under the mmu_lock
when clearing rc bits. This means that when inserting the nested pte in
__kvmhv_nested_page_fault_radix() we will bail out and retry when we see
that the mmu_seq differs indicating that the guest pte has changed.

Fixes: ae59a7e1945b ("KVM: PPC: Book3S HV: Keep rc bits in shadow pgtable in sync with host")

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 2d415c36a61d..310d8dde9a48 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1044,6 +1044,8 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm,
 		kvmhv_update_nest_rmap_rc_list(kvm, rmapp, _PAGE_DIRTY, 0,
 					       old & PTE_RPN_MASK,
 					       1UL << shift);
+		/* Notify anyone trying to map the page that it has changed */
+		kvm->mmu_notifier_seq++;
 		spin_unlock(&kvm->mmu_lock);
 	}
 	return ret;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/23] KVM: PPC: Book3S HV: Nested: Don't allow hash guests to run nested guests
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 01/23] KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 02/23] KVM: PPC: Book3S HV: Increment mmu_notifier_seq when modifying radix pte rc bits Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-10-23  4:47   ` Paul Mackerras
  2019-08-26  6:20 ` [PATCH 04/23] KVM: PPC: Book3S HV: Handle making H_ENTER_NESTED hcall in a separate function Suraj Jitindar Singh
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Don't allow hpt (hash page table) guests to act as guest hypervisors and
thus be able to run nested guests. There is currently no support for
this, if a nested guest is to be run it must be run at the lowest level.
Explicitly disallow hash guests from enabling the nested kvm-hv capability
at the hypervisor level.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cde3f5a4b3e4..ce960301bfaa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5336,8 +5336,12 @@ static int kvmhv_enable_nested(struct kvm *kvm)
 		return -ENODEV;
 
 	/* kvm == NULL means the caller is testing if the capability exists */
-	if (kvm)
+	if (kvm) {
+		/* Only radix guests can act as nested hv and thus run guests */
+		if (!kvm_is_radix(kvm))
+			return -1;
 		kvm->arch.nested_enable = true;
+	}
 	return 0;
 }
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/23] KVM: PPC: Book3S HV: Handle making H_ENTER_NESTED hcall in a separate function
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (2 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 03/23] KVM: PPC: Book3S HV: Nested: Don't allow hash guests to run nested guests Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 05/23] KVM: PPC: Book3S HV: Enable calling kvmppc_hpte_hv_fault in virtual mode Suraj Jitindar Singh
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

A pseries guest (that is a guest acting as a guest hypervisor) which
would like to run it's own guest (a nested guest) must perform this by
invoking the top level hypervisor via the H_ENTER_NESTED hcall.

Separate out the code which handles calling this into a separate
function for code clarity and readability.

No functional change.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 88 ++++++++++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ce960301bfaa..4901738a3c31 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3367,7 +3367,55 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 }
 
 /*
+ * Handle making the H_ENTER_NESTED hcall if we're pseries.
+ */
+static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit,
+				     unsigned long lpcr)
+{
+	/* call our hypervisor to load up HV regs and go */
+	struct hv_guest_state hvregs;
+	/* we need to save/restore host & guest psscr since L0 doesn't for us */
+	unsigned long host_psscr;
+	int trap;
+
+	host_psscr = mfspr(SPRN_PSSCR_PR);
+	mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
+	kvmhv_save_hv_regs(vcpu, &hvregs);
+	hvregs.lpcr = lpcr;
+	vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
+	hvregs.version = HV_GUEST_STATE_VERSION;
+	if (vcpu->arch.nested) {
+		hvregs.lpid = vcpu->arch.nested->shadow_lpid;
+		hvregs.vcpu_token = vcpu->arch.nested_vcpu_id;
+	} else {
+		hvregs.lpid = vcpu->kvm->arch.lpid;
+		hvregs.vcpu_token = vcpu->vcpu_id;
+	}
+	hvregs.hdec_expiry = time_limit;
+	trap = plpar_hcall_norets(H_ENTER_NESTED, __pa(&hvregs),
+				  __pa(&vcpu->arch.regs));
+	kvmhv_restore_hv_return_state(vcpu, &hvregs);
+	vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
+	vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
+	vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
+	vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
+	mtspr(SPRN_PSSCR_PR, host_psscr);
+
+	/* H_CEDE has to be handled now, not later */
+	if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
+	    kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
+		kvmppc_nested_cede(vcpu);
+		trap = 0;
+	}
+
+	return trap;
+}
+
+/*
  * Load up hypervisor-mode registers on P9.
+ * This is only called on baremetal (powernv) systems i.e. where
+ * CPU_FTR_HVMODE is set. This is only used for radix guests, however that
+ * radix guest may be a direct guest of this hypervisor or a nested guest.
  */
 static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 				     unsigned long lpcr)
@@ -3569,45 +3617,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 	mtspr(SPRN_DEC, vcpu->arch.dec_expires - mftb());
 
 	if (kvmhv_on_pseries()) {
-		/*
-		 * We need to save and restore the guest visible part of the
-		 * psscr (i.e. using SPRN_PSSCR_PR) since the hypervisor
-		 * doesn't do this for us. Note only required if pseries since
-		 * this is done in kvmhv_load_hv_regs_and_go() below otherwise.
-		 */
-		unsigned long host_psscr;
-		/* call our hypervisor to load up HV regs and go */
-		struct hv_guest_state hvregs;
-
-		host_psscr = mfspr(SPRN_PSSCR_PR);
-		mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
-		kvmhv_save_hv_regs(vcpu, &hvregs);
-		hvregs.lpcr = lpcr;
-		vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
-		hvregs.version = HV_GUEST_STATE_VERSION;
-		if (vcpu->arch.nested) {
-			hvregs.lpid = vcpu->arch.nested->shadow_lpid;
-			hvregs.vcpu_token = vcpu->arch.nested_vcpu_id;
-		} else {
-			hvregs.lpid = vcpu->kvm->arch.lpid;
-			hvregs.vcpu_token = vcpu->vcpu_id;
-		}
-		hvregs.hdec_expiry = time_limit;
-		trap = plpar_hcall_norets(H_ENTER_NESTED, __pa(&hvregs),
-					  __pa(&vcpu->arch.regs));
-		kvmhv_restore_hv_return_state(vcpu, &hvregs);
-		vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
-		vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
-		vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
-		vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
-		mtspr(SPRN_PSSCR_PR, host_psscr);
-
-		/* H_CEDE has to be handled now, not later */
-		if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
-		    kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
-			kvmppc_nested_cede(vcpu);
-			trap = 0;
-		}
+		trap = kvmhv_pseries_enter_guest(vcpu, time_limit, lpcr);
 	} else {
 		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
 	}
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/23] KVM: PPC: Book3S HV: Enable calling kvmppc_hpte_hv_fault in virtual mode
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (3 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 04/23] KVM: PPC: Book3S HV: Handle making H_ENTER_NESTED hcall in a separate function Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 06/23] KVM: PPC: Book3S HV: Allow hpt manipulation hcalls to be called " Suraj Jitindar Singh
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The function kvmppc_hpte_hv_fault() is used to search a hpt (hash page
table) for an existing entry given a slb entry, a fault code and a fault
address.

Currently this function is only called in real mode. Modify this function
so that it can be called in virtual mode and add a function parameter used
to specify if the function is being called from real mode or not.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_ppc.h      | 3 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c     | 9 +++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2484e6a8f5ca..2c4d659cf8bb 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -688,7 +688,8 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags,
 			   unsigned long dest, unsigned long src);
 long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
-                          unsigned long slb_v, unsigned int status, bool data);
+                          unsigned long slb_v, unsigned int status,
+			  bool data, bool is_realmode);
 unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu);
 unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu);
 unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 63e0ce91e29d..9f7ad4eaa528 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -1182,9 +1182,12 @@ EXPORT_SYMBOL(kvmppc_hv_find_lock_hpte);
  * -1 to pass the fault up to host kernel mode code, -2 to do that
  * and also load the instruction word (for MMIO emulation),
  * or 0 if we should make the guest retry the access.
+ * For a nested hypervisor, this will be called in virtual mode
+ * (is_realmode == false) and should be called with preemption disabled.
  */
 long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
-			  unsigned long slb_v, unsigned int status, bool data)
+			  unsigned long slb_v, unsigned int status,
+			  bool data, bool is_realmode)
 {
 	struct kvm *kvm = vcpu->kvm;
 	long int index;
@@ -1222,7 +1225,9 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 			v = hpte_new_to_old_v(v, r);
 			r = hpte_new_to_old_r(r);
 		}
-		rev = real_vmalloc_addr(&kvm->arch.hpt.rev[index]);
+		rev = &kvm->arch.hpt.rev[index];
+		if (is_realmode)
+			rev = real_vmalloc_addr(rev);
 		gr = rev->guest_rpte;
 
 		unlock_hpte(hpte, orig_v);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 337e64468d78..54e1864d4702 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2066,6 +2066,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	/* Search the hash table. */
 	mr	r3, r9			/* vcpu pointer */
 	li	r7, 1			/* data fault */
+	li	r8, 1			/* is real mode */
 	bl	kvmppc_hpte_hv_fault
 	ld	r9, HSTATE_KVM_VCPU(r13)
 	ld	r10, VCPU_PC(r9)
@@ -2158,6 +2159,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	mr	r4, r10
 	mr	r6, r11
 	li	r7, 0			/* instruction fault */
+	li	r8, 1			/* is real mode */
 	bl	kvmppc_hpte_hv_fault
 	ld	r9, HSTATE_KVM_VCPU(r13)
 	ld	r10, VCPU_PC(r9)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/23] KVM: PPC: Book3S HV: Allow hpt manipulation hcalls to be called in virtual mode
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (4 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 05/23] KVM: PPC: Book3S HV: Enable calling kvmppc_hpte_hv_fault in virtual mode Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 07/23] KVM: PPC: Book3S HV: Make kvmppc_invalidate_hpte() take lpid not a kvm struct Suraj Jitindar Singh
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The hcalls H_ENTER, H_REMOVE, H_READ, H_CLEAR_MOD, H_CLEAR_READ,
H_PROTECT and H_BULK_REMOVE are used by a guest to call into the
hypervisor in order to control manipulation of it's hpt (hash page table)
by the hypervisor on it'the guest's behalf.

Currently the functions which handle these hcalls are only called in
real mode from the kvm exit path in book3s_hv_rmhandlers.c.

Modify the functions which handle these hcalls so that they can be
called in real mode and call them from the virtual mode hcall handling
function kvmppc_pseries_do_hcall() if we're a pseries machine. There is
no need to call these functions again on powernv as they were already
called from real mode.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_book3s.h |  16 ++++-
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  10 +---
 arch/powerpc/kvm/book3s_hv.c          |  60 +++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 107 +++++++++++++++++++++++++---------
 4 files changed, 159 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 506e4df2d730..6d0bb22dc637 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -252,9 +252,23 @@ extern void kvmppc_unpin_guest_page(struct kvm *kvm, void *addr,
 extern long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 			long pte_index, unsigned long pteh, unsigned long ptel,
 			pgd_t *pgdir, bool realmode, unsigned long *idx_ret);
+extern long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
+				       long pte_index, unsigned long pteh,
+				       unsigned long ptel,
+				       unsigned long *pte_idx_ret);
 extern long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 			unsigned long pte_index, unsigned long avpn,
-			unsigned long *hpret);
+			bool realmode, unsigned long *hpret);
+extern long kvmppc_do_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
+			     unsigned long pte_index, bool realmode);
+extern long kvmppc_do_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+				  unsigned long pte_index, bool realmode);
+extern long kvmppc_do_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+				  unsigned long pte_index, bool realmode);
+extern long kvmppc_do_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
+				unsigned long pte_index, unsigned long avpn,
+				unsigned long va, bool realmode);
+extern long kvmppc_do_h_bulk_remove(struct kvm_vcpu *vcpu, bool realmode);
 extern long kvmppc_hv_get_dirty_log_hpt(struct kvm *kvm,
 			struct kvm_memory_slot *memslot, unsigned long *map);
 extern void kvmppc_harvest_vpa_dirty(struct kvmppc_vpa *vpa,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index a485bb018193..ab97b6bcf226 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -42,10 +42,6 @@
 	do { } while (0)
 #endif
 
-static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
-				long pte_index, unsigned long pteh,
-				unsigned long ptel, unsigned long *pte_idx_ret);
-
 struct kvm_resize_hpt {
 	/* These fields read-only after init */
 	struct kvm *kvm;
@@ -287,7 +283,7 @@ static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
 	kvmppc_set_msr(vcpu, msr);
 }
 
-static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
+long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
 				long pte_index, unsigned long pteh,
 				unsigned long ptel, unsigned long *pte_idx_ret)
 {
@@ -1907,7 +1903,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 			nb += HPTE_SIZE;
 
 			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
-				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
+				kvmppc_do_h_remove(kvm, 0, i, 0, false, tmp);
 			err = -EIO;
 			ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
 							 tmp);
@@ -1937,7 +1933,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 
 		for (j = 0; j < hdr.n_invalid; ++j) {
 			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
-				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
+				kvmppc_do_h_remove(kvm, 0, i, 0, false, tmp);
 			++i;
 			hptp += 2;
 		}
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4901738a3c31..67e242214191 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -916,6 +916,66 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		return RESUME_HOST;
 
 	switch (req) {
+	/*
+	 * The following hpt manipulation hcalls
+	 * (H_REMOVE/H_ENTER/H_READ/H_CLEAR_[MOD/REF]/H_PROTECT/H_BULK_REMOVE)
+	 * are normally handled in real mode in book3s_hv_rmhandlers.S for a
+	 * baremetal (powernv) hypervisor. For a pseries (nested) hypervisor we
+	 * didn't use that entry path, so we have to try handle them here before
+	 * punting them to userspace.
+	 * NOTE: There's not point trying to call the handlers again for
+	 *       !pseries since the only way we got here is if we couldn't
+	 *       handle them.
+	 */
+	case H_REMOVE:
+		if (!kvmhv_on_pseries())
+			return RESUME_HOST;
+		ret = kvmppc_do_h_remove(vcpu->kvm, kvmppc_get_gpr(vcpu, 4),
+					 kvmppc_get_gpr(vcpu, 5),
+					 kvmppc_get_gpr(vcpu, 6), false,
+					 &vcpu->arch.regs.gpr[4]);
+		break;
+	case H_ENTER:
+		if (!kvmhv_on_pseries())
+			return RESUME_HOST;
+		ret = kvmppc_virtmode_do_h_enter(vcpu->kvm,
+						 kvmppc_get_gpr(vcpu, 4),
+						 kvmppc_get_gpr(vcpu, 5),
+						 kvmppc_get_gpr(vcpu, 6),
+						 kvmppc_get_gpr(vcpu, 7),
+						 &vcpu->arch.regs.gpr[4]);
+		break;
+	case H_READ:
+		if (!kvmhv_on_pseries())
+			return RESUME_HOST;
+		ret = kvmppc_do_h_read(vcpu, kvmppc_get_gpr(vcpu, 4),
+				       kvmppc_get_gpr(vcpu, 5), false);
+		break;
+	case H_CLEAR_MOD:
+		if (!kvmhv_on_pseries())
+			return RESUME_HOST;
+		ret = kvmppc_do_h_clear_mod(vcpu, kvmppc_get_gpr(vcpu, 4),
+					    kvmppc_get_gpr(vcpu, 5), false);
+		break;
+	case H_CLEAR_REF:
+		if (!kvmhv_on_pseries())
+			return RESUME_HOST;
+		ret = kvmppc_do_h_clear_ref(vcpu, kvmppc_get_gpr(vcpu, 4),
+					    kvmppc_get_gpr(vcpu, 5), false);
+		break;
+	case H_PROTECT:
+		if (!kvmhv_on_pseries())
+			return RESUME_HOST;
+		ret = kvmppc_do_h_protect(vcpu, kvmppc_get_gpr(vcpu, 4),
+					  kvmppc_get_gpr(vcpu, 5),
+					  kvmppc_get_gpr(vcpu, 6),
+					  0UL, false);
+		break;
+	case H_BULK_REMOVE:
+		if (!kvmhv_on_pseries())
+			return RESUME_HOST;
+		ret = kvmppc_do_h_bulk_remove(vcpu, false);
+		break;
 	case H_CEDE:
 		break;
 	case H_PROD:
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9f7ad4eaa528..bd31d36332a8 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -137,7 +137,7 @@ static void kvmppc_set_dirty_from_hpte(struct kvm *kvm,
 static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v,
 				      unsigned long hpte_gr,
 				      struct kvm_memory_slot **memslotp,
-				      unsigned long *gfnp)
+				      unsigned long *gfnp, bool realmode)
 {
 	struct kvm_memory_slot *memslot;
 	unsigned long *rmap;
@@ -152,14 +152,17 @@ static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v,
 	if (!memslot)
 		return NULL;
 
-	rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]);
+	rmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
+	if (realmode)
+		rmap = real_vmalloc_addr(rmap);
 	return rmap;
 }
 
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
 				struct revmap_entry *rev,
-				unsigned long hpte_v, unsigned long hpte_r)
+				unsigned long hpte_v, unsigned long hpte_r,
+				bool realmode)
 {
 	struct revmap_entry *next, *prev;
 	unsigned long ptel, head;
@@ -170,14 +173,18 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
 
 	rcbits = hpte_r & (HPTE_R_R | HPTE_R_C);
 	ptel = rev->guest_rpte |= rcbits;
-	rmap = revmap_for_hpte(kvm, hpte_v, ptel, &memslot, &gfn);
+	rmap = revmap_for_hpte(kvm, hpte_v, ptel, &memslot, &gfn, realmode);
 	if (!rmap)
 		return;
 	lock_rmap(rmap);
 
 	head = *rmap & KVMPPC_RMAP_INDEX;
-	next = real_vmalloc_addr(&kvm->arch.hpt.rev[rev->forw]);
-	prev = real_vmalloc_addr(&kvm->arch.hpt.rev[rev->back]);
+	next = &kvm->arch.hpt.rev[rev->forw];
+	if (realmode)
+		next = real_vmalloc_addr(next);
+	prev = &kvm->arch.hpt.rev[rev->back];
+	if (realmode)
+		prev = real_vmalloc_addr(prev);
 	next->back = rev->back;
 	prev->forw = rev->forw;
 	if (head == pte_index) {
@@ -475,7 +482,7 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 
 long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 			unsigned long pte_index, unsigned long avpn,
-			unsigned long *hpret)
+			bool realmode, unsigned long *hpret)
 {
 	__be64 *hpte;
 	unsigned long v, r, rb;
@@ -502,7 +509,9 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 		return H_NOT_FOUND;
 	}
 
-	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+	rev = &kvm->arch.hpt.rev[pte_index];
+	if (realmode)
+		rev = real_vmalloc_addr(rev);
 	v = pte & ~HPTE_V_HVLOCK;
 	if (v & HPTE_V_VALID) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
@@ -518,7 +527,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 		 * obtain reliable values of R and C.
 		 */
 		remove_revmap_chain(kvm, pte_index, rev, v,
-				    be64_to_cpu(hpte[1]));
+				    be64_to_cpu(hpte[1]), realmode);
 	}
 	r = rev->guest_rpte & ~HPTE_GR_RESERVED;
 	note_hpte_modification(kvm, rev);
@@ -538,11 +547,11 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_remove);
 long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
 		     unsigned long pte_index, unsigned long avpn)
 {
-	return kvmppc_do_h_remove(vcpu->kvm, flags, pte_index, avpn,
+	return kvmppc_do_h_remove(vcpu->kvm, flags, pte_index, avpn, true,
 				  &vcpu->arch.regs.gpr[4]);
 }
 
-long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
+long kvmppc_do_h_bulk_remove(struct kvm_vcpu *vcpu, bool realmode)
 {
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long *args = &vcpu->arch.regs.gpr[4];
@@ -615,7 +624,9 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 
 			args[j] = ((0x80 | flags) << 56) + pte_index;
-			rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+			rev = &kvm->arch.hpt.rev[pte_index];
+			if (realmode)
+				rev = real_vmalloc_addr(rev);
 			note_hpte_modification(kvm, rev);
 
 			if (!(hp0 & HPTE_V_VALID)) {
@@ -650,7 +661,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			hp = hptes[k];
 			rev = revs[k];
 			remove_revmap_chain(kvm, pte_index, rev,
-				be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
+				be64_to_cpu(hp[0]), be64_to_cpu(hp[1]), true);
 			rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 			args[j] |= rcbits << (56 - 5);
 			__unlock_hpte(hp, 0);
@@ -659,10 +670,16 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvmppc_do_h_bulk_remove);
 
-long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
-		      unsigned long pte_index, unsigned long avpn,
-		      unsigned long va)
+long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
+{
+	return kvmppc_do_h_bulk_remove(vcpu, true);
+}
+
+long kvmppc_do_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
+			 unsigned long pte_index, unsigned long avpn,
+			 unsigned long va, bool realmode)
 {
 	struct kvm *kvm = vcpu->kvm;
 	__be64 *hpte;
@@ -695,7 +712,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	/* Update guest view of 2nd HPTE dword */
 	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
 		HPTE_R_KEY_HI | HPTE_R_KEY_LO;
-	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+	rev = &kvm->arch.hpt.rev[pte_index];
+	if (realmode)
+		rev = real_vmalloc_addr(rev);
 	if (rev) {
 		r = (rev->guest_rpte & ~mask) | bits;
 		rev->guest_rpte = r;
@@ -730,9 +749,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 
 	return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_do_h_protect);
 
-long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
-		   unsigned long pte_index)
+long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
+		      unsigned long pte_index, unsigned long avpn,
+		      unsigned long va)
+{
+	return kvmppc_do_h_protect(vcpu, flags, pte_index, avpn, va, true);
+}
+
+long kvmppc_do_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
+		      unsigned long pte_index, bool realmode)
 {
 	struct kvm *kvm = vcpu->kvm;
 	__be64 *hpte;
@@ -748,7 +775,9 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		pte_index &= ~3;
 		n = 4;
 	}
-	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+	rev = &kvm->arch.hpt.rev[pte_index];
+	if (realmode)
+		rev = real_vmalloc_addr(rev);
 	for (i = 0; i < n; ++i, ++pte_index) {
 		hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
@@ -770,9 +799,16 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 	return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_do_h_read);
 
-long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
-			unsigned long pte_index)
+long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
+		   unsigned long pte_index)
+{
+	return kvmppc_do_h_read(vcpu, flags, pte_index, true);
+}
+
+long kvmppc_do_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+			   unsigned long pte_index, bool realmode)
 {
 	struct kvm *kvm = vcpu->kvm;
 	__be64 *hpte;
@@ -786,7 +822,9 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 
-	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+	rev = &kvm->arch.hpt.rev[pte_index];
+	if (realmode)
+		rev = real_vmalloc_addr(rev);
 	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
@@ -804,7 +842,8 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 		gr |= r & (HPTE_R_R | HPTE_R_C);
 		if (r & HPTE_R_R) {
 			kvmppc_clear_ref_hpte(kvm, hpte, pte_index);
-			rmap = revmap_for_hpte(kvm, v, gr, NULL, NULL);
+			rmap = revmap_for_hpte(kvm, v, gr, NULL, NULL,
+					       realmode);
 			if (rmap) {
 				lock_rmap(rmap);
 				*rmap |= KVMPPC_RMAP_REFERENCED;
@@ -818,10 +857,17 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvmppc_do_h_clear_ref);
 
-long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
 			unsigned long pte_index)
 {
+	return kvmppc_do_h_clear_ref(vcpu, flags, pte_index, true);
+}
+
+long kvmppc_do_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+			   unsigned long pte_index, bool realmode)
+{
 	struct kvm *kvm = vcpu->kvm;
 	__be64 *hpte;
 	unsigned long v, r, gr;
@@ -833,7 +879,9 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
 		return H_PARAMETER;
 
-	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+	rev = &kvm->arch.hpt.rev[pte_index];
+	if (realmode)
+		rev = real_vmalloc_addr(rev);
 	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
@@ -865,6 +913,13 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvmppc_do_h_clear_mod);
+
+long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+			unsigned long pte_index)
+{
+	return kvmppc_do_h_clear_mod(vcpu, flags, pte_index, true);
+}
 
 static int kvmppc_get_hpa(struct kvm_vcpu *vcpu, unsigned long gpa,
 			  int writing, unsigned long *hpa,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/23] KVM: PPC: Book3S HV: Make kvmppc_invalidate_hpte() take lpid not a kvm struct
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (5 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 06/23] KVM: PPC: Book3S HV: Allow hpt manipulation hcalls to be called " Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 08/23] KVM: PPC: Book3S HV: Nested: Allow pseries hypervisor to run hpt nested guest Suraj Jitindar Singh
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The function kvmppc_invalidate_hpte() is used to invalidate and perform
the tlbies for a given pte in a hpt (hash page table). Currently the
function takes a kvm struct as an argument, however the only member
of this struct that it accesses in the lpid field. Modify this function
to take an lpid argument in place of the kvm struct.

No functional change.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_book3s.h |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  6 +++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 22 ++++++++++++----------
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 6d0bb22dc637..c69eeb4e176c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -241,7 +241,7 @@ extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
 			unsigned long *rmap, long pte_index, int realmode);
 extern void kvmppc_update_dirty_map(const struct kvm_memory_slot *memslot,
 			unsigned long gfn, unsigned long psize);
-extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
+extern void kvmppc_invalidate_hpte(unsigned int lpid, __be64 *hptep,
 			unsigned long pte_index);
 void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
 			unsigned long pte_index);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ab97b6bcf226..bbb23b3f8bb9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -701,7 +701,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		/* HPTE was previously valid, so we need to invalidate it */
 		unlock_rmap(rmap);
 		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
-		kvmppc_invalidate_hpte(kvm, hptep, index);
+		kvmppc_invalidate_hpte(kvm->arch.lpid, hptep, index);
 		/* don't lose previous R and C bits */
 		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
 	} else {
@@ -829,7 +829,7 @@ static void kvmppc_unmap_hpte(struct kvm *kvm, unsigned long i,
 	if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
 	    hpte_rpn(ptel, psize) == gfn) {
 		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
-		kvmppc_invalidate_hpte(kvm, hptep, i);
+		kvmppc_invalidate_hpte(kvm->arch.lpid, hptep, i);
 		hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 		/* Harvest R and C */
 		rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
@@ -1094,7 +1094,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 
 		/* need to make it temporarily absent so C is stable */
 		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
-		kvmppc_invalidate_hpte(kvm, hptep, i);
+		kvmppc_invalidate_hpte(kvm->arch.lpid, hptep, i);
 		v = be64_to_cpu(hptep[0]);
 		r = be64_to_cpu(hptep[1]);
 		if (r & HPTE_R_C) {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index bd31d36332a8..1d26d509aaf6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -440,7 +440,7 @@ static inline int is_mmio_hpte(unsigned long v, unsigned long r)
 		(HPTE_R_KEY_HI | HPTE_R_KEY_LO));
 }
 
-static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
+static void do_tlbies(unsigned int lpid, unsigned long *rbvalues,
 		      long npages, int global, bool need_sync)
 {
 	long i;
@@ -455,7 +455,7 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 			asm volatile("ptesync" : : : "memory");
 		for (i = 0; i < npages; ++i) {
 			asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
-				     "r" (rbvalues[i]), "r" (kvm->arch.lpid));
+				     "r" (rbvalues[i]), "r" (lpid));
 		}
 
 		if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) {
@@ -465,7 +465,7 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 			 */
 			asm volatile("ptesync": : :"memory");
 			asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
-				     "r" (rbvalues[0]), "r" (kvm->arch.lpid));
+				     "r" (rbvalues[0]), "r" (lpid));
 		}
 
 		asm volatile("eieio; tlbsync; ptesync" : : : "memory");
@@ -516,7 +516,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	if (v & HPTE_V_VALID) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
 		rb = compute_tlbie_rb(v, pte_r, pte_index);
-		do_tlbies(kvm, &rb, 1, global_invalidates(kvm), true);
+		do_tlbies(kvm->arch.lpid, &rb, 1, global_invalidates(kvm),
+			  true);
 		/*
 		 * The reference (R) and change (C) bits in a HPT
 		 * entry can be set by hardware at any time up until
@@ -652,7 +653,7 @@ long kvmppc_do_h_bulk_remove(struct kvm_vcpu *vcpu, bool realmode)
 			break;
 
 		/* Now that we've collected a batch, do the tlbies */
-		do_tlbies(kvm, tlbrb, n, global, true);
+		do_tlbies(kvm->arch.lpid, tlbrb, n, global, true);
 
 		/* Read PTE low words after tlbie to get final R/C values */
 		for (k = 0; k < n; ++k) {
@@ -736,7 +737,8 @@ long kvmppc_do_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 			rb = compute_tlbie_rb(v, r, pte_index);
 			hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
 					      HPTE_V_ABSENT);
-			do_tlbies(kvm, &rb, 1, global_invalidates(kvm), true);
+			do_tlbies(kvm->arch.lpid, &rb, 1,
+				  global_invalidates(kvm), true);
 			/* Don't lose R/C bit updates done by hardware */
 			r |= be64_to_cpu(hpte[1]) & (HPTE_R_R | HPTE_R_C);
 			hpte[1] = cpu_to_be64(r);
@@ -898,7 +900,7 @@ long kvmppc_do_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
 	if (v & HPTE_V_VALID) {
 		/* need to make it temporarily absent so C is stable */
 		hpte[0] |= cpu_to_be64(HPTE_V_ABSENT);
-		kvmppc_invalidate_hpte(kvm, hpte, pte_index);
+		kvmppc_invalidate_hpte(kvm->arch.lpid, hpte, pte_index);
 		r = be64_to_cpu(hpte[1]);
 		gr |= r & (HPTE_R_R | HPTE_R_C);
 		if (r & HPTE_R_C) {
@@ -1064,7 +1066,7 @@ long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags,
 	return ret;
 }
 
-void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
+void kvmppc_invalidate_hpte(unsigned int lpid, __be64 *hptep,
 			unsigned long pte_index)
 {
 	unsigned long rb;
@@ -1078,7 +1080,7 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
 		hp1 = hpte_new_to_old_r(hp1);
 	}
 	rb = compute_tlbie_rb(hp0, hp1, pte_index);
-	do_tlbies(kvm, &rb, 1, 1, true);
+	do_tlbies(lpid, &rb, 1, 1, true);
 }
 EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
 
@@ -1099,7 +1101,7 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
 	rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
 	/* modify only the second-last byte, which contains the ref bit */
 	*((char *)hptep + 14) = rbyte;
-	do_tlbies(kvm, &rb, 1, 1, false);
+	do_tlbies(kvm->arch.lpid, &rb, 1, 1, false);
 }
 EXPORT_SYMBOL_GPL(kvmppc_clear_ref_hpte);
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/23] KVM: PPC: Book3S HV: Nested: Allow pseries hypervisor to run hpt nested guest
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (6 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 07/23] KVM: PPC: Book3S HV: Make kvmppc_invalidate_hpte() take lpid not a kvm struct Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 09/23] KVM: PPC: Book3S HV: Nested: Improve comments and naming of nest rmap functions Suraj Jitindar Singh
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Allow a pseries guest hypervisor (must be a radix guest) to run a hpt
(hash page table) nested guest.

Modify the entry path such that a pseries hypervisor will always use the
kvmhv_run_single_vcpu() function to enter a guest which will result in
it calling H_ENTER_NESTED. Also modify the API to H_ENTER_NESTED to add
a version 2 which adds a slb pointer to the argument list which provides
the slb state to be used to run the nested guest.

Also modify the exit path such that a pseries hypervisor will call
kvmppc_hpte_hv_fault() when handling a page fault which would normally
be called from real mode in book3s_hv_rmhandlers.c on a powernv
hypervisor. This is required for subsequent functions which are invoked
to handle the page fault. Also save the slb state on guest exit and
don't zero slb_max.

Modify kvm_vm_ioctl_get_smmu_info_hv() such that only 4k and 64k page
size support is reported to the guest. This is to ensure that we can
maintain a 1-to-1 mapping between the guest hpt and the shadow hpt for
simplicity (this could be relaxed later with appropriate support) since
a 16M page would likely have to be broken into multiple smaller pages
since the radix guest is likely to at most be backed by 2M pages.

Modify do_tlbies() such that a pseries hypervisor will make the
H_TLB_INVALIDATE hcall to notify it's hypervisor of the invalidation of
partition scoped translation information which is required to keep the
shadow hpt in sync.

Finally allow a pseries hypervisor to run a nested hpt guest by reporting
the KVM_CAP_PPC_MMU_HASH_V3 capability and allowing handling of
kvmhv_configure_mmu().

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/hvcall.h     | 36 --------------
 arch/powerpc/include/asm/kvm_book3s.h |  2 +
 arch/powerpc/include/asm/kvm_host.h   | 55 +++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c          | 90 ++++++++++++++++++++++++++++-------
 arch/powerpc/kvm/book3s_hv_nested.c   | 22 ++++++++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 20 ++++++++
 arch/powerpc/kvm/powerpc.c            |  3 +-
 7 files changed, 173 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 11112023e327..19afab30c7d0 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -481,42 +481,6 @@ struct h_cpu_char_result {
 	u64 behaviour;
 };
 
-/* Register state for entering a nested guest with H_ENTER_NESTED */
-struct hv_guest_state {
-	u64 version;		/* version of this structure layout */
-	u32 lpid;
-	u32 vcpu_token;
-	/* These registers are hypervisor privileged (at least for writing) */
-	u64 lpcr;
-	u64 pcr;
-	u64 amor;
-	u64 dpdes;
-	u64 hfscr;
-	s64 tb_offset;
-	u64 dawr0;
-	u64 dawrx0;
-	u64 ciabr;
-	u64 hdec_expiry;
-	u64 purr;
-	u64 spurr;
-	u64 ic;
-	u64 vtb;
-	u64 hdar;
-	u64 hdsisr;
-	u64 heir;
-	u64 asdr;
-	/* These are OS privileged but need to be set late in guest entry */
-	u64 srr0;
-	u64 srr1;
-	u64 sprg[4];
-	u64 pidr;
-	u64 cfar;
-	u64 ppr;
-};
-
-/* Latest version of hv_guest_state structure */
-#define HV_GUEST_STATE_VERSION	1
-
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index c69eeb4e176c..40218e81b75f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -317,6 +317,8 @@ long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
 int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu,
 			  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
+void kvmhv_save_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp);
+void kvmhv_restore_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp);
 void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu,
 				   struct hv_guest_state *hr);
 long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index e6e5f59aaa97..bad09c213be6 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -813,6 +813,61 @@ struct kvm_vcpu_arch {
 #endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+
+/* Following definitions used for the H_ENTER_NESTED hcall parameters */
+
+/* Following structure(s) added in Version 1 */
+
+/* Register state for entering a nested guest with H_ENTER_NESTED */
+struct hv_guest_state {
+	/* version 1 */
+	u64 version;		/* version of this structure layout */
+	u32 lpid;
+	u32 vcpu_token;
+	/* These registers are hypervisor privileged (at least for writing) */
+	u64 lpcr;
+	u64 pcr;
+	u64 amor;
+	u64 dpdes;
+	u64 hfscr;
+	s64 tb_offset;
+	u64 dawr0;
+	u64 dawrx0;
+	u64 ciabr;
+	u64 hdec_expiry;
+	u64 purr;
+	u64 spurr;
+	u64 ic;
+	u64 vtb;
+	u64 hdar;
+	u64 hdsisr;
+	u64 heir;
+	u64 asdr;
+	/* These are OS privileged but need to be set late in guest entry */
+	u64 srr0;
+	u64 srr1;
+	u64 sprg[4];
+	u64 pidr;
+	u64 cfar;
+	u64 ppr;
+};
+
+/* Following structure(s) added in Version 2 */
+
+/* SLB state for entering a nested guest with H_ENTER_NESTED */
+struct guest_slb {
+	struct kvmppc_slb slb[64];
+	int slb_max;		/* 1 + index of last valid entry in slb[] */
+	int slb_nr;		/* total number of entries in SLB */
+};
+
+/* Min and max supported versions of the above structure(s) */
+#define HV_GUEST_STATE_MIN_VERSION	1
+#define HV_GUEST_STATE_MAX_VERSION	2
+
+#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
+
 #define VCPU_FPR(vcpu, i)	(vcpu)->arch.fp.fpr[i][TS_FPROFFSET]
 #define VCPU_VSX_FPR(vcpu, i, j)	((vcpu)->arch.fp.fpr[i][j])
 #define VCPU_VSX_VR(vcpu, i)		((vcpu)->arch.vr.vr[i])
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 67e242214191..be72bc6b4cd5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3434,16 +3434,28 @@ static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit,
 {
 	/* call our hypervisor to load up HV regs and go */
 	struct hv_guest_state hvregs;
+	struct guest_slb *slbp = NULL;
 	/* we need to save/restore host & guest psscr since L0 doesn't for us */
 	unsigned long host_psscr;
 	int trap;
 
+	if (!kvmhv_vcpu_is_radix(vcpu)) {
+		slbp = kzalloc(sizeof(*slbp), GFP_KERNEL);
+		if (!slbp) {
+			pr_err_ratelimited("KVM: Couldn't alloc hv_guest_slb\n");
+			return 0;
+		}
+		kvmhv_save_guest_slb(vcpu, slbp);
+		hvregs.version = 2;	/* V2 required for hpt guest support */
+	} else {
+		hvregs.version = 1;	/* V1 sufficient for radix guest */
+	}
+
 	host_psscr = mfspr(SPRN_PSSCR_PR);
 	mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
 	kvmhv_save_hv_regs(vcpu, &hvregs);
 	hvregs.lpcr = lpcr;
 	vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
-	hvregs.version = HV_GUEST_STATE_VERSION;
 	if (vcpu->arch.nested) {
 		hvregs.lpid = vcpu->arch.nested->shadow_lpid;
 		hvregs.vcpu_token = vcpu->arch.nested_vcpu_id;
@@ -3453,8 +3465,12 @@ static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit,
 	}
 	hvregs.hdec_expiry = time_limit;
 	trap = plpar_hcall_norets(H_ENTER_NESTED, __pa(&hvregs),
-				  __pa(&vcpu->arch.regs));
+				  __pa(&vcpu->arch.regs), __pa(slbp));
 	kvmhv_restore_hv_return_state(vcpu, &hvregs);
+	if (!kvmhv_vcpu_is_radix(vcpu)) {
+		kvmhv_restore_guest_slb(vcpu, slbp);
+		kfree(slbp);
+	}
 	vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
 	vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
 	vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
@@ -3466,6 +3482,49 @@ static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit,
 	    kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
 		kvmppc_nested_cede(vcpu);
 		trap = 0;
+	} else if ((!kvmhv_vcpu_is_radix(vcpu)) &&
+				(trap == BOOK3S_INTERRUPT_H_DATA_STORAGE ||
+				 trap == BOOK3S_INTERRUPT_H_INST_STORAGE)) {
+		bool data = (trap == BOOK3S_INTERRUPT_H_DATA_STORAGE);
+		unsigned long addr, slb_v;
+		unsigned int dsisr;
+		long ret;
+
+		/* NOTE: fault_gpa was reused to store faulting slb entry. */
+		slb_v = vcpu->arch.fault_gpa;
+		if (data) {
+			addr = vcpu->arch.fault_dar;
+			dsisr = vcpu->arch.fault_dsisr;
+		} else {
+			addr = kvmppc_get_pc(vcpu);
+			dsisr = vcpu->arch.shregs.msr & DSISR_SRR1_MATCH_64S;
+			if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
+				dsisr |= DSISR_ISSTORE;
+		}
+
+		/*
+		* kvmppc_hpte_hv_fault is normally called on the exit path in
+		* book3s_hv_rmhandlers.S, however here (for a pseries
+		* hypervisor) we used the H_ENTER_NESTED hcall and so missed
+		* calling it. Thus call is here, now.
+		*/
+		ret = kvmppc_hpte_hv_fault(vcpu, addr, slb_v, dsisr, data, 0);
+		if (!ret) { /* let the guest try again */
+			trap = 0;
+		} else if ((!vcpu->arch.nested) && (ret > 0)) {
+			/*
+			 * Synthesize a DSI or ISI for the guest
+			 * NOTE: don't need to worry about this being a segment
+			 * fault since if that was the case the L0 hypervisor
+			 * would have delivered this to the nested guest
+			 * directly already.
+			 */
+			if (data)
+				kvmppc_core_queue_data_storage(vcpu, addr, ret);
+			else
+				kvmppc_core_queue_inst_storage(vcpu, ret);
+			trap = 0;
+		}
 	}
 
 	return trap;
@@ -3682,7 +3741,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
 	}
 
-	vcpu->arch.slb_max = 0;
+	if (kvm_is_radix(vcpu->kvm))
+		vcpu->arch.slb_max = 0;
 	dec = mfspr(SPRN_DEC);
 	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
 		dec = (s32) dec;
@@ -4346,9 +4406,12 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		 * for radix guests using the guest PIDR value and LPID 0.
 		 * The workaround is in the old path (kvmppc_run_vcpu())
 		 * but not the new path (kvmhv_run_single_vcpu()).
+		 * N.B. We need to use the kvmhv_run_single_vcpu() path on
+		 *      pseries to ensure we call H_ENTER_NESTED.
 		 */
-		if (kvm->arch.threads_indep && kvm_is_radix(kvm) &&
-		    !no_mixing_hpt_and_radix)
+		if (kvmhv_on_pseries() || (kvm->arch.threads_indep &&
+					   kvm_is_radix(kvm) &&
+					   !no_mixing_hpt_and_radix))
 			r = kvmhv_run_single_vcpu(run, vcpu, ~(u64)0,
 						  vcpu->arch.vcore->lpcr);
 		else
@@ -4396,9 +4459,10 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
 	(*sps)->enc[0].page_shift = shift;
 	(*sps)->enc[0].pte_enc = kvmppc_pgsize_lp_encoding(shift, shift);
 	/*
-	 * Add 16MB MPSS support (may get filtered out by userspace)
+	 * Add 16MB MPSS support (may get filtered out by userspace) if we're
+	 * not running as a nested hypervisor (pseries)
 	 */
-	if (shift != 24) {
+	if (shift != 24 && !kvmhv_on_pseries()) {
 		int penc = kvmppc_pgsize_lp_encoding(shift, 24);
 		if (penc != -1) {
 			(*sps)->enc[1].page_shift = 24;
@@ -4429,11 +4493,9 @@ static int kvm_vm_ioctl_get_smmu_info_hv(struct kvm *kvm,
 	sps = &info->sps[0];
 	kvmppc_add_seg_page_size(&sps, 12, 0);
 	kvmppc_add_seg_page_size(&sps, 16, SLB_VSID_L | SLB_VSID_LP_01);
-	kvmppc_add_seg_page_size(&sps, 24, SLB_VSID_L);
-
-	/* If running as a nested hypervisor, we don't support HPT guests */
-	if (kvmhv_on_pseries())
-		info->flags |= KVM_PPC_NO_HASH;
+	if (!kvmhv_on_pseries()) {
+		kvmppc_add_seg_page_size(&sps, 24, SLB_VSID_L);
+	} /* else no 16M page size support */
 
 	return 0;
 }
@@ -5362,10 +5424,6 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
 	if (radix && !radix_enabled())
 		return -EINVAL;
 
-	/* If we're a nested hypervisor, we currently only support radix */
-	if (kvmhv_on_pseries() && !radix)
-		return -EINVAL;
-
 	mutex_lock(&kvm->arch.mmu_setup_lock);
 	if (radix != kvm_is_radix(kvm)) {
 		if (kvm->arch.mmu_ready) {
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 735e0ac6f5b2..68d492e8861e 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -51,6 +51,16 @@ void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 	hr->ppr = vcpu->arch.ppr;
 }
 
+void kvmhv_save_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp)
+{
+	int i;
+
+	for (i = 0; i < 64; i++)
+		slbp->slb[i] = vcpu->arch.slb[i];
+	slbp->slb_max = vcpu->arch.slb_max;
+	slbp->slb_nr = vcpu->arch.slb_nr;
+}
+
 static void byteswap_pt_regs(struct pt_regs *regs)
 {
 	unsigned long *addr = (unsigned long *) regs;
@@ -169,6 +179,16 @@ static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 	vcpu->arch.ppr = hr->ppr;
 }
 
+void kvmhv_restore_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp)
+{
+	int i;
+
+	for (i = 0; i < 64; i++)
+		vcpu->arch.slb[i] = slbp->slb[i];
+	vcpu->arch.slb_max = slbp->slb_max;
+	vcpu->arch.slb_nr = slbp->slb_nr;
+}
+
 void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu,
 				   struct hv_guest_state *hr)
 {
@@ -239,7 +259,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		return H_PARAMETER;
 	if (kvmppc_need_byteswap(vcpu))
 		byteswap_hv_regs(&l2_hv);
-	if (l2_hv.version != HV_GUEST_STATE_VERSION)
+	if (l2_hv.version != 1)
 		return H_P2;
 
 	regs_ptr = kvmppc_get_gpr(vcpu, 5);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1d26d509aaf6..53fe51d04d78 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -446,6 +446,25 @@ static void do_tlbies(unsigned int lpid, unsigned long *rbvalues,
 	long i;
 
 	/*
+	 * Handle the case where we're running as a nested hypervisor and so
+	 * have to make an hcall to handle invalidations for us.
+	 */
+	if (kvmhv_on_pseries()) {
+		unsigned long rc, ric = 0, prs = 0, r = 0;
+
+		for (i = 0; i < npages; i++) {
+			rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+						H_TLBIE_P1_ENC(ric, prs, r),
+						lpid, rbvalues[i]);
+			if (rc)
+				pr_err("KVM: HPT TLB page invalidation hcall failed"
+					", rc=%ld\n", rc);
+		}
+
+		return;
+	}
+
+	/*
 	 * We use the POWER9 5-operand versions of tlbie and tlbiel here.
 	 * Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
 	 * the RS field, this is backwards-compatible with P7 and P8.
@@ -1355,3 +1374,4 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 
 	return -1;		/* send fault up to host kernel mode */
 }
+EXPORT_SYMBOL_GPL(kvmppc_hpte_hv_fault);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3e566c2e6066..b74f794873cd 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -604,8 +604,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = !!(hv_enabled && radix_enabled());
 		break;
 	case KVM_CAP_PPC_MMU_HASH_V3:
-		r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300) &&
-		       cpu_has_feature(CPU_FTR_HVMODE));
+		r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300));
 		break;
 	case KVM_CAP_PPC_NESTED_HV:
 		r = !!(hv_enabled && kvmppc_hv_ops->enable_nested &&
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/23] KVM: PPC: Book3S HV: Nested: Improve comments and naming of nest rmap functions
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (7 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 08/23] KVM: PPC: Book3S HV: Nested: Allow pseries hypervisor to run hpt nested guest Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 10/23] KVM: PPC: Book3S HV: Nested: Increase gpa field in nest rmap to 46 bits Suraj Jitindar Singh
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The nested rmap entries are used to track nested pages which map a given
guest page such that that information can be retrieved from the guest
memslot.

Improve the naming of some of these functions such that it's clearer
what they do, the functions with remove in the name remove the rmap
_and_ perform an invalidation so rename them invalidate to reflect this.

kvmhv_insert_nest_rmap() takes a kvm struct as an argument which is unused,
so remove this argument.

Additionally improve the function comments and add information about which
locks must be held for clarity.

No functional change.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  4 +--
 arch/powerpc/kvm/book3s_64_mmu_radix.c   |  8 +++---
 arch/powerpc/kvm/book3s_hv_nested.c      | 49 +++++++++++++++++++++++---------
 3 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index bb7c8cc77f1a..bec78f15e2f5 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -624,12 +624,12 @@ extern int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
 			     unsigned long gpa, unsigned int level,
 			     unsigned long mmu_seq, unsigned int lpid,
 			     unsigned long *rmapp, struct rmap_nested **n_rmap);
-extern void kvmhv_insert_nest_rmap(struct kvm *kvm, unsigned long *rmapp,
+extern void kvmhv_insert_nest_rmap(unsigned long *rmapp,
 				   struct rmap_nested **n_rmap);
 extern void kvmhv_update_nest_rmap_rc_list(struct kvm *kvm, unsigned long *rmapp,
 					   unsigned long clr, unsigned long set,
 					   unsigned long hpa, unsigned long nbytes);
-extern void kvmhv_remove_nest_rmap_range(struct kvm *kvm,
+extern void kvmhv_invalidate_nest_rmap_range(struct kvm *kvm,
 				const struct kvm_memory_slot *memslot,
 				unsigned long gpa, unsigned long hpa,
 				unsigned long nbytes);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 310d8dde9a48..48b844d33dc9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -405,7 +405,7 @@ void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa,
 
 	gpa &= ~(page_size - 1);
 	hpa = old & PTE_RPN_MASK;
-	kvmhv_remove_nest_rmap_range(kvm, memslot, gpa, hpa, page_size);
+	kvmhv_invalidate_nest_rmap_range(kvm, memslot, gpa, hpa, page_size);
 
 	if ((old & _PAGE_DIRTY) && memslot->dirty_bitmap)
 		kvmppc_update_dirty_map(memslot, gfn, page_size);
@@ -643,7 +643,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
 		}
 		kvmppc_radix_set_pte_at(kvm, gpa, (pte_t *)pud, pte);
 		if (rmapp && n_rmap)
-			kvmhv_insert_nest_rmap(kvm, rmapp, n_rmap);
+			kvmhv_insert_nest_rmap(rmapp, n_rmap);
 		ret = 0;
 		goto out_unlock;
 	}
@@ -695,7 +695,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
 		}
 		kvmppc_radix_set_pte_at(kvm, gpa, pmdp_ptep(pmd), pte);
 		if (rmapp && n_rmap)
-			kvmhv_insert_nest_rmap(kvm, rmapp, n_rmap);
+			kvmhv_insert_nest_rmap(rmapp, n_rmap);
 		ret = 0;
 		goto out_unlock;
 	}
@@ -721,7 +721,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
 	}
 	kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte);
 	if (rmapp && n_rmap)
-		kvmhv_insert_nest_rmap(kvm, rmapp, n_rmap);
+		kvmhv_insert_nest_rmap(rmapp, n_rmap);
 	ret = 0;
 
  out_unlock:
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 68d492e8861e..555b45a35fec 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -776,8 +776,8 @@ static inline bool kvmhv_n_rmap_is_equal(u64 rmap_1, u64 rmap_2)
 				       RMAP_NESTED_GPA_MASK));
 }
 
-void kvmhv_insert_nest_rmap(struct kvm *kvm, unsigned long *rmapp,
-			    struct rmap_nested **n_rmap)
+/* called with kvm->mmu_lock held */
+void kvmhv_insert_nest_rmap(unsigned long *rmapp, struct rmap_nested **n_rmap)
 {
 	struct llist_node *entry = ((struct llist_head *) rmapp)->first;
 	struct rmap_nested *cursor;
@@ -808,6 +808,11 @@ void kvmhv_insert_nest_rmap(struct kvm *kvm, unsigned long *rmapp,
 	*n_rmap = NULL;
 }
 
+/*
+ * called with kvm->mmu_lock held
+ * Given a single rmap entry, update the rc bits in the corresponding shadow
+ * pte. Should only be used to clear rc bits.
+ */
 static void kvmhv_update_nest_rmap_rc(struct kvm *kvm, u64 n_rmap,
 				      unsigned long clr, unsigned long set,
 				      unsigned long hpa, unsigned long mask)
@@ -838,8 +843,10 @@ static void kvmhv_update_nest_rmap_rc(struct kvm *kvm, u64 n_rmap,
 }
 
 /*
+ * called with kvm->mmu_lock held
  * For a given list of rmap entries, update the rc bits in all ptes in shadow
  * page tables for nested guests which are referenced by the rmap list.
+ * Should only be used to clear rc bits.
  */
 void kvmhv_update_nest_rmap_rc_list(struct kvm *kvm, unsigned long *rmapp,
 				    unsigned long clr, unsigned long set,
@@ -859,8 +866,12 @@ void kvmhv_update_nest_rmap_rc_list(struct kvm *kvm, unsigned long *rmapp,
 		kvmhv_update_nest_rmap_rc(kvm, rmap, clr, set, hpa, mask);
 }
 
-static void kvmhv_remove_nest_rmap(struct kvm *kvm, u64 n_rmap,
-				   unsigned long hpa, unsigned long mask)
+/*
+ * called with kvm->mmu_lock held
+ * Given a single rmap entry, invalidate the corresponding shadow pte.
+ */
+static void kvmhv_invalidate_nest_rmap(struct kvm *kvm, u64 n_rmap,
+				       unsigned long hpa, unsigned long mask)
 {
 	struct kvm_nested_guest *gp;
 	unsigned long gpa;
@@ -880,24 +891,35 @@ static void kvmhv_remove_nest_rmap(struct kvm *kvm, u64 n_rmap,
 		kvmppc_unmap_pte(kvm, ptep, gpa, shift, NULL, gp->shadow_lpid);
 }
 
-static void kvmhv_remove_nest_rmap_list(struct kvm *kvm, unsigned long *rmapp,
-					unsigned long hpa, unsigned long mask)
+/*
+ * called with kvm->mmu_lock held
+ * For a given list of rmap entries, invalidate the corresponding shadow ptes
+ * for nested guests which are referenced by the rmap list.
+ */
+static void kvmhv_invalidate_nest_rmap_list(struct kvm *kvm,
+					    unsigned long *rmapp,
+					    unsigned long hpa,
+					    unsigned long mask)
 {
 	struct llist_node *entry = llist_del_all((struct llist_head *) rmapp);
 	struct rmap_nested *cursor;
 	unsigned long rmap;
 
 	for_each_nest_rmap_safe(cursor, entry, &rmap) {
-		kvmhv_remove_nest_rmap(kvm, rmap, hpa, mask);
+		kvmhv_invalidate_nest_rmap(kvm, rmap, hpa, mask);
 		kfree(cursor);
 	}
 }
 
-/* called with kvm->mmu_lock held */
-void kvmhv_remove_nest_rmap_range(struct kvm *kvm,
-				  const struct kvm_memory_slot *memslot,
-				  unsigned long gpa, unsigned long hpa,
-				  unsigned long nbytes)
+/*
+ * called with kvm->mmu_lock held
+ * For a given memslot, invalidate all of the rmap entries which fall into the
+ * given range.
+ */
+void kvmhv_invalidate_nest_rmap_range(struct kvm *kvm,
+				      const struct kvm_memory_slot *memslot,
+				      unsigned long gpa, unsigned long hpa,
+				      unsigned long nbytes)
 {
 	unsigned long gfn, end_gfn;
 	unsigned long addr_mask;
@@ -912,10 +934,11 @@ void kvmhv_remove_nest_rmap_range(struct kvm *kvm,
 
 	for (; gfn < end_gfn; gfn++) {
 		unsigned long *rmap = &memslot->arch.rmap[gfn];
-		kvmhv_remove_nest_rmap_list(kvm, rmap, hpa, addr_mask);
+		kvmhv_invalidate_nest_rmap_list(kvm, rmap, hpa, addr_mask);
 	}
 }
 
+/* Free the nest rmap structures for a given memslot */
 static void kvmhv_free_memslot_nest_rmap(struct kvm_memory_slot *free)
 {
 	unsigned long page;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/23] KVM: PPC: Book3S HV: Nested: Increase gpa field in nest rmap to 46 bits
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (8 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 09/23] KVM: PPC: Book3S HV: Nested: Improve comments and naming of nest rmap functions Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 11/23] KVM: PPC: Book3S HV: Nested: Remove single nest rmap entries Suraj Jitindar Singh
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The nested rmap entries are used to track nested pages which map a given
guest page such that that information can be retrieved from the guest
memslot.

Increase the size of the gpa (guest physical address) field in the
nested rmap entry to 46 bits such that it can be reused to store a
nested hpt (hash page table) guest entry where this field will hold the
hpt index (which can be up to 46 bits).

Additionally introduce helper functions to access these bit fields for
simplicity.

No functional change.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  5 ++--
 arch/powerpc/kvm/book3s_hv_nested.c      | 41 ++++++++++++++++++++++++--------
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index bec78f15e2f5..ef6af64a4451 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -50,12 +50,13 @@ struct kvm_nested_guest {
 /*
  * We define a nested rmap entry as a single 64-bit quantity
  * 0xFFF0000000000000	12-bit lpid field
- * 0x000FFFFFFFFFF000	40-bit guest 4k page frame number
+ * 0x000FFFFFFFFFFFC0	46-bit guest page frame number (radix) or hpt index
  * 0x0000000000000001	1-bit  single entry flag
  */
 #define RMAP_NESTED_LPID_MASK		0xFFF0000000000000UL
 #define RMAP_NESTED_LPID_SHIFT		(52)
-#define RMAP_NESTED_GPA_MASK		0x000FFFFFFFFFF000UL
+#define RMAP_NESTED_GPA_MASK		0x000FFFFFFFFFFFC0UL
+#define RMAP_NESTED_GPA_SHIFT		(6)
 #define RMAP_NESTED_IS_SINGLE_ENTRY	0x0000000000000001UL
 
 /* Structure for a nested guest rmap entry */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 555b45a35fec..c6304aa949c1 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -770,10 +770,31 @@ static struct kvm_nested_guest *kvmhv_find_nested(struct kvm *kvm, int lpid)
 	return kvm->arch.nested_guests[lpid];
 }
 
-static inline bool kvmhv_n_rmap_is_equal(u64 rmap_1, u64 rmap_2)
+static inline u64 n_rmap_to_gpa(u64 rmap)
 {
-	return !((rmap_1 ^ rmap_2) & (RMAP_NESTED_LPID_MASK |
-				       RMAP_NESTED_GPA_MASK));
+	return ((rmap & RMAP_NESTED_GPA_MASK) >> RMAP_NESTED_GPA_SHIFT)
+		<< PAGE_SHIFT;
+}
+
+static inline u64 gpa_to_n_rmap(u64 gpa)
+{
+	return ((gpa >> PAGE_SHIFT) << RMAP_NESTED_GPA_SHIFT) &
+		RMAP_NESTED_GPA_MASK;
+}
+
+static inline int n_rmap_to_lpid(u64 rmap)
+{
+	return (int) ((rmap & RMAP_NESTED_LPID_MASK) >> RMAP_NESTED_LPID_SHIFT);
+}
+
+static inline u64 lpid_to_n_rmap(int lpid)
+{
+	return (((u64) lpid) << RMAP_NESTED_LPID_SHIFT) & RMAP_NESTED_LPID_MASK;
+}
+
+static inline bool kvmhv_n_rmap_is_equal(u64 rmap_1, u64 rmap_2, u64 mask)
+{
+	return !((rmap_1 ^ rmap_2) & mask);
 }
 
 /* called with kvm->mmu_lock held */
@@ -792,7 +813,8 @@ void kvmhv_insert_nest_rmap(unsigned long *rmapp, struct rmap_nested **n_rmap)
 
 	/* Do any entries match what we're trying to insert? */
 	for_each_nest_rmap_safe(cursor, entry, &rmap) {
-		if (kvmhv_n_rmap_is_equal(rmap, new_rmap))
+		if (kvmhv_n_rmap_is_equal(rmap, new_rmap, RMAP_NESTED_LPID_MASK
+							| RMAP_NESTED_GPA_MASK))
 			return;
 	}
 
@@ -822,8 +844,8 @@ static void kvmhv_update_nest_rmap_rc(struct kvm *kvm, u64 n_rmap,
 	unsigned int shift, lpid;
 	pte_t *ptep;
 
-	gpa = n_rmap & RMAP_NESTED_GPA_MASK;
-	lpid = (n_rmap & RMAP_NESTED_LPID_MASK) >> RMAP_NESTED_LPID_SHIFT;
+	gpa = n_rmap_to_gpa(n_rmap);
+	lpid = n_rmap_to_lpid(n_rmap);;
 	gp = kvmhv_find_nested(kvm, lpid);
 	if (!gp)
 		return;
@@ -878,8 +900,8 @@ static void kvmhv_invalidate_nest_rmap(struct kvm *kvm, u64 n_rmap,
 	unsigned int shift, lpid;
 	pte_t *ptep;
 
-	gpa = n_rmap & RMAP_NESTED_GPA_MASK;
-	lpid = (n_rmap & RMAP_NESTED_LPID_MASK) >> RMAP_NESTED_LPID_SHIFT;
+	gpa = n_rmap_to_gpa(n_rmap);
+	lpid = n_rmap_to_lpid(n_rmap);;
 	gp = kvmhv_find_nested(kvm, lpid);
 	if (!gp)
 		return;
@@ -1454,8 +1476,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_run *run,
 	n_rmap = kzalloc(sizeof(*n_rmap), GFP_KERNEL);
 	if (!n_rmap)
 		return RESUME_GUEST; /* Let the guest try again */
-	n_rmap->rmap = (n_gpa & RMAP_NESTED_GPA_MASK) |
-		(((unsigned long) gp->l1_lpid) << RMAP_NESTED_LPID_SHIFT);
+	n_rmap->rmap = gpa_to_n_rmap(n_gpa) | lpid_to_n_rmap(gp->l1_lpid);
 	rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
 	ret = kvmppc_create_pte(kvm, gp->shadow_pgtable, pte, n_gpa, level,
 				mmu_seq, gp->shadow_lpid, rmapp, &n_rmap);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/23] KVM: PPC: Book3S HV: Nested: Remove single nest rmap entries
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (9 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 10/23] KVM: PPC: Book3S HV: Nested: Increase gpa field in nest rmap to 46 bits Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 12/23] KVM: PPC: Book3S HV: Nested: add kvmhv_remove_all_nested_rmap_lpid() Suraj Jitindar Singh
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The nested rmap entries are used to track nested pages which map a given
guest page such that that information can be retrieved from the guest
memslot. These entries are stored in the guest memslot as a singly
linked list (llist) with the next pointer of the last entry in the list
used to store a "single entry" to save having to add another list entry.

This approach while saving a small amount of memory significantly
complicates the list handling. For simplicity and code clarity remove
the existence of these "single entries" and always insert another list
entry to hold the final value.

No functional change.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 65 ++++++++++----------------------
 arch/powerpc/kvm/book3s_hv_nested.c      | 51 ++++++++++---------------
 2 files changed, 39 insertions(+), 77 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index ef6af64a4451..410e609efd37 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -48,66 +48,39 @@ struct kvm_nested_guest {
 };
 
 /*
+ * We use nested rmap entries to store information so that we can find pte
+ * entries in the shadow page table of a nested guest when the host is modifying
+ * a pte for a l1 guest page. For a radix nested guest this is the nested
+ * gpa which can then be used to walk the radix shadow page table to find a
+ * pte. For a hash nested guest this is an index into the shadow hpt which can
+ * be used to find a pte. Irrespective the lpid of the nested guest is stored.
+ *
+ * These entries are stored in a rmap_nested struct. There may be multiple
+ * entries for a single l1 guest page since that guest may have multiple nested
+ * guests and map the same page into more than 1, or a single nested guest may
+ * map the same l1 guest page with multiple hpt entries. To accommodate this
+ * the rmap_nested entries are linked together in a singly linked list with the
+ * corresponding rmap entry in the rmap array of the memslot containing the
+ * head pointer of the linked list (or NULL if list is empty).
+ */
+
+/*
  * We define a nested rmap entry as a single 64-bit quantity
  * 0xFFF0000000000000	12-bit lpid field
  * 0x000FFFFFFFFFFFC0	46-bit guest page frame number (radix) or hpt index
- * 0x0000000000000001	1-bit  single entry flag
+ * 0x000000000000003F	6-bit unused field
  */
 #define RMAP_NESTED_LPID_MASK		0xFFF0000000000000UL
 #define RMAP_NESTED_LPID_SHIFT		(52)
 #define RMAP_NESTED_GPA_MASK		0x000FFFFFFFFFFFC0UL
 #define RMAP_NESTED_GPA_SHIFT		(6)
-#define RMAP_NESTED_IS_SINGLE_ENTRY	0x0000000000000001UL
 
 /* Structure for a nested guest rmap entry */
 struct rmap_nested {
 	struct llist_node list;
-	u64 rmap;
+	u64 rmap;			/* layout defined above */
 };
 
-/*
- * for_each_nest_rmap_safe - iterate over the list of nested rmap entries
- *			     safe against removal of the list entry or NULL list
- * @pos:	a (struct rmap_nested *) to use as a loop cursor
- * @node:	pointer to the first entry
- *		NOTE: this can be NULL
- * @rmapp:	an (unsigned long *) in which to return the rmap entries on each
- *		iteration
- *		NOTE: this must point to already allocated memory
- *
- * The nested_rmap is a llist of (struct rmap_nested) entries pointed to by the
- * rmap entry in the memslot. The list is always terminated by a "single entry"
- * stored in the list element of the final entry of the llist. If there is ONLY
- * a single entry then this is itself in the rmap entry of the memslot, not a
- * llist head pointer.
- *
- * Note that the iterator below assumes that a nested rmap entry is always
- * non-zero.  This is true for our usage because the LPID field is always
- * non-zero (zero is reserved for the host).
- *
- * This should be used to iterate over the list of rmap_nested entries with
- * processing done on the u64 rmap value given by each iteration. This is safe
- * against removal of list entries and it is always safe to call free on (pos).
- *
- * e.g.
- * struct rmap_nested *cursor;
- * struct llist_node *first;
- * unsigned long rmap;
- * for_each_nest_rmap_safe(cursor, first, &rmap) {
- *	do_something(rmap);
- *	free(cursor);
- * }
- */
-#define for_each_nest_rmap_safe(pos, node, rmapp)			       \
-	for ((pos) = llist_entry((node), typeof(*(pos)), list);		       \
-	     (node) &&							       \
-	     (*(rmapp) = ((RMAP_NESTED_IS_SINGLE_ENTRY & ((u64) (node))) ?     \
-			  ((u64) (node)) : ((pos)->rmap))) &&		       \
-	     (((node) = ((RMAP_NESTED_IS_SINGLE_ENTRY & ((u64) (node))) ?      \
-			 ((struct llist_node *) ((pos) = NULL)) :	       \
-			 (pos)->list.next)), true);			       \
-	     (pos) = llist_entry((node), typeof(*(pos)), list))
-
 struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid,
 					  bool create);
 void kvmhv_put_nested(struct kvm_nested_guest *gp);
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index c6304aa949c1..c76e499437ee 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -800,31 +800,20 @@ static inline bool kvmhv_n_rmap_is_equal(u64 rmap_1, u64 rmap_2, u64 mask)
 /* called with kvm->mmu_lock held */
 void kvmhv_insert_nest_rmap(unsigned long *rmapp, struct rmap_nested **n_rmap)
 {
-	struct llist_node *entry = ((struct llist_head *) rmapp)->first;
+	struct llist_head *head = (struct llist_head *) rmapp;
 	struct rmap_nested *cursor;
-	u64 rmap, new_rmap = (*n_rmap)->rmap;
+	u64 new_rmap = (*n_rmap)->rmap;
 
-	/* Are there any existing entries? */
-	if (!(*rmapp)) {
-		/* No -> use the rmap as a single entry */
-		*rmapp = new_rmap | RMAP_NESTED_IS_SINGLE_ENTRY;
-		return;
-	}
-
-	/* Do any entries match what we're trying to insert? */
-	for_each_nest_rmap_safe(cursor, entry, &rmap) {
-		if (kvmhv_n_rmap_is_equal(rmap, new_rmap, RMAP_NESTED_LPID_MASK
-							| RMAP_NESTED_GPA_MASK))
+	/* Do any existing entries match what we're trying to insert? */
+	llist_for_each_entry(cursor, head->first, list) {
+		if (kvmhv_n_rmap_is_equal(cursor->rmap, new_rmap,
+					  RMAP_NESTED_LPID_MASK |
+					  RMAP_NESTED_GPA_MASK))
 			return;
 	}
 
-	/* Do we need to create a list or just add the new entry? */
-	rmap = *rmapp;
-	if (rmap & RMAP_NESTED_IS_SINGLE_ENTRY) /* Not previously a list */
-		*rmapp = 0UL;
-	llist_add(&((*n_rmap)->list), (struct llist_head *) rmapp);
-	if (rmap & RMAP_NESTED_IS_SINGLE_ENTRY) /* Not previously a list */
-		(*n_rmap)->list.next = (struct llist_node *) rmap;
+	/* Insert the new entry */
+	llist_add(&((*n_rmap)->list), head);
 
 	/* Set NULL so not freed by caller */
 	*n_rmap = NULL;
@@ -874,9 +863,9 @@ void kvmhv_update_nest_rmap_rc_list(struct kvm *kvm, unsigned long *rmapp,
 				    unsigned long clr, unsigned long set,
 				    unsigned long hpa, unsigned long nbytes)
 {
-	struct llist_node *entry = ((struct llist_head *) rmapp)->first;
+	struct llist_head *head = (struct llist_head *) rmapp;
 	struct rmap_nested *cursor;
-	unsigned long rmap, mask;
+	unsigned long mask;
 
 	if ((clr | set) & ~(_PAGE_DIRTY | _PAGE_ACCESSED))
 		return;
@@ -884,8 +873,9 @@ void kvmhv_update_nest_rmap_rc_list(struct kvm *kvm, unsigned long *rmapp,
 	mask = PTE_RPN_MASK & ~(nbytes - 1);
 	hpa &= mask;
 
-	for_each_nest_rmap_safe(cursor, entry, &rmap)
-		kvmhv_update_nest_rmap_rc(kvm, rmap, clr, set, hpa, mask);
+	llist_for_each_entry(cursor, head->first, list)
+		kvmhv_update_nest_rmap_rc(kvm, cursor->rmap, clr, set, hpa,
+					  mask);
 }
 
 /*
@@ -924,11 +914,10 @@ static void kvmhv_invalidate_nest_rmap_list(struct kvm *kvm,
 					    unsigned long mask)
 {
 	struct llist_node *entry = llist_del_all((struct llist_head *) rmapp);
-	struct rmap_nested *cursor;
-	unsigned long rmap;
+	struct rmap_nested *cursor, *next;
 
-	for_each_nest_rmap_safe(cursor, entry, &rmap) {
-		kvmhv_invalidate_nest_rmap(kvm, rmap, hpa, mask);
+	llist_for_each_entry_safe(cursor, next, entry, list) {
+		kvmhv_invalidate_nest_rmap(kvm, cursor->rmap, hpa, mask);
 		kfree(cursor);
 	}
 }
@@ -966,12 +955,12 @@ static void kvmhv_free_memslot_nest_rmap(struct kvm_memory_slot *free)
 	unsigned long page;
 
 	for (page = 0; page < free->npages; page++) {
-		unsigned long rmap, *rmapp = &free->arch.rmap[page];
-		struct rmap_nested *cursor;
+		unsigned long *rmapp = &free->arch.rmap[page];
+		struct rmap_nested *cursor, *next;
 		struct llist_node *entry;
 
 		entry = llist_del_all((struct llist_head *) rmapp);
-		for_each_nest_rmap_safe(cursor, entry, &rmap)
+		llist_for_each_entry_safe(cursor, next, entry, list)
 			kfree(cursor);
 	}
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/23] KVM: PPC: Book3S HV: Nested: add kvmhv_remove_all_nested_rmap_lpid()
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (10 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 11/23] KVM: PPC: Book3S HV: Nested: Remove single nest rmap entries Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-08-26  6:20 ` [PATCH 13/23] KVM: PPC: Book3S HV: Nested: Infrastructure for nested hpt guest setup Suraj Jitindar Singh
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Nested rmap entries are used to store a reverse mapping from a L1 guest
page in a kvm memslot to a nested guest pte.

Implement a function to remove all nest rmap entries of a given lpid
from all of the memslots for a given guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv_nested.c | 47 +++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index c76e499437ee..58a5de2aa2af 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -819,6 +819,53 @@ void kvmhv_insert_nest_rmap(unsigned long *rmapp, struct rmap_nested **n_rmap)
 	*n_rmap = NULL;
 }
 
+/* called with kvm->mmu_lock held */
+static void kvmhv_remove_nested_rmap_lpid(unsigned long *rmapp, int l1_lpid)
+{
+	struct llist_node **next = &(((struct llist_head *) rmapp)->first);
+	u64 match = lpid_to_n_rmap(l1_lpid);
+
+	while (*next) {
+		struct llist_node *entry = (*next);
+		struct rmap_nested *n_rmap = llist_entry(entry, typeof(*n_rmap),
+							 list);
+
+		if (kvmhv_n_rmap_is_equal(match, n_rmap->rmap,
+					  RMAP_NESTED_LPID_MASK)) {
+			*next = entry->next;
+			kfree(n_rmap);
+		} else {
+			next = &(entry->next);
+		}
+	}
+}
+
+/*
+ * caller must hold gp->tlb_lock
+ * For a given nested lpid, remove all of the rmap entries which match that
+ * nest lpid. Note that no invalidation/tlbie is done for the entries, it is
+ * assumed that the caller will perform an lpid wide invalidation after calling
+ * this function.
+ */
+static void kvmhv_remove_all_nested_rmap_lpid(struct kvm *kvm, int l1_lpid)
+{
+	struct kvm_memory_slot *memslot;
+
+	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+		unsigned long page;
+
+		for (page = 0; page < memslot->npages; page++) {
+			unsigned long *rmapp;
+
+			spin_lock(&kvm->mmu_lock);
+			rmapp = &memslot->arch.rmap[page];
+			if (*rmapp) /* Are there any rmap entries? */
+				kvmhv_remove_nested_rmap_lpid(rmapp, l1_lpid);
+			spin_unlock(&kvm->mmu_lock);
+		}
+	}
+}
+
 /*
  * called with kvm->mmu_lock held
  * Given a single rmap entry, update the rc bits in the corresponding shadow
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/23] KVM: PPC: Book3S HV: Nested: Infrastructure for nested hpt guest setup
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (11 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 12/23] KVM: PPC: Book3S HV: Nested: add kvmhv_remove_all_nested_rmap_lpid() Suraj Jitindar Singh
@ 2019-08-26  6:20 ` Suraj Jitindar Singh
  2019-10-24  3:43   ` Paul Mackerras
  2019-08-26  6:21 ` [PATCH 14/23] KVM: PPC: Book3S HV: Nested: Context switch slb for nested hpt guest Suraj Jitindar Singh
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Add the infrastructure to book3s_hv_nested.c to allow a nested hpt (hash
page table) guest to be setup. As this patch doesn't add the capability
of creating or removing mmu translations return H_PARAMETER when an
attempt to actually run a nested hpt guest is made.

Add fields to the nested guest struct to store the hpt and the vrma slb
entry.

Update kvmhv_update_ptbl_cache() to determine when a nested guest is
switching from radix to hpt or hpt to radix and perform the required
setup. A page table (radix) or hpt (hash) must be allocated with any
existing table being freed and the radix field in the nested guest
struct being updated under the mmu_lock (this means that when holding
the mmu_lock the radix field can be tested and the existance of the
correct type of page table guaranteed). Also remove all of the nest rmap
entries which belong to this nested guest since a nested rmap entry is
specific to whether the nested guest is hash or radix.

When a nested guest is initially created or when the partition table
entry is empty we assume a radix guest since it is much less expensive
to allocate a radix page table compared to a hpt.

The hpt which is allocated in the hypervisor for the nested guest
(called the shadow hpt) is identical in size to the one allocated in the
guest hypervisor to ensure a 1-to-1 mapping between page table entries.
This simplifies handling of the entries however this requirement could
be relaxed in future if support was added.

Introduce a hash nested_page_fault function to be envoked when the
nested guest which experiences a page fault is hash, returns -EINVAL for
now. Also return -EINVAL when handling the H_TLB_INVALIDATE hcall. Also
lacking support for the hypervisor paging out a guest page which has
been mapped through to a nested guest. These 3 portions of functionality
added in proceeding patches.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   2 +
 arch/powerpc/include/asm/book3s/64/mmu.h      |   9 +
 arch/powerpc/include/asm/kvm_book3s_64.h      |   4 +-
 arch/powerpc/kvm/book3s_hv_nested.c           | 255 ++++++++++++++++++++++----
 4 files changed, 235 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 15b75005bc34..c04e37b2c30d 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -45,12 +45,14 @@
 #define SLB_VSID_KP		ASM_CONST(0x0000000000000400)
 #define SLB_VSID_N		ASM_CONST(0x0000000000000200) /* no-execute */
 #define SLB_VSID_L		ASM_CONST(0x0000000000000100)
+#define SLB_VSID_L_SHIFT	8
 #define SLB_VSID_C		ASM_CONST(0x0000000000000080) /* class */
 #define SLB_VSID_LP		ASM_CONST(0x0000000000000030)
 #define SLB_VSID_LP_00		ASM_CONST(0x0000000000000000)
 #define SLB_VSID_LP_01		ASM_CONST(0x0000000000000010)
 #define SLB_VSID_LP_10		ASM_CONST(0x0000000000000020)
 #define SLB_VSID_LP_11		ASM_CONST(0x0000000000000030)
+#define SLB_VSID_LP_SHIFT	4
 #define SLB_VSID_LLP		(SLB_VSID_L|SLB_VSID_LP)
 
 #define SLB_VSID_KERNEL		(SLB_VSID_KP)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 23b83d3593e2..8c02e40f1125 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -62,6 +62,7 @@ struct patb_entry {
 extern struct patb_entry *partition_tb;
 
 /* Bits in patb0 field */
+/* Radix */
 #define PATB_HR		(1UL << 63)
 #define RPDB_MASK	0x0fffffffffffff00UL
 #define RPDB_SHIFT	(1UL << 8)
@@ -70,6 +71,14 @@ extern struct patb_entry *partition_tb;
 #define RTS2_SHIFT	5		/* bottom 3 bits of radix tree size */
 #define RTS2_MASK	(7UL << RTS2_SHIFT)
 #define RPDS_MASK	0x1f		/* root page dir. size field */
+/* Hash */
+#define PATB_HTABORG	0x0ffffffffffc0000UL	/* hpt base */
+#define PATB_PS		0xe0			/* page size */
+#define PATB_PS_L	0x80
+#define PATB_PS_L_SHIFT	7
+#define PATB_PS_LP	0x60
+#define PATB_PS_LP_SHIFT	5
+#define PATB_HTABSIZE	0x1f			/* hpt size */
 
 /* Bits in patb1 field */
 #define PATB_GR		(1UL << 63)	/* guest uses radix; must match HR */
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 410e609efd37..c874ab3a037e 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -35,7 +35,9 @@ struct kvm_nested_guest {
 	struct kvm *l1_host;		/* L1 VM that owns this nested guest */
 	int l1_lpid;			/* lpid L1 guest thinks this guest is */
 	int shadow_lpid;		/* real lpid of this nested guest */
-	pgd_t *shadow_pgtable;		/* our page table for this guest */
+	pgd_t *shadow_pgtable;		/* page table for this guest if radix */
+	struct kvm_hpt_info shadow_hpt;	/* hpt for this guest if hash */
+	u64 vrma_slb_v;			/* vrma slb for this guest if hash */
 	u64 l1_gr_to_hr;		/* L1's addr of part'n-scoped table */
 	u64 process_table;		/* process table entry for this guest */
 	long refcnt;			/* number of pointers to this struct */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 58a5de2aa2af..82690eafee77 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -23,6 +23,7 @@
 static struct patb_entry *pseries_partition_tb;
 
 static void kvmhv_update_ptbl_cache(struct kvm_nested_guest *gp);
+static void kvmhv_remove_all_nested_rmap_lpid(struct kvm *kvm, int lpid);
 static void kvmhv_free_memslot_nest_rmap(struct kvm_memory_slot *free);
 
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
@@ -247,6 +248,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	s64 delta_purr, delta_spurr, delta_ic, delta_vtb;
 	u64 mask;
 	unsigned long lpcr;
+	u8 radix;
 
 	if (vcpu->kvm->arch.l1_ptcr == 0)
 		return H_NOT_AVAILABLE;
@@ -282,6 +284,25 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		mutex_unlock(&l2->tlb_lock);
 	}
 
+	mutex_lock(&l2->tlb_lock);
+	radix = l2->radix;
+	mutex_unlock(&l2->tlb_lock);
+	/* some lpcr sanity checking */
+	if (radix) {
+		/* radix requires gtse and uprt */
+		if ((~l2_hv.lpcr & LPCR_HR) || (~l2_hv.lpcr & LPCR_GTSE) ||
+					       (~l2_hv.lpcr & LPCR_UPRT) ||
+					       (l2_hv.lpcr & LPCR_VPM1))
+			return H_PARAMETER;
+	} else {
+		return H_PARAMETER;
+		/* hpt doesn't support gtse or uprt and required vpm */
+		if ((l2_hv.lpcr & LPCR_HR) || (l2_hv.lpcr & LPCR_GTSE) ||
+					      (l2_hv.lpcr & LPCR_UPRT) ||
+					      (~l2_hv.lpcr & LPCR_VPM1))
+			return H_PARAMETER;
+	}
+
 	/* save l1 values of things */
 	vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
 	saved_l1_regs = vcpu->arch.regs;
@@ -297,7 +318,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs = l2_regs;
 	vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
 	mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
-		LPCR_LPES | LPCR_MER;
+		LPCR_LPES | LPCR_MER | LPCR_HR | LPCR_GTSE | LPCR_UPRT |
+		LPCR_VPM1;
 	lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
 	sanitise_hv_regs(vcpu, &l2_hv);
 	restore_hv_regs(vcpu, &l2_hv);
@@ -413,16 +435,26 @@ void kvmhv_nested_exit(void)
 	}
 }
 
-static void kvmhv_flush_lpid(unsigned int lpid)
+/*
+ * Flushes the partition scoped translations of a given lpid.
+ */
+static void kvmhv_flush_lpid(unsigned int lpid, bool radix)
 {
 	long rc;
 
 	if (!kvmhv_on_pseries()) {
-		radix__flush_tlb_lpid(lpid);
+		if (radix) {
+			radix__flush_tlb_lpid(lpid);
+		} else {
+			asm volatile("ptesync": : :"memory");
+			asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
+				     "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
+			asm volatile("eieio; tlbsync; ptesync": : :"memory");
+		}
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
+	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, radix),
 				lpid, TLBIEL_INVAL_SET_LPID);
 	if (rc)
 		pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
@@ -430,23 +462,43 @@ static void kvmhv_flush_lpid(unsigned int lpid)
 
 void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1)
 {
+	bool radix;
+
 	if (!kvmhv_on_pseries()) {
 		mmu_partition_table_set_entry(lpid, dw0, dw1);
 		return;
 	}
 
+	/* radix flag based on old entry */
+	radix = !!(be64_to_cpu(pseries_partition_tb[lpid].patb0) & PATB_HR);
 	pseries_partition_tb[lpid].patb0 = cpu_to_be64(dw0);
 	pseries_partition_tb[lpid].patb1 = cpu_to_be64(dw1);
 	/* L0 will do the necessary barriers */
-	kvmhv_flush_lpid(lpid);
+	kvmhv_flush_lpid(lpid, radix);
+}
+
+static inline int kvmhv_patb_get_hpt_order(u64 patb0)
+{
+	return (patb0 & PATB_HTABSIZE) + 18;
+}
+
+static inline u64 kvmhv_patb_get_htab_size(int order)
+{
+	return (order - 18) & PATB_HTABSIZE;
 }
 
 static void kvmhv_set_nested_ptbl(struct kvm_nested_guest *gp)
 {
 	unsigned long dw0;
 
-	dw0 = PATB_HR | radix__get_tree_size() |
-		__pa(gp->shadow_pgtable) | RADIX_PGD_INDEX_SIZE;
+	if (gp->radix) {
+		dw0 = PATB_HR | radix__get_tree_size() |
+			__pa(gp->shadow_pgtable) | RADIX_PGD_INDEX_SIZE;
+	} else {
+		dw0 = (PATB_HTABORG & __pa(gp->shadow_hpt.virt)) |
+			(PATB_PS & gp->l1_gr_to_hr) |
+			kvmhv_patb_get_htab_size(gp->shadow_hpt.order);
+	}
 	kvmhv_set_ptbl_entry(gp->shadow_lpid, dw0, gp->process_table);
 }
 
@@ -521,6 +573,15 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu)
 
 	mutex_lock(&gp->tlb_lock);
 
+	if (!gp->radix) {
+		/*
+		 * Currently quadrants are the only way to read nested guest
+		 * memory, which is only valid for a radix guest.
+		 */
+		rc = H_PARAMETER;
+		goto out_unlock;
+	}
+
 	if (is_load) {
 		/* Load from the nested guest into our buffer */
 		rc = __kvmhv_copy_tofrom_guest_radix(gp->shadow_lpid, pid,
@@ -556,6 +617,69 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu)
 	goto out_unlock;
 }
 
+/* Caller must hold gp->tlb_lock */
+static int kvmhv_switch_to_radix_nested(struct kvm_nested_guest *gp)
+{
+	struct kvm *kvm = gp->l1_host;
+	pgd_t *pgtable;
+
+	/* try to allocate a radix tree */
+	pgtable = pgd_alloc(kvm->mm);
+	if (!pgtable) {
+		pr_err_ratelimited("KVM: Couldn't alloc nested radix tree\n");
+		return -ENOMEM;
+	}
+
+	/* mmu_lock protects shadow_hpt & radix in nested guest struct */
+	spin_lock(&kvm->mmu_lock);
+	kvmppc_free_hpt(&gp->shadow_hpt);
+	gp->radix = 1;
+	gp->shadow_pgtable = pgtable;
+	spin_unlock(&kvm->mmu_lock);
+
+	/* remove all nested rmap entries and perform global invalidation */
+	kvmhv_remove_all_nested_rmap_lpid(kvm, gp->l1_lpid);
+	kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);
+
+	return 0;
+}
+
+/* Caller must hold gp->tlb_lock */
+static int kvmhv_switch_to_hpt_nested(struct kvm_nested_guest *gp, int order)
+{
+	struct kvm *kvm = gp->l1_host;
+	struct kvm_hpt_info info;
+	int rc;
+
+	/* try to allocate an hpt */
+	rc = kvmppc_allocate_hpt(&info, order);
+	if (rc) {
+		pr_err_ratelimited("KVM: Couldn't alloc nested hpt\n");
+		return rc;
+	}
+
+	/* mmu_lock protects shadow_pgtable & radix in nested guest struct */
+	spin_lock(&kvm->mmu_lock);
+	kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable, gp->shadow_lpid);
+	pgd_free(kvm->mm, gp->shadow_pgtable);
+	gp->shadow_pgtable = NULL;
+	gp->radix = 0;
+	gp->shadow_hpt = info;
+	spin_unlock(&kvm->mmu_lock);
+
+	/* remove all nested rmap entries and perform global invalidation */
+	kvmhv_remove_all_nested_rmap_lpid(kvm, gp->l1_lpid);
+	kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);
+
+	return 0;
+}
+
+static inline u64 kvmhv_patb_ps_to_slb_llp(u64 patb)
+{
+	return (((patb & PATB_PS_L) >> PATB_PS_L_SHIFT) << SLB_VSID_L_SHIFT) |
+	       (((patb & PATB_PS_LP) >> PATB_PS_LP_SHIFT) << SLB_VSID_LP_SHIFT);
+}
+
 /*
  * Reload the partition table entry for a guest.
  * Caller must hold gp->tlb_lock.
@@ -567,23 +691,48 @@ static void kvmhv_update_ptbl_cache(struct kvm_nested_guest *gp)
 	unsigned long ptbl_addr;
 	struct kvm *kvm = gp->l1_host;
 
+	gp->l1_gr_to_hr = 0;
+	gp->process_table = 0;
 	ret = -EFAULT;
 	ptbl_addr = (kvm->arch.l1_ptcr & PRTB_MASK) + (gp->l1_lpid << 4);
 	if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 8)))
 		ret = kvm_read_guest(kvm, ptbl_addr,
 				     &ptbl_entry, sizeof(ptbl_entry));
-	if (ret) {
-		gp->l1_gr_to_hr = 0;
-		gp->process_table = 0;
-	} else {
-		gp->l1_gr_to_hr = be64_to_cpu(ptbl_entry.patb0);
-		gp->process_table = be64_to_cpu(ptbl_entry.patb1);
+	if (!ret) {
+		u64 patb0 = be64_to_cpu(ptbl_entry.patb0);
+		u64 process_table = be64_to_cpu(ptbl_entry.patb1);
+
+		if (patb0) {
+			bool radix = !!(patb0 & PATB_HR);
+
+			if (radix && !gp->radix)
+				ret = kvmhv_switch_to_radix_nested(gp);
+			else if (!radix && gp->radix)
+				ret = kvmhv_switch_to_hpt_nested(gp,
+					kvmhv_patb_get_hpt_order(patb0));
+			if (!ret) {
+				gp->l1_gr_to_hr = patb0;
+				gp->process_table = process_table;
+				if (!radix) { /* update vrma slb_v */
+					u64 senc;
+
+					senc = kvmhv_patb_ps_to_slb_llp(patb0);
+					gp->vrma_slb_v = senc | SLB_VSID_B_1T |
+						(VRMA_VSID << SLB_VSID_SHIFT_1T);
+				}
+			}
+		}
 	}
 	kvmhv_set_nested_ptbl(gp);
 }
 
 struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int lpid)
 {
+	/*
+	 * Allocate the state for a nested guest.
+	 * Note: assume radix to avoid allocating a hpt when not necessary as
+	 * this can consume a large amount of contiguous memory in the host.
+	 */
 	struct kvm_nested_guest *gp;
 	long shadow_lpid;
 
@@ -620,15 +769,17 @@ static void kvmhv_release_nested(struct kvm_nested_guest *gp)
 {
 	struct kvm *kvm = gp->l1_host;
 
-	if (gp->shadow_pgtable) {
-		/*
-		 * No vcpu is using this struct and no call to
-		 * kvmhv_get_nested can find this struct,
-		 * so we don't need to hold kvm->mmu_lock.
-		 */
+	/*
+	 * No vcpu is using this struct and no call to
+	 * kvmhv_get_nested can find this struct,
+	 * so we don't need to hold kvm->mmu_lock.
+	 */
+	if (gp->radix && gp->shadow_pgtable) {
 		kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable,
 					  gp->shadow_lpid);
 		pgd_free(kvm->mm, gp->shadow_pgtable);
+	} else if ((!gp->radix) && gp->shadow_hpt.virt) {
+		kvmppc_free_hpt(&gp->shadow_hpt);
 	}
 	kvmhv_set_ptbl_entry(gp->shadow_lpid, 0, 0);
 	kvmppc_free_lpid(gp->shadow_lpid);
@@ -701,9 +852,18 @@ static void kvmhv_flush_nested(struct kvm_nested_guest *gp)
 	struct kvm *kvm = gp->l1_host;
 
 	spin_lock(&kvm->mmu_lock);
-	kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable, gp->shadow_lpid);
+	if (gp->radix) {
+		kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable,
+					  gp->shadow_lpid);
+	} else {
+		memset((void *) gp->shadow_hpt.virt, 0,
+			1UL << gp->shadow_hpt.order);
+		memset((void *) gp->shadow_hpt.rev, 0,
+			(1UL << (gp->shadow_hpt.order - 4)) *
+			sizeof(struct revmap_entry));
+	}
 	spin_unlock(&kvm->mmu_lock);
-	kvmhv_flush_lpid(gp->shadow_lpid);
+	kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);
 	kvmhv_update_ptbl_cache(gp);
 	if (gp->l1_gr_to_hr == 0)
 		kvmhv_remove_nested(gp);
@@ -887,7 +1047,10 @@ static void kvmhv_update_nest_rmap_rc(struct kvm *kvm, u64 n_rmap,
 		return;
 
 	/* Find the pte */
-	ptep = __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift);
+	if (gp->radix)
+		ptep = __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift);
+	else
+		ptep = NULL;	/* XXX TODO */
 	/*
 	 * If the pte is present and the pfn is still the same, update the pte.
 	 * If the pfn has changed then this is a stale rmap entry, the nested
@@ -944,7 +1107,10 @@ static void kvmhv_invalidate_nest_rmap(struct kvm *kvm, u64 n_rmap,
 		return;
 
 	/* Find and invalidate the pte */
-	ptep = __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift);
+	if (gp->radix)
+		ptep = __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift);
+	else
+		ptep = NULL;	/* XXX TODO */
 	/* Don't spuriously invalidate ptes if the pfn has changed */
 	if (ptep && pte_present(*ptep) && ((pte_val(*ptep) & mask) == hpa))
 		kvmppc_unmap_pte(kvm, ptep, gpa, shift, NULL, gp->shadow_lpid);
@@ -1012,9 +1178,9 @@ static void kvmhv_free_memslot_nest_rmap(struct kvm_memory_slot *free)
 	}
 }
 
-static bool kvmhv_invalidate_shadow_pte(struct kvm_vcpu *vcpu,
-					struct kvm_nested_guest *gp,
-					long gpa, int *shift_ret)
+static bool kvmhv_invalidate_shadow_pte_radix(struct kvm_vcpu *vcpu,
+					      struct kvm_nested_guest *gp,
+					      long gpa, int *shift_ret)
 {
 	struct kvm *kvm = vcpu->kvm;
 	bool ret = false;
@@ -1079,6 +1245,7 @@ static int kvmhv_emulate_tlbie_tlb_addr(struct kvm_vcpu *vcpu, int lpid,
 	long npages;
 	int shift, shadow_shift;
 	unsigned long addr;
+	int rc = 0;
 
 	shift = ap_to_shift(ap);
 	addr = epn << 12;
@@ -1094,17 +1261,25 @@ static int kvmhv_emulate_tlbie_tlb_addr(struct kvm_vcpu *vcpu, int lpid,
 		return 0;
 	mutex_lock(&gp->tlb_lock);
 
+	/* XXX TODO hpt */
+	if (!gp->radix) {
+		rc = -EINVAL;
+		goto out_unlock;
+	}
+
 	/* There may be more than one host page backing this single guest pte */
 	do {
-		kvmhv_invalidate_shadow_pte(vcpu, gp, addr, &shadow_shift);
+		kvmhv_invalidate_shadow_pte_radix(vcpu, gp, addr,
+						  &shadow_shift);
 
 		npages -= 1UL << (shadow_shift - PAGE_SHIFT);
 		addr += 1UL << shadow_shift;
 	} while (npages > 0);
 
+out_unlock:
 	mutex_unlock(&gp->tlb_lock);
 	kvmhv_put_nested(gp);
-	return 0;
+	return rc;
 }
 
 static void kvmhv_emulate_tlbie_lpid(struct kvm_vcpu *vcpu,
@@ -1112,6 +1287,7 @@ static void kvmhv_emulate_tlbie_lpid(struct kvm_vcpu *vcpu,
 {
 	struct kvm *kvm = vcpu->kvm;
 
+	/* XXX TODO hpt */
 	mutex_lock(&gp->tlb_lock);
 	switch (ric) {
 	case 0:
@@ -1119,8 +1295,8 @@ static void kvmhv_emulate_tlbie_lpid(struct kvm_vcpu *vcpu,
 		spin_lock(&kvm->mmu_lock);
 		kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable,
 					  gp->shadow_lpid);
-		kvmhv_flush_lpid(gp->shadow_lpid);
 		spin_unlock(&kvm->mmu_lock);
+		kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);
 		break;
 	case 1:
 		/*
@@ -1358,9 +1534,9 @@ static inline int kvmppc_radix_shift_to_level(int shift)
 }
 
 /* called with gp->tlb_lock held */
-static long int __kvmhv_nested_page_fault(struct kvm_run *run,
-					  struct kvm_vcpu *vcpu,
-					  struct kvm_nested_guest *gp)
+static long int __kvmhv_nested_page_fault_radix(struct kvm_run *run,
+						struct kvm_vcpu *vcpu,
+						struct kvm_nested_guest *gp)
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_memory_slot *memslot;
@@ -1524,17 +1700,28 @@ static long int __kvmhv_nested_page_fault(struct kvm_run *run,
 	return ret;
 
  inval:
-	kvmhv_invalidate_shadow_pte(vcpu, gp, n_gpa, NULL);
+	kvmhv_invalidate_shadow_pte_radix(vcpu, gp, n_gpa, NULL);
 	return RESUME_GUEST;
 }
 
+/* called with gp->tlb_lock held */
+static long int __kvmhv_nested_page_fault_hash(struct kvm_run *run,
+					       struct kvm_vcpu *vcpu,
+					       struct kvm_nested_guest *gp)
+{
+	return -EINVAL;
+}
+
 long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu)
 {
 	struct kvm_nested_guest *gp = vcpu->arch.nested;
 	long int ret;
 
 	mutex_lock(&gp->tlb_lock);
-	ret = __kvmhv_nested_page_fault(run, vcpu, gp);
+	if (gp->radix)
+		ret = __kvmhv_nested_page_fault_radix(run, vcpu, gp);
+	else
+		ret = __kvmhv_nested_page_fault_hash(run, vcpu, gp);
 	mutex_unlock(&gp->tlb_lock);
 	return ret;
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/23] KVM: PPC: Book3S HV: Nested: Context switch slb for nested hpt guest
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (12 preceding siblings ...)
  2019-08-26  6:20 ` [PATCH 13/23] KVM: PPC: Book3S HV: Nested: Infrastructure for nested hpt guest setup Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-10-24  4:48   ` Paul Mackerras
  2019-08-26  6:21 ` [PATCH 15/23] KVM: PPC: Book3S HV: Store lpcr and hdec_exp in the vcpu struct Suraj Jitindar Singh
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

A version 2 of the H_ENTER_NESTED hcall was added with an argument to
specify the slb entries which should be used to run the nested guest.

Add support for this version of the hcall structures to
kvmhv_enter_nested_guest() and context switch the slb when the nested
guest being run is a hpt (hash page table) guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv_nested.c | 84 ++++++++++++++++++++++++++++++++++---
 1 file changed, 79 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 82690eafee77..883f8896ed60 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -104,6 +104,28 @@ static void byteswap_hv_regs(struct hv_guest_state *hr)
 	hr->ppr = swab64(hr->ppr);
 }
 
+static void byteswap_guest_slb(struct guest_slb *slbp)
+{
+	int i;
+
+	for (i = 0; i < 64; i++) {
+		slbp->slb[i].esid = swab64(slbp->slb[i].esid);
+		slbp->slb[i].vsid = swab64(slbp->slb[i].vsid);
+		slbp->slb[i].orige = swab64(slbp->slb[i].orige);
+		slbp->slb[i].origv = swab64(slbp->slb[i].origv);
+		slbp->slb[i].valid = swab32(slbp->slb[i].valid);
+		slbp->slb[i].Ks = swab32(slbp->slb[i].Ks);
+		slbp->slb[i].Kp = swab32(slbp->slb[i].Kp);
+		slbp->slb[i].nx = swab32(slbp->slb[i].nx);
+		slbp->slb[i].large = swab32(slbp->slb[i].large);
+		slbp->slb[i].tb = swab32(slbp->slb[i].tb);
+		slbp->slb[i].class = swab32(slbp->slb[i].class);
+		/* base_page_size is u8 thus no need to byteswap */
+	}
+	slbp->slb_max = swab64(slbp->slb_max);
+	slbp->slb_nr = swab64(slbp->slb_nr);
+}
+
 static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
 				 struct hv_guest_state *hr)
 {
@@ -238,12 +260,13 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu *vcpu, u64 regs_ptr)
 
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 {
-	long int err, r;
+	long int err, r, ret = H_SUCCESS;
 	struct kvm_nested_guest *l2;
 	struct pt_regs l2_regs, saved_l1_regs;
 	struct hv_guest_state l2_hv, saved_l1_hv;
+	struct guest_slb *l2_slb = NULL, *saved_l1_slb = NULL;
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
-	u64 hv_ptr, regs_ptr;
+	u64 hv_ptr, regs_ptr, slb_ptr = 0UL;
 	u64 hdec_exp;
 	s64 delta_purr, delta_spurr, delta_ic, delta_vtb;
 	u64 mask;
@@ -261,7 +284,9 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		return H_PARAMETER;
 	if (kvmppc_need_byteswap(vcpu))
 		byteswap_hv_regs(&l2_hv);
-	if (l2_hv.version != 1)
+	/* Do we support the guest version of the argument structures */
+	if ((l2_hv.version > HV_GUEST_STATE_MAX_VERSION) ||
+			(l2_hv.version < HV_GUEST_STATE_MIN_VERSION))
 		return H_P2;
 
 	regs_ptr = kvmppc_get_gpr(vcpu, 5);
@@ -296,6 +321,9 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 			return H_PARAMETER;
 	} else {
 		return H_PARAMETER;
+		/* must be at least V2 to support hpt guest */
+		if (l2_hv.version < 2)
+			return H_PARAMETER;
 		/* hpt doesn't support gtse or uprt and required vpm */
 		if ((l2_hv.lpcr & LPCR_HR) || (l2_hv.lpcr & LPCR_GTSE) ||
 					      (l2_hv.lpcr & LPCR_UPRT) ||
@@ -307,6 +335,26 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
 	saved_l1_regs = vcpu->arch.regs;
 	kvmhv_save_hv_regs(vcpu, &saved_l1_hv);
+	/* if running hpt then context switch the slb in the vcpu struct */
+	if (!radix) {
+		slb_ptr = kvmppc_get_gpr(vcpu, 6);
+		l2_slb = kzalloc(sizeof(*l2_slb), GFP_KERNEL);
+		saved_l1_slb = kzalloc(sizeof(*saved_l1_slb), GFP_KERNEL);
+
+		if ((!l2_slb) || (!saved_l1_slb)) {
+			ret = H_HARDWARE;
+			goto out_free;
+		}
+		err = kvm_vcpu_read_guest(vcpu, slb_ptr, l2_slb,
+					  sizeof(struct guest_slb));
+		if (err) {
+			ret = H_PARAMETER;
+			goto out_free;
+		}
+		if (kvmppc_need_byteswap(vcpu))
+			byteswap_guest_slb(l2_slb);
+		kvmhv_save_guest_slb(vcpu, saved_l1_slb);
+	}
 
 	/* convert TB values/offsets to host (L0) values */
 	hdec_exp = l2_hv.hdec_expiry - vc->tb_offset;
@@ -323,6 +371,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
 	sanitise_hv_regs(vcpu, &l2_hv);
 	restore_hv_regs(vcpu, &l2_hv);
+	if (!radix)
+		kvmhv_restore_guest_slb(vcpu, l2_slb);
 
 	vcpu->arch.ret = RESUME_GUEST;
 	vcpu->arch.trap = 0;
@@ -332,8 +382,11 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 			r = RESUME_HOST;
 			break;
 		}
-		r = kvmhv_run_single_vcpu(vcpu->arch.kvm_run, vcpu, hdec_exp,
-					  lpcr);
+		if (radix)
+			r = kvmhv_run_single_vcpu(vcpu->arch.kvm_run, vcpu,
+						  hdec_exp, lpcr);
+		else
+			r = RESUME_HOST; /* XXX TODO hpt entry path */
 	} while (is_kvmppc_resume_guest(r));
 
 	/* save L2 state for return */
@@ -344,6 +397,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	delta_ic = vcpu->arch.ic - l2_hv.ic;
 	delta_vtb = vc->vtb - l2_hv.vtb;
 	save_hv_return_state(vcpu, vcpu->arch.trap, &l2_hv);
+	if (!radix)
+		kvmhv_save_guest_slb(vcpu, l2_slb);
 
 	/* restore L1 state */
 	vcpu->arch.nested = NULL;
@@ -354,6 +409,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr |= MSR_TS_S;
 	vc->tb_offset = saved_l1_hv.tb_offset;
 	restore_hv_regs(vcpu, &saved_l1_hv);
+	if (!radix)
+		kvmhv_restore_guest_slb(vcpu, saved_l1_slb);
 	vcpu->arch.purr += delta_purr;
 	vcpu->arch.spurr += delta_spurr;
 	vcpu->arch.ic += delta_ic;
@@ -363,9 +420,21 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 
 	/* copy l2_hv_state and regs back to guest */
 	if (kvmppc_need_byteswap(vcpu)) {
+		if (!radix)
+			byteswap_guest_slb(l2_slb);
 		byteswap_hv_regs(&l2_hv);
 		byteswap_pt_regs(&l2_regs);
 	}
+	if (!radix) {
+		err = kvm_vcpu_write_guest(vcpu, slb_ptr, l2_slb,
+					   sizeof(struct guest_slb));
+		if (err) {
+			ret = H_AUTHORITY;
+			goto out_free;
+		}
+		kfree(l2_slb);
+		kfree(saved_l1_slb);
+	}
 	err = kvm_vcpu_write_guest(vcpu, hv_ptr, &l2_hv,
 				   sizeof(struct hv_guest_state));
 	if (err)
@@ -384,6 +453,11 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	}
 
 	return vcpu->arch.trap;
+
+out_free:
+	kfree(l2_slb);
+	kfree(saved_l1_slb);
+	return ret;
 }
 
 long kvmhv_nested_init(void)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 15/23] KVM: PPC: Book3S HV: Store lpcr and hdec_exp in the vcpu struct
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (13 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 14/23] KVM: PPC: Book3S HV: Nested: Context switch slb for nested hpt guest Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 16/23] KVM: PPC: Book3S HV: Nested: Make kvmppc_run_vcpu() entry path nested capable Suraj Jitindar Singh
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

When running a single vcpu with kvmhv_run_single_vcpu() the lpcr and
hypervisor decrementer expiry are passed as function arguments. When
running a vcore with kvmppc_run_vcpu() the lpcr is taken from the vcore
and there is no need to consider the hypervisor decrementer expiry as it
only applies when running a nested guest.

These fields will need to be accessed in the guest entry path in
book3s_hv_rmhandlers.S when running a nested hpt (hash page table)
guest. To allow for this store the lpcr and hdec_exp in the vcpu struct.

No functional change.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_book3s.h |  3 +--
 arch/powerpc/include/asm/kvm_host.h   |  2 ++
 arch/powerpc/kvm/book3s_hv.c          | 40 +++++++++++++++++------------------
 arch/powerpc/kvm/book3s_hv_nested.c   | 10 ++++-----
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 40218e81b75f..e1dc1872e453 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -314,8 +314,7 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
-int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu,
-			  u64 time_limit, unsigned long lpcr);
+int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
 void kvmhv_save_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp);
 void kvmhv_restore_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index bad09c213be6..b092701951ee 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -793,10 +793,12 @@ struct kvm_vcpu_arch {
 
 	u32 online;
 
+	unsigned long lpcr;
 	/* For support of nested guests */
 	struct kvm_nested_guest *nested;
 	u32 nested_vcpu_id;
 	gpa_t nested_io_gpr;
+	u64 hdec_exp;
 #endif
 
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index be72bc6b4cd5..8407071d5e22 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3429,8 +3429,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 /*
  * Handle making the H_ENTER_NESTED hcall if we're pseries.
  */
-static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit,
-				     unsigned long lpcr)
+static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit)
 {
 	/* call our hypervisor to load up HV regs and go */
 	struct hv_guest_state hvregs;
@@ -3454,7 +3453,7 @@ static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit,
 	host_psscr = mfspr(SPRN_PSSCR_PR);
 	mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
 	kvmhv_save_hv_regs(vcpu, &hvregs);
-	hvregs.lpcr = lpcr;
+	hvregs.lpcr = vcpu->arch.lpcr;
 	vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
 	if (vcpu->arch.nested) {
 		hvregs.lpid = vcpu->arch.nested->shadow_lpid;
@@ -3536,8 +3535,7 @@ static int kvmhv_pseries_enter_guest(struct kvm_vcpu *vcpu, u64 time_limit,
  * CPU_FTR_HVMODE is set. This is only used for radix guests, however that
  * radix guest may be a direct guest of this hypervisor or a nested guest.
  */
-static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
-				     unsigned long lpcr)
+static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	s64 hdec;
@@ -3594,7 +3592,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	mtspr(SPRN_AMOR, ~0UL);
 
-	mtspr(SPRN_LPCR, lpcr);
+	mtspr(SPRN_LPCR, vcpu->arch.lpcr);
 	isync();
 
 	kvmppc_xive_push_vcpu(vcpu);
@@ -3666,8 +3664,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
  * Virtual-mode guest entry for POWER9 and later when the host and
  * guest are both using the radix MMU.  The LPIDR has already been set.
  */
-int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
-			 unsigned long lpcr)
+int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	unsigned long host_dscr = mfspr(SPRN_DSCR);
@@ -3675,7 +3672,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 	unsigned long host_iamr = mfspr(SPRN_IAMR);
 	unsigned long host_amr = mfspr(SPRN_AMR);
 	s64 dec;
-	u64 tb;
+	u64 tb, time_limit;
 	int trap, save_pmu;
 
 	dec = mfspr(SPRN_DEC);
@@ -3683,8 +3680,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 	if (dec < 512)
 		return BOOK3S_INTERRUPT_HV_DECREMENTER;
 	local_paca->kvm_hstate.dec_expires = dec + tb;
-	if (local_paca->kvm_hstate.dec_expires < time_limit)
-		time_limit = local_paca->kvm_hstate.dec_expires;
+	time_limit = min_t(u64, local_paca->kvm_hstate.dec_expires,
+				vcpu->arch.hdec_exp);
 
 	vcpu->arch.ceded = 0;
 
@@ -3736,15 +3733,16 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 	mtspr(SPRN_DEC, vcpu->arch.dec_expires - mftb());
 
 	if (kvmhv_on_pseries()) {
-		trap = kvmhv_pseries_enter_guest(vcpu, time_limit, lpcr);
+		trap = kvmhv_pseries_enter_guest(vcpu, time_limit);
 	} else {
-		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
+		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit);
 	}
 
 	if (kvm_is_radix(vcpu->kvm))
 		vcpu->arch.slb_max = 0;
 	dec = mfspr(SPRN_DEC);
-	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
+	/* Sign extend if not using large decrementer */
+	if (!(vcpu->arch.lpcr & LPCR_LD))
 		dec = (s32) dec;
 	tb = mftb();
 	vcpu->arch.dec_expires = dec + tb;
@@ -4145,9 +4143,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	return vcpu->arch.ret;
 }
 
-int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,
-			  struct kvm_vcpu *vcpu, u64 time_limit,
-			  unsigned long lpcr)
+int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
 	int trap, r, pcpu;
 	int srcu_idx, lpid;
@@ -4206,7 +4202,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,
 		}
 		if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
 			     &vcpu->arch.pending_exceptions))
-			lpcr |= LPCR_MER;
+			vcpu->arch.lpcr |= LPCR_MER;
 	} else if (vcpu->arch.pending_exceptions ||
 		   vcpu->arch.doorbell_request ||
 		   xive_interrupt_pending(vcpu)) {
@@ -4242,7 +4238,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,
 	/* Tell lockdep that we're about to enable interrupts */
 	trace_hardirqs_on();
 
-	trap = kvmhv_p9_guest_entry(vcpu, time_limit, lpcr);
+	trap = kvmhv_p9_guest_entry(vcpu);
 	vcpu->arch.trap = trap;
 
 	trace_hardirqs_off();
@@ -4399,6 +4395,9 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
 	do {
+		/* update vcpu->arch.lpcr in case a previous loop modified it */
+		vcpu->arch.lpcr = vcpu->arch.vcore->lpcr;
+		vcpu->arch.hdec_exp = ~(u64)0;
 		/*
 		 * The early POWER9 chips that can't mix radix and HPT threads
 		 * on the same core also need the workaround for the problem
@@ -4412,8 +4411,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		if (kvmhv_on_pseries() || (kvm->arch.threads_indep &&
 					   kvm_is_radix(kvm) &&
 					   !no_mixing_hpt_and_radix))
-			r = kvmhv_run_single_vcpu(run, vcpu, ~(u64)0,
-						  vcpu->arch.vcore->lpcr);
+			r = kvmhv_run_single_vcpu(run, vcpu);
 		else
 			r = kvmppc_run_vcpu(run, vcpu);
 
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 883f8896ed60..f80491e9ff97 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -267,7 +267,6 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	struct guest_slb *l2_slb = NULL, *saved_l1_slb = NULL;
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	u64 hv_ptr, regs_ptr, slb_ptr = 0UL;
-	u64 hdec_exp;
 	s64 delta_purr, delta_spurr, delta_ic, delta_vtb;
 	u64 mask;
 	unsigned long lpcr;
@@ -357,7 +356,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	}
 
 	/* convert TB values/offsets to host (L0) values */
-	hdec_exp = l2_hv.hdec_expiry - vc->tb_offset;
+	vcpu->arch.hdec_exp = l2_hv.hdec_expiry - vc->tb_offset;
 	vc->tb_offset += l2_hv.tb_offset;
 
 	/* set L1 state to L2 state */
@@ -377,14 +376,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 	vcpu->arch.ret = RESUME_GUEST;
 	vcpu->arch.trap = 0;
 	do {
-		if (mftb() >= hdec_exp) {
+		if (mftb() >= vcpu->arch.hdec_exp) {
 			vcpu->arch.trap = BOOK3S_INTERRUPT_HV_DECREMENTER;
 			r = RESUME_HOST;
 			break;
 		}
+		/* update vcpu->arch.lpcr in case a previous loop modified it */
+		vcpu->arch.lpcr = lpcr;
 		if (radix)
-			r = kvmhv_run_single_vcpu(vcpu->arch.kvm_run, vcpu,
-						  hdec_exp, lpcr);
+			r = kvmhv_run_single_vcpu(vcpu->arch.kvm_run, vcpu);
 		else
 			r = RESUME_HOST; /* XXX TODO hpt entry path */
 	} while (is_kvmppc_resume_guest(r));
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 16/23] KVM: PPC: Book3S HV: Nested: Make kvmppc_run_vcpu() entry path nested capable
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (14 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 15/23] KVM: PPC: Book3S HV: Store lpcr and hdec_exp in the vcpu struct Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 17/23] KVM: PPC: Book3S HV: Nested: Rename kvmhv_xlate_addr_nested_radix Suraj Jitindar Singh
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

In order to run a hpt (hash page table) guest the kvm entry path used must
enter real mode before loading up the guest mmu state. Currently the
only path which does this is calling kvmppc_run_vcpu() which then uses
the entry path in book3s_hv_rmhandlers.S and until now this path didn't
accomodate running a nested guest.

Have the nested hpt guest entry path call kvmppc_run_vcpu() and modify
the entry path in book3s_hv_rmhandlers.S to be able to run a nested
guest.

For the entry path this means loading the smaller of the guest
hypervisor decrementer and the host decrementer into the host
decrementer since we want control back when either expires. Additionally
the correct LPID and LPCR must be loaded, and the guest slb entries must
be restored. When checking if an interrupt can be injected to the guest
in kvmppc_guest_entry_inject_int() we return -1 if entering a nested
guest while there is something pending for the L1 guest to indicate that
the nested guest shouldn't be entered and control should be passed back
to the L1 guest.

On the exit path we must save the guest slb entries to be returned to
the L1 guest hypervisor. Additionally the correct vrma_slb_v entry must
be loaded for kvmppc_hpte_hv_fault() if the guest was in real mode. The
correct hpt must be used in kvmppc_hpte_hv_fault(). And the correct
handle_exit function must be called depending on whether a nested guest
was being run or not.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/kvm_asm.h      |   5 ++
 arch/powerpc/include/asm/kvm_book3s.h   |   3 +-
 arch/powerpc/include/asm/kvm_ppc.h      |   2 +-
 arch/powerpc/kernel/asm-offsets.c       |   5 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c     |   2 +-
 arch/powerpc/kvm/book3s_hv.c            |  55 +++++++-------
 arch/powerpc/kvm/book3s_hv_builtin.c    |  33 ++++++---
 arch/powerpc/kvm/book3s_hv_interrupts.S |  25 ++++++-
 arch/powerpc/kvm/book3s_hv_nested.c     |   2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c     |  80 +++++++++++++++------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 124 ++++++++++++++++++++++----------
 arch/powerpc/kvm/book3s_xive.h          |  15 ++++
 12 files changed, 252 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 635fb154b33f..83bfd74ce67c 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -104,6 +104,11 @@
  * completely in the guest.
  */
 #define BOOK3S_INTERRUPT_HV_RM_HARD	0x5555
+/*
+ * Special trap used when running a nested guest to communicate that control
+ * should be passed back to the L1 guest. e.g. Because interrupt pending
+ */
+#define BOOK3S_INTERRUPT_HV_NEST_EXIT	0x5556
 
 #define BOOK3S_IRQPRIO_SYSTEM_RESET		0
 #define BOOK3S_IRQPRIO_DATA_SEGMENT		1
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index e1dc1872e453..f13dab096dad 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -158,7 +158,7 @@ extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
 extern int kvmppc_book3s_hv_page_fault(struct kvm_run *run,
 			struct kvm_vcpu *vcpu, unsigned long addr,
 			unsigned long status);
-extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr,
+extern long kvmppc_hv_find_lock_hpte(struct kvm_hpt_info *hpt, gva_t eaddr,
 			unsigned long slb_v, unsigned long valid);
 extern int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			unsigned long gpa, gva_t ea, int is_store);
@@ -315,6 +315,7 @@ void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
 int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
 void kvmhv_save_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp);
 void kvmhv_restore_guest_slb(struct kvm_vcpu *vcpu, struct guest_slb *slbp);
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2c4d659cf8bb..46bbdc38b2c5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -697,7 +697,7 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
                     unsigned long mfrr);
 int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
 int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
-void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
+int kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
 
 /*
  * Host-side operations we want to set up while running in real
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 4ccb6b3a7fbd..7652ad430aab 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -511,9 +511,14 @@ int main(void)
 	OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr);
 	OFFSET(VCPU_VPA_DIRTY, kvm_vcpu, arch.vpa.dirty);
 	OFFSET(VCPU_HEIR, kvm_vcpu, arch.emul_inst);
+	OFFSET(VCPU_LPCR, kvm_vcpu, arch.lpcr);
 	OFFSET(VCPU_NESTED, kvm_vcpu, arch.nested);
+	OFFSET(VCPU_NESTED_LPID, kvm_nested_guest, shadow_lpid);
+	OFFSET(VCPU_NESTED_RADIX, kvm_nested_guest, radix);
+	OFFSET(VCPU_NESTED_VRMA_SLB_V, kvm_nested_guest, vrma_slb_v);
 	OFFSET(VCPU_CPU, kvm_vcpu, cpu);
 	OFFSET(VCPU_THREAD_CPU, kvm_vcpu, arch.thread_cpu);
+	OFFSET(VCPU_HDEC_EXP, kvm_vcpu, arch.hdec_exp);
 #endif
 #ifdef CONFIG_PPC_BOOK3S
 	OFFSET(VCPU_PURR, kvm_vcpu, arch.purr);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index bbb23b3f8bb9..2b30b48dce49 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -361,7 +361,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 
 	preempt_disable();
 	/* Find the HPTE in the hash table */
-	index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
+	index = kvmppc_hv_find_lock_hpte(&kvm->arch.hpt, eaddr, slb_v,
 					 HPTE_V_VALID | HPTE_V_ABSENT);
 	if (index < 0) {
 		preempt_enable();
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8407071d5e22..4020bb52fca7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -74,6 +74,7 @@
 #include <asm/hw_breakpoint.h>
 
 #include "book3s.h"
+#include "book3s_xive.h"
 
 #define CREATE_TRACE_POINTS
 #include "trace_hv.h"
@@ -1520,7 +1521,11 @@ static int kvmppc_handle_nested_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	/* We're good on these - the host merely wanted to get our attention */
 	case BOOK3S_INTERRUPT_HV_DECREMENTER:
 		vcpu->stat.dec_exits++;
-		r = RESUME_GUEST;
+		/* if the guest hdec has expired then it wants control back */
+		if (mftb() >= vcpu->arch.hdec_exp)
+			r = RESUME_HOST;
+		else
+			r = RESUME_GUEST;
 		break;
 	case BOOK3S_INTERRUPT_EXTERNAL:
 		vcpu->stat.ext_intr_exits++;
@@ -1583,6 +1588,15 @@ static int kvmppc_handle_nested_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		if (!xics_on_xive())
 			kvmppc_xics_rm_complete(vcpu, 0);
 		break;
+	case BOOK3S_INTERRUPT_HV_NEST_EXIT:
+		/*
+		 * Occurs on nested guest entry path to indicate that control
+		 * should be passed back to l1 guest hypervisor.
+		 * e.g. because of pending interrupt
+		 */
+		vcpu->arch.trap = 0;
+		r = RESUME_HOST;
+		break;
 	default:
 		r = RESUME_HOST;
 		break;
@@ -2957,7 +2971,6 @@ static void post_guest_process(struct kvmppc_vcore *vc, bool is_master)
 {
 	int still_running = 0, i;
 	u64 now;
-	long ret;
 	struct kvm_vcpu *vcpu;
 
 	spin_lock(&vc->lock);
@@ -2978,13 +2991,16 @@ static void post_guest_process(struct kvmppc_vcore *vc, bool is_master)
 
 		trace_kvm_guest_exit(vcpu);
 
-		ret = RESUME_GUEST;
-		if (vcpu->arch.trap)
-			ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
-						    vcpu->arch.run_task);
-
-		vcpu->arch.ret = ret;
-		vcpu->arch.trap = 0;
+		vcpu->arch.ret = RESUME_GUEST;
+		if (vcpu->arch.trap) {
+			if (vcpu->arch.nested)
+				vcpu->arch.ret = kvmppc_handle_nested_exit(
+						 vcpu->arch.kvm_run, vcpu);
+			else
+				vcpu->arch.ret = kvmppc_handle_exit_hv(
+						 vcpu->arch.kvm_run, vcpu,
+						 vcpu->arch.run_task);
+		}
 
 		spin_lock(&vc->lock);
 		if (is_kvmppc_resume_guest(vcpu->arch.ret)) {
@@ -3297,6 +3313,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 			if (!vcpu->arch.ptid)
 				thr0_done = true;
 			active |= 1 << (thr + vcpu->arch.ptid);
+			vcpu->arch.trap = 0;
 		}
 		/*
 		 * We need to start the first thread of each subcore
@@ -3847,21 +3864,6 @@ static void shrink_halt_poll_ns(struct kvmppc_vcore *vc)
 		vc->halt_poll_ns /= halt_poll_ns_shrink;
 }
 
-#ifdef CONFIG_KVM_XICS
-static inline bool xive_interrupt_pending(struct kvm_vcpu *vcpu)
-{
-	if (!xics_on_xive())
-		return false;
-	return vcpu->arch.irq_pending || vcpu->arch.xive_saved_state.pipr <
-		vcpu->arch.xive_saved_state.cppr;
-}
-#else
-static inline bool xive_interrupt_pending(struct kvm_vcpu *vcpu)
-{
-	return false;
-}
-#endif /* CONFIG_KVM_XICS */
-
 static bool kvmppc_vcpu_woken(struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.pending_exceptions || vcpu->arch.prodded ||
@@ -4013,7 +4015,7 @@ static int kvmhv_setup_mmu(struct kvm_vcpu *vcpu)
 	return r;
 }
 
-static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
 	int n_ceded, i, r;
 	struct kvmppc_vcore *vc;
@@ -4082,7 +4084,8 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 			continue;
 		}
 		for_each_runnable_thread(i, v, vc) {
-			kvmppc_core_prepare_to_enter(v);
+			if (!vcpu->arch.nested)
+				kvmppc_core_prepare_to_enter(v);
 			if (signal_pending(v->arch.run_task)) {
 				kvmppc_remove_runnable(vc, v);
 				v->stat.signal_exits++;
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7c1909657b55..049c3111b530 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -759,26 +759,40 @@ void kvmhv_p9_restore_lpcr(struct kvm_split_mode *sip)
  * Is there a PRIV_DOORBELL pending for the guest (on POWER9)?
  * Can we inject a Decrementer or a External interrupt?
  */
-void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu)
+int kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu)
 {
 	int ext;
 	unsigned long vec = 0;
-	unsigned long lpcr;
+	unsigned long old_lpcr, lpcr;
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	/*
+	 * Don't enter a nested guest if there is something pending for this
+	 * vcpu for the l1 guest. Return -1 to indicate this.
+	 */
+	if (vcpu->arch.nested && (vcpu->arch.pending_exceptions ||
+				  vcpu->arch.prodded ||
+				  vcpu->arch.doorbell_request ||
+				  xive_interrupt_pending(vcpu)))
+		return -1;
+#endif
 
 	/* Insert EXTERNAL bit into LPCR at the MER bit position */
 	ext = (vcpu->arch.pending_exceptions >> BOOK3S_IRQPRIO_EXTERNAL) & 1;
-	lpcr = mfspr(SPRN_LPCR);
-	lpcr |= ext << LPCR_MER_SH;
-	mtspr(SPRN_LPCR, lpcr);
-	isync();
+	old_lpcr = mfspr(SPRN_LPCR);
+	lpcr = old_lpcr | (ext << LPCR_MER_SH);
+	if (lpcr != old_lpcr) {
+		mtspr(SPRN_LPCR, lpcr);
+		isync();
+	}
 
 	if (vcpu->arch.shregs.msr & MSR_EE) {
 		if (ext) {
 			vec = BOOK3S_INTERRUPT_EXTERNAL;
 		} else {
-			long int dec = mfspr(SPRN_DEC);
+			s64 dec = mfspr(SPRN_DEC);
 			if (!(lpcr & LPCR_LD))
-				dec = (int) dec;
+				dec = (s32) dec;
 			if (dec < 0)
 				vec = BOOK3S_INTERRUPT_DECREMENTER;
 		}
@@ -795,12 +809,13 @@ void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr = msr;
 	}
 
-	if (vcpu->arch.doorbell_request) {
+	if (cpu_has_feature(CPU_FTR_ARCH_300) && vcpu->arch.doorbell_request) {
 		mtspr(SPRN_DPDES, 1);
 		vcpu->arch.vcore->dpdes = 1;
 		smp_wmb();
 		vcpu->arch.doorbell_request = 0;
 	}
+	return 0;
 }
 
 static void flush_guest_tlb(struct kvm *kvm)
diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S b/arch/powerpc/kvm/book3s_hv_interrupts.S
index 63fd81f3039d..624f9951731d 100644
--- a/arch/powerpc/kvm/book3s_hv_interrupts.S
+++ b/arch/powerpc/kvm/book3s_hv_interrupts.S
@@ -58,10 +58,20 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	/*
 	 * Put whatever is in the decrementer into the
 	 * hypervisor decrementer.
+	 * If running a nested guest then put the lower of the host decrementer
+	 * and the guest hypervisor decrementer into the hypervisor decrementer
+	 * since we want control back from the nested guest when either expires.
 	 */
 BEGIN_FTR_SECTION
 	ld	r5, HSTATE_KVM_VCORE(r13)
-	ld	r6, VCORE_KVM(r5)
+	ld	r6, HSTATE_KVM_VCPU(r13)
+	cmpdi   cr1, r6, 0              /* Do we actually have a vcpu? */
+	beq     cr1, 33f
+	ld      r7, VCPU_NESTED(r6)
+	cmpdi   cr1, r7, 0              /* Do we have a nested guest? */
+	beq     cr1, 33f
+	ld      r10, VCPU_HDEC_EXP(r6)  /* If so load the hdec expiry */
+33:	ld	r6, VCORE_KVM(r5)
 	ld	r9, KVM_HOST_LPCR(r6)
 	andis.	r9, r9, LPCR_LD@h
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
@@ -72,8 +82,17 @@ BEGIN_FTR_SECTION
 	bne	32f
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	extsw	r8,r8
-32:	mtspr	SPRN_HDEC,r8
-	add	r8,r8,r7
+BEGIN_FTR_SECTION
+32:	beq     cr1, 34f		/* did we load hdec expiry above? */
+	subf    r10, r7, r10		/* r10 = guest_hdec = hdec_exp - tb */
+	cmpd    r8, r10			/* host decrementer < hdec? */
+	ble     34f
+	mtspr   SPRN_HDEC, r10		/* put guest_hdec into the hv decr */
+	b       35f
+34:
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+	mtspr	SPRN_HDEC,r8		/* put host decr into hv decr */
+35:	add	r8,r8,r7
 	std	r8,HSTATE_DECEXP(r13)
 
 	/* Jump to partition switch code */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index f80491e9ff97..54d6ff0bee5b 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -386,7 +386,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		if (radix)
 			r = kvmhv_run_single_vcpu(vcpu->arch.kvm_run, vcpu);
 		else
-			r = RESUME_HOST; /* XXX TODO hpt entry path */
+			r = kvmppc_run_vcpu(vcpu->arch.kvm_run, vcpu);
 	} while (is_kvmppc_resume_guest(r));
 
 	/* save L2 state for return */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 53fe51d04d78..a939782d8a5e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -1166,8 +1166,8 @@ static struct mmio_hpte_cache_entry *
  * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK
  * can trigger deadlock issue.
  */
-long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
-			      unsigned long valid)
+long kvmppc_hv_find_lock_hpte(struct kvm_hpt_info *hpt, gva_t eaddr,
+			     unsigned long slb_v, unsigned long valid)
 {
 	unsigned int i;
 	unsigned int pshift;
@@ -1195,7 +1195,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		somask = (1UL << 28) - 1;
 		vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
 	}
-	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvmppc_hpt_mask(&kvm->arch.hpt);
+	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvmppc_hpt_mask(hpt);
 	avpn = slb_v & ~(somask >> 16);	/* also includes B */
 	avpn |= (eaddr & somask) >> 16;
 
@@ -1206,7 +1206,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 	val |= avpn;
 
 	for (;;) {
-		hpte = (__be64 *)(kvm->arch.hpt.virt + (hash << 7));
+		hpte = (__be64 *)(hpt->virt + (hash << 7));
 
 		for (i = 0; i < 16; i += 2) {
 			/* Read the PTE racily */
@@ -1242,7 +1242,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 		if (val & HPTE_V_SECONDARY)
 			break;
 		val |= HPTE_V_SECONDARY;
-		hash = hash ^ kvmppc_hpt_mask(&kvm->arch.hpt);
+		hash = hash ^ kvmppc_hpt_mask(hpt);
 	}
 	return -1;
 }
@@ -1265,7 +1265,9 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 			  unsigned long slb_v, unsigned int status,
 			  bool data, bool is_realmode)
 {
+	struct kvm_nested_guest *nested;
 	struct kvm *kvm = vcpu->kvm;
+	struct kvm_hpt_info *hpt;
 	long int index;
 	unsigned long v, r, gr, orig_v;
 	__be64 *hpte;
@@ -1275,12 +1277,20 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	struct mmio_hpte_cache_entry *cache_entry = NULL;
 	long mmio_update = 0;
 
+	hpt = &kvm->arch.hpt;
+	nested = vcpu->arch.nested;
+	if (nested)
+		hpt = &nested->shadow_hpt;
+
 	/* For protection fault, expect to find a valid HPTE */
 	valid = HPTE_V_VALID;
 	if (status & DSISR_NOHPTE) {
 		valid |= HPTE_V_ABSENT;
-		mmio_update = atomic64_read(&kvm->arch.mmio_update);
-		cache_entry = mmio_cache_search(vcpu, addr, slb_v, mmio_update);
+		if (!nested) {
+			mmio_update = atomic64_read(&kvm->arch.mmio_update);
+			cache_entry = mmio_cache_search(vcpu, addr, slb_v,
+							mmio_update);
+		}
 	}
 	if (cache_entry) {
 		index = cache_entry->pte_index;
@@ -1288,20 +1298,26 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 		r = cache_entry->hpte_r;
 		gr = cache_entry->rpte;
 	} else {
-		index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
+		index = kvmppc_hv_find_lock_hpte(hpt, addr, slb_v, valid);
 		if (index < 0) {
-			if (status & DSISR_NOHPTE)
+			if (status & DSISR_NOHPTE) {
+				if (nested) {
+					/* have to look for HPTE in L1's HPT */
+					vcpu->arch.pgfault_index = index;
+					return -1;
+				}
 				return status;	/* there really was no HPTE */
+			}
 			return 0;	/* for prot fault, HPTE disappeared */
 		}
-		hpte = (__be64 *)(kvm->arch.hpt.virt + (index << 4));
+		hpte = (__be64 *)(hpt->virt + (index << 4));
 		v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
 		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
 			v = hpte_new_to_old_v(v, r);
 			r = hpte_new_to_old_r(r);
 		}
-		rev = &kvm->arch.hpt.rev[index];
+		rev = &hpt->rev[index];
 		if (is_realmode)
 			rev = real_vmalloc_addr(rev);
 		gr = rev->guest_rpte;
@@ -1318,17 +1334,25 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	key = (vcpu->arch.shregs.msr & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
 	status &= ~DSISR_NOHPTE;	/* DSISR_NOHPTE == SRR1_ISI_NOPT */
 	if (!data) {
-		if (gr & (HPTE_R_N | HPTE_R_G))
-			return status | SRR1_ISI_N_OR_G;
-		if (!hpte_read_permission(pp, slb_v & key))
-			return status | SRR1_ISI_PROT;
+		if (gr & (HPTE_R_N | HPTE_R_G)) {
+			status |= SRR1_ISI_N_OR_G;
+			goto forward_to_guest;
+		}
+		if (!hpte_read_permission(pp, slb_v & key)) {
+			status |= SRR1_ISI_PROT;
+			goto forward_to_guest;
+		}
 	} else if (status & DSISR_ISSTORE) {
 		/* check write permission */
-		if (!hpte_write_permission(pp, slb_v & key))
-			return status | DSISR_PROTFAULT;
+		if (!hpte_write_permission(pp, slb_v & key)) {
+			status |= DSISR_PROTFAULT;
+			goto forward_to_guest;
+		}
 	} else {
-		if (!hpte_read_permission(pp, slb_v & key))
-			return status | DSISR_PROTFAULT;
+		if (!hpte_read_permission(pp, slb_v & key)) {
+			status |= DSISR_PROTFAULT;
+			goto forward_to_guest;
+		}
 	}
 
 	/* Check storage key, if applicable */
@@ -1343,13 +1367,14 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	/* Save HPTE info for virtual-mode handler */
 	vcpu->arch.pgfault_addr = addr;
 	vcpu->arch.pgfault_index = index;
+
 	vcpu->arch.pgfault_hpte[0] = v;
 	vcpu->arch.pgfault_hpte[1] = r;
 	vcpu->arch.pgfault_cache = cache_entry;
 
 	/* Check the storage key to see if it is possibly emulated MMIO */
-	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
-	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) {
+	if (!nested && (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
+			    (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) {
 		if (!cache_entry) {
 			unsigned int pshift = 12;
 			unsigned int pshift_index;
@@ -1373,5 +1398,18 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	}
 
 	return -1;		/* send fault up to host kernel mode */
+
+forward_to_guest:
+	if (nested) {
+		/*
+		 * This was technically caused by missing permissions in the L1
+		 * pte, go up to the virtual mode handler so we can forward
+		 * this interrupt to L1.
+		 */
+		vcpu->arch.pgfault_index = -1;
+		vcpu->arch.fault_dsisr = status;
+		return -1;
+	}
+	return status;
 }
 EXPORT_SYMBOL_GPL(kvmppc_hpte_hv_fault);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 54e1864d4702..43cdd9f7fab5 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -606,15 +606,29 @@ kvmppc_hv_entry:
 	cmpwi	r6,0
 	bne	10f
 
-	lwz	r7,KVM_LPID(r9)
+	/* Load guest lpid (on P9 need to check if running a nested guest) */
 BEGIN_FTR_SECTION
+	cmpdi	r4, 0			/* do we have a vcpu? */
+	beq	19f
+	ld	r5, VCPU_NESTED(r4)	/* vcpu running nested guest? */
+	cmpdi	cr2, r5, 0		/* use cr2 as indication of nested */
+	/*
+	 * If we're using this entry path for a nested guest that nested guest
+	 * must be hash, otherwise we'd have used __kvmhv_vcpu_entry_p9.
+	 */
+	beq	cr2, 19f
+	ld	r7, VCPU_NESTED_LPID(r5)
+	b	20f
+19:
+FTR_SECTION_ELSE
 	ld	r6,KVM_SDR1(r9)
 	li	r0,LPID_RSVD		/* switch to reserved LPID */
 	mtspr	SPRN_LPID,r0
 	ptesync
 	mtspr	SPRN_SDR1,r6		/* switch to partition page table */
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
-	mtspr	SPRN_LPID,r7
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
+	lwz	r7,KVM_LPID(r9)
+20:	mtspr	SPRN_LPID,r7
 	isync
 
 	/* See if we need to flush the TLB. */
@@ -892,7 +906,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 	HMT_MEDIUM
 21:
 	/* Set LPCR. */
-	ld	r8,VCORE_LPCR(r5)
+	ld	r8,VCPU_LPCR(r4)
 	mtspr	SPRN_LPCR,r8
 	isync
 
@@ -915,10 +929,14 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
 	blt	hdec_soon
 
 	/* For hash guest, clear out and reload the SLB */
+BEGIN_FTR_SECTION
+	bne	cr2, 10f		/* cr2 indicates nested -> hash */
 	ld	r6, VCPU_KVM(r4)
 	lbz	r0, KVM_RADIX(r6)
 	cmpwi	r0, 0
 	bne	9f
+10:
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	li	r6, 0
 	slbmte	r6, r6
 	slbia
@@ -1018,19 +1036,18 @@ no_xive:
 	stw	r0, STACK_SLOT_SHORT_PATH(r1)
 
 deliver_guest_interrupt:	/* r4 = vcpu, r13 = paca */
-	/* Check if we can deliver an external or decrementer interrupt now */
-	ld	r0, VCPU_PENDING_EXC(r4)
-BEGIN_FTR_SECTION
-	/* On POWER9, also check for emulated doorbell interrupt */
-	lbz	r3, VCPU_DBELL_REQ(r4)
-	or	r0, r0, r3
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
-	cmpdi	r0, 0
-	beq	71f
+	/* Check if we can deliver external/decrementer/dbell interrupt now */
 	mr	r3, r4
 	bl	kvmppc_guest_entry_inject_int
-	ld	r4, HSTATE_KVM_VCPU(r13)
+	cmpdi	r3, 0
+	beq	71f
+	/* kvmppc_guest_entry_inject_int returned -1 don't enter nested guest */
+	ld	r9, HSTATE_KVM_VCPU(r13)
+	li	r12, BOOK3S_INTERRUPT_HV_NEST_EXIT
+	b	guest_exit_cont
+
 71:
+	ld	r4, HSTATE_KVM_VCPU(r13)
 	ld	r6, VCPU_SRR0(r4)
 	ld	r7, VCPU_SRR1(r4)
 	mtspr	SPRN_SRR0, r6
@@ -1462,11 +1479,17 @@ guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 	bne	guest_exit_short_path
 
 	/* For hash guest, read the guest SLB and save it away */
-	ld	r5, VCPU_KVM(r9)
-	lbz	r0, KVM_RADIX(r5)
 	li	r5, 0
+BEGIN_FTR_SECTION
+	ld	r6, VCPU_NESTED(r9)	/* vcpu running nested guest? */
+	cmpdi	r6, 0
+	bne	4f			/* must be hash if we're nested */
+	ld	r7, VCPU_KVM(r9)
+	lbz	r0, KVM_RADIX(r7)
 	cmpwi	r0, 0
 	bne	3f			/* for radix, save 0 entries */
+4:
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	lwz	r0,VCPU_SLB_NR(r9)	/* number of entries in SLB */
 	mtctr	r0
 	li	r6,0
@@ -1517,7 +1540,7 @@ guest_bypass:
 	mftb	r6
 	/* On P9, if the guest has large decr enabled, don't sign extend */
 BEGIN_FTR_SECTION
-	ld	r4, VCORE_LPCR(r3)
+	ld	r4, VCPU_LPCR(r9)
 	andis.	r4, r4, LPCR_LD@h
 	bne	16f
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
@@ -1749,6 +1772,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	/*
 	 * Are we running hash or radix ?
 	 */
+	ld	r6, VCPU_NESTED(r9)	/* vcpu running nested guest? */
+	cmpdi	r6, 0
+	bne	2f			/* must be hash if we're nested */
 	ld	r5, VCPU_KVM(r9)
 	lbz	r0, KVM_RADIX(r5)
 	cmpwi	cr2, r0, 0
@@ -2036,22 +2062,38 @@ kvmppc_tm_emul:
  * reflect the HDSI to the guest as a DSI.
  */
 kvmppc_hdsi:
-	ld	r3, VCPU_KVM(r9)
-	lbz	r0, KVM_RADIX(r3)
 	mfspr	r4, SPRN_HDAR
 	mfspr	r6, SPRN_HDSISR
 BEGIN_FTR_SECTION
 	/* Look for DSISR canary. If we find it, retry instruction */
 	cmpdi	r6, 0x7fff
 	beq	6f
+	/* Are we hash or radix? */
+	ld	r3, VCPU_NESTED(r9)
+	cmpdi	cr2, r3, 0
+	beq	cr2, 10f
+	lbz	r0, VCPU_NESTED_RADIX(r3)	/* nested check nested->radix */
+	b	11f
+10:	ld      r5, VCPU_KVM(r9)
+	lbz     r0, KVM_RADIX(r5)	/* !nested check kvm->arch.radix */
+11:	cmpwi	r0, 0
+	bne	.Lradix_hdsi            /* on radix, just save DAR/DSISR/ASDR */
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
-	cmpwi	r0, 0
-	bne	.Lradix_hdsi		/* on radix, just save DAR/DSISR/ASDR */
 	/* HPTE not found fault or protection fault? */
 	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
 	beq	1f			/* if not, send it to the guest */
 	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
-	beq	3f
+	bne	3f
+	/* not relocated, load the VRMA_SLB_V for kvmppc_hpte_hv_fault() */
+BEGIN_FTR_SECTION
+	beq	cr2, 12f			/* cr2 indicates nested */
+	ld	r5, VCPU_NESTED_VRMA_SLB_V(r3)	/* r3 = nested (loaded above) */
+	b	4f
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+	ld	r5, VCPU_KVM(r9)
+12:	ld	r5, KVM_VRMA_SLB_V(r5)
+	b	4f
+3:
 BEGIN_FTR_SECTION
 	mfspr	r5, SPRN_ASDR		/* on POWER9, use ASDR to get VSID */
 	b	4f
@@ -2097,10 +2139,6 @@ fast_interrupt_c_return:
 	mr	r4, r9
 	b	fast_guest_return
 
-3:	ld	r5, VCPU_KVM(r9)	/* not relocated, use VRMA */
-	ld	r5, KVM_VRMA_SLB_V(r5)
-	b	4b
-
 	/* If this is for emulated MMIO, load the instruction word */
 2:	li	r8, KVM_INST_FETCH_FAILED	/* In case lwz faults */
 
@@ -2137,14 +2175,32 @@ fast_interrupt_c_return:
  * it is an HPTE not found fault for a page that we have paged out.
  */
 kvmppc_hisi:
-	ld	r3, VCPU_KVM(r9)
-	lbz	r0, KVM_RADIX(r3)
-	cmpwi	r0, 0
+BEGIN_FTR_SECTION
+	/* Are we hash or radix? */
+	ld	r3, VCPU_NESTED(r9)
+	cmpdi	cr2, r3, 0
+	beq	cr2, 10f
+	lbz	r0, VCPU_NESTED_RADIX(r3)	/* nested check nested->radix */
+	b	11f
+10:	ld      r6, VCPU_KVM(r9)
+	lbz     r0, KVM_RADIX(r6)	/* !nested check kvm->arch.radix */
+11:	cmpwi	r0, 0
 	bne	.Lradix_hisi		/* for radix, just save ASDR */
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	andis.	r0, r11, SRR1_ISI_NOPT@h
 	beq	1f
 	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
-	beq	3f
+	bne	3f
+	/* not relocated, load the VRMA_SLB_V for kvmppc_hpte_hv_fault() */
+BEGIN_FTR_SECTION
+	beq	cr2, 12f			/* cr2 indicates nested */
+	ld	r5, VCPU_NESTED_VRMA_SLB_V(r3)	/* r3 = nested (loaded above) */
+	b	4f
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+	ld	r6, VCPU_KVM(r9)
+12:	ld	r5, KVM_VRMA_SLB_V(r6)
+	b	4f
+3:
 BEGIN_FTR_SECTION
 	mfspr	r5, SPRN_ASDR		/* on POWER9, use ASDR to get VSID */
 	b	4f
@@ -2179,10 +2235,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	bl	kvmppc_msr_interrupt
 	b	fast_interrupt_c_return
 
-3:	ld	r6, VCPU_KVM(r9)	/* not relocated, use VRMA */
-	ld	r5, KVM_VRMA_SLB_V(r6)
-	b	4b
-
 /*
  * Try to handle an hcall in real mode.
  * Returns to the guest if we handle it, or continues on up to
@@ -2624,8 +2676,8 @@ END_FTR_SECTION(CPU_FTR_TM | CPU_FTR_P9_TM_HV_ASSIST, 0)
 	mftb	r5
 BEGIN_FTR_SECTION
 	/* On P9 check whether the guest has large decrementer mode enabled */
-	ld	r6, HSTATE_KVM_VCORE(r13)
-	ld	r6, VCORE_LPCR(r6)
+	ld	r6, HSTATE_KVM_VCPU(r13)
+	ld	r6, VCPU_LPCR(r6)
 	andis.	r6, r6, LPCR_LD@h
 	bne	68f
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 50494d0ee375..d6f10d7ec4d2 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -283,5 +283,20 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
 				  bool single_escalation);
 struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
 
+static inline bool xive_interrupt_pending(struct kvm_vcpu *vcpu)
+{
+        if (!xics_on_xive())
+                return false;
+        return vcpu->arch.irq_pending || vcpu->arch.xive_saved_state.pipr <
+                vcpu->arch.xive_saved_state.cppr;
+}
+
+#else /* !CONFIG_KVM_XICS */
+
+static inline bool xive_interrupt_pending(struct kvm_vcpu *vcpu)
+{
+        return false;
+}
+
 #endif /* CONFIG_KVM_XICS */
 #endif /* _KVM_PPC_BOOK3S_XICS_H */
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 17/23] KVM: PPC: Book3S HV: Nested: Rename kvmhv_xlate_addr_nested_radix
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (15 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 16/23] KVM: PPC: Book3S HV: Nested: Make kvmppc_run_vcpu() entry path nested capable Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 18/23] KVM: PPC: Book3S HV: Separate out hashing from kvmppc_hv_find_lock_hpte() Suraj Jitindar Singh
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Rename kvmhv_xlate_addr_nested() to kvmhv_xlate_addr_nested_radix() for
clarity since there will need to be a hash function added.

Additionally if we can't permit an access since a nested guest is
writing to a page which has the KVM_MEM_READONLY bit set in the memslot
then give the nested guest an interrupt, there's not much else which can
be done in this case. For now this is the only place where an interrupt
is injected directly to the nested guest by the L0 hypervisor without
involving the L1 guest hypervisor.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s.c           |  1 +
 arch/powerpc/kvm/book3s_hv_nested.c | 81 +++++++++++++++++++++++++++----------
 2 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 9524d92bc45d..ff6586f1ed6f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -140,6 +140,7 @@ void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
 	kvmppc_set_pc(vcpu, kvmppc_interrupt_offset(vcpu) + vec);
 	vcpu->arch.mmu.reset_msr(vcpu);
 }
+EXPORT_SYMBOL_GPL(kvmppc_inject_interrupt);
 
 static int kvmppc_book3s_vec2irqprio(unsigned int vec)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 54d6ff0bee5b..8ed50d4bd9a6 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -258,6 +258,14 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu *vcpu, u64 regs_ptr)
 	}
 }
 
+static void kvmhv_update_intr_msr(struct kvm_vcpu *vcpu, unsigned long lpcr)
+{
+	if (lpcr & LPCR_ILE)
+		vcpu->arch.intr_msr |= MSR_LE;
+	else
+		vcpu->arch.intr_msr &= ~MSR_LE;
+}
+
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 {
 	long int err, r, ret = H_SUCCESS;
@@ -368,6 +376,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		LPCR_LPES | LPCR_MER | LPCR_HR | LPCR_GTSE | LPCR_UPRT |
 		LPCR_VPM1;
 	lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
+	kvmhv_update_intr_msr(vcpu, lpcr);
 	sanitise_hv_regs(vcpu, &l2_hv);
 	restore_hv_regs(vcpu, &l2_hv);
 	if (!radix)
@@ -409,6 +418,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 		vcpu->arch.shregs.msr |= MSR_TS_S;
 	vc->tb_offset = saved_l1_hv.tb_offset;
 	restore_hv_regs(vcpu, &saved_l1_hv);
+	kvmhv_update_intr_msr(vcpu, vc->lpcr);
 	if (!radix)
 		kvmhv_restore_guest_slb(vcpu, saved_l1_slb);
 	vcpu->arch.purr += delta_purr;
@@ -1480,13 +1490,38 @@ long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu)
 	return H_SUCCESS;
 }
 
-/* Used to convert a nested guest real address to a L1 guest real address */
-static int kvmhv_translate_addr_nested(struct kvm_vcpu *vcpu,
-				       struct kvm_nested_guest *gp,
-				       unsigned long n_gpa, unsigned long dsisr,
-				       struct kvmppc_pte *gpte_p)
+/*
+ * Inject a storage interrupt (instruction or data) to the nested guest.
+ *
+ * Normally don't inject interrupts to the nested guest directly but
+ * instead let it's guest hypervisor handle injecting interrupts. However
+ * there are cases where the guest hypervisor is providing access to a page
+ * but the level 0 hypervisor is not, and in this case we need to inject an
+ * interrupt directly.
+ */
+static void kvmhv_inject_nested_storage_int(struct kvm_vcpu *vcpu, bool data,
+					    bool writing, u64 addr, u64 flags)
 {
-	u64 fault_addr, flags = dsisr & DSISR_ISSTORE;
+	int vec = BOOK3S_INTERRUPT_INST_STORAGE;
+
+	if (writing)
+		flags |= DSISR_ISSTORE;
+	if (data) {
+		vec = BOOK3S_INTERRUPT_DATA_STORAGE;
+		kvmppc_set_dar(vcpu, addr);
+		kvmppc_set_dsisr(vcpu, flags);
+	}
+	kvmppc_inject_interrupt(vcpu, vec, flags);
+}
+
+/* Used to convert a radix nested guest real addr to a L1 guest real address */
+static int kvmhv_xlate_addr_nested_radix(struct kvm_vcpu *vcpu,
+					 struct kvm_nested_guest *gp,
+					 unsigned long n_gpa, bool data,
+					 bool writing,
+					 struct kvmppc_pte *gpte_p)
+{
+	u64 fault_addr, flags = writing ? DSISR_ISSTORE : 0ULL;
 	int ret;
 
 	ret = kvmppc_mmu_walk_radix_tree(vcpu, n_gpa, gpte_p, gp->l1_gr_to_hr,
@@ -1511,13 +1546,13 @@ static int kvmhv_translate_addr_nested(struct kvm_vcpu *vcpu,
 		goto forward_to_l1;
 	} else {
 		/* We found a pte -> check permissions */
-		if (dsisr & DSISR_ISSTORE) {
+		if (writing) {
 			/* Can we write? */
 			if (!gpte_p->may_write) {
 				flags |= DSISR_PROTFAULT;
 				goto forward_to_l1;
 			}
-		} else if (vcpu->arch.trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
+		} else if (!data) {
 			/* Can we execute? */
 			if (!gpte_p->may_execute) {
 				flags |= SRR1_ISI_N_OR_G;
@@ -1536,18 +1571,18 @@ static int kvmhv_translate_addr_nested(struct kvm_vcpu *vcpu,
 
 forward_to_l1:
 	vcpu->arch.fault_dsisr = flags;
-	if (vcpu->arch.trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
+	if (!data) {
 		vcpu->arch.shregs.msr &= ~0x783f0000ul;
-		vcpu->arch.shregs.msr |= flags;
+		vcpu->arch.shregs.msr |= (flags & 0x783f0000ul);
 	}
 	return RESUME_HOST;
 }
 
-static long kvmhv_handle_nested_set_rc(struct kvm_vcpu *vcpu,
-				       struct kvm_nested_guest *gp,
-				       unsigned long n_gpa,
-				       struct kvmppc_pte gpte,
-				       unsigned long dsisr)
+static long kvmhv_handle_nested_set_rc_radix(struct kvm_vcpu *vcpu,
+					     struct kvm_nested_guest *gp,
+					     unsigned long n_gpa,
+					     struct kvmppc_pte gpte,
+					     unsigned long dsisr)
 {
 	struct kvm *kvm = vcpu->kvm;
 	bool writing = !!(dsisr & DSISR_ISSTORE);
@@ -1623,7 +1658,8 @@ static long int __kvmhv_nested_page_fault_radix(struct kvm_run *run,
 	unsigned long *rmapp;
 	unsigned long n_gpa, gpa, gfn, perm = 0UL;
 	unsigned int shift, l1_shift, level;
-	bool writing = !!(dsisr & DSISR_ISSTORE);
+	bool data = vcpu->arch.trap == BOOK3S_INTERRUPT_H_DATA_STORAGE;
+	bool writing = data && (dsisr & DSISR_ISSTORE);
 	bool kvm_ro = false;
 	long int ret;
 
@@ -1638,7 +1674,8 @@ static long int __kvmhv_nested_page_fault_radix(struct kvm_run *run,
 	n_gpa = vcpu->arch.fault_gpa & ~0xF000000000000FFFULL;
 	if (!(dsisr & DSISR_PRTABLE_FAULT))
 		n_gpa |= ea & 0xFFF;
-	ret = kvmhv_translate_addr_nested(vcpu, gp, n_gpa, dsisr, &gpte);
+	ret = kvmhv_xlate_addr_nested_radix(vcpu, gp, n_gpa, data, writing,
+					    &gpte);
 
 	/*
 	 * If the hardware found a translation but we don't now have a usable
@@ -1654,7 +1691,8 @@ static long int __kvmhv_nested_page_fault_radix(struct kvm_run *run,
 
 	/* Failed to set the reference/change bits */
 	if (dsisr & DSISR_SET_RC) {
-		ret = kvmhv_handle_nested_set_rc(vcpu, gp, n_gpa, gpte, dsisr);
+		ret = kvmhv_handle_nested_set_rc_radix(vcpu, gp, n_gpa, gpte,
+						       dsisr);
 		if (ret == RESUME_HOST)
 			return ret;
 		if (ret)
@@ -1687,7 +1725,8 @@ static long int __kvmhv_nested_page_fault_radix(struct kvm_run *run,
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
 		if (dsisr & (DSISR_PRTABLE_FAULT | DSISR_BADACCESS)) {
 			/* unusual error -> reflect to the guest as a DSI */
-			kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+			kvmhv_inject_nested_storage_int(vcpu, data, writing, ea,
+							dsisr);
 			return RESUME_GUEST;
 		}
 
@@ -1697,8 +1736,8 @@ static long int __kvmhv_nested_page_fault_radix(struct kvm_run *run,
 	if (memslot->flags & KVM_MEM_READONLY) {
 		if (writing) {
 			/* Give the guest a DSI */
-			kvmppc_core_queue_data_storage(vcpu, ea,
-					DSISR_ISSTORE | DSISR_PROTFAULT);
+			kvmhv_inject_nested_storage_int(vcpu, data, writing, ea,
+							DSISR_PROTFAULT);
 			return RESUME_GUEST;
 		}
 		kvm_ro = true;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 18/23] KVM: PPC: Book3S HV: Separate out hashing from kvmppc_hv_find_lock_hpte()
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (16 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 17/23] KVM: PPC: Book3S HV: Nested: Rename kvmhv_xlate_addr_nested_radix Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 19/23] KVM: PPC: Book3S HV: Nested: Implement nested hpt mmu translation Suraj Jitindar Singh
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Separate out the hashing function from kvmppc_hv_find_lock_hpte() as
kvmppc_hv_get_hash_value() to allow this function to be reused and
prevent code duplication.

No functional change.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 60 ++++++++++++++++++++++---------------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index a939782d8a5e..c8a379a6f533 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -1162,6 +1162,40 @@ static struct mmio_hpte_cache_entry *
 	return &vcpu->arch.mmio_cache.entry[index];
 }
 
+/*
+ * Given an effective address and a slb entry, work out the hash and the
+ * virtual page number
+ */
+unsigned long kvmppc_hv_get_hash_value(struct kvm_hpt_info *hpt, gva_t eaddr,
+				       unsigned long slb_v, unsigned long *avpn,
+				       unsigned int *pshift_p)
+{
+	unsigned long hash, somask, vsid;
+	unsigned int pshift = 12;
+
+	if (slb_v & SLB_VSID_L)
+		pshift = slb_base_page_shift[(slb_v & SLB_VSID_LP) >> 4];
+	if (slb_v & SLB_VSID_B_1T) {
+		somask = (1UL << 40) - 1;
+		vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T;
+		vsid ^= vsid << 25;
+	} else {
+		somask = (1UL << 28) - 1;
+		vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
+	}
+	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvmppc_hpt_mask(hpt);
+	*avpn = slb_v & ~(somask >> 16);	/* also includes B */
+	*avpn |= (eaddr & somask) >> 16;
+
+	if (pshift >= 24)
+		*avpn &= ~((1UL << (pshift - 16)) - 1);
+	else
+		*avpn &= ~0x7fUL;
+	*pshift_p = pshift;
+
+	return hash;
+}
+
 /* When called from virtmode, this func should be protected by
  * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK
  * can trigger deadlock issue.
@@ -1171,39 +1205,17 @@ long kvmppc_hv_find_lock_hpte(struct kvm_hpt_info *hpt, gva_t eaddr,
 {
 	unsigned int i;
 	unsigned int pshift;
-	unsigned long somask;
-	unsigned long vsid, hash;
-	unsigned long avpn;
 	__be64 *hpte;
-	unsigned long mask, val;
+	unsigned long hash, mask, val;
 	unsigned long v, r, orig_v;
 
 	/* Get page shift, work out hash and AVPN etc. */
 	mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
-	val = 0;
-	pshift = 12;
+	hash = kvmppc_hv_get_hash_value(hpt, eaddr, slb_v, &val, &pshift);
 	if (slb_v & SLB_VSID_L) {
 		mask |= HPTE_V_LARGE;
 		val |= HPTE_V_LARGE;
-		pshift = slb_base_page_shift[(slb_v & SLB_VSID_LP) >> 4];
-	}
-	if (slb_v & SLB_VSID_B_1T) {
-		somask = (1UL << 40) - 1;
-		vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT_1T;
-		vsid ^= vsid << 25;
-	} else {
-		somask = (1UL << 28) - 1;
-		vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
 	}
-	hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvmppc_hpt_mask(hpt);
-	avpn = slb_v & ~(somask >> 16);	/* also includes B */
-	avpn |= (eaddr & somask) >> 16;
-
-	if (pshift >= 24)
-		avpn &= ~((1UL << (pshift - 16)) - 1);
-	else
-		avpn &= ~0x7fUL;
-	val |= avpn;
 
 	for (;;) {
 		hpte = (__be64 *)(hpt->virt + (hash << 7));
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 19/23] KVM: PPC: Book3S HV: Nested: Implement nested hpt mmu translation
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (17 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 18/23] KVM: PPC: Book3S HV: Separate out hashing from kvmppc_hv_find_lock_hpte() Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 20/23] KVM: PPC: Book3S HV: Nested: Handle tlbie hcall for nested hpt guest Suraj Jitindar Singh
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Implement the insertion of nested hpt (hash page table) translations in
to the shadow hpt. The shadow hpt is used to store ptes (page table
entries) which provide the translation from nested guest virtual address
to host real address. The translation from nested guest effective
address to virtual address is provided by the slb which the nested guest
can manage itself directly.

In order to construct this translation the hash page table in L1 memory
must first be searched to provide a translation from nested guest
virtual address to L1 guest real address. If no such entry is found then
an interrupt is provided to the L1 guest hypervisor so that it can
insert an entry.

The L1 guest real address is then translated to a host real address
through the radix page tables for the L1 guest, with an entry created
if one doesn't already exist.

The rc bits are then set for the pte in the L1 guest memory and the host
radix pte for the L1 guest since these will be set by the hardware in
the shadow pte automatically and so no interrupt will be provided to
ensure these can be kept in sync. In fact the rc bits in the shadow hpt
are set by software here. The c (change) bit is only set if the nested
guest is writing, otherwise the page is mapped read only so that we can
fault on a write to set the change bit and upgrade the write
permissions.

The combination of the pte permissions is applied to the entry inserted
and a nest rmap entry inserted.

Since we may come in with an existing entry when we just need to upgrade
the write permissions on a page (in which case the index was found and
stored in kvmppc_hpte_hv_fault()) we check for this case and just load
the guest rpte entry from the rev map rather than having to look it up
in L1 guest memory again.

This doesn't support the invalidation of translations by either the L0
or L1 hypervisors, this functionality is added in proceeding patches.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   2 +
 arch/powerpc/include/asm/kvm_book3s.h         |   4 +
 arch/powerpc/include/asm/kvm_book3s_64.h      |   9 +
 arch/powerpc/kvm/book3s_hv_nested.c           | 385 +++++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c           |   6 +-
 5 files changed, 404 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index c04e37b2c30d..f33dcb84a0bf 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -90,6 +90,8 @@
 #define HPTE_R_KEY_HI		ASM_CONST(0x3000000000000000)
 #define HPTE_R_KEY_BIT0		ASM_CONST(0x2000000000000000)
 #define HPTE_R_KEY_BIT1		ASM_CONST(0x1000000000000000)
+#define HPTE_R_B		ASM_CONST(0x0c00000000000000)
+#define HPTE_R_B_1T		ASM_CONST(0x0400000000000000)
 #define HPTE_R_RPN_SHIFT	12
 #define HPTE_R_RPN		ASM_CONST(0x0ffffffffffff000)
 #define HPTE_R_RPN_3_0		ASM_CONST(0x01fffffffffff000)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index f13dab096dad..b43d7f348712 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -158,6 +158,10 @@ extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
 extern int kvmppc_book3s_hv_page_fault(struct kvm_run *run,
 			struct kvm_vcpu *vcpu, unsigned long addr,
 			unsigned long status);
+extern unsigned long kvmppc_hv_get_hash_value(struct kvm_hpt_info *hpt,
+					      gva_t eaddr, unsigned long slb_v,
+					      unsigned long *avpn,
+					      unsigned int *pshift_p);
 extern long kvmppc_hv_find_lock_hpte(struct kvm_hpt_info *hpt, gva_t eaddr,
 			unsigned long slb_v, unsigned long valid);
 extern int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index c874ab3a037e..0db673501110 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -369,6 +369,15 @@ static inline unsigned long hpte_make_readonly(unsigned long ptel)
 	return ptel;
 }
 
+static inline unsigned long hpte_make_writable(unsigned long ptel)
+{
+	if ((ptel & HPTE_R_PP0) || (ptel & HPTE_R_PP) == PP_RWXX)
+		ptel = (ptel & ~HPTE_R_PPP) | PP_RWXX;
+	else
+		ptel = (ptel & ~HPTE_R_PP) | PP_RWRW;
+	return ptel;
+}
+
 static inline bool hpte_cache_flags_ok(unsigned long hptel, bool is_ci)
 {
 	unsigned int wimg = hptel & HPTE_R_WIMG;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 8ed50d4bd9a6..463745e535c5 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1026,6 +1026,16 @@ static inline u64 gpa_to_n_rmap(u64 gpa)
 		RMAP_NESTED_GPA_MASK;
 }
 
+static inline u64 n_rmap_to_index(u64 rmap)
+{
+	return (rmap & RMAP_NESTED_GPA_MASK) >> RMAP_NESTED_GPA_SHIFT;
+}
+
+static inline u64 index_to_n_rmap(u64 index)
+{
+	return (index << RMAP_NESTED_GPA_SHIFT) & RMAP_NESTED_GPA_MASK;
+}
+
 static inline int n_rmap_to_lpid(u64 rmap)
 {
 	return (int) ((rmap & RMAP_NESTED_LPID_MASK) >> RMAP_NESTED_LPID_SHIFT);
@@ -1817,12 +1827,385 @@ static long int __kvmhv_nested_page_fault_radix(struct kvm_run *run,
 	return RESUME_GUEST;
 }
 
+/*
+ * Used to convert a hash nested guest virtual addr to a L1 guest real addr
+ * Returns pte index of pte which provided the translation
+ */
+static long kvmhv_xlate_addr_nested_hash(struct kvm_vcpu *vcpu,
+					 struct kvm_nested_guest *gp,
+					 u64 eaddr, u64 slb_v, bool data,
+					 bool writing, u64 *v_p, u64 *r_p)
+{
+	unsigned long v, v_mask, v_match, r, r_mask, r_match;
+	u64 flags = writing ? DSISR_ISSTORE : 0ULL;
+	int pshift, i, ret;
+	u64 hash, pp, key;
+	u64 pteg[16];
+
+	/* NOTE: All handling done in new ISA V3.0 hpte format */
+
+	/* Compute the hash */
+	hash = kvmppc_hv_get_hash_value(&gp->shadow_hpt, eaddr, slb_v, &v_match,
+					&pshift);
+	/* Bits which must match */
+	v_mask = HPTE_V_AVPN_3_0 | HPTE_V_SECONDARY | HPTE_V_VALID;
+	v_match |= HPTE_V_VALID;
+	if (slb_v & SLB_VSID_L) {
+		v_mask |= HPTE_V_LARGE;
+		v_match |= HPTE_V_LARGE;
+	}
+	r_mask = HPTE_R_B;
+	r_match = (slb_v & SLB_VSID_B_1T) ? HPTE_R_B_1T : 0ULL;
+
+	/*
+	 * Read the pteg from L1 guest memory and search for a matching pte.
+	 * Note: No need to lock the pte since we hold the tlb_lock meaning
+	 * that L1 can't complete a tlbie and change the pte out from under us.
+	 */
+	while (true) {
+		u64 pteg_addr = (gp->l1_gr_to_hr & PATB_HTABORG) + (hash << 7);
+
+		ret = kvm_vcpu_read_guest(vcpu, pteg_addr, pteg, sizeof(pteg));
+		if (ret) {
+			flags |= DSISR_NOHPTE;
+			goto forward_to_l1;
+		}
+
+		for (i = 0; i < 16; i += 2) {
+			v = be64_to_cpu(pteg[i]) & ~HPTE_V_HVLOCK;
+			r = be64_to_cpu(pteg[i + 1]);
+
+			if (!((v ^ v_match) & v_mask) &&
+					!((r ^ r_match) & r_mask) &&
+					(kvmppc_hpte_base_page_shift(v, r) ==
+					 pshift))
+				goto match_found;
+		}
+
+		if (v_match & HPTE_V_SECONDARY) {
+			flags |= DSISR_NOHPTE;
+			goto forward_to_l1;
+		}
+		/* Try the secondary hash */
+		v_match |= HPTE_V_SECONDARY;
+		hash = hash ^ kvmppc_hpt_mask(&gp->shadow_hpt);
+	}
+
+match_found:
+	/* Match found - check the permissions */
+	pp = r & HPTE_R_PPP;
+	key = slb_v & (vcpu->arch.shregs.msr & MSR_PR ? SLB_VSID_KP :
+							SLB_VSID_KS);
+	if (!data) {		/* check execute permissions */
+		if (r & (HPTE_R_N | HPTE_R_G)) {
+			flags |= SRR1_ISI_N_OR_G;
+			goto forward_to_l1;
+		}
+		if (!hpte_read_permission(pp, key)) {
+			flags |= SRR1_ISI_PROT;
+			goto forward_to_l1;
+		}
+	} else if (writing) {	/* check write permissions */
+		if (!hpte_write_permission(pp, key)) {
+			flags |= DSISR_PROTFAULT;
+			goto forward_to_l1;
+		}
+	} else {		/* check read permissions */
+		if (!hpte_read_permission(pp, key)) {
+			flags |= DSISR_PROTFAULT;
+			goto forward_to_l1;
+		}
+	}
+
+	*v_p = v & ~HPTE_V_HVLOCK;
+	*r_p = r;
+	return (hash << 3) + (i >> 1);
+
+forward_to_l1:
+	vcpu->arch.fault_dsisr = flags;
+	if (!data) {
+		vcpu->arch.shregs.msr &= ~0x783f0000ul;
+		vcpu->arch.shregs.msr |= (flags & 0x783f0000ul);
+	}
+	return -1;
+}
+
+static long kvmhv_handle_nested_set_rc_hash(struct kvm_vcpu *vcpu,
+					    struct kvm_nested_guest *gp,
+					    unsigned long gpa, u64 index,
+					    u64 *gr, u64 *hr, bool writing)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 pgflags;
+	long ret;
+
+	pgflags = _PAGE_ACCESSED;
+	if (writing)
+		pgflags |= _PAGE_DIRTY;
+
+	/* Are the rc bits set in the L1 hash pte? */
+	if (pgflags & ~(*gr)) {
+		__be64 gr_be;
+		u64 addr = (gp->l1_gr_to_hr & PATB_HTABORG) + (index << 4);
+		addr += sizeof(*gr);	/* Writing second doubleword */
+
+		/* Update rc in the L1 guest pte */
+		(*gr) |= pgflags;
+		gr_be = cpu_to_be64(*gr);
+		ret = kvm_write_guest(kvm, addr, &gr_be, sizeof(gr_be));
+		if (ret)	/* Let the guest try again */
+			return -EINVAL;
+	}
+
+	/* Set the rc bit in the pte of our (L0) pgtable for the L1 guest */
+	spin_lock(&kvm->mmu_lock);
+	ret = kvmppc_hv_handle_set_rc(kvm, kvm->arch.pgtable, writing,
+				      gpa, kvm->arch.lpid);
+	spin_unlock(&kvm->mmu_lock);
+	if (!ret)		/* Let the guest try again */
+		return -EINVAL;
+
+	/* Set the rc bit in the pte of the shadow_hpt for the nest guest */
+	(*hr) |= pgflags;
+
+	return 0;
+}
+
 /* called with gp->tlb_lock held */
 static long int __kvmhv_nested_page_fault_hash(struct kvm_run *run,
 					       struct kvm_vcpu *vcpu,
 					       struct kvm_nested_guest *gp)
 {
-	return -EINVAL;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_memory_slot *memslot;
+	struct rmap_nested *n_rmap;
+	unsigned long hpte[3] = { 0UL };
+	unsigned long mmu_seq;
+	unsigned long dsisr = vcpu->arch.fault_dsisr;
+	unsigned long ea = vcpu->arch.fault_dar;
+	long index = vcpu->arch.pgfault_index;
+	unsigned long psize, *rmapp;
+	bool data = vcpu->arch.trap == BOOK3S_INTERRUPT_H_DATA_STORAGE;
+	bool writing = data && (dsisr & DSISR_ISSTORE);
+	bool kvm_ro = false;
+	u64 gv = 0ULL, gr = 0ULL, hr = 0ULL;
+	u64 gpa, gfn, hpa;
+	int l1_shift, shift, req_perm, h_perm;
+	pte_t pte, *pte_p;
+	__be64 *hptep;
+	long int ret;
+
+	/*
+	 * 1. Translate to a L1 Guest Real Addr
+	 * If there was no existing entry (pgfault_index < 0) then we need to
+	 * search for the guest hpte in l1 memory.
+	 * If we found an entry in kvmppc_hpte_hv_fault() (pgfault_index >= 0)
+	 * then lock the hpte and check it hasn't changed. If it has (because
+	 * a tlbie has completed between then and now) let the guest try again.
+	 * If the entry is valid then we are coming in here to upgrade the write
+	 * permissions on an existing hpte which we mapped read only to avoid
+	 * setting the change bit, and now the guest is writing to it.
+	 * If the entry isn't valid (which means it's absent) then the
+	 * guest_rpte is still valid, we just made it absent when the host
+	 * paged out the underlying page which was used to back the guest memory
+	 * NOTE: Since the shadow_hpt was allocated the same size as the l1 hpt
+	 * the index is preserved giving a 1-to-1 mapping between the hash page
+	 * tables, this could be changed in future.
+	 */
+	if (index >= 0) {
+		hptep = (__be64 *)(gp->shadow_hpt.virt + (index << 4));
+
+		preempt_disable();
+		while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+			cpu_relax();
+		hpte[0] = gv = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+		hpte[1] = hr = be64_to_cpu(hptep[1]);
+		hpte[2] = gr = gp->shadow_hpt.rev[index].guest_rpte;
+		unlock_hpte(hptep, hpte[0]);
+		preempt_enable();
+
+		/* hpt modified under us? */
+		if (hpte[0] != hpte_old_to_new_v(vcpu->arch.pgfault_hpte[0]) ||
+		    hpte[1] != hpte_old_to_new_r(vcpu->arch.pgfault_hpte[0],
+						 vcpu->arch.pgfault_hpte[1]))
+			return RESUME_GUEST;	/* Let the guest try again */
+	} else {
+		/* Note: fault_gpa was used to store the slb_v entry */
+		index = kvmhv_xlate_addr_nested_hash(vcpu, gp, ea,
+						     vcpu->arch.fault_gpa, data,
+						     writing, &gv, &gr);
+		if (index < 0)
+			return RESUME_HOST;
+		hptep = (__be64 *)(gp->shadow_hpt.virt + (index << 4));
+	}
+	l1_shift = kvmppc_hpte_actual_page_shift(gv, gr);
+	psize = (1UL << l1_shift);
+	gfn = (gr & HPTE_R_RPN_3_0 & ~(psize - 1)) >> PAGE_SHIFT;
+	gpa = (gfn << PAGE_SHIFT) | (ea & (psize - 1));
+
+	/* 2. Find the host memslot */
+
+	memslot = gfn_to_memslot(kvm, gfn);
+	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
+		/* passthrough of emulated MMIO case */
+		pr_err("emulated MMIO passthrough?\n");
+		return -EINVAL;
+	}
+	if (memslot->flags & KVM_MEM_READONLY) {
+		if (writing) {
+			/* Give the guest a DSI */
+			kvmhv_inject_nested_storage_int(vcpu, data, ea, writing,
+							DSISR_PROTFAULT);
+			return RESUME_GUEST;
+		}
+		kvm_ro = true;
+	}
+
+	/* 3. Translate to a L0 Host Real Address through the L0 page table */
+
+	/* Used to check for invalidations in progress */
+	mmu_seq = kvm->mmu_notifier_seq;
+	smp_rmb();
+
+	/* See if can find translation in our partition scoped tables for L1 */
+	if (!kvm->arch.radix) {
+		/* only support nested hpt guest under radix l1 guest */
+		pr_err("nested hpt guest only supported under radix guest\n");
+		return -EINVAL;
+	}
+	pte = __pte(0);
+	spin_lock(&kvm->mmu_lock);
+	pte_p = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift);
+	spin_unlock(&kvm->mmu_lock);
+
+	if (!shift)
+		shift = PAGE_SHIFT;
+	if (pte_p)
+		pte = *pte_p;
+
+	if (!pte_present(pte) || (writing && !(pte_val(pte) & _PAGE_WRITE))) {
+		int level;
+		/* No suitable pte found -> try to insert a mapping */
+		ret = kvmppc_book3s_instantiate_page(vcpu, gpa, memslot,
+						writing, kvm_ro, &pte, &level);
+		if (ret == -EAGAIN)
+			return RESUME_GUEST;
+		else if (ret)
+			return ret;
+		shift = kvmppc_radix_level_to_shift(level);
+	}
+
+	if (shift < l1_shift)	/* Don't support L1 using larger page than us */
+		return -EINVAL;
+	if (!hpte_cache_flags_ok(gr, pte_ci(pte)))
+		return -EINVAL;
+	hpa = pte_pfn(pte) << PAGE_SHIFT;
+	/* Align gfn to the start of the page */
+	gfn = (gpa & ~((1UL << shift) - 1)) >> PAGE_SHIFT;
+
+	/* 4. Compute the PTE we're going to insert */
+
+	if (!hr) {	/* Not an existing entry */
+		hr = gr & ~HPTE_R_RPN_3_0;	/* Copy everything except rpn */
+		hr |= ((psize - HPTE_R_KEY_BIT2) & gr);	/* psize encoding */
+		hr |= (hpa & HPTE_R_RPN_3_0 & ~((1UL << shift) - 1));
+		if (shift > l1_shift)	/* take some bits from the gpa */
+			hr |= (gpa & ((1UL << shift) - psize));
+	}
+
+	/* Limit permissions based on the L0 pte */
+	req_perm = data ? (writing ? (_PAGE_READ | _PAGE_WRITE) : _PAGE_READ)
+			: _PAGE_EXEC;
+	h_perm = (pte_val(pte) & _PAGE_READ) ? _PAGE_READ : 0;
+	h_perm |= (pte_val(pte) & _PAGE_WRITE) ? (_PAGE_READ |
+						 (kvm_ro ? 0 : _PAGE_WRITE))
+					       : 0;
+	h_perm |= (pte_val(pte) & _PAGE_EXEC) ? _PAGE_EXEC : 0;
+	if (req_perm & ~h_perm) {
+		/* host doesn't provide a required permission -> dsi to guest */
+		kvmhv_inject_nested_storage_int(vcpu, data, ea, writing,
+						DSISR_PROTFAULT);
+		return RESUME_GUEST;
+	}
+	if (!(h_perm & _PAGE_EXEC))	/* Make page no execute */
+		hr |= HPTE_R_N;
+	if (!(h_perm & _PAGE_WRITE)) {	/* Make page no write */
+		hr = hpte_make_readonly(hr);
+		writing = 0;
+	} else if (!writing) {
+		/*
+		 * Make page no write so we can defer setting the change bit.
+		 * If the guest writes to the page we'll come back in to
+		 * upgrade the permissions and set the change bit then.
+		 */
+		hr = hpte_make_readonly(hr);
+	} else {	/* _PAGE_WRITE && writing */
+		hr = hpte_make_writable(hr);
+	}
+
+	/* 5. Update rc bits if required */
+
+	ret = kvmhv_handle_nested_set_rc_hash(vcpu, gp, gpa, index, &gr, &hr,
+					      writing);
+	if (ret)
+		return RESUME_GUEST;		/* Let the guest try again */
+
+	/* 6. Generate the nest rmap */
+
+	n_rmap = kzalloc(sizeof(*n_rmap), GFP_KERNEL);
+	if (!n_rmap)				/* Let the guest try again */
+		return RESUME_GUEST;
+	n_rmap->rmap = index_to_n_rmap(index) | lpid_to_n_rmap(gp->l1_lpid);
+	rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
+
+	/* 7. Insert the PTE */
+
+	/* Check if we might have been invalidated; let the guest retry if so */
+	spin_lock(&kvm->mmu_lock);
+	if (mmu_notifier_retry(kvm, mmu_seq))
+		goto out_free;
+
+	/* Lock the hpte */
+	preempt_disable();
+	while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+		cpu_relax();
+
+	/* Check that the entry hasn't been changed out from under us */
+	if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
+	     be64_to_cpu(hptep[1]) != hpte[1] ||
+	     gp->shadow_hpt.rev[index].guest_rpte != hpte[2])
+		goto out_unlock;		/* Let the guest try again */
+
+	/* Ensure valid bit set in hpte */
+	gv = (gv & ~HPTE_V_ABSENT) | HPTE_V_VALID;
+
+	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
+		/* HPTE was previously valid, so we need to invalidate it */
+		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		kvmppc_invalidate_hpte(gp->shadow_lpid, hptep, index);
+	}
+
+	/* Insert the rmap entry */
+	kvmhv_insert_nest_rmap(rmapp, &n_rmap);
+
+	/* Always update guest_rpte in case we updated rc bits */
+	gp->shadow_hpt.rev[index].guest_rpte = gr;
+
+	hptep[1] = cpu_to_be64(hr);
+	eieio();
+	__unlock_hpte(hptep, gv);
+	preempt_enable();
+
+out_free:
+	spin_unlock(&kvm->mmu_lock);
+	if (n_rmap)
+		kfree(n_rmap);
+	return RESUME_GUEST;
+
+out_unlock:
+	__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
+	preempt_enable();
+	goto out_free;
 }
 
 long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c8a379a6f533..3c01957acb0e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -1195,6 +1195,7 @@ unsigned long kvmppc_hv_get_hash_value(struct kvm_hpt_info *hpt, gva_t eaddr,
 
 	return hash;
 }
+EXPORT_SYMBOL_GPL(kvmppc_hv_get_hash_value);
 
 /* When called from virtmode, this func should be protected by
  * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK
@@ -1291,8 +1292,11 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 
 	hpt = &kvm->arch.hpt;
 	nested = vcpu->arch.nested;
-	if (nested)
+	if (nested) {
 		hpt = &nested->shadow_hpt;
+		/* reuse fault_gpa field to save slb for nested pgfault funcn */
+		vcpu->arch.fault_gpa = slb_v;
+	}
 
 	/* For protection fault, expect to find a valid HPTE */
 	valid = HPTE_V_VALID;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 20/23] KVM: PPC: Book3S HV: Nested: Handle tlbie hcall for nested hpt guest
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (18 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 19/23] KVM: PPC: Book3S HV: Nested: Implement nested hpt mmu translation Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 21/23] KVM: PPC: Book3S HV: Nested: Implement nest rmap invalidations for hpt guests Suraj Jitindar Singh
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The tlbie instruction is used to invalidate caching of translation
information derived from the partition and process scoped page tables.
This instruction is hypervisor privileged when specifying partition
scoped translations to be invalidated. Thus this interface is
paravirtualised with the H_TLB_INVALIDATE hcall which is used by a
pseries hypervisor to perform partition scoped invalidations. This is
then handled in the hypervisor in kvmhv_emulate_priv_tlbie().

When handling this hcall in the hypervisor it is necessary to invalidate
caching of partition scoped translations which are in the shadow page
table (radix) or shadow hpt (hash page table). This functionality
already exists for radix, so implement it for a hash guest.

LPID wide invalidations are already handled and don't differ from the
radix case. However the case where a single tlb entry corresponding to a
virtual address is being invalidated needs to be implemented here. The
information to find the entry is provided as an hcall argument and is
expected to match the layout of rb specified for the tlbie instruction.

The rb value provides an abbreviated virtual address, base and actual
page size and segment size to be used to search for the corresponding
tlb entry. A hash is computed and this is used to find the pteg which
needs to be searched. However since only 64 bits of the virtual address
are supplied and depending on the segment and hpt size up to all 78 bits
of the virtual address may be needed to compute the hash we have to mask
out the bits which can't be determined and iterate through all possible
combinations looking for a matching entry in all of the ptegs which are
addressed by all of the possible hash values. Although there is in theory a
1-to-1 relationship between ptes in the shadow hpt and the hpt
maintained by the guest hypervisor we need to invalidate all matches
since they can't be differentiated.

When a matching entry is found it is invalidated if it was valid and the
corresponding tlbie is issued by the hypervisor, irrespective the pte is
then zeroed. An optimisation here would be to just make the pte absent and
extend the rev map to store the host real doubleword since it is still
valid.

Note: ric == 3 (cluster bombs) are not supported even though the ISA
technically allows for them, their encoding is implemenatation dependant
and linux doesn't use them.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  11 +
 arch/powerpc/kvm/book3s_hv_nested.c           | 293 ++++++++++++++++++++++----
 2 files changed, 258 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index f33dcb84a0bf..70f4545fecbb 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -129,6 +129,17 @@
 #define TLBIEL_INVAL_SET_MASK	0xfff000	/* set number to inval. */
 #define TLBIEL_INVAL_SET_SHIFT	12
 
+/* Fields in the rb registers for the tlbie instruction */
+#define TLBIE_RB_AVA_4K		ASM_CONST(0xfffffffffffff000)
+#define TLBIE_RB_AVA_L		ASM_CONST(0xfffffffffff00000)
+#define TLBIE_RB_LP		ASM_CONST(0x00000000000ff000)
+#define TLBIE_RB_B		ASM_CONST(0x0000000000000300)
+#define TLBIE_RB_B_1T		ASM_CONST(0x0000000000000100)
+#define TLBIE_RB_B_SHIFT	50	/* Shift to match the pte location */
+#define TLBIE_RB_AVAL		ASM_CONST(0x00000000000000fe)
+#define TLBIE_RB_AVAL_SHIFT	12
+#define TLBIE_RB_L		ASM_CONST(0x0000000000000001)
+
 #define POWER7_TLB_SETS		128	/* # sets in POWER7 TLB */
 #define POWER8_TLB_SETS		512	/* # sets in POWER8 TLB */
 #define POWER9_TLB_SETS_HASH	256	/* # sets in POWER9 TLB Hash mode */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 463745e535c5..57add167115e 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -931,10 +931,10 @@ void kvmhv_release_all_nested(struct kvm *kvm)
 }
 
 /* caller must hold gp->tlb_lock */
-static void kvmhv_flush_nested(struct kvm_nested_guest *gp)
+static void kvmhv_flush_nested(struct kvm *kvm, struct kvm_nested_guest *gp,
+			       bool invalidate_ptbl)
 {
-	struct kvm *kvm = gp->l1_host;
-
+	/* Invalidate (zero) all entries in the shadow pgtable or shadow hpt */
 	spin_lock(&kvm->mmu_lock);
 	if (gp->radix) {
 		kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable,
@@ -947,10 +947,15 @@ static void kvmhv_flush_nested(struct kvm_nested_guest *gp)
 			sizeof(struct revmap_entry));
 	}
 	spin_unlock(&kvm->mmu_lock);
+	/* remove all nested rmap entries and perform global invalidation */
+	kvmhv_remove_all_nested_rmap_lpid(kvm, gp->l1_lpid);
 	kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);
-	kvmhv_update_ptbl_cache(gp);
-	if (gp->l1_gr_to_hr == 0)
-		kvmhv_remove_nested(gp);
+	/* was caching of the partition table entries also invalidated? */
+	if (invalidate_ptbl) {
+		kvmhv_update_ptbl_cache(gp);
+		if (gp->l1_gr_to_hr == 0)
+			kvmhv_remove_nested(gp);
+	}
 }
 
 struct kvm_nested_guest *kvmhv_get_nested(struct kvm *kvm, int l1_lpid,
@@ -1296,6 +1301,158 @@ static bool kvmhv_invalidate_shadow_pte_radix(struct kvm_vcpu *vcpu,
 	return ret;
 }
 
+/* Called with the hpte locked */
+static void kvmhv_invalidate_shadow_pte_hash(struct kvm_hpt_info *hpt,
+					     unsigned int lpid, __be64 *hptep,
+					     unsigned long index)
+{
+	hpt->rev[index].guest_rpte = 0UL;
+	if (hptep[0] & cpu_to_be64(HPTE_V_VALID)) {
+		/* HPTE was previously valid, so we need to invalidate it */
+		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		kvmppc_invalidate_hpte(lpid, hptep, index);
+	}
+	hptep[1] = 0ULL;
+	eieio();
+	__unlock_hpte(hptep, 0UL);
+}
+
+/* Calculate hash given a virtual address, base page shift, and segment size */
+static unsigned long kvmppc_hv_get_hash_value_va(struct kvm_hpt_info *hpt,
+						 unsigned long va, int pshift,
+						 unsigned long b)
+{
+	unsigned long hash, somask;
+
+	if (b & HPTE_R_B_1T) {	/* 1T segment */
+		somask = (1UL << 40) - 1;
+		hash = va >> 40;
+		hash ^= hash << 25;
+	} else {		/* 256M segment */
+		somask = (1UL << 28) - 1;
+		hash = va >> 28;
+	}
+	hash ^= ((va & somask) >> pshift);
+	hash &= kvmppc_hpt_mask(hpt);
+
+	return hash;
+}
+
+/* called with gp->tlb_lock held */
+static void kvmhv_tlbie_hpt_addr(struct kvm_nested_guest *gp, unsigned long va,
+				 int base_pshift, int actual_pshift,
+				 unsigned long b)
+{
+	unsigned long mask, hash_incr, num, i;
+	struct kvm_hpt_info *hpt = &gp->shadow_hpt;
+	__be64 *hptep;
+	unsigned long hash, v, v_mask, v_match, r, r_mask, r_match;
+
+	hash = kvmppc_hv_get_hash_value_va(hpt, va, base_pshift, b);
+
+	/*
+	 * The virtual address provided to us in the rb register for tlbie is
+	 * bits 14:77 of the virtual address, however we support a 68 bit
+	 * virtual address on P9. This means that we actually need bits 10:77 of
+	 * the virtual address to calculate all possible hash values for a 68
+	 * bit virtual address space. This means that dependant on the size of
+	 * the hpt (and thus the number of hash bits we actually use to find
+	 * the pteg index) we might have to search up to 16 ptegs (1TB segs) or
+	 * 8 ptegs (256M segs) for a match.
+	 */
+	if (b & HPTE_R_B_1T) {	/* 1T segment */
+		/*
+		 * The hash when using 1T segments uses bits 0:37 of the VA.
+		 * Thus to cover the missing bits of the VA (bits 0:13) we need
+		 * to zero any of these bits being used (as determined by
+		 * kvmppc_hpt_mask()) and then search all possible values.
+		 */
+		hash_incr = 1UL << 24;
+		mask = (0x3ffUL << 24) & kvmppc_hpt_mask(hpt);
+		hash &= ~mask;
+		num = mask >> 24;
+	} else {		/* 256M segment */
+		/*
+		 * The hash when using 256M segments uses bits 11:49 of the VA.
+		 * Thus to cover the missing bits of the VA (bits 11:13) we need
+		 * to zero any of these bits being used (as determined by
+		 * kvmppc_hpt_mask()) and then search all possible values.
+		 */
+		hash_incr = 1UL << 36;
+		mask = (0x7UL << 36) & kvmppc_hpt_mask(hpt);
+		hash &= ~mask;
+		num = mask >> 36;
+	}
+
+	/* Calculate what we're going to match the hpte on */
+	v_match = va >> 16;	/* Align va to ava in the hpte */
+	if (base_pshift >= 24)
+		v_match &= ~((1UL << (base_pshift - 16)) - 1);
+	else
+		v_match &= ~0x7fUL;
+	if (actual_pshift > 12)
+		v_match |= HPTE_V_LARGE;
+	r_match = b;
+	/* We don't have the top 4 bits of the ava to match on */
+	v_mask = (TLBIE_RB_AVA_4K >> 16) & HPTE_V_AVPN_3_0;
+	v_mask |= HPTE_V_LARGE | HPTE_V_SECONDARY;
+	r_mask = HPTE_R_B;
+
+	/* Iterate through the ptegs which we have to search */
+	for (i = 0; i <= num; i++, hash += hash_incr) {
+		unsigned long pteg_addr = hash << 7;
+		v_match &= ~HPTE_V_SECONDARY;
+
+		/* Try both the primary and the secondary hash */
+		while (true) {
+			int j;
+			hptep = (__be64 *)(hpt->virt + pteg_addr);
+
+			/* There are 8 entries in the pteg to search */
+			for (j = 0; j < 16; j += 2) {
+				preempt_disable();
+				/* Lock the pte */
+				while (!try_lock_hpte(&hptep[j], HPTE_V_HVLOCK))
+					cpu_relax();
+				v = be64_to_cpu(hptep[j]) & ~HPTE_V_HVLOCK;
+				r = be64_to_cpu(hptep[j + 1]);
+
+				/*
+				 * Check for a match under the lock
+				 * NOTE: the entry might be valid or absent
+				 */
+				if ((v & (HPTE_V_VALID | HPTE_V_ABSENT)) &&
+				    !((v ^ v_match) & v_mask) &&
+				    !((r ^ r_match) & r_mask) &&
+				    (kvmppc_hpte_base_page_shift(v, r) ==
+				     base_pshift) &&
+				    (kvmppc_hpte_actual_page_shift(v, r) ==
+				     actual_pshift))
+					kvmhv_invalidate_shadow_pte_hash(hpt,
+						gp->shadow_lpid, &hptep[j],
+						(pteg_addr >> 4) + (j >> 1));
+				else
+					__unlock_hpte(&hptep[j], v);
+				preempt_enable();
+				/*
+				 * In theory there is a 1-to-1 mapping between
+				 * entries in the L1 hpt and our shadow hpt,
+				 * however since L1 can't exactly specify a
+				 * hpte (since we're missing some va bits) we
+				 * must invalidate any match which we find and
+				 * continue the search.
+				 */
+			}
+
+			if (v_match & HPTE_V_SECONDARY)
+				break;
+			/* try the secondary hash */
+			v_match |= HPTE_V_SECONDARY;
+			pteg_addr ^= (kvmppc_hpt_mask(hpt) << 7);
+		}
+	}
+}
+
 static inline int get_ric(unsigned int instr)
 {
 	return (instr >> 18) & 0x3;
@@ -1331,44 +1488,82 @@ static inline long get_epn(unsigned long r_val)
 	return r_val >> 12;
 }
 
+/* SLB[lp] encodings for base page shifts */
+static int slb_base_page_shift[4] = {
+	24,     /* 16M */
+	16,     /* 64k */
+	34,     /* 16G */
+	20,     /* 1M, unsupported */
+};
+
 static int kvmhv_emulate_tlbie_tlb_addr(struct kvm_vcpu *vcpu, int lpid,
-					int ap, long epn)
+					bool radix, unsigned long rbval)
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_nested_guest *gp;
-	long npages;
-	int shift, shadow_shift;
-	unsigned long addr;
 	int rc = 0;
 
-	shift = ap_to_shift(ap);
-	addr = epn << 12;
-	if (shift < 0)
-		/* Invalid ap encoding */
-		return -EINVAL;
-
-	addr &= ~((1UL << shift) - 1);
-	npages = 1UL << (shift - PAGE_SHIFT);
-
 	gp = kvmhv_get_nested(kvm, lpid, false);
 	if (!gp) /* No such guest -> nothing to do */
 		return 0;
 	mutex_lock(&gp->tlb_lock);
 
-	/* XXX TODO hpt */
-	if (!gp->radix) {
-		rc = -EINVAL;
-		goto out_unlock;
-	}
+	if (radix) {	/* Radix Invalidation */
+		int shift, shadow_shift;
+		unsigned long addr;
+		long npages;
 
-	/* There may be more than one host page backing this single guest pte */
-	do {
-		kvmhv_invalidate_shadow_pte_radix(vcpu, gp, addr,
-						  &shadow_shift);
+		/* Radix invalidation but this is a hpt guest, nothing to do */
+		if (!gp->radix)
+			goto out_unlock;
+
+		shift = ap_to_shift(get_ap(rbval));
+		addr = get_epn(rbval) << 12;
+		if (shift < 0) {	/* Invalid ap encoding */
+			rc = -EINVAL;
+			goto out_unlock;
+		}
+
+		addr &= ~((1UL << shift) - 1);
+		npages = 1UL << (shift - PAGE_SHIFT);
+		/* There may be more than one host page backing this single guest pte */
+		do {
+			kvmhv_invalidate_shadow_pte_radix(vcpu, gp, addr,
+							  &shadow_shift);
+
+			npages -= 1UL << (shadow_shift - PAGE_SHIFT);
+			addr += 1UL << shadow_shift;
+		} while (npages > 0);
+	} else {	/* Hash Invalidation */
+		int base_pshift = 12, actual_pshift = 12;
+		unsigned long ava, b = (rbval & TLBIE_RB_B) << TLBIE_RB_B_SHIFT;
+
+		/* HPT invalidation but this is a radix guest, nothing to do */
+		if (gp->radix)
+			goto out_unlock;
+
+		/* Decode the rbval into ava, b, and base and actual pshifts */
+		if (rbval & TLBIE_RB_L) {	/* large base page size */
+			unsigned long lp = rbval & TLBIE_RB_LP;
+			ava = (rbval & TLBIE_RB_AVA_L) |
+			      ((rbval & TLBIE_RB_AVAL) << TLBIE_RB_AVAL_SHIFT);
+
+			/* base and actual page size encoded in lp field */
+			base_pshift = kvmppc_hpte_base_page_shift(HPTE_V_LARGE,
+								  lp);
+			actual_pshift = kvmppc_hpte_actual_page_shift(HPTE_V_LARGE,
+								      lp);
+		} else {			/* !large base page size */
+			int ap = get_ap(rbval);
+			ava = rbval & TLBIE_RB_AVA_4K;
+
+			/* actual page size encoded in ap field */
+			if (ap & 0x4)
+				actual_pshift = slb_base_page_shift[ap & 0x3];
+		}
 
-		npages -= 1UL << (shadow_shift - PAGE_SHIFT);
-		addr += 1UL << shadow_shift;
-	} while (npages > 0);
+		kvmhv_tlbie_hpt_addr(gp, ava, base_pshift, actual_pshift, b);
+	}
 
 out_unlock:
 	mutex_unlock(&gp->tlb_lock);
@@ -1381,16 +1576,11 @@ static void kvmhv_emulate_tlbie_lpid(struct kvm_vcpu *vcpu,
 {
 	struct kvm *kvm = vcpu->kvm;
 
-	/* XXX TODO hpt */
 	mutex_lock(&gp->tlb_lock);
 	switch (ric) {
 	case 0:
 		/* Invalidate TLB */
-		spin_lock(&kvm->mmu_lock);
-		kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable,
-					  gp->shadow_lpid);
-		spin_unlock(&kvm->mmu_lock);
-		kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);
+		kvmhv_flush_nested(kvm, gp, false);
 		break;
 	case 1:
 		/*
@@ -1400,7 +1590,7 @@ static void kvmhv_emulate_tlbie_lpid(struct kvm_vcpu *vcpu,
 		break;
 	case 2:
 		/* Invalidate TLB, PWC and caching of partition table entries */
-		kvmhv_flush_nested(gp);
+		kvmhv_flush_nested(kvm, gp, true);
 		break;
 	default:
 		break;
@@ -1431,9 +1621,8 @@ static int kvmhv_emulate_priv_tlbie(struct kvm_vcpu *vcpu, unsigned int instr,
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_nested_guest *gp;
-	int r, ric, prs, is, ap;
+	int r, ric, prs, is;
 	int lpid;
-	long epn;
 	int ret = 0;
 
 	ric = get_ric(instr);
@@ -1444,14 +1633,28 @@ static int kvmhv_emulate_priv_tlbie(struct kvm_vcpu *vcpu, unsigned int instr,
 
 	/*
 	 * These cases are invalid and are not handled:
-	 * r   != 1 -> Only radix supported
+	 *
+	 * Radix:
 	 * prs == 1 -> Not HV privileged
 	 * ric == 3 -> No cluster bombs for radix
 	 * is  == 1 -> Partition scoped translations not associated with pid
 	 * (!is) && (ric == 1 || ric == 2) -> Not supported by ISA
+	 *
+	 * HPT:
+	 * prs == 1 && ric != 2	-> Only process scoped caching is process table
+	 * ric == 1		-> No page walk cache for HPT
+	 * (!is) && ric == 2	-> Not supported by ISA
+	 * ric == 3		-> Although cluster bombs are technically
+	 * 			   supported for is == 0, their encoding is
+	 * 			   implementation specific and linux doesn't
+	 * 			   use them, so we don't handle them for now.
+	 * is == 1		-> HPT translations not associated with pid
 	 */
-	if ((!r) || (prs) || (ric == 3) || (is == 1) ||
-	    ((!is) && (ric == 1 || ric == 2)))
+	if (r && ((prs) || (ric == 3) || (is == 1) ||
+			   ((!is) && (ric == 1 || ric == 2))))
+		return -EINVAL;
+	else if (!r && ((prs && (ric != 2)) || (ric == 1) ||
+			(!is && (ric == 2)) || (is == 1) || (ric == 3)))
 		return -EINVAL;
 
 	switch (is) {
@@ -1460,9 +1663,7 @@ static int kvmhv_emulate_priv_tlbie(struct kvm_vcpu *vcpu, unsigned int instr,
 		 * We know ric == 0
 		 * Invalidate TLB for a given target address
 		 */
-		epn = get_epn(rbval);
-		ap = get_ap(rbval);
-		ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, ap, epn);
+		ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, r, rbval);
 		break;
 	case 2:
 		/* Invalidate matching LPID */
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 21/23] KVM: PPC: Book3S HV: Nested: Implement nest rmap invalidations for hpt guests
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (19 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 20/23] KVM: PPC: Book3S HV: Nested: Handle tlbie hcall for nested hpt guest Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 22/23] KVM: PPC: Book3S HV: Nested: Enable nested " Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 23/23] KVM: PPC: Book3S HV: Add nested hpt pte information to debugfs Suraj Jitindar Singh
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The nest rmap is used to store a reverse mapping from the (L1) guest real
address back to a pte in the shadow page table which maps it. This is used
when the host is modifying a L1 guest pte (either invalidating it or
modifying the rc bits) to make the necessary changes to the ptes in
the shadow tables which map that L1 guest page. This is already
implemented for a nested radix guest where the rmap entry stores the gpa
(guest physical address) of the nested pte which can be used to traverse
the shadow page table and find any matching ptes. Implement this nested
rmap invalidation for nested hpt (hash page table) guests.

We reuse the nest rmap structure that already exists for radix nested
guests for nested hpt guests. Instead of storing the gpa the hpt index
of the pte is stored. This means that a pte in the shadow hpt can be
uniquely identified by the nest rmap. As with the radix case we check
that the same host page is being addressed to detect if this is a stale
rmap entry, in which case we skip the invalidation.

When the host is invalidating a mapping for a L1 guest page use the
nest rmap to find any shadow ptes in the shadow hpt which map that page
and invalidate then, also invalidate any caching of the entry. A future
optimisation would be to make the pte absend so that we can avoid having
to lookup the guest rpte the next time an entry is faulted in.

When the host is clearing rc bits for a mapping for a L1 guest page use
the nest rmap to find any shadow ptes in the shadow hpt which map that
page and invalidate them as in the above case for invalidating a L1 guest
page. It is not sufficient to clear the rc bits in the shadow pte since
hardware can set them again without software intervention, so the mapping
must be made invalid so that we will take a page fault and can ensure that
the rc bits stay in sync in the page fault handler.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv_nested.c | 114 +++++++++++++++++++++++++++---------
 1 file changed, 85 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 57add167115e..90788a52b298 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -25,6 +25,9 @@ static struct patb_entry *pseries_partition_tb;
 static void kvmhv_update_ptbl_cache(struct kvm_nested_guest *gp);
 static void kvmhv_remove_all_nested_rmap_lpid(struct kvm *kvm, int lpid);
 static void kvmhv_free_memslot_nest_rmap(struct kvm_memory_slot *free);
+static void kvmhv_invalidate_shadow_pte_hash(struct kvm_hpt_info *hpt,
+					     unsigned int lpid, __be64 *hptep,
+					     unsigned long index);
 
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 {
@@ -1135,30 +1138,57 @@ static void kvmhv_update_nest_rmap_rc(struct kvm *kvm, u64 n_rmap,
 				      unsigned long hpa, unsigned long mask)
 {
 	struct kvm_nested_guest *gp;
-	unsigned long gpa;
-	unsigned int shift, lpid;
-	pte_t *ptep;
+	unsigned int lpid;
 
-	gpa = n_rmap_to_gpa(n_rmap);
 	lpid = n_rmap_to_lpid(n_rmap);;
 	gp = kvmhv_find_nested(kvm, lpid);
 	if (!gp)
 		return;
 
-	/* Find the pte */
-	if (gp->radix)
-		ptep = __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift);
-	else
-		ptep = NULL;	/* XXX TODO */
 	/*
-	 * If the pte is present and the pfn is still the same, update the pte.
-	 * If the pfn has changed then this is a stale rmap entry, the nested
-	 * gpa actually points somewhere else now, and there is nothing to do.
-	 * XXX A future optimisation would be to remove the rmap entry here.
+	 * Find the pte, and ensure it's valid and still points to the same
+	 * host page. If the pfn has changed then this is a stale rmap entry,
+	 * the shadow pte actually points somewhere else now, and there is
+	 * nothing to do. Otherwise clear the requested rc bits from the shadow
+	 * pte and perform the appropriate cache invalidation.
+	 * XXX A future optimisation would be to remove the rmap entry
 	 */
-	if (ptep && pte_present(*ptep) && ((pte_val(*ptep) & mask) == hpa)) {
-		__radix_pte_update(ptep, clr, set);
-		kvmppc_radix_tlbie_page(kvm, gpa, shift, lpid);
+	if (gp->radix) {
+		unsigned long gpa = n_rmap_to_gpa(n_rmap);
+		unsigned int shift;
+		pte_t *ptep;
+
+		ptep = __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift);
+		/* pte present and still points to the same host page? */
+		if (ptep && pte_present(*ptep) && ((pte_val(*ptep) & mask) ==
+						   hpa)) {
+			__radix_pte_update(ptep, clr, set);
+			kvmppc_radix_tlbie_page(kvm, gpa, shift, lpid);
+		}
+	 } else {
+		unsigned long v, r, index = n_rmap_to_index(n_rmap);
+		__be64 *hptep = (__be64 *)(gp->shadow_hpt.virt + (index << 4));
+
+		preempt_disable();
+		while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+			cpu_relax();
+		v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+		r = be64_to_cpu(hptep[1]);
+
+		/*
+		 * It's not enough to just clear the rc bits here since the
+		 * hardware can just set them again transparently, we need to
+		 * make the pte invalid so that an attempt to access the page
+		 * will invoke the page fault handler and we can ensure
+		 * consistency across the rc bits in the various ptes.
+		 */
+		if ((v & HPTE_V_VALID) && ((r & mask) == hpa))
+			kvmhv_invalidate_shadow_pte_hash(&gp->shadow_hpt,
+							 gp->shadow_lpid, hptep,
+							 index);
+		else	/* Leave pte unchanged */
+			__unlock_hpte(hptep, v);
+		preempt_enable();
 	}
 }
 
@@ -1179,7 +1209,7 @@ void kvmhv_update_nest_rmap_rc_list(struct kvm *kvm, unsigned long *rmapp,
 	if ((clr | set) & ~(_PAGE_DIRTY | _PAGE_ACCESSED))
 		return;
 
-	mask = PTE_RPN_MASK & ~(nbytes - 1);
+	mask = HPTE_R_RPN_3_0 & ~(nbytes - 1);
 	hpa &= mask;
 
 	llist_for_each_entry(cursor, head->first, list)
@@ -1195,24 +1225,50 @@ static void kvmhv_invalidate_nest_rmap(struct kvm *kvm, u64 n_rmap,
 				       unsigned long hpa, unsigned long mask)
 {
 	struct kvm_nested_guest *gp;
-	unsigned long gpa;
-	unsigned int shift, lpid;
-	pte_t *ptep;
+	unsigned int lpid;
 
-	gpa = n_rmap_to_gpa(n_rmap);
 	lpid = n_rmap_to_lpid(n_rmap);;
 	gp = kvmhv_find_nested(kvm, lpid);
 	if (!gp)
 		return;
 
-	/* Find and invalidate the pte */
-	if (gp->radix)
+	/*
+	 * Find the pte, and ensure it's valid and still points to the same
+	 * host page. If the pfn has changed then this is a stale rmap entry,
+	 * the shadow pte actually points somewhere else now, and there is
+	 * nothing to do. Otherwise invalidate the shadow pte and perform the
+	 * appropriate cache invalidation.
+	 */
+	if (gp->radix) {
+		unsigned long gpa = n_rmap_to_gpa(n_rmap);
+		unsigned int shift;
+		pte_t *ptep;
+
 		ptep = __find_linux_pte(gp->shadow_pgtable, gpa, NULL, &shift);
-	else
-		ptep = NULL;	/* XXX TODO */
-	/* Don't spuriously invalidate ptes if the pfn has changed */
-	if (ptep && pte_present(*ptep) && ((pte_val(*ptep) & mask) == hpa))
-		kvmppc_unmap_pte(kvm, ptep, gpa, shift, NULL, gp->shadow_lpid);
+		/* pte present and still points to the same host page? */
+		if (ptep && pte_present(*ptep) && ((pte_val(*ptep) & mask) ==
+						   hpa))
+			kvmppc_unmap_pte(kvm, ptep, gpa, shift, NULL,
+					 gp->shadow_lpid);
+	} else {
+		unsigned long v, r, index = n_rmap_to_index(n_rmap);
+		__be64 *hptep = (__be64 *)(gp->shadow_hpt.virt + (index << 4));
+
+		preempt_disable();
+		while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+			cpu_relax();
+		v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
+		r = be64_to_cpu(hptep[1]);
+
+		/* Invalidate existing pte if valid and host addr matches */
+		if ((v & HPTE_V_VALID) && ((r & mask) == hpa))
+			kvmhv_invalidate_shadow_pte_hash(&gp->shadow_hpt,
+							 gp->shadow_lpid, hptep,
+							 index);
+		else	/* Leave pte unchanged */
+			__unlock_hpte(hptep, v);
+		preempt_enable();
+	}
 }
 
 /*
@@ -1252,7 +1308,7 @@ void kvmhv_invalidate_nest_rmap_range(struct kvm *kvm,
 	gfn = (gpa >> PAGE_SHIFT) - memslot->base_gfn;
 	end_gfn = gfn + (nbytes >> PAGE_SHIFT);
 
-	addr_mask = PTE_RPN_MASK & ~(nbytes - 1);
+	addr_mask = HPTE_R_RPN_3_0 & ~(nbytes - 1);
 	hpa &= addr_mask;
 
 	for (; gfn < end_gfn; gfn++) {
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 22/23] KVM: PPC: Book3S HV: Nested: Enable nested hpt guests
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (20 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 21/23] KVM: PPC: Book3S HV: Nested: Implement nest rmap invalidations for hpt guests Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  2019-08-26  6:21 ` [PATCH 23/23] KVM: PPC: Book3S HV: Add nested hpt pte information to debugfs Suraj Jitindar Singh
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

Allow nested hpt (hash page table) guests to be run.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv_nested.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 90788a52b298..bc7b411793be 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -330,7 +330,6 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
 					       (l2_hv.lpcr & LPCR_VPM1))
 			return H_PARAMETER;
 	} else {
-		return H_PARAMETER;
 		/* must be at least V2 to support hpt guest */
 		if (l2_hv.version < 2)
 			return H_PARAMETER;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 23/23] KVM: PPC: Book3S HV: Add nested hpt pte information to debugfs
  2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
                   ` (21 preceding siblings ...)
  2019-08-26  6:21 ` [PATCH 22/23] KVM: PPC: Book3S HV: Nested: Enable nested " Suraj Jitindar Singh
@ 2019-08-26  6:21 ` Suraj Jitindar Singh
  22 siblings, 0 replies; 27+ messages in thread
From: Suraj Jitindar Singh @ 2019-08-26  6:21 UTC (permalink / raw)
  To: kvm-ppc; +Cc: paulus, kvm, Suraj Jitindar Singh

The radix entry in the vm debugfs folder is used to export debug
information about the ptes contained in the page table of a radix guest.
There is a corrsponding htab entry which provides the same pte
information for a hpt (hash page table) guest.

When a radix guest is running a nested guest this entry also provides
information about the shadow ptes in the shadow page table.

Add the same information for a nested hpt guest running under a radix
guest to this entry. The format followed is the same as that for the
htab entry for a hpt guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 157 +++++++++++++++++++++------------
 1 file changed, 101 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 48b844d33dc9..8ab9d487c9e8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1188,6 +1188,8 @@ static int debugfs_radix_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+#define HPTE_SIZE      (2 * sizeof(unsigned long))
+
 static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
 				 size_t len, loff_t *ppos)
 {
@@ -1197,6 +1199,7 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
 	struct kvm *kvm;
 	unsigned long gpa;
 	pgd_t *pgt;
+	struct kvm_hpt_info *hpt;
 	struct kvm_nested_guest *nested;
 	pgd_t pgd, *pgdp;
 	pud_t pud, *pudp;
@@ -1234,84 +1237,126 @@ static ssize_t debugfs_radix_read(struct file *file, char __user *buf,
 	gpa = p->gpa;
 	nested = NULL;
 	pgt = NULL;
+	hpt = NULL;
 	while (len != 0 && p->lpid >= 0) {
-		if (gpa >= RADIX_PGTABLE_RANGE) {
-			gpa = 0;
-			pgt = NULL;
-			if (nested) {
-				kvmhv_put_nested(nested);
-				nested = NULL;
-			}
-			p->lpid = kvmhv_nested_next_lpid(kvm, p->lpid);
-			p->hdr = 0;
-			if (p->lpid < 0)
-				break;
-		}
-		if (!pgt) {
+		if (!pgt && !hpt) {
 			if (p->lpid == 0) {
 				pgt = kvm->arch.pgtable;
 			} else {
 				nested = kvmhv_get_nested(kvm, p->lpid, false);
 				if (!nested) {
-					gpa = RADIX_PGTABLE_RANGE;
+					p->lpid = kvmhv_nested_next_lpid(kvm,
+							p->lpid);
 					continue;
 				}
-				pgt = nested->shadow_pgtable;
+				if (nested->radix)
+					pgt = nested->shadow_pgtable;
+				else
+					hpt = &nested->shadow_hpt;
 			}
 		}
+		if ((pgt && (gpa >= RADIX_PGTABLE_RANGE)) || (hpt &&
+					(gpa >= kvmppc_hpt_npte(hpt)))) {
+			gpa = 0;
+			pgt = NULL;
+			hpt = NULL;
+			if (nested) {
+				kvmhv_put_nested(nested);
+				nested = NULL;
+			}
+			p->lpid = kvmhv_nested_next_lpid(kvm, p->lpid);
+			p->hdr = 0;
+			continue;
+		}
 		n = 0;
 		if (!p->hdr) {
 			if (p->lpid > 0)
 				n = scnprintf(p->buf, sizeof(p->buf),
-					      "\nNested LPID %d: ", p->lpid);
-			n += scnprintf(p->buf + n, sizeof(p->buf) - n,
-				      "pgdir: %lx\n", (unsigned long)pgt);
+					      "\nNested LPID %d: \n", p->lpid);
+			if (pgt)
+				n += scnprintf(p->buf + n, sizeof(p->buf) - n,
+					      "RADIX:\npgdir: %lx\n",
+					      (unsigned long)pgt);
+			else
+				n += scnprintf(p->buf + n, sizeof(p->buf) - n,
+					       "HASH:\nnpte: %lx\n",
+					       kvmppc_hpt_npte(hpt));
 			p->hdr = 1;
 			goto copy;
 		}
 
-		pgdp = pgt + pgd_index(gpa);
-		pgd = READ_ONCE(*pgdp);
-		if (!(pgd_val(pgd) & _PAGE_PRESENT)) {
-			gpa = (gpa & PGDIR_MASK) + PGDIR_SIZE;
-			continue;
-		}
+		if (pgt) {
+			pgdp = pgt + pgd_index(gpa);
+			pgd = READ_ONCE(*pgdp);
+			if (!(pgd_val(pgd) & _PAGE_PRESENT)) {
+				gpa = (gpa & PGDIR_MASK) + PGDIR_SIZE;
+				continue;
+			}
 
-		pudp = pud_offset(&pgd, gpa);
-		pud = READ_ONCE(*pudp);
-		if (!(pud_val(pud) & _PAGE_PRESENT)) {
-			gpa = (gpa & PUD_MASK) + PUD_SIZE;
-			continue;
-		}
-		if (pud_val(pud) & _PAGE_PTE) {
-			pte = pud_val(pud);
-			shift = PUD_SHIFT;
-			goto leaf;
-		}
+			pudp = pud_offset(&pgd, gpa);
+			pud = READ_ONCE(*pudp);
+			if (!(pud_val(pud) & _PAGE_PRESENT)) {
+				gpa = (gpa & PUD_MASK) + PUD_SIZE;
+				continue;
+			}
+			if (pud_val(pud) & _PAGE_PTE) {
+				pte = pud_val(pud);
+				shift = PUD_SHIFT;
+				goto leaf;
+			}
 
-		pmdp = pmd_offset(&pud, gpa);
-		pmd = READ_ONCE(*pmdp);
-		if (!(pmd_val(pmd) & _PAGE_PRESENT)) {
-			gpa = (gpa & PMD_MASK) + PMD_SIZE;
-			continue;
-		}
-		if (pmd_val(pmd) & _PAGE_PTE) {
-			pte = pmd_val(pmd);
-			shift = PMD_SHIFT;
-			goto leaf;
-		}
+			pmdp = pmd_offset(&pud, gpa);
+			pmd = READ_ONCE(*pmdp);
+			if (!(pmd_val(pmd) & _PAGE_PRESENT)) {
+				gpa = (gpa & PMD_MASK) + PMD_SIZE;
+				continue;
+			}
+			if (pmd_val(pmd) & _PAGE_PTE) {
+				pte = pmd_val(pmd);
+				shift = PMD_SHIFT;
+				goto leaf;
+			}
 
-		ptep = pte_offset_kernel(&pmd, gpa);
-		pte = pte_val(READ_ONCE(*ptep));
-		if (!(pte & _PAGE_PRESENT)) {
-			gpa += PAGE_SIZE;
-			continue;
-		}
-		shift = PAGE_SHIFT;
-	leaf:
-		n = scnprintf(p->buf, sizeof(p->buf),
+			ptep = pte_offset_kernel(&pmd, gpa);
+			pte = pte_val(READ_ONCE(*ptep));
+			if (!(pte & _PAGE_PRESENT)) {
+				gpa += PAGE_SIZE;
+				continue;
+			}
+			shift = PAGE_SHIFT;
+		leaf:
+			n = scnprintf(p->buf, sizeof(p->buf),
 			      " %lx: %lx %d\n", gpa, pte, shift);
-		gpa += 1ul << shift;
+			gpa += 1ul << shift;
+		} else { /* hpt */
+			__be64 *hptp = (__be64 *)(hpt->virt + (gpa * HPTE_SIZE));
+			unsigned long v, hr, gr;
+
+			if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID |
+							HPTE_V_ABSENT))) {
+				gpa++;
+				continue;
+			}
+			/* lock the HPTE so it's stable and read it */
+			preempt_disable();
+			while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
+				cpu_relax();
+			v = be64_to_cpu(hptp[0]) & ~HPTE_V_HVLOCK;
+			hr = be64_to_cpu(hptp[1]);
+			gr = hpt->rev[gpa].guest_rpte;
+			unlock_hpte(hptp, v);
+			preempt_enable();
+
+			if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT))) {
+				gpa++;
+				continue;
+			}
+
+			n = scnprintf(p->buf, sizeof(p->buf),
+				      "%6lx %.16lx %.16lx %.16lx\n",
+				      gpa, v, hr, gr);
+			gpa++;
+		}
 	copy:
 		p->chars_left = n;
 		if (n > len)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/23] KVM: PPC: Book3S HV: Nested: Don't allow hash guests to run nested guests
  2019-08-26  6:20 ` [PATCH 03/23] KVM: PPC: Book3S HV: Nested: Don't allow hash guests to run nested guests Suraj Jitindar Singh
@ 2019-10-23  4:47   ` Paul Mackerras
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2019-10-23  4:47 UTC (permalink / raw)
  To: Suraj Jitindar Singh; +Cc: kvm-ppc, kvm

On Mon, Aug 26, 2019 at 04:20:49PM +1000, Suraj Jitindar Singh wrote:
> Don't allow hpt (hash page table) guests to act as guest hypervisors and
> thus be able to run nested guests. There is currently no support for
> this, if a nested guest is to be run it must be run at the lowest level.
> Explicitly disallow hash guests from enabling the nested kvm-hv capability
> at the hypervisor level.
> 
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index cde3f5a4b3e4..ce960301bfaa 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -5336,8 +5336,12 @@ static int kvmhv_enable_nested(struct kvm *kvm)
>  		return -ENODEV;
>  
>  	/* kvm == NULL means the caller is testing if the capability exists */
> -	if (kvm)
> +	if (kvm) {
> +		/* Only radix guests can act as nested hv and thus run guests */
> +		if (!kvm_is_radix(kvm))
> +			return -1;
>  		kvm->arch.nested_enable = true;
> +	}

I don't think this is necessary, and is possibly undesirable, since a
guest can switch between HPT and radix mode.  In fact if a guest in
HPT mode tries to do any of the hcalls for managing nested guests, it
will get errors, because we have this:

static inline bool nesting_enabled(struct kvm *kvm)
{
	return kvm->arch.nested_enable && kvm_is_radix(kvm);
}

and H_SET_PARTITION_TABLE, H_ENTER_NESTED, etc. all return H_FUNCTION
if nested_enabled() is false.  (This is as the code is today without
your patch).  Furthermore, kvmppc_switch_mmu_to_hpt() does this:

	if (nesting_enabled(kvm))
		kvmhv_release_all_nested(kvm);

So I think it is all covered already without your patch.

Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Nested: Infrastructure for nested hpt guest setup
  2019-08-26  6:20 ` [PATCH 13/23] KVM: PPC: Book3S HV: Nested: Infrastructure for nested hpt guest setup Suraj Jitindar Singh
@ 2019-10-24  3:43   ` Paul Mackerras
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2019-10-24  3:43 UTC (permalink / raw)
  To: Suraj Jitindar Singh; +Cc: kvm-ppc, kvm

On Mon, Aug 26, 2019 at 04:20:59PM +1000, Suraj Jitindar Singh wrote:
> Add the infrastructure to book3s_hv_nested.c to allow a nested hpt (hash
> page table) guest to be setup. As this patch doesn't add the capability
> of creating or removing mmu translations return H_PARAMETER when an
> attempt to actually run a nested hpt guest is made.
> 
> Add fields to the nested guest struct to store the hpt and the vrma slb
> entry.
> 
> Update kvmhv_update_ptbl_cache() to determine when a nested guest is
> switching from radix to hpt or hpt to radix and perform the required
> setup. A page table (radix) or hpt (hash) must be allocated with any
> existing table being freed and the radix field in the nested guest
> struct being updated under the mmu_lock (this means that when holding
> the mmu_lock the radix field can be tested and the existance of the
> correct type of page table guaranteed). Also remove all of the nest rmap
> entries which belong to this nested guest since a nested rmap entry is
> specific to whether the nested guest is hash or radix.
> 
> When a nested guest is initially created or when the partition table
> entry is empty we assume a radix guest since it is much less expensive
> to allocate a radix page table compared to a hpt.
> 
> The hpt which is allocated in the hypervisor for the nested guest
> (called the shadow hpt) is identical in size to the one allocated in the
> guest hypervisor to ensure a 1-to-1 mapping between page table entries.
> This simplifies handling of the entries however this requirement could
> be relaxed in future if support was added.
> 
> Introduce a hash nested_page_fault function to be envoked when the
> nested guest which experiences a page fault is hash, returns -EINVAL for
> now. Also return -EINVAL when handling the H_TLB_INVALIDATE hcall. Also
> lacking support for the hypervisor paging out a guest page which has
> been mapped through to a nested guest. These 3 portions of functionality
> added in proceeding patches.
> 
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>

Small nit below...

> +/* Caller must hold gp->tlb_lock */
> +static int kvmhv_switch_to_radix_nested(struct kvm_nested_guest *gp)
> +{
> +	struct kvm *kvm = gp->l1_host;
> +	pgd_t *pgtable;
> +
> +	/* try to allocate a radix tree */
> +	pgtable = pgd_alloc(kvm->mm);
> +	if (!pgtable) {
> +		pr_err_ratelimited("KVM: Couldn't alloc nested radix tree\n");
> +		return -ENOMEM;
> +	}
> +
> +	/* mmu_lock protects shadow_hpt & radix in nested guest struct */
> +	spin_lock(&kvm->mmu_lock);
> +	kvmppc_free_hpt(&gp->shadow_hpt);
> +	gp->radix = 1;
> +	gp->shadow_pgtable = pgtable;
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	/* remove all nested rmap entries and perform global invalidation */
> +	kvmhv_remove_all_nested_rmap_lpid(kvm, gp->l1_lpid);
> +	kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);

Shouldn't this flush be done using the old value of gp->radix, i.e. 0?
Both because we want to flush the old translations for the guest, and
because we haven't changed the partition table entry for the guest at
this point, so it still says HPT.

> +
> +	return 0;
> +}
> +
> +/* Caller must hold gp->tlb_lock */
> +static int kvmhv_switch_to_hpt_nested(struct kvm_nested_guest *gp, int order)
> +{
> +	struct kvm *kvm = gp->l1_host;
> +	struct kvm_hpt_info info;
> +	int rc;
> +
> +	/* try to allocate an hpt */
> +	rc = kvmppc_allocate_hpt(&info, order);
> +	if (rc) {
> +		pr_err_ratelimited("KVM: Couldn't alloc nested hpt\n");
> +		return rc;
> +	}
> +
> +	/* mmu_lock protects shadow_pgtable & radix in nested guest struct */
> +	spin_lock(&kvm->mmu_lock);
> +	kvmppc_free_pgtable_radix(kvm, gp->shadow_pgtable, gp->shadow_lpid);
> +	pgd_free(kvm->mm, gp->shadow_pgtable);
> +	gp->shadow_pgtable = NULL;
> +	gp->radix = 0;
> +	gp->shadow_hpt = info;
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	/* remove all nested rmap entries and perform global invalidation */
> +	kvmhv_remove_all_nested_rmap_lpid(kvm, gp->l1_lpid);
> +	kvmhv_flush_lpid(gp->shadow_lpid, gp->radix);

Similarly, shouldn't this be a radix flush?

> +
> +	return 0;
> +}

Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 14/23] KVM: PPC: Book3S HV: Nested: Context switch slb for nested hpt guest
  2019-08-26  6:21 ` [PATCH 14/23] KVM: PPC: Book3S HV: Nested: Context switch slb for nested hpt guest Suraj Jitindar Singh
@ 2019-10-24  4:48   ` Paul Mackerras
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2019-10-24  4:48 UTC (permalink / raw)
  To: Suraj Jitindar Singh; +Cc: kvm-ppc, kvm

On Mon, Aug 26, 2019 at 04:21:00PM +1000, Suraj Jitindar Singh wrote:
> A version 2 of the H_ENTER_NESTED hcall was added with an argument to
> specify the slb entries which should be used to run the nested guest.
> 
> Add support for this version of the hcall structures to
> kvmhv_enter_nested_guest() and context switch the slb when the nested
> guest being run is a hpt (hash page table) guest.
> 
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>

Question below...

> @@ -307,6 +335,26 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>  	vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
>  	saved_l1_regs = vcpu->arch.regs;
>  	kvmhv_save_hv_regs(vcpu, &saved_l1_hv);
> +	/* if running hpt then context switch the slb in the vcpu struct */
> +	if (!radix) {
> +		slb_ptr = kvmppc_get_gpr(vcpu, 6);
> +		l2_slb = kzalloc(sizeof(*l2_slb), GFP_KERNEL);
> +		saved_l1_slb = kzalloc(sizeof(*saved_l1_slb), GFP_KERNEL);
> +
> +		if ((!l2_slb) || (!saved_l1_slb)) {
> +			ret = H_HARDWARE;
> +			goto out_free;
> +		}
> +		err = kvm_vcpu_read_guest(vcpu, slb_ptr, l2_slb,
> +					  sizeof(struct guest_slb));
> +		if (err) {
> +			ret = H_PARAMETER;
> +			goto out_free;
> +		}
> +		if (kvmppc_need_byteswap(vcpu))
> +			byteswap_guest_slb(l2_slb);
> +		kvmhv_save_guest_slb(vcpu, saved_l1_slb);

Why are we bothering to save the SLB state of the L1 guest, which has
to be a radix guest?  Won't the L1 SLB state always just have 0
entries?

> @@ -354,6 +409,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>  		vcpu->arch.shregs.msr |= MSR_TS_S;
>  	vc->tb_offset = saved_l1_hv.tb_offset;
>  	restore_hv_regs(vcpu, &saved_l1_hv);
> +	if (!radix)
> +		kvmhv_restore_guest_slb(vcpu, saved_l1_slb);

Likewise here can't we just set vcpu->arch.slb_max and
vcpu->arch.slb_nr to zero?

Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-10-24  5:38 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26  6:20 [PATCH 00/23] KVM: PPC: BOok3S HV: Support for nested HPT guests Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 01/23] KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 02/23] KVM: PPC: Book3S HV: Increment mmu_notifier_seq when modifying radix pte rc bits Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 03/23] KVM: PPC: Book3S HV: Nested: Don't allow hash guests to run nested guests Suraj Jitindar Singh
2019-10-23  4:47   ` Paul Mackerras
2019-08-26  6:20 ` [PATCH 04/23] KVM: PPC: Book3S HV: Handle making H_ENTER_NESTED hcall in a separate function Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 05/23] KVM: PPC: Book3S HV: Enable calling kvmppc_hpte_hv_fault in virtual mode Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 06/23] KVM: PPC: Book3S HV: Allow hpt manipulation hcalls to be called " Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 07/23] KVM: PPC: Book3S HV: Make kvmppc_invalidate_hpte() take lpid not a kvm struct Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 08/23] KVM: PPC: Book3S HV: Nested: Allow pseries hypervisor to run hpt nested guest Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 09/23] KVM: PPC: Book3S HV: Nested: Improve comments and naming of nest rmap functions Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 10/23] KVM: PPC: Book3S HV: Nested: Increase gpa field in nest rmap to 46 bits Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 11/23] KVM: PPC: Book3S HV: Nested: Remove single nest rmap entries Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 12/23] KVM: PPC: Book3S HV: Nested: add kvmhv_remove_all_nested_rmap_lpid() Suraj Jitindar Singh
2019-08-26  6:20 ` [PATCH 13/23] KVM: PPC: Book3S HV: Nested: Infrastructure for nested hpt guest setup Suraj Jitindar Singh
2019-10-24  3:43   ` Paul Mackerras
2019-08-26  6:21 ` [PATCH 14/23] KVM: PPC: Book3S HV: Nested: Context switch slb for nested hpt guest Suraj Jitindar Singh
2019-10-24  4:48   ` Paul Mackerras
2019-08-26  6:21 ` [PATCH 15/23] KVM: PPC: Book3S HV: Store lpcr and hdec_exp in the vcpu struct Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 16/23] KVM: PPC: Book3S HV: Nested: Make kvmppc_run_vcpu() entry path nested capable Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 17/23] KVM: PPC: Book3S HV: Nested: Rename kvmhv_xlate_addr_nested_radix Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 18/23] KVM: PPC: Book3S HV: Separate out hashing from kvmppc_hv_find_lock_hpte() Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 19/23] KVM: PPC: Book3S HV: Nested: Implement nested hpt mmu translation Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 20/23] KVM: PPC: Book3S HV: Nested: Handle tlbie hcall for nested hpt guest Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 21/23] KVM: PPC: Book3S HV: Nested: Implement nest rmap invalidations for hpt guests Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 22/23] KVM: PPC: Book3S HV: Nested: Enable nested " Suraj Jitindar Singh
2019-08-26  6:21 ` [PATCH 23/23] KVM: PPC: Book3S HV: Add nested hpt pte information to debugfs Suraj Jitindar Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).