All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel v2 0/6] KVM: PPC: TCE improvements
@ 2018-09-10  8:29 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

Hi,

Here is my current queue of TCE/KVM patches.

1/6 is a bugfix for https://bugzilla.redhat.com/show_bug.cgi?id=1620360
2/6..5/6 are to help with testing https://bugzilla.redhat.com/show_bug.cgi?id=1613190
6/6 is a small cleanup


This is based on sha1
11da3a7 Linus Torvalds "Linux 4.19-rc3".

Please comment. Thanks.


Alex, I cc: you to keep you informed that
[RFC 1/6] did change drivers/vfio/vfio_iommu_spapr_tce.c but
this one does not.


Alexey Kardashevskiy (6):
  KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
  KVM: PPC: Validate all tces before updating tables
  KVM: PPC: Inform the userspace about TCE update failures
  KVM: PPC: Validate TCEs against preregistered memory page sizes
  KVM: PPC: Propagate errors to the guest when failed instead of
    ignoring
  KVM: PPC: Remove redundand permission bits removal

 arch/powerpc/include/asm/book3s/64/pgtable.h |   1 -
 arch/powerpc/include/asm/iommu.h             |   2 -
 arch/powerpc/include/asm/kvm_ppc.h           |   4 +-
 arch/powerpc/include/asm/mmu_context.h       |   1 +
 arch/powerpc/kernel/iommu.c                  |  25 ------
 arch/powerpc/kvm/book3s_64_vio.c             |  89 +++++++++++++++------
 arch/powerpc/kvm/book3s_64_vio_hv.c          | 114 +++++++++++++++++----------
 arch/powerpc/mm/init_64.c                    |  49 ------------
 arch/powerpc/mm/mmu_context_iommu.c          |  34 +++++++-
 9 files changed, 170 insertions(+), 149 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 0/6] KVM: PPC: TCE improvements
@ 2018-09-10  8:29 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

Hi,

Here is my current queue of TCE/KVM patches.

1/6 is a bugfix for https://bugzilla.redhat.com/show_bug.cgi?id\x1620360
2/6..5/6 are to help with testing https://bugzilla.redhat.com/show_bug.cgi?id\x1613190
6/6 is a small cleanup


This is based on sha1
11da3a7 Linus Torvalds "Linux 4.19-rc3".

Please comment. Thanks.


Alex, I cc: you to keep you informed that
[RFC 1/6] did change drivers/vfio/vfio_iommu_spapr_tce.c but
this one does not.


Alexey Kardashevskiy (6):
  KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
  KVM: PPC: Validate all tces before updating tables
  KVM: PPC: Inform the userspace about TCE update failures
  KVM: PPC: Validate TCEs against preregistered memory page sizes
  KVM: PPC: Propagate errors to the guest when failed instead of
    ignoring
  KVM: PPC: Remove redundand permission bits removal

 arch/powerpc/include/asm/book3s/64/pgtable.h |   1 -
 arch/powerpc/include/asm/iommu.h             |   2 -
 arch/powerpc/include/asm/kvm_ppc.h           |   4 +-
 arch/powerpc/include/asm/mmu_context.h       |   1 +
 arch/powerpc/kernel/iommu.c                  |  25 ------
 arch/powerpc/kvm/book3s_64_vio.c             |  89 +++++++++++++++------
 arch/powerpc/kvm/book3s_64_vio_hv.c          | 114 +++++++++++++++++----------
 arch/powerpc/mm/init_64.c                    |  49 ------------
 arch/powerpc/mm/mmu_context_iommu.c          |  34 +++++++-
 9 files changed, 170 insertions(+), 149 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 1/6] KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
  2018-09-10  8:29 ` Alexey Kardashevskiy
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
which in turn reads the old TCE and if it was a valid entry - marks
the physical page dirty if it was mapped for writing. Since it is
the real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
to get the page struct. However SetPageDirty() itself reads the compound
page head and returns a virtual address for the head page struct and
setting dirty bit for that kills the system.

This adds additional dirty bit tracking into the MM/IOMMU API for use
in the real mode. Note that this does not change how VFIO and
KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
- use the lowest bit of the cached host phys address to carry
the dirty bit;
- mark pages dirty when they are unpinned which happens when
the preregistered memory is released which always happens in virtual
mode;
- add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
- change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
in the new mm_iommu_ua_mark_dirty_rm() helper;
- move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
caller anyway) to reduce the real mode KVM and IOMMU knowledge
across different subsystems.

This removes realmode_pfn_to_page() as it is not used anymore.

While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
the real mode only and modules cannot call it anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* only do delaying dirtying for the real mode
* no change in VFIO IOMMU SPAPR TCE driver is needed anymore
* inverted MM_IOMMU_TABLE_GROUP_PAGE_MASK
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
 arch/powerpc/include/asm/iommu.h             |  2 --
 arch/powerpc/include/asm/mmu_context.h       |  1 +
 arch/powerpc/kernel/iommu.c                  | 25 --------------
 arch/powerpc/kvm/book3s_64_vio_hv.c          | 39 +++++++++++++++++-----
 arch/powerpc/mm/init_64.c                    | 49 ----------------------------
 arch/powerpc/mm/mmu_context_iommu.c          | 34 ++++++++++++++++---
 7 files changed, 62 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 13a688f..2fdc865 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1051,7 +1051,6 @@ static inline void vmemmap_remove_mapping(unsigned long start,
 	return hash__vmemmap_remove_mapping(start, page_size);
 }
 #endif
-struct page *realmode_pfn_to_page(unsigned long pfn);
 
 static inline pte_t pmd_pte(pmd_t pmd)
 {
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index ab3a4fb..3d4b88c 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -220,8 +220,6 @@ extern void iommu_del_device(struct device *dev);
 extern int __init tce_iommu_bus_notifier_init(void);
 extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
 		unsigned long *hpa, enum dma_data_direction *direction);
-extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
-		unsigned long *hpa, enum dma_data_direction *direction);
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b2f89b6..b694d6a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -38,6 +38,7 @@ extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
 		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
 extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
 		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
+extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua);
 extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
 extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
 #endif
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index af7a20d..19b4c62 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1013,31 +1013,6 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_xchg);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
-		unsigned long *hpa, enum dma_data_direction *direction)
-{
-	long ret;
-
-	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
-
-	if (!ret && ((*direction == DMA_FROM_DEVICE) ||
-			(*direction == DMA_BIDIRECTIONAL))) {
-		struct page *pg = realmode_pfn_to_page(*hpa >> PAGE_SHIFT);
-
-		if (likely(pg)) {
-			SetPageDirty(pg);
-		} else {
-			tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
-			ret = -EFAULT;
-		}
-	}
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);
-#endif
-
 int iommu_take_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 506a4d4..6821ead 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -187,12 +187,35 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-static void kvmppc_rm_clear_tce(struct iommu_table *tbl, unsigned long entry)
+static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
+		unsigned long entry, unsigned long *hpa,
+		enum dma_data_direction *direction)
+{
+	long ret;
+
+	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
+
+	if (!ret && ((*direction == DMA_FROM_DEVICE) ||
+				(*direction == DMA_BIDIRECTIONAL))) {
+		__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
+		/*
+		 * kvmppc_rm_tce_iommu_do_map() updates the UA cache after
+		 * calling this so we still get here a valid UA.
+		 */
+		if (pua && *pua)
+			mm_iommu_ua_mark_dirty_rm(mm, be64_to_cpu(*pua));
+	}
+
+	return ret;
+}
+
+static void kvmppc_rm_clear_tce(struct kvm *kvm, struct iommu_table *tbl,
+		unsigned long entry)
 {
 	unsigned long hpa = 0;
 	enum dma_data_direction dir = DMA_NONE;
 
-	iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
+	iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 }
 
 static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
@@ -224,7 +247,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
 	unsigned long hpa = 0;
 	long ret;
 
-	if (iommu_tce_xchg_rm(tbl, entry, &hpa, &dir))
+	if (iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir))
 		/*
 		 * real mode xchg can fail if struct page crosses
 		 * a page boundary
@@ -236,7 +259,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
 
 	ret = kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry);
 	if (ret)
-		iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
+		iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 
 	return ret;
 }
@@ -282,7 +305,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
 		return H_CLOSED;
 
-	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
+	ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 	if (ret) {
 		mm_iommu_mapped_dec(mem);
 		/*
@@ -371,7 +394,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			return ret;
 
 		WARN_ON_ONCE_RM(1);
-		kvmppc_rm_clear_tce(stit->tbl, entry);
+		kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -520,7 +543,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 				goto unlock_exit;
 
 			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(stit->tbl, entry);
+			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
@@ -571,7 +594,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 				return ret;
 
 			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(stit->tbl, entry);
+			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 		}
 	}
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 51ce091..7a9886f 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -308,55 +308,6 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 {
 }
 
-/*
- * We do not have access to the sparsemem vmemmap, so we fallback to
- * walking the list of sparsemem blocks which we already maintain for
- * the sake of crashdump. In the long run, we might want to maintain
- * a tree if performance of that linear walk becomes a problem.
- *
- * realmode_pfn_to_page functions can fail due to:
- * 1) As real sparsemem blocks do not lay in RAM continously (they
- * are in virtual address space which is not available in the real mode),
- * the requested page struct can be split between blocks so get_page/put_page
- * may fail.
- * 2) When huge pages are used, the get_page/put_page API will fail
- * in real mode as the linked addresses in the page struct are virtual
- * too.
- */
-struct page *realmode_pfn_to_page(unsigned long pfn)
-{
-	struct vmemmap_backing *vmem_back;
-	struct page *page;
-	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
-	unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
-
-	for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
-		if (pg_va < vmem_back->virt_addr)
-			continue;
-
-		/* After vmemmap_list entry free is possible, need check all */
-		if ((pg_va + sizeof(struct page)) <=
-				(vmem_back->virt_addr + page_size)) {
-			page = (struct page *) (vmem_back->phys + pg_va -
-				vmem_back->virt_addr);
-			return page;
-		}
-	}
-
-	/* Probably that page struct is split between real pages */
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
-
-#else
-
-struct page *realmode_pfn_to_page(unsigned long pfn)
-{
-	struct page *page = pfn_to_page(pfn);
-	return page;
-}
-EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
-
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index c9ee9e2..56c2234 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -18,11 +18,15 @@
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
 #include <linux/swap.h>
+#include <linux/sizes.h>
 #include <asm/mmu_context.h>
 #include <asm/pte-walk.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
 
+#define MM_IOMMU_TABLE_GROUP_PAGE_DIRTY	0x1
+#define MM_IOMMU_TABLE_GROUP_PAGE_MASK	~(SZ_4K - 1)
+
 struct mm_iommu_table_group_mem_t {
 	struct list_head next;
 	struct rcu_head rcu;
@@ -263,6 +267,9 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem)
 		if (!page)
 			continue;
 
+		if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
+			SetPageDirty(page);
+
 		put_page(page);
 		mem->hpas[i] = 0;
 	}
@@ -360,7 +367,6 @@ struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(struct mm_struct *mm,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm);
 
 struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
 		unsigned long ua, unsigned long entries)
@@ -390,7 +396,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
 	if (pageshift > mem->pageshift)
 		return -EFAULT;
 
-	*hpa = *va | (ua & ~PAGE_MASK);
+	*hpa = (*va & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
 
 	return 0;
 }
@@ -413,11 +419,31 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
 	if (!pa)
 		return -EFAULT;
 
-	*hpa = *pa | (ua & ~PAGE_MASK);
+	*hpa = (*pa & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa_rm);
+
+extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua)
+{
+	struct mm_iommu_table_group_mem_t *mem;
+	long entry;
+	void *va;
+	unsigned long *pa;
+
+	mem = mm_iommu_lookup_rm(mm, ua, PAGE_SIZE);
+	if (!mem)
+		return;
+
+	entry = (ua - mem->ua) >> PAGE_SHIFT;
+	va = &mem->hpas[entry];
+
+	pa = (void *) vmalloc_to_phys(va);
+	if (!pa)
+		return;
+
+	*pa |= MM_IOMMU_TABLE_GROUP_PAGE_DIRTY;
+}
 
 long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 1/6] KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
which in turn reads the old TCE and if it was a valid entry - marks
the physical page dirty if it was mapped for writing. Since it is
the real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
to get the page struct. However SetPageDirty() itself reads the compound
page head and returns a virtual address for the head page struct and
setting dirty bit for that kills the system.

This adds additional dirty bit tracking into the MM/IOMMU API for use
in the real mode. Note that this does not change how VFIO and
KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
- use the lowest bit of the cached host phys address to carry
the dirty bit;
- mark pages dirty when they are unpinned which happens when
the preregistered memory is released which always happens in virtual
mode;
- add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
- change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
in the new mm_iommu_ua_mark_dirty_rm() helper;
- move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
caller anyway) to reduce the real mode KVM and IOMMU knowledge
across different subsystems.

This removes realmode_pfn_to_page() as it is not used anymore.

While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
the real mode only and modules cannot call it anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* only do delaying dirtying for the real mode
* no change in VFIO IOMMU SPAPR TCE driver is needed anymore
* inverted MM_IOMMU_TABLE_GROUP_PAGE_MASK
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
 arch/powerpc/include/asm/iommu.h             |  2 --
 arch/powerpc/include/asm/mmu_context.h       |  1 +
 arch/powerpc/kernel/iommu.c                  | 25 --------------
 arch/powerpc/kvm/book3s_64_vio_hv.c          | 39 +++++++++++++++++-----
 arch/powerpc/mm/init_64.c                    | 49 ----------------------------
 arch/powerpc/mm/mmu_context_iommu.c          | 34 ++++++++++++++++---
 7 files changed, 62 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 13a688f..2fdc865 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1051,7 +1051,6 @@ static inline void vmemmap_remove_mapping(unsigned long start,
 	return hash__vmemmap_remove_mapping(start, page_size);
 }
 #endif
-struct page *realmode_pfn_to_page(unsigned long pfn);
 
 static inline pte_t pmd_pte(pmd_t pmd)
 {
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index ab3a4fb..3d4b88c 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -220,8 +220,6 @@ extern void iommu_del_device(struct device *dev);
 extern int __init tce_iommu_bus_notifier_init(void);
 extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
 		unsigned long *hpa, enum dma_data_direction *direction);
-extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
-		unsigned long *hpa, enum dma_data_direction *direction);
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b2f89b6..b694d6a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -38,6 +38,7 @@ extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
 		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
 extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
 		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
+extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua);
 extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
 extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
 #endif
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index af7a20d..19b4c62 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1013,31 +1013,6 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_xchg);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
-		unsigned long *hpa, enum dma_data_direction *direction)
-{
-	long ret;
-
-	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
-
-	if (!ret && ((*direction = DMA_FROM_DEVICE) ||
-			(*direction = DMA_BIDIRECTIONAL))) {
-		struct page *pg = realmode_pfn_to_page(*hpa >> PAGE_SHIFT);
-
-		if (likely(pg)) {
-			SetPageDirty(pg);
-		} else {
-			tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
-			ret = -EFAULT;
-		}
-	}
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);
-#endif
-
 int iommu_take_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 506a4d4..6821ead 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -187,12 +187,35 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-static void kvmppc_rm_clear_tce(struct iommu_table *tbl, unsigned long entry)
+static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
+		unsigned long entry, unsigned long *hpa,
+		enum dma_data_direction *direction)
+{
+	long ret;
+
+	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
+
+	if (!ret && ((*direction = DMA_FROM_DEVICE) ||
+				(*direction = DMA_BIDIRECTIONAL))) {
+		__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
+		/*
+		 * kvmppc_rm_tce_iommu_do_map() updates the UA cache after
+		 * calling this so we still get here a valid UA.
+		 */
+		if (pua && *pua)
+			mm_iommu_ua_mark_dirty_rm(mm, be64_to_cpu(*pua));
+	}
+
+	return ret;
+}
+
+static void kvmppc_rm_clear_tce(struct kvm *kvm, struct iommu_table *tbl,
+		unsigned long entry)
 {
 	unsigned long hpa = 0;
 	enum dma_data_direction dir = DMA_NONE;
 
-	iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
+	iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 }
 
 static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
@@ -224,7 +247,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
 	unsigned long hpa = 0;
 	long ret;
 
-	if (iommu_tce_xchg_rm(tbl, entry, &hpa, &dir))
+	if (iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir))
 		/*
 		 * real mode xchg can fail if struct page crosses
 		 * a page boundary
@@ -236,7 +259,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
 
 	ret = kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry);
 	if (ret)
-		iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
+		iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 
 	return ret;
 }
@@ -282,7 +305,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
 		return H_CLOSED;
 
-	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
+	ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 	if (ret) {
 		mm_iommu_mapped_dec(mem);
 		/*
@@ -371,7 +394,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			return ret;
 
 		WARN_ON_ONCE_RM(1);
-		kvmppc_rm_clear_tce(stit->tbl, entry);
+		kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -520,7 +543,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 				goto unlock_exit;
 
 			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(stit->tbl, entry);
+			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
@@ -571,7 +594,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 				return ret;
 
 			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(stit->tbl, entry);
+			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 		}
 	}
 
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 51ce091..7a9886f 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -308,55 +308,6 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 {
 }
 
-/*
- * We do not have access to the sparsemem vmemmap, so we fallback to
- * walking the list of sparsemem blocks which we already maintain for
- * the sake of crashdump. In the long run, we might want to maintain
- * a tree if performance of that linear walk becomes a problem.
- *
- * realmode_pfn_to_page functions can fail due to:
- * 1) As real sparsemem blocks do not lay in RAM continously (they
- * are in virtual address space which is not available in the real mode),
- * the requested page struct can be split between blocks so get_page/put_page
- * may fail.
- * 2) When huge pages are used, the get_page/put_page API will fail
- * in real mode as the linked addresses in the page struct are virtual
- * too.
- */
-struct page *realmode_pfn_to_page(unsigned long pfn)
-{
-	struct vmemmap_backing *vmem_back;
-	struct page *page;
-	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
-	unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
-
-	for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
-		if (pg_va < vmem_back->virt_addr)
-			continue;
-
-		/* After vmemmap_list entry free is possible, need check all */
-		if ((pg_va + sizeof(struct page)) <-				(vmem_back->virt_addr + page_size)) {
-			page = (struct page *) (vmem_back->phys + pg_va -
-				vmem_back->virt_addr);
-			return page;
-		}
-	}
-
-	/* Probably that page struct is split between real pages */
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
-
-#else
-
-struct page *realmode_pfn_to_page(unsigned long pfn)
-{
-	struct page *page = pfn_to_page(pfn);
-	return page;
-}
-EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
-
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index c9ee9e2..56c2234 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -18,11 +18,15 @@
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
 #include <linux/swap.h>
+#include <linux/sizes.h>
 #include <asm/mmu_context.h>
 #include <asm/pte-walk.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
 
+#define MM_IOMMU_TABLE_GROUP_PAGE_DIRTY	0x1
+#define MM_IOMMU_TABLE_GROUP_PAGE_MASK	~(SZ_4K - 1)
+
 struct mm_iommu_table_group_mem_t {
 	struct list_head next;
 	struct rcu_head rcu;
@@ -263,6 +267,9 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem)
 		if (!page)
 			continue;
 
+		if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
+			SetPageDirty(page);
+
 		put_page(page);
 		mem->hpas[i] = 0;
 	}
@@ -360,7 +367,6 @@ struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(struct mm_struct *mm,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm);
 
 struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
 		unsigned long ua, unsigned long entries)
@@ -390,7 +396,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
 	if (pageshift > mem->pageshift)
 		return -EFAULT;
 
-	*hpa = *va | (ua & ~PAGE_MASK);
+	*hpa = (*va & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
 
 	return 0;
 }
@@ -413,11 +419,31 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
 	if (!pa)
 		return -EFAULT;
 
-	*hpa = *pa | (ua & ~PAGE_MASK);
+	*hpa = (*pa & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa_rm);
+
+extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua)
+{
+	struct mm_iommu_table_group_mem_t *mem;
+	long entry;
+	void *va;
+	unsigned long *pa;
+
+	mem = mm_iommu_lookup_rm(mm, ua, PAGE_SIZE);
+	if (!mem)
+		return;
+
+	entry = (ua - mem->ua) >> PAGE_SHIFT;
+	va = &mem->hpas[entry];
+
+	pa = (void *) vmalloc_to_phys(va);
+	if (!pa)
+		return;
+
+	*pa |= MM_IOMMU_TABLE_GROUP_PAGE_DIRTY;
+}
 
 long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 2/6] KVM: PPC: Validate all tces before updating tables
  2018-09-10  8:29 ` Alexey Kardashevskiy
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

The KVM TCE handlers are written in a way so they fail when either
something went horribly wrong or the userspace did some obvious mistake
such as passing a misaligned address.

We are going to enhance the TCE checker to fail on attempts to map bigger
IOMMU page than the underlying pinned memory so let's valitate TCE
beforehand.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
Changes:
v2:
* added a comment for the second get_user() from v1 discussion
---
 arch/powerpc/kvm/book3s_64_vio.c    | 18 ++++++++++++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  4 ++++
 2 files changed, 22 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9a3f264..3c17977 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -599,6 +599,24 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		/*
+		 * This looks unsafe, because we validate, then regrab
+		 * the TCE from userspace which could have been changed by
+		 * another thread.
+		 *
+		 * But it actually is safe, because the relevant checks will be
+		 * re-executed in the following code.  If userspace tries to
+		 * change this dodgily it will result in a messier failure mode
+		 * but won't threaten the host.
+		 */
+		if (get_user(tce, tces + i)) {
+			ret = H_TOO_HARD;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
 
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
 				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6821ead..c2848e0b 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -524,6 +524,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
 		ua = 0;
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 2/6] KVM: PPC: Validate all tces before updating tables
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

The KVM TCE handlers are written in a way so they fail when either
something went horribly wrong or the userspace did some obvious mistake
such as passing a misaligned address.

We are going to enhance the TCE checker to fail on attempts to map bigger
IOMMU page than the underlying pinned memory so let's valitate TCE
beforehand.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
Changes:
v2:
* added a comment for the second get_user() from v1 discussion
---
 arch/powerpc/kvm/book3s_64_vio.c    | 18 ++++++++++++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c |  4 ++++
 2 files changed, 22 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9a3f264..3c17977 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -599,6 +599,24 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		/*
+		 * This looks unsafe, because we validate, then regrab
+		 * the TCE from userspace which could have been changed by
+		 * another thread.
+		 *
+		 * But it actually is safe, because the relevant checks will be
+		 * re-executed in the following code.  If userspace tries to
+		 * change this dodgily it will result in a messier failure mode
+		 * but won't threaten the host.
+		 */
+		if (get_user(tce, tces + i)) {
+			ret = H_TOO_HARD;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
 
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
 				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6821ead..c2848e0b 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -524,6 +524,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
 		ua = 0;
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 3/6] KVM: PPC: Inform the userspace about TCE update failures
  2018-09-10  8:29 ` Alexey Kardashevskiy
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

We return H_TOO_HARD from TCE update handlers when we think that
the next handler (realmode -> virtual mode -> user mode) has a chance to
handle the request; H_HARDWARE/H_CLOSED otherwise.

This changes the handlers to return H_TOO_HARD on every error giving
the userspace an opportunity to handle any request or at least log
them all.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++----
 arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 3c17977..984cec8 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -401,7 +401,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
 	long ret;
 
 	if (WARN_ON_ONCE(iommu_tce_xchg(tbl, entry, &hpa, &dir)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (dir == DMA_NONE)
 		return H_SUCCESS;
@@ -449,15 +449,15 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (mm_iommu_mapped_inc(mem))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
 	if (WARN_ON_ONCE(ret)) {
 		mm_iommu_mapped_dec(mem);
-		return H_HARDWARE;
+		return H_TOO_HARD;
 	}
 
 	if (dir != DMA_NONE)
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index c2848e0b..7388b66 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -300,10 +300,10 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 
 	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
 			&hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 	if (ret) {
@@ -501,7 +501,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
 		rmap = (void *) vmalloc_to_phys(rmap);
 		if (WARN_ON_ONCE_RM(!rmap))
-			return H_HARDWARE;
+			return H_TOO_HARD;
 
 		/*
 		 * Synchronize with the MMU notifier callbacks in
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 3/6] KVM: PPC: Inform the userspace about TCE update failures
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

We return H_TOO_HARD from TCE update handlers when we think that
the next handler (realmode -> virtual mode -> user mode) has a chance to
handle the request; H_HARDWARE/H_CLOSED otherwise.

This changes the handlers to return H_TOO_HARD on every error giving
the userspace an opportunity to handle any request or at least log
them all.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++----
 arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 3c17977..984cec8 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -401,7 +401,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
 	long ret;
 
 	if (WARN_ON_ONCE(iommu_tce_xchg(tbl, entry, &hpa, &dir)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (dir = DMA_NONE)
 		return H_SUCCESS;
@@ -449,15 +449,15 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (mm_iommu_mapped_inc(mem))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
 	if (WARN_ON_ONCE(ret)) {
 		mm_iommu_mapped_dec(mem);
-		return H_HARDWARE;
+		return H_TOO_HARD;
 	}
 
 	if (dir != DMA_NONE)
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index c2848e0b..7388b66 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -300,10 +300,10 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 
 	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
 			&hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
 	if (ret) {
@@ -501,7 +501,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
 		rmap = (void *) vmalloc_to_phys(rmap);
 		if (WARN_ON_ONCE_RM(!rmap))
-			return H_HARDWARE;
+			return H_TOO_HARD;
 
 		/*
 		 * Synchronize with the MMU notifier callbacks in
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 4/6] KVM: PPC: Validate TCEs against preregistered memory page sizes
  2018-09-10  8:29 ` Alexey Kardashevskiy
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

The userspace can request an arbitrary supported page size for a DMA
window and this works fine as long as the mapped memory is backed with
the pages of the same or bigger size; if this is not the case,
mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
dangerously incorrect TCEs.

However since it is quite easy to misconfigure the KVM and we do not do
reverts to all changes made to TCE tables if an error happens in a middle,
we better do the acceptable page size validation before we even touch
the tables.

This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
against the preregistered memory page sizes.

Since the new check uses real/virtual mode helpers, this renames
kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
case and mirrors it for the virtual mode under the old name. The real
mode handler is not used for the virtual mode as:
1. it uses _lockless() list traversing primitives instead of RCU;
2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
virtual mode does not have to use and since on POWER9+radix only virtual
mode handlers actually work, we do not want to slow down that path even
a bit.

This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
are static now.

>From now on the attempts on mapping IOMMU pages bigger than allowed will
result in KVM exit.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
Changes:
v2:
* updated commit log
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 --
 arch/powerpc/kvm/book3s_64_vio.c    | 35 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 30 +++++++++++++++++++++++-------
 3 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index e991821..2f5d431 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,8 +194,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
 		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
 				(stt)->size, (ioba), (npages)) ?        \
 				H_PARAMETER : H_SUCCESS)
-extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
-		unsigned long tce);
 extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 		unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 984cec8..01e1994 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -363,6 +363,41 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	return ret;
 }
 
+static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
+{
+	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
+	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
+
+	/* Allow userspace to poison TCE table */
+	if (dir == DMA_NONE)
+		return H_SUCCESS;
+
+	if (iommu_tce_check_gpa(stt->page_shift, gpa))
+		return H_TOO_HARD;
+
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
+	return H_SUCCESS;
+}
+
 static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
 {
 	unsigned long hpa = 0;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 7388b66..977e95a 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -94,14 +94,14 @@ EXPORT_SYMBOL_GPL(kvmppc_find_table);
  * to the table and user space is supposed to process them), we can skip
  * checking other things (such as TCE is a guest RAM address or the page
  * was actually allocated).
- *
- * WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
  */
-long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
 {
 	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
 	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
 
 	/* Allow userspace to poison TCE table */
 	if (dir == DMA_NONE)
@@ -110,9 +110,25 @@ long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_PARAMETER;
 
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup_rm(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa_rm(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
 
 /* Note on the use of page_address() in real mode,
  *
@@ -368,7 +384,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	ret = kvmppc_tce_validate(stt, tce);
+	ret = kvmppc_rm_tce_validate(stt, tce);
 	if (ret != H_SUCCESS)
 		return ret;
 
@@ -521,7 +537,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	for (i = 0; i < npages; ++i) {
 		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
-		ret = kvmppc_tce_validate(stt, tce);
+		ret = kvmppc_rm_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 4/6] KVM: PPC: Validate TCEs against preregistered memory page sizes
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

The userspace can request an arbitrary supported page size for a DMA
window and this works fine as long as the mapped memory is backed with
the pages of the same or bigger size; if this is not the case,
mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
dangerously incorrect TCEs.

However since it is quite easy to misconfigure the KVM and we do not do
reverts to all changes made to TCE tables if an error happens in a middle,
we better do the acceptable page size validation before we even touch
the tables.

This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
against the preregistered memory page sizes.

Since the new check uses real/virtual mode helpers, this renames
kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
case and mirrors it for the virtual mode under the old name. The real
mode handler is not used for the virtual mode as:
1. it uses _lockless() list traversing primitives instead of RCU;
2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
virtual mode does not have to use and since on POWER9+radix only virtual
mode handlers actually work, we do not want to slow down that path even
a bit.

This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
are static now.

From now on the attempts on mapping IOMMU pages bigger than allowed will
result in KVM exit.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
Changes:
v2:
* updated commit log
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 --
 arch/powerpc/kvm/book3s_64_vio.c    | 35 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 30 +++++++++++++++++++++++-------
 3 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index e991821..2f5d431 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,8 +194,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
 		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
 				(stt)->size, (ioba), (npages)) ?        \
 				H_PARAMETER : H_SUCCESS)
-extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
-		unsigned long tce);
 extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 		unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 984cec8..01e1994 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -363,6 +363,41 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	return ret;
 }
 
+static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
+{
+	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
+	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
+
+	/* Allow userspace to poison TCE table */
+	if (dir = DMA_NONE)
+		return H_SUCCESS;
+
+	if (iommu_tce_check_gpa(stt->page_shift, gpa))
+		return H_TOO_HARD;
+
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
+	return H_SUCCESS;
+}
+
 static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
 {
 	unsigned long hpa = 0;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 7388b66..977e95a 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -94,14 +94,14 @@ EXPORT_SYMBOL_GPL(kvmppc_find_table);
  * to the table and user space is supposed to process them), we can skip
  * checking other things (such as TCE is a guest RAM address or the page
  * was actually allocated).
- *
- * WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
  */
-long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
 {
 	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
 	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
 
 	/* Allow userspace to poison TCE table */
 	if (dir = DMA_NONE)
@@ -110,9 +110,25 @@ long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_PARAMETER;
 
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup_rm(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa_rm(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
 
 /* Note on the use of page_address() in real mode,
  *
@@ -368,7 +384,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	ret = kvmppc_tce_validate(stt, tce);
+	ret = kvmppc_rm_tce_validate(stt, tce);
 	if (ret != H_SUCCESS)
 		return ret;
 
@@ -521,7 +537,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	for (i = 0; i < npages; ++i) {
 		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
-		ret = kvmppc_tce_validate(stt, tce);
+		ret = kvmppc_rm_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 5/6] KVM: PPC: Propagate errors to the guest when failed instead of ignoring
  2018-09-10  8:29 ` Alexey Kardashevskiy
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

At the moment if the PUT_TCE{_INDIRECT} handlers fail to update
the hardware tables, we print a warning once, clear the entry and
continue. This is so as at the time the assumption was that if
a VFIO device is hotplugged into the guest, and the userspace replays
virtual DMA mappings (i.e. TCEs) to the hardware tables and if this fails,
then there is nothing useful we can do about it.

However the assumption is not valid as these handlers are not called for
TCE replay (VFIO ioctl interface is used for that) and these handlers
are for new TCEs.

This returns an error to the guest if there is a request which cannot be
processed. By now the only possible failure must be H_TOO_HARD.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 20 ++++++--------------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 21 +++++++--------------
 2 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 01e1994..8231b17 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -568,14 +568,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
 					entry, ua, dir);
 
-		if (ret == H_SUCCESS)
-			continue;
-
-		if (ret == H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_clear_tce(stit->tbl, entry);
 			goto unlock_exit;
-
-		WARN_ON_ONCE(1);
-		kvmppc_clear_tce(stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -663,14 +659,10 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret == H_SUCCESS)
-				continue;
-
-			if (ret == H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_clear_tce(stit->tbl, entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE(1);
-			kvmppc_clear_tce(stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 977e95a..adf3b21 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -403,14 +403,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry, ua, dir);
 
-		if (ret == H_SUCCESS)
-			continue;
-
-		if (ret == H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 			return ret;
-
-		WARN_ON_ONCE_RM(1);
-		kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -556,14 +552,11 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret == H_SUCCESS)
-				continue;
-
-			if (ret == H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl,
+						entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 5/6] KVM: PPC: Propagate errors to the guest when failed instead of ignoring
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

At the moment if the PUT_TCE{_INDIRECT} handlers fail to update
the hardware tables, we print a warning once, clear the entry and
continue. This is so as at the time the assumption was that if
a VFIO device is hotplugged into the guest, and the userspace replays
virtual DMA mappings (i.e. TCEs) to the hardware tables and if this fails,
then there is nothing useful we can do about it.

However the assumption is not valid as these handlers are not called for
TCE replay (VFIO ioctl interface is used for that) and these handlers
are for new TCEs.

This returns an error to the guest if there is a request which cannot be
processed. By now the only possible failure must be H_TOO_HARD.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 20 ++++++--------------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 21 +++++++--------------
 2 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 01e1994..8231b17 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -568,14 +568,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
 					entry, ua, dir);
 
-		if (ret = H_SUCCESS)
-			continue;
-
-		if (ret = H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_clear_tce(stit->tbl, entry);
 			goto unlock_exit;
-
-		WARN_ON_ONCE(1);
-		kvmppc_clear_tce(stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -663,14 +659,10 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret = H_SUCCESS)
-				continue;
-
-			if (ret = H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_clear_tce(stit->tbl, entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE(1);
-			kvmppc_clear_tce(stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 977e95a..adf3b21 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -403,14 +403,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry, ua, dir);
 
-		if (ret = H_SUCCESS)
-			continue;
-
-		if (ret = H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
 			return ret;
-
-		WARN_ON_ONCE_RM(1);
-		kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -556,14 +552,11 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret = H_SUCCESS)
-				continue;
-
-			if (ret = H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl,
+						entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 6/6] KVM: PPC: Remove redundand permission bits removal
  2018-09-10  8:29 ` Alexey Kardashevskiy
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

The kvmppc_gpa_to_ua() helper itself takes care of the permission
bits in the TCE and yet every single caller removes them.

This changes semantics of kvmppc_gpa_to_ua() so it takes TCEs
(which are GPAs + TCE permission bits) to make the callers simpler.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* %s/kvmppc_gpa_to_ua/kvmppc_tce_to_ua/g
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c    | 12 ++++--------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 22 +++++++++-------------
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2f5d431..38d0328 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,7 +194,7 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
 		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
 				(stt)->size, (ioba), (npages)) ?        \
 				H_PARAMETER : H_SUCCESS)
-extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
 		unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
 		unsigned long idx, unsigned long tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8231b17..c0c64d1 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -378,8 +378,7 @@ static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_TOO_HARD;
 
-	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
 		return H_TOO_HARD;
 
 	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
@@ -552,8 +551,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
-			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL)) {
+	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) {
 		ret = H_PARAMETER;
 		goto unlock_exit;
 	}
@@ -614,7 +612,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		return ret;
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
-	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
+	if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
 		ret = H_TOO_HARD;
 		goto unlock_exit;
 	}
@@ -649,9 +647,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		}
 		tce = be64_to_cpu(tce);
 
-		if (kvmppc_gpa_to_ua(vcpu->kvm,
-				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index adf3b21..389dac1 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -110,8 +110,7 @@ static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_PARAMETER;
 
-	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
 		return H_TOO_HARD;
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
@@ -180,10 +179,10 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 }
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
-long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
 		unsigned long *ua, unsigned long **prmap)
 {
-	unsigned long gfn = gpa >> PAGE_SHIFT;
+	unsigned long gfn = tce >> PAGE_SHIFT;
 	struct kvm_memory_slot *memslot;
 
 	memslot = search_memslots(kvm_memslots(kvm), gfn);
@@ -191,7 +190,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 		return -EINVAL;
 
 	*ua = __gfn_to_hva_memslot(memslot, gfn) |
-		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
+		(tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	if (prmap)
@@ -200,7 +199,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
+EXPORT_SYMBOL_GPL(kvmppc_tce_to_ua);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
@@ -389,8 +388,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		return ret;
 
 	dir = iommu_tce_direction(tce);
-	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
-			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL))
+	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
 		return H_PARAMETER;
 
 	entry = ioba >> stt->page_shift;
@@ -492,7 +490,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		 */
 		struct mm_iommu_table_group_mem_t *mem;
 
-		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL))
 			return H_TOO_HARD;
 
 		mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
@@ -508,7 +506,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		 * We do not require memory to be preregistered in this case
 		 * so lock rmap and do __find_linux_pte_or_hugepte().
 		 */
-		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
 			return H_TOO_HARD;
 
 		rmap = (void *) vmalloc_to_phys(rmap);
@@ -542,9 +540,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
 		ua = 0;
-		if (kvmppc_gpa_to_ua(vcpu->kvm,
-				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kernel v2 6/6] KVM: PPC: Remove redundand permission bits removal
@ 2018-09-10  8:29   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey Kardashevskiy @ 2018-09-10  8:29 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Aneesh Kumar K.V,
	Paul Mackerras, Alex Williamson

The kvmppc_gpa_to_ua() helper itself takes care of the permission
bits in the TCE and yet every single caller removes them.

This changes semantics of kvmppc_gpa_to_ua() so it takes TCEs
(which are GPAs + TCE permission bits) to make the callers simpler.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* %s/kvmppc_gpa_to_ua/kvmppc_tce_to_ua/g
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c    | 12 ++++--------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 22 +++++++++-------------
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2f5d431..38d0328 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,7 +194,7 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
 		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
 				(stt)->size, (ioba), (npages)) ?        \
 				H_PARAMETER : H_SUCCESS)
-extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
 		unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
 		unsigned long idx, unsigned long tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8231b17..c0c64d1 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -378,8 +378,7 @@ static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_TOO_HARD;
 
-	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
 		return H_TOO_HARD;
 
 	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
@@ -552,8 +551,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
-			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL)) {
+	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) {
 		ret = H_PARAMETER;
 		goto unlock_exit;
 	}
@@ -614,7 +612,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		return ret;
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
-	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
+	if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
 		ret = H_TOO_HARD;
 		goto unlock_exit;
 	}
@@ -649,9 +647,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		}
 		tce = be64_to_cpu(tce);
 
-		if (kvmppc_gpa_to_ua(vcpu->kvm,
-				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index adf3b21..389dac1 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -110,8 +110,7 @@ static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_PARAMETER;
 
-	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
 		return H_TOO_HARD;
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
@@ -180,10 +179,10 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 }
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
-long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
 		unsigned long *ua, unsigned long **prmap)
 {
-	unsigned long gfn = gpa >> PAGE_SHIFT;
+	unsigned long gfn = tce >> PAGE_SHIFT;
 	struct kvm_memory_slot *memslot;
 
 	memslot = search_memslots(kvm_memslots(kvm), gfn);
@@ -191,7 +190,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 		return -EINVAL;
 
 	*ua = __gfn_to_hva_memslot(memslot, gfn) |
-		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
+		(tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	if (prmap)
@@ -200,7 +199,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
+EXPORT_SYMBOL_GPL(kvmppc_tce_to_ua);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
@@ -389,8 +388,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		return ret;
 
 	dir = iommu_tce_direction(tce);
-	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
-			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL))
+	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
 		return H_PARAMETER;
 
 	entry = ioba >> stt->page_shift;
@@ -492,7 +490,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		 */
 		struct mm_iommu_table_group_mem_t *mem;
 
-		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL))
 			return H_TOO_HARD;
 
 		mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
@@ -508,7 +506,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		 * We do not require memory to be preregistered in this case
 		 * so lock rmap and do __find_linux_pte_or_hugepte().
 		 */
-		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
 			return H_TOO_HARD;
 
 		rmap = (void *) vmalloc_to_phys(rmap);
@@ -542,9 +540,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
 		ua = 0;
-		if (kvmppc_gpa_to_ua(vcpu->kvm,
-				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
-				&ua, NULL))
+		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
  2018-09-10  8:29   ` Alexey Kardashevskiy
@ 2018-09-11  3:13     ` David Gibson
  -1 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2018-09-11  3:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Aneesh Kumar K.V, Paul Mackerras, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 13681 bytes --]

On Mon, Sep 10, 2018 at 06:29:07PM +1000, Alexey Kardashevskiy wrote:
> At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
> which in turn reads the old TCE and if it was a valid entry - marks
> the physical page dirty if it was mapped for writing. Since it is
> the real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
> to get the page struct. However SetPageDirty() itself reads the compound
> page head and returns a virtual address for the head page struct and
> setting dirty bit for that kills the system.
> 
> This adds additional dirty bit tracking into the MM/IOMMU API for use
> in the real mode. Note that this does not change how VFIO and
> KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
> - use the lowest bit of the cached host phys address to carry
> the dirty bit;
> - mark pages dirty when they are unpinned which happens when
> the preregistered memory is released which always happens in virtual
> mode;
> - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
> - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
> in the new mm_iommu_ua_mark_dirty_rm() helper;
> - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
> caller anyway) to reduce the real mode KVM and IOMMU knowledge
> across different subsystems.
> 
> This removes realmode_pfn_to_page() as it is not used anymore.
> 
> While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
> the real mode only and modules cannot call it anyway.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Changes:
> v2:
> * only do delaying dirtying for the real mode
> * no change in VFIO IOMMU SPAPR TCE driver is needed anymore
> * inverted MM_IOMMU_TABLE_GROUP_PAGE_MASK
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
>  arch/powerpc/include/asm/iommu.h             |  2 --
>  arch/powerpc/include/asm/mmu_context.h       |  1 +
>  arch/powerpc/kernel/iommu.c                  | 25 --------------
>  arch/powerpc/kvm/book3s_64_vio_hv.c          | 39 +++++++++++++++++-----
>  arch/powerpc/mm/init_64.c                    | 49 ----------------------------
>  arch/powerpc/mm/mmu_context_iommu.c          | 34 ++++++++++++++++---
>  7 files changed, 62 insertions(+), 89 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 13a688f..2fdc865 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1051,7 +1051,6 @@ static inline void vmemmap_remove_mapping(unsigned long start,
>  	return hash__vmemmap_remove_mapping(start, page_size);
>  }
>  #endif
> -struct page *realmode_pfn_to_page(unsigned long pfn);
>  
>  static inline pte_t pmd_pte(pmd_t pmd)
>  {
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index ab3a4fb..3d4b88c 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -220,8 +220,6 @@ extern void iommu_del_device(struct device *dev);
>  extern int __init tce_iommu_bus_notifier_init(void);
>  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>  		unsigned long *hpa, enum dma_data_direction *direction);
> -extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> -		unsigned long *hpa, enum dma_data_direction *direction);
>  #else
>  static inline void iommu_register_group(struct iommu_table_group *table_group,
>  					int pci_domain_number,
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index b2f89b6..b694d6a 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -38,6 +38,7 @@ extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
>  extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
>  		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
> +extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua);
>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
>  extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
>  #endif
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index af7a20d..19b4c62 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1013,31 +1013,6 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>  }
>  EXPORT_SYMBOL_GPL(iommu_tce_xchg);
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> -		unsigned long *hpa, enum dma_data_direction *direction)
> -{
> -	long ret;
> -
> -	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> -
> -	if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> -			(*direction == DMA_BIDIRECTIONAL))) {
> -		struct page *pg = realmode_pfn_to_page(*hpa >> PAGE_SHIFT);
> -
> -		if (likely(pg)) {
> -			SetPageDirty(pg);
> -		} else {
> -			tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> -			ret = -EFAULT;
> -		}
> -	}
> -
> -	return ret;
> -}
> -EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);
> -#endif
> -
>  int iommu_take_ownership(struct iommu_table *tbl)
>  {
>  	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 506a4d4..6821ead 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -187,12 +187,35 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> -static void kvmppc_rm_clear_tce(struct iommu_table *tbl, unsigned long entry)
> +static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long *hpa,
> +		enum dma_data_direction *direction)
> +{
> +	long ret;
> +
> +	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> +
> +	if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> +				(*direction == DMA_BIDIRECTIONAL))) {
> +		__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
> +		/*
> +		 * kvmppc_rm_tce_iommu_do_map() updates the UA cache after
> +		 * calling this so we still get here a valid UA.
> +		 */
> +		if (pua && *pua)
> +			mm_iommu_ua_mark_dirty_rm(mm, be64_to_cpu(*pua));
> +	}
> +
> +	return ret;
> +}
> +
> +static void kvmppc_rm_clear_tce(struct kvm *kvm, struct iommu_table *tbl,
> +		unsigned long entry)
>  {
>  	unsigned long hpa = 0;
>  	enum dma_data_direction dir = DMA_NONE;
>  
> -	iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
> +	iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
>  }
>  
>  static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
> @@ -224,7 +247,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  	unsigned long hpa = 0;
>  	long ret;
>  
> -	if (iommu_tce_xchg_rm(tbl, entry, &hpa, &dir))
> +	if (iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir))
>  		/*
>  		 * real mode xchg can fail if struct page crosses
>  		 * a page boundary
> @@ -236,7 +259,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  
>  	ret = kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry);
>  	if (ret)
> -		iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
> +		iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
>  
>  	return ret;
>  }
> @@ -282,7 +305,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
>  		return H_CLOSED;
>  
> -	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
> +	ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
>  	if (ret) {
>  		mm_iommu_mapped_dec(mem);
>  		/*
> @@ -371,7 +394,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			return ret;
>  
>  		WARN_ON_ONCE_RM(1);
> -		kvmppc_rm_clear_tce(stit->tbl, entry);
> +		kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
>  	}
>  
>  	kvmppc_tce_put(stt, entry, tce);
> @@ -520,7 +543,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  				goto unlock_exit;
>  
>  			WARN_ON_ONCE_RM(1);
> -			kvmppc_rm_clear_tce(stit->tbl, entry);
> +			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
>  		}
>  
>  		kvmppc_tce_put(stt, entry + i, tce);
> @@ -571,7 +594,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  				return ret;
>  
>  			WARN_ON_ONCE_RM(1);
> -			kvmppc_rm_clear_tce(stit->tbl, entry);
> +			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
>  		}
>  	}
>  
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 51ce091..7a9886f 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -308,55 +308,6 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>  {
>  }
>  
> -/*
> - * We do not have access to the sparsemem vmemmap, so we fallback to
> - * walking the list of sparsemem blocks which we already maintain for
> - * the sake of crashdump. In the long run, we might want to maintain
> - * a tree if performance of that linear walk becomes a problem.
> - *
> - * realmode_pfn_to_page functions can fail due to:
> - * 1) As real sparsemem blocks do not lay in RAM continously (they
> - * are in virtual address space which is not available in the real mode),
> - * the requested page struct can be split between blocks so get_page/put_page
> - * may fail.
> - * 2) When huge pages are used, the get_page/put_page API will fail
> - * in real mode as the linked addresses in the page struct are virtual
> - * too.
> - */
> -struct page *realmode_pfn_to_page(unsigned long pfn)
> -{
> -	struct vmemmap_backing *vmem_back;
> -	struct page *page;
> -	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
> -	unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
> -
> -	for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
> -		if (pg_va < vmem_back->virt_addr)
> -			continue;
> -
> -		/* After vmemmap_list entry free is possible, need check all */
> -		if ((pg_va + sizeof(struct page)) <=
> -				(vmem_back->virt_addr + page_size)) {
> -			page = (struct page *) (vmem_back->phys + pg_va -
> -				vmem_back->virt_addr);
> -			return page;
> -		}
> -	}
> -
> -	/* Probably that page struct is split between real pages */
> -	return NULL;
> -}
> -EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
> -
> -#else
> -
> -struct page *realmode_pfn_to_page(unsigned long pfn)
> -{
> -	struct page *page = pfn_to_page(pfn);
> -	return page;
> -}
> -EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
> -
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index c9ee9e2..56c2234 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -18,11 +18,15 @@
>  #include <linux/migrate.h>
>  #include <linux/hugetlb.h>
>  #include <linux/swap.h>
> +#include <linux/sizes.h>
>  #include <asm/mmu_context.h>
>  #include <asm/pte-walk.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
>  
> +#define MM_IOMMU_TABLE_GROUP_PAGE_DIRTY	0x1
> +#define MM_IOMMU_TABLE_GROUP_PAGE_MASK	~(SZ_4K - 1)
> +
>  struct mm_iommu_table_group_mem_t {
>  	struct list_head next;
>  	struct rcu_head rcu;
> @@ -263,6 +267,9 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem)
>  		if (!page)
>  			continue;
>  
> +		if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
> +			SetPageDirty(page);
> +
>  		put_page(page);
>  		mem->hpas[i] = 0;
>  	}
> @@ -360,7 +367,6 @@ struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(struct mm_struct *mm,
>  
>  	return ret;
>  }
> -EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm);
>  
>  struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
>  		unsigned long ua, unsigned long entries)
> @@ -390,7 +396,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  	if (pageshift > mem->pageshift)
>  		return -EFAULT;
>  
> -	*hpa = *va | (ua & ~PAGE_MASK);
> +	*hpa = (*va & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
>  
>  	return 0;
>  }
> @@ -413,11 +419,31 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
>  	if (!pa)
>  		return -EFAULT;
>  
> -	*hpa = *pa | (ua & ~PAGE_MASK);
> +	*hpa = (*pa & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa_rm);
> +
> +extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua)
> +{
> +	struct mm_iommu_table_group_mem_t *mem;
> +	long entry;
> +	void *va;
> +	unsigned long *pa;
> +
> +	mem = mm_iommu_lookup_rm(mm, ua, PAGE_SIZE);
> +	if (!mem)
> +		return;
> +
> +	entry = (ua - mem->ua) >> PAGE_SHIFT;
> +	va = &mem->hpas[entry];
> +
> +	pa = (void *) vmalloc_to_phys(va);
> +	if (!pa)
> +		return;
> +
> +	*pa |= MM_IOMMU_TABLE_GROUP_PAGE_DIRTY;
> +}
>  
>  long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem)
>  {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
@ 2018-09-11  3:13     ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2018-09-11  3:13 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Aneesh Kumar K.V, Paul Mackerras, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 13681 bytes --]

On Mon, Sep 10, 2018 at 06:29:07PM +1000, Alexey Kardashevskiy wrote:
> At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
> which in turn reads the old TCE and if it was a valid entry - marks
> the physical page dirty if it was mapped for writing. Since it is
> the real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
> to get the page struct. However SetPageDirty() itself reads the compound
> page head and returns a virtual address for the head page struct and
> setting dirty bit for that kills the system.
> 
> This adds additional dirty bit tracking into the MM/IOMMU API for use
> in the real mode. Note that this does not change how VFIO and
> KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
> - use the lowest bit of the cached host phys address to carry
> the dirty bit;
> - mark pages dirty when they are unpinned which happens when
> the preregistered memory is released which always happens in virtual
> mode;
> - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
> - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
> in the new mm_iommu_ua_mark_dirty_rm() helper;
> - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
> caller anyway) to reduce the real mode KVM and IOMMU knowledge
> across different subsystems.
> 
> This removes realmode_pfn_to_page() as it is not used anymore.
> 
> While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
> the real mode only and modules cannot call it anyway.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Changes:
> v2:
> * only do delaying dirtying for the real mode
> * no change in VFIO IOMMU SPAPR TCE driver is needed anymore
> * inverted MM_IOMMU_TABLE_GROUP_PAGE_MASK
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
>  arch/powerpc/include/asm/iommu.h             |  2 --
>  arch/powerpc/include/asm/mmu_context.h       |  1 +
>  arch/powerpc/kernel/iommu.c                  | 25 --------------
>  arch/powerpc/kvm/book3s_64_vio_hv.c          | 39 +++++++++++++++++-----
>  arch/powerpc/mm/init_64.c                    | 49 ----------------------------
>  arch/powerpc/mm/mmu_context_iommu.c          | 34 ++++++++++++++++---
>  7 files changed, 62 insertions(+), 89 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 13a688f..2fdc865 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1051,7 +1051,6 @@ static inline void vmemmap_remove_mapping(unsigned long start,
>  	return hash__vmemmap_remove_mapping(start, page_size);
>  }
>  #endif
> -struct page *realmode_pfn_to_page(unsigned long pfn);
>  
>  static inline pte_t pmd_pte(pmd_t pmd)
>  {
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index ab3a4fb..3d4b88c 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -220,8 +220,6 @@ extern void iommu_del_device(struct device *dev);
>  extern int __init tce_iommu_bus_notifier_init(void);
>  extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>  		unsigned long *hpa, enum dma_data_direction *direction);
> -extern long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> -		unsigned long *hpa, enum dma_data_direction *direction);
>  #else
>  static inline void iommu_register_group(struct iommu_table_group *table_group,
>  					int pci_domain_number,
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index b2f89b6..b694d6a 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -38,6 +38,7 @@ extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
>  extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
>  		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
> +extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua);
>  extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
>  extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
>  #endif
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index af7a20d..19b4c62 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1013,31 +1013,6 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
>  }
>  EXPORT_SYMBOL_GPL(iommu_tce_xchg);
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -long iommu_tce_xchg_rm(struct iommu_table *tbl, unsigned long entry,
> -		unsigned long *hpa, enum dma_data_direction *direction)
> -{
> -	long ret;
> -
> -	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> -
> -	if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> -			(*direction == DMA_BIDIRECTIONAL))) {
> -		struct page *pg = realmode_pfn_to_page(*hpa >> PAGE_SHIFT);
> -
> -		if (likely(pg)) {
> -			SetPageDirty(pg);
> -		} else {
> -			tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> -			ret = -EFAULT;
> -		}
> -	}
> -
> -	return ret;
> -}
> -EXPORT_SYMBOL_GPL(iommu_tce_xchg_rm);
> -#endif
> -
>  int iommu_take_ownership(struct iommu_table *tbl)
>  {
>  	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 506a4d4..6821ead 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -187,12 +187,35 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> -static void kvmppc_rm_clear_tce(struct iommu_table *tbl, unsigned long entry)
> +static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long *hpa,
> +		enum dma_data_direction *direction)
> +{
> +	long ret;
> +
> +	ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction);
> +
> +	if (!ret && ((*direction == DMA_FROM_DEVICE) ||
> +				(*direction == DMA_BIDIRECTIONAL))) {
> +		__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY_RM(tbl, entry);
> +		/*
> +		 * kvmppc_rm_tce_iommu_do_map() updates the UA cache after
> +		 * calling this so we still get here a valid UA.
> +		 */
> +		if (pua && *pua)
> +			mm_iommu_ua_mark_dirty_rm(mm, be64_to_cpu(*pua));
> +	}
> +
> +	return ret;
> +}
> +
> +static void kvmppc_rm_clear_tce(struct kvm *kvm, struct iommu_table *tbl,
> +		unsigned long entry)
>  {
>  	unsigned long hpa = 0;
>  	enum dma_data_direction dir = DMA_NONE;
>  
> -	iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
> +	iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
>  }
>  
>  static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
> @@ -224,7 +247,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  	unsigned long hpa = 0;
>  	long ret;
>  
> -	if (iommu_tce_xchg_rm(tbl, entry, &hpa, &dir))
> +	if (iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir))
>  		/*
>  		 * real mode xchg can fail if struct page crosses
>  		 * a page boundary
> @@ -236,7 +259,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  
>  	ret = kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry);
>  	if (ret)
> -		iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
> +		iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
>  
>  	return ret;
>  }
> @@ -282,7 +305,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
>  		return H_CLOSED;
>  
> -	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
> +	ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir);
>  	if (ret) {
>  		mm_iommu_mapped_dec(mem);
>  		/*
> @@ -371,7 +394,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			return ret;
>  
>  		WARN_ON_ONCE_RM(1);
> -		kvmppc_rm_clear_tce(stit->tbl, entry);
> +		kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
>  	}
>  
>  	kvmppc_tce_put(stt, entry, tce);
> @@ -520,7 +543,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  				goto unlock_exit;
>  
>  			WARN_ON_ONCE_RM(1);
> -			kvmppc_rm_clear_tce(stit->tbl, entry);
> +			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
>  		}
>  
>  		kvmppc_tce_put(stt, entry + i, tce);
> @@ -571,7 +594,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  				return ret;
>  
>  			WARN_ON_ONCE_RM(1);
> -			kvmppc_rm_clear_tce(stit->tbl, entry);
> +			kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry);
>  		}
>  	}
>  
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 51ce091..7a9886f 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -308,55 +308,6 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>  {
>  }
>  
> -/*
> - * We do not have access to the sparsemem vmemmap, so we fallback to
> - * walking the list of sparsemem blocks which we already maintain for
> - * the sake of crashdump. In the long run, we might want to maintain
> - * a tree if performance of that linear walk becomes a problem.
> - *
> - * realmode_pfn_to_page functions can fail due to:
> - * 1) As real sparsemem blocks do not lay in RAM continously (they
> - * are in virtual address space which is not available in the real mode),
> - * the requested page struct can be split between blocks so get_page/put_page
> - * may fail.
> - * 2) When huge pages are used, the get_page/put_page API will fail
> - * in real mode as the linked addresses in the page struct are virtual
> - * too.
> - */
> -struct page *realmode_pfn_to_page(unsigned long pfn)
> -{
> -	struct vmemmap_backing *vmem_back;
> -	struct page *page;
> -	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
> -	unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
> -
> -	for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
> -		if (pg_va < vmem_back->virt_addr)
> -			continue;
> -
> -		/* After vmemmap_list entry free is possible, need check all */
> -		if ((pg_va + sizeof(struct page)) <=
> -				(vmem_back->virt_addr + page_size)) {
> -			page = (struct page *) (vmem_back->phys + pg_va -
> -				vmem_back->virt_addr);
> -			return page;
> -		}
> -	}
> -
> -	/* Probably that page struct is split between real pages */
> -	return NULL;
> -}
> -EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
> -
> -#else
> -
> -struct page *realmode_pfn_to_page(unsigned long pfn)
> -{
> -	struct page *page = pfn_to_page(pfn);
> -	return page;
> -}
> -EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
> -
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index c9ee9e2..56c2234 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -18,11 +18,15 @@
>  #include <linux/migrate.h>
>  #include <linux/hugetlb.h>
>  #include <linux/swap.h>
> +#include <linux/sizes.h>
>  #include <asm/mmu_context.h>
>  #include <asm/pte-walk.h>
>  
>  static DEFINE_MUTEX(mem_list_mutex);
>  
> +#define MM_IOMMU_TABLE_GROUP_PAGE_DIRTY	0x1
> +#define MM_IOMMU_TABLE_GROUP_PAGE_MASK	~(SZ_4K - 1)
> +
>  struct mm_iommu_table_group_mem_t {
>  	struct list_head next;
>  	struct rcu_head rcu;
> @@ -263,6 +267,9 @@ static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem)
>  		if (!page)
>  			continue;
>  
> +		if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
> +			SetPageDirty(page);
> +
>  		put_page(page);
>  		mem->hpas[i] = 0;
>  	}
> @@ -360,7 +367,6 @@ struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(struct mm_struct *mm,
>  
>  	return ret;
>  }
> -EXPORT_SYMBOL_GPL(mm_iommu_lookup_rm);
>  
>  struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
>  		unsigned long ua, unsigned long entries)
> @@ -390,7 +396,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
>  	if (pageshift > mem->pageshift)
>  		return -EFAULT;
>  
> -	*hpa = *va | (ua & ~PAGE_MASK);
> +	*hpa = (*va & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
>  
>  	return 0;
>  }
> @@ -413,11 +419,31 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
>  	if (!pa)
>  		return -EFAULT;
>  
> -	*hpa = *pa | (ua & ~PAGE_MASK);
> +	*hpa = (*pa & MM_IOMMU_TABLE_GROUP_PAGE_MASK) | (ua & ~PAGE_MASK);
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa_rm);
> +
> +extern void mm_iommu_ua_mark_dirty_rm(struct mm_struct *mm, unsigned long ua)
> +{
> +	struct mm_iommu_table_group_mem_t *mem;
> +	long entry;
> +	void *va;
> +	unsigned long *pa;
> +
> +	mem = mm_iommu_lookup_rm(mm, ua, PAGE_SIZE);
> +	if (!mem)
> +		return;
> +
> +	entry = (ua - mem->ua) >> PAGE_SHIFT;
> +	va = &mem->hpas[entry];
> +
> +	pa = (void *) vmalloc_to_phys(va);
> +	if (!pa)
> +		return;
> +
> +	*pa |= MM_IOMMU_TABLE_GROUP_PAGE_DIRTY;
> +}
>  
>  long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem)
>  {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Remove redundand permission bits removal
  2018-09-10  8:29   ` Alexey Kardashevskiy
@ 2018-09-11  3:15     ` David Gibson
  -1 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2018-09-11  3:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Aneesh Kumar K.V, Paul Mackerras, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 6853 bytes --]

On Mon, Sep 10, 2018 at 06:29:12PM +1000, Alexey Kardashevskiy wrote:
> The kvmppc_gpa_to_ua() helper itself takes care of the permission
> bits in the TCE and yet every single caller removes them.
> 
> This changes semantics of kvmppc_gpa_to_ua() so it takes TCEs
> (which are GPAs + TCE permission bits) to make the callers simpler.
> 
> This should cause no behavioural change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Changes:
> v2:
> * %s/kvmppc_gpa_to_ua/kvmppc_tce_to_ua/g
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
>  arch/powerpc/kvm/book3s_64_vio.c    | 12 ++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 22 +++++++++-------------
>  3 files changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 2f5d431..38d0328 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -194,7 +194,7 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>  		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
>  				(stt)->size, (ioba), (npages)) ?        \
>  				H_PARAMETER : H_SUCCESS)
> -extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
>  		unsigned long *ua, unsigned long **prmap);
>  extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
>  		unsigned long idx, unsigned long tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 8231b17..c0c64d1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -378,8 +378,7 @@ static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
>  	if (iommu_tce_check_gpa(stt->page_shift, gpa))
>  		return H_TOO_HARD;
>  
> -	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
>  		return H_TOO_HARD;
>  
>  	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
> @@ -552,8 +551,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
> -	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
> -			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL)) {
> +	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) {
>  		ret = H_PARAMETER;
>  		goto unlock_exit;
>  	}
> @@ -614,7 +612,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		return ret;
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
> -	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> +	if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
>  		ret = H_TOO_HARD;
>  		goto unlock_exit;
>  	}
> @@ -649,9 +647,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		}
>  		tce = be64_to_cpu(tce);
>  
> -		if (kvmppc_gpa_to_ua(vcpu->kvm,
> -				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index adf3b21..389dac1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -110,8 +110,7 @@ static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
>  	if (iommu_tce_check_gpa(stt->page_shift, gpa))
>  		return H_PARAMETER;
>  
> -	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
>  		return H_TOO_HARD;
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> @@ -180,10 +179,10 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
> -long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
>  		unsigned long *ua, unsigned long **prmap)
>  {
> -	unsigned long gfn = gpa >> PAGE_SHIFT;
> +	unsigned long gfn = tce >> PAGE_SHIFT;
>  	struct kvm_memory_slot *memslot;
>  
>  	memslot = search_memslots(kvm_memslots(kvm), gfn);
> @@ -191,7 +190,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  		return -EINVAL;
>  
>  	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> -		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> +		(tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  	if (prmap)
> @@ -200,7 +199,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> +EXPORT_SYMBOL_GPL(kvmppc_tce_to_ua);
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
> @@ -389,8 +388,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		return ret;
>  
>  	dir = iommu_tce_direction(tce);
> -	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
> -			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL))
> +	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
>  		return H_PARAMETER;
>  
>  	entry = ioba >> stt->page_shift;
> @@ -492,7 +490,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		 */
>  		struct mm_iommu_table_group_mem_t *mem;
>  
> -		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL))
>  			return H_TOO_HARD;
>  
>  		mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
> @@ -508,7 +506,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		 * We do not require memory to be preregistered in this case
>  		 * so lock rmap and do __find_linux_pte_or_hugepte().
>  		 */
> -		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>  			return H_TOO_HARD;
>  
>  		rmap = (void *) vmalloc_to_phys(rmap);
> @@ -542,9 +540,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>  
>  		ua = 0;
> -		if (kvmppc_gpa_to_ua(vcpu->kvm,
> -				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Remove redundand permission bits removal
@ 2018-09-11  3:15     ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2018-09-11  3:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, Aneesh Kumar K.V, Paul Mackerras, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 6853 bytes --]

On Mon, Sep 10, 2018 at 06:29:12PM +1000, Alexey Kardashevskiy wrote:
> The kvmppc_gpa_to_ua() helper itself takes care of the permission
> bits in the TCE and yet every single caller removes them.
> 
> This changes semantics of kvmppc_gpa_to_ua() so it takes TCEs
> (which are GPAs + TCE permission bits) to make the callers simpler.
> 
> This should cause no behavioural change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> Changes:
> v2:
> * %s/kvmppc_gpa_to_ua/kvmppc_tce_to_ua/g
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
>  arch/powerpc/kvm/book3s_64_vio.c    | 12 ++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 22 +++++++++-------------
>  3 files changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 2f5d431..38d0328 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -194,7 +194,7 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>  		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
>  				(stt)->size, (ioba), (npages)) ?        \
>  				H_PARAMETER : H_SUCCESS)
> -extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
>  		unsigned long *ua, unsigned long **prmap);
>  extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
>  		unsigned long idx, unsigned long tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 8231b17..c0c64d1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -378,8 +378,7 @@ static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
>  	if (iommu_tce_check_gpa(stt->page_shift, gpa))
>  		return H_TOO_HARD;
>  
> -	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
>  		return H_TOO_HARD;
>  
>  	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
> @@ -552,8 +551,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
> -	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
> -			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL)) {
> +	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) {
>  		ret = H_PARAMETER;
>  		goto unlock_exit;
>  	}
> @@ -614,7 +612,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		return ret;
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
> -	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> +	if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
>  		ret = H_TOO_HARD;
>  		goto unlock_exit;
>  	}
> @@ -649,9 +647,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		}
>  		tce = be64_to_cpu(tce);
>  
> -		if (kvmppc_gpa_to_ua(vcpu->kvm,
> -				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index adf3b21..389dac1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -110,8 +110,7 @@ static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
>  	if (iommu_tce_check_gpa(stt->page_shift, gpa))
>  		return H_PARAMETER;
>  
> -	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +	if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
>  		return H_TOO_HARD;
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> @@ -180,10 +179,10 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
> -long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
>  		unsigned long *ua, unsigned long **prmap)
>  {
> -	unsigned long gfn = gpa >> PAGE_SHIFT;
> +	unsigned long gfn = tce >> PAGE_SHIFT;
>  	struct kvm_memory_slot *memslot;
>  
>  	memslot = search_memslots(kvm_memslots(kvm), gfn);
> @@ -191,7 +190,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  		return -EINVAL;
>  
>  	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> -		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> +		(tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  	if (prmap)
> @@ -200,7 +199,7 @@ long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> +EXPORT_SYMBOL_GPL(kvmppc_tce_to_ua);
>  
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
> @@ -389,8 +388,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		return ret;
>  
>  	dir = iommu_tce_direction(tce);
> -	if ((dir != DMA_NONE) && kvmppc_gpa_to_ua(vcpu->kvm,
> -			tce & ~(TCE_PCI_READ | TCE_PCI_WRITE), &ua, NULL))
> +	if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
>  		return H_PARAMETER;
>  
>  	entry = ioba >> stt->page_shift;
> @@ -492,7 +490,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		 */
>  		struct mm_iommu_table_group_mem_t *mem;
>  
> -		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL))
>  			return H_TOO_HARD;
>  
>  		mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
> @@ -508,7 +506,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		 * We do not require memory to be preregistered in this case
>  		 * so lock rmap and do __find_linux_pte_or_hugepte().
>  		 */
> -		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>  			return H_TOO_HARD;
>  
>  		rmap = (void *) vmalloc_to_phys(rmap);
> @@ -542,9 +540,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>  
>  		ua = 0;
> -		if (kvmppc_gpa_to_ua(vcpu->kvm,
> -				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> -				&ua, NULL))
> +		if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-09-11  3:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10  8:29 [PATCH kernel v2 0/6] KVM: PPC: TCE improvements Alexey Kardashevskiy
2018-09-10  8:29 ` Alexey Kardashevskiy
2018-09-10  8:29 ` [PATCH kernel v2 1/6] KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode Alexey Kardashevskiy
2018-09-10  8:29   ` Alexey Kardashevskiy
2018-09-11  3:13   ` David Gibson
2018-09-11  3:13     ` David Gibson
2018-09-10  8:29 ` [PATCH kernel v2 2/6] KVM: PPC: Validate all tces before updating tables Alexey Kardashevskiy
2018-09-10  8:29   ` Alexey Kardashevskiy
2018-09-10  8:29 ` [PATCH kernel v2 3/6] KVM: PPC: Inform the userspace about TCE update failures Alexey Kardashevskiy
2018-09-10  8:29   ` Alexey Kardashevskiy
2018-09-10  8:29 ` [PATCH kernel v2 4/6] KVM: PPC: Validate TCEs against preregistered memory page sizes Alexey Kardashevskiy
2018-09-10  8:29   ` Alexey Kardashevskiy
2018-09-10  8:29 ` [PATCH kernel v2 5/6] KVM: PPC: Propagate errors to the guest when failed instead of ignoring Alexey Kardashevskiy
2018-09-10  8:29   ` Alexey Kardashevskiy
2018-09-10  8:29 ` [PATCH kernel v2 6/6] KVM: PPC: Remove redundand permission bits removal Alexey Kardashevskiy
2018-09-10  8:29   ` Alexey Kardashevskiy
2018-09-11  3:15   ` David Gibson
2018-09-11  3:15     ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.