All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel 0/9] KVM: PPC: Add in-kernel multitce handling
@ 2015-09-15 10:49 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

These patches enable in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls which allow doing multiple (up to 512) TCE entries
update in a single call saving time on switching context. QEMU already
supports these hypercalls so this is just an optimization.

Both HV and PR KVM modes are supported.

This does not affect VFIO, this support is coming next.

Please comment. Thanks.


Alexey Kardashevskiy (9):
  rcu: Define notrace version of list_for_each_entry_rcu
  KVM: PPC: Make real_vmalloc_addr() public
  KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  KVM: PPC: Use RCU for arch.spapr_tce_tables
  KVM: PPC: Account TCE-containing pages in locked_vm
  KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  KVM: Fix KVM_SMI chapter number
  KVM: PPC: Add support for multiple-TCE hcalls

 Documentation/virtual/kvm/api.txt        |  27 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 -
 arch/powerpc/include/asm/kvm_host.h      |   1 +
 arch/powerpc/include/asm/kvm_ppc.h       |  16 ++
 arch/powerpc/include/asm/mmu-hash64.h    |   3 +
 arch/powerpc/kvm/book3s.c                |   2 +-
 arch/powerpc/kvm/book3s_64_vio.c         | 185 ++++++++++++++++--
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 310 +++++++++++++++++++++++++++----
 arch/powerpc/kvm/book3s_hv.c             |  26 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      |  17 --
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c        |  35 ++++
 arch/powerpc/kvm/powerpc.c               |   3 +
 arch/powerpc/mm/hash_utils_64.c          |  17 ++
 include/linux/rculist.h                  |  38 ++++
 15 files changed, 610 insertions(+), 78 deletions(-)

-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH kernel 0/9] KVM: PPC: Add in-kernel multitce handling
@ 2015-09-15 10:49 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

These patches enable in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls which allow doing multiple (up to 512) TCE entries
update in a single call saving time on switching context. QEMU already
supports these hypercalls so this is just an optimization.

Both HV and PR KVM modes are supported.

This does not affect VFIO, this support is coming next.

Please comment. Thanks.


Alexey Kardashevskiy (9):
  rcu: Define notrace version of list_for_each_entry_rcu
  KVM: PPC: Make real_vmalloc_addr() public
  KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  KVM: PPC: Use RCU for arch.spapr_tce_tables
  KVM: PPC: Account TCE-containing pages in locked_vm
  KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  KVM: Fix KVM_SMI chapter number
  KVM: PPC: Add support for multiple-TCE hcalls

 Documentation/virtual/kvm/api.txt        |  27 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 -
 arch/powerpc/include/asm/kvm_host.h      |   1 +
 arch/powerpc/include/asm/kvm_ppc.h       |  16 ++
 arch/powerpc/include/asm/mmu-hash64.h    |   3 +
 arch/powerpc/kvm/book3s.c                |   2 +-
 arch/powerpc/kvm/book3s_64_vio.c         | 185 ++++++++++++++++--
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 310 +++++++++++++++++++++++++++----
 arch/powerpc/kvm/book3s_hv.c             |  26 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      |  17 --
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c        |  35 ++++
 arch/powerpc/kvm/powerpc.c               |   3 +
 arch/powerpc/mm/hash_utils_64.c          |  17 ++
 include/linux/rculist.h                  |  38 ++++
 15 files changed, 610 insertions(+), 78 deletions(-)

-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace
which use rcu_dereference_raw_notrace instead of rcu_dereference_raw.
This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/linux/rculist.h | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 17c6b1f..439c4d7 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
 })
 
 /**
+ * list_entry_rcu_notrace - get the struct for this entry
+ * @ptr:        the &struct list_head pointer.
+ * @type:       the type of the struct this is embedded in.
+ * @member:     the name of the list_struct within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
+ *
+ * This is the same as list_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_entry_rcu_notrace(ptr, type, member) \
+({ \
+	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
+	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
+			type, member); \
+})
+
+/**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
  *
  * Implementing those functions following their counterparts list_empty() and
@@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
 		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as list_add_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ *
+ * This is the same as list_for_each_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_for_each_entry_rcu_notrace(pos, head, member) \
+	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
+		&pos->member != (head); \
+		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
+				member))
+
+/**
  * list_for_each_entry_continue_rcu - continue iteration over list of given type
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace
which use rcu_dereference_raw_notrace instead of rcu_dereference_raw.
This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/linux/rculist.h | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 17c6b1f..439c4d7 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
 })
 
 /**
+ * list_entry_rcu_notrace - get the struct for this entry
+ * @ptr:        the &struct list_head pointer.
+ * @type:       the type of the struct this is embedded in.
+ * @member:     the name of the list_struct within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
+ *
+ * This is the same as list_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_entry_rcu_notrace(ptr, type, member) \
+({ \
+	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
+	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
+			type, member); \
+})
+
+/**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
  *
  * Implementing those functions following their counterparts list_empty() and
@@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
 		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as list_add_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ *
+ * This is the same as list_for_each_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_for_each_entry_rcu_notrace(pos, head, member) \
+	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
+		&pos->member != (head); \
+		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
+				member))
+
+/**
  * list_for_each_entry_continue_rcu - continue iteration over list of given type
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This helper translates vmalloc'd addresses to linear addresses.
It is only used by the KVM MMU code now and resides in the HV KVM code.
We will need it further in the TCE code and the DMA memory preregistration
code called in real mode.

This makes real_vmalloc_addr() public and moves it to the powerpc code as
it does not do anything special for KVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/mmu-hash64.h |  3 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 17 -----------------
 arch/powerpc/mm/hash_utils_64.c       | 17 +++++++++++++++++
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index a82f534..fd06b73 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -606,6 +606,9 @@ static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize)
 	context = (MAX_USER_CONTEXT) + ((ea >> 60) - 0xc) + 1;
 	return get_vsid(context, ea, ssize);
 }
+
+void *real_vmalloc_addr(void *x);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_MMU_HASH64_H_ */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c1df9bb..987b7d1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -22,23 +22,6 @@
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
 
-/* Translate address of a vmalloc'd thing to a linear map address */
-static void *real_vmalloc_addr(void *x)
-{
-	unsigned long addr = (unsigned long) x;
-	pte_t *p;
-	/*
-	 * assume we don't have huge pages in vmalloc space...
-	 * So don't worry about THP collapse/split. Called
-	 * Only in realmode, hence won't need irq_save/restore.
-	 */
-	p = __find_linux_pte_or_hugepte(swapper_pg_dir, addr, NULL);
-	if (!p || !pte_present(*p))
-		return NULL;
-	addr = (pte_pfn(*p) << PAGE_SHIFT) | (addr & ~PAGE_MASK);
-	return __va(addr);
-}
-
 /* Return 1 if we need to do a global tlbie, 0 if we can use tlbiel */
 static int global_invalidates(struct kvm *kvm, unsigned long flags)
 {
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5ec987f..9737d6a 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1556,3 +1556,20 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	/* Finally limit subsequent allocations */
 	memblock_set_current_limit(ppc64_rma_size);
 }
+
+/* Translate address of a vmalloc'd thing to a linear map address */
+void *real_vmalloc_addr(void *x)
+{
+	unsigned long addr = (unsigned long) x;
+	pte_t *p;
+	/*
+	 * assume we don't have huge pages in vmalloc space...
+	 * So don't worry about THP collapse/split. Called
+	 * Only in realmode, hence won't need irq_save/restore.
+	 */
+	p = __find_linux_pte_or_hugepte(swapper_pg_dir, addr, NULL);
+	if (!p || !pte_present(*p))
+		return NULL;
+	addr = (pte_pfn(*p) << PAGE_SHIFT) | (addr & ~PAGE_MASK);
+	return __va(addr);
+}
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This helper translates vmalloc'd addresses to linear addresses.
It is only used by the KVM MMU code now and resides in the HV KVM code.
We will need it further in the TCE code and the DMA memory preregistration
code called in real mode.

This makes real_vmalloc_addr() public and moves it to the powerpc code as
it does not do anything special for KVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/mmu-hash64.h |  3 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 17 -----------------
 arch/powerpc/mm/hash_utils_64.c       | 17 +++++++++++++++++
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
index a82f534..fd06b73 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -606,6 +606,9 @@ static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize)
 	context = (MAX_USER_CONTEXT) + ((ea >> 60) - 0xc) + 1;
 	return get_vsid(context, ea, ssize);
 }
+
+void *real_vmalloc_addr(void *x);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_MMU_HASH64_H_ */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c1df9bb..987b7d1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -22,23 +22,6 @@
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
 
-/* Translate address of a vmalloc'd thing to a linear map address */
-static void *real_vmalloc_addr(void *x)
-{
-	unsigned long addr = (unsigned long) x;
-	pte_t *p;
-	/*
-	 * assume we don't have huge pages in vmalloc space...
-	 * So don't worry about THP collapse/split. Called
-	 * Only in realmode, hence won't need irq_save/restore.
-	 */
-	p = __find_linux_pte_or_hugepte(swapper_pg_dir, addr, NULL);
-	if (!p || !pte_present(*p))
-		return NULL;
-	addr = (pte_pfn(*p) << PAGE_SHIFT) | (addr & ~PAGE_MASK);
-	return __va(addr);
-}
-
 /* Return 1 if we need to do a global tlbie, 0 if we can use tlbiel */
 static int global_invalidates(struct kvm *kvm, unsigned long flags)
 {
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5ec987f..9737d6a 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1556,3 +1556,20 @@ void setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	/* Finally limit subsequent allocations */
 	memblock_set_current_limit(ppc64_rma_size);
 }
+
+/* Translate address of a vmalloc'd thing to a linear map address */
+void *real_vmalloc_addr(void *x)
+{
+	unsigned long addr = (unsigned long) x;
+	pte_t *p;
+	/*
+	 * assume we don't have huge pages in vmalloc space...
+	 * So don't worry about THP collapse/split. Called
+	 * Only in realmode, hence won't need irq_save/restore.
+	 */
+	p = __find_linux_pte_or_hugepte(swapper_pg_dir, addr, NULL);
+	if (!p || !pte_present(*p))
+		return NULL;
+	addr = (pte_pfn(*p) << PAGE_SHIFT) | (addr & ~PAGE_MASK);
+	return __va(addr);
+}
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one
exit path. This allows next patch to add locks nicely.

This moves the ioba boundaries check to a helper and adds a check for
least bits which have to be zeros.

The patch is pretty mechanical (only check for least ioba bits is added)
so no change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 102 +++++++++++++++++++++++-------------
 1 file changed, 66 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3..8ae12ac 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -35,71 +35,101 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
+/*
+ * Finds a TCE table descriptor by LIOBN.
+ *
+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+		unsigned long liobn)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvmppc_spapr_tce_table *stt;
+
+	list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables, list)
+		if (stt->liobn == liobn)
+			return stt;
+
+	return NULL;
+}
+
+/*
+ * Validates IO address.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages)
+{
+	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
+
+	if ((ioba & mask) || (size + npages <= idx))
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba, unsigned long tce)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret = H_TOO_HARD;
+	unsigned long idx;
+	struct page *page;
+	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn == liobn) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
+	if (!stt)
+		return ret;
 
-			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
-			/* 	    liobn, stt, stt->window_size); */
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret)
+		return ret;
 
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+	idx = ioba >> SPAPR_TCE_SHIFT;
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = (u64 *)page_address(page);
 
-			/* FIXME: Need to validate the TCE itself */
-			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-			tbl[idx % TCES_PER_PAGE] = tce;
-			return H_SUCCESS;
-		}
-	}
+	/* FIXME: Need to validate the TCE itself */
+	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
+	tbl[idx % TCES_PER_PAGE] = tce;
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret = H_TOO_HARD;
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn == liobn) {
+
+	if (stt) {
+		ret = kvmppc_ioba_validate(stt, ioba, 1);
+		if (!ret) {
 			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
-
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
-
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+			struct page *page = stt->pages[idx / TCES_PER_PAGE];
+			u64 *tbl = (u64 *)page_address(page);
 
 			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
-			return H_SUCCESS;
 		}
 	}
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one
exit path. This allows next patch to add locks nicely.

This moves the ioba boundaries check to a helper and adds a check for
least bits which have to be zeros.

The patch is pretty mechanical (only check for least ioba bits is added)
so no change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 102 +++++++++++++++++++++++-------------
 1 file changed, 66 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3..8ae12ac 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -35,71 +35,101 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
+/*
+ * Finds a TCE table descriptor by LIOBN.
+ *
+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+		unsigned long liobn)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvmppc_spapr_tce_table *stt;
+
+	list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables, list)
+		if (stt->liobn = liobn)
+			return stt;
+
+	return NULL;
+}
+
+/*
+ * Validates IO address.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages)
+{
+	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
+
+	if ((ioba & mask) || (size + npages <= idx))
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba, unsigned long tce)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret = H_TOO_HARD;
+	unsigned long idx;
+	struct page *page;
+	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn = liobn) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
+	if (!stt)
+		return ret;
 
-			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
-			/* 	    liobn, stt, stt->window_size); */
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret)
+		return ret;
 
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+	idx = ioba >> SPAPR_TCE_SHIFT;
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = (u64 *)page_address(page);
 
-			/* FIXME: Need to validate the TCE itself */
-			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-			tbl[idx % TCES_PER_PAGE] = tce;
-			return H_SUCCESS;
-		}
-	}
+	/* FIXME: Need to validate the TCE itself */
+	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
+	tbl[idx % TCES_PER_PAGE] = tce;
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret = H_TOO_HARD;
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn = liobn) {
+
+	if (stt) {
+		ret = kvmppc_ioba_validate(stt, ioba, 1);
+		if (!ret) {
 			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
-
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
-
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+			struct page *page = stt->pages[idx / TCES_PER_PAGE];
+			u64 *tbl = (u64 *)page_address(page);
 
 			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
-			return H_SUCCESS;
 		}
 	}
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

At the moment spapr_tce_tables is not protected against races. This makes
use of RCU-variants of list helpers. As some bits are executed in real
mode, this makes use of just introduced list_for_each_entry_rcu_notrace().

This converts release_spapr_tce_table() to a RCU scheduled handler.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c    | 20 +++++++++++---------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 98eebbf6..e19d412 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -178,6 +178,7 @@ struct kvmppc_spapr_tce_table {
 	struct kvm *kvm;
 	u64 liobn;
 	u32 window_size;
+	struct rcu_head rcu;
 	struct page *pages[0];
 };
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 53285d5..3418f7c 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -806,7 +806,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 {
 
 #ifdef CONFIG_PPC64
-	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc..9526c34 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
-static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
+static void release_spapr_tce_table(struct rcu_head *head)
 {
-	struct kvm *kvm = stt->kvm;
+	struct kvmppc_spapr_tce_table *stt = container_of(head,
+			struct kvmppc_spapr_tce_table, rcu);
 	int i;
 
-	mutex_lock(&kvm->lock);
-	list_del(&stt->list);
 	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
 		__free_page(stt->pages[i]);
+
 	kfree(stt);
-	mutex_unlock(&kvm->lock);
-
-	kvm_put_kvm(kvm);
 }
 
 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
@@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 {
 	struct kvmppc_spapr_tce_table *stt = filp->private_data;
 
-	release_spapr_tce_table(stt);
+	list_del_rcu(&stt->list);
+
+	kvm_put_kvm(stt->kvm);
+
+	call_rcu(&stt->rcu, release_spapr_tce_table);
+
 	return 0;
 }
 
@@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	kvm_get_kvm(kvm);
 
 	mutex_lock(&kvm->lock);
-	list_add(&stt->list, &kvm->arch.spapr_tce_tables);
+	list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);
 
 	mutex_unlock(&kvm->lock);
 
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

At the moment spapr_tce_tables is not protected against races. This makes
use of RCU-variants of list helpers. As some bits are executed in real
mode, this makes use of just introduced list_for_each_entry_rcu_notrace().

This converts release_spapr_tce_table() to a RCU scheduled handler.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c    | 20 +++++++++++---------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 98eebbf6..e19d412 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -178,6 +178,7 @@ struct kvmppc_spapr_tce_table {
 	struct kvm *kvm;
 	u64 liobn;
 	u32 window_size;
+	struct rcu_head rcu;
 	struct page *pages[0];
 };
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 53285d5..3418f7c 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -806,7 +806,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 {
 
 #ifdef CONFIG_PPC64
-	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc..9526c34 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
-static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
+static void release_spapr_tce_table(struct rcu_head *head)
 {
-	struct kvm *kvm = stt->kvm;
+	struct kvmppc_spapr_tce_table *stt = container_of(head,
+			struct kvmppc_spapr_tce_table, rcu);
 	int i;
 
-	mutex_lock(&kvm->lock);
-	list_del(&stt->list);
 	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
 		__free_page(stt->pages[i]);
+
 	kfree(stt);
-	mutex_unlock(&kvm->lock);
-
-	kvm_put_kvm(kvm);
 }
 
 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
@@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 {
 	struct kvmppc_spapr_tce_table *stt = filp->private_data;
 
-	release_spapr_tce_table(stt);
+	list_del_rcu(&stt->list);
+
+	kvm_put_kvm(stt->kvm);
+
+	call_rcu(&stt->rcu, release_spapr_tce_table);
+
 	return 0;
 }
 
@@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	kvm_get_kvm(kvm);
 
 	mutex_lock(&kvm->lock);
-	list_add(&stt->list, &kvm->arch.spapr_tce_tables);
+	list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);
 
 	mutex_unlock(&kvm->lock);
 
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

At the moment pages used for TCE tables (in addition to pages addressed
by TCEs) are not counted in locked_vm counter so a malicious userspace
tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
lock a lot of memory.

This adds counting for pages used for TCE tables.

This counts the number of pages required for a table plus pages for
the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

This does not change the amount of (de)allocated memory.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9526c34..b70787d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
+static long kvmppc_account_memlimit(long npages, bool inc)
+{
+	long ret = 0;
+	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
+			(abs(npages) * sizeof(struct page *));
+	const long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;
+
+	if (!current || !current->mm)
+		return ret; /* process exited */
+
+	npages += stt_pages;
+
+	down_write(&current->mm->mmap_sem);
+
+	if (inc) {
+		long locked, lock_limit;
+
+		locked = current->mm->locked_vm + npages;
+		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+			ret = -ENOMEM;
+		else
+			current->mm->locked_vm += npages;
+	} else {
+		if (npages > current->mm->locked_vm)
+			npages = current->mm->locked_vm;
+
+		current->mm->locked_vm -= npages;
+	}
+
+	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
+			inc ? '+' : '-',
+			npages << PAGE_SHIFT,
+			current->mm->locked_vm << PAGE_SHIFT,
+			rlimit(RLIMIT_MEMLOCK),
+			ret ? " - exceeded" : "");
+
+	up_write(&current->mm->mmap_sem);
+
+	return ret;
+}
+
 static void release_spapr_tce_table(struct rcu_head *head)
 {
 	struct kvmppc_spapr_tce_table *stt = container_of(head,
 			struct kvmppc_spapr_tce_table, rcu);
 	int i;
+	long npages = kvmppc_stt_npages(stt->window_size);
 
-	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+	for (i = 0; i < npages; i++)
 		__free_page(stt->pages[i]);
 
 	kfree(stt);
@@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 
 	kvm_put_kvm(stt->kvm);
 
+	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
 	call_rcu(&stt->rcu, release_spapr_tce_table);
 
 	return 0;
@@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	}
 
 	npages = kvmppc_stt_npages(args->window_size);
+	ret = kvmppc_account_memlimit(npages, true);
+	if (ret) {
+		stt = NULL;
+		goto fail;
+	}
 
 	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
 		      GFP_KERNEL);
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

At the moment pages used for TCE tables (in addition to pages addressed
by TCEs) are not counted in locked_vm counter so a malicious userspace
tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
lock a lot of memory.

This adds counting for pages used for TCE tables.

This counts the number of pages required for a table plus pages for
the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

This does not change the amount of (de)allocated memory.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9526c34..b70787d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
+static long kvmppc_account_memlimit(long npages, bool inc)
+{
+	long ret = 0;
+	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
+			(abs(npages) * sizeof(struct page *));
+	const long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;
+
+	if (!current || !current->mm)
+		return ret; /* process exited */
+
+	npages += stt_pages;
+
+	down_write(&current->mm->mmap_sem);
+
+	if (inc) {
+		long locked, lock_limit;
+
+		locked = current->mm->locked_vm + npages;
+		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+			ret = -ENOMEM;
+		else
+			current->mm->locked_vm += npages;
+	} else {
+		if (npages > current->mm->locked_vm)
+			npages = current->mm->locked_vm;
+
+		current->mm->locked_vm -= npages;
+	}
+
+	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
+			inc ? '+' : '-',
+			npages << PAGE_SHIFT,
+			current->mm->locked_vm << PAGE_SHIFT,
+			rlimit(RLIMIT_MEMLOCK),
+			ret ? " - exceeded" : "");
+
+	up_write(&current->mm->mmap_sem);
+
+	return ret;
+}
+
 static void release_spapr_tce_table(struct rcu_head *head)
 {
 	struct kvmppc_spapr_tce_table *stt = container_of(head,
 			struct kvmppc_spapr_tce_table, rcu);
 	int i;
+	long npages = kvmppc_stt_npages(stt->window_size);
 
-	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+	for (i = 0; i < npages; i++)
 		__free_page(stt->pages[i]);
 
 	kfree(stt);
@@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 
 	kvm_put_kvm(stt->kvm);
 
+	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
 	call_rcu(&stt->rcu, release_spapr_tce_table);
 
 	return 0;
@@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	}
 
 	npages = kvmppc_stt_npages(args->window_size);
+	ret = kvmppc_account_memlimit(npages, true);
+	if (ret) {
+		stt = NULL;
+		goto fail;
+	}
 
 	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
 		      GFP_KERNEL);
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 6/9] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K
can be easily used instead, remove SPAPR_TCE_SHIFT.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 2 --
 arch/powerpc/kvm/book3s_64_vio.c         | 3 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2aa79c8..7529aab 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
 }
 #endif
 
-#define SPAPR_TCE_SHIFT		12
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
 #endif
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index b70787d..e347856 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -36,12 +36,13 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
 static long kvmppc_stt_npages(unsigned long window_size)
 {
-	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
+	return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 8ae12ac..6cf1ab3 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret)
 		return ret;
 
-	idx = ioba >> SPAPR_TCE_SHIFT;
+	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
 
@@ -121,7 +121,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (stt) {
 		ret = kvmppc_ioba_validate(stt, ioba, 1);
 		if (!ret) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
+			unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 			struct page *page = stt->pages[idx / TCES_PER_PAGE];
 			u64 *tbl = (u64 *)page_address(page);
 
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 6/9] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K
can be easily used instead, remove SPAPR_TCE_SHIFT.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 2 --
 arch/powerpc/kvm/book3s_64_vio.c         | 3 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2aa79c8..7529aab 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
 }
 #endif
 
-#define SPAPR_TCE_SHIFT		12
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
 #endif
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index b70787d..e347856 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -36,12 +36,13 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
 static long kvmppc_stt_npages(unsigned long window_size)
 {
-	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
+	return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 8ae12ac..6cf1ab3 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret)
 		return ret;
 
-	idx = ioba >> SPAPR_TCE_SHIFT;
+	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
 
@@ -121,7 +121,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (stt) {
 		ret = kvmppc_ioba_validate(stt, ioba, 1);
 		if (!ret) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
+			unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 			struct page *page = stt->pages[idx / TCES_PER_PAGE];
 			u64 *tbl = (u64 *)page_address(page);
 
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
will validate TCE (not to have unexpected bits) and IO address
(to be within the DMA window boundaries).

This introduces helpers to validate TCE and IO address.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 89 ++++++++++++++++++++++++++++++++-----
 2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index c6ef05b..fcde896 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages);
+extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
+		unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6cf1ab3..f0fd84c 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -36,6 +36,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
+#include <asm/tce.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
  * WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages)
 {
 	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
@@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 
 	return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
+
+/*
+ * Validates TCE address.
+ * At the moment flags and page mask are validated.
+ * As the host kernel does not access those addresses (just puts them
+ * to the table and user space is supposed to process them), we can skip
+ * checking other things (such as TCE is a guest RAM address or the page
+ * was actually allocated).
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+{
+	unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) &
+			~(TCE_PCI_WRITE | TCE_PCI_READ);
+
+	if (tce & mask)
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
+
+/* Note on the use of page_address() in real mode,
+ *
+ * It is safe to use page_address() in real mode on ppc64 because
+ * page_address() is always defined as lowmem_page_address()
+ * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
+ * operation and does not access page struct.
+ *
+ * Theoretically page_address() could be defined different
+ * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
+ * should be enabled.
+ * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
+ * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
+ * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
+ * is not expected to be enabled on ppc32, page_address()
+ * is safe for ppc32 as well.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static u64 *kvmppc_page_address(struct page *page)
+{
+#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
+#error TODO: fix to avoid page_address() here
+#endif
+	return (u64 *) page_address(page);
+}
+
+/*
+ * Handles TCE requests for emulated devices.
+ * Puts guest TCE values to the table and expects user space to convert them.
+ * Called in both real and virtual modes.
+ * Cannot fail so kvmppc_tce_validate must be called before it.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
+		unsigned long idx, unsigned long tce)
+{
+	struct page *page;
+	u64 *tbl;
+
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = kvmppc_page_address(page);
+
+	tbl[idx % TCES_PER_PAGE] = tce;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
@@ -85,9 +159,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret = H_TOO_HARD;
-	unsigned long idx;
-	struct page *page;
-	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
@@ -99,13 +170,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret)
 		return ret;
 
-	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
-	page = stt->pages[idx / TCES_PER_PAGE];
-	tbl = (u64 *)page_address(page);
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret)
+		return ret;
 
-	/* FIXME: Need to validate the TCE itself */
-	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-	tbl[idx % TCES_PER_PAGE] = tce;
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
 
 	return ret;
 }
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
will validate TCE (not to have unexpected bits) and IO address
(to be within the DMA window boundaries).

This introduces helpers to validate TCE and IO address.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 89 ++++++++++++++++++++++++++++++++-----
 2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index c6ef05b..fcde896 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages);
+extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
+		unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6cf1ab3..f0fd84c 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -36,6 +36,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
+#include <asm/tce.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
  * WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages)
 {
 	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
@@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 
 	return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
+
+/*
+ * Validates TCE address.
+ * At the moment flags and page mask are validated.
+ * As the host kernel does not access those addresses (just puts them
+ * to the table and user space is supposed to process them), we can skip
+ * checking other things (such as TCE is a guest RAM address or the page
+ * was actually allocated).
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+{
+	unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) &
+			~(TCE_PCI_WRITE | TCE_PCI_READ);
+
+	if (tce & mask)
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
+
+/* Note on the use of page_address() in real mode,
+ *
+ * It is safe to use page_address() in real mode on ppc64 because
+ * page_address() is always defined as lowmem_page_address()
+ * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
+ * operation and does not access page struct.
+ *
+ * Theoretically page_address() could be defined different
+ * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
+ * should be enabled.
+ * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
+ * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
+ * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
+ * is not expected to be enabled on ppc32, page_address()
+ * is safe for ppc32 as well.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static u64 *kvmppc_page_address(struct page *page)
+{
+#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
+#error TODO: fix to avoid page_address() here
+#endif
+	return (u64 *) page_address(page);
+}
+
+/*
+ * Handles TCE requests for emulated devices.
+ * Puts guest TCE values to the table and expects user space to convert them.
+ * Called in both real and virtual modes.
+ * Cannot fail so kvmppc_tce_validate must be called before it.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
+		unsigned long idx, unsigned long tce)
+{
+	struct page *page;
+	u64 *tbl;
+
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = kvmppc_page_address(page);
+
+	tbl[idx % TCES_PER_PAGE] = tce;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
@@ -85,9 +159,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret = H_TOO_HARD;
-	unsigned long idx;
-	struct page *page;
-	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
@@ -99,13 +170,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret)
 		return ret;
 
-	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
-	page = stt->pages[idx / TCES_PER_PAGE];
-	tbl = (u64 *)page_address(page);
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret)
+		return ret;
 
-	/* FIXME: Need to validate the TCE itself */
-	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-	tbl[idx % TCES_PER_PAGE] = tce;
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
 
 	return ret;
 }
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 8/9] KVM: Fix KVM_SMI chapter number
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

The KVM_SMI capability is following the KVM_S390_SET_IRQ_STATE capability
which is "4.95", this changes the number of the KVM_SMI chapter to 4.96.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 Documentation/virtual/kvm/api.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index d9eccee..d86d831 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3009,7 +3009,7 @@ len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
 and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
 which is the maximum number of possibly pending cpu-local interrupts.
 
-4.90 KVM_SMI
+4.96 KVM_SMI
 
 Capability: KVM_CAP_X86_SMM
 Architectures: x86
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 8/9] KVM: Fix KVM_SMI chapter number
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

The KVM_SMI capability is following the KVM_S390_SET_IRQ_STATE capability
which is "4.95", this changes the number of the KVM_SMI chapter to 4.96.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 Documentation/virtual/kvm/api.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index d9eccee..d86d831 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3009,7 +3009,7 @@ len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
 and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
 which is the maximum number of possibly pending cpu-local interrupts.
 
-4.90 KVM_SMI
+4.96 KVM_SMI
 
 Capability: KVM_CAP_X86_SMM
 Architectures: x86
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls
  2015-09-15 10:49 ` Alexey Kardashevskiy
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI.  These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition between kernel and user space.

This implements the KVM_CAP_PPC_MULTITCE capability. When present,
the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
If they can not be handled by the kernel, they are passed on to
the user space. The user space still has to have an implementation
for these.

Both HV and PR-syle KVM are supported.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 Documentation/virtual/kvm/api.txt       |  25 ++++++
 arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
 arch/powerpc/kvm/book3s_64_vio.c        | 111 +++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
 arch/powerpc/kvm/powerpc.c              |   3 +
 8 files changed, 350 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index d86d831..593c62a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error
 
 Queues an SMI on the thread's vcpu.
 
+4.97 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability means the kernel is capable of handling hypercalls
+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
+space. This significantly accelerates DMA operations for PPC KVM guests.
+User space should expect that its handlers for these hypercalls
+are not going to be called if user space previously registered LIOBN
+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
+
+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+The hypercalls mentioned above may or may not be processed successfully
+in the kernel based fast path. If they can not be handled by the kernel,
+they will get passed on to user space. So user space still has to have
+an implementation for these despite the in kernel acceleration.
+
+This capability is always enabled.
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index fcde896..e5b968e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
+		struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages);
 extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
 		unsigned long tce);
+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap);
+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+		unsigned long idx, unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages);
+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba);
 extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index e347856..d3fc732 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -37,8 +38,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
-
-#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#include <asm/tce.h>
 
 static long kvmppc_stt_npages(unsigned long window_size)
 {
@@ -200,3 +200,110 @@ fail:
 	}
 	return ret;
 }
+
+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce)
+{
+	long ret;
+	struct kvmppc_spapr_tce_table *stt;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret)
+		return ret;
+
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS, idx;
+	unsigned long entry, ua = 0;
+	u64 __user *tces, tce;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * SPAPR spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table fits in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & ~IOMMU_PAGE_MASK_4K)
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+	tces = (u64 *) ua;
+
+	for (i = 0; i < npages; ++i) {
+		if (get_user(tce, tces + i)) {
+			ret = H_PARAMETER;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
+
+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce_value);
+	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index f0fd84c..bca7b12 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -30,6 +31,7 @@
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
 #include <asm/mmu-hash64.h>
+#include <asm/mmu_context.h>
 #include <asm/hvcall.h>
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
@@ -37,6 +39,7 @@
 #include <asm/udbg.h>
 #include <asm/iommu.h>
 #include <asm/tce.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -46,7 +49,7 @@
  * WARNING: This will be called in real or virtual mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 		unsigned long liobn)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvmppc_find_table);
 
 /*
  * Validates IO address.
@@ -151,11 +155,32 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 }
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
-/* WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
- */
-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
-		      unsigned long ioba, unsigned long tce)
+long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap)
+{
+	unsigned long gfn = gpa >> PAGE_SHIFT;
+	struct kvm_memory_slot *memslot;
+
+	memslot = search_memslots(kvm_memslots(kvm), gfn);
+	if (!memslot)
+		return -EINVAL;
+
+	*ua = __gfn_to_hva_memslot(memslot, gfn) |
+		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	if (prmap)
+		*prmap = real_vmalloc_addr(&memslot->arch.rmap[
+				gfn - memslot->base_gfn]);
+#endif
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
+		unsigned long ioba, unsigned long tce)
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret = H_TOO_HARD;
@@ -178,7 +203,111 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
+		unsigned long ua, unsigned long *phpa)
+{
+	pte_t *ptep, pte;
+	unsigned shift = 0;
+
+	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, &shift);
+	if (!ptep || !pte_present(*ptep))
+		return -ENXIO;
+	pte = *ptep;
+
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* Avoid handling anything potentially complicated in realmode */
+	if (shift > PAGE_SHIFT)
+		return -EAGAIN;
+
+	if (!pte_young(pte))
+		return -EAGAIN;
+
+	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
+			(ua & ~PAGE_MASK);
+
+	return 0;
+}
+
+long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list,	unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS;
+	unsigned long tces, entry, ua = 0;
+	unsigned long *rmap = NULL;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * The spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table addressed resides in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & ~IOMMU_PAGE_MASK_4K)
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
+		return H_TOO_HARD;
+
+	lock_rmap(rmap);
+	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	unlock_rmap(rmap);
+
+	return ret;
+}
+
+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce_value);
+	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba)
@@ -202,3 +331,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	return ret;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
+
+#endif /* KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c5edf17..408b1b1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -775,7 +775,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		if (kvmppc_xics_enabled(vcpu)) {
 			ret = kvmppc_xics_hcall(vcpu, req);
 			break;
-		} /* fallthrough */
+		}
+		return RESUME_HOST;
+	case H_PUT_TCE:
+		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6));
+		if (ret == H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_PUT_TCE_INDIRECT:
+		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret == H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_STUFF_TCE:
+		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret == H_TOO_HARD)
+			return RESUME_HOST;
+		break;
 	default:
 		return RESUME_HOST;
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 2273dca..fd1997c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1917,7 +1917,7 @@ hcall_real_table:
 	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
-	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
 	.long	0		/* 0x24 - H_SET_SPRG0 */
 	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
 	.long	0		/* 0x2c */
@@ -1995,8 +1995,8 @@ hcall_real_table:
 	.long	0		/* 0x12c */
 	.long	0		/* 0x130 */
 	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
-	.long	0		/* 0x138 */
-	.long	0		/* 0x13c */
+	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
 	.long	0		/* 0x140 */
 	.long	0		/* 0x144 */
 	.long	0		/* 0x148 */
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index f2c75a1..02176fd 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
 	return EMULATE_DONE;
 }
 
+static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
+			tce, npages);
+	if (rc == H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
+static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
+	if (rc == H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
 static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
 {
 	long rc = kvmppc_xics_hcall(vcpu, cmd);
@@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		return kvmppc_h_pr_bulk_remove(vcpu);
 	case H_PUT_TCE:
 		return kvmppc_h_pr_put_tce(vcpu);
+	case H_PUT_TCE_INDIRECT:
+		return kvmppc_h_pr_put_tce_indirect(vcpu);
+	case H_STUFF_TCE:
+		return kvmppc_h_pr_stuff_tce(vcpu);
 	case H_CEDE:
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_block(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2e51289..c7c2802 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -566,6 +566,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PPC_GET_SMMU_INFO:
 		r = 1;
 		break;
+	case KVM_CAP_SPAPR_MULTITCE:
+		r = 1;
+		break;
 #endif
 	default:
 		r = 0;
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls
@ 2015-09-15 10:49   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-09-15 10:49 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, Alexander Graf,
	David Gibson, kvm-ppc, kvm

This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI.  These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition between kernel and user space.

This implements the KVM_CAP_PPC_MULTITCE capability. When present,
the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
If they can not be handled by the kernel, they are passed on to
the user space. The user space still has to have an implementation
for these.

Both HV and PR-syle KVM are supported.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 Documentation/virtual/kvm/api.txt       |  25 ++++++
 arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
 arch/powerpc/kvm/book3s_64_vio.c        | 111 +++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
 arch/powerpc/kvm/powerpc.c              |   3 +
 8 files changed, 350 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index d86d831..593c62a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error
 
 Queues an SMI on the thread's vcpu.
 
+4.97 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability means the kernel is capable of handling hypercalls
+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
+space. This significantly accelerates DMA operations for PPC KVM guests.
+User space should expect that its handlers for these hypercalls
+are not going to be called if user space previously registered LIOBN
+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
+
+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+The hypercalls mentioned above may or may not be processed successfully
+in the kernel based fast path. If they can not be handled by the kernel,
+they will get passed on to user space. So user space still has to have
+an implementation for these despite the in kernel acceleration.
+
+This capability is always enabled.
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index fcde896..e5b968e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
+		struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages);
 extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
 		unsigned long tce);
+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap);
+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+		unsigned long idx, unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages);
+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba);
 extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index e347856..d3fc732 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -37,8 +38,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
-
-#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#include <asm/tce.h>
 
 static long kvmppc_stt_npages(unsigned long window_size)
 {
@@ -200,3 +200,110 @@ fail:
 	}
 	return ret;
 }
+
+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce)
+{
+	long ret;
+	struct kvmppc_spapr_tce_table *stt;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret)
+		return ret;
+
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS, idx;
+	unsigned long entry, ua = 0;
+	u64 __user *tces, tce;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * SPAPR spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table fits in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & ~IOMMU_PAGE_MASK_4K)
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+	tces = (u64 *) ua;
+
+	for (i = 0; i < npages; ++i) {
+		if (get_user(tce, tces + i)) {
+			ret = H_PARAMETER;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
+
+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce_value);
+	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index f0fd84c..bca7b12 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -30,6 +31,7 @@
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
 #include <asm/mmu-hash64.h>
+#include <asm/mmu_context.h>
 #include <asm/hvcall.h>
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
@@ -37,6 +39,7 @@
 #include <asm/udbg.h>
 #include <asm/iommu.h>
 #include <asm/tce.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -46,7 +49,7 @@
  * WARNING: This will be called in real or virtual mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 		unsigned long liobn)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvmppc_find_table);
 
 /*
  * Validates IO address.
@@ -151,11 +155,32 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 }
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
-/* WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
- */
-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
-		      unsigned long ioba, unsigned long tce)
+long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap)
+{
+	unsigned long gfn = gpa >> PAGE_SHIFT;
+	struct kvm_memory_slot *memslot;
+
+	memslot = search_memslots(kvm_memslots(kvm), gfn);
+	if (!memslot)
+		return -EINVAL;
+
+	*ua = __gfn_to_hva_memslot(memslot, gfn) |
+		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	if (prmap)
+		*prmap = real_vmalloc_addr(&memslot->arch.rmap[
+				gfn - memslot->base_gfn]);
+#endif
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
+		unsigned long ioba, unsigned long tce)
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret = H_TOO_HARD;
@@ -178,7 +203,111 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
+		unsigned long ua, unsigned long *phpa)
+{
+	pte_t *ptep, pte;
+	unsigned shift = 0;
+
+	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, &shift);
+	if (!ptep || !pte_present(*ptep))
+		return -ENXIO;
+	pte = *ptep;
+
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* Avoid handling anything potentially complicated in realmode */
+	if (shift > PAGE_SHIFT)
+		return -EAGAIN;
+
+	if (!pte_young(pte))
+		return -EAGAIN;
+
+	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
+			(ua & ~PAGE_MASK);
+
+	return 0;
+}
+
+long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list,	unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS;
+	unsigned long tces, entry, ua = 0;
+	unsigned long *rmap = NULL;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * The spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table addressed resides in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & ~IOMMU_PAGE_MASK_4K)
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
+		return H_TOO_HARD;
+
+	lock_rmap(rmap);
+	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	unlock_rmap(rmap);
+
+	return ret;
+}
+
+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce_value);
+	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba)
@@ -202,3 +331,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	return ret;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
+
+#endif /* KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c5edf17..408b1b1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -775,7 +775,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		if (kvmppc_xics_enabled(vcpu)) {
 			ret = kvmppc_xics_hcall(vcpu, req);
 			break;
-		} /* fallthrough */
+		}
+		return RESUME_HOST;
+	case H_PUT_TCE:
+		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6));
+		if (ret = H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_PUT_TCE_INDIRECT:
+		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret = H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_STUFF_TCE:
+		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret = H_TOO_HARD)
+			return RESUME_HOST;
+		break;
 	default:
 		return RESUME_HOST;
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 2273dca..fd1997c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1917,7 +1917,7 @@ hcall_real_table:
 	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
-	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
 	.long	0		/* 0x24 - H_SET_SPRG0 */
 	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
 	.long	0		/* 0x2c */
@@ -1995,8 +1995,8 @@ hcall_real_table:
 	.long	0		/* 0x12c */
 	.long	0		/* 0x130 */
 	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
-	.long	0		/* 0x138 */
-	.long	0		/* 0x13c */
+	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
 	.long	0		/* 0x140 */
 	.long	0		/* 0x144 */
 	.long	0		/* 0x148 */
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index f2c75a1..02176fd 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
 	return EMULATE_DONE;
 }
 
+static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
+			tce, npages);
+	if (rc = H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
+static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
+	if (rc = H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
 static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
 {
 	long rc = kvmppc_xics_hcall(vcpu, cmd);
@@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		return kvmppc_h_pr_bulk_remove(vcpu);
 	case H_PUT_TCE:
 		return kvmppc_h_pr_put_tce(vcpu);
+	case H_PUT_TCE_INDIRECT:
+		return kvmppc_h_pr_put_tce_indirect(vcpu);
+	case H_STUFF_TCE:
+		return kvmppc_h_pr_stuff_tce(vcpu);
 	case H_CEDE:
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_block(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2e51289..c7c2802 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -566,6 +566,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PPC_GET_SMMU_INFO:
 		r = 1;
 		break;
+	case KVM_CAP_SPAPR_MULTITCE:
+		r = 1;
+		break;
 #endif
 	default:
 		r = 0;
-- 
2.4.0.rc3.8.gfb3e7d5


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-11-30  2:06     ` Paul Mackerras
  -1 siblings, 0 replies; 46+ messages in thread
From: Paul Mackerras @ 2015-11-30  2:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Alexander Graf, David Gibson, kvm-ppc, kvm

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
> 
> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> +	long ret = 0;
> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(abs(npages) * sizeof(struct page *));

Why abs(npages)?  Can npages be negative?  If so, what does that mean?

Paul.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
@ 2015-11-30  2:06     ` Paul Mackerras
  0 siblings, 0 replies; 46+ messages in thread
From: Paul Mackerras @ 2015-11-30  2:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Alexander Graf, David Gibson, kvm-ppc, kvm

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
> 
> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> +	long ret = 0;
> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(abs(npages) * sizeof(struct page *));

Why abs(npages)?  Can npages be negative?  If so, what does that mean?

Paul.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
  2015-11-30  2:06     ` Paul Mackerras
@ 2015-11-30  5:09       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-30  5:09 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Alexander Graf, David Gibson, kvm-ppc, kvm

On 11/30/2015 01:06 PM, Paul Mackerras wrote:
> On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
>> At the moment pages used for TCE tables (in addition to pages addressed
>> by TCEs) are not counted in locked_vm counter so a malicious userspace
>> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
>> lock a lot of memory.
>>
>> This adds counting for pages used for TCE tables.
>>
>> This counts the number of pages required for a table plus pages for
>> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
>>
>> This does not change the amount of (de)allocated memory.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 9526c34..b70787d 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>>   		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>>   }
>>
>> +static long kvmppc_account_memlimit(long npages, bool inc)
>> +{
>> +	long ret = 0;
>> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
>> +			(abs(npages) * sizeof(struct page *));
>
> Why abs(npages)?  Can npages be negative?  If so, what does that mean?


Leftover from older versions when there was one shared 
account_memlimit(long npages). It does not make sense here, I need to 
remove it.


-- 
Alexey

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
@ 2015-11-30  5:09       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-11-30  5:09 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Alexander Graf, David Gibson, kvm-ppc, kvm

On 11/30/2015 01:06 PM, Paul Mackerras wrote:
> On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
>> At the moment pages used for TCE tables (in addition to pages addressed
>> by TCEs) are not counted in locked_vm counter so a malicious userspace
>> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
>> lock a lot of memory.
>>
>> This adds counting for pages used for TCE tables.
>>
>> This counts the number of pages required for a table plus pages for
>> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
>>
>> This does not change the amount of (de)allocated memory.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 9526c34..b70787d 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>>   		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>>   }
>>
>> +static long kvmppc_account_memlimit(long npages, bool inc)
>> +{
>> +	long ret = 0;
>> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
>> +			(abs(npages) * sizeof(struct page *));
>
> Why abs(npages)?  Can npages be negative?  If so, what does that mean?


Leftover from older versions when there was one shared 
account_memlimit(long npages). It does not make sense here, I need to 
remove it.


-- 
Alexey

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  2:05     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3122 bytes --]

On Tue, Sep 15, 2015 at 08:49:31PM +1000, Alexey Kardashevskiy wrote:
> This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace
> which use rcu_dereference_raw_notrace instead of rcu_dereference_raw.
> This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off).
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/linux/rculist.h | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f..439c4d7 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
>  })
>  
>  /**
> + * list_entry_rcu_notrace - get the struct for this entry
> + * @ptr:        the &struct list_head pointer.
> + * @type:       the type of the struct this is embedded in.
> + * @member:     the name of the list_struct within the struct.
> + *
> + * This primitive may safely run concurrently with the _rcu list-mutation
> + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> + *
> + * This is the same as list_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_entry_rcu_notrace(ptr, type, member) \
> +({ \
> +	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> +	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
> +			type, member); \
> +})
> +
> +/**
>   * Where are list_empty_rcu() and list_first_entry_rcu()?
>   *
>   * Implementing those functions following their counterparts list_empty() and
> @@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
>  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as list_add_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + *
> + * This is the same as list_for_each_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_for_each_entry_rcu_notrace(pos, head, member) \
> +	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
> +		&pos->member != (head); \
> +		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
> +				member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given type
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu
@ 2015-12-08  2:05     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3122 bytes --]

On Tue, Sep 15, 2015 at 08:49:31PM +1000, Alexey Kardashevskiy wrote:
> This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace
> which use rcu_dereference_raw_notrace instead of rcu_dereference_raw.
> This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off).
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/linux/rculist.h | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f..439c4d7 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
>  })
>  
>  /**
> + * list_entry_rcu_notrace - get the struct for this entry
> + * @ptr:        the &struct list_head pointer.
> + * @type:       the type of the struct this is embedded in.
> + * @member:     the name of the list_struct within the struct.
> + *
> + * This primitive may safely run concurrently with the _rcu list-mutation
> + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
> + *
> + * This is the same as list_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_entry_rcu_notrace(ptr, type, member) \
> +({ \
> +	typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> +	container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
> +			type, member); \
> +})
> +
> +/**
>   * Where are list_empty_rcu() and list_first_entry_rcu()?
>   *
>   * Implementing those functions following their counterparts list_empty() and
> @@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head *list,
>  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as list_add_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + *
> + * This is the same as list_for_each_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_for_each_entry_rcu_notrace(pos, head, member) \
> +	for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
> +		&pos->member != (head); \
> +		pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
> +				member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given type
>   * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  2:08     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 1049 bytes --]

On Tue, Sep 15, 2015 at 08:49:32PM +1000, Alexey Kardashevskiy wrote:
> This helper translates vmalloc'd addresses to linear addresses.
> It is only used by the KVM MMU code now and resides in the HV KVM code.
> We will need it further in the TCE code and the DMA memory preregistration
> code called in real mode.
> 
> This makes real_vmalloc_addr() public and moves it to the powerpc code as
> it does not do anything special for KVM.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Hmm, I have a couple of small concerns.

First, I'm not sure if the name is clear enough for a public function.

Second, I'm not sure if mmu-hash64.h is the right place for it.  This
is still a function with very specific and limited usage, I wonder if
we should have somewhere else for such special real mode helpers.

Paulus, thoughts?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public
@ 2015-12-08  2:08     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 1049 bytes --]

On Tue, Sep 15, 2015 at 08:49:32PM +1000, Alexey Kardashevskiy wrote:
> This helper translates vmalloc'd addresses to linear addresses.
> It is only used by the KVM MMU code now and resides in the HV KVM code.
> We will need it further in the TCE code and the DMA memory preregistration
> code called in real mode.
> 
> This makes real_vmalloc_addr() public and moves it to the powerpc code as
> it does not do anything special for KVM.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Hmm, I have a couple of small concerns.

First, I'm not sure if the name is clear enough for a public function.

Second, I'm not sure if mmu-hash64.h is the right place for it.  This
is still a function with very specific and limited usage, I wonder if
we should have somewhere else for such special real mode helpers.

Paulus, thoughts?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  2:18     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 5925 bytes --]

On Tue, Sep 15, 2015 at 08:49:33PM +1000, Alexey Kardashevskiy wrote:
> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one
> exit path. This allows next patch to add locks nicely.

I don't see a problem with the actual code, but it doesn't seem to
match this description: I still see multiple return statements for
h_put_tce at least.

> This moves the ioba boundaries check to a helper and adds a check for
> least bits which have to be zeros.
> 
> The patch is pretty mechanical (only check for least ioba bits is added)
> so no change in behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 102 +++++++++++++++++++++++-------------
>  1 file changed, 66 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 89e96b3..8ae12ac 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -35,71 +35,101 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> +/*
> + * Finds a TCE table descriptor by LIOBN.
> + *
> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +		unsigned long liobn)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables, list)
> +		if (stt->liobn == liobn)
> +			return stt;
> +
> +	return NULL;
> +}
> +
> +/*
> + * Validates IO address.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages)
> +{
> +	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> +	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> +
> +	if ((ioba & mask) || (size + npages <= idx))
> +		return H_PARAMETER;

Not sure if it's worth a check for overflow in (size+npages) there.

> +
> +	return H_SUCCESS;
> +}
> +
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvmppc_spapr_tce_table *stt;
> +	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> +	long ret = H_TOO_HARD;
> +	unsigned long idx;
> +	struct page *page;
> +	u64 *tbl;
>  
>  	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>  	/* 	    liobn, ioba, tce); */
>  
> -	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
> -		if (stt->liobn == liobn) {
> -			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> -			struct page *page;
> -			u64 *tbl;
> +	if (!stt)
> +		return ret;
>  
> -			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
> -			/* 	    liobn, stt, stt->window_size); */
> -			if (ioba >= stt->window_size)
> -				return H_PARAMETER;
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret)
> +		return ret;
>  
> -			page = stt->pages[idx / TCES_PER_PAGE];
> -			tbl = (u64 *)page_address(page);
> +	idx = ioba >> SPAPR_TCE_SHIFT;
> +	page = stt->pages[idx / TCES_PER_PAGE];
> +	tbl = (u64 *)page_address(page);
>  
> -			/* FIXME: Need to validate the TCE itself */
> -			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -			tbl[idx % TCES_PER_PAGE] = tce;
> -			return H_SUCCESS;
> -		}
> -	}
> +	/* FIXME: Need to validate the TCE itself */
> +	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> +	tbl[idx % TCES_PER_PAGE] = tce;
>  
> -	/* Didn't find the liobn, punt it to userspace */
> -	return H_TOO_HARD;
> +	return ret;

So, this relies on the fact that kvmppc_ioba_validate() would have
returned H_SUCCESS some distance above.  This seems rather fragile if
you insert anything else which alters ret in between.  Since this is
the success path, I think it's clearer to explicitly return H_SUCCESS.

>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba)
>  {
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvmppc_spapr_tce_table *stt;
> +	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> +	long ret = H_TOO_HARD;
>  
> -	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
> -		if (stt->liobn == liobn) {
> +
> +	if (stt) {
> +		ret = kvmppc_ioba_validate(stt, ioba, 1);
> +		if (!ret) {

This relies on the fact that H_SUCCESS == 0, I'm not sure if that's
something we're already doing elsewhere or not.


>  			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> -			struct page *page;
> -			u64 *tbl;
> -
> -			if (ioba >= stt->window_size)
> -				return H_PARAMETER;
> -
> -			page = stt->pages[idx / TCES_PER_PAGE];
> -			tbl = (u64 *)page_address(page);
> +			struct page *page = stt->pages[idx / TCES_PER_PAGE];
> +			u64 *tbl = (u64 *)page_address(page);
>  
>  			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
> -			return H_SUCCESS;
>  		}
>  	}
>  
> -	/* Didn't find the liobn, punt it to userspace */
> -	return H_TOO_HARD;
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
@ 2015-12-08  2:18     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 5925 bytes --]

On Tue, Sep 15, 2015 at 08:49:33PM +1000, Alexey Kardashevskiy wrote:
> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one
> exit path. This allows next patch to add locks nicely.

I don't see a problem with the actual code, but it doesn't seem to
match this description: I still see multiple return statements for
h_put_tce at least.

> This moves the ioba boundaries check to a helper and adds a check for
> least bits which have to be zeros.
> 
> The patch is pretty mechanical (only check for least ioba bits is added)
> so no change in behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 102 +++++++++++++++++++++++-------------
>  1 file changed, 66 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 89e96b3..8ae12ac 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -35,71 +35,101 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> +/*
> + * Finds a TCE table descriptor by LIOBN.
> + *
> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +		unsigned long liobn)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables, list)
> +		if (stt->liobn == liobn)
> +			return stt;
> +
> +	return NULL;
> +}
> +
> +/*
> + * Validates IO address.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages)
> +{
> +	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> +	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> +
> +	if ((ioba & mask) || (size + npages <= idx))
> +		return H_PARAMETER;

Not sure if it's worth a check for overflow in (size+npages) there.

> +
> +	return H_SUCCESS;
> +}
> +
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvmppc_spapr_tce_table *stt;
> +	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> +	long ret = H_TOO_HARD;
> +	unsigned long idx;
> +	struct page *page;
> +	u64 *tbl;
>  
>  	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>  	/* 	    liobn, ioba, tce); */
>  
> -	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
> -		if (stt->liobn == liobn) {
> -			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> -			struct page *page;
> -			u64 *tbl;
> +	if (!stt)
> +		return ret;
>  
> -			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
> -			/* 	    liobn, stt, stt->window_size); */
> -			if (ioba >= stt->window_size)
> -				return H_PARAMETER;
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret)
> +		return ret;
>  
> -			page = stt->pages[idx / TCES_PER_PAGE];
> -			tbl = (u64 *)page_address(page);
> +	idx = ioba >> SPAPR_TCE_SHIFT;
> +	page = stt->pages[idx / TCES_PER_PAGE];
> +	tbl = (u64 *)page_address(page);
>  
> -			/* FIXME: Need to validate the TCE itself */
> -			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -			tbl[idx % TCES_PER_PAGE] = tce;
> -			return H_SUCCESS;
> -		}
> -	}
> +	/* FIXME: Need to validate the TCE itself */
> +	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> +	tbl[idx % TCES_PER_PAGE] = tce;
>  
> -	/* Didn't find the liobn, punt it to userspace */
> -	return H_TOO_HARD;
> +	return ret;

So, this relies on the fact that kvmppc_ioba_validate() would have
returned H_SUCCESS some distance above.  This seems rather fragile if
you insert anything else which alters ret in between.  Since this is
the success path, I think it's clearer to explicitly return H_SUCCESS.

>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba)
>  {
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvmppc_spapr_tce_table *stt;
> +	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> +	long ret = H_TOO_HARD;
>  
> -	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
> -		if (stt->liobn == liobn) {
> +
> +	if (stt) {
> +		ret = kvmppc_ioba_validate(stt, ioba, 1);
> +		if (!ret) {

This relies on the fact that H_SUCCESS == 0, I'm not sure if that's
something we're already doing elsewhere or not.


>  			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> -			struct page *page;
> -			u64 *tbl;
> -
> -			if (ioba >= stt->window_size)
> -				return H_PARAMETER;
> -
> -			page = stt->pages[idx / TCES_PER_PAGE];
> -			tbl = (u64 *)page_address(page);
> +			struct page *page = stt->pages[idx / TCES_PER_PAGE];
> +			u64 *tbl = (u64 *)page_address(page);
>  
>  			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
> -			return H_SUCCESS;
>  		}
>  	}
>  
> -	/* Didn't find the liobn, punt it to userspace */
> -	return H_TOO_HARD;
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  2:35     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 737 bytes --]

On Tue, Sep 15, 2015 at 08:49:34PM +1000, Alexey Kardashevskiy wrote:
> At the moment spapr_tce_tables is not protected against races. This makes
> use of RCU-variants of list helpers. As some bits are executed in real
> mode, this makes use of just introduced list_for_each_entry_rcu_notrace().
> 
> This converts release_spapr_tce_table() to a RCU scheduled handler.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Looks correct on my limited knowledge of RCU

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables
@ 2015-12-08  2:35     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  2:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 737 bytes --]

On Tue, Sep 15, 2015 at 08:49:34PM +1000, Alexey Kardashevskiy wrote:
> At the moment spapr_tce_tables is not protected against races. This makes
> use of RCU-variants of list helpers. As some bits are executed in real
> mode, this makes use of just introduced list_for_each_entry_rcu_notrace().
> 
> This converts release_spapr_tce_table() to a RCU scheduled handler.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Looks correct on my limited knowledge of RCU

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  5:18     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3928 bytes --]

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

Hmm.  Does it make sense to account for the descriptor struct itself?
I mean there are lots of little structures the kernel will allocate on
a process's behalf, and I don't think most of them get accounted
against locked vm.

> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> +	long ret = 0;
> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(abs(npages) * sizeof(struct page *));
> +	const long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;

Overflow checks might be useful here, I'm not sure.

> +
> +	if (!current || !current->mm)
> +		return ret; /* process exited */
> +
> +	npages += stt_pages;
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	if (inc) {
> +		long locked, lock_limit;
> +
> +		locked = current->mm->locked_vm + npages;
> +		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> +			ret = -ENOMEM;
> +		else
> +			current->mm->locked_vm += npages;
> +	} else {
> +		if (npages > current->mm->locked_vm)

Should this be a WARN_ON?  It means something has gone wrong
previously in the accounting, doesn't it?

> +			npages = current->mm->locked_vm;
> +
> +		current->mm->locked_vm -= npages;
> +	}
> +
> +	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> +			inc ? '+' : '-',
> +			npages << PAGE_SHIFT,
> +			current->mm->locked_vm << PAGE_SHIFT,
> +			rlimit(RLIMIT_MEMLOCK),
> +			ret ? " - exceeded" : "");
> +
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  static void release_spapr_tce_table(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_table *stt = container_of(head,
>  			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
> +	long npages = kvmppc_stt_npages(stt->window_size);
>  
> -	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> +	for (i = 0; i < npages; i++)
>  		__free_page(stt->pages[i]);
>  
>  	kfree(stt);
> @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> +	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
>  	return 0;
> @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	}
>  
>  	npages = kvmppc_stt_npages(args->window_size);
> +	ret = kvmppc_account_memlimit(npages, true);
> +	if (ret) {
> +		stt = NULL;
> +		goto fail;
> +	}
>  
>  	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
>  		      GFP_KERNEL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
@ 2015-12-08  5:18     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3928 bytes --]

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

Hmm.  Does it make sense to account for the descriptor struct itself?
I mean there are lots of little structures the kernel will allocate on
a process's behalf, and I don't think most of them get accounted
against locked vm.

> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> +	long ret = 0;
> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(abs(npages) * sizeof(struct page *));
> +	const long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;

Overflow checks might be useful here, I'm not sure.

> +
> +	if (!current || !current->mm)
> +		return ret; /* process exited */
> +
> +	npages += stt_pages;
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	if (inc) {
> +		long locked, lock_limit;
> +
> +		locked = current->mm->locked_vm + npages;
> +		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> +			ret = -ENOMEM;
> +		else
> +			current->mm->locked_vm += npages;
> +	} else {
> +		if (npages > current->mm->locked_vm)

Should this be a WARN_ON?  It means something has gone wrong
previously in the accounting, doesn't it?

> +			npages = current->mm->locked_vm;
> +
> +		current->mm->locked_vm -= npages;
> +	}
> +
> +	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> +			inc ? '+' : '-',
> +			npages << PAGE_SHIFT,
> +			current->mm->locked_vm << PAGE_SHIFT,
> +			rlimit(RLIMIT_MEMLOCK),
> +			ret ? " - exceeded" : "");
> +
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  static void release_spapr_tce_table(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_table *stt = container_of(head,
>  			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
> +	long npages = kvmppc_stt_npages(stt->window_size);
>  
> -	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> +	for (i = 0; i < npages; i++)
>  		__free_page(stt->pages[i]);
>  
>  	kfree(stt);
> @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> +	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
>  	return 0;
> @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	}
>  
>  	npages = kvmppc_stt_npages(args->window_size);
> +	ret = kvmppc_account_memlimit(npages, true);
> +	if (ret) {
> +		stt = NULL;
> +		goto fail;
> +	}
>  
>  	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
>  		      GFP_KERNEL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 6/9] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  5:19     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 2764 bytes --]

On Tue, Sep 15, 2015 at 08:49:36PM +1000, Alexey Kardashevskiy wrote:
> SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K
> can be easily used instead, remove SPAPR_TCE_SHIFT.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h | 2 --
>  arch/powerpc/kvm/book3s_64_vio.c         | 3 ++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c      | 4 ++--
>  3 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 2aa79c8..7529aab 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
>  }
>  #endif
>  
> -#define SPAPR_TCE_SHIFT		12
> -
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
>  #endif
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index b70787d..e347856 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -36,12 +36,13 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
>  static long kvmppc_stt_npages(unsigned long window_size)
>  {
> -	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
> +	return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8ae12ac..6cf1ab3 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret)
>  		return ret;
>  
> -	idx = ioba >> SPAPR_TCE_SHIFT;
> +	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>  	page = stt->pages[idx / TCES_PER_PAGE];
>  	tbl = (u64 *)page_address(page);
>  
> @@ -121,7 +121,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (stt) {
>  		ret = kvmppc_ioba_validate(stt, ioba, 1);
>  		if (!ret) {
> -			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> +			unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>  			struct page *page = stt->pages[idx / TCES_PER_PAGE];
>  			u64 *tbl = (u64 *)page_address(page);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 6/9] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
@ 2015-12-08  5:19     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 2764 bytes --]

On Tue, Sep 15, 2015 at 08:49:36PM +1000, Alexey Kardashevskiy wrote:
> SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K
> can be easily used instead, remove SPAPR_TCE_SHIFT.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h | 2 --
>  arch/powerpc/kvm/book3s_64_vio.c         | 3 ++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c      | 4 ++--
>  3 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 2aa79c8..7529aab 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
>  }
>  #endif
>  
> -#define SPAPR_TCE_SHIFT		12
> -
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
>  #endif
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index b70787d..e347856 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -36,12 +36,13 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
>  static long kvmppc_stt_npages(unsigned long window_size)
>  {
> -	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
> +	return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8ae12ac..6cf1ab3 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret)
>  		return ret;
>  
> -	idx = ioba >> SPAPR_TCE_SHIFT;
> +	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>  	page = stt->pages[idx / TCES_PER_PAGE];
>  	tbl = (u64 *)page_address(page);
>  
> @@ -121,7 +121,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (stt) {
>  		ret = kvmppc_ioba_validate(stt, ioba, 1);
>  		if (!ret) {
> -			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> +			unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>  			struct page *page = stt->pages[idx / TCES_PER_PAGE];
>  			u64 *tbl = (u64 *)page_address(page);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  5:27     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 6631 bytes --]

On Tue, Sep 15, 2015 at 08:49:37PM +1000, Alexey Kardashevskiy wrote:
> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> will validate TCE (not to have unexpected bits) and IO address
> (to be within the DMA window boundaries).
> 
> This introduces helpers to validate TCE and IO address.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 89 ++++++++++++++++++++++++++++++++-----
>  2 files changed, 83 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index c6ef05b..fcde896 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages);
> +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 6cf1ab3..f0fd84c 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -36,6 +36,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> +#include <asm/tce.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>   * WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages)
>  {
>  	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> @@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  
>  	return H_SUCCESS;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);

Why does it need to be exported - the new users will still be in the
KVM module, won't they?

> +
> +/*
> + * Validates TCE address.
> + * At the moment flags and page mask are validated.
> + * As the host kernel does not access those addresses (just puts them
> + * to the table and user space is supposed to process them), we can skip
> + * checking other things (such as TCE is a guest RAM address or the page
> + * was actually allocated).

Hmm.  These comments apply given that the only current user of this is
the kvm acceleration of userspace TCE tables, but the name suggests it
would validate any TCE, including in kernel ones for which this would
be unsafe.

> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
> +{
> +	unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) &
> +			~(TCE_PCI_WRITE | TCE_PCI_READ);
> +
> +	if (tce & mask)
> +		return H_PARAMETER;
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
> +
> +/* Note on the use of page_address() in real mode,
> + *
> + * It is safe to use page_address() in real mode on ppc64 because
> + * page_address() is always defined as lowmem_page_address()
> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
> + * operation and does not access page struct.
> + *
> + * Theoretically page_address() could be defined different
> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> + * should be enabled.
> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
> + * is not expected to be enabled on ppc32, page_address()
> + * is safe for ppc32 as well.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static u64 *kvmppc_page_address(struct page *page)
> +{
> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
> +#error TODO: fix to avoid page_address() here
> +#endif
> +	return (u64 *) page_address(page);
> +}
> +
> +/*
> + * Handles TCE requests for emulated devices.
> + * Puts guest TCE values to the table and expects user space to convert them.
> + * Called in both real and virtual modes.
> + * Cannot fail so kvmppc_tce_validate must be called before it.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long idx, unsigned long tce)
> +{
> +	struct page *page;
> +	u64 *tbl;
> +
> +	page = stt->pages[idx / TCES_PER_PAGE];
> +	tbl = kvmppc_page_address(page);
> +
> +	tbl[idx % TCES_PER_PAGE] = tce;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
> @@ -85,9 +159,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret = H_TOO_HARD;
> -	unsigned long idx;
> -	struct page *page;
> -	u64 *tbl;
>  
>  	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>  	/* 	    liobn, ioba, tce); */
> @@ -99,13 +170,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret)
>  		return ret;
>  
> -	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> -	page = stt->pages[idx / TCES_PER_PAGE];
> -	tbl = (u64 *)page_address(page);
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret)
> +		return ret;
>  
> -	/* FIXME: Need to validate the TCE itself */
> -	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -	tbl[idx % TCES_PER_PAGE] = tce;
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>  
>  	return ret;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
@ 2015-12-08  5:27     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 6631 bytes --]

On Tue, Sep 15, 2015 at 08:49:37PM +1000, Alexey Kardashevskiy wrote:
> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> will validate TCE (not to have unexpected bits) and IO address
> (to be within the DMA window boundaries).
> 
> This introduces helpers to validate TCE and IO address.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 89 ++++++++++++++++++++++++++++++++-----
>  2 files changed, 83 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index c6ef05b..fcde896 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages);
> +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 6cf1ab3..f0fd84c 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -36,6 +36,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> +#include <asm/tce.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>   * WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages)
>  {
>  	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> @@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  
>  	return H_SUCCESS;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);

Why does it need to be exported - the new users will still be in the
KVM module, won't they?

> +
> +/*
> + * Validates TCE address.
> + * At the moment flags and page mask are validated.
> + * As the host kernel does not access those addresses (just puts them
> + * to the table and user space is supposed to process them), we can skip
> + * checking other things (such as TCE is a guest RAM address or the page
> + * was actually allocated).

Hmm.  These comments apply given that the only current user of this is
the kvm acceleration of userspace TCE tables, but the name suggests it
would validate any TCE, including in kernel ones for which this would
be unsafe.

> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
> +{
> +	unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) &
> +			~(TCE_PCI_WRITE | TCE_PCI_READ);
> +
> +	if (tce & mask)
> +		return H_PARAMETER;
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
> +
> +/* Note on the use of page_address() in real mode,
> + *
> + * It is safe to use page_address() in real mode on ppc64 because
> + * page_address() is always defined as lowmem_page_address()
> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
> + * operation and does not access page struct.
> + *
> + * Theoretically page_address() could be defined different
> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> + * should be enabled.
> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
> + * is not expected to be enabled on ppc32, page_address()
> + * is safe for ppc32 as well.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static u64 *kvmppc_page_address(struct page *page)
> +{
> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
> +#error TODO: fix to avoid page_address() here
> +#endif
> +	return (u64 *) page_address(page);
> +}
> +
> +/*
> + * Handles TCE requests for emulated devices.
> + * Puts guest TCE values to the table and expects user space to convert them.
> + * Called in both real and virtual modes.
> + * Cannot fail so kvmppc_tce_validate must be called before it.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long idx, unsigned long tce)
> +{
> +	struct page *page;
> +	u64 *tbl;
> +
> +	page = stt->pages[idx / TCES_PER_PAGE];
> +	tbl = kvmppc_page_address(page);
> +
> +	tbl[idx % TCES_PER_PAGE] = tce;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
> @@ -85,9 +159,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret = H_TOO_HARD;
> -	unsigned long idx;
> -	struct page *page;
> -	u64 *tbl;
>  
>  	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>  	/* 	    liobn, ioba, tce); */
> @@ -99,13 +170,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret)
>  		return ret;
>  
> -	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> -	page = stt->pages[idx / TCES_PER_PAGE];
> -	tbl = (u64 *)page_address(page);
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret)
> +		return ret;
>  
> -	/* FIXME: Need to validate the TCE itself */
> -	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -	tbl[idx % TCES_PER_PAGE] = tce;
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>  
>  	return ret;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 8/9] KVM: Fix KVM_SMI chapter number
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  5:29     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 540 bytes --]


On Tue, Sep 15, 2015 at 08:49:38PM +1000, Alexey Kardashevskiy wrote:
> The KVM_SMI capability is following the KVM_S390_SET_IRQ_STATE capability
> which is "4.95", this changes the number of the KVM_SMI chapter to 4.96.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 8/9] KVM: Fix KVM_SMI chapter number
@ 2015-12-08  5:29     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 540 bytes --]


On Tue, Sep 15, 2015 at 08:49:38PM +1000, Alexey Kardashevskiy wrote:
> The KVM_SMI capability is following the KVM_S390_SET_IRQ_STATE capability
> which is "4.95", this changes the number of the KVM_SMI chapter to 4.96.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls
  2015-09-15 10:49   ` Alexey Kardashevskiy
@ 2015-12-08  5:48     ` David Gibson
  -1 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 19242 bytes --]

On Tue, Sep 15, 2015 at 08:49:39PM +1000, Alexey Kardashevskiy wrote:
> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> devices or emulated PCI.  These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition between kernel and user space.
> 
> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> If they can not be handled by the kernel, they are passed on to
> the user space. The user space still has to have an implementation
> for these.
> 
> Both HV and PR-syle KVM are supported.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  Documentation/virtual/kvm/api.txt       |  25 ++++++
>  arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>  arch/powerpc/kvm/book3s_64_vio.c        | 111 +++++++++++++++++++++++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>  arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>  arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>  arch/powerpc/kvm/powerpc.c              |   3 +
>  8 files changed, 350 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index d86d831..593c62a 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error
>  
>  Queues an SMI on the thread's vcpu.
>  
> +4.97 KVM_CAP_PPC_MULTITCE
> +
> +Capability: KVM_CAP_PPC_MULTITCE
> +Architectures: ppc
> +Type: vm
> +
> +This capability means the kernel is capable of handling hypercalls
> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
> +space. This significantly accelerates DMA operations for PPC KVM guests.
> +User space should expect that its handlers for these hypercalls
> +are not going to be called if user space previously registered LIOBN
> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
> +
> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
> +user space might have to advertise it for the guest. For example,
> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
> +present in the "ibm,hypertas-functions" device-tree property.
> +
> +The hypercalls mentioned above may or may not be processed successfully
> +in the kernel based fast path. If they can not be handled by the kernel,
> +they will get passed on to user space. So user space still has to have
> +an implementation for these despite the in kernel acceleration.
> +
> +This capability is always enabled.
> +
>  5. The kvm_run structure
>  ------------------------
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index fcde896..e5b968e 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages);
>  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>  		unsigned long tce);
> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap);
> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long idx, unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages);
> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba);
>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index e347856..d3fc732 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -37,8 +38,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> -
> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#include <asm/tce.h>
>  
>  static long kvmppc_stt_npages(unsigned long window_size)
>  {
> @@ -200,3 +200,110 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	long ret;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret)
> +		return ret;
> +
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS, idx;
> +	unsigned long entry, ua = 0;
> +	u64 __user *tces, tce;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table fits in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & ~IOMMU_PAGE_MASK_4K)

IOMMU_PAGE_MASK_4K doesn't seem like the right thing here.  It is 4k,
but that restriction is derived from the smallest possible main memory
page size, rather than from anything to do with the IOMMU page size.

> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +	tces = (u64 *) ua;

The u64 * should have a usermem sparse annotation, no?

> +	for (i = 0; i < npages; ++i) {
> +		if (get_user(tce, tces + i)) {
> +			ret = H_PARAMETER;
> +			goto unlock_exit;
> +		}
> +		tce = be64_to_cpu(tce);
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
> +
> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce_value);
> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index f0fd84c..bca7b12 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -30,6 +31,7 @@
>  #include <asm/kvm_ppc.h>
>  #include <asm/kvm_book3s.h>
>  #include <asm/mmu-hash64.h>
> +#include <asm/mmu_context.h>
>  #include <asm/hvcall.h>
>  #include <asm/synch.h>
>  #include <asm/ppc-opcode.h>
> @@ -37,6 +39,7 @@
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
>  #include <asm/tce.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -46,7 +49,7 @@
>   * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  		unsigned long liobn)
>  {
>  	struct kvm *kvm = vcpu->kvm;
> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  
>  	return NULL;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>  
>  /*
>   * Validates IO address.
> @@ -151,11 +155,32 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
> -/* WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
> - */
> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> -		      unsigned long ioba, unsigned long tce)
> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap)

I'm kind of surprised there isn't already a function to do this somewhere.

> +{
> +	unsigned long gfn = gpa >> PAGE_SHIFT;
> +	struct kvm_memory_slot *memslot;
> +
> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
> +	if (!memslot)
> +		return -EINVAL;
> +
> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	if (prmap)
> +		*prmap = real_vmalloc_addr(&memslot->arch.rmap[
> +				gfn - memslot->base_gfn]);
> +#endif
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> +		unsigned long ioba, unsigned long tce)
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret = H_TOO_HARD;
> @@ -178,7 +203,111 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	return ret;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
> +		unsigned long ua, unsigned long *phpa)
> +{
> +	pte_t *ptep, pte;
> +	unsigned shift = 0;
> +
> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, &shift);
> +	if (!ptep || !pte_present(*ptep))
> +		return -ENXIO;
> +	pte = *ptep;
> +
> +	if (!shift)
> +		shift = PAGE_SHIFT;
> +
> +	/* Avoid handling anything potentially complicated in realmode */
> +	if (shift > PAGE_SHIFT)
> +		return -EAGAIN;
> +
> +	if (!pte_young(pte))
> +		return -EAGAIN;

Does it also need to be dirty, since you might be writing to this page?

> +	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
> +			(ua & ~PAGE_MASK);
> +
> +	return 0;
> +}
> +
> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS;
> +	unsigned long tces, entry, ua = 0;
> +	unsigned long *rmap = NULL;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & ~IOMMU_PAGE_MASK_4K)
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		return H_TOO_HARD;
> +
> +	lock_rmap(rmap);
> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	unlock_rmap(rmap);
> +
> +	return ret;
> +}
> +
> +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce_value);
> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba)
> @@ -202,3 +331,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
> +
> +#endif /* KVM_BOOK3S_HV_POSSIBLE */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index c5edf17..408b1b1 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -775,7 +775,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  		if (kvmppc_xics_enabled(vcpu)) {
>  			ret = kvmppc_xics_hcall(vcpu, req);
>  			break;
> -		} /* fallthrough */
> +		}
> +		return RESUME_HOST;
> +	case H_PUT_TCE:
> +		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_PUT_TCE_INDIRECT:
> +		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_STUFF_TCE:
> +		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
>  	default:
>  		return RESUME_HOST;
>  	}
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 2273dca..fd1997c 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1917,7 +1917,7 @@ hcall_real_table:
>  	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> -	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
>  	.long	0		/* 0x24 - H_SET_SPRG0 */
>  	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
>  	.long	0		/* 0x2c */
> @@ -1995,8 +1995,8 @@ hcall_real_table:
>  	.long	0		/* 0x12c */
>  	.long	0		/* 0x130 */
>  	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
> -	.long	0		/* 0x138 */
> -	.long	0		/* 0x13c */
> +	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
>  	.long	0		/* 0x140 */
>  	.long	0		/* 0x144 */
>  	.long	0		/* 0x148 */
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index f2c75a1..02176fd 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
>  	return EMULATE_DONE;
>  }
>  
> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
> +			tce, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
>  static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>  {
>  	long rc = kvmppc_xics_hcall(vcpu, cmd);
> @@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>  		return kvmppc_h_pr_bulk_remove(vcpu);
>  	case H_PUT_TCE:
>  		return kvmppc_h_pr_put_tce(vcpu);
> +	case H_PUT_TCE_INDIRECT:
> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
> +	case H_STUFF_TCE:
> +		return kvmppc_h_pr_stuff_tce(vcpu);
>  	case H_CEDE:
>  		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>  		kvm_vcpu_block(vcpu);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2e51289..c7c2802 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -566,6 +566,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_PPC_GET_SMMU_INFO:
>  		r = 1;
>  		break;
> +	case KVM_CAP_SPAPR_MULTITCE:
> +		r = 1;
> +		break;
>  #endif
>  	default:
>  		r = 0;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls
@ 2015-12-08  5:48     ` David Gibson
  0 siblings, 0 replies; 46+ messages in thread
From: David Gibson @ 2015-12-08  5:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 19242 bytes --]

On Tue, Sep 15, 2015 at 08:49:39PM +1000, Alexey Kardashevskiy wrote:
> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> devices or emulated PCI.  These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition between kernel and user space.
> 
> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> If they can not be handled by the kernel, they are passed on to
> the user space. The user space still has to have an implementation
> for these.
> 
> Both HV and PR-syle KVM are supported.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  Documentation/virtual/kvm/api.txt       |  25 ++++++
>  arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>  arch/powerpc/kvm/book3s_64_vio.c        | 111 +++++++++++++++++++++++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>  arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>  arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>  arch/powerpc/kvm/powerpc.c              |   3 +
>  8 files changed, 350 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index d86d831..593c62a 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error
>  
>  Queues an SMI on the thread's vcpu.
>  
> +4.97 KVM_CAP_PPC_MULTITCE
> +
> +Capability: KVM_CAP_PPC_MULTITCE
> +Architectures: ppc
> +Type: vm
> +
> +This capability means the kernel is capable of handling hypercalls
> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
> +space. This significantly accelerates DMA operations for PPC KVM guests.
> +User space should expect that its handlers for these hypercalls
> +are not going to be called if user space previously registered LIOBN
> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
> +
> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
> +user space might have to advertise it for the guest. For example,
> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
> +present in the "ibm,hypertas-functions" device-tree property.
> +
> +The hypercalls mentioned above may or may not be processed successfully
> +in the kernel based fast path. If they can not be handled by the kernel,
> +they will get passed on to user space. So user space still has to have
> +an implementation for these despite the in kernel acceleration.
> +
> +This capability is always enabled.
> +
>  5. The kvm_run structure
>  ------------------------
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index fcde896..e5b968e 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages);
>  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>  		unsigned long tce);
> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap);
> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long idx, unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages);
> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba);
>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index e347856..d3fc732 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -37,8 +38,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> -
> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#include <asm/tce.h>
>  
>  static long kvmppc_stt_npages(unsigned long window_size)
>  {
> @@ -200,3 +200,110 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	long ret;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret)
> +		return ret;
> +
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS, idx;
> +	unsigned long entry, ua = 0;
> +	u64 __user *tces, tce;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table fits in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & ~IOMMU_PAGE_MASK_4K)

IOMMU_PAGE_MASK_4K doesn't seem like the right thing here.  It is 4k,
but that restriction is derived from the smallest possible main memory
page size, rather than from anything to do with the IOMMU page size.

> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +	tces = (u64 *) ua;

The u64 * should have a usermem sparse annotation, no?

> +	for (i = 0; i < npages; ++i) {
> +		if (get_user(tce, tces + i)) {
> +			ret = H_PARAMETER;
> +			goto unlock_exit;
> +		}
> +		tce = be64_to_cpu(tce);
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
> +
> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce_value);
> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index f0fd84c..bca7b12 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -30,6 +31,7 @@
>  #include <asm/kvm_ppc.h>
>  #include <asm/kvm_book3s.h>
>  #include <asm/mmu-hash64.h>
> +#include <asm/mmu_context.h>
>  #include <asm/hvcall.h>
>  #include <asm/synch.h>
>  #include <asm/ppc-opcode.h>
> @@ -37,6 +39,7 @@
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
>  #include <asm/tce.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -46,7 +49,7 @@
>   * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  		unsigned long liobn)
>  {
>  	struct kvm *kvm = vcpu->kvm;
> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  
>  	return NULL;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>  
>  /*
>   * Validates IO address.
> @@ -151,11 +155,32 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
> -/* WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
> - */
> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> -		      unsigned long ioba, unsigned long tce)
> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap)

I'm kind of surprised there isn't already a function to do this somewhere.

> +{
> +	unsigned long gfn = gpa >> PAGE_SHIFT;
> +	struct kvm_memory_slot *memslot;
> +
> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
> +	if (!memslot)
> +		return -EINVAL;
> +
> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	if (prmap)
> +		*prmap = real_vmalloc_addr(&memslot->arch.rmap[
> +				gfn - memslot->base_gfn]);
> +#endif
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> +		unsigned long ioba, unsigned long tce)
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret = H_TOO_HARD;
> @@ -178,7 +203,111 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	return ret;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
> +		unsigned long ua, unsigned long *phpa)
> +{
> +	pte_t *ptep, pte;
> +	unsigned shift = 0;
> +
> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, &shift);
> +	if (!ptep || !pte_present(*ptep))
> +		return -ENXIO;
> +	pte = *ptep;
> +
> +	if (!shift)
> +		shift = PAGE_SHIFT;
> +
> +	/* Avoid handling anything potentially complicated in realmode */
> +	if (shift > PAGE_SHIFT)
> +		return -EAGAIN;
> +
> +	if (!pte_young(pte))
> +		return -EAGAIN;

Does it also need to be dirty, since you might be writing to this page?

> +	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
> +			(ua & ~PAGE_MASK);
> +
> +	return 0;
> +}
> +
> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS;
> +	unsigned long tces, entry, ua = 0;
> +	unsigned long *rmap = NULL;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & ~IOMMU_PAGE_MASK_4K)
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		return H_TOO_HARD;
> +
> +	lock_rmap(rmap);
> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	unlock_rmap(rmap);
> +
> +	return ret;
> +}
> +
> +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce_value);
> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba)
> @@ -202,3 +331,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
> +
> +#endif /* KVM_BOOK3S_HV_POSSIBLE */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index c5edf17..408b1b1 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -775,7 +775,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  		if (kvmppc_xics_enabled(vcpu)) {
>  			ret = kvmppc_xics_hcall(vcpu, req);
>  			break;
> -		} /* fallthrough */
> +		}
> +		return RESUME_HOST;
> +	case H_PUT_TCE:
> +		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_PUT_TCE_INDIRECT:
> +		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_STUFF_TCE:
> +		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
>  	default:
>  		return RESUME_HOST;
>  	}
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 2273dca..fd1997c 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1917,7 +1917,7 @@ hcall_real_table:
>  	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> -	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
>  	.long	0		/* 0x24 - H_SET_SPRG0 */
>  	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
>  	.long	0		/* 0x2c */
> @@ -1995,8 +1995,8 @@ hcall_real_table:
>  	.long	0		/* 0x12c */
>  	.long	0		/* 0x130 */
>  	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
> -	.long	0		/* 0x138 */
> -	.long	0		/* 0x13c */
> +	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
>  	.long	0		/* 0x140 */
>  	.long	0		/* 0x144 */
>  	.long	0		/* 0x148 */
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index f2c75a1..02176fd 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
>  	return EMULATE_DONE;
>  }
>  
> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
> +			tce, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
>  static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>  {
>  	long rc = kvmppc_xics_hcall(vcpu, cmd);
> @@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>  		return kvmppc_h_pr_bulk_remove(vcpu);
>  	case H_PUT_TCE:
>  		return kvmppc_h_pr_put_tce(vcpu);
> +	case H_PUT_TCE_INDIRECT:
> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
> +	case H_STUFF_TCE:
> +		return kvmppc_h_pr_stuff_tce(vcpu);
>  	case H_CEDE:
>  		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>  		kvm_vcpu_block(vcpu);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2e51289..c7c2802 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -566,6 +566,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_PPC_GET_SMMU_INFO:
>  		r = 1;
>  		break;
> +	case KVM_CAP_SPAPR_MULTITCE:
> +		r = 1;
> +		break;
>  #endif
>  	default:
>  		r = 0;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  2015-12-08  5:27     ` David Gibson
@ 2015-12-22  7:24       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-12-22  7:24 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

On 12/08/2015 04:27 PM, David Gibson wrote:
> On Tue, Sep 15, 2015 at 08:49:37PM +1000, Alexey Kardashevskiy wrote:
>> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
>> will validate TCE (not to have unexpected bits) and IO address
>> (to be within the DMA window boundaries).
>>
>> This introduces helpers to validate TCE and IO address.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
>>   arch/powerpc/kvm/book3s_64_vio_hv.c | 89 ++++++++++++++++++++++++++++++++-----
>>   2 files changed, 83 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index c6ef05b..fcde896 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>>
>>   extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>   				struct kvm_create_spapr_tce *args);
>> +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>> +		unsigned long ioba, unsigned long npages);
>> +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>> +		unsigned long tce);
>>   extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba, unsigned long tce);
>>   extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 6cf1ab3..f0fd84c 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -36,6 +36,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> +#include <asm/tce.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> @@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>    * WARNING: This will be called in real-mode on HV KVM and virtual
>>    *          mode on PR KVM
>>    */
>> -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>> +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>   		unsigned long ioba, unsigned long npages)
>>   {
>>   	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
>> @@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>
>>   	return H_SUCCESS;
>>   }
>> +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
>
> Why does it need to be exported - the new users will still be in the
> KVM module, won't they?


book3s_64_vio_hv.c contains realmode code and is always compiled into 
vmlinux and the helper is meant to be called from book3s_64_vio.c which may 
compile as a module.


>
>> +
>> +/*
>> + * Validates TCE address.
>> + * At the moment flags and page mask are validated.
>> + * As the host kernel does not access those addresses (just puts them
>> + * to the table and user space is supposed to process them), we can skip
>> + * checking other things (such as TCE is a guest RAM address or the page
>> + * was actually allocated).
>
> Hmm.  These comments apply given that the only current user of this is
> the kvm acceleration of userspace TCE tables, but the name suggests it
> would validate any TCE, including in kernel ones for which this would
> be unsafe.


The function has the "kvmppc_" prefix and the file besides in 
arch/powerpc/kvm, so to my taste it is self-explanatory that it only 
handles TCEs from KVM guests (not even from a random user-space tool), no?



>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
>> +{
>> +	unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) &
>> +			~(TCE_PCI_WRITE | TCE_PCI_READ);
>> +
>> +	if (tce & mask)
>> +		return H_PARAMETER;
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
>> +
>> +/* Note on the use of page_address() in real mode,
>> + *
>> + * It is safe to use page_address() in real mode on ppc64 because
>> + * page_address() is always defined as lowmem_page_address()
>> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
>> + * operation and does not access page struct.
>> + *
>> + * Theoretically page_address() could be defined different
>> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
>> + * should be enabled.
>> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
>> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
>> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
>> + * is not expected to be enabled on ppc32, page_address()
>> + * is safe for ppc32 as well.
>> + *
>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +static u64 *kvmppc_page_address(struct page *page)
>> +{
>> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
>> +#error TODO: fix to avoid page_address() here
>> +#endif
>> +	return (u64 *) page_address(page);
>> +}
>> +
>> +/*
>> + * Handles TCE requests for emulated devices.
>> + * Puts guest TCE values to the table and expects user space to convert them.
>> + * Called in both real and virtual modes.
>> + * Cannot fail so kvmppc_tce_validate must be called before it.
>> + *
>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>> +		unsigned long idx, unsigned long tce)
>> +{
>> +	struct page *page;
>> +	u64 *tbl;
>> +
>> +	page = stt->pages[idx / TCES_PER_PAGE];
>> +	tbl = kvmppc_page_address(page);
>> +
>> +	tbl[idx % TCES_PER_PAGE] = tce;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>>
>>   /* WARNING: This will be called in real-mode on HV KVM and virtual
>>    *          mode on PR KVM
>> @@ -85,9 +159,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   {
>>   	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>>   	long ret = H_TOO_HARD;
>> -	unsigned long idx;
>> -	struct page *page;
>> -	u64 *tbl;
>>
>>   	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>>   	/* 	    liobn, ioba, tce); */
>> @@ -99,13 +170,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   	if (ret)
>>   		return ret;
>>
>> -	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>> -	page = stt->pages[idx / TCES_PER_PAGE];
>> -	tbl = (u64 *)page_address(page);
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret)
>> +		return ret;
>>
>> -	/* FIXME: Need to validate the TCE itself */
>> -	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
>> -	tbl[idx % TCES_PER_PAGE] = tce;
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>>
>>   	return ret;
>>   }
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
@ 2015-12-22  7:24       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-12-22  7:24 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

On 12/08/2015 04:27 PM, David Gibson wrote:
> On Tue, Sep 15, 2015 at 08:49:37PM +1000, Alexey Kardashevskiy wrote:
>> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
>> will validate TCE (not to have unexpected bits) and IO address
>> (to be within the DMA window boundaries).
>>
>> This introduces helpers to validate TCE and IO address.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
>>   arch/powerpc/kvm/book3s_64_vio_hv.c | 89 ++++++++++++++++++++++++++++++++-----
>>   2 files changed, 83 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index c6ef05b..fcde896 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>>
>>   extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>   				struct kvm_create_spapr_tce *args);
>> +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>> +		unsigned long ioba, unsigned long npages);
>> +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>> +		unsigned long tce);
>>   extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba, unsigned long tce);
>>   extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 6cf1ab3..f0fd84c 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -36,6 +36,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> +#include <asm/tce.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> @@ -64,7 +65,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>    * WARNING: This will be called in real-mode on HV KVM and virtual
>>    *          mode on PR KVM
>>    */
>> -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>> +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>   		unsigned long ioba, unsigned long npages)
>>   {
>>   	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
>> @@ -76,6 +77,79 @@ static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>
>>   	return H_SUCCESS;
>>   }
>> +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
>
> Why does it need to be exported - the new users will still be in the
> KVM module, won't they?


book3s_64_vio_hv.c contains realmode code and is always compiled into 
vmlinux and the helper is meant to be called from book3s_64_vio.c which may 
compile as a module.


>
>> +
>> +/*
>> + * Validates TCE address.
>> + * At the moment flags and page mask are validated.
>> + * As the host kernel does not access those addresses (just puts them
>> + * to the table and user space is supposed to process them), we can skip
>> + * checking other things (such as TCE is a guest RAM address or the page
>> + * was actually allocated).
>
> Hmm.  These comments apply given that the only current user of this is
> the kvm acceleration of userspace TCE tables, but the name suggests it
> would validate any TCE, including in kernel ones for which this would
> be unsafe.


The function has the "kvmppc_" prefix and the file besides in 
arch/powerpc/kvm, so to my taste it is self-explanatory that it only 
handles TCEs from KVM guests (not even from a random user-space tool), no?



>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
>> +{
>> +	unsigned long mask = ((1ULL << IOMMU_PAGE_SHIFT_4K) - 1) &
>> +			~(TCE_PCI_WRITE | TCE_PCI_READ);
>> +
>> +	if (tce & mask)
>> +		return H_PARAMETER;
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
>> +
>> +/* Note on the use of page_address() in real mode,
>> + *
>> + * It is safe to use page_address() in real mode on ppc64 because
>> + * page_address() is always defined as lowmem_page_address()
>> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
>> + * operation and does not access page struct.
>> + *
>> + * Theoretically page_address() could be defined different
>> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
>> + * should be enabled.
>> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
>> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
>> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
>> + * is not expected to be enabled on ppc32, page_address()
>> + * is safe for ppc32 as well.
>> + *
>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +static u64 *kvmppc_page_address(struct page *page)
>> +{
>> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
>> +#error TODO: fix to avoid page_address() here
>> +#endif
>> +	return (u64 *) page_address(page);
>> +}
>> +
>> +/*
>> + * Handles TCE requests for emulated devices.
>> + * Puts guest TCE values to the table and expects user space to convert them.
>> + * Called in both real and virtual modes.
>> + * Cannot fail so kvmppc_tce_validate must be called before it.
>> + *
>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>> +		unsigned long idx, unsigned long tce)
>> +{
>> +	struct page *page;
>> +	u64 *tbl;
>> +
>> +	page = stt->pages[idx / TCES_PER_PAGE];
>> +	tbl = kvmppc_page_address(page);
>> +
>> +	tbl[idx % TCES_PER_PAGE] = tce;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>>
>>   /* WARNING: This will be called in real-mode on HV KVM and virtual
>>    *          mode on PR KVM
>> @@ -85,9 +159,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   {
>>   	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>>   	long ret = H_TOO_HARD;
>> -	unsigned long idx;
>> -	struct page *page;
>> -	u64 *tbl;
>>
>>   	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>>   	/* 	    liobn, ioba, tce); */
>> @@ -99,13 +170,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   	if (ret)
>>   		return ret;
>>
>> -	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>> -	page = stt->pages[idx / TCES_PER_PAGE];
>> -	tbl = (u64 *)page_address(page);
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret)
>> +		return ret;
>>
>> -	/* FIXME: Need to validate the TCE itself */
>> -	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
>> -	tbl[idx % TCES_PER_PAGE] = tce;
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>>
>>   	return ret;
>>   }
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls
  2015-12-08  5:48     ` David Gibson
@ 2015-12-22  7:42       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-12-22  7:42 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

On 12/08/2015 04:48 PM, David Gibson wrote:
> On Tue, Sep 15, 2015 at 08:49:39PM +1000, Alexey Kardashevskiy wrote:
>> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
>> devices or emulated PCI.  These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition between kernel and user space.
>>
>> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
>> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
>> If they can not be handled by the kernel, they are passed on to
>> the user space. The user space still has to have an implementation
>> for these.
>>
>> Both HV and PR-syle KVM are supported.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   Documentation/virtual/kvm/api.txt       |  25 ++++++
>>   arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>>   arch/powerpc/kvm/book3s_64_vio.c        | 111 +++++++++++++++++++++++-
>>   arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>>   arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>>   arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>>   arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>>   arch/powerpc/kvm/powerpc.c              |   3 +
>>   8 files changed, 350 insertions(+), 13 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index d86d831..593c62a 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error
>>
>>   Queues an SMI on the thread's vcpu.
>>
>> +4.97 KVM_CAP_PPC_MULTITCE
>> +
>> +Capability: KVM_CAP_PPC_MULTITCE
>> +Architectures: ppc
>> +Type: vm
>> +
>> +This capability means the kernel is capable of handling hypercalls
>> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
>> +space. This significantly accelerates DMA operations for PPC KVM guests.
>> +User space should expect that its handlers for these hypercalls
>> +are not going to be called if user space previously registered LIOBN
>> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
>> +
>> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
>> +user space might have to advertise it for the guest. For example,
>> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
>> +present in the "ibm,hypertas-functions" device-tree property.
>> +
>> +The hypercalls mentioned above may or may not be processed successfully
>> +in the kernel based fast path. If they can not be handled by the kernel,
>> +they will get passed on to user space. So user space still has to have
>> +an implementation for these despite the in kernel acceleration.
>> +
>> +This capability is always enabled.
>> +
>>   5. The kvm_run structure
>>   ------------------------
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index fcde896..e5b968e 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>>
>>   extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>   				struct kvm_create_spapr_tce *args);
>> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>>   extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>   		unsigned long ioba, unsigned long npages);
>>   extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>>   		unsigned long tce);
>> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap);
>> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
>> +		unsigned long idx, unsigned long tce);
>>   extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba, unsigned long tce);
>> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages);
>> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages);
>>   extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba);
>>   extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index e347856..d3fc732 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -37,8 +38,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> -
>> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#include <asm/tce.h>
>>
>>   static long kvmppc_stt_npages(unsigned long window_size)
>>   {
>> @@ -200,3 +200,110 @@ fail:
>>   	}
>>   	return ret;
>>   }
>> +
>> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce)
>> +{
>> +	long ret;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret)
>> +		return ret;
>> +
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS, idx;
>> +	unsigned long entry, ua = 0;
>> +	u64 __user *tces, tce;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table fits in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & ~IOMMU_PAGE_MASK_4K)
>
> IOMMU_PAGE_MASK_4K doesn't seem like the right thing here.  It is 4k,
> but that restriction is derived from the smallest possible main memory
> page size, rather than from anything to do with the IOMMU page size.


Ok, I'll make it SZ_4K then.


>
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret)
>> +		return ret;
>> +
>> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +	tces = (u64 *) ua;
>
> The u64 * should have a usermem sparse annotation, no?


Like this?

tces = (u64 __user *) ua;


>
>> +	for (i = 0; i < npages; ++i) {
>> +		if (get_user(tce, tces + i)) {
>> +			ret = H_PARAMETER;
>> +			goto unlock_exit;
>> +		}
>> +		tce = be64_to_cpu(tce);
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
>> +
>> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce_value);
>> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index f0fd84c..bca7b12 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -30,6 +31,7 @@
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>>   #include <asm/mmu-hash64.h>
>> +#include <asm/mmu_context.h>
>>   #include <asm/hvcall.h>
>>   #include <asm/synch.h>
>>   #include <asm/ppc-opcode.h>
>> @@ -37,6 +39,7 @@
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>>   #include <asm/tce.h>
>> +#include <asm/iommu.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> @@ -46,7 +49,7 @@
>>    * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>>    *          mode on PR KVM
>>    */
>> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>   		unsigned long liobn)
>>   {
>>   	struct kvm *kvm = vcpu->kvm;
>> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>
>>   	return NULL;
>>   }
>> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>>
>>   /*
>>    * Validates IO address.
>> @@ -151,11 +155,32 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>>   }
>>   EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>>
>> -/* WARNING: This will be called in real-mode on HV KVM and virtual
>> - *          mode on PR KVM
>> - */
>> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> -		      unsigned long ioba, unsigned long tce)
>> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap)
>
> I'm kind of surprised there isn't already a function to do this somewhere.
>
>> +{
>> +	unsigned long gfn = gpa >> PAGE_SHIFT;
>> +	struct kvm_memory_slot *memslot;
>> +
>> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
>> +	if (!memslot)
>> +		return -EINVAL;
>> +
>> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
>> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +	if (prmap)
>> +		*prmap = real_vmalloc_addr(&memslot->arch.rmap[
>> +				gfn - memslot->base_gfn]);
>> +#endif
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> +		unsigned long ioba, unsigned long tce)
>>   {
>>   	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>>   	long ret = H_TOO_HARD;
>> @@ -178,7 +203,111 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>
>>   	return ret;
>>   }
>> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
>> +		unsigned long ua, unsigned long *phpa)
>> +{
>> +	pte_t *ptep, pte;
>> +	unsigned shift = 0;
>> +
>> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, &shift);
>> +	if (!ptep || !pte_present(*ptep))
>> +		return -ENXIO;
>> +	pte = *ptep;
>> +
>> +	if (!shift)
>> +		shift = PAGE_SHIFT;
>> +
>> +	/* Avoid handling anything potentially complicated in realmode */
>> +	if (shift > PAGE_SHIFT)
>> +		return -EAGAIN;
>> +
>> +	if (!pte_young(pte))
>> +		return -EAGAIN;
>
> Does it also need to be dirty, since you might be writing to this page?

This particular helper is used to get the address of the TCE list page (the 
actual TCEs for VFIO will be translated using memory pre-registration 
mechanism) so no, we should not be writing to this page. And setting the 
dirty bit is done by iommu_tce_xchg/iommu_tce_xchg_rm anyway, when needed.



-- 
Alexey

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls
@ 2015-12-22  7:42       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 46+ messages in thread
From: Alexey Kardashevskiy @ 2015-12-22  7:42 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, Alexander Graf, kvm-ppc, kvm

On 12/08/2015 04:48 PM, David Gibson wrote:
> On Tue, Sep 15, 2015 at 08:49:39PM +1000, Alexey Kardashevskiy wrote:
>> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
>> devices or emulated PCI.  These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition between kernel and user space.
>>
>> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
>> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
>> If they can not be handled by the kernel, they are passed on to
>> the user space. The user space still has to have an implementation
>> for these.
>>
>> Both HV and PR-syle KVM are supported.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   Documentation/virtual/kvm/api.txt       |  25 ++++++
>>   arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>>   arch/powerpc/kvm/book3s_64_vio.c        | 111 +++++++++++++++++++++++-
>>   arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>>   arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>>   arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>>   arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>>   arch/powerpc/kvm/powerpc.c              |   3 +
>>   8 files changed, 350 insertions(+), 13 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index d86d831..593c62a 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3019,6 +3019,31 @@ Returns: 0 on success, -1 on error
>>
>>   Queues an SMI on the thread's vcpu.
>>
>> +4.97 KVM_CAP_PPC_MULTITCE
>> +
>> +Capability: KVM_CAP_PPC_MULTITCE
>> +Architectures: ppc
>> +Type: vm
>> +
>> +This capability means the kernel is capable of handling hypercalls
>> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
>> +space. This significantly accelerates DMA operations for PPC KVM guests.
>> +User space should expect that its handlers for these hypercalls
>> +are not going to be called if user space previously registered LIOBN
>> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
>> +
>> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
>> +user space might have to advertise it for the guest. For example,
>> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
>> +present in the "ibm,hypertas-functions" device-tree property.
>> +
>> +The hypercalls mentioned above may or may not be processed successfully
>> +in the kernel based fast path. If they can not be handled by the kernel,
>> +they will get passed on to user space. So user space still has to have
>> +an implementation for these despite the in kernel acceleration.
>> +
>> +This capability is always enabled.
>> +
>>   5. The kvm_run structure
>>   ------------------------
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index fcde896..e5b968e 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>>
>>   extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>   				struct kvm_create_spapr_tce *args);
>> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>>   extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>   		unsigned long ioba, unsigned long npages);
>>   extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>>   		unsigned long tce);
>> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap);
>> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
>> +		unsigned long idx, unsigned long tce);
>>   extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba, unsigned long tce);
>> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages);
>> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages);
>>   extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba);
>>   extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index e347856..d3fc732 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -37,8 +38,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> -
>> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#include <asm/tce.h>
>>
>>   static long kvmppc_stt_npages(unsigned long window_size)
>>   {
>> @@ -200,3 +200,110 @@ fail:
>>   	}
>>   	return ret;
>>   }
>> +
>> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce)
>> +{
>> +	long ret;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret)
>> +		return ret;
>> +
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS, idx;
>> +	unsigned long entry, ua = 0;
>> +	u64 __user *tces, tce;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table fits in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & ~IOMMU_PAGE_MASK_4K)
>
> IOMMU_PAGE_MASK_4K doesn't seem like the right thing here.  It is 4k,
> but that restriction is derived from the smallest possible main memory
> page size, rather than from anything to do with the IOMMU page size.


Ok, I'll make it SZ_4K then.


>
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret)
>> +		return ret;
>> +
>> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +	tces = (u64 *) ua;
>
> The u64 * should have a usermem sparse annotation, no?


Like this?

tces = (u64 __user *) ua;


>
>> +	for (i = 0; i < npages; ++i) {
>> +		if (get_user(tce, tces + i)) {
>> +			ret = H_PARAMETER;
>> +			goto unlock_exit;
>> +		}
>> +		tce = be64_to_cpu(tce);
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
>> +
>> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce_value);
>> +	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index f0fd84c..bca7b12 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -30,6 +31,7 @@
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>>   #include <asm/mmu-hash64.h>
>> +#include <asm/mmu_context.h>
>>   #include <asm/hvcall.h>
>>   #include <asm/synch.h>
>>   #include <asm/ppc-opcode.h>
>> @@ -37,6 +39,7 @@
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>>   #include <asm/tce.h>
>> +#include <asm/iommu.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> @@ -46,7 +49,7 @@
>>    * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>>    *          mode on PR KVM
>>    */
>> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>   		unsigned long liobn)
>>   {
>>   	struct kvm *kvm = vcpu->kvm;
>> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>
>>   	return NULL;
>>   }
>> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>>
>>   /*
>>    * Validates IO address.
>> @@ -151,11 +155,32 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>>   }
>>   EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>>
>> -/* WARNING: This will be called in real-mode on HV KVM and virtual
>> - *          mode on PR KVM
>> - */
>> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> -		      unsigned long ioba, unsigned long tce)
>> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap)
>
> I'm kind of surprised there isn't already a function to do this somewhere.
>
>> +{
>> +	unsigned long gfn = gpa >> PAGE_SHIFT;
>> +	struct kvm_memory_slot *memslot;
>> +
>> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
>> +	if (!memslot)
>> +		return -EINVAL;
>> +
>> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
>> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +	if (prmap)
>> +		*prmap = real_vmalloc_addr(&memslot->arch.rmap[
>> +				gfn - memslot->base_gfn]);
>> +#endif
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> +		unsigned long ioba, unsigned long tce)
>>   {
>>   	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>>   	long ret = H_TOO_HARD;
>> @@ -178,7 +203,111 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>
>>   	return ret;
>>   }
>> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
>> +		unsigned long ua, unsigned long *phpa)
>> +{
>> +	pte_t *ptep, pte;
>> +	unsigned shift = 0;
>> +
>> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, &shift);
>> +	if (!ptep || !pte_present(*ptep))
>> +		return -ENXIO;
>> +	pte = *ptep;
>> +
>> +	if (!shift)
>> +		shift = PAGE_SHIFT;
>> +
>> +	/* Avoid handling anything potentially complicated in realmode */
>> +	if (shift > PAGE_SHIFT)
>> +		return -EAGAIN;
>> +
>> +	if (!pte_young(pte))
>> +		return -EAGAIN;
>
> Does it also need to be dirty, since you might be writing to this page?

This particular helper is used to get the address of the TCE list page (the 
actual TCEs for VFIO will be translated using memory pre-registration 
mechanism) so no, we should not be writing to this page. And setting the 
dirty bit is done by iommu_tce_xchg/iommu_tce_xchg_rm anyway, when needed.



-- 
Alexey

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-12-22  7:42 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-15 10:49 [PATCH kernel 0/9] KVM: PPC: Add in-kernel multitce handling Alexey Kardashevskiy
2015-09-15 10:49 ` Alexey Kardashevskiy
2015-09-15 10:49 ` [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:05   ` David Gibson
2015-12-08  2:05     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:08   ` David Gibson
2015-12-08  2:08     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:18   ` David Gibson
2015-12-08  2:18     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:35   ` David Gibson
2015-12-08  2:35     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-11-30  2:06   ` Paul Mackerras
2015-11-30  2:06     ` Paul Mackerras
2015-11-30  5:09     ` Alexey Kardashevskiy
2015-11-30  5:09       ` Alexey Kardashevskiy
2015-12-08  5:18   ` David Gibson
2015-12-08  5:18     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 6/9] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:19   ` David Gibson
2015-12-08  5:19     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:27   ` David Gibson
2015-12-08  5:27     ` David Gibson
2015-12-22  7:24     ` Alexey Kardashevskiy
2015-12-22  7:24       ` Alexey Kardashevskiy
2015-09-15 10:49 ` [PATCH kernel 8/9] KVM: Fix KVM_SMI chapter number Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:29   ` David Gibson
2015-12-08  5:29     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:48   ` David Gibson
2015-12-08  5:48     ` David Gibson
2015-12-22  7:42     ` Alexey Kardashevskiy
2015-12-22  7:42       ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.