All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel v2 0/6] KVM: PPC: Add in-kernel multitce handling
@ 2016-01-21  7:39 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

These patches enable in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls which allow doing multiple (up to 512) TCE entries
update in a single call saving time on switching context. QEMU already
supports these hypercalls so this is just an optimization.

Both HV and PR KVM modes are supported.

This does not affect VFIO, this support is coming next.

This depends on "powerpc: Make vmalloc_to_phys() public".

Please comment. Thanks.


Alexey Kardashevskiy (6):
  KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  KVM: PPC: Use RCU for arch.spapr_tce_tables
  KVM: PPC: Account TCE-containing pages in locked_vm
  KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  KVM: PPC: Add support for multiple-TCE hcalls

 Documentation/virtual/kvm/api.txt        |  25 +++
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 -
 arch/powerpc/include/asm/kvm_host.h      |   1 +
 arch/powerpc/include/asm/kvm_ppc.h       |  16 ++
 arch/powerpc/kvm/book3s.c                |   2 +-
 arch/powerpc/kvm/book3s_64_vio.c         | 188 ++++++++++++++++--
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 318 ++++++++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv.c             |  26 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c        |  35 ++++
 arch/powerpc/kvm/powerpc.c               |   3 +
 11 files changed, 557 insertions(+), 65 deletions(-)

-- 
2.5.0.rc3


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 0/6] KVM: PPC: Add in-kernel multitce handling
@ 2016-01-21  7:39 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

These patches enable in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls which allow doing multiple (up to 512) TCE entries
update in a single call saving time on switching context. QEMU already
supports these hypercalls so this is just an optimization.

Both HV and PR KVM modes are supported.

This does not affect VFIO, this support is coming next.

This depends on "powerpc: Make vmalloc_to_phys() public".

Please comment. Thanks.


Alexey Kardashevskiy (6):
  KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  KVM: PPC: Use RCU for arch.spapr_tce_tables
  KVM: PPC: Account TCE-containing pages in locked_vm
  KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  KVM: PPC: Add support for multiple-TCE hcalls

 Documentation/virtual/kvm/api.txt        |  25 +++
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 -
 arch/powerpc/include/asm/kvm_host.h      |   1 +
 arch/powerpc/include/asm/kvm_ppc.h       |  16 ++
 arch/powerpc/kvm/book3s.c                |   2 +-
 arch/powerpc/kvm/book3s_64_vio.c         | 188 ++++++++++++++++--
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 318 ++++++++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv.c             |  26 ++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c        |  35 ++++
 arch/powerpc/kvm/powerpc.c               |   3 +
 11 files changed, 557 insertions(+), 65 deletions(-)

-- 
2.5.0.rc3


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  2016-01-21  7:39 ` Alexey Kardashevskiy
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
patches applied nicer.

This moves the ioba boundaries check to a helper and adds a check for
least bits which have to be zeros.

The patch is pretty mechanical (only check for least ioba bits is added)
so no change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changelog:
v2:
* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
* made error reporting cleaner
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
 1 file changed, 72 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3..862f9a2 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -35,71 +35,104 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
+/*
+ * Finds a TCE table descriptor by LIOBN.
+ *
+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+		unsigned long liobn)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvmppc_spapr_tce_table *stt;
+
+	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
+		if (stt->liobn == liobn)
+			return stt;
+
+	return NULL;
+}
+
+/*
+ * Validates IO address.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages)
+{
+	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
+
+	if ((ioba & mask) || (idx + npages > size))
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba, unsigned long tce)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret;
+	unsigned long idx;
+	struct page *page;
+	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn == liobn) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
+	if (!stt)
+		return H_TOO_HARD;
 
-			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
-			/* 	    liobn, stt, stt->window_size); */
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret != H_SUCCESS)
+		return ret;
 
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+	idx = ioba >> SPAPR_TCE_SHIFT;
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = (u64 *)page_address(page);
 
-			/* FIXME: Need to validate the TCE itself */
-			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-			tbl[idx % TCES_PER_PAGE] = tce;
-			return H_SUCCESS;
-		}
-	}
+	/* FIXME: Need to validate the TCE itself */
+	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
+	tbl[idx % TCES_PER_PAGE] = tce;
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+	return H_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
-		      unsigned long ioba)
+		unsigned long ioba)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret;
+	unsigned long idx;
+	struct page *page;
+	u64 *tbl;
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn == liobn) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
+	if (!stt)
+		return H_TOO_HARD;
 
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret != H_SUCCESS)
+		return ret;
 
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+	idx = ioba >> SPAPR_TCE_SHIFT;
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = (u64 *)page_address(page);
 
-			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
-			return H_SUCCESS;
-		}
-	}
+	vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+	return H_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
patches applied nicer.

This moves the ioba boundaries check to a helper and adds a check for
least bits which have to be zeros.

The patch is pretty mechanical (only check for least ioba bits is added)
so no change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changelog:
v2:
* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
* made error reporting cleaner
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
 1 file changed, 72 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3..862f9a2 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -35,71 +35,104 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
+/*
+ * Finds a TCE table descriptor by LIOBN.
+ *
+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+		unsigned long liobn)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvmppc_spapr_tce_table *stt;
+
+	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
+		if (stt->liobn = liobn)
+			return stt;
+
+	return NULL;
+}
+
+/*
+ * Validates IO address.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages)
+{
+	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
+
+	if ((ioba & mask) || (idx + npages > size))
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba, unsigned long tce)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret;
+	unsigned long idx;
+	struct page *page;
+	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn = liobn) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
+	if (!stt)
+		return H_TOO_HARD;
 
-			/* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  window_size=0x%x\n", */
-			/* 	    liobn, stt, stt->window_size); */
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret != H_SUCCESS)
+		return ret;
 
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+	idx = ioba >> SPAPR_TCE_SHIFT;
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = (u64 *)page_address(page);
 
-			/* FIXME: Need to validate the TCE itself */
-			/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-			tbl[idx % TCES_PER_PAGE] = tce;
-			return H_SUCCESS;
-		}
-	}
+	/* FIXME: Need to validate the TCE itself */
+	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
+	tbl[idx % TCES_PER_PAGE] = tce;
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+	return H_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
-		      unsigned long ioba)
+		unsigned long ioba)
 {
-	struct kvm *kvm = vcpu->kvm;
-	struct kvmppc_spapr_tce_table *stt;
+	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
+	long ret;
+	unsigned long idx;
+	struct page *page;
+	u64 *tbl;
 
-	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt->liobn = liobn) {
-			unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
-			struct page *page;
-			u64 *tbl;
+	if (!stt)
+		return H_TOO_HARD;
 
-			if (ioba >= stt->window_size)
-				return H_PARAMETER;
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret != H_SUCCESS)
+		return ret;
 
-			page = stt->pages[idx / TCES_PER_PAGE];
-			tbl = (u64 *)page_address(page);
+	idx = ioba >> SPAPR_TCE_SHIFT;
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = (u64 *)page_address(page);
 
-			vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
-			return H_SUCCESS;
-		}
-	}
+	vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE];
 
-	/* Didn't find the liobn, punt it to userspace */
-	return H_TOO_HARD;
+	return H_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables
  2016-01-21  7:39 ` Alexey Kardashevskiy
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

At the moment spapr_tce_tables is not protected against races. This makes
use of RCU-variants of list helpers. As some bits are executed in real
mode, this makes use of just introduced list_for_each_entry_rcu_notrace().

This converts release_spapr_tce_table() to a RCU scheduled handler.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c    | 20 +++++++++++---------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 271fefb..c7ee696 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -184,6 +184,7 @@ struct kvmppc_spapr_tce_table {
 	struct kvm *kvm;
 	u64 liobn;
 	u32 window_size;
+	struct rcu_head rcu;
 	struct page *pages[0];
 };
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 638c6d9..b34220d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -807,7 +807,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 {
 
 #ifdef CONFIG_PPC64
-	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc..9526c34 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
-static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
+static void release_spapr_tce_table(struct rcu_head *head)
 {
-	struct kvm *kvm = stt->kvm;
+	struct kvmppc_spapr_tce_table *stt = container_of(head,
+			struct kvmppc_spapr_tce_table, rcu);
 	int i;
 
-	mutex_lock(&kvm->lock);
-	list_del(&stt->list);
 	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
 		__free_page(stt->pages[i]);
+
 	kfree(stt);
-	mutex_unlock(&kvm->lock);
-
-	kvm_put_kvm(kvm);
 }
 
 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
@@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 {
 	struct kvmppc_spapr_tce_table *stt = filp->private_data;
 
-	release_spapr_tce_table(stt);
+	list_del_rcu(&stt->list);
+
+	kvm_put_kvm(stt->kvm);
+
+	call_rcu(&stt->rcu, release_spapr_tce_table);
+
 	return 0;
 }
 
@@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	kvm_get_kvm(kvm);
 
 	mutex_lock(&kvm->lock);
-	list_add(&stt->list, &kvm->arch.spapr_tce_tables);
+	list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);
 
 	mutex_unlock(&kvm->lock);
 
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

At the moment spapr_tce_tables is not protected against races. This makes
use of RCU-variants of list helpers. As some bits are executed in real
mode, this makes use of just introduced list_for_each_entry_rcu_notrace().

This converts release_spapr_tce_table() to a RCU scheduled handler.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c    | 20 +++++++++++---------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 271fefb..c7ee696 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -184,6 +184,7 @@ struct kvmppc_spapr_tce_table {
 	struct kvm *kvm;
 	u64 liobn;
 	u32 window_size;
+	struct rcu_head rcu;
 	struct page *pages[0];
 };
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 638c6d9..b34220d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -807,7 +807,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 {
 
 #ifdef CONFIG_PPC64
-	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc..9526c34 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
-static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
+static void release_spapr_tce_table(struct rcu_head *head)
 {
-	struct kvm *kvm = stt->kvm;
+	struct kvmppc_spapr_tce_table *stt = container_of(head,
+			struct kvmppc_spapr_tce_table, rcu);
 	int i;
 
-	mutex_lock(&kvm->lock);
-	list_del(&stt->list);
 	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
 		__free_page(stt->pages[i]);
+
 	kfree(stt);
-	mutex_unlock(&kvm->lock);
-
-	kvm_put_kvm(kvm);
 }
 
 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
@@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 {
 	struct kvmppc_spapr_tce_table *stt = filp->private_data;
 
-	release_spapr_tce_table(stt);
+	list_del_rcu(&stt->list);
+
+	kvm_put_kvm(stt->kvm);
+
+	call_rcu(&stt->rcu, release_spapr_tce_table);
+
 	return 0;
 }
 
@@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	kvm_get_kvm(kvm);
 
 	mutex_lock(&kvm->lock);
-	list_add(&stt->list, &kvm->arch.spapr_tce_tables);
+	list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);
 
 	mutex_unlock(&kvm->lock);
 
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm
  2016-01-21  7:39 ` Alexey Kardashevskiy
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

At the moment pages used for TCE tables (in addition to pages addressed
by TCEs) are not counted in locked_vm counter so a malicious userspace
tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
lock a lot of memory.

This adds counting for pages used for TCE tables.

This counts the number of pages required for a table plus pages for
the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

This changes release_spapr_tce_table() to store @npages on stack to
avoid calling kvmppc_stt_npages() in the loop (tiny optimization,
probably).

This does not change the amount of (de)allocated memory.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* switched from long to unsigned long types
* added WARN_ON_ONCE() in locked_vm decrement case
---
 arch/powerpc/kvm/book3s_64_vio.c | 55 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9526c34..ea498b4 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -39,19 +39,62 @@
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
-static long kvmppc_stt_npages(unsigned long window_size)
+static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
 	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
+static long kvmppc_account_memlimit(unsigned long npages, bool inc)
+{
+	long ret = 0;
+	const unsigned long bytes = sizeof(struct kvmppc_spapr_tce_table) +
+			(npages * sizeof(struct page *));
+	const unsigned long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;
+
+	if (!current || !current->mm)
+		return ret; /* process exited */
+
+	npages += stt_pages;
+
+	down_write(&current->mm->mmap_sem);
+
+	if (inc) {
+		unsigned long locked, lock_limit;
+
+		locked = current->mm->locked_vm + npages;
+		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+			ret = -ENOMEM;
+		else
+			current->mm->locked_vm += npages;
+	} else {
+		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
+			npages = current->mm->locked_vm;
+
+		current->mm->locked_vm -= npages;
+	}
+
+	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
+			inc ? '+' : '-',
+			npages << PAGE_SHIFT,
+			current->mm->locked_vm << PAGE_SHIFT,
+			rlimit(RLIMIT_MEMLOCK),
+			ret ? " - exceeded" : "");
+
+	up_write(&current->mm->mmap_sem);
+
+	return ret;
+}
+
 static void release_spapr_tce_table(struct rcu_head *head)
 {
 	struct kvmppc_spapr_tce_table *stt = container_of(head,
 			struct kvmppc_spapr_tce_table, rcu);
 	int i;
+	unsigned long npages = kvmppc_stt_npages(stt->window_size);
 
-	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+	for (i = 0; i < npages; i++)
 		__free_page(stt->pages[i]);
 
 	kfree(stt);
@@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 
 	kvm_put_kvm(stt->kvm);
 
+	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
 	call_rcu(&stt->rcu, release_spapr_tce_table);
 
 	return 0;
@@ -103,7 +147,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				   struct kvm_create_spapr_tce *args)
 {
 	struct kvmppc_spapr_tce_table *stt = NULL;
-	long npages;
+	unsigned long npages;
 	int ret = -ENOMEM;
 	int i;
 
@@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	}
 
 	npages = kvmppc_stt_npages(args->window_size);
+	ret = kvmppc_account_memlimit(npages, true);
+	if (ret) {
+		stt = NULL;
+		goto fail;
+	}
 
 	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
 		      GFP_KERNEL);
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

At the moment pages used for TCE tables (in addition to pages addressed
by TCEs) are not counted in locked_vm counter so a malicious userspace
tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
lock a lot of memory.

This adds counting for pages used for TCE tables.

This counts the number of pages required for a table plus pages for
the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

This changes release_spapr_tce_table() to store @npages on stack to
avoid calling kvmppc_stt_npages() in the loop (tiny optimization,
probably).

This does not change the amount of (de)allocated memory.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* switched from long to unsigned long types
* added WARN_ON_ONCE() in locked_vm decrement case
---
 arch/powerpc/kvm/book3s_64_vio.c | 55 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9526c34..ea498b4 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -39,19 +39,62 @@
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
-static long kvmppc_stt_npages(unsigned long window_size)
+static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
 	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
+static long kvmppc_account_memlimit(unsigned long npages, bool inc)
+{
+	long ret = 0;
+	const unsigned long bytes = sizeof(struct kvmppc_spapr_tce_table) +
+			(npages * sizeof(struct page *));
+	const unsigned long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;
+
+	if (!current || !current->mm)
+		return ret; /* process exited */
+
+	npages += stt_pages;
+
+	down_write(&current->mm->mmap_sem);
+
+	if (inc) {
+		unsigned long locked, lock_limit;
+
+		locked = current->mm->locked_vm + npages;
+		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+			ret = -ENOMEM;
+		else
+			current->mm->locked_vm += npages;
+	} else {
+		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
+			npages = current->mm->locked_vm;
+
+		current->mm->locked_vm -= npages;
+	}
+
+	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
+			inc ? '+' : '-',
+			npages << PAGE_SHIFT,
+			current->mm->locked_vm << PAGE_SHIFT,
+			rlimit(RLIMIT_MEMLOCK),
+			ret ? " - exceeded" : "");
+
+	up_write(&current->mm->mmap_sem);
+
+	return ret;
+}
+
 static void release_spapr_tce_table(struct rcu_head *head)
 {
 	struct kvmppc_spapr_tce_table *stt = container_of(head,
 			struct kvmppc_spapr_tce_table, rcu);
 	int i;
+	unsigned long npages = kvmppc_stt_npages(stt->window_size);
 
-	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+	for (i = 0; i < npages; i++)
 		__free_page(stt->pages[i]);
 
 	kfree(stt);
@@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 
 	kvm_put_kvm(stt->kvm);
 
+	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
 	call_rcu(&stt->rcu, release_spapr_tce_table);
 
 	return 0;
@@ -103,7 +147,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				   struct kvm_create_spapr_tce *args)
 {
 	struct kvmppc_spapr_tce_table *stt = NULL;
-	long npages;
+	unsigned long npages;
 	int ret = -ENOMEM;
 	int i;
 
@@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	}
 
 	npages = kvmppc_stt_npages(args->window_size);
+	ret = kvmppc_account_memlimit(npages, true);
+	if (ret) {
+		stt = NULL;
+		goto fail;
+	}
 
 	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
 		      GFP_KERNEL);
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 4/6] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
  2016-01-21  7:39 ` Alexey Kardashevskiy
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K
can be easily used instead, remove SPAPR_TCE_SHIFT.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 2 --
 arch/powerpc/kvm/book3s_64_vio.c         | 3 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2aa79c8..7529aab 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
 }
 #endif
 
-#define SPAPR_TCE_SHIFT		12
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
 #endif
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index ea498b4..975f0ab 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -36,12 +36,13 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
 static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
-	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
+	return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 862f9a2..e142171 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	idx = ioba >> SPAPR_TCE_SHIFT;
+	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
 
@@ -127,7 +127,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	idx = ioba >> SPAPR_TCE_SHIFT;
+	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
 
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 4/6] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

SPAPR_TCE_SHIFT is used in few places only and since IOMMU_PAGE_SHIFT_4K
can be easily used instead, remove SPAPR_TCE_SHIFT.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 2 --
 arch/powerpc/kvm/book3s_64_vio.c         | 3 ++-
 arch/powerpc/kvm/book3s_64_vio_hv.c      | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2aa79c8..7529aab 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -33,8 +33,6 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
 }
 #endif
 
-#define SPAPR_TCE_SHIFT		12
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
 #endif
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index ea498b4..975f0ab 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -36,12 +36,13 @@
 #include <asm/ppc-opcode.h>
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
 static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
-	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
+	return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 862f9a2..e142171 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -99,7 +99,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	idx = ioba >> SPAPR_TCE_SHIFT;
+	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
 
@@ -127,7 +127,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	idx = ioba >> SPAPR_TCE_SHIFT;
+	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
 
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  2016-01-21  7:39 ` Alexey Kardashevskiy
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
will validate TCE (not to have unexpected bits) and IO address
(to be within the DMA window boundaries).

This introduces helpers to validate TCE and IO address. The helpers are
exported as they compile into vmlinux (to work in realmode) and will be
used later by KVM kernel module in virtual mode.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* added note to the commit log about why new helpers are exported
* did not add a note that xxx_validate() validate TCEs for KVM (not for
host kernel DMA) as the helper names and file location tell what are
they for
---
 arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 92 ++++++++++++++++++++++++++++++++-----
 2 files changed, 84 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2241d53..9513911 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages);
+extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
+		unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index e142171..8cd3a95 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -36,6 +36,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
+#include <asm/tce.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
  * WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages)
 {
-	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+	unsigned long mask = IOMMU_PAGE_MASK_4K;
 	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
 
-	if ((ioba & mask) || (idx + npages > size))
+	if ((ioba & ~mask) || (idx + npages > size))
 		return H_PARAMETER;
 
 	return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
+
+/*
+ * Validates TCE address.
+ * At the moment flags and page mask are validated.
+ * As the host kernel does not access those addresses (just puts them
+ * to the table and user space is supposed to process them), we can skip
+ * checking other things (such as TCE is a guest RAM address or the page
+ * was actually allocated).
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+{
+	unsigned long mask = IOMMU_PAGE_MASK_4K | TCE_PCI_WRITE | TCE_PCI_READ;
+
+	if (tce & ~mask)
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
+
+/* Note on the use of page_address() in real mode,
+ *
+ * It is safe to use page_address() in real mode on ppc64 because
+ * page_address() is always defined as lowmem_page_address()
+ * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
+ * operation and does not access page struct.
+ *
+ * Theoretically page_address() could be defined different
+ * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
+ * should be enabled.
+ * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
+ * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
+ * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
+ * is not expected to be enabled on ppc32, page_address()
+ * is safe for ppc32 as well.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static u64 *kvmppc_page_address(struct page *page)
+{
+#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
+#error TODO: fix to avoid page_address() here
+#endif
+	return (u64 *) page_address(page);
+}
+
+/*
+ * Handles TCE requests for emulated devices.
+ * Puts guest TCE values to the table and expects user space to convert them.
+ * Called in both real and virtual modes.
+ * Cannot fail so kvmppc_tce_validate must be called before it.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
+		unsigned long idx, unsigned long tce)
+{
+	struct page *page;
+	u64 *tbl;
+
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = kvmppc_page_address(page);
+
+	tbl[idx % TCES_PER_PAGE] = tce;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
@@ -85,9 +158,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret;
-	unsigned long idx;
-	struct page *page;
-	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
@@ -99,13 +169,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
-	page = stt->pages[idx / TCES_PER_PAGE];
-	tbl = (u64 *)page_address(page);
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret != H_SUCCESS)
+		return ret;
 
-	/* FIXME: Need to validate the TCE itself */
-	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-	tbl[idx % TCES_PER_PAGE] = tce;
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
 
 	return H_SUCCESS;
 }
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
will validate TCE (not to have unexpected bits) and IO address
(to be within the DMA window boundaries).

This introduces helpers to validate TCE and IO address. The helpers are
exported as they compile into vmlinux (to work in realmode) and will be
used later by KVM kernel module in virtual mode.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* added note to the commit log about why new helpers are exported
* did not add a note that xxx_validate() validate TCEs for KVM (not for
host kernel DMA) as the helper names and file location tell what are
they for
---
 arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 92 ++++++++++++++++++++++++++++++++-----
 2 files changed, 84 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2241d53..9513911 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long ioba, unsigned long npages);
+extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
+		unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index e142171..8cd3a95 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -36,6 +36,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
+#include <asm/tce.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
  * WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
+long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages)
 {
-	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
+	unsigned long mask = IOMMU_PAGE_MASK_4K;
 	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
 
-	if ((ioba & mask) || (idx + npages > size))
+	if ((ioba & ~mask) || (idx + npages > size))
 		return H_PARAMETER;
 
 	return H_SUCCESS;
 }
+EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
+
+/*
+ * Validates TCE address.
+ * At the moment flags and page mask are validated.
+ * As the host kernel does not access those addresses (just puts them
+ * to the table and user space is supposed to process them), we can skip
+ * checking other things (such as TCE is a guest RAM address or the page
+ * was actually allocated).
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+{
+	unsigned long mask = IOMMU_PAGE_MASK_4K | TCE_PCI_WRITE | TCE_PCI_READ;
+
+	if (tce & ~mask)
+		return H_PARAMETER;
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
+
+/* Note on the use of page_address() in real mode,
+ *
+ * It is safe to use page_address() in real mode on ppc64 because
+ * page_address() is always defined as lowmem_page_address()
+ * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
+ * operation and does not access page struct.
+ *
+ * Theoretically page_address() could be defined different
+ * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
+ * should be enabled.
+ * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
+ * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
+ * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
+ * is not expected to be enabled on ppc32, page_address()
+ * is safe for ppc32 as well.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+static u64 *kvmppc_page_address(struct page *page)
+{
+#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
+#error TODO: fix to avoid page_address() here
+#endif
+	return (u64 *) page_address(page);
+}
+
+/*
+ * Handles TCE requests for emulated devices.
+ * Puts guest TCE values to the table and expects user space to convert them.
+ * Called in both real and virtual modes.
+ * Cannot fail so kvmppc_tce_validate must be called before it.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ *          mode on PR KVM
+ */
+void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
+		unsigned long idx, unsigned long tce)
+{
+	struct page *page;
+	u64 *tbl;
+
+	page = stt->pages[idx / TCES_PER_PAGE];
+	tbl = kvmppc_page_address(page);
+
+	tbl[idx % TCES_PER_PAGE] = tce;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
 /* WARNING: This will be called in real-mode on HV KVM and virtual
  *          mode on PR KVM
@@ -85,9 +158,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret;
-	unsigned long idx;
-	struct page *page;
-	u64 *tbl;
 
 	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
 	/* 	    liobn, ioba, tce); */
@@ -99,13 +169,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
-	page = stt->pages[idx / TCES_PER_PAGE];
-	tbl = (u64 *)page_address(page);
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret != H_SUCCESS)
+		return ret;
 
-	/* FIXME: Need to validate the TCE itself */
-	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
-	tbl[idx % TCES_PER_PAGE] = tce;
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
 
 	return H_SUCCESS;
 }
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-01-21  7:39 ` Alexey Kardashevskiy
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI.  These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition between kernel and user space.

This implements the KVM_CAP_PPC_MULTITCE capability. When present,
the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
If they can not be handled by the kernel, they are passed on to
the user space. The user space still has to have an implementation
for these.

Both HV and PR-syle KVM are supported.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
* s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
---
 Documentation/virtual/kvm/api.txt       |  25 ++++++
 arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
 arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
 arch/powerpc/kvm/powerpc.c              |   3 +
 8 files changed, 349 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 07e4cdf..da39435 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
 
 Queues an SMI on the thread's vcpu.
 
+4.97 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability means the kernel is capable of handling hypercalls
+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
+space. This significantly accelerates DMA operations for PPC KVM guests.
+User space should expect that its handlers for these hypercalls
+are not going to be called if user space previously registered LIOBN
+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
+
+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+The hypercalls mentioned above may or may not be processed successfully
+in the kernel based fast path. If they can not be handled by the kernel,
+they will get passed on to user space. So user space still has to have
+an implementation for these despite the in kernel acceleration.
+
+This capability is always enabled.
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 9513911..4cadee5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
+		struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages);
 extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
 		unsigned long tce);
+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap);
+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+		unsigned long idx, unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages);
+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba);
 extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 975f0ab..987f406 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -37,8 +38,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
-
-#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#include <asm/tce.h>
 
 static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
@@ -200,3 +200,109 @@ fail:
 	}
 	return ret;
 }
+
+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce)
+{
+	long ret;
+	struct kvmppc_spapr_tce_table *stt;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS, idx;
+	unsigned long entry, ua = 0;
+	u64 __user *tces, tce;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * SPAPR spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table fits in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & (SZ_4K - 1))
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+	tces = (u64 __user *) ua;
+
+	for (i = 0; i < npages; ++i) {
+		if (get_user(tce, tces + i)) {
+			ret = H_PARAMETER;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret != H_SUCCESS)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
+
+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 8cd3a95..58c63ed 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -30,6 +31,7 @@
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
 #include <asm/mmu-hash64.h>
+#include <asm/mmu_context.h>
 #include <asm/hvcall.h>
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
@@ -37,6 +39,7 @@
 #include <asm/udbg.h>
 #include <asm/iommu.h>
 #include <asm/tce.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -46,7 +49,7 @@
  * WARNING: This will be called in real or virtual mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 		unsigned long liobn)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvmppc_find_table);
 
 /*
  * Validates IO address.
@@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 }
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
-/* WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
- */
-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
-		      unsigned long ioba, unsigned long tce)
+long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap)
+{
+	unsigned long gfn = gpa >> PAGE_SHIFT;
+	struct kvm_memory_slot *memslot;
+
+	memslot = search_memslots(kvm_memslots(kvm), gfn);
+	if (!memslot)
+		return -EINVAL;
+
+	*ua = __gfn_to_hva_memslot(memslot, gfn) |
+		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	if (prmap)
+		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
+#endif
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
+		unsigned long ioba, unsigned long tce)
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret;
@@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
+		unsigned long ua, unsigned long *phpa)
+{
+	pte_t *ptep, pte;
+	unsigned shift = 0;
+
+	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
+	if (!ptep || !pte_present(*ptep))
+		return -ENXIO;
+	pte = *ptep;
+
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* Avoid handling anything potentially complicated in realmode */
+	if (shift > PAGE_SHIFT)
+		return -EAGAIN;
+
+	if (!pte_young(pte))
+		return -EAGAIN;
+
+	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
+			(ua & ~PAGE_MASK);
+
+	return 0;
+}
+
+long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list,	unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS;
+	unsigned long tces, entry, ua = 0;
+	unsigned long *rmap = NULL;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * The spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table addressed resides in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & (SZ_4K - 1))
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
+		return H_TOO_HARD;
+
+	rmap = (void *) vmalloc_to_phys(rmap);
+
+	lock_rmap(rmap);
+	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret != H_SUCCESS)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	unlock_rmap(rmap);
+
+	return ret;
+}
+
+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		unsigned long ioba)
@@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	return H_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
+
+#endif /* KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cff207b..df3fbae 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		if (kvmppc_xics_enabled(vcpu)) {
 			ret = kvmppc_xics_hcall(vcpu, req);
 			break;
-		} /* fallthrough */
+		}
+		return RESUME_HOST;
+	case H_PUT_TCE:
+		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6));
+		if (ret == H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_PUT_TCE_INDIRECT:
+		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret == H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_STUFF_TCE:
+		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret == H_TOO_HARD)
+			return RESUME_HOST;
+		break;
 	default:
 		return RESUME_HOST;
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..3bf6e72 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1928,7 +1928,7 @@ hcall_real_table:
 	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
-	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
 	.long	0		/* 0x24 - H_SET_SPRG0 */
 	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
 	.long	0		/* 0x2c */
@@ -2006,8 +2006,8 @@ hcall_real_table:
 	.long	0		/* 0x12c */
 	.long	0		/* 0x130 */
 	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
-	.long	0		/* 0x138 */
-	.long	0		/* 0x13c */
+	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
 	.long	0		/* 0x140 */
 	.long	0		/* 0x144 */
 	.long	0		/* 0x148 */
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index f2c75a1..02176fd 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
 	return EMULATE_DONE;
 }
 
+static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
+			tce, npages);
+	if (rc == H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
+static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
+	if (rc == H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
 static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
 {
 	long rc = kvmppc_xics_hcall(vcpu, cmd);
@@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		return kvmppc_h_pr_bulk_remove(vcpu);
 	case H_PUT_TCE:
 		return kvmppc_h_pr_put_tce(vcpu);
+	case H_PUT_TCE_INDIRECT:
+		return kvmppc_h_pr_put_tce_indirect(vcpu);
+	case H_STUFF_TCE:
+		return kvmppc_h_pr_stuff_tce(vcpu);
 	case H_CEDE:
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_block(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6fd2405..164735c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PPC_GET_SMMU_INFO:
 		r = 1;
 		break;
+	case KVM_CAP_SPAPR_MULTITCE:
+		r = 1;
+		break;
 #endif
 	default:
 		r = 0;
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-01-21  7:39   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  7:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Paul Mackerras, David Gibson, kvm-ppc, kvm

This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI.  These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition between kernel and user space.

This implements the KVM_CAP_PPC_MULTITCE capability. When present,
the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
If they can not be handled by the kernel, they are passed on to
the user space. The user space still has to have an implementation
for these.

Both HV and PR-syle KVM are supported.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
* s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
---
 Documentation/virtual/kvm/api.txt       |  25 ++++++
 arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
 arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
 arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
 arch/powerpc/kvm/powerpc.c              |   3 +
 8 files changed, 349 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 07e4cdf..da39435 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
 
 Queues an SMI on the thread's vcpu.
 
+4.97 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability means the kernel is capable of handling hypercalls
+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
+space. This significantly accelerates DMA operations for PPC KVM guests.
+User space should expect that its handlers for these hypercalls
+are not going to be called if user space previously registered LIOBN
+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
+
+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+The hypercalls mentioned above may or may not be processed successfully
+in the kernel based fast path. If they can not be handled by the kernel,
+they will get passed on to user space. So user space still has to have
+an implementation for these despite the in kernel acceleration.
+
+This capability is always enabled.
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 9513911..4cadee5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
+		struct kvm_vcpu *vcpu, unsigned long liobn);
 extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages);
 extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
 		unsigned long tce);
+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap);
+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+		unsigned long idx, unsigned long tce);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba, unsigned long tce);
+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages);
+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages);
 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			     unsigned long ioba);
 extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 975f0ab..987f406 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -37,8 +38,7 @@
 #include <asm/kvm_host.h>
 #include <asm/udbg.h>
 #include <asm/iommu.h>
-
-#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#include <asm/tce.h>
 
 static unsigned long kvmppc_stt_npages(unsigned long window_size)
 {
@@ -200,3 +200,109 @@ fail:
 	}
 	return ret;
 }
+
+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce)
+{
+	long ret;
+	struct kvmppc_spapr_tce_table *stt;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, 1);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	ret = kvmppc_tce_validate(stt, tce);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS, idx;
+	unsigned long entry, ua = 0;
+	u64 __user *tces, tce;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * SPAPR spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table fits in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & (SZ_4K - 1))
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+	tces = (u64 __user *) ua;
+
+	for (i = 0; i < npages; ++i) {
+		if (get_user(tce, tces + i)) {
+			ret = H_PARAMETER;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret != H_SUCCESS)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
+
+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 8cd3a95..58c63ed 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -14,6 +14,7 @@
  *
  * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
  * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
  */
 
 #include <linux/types.h>
@@ -30,6 +31,7 @@
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
 #include <asm/mmu-hash64.h>
+#include <asm/mmu_context.h>
 #include <asm/hvcall.h>
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
@@ -37,6 +39,7 @@
 #include <asm/udbg.h>
 #include <asm/iommu.h>
 #include <asm/tce.h>
+#include <asm/iommu.h>
 
 #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
 
@@ -46,7 +49,7 @@
  * WARNING: This will be called in real or virtual mode on HV KVM and virtual
  *          mode on PR KVM
  */
-static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
+struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 		unsigned long liobn)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvmppc_find_table);
 
 /*
  * Validates IO address.
@@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 }
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
-/* WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
- */
-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
-		      unsigned long ioba, unsigned long tce)
+long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
+		unsigned long *ua, unsigned long **prmap)
+{
+	unsigned long gfn = gpa >> PAGE_SHIFT;
+	struct kvm_memory_slot *memslot;
+
+	memslot = search_memslots(kvm_memslots(kvm), gfn);
+	if (!memslot)
+		return -EINVAL;
+
+	*ua = __gfn_to_hva_memslot(memslot, gfn) |
+		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	if (prmap)
+		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
+#endif
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
+		unsigned long ioba, unsigned long tce)
 {
 	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
 	long ret;
@@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
+
+static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
+		unsigned long ua, unsigned long *phpa)
+{
+	pte_t *ptep, pte;
+	unsigned shift = 0;
+
+	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
+	if (!ptep || !pte_present(*ptep))
+		return -ENXIO;
+	pte = *ptep;
+
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* Avoid handling anything potentially complicated in realmode */
+	if (shift > PAGE_SHIFT)
+		return -EAGAIN;
+
+	if (!pte_young(pte))
+		return -EAGAIN;
+
+	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
+			(ua & ~PAGE_MASK);
+
+	return 0;
+}
+
+long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_list,	unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret = H_SUCCESS;
+	unsigned long tces, entry, ua = 0;
+	unsigned long *rmap = NULL;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
+	/*
+	 * The spec says that the maximum size of the list is 512 TCEs
+	 * so the whole table addressed resides in 4K page
+	 */
+	if (npages > 512)
+		return H_PARAMETER;
+
+	if (tce_list & (SZ_4K - 1))
+		return H_PARAMETER;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
+		return H_TOO_HARD;
+
+	rmap = (void *) vmalloc_to_phys(rmap);
+
+	lock_rmap(rmap);
+	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
+		ret = H_TOO_HARD;
+		goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret != H_SUCCESS)
+			goto unlock_exit;
+
+		kvmppc_tce_put(stt, entry + i, tce);
+	}
+
+unlock_exit:
+	unlock_rmap(rmap);
+
+	return ret;
+}
+
+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	struct kvmppc_spapr_tce_table *stt;
+	long i, ret;
+
+	stt = kvmppc_find_table(vcpu, liobn);
+	if (!stt)
+		return H_TOO_HARD;
+
+	ret = kvmppc_ioba_validate(stt, ioba, npages);
+	if (ret != H_SUCCESS)
+		return ret;
+
+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
+		return H_PARAMETER;
+
+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+
+	return H_SUCCESS;
+}
 
 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		unsigned long ioba)
@@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	return H_SUCCESS;
 }
 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
+
+#endif /* KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cff207b..df3fbae 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		if (kvmppc_xics_enabled(vcpu)) {
 			ret = kvmppc_xics_hcall(vcpu, req);
 			break;
-		} /* fallthrough */
+		}
+		return RESUME_HOST;
+	case H_PUT_TCE:
+		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6));
+		if (ret = H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_PUT_TCE_INDIRECT:
+		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret = H_TOO_HARD)
+			return RESUME_HOST;
+		break;
+	case H_STUFF_TCE:
+		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+						kvmppc_get_gpr(vcpu, 5),
+						kvmppc_get_gpr(vcpu, 6),
+						kvmppc_get_gpr(vcpu, 7));
+		if (ret = H_TOO_HARD)
+			return RESUME_HOST;
+		break;
 	default:
 		return RESUME_HOST;
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..3bf6e72 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1928,7 +1928,7 @@ hcall_real_table:
 	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
-	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
 	.long	0		/* 0x24 - H_SET_SPRG0 */
 	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
 	.long	0		/* 0x2c */
@@ -2006,8 +2006,8 @@ hcall_real_table:
 	.long	0		/* 0x12c */
 	.long	0		/* 0x130 */
 	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
-	.long	0		/* 0x138 */
-	.long	0		/* 0x13c */
+	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
 	.long	0		/* 0x140 */
 	.long	0		/* 0x144 */
 	.long	0		/* 0x148 */
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index f2c75a1..02176fd 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
 	return EMULATE_DONE;
 }
 
+static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
+			tce, npages);
+	if (rc = H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
+static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
+{
+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+	long rc;
+
+	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
+	if (rc = H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
 static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
 {
 	long rc = kvmppc_xics_hcall(vcpu, cmd);
@@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		return kvmppc_h_pr_bulk_remove(vcpu);
 	case H_PUT_TCE:
 		return kvmppc_h_pr_put_tce(vcpu);
+	case H_PUT_TCE_INDIRECT:
+		return kvmppc_h_pr_put_tce_indirect(vcpu);
+	case H_STUFF_TCE:
+		return kvmppc_h_pr_stuff_tce(vcpu);
 	case H_CEDE:
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_block(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6fd2405..164735c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PPC_GET_SMMU_INFO:
 		r = 1;
 		break;
+	case KVM_CAP_SPAPR_MULTITCE:
+		r = 1;
+		break;
 #endif
 	default:
 		r = 0;
-- 
2.5.0.rc3


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-01-21  7:56     ` kbuild test robot
  -1 siblings, 0 replies; 48+ messages in thread
From: kbuild test robot @ 2016-01-21  7:56 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kbuild-all, linuxppc-dev, Alexey Kardashevskiy, Paul Mackerras,
	David Gibson, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 2591 bytes --]

Hi Alexey,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.4 next-20160121]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/KVM-PPC-Add-in-kernel-multitce-handling/20160121-154336
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: powerpc-allyesconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_find_table':
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:2: error: implicit declaration of function 'list_for_each_entry_lockless' [-Werror=implicit-function-declaration]
     list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
     ^
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: error: 'list' undeclared (first use in this function)
     list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
                                                                    ^
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: note: each undeclared identifier is reported only once for each function it appears in
   arch/powerpc/kvm/book3s_64_vio_hv.c:59:3: error: expected ';' before 'if'
      if (stt->liobn == liobn)
      ^
   arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_rm_h_put_tce_indirect':
>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:18: error: implicit declaration of function 'vmalloc_to_phys' [-Werror=implicit-function-declaration]
     rmap = (void *) vmalloc_to_phys(rmap);
                     ^
>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     rmap = (void *) vmalloc_to_phys(rmap);
            ^
   cc1: some warnings being treated as errors

vim +/vmalloc_to_phys +263 arch/powerpc/kvm/book3s_64_vio_hv.c

   257		if (ret != H_SUCCESS)
   258			return ret;
   259	
   260		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
   261			return H_TOO_HARD;
   262	
 > 263		rmap = (void *) vmalloc_to_phys(rmap);
   264	
   265		lock_rmap(rmap);
   266		if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 47169 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-01-21  7:56     ` kbuild test robot
  0 siblings, 0 replies; 48+ messages in thread
From: kbuild test robot @ 2016-01-21  7:56 UTC (permalink / raw)
  To: kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 2591 bytes --]

Hi Alexey,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.4 next-20160121]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/KVM-PPC-Add-in-kernel-multitce-handling/20160121-154336
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: powerpc-allyesconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_find_table':
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:2: error: implicit declaration of function 'list_for_each_entry_lockless' [-Werror=implicit-function-declaration]
     list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
     ^
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: error: 'list' undeclared (first use in this function)
     list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
                                                                    ^
   arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: note: each undeclared identifier is reported only once for each function it appears in
   arch/powerpc/kvm/book3s_64_vio_hv.c:59:3: error: expected ';' before 'if'
      if (stt->liobn == liobn)
      ^
   arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_rm_h_put_tce_indirect':
>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:18: error: implicit declaration of function 'vmalloc_to_phys' [-Werror=implicit-function-declaration]
     rmap = (void *) vmalloc_to_phys(rmap);
                     ^
>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     rmap = (void *) vmalloc_to_phys(rmap);
            ^
   cc1: some warnings being treated as errors

vim +/vmalloc_to_phys +263 arch/powerpc/kvm/book3s_64_vio_hv.c

   257		if (ret != H_SUCCESS)
   258			return ret;
   259	
   260		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
   261			return H_TOO_HARD;
   262	
 > 263		rmap = (void *) vmalloc_to_phys(rmap);
   264	
   265		lock_rmap(rmap);
   266		if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 47169 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-01-21  7:56     ` kbuild test robot
@ 2016-01-21  8:09       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  8:09 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linuxppc-dev, Paul Mackerras, David Gibson, kvm-ppc, kvm

Right, this also depends on

69b907297f4e list: Add lockless list traversal primitives



On 01/21/2016 06:56 PM, kbuild test robot wrote:
> Hi Alexey,
>
> [auto build test ERROR on kvm/linux-next]
> [also build test ERROR on v4.4 next-20160121]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
>
> url:    https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/KVM-PPC-Add-in-kernel-multitce-handling/20160121-154336
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
> config: powerpc-allyesconfig (attached as .config)
> reproduce:
>          wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          make.cross ARCH=powerpc
>
> All error/warnings (new ones prefixed by >>):
>
>     arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_find_table':
>     arch/powerpc/kvm/book3s_64_vio_hv.c:58:2: error: implicit declaration of function 'list_for_each_entry_lockless' [-Werror=implicit-function-declaration]
>       list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
>       ^
>     arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: error: 'list' undeclared (first use in this function)
>       list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
>                                                                      ^
>     arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: note: each undeclared identifier is reported only once for each function it appears in
>     arch/powerpc/kvm/book3s_64_vio_hv.c:59:3: error: expected ';' before 'if'
>        if (stt->liobn == liobn)
>        ^
>     arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_rm_h_put_tce_indirect':
>>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:18: error: implicit declaration of function 'vmalloc_to_phys' [-Werror=implicit-function-declaration]
>       rmap = (void *) vmalloc_to_phys(rmap);
>                       ^
>>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>       rmap = (void *) vmalloc_to_phys(rmap);
>              ^
>     cc1: some warnings being treated as errors
>
> vim +/vmalloc_to_phys +263 arch/powerpc/kvm/book3s_64_vio_hv.c
>
>     257		if (ret != H_SUCCESS)
>     258			return ret;
>     259	
>     260		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>     261			return H_TOO_HARD;
>     262	
>   > 263		rmap = (void *) vmalloc_to_phys(rmap);
>     264	
>     265		lock_rmap(rmap);
>     266		if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-01-21  8:09       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-21  8:09 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linuxppc-dev, Paul Mackerras, David Gibson, kvm-ppc, kvm

Right, this also depends on

69b907297f4e list: Add lockless list traversal primitives



On 01/21/2016 06:56 PM, kbuild test robot wrote:
> Hi Alexey,
>
> [auto build test ERROR on kvm/linux-next]
> [also build test ERROR on v4.4 next-20160121]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
>
> url:    https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/KVM-PPC-Add-in-kernel-multitce-handling/20160121-154336
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
> config: powerpc-allyesconfig (attached as .config)
> reproduce:
>          wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          make.cross ARCH=powerpc
>
> All error/warnings (new ones prefixed by >>):
>
>     arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_find_table':
>     arch/powerpc/kvm/book3s_64_vio_hv.c:58:2: error: implicit declaration of function 'list_for_each_entry_lockless' [-Werror=implicit-function-declaration]
>       list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
>       ^
>     arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: error: 'list' undeclared (first use in this function)
>       list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
>                                                                      ^
>     arch/powerpc/kvm/book3s_64_vio_hv.c:58:65: note: each undeclared identifier is reported only once for each function it appears in
>     arch/powerpc/kvm/book3s_64_vio_hv.c:59:3: error: expected ';' before 'if'
>        if (stt->liobn = liobn)
>        ^
>     arch/powerpc/kvm/book3s_64_vio_hv.c: In function 'kvmppc_rm_h_put_tce_indirect':
>>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:18: error: implicit declaration of function 'vmalloc_to_phys' [-Werror=implicit-function-declaration]
>       rmap = (void *) vmalloc_to_phys(rmap);
>                       ^
>>> arch/powerpc/kvm/book3s_64_vio_hv.c:263:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>       rmap = (void *) vmalloc_to_phys(rmap);
>              ^
>     cc1: some warnings being treated as errors
>
> vim +/vmalloc_to_phys +263 arch/powerpc/kvm/book3s_64_vio_hv.c
>
>     257		if (ret != H_SUCCESS)
>     258			return ret;
>     259	
>     260		if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>     261			return H_TOO_HARD;
>     262	
>   > 263		rmap = (void *) vmalloc_to_phys(rmap);
>     264	
>     265		lock_rmap(rmap);
>     266		if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-01-22  0:42     ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-22  0:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 2807 bytes --]

On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
> patches applied nicer.
> 
> This moves the ioba boundaries check to a helper and adds a check for
> least bits which have to be zeros.
> 
> The patch is pretty mechanical (only check for least ioba bits is added)
> so no change in behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Concept looks good, but there are a couple of nits.

> ---
> Changelog:
> v2:
> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> * made error reporting cleaner
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
>  1 file changed, 72 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 89e96b3..862f9a2 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -35,71 +35,104 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> +/*
> + * Finds a TCE table descriptor by LIOBN.
> + *
> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +		unsigned long liobn)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)

list_for_each_entry_lockless?  According to the comments in the
header, that's for RCU protected lists, whereas this one is just
protected by the lock in the kvm structure.  This is replacing a plain
list_for_each_entry().


> +		if (stt->liobn == liobn)
> +			return stt;
> +
> +	return NULL;
> +}
> +
> +/*
> + * Validates IO address.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages)
> +{
> +	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> +	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> +
> +	if ((ioba & mask) || (idx + npages > size))

It doesn't matter for the current callers, but you should check for
overflow in idx + npages as well.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
@ 2016-01-22  0:42     ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-22  0:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 2807 bytes --]

On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
> patches applied nicer.
> 
> This moves the ioba boundaries check to a helper and adds a check for
> least bits which have to be zeros.
> 
> The patch is pretty mechanical (only check for least ioba bits is added)
> so no change in behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Concept looks good, but there are a couple of nits.

> ---
> Changelog:
> v2:
> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> * made error reporting cleaner
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
>  1 file changed, 72 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 89e96b3..862f9a2 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -35,71 +35,104 @@
>  #include <asm/ppc-opcode.h>
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> +/*
> + * Finds a TCE table descriptor by LIOBN.
> + *
> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +		unsigned long liobn)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)

list_for_each_entry_lockless?  According to the comments in the
header, that's for RCU protected lists, whereas this one is just
protected by the lock in the kvm structure.  This is replacing a plain
list_for_each_entry().


> +		if (stt->liobn == liobn)
> +			return stt;
> +
> +	return NULL;
> +}
> +
> +/*
> + * Validates IO address.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages)
> +{
> +	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> +	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> +
> +	if ((ioba & mask) || (idx + npages > size))

It doesn't matter for the current callers, but you should check for
overflow in idx + npages as well.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  2016-01-22  0:42     ` David Gibson
@ 2016-01-22  1:59       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-22  1:59 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

On 01/22/2016 11:42 AM, David Gibson wrote:
> On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
>> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
>> patches applied nicer.
>>
>> This moves the ioba boundaries check to a helper and adds a check for
>> least bits which have to be zeros.
>>
>> The patch is pretty mechanical (only check for least ioba bits is added)
>> so no change in behaviour is expected.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Concept looks good, but there are a couple of nits.
>
>> ---
>> Changelog:
>> v2:
>> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
>> * made error reporting cleaner
>> ---
>>   arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
>>   1 file changed, 72 insertions(+), 39 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 89e96b3..862f9a2 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -35,71 +35,104 @@
>>   #include <asm/ppc-opcode.h>
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>> +#include <asm/iommu.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> +/*
>> + * Finds a TCE table descriptor by LIOBN.
>> + *
>> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
>
> list_for_each_entry_lockless?  According to the comments in the
> header, that's for RCU protected lists, whereas this one is just
> protected by the lock in the kvm structure.  This is replacing a plain
> list_for_each_entry().

My bad, the next patch should have done this
s/list_for_each_entry/list_for_each_entry_lockless/


>
>
>> +		if (stt->liobn == liobn)
>> +			return stt;
>> +
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * Validates IO address.
>> + *
>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>> +		unsigned long ioba, unsigned long npages)
>> +{
>> +	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
>> +	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
>> +
>> +	if ((ioba & mask) || (idx + npages > size))
>
> It doesn't matter for the current callers, but you should check for
> overflow in idx + npages as well.


npages can be only 1..512 and this is checked in H_PUT_TCE/etc handlers.
idx is 52bit long max.
And this is not going to change because H_PUT_TCE_INDIRECT will always be 
limited by 512 (or one 4K page).

Do I still need the overflow check here?


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
@ 2016-01-22  1:59       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-22  1:59 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

On 01/22/2016 11:42 AM, David Gibson wrote:
> On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
>> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
>> patches applied nicer.
>>
>> This moves the ioba boundaries check to a helper and adds a check for
>> least bits which have to be zeros.
>>
>> The patch is pretty mechanical (only check for least ioba bits is added)
>> so no change in behaviour is expected.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> Concept looks good, but there are a couple of nits.
>
>> ---
>> Changelog:
>> v2:
>> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
>> * made error reporting cleaner
>> ---
>>   arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
>>   1 file changed, 72 insertions(+), 39 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 89e96b3..862f9a2 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -35,71 +35,104 @@
>>   #include <asm/ppc-opcode.h>
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>> +#include <asm/iommu.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> +/*
>> + * Finds a TCE table descriptor by LIOBN.
>> + *
>> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
>
> list_for_each_entry_lockless?  According to the comments in the
> header, that's for RCU protected lists, whereas this one is just
> protected by the lock in the kvm structure.  This is replacing a plain
> list_for_each_entry().

My bad, the next patch should have done this
s/list_for_each_entry/list_for_each_entry_lockless/


>
>
>> +		if (stt->liobn = liobn)
>> +			return stt;
>> +
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * Validates IO address.
>> + *
>> + * WARNING: This will be called in real-mode on HV KVM and virtual
>> + *          mode on PR KVM
>> + */
>> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>> +		unsigned long ioba, unsigned long npages)
>> +{
>> +	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
>> +	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
>> +
>> +	if ((ioba & mask) || (idx + npages > size))
>
> It doesn't matter for the current callers, but you should check for
> overflow in idx + npages as well.


npages can be only 1..512 and this is checked in H_PUT_TCE/etc handlers.
idx is 52bit long max.
And this is not going to change because H_PUT_TCE_INDIRECT will always be 
limited by 512 (or one 4K page).

Do I still need the overflow check here?


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  2016-01-22  1:59       ` Alexey Kardashevskiy
@ 2016-01-24 23:43         ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-24 23:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]

On Fri, Jan 22, 2016 at 12:59:47PM +1100, Alexey Kardashevskiy wrote:
> On 01/22/2016 11:42 AM, David Gibson wrote:
> >On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
> >>This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
> >>patches applied nicer.
> >>
> >>This moves the ioba boundaries check to a helper and adds a check for
> >>least bits which have to be zeros.
> >>
> >>The patch is pretty mechanical (only check for least ioba bits is added)
> >>so no change in behaviour is expected.
> >>
> >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >
> >Concept looks good, but there are a couple of nits.
> >
> >>---
> >>Changelog:
> >>v2:
> >>* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> >>* made error reporting cleaner
> >>---
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
> >>  1 file changed, 72 insertions(+), 39 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>index 89e96b3..862f9a2 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>@@ -35,71 +35,104 @@
> >>  #include <asm/ppc-opcode.h>
> >>  #include <asm/kvm_host.h>
> >>  #include <asm/udbg.h>
> >>+#include <asm/iommu.h>
> >>
> >>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >>
> >>+/*
> >>+ * Finds a TCE table descriptor by LIOBN.
> >>+ *
> >>+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> >>+ *          mode on PR KVM
> >>+ */
> >>+static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn)
> >>+{
> >>+	struct kvm *kvm = vcpu->kvm;
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+
> >>+	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
> >
> >list_for_each_entry_lockless?  According to the comments in the
> >header, that's for RCU protected lists, whereas this one is just
> >protected by the lock in the kvm structure.  This is replacing a plain
> >list_for_each_entry().
> 
> My bad, the next patch should have done this
> s/list_for_each_entry/list_for_each_entry_lockless/

Ah, yes.  I hadn't yet looked at the second patch.

> >>+		if (stt->liobn == liobn)
> >>+			return stt;
> >>+
> >>+	return NULL;
> >>+}
> >>+
> >>+/*
> >>+ * Validates IO address.
> >>+ *
> >>+ * WARNING: This will be called in real-mode on HV KVM and virtual
> >>+ *          mode on PR KVM
> >>+ */
> >>+static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> >>+		unsigned long ioba, unsigned long npages)
> >>+{
> >>+	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> >>+	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> >>+	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> >>+
> >>+	if ((ioba & mask) || (idx + npages > size))
> >
> >It doesn't matter for the current callers, but you should check for
> >overflow in idx + npages as well.
> 
> 
> npages can be only 1..512 and this is checked in H_PUT_TCE/etc handlers.
> idx is 52bit long max.
> And this is not going to change because H_PUT_TCE_INDIRECT will always be
> limited by 512 (or one 4K page).

Ah, ok.

> Do I still need the overflow check here?

Hm, I guess it's not essential.  I'd still prefer to see it though,
since it's good practice in general.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
@ 2016-01-24 23:43         ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-24 23:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]

On Fri, Jan 22, 2016 at 12:59:47PM +1100, Alexey Kardashevskiy wrote:
> On 01/22/2016 11:42 AM, David Gibson wrote:
> >On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
> >>This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have following
> >>patches applied nicer.
> >>
> >>This moves the ioba boundaries check to a helper and adds a check for
> >>least bits which have to be zeros.
> >>
> >>The patch is pretty mechanical (only check for least ioba bits is added)
> >>so no change in behaviour is expected.
> >>
> >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >
> >Concept looks good, but there are a couple of nits.
> >
> >>---
> >>Changelog:
> >>v2:
> >>* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> >>* made error reporting cleaner
> >>---
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c | 111 +++++++++++++++++++++++-------------
> >>  1 file changed, 72 insertions(+), 39 deletions(-)
> >>
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>index 89e96b3..862f9a2 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>@@ -35,71 +35,104 @@
> >>  #include <asm/ppc-opcode.h>
> >>  #include <asm/kvm_host.h>
> >>  #include <asm/udbg.h>
> >>+#include <asm/iommu.h>
> >>
> >>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >>
> >>+/*
> >>+ * Finds a TCE table descriptor by LIOBN.
> >>+ *
> >>+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> >>+ *          mode on PR KVM
> >>+ */
> >>+static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn)
> >>+{
> >>+	struct kvm *kvm = vcpu->kvm;
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+
> >>+	list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list)
> >
> >list_for_each_entry_lockless?  According to the comments in the
> >header, that's for RCU protected lists, whereas this one is just
> >protected by the lock in the kvm structure.  This is replacing a plain
> >list_for_each_entry().
> 
> My bad, the next patch should have done this
> s/list_for_each_entry/list_for_each_entry_lockless/

Ah, yes.  I hadn't yet looked at the second patch.

> >>+		if (stt->liobn == liobn)
> >>+			return stt;
> >>+
> >>+	return NULL;
> >>+}
> >>+
> >>+/*
> >>+ * Validates IO address.
> >>+ *
> >>+ * WARNING: This will be called in real-mode on HV KVM and virtual
> >>+ *          mode on PR KVM
> >>+ */
> >>+static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> >>+		unsigned long ioba, unsigned long npages)
> >>+{
> >>+	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> >>+	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> >>+	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> >>+
> >>+	if ((ioba & mask) || (idx + npages > size))
> >
> >It doesn't matter for the current callers, but you should check for
> >overflow in idx + npages as well.
> 
> 
> npages can be only 1..512 and this is checked in H_PUT_TCE/etc handlers.
> idx is 52bit long max.
> And this is not going to change because H_PUT_TCE_INDIRECT will always be
> limited by 512 (or one 4K page).

Ah, ok.

> Do I still need the overflow check here?

Hm, I guess it's not essential.  I'd still prefer to see it though,
since it's good practice in general.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-01-24 23:46     ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-24 23:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3470 bytes --]

On Thu, Jan 21, 2016 at 06:39:33PM +1100, Alexey Kardashevskiy wrote:
> At the moment spapr_tce_tables is not protected against races.

That's not really true - it's protected by the kvm->lock mutex.

> This makes
> use of RCU-variants of list helpers. As some bits are executed in real
> mode, this makes use of just introduced list_for_each_entry_rcu_notrace().
> 
> This converts release_spapr_tce_table() to a RCU scheduled handler.

The change itself is fine, though.

> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/include/asm/kvm_host.h |  1 +
>  arch/powerpc/kvm/book3s.c           |  2 +-
>  arch/powerpc/kvm/book3s_64_vio.c    | 20 +++++++++++---------
>  3 files changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 271fefb..c7ee696 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -184,6 +184,7 @@ struct kvmppc_spapr_tce_table {
>  	struct kvm *kvm;
>  	u64 liobn;
>  	u32 window_size;
> +	struct rcu_head rcu;
>  	struct page *pages[0];
>  };
>  
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 638c6d9..b34220d 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -807,7 +807,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
>  {
>  
>  #ifdef CONFIG_PPC64
> -	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
> +	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
>  	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>  #endif
>  
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 54cf9bc..9526c34 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> -static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
> +static void release_spapr_tce_table(struct rcu_head *head)
>  {
> -	struct kvm *kvm = stt->kvm;
> +	struct kvmppc_spapr_tce_table *stt = container_of(head,
> +			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
>  
> -	mutex_lock(&kvm->lock);
> -	list_del(&stt->list);
>  	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
>  		__free_page(stt->pages[i]);
> +
>  	kfree(stt);
> -	mutex_unlock(&kvm->lock);
> -
> -	kvm_put_kvm(kvm);
>  }
>  
>  static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> @@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  {
>  	struct kvmppc_spapr_tce_table *stt = filp->private_data;
>  
> -	release_spapr_tce_table(stt);
> +	list_del_rcu(&stt->list);
> +
> +	kvm_put_kvm(stt->kvm);
> +
> +	call_rcu(&stt->rcu, release_spapr_tce_table);
> +
>  	return 0;
>  }
>  
> @@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	kvm_get_kvm(kvm);
>  
>  	mutex_lock(&kvm->lock);
> -	list_add(&stt->list, &kvm->arch.spapr_tce_tables);
> +	list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);
>  
>  	mutex_unlock(&kvm->lock);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables
@ 2016-01-24 23:46     ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-24 23:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3470 bytes --]

On Thu, Jan 21, 2016 at 06:39:33PM +1100, Alexey Kardashevskiy wrote:
> At the moment spapr_tce_tables is not protected against races.

That's not really true - it's protected by the kvm->lock mutex.

> This makes
> use of RCU-variants of list helpers. As some bits are executed in real
> mode, this makes use of just introduced list_for_each_entry_rcu_notrace().
> 
> This converts release_spapr_tce_table() to a RCU scheduled handler.

The change itself is fine, though.

> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  arch/powerpc/include/asm/kvm_host.h |  1 +
>  arch/powerpc/kvm/book3s.c           |  2 +-
>  arch/powerpc/kvm/book3s_64_vio.c    | 20 +++++++++++---------
>  3 files changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 271fefb..c7ee696 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -184,6 +184,7 @@ struct kvmppc_spapr_tce_table {
>  	struct kvm *kvm;
>  	u64 liobn;
>  	u32 window_size;
> +	struct rcu_head rcu;
>  	struct page *pages[0];
>  };
>  
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 638c6d9..b34220d 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -807,7 +807,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
>  {
>  
>  #ifdef CONFIG_PPC64
> -	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
> +	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
>  	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>  #endif
>  
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 54cf9bc..9526c34 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> -static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
> +static void release_spapr_tce_table(struct rcu_head *head)
>  {
> -	struct kvm *kvm = stt->kvm;
> +	struct kvmppc_spapr_tce_table *stt = container_of(head,
> +			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
>  
> -	mutex_lock(&kvm->lock);
> -	list_del(&stt->list);
>  	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
>  		__free_page(stt->pages[i]);
> +
>  	kfree(stt);
> -	mutex_unlock(&kvm->lock);
> -
> -	kvm_put_kvm(kvm);
>  }
>  
>  static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> @@ -88,7 +85,12 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  {
>  	struct kvmppc_spapr_tce_table *stt = filp->private_data;
>  
> -	release_spapr_tce_table(stt);
> +	list_del_rcu(&stt->list);
> +
> +	kvm_put_kvm(stt->kvm);
> +
> +	call_rcu(&stt->rcu, release_spapr_tce_table);
> +
>  	return 0;
>  }
>  
> @@ -131,7 +133,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	kvm_get_kvm(kvm);
>  
>  	mutex_lock(&kvm->lock);
> -	list_add(&stt->list, &kvm->arch.spapr_tce_tables);
> +	list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);
>  
>  	mutex_unlock(&kvm->lock);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-01-24 23:57     ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-24 23:57 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]

On Thu, Jan 21, 2016 at 06:39:34PM +1100, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
> 
> This changes release_spapr_tce_table() to store @npages on stack to
> avoid calling kvmppc_stt_npages() in the loop (tiny optimization,
> probably).
> 
> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * switched from long to unsigned long types
> * added WARN_ON_ONCE() in locked_vm decrement case
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 55 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..ea498b4 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -39,19 +39,62 @@
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> -static long kvmppc_stt_npages(unsigned long window_size)
> +static unsigned long kvmppc_stt_npages(unsigned long window_size)
>  {
>  	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(unsigned long npages, bool inc)
> +{
> +	long ret = 0;
> +	const unsigned long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(npages * sizeof(struct page *));
> +	const unsigned long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;

Urgh, this is made pretty hard to follow by the fact that in some
places npages / stt_pages refers to the number of pages occupied by
the actual TCE tables, and in other places to the number of pages
occupied by the overhead data structures.  Please use different (and
consistent) variables for the two things to make this clearer.

It also seems odd the calculation of the overhead pages is done here,
but the base number of pages is calculated in the caller, even though
both quantities come from the stt structure itself.

> +	if (!current || !current->mm)
> +		return ret; /* process exited */
> +
> +	npages += stt_pages;
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	if (inc) {
> +		unsigned long locked, lock_limit;
> +
> +		locked = current->mm->locked_vm + npages;
> +		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> +			ret = -ENOMEM;
> +		else
> +			current->mm->locked_vm += npages;
> +	} else {
> +		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> +			npages = current->mm->locked_vm;
> +
> +		current->mm->locked_vm -= npages;
> +	}
> +
> +	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> +			inc ? '+' : '-',
> +			npages << PAGE_SHIFT,
> +			current->mm->locked_vm << PAGE_SHIFT,
> +			rlimit(RLIMIT_MEMLOCK),
> +			ret ? " - exceeded" : "");
> +
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  static void release_spapr_tce_table(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_table *stt = container_of(head,
>  			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
> +	unsigned long npages = kvmppc_stt_npages(stt->window_size);
>  
> -	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> +	for (i = 0; i < npages; i++)
>  		__free_page(stt->pages[i]);
>  
>  	kfree(stt);
> @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> +	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
>  	return 0;
> @@ -103,7 +147,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				   struct kvm_create_spapr_tce *args)
>  {
>  	struct kvmppc_spapr_tce_table *stt = NULL;
> -	long npages;
> +	unsigned long npages;
>  	int ret = -ENOMEM;
>  	int i;
>  
> @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	}
>  
>  	npages = kvmppc_stt_npages(args->window_size);
> +	ret = kvmppc_account_memlimit(npages, true);
> +	if (ret) {
> +		stt = NULL;
> +		goto fail;
> +	}
>  
>  	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
>  		      GFP_KERNEL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm
@ 2016-01-24 23:57     ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-24 23:57 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]

On Thu, Jan 21, 2016 at 06:39:34PM +1100, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
> 
> This changes release_spapr_tce_table() to store @npages on stack to
> avoid calling kvmppc_stt_npages() in the loop (tiny optimization,
> probably).
> 
> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * switched from long to unsigned long types
> * added WARN_ON_ONCE() in locked_vm decrement case
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 55 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..ea498b4 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -39,19 +39,62 @@
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> -static long kvmppc_stt_npages(unsigned long window_size)
> +static unsigned long kvmppc_stt_npages(unsigned long window_size)
>  {
>  	return ALIGN((window_size >> SPAPR_TCE_SHIFT)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(unsigned long npages, bool inc)
> +{
> +	long ret = 0;
> +	const unsigned long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(npages * sizeof(struct page *));
> +	const unsigned long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;

Urgh, this is made pretty hard to follow by the fact that in some
places npages / stt_pages refers to the number of pages occupied by
the actual TCE tables, and in other places to the number of pages
occupied by the overhead data structures.  Please use different (and
consistent) variables for the two things to make this clearer.

It also seems odd the calculation of the overhead pages is done here,
but the base number of pages is calculated in the caller, even though
both quantities come from the stt structure itself.

> +	if (!current || !current->mm)
> +		return ret; /* process exited */
> +
> +	npages += stt_pages;
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	if (inc) {
> +		unsigned long locked, lock_limit;
> +
> +		locked = current->mm->locked_vm + npages;
> +		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> +			ret = -ENOMEM;
> +		else
> +			current->mm->locked_vm += npages;
> +	} else {
> +		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> +			npages = current->mm->locked_vm;
> +
> +		current->mm->locked_vm -= npages;
> +	}
> +
> +	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> +			inc ? '+' : '-',
> +			npages << PAGE_SHIFT,
> +			current->mm->locked_vm << PAGE_SHIFT,
> +			rlimit(RLIMIT_MEMLOCK),
> +			ret ? " - exceeded" : "");
> +
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  static void release_spapr_tce_table(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_table *stt = container_of(head,
>  			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
> +	unsigned long npages = kvmppc_stt_npages(stt->window_size);
>  
> -	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> +	for (i = 0; i < npages; i++)
>  		__free_page(stt->pages[i]);
>  
>  	kfree(stt);
> @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> +	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
>  	return 0;
> @@ -103,7 +147,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				   struct kvm_create_spapr_tce *args)
>  {
>  	struct kvmppc_spapr_tce_table *stt = NULL;
> -	long npages;
> +	unsigned long npages;
>  	int ret = -ENOMEM;
>  	int i;
>  
> @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	}
>  
>  	npages = kvmppc_stt_npages(args->window_size);
> +	ret = kvmppc_account_memlimit(npages, true);
> +	if (ret) {
> +		stt = NULL;
> +		goto fail;
> +	}
>  
>  	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
>  		      GFP_KERNEL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-01-25  0:12     ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  0:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 6988 bytes --]

On Thu, Jan 21, 2016 at 06:39:36PM +1100, Alexey Kardashevskiy wrote:
> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> will validate TCE (not to have unexpected bits) and IO address
> (to be within the DMA window boundaries).
> 
> This introduces helpers to validate TCE and IO address. The helpers are
> exported as they compile into vmlinux (to work in realmode) and will be
> used later by KVM kernel module in virtual mode.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * added note to the commit log about why new helpers are exported
> * did not add a note that xxx_validate() validate TCEs for KVM (not for
> host kernel DMA) as the helper names and file location tell what are
> they for
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 92 ++++++++++++++++++++++++++++++++-----
>  2 files changed, 84 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 2241d53..9513911 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages);
> +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e142171..8cd3a95 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -36,6 +36,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> +#include <asm/tce.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>   * WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages)
>  {
> -	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> +	unsigned long mask = IOMMU_PAGE_MASK_4K;
>  	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>  	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
>  
> -	if ((ioba & mask) || (idx + npages > size))
> +	if ((ioba & ~mask) || (idx + npages > size))
>  		return H_PARAMETER;
>  
>  	return H_SUCCESS;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
> +
> +/*
> + * Validates TCE address.
> + * At the moment flags and page mask are validated.
> + * As the host kernel does not access those addresses (just puts them
> + * to the table and user space is supposed to process them), we can skip
> + * checking other things (such as TCE is a guest RAM address or the page
> + * was actually allocated).
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
>  unsigned long tce)

It would be nice to write this in terms of kvmppc_ioba_validate() above.

> +{
> +	unsigned long mask = IOMMU_PAGE_MASK_4K | TCE_PCI_WRITE | TCE_PCI_READ;
> +
> +	if (tce & ~mask)
> +		return H_PARAMETER;
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
> +
> +/* Note on the use of page_address() in real mode,
> + *
> + * It is safe to use page_address() in real mode on ppc64 because
> + * page_address() is always defined as lowmem_page_address()
> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
> + * operation and does not access page struct.
> + *
> + * Theoretically page_address() could be defined different
> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> + * should be enabled.
> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
> + * is not expected to be enabled on ppc32, page_address()
> + * is safe for ppc32 as well.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static u64 *kvmppc_page_address(struct page *page)
> +{
> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
> +#error TODO: fix to avoid page_address() here
> +#endif
> +	return (u64 *) page_address(page);
> +}
> +
> +/*
> + * Handles TCE requests for emulated devices.
> + * Puts guest TCE values to the table and expects user space to convert them.
> + * Called in both real and virtual modes.
> + * Cannot fail so kvmppc_tce_validate must be called before it.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long idx, unsigned long tce)
> +{
> +	struct page *page;
> +	u64 *tbl;
> +
> +	page = stt->pages[idx / TCES_PER_PAGE];
> +	tbl = kvmppc_page_address(page);
> +
> +	tbl[idx % TCES_PER_PAGE] = tce;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
> @@ -85,9 +158,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret;
> -	unsigned long idx;
> -	struct page *page;
> -	u64 *tbl;
>  
>  	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>  	/* 	    liobn, ioba, tce); */
> @@ -99,13 +169,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> -	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> -	page = stt->pages[idx / TCES_PER_PAGE];
> -	tbl = (u64 *)page_address(page);
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret != H_SUCCESS)
> +		return ret;
>  
> -	/* FIXME: Need to validate the TCE itself */
> -	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -	tbl[idx % TCES_PER_PAGE] = tce;
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>  
>  	return H_SUCCESS;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
@ 2016-01-25  0:12     ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  0:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 6988 bytes --]

On Thu, Jan 21, 2016 at 06:39:36PM +1100, Alexey Kardashevskiy wrote:
> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> will validate TCE (not to have unexpected bits) and IO address
> (to be within the DMA window boundaries).
> 
> This introduces helpers to validate TCE and IO address. The helpers are
> exported as they compile into vmlinux (to work in realmode) and will be
> used later by KVM kernel module in virtual mode.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * added note to the commit log about why new helpers are exported
> * did not add a note that xxx_validate() validate TCEs for KVM (not for
> host kernel DMA) as the helper names and file location tell what are
> they for
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 92 ++++++++++++++++++++++++++++++++-----
>  2 files changed, 84 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 2241d53..9513911 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long ioba, unsigned long npages);
> +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e142171..8cd3a95 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -36,6 +36,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> +#include <asm/tce.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>   * WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages)
>  {
> -	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> +	unsigned long mask = IOMMU_PAGE_MASK_4K;
>  	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
>  	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
>  
> -	if ((ioba & mask) || (idx + npages > size))
> +	if ((ioba & ~mask) || (idx + npages > size))
>  		return H_PARAMETER;
>  
>  	return H_SUCCESS;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
> +
> +/*
> + * Validates TCE address.
> + * At the moment flags and page mask are validated.
> + * As the host kernel does not access those addresses (just puts them
> + * to the table and user space is supposed to process them), we can skip
> + * checking other things (such as TCE is a guest RAM address or the page
> + * was actually allocated).
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
>  unsigned long tce)

It would be nice to write this in terms of kvmppc_ioba_validate() above.

> +{
> +	unsigned long mask = IOMMU_PAGE_MASK_4K | TCE_PCI_WRITE | TCE_PCI_READ;
> +
> +	if (tce & ~mask)
> +		return H_PARAMETER;
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
> +
> +/* Note on the use of page_address() in real mode,
> + *
> + * It is safe to use page_address() in real mode on ppc64 because
> + * page_address() is always defined as lowmem_page_address()
> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
> + * operation and does not access page struct.
> + *
> + * Theoretically page_address() could be defined different
> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> + * should be enabled.
> + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
> + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
> + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
> + * is not expected to be enabled on ppc32, page_address()
> + * is safe for ppc32 as well.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +static u64 *kvmppc_page_address(struct page *page)
> +{
> +#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
> +#error TODO: fix to avoid page_address() here
> +#endif
> +	return (u64 *) page_address(page);
> +}
> +
> +/*
> + * Handles TCE requests for emulated devices.
> + * Puts guest TCE values to the table and expects user space to convert them.
> + * Called in both real and virtual modes.
> + * Cannot fail so kvmppc_tce_validate must be called before it.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *          mode on PR KVM
> + */
> +void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long idx, unsigned long tce)
> +{
> +	struct page *page;
> +	u64 *tbl;
> +
> +	page = stt->pages[idx / TCES_PER_PAGE];
> +	tbl = kvmppc_page_address(page);
> +
> +	tbl[idx % TCES_PER_PAGE] = tce;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *          mode on PR KVM
> @@ -85,9 +158,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret;
> -	unsigned long idx;
> -	struct page *page;
> -	u64 *tbl;
>  
>  	/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>  	/* 	    liobn, ioba, tce); */
> @@ -99,13 +169,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> -	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> -	page = stt->pages[idx / TCES_PER_PAGE];
> -	tbl = (u64 *)page_address(page);
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret != H_SUCCESS)
> +		return ret;
>  
> -	/* FIXME: Need to validate the TCE itself */
> -	/* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
> -	tbl[idx % TCES_PER_PAGE] = tce;
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>  
>  	return H_SUCCESS;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  2016-01-25  0:12     ` David Gibson
@ 2016-01-25  0:18       ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  0:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 4237 bytes --]

On Mon, Jan 25, 2016 at 11:12:36AM +1100, David Gibson wrote:
> On Thu, Jan 21, 2016 at 06:39:36PM +1100, Alexey Kardashevskiy wrote:
> > Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> > will validate TCE (not to have unexpected bits) and IO address
> > (to be within the DMA window boundaries).
> > 
> > This introduces helpers to validate TCE and IO address. The helpers are
> > exported as they compile into vmlinux (to work in realmode) and will be
> > used later by KVM kernel module in virtual mode.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> > Changes:
> > v2:
> > * added note to the commit log about why new helpers are exported
> > * did not add a note that xxx_validate() validate TCEs for KVM (not for
> > host kernel DMA) as the helper names and file location tell what are
> > they for
> > ---
> >  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
> >  arch/powerpc/kvm/book3s_64_vio_hv.c | 92 ++++++++++++++++++++++++++++++++-----
> >  2 files changed, 84 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> > index 2241d53..9513911 100644
> > --- a/arch/powerpc/include/asm/kvm_ppc.h
> > +++ b/arch/powerpc/include/asm/kvm_ppc.h
> > @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
> >  
> >  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> >  				struct kvm_create_spapr_tce *args);
> > +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> > +		unsigned long ioba, unsigned long npages);
> > +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> > +		unsigned long tce);
> >  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >  			     unsigned long ioba, unsigned long tce);
> >  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> > index e142171..8cd3a95 100644
> > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> > @@ -36,6 +36,7 @@
> >  #include <asm/kvm_host.h>
> >  #include <asm/udbg.h>
> >  #include <asm/iommu.h>
> > +#include <asm/tce.h>
> >  
> >  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >  
> > @@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >   * WARNING: This will be called in real-mode on HV KVM and virtual
> >   *          mode on PR KVM
> >   */
> > -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> > +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> >  		unsigned long ioba, unsigned long npages)
> >  {
> > -	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> > +	unsigned long mask = IOMMU_PAGE_MASK_4K;
> >  	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> >  	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> >  
> > -	if ((ioba & mask) || (idx + npages > size))
> > +	if ((ioba & ~mask) || (idx + npages > size))
> >  		return H_PARAMETER;
> >  
> >  	return H_SUCCESS;
> >  }
> > +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
> > +
> > +/*
> > + * Validates TCE address.
> > + * At the moment flags and page mask are validated.
> > + * As the host kernel does not access those addresses (just puts them
> > + * to the table and user space is supposed to process them), we can skip
> > + * checking other things (such as TCE is a guest RAM address or the page
> > + * was actually allocated).
> > + *
> > + * WARNING: This will be called in real-mode on HV KVM and virtual
> > + *          mode on PR KVM
> > + */
> > +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
> >  unsigned long tce)
> 
> It would be nice to write this in terms of kvmppc_ioba_validate() above.

Duh, sorry.  Realised shortly afterwards that's nonsense.  One is
looking at the IOBA, the other at the real address.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
@ 2016-01-25  0:18       ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  0:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 4237 bytes --]

On Mon, Jan 25, 2016 at 11:12:36AM +1100, David Gibson wrote:
> On Thu, Jan 21, 2016 at 06:39:36PM +1100, Alexey Kardashevskiy wrote:
> > Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> > will validate TCE (not to have unexpected bits) and IO address
> > (to be within the DMA window boundaries).
> > 
> > This introduces helpers to validate TCE and IO address. The helpers are
> > exported as they compile into vmlinux (to work in realmode) and will be
> > used later by KVM kernel module in virtual mode.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> > Changes:
> > v2:
> > * added note to the commit log about why new helpers are exported
> > * did not add a note that xxx_validate() validate TCEs for KVM (not for
> > host kernel DMA) as the helper names and file location tell what are
> > they for
> > ---
> >  arch/powerpc/include/asm/kvm_ppc.h  |  4 ++
> >  arch/powerpc/kvm/book3s_64_vio_hv.c | 92 ++++++++++++++++++++++++++++++++-----
> >  2 files changed, 84 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> > index 2241d53..9513911 100644
> > --- a/arch/powerpc/include/asm/kvm_ppc.h
> > +++ b/arch/powerpc/include/asm/kvm_ppc.h
> > @@ -166,6 +166,10 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
> >  
> >  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> >  				struct kvm_create_spapr_tce *args);
> > +extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> > +		unsigned long ioba, unsigned long npages);
> > +extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> > +		unsigned long tce);
> >  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >  			     unsigned long ioba, unsigned long tce);
> >  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> > diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> > index e142171..8cd3a95 100644
> > --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> > +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> > @@ -36,6 +36,7 @@
> >  #include <asm/kvm_host.h>
> >  #include <asm/udbg.h>
> >  #include <asm/iommu.h>
> > +#include <asm/tce.h>
> >  
> >  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >  
> > @@ -64,18 +65,90 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >   * WARNING: This will be called in real-mode on HV KVM and virtual
> >   *          mode on PR KVM
> >   */
> > -static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> > +long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> >  		unsigned long ioba, unsigned long npages)
> >  {
> > -	unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> > +	unsigned long mask = IOMMU_PAGE_MASK_4K;
> >  	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> >  	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> >  
> > -	if ((ioba & mask) || (idx + npages > size))
> > +	if ((ioba & ~mask) || (idx + npages > size))
> >  		return H_PARAMETER;
> >  
> >  	return H_SUCCESS;
> >  }
> > +EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
> > +
> > +/*
> > + * Validates TCE address.
> > + * At the moment flags and page mask are validated.
> > + * As the host kernel does not access those addresses (just puts them
> > + * to the table and user space is supposed to process them), we can skip
> > + * checking other things (such as TCE is a guest RAM address or the page
> > + * was actually allocated).
> > + *
> > + * WARNING: This will be called in real-mode on HV KVM and virtual
> > + *          mode on PR KVM
> > + */
> > +long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
> >  unsigned long tce)
> 
> It would be nice to write this in terms of kvmppc_ioba_validate() above.

Duh, sorry.  Realised shortly afterwards that's nonsense.  One is
looking at the IOBA, the other at the real address.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-01-25  0:44     ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  0:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 19747 bytes --]

On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> devices or emulated PCI.  These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition between kernel and user space.
> 
> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> If they can not be handled by the kernel, they are passed on to
> the user space. The user space still has to have an implementation
> for these.
> 
> Both HV and PR-syle KVM are supported.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> * s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
> ---
>  Documentation/virtual/kvm/api.txt       |  25 ++++++
>  arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>  arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>  arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>  arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>  arch/powerpc/kvm/powerpc.c              |   3 +
>  8 files changed, 349 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 07e4cdf..da39435 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
>  
>  Queues an SMI on the thread's vcpu.
>  
> +4.97 KVM_CAP_PPC_MULTITCE
> +
> +Capability: KVM_CAP_PPC_MULTITCE
> +Architectures: ppc
> +Type: vm
> +
> +This capability means the kernel is capable of handling hypercalls
> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
> +space. This significantly accelerates DMA operations for PPC KVM guests.
> +User space should expect that its handlers for these hypercalls
> +are not going to be called if user space previously registered LIOBN
> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
> +
> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
> +user space might have to advertise it for the guest. For example,
> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
> +present in the "ibm,hypertas-functions" device-tree property.
> +
> +The hypercalls mentioned above may or may not be processed successfully
> +in the kernel based fast path. If they can not be handled by the kernel,
> +they will get passed on to user space. So user space still has to have
> +an implementation for these despite the in kernel acceleration.
> +
> +This capability is always enabled.
> +
>  5. The kvm_run structure
>  ------------------------
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 9513911..4cadee5 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages);
>  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>  		unsigned long tce);
> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap);

Putting a userspace address into an unsigned long is pretty nasty: it
should be a something __user *.

> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long idx, unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages);
> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba);
>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 975f0ab..987f406 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -37,8 +38,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> -
> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#include <asm/tce.h>
>  
>  static unsigned long kvmppc_stt_npages(unsigned long window_size)
>  {
> @@ -200,3 +200,109 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	long ret;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS, idx;
> +	unsigned long entry, ua = 0;
> +	u64 __user *tces, tce;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table fits in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & (SZ_4K - 1))
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +	tces = (u64 __user *) ua;
> +
> +	for (i = 0; i < npages; ++i) {
> +		if (get_user(tce, tces + i)) {
> +			ret = H_PARAMETER;
> +			goto unlock_exit;
> +		}
> +		tce = be64_to_cpu(tce);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret != H_SUCCESS)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
> +
> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;

Do we really need to allow no-permission but non-zero TCEs?

> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8cd3a95..58c63ed 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -30,6 +31,7 @@
>  #include <asm/kvm_ppc.h>
>  #include <asm/kvm_book3s.h>
>  #include <asm/mmu-hash64.h>
> +#include <asm/mmu_context.h>
>  #include <asm/hvcall.h>
>  #include <asm/synch.h>
>  #include <asm/ppc-opcode.h>
> @@ -37,6 +39,7 @@
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
>  #include <asm/tce.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -46,7 +49,7 @@
>   * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  		unsigned long liobn)
>  {
>  	struct kvm *kvm = vcpu->kvm;
> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  
>  	return NULL;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>  
>  /*
>   * Validates IO address.
> @@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
> -/* WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
> - */
> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> -		      unsigned long ioba, unsigned long tce)
> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap)
> +{
> +	unsigned long gfn = gpa >> PAGE_SHIFT;
> +	struct kvm_memory_slot *memslot;
> +
> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
> +	if (!memslot)
> +		return -EINVAL;
> +
> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE

It's a bit odd to see a test for HV_POSSIBLE in a file named
book3s_64_vio_hv.c


> +	if (prmap)
> +		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
> +#endif
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> +		unsigned long ioba, unsigned long tce)
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret;
> @@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	return H_SUCCESS;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
> +		unsigned long ua, unsigned long *phpa)

ua should be a something __user * rather than an unsigned long.  And
come to that hpa should be a something * rather than an unsigned long.

> +{
> +	pte_t *ptep, pte;
> +	unsigned shift = 0;
> +
> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
> +	if (!ptep || !pte_present(*ptep))
> +		return -ENXIO;
> +	pte = *ptep;
> +
> +	if (!shift)
> +		shift = PAGE_SHIFT;
> +
> +	/* Avoid handling anything potentially complicated in realmode */
> +	if (shift > PAGE_SHIFT)
> +		return -EAGAIN;
> +
> +	if (!pte_young(pte))
> +		return -EAGAIN;
> +
> +	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
> +			(ua & ~PAGE_MASK);
> +
> +	return 0;
> +}
> +
> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS;
> +	unsigned long tces, entry, ua = 0;
> +	unsigned long *rmap = NULL;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & (SZ_4K - 1))
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		return H_TOO_HARD;
> +
> +	rmap = (void *) vmalloc_to_phys(rmap);
> +
> +	lock_rmap(rmap);
> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret != H_SUCCESS)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	unlock_rmap(rmap);
> +
> +	return ret;
> +}
> +
> +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)

Unlike put_indirect, this code appears to be identical to the non
realmode code - can you combine them?

> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		unsigned long ioba)
> @@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	return H_SUCCESS;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
> +
> +#endif /* KVM_BOOK3S_HV_POSSIBLE */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index cff207b..df3fbae 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  		if (kvmppc_xics_enabled(vcpu)) {
>  			ret = kvmppc_xics_hcall(vcpu, req);
>  			break;
> -		} /* fallthrough */
> +		}
> +		return RESUME_HOST;
> +	case H_PUT_TCE:
> +		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_PUT_TCE_INDIRECT:
> +		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_STUFF_TCE:
> +		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
>  	default:
>  		return RESUME_HOST;
>  	}
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 3c6badc..3bf6e72 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1928,7 +1928,7 @@ hcall_real_table:
>  	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> -	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
>  	.long	0		/* 0x24 - H_SET_SPRG0 */
>  	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
>  	.long	0		/* 0x2c */
> @@ -2006,8 +2006,8 @@ hcall_real_table:
>  	.long	0		/* 0x12c */
>  	.long	0		/* 0x130 */
>  	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
> -	.long	0		/* 0x138 */
> -	.long	0		/* 0x13c */
> +	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
>  	.long	0		/* 0x140 */
>  	.long	0		/* 0x144 */
>  	.long	0		/* 0x148 */
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index f2c75a1..02176fd 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
>  	return EMULATE_DONE;
>  }
>  
> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
> +			tce, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
>  static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>  {
>  	long rc = kvmppc_xics_hcall(vcpu, cmd);
> @@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>  		return kvmppc_h_pr_bulk_remove(vcpu);
>  	case H_PUT_TCE:
>  		return kvmppc_h_pr_put_tce(vcpu);
> +	case H_PUT_TCE_INDIRECT:
> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
> +	case H_STUFF_TCE:
> +		return kvmppc_h_pr_stuff_tce(vcpu);
>  	case H_CEDE:
>  		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>  		kvm_vcpu_block(vcpu);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6fd2405..164735c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_PPC_GET_SMMU_INFO:
>  		r = 1;
>  		break;
> +	case KVM_CAP_SPAPR_MULTITCE:
> +		r = 1;
> +		break;

Hmm, usual practice has been not to enable new KVM hcalls, unless
userspace (qemu) explicitly enables them with ENABLE_HCALL.  I don't
see an obvious way this extension could break, but it's probably
safest to continue that pattern.

>  #endif
>  	default:
>  		r = 0;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-01-25  0:44     ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  0:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 19747 bytes --]

On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> devices or emulated PCI.  These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition between kernel and user space.
> 
> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> If they can not be handled by the kernel, they are passed on to
> the user space. The user space still has to have an implementation
> for these.
> 
> Both HV and PR-syle KVM are supported.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> * s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
> ---
>  Documentation/virtual/kvm/api.txt       |  25 ++++++
>  arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>  arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
>  arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>  arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>  arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>  arch/powerpc/kvm/powerpc.c              |   3 +
>  8 files changed, 349 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 07e4cdf..da39435 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
>  
>  Queues an SMI on the thread's vcpu.
>  
> +4.97 KVM_CAP_PPC_MULTITCE
> +
> +Capability: KVM_CAP_PPC_MULTITCE
> +Architectures: ppc
> +Type: vm
> +
> +This capability means the kernel is capable of handling hypercalls
> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
> +space. This significantly accelerates DMA operations for PPC KVM guests.
> +User space should expect that its handlers for these hypercalls
> +are not going to be called if user space previously registered LIOBN
> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
> +
> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
> +user space might have to advertise it for the guest. For example,
> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
> +present in the "ibm,hypertas-functions" device-tree property.
> +
> +The hypercalls mentioned above may or may not be processed successfully
> +in the kernel based fast path. If they can not be handled by the kernel,
> +they will get passed on to user space. So user space still has to have
> +an implementation for these despite the in kernel acceleration.
> +
> +This capability is always enabled.
> +
>  5. The kvm_run structure
>  ------------------------
>  
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 9513911..4cadee5 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>  
>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  				struct kvm_create_spapr_tce *args);
> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>  		unsigned long ioba, unsigned long npages);
>  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>  		unsigned long tce);
> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap);

Putting a userspace address into an unsigned long is pretty nasty: it
should be a something __user *.

> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> +		unsigned long idx, unsigned long tce);
>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba, unsigned long tce);
> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages);
> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages);
>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			     unsigned long ioba);
>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 975f0ab..987f406 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -37,8 +38,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> -
> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#include <asm/tce.h>
>  
>  static unsigned long kvmppc_stt_npages(unsigned long window_size)
>  {
> @@ -200,3 +200,109 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	long ret;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS, idx;
> +	unsigned long entry, ua = 0;
> +	u64 __user *tces, tce;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table fits in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & (SZ_4K - 1))
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +	tces = (u64 __user *) ua;
> +
> +	for (i = 0; i < npages; ++i) {
> +		if (get_user(tce, tces + i)) {
> +			ret = H_PARAMETER;
> +			goto unlock_exit;
> +		}
> +		tce = be64_to_cpu(tce);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret != H_SUCCESS)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
> +
> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;

Do we really need to allow no-permission but non-zero TCEs?

> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8cd3a95..58c63ed 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -30,6 +31,7 @@
>  #include <asm/kvm_ppc.h>
>  #include <asm/kvm_book3s.h>
>  #include <asm/mmu-hash64.h>
> +#include <asm/mmu_context.h>
>  #include <asm/hvcall.h>
>  #include <asm/synch.h>
>  #include <asm/ppc-opcode.h>
> @@ -37,6 +39,7 @@
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
>  #include <asm/tce.h>
> +#include <asm/iommu.h>
>  
>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>  
> @@ -46,7 +49,7 @@
>   * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>   *          mode on PR KVM
>   */
> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  		unsigned long liobn)
>  {
>  	struct kvm *kvm = vcpu->kvm;
> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>  
>  	return NULL;
>  }
> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>  
>  /*
>   * Validates IO address.
> @@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>  
> -/* WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
> - */
> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> -		      unsigned long ioba, unsigned long tce)
> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> +		unsigned long *ua, unsigned long **prmap)
> +{
> +	unsigned long gfn = gpa >> PAGE_SHIFT;
> +	struct kvm_memory_slot *memslot;
> +
> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
> +	if (!memslot)
> +		return -EINVAL;
> +
> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE

It's a bit odd to see a test for HV_POSSIBLE in a file named
book3s_64_vio_hv.c


> +	if (prmap)
> +		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
> +#endif
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> +
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> +		unsigned long ioba, unsigned long tce)
>  {
>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>  	long ret;
> @@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	return H_SUCCESS;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> +
> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
> +		unsigned long ua, unsigned long *phpa)

ua should be a something __user * rather than an unsigned long.  And
come to that hpa should be a something * rather than an unsigned long.

> +{
> +	pte_t *ptep, pte;
> +	unsigned shift = 0;
> +
> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
> +	if (!ptep || !pte_present(*ptep))
> +		return -ENXIO;
> +	pte = *ptep;
> +
> +	if (!shift)
> +		shift = PAGE_SHIFT;
> +
> +	/* Avoid handling anything potentially complicated in realmode */
> +	if (shift > PAGE_SHIFT)
> +		return -EAGAIN;
> +
> +	if (!pte_young(pte))
> +		return -EAGAIN;
> +
> +	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
> +			(ua & ~PAGE_MASK);
> +
> +	return 0;
> +}
> +
> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS;
> +	unsigned long tces, entry, ua = 0;
> +	unsigned long *rmap = NULL;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & (SZ_4K - 1))
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		return H_TOO_HARD;
> +
> +	rmap = (void *) vmalloc_to_phys(rmap);
> +
> +	lock_rmap(rmap);
> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret != H_SUCCESS)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	unlock_rmap(rmap);
> +
> +	return ret;
> +}
> +
> +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)

Unlike put_indirect, this code appears to be identical to the non
realmode code - can you combine them?

> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> +
> +	return H_SUCCESS;
> +}
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		unsigned long ioba)
> @@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	return H_SUCCESS;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
> +
> +#endif /* KVM_BOOK3S_HV_POSSIBLE */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index cff207b..df3fbae 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  		if (kvmppc_xics_enabled(vcpu)) {
>  			ret = kvmppc_xics_hcall(vcpu, req);
>  			break;
> -		} /* fallthrough */
> +		}
> +		return RESUME_HOST;
> +	case H_PUT_TCE:
> +		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_PUT_TCE_INDIRECT:
> +		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
> +	case H_STUFF_TCE:
> +		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> +						kvmppc_get_gpr(vcpu, 5),
> +						kvmppc_get_gpr(vcpu, 6),
> +						kvmppc_get_gpr(vcpu, 7));
> +		if (ret == H_TOO_HARD)
> +			return RESUME_HOST;
> +		break;
>  	default:
>  		return RESUME_HOST;
>  	}
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 3c6badc..3bf6e72 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1928,7 +1928,7 @@ hcall_real_table:
>  	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
>  	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> -	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
>  	.long	0		/* 0x24 - H_SET_SPRG0 */
>  	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
>  	.long	0		/* 0x2c */
> @@ -2006,8 +2006,8 @@ hcall_real_table:
>  	.long	0		/* 0x12c */
>  	.long	0		/* 0x130 */
>  	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
> -	.long	0		/* 0x138 */
> -	.long	0		/* 0x13c */
> +	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
> +	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
>  	.long	0		/* 0x140 */
>  	.long	0		/* 0x144 */
>  	.long	0		/* 0x148 */
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index f2c75a1..02176fd 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
>  	return EMULATE_DONE;
>  }
>  
> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
> +			tce, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> +	long rc;
> +
> +	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
> +	if (rc == H_TOO_HARD)
> +		return EMULATE_FAIL;
> +	kvmppc_set_gpr(vcpu, 3, rc);
> +	return EMULATE_DONE;
> +}
> +
>  static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>  {
>  	long rc = kvmppc_xics_hcall(vcpu, cmd);
> @@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>  		return kvmppc_h_pr_bulk_remove(vcpu);
>  	case H_PUT_TCE:
>  		return kvmppc_h_pr_put_tce(vcpu);
> +	case H_PUT_TCE_INDIRECT:
> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
> +	case H_STUFF_TCE:
> +		return kvmppc_h_pr_stuff_tce(vcpu);
>  	case H_CEDE:
>  		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>  		kvm_vcpu_block(vcpu);
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 6fd2405..164735c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_PPC_GET_SMMU_INFO:
>  		r = 1;
>  		break;
> +	case KVM_CAP_SPAPR_MULTITCE:
> +		r = 1;
> +		break;

Hmm, usual practice has been not to enable new KVM hcalls, unless
userspace (qemu) explicitly enables them with ENABLE_HCALL.  I don't
see an obvious way this extension could break, but it's probably
safest to continue that pattern.

>  #endif
>  	default:
>  		r = 0;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-01-25  0:44     ` David Gibson
@ 2016-01-25  1:24       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-25  1:24 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

On 01/25/2016 11:44 AM, David Gibson wrote:
> On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
>> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
>> devices or emulated PCI.  These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition between kernel and user space.
>>
>> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
>> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
>> If they can not be handled by the kernel, they are passed on to
>> the user space. The user space still has to have an implementation
>> for these.
>>
>> Both HV and PR-syle KVM are supported.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
>> * s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
>> ---
>>   Documentation/virtual/kvm/api.txt       |  25 ++++++
>>   arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>>   arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
>>   arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>>   arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>>   arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>>   arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>>   arch/powerpc/kvm/powerpc.c              |   3 +
>>   8 files changed, 349 insertions(+), 13 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 07e4cdf..da39435 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
>>
>>   Queues an SMI on the thread's vcpu.
>>
>> +4.97 KVM_CAP_PPC_MULTITCE
>> +
>> +Capability: KVM_CAP_PPC_MULTITCE
>> +Architectures: ppc
>> +Type: vm
>> +
>> +This capability means the kernel is capable of handling hypercalls
>> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
>> +space. This significantly accelerates DMA operations for PPC KVM guests.
>> +User space should expect that its handlers for these hypercalls
>> +are not going to be called if user space previously registered LIOBN
>> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
>> +
>> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
>> +user space might have to advertise it for the guest. For example,
>> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
>> +present in the "ibm,hypertas-functions" device-tree property.
>> +
>> +The hypercalls mentioned above may or may not be processed successfully
>> +in the kernel based fast path. If they can not be handled by the kernel,
>> +they will get passed on to user space. So user space still has to have
>> +an implementation for these despite the in kernel acceleration.
>> +
>> +This capability is always enabled.
>> +
>>   5. The kvm_run structure
>>   ------------------------
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index 9513911..4cadee5 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>>
>>   extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>   				struct kvm_create_spapr_tce *args);
>> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>>   extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>   		unsigned long ioba, unsigned long npages);
>>   extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>>   		unsigned long tce);
>> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap);
>
> Putting a userspace address into an unsigned long is pretty nasty: it
> should be a something __user *.
>
>> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
>> +		unsigned long idx, unsigned long tce);
>>   extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba, unsigned long tce);
>> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages);
>> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages);
>>   extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba);
>>   extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 975f0ab..987f406 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -37,8 +38,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> -
>> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#include <asm/tce.h>
>>
>>   static unsigned long kvmppc_stt_npages(unsigned long window_size)
>>   {
>> @@ -200,3 +200,109 @@ fail:
>>   	}
>>   	return ret;
>>   }
>> +
>> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce)
>> +{
>> +	long ret;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS, idx;
>> +	unsigned long entry, ua = 0;
>> +	u64 __user *tces, tce;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table fits in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & (SZ_4K - 1))
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +	tces = (u64 __user *) ua;
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		if (get_user(tce, tces + i)) {
>> +			ret = H_PARAMETER;
>> +			goto unlock_exit;
>> +		}
>> +		tce = be64_to_cpu(tce);
>> +
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret != H_SUCCESS)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
>> +
>> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>> +		return H_PARAMETER;
>
> Do we really need to allow no-permission but non-zero TCEs?

Not sure, for debugging purposes one could want to poison the table. 
Totally useless?


>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 8cd3a95..58c63ed 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -30,6 +31,7 @@
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>>   #include <asm/mmu-hash64.h>
>> +#include <asm/mmu_context.h>
>>   #include <asm/hvcall.h>
>>   #include <asm/synch.h>
>>   #include <asm/ppc-opcode.h>
>> @@ -37,6 +39,7 @@
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>>   #include <asm/tce.h>
>> +#include <asm/iommu.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> @@ -46,7 +49,7 @@
>>    * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>>    *          mode on PR KVM
>>    */
>> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>   		unsigned long liobn)
>>   {
>>   	struct kvm *kvm = vcpu->kvm;
>> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>
>>   	return NULL;
>>   }
>> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>>
>>   /*
>>    * Validates IO address.
>> @@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>>   }
>>   EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>>
>> -/* WARNING: This will be called in real-mode on HV KVM and virtual
>> - *          mode on PR KVM
>> - */
>> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> -		      unsigned long ioba, unsigned long tce)
>> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap)
>> +{
>> +	unsigned long gfn = gpa >> PAGE_SHIFT;
>> +	struct kvm_memory_slot *memslot;
>> +
>> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
>> +	if (!memslot)
>> +		return -EINVAL;
>> +
>> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
>> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>
> It's a bit odd to see a test for HV_POSSIBLE in a file named
> book3s_64_vio_hv.c


True, the file name should have probably been changed book3s_64_vio_rm.c.


>
>> +	if (prmap)
>> +		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
>> +#endif
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> +		unsigned long ioba, unsigned long tce)
>>   {
>>   	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>>   	long ret;
>> @@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>
>>   	return H_SUCCESS;
>>   }
>> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
>> +		unsigned long ua, unsigned long *phpa)
>
> ua should be a something __user * rather than an unsigned long.  And
> come to that hpa should be a something * rather than an unsigned long.


@ua is the return type of __gfn_to_hva_memslot() so I kept it. Also, the 
only place where I actually read from this address is the virtualmode's 
H_PUT_TCE_INDIRECT handler, all other places just do translation with it so 
making it "unsigned long" saves some type convertions. It is also used in 
mm_iommu_ua_to_hpa() which is in upstream now (commit 15b244a88e1b289, part 
of DDW and preregistration patchset). Still need to change it?

Regarding @phpa, the agreement here that we use "void *" for 0xC000.... 
type of addresses; and we use "unsigned long" if top 4 bits are not set as 
dereferencing such pointers will normally fail.



>> +{
>> +	pte_t *ptep, pte;
>> +	unsigned shift = 0;
>> +
>> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
>> +	if (!ptep || !pte_present(*ptep))
>> +		return -ENXIO;
>> +	pte = *ptep;
>> +
>> +	if (!shift)
>> +		shift = PAGE_SHIFT;
>> +
>> +	/* Avoid handling anything potentially complicated in realmode */
>> +	if (shift > PAGE_SHIFT)
>> +		return -EAGAIN;
>> +
>> +	if (!pte_young(pte))
>> +		return -EAGAIN;
>> +
>> +	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
>> +			(ua & ~PAGE_MASK);
>> +
>> +	return 0;
>> +}
>> +
>> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list,	unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS;
>> +	unsigned long tces, entry, ua = 0;
>> +	unsigned long *rmap = NULL;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * The spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table addressed resides in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & (SZ_4K - 1))
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>> +		return H_TOO_HARD;
>> +
>> +	rmap = (void *) vmalloc_to_phys(rmap);
>> +
>> +	lock_rmap(rmap);
>> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>> +
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret != H_SUCCESS)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	unlock_rmap(rmap);
>> +
>> +	return ret;
>> +}
>> +
>> +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>
> Unlike put_indirect, this code appears to be identical to the non
> realmode code - can you combine them?


It is at this point but this will get different bits in "KVM: PPC: vfio kvm 
device: support spapr tce" later, may be sometime later I will manage to 
get to that part, eventually...


>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
>> +
>> +	return H_SUCCESS;
>> +}
>>
>>   long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   		unsigned long ioba)
>> @@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   	return H_SUCCESS;
>>   }
>>   EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
>> +
>> +#endif /* KVM_BOOK3S_HV_POSSIBLE */
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index cff207b..df3fbae 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>>   		if (kvmppc_xics_enabled(vcpu)) {
>>   			ret = kvmppc_xics_hcall(vcpu, req);
>>   			break;
>> -		} /* fallthrough */
>> +		}
>> +		return RESUME_HOST;
>> +	case H_PUT_TCE:
>> +		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +						kvmppc_get_gpr(vcpu, 5),
>> +						kvmppc_get_gpr(vcpu, 6));
>> +		if (ret == H_TOO_HARD)
>> +			return RESUME_HOST;
>> +		break;
>> +	case H_PUT_TCE_INDIRECT:
>> +		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +						kvmppc_get_gpr(vcpu, 5),
>> +						kvmppc_get_gpr(vcpu, 6),
>> +						kvmppc_get_gpr(vcpu, 7));
>> +		if (ret == H_TOO_HARD)
>> +			return RESUME_HOST;
>> +		break;
>> +	case H_STUFF_TCE:
>> +		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +						kvmppc_get_gpr(vcpu, 5),
>> +						kvmppc_get_gpr(vcpu, 6),
>> +						kvmppc_get_gpr(vcpu, 7));
>> +		if (ret == H_TOO_HARD)
>> +			return RESUME_HOST;
>> +		break;
>>   	default:
>>   		return RESUME_HOST;
>>   	}
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index 3c6badc..3bf6e72 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -1928,7 +1928,7 @@ hcall_real_table:
>>   	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>>   	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
>>   	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
>> -	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
>> +	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
>>   	.long	0		/* 0x24 - H_SET_SPRG0 */
>>   	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
>>   	.long	0		/* 0x2c */
>> @@ -2006,8 +2006,8 @@ hcall_real_table:
>>   	.long	0		/* 0x12c */
>>   	.long	0		/* 0x130 */
>>   	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
>> -	.long	0		/* 0x138 */
>> -	.long	0		/* 0x13c */
>> +	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
>> +	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
>>   	.long	0		/* 0x140 */
>>   	.long	0		/* 0x144 */
>>   	.long	0		/* 0x148 */
>> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
>> index f2c75a1..02176fd 100644
>> --- a/arch/powerpc/kvm/book3s_pr_papr.c
>> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
>> @@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
>>   	return EMULATE_DONE;
>>   }
>>
>> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
>> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
>> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
>> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
>> +	long rc;
>> +
>> +	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
>> +			tce, npages);
>> +	if (rc == H_TOO_HARD)
>> +		return EMULATE_FAIL;
>> +	kvmppc_set_gpr(vcpu, 3, rc);
>> +	return EMULATE_DONE;
>> +}
>> +
>> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
>> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
>> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
>> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
>> +	long rc;
>> +
>> +	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
>> +	if (rc == H_TOO_HARD)
>> +		return EMULATE_FAIL;
>> +	kvmppc_set_gpr(vcpu, 3, rc);
>> +	return EMULATE_DONE;
>> +}
>> +
>>   static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>>   {
>>   	long rc = kvmppc_xics_hcall(vcpu, cmd);
>> @@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>>   		return kvmppc_h_pr_bulk_remove(vcpu);
>>   	case H_PUT_TCE:
>>   		return kvmppc_h_pr_put_tce(vcpu);
>> +	case H_PUT_TCE_INDIRECT:
>> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
>> +	case H_STUFF_TCE:
>> +		return kvmppc_h_pr_stuff_tce(vcpu);
>>   	case H_CEDE:
>>   		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>>   		kvm_vcpu_block(vcpu);
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 6fd2405..164735c 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>   	case KVM_CAP_PPC_GET_SMMU_INFO:
>>   		r = 1;
>>   		break;
>> +	case KVM_CAP_SPAPR_MULTITCE:
>> +		r = 1;
>> +		break;
>
> Hmm, usual practice has been not to enable new KVM hcalls, unless
> userspace (qemu) explicitly enables them with ENABLE_HCALL.  I don't
> see an obvious way this extension could break, but it's probably
> safest to continue that pattern.


This advertises the capability but does not enable it, this is still 
required in QEMU:

ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
                         H_PUT_TCE_INDIRECT, 1);
ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
                         H_STUFF_TCE, 1);

as multi-tce hcalls are not in the default_hcall_list list in 
arch/powerpc/kvm/book3s_hv.c.


>>   #endif
>>   	default:
>>   		r = 0;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-01-25  1:24       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-01-25  1:24 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

On 01/25/2016 11:44 AM, David Gibson wrote:
> On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
>> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
>> devices or emulated PCI.  These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition between kernel and user space.
>>
>> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
>> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
>> If they can not be handled by the kernel, they are passed on to
>> the user space. The user space still has to have an implementation
>> for these.
>>
>> Both HV and PR-syle KVM are supported.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
>> * s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
>> ---
>>   Documentation/virtual/kvm/api.txt       |  25 ++++++
>>   arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
>>   arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
>>   arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
>>   arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
>>   arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
>>   arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
>>   arch/powerpc/kvm/powerpc.c              |   3 +
>>   8 files changed, 349 insertions(+), 13 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 07e4cdf..da39435 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
>>
>>   Queues an SMI on the thread's vcpu.
>>
>> +4.97 KVM_CAP_PPC_MULTITCE
>> +
>> +Capability: KVM_CAP_PPC_MULTITCE
>> +Architectures: ppc
>> +Type: vm
>> +
>> +This capability means the kernel is capable of handling hypercalls
>> +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
>> +space. This significantly accelerates DMA operations for PPC KVM guests.
>> +User space should expect that its handlers for these hypercalls
>> +are not going to be called if user space previously registered LIOBN
>> +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
>> +
>> +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
>> +user space might have to advertise it for the guest. For example,
>> +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
>> +present in the "ibm,hypertas-functions" device-tree property.
>> +
>> +The hypercalls mentioned above may or may not be processed successfully
>> +in the kernel based fast path. If they can not be handled by the kernel,
>> +they will get passed on to user space. So user space still has to have
>> +an implementation for these despite the in kernel acceleration.
>> +
>> +This capability is always enabled.
>> +
>>   5. The kvm_run structure
>>   ------------------------
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index 9513911..4cadee5 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
>>
>>   extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>   				struct kvm_create_spapr_tce *args);
>> +extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>> +		struct kvm_vcpu *vcpu, unsigned long liobn);
>>   extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
>>   		unsigned long ioba, unsigned long npages);
>>   extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
>>   		unsigned long tce);
>> +extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap);
>
> Putting a userspace address into an unsigned long is pretty nasty: it
> should be a something __user *.
>
>> +extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
>> +		unsigned long idx, unsigned long tce);
>>   extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba, unsigned long tce);
>> +extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages);
>> +extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages);
>>   extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   			     unsigned long ioba);
>>   extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 975f0ab..987f406 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -37,8 +38,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> -
>> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#include <asm/tce.h>
>>
>>   static unsigned long kvmppc_stt_npages(unsigned long window_size)
>>   {
>> @@ -200,3 +200,109 @@ fail:
>>   	}
>>   	return ret;
>>   }
>> +
>> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce)
>> +{
>> +	long ret;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS, idx;
>> +	unsigned long entry, ua = 0;
>> +	u64 __user *tces, tce;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * SPAPR spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table fits in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & (SZ_4K - 1))
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +	tces = (u64 __user *) ua;
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		if (get_user(tce, tces + i)) {
>> +			ret = H_PARAMETER;
>> +			goto unlock_exit;
>> +		}
>> +		tce = be64_to_cpu(tce);
>> +
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret != H_SUCCESS)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
>> +
>> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>> +		return H_PARAMETER;
>
> Do we really need to allow no-permission but non-zero TCEs?

Not sure, for debugging purposes one could want to poison the table. 
Totally useless?


>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 8cd3a95..58c63ed 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -30,6 +31,7 @@
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>>   #include <asm/mmu-hash64.h>
>> +#include <asm/mmu_context.h>
>>   #include <asm/hvcall.h>
>>   #include <asm/synch.h>
>>   #include <asm/ppc-opcode.h>
>> @@ -37,6 +39,7 @@
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>>   #include <asm/tce.h>
>> +#include <asm/iommu.h>
>>
>>   #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>>
>> @@ -46,7 +49,7 @@
>>    * WARNING: This will be called in real or virtual mode on HV KVM and virtual
>>    *          mode on PR KVM
>>    */
>> -static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>> +struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>   		unsigned long liobn)
>>   {
>>   	struct kvm *kvm = vcpu->kvm;
>> @@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
>>
>>   	return NULL;
>>   }
>> +EXPORT_SYMBOL_GPL(kvmppc_find_table);
>>
>>   /*
>>    * Validates IO address.
>> @@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
>>   }
>>   EXPORT_SYMBOL_GPL(kvmppc_tce_put);
>>
>> -/* WARNING: This will be called in real-mode on HV KVM and virtual
>> - *          mode on PR KVM
>> - */
>> -long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> -		      unsigned long ioba, unsigned long tce)
>> +long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>> +		unsigned long *ua, unsigned long **prmap)
>> +{
>> +	unsigned long gfn = gpa >> PAGE_SHIFT;
>> +	struct kvm_memory_slot *memslot;
>> +
>> +	memslot = search_memslots(kvm_memslots(kvm), gfn);
>> +	if (!memslot)
>> +		return -EINVAL;
>> +
>> +	*ua = __gfn_to_hva_memslot(memslot, gfn) |
>> +		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>
> It's a bit odd to see a test for HV_POSSIBLE in a file named
> book3s_64_vio_hv.c


True, the file name should have probably been changed book3s_64_vio_rm.c.


>
>> +	if (prmap)
>> +		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
>> +#endif
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
>> +
>> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>> +long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>> +		unsigned long ioba, unsigned long tce)
>>   {
>>   	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
>>   	long ret;
>> @@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>
>>   	return H_SUCCESS;
>>   }
>> -EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>> +
>> +static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
>> +		unsigned long ua, unsigned long *phpa)
>
> ua should be a something __user * rather than an unsigned long.  And
> come to that hpa should be a something * rather than an unsigned long.


@ua is the return type of __gfn_to_hva_memslot() so I kept it. Also, the 
only place where I actually read from this address is the virtualmode's 
H_PUT_TCE_INDIRECT handler, all other places just do translation with it so 
making it "unsigned long" saves some type convertions. It is also used in 
mm_iommu_ua_to_hpa() which is in upstream now (commit 15b244a88e1b289, part 
of DDW and preregistration patchset). Still need to change it?

Regarding @phpa, the agreement here that we use "void *" for 0xC000.... 
type of addresses; and we use "unsigned long" if top 4 bits are not set as 
dereferencing such pointers will normally fail.



>> +{
>> +	pte_t *ptep, pte;
>> +	unsigned shift = 0;
>> +
>> +	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
>> +	if (!ptep || !pte_present(*ptep))
>> +		return -ENXIO;
>> +	pte = *ptep;
>> +
>> +	if (!shift)
>> +		shift = PAGE_SHIFT;
>> +
>> +	/* Avoid handling anything potentially complicated in realmode */
>> +	if (shift > PAGE_SHIFT)
>> +		return -EAGAIN;
>> +
>> +	if (!pte_young(pte))
>> +		return -EAGAIN;
>> +
>> +	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
>> +			(ua & ~PAGE_MASK);
>> +
>> +	return 0;
>> +}
>> +
>> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list,	unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS;
>> +	unsigned long tces, entry, ua = 0;
>> +	unsigned long *rmap = NULL;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * The spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table addressed resides in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & (SZ_4K - 1))
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>> +		return H_TOO_HARD;
>> +
>> +	rmap = (void *) vmalloc_to_phys(rmap);
>> +
>> +	lock_rmap(rmap);
>> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>> +
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret != H_SUCCESS)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	unlock_rmap(rmap);
>> +
>> +	return ret;
>> +}
>> +
>> +long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>
> Unlike put_indirect, this code appears to be identical to the non
> realmode code - can you combine them?


It is at this point but this will get different bits in "KVM: PPC: vfio kvm 
device: support spapr tce" later, may be sometime later I will manage to 
get to that part, eventually...


>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>> +		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
>> +
>> +	return H_SUCCESS;
>> +}
>>
>>   long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   		unsigned long ioba)
>> @@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>   	return H_SUCCESS;
>>   }
>>   EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
>> +
>> +#endif /* KVM_BOOK3S_HV_POSSIBLE */
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index cff207b..df3fbae 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>>   		if (kvmppc_xics_enabled(vcpu)) {
>>   			ret = kvmppc_xics_hcall(vcpu, req);
>>   			break;
>> -		} /* fallthrough */
>> +		}
>> +		return RESUME_HOST;
>> +	case H_PUT_TCE:
>> +		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +						kvmppc_get_gpr(vcpu, 5),
>> +						kvmppc_get_gpr(vcpu, 6));
>> +		if (ret = H_TOO_HARD)
>> +			return RESUME_HOST;
>> +		break;
>> +	case H_PUT_TCE_INDIRECT:
>> +		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +						kvmppc_get_gpr(vcpu, 5),
>> +						kvmppc_get_gpr(vcpu, 6),
>> +						kvmppc_get_gpr(vcpu, 7));
>> +		if (ret = H_TOO_HARD)
>> +			return RESUME_HOST;
>> +		break;
>> +	case H_STUFF_TCE:
>> +		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +						kvmppc_get_gpr(vcpu, 5),
>> +						kvmppc_get_gpr(vcpu, 6),
>> +						kvmppc_get_gpr(vcpu, 7));
>> +		if (ret = H_TOO_HARD)
>> +			return RESUME_HOST;
>> +		break;
>>   	default:
>>   		return RESUME_HOST;
>>   	}
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index 3c6badc..3bf6e72 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -1928,7 +1928,7 @@ hcall_real_table:
>>   	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>>   	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
>>   	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
>> -	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
>> +	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
>>   	.long	0		/* 0x24 - H_SET_SPRG0 */
>>   	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
>>   	.long	0		/* 0x2c */
>> @@ -2006,8 +2006,8 @@ hcall_real_table:
>>   	.long	0		/* 0x12c */
>>   	.long	0		/* 0x130 */
>>   	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
>> -	.long	0		/* 0x138 */
>> -	.long	0		/* 0x13c */
>> +	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
>> +	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
>>   	.long	0		/* 0x140 */
>>   	.long	0		/* 0x144 */
>>   	.long	0		/* 0x148 */
>> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
>> index f2c75a1..02176fd 100644
>> --- a/arch/powerpc/kvm/book3s_pr_papr.c
>> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
>> @@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
>>   	return EMULATE_DONE;
>>   }
>>
>> +static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
>> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
>> +	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
>> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
>> +	long rc;
>> +
>> +	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
>> +			tce, npages);
>> +	if (rc = H_TOO_HARD)
>> +		return EMULATE_FAIL;
>> +	kvmppc_set_gpr(vcpu, 3, rc);
>> +	return EMULATE_DONE;
>> +}
>> +
>> +static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
>> +	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
>> +	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
>> +	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
>> +	long rc;
>> +
>> +	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
>> +	if (rc = H_TOO_HARD)
>> +		return EMULATE_FAIL;
>> +	kvmppc_set_gpr(vcpu, 3, rc);
>> +	return EMULATE_DONE;
>> +}
>> +
>>   static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>>   {
>>   	long rc = kvmppc_xics_hcall(vcpu, cmd);
>> @@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>>   		return kvmppc_h_pr_bulk_remove(vcpu);
>>   	case H_PUT_TCE:
>>   		return kvmppc_h_pr_put_tce(vcpu);
>> +	case H_PUT_TCE_INDIRECT:
>> +		return kvmppc_h_pr_put_tce_indirect(vcpu);
>> +	case H_STUFF_TCE:
>> +		return kvmppc_h_pr_stuff_tce(vcpu);
>>   	case H_CEDE:
>>   		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>>   		kvm_vcpu_block(vcpu);
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 6fd2405..164735c 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>   	case KVM_CAP_PPC_GET_SMMU_INFO:
>>   		r = 1;
>>   		break;
>> +	case KVM_CAP_SPAPR_MULTITCE:
>> +		r = 1;
>> +		break;
>
> Hmm, usual practice has been not to enable new KVM hcalls, unless
> userspace (qemu) explicitly enables them with ENABLE_HCALL.  I don't
> see an obvious way this extension could break, but it's probably
> safest to continue that pattern.


This advertises the capability but does not enable it, this is still 
required in QEMU:

ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
                         H_PUT_TCE_INDIRECT, 1);
ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
                         H_STUFF_TCE, 1);

as multi-tce hcalls are not in the default_hcall_list list in 
arch/powerpc/kvm/book3s_hv.c.


>>   #endif
>>   	default:
>>   		r = 0;
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-01-25  1:24       ` Alexey Kardashevskiy
@ 2016-01-25  5:21         ` David Gibson
  -1 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  5:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 22857 bytes --]

On Mon, Jan 25, 2016 at 12:24:29PM +1100, Alexey Kardashevskiy wrote:
> On 01/25/2016 11:44 AM, David Gibson wrote:
> >On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> >>This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> >>H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> >>devices or emulated PCI.  These calls allow adding multiple entries
> >>(up to 512) into the TCE table in one call which saves time on
> >>transition between kernel and user space.
> >>
> >>This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> >>the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> >>If they can not be handled by the kernel, they are passed on to
> >>the user space. The user space still has to have an implementation
> >>for these.
> >>
> >>Both HV and PR-syle KVM are supported.
> >>
> >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>---
> >>Changes:
> >>v2:
> >>* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> >>* s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
> >>---
> >>  Documentation/virtual/kvm/api.txt       |  25 ++++++
> >>  arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
> >>  arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
> >>  arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
> >>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
> >>  arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
> >>  arch/powerpc/kvm/powerpc.c              |   3 +
> >>  8 files changed, 349 insertions(+), 13 deletions(-)
> >>
> >>diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>index 07e4cdf..da39435 100644
> >>--- a/Documentation/virtual/kvm/api.txt
> >>+++ b/Documentation/virtual/kvm/api.txt
> >>@@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
> >>
> >>  Queues an SMI on the thread's vcpu.
> >>
> >>+4.97 KVM_CAP_PPC_MULTITCE
> >>+
> >>+Capability: KVM_CAP_PPC_MULTITCE
> >>+Architectures: ppc
> >>+Type: vm
> >>+
> >>+This capability means the kernel is capable of handling hypercalls
> >>+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
> >>+space. This significantly accelerates DMA operations for PPC KVM guests.
> >>+User space should expect that its handlers for these hypercalls
> >>+are not going to be called if user space previously registered LIOBN
> >>+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
> >>+
> >>+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
> >>+user space might have to advertise it for the guest. For example,
> >>+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
> >>+present in the "ibm,hypertas-functions" device-tree property.
> >>+
> >>+The hypercalls mentioned above may or may not be processed successfully
> >>+in the kernel based fast path. If they can not be handled by the kernel,
> >>+they will get passed on to user space. So user space still has to have
> >>+an implementation for these despite the in kernel acceleration.
> >>+
> >>+This capability is always enabled.
> >>+
> >>  5. The kvm_run structure
> >>  ------------------------
> >>
> >>diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> >>index 9513911..4cadee5 100644
> >>--- a/arch/powerpc/include/asm/kvm_ppc.h
> >>+++ b/arch/powerpc/include/asm/kvm_ppc.h
> >>@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
> >>
> >>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> >>  				struct kvm_create_spapr_tce *args);
> >>+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> >>+		struct kvm_vcpu *vcpu, unsigned long liobn);
> >>  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> >>  		unsigned long ioba, unsigned long npages);
> >>  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> >>  		unsigned long tce);
> >>+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> >>+		unsigned long *ua, unsigned long **prmap);
> >
> >Putting a userspace address into an unsigned long is pretty nasty: it
> >should be a something __user *.
> >
> >>+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> >>+		unsigned long idx, unsigned long tce);
> >>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  			     unsigned long ioba, unsigned long tce);
> >>+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_list, unsigned long npages);
> >>+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_value, unsigned long npages);
> >>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  			     unsigned long ioba);
> >>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> >>index 975f0ab..987f406 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio.c
> >>@@ -14,6 +14,7 @@
> >>   *
> >>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
> >>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> >>+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
> >>   */
> >>
> >>  #include <linux/types.h>
> >>@@ -37,8 +38,7 @@
> >>  #include <asm/kvm_host.h>
> >>  #include <asm/udbg.h>
> >>  #include <asm/iommu.h>
> >>-
> >>-#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >>+#include <asm/tce.h>
> >>
> >>  static unsigned long kvmppc_stt_npages(unsigned long window_size)
> >>  {
> >>@@ -200,3 +200,109 @@ fail:
> >>  	}
> >>  	return ret;
> >>  }
> >>+
> >>+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce)
> >>+{
> >>+	long ret;
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, 1);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	ret = kvmppc_tce_validate(stt, tce);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> >>+
> >>+	return H_SUCCESS;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> >>+
> >>+long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_list, unsigned long npages)
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret = H_SUCCESS, idx;
> >>+	unsigned long entry, ua = 0;
> >>+	u64 __user *tces, tce;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> >>+	/*
> >>+	 * SPAPR spec says that the maximum size of the list is 512 TCEs
> >>+	 * so the whole table fits in 4K page
> >>+	 */
> >>+	if (npages > 512)
> >>+		return H_PARAMETER;
> >>+
> >>+	if (tce_list & (SZ_4K - 1))
> >>+		return H_PARAMETER;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	idx = srcu_read_lock(&vcpu->kvm->srcu);
> >>+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> >>+		ret = H_TOO_HARD;
> >>+		goto unlock_exit;
> >>+	}
> >>+	tces = (u64 __user *) ua;
> >>+
> >>+	for (i = 0; i < npages; ++i) {
> >>+		if (get_user(tce, tces + i)) {
> >>+			ret = H_PARAMETER;
> >>+			goto unlock_exit;
> >>+		}
> >>+		tce = be64_to_cpu(tce);
> >>+
> >>+		ret = kvmppc_tce_validate(stt, tce);
> >>+		if (ret != H_SUCCESS)
> >>+			goto unlock_exit;
> >>+
> >>+		kvmppc_tce_put(stt, entry + i, tce);
> >>+	}
> >>+
> >>+unlock_exit:
> >>+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >>+
> >>+	return ret;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
> >>+
> >>+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_value, unsigned long npages)
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> >>+		return H_PARAMETER;
> >
> >Do we really need to allow no-permission but non-zero TCEs?
> 
> Not sure, for debugging purposes one could want to poison the table. Totally
> useless?

Hmm, I guess.  Ok, leave it as is.

> >>+
> >>+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> >>+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> >>+
> >>+	return H_SUCCESS;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>index 8cd3a95..58c63ed 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>@@ -14,6 +14,7 @@
> >>   *
> >>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
> >>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> >>+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
> >>   */
> >>
> >>  #include <linux/types.h>
> >>@@ -30,6 +31,7 @@
> >>  #include <asm/kvm_ppc.h>
> >>  #include <asm/kvm_book3s.h>
> >>  #include <asm/mmu-hash64.h>
> >>+#include <asm/mmu_context.h>
> >>  #include <asm/hvcall.h>
> >>  #include <asm/synch.h>
> >>  #include <asm/ppc-opcode.h>
> >>@@ -37,6 +39,7 @@
> >>  #include <asm/udbg.h>
> >>  #include <asm/iommu.h>
> >>  #include <asm/tce.h>
> >>+#include <asm/iommu.h>
> >>
> >>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >>
> >>@@ -46,7 +49,7 @@
> >>   * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> >>   *          mode on PR KVM
> >>   */
> >>-static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>+struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>  		unsigned long liobn)
> >>  {
> >>  	struct kvm *kvm = vcpu->kvm;
> >>@@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>
> >>  	return NULL;
> >>  }
> >>+EXPORT_SYMBOL_GPL(kvmppc_find_table);
> >>
> >>  /*
> >>   * Validates IO address.
> >>@@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
> >>
> >>-/* WARNING: This will be called in real-mode on HV KVM and virtual
> >>- *          mode on PR KVM
> >>- */
> >>-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>-		      unsigned long ioba, unsigned long tce)
> >>+long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> >>+		unsigned long *ua, unsigned long **prmap)
> >>+{
> >>+	unsigned long gfn = gpa >> PAGE_SHIFT;
> >>+	struct kvm_memory_slot *memslot;
> >>+
> >>+	memslot = search_memslots(kvm_memslots(kvm), gfn);
> >>+	if (!memslot)
> >>+		return -EINVAL;
> >>+
> >>+	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> >>+		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> >>+
> >>+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> >
> >It's a bit odd to see a test for HV_POSSIBLE in a file named
> >book3s_64_vio_hv.c
> 
> 
> True, the file name should have probably been changed book3s_64_vio_rm.c.
> 
> 
> >
> >>+	if (prmap)
> >>+		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
> >>+#endif
> >>+
> >>+	return 0;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> >>+
> >>+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> >>+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>+		unsigned long ioba, unsigned long tce)
> >>  {
> >>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> >>  	long ret;
> >>@@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>
> >>  	return H_SUCCESS;
> >>  }
> >>-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> >>+
> >>+static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
> >>+		unsigned long ua, unsigned long *phpa)
> >
> >ua should be a something __user * rather than an unsigned long.  And
> >come to that hpa should be a something * rather than an unsigned long.
> 
> 
> @ua is the return type of __gfn_to_hva_memslot() so I kept it. Also, the
> only place where I actually read from this address is the virtualmode's
> H_PUT_TCE_INDIRECT handler, all other places just do translation with it so
> making it "unsigned long" saves some type convertions. It is also used in
> mm_iommu_ua_to_hpa() which is in upstream now (commit 15b244a88e1b289, part
> of DDW and preregistration patchset). Still need to change it?

Hmm, I guess not.

> Regarding @phpa, the agreement here that we use "void *" for 0xC000.... type
> of addresses; and we use "unsigned long" if top 4 bits are not set as
> dereferencing such pointers will normally fail.

Ah, yes, sorry, forgot that it was an HV physical addr, not an HV
virtual addr.

> 
> 
> 
> >>+{
> >>+	pte_t *ptep, pte;
> >>+	unsigned shift = 0;
> >>+
> >>+	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
> >>+	if (!ptep || !pte_present(*ptep))
> >>+		return -ENXIO;
> >>+	pte = *ptep;
> >>+
> >>+	if (!shift)
> >>+		shift = PAGE_SHIFT;
> >>+
> >>+	/* Avoid handling anything potentially complicated in realmode */
> >>+	if (shift > PAGE_SHIFT)
> >>+		return -EAGAIN;
> >>+
> >>+	if (!pte_young(pte))
> >>+		return -EAGAIN;
> >>+
> >>+	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
> >>+			(ua & ~PAGE_MASK);
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_list,	unsigned long npages)
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret = H_SUCCESS;
> >>+	unsigned long tces, entry, ua = 0;
> >>+	unsigned long *rmap = NULL;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> >>+	/*
> >>+	 * The spec says that the maximum size of the list is 512 TCEs
> >>+	 * so the whole table addressed resides in 4K page
> >>+	 */
> >>+	if (npages > 512)
> >>+		return H_PARAMETER;
> >>+
> >>+	if (tce_list & (SZ_4K - 1))
> >>+		return H_PARAMETER;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> >>+		return H_TOO_HARD;
> >>+
> >>+	rmap = (void *) vmalloc_to_phys(rmap);
> >>+
> >>+	lock_rmap(rmap);
> >>+	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> >>+		ret = H_TOO_HARD;
> >>+		goto unlock_exit;
> >>+	}
> >>+
> >>+	for (i = 0; i < npages; ++i) {
> >>+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> >>+
> >>+		ret = kvmppc_tce_validate(stt, tce);
> >>+		if (ret != H_SUCCESS)
> >>+			goto unlock_exit;
> >>+
> >>+		kvmppc_tce_put(stt, entry + i, tce);
> >>+	}
> >>+
> >>+unlock_exit:
> >>+	unlock_rmap(rmap);
> >>+
> >>+	return ret;
> >>+}
> >>+
> >>+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_value, unsigned long npages)
> >
> >Unlike put_indirect, this code appears to be identical to the non
> >realmode code - can you combine them?
> 
> 
> It is at this point but this will get different bits in "KVM: PPC: vfio kvm
> device: support spapr tce" later, may be sometime later I will manage to get
> to that part, eventually...

I think I'd prefer to see them split only in the patch that actually
makes them different.

> 
> 
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> >>+		return H_PARAMETER;
> >>+
> >>+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> >>+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> >>+
> >>+	return H_SUCCESS;
> >>+}
> >>
> >>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  		unsigned long ioba)
> >>@@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  	return H_SUCCESS;
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
> >>+
> >>+#endif /* KVM_BOOK3S_HV_POSSIBLE */
> >>diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> >>index cff207b..df3fbae 100644
> >>--- a/arch/powerpc/kvm/book3s_hv.c
> >>+++ b/arch/powerpc/kvm/book3s_hv.c
> >>@@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
> >>  		if (kvmppc_xics_enabled(vcpu)) {
> >>  			ret = kvmppc_xics_hcall(vcpu, req);
> >>  			break;
> >>-		} /* fallthrough */
> >>+		}
> >>+		return RESUME_HOST;
> >>+	case H_PUT_TCE:
> >>+		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> >>+						kvmppc_get_gpr(vcpu, 5),
> >>+						kvmppc_get_gpr(vcpu, 6));
> >>+		if (ret == H_TOO_HARD)
> >>+			return RESUME_HOST;
> >>+		break;
> >>+	case H_PUT_TCE_INDIRECT:
> >>+		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> >>+						kvmppc_get_gpr(vcpu, 5),
> >>+						kvmppc_get_gpr(vcpu, 6),
> >>+						kvmppc_get_gpr(vcpu, 7));
> >>+		if (ret == H_TOO_HARD)
> >>+			return RESUME_HOST;
> >>+		break;
> >>+	case H_STUFF_TCE:
> >>+		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> >>+						kvmppc_get_gpr(vcpu, 5),
> >>+						kvmppc_get_gpr(vcpu, 6),
> >>+						kvmppc_get_gpr(vcpu, 7));
> >>+		if (ret == H_TOO_HARD)
> >>+			return RESUME_HOST;
> >>+		break;
> >>  	default:
> >>  		return RESUME_HOST;
> >>  	}
> >>diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >>index 3c6badc..3bf6e72 100644
> >>--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >>+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >>@@ -1928,7 +1928,7 @@ hcall_real_table:
> >>  	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
> >>  	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
> >>  	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> >>-	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
> >>+	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
> >>  	.long	0		/* 0x24 - H_SET_SPRG0 */
> >>  	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
> >>  	.long	0		/* 0x2c */
> >>@@ -2006,8 +2006,8 @@ hcall_real_table:
> >>  	.long	0		/* 0x12c */
> >>  	.long	0		/* 0x130 */
> >>  	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
> >>-	.long	0		/* 0x138 */
> >>-	.long	0		/* 0x13c */
> >>+	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
> >>+	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
> >>  	.long	0		/* 0x140 */
> >>  	.long	0		/* 0x144 */
> >>  	.long	0		/* 0x148 */
> >>diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> >>index f2c75a1..02176fd 100644
> >>--- a/arch/powerpc/kvm/book3s_pr_papr.c
> >>+++ b/arch/powerpc/kvm/book3s_pr_papr.c
> >>@@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
> >>  	return EMULATE_DONE;
> >>  }
> >>
> >>+static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> >>+{
> >>+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> >>+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> >>+	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> >>+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> >>+	long rc;
> >>+
> >>+	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
> >>+			tce, npages);
> >>+	if (rc == H_TOO_HARD)
> >>+		return EMULATE_FAIL;
> >>+	kvmppc_set_gpr(vcpu, 3, rc);
> >>+	return EMULATE_DONE;
> >>+}
> >>+
> >>+static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> >>+{
> >>+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> >>+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> >>+	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> >>+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> >>+	long rc;
> >>+
> >>+	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
> >>+	if (rc == H_TOO_HARD)
> >>+		return EMULATE_FAIL;
> >>+	kvmppc_set_gpr(vcpu, 3, rc);
> >>+	return EMULATE_DONE;
> >>+}
> >>+
> >>  static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
> >>  {
> >>  	long rc = kvmppc_xics_hcall(vcpu, cmd);
> >>@@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
> >>  		return kvmppc_h_pr_bulk_remove(vcpu);
> >>  	case H_PUT_TCE:
> >>  		return kvmppc_h_pr_put_tce(vcpu);
> >>+	case H_PUT_TCE_INDIRECT:
> >>+		return kvmppc_h_pr_put_tce_indirect(vcpu);
> >>+	case H_STUFF_TCE:
> >>+		return kvmppc_h_pr_stuff_tce(vcpu);
> >>  	case H_CEDE:
> >>  		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
> >>  		kvm_vcpu_block(vcpu);
> >>diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> >>index 6fd2405..164735c 100644
> >>--- a/arch/powerpc/kvm/powerpc.c
> >>+++ b/arch/powerpc/kvm/powerpc.c
> >>@@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>  	case KVM_CAP_PPC_GET_SMMU_INFO:
> >>  		r = 1;
> >>  		break;
> >>+	case KVM_CAP_SPAPR_MULTITCE:
> >>+		r = 1;
> >>+		break;
> >
> >Hmm, usual practice has been not to enable new KVM hcalls, unless
> >userspace (qemu) explicitly enables them with ENABLE_HCALL.  I don't
> >see an obvious way this extension could break, but it's probably
> >safest to continue that pattern.
> 
> 
> This advertises the capability but does not enable it, this is still
> required in QEMU:
> 
> ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
>                         H_PUT_TCE_INDIRECT, 1);
> ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
>                         H_STUFF_TCE, 1);
> 
> as multi-tce hcalls are not in the default_hcall_list list in
> arch/powerpc/kvm/book3s_hv.c.

Ah, right, sorry.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-01-25  5:21         ` David Gibson
  0 siblings, 0 replies; 48+ messages in thread
From: David Gibson @ 2016-01-25  5:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 22857 bytes --]

On Mon, Jan 25, 2016 at 12:24:29PM +1100, Alexey Kardashevskiy wrote:
> On 01/25/2016 11:44 AM, David Gibson wrote:
> >On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> >>This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> >>H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> >>devices or emulated PCI.  These calls allow adding multiple entries
> >>(up to 512) into the TCE table in one call which saves time on
> >>transition between kernel and user space.
> >>
> >>This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> >>the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> >>If they can not be handled by the kernel, they are passed on to
> >>the user space. The user space still has to have an implementation
> >>for these.
> >>
> >>Both HV and PR-syle KVM are supported.
> >>
> >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>---
> >>Changes:
> >>v2:
> >>* compare @ret with H_SUCCESS instead of assuming H_SUCCESS is zero
> >>* s/~IOMMU_PAGE_MASK_4K/SZ_4K-1/ when testing @tce_list
> >>---
> >>  Documentation/virtual/kvm/api.txt       |  25 ++++++
> >>  arch/powerpc/include/asm/kvm_ppc.h      |  12 +++
> >>  arch/powerpc/kvm/book3s_64_vio.c        | 110 +++++++++++++++++++++++-
> >>  arch/powerpc/kvm/book3s_64_vio_hv.c     | 145 ++++++++++++++++++++++++++++++--
> >>  arch/powerpc/kvm/book3s_hv.c            |  26 +++++-
> >>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   6 +-
> >>  arch/powerpc/kvm/book3s_pr_papr.c       |  35 ++++++++
> >>  arch/powerpc/kvm/powerpc.c              |   3 +
> >>  8 files changed, 349 insertions(+), 13 deletions(-)
> >>
> >>diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>index 07e4cdf..da39435 100644
> >>--- a/Documentation/virtual/kvm/api.txt
> >>+++ b/Documentation/virtual/kvm/api.txt
> >>@@ -3035,6 +3035,31 @@ Returns: 0 on success, -1 on error
> >>
> >>  Queues an SMI on the thread's vcpu.
> >>
> >>+4.97 KVM_CAP_PPC_MULTITCE
> >>+
> >>+Capability: KVM_CAP_PPC_MULTITCE
> >>+Architectures: ppc
> >>+Type: vm
> >>+
> >>+This capability means the kernel is capable of handling hypercalls
> >>+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
> >>+space. This significantly accelerates DMA operations for PPC KVM guests.
> >>+User space should expect that its handlers for these hypercalls
> >>+are not going to be called if user space previously registered LIOBN
> >>+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
> >>+
> >>+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
> >>+user space might have to advertise it for the guest. For example,
> >>+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
> >>+present in the "ibm,hypertas-functions" device-tree property.
> >>+
> >>+The hypercalls mentioned above may or may not be processed successfully
> >>+in the kernel based fast path. If they can not be handled by the kernel,
> >>+they will get passed on to user space. So user space still has to have
> >>+an implementation for these despite the in kernel acceleration.
> >>+
> >>+This capability is always enabled.
> >>+
> >>  5. The kvm_run structure
> >>  ------------------------
> >>
> >>diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> >>index 9513911..4cadee5 100644
> >>--- a/arch/powerpc/include/asm/kvm_ppc.h
> >>+++ b/arch/powerpc/include/asm/kvm_ppc.h
> >>@@ -166,12 +166,24 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
> >>
> >>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
> >>  				struct kvm_create_spapr_tce *args);
> >>+extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
> >>+		struct kvm_vcpu *vcpu, unsigned long liobn);
> >>  extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> >>  		unsigned long ioba, unsigned long npages);
> >>  extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> >>  		unsigned long tce);
> >>+extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> >>+		unsigned long *ua, unsigned long **prmap);
> >
> >Putting a userspace address into an unsigned long is pretty nasty: it
> >should be a something __user *.
> >
> >>+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> >>+		unsigned long idx, unsigned long tce);
> >>  extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  			     unsigned long ioba, unsigned long tce);
> >>+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_list, unsigned long npages);
> >>+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_value, unsigned long npages);
> >>  extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  			     unsigned long ioba);
> >>  extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> >>index 975f0ab..987f406 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio.c
> >>@@ -14,6 +14,7 @@
> >>   *
> >>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
> >>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> >>+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
> >>   */
> >>
> >>  #include <linux/types.h>
> >>@@ -37,8 +38,7 @@
> >>  #include <asm/kvm_host.h>
> >>  #include <asm/udbg.h>
> >>  #include <asm/iommu.h>
> >>-
> >>-#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >>+#include <asm/tce.h>
> >>
> >>  static unsigned long kvmppc_stt_npages(unsigned long window_size)
> >>  {
> >>@@ -200,3 +200,109 @@ fail:
> >>  	}
> >>  	return ret;
> >>  }
> >>+
> >>+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce)
> >>+{
> >>+	long ret;
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, 1);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	ret = kvmppc_tce_validate(stt, tce);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> >>+
> >>+	return H_SUCCESS;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> >>+
> >>+long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_list, unsigned long npages)
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret = H_SUCCESS, idx;
> >>+	unsigned long entry, ua = 0;
> >>+	u64 __user *tces, tce;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> >>+	/*
> >>+	 * SPAPR spec says that the maximum size of the list is 512 TCEs
> >>+	 * so the whole table fits in 4K page
> >>+	 */
> >>+	if (npages > 512)
> >>+		return H_PARAMETER;
> >>+
> >>+	if (tce_list & (SZ_4K - 1))
> >>+		return H_PARAMETER;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	idx = srcu_read_lock(&vcpu->kvm->srcu);
> >>+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
> >>+		ret = H_TOO_HARD;
> >>+		goto unlock_exit;
> >>+	}
> >>+	tces = (u64 __user *) ua;
> >>+
> >>+	for (i = 0; i < npages; ++i) {
> >>+		if (get_user(tce, tces + i)) {
> >>+			ret = H_PARAMETER;
> >>+			goto unlock_exit;
> >>+		}
> >>+		tce = be64_to_cpu(tce);
> >>+
> >>+		ret = kvmppc_tce_validate(stt, tce);
> >>+		if (ret != H_SUCCESS)
> >>+			goto unlock_exit;
> >>+
> >>+		kvmppc_tce_put(stt, entry + i, tce);
> >>+	}
> >>+
> >>+unlock_exit:
> >>+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >>+
> >>+	return ret;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
> >>+
> >>+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_value, unsigned long npages)
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> >>+		return H_PARAMETER;
> >
> >Do we really need to allow no-permission but non-zero TCEs?
> 
> Not sure, for debugging purposes one could want to poison the table. Totally
> useless?

Hmm, I guess.  Ok, leave it as is.

> >>+
> >>+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> >>+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> >>+
> >>+	return H_SUCCESS;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
> >>diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>index 8cd3a95..58c63ed 100644
> >>--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> >>@@ -14,6 +14,7 @@
> >>   *
> >>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
> >>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> >>+ * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
> >>   */
> >>
> >>  #include <linux/types.h>
> >>@@ -30,6 +31,7 @@
> >>  #include <asm/kvm_ppc.h>
> >>  #include <asm/kvm_book3s.h>
> >>  #include <asm/mmu-hash64.h>
> >>+#include <asm/mmu_context.h>
> >>  #include <asm/hvcall.h>
> >>  #include <asm/synch.h>
> >>  #include <asm/ppc-opcode.h>
> >>@@ -37,6 +39,7 @@
> >>  #include <asm/udbg.h>
> >>  #include <asm/iommu.h>
> >>  #include <asm/tce.h>
> >>+#include <asm/iommu.h>
> >>
> >>  #define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> >>
> >>@@ -46,7 +49,7 @@
> >>   * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> >>   *          mode on PR KVM
> >>   */
> >>-static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>+struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>  		unsigned long liobn)
> >>  {
> >>  	struct kvm *kvm = vcpu->kvm;
> >>@@ -58,6 +61,7 @@ static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu,
> >>
> >>  	return NULL;
> >>  }
> >>+EXPORT_SYMBOL_GPL(kvmppc_find_table);
> >>
> >>  /*
> >>   * Validates IO address.
> >>@@ -150,11 +154,31 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvmppc_tce_put);
> >>
> >>-/* WARNING: This will be called in real-mode on HV KVM and virtual
> >>- *          mode on PR KVM
> >>- */
> >>-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>-		      unsigned long ioba, unsigned long tce)
> >>+long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
> >>+		unsigned long *ua, unsigned long **prmap)
> >>+{
> >>+	unsigned long gfn = gpa >> PAGE_SHIFT;
> >>+	struct kvm_memory_slot *memslot;
> >>+
> >>+	memslot = search_memslots(kvm_memslots(kvm), gfn);
> >>+	if (!memslot)
> >>+		return -EINVAL;
> >>+
> >>+	*ua = __gfn_to_hva_memslot(memslot, gfn) |
> >>+		(gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
> >>+
> >>+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> >
> >It's a bit odd to see a test for HV_POSSIBLE in a file named
> >book3s_64_vio_hv.c
> 
> 
> True, the file name should have probably been changed book3s_64_vio_rm.c.
> 
> 
> >
> >>+	if (prmap)
> >>+		*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
> >>+#endif
> >>+
> >>+	return 0;
> >>+}
> >>+EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua);
> >>+
> >>+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> >>+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>+		unsigned long ioba, unsigned long tce)
> >>  {
> >>  	struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> >>  	long ret;
> >>@@ -177,7 +201,112 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>
> >>  	return H_SUCCESS;
> >>  }
> >>-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
> >>+
> >>+static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu,
> >>+		unsigned long ua, unsigned long *phpa)
> >
> >ua should be a something __user * rather than an unsigned long.  And
> >come to that hpa should be a something * rather than an unsigned long.
> 
> 
> @ua is the return type of __gfn_to_hva_memslot() so I kept it. Also, the
> only place where I actually read from this address is the virtualmode's
> H_PUT_TCE_INDIRECT handler, all other places just do translation with it so
> making it "unsigned long" saves some type convertions. It is also used in
> mm_iommu_ua_to_hpa() which is in upstream now (commit 15b244a88e1b289, part
> of DDW and preregistration patchset). Still need to change it?

Hmm, I guess not.

> Regarding @phpa, the agreement here that we use "void *" for 0xC000.... type
> of addresses; and we use "unsigned long" if top 4 bits are not set as
> dereferencing such pointers will normally fail.

Ah, yes, sorry, forgot that it was an HV physical addr, not an HV
virtual addr.

> 
> 
> 
> >>+{
> >>+	pte_t *ptep, pte;
> >>+	unsigned shift = 0;
> >>+
> >>+	ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift);
> >>+	if (!ptep || !pte_present(*ptep))
> >>+		return -ENXIO;
> >>+	pte = *ptep;
> >>+
> >>+	if (!shift)
> >>+		shift = PAGE_SHIFT;
> >>+
> >>+	/* Avoid handling anything potentially complicated in realmode */
> >>+	if (shift > PAGE_SHIFT)
> >>+		return -EAGAIN;
> >>+
> >>+	if (!pte_young(pte))
> >>+		return -EAGAIN;
> >>+
> >>+	*phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) |
> >>+			(ua & ~PAGE_MASK);
> >>+
> >>+	return 0;
> >>+}
> >>+
> >>+long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_list,	unsigned long npages)
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret = H_SUCCESS;
> >>+	unsigned long tces, entry, ua = 0;
> >>+	unsigned long *rmap = NULL;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> >>+	/*
> >>+	 * The spec says that the maximum size of the list is 512 TCEs
> >>+	 * so the whole table addressed resides in 4K page
> >>+	 */
> >>+	if (npages > 512)
> >>+		return H_PARAMETER;
> >>+
> >>+	if (tce_list & (SZ_4K - 1))
> >>+		return H_PARAMETER;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> >>+		return H_TOO_HARD;
> >>+
> >>+	rmap = (void *) vmalloc_to_phys(rmap);
> >>+
> >>+	lock_rmap(rmap);
> >>+	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> >>+		ret = H_TOO_HARD;
> >>+		goto unlock_exit;
> >>+	}
> >>+
> >>+	for (i = 0; i < npages; ++i) {
> >>+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> >>+
> >>+		ret = kvmppc_tce_validate(stt, tce);
> >>+		if (ret != H_SUCCESS)
> >>+			goto unlock_exit;
> >>+
> >>+		kvmppc_tce_put(stt, entry + i, tce);
> >>+	}
> >>+
> >>+unlock_exit:
> >>+	unlock_rmap(rmap);
> >>+
> >>+	return ret;
> >>+}
> >>+
> >>+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
> >>+		unsigned long liobn, unsigned long ioba,
> >>+		unsigned long tce_value, unsigned long npages)
> >
> >Unlike put_indirect, this code appears to be identical to the non
> >realmode code - can you combine them?
> 
> 
> It is at this point but this will get different bits in "KVM: PPC: vfio kvm
> device: support spapr tce" later, may be sometime later I will manage to get
> to that part, eventually...

I think I'd prefer to see them split only in the patch that actually
makes them different.

> 
> 
> >>+{
> >>+	struct kvmppc_spapr_tce_table *stt;
> >>+	long i, ret;
> >>+
> >>+	stt = kvmppc_find_table(vcpu, liobn);
> >>+	if (!stt)
> >>+		return H_TOO_HARD;
> >>+
> >>+	ret = kvmppc_ioba_validate(stt, ioba, npages);
> >>+	if (ret != H_SUCCESS)
> >>+		return ret;
> >>+
> >>+	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> >>+		return H_PARAMETER;
> >>+
> >>+	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
> >>+		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
> >>+
> >>+	return H_SUCCESS;
> >>+}
> >>
> >>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  		unsigned long ioba)
> >>@@ -204,3 +333,5 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> >>  	return H_SUCCESS;
> >>  }
> >>  EXPORT_SYMBOL_GPL(kvmppc_h_get_tce);
> >>+
> >>+#endif /* KVM_BOOK3S_HV_POSSIBLE */
> >>diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> >>index cff207b..df3fbae 100644
> >>--- a/arch/powerpc/kvm/book3s_hv.c
> >>+++ b/arch/powerpc/kvm/book3s_hv.c
> >>@@ -768,7 +768,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
> >>  		if (kvmppc_xics_enabled(vcpu)) {
> >>  			ret = kvmppc_xics_hcall(vcpu, req);
> >>  			break;
> >>-		} /* fallthrough */
> >>+		}
> >>+		return RESUME_HOST;
> >>+	case H_PUT_TCE:
> >>+		ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> >>+						kvmppc_get_gpr(vcpu, 5),
> >>+						kvmppc_get_gpr(vcpu, 6));
> >>+		if (ret == H_TOO_HARD)
> >>+			return RESUME_HOST;
> >>+		break;
> >>+	case H_PUT_TCE_INDIRECT:
> >>+		ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
> >>+						kvmppc_get_gpr(vcpu, 5),
> >>+						kvmppc_get_gpr(vcpu, 6),
> >>+						kvmppc_get_gpr(vcpu, 7));
> >>+		if (ret == H_TOO_HARD)
> >>+			return RESUME_HOST;
> >>+		break;
> >>+	case H_STUFF_TCE:
> >>+		ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
> >>+						kvmppc_get_gpr(vcpu, 5),
> >>+						kvmppc_get_gpr(vcpu, 6),
> >>+						kvmppc_get_gpr(vcpu, 7));
> >>+		if (ret == H_TOO_HARD)
> >>+			return RESUME_HOST;
> >>+		break;
> >>  	default:
> >>  		return RESUME_HOST;
> >>  	}
> >>diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >>index 3c6badc..3bf6e72 100644
> >>--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >>+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> >>@@ -1928,7 +1928,7 @@ hcall_real_table:
> >>  	.long	DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
> >>  	.long	DOTSYM(kvmppc_h_protect) - hcall_real_table
> >>  	.long	DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> >>-	.long	DOTSYM(kvmppc_h_put_tce) - hcall_real_table
> >>+	.long	DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
> >>  	.long	0		/* 0x24 - H_SET_SPRG0 */
> >>  	.long	DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
> >>  	.long	0		/* 0x2c */
> >>@@ -2006,8 +2006,8 @@ hcall_real_table:
> >>  	.long	0		/* 0x12c */
> >>  	.long	0		/* 0x130 */
> >>  	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
> >>-	.long	0		/* 0x138 */
> >>-	.long	0		/* 0x13c */
> >>+	.long	DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
> >>+	.long	DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
> >>  	.long	0		/* 0x140 */
> >>  	.long	0		/* 0x144 */
> >>  	.long	0		/* 0x148 */
> >>diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> >>index f2c75a1..02176fd 100644
> >>--- a/arch/powerpc/kvm/book3s_pr_papr.c
> >>+++ b/arch/powerpc/kvm/book3s_pr_papr.c
> >>@@ -280,6 +280,37 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu *vcpu)
> >>  	return EMULATE_DONE;
> >>  }
> >>
> >>+static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
> >>+{
> >>+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> >>+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> >>+	unsigned long tce = kvmppc_get_gpr(vcpu, 6);
> >>+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> >>+	long rc;
> >>+
> >>+	rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
> >>+			tce, npages);
> >>+	if (rc == H_TOO_HARD)
> >>+		return EMULATE_FAIL;
> >>+	kvmppc_set_gpr(vcpu, 3, rc);
> >>+	return EMULATE_DONE;
> >>+}
> >>+
> >>+static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
> >>+{
> >>+	unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
> >>+	unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
> >>+	unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
> >>+	unsigned long npages = kvmppc_get_gpr(vcpu, 7);
> >>+	long rc;
> >>+
> >>+	rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
> >>+	if (rc == H_TOO_HARD)
> >>+		return EMULATE_FAIL;
> >>+	kvmppc_set_gpr(vcpu, 3, rc);
> >>+	return EMULATE_DONE;
> >>+}
> >>+
> >>  static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
> >>  {
> >>  	long rc = kvmppc_xics_hcall(vcpu, cmd);
> >>@@ -306,6 +337,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
> >>  		return kvmppc_h_pr_bulk_remove(vcpu);
> >>  	case H_PUT_TCE:
> >>  		return kvmppc_h_pr_put_tce(vcpu);
> >>+	case H_PUT_TCE_INDIRECT:
> >>+		return kvmppc_h_pr_put_tce_indirect(vcpu);
> >>+	case H_STUFF_TCE:
> >>+		return kvmppc_h_pr_stuff_tce(vcpu);
> >>  	case H_CEDE:
> >>  		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
> >>  		kvm_vcpu_block(vcpu);
> >>diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> >>index 6fd2405..164735c 100644
> >>--- a/arch/powerpc/kvm/powerpc.c
> >>+++ b/arch/powerpc/kvm/powerpc.c
> >>@@ -569,6 +569,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>  	case KVM_CAP_PPC_GET_SMMU_INFO:
> >>  		r = 1;
> >>  		break;
> >>+	case KVM_CAP_SPAPR_MULTITCE:
> >>+		r = 1;
> >>+		break;
> >
> >Hmm, usual practice has been not to enable new KVM hcalls, unless
> >userspace (qemu) explicitly enables them with ENABLE_HCALL.  I don't
> >see an obvious way this extension could break, but it's probably
> >safest to continue that pattern.
> 
> 
> This advertises the capability but does not enable it, this is still
> required in QEMU:
> 
> ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
>                         H_PUT_TCE_INDIRECT, 1);
> ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PPC_ENABLE_HCALL, 0,
>                         H_STUFF_TCE, 1);
> 
> as multi-tce hcalls are not in the default_hcall_list list in
> arch/powerpc/kvm/book3s_hv.c.

Ah, right, sorry.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
  2016-01-22  1:59       ` Alexey Kardashevskiy
@ 2016-02-11  4:11         ` Paul Mackerras
  -1 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-11  4:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: David Gibson, linuxppc-dev, kvm-ppc, kvm

On Fri, Jan 22, 2016 at 12:59:47PM +1100, Alexey Kardashevskiy wrote:
> On 01/22/2016 11:42 AM, David Gibson wrote:
> >On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
[snip]
> >>+	if ((ioba & mask) || (idx + npages > size))
> >
> >It doesn't matter for the current callers, but you should check for
> >overflow in idx + npages as well.
> 
> 
> npages can be only 1..512 and this is checked in H_PUT_TCE/etc handlers.
> idx is 52bit long max.
> And this is not going to change because H_PUT_TCE_INDIRECT will always be
> limited by 512 (or one 4K page).
> 
> Do I still need the overflow check here?

You could add "|| npages > TCES_PER_PAGE" and that would make it clear
that there can't be any overflow, and it should get removed by the
compiler for the calls with constant npages.

Paul.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers
@ 2016-02-11  4:11         ` Paul Mackerras
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-11  4:11 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: David Gibson, linuxppc-dev, kvm-ppc, kvm

On Fri, Jan 22, 2016 at 12:59:47PM +1100, Alexey Kardashevskiy wrote:
> On 01/22/2016 11:42 AM, David Gibson wrote:
> >On Thu, Jan 21, 2016 at 06:39:32PM +1100, Alexey Kardashevskiy wrote:
[snip]
> >>+	if ((ioba & mask) || (idx + npages > size))
> >
> >It doesn't matter for the current callers, but you should check for
> >overflow in idx + npages as well.
> 
> 
> npages can be only 1..512 and this is checked in H_PUT_TCE/etc handlers.
> idx is 52bit long max.
> And this is not going to change because H_PUT_TCE_INDIRECT will always be
> limited by 512 (or one 4K page).
> 
> Do I still need the overflow check here?

You could add "|| npages > TCES_PER_PAGE" and that would make it clear
that there can't be any overflow, and it should get removed by the
compiler for the calls with constant npages.

Paul.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-02-11  4:39     ` Paul Mackerras
  -1 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-11  4:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On Thu, Jan 21, 2016 at 06:39:36PM +1100, Alexey Kardashevskiy wrote:
> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> will validate TCE (not to have unexpected bits) and IO address
> (to be within the DMA window boundaries).
> 
> This introduces helpers to validate TCE and IO address. The helpers are
> exported as they compile into vmlinux (to work in realmode) and will be
> used later by KVM kernel module in virtual mode.

Comments below...

> +/* Note on the use of page_address() in real mode,
> + *
> + * It is safe to use page_address() in real mode on ppc64 because
> + * page_address() is always defined as lowmem_page_address()
> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial

"arithmetic" not "arithmetial"

> + * operation and does not access page struct.
> + *
> + * Theoretically page_address() could be defined different
> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> + * should be enabled.

"would have to be enabled" not "should be enabled"

Apart from those nits, the patch looks fine.

Paul.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers
@ 2016-02-11  4:39     ` Paul Mackerras
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-11  4:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On Thu, Jan 21, 2016 at 06:39:36PM +1100, Alexey Kardashevskiy wrote:
> Upcoming multi-tce support (H_PUT_TCE_INDIRECT/H_STUFF_TCE hypercalls)
> will validate TCE (not to have unexpected bits) and IO address
> (to be within the DMA window boundaries).
> 
> This introduces helpers to validate TCE and IO address. The helpers are
> exported as they compile into vmlinux (to work in realmode) and will be
> used later by KVM kernel module in virtual mode.

Comments below...

> +/* Note on the use of page_address() in real mode,
> + *
> + * It is safe to use page_address() in real mode on ppc64 because
> + * page_address() is always defined as lowmem_page_address()
> + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial

"arithmetic" not "arithmetial"

> + * operation and does not access page struct.
> + *
> + * Theoretically page_address() could be defined different
> + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
> + * should be enabled.

"would have to be enabled" not "should be enabled"

Apart from those nits, the patch looks fine.

Paul.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-01-21  7:39   ` Alexey Kardashevskiy
@ 2016-02-11  5:32     ` Paul Mackerras
  -1 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-11  5:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> devices or emulated PCI.  These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition between kernel and user space.
> 
> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> If they can not be handled by the kernel, they are passed on to
> the user space. The user space still has to have an implementation
> for these.
> 
> Both HV and PR-syle KVM are supported.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

[snip]

> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 975f0ab..987f406 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -37,8 +38,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> -
> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#include <asm/tce.h>
>  
>  static unsigned long kvmppc_stt_npages(unsigned long window_size)
>  {
> @@ -200,3 +200,109 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	long ret;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);

As far as I can see, this is functionally identical to the
kvmppc_h_put_tce that we have in book3s_64_vio_hv.c, that gets renamed
later on in this patch.  It would be good to have an explanation in
the commit message why we want two almost-identical functions (and
similarly for kvmppc_h_stuff_tce).  Is it because a future patch is
going to make them different, for instance?

> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)

Looks like we need a bounds check on npages, presumably in
kvmppc_ioba_validate().

> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8cd3a95..58c63ed 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
[...]
> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS;
> +	unsigned long tces, entry, ua = 0;
> +	unsigned long *rmap = NULL;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & (SZ_4K - 1))
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		return H_TOO_HARD;
> +
> +	rmap = (void *) vmalloc_to_phys(rmap);
> +
> +	lock_rmap(rmap);

A comment here explaining why we lock the rmap and what that achieves
would be useful for future generations.

> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret != H_SUCCESS)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	unlock_rmap(rmap);
> +
> +	return ret;
> +}

Paul.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-02-11  5:32     ` Paul Mackerras
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-11  5:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
> devices or emulated PCI.  These calls allow adding multiple entries
> (up to 512) into the TCE table in one call which saves time on
> transition between kernel and user space.
> 
> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
> If they can not be handled by the kernel, they are passed on to
> the user space. The user space still has to have an implementation
> for these.
> 
> Both HV and PR-syle KVM are supported.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

[snip]

> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 975f0ab..987f406 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -14,6 +14,7 @@
>   *
>   * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>   */
>  
>  #include <linux/types.h>
> @@ -37,8 +38,7 @@
>  #include <asm/kvm_host.h>
>  #include <asm/udbg.h>
>  #include <asm/iommu.h>
> -
> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
> +#include <asm/tce.h>
>  
>  static unsigned long kvmppc_stt_npages(unsigned long window_size)
>  {
> @@ -200,3 +200,109 @@ fail:
>  	}
>  	return ret;
>  }
> +
> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce)
> +{
> +	long ret;
> +	struct kvmppc_spapr_tce_table *stt;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	ret = kvmppc_tce_validate(stt, tce);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
> +
> +	return H_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);

As far as I can see, this is functionally identical to the
kvmppc_h_put_tce that we have in book3s_64_vio_hv.c, that gets renamed
later on in this patch.  It would be good to have an explanation in
the commit message why we want two almost-identical functions (and
similarly for kvmppc_h_stuff_tce).  Is it because a future patch is
going to make them different, for instance?

> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_value, unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> +		return H_PARAMETER;
> +
> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)

Looks like we need a bounds check on npages, presumably in
kvmppc_ioba_validate().

> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 8cd3a95..58c63ed 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
[...]
> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> +		unsigned long liobn, unsigned long ioba,
> +		unsigned long tce_list,	unsigned long npages)
> +{
> +	struct kvmppc_spapr_tce_table *stt;
> +	long i, ret = H_SUCCESS;
> +	unsigned long tces, entry, ua = 0;
> +	unsigned long *rmap = NULL;
> +
> +	stt = kvmppc_find_table(vcpu, liobn);
> +	if (!stt)
> +		return H_TOO_HARD;
> +
> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
> +	/*
> +	 * The spec says that the maximum size of the list is 512 TCEs
> +	 * so the whole table addressed resides in 4K page
> +	 */
> +	if (npages > 512)
> +		return H_PARAMETER;
> +
> +	if (tce_list & (SZ_4K - 1))
> +		return H_PARAMETER;
> +
> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
> +	if (ret != H_SUCCESS)
> +		return ret;
> +
> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
> +		return H_TOO_HARD;
> +
> +	rmap = (void *) vmalloc_to_phys(rmap);
> +
> +	lock_rmap(rmap);

A comment here explaining why we lock the rmap and what that achieves
would be useful for future generations.

> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
> +		ret = H_TOO_HARD;
> +		goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> +
> +		ret = kvmppc_tce_validate(stt, tce);
> +		if (ret != H_SUCCESS)
> +			goto unlock_exit;
> +
> +		kvmppc_tce_put(stt, entry + i, tce);
> +	}
> +
> +unlock_exit:
> +	unlock_rmap(rmap);
> +
> +	return ret;
> +}

Paul.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-02-11  5:32     ` Paul Mackerras
@ 2016-02-12  4:54       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-02-12  4:54 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On 02/11/2016 04:32 PM, Paul Mackerras wrote:
> On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
>> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
>> devices or emulated PCI.  These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition between kernel and user space.
>>
>> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
>> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
>> If they can not be handled by the kernel, they are passed on to
>> the user space. The user space still has to have an implementation
>> for these.
>>
>> Both HV and PR-syle KVM are supported.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> [snip]
>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 975f0ab..987f406 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -37,8 +38,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> -
>> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#include <asm/tce.h>
>>
>>   static unsigned long kvmppc_stt_npages(unsigned long window_size)
>>   {
>> @@ -200,3 +200,109 @@ fail:
>>   	}
>>   	return ret;
>>   }
>> +
>> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce)
>> +{
>> +	long ret;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>
> As far as I can see, this is functionally identical to the
> kvmppc_h_put_tce that we have in book3s_64_vio_hv.c, that gets renamed
> later on in this patch.  It would be good to have an explanation in
> the commit message why we want two almost-identical functions (and
> similarly for kvmppc_h_stuff_tce).  Is it because a future patch is
> going to make them different, for instance?
>
>> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>
> Looks like we need a bounds check on npages, presumably in
> kvmppc_ioba_validate().
>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 8cd3a95..58c63ed 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> [...]
>> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list,	unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS;
>> +	unsigned long tces, entry, ua = 0;
>> +	unsigned long *rmap = NULL;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * The spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table addressed resides in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & (SZ_4K - 1))
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>> +		return H_TOO_HARD;
>> +
>> +	rmap = (void *) vmalloc_to_phys(rmap);
>> +
>> +	lock_rmap(rmap);
>
> A comment here explaining why we lock the rmap and what that achieves
> would be useful for future generations.


/* This protects the guest page with the TCE list from going away while we 
are reading TCE list */

?

By "going away" I mean H_ENTER/H_REMOVE executed on parallel CPUs, is this 
roughly correct? as I did grep for "lock_rmap()" and did not find a single 
comment next to it...



>
>> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>> +
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret != H_SUCCESS)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	unlock_rmap(rmap);
>> +
>> +	return ret;
>> +}
>
> Paul.
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-02-12  4:54       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 48+ messages in thread
From: Alexey Kardashevskiy @ 2016-02-12  4:54 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On 02/11/2016 04:32 PM, Paul Mackerras wrote:
> On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
>> This adds real and virtual mode handlers for the H_PUT_TCE_INDIRECT and
>> H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
>> devices or emulated PCI.  These calls allow adding multiple entries
>> (up to 512) into the TCE table in one call which saves time on
>> transition between kernel and user space.
>>
>> This implements the KVM_CAP_PPC_MULTITCE capability. When present,
>> the kernel will try handling H_PUT_TCE_INDIRECT and H_STUFF_TCE.
>> If they can not be handled by the kernel, they are passed on to
>> the user space. The user space still has to have an implementation
>> for these.
>>
>> Both HV and PR-syle KVM are supported.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> [snip]
>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 975f0ab..987f406 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -14,6 +14,7 @@
>>    *
>>    * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>>    * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com>
>> + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com>
>>    */
>>
>>   #include <linux/types.h>
>> @@ -37,8 +38,7 @@
>>   #include <asm/kvm_host.h>
>>   #include <asm/udbg.h>
>>   #include <asm/iommu.h>
>> -
>> -#define TCES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
>> +#include <asm/tce.h>
>>
>>   static unsigned long kvmppc_stt_npages(unsigned long window_size)
>>   {
>> @@ -200,3 +200,109 @@ fail:
>>   	}
>>   	return ret;
>>   }
>> +
>> +long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce)
>> +{
>> +	long ret;
>> +	struct kvmppc_spapr_tce_table *stt;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, 1);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	ret = kvmppc_tce_validate(stt, tce);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
>> +
>> +	return H_SUCCESS;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>
> As far as I can see, this is functionally identical to the
> kvmppc_h_put_tce that we have in book3s_64_vio_hv.c, that gets renamed
> later on in this patch.  It would be good to have an explanation in
> the commit message why we want two almost-identical functions (and
> similarly for kvmppc_h_stuff_tce).  Is it because a future patch is
> going to make them different, for instance?
>
>> +long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_value, unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>> +		return H_PARAMETER;
>> +
>> +	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
>
> Looks like we need a bounds check on npages, presumably in
> kvmppc_ioba_validate().
>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 8cd3a95..58c63ed 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> [...]
>> +long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>> +		unsigned long liobn, unsigned long ioba,
>> +		unsigned long tce_list,	unsigned long npages)
>> +{
>> +	struct kvmppc_spapr_tce_table *stt;
>> +	long i, ret = H_SUCCESS;
>> +	unsigned long tces, entry, ua = 0;
>> +	unsigned long *rmap = NULL;
>> +
>> +	stt = kvmppc_find_table(vcpu, liobn);
>> +	if (!stt)
>> +		return H_TOO_HARD;
>> +
>> +	entry = ioba >> IOMMU_PAGE_SHIFT_4K;
>> +	/*
>> +	 * The spec says that the maximum size of the list is 512 TCEs
>> +	 * so the whole table addressed resides in 4K page
>> +	 */
>> +	if (npages > 512)
>> +		return H_PARAMETER;
>> +
>> +	if (tce_list & (SZ_4K - 1))
>> +		return H_PARAMETER;
>> +
>> +	ret = kvmppc_ioba_validate(stt, ioba, npages);
>> +	if (ret != H_SUCCESS)
>> +		return ret;
>> +
>> +	if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
>> +		return H_TOO_HARD;
>> +
>> +	rmap = (void *) vmalloc_to_phys(rmap);
>> +
>> +	lock_rmap(rmap);
>
> A comment here explaining why we lock the rmap and what that achieves
> would be useful for future generations.


/* This protects the guest page with the TCE list from going away while we 
are reading TCE list */

?

By "going away" I mean H_ENTER/H_REMOVE executed on parallel CPUs, is this 
roughly correct? as I did grep for "lock_rmap()" and did not find a single 
comment next to it...



>
>> +	if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) {
>> +		ret = H_TOO_HARD;
>> +		goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>> +
>> +		ret = kvmppc_tce_validate(stt, tce);
>> +		if (ret != H_SUCCESS)
>> +			goto unlock_exit;
>> +
>> +		kvmppc_tce_put(stt, entry + i, tce);
>> +	}
>> +
>> +unlock_exit:
>> +	unlock_rmap(rmap);
>> +
>> +	return ret;
>> +}
>
> Paul.
>


-- 
Alexey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
  2016-02-12  4:54       ` Alexey Kardashevskiy
@ 2016-02-12  5:52         ` Paul Mackerras
  -1 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-12  5:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On Fri, Feb 12, 2016 at 03:54:18PM +1100, Alexey Kardashevskiy wrote:
> On 02/11/2016 04:32 PM, Paul Mackerras wrote:
> >On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> >>+	rmap = (void *) vmalloc_to_phys(rmap);
> >>+
> >>+	lock_rmap(rmap);
> >
> >A comment here explaining why we lock the rmap and what that achieves
> >would be useful for future generations.
> 
> 
> /* This protects the guest page with the TCE list from going away while we
> are reading TCE list */
> 
> ?
> 
> By "going away" I mean H_ENTER/H_REMOVE executed on parallel CPUs, is this
> roughly correct? as I did grep for "lock_rmap()" and did not find a single
> comment next to it...

Actually, taking the rmap lock stops the guest real -> host real
mapping from changing.  For the comment, I suggest this:

	/*
	 * Synchronize with the MMU notifier callbacks in
	 * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
	 * While we have the rmap lock, code running on other CPUs
	 * cannot finish unmapping the host real page that backs
	 * this guest real page, so we are OK to access the host
	 * real page.
	 */

Paul.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls
@ 2016-02-12  5:52         ` Paul Mackerras
  0 siblings, 0 replies; 48+ messages in thread
From: Paul Mackerras @ 2016-02-12  5:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, David Gibson, kvm-ppc, kvm

On Fri, Feb 12, 2016 at 03:54:18PM +1100, Alexey Kardashevskiy wrote:
> On 02/11/2016 04:32 PM, Paul Mackerras wrote:
> >On Thu, Jan 21, 2016 at 06:39:37PM +1100, Alexey Kardashevskiy wrote:
> >>+	rmap = (void *) vmalloc_to_phys(rmap);
> >>+
> >>+	lock_rmap(rmap);
> >
> >A comment here explaining why we lock the rmap and what that achieves
> >would be useful for future generations.
> 
> 
> /* This protects the guest page with the TCE list from going away while we
> are reading TCE list */
> 
> ?
> 
> By "going away" I mean H_ENTER/H_REMOVE executed on parallel CPUs, is this
> roughly correct? as I did grep for "lock_rmap()" and did not find a single
> comment next to it...

Actually, taking the rmap lock stops the guest real -> host real
mapping from changing.  For the comment, I suggest this:

	/*
	 * Synchronize with the MMU notifier callbacks in
	 * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.).
	 * While we have the rmap lock, code running on other CPUs
	 * cannot finish unmapping the host real page that backs
	 * this guest real page, so we are OK to access the host
	 * real page.
	 */

Paul.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-02-12  5:53 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21  7:39 [PATCH kernel v2 0/6] KVM: PPC: Add in-kernel multitce handling Alexey Kardashevskiy
2016-01-21  7:39 ` Alexey Kardashevskiy
2016-01-21  7:39 ` [PATCH kernel v2 1/6] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers Alexey Kardashevskiy
2016-01-21  7:39   ` Alexey Kardashevskiy
2016-01-22  0:42   ` David Gibson
2016-01-22  0:42     ` David Gibson
2016-01-22  1:59     ` Alexey Kardashevskiy
2016-01-22  1:59       ` Alexey Kardashevskiy
2016-01-24 23:43       ` David Gibson
2016-01-24 23:43         ` David Gibson
2016-02-11  4:11       ` Paul Mackerras
2016-02-11  4:11         ` Paul Mackerras
2016-01-21  7:39 ` [PATCH kernel v2 2/6] KVM: PPC: Use RCU for arch.spapr_tce_tables Alexey Kardashevskiy
2016-01-21  7:39   ` Alexey Kardashevskiy
2016-01-24 23:46   ` David Gibson
2016-01-24 23:46     ` David Gibson
2016-01-21  7:39 ` [PATCH kernel v2 3/6] KVM: PPC: Account TCE-containing pages in locked_vm Alexey Kardashevskiy
2016-01-21  7:39   ` Alexey Kardashevskiy
2016-01-24 23:57   ` David Gibson
2016-01-24 23:57     ` David Gibson
2016-01-21  7:39 ` [PATCH kernel v2 4/6] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K Alexey Kardashevskiy
2016-01-21  7:39   ` Alexey Kardashevskiy
2016-01-21  7:39 ` [PATCH kernel v2 5/6] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers Alexey Kardashevskiy
2016-01-21  7:39   ` Alexey Kardashevskiy
2016-01-25  0:12   ` David Gibson
2016-01-25  0:12     ` David Gibson
2016-01-25  0:18     ` David Gibson
2016-01-25  0:18       ` David Gibson
2016-02-11  4:39   ` Paul Mackerras
2016-02-11  4:39     ` Paul Mackerras
2016-01-21  7:39 ` [PATCH kernel v2 6/6] KVM: PPC: Add support for multiple-TCE hcalls Alexey Kardashevskiy
2016-01-21  7:39   ` Alexey Kardashevskiy
2016-01-21  7:56   ` kbuild test robot
2016-01-21  7:56     ` kbuild test robot
2016-01-21  8:09     ` Alexey Kardashevskiy
2016-01-21  8:09       ` Alexey Kardashevskiy
2016-01-25  0:44   ` David Gibson
2016-01-25  0:44     ` David Gibson
2016-01-25  1:24     ` Alexey Kardashevskiy
2016-01-25  1:24       ` Alexey Kardashevskiy
2016-01-25  5:21       ` David Gibson
2016-01-25  5:21         ` David Gibson
2016-02-11  5:32   ` Paul Mackerras
2016-02-11  5:32     ` Paul Mackerras
2016-02-12  4:54     ` Alexey Kardashevskiy
2016-02-12  4:54       ` Alexey Kardashevskiy
2016-02-12  5:52       ` Paul Mackerras
2016-02-12  5:52         ` Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.