All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel 0/4] KVM: PPC: Some error handling rework
@ 2018-08-30  3:16 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

There are some inconsistencies in how I handle errors which prevents
us from validating these properly:
KVM: PPC: Check if IOMMU page is contained in the pinned physical page
KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages


This is based on sha1
ff69279 Ard Biesheuvel "powerpc: disable support for relative ksymtab references".

Please comment. Thanks.



Alexey Kardashevskiy (4):
  KVM: PPC: Validate all tces before updating tables
  KVM: PPC: Inform the userspace about TCE update failures
  KVM: PPC: Validate TCEs against preregistered memory page sizes
  KVM: PPC: Propagate errors to the guest when failed instead of
    ignoring

 arch/powerpc/include/asm/kvm_ppc.h  |  2 -
 arch/powerpc/kvm/book3s_64_vio.c    | 78 ++++++++++++++++++++++++++++---------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 60 ++++++++++++++++------------
 3 files changed, 96 insertions(+), 44 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH kernel 0/4] KVM: PPC: Some error handling rework
@ 2018-08-30  3:16 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

There are some inconsistencies in how I handle errors which prevents
us from validating these properly:
KVM: PPC: Check if IOMMU page is contained in the pinned physical page
KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages


This is based on sha1
ff69279 Ard Biesheuvel "powerpc: disable support for relative ksymtab references".

Please comment. Thanks.



Alexey Kardashevskiy (4):
  KVM: PPC: Validate all tces before updating tables
  KVM: PPC: Inform the userspace about TCE update failures
  KVM: PPC: Validate TCEs against preregistered memory page sizes
  KVM: PPC: Propagate errors to the guest when failed instead of
    ignoring

 arch/powerpc/include/asm/kvm_ppc.h  |  2 -
 arch/powerpc/kvm/book3s_64_vio.c    | 78 ++++++++++++++++++++++++++++---------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 60 ++++++++++++++++------------
 3 files changed, 96 insertions(+), 44 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH kernel 1/4] KVM: PPC: Validate all tces before updating tables
  2018-08-30  3:16 ` Alexey Kardashevskiy
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

The KVM TCE handlers are written in a way so they fail when either
something went horribly wrong or the userspace did some obvious mistake
such as passing a misaligned address.

We are going to enhance the TCE checker to fail on attempts to map bigger
IOMMU page than the underlying pinned memory so let's valitate TCE
beforehand.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
 2 files changed, 12 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9a3f264..0fef22b 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -599,6 +599,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		if (get_user(tce, tces + i)) {
+			ret = H_TOO_HARD;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
 
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
 				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 506a4d4..7ab6f3f 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -501,6 +501,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
 		ua = 0;
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 1/4] KVM: PPC: Validate all tces before updating tables
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

The KVM TCE handlers are written in a way so they fail when either
something went horribly wrong or the userspace did some obvious mistake
such as passing a misaligned address.

We are going to enhance the TCE checker to fail on attempts to map bigger
IOMMU page than the underlying pinned memory so let's valitate TCE
beforehand.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
 2 files changed, 12 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 9a3f264..0fef22b 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -599,6 +599,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		if (get_user(tce, tces + i)) {
+			ret = H_TOO_HARD;
+			goto unlock_exit;
+		}
+		tce = be64_to_cpu(tce);
 
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
 				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 506a4d4..7ab6f3f 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -501,6 +501,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		ret = kvmppc_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
 		ua = 0;
 		if (kvmppc_gpa_to_ua(vcpu->kvm,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 2/4] KVM: PPC: Inform the userspace about TCE update failures
  2018-08-30  3:16 ` Alexey Kardashevskiy
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

We return H_TOO_HARD from TCE update handlers when we think that
the next handler (realmode -> virtual mode -> user mode) has a chance to
handle the request; H_HARDWARE/H_CLOSED otherwise.

This changes the handlers to return H_TOO_HARD on every error giving
the userspace an opportunity to handle any request or at least log
them all.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++----
 arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 0fef22b..3e8ac98 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -401,7 +401,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
 	long ret;
 
 	if (WARN_ON_ONCE(iommu_tce_xchg(tbl, entry, &hpa, &dir)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (dir == DMA_NONE)
 		return H_SUCCESS;
@@ -449,15 +449,15 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (mm_iommu_mapped_inc(mem))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
 	if (WARN_ON_ONCE(ret)) {
 		mm_iommu_mapped_dec(mem);
-		return H_HARDWARE;
+		return H_TOO_HARD;
 	}
 
 	if (dir != DMA_NONE)
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 7ab6f3f..9584d9b 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -277,10 +277,10 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 
 	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
 			&hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
 	if (ret) {
@@ -478,7 +478,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
 		rmap = (void *) vmalloc_to_phys(rmap);
 		if (WARN_ON_ONCE_RM(!rmap))
-			return H_HARDWARE;
+			return H_TOO_HARD;
 
 		/*
 		 * Synchronize with the MMU notifier callbacks in
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 2/4] KVM: PPC: Inform the userspace about TCE update failures
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

We return H_TOO_HARD from TCE update handlers when we think that
the next handler (realmode -> virtual mode -> user mode) has a chance to
handle the request; H_HARDWARE/H_CLOSED otherwise.

This changes the handlers to return H_TOO_HARD on every error giving
the userspace an opportunity to handle any request or at least log
them all.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++----
 arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 0fef22b..3e8ac98 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -401,7 +401,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
 	long ret;
 
 	if (WARN_ON_ONCE(iommu_tce_xchg(tbl, entry, &hpa, &dir)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (dir = DMA_NONE)
 		return H_SUCCESS;
@@ -449,15 +449,15 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (mm_iommu_mapped_inc(mem))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
 	if (WARN_ON_ONCE(ret)) {
 		mm_iommu_mapped_dec(mem);
-		return H_HARDWARE;
+		return H_TOO_HARD;
 	}
 
 	if (dir != DMA_NONE)
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 7ab6f3f..9584d9b 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -277,10 +277,10 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 
 	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
 			&hpa)))
-		return H_HARDWARE;
+		return H_TOO_HARD;
 
 	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
-		return H_CLOSED;
+		return H_TOO_HARD;
 
 	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
 	if (ret) {
@@ -478,7 +478,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
 		rmap = (void *) vmalloc_to_phys(rmap);
 		if (WARN_ON_ONCE_RM(!rmap))
-			return H_HARDWARE;
+			return H_TOO_HARD;
 
 		/*
 		 * Synchronize with the MMU notifier callbacks in
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 3/4] KVM: PPC: Validate TCEs against preregistered memory page sizes
  2018-08-30  3:16 ` Alexey Kardashevskiy
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

The userspace can request an arbitrary supported page size for a DMA
window and this works fine as long as the mapped memory is backed with
the pages of the same or bigger size; if this is not the case,
mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
dangerously incorrect TCEs.

However since it is quite easy to misconfigure the KVM and we do not do
reverts to all changes made to TCE tables if an error happens in a middle,
we better do the acceptable page size validation before we even touch
the tables.

This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
against the preregistered memory page sizes.

Since the new check uses real/virtual mode helpers, this renames
kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
case and mirrors it for the virtual mode under the old name. The real
mode handler is not used for the virtual mode as:
1. it uses _lockless() list traversing primitives instead of RCU;
2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
virtual mode does not have to use and since on POWER9+radix only virtual
mode handlers actually work, we do not want to slow down that path even
a bit.

This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
are static now.

>From now on the attempts on mapping IOMMU pages bigger than allowed will
result in KVM exit.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 --
 arch/powerpc/kvm/book3s_64_vio.c    | 42 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 30 +++++++++++++++++++-------
 3 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index e991821..2f5d431 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,8 +194,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
 		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
 				(stt)->size, (ioba), (npages)) ?        \
 				H_PARAMETER : H_SUCCESS)
-extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
-		unsigned long tce);
 extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 		unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 3e8ac98..5cd2a66 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -363,6 +363,41 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	return ret;
 }
 
+static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
+{
+	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
+	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
+
+	/* Allow userspace to poison TCE table */
+	if (dir == DMA_NONE)
+		return H_SUCCESS;
+
+	if (iommu_tce_check_gpa(stt->page_shift, gpa))
+		return H_TOO_HARD;
+
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
+	return H_SUCCESS;
+}
+
 static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
 {
 	unsigned long hpa = 0;
@@ -602,6 +637,13 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	}
 
 	for (i = 0; i < npages; ++i) {
+		/*
+		 * This get_user() may produce a different result than few
+		 * lines in the validation loop above but we translate it
+		 * again little later anyway and if that fails, we simply stop
+		 * and return error as it is likely the userspace shooting
+		 * itself in a foot.
+		 */
 		if (get_user(tce, tces + i)) {
 			ret = H_TOO_HARD;
 			goto unlock_exit;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 9584d9b..e79ffbb 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -94,14 +94,14 @@ EXPORT_SYMBOL_GPL(kvmppc_find_table);
  * to the table and user space is supposed to process them), we can skip
  * checking other things (such as TCE is a guest RAM address or the page
  * was actually allocated).
- *
- * WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
  */
-long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
 {
 	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
 	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
 
 	/* Allow userspace to poison TCE table */
 	if (dir == DMA_NONE)
@@ -110,9 +110,25 @@ long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_PARAMETER;
 
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup_rm(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa_rm(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
 
 /* Note on the use of page_address() in real mode,
  *
@@ -345,7 +361,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	ret = kvmppc_tce_validate(stt, tce);
+	ret = kvmppc_rm_tce_validate(stt, tce);
 	if (ret != H_SUCCESS)
 		return ret;
 
@@ -498,7 +514,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	for (i = 0; i < npages; ++i) {
 		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
-		ret = kvmppc_tce_validate(stt, tce);
+		ret = kvmppc_rm_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 3/4] KVM: PPC: Validate TCEs against preregistered memory page sizes
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

The userspace can request an arbitrary supported page size for a DMA
window and this works fine as long as the mapped memory is backed with
the pages of the same or bigger size; if this is not the case,
mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
dangerously incorrect TCEs.

However since it is quite easy to misconfigure the KVM and we do not do
reverts to all changes made to TCE tables if an error happens in a middle,
we better do the acceptable page size validation before we even touch
the tables.

This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
against the preregistered memory page sizes.

Since the new check uses real/virtual mode helpers, this renames
kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
case and mirrors it for the virtual mode under the old name. The real
mode handler is not used for the virtual mode as:
1. it uses _lockless() list traversing primitives instead of RCU;
2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
virtual mode does not have to use and since on POWER9+radix only virtual
mode handlers actually work, we do not want to slow down that path even
a bit.

This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
are static now.

From now on the attempts on mapping IOMMU pages bigger than allowed will
result in KVM exit.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_ppc.h  |  2 --
 arch/powerpc/kvm/book3s_64_vio.c    | 42 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_64_vio_hv.c | 30 +++++++++++++++++++-------
 3 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index e991821..2f5d431 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -194,8 +194,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
 		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
 				(stt)->size, (ioba), (npages)) ?        \
 				H_PARAMETER : H_SUCCESS)
-extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
-		unsigned long tce);
 extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
 		unsigned long *ua, unsigned long **prmap);
 extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 3e8ac98..5cd2a66 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -363,6 +363,41 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	return ret;
 }
 
+static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
+{
+	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
+	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
+
+	/* Allow userspace to poison TCE table */
+	if (dir = DMA_NONE)
+		return H_SUCCESS;
+
+	if (iommu_tce_check_gpa(stt->page_shift, gpa))
+		return H_TOO_HARD;
+
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
+	return H_SUCCESS;
+}
+
 static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
 {
 	unsigned long hpa = 0;
@@ -602,6 +637,13 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	}
 
 	for (i = 0; i < npages; ++i) {
+		/*
+		 * This get_user() may produce a different result than few
+		 * lines in the validation loop above but we translate it
+		 * again little later anyway and if that fails, we simply stop
+		 * and return error as it is likely the userspace shooting
+		 * itself in a foot.
+		 */
 		if (get_user(tce, tces + i)) {
 			ret = H_TOO_HARD;
 			goto unlock_exit;
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 9584d9b..e79ffbb 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -94,14 +94,14 @@ EXPORT_SYMBOL_GPL(kvmppc_find_table);
  * to the table and user space is supposed to process them), we can skip
  * checking other things (such as TCE is a guest RAM address or the page
  * was actually allocated).
- *
- * WARNING: This will be called in real-mode on HV KVM and virtual
- *          mode on PR KVM
  */
-long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
+static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
+		unsigned long tce)
 {
 	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
 	enum dma_data_direction dir = iommu_tce_direction(tce);
+	struct kvmppc_spapr_tce_iommu_table *stit;
+	unsigned long ua = 0;
 
 	/* Allow userspace to poison TCE table */
 	if (dir = DMA_NONE)
@@ -110,9 +110,25 @@ long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
 	if (iommu_tce_check_gpa(stt->page_shift, gpa))
 		return H_PARAMETER;
 
+	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
+				&ua, NULL))
+		return H_TOO_HARD;
+
+	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
+		unsigned long hpa = 0;
+		struct mm_iommu_table_group_mem_t *mem;
+		long shift = stit->tbl->it_page_shift;
+
+		mem = mm_iommu_lookup_rm(stt->kvm->mm, ua, 1ULL << shift);
+		if (!mem)
+			return H_TOO_HARD;
+
+		if (mm_iommu_ua_to_hpa_rm(mem, ua, shift, &hpa))
+			return H_TOO_HARD;
+	}
+
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
 
 /* Note on the use of page_address() in real mode,
  *
@@ -345,7 +361,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret != H_SUCCESS)
 		return ret;
 
-	ret = kvmppc_tce_validate(stt, tce);
+	ret = kvmppc_rm_tce_validate(stt, tce);
 	if (ret != H_SUCCESS)
 		return ret;
 
@@ -498,7 +514,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	for (i = 0; i < npages; ++i) {
 		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
 
-		ret = kvmppc_tce_validate(stt, tce);
+		ret = kvmppc_rm_tce_validate(stt, tce);
 		if (ret != H_SUCCESS)
 			goto unlock_exit;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 4/4] KVM: PPC: Propagate errors to the guest when failed instead of ignoring
  2018-08-30  3:16 ` Alexey Kardashevskiy
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

At the moment if the PUT_TCE{_INDIRECT} handlers fail to update
the hardware tables, we print a warning once, clear the entry and
continue. This is so as at the time the assumption was that if
a VFIO device is hotplugged into the guest, and the userspace replays
virtual DMA mappings (i.e. TCEs) to the hardware tables and if this fails,
then there is nothing useful we can do about it.

However the assumption is not valid as these handlers are not called for
TCE replay (VFIO ioctl interface is used for that) and these handlers
are for new TCEs.

This returns an error to the guest if there is a request which cannot be
processed. By now the only possible failure must be H_TOO_HARD.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 20 ++++++--------------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 20 ++++++--------------
 2 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 5cd2a66..5e3151b 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -568,14 +568,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
 					entry, ua, dir);
 
-		if (ret == H_SUCCESS)
-			continue;
-
-		if (ret == H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_clear_tce(stit->tbl, entry);
 			goto unlock_exit;
-
-		WARN_ON_ONCE(1);
-		kvmppc_clear_tce(stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -660,14 +656,10 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret == H_SUCCESS)
-				continue;
-
-			if (ret == H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_clear_tce(stit->tbl, entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE(1);
-			kvmppc_clear_tce(stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index e79ffbb..8d82133 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -380,14 +380,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry, ua, dir);
 
-		if (ret == H_SUCCESS)
-			continue;
-
-		if (ret == H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_rm_clear_tce(stit->tbl, entry);
 			return ret;
-
-		WARN_ON_ONCE_RM(1);
-		kvmppc_rm_clear_tce(stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -533,14 +529,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret == H_SUCCESS)
-				continue;
-
-			if (ret == H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_rm_clear_tce(stit->tbl, entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH kernel 4/4] KVM: PPC: Propagate errors to the guest when failed instead of ignoring
@ 2018-08-30  3:16   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-30  3:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

At the moment if the PUT_TCE{_INDIRECT} handlers fail to update
the hardware tables, we print a warning once, clear the entry and
continue. This is so as at the time the assumption was that if
a VFIO device is hotplugged into the guest, and the userspace replays
virtual DMA mappings (i.e. TCEs) to the hardware tables and if this fails,
then there is nothing useful we can do about it.

However the assumption is not valid as these handlers are not called for
TCE replay (VFIO ioctl interface is used for that) and these handlers
are for new TCEs.

This returns an error to the guest if there is a request which cannot be
processed. By now the only possible failure must be H_TOO_HARD.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 20 ++++++--------------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 20 ++++++--------------
 2 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 5cd2a66..5e3151b 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -568,14 +568,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
 					entry, ua, dir);
 
-		if (ret = H_SUCCESS)
-			continue;
-
-		if (ret = H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_clear_tce(stit->tbl, entry);
 			goto unlock_exit;
-
-		WARN_ON_ONCE(1);
-		kvmppc_clear_tce(stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -660,14 +656,10 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret = H_SUCCESS)
-				continue;
-
-			if (ret = H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_clear_tce(stit->tbl, entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE(1);
-			kvmppc_clear_tce(stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index e79ffbb..8d82133 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -380,14 +380,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry, ua, dir);
 
-		if (ret = H_SUCCESS)
-			continue;
-
-		if (ret = H_TOO_HARD)
+		if (ret != H_SUCCESS) {
+			kvmppc_rm_clear_tce(stit->tbl, entry);
 			return ret;
-
-		WARN_ON_ONCE_RM(1);
-		kvmppc_rm_clear_tce(stit->tbl, entry);
+		}
 	}
 
 	kvmppc_tce_put(stt, entry, tce);
@@ -533,14 +529,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
-			if (ret = H_SUCCESS)
-				continue;
-
-			if (ret = H_TOO_HARD)
+			if (ret != H_SUCCESS) {
+				kvmppc_rm_clear_tce(stit->tbl, entry);
 				goto unlock_exit;
-
-			WARN_ON_ONCE_RM(1);
-			kvmppc_rm_clear_tce(stit->tbl, entry);
+			}
 		}
 
 		kvmppc_tce_put(stt, entry + i, tce);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/4] KVM: PPC: Validate all tces before updating tables
  2018-08-30  3:16   ` Alexey Kardashevskiy
@ 2018-08-30  4:01     ` David Gibson
  -1 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2835 bytes --]

On Thu, Aug 30, 2018 at 01:16:44PM +1000, Alexey Kardashevskiy wrote:
> The KVM TCE handlers are written in a way so they fail when either
> something went horribly wrong or the userspace did some obvious mistake
> such as passing a misaligned address.
> 
> We are going to enhance the TCE checker to fail on attempts to map bigger
> IOMMU page than the underlying pinned memory so let's valitate TCE
> beforehand.
> 
> This should cause no behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

With one misgiving..

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++++++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9a3f264..0fef22b 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -599,6 +599,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		ret = kvmppc_tce_validate(stt, tce);
>  		if (ret != H_SUCCESS)
>  			goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		if (get_user(tce, tces + i)) {

This looks unsafe, because we validate, then regrab the TCE from
userspace which could have been changed by another thread.

But it actually is safe, because the relevant checks will be
re-executed in the following code.  If userspace tries to change this
dodgily it will result in a messier failure mode but won't threaten
the host.

Long term, I think we would be better off copying everything into
kernel space then doing the validation just once.  But the difference
should only become apparent with a malicious or badly broken guest,
and in the meantime this series addresses a real problem.

So, I think we should go ahead with it despite that imperfection.


> +			ret = H_TOO_HARD;
> +			goto unlock_exit;
> +		}
> +		tce = be64_to_cpu(tce);
>  
>  		if (kvmppc_gpa_to_ua(vcpu->kvm,
>  				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 506a4d4..7ab6f3f 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -501,6 +501,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		ret = kvmppc_tce_validate(stt, tce);
>  		if (ret != H_SUCCESS)
>  			goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>  
>  		ua = 0;
>  		if (kvmppc_gpa_to_ua(vcpu->kvm,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/4] KVM: PPC: Validate all tces before updating tables
@ 2018-08-30  4:01     ` David Gibson
  0 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2835 bytes --]

On Thu, Aug 30, 2018 at 01:16:44PM +1000, Alexey Kardashevskiy wrote:
> The KVM TCE handlers are written in a way so they fail when either
> something went horribly wrong or the userspace did some obvious mistake
> such as passing a misaligned address.
> 
> We are going to enhance the TCE checker to fail on attempts to map bigger
> IOMMU page than the underlying pinned memory so let's valitate TCE
> beforehand.
> 
> This should cause no behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

With one misgiving..

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++++++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9a3f264..0fef22b 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -599,6 +599,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		ret = kvmppc_tce_validate(stt, tce);
>  		if (ret != H_SUCCESS)
>  			goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		if (get_user(tce, tces + i)) {

This looks unsafe, because we validate, then regrab the TCE from
userspace which could have been changed by another thread.

But it actually is safe, because the relevant checks will be
re-executed in the following code.  If userspace tries to change this
dodgily it will result in a messier failure mode but won't threaten
the host.

Long term, I think we would be better off copying everything into
kernel space then doing the validation just once.  But the difference
should only become apparent with a malicious or badly broken guest,
and in the meantime this series addresses a real problem.

So, I think we should go ahead with it despite that imperfection.


> +			ret = H_TOO_HARD;
> +			goto unlock_exit;
> +		}
> +		tce = be64_to_cpu(tce);
>  
>  		if (kvmppc_gpa_to_ua(vcpu->kvm,
>  				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 506a4d4..7ab6f3f 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -501,6 +501,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  		ret = kvmppc_tce_validate(stt, tce);
>  		if (ret != H_SUCCESS)
>  			goto unlock_exit;
> +	}
> +
> +	for (i = 0; i < npages; ++i) {
> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>  
>  		ua = 0;
>  		if (kvmppc_gpa_to_ua(vcpu->kvm,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/4] KVM: PPC: Inform the userspace about TCE update failures
  2018-08-30  3:16   ` Alexey Kardashevskiy
@ 2018-08-30  4:01     ` David Gibson
  -1 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2858 bytes --]

On Thu, Aug 30, 2018 at 01:16:45PM +1000, Alexey Kardashevskiy wrote:
> We return H_TOO_HARD from TCE update handlers when we think that
> the next handler (realmode -> virtual mode -> user mode) has a chance to
> handle the request; H_HARDWARE/H_CLOSED otherwise.
> 
> This changes the handlers to return H_TOO_HARD on every error giving
> the userspace an opportunity to handle any request or at least log
> them all.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++----
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++---
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 0fef22b..3e8ac98 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -401,7 +401,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>  	long ret;
>  
>  	if (WARN_ON_ONCE(iommu_tce_xchg(tbl, entry, &hpa, &dir)))
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  
>  	if (dir == DMA_NONE)
>  		return H_SUCCESS;
> @@ -449,15 +449,15 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		return H_TOO_HARD;
>  
>  	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  
>  	if (mm_iommu_mapped_inc(mem))
> -		return H_CLOSED;
> +		return H_TOO_HARD;
>  
>  	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
>  	if (WARN_ON_ONCE(ret)) {
>  		mm_iommu_mapped_dec(mem);
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  	}
>  
>  	if (dir != DMA_NONE)
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 7ab6f3f..9584d9b 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -277,10 +277,10 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  
>  	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
>  			&hpa)))
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  
>  	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
> -		return H_CLOSED;
> +		return H_TOO_HARD;
>  
>  	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
>  	if (ret) {
> @@ -478,7 +478,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  
>  		rmap = (void *) vmalloc_to_phys(rmap);
>  		if (WARN_ON_ONCE_RM(!rmap))
> -			return H_HARDWARE;
> +			return H_TOO_HARD;
>  
>  		/*
>  		 * Synchronize with the MMU notifier callbacks in

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 2/4] KVM: PPC: Inform the userspace about TCE update failures
@ 2018-08-30  4:01     ` David Gibson
  0 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2858 bytes --]

On Thu, Aug 30, 2018 at 01:16:45PM +1000, Alexey Kardashevskiy wrote:
> We return H_TOO_HARD from TCE update handlers when we think that
> the next handler (realmode -> virtual mode -> user mode) has a chance to
> handle the request; H_HARDWARE/H_CLOSED otherwise.
> 
> This changes the handlers to return H_TOO_HARD on every error giving
> the userspace an opportunity to handle any request or at least log
> them all.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++----
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +++---
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 0fef22b..3e8ac98 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -401,7 +401,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>  	long ret;
>  
>  	if (WARN_ON_ONCE(iommu_tce_xchg(tbl, entry, &hpa, &dir)))
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  
>  	if (dir == DMA_NONE)
>  		return H_SUCCESS;
> @@ -449,15 +449,15 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		return H_TOO_HARD;
>  
>  	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  
>  	if (mm_iommu_mapped_inc(mem))
> -		return H_CLOSED;
> +		return H_TOO_HARD;
>  
>  	ret = iommu_tce_xchg(tbl, entry, &hpa, &dir);
>  	if (WARN_ON_ONCE(ret)) {
>  		mm_iommu_mapped_dec(mem);
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  	}
>  
>  	if (dir != DMA_NONE)
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 7ab6f3f..9584d9b 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -277,10 +277,10 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  
>  	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
>  			&hpa)))
> -		return H_HARDWARE;
> +		return H_TOO_HARD;
>  
>  	if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem)))
> -		return H_CLOSED;
> +		return H_TOO_HARD;
>  
>  	ret = iommu_tce_xchg_rm(tbl, entry, &hpa, &dir);
>  	if (ret) {
> @@ -478,7 +478,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  
>  		rmap = (void *) vmalloc_to_phys(rmap);
>  		if (WARN_ON_ONCE_RM(!rmap))
> -			return H_HARDWARE;
> +			return H_TOO_HARD;
>  
>  		/*
>  		 * Synchronize with the MMU notifier callbacks in

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 3/4] KVM: PPC: Validate TCEs against preregistered memory page sizes
  2018-08-30  3:16   ` Alexey Kardashevskiy
@ 2018-08-30  4:03     ` David Gibson
  -1 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 7311 bytes --]

On Thu, Aug 30, 2018 at 01:16:46PM +1000, Alexey Kardashevskiy wrote:
> The userspace can request an arbitrary supported page size for a DMA
> window and this works fine as long as the mapped memory is backed with
> the pages of the same or bigger size; if this is not the case,
> mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
> dangerously incorrect TCEs.
> 
> However since it is quite easy to misconfigure the KVM and we do not do
> reverts to all changes made to TCE tables if an error happens in a middle,
> we better do the acceptable page size validation before we even touch
> the tables.
> 
> This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
> against the preregistered memory page sizes.
> 
> Since the new check uses real/virtual mode helpers, this renames
> kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
> case and mirrors it for the virtual mode under the old name. The real
> mode handler is not used for the virtual mode as:
> 1. it uses _lockless() list traversing primitives instead of RCU;
> 2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
> virtual mode does not have to use and since on POWER9+radix only virtual
> mode handlers actually work, we do not want to slow down that path even
> a bit.
> 
> This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
> are static now.
> 
> >From now on the attempts on mapping IOMMU pages bigger than allowed will
> result in KVM exit.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 --
>  arch/powerpc/kvm/book3s_64_vio.c    | 42 +++++++++++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 30 +++++++++++++++++++-------
>  3 files changed, 65 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index e991821..2f5d431 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -194,8 +194,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>  		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
>  				(stt)->size, (ioba), (npages)) ?        \
>  				H_PARAMETER : H_SUCCESS)
> -extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> -		unsigned long tce);
>  extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  		unsigned long *ua, unsigned long **prmap);
>  extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 3e8ac98..5cd2a66 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -363,6 +363,41 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	return ret;
>  }
>  
> +static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long tce)
> +{
> +	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> +	enum dma_data_direction dir = iommu_tce_direction(tce);
> +	struct kvmppc_spapr_tce_iommu_table *stit;
> +	unsigned long ua = 0;
> +
> +	/* Allow userspace to poison TCE table */
> +	if (dir == DMA_NONE)
> +		return H_SUCCESS;
> +
> +	if (iommu_tce_check_gpa(stt->page_shift, gpa))
> +		return H_TOO_HARD;
> +
> +	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> +				&ua, NULL))
> +		return H_TOO_HARD;
> +
> +	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
> +		unsigned long hpa = 0;
> +		struct mm_iommu_table_group_mem_t *mem;
> +		long shift = stit->tbl->it_page_shift;
> +
> +		mem = mm_iommu_lookup(stt->kvm->mm, ua, 1ULL << shift);
> +		if (!mem)
> +			return H_TOO_HARD;
> +
> +		if (mm_iommu_ua_to_hpa(mem, ua, shift, &hpa))
> +			return H_TOO_HARD;
> +	}
> +
> +	return H_SUCCESS;
> +}
> +
>  static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
>  {
>  	unsigned long hpa = 0;
> @@ -602,6 +637,13 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  	}
>  
>  	for (i = 0; i < npages; ++i) {
> +		/*
> +		 * This get_user() may produce a different result than few
> +		 * lines in the validation loop above but we translate it
> +		 * again little later anyway and if that fails, we simply stop
> +		 * and return error as it is likely the userspace shooting
> +		 * itself in a foot.
> +		 */
>  		if (get_user(tce, tces + i)) {
>  			ret = H_TOO_HARD;
>  			goto unlock_exit;
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 9584d9b..e79ffbb 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -94,14 +94,14 @@ EXPORT_SYMBOL_GPL(kvmppc_find_table);
>   * to the table and user space is supposed to process them), we can skip
>   * checking other things (such as TCE is a guest RAM address or the page
>   * was actually allocated).
> - *
> - * WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
>   */
> -long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
> +static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long tce)
>  {
>  	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
>  	enum dma_data_direction dir = iommu_tce_direction(tce);
> +	struct kvmppc_spapr_tce_iommu_table *stit;
> +	unsigned long ua = 0;
>  
>  	/* Allow userspace to poison TCE table */
>  	if (dir == DMA_NONE)
> @@ -110,9 +110,25 @@ long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
>  	if (iommu_tce_check_gpa(stt->page_shift, gpa))
>  		return H_PARAMETER;
>  
> +	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> +				&ua, NULL))
> +		return H_TOO_HARD;
> +
> +	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> +		unsigned long hpa = 0;
> +		struct mm_iommu_table_group_mem_t *mem;
> +		long shift = stit->tbl->it_page_shift;
> +
> +		mem = mm_iommu_lookup_rm(stt->kvm->mm, ua, 1ULL << shift);
> +		if (!mem)
> +			return H_TOO_HARD;
> +
> +		if (mm_iommu_ua_to_hpa_rm(mem, ua, shift, &hpa))
> +			return H_TOO_HARD;
> +	}
> +
>  	return H_SUCCESS;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
>  
>  /* Note on the use of page_address() in real mode,
>   *
> @@ -345,7 +361,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> -	ret = kvmppc_tce_validate(stt, tce);
> +	ret = kvmppc_rm_tce_validate(stt, tce);
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> @@ -498,7 +514,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  	for (i = 0; i < npages; ++i) {
>  		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>  
> -		ret = kvmppc_tce_validate(stt, tce);
> +		ret = kvmppc_rm_tce_validate(stt, tce);
>  		if (ret != H_SUCCESS)
>  			goto unlock_exit;
>  	}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 3/4] KVM: PPC: Validate TCEs against preregistered memory page sizes
@ 2018-08-30  4:03     ` David Gibson
  0 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 7311 bytes --]

On Thu, Aug 30, 2018 at 01:16:46PM +1000, Alexey Kardashevskiy wrote:
> The userspace can request an arbitrary supported page size for a DMA
> window and this works fine as long as the mapped memory is backed with
> the pages of the same or bigger size; if this is not the case,
> mm_iommu_ua_to_hpa{_rm}() fail and tables do not populated with
> dangerously incorrect TCEs.
> 
> However since it is quite easy to misconfigure the KVM and we do not do
> reverts to all changes made to TCE tables if an error happens in a middle,
> we better do the acceptable page size validation before we even touch
> the tables.
> 
> This enhances kvmppc_tce_validate() to check the hardware IOMMU page sizes
> against the preregistered memory page sizes.
> 
> Since the new check uses real/virtual mode helpers, this renames
> kvmppc_tce_validate() to kvmppc_rm_tce_validate() to handle the real mode
> case and mirrors it for the virtual mode under the old name. The real
> mode handler is not used for the virtual mode as:
> 1. it uses _lockless() list traversing primitives instead of RCU;
> 2. realmode's mm_iommu_ua_to_hpa_rm() uses vmalloc_to_phys() which
> virtual mode does not have to use and since on POWER9+radix only virtual
> mode handlers actually work, we do not want to slow down that path even
> a bit.
> 
> This removes EXPORT_SYMBOL_GPL(kvmppc_tce_validate) as the validators
> are static now.
> 
> >From now on the attempts on mapping IOMMU pages bigger than allowed will
> result in KVM exit.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  2 --
>  arch/powerpc/kvm/book3s_64_vio.c    | 42 +++++++++++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 30 +++++++++++++++++++-------
>  3 files changed, 65 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index e991821..2f5d431 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -194,8 +194,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
>  		(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
>  				(stt)->size, (ioba), (npages)) ?        \
>  				H_PARAMETER : H_SUCCESS)
> -extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt,
> -		unsigned long tce);
>  extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa,
>  		unsigned long *ua, unsigned long **prmap);
>  extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 3e8ac98..5cd2a66 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -363,6 +363,41 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	return ret;
>  }
>  
> +static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long tce)
> +{
> +	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> +	enum dma_data_direction dir = iommu_tce_direction(tce);
> +	struct kvmppc_spapr_tce_iommu_table *stit;
> +	unsigned long ua = 0;
> +
> +	/* Allow userspace to poison TCE table */
> +	if (dir == DMA_NONE)
> +		return H_SUCCESS;
> +
> +	if (iommu_tce_check_gpa(stt->page_shift, gpa))
> +		return H_TOO_HARD;
> +
> +	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> +				&ua, NULL))
> +		return H_TOO_HARD;
> +
> +	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
> +		unsigned long hpa = 0;
> +		struct mm_iommu_table_group_mem_t *mem;
> +		long shift = stit->tbl->it_page_shift;
> +
> +		mem = mm_iommu_lookup(stt->kvm->mm, ua, 1ULL << shift);
> +		if (!mem)
> +			return H_TOO_HARD;
> +
> +		if (mm_iommu_ua_to_hpa(mem, ua, shift, &hpa))
> +			return H_TOO_HARD;
> +	}
> +
> +	return H_SUCCESS;
> +}
> +
>  static void kvmppc_clear_tce(struct iommu_table *tbl, unsigned long entry)
>  {
>  	unsigned long hpa = 0;
> @@ -602,6 +637,13 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  	}
>  
>  	for (i = 0; i < npages; ++i) {
> +		/*
> +		 * This get_user() may produce a different result than few
> +		 * lines in the validation loop above but we translate it
> +		 * again little later anyway and if that fails, we simply stop
> +		 * and return error as it is likely the userspace shooting
> +		 * itself in a foot.
> +		 */
>  		if (get_user(tce, tces + i)) {
>  			ret = H_TOO_HARD;
>  			goto unlock_exit;
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 9584d9b..e79ffbb 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -94,14 +94,14 @@ EXPORT_SYMBOL_GPL(kvmppc_find_table);
>   * to the table and user space is supposed to process them), we can skip
>   * checking other things (such as TCE is a guest RAM address or the page
>   * was actually allocated).
> - *
> - * WARNING: This will be called in real-mode on HV KVM and virtual
> - *          mode on PR KVM
>   */
> -long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
> +static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
> +		unsigned long tce)
>  {
>  	unsigned long gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
>  	enum dma_data_direction dir = iommu_tce_direction(tce);
> +	struct kvmppc_spapr_tce_iommu_table *stit;
> +	unsigned long ua = 0;
>  
>  	/* Allow userspace to poison TCE table */
>  	if (dir == DMA_NONE)
> @@ -110,9 +110,25 @@ long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
>  	if (iommu_tce_check_gpa(stt->page_shift, gpa))
>  		return H_PARAMETER;
>  
> +	if (kvmppc_gpa_to_ua(stt->kvm, tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
> +				&ua, NULL))
> +		return H_TOO_HARD;
> +
> +	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> +		unsigned long hpa = 0;
> +		struct mm_iommu_table_group_mem_t *mem;
> +		long shift = stit->tbl->it_page_shift;
> +
> +		mem = mm_iommu_lookup_rm(stt->kvm->mm, ua, 1ULL << shift);
> +		if (!mem)
> +			return H_TOO_HARD;
> +
> +		if (mm_iommu_ua_to_hpa_rm(mem, ua, shift, &hpa))
> +			return H_TOO_HARD;
> +	}
> +
>  	return H_SUCCESS;
>  }
> -EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
>  
>  /* Note on the use of page_address() in real mode,
>   *
> @@ -345,7 +361,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> -	ret = kvmppc_tce_validate(stt, tce);
> +	ret = kvmppc_rm_tce_validate(stt, tce);
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> @@ -498,7 +514,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  	for (i = 0; i < npages; ++i) {
>  		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>  
> -		ret = kvmppc_tce_validate(stt, tce);
> +		ret = kvmppc_rm_tce_validate(stt, tce);
>  		if (ret != H_SUCCESS)
>  			goto unlock_exit;
>  	}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 4/4] KVM: PPC: Propagate errors to the guest when failed instead of ignoring
  2018-08-30  3:16   ` Alexey Kardashevskiy
@ 2018-08-30  4:04     ` David Gibson
  -1 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 3672 bytes --]

On Thu, Aug 30, 2018 at 01:16:47PM +1000, Alexey Kardashevskiy wrote:
> At the moment if the PUT_TCE{_INDIRECT} handlers fail to update
> the hardware tables, we print a warning once, clear the entry and
> continue. This is so as at the time the assumption was that if
> a VFIO device is hotplugged into the guest, and the userspace replays
> virtual DMA mappings (i.e. TCEs) to the hardware tables and if this fails,
> then there is nothing useful we can do about it.
> 
> However the assumption is not valid as these handlers are not called for
> TCE replay (VFIO ioctl interface is used for that) and these handlers
> are for new TCEs.
> 
> This returns an error to the guest if there is a request which cannot be
> processed. By now the only possible failure must be H_TOO_HARD.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 20 ++++++--------------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 20 ++++++--------------
>  2 files changed, 12 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 5cd2a66..5e3151b 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -568,14 +568,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>  					entry, ua, dir);
>  
> -		if (ret == H_SUCCESS)
> -			continue;
> -
> -		if (ret == H_TOO_HARD)
> +		if (ret != H_SUCCESS) {
> +			kvmppc_clear_tce(stit->tbl, entry);
>  			goto unlock_exit;
> -
> -		WARN_ON_ONCE(1);
> -		kvmppc_clear_tce(stit->tbl, entry);
> +		}
>  	}
>  
>  	kvmppc_tce_put(stt, entry, tce);
> @@ -660,14 +656,10 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> -			if (ret == H_SUCCESS)
> -				continue;
> -
> -			if (ret == H_TOO_HARD)
> +			if (ret != H_SUCCESS) {
> +				kvmppc_clear_tce(stit->tbl, entry);
>  				goto unlock_exit;
> -
> -			WARN_ON_ONCE(1);
> -			kvmppc_clear_tce(stit->tbl, entry);
> +			}
>  		}
>  
>  		kvmppc_tce_put(stt, entry + i, tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e79ffbb..8d82133 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -380,14 +380,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry, ua, dir);
>  
> -		if (ret == H_SUCCESS)
> -			continue;
> -
> -		if (ret == H_TOO_HARD)
> +		if (ret != H_SUCCESS) {
> +			kvmppc_rm_clear_tce(stit->tbl, entry);
>  			return ret;
> -
> -		WARN_ON_ONCE_RM(1);
> -		kvmppc_rm_clear_tce(stit->tbl, entry);
> +		}
>  	}
>  
>  	kvmppc_tce_put(stt, entry, tce);
> @@ -533,14 +529,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> -			if (ret == H_SUCCESS)
> -				continue;
> -
> -			if (ret == H_TOO_HARD)
> +			if (ret != H_SUCCESS) {
> +				kvmppc_rm_clear_tce(stit->tbl, entry);
>  				goto unlock_exit;
> -
> -			WARN_ON_ONCE_RM(1);
> -			kvmppc_rm_clear_tce(stit->tbl, entry);
> +			}
>  		}
>  
>  		kvmppc_tce_put(stt, entry + i, tce);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 4/4] KVM: PPC: Propagate errors to the guest when failed instead of ignoring
@ 2018-08-30  4:04     ` David Gibson
  0 siblings, 0 replies; 20+ messages in thread
From: David Gibson @ 2018-08-30  4:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 3672 bytes --]

On Thu, Aug 30, 2018 at 01:16:47PM +1000, Alexey Kardashevskiy wrote:
> At the moment if the PUT_TCE{_INDIRECT} handlers fail to update
> the hardware tables, we print a warning once, clear the entry and
> continue. This is so as at the time the assumption was that if
> a VFIO device is hotplugged into the guest, and the userspace replays
> virtual DMA mappings (i.e. TCEs) to the hardware tables and if this fails,
> then there is nothing useful we can do about it.
> 
> However the assumption is not valid as these handlers are not called for
> TCE replay (VFIO ioctl interface is used for that) and these handlers
> are for new TCEs.
> 
> This returns an error to the guest if there is a request which cannot be
> processed. By now the only possible failure must be H_TOO_HARD.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 20 ++++++--------------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 20 ++++++--------------
>  2 files changed, 12 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 5cd2a66..5e3151b 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -568,14 +568,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>  					entry, ua, dir);
>  
> -		if (ret == H_SUCCESS)
> -			continue;
> -
> -		if (ret == H_TOO_HARD)
> +		if (ret != H_SUCCESS) {
> +			kvmppc_clear_tce(stit->tbl, entry);
>  			goto unlock_exit;
> -
> -		WARN_ON_ONCE(1);
> -		kvmppc_clear_tce(stit->tbl, entry);
> +		}
>  	}
>  
>  	kvmppc_tce_put(stt, entry, tce);
> @@ -660,14 +656,10 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> -			if (ret == H_SUCCESS)
> -				continue;
> -
> -			if (ret == H_TOO_HARD)
> +			if (ret != H_SUCCESS) {
> +				kvmppc_clear_tce(stit->tbl, entry);
>  				goto unlock_exit;
> -
> -			WARN_ON_ONCE(1);
> -			kvmppc_clear_tce(stit->tbl, entry);
> +			}
>  		}
>  
>  		kvmppc_tce_put(stt, entry + i, tce);
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e79ffbb..8d82133 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -380,14 +380,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry, ua, dir);
>  
> -		if (ret == H_SUCCESS)
> -			continue;
> -
> -		if (ret == H_TOO_HARD)
> +		if (ret != H_SUCCESS) {
> +			kvmppc_rm_clear_tce(stit->tbl, entry);
>  			return ret;
> -
> -		WARN_ON_ONCE_RM(1);
> -		kvmppc_rm_clear_tce(stit->tbl, entry);
> +		}
>  	}
>  
>  	kvmppc_tce_put(stt, entry, tce);
> @@ -533,14 +529,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> -			if (ret == H_SUCCESS)
> -				continue;
> -
> -			if (ret == H_TOO_HARD)
> +			if (ret != H_SUCCESS) {
> +				kvmppc_rm_clear_tce(stit->tbl, entry);
>  				goto unlock_exit;
> -
> -			WARN_ON_ONCE_RM(1);
> -			kvmppc_rm_clear_tce(stit->tbl, entry);
> +			}
>  		}
>  
>  		kvmppc_tce_put(stt, entry + i, tce);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/4] KVM: PPC: Validate all tces before updating tables
  2018-08-30  4:01     ` David Gibson
@ 2018-08-31  4:04       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-31  4:04 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras



On 30/08/2018 14:01, David Gibson wrote:
> On Thu, Aug 30, 2018 at 01:16:44PM +1000, Alexey Kardashevskiy wrote:
>> The KVM TCE handlers are written in a way so they fail when either
>> something went horribly wrong or the userspace did some obvious mistake
>> such as passing a misaligned address.
>>
>> We are going to enhance the TCE checker to fail on attempts to map bigger
>> IOMMU page than the underlying pinned memory so let's valitate TCE
>> beforehand.
>>
>> This should cause no behavioral change.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> With one misgiving..
> 
>> ---
>>  arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++++++
>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
>>  2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 9a3f264..0fef22b 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -599,6 +599,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  		ret = kvmppc_tce_validate(stt, tce);
>>  		if (ret != H_SUCCESS)
>>  			goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		if (get_user(tce, tces + i)) {
> 
> This looks unsafe, because we validate, then regrab the TCE from
> userspace which could have been changed by another thread.
> 
> But it actually is safe, because the relevant checks will be
> re-executed in the following code.  If userspace tries to change this
> dodgily it will result in a messier failure mode but won't threaten
> the host.


I have put this to 3/4 for this get_user() while it should have been here:
+		/*
+		 * This get_user() may produce a different result than few
+		 * lines in the validation loop above but we translate it
+		 * again little later anyway and if that fails, we simply stop
+		 * and return error as it is likely the userspace shooting
+		 * itself in a foot.
+		 */

Might repost, testing that THP+realmode patch....


> 
> Long term, I think we would be better off copying everything into
> kernel space then doing the validation just once.  But the difference
> should only become apparent with a malicious or badly broken guest,
> and in the meantime this series addresses a real problem.
> 
> So, I think we should go ahead with it despite that imperfection.
> 
> 
>> +			ret = H_TOO_HARD;
>> +			goto unlock_exit;
>> +		}
>> +		tce = be64_to_cpu(tce);
>>  
>>  		if (kvmppc_gpa_to_ua(vcpu->kvm,
>>  				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 506a4d4..7ab6f3f 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -501,6 +501,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  		ret = kvmppc_tce_validate(stt, tce);
>>  		if (ret != H_SUCCESS)
>>  			goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>>  
>>  		ua = 0;
>>  		if (kvmppc_gpa_to_ua(vcpu->kvm,
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH kernel 1/4] KVM: PPC: Validate all tces before updating tables
@ 2018-08-31  4:04       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Kardashevskiy @ 2018-08-31  4:04 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras



On 30/08/2018 14:01, David Gibson wrote:
> On Thu, Aug 30, 2018 at 01:16:44PM +1000, Alexey Kardashevskiy wrote:
>> The KVM TCE handlers are written in a way so they fail when either
>> something went horribly wrong or the userspace did some obvious mistake
>> such as passing a misaligned address.
>>
>> We are going to enhance the TCE checker to fail on attempts to map bigger
>> IOMMU page than the underlying pinned memory so let's valitate TCE
>> beforehand.
>>
>> This should cause no behavioral change.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> With one misgiving..
> 
>> ---
>>  arch/powerpc/kvm/book3s_64_vio.c    | 8 ++++++++
>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 4 ++++
>>  2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 9a3f264..0fef22b 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -599,6 +599,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  		ret = kvmppc_tce_validate(stt, tce);
>>  		if (ret != H_SUCCESS)
>>  			goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		if (get_user(tce, tces + i)) {
> 
> This looks unsafe, because we validate, then regrab the TCE from
> userspace which could have been changed by another thread.
> 
> But it actually is safe, because the relevant checks will be
> re-executed in the following code.  If userspace tries to change this
> dodgily it will result in a messier failure mode but won't threaten
> the host.


I have put this to 3/4 for this get_user() while it should have been here:
+		/*
+		 * This get_user() may produce a different result than few
+		 * lines in the validation loop above but we translate it
+		 * again little later anyway and if that fails, we simply stop
+		 * and return error as it is likely the userspace shooting
+		 * itself in a foot.
+		 */

Might repost, testing that THP+realmode patch....


> 
> Long term, I think we would be better off copying everything into
> kernel space then doing the validation just once.  But the difference
> should only become apparent with a malicious or badly broken guest,
> and in the meantime this series addresses a real problem.
> 
> So, I think we should go ahead with it despite that imperfection.
> 
> 
>> +			ret = H_TOO_HARD;
>> +			goto unlock_exit;
>> +		}
>> +		tce = be64_to_cpu(tce);
>>  
>>  		if (kvmppc_gpa_to_ua(vcpu->kvm,
>>  				tce & ~(TCE_PCI_READ | TCE_PCI_WRITE),
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index 506a4d4..7ab6f3f 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -501,6 +501,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  		ret = kvmppc_tce_validate(stt, tce);
>>  		if (ret != H_SUCCESS)
>>  			goto unlock_exit;
>> +	}
>> +
>> +	for (i = 0; i < npages; ++i) {
>> +		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>>  
>>  		ua = 0;
>>  		if (kvmppc_gpa_to_ua(vcpu->kvm,
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-08-31  4:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30  3:16 [PATCH kernel 0/4] KVM: PPC: Some error handling rework Alexey Kardashevskiy
2018-08-30  3:16 ` Alexey Kardashevskiy
2018-08-30  3:16 ` [PATCH kernel 1/4] KVM: PPC: Validate all tces before updating tables Alexey Kardashevskiy
2018-08-30  3:16   ` Alexey Kardashevskiy
2018-08-30  4:01   ` David Gibson
2018-08-30  4:01     ` David Gibson
2018-08-31  4:04     ` Alexey Kardashevskiy
2018-08-31  4:04       ` Alexey Kardashevskiy
2018-08-30  3:16 ` [PATCH kernel 2/4] KVM: PPC: Inform the userspace about TCE update failures Alexey Kardashevskiy
2018-08-30  3:16   ` Alexey Kardashevskiy
2018-08-30  4:01   ` David Gibson
2018-08-30  4:01     ` David Gibson
2018-08-30  3:16 ` [PATCH kernel 3/4] KVM: PPC: Validate TCEs against preregistered memory page sizes Alexey Kardashevskiy
2018-08-30  3:16   ` Alexey Kardashevskiy
2018-08-30  4:03   ` David Gibson
2018-08-30  4:03     ` David Gibson
2018-08-30  3:16 ` [PATCH kernel 4/4] KVM: PPC: Propagate errors to the guest when failed instead of ignoring Alexey Kardashevskiy
2018-08-30  3:16   ` Alexey Kardashevskiy
2018-08-30  4:04   ` David Gibson
2018-08-30  4:04     ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.