[PATCH kernel 0/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH kernel 0/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
@ 2018-05-02  4:07 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  4:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

This is to allow the guest to use big host IOMMU pages even when
the exact page size (i.e. 16MB) is not supported by the host (i.e.P9).


Please comment. Thanks.



Alexey Kardashevskiy (2):
  KVM: PPC: Use correct page shift in H_STUFF_TCE
  KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
    pages

 arch/powerpc/kvm/book3s_64_vio.c    | 68 +++++++++++++++++++++++++++++--------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 54 +++++++++++++++++++++++++----
 2 files changed, 100 insertions(+), 22 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH kernel 0/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
@ 2018-05-02  4:07 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  4:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

This is to allow the guest to use big host IOMMU pages even when
the exact page size (i.e. 16MB) is not supported by the host (i.e.P9).


Please comment. Thanks.



Alexey Kardashevskiy (2):
  KVM: PPC: Use correct page shift in H_STUFF_TCE
  KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
    pages

 arch/powerpc/kvm/book3s_64_vio.c    | 68 +++++++++++++++++++++++++++++--------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 54 +++++++++++++++++++++++++----
 2 files changed, 100 insertions(+), 22 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH kernel 1/2] KVM: PPC: Use correct page shift in H_STUFF_TCE
  2018-05-02  4:07 ` Alexey Kardashevskiy
@ 2018-05-02  4:07   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  4:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

The other TCE handlers use page shift from the guest visible TCE table
(described by kvmppc_spapr_tce_iommu_table) so let's make H_STUFF_TCE
handlers do the same thing.

This should cause no behavioral change now but soon we will allow
the iommu_table::it_page_shift being different from from the emulated
table page size so this will play a role.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 4dffa61..041e54d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -615,7 +615,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 		return H_PARAMETER;
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-		unsigned long entry = ioba >> stit->tbl->it_page_shift;
+		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
 			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6651f73..e220fab 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -526,7 +526,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 		return H_PARAMETER;
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-		unsigned long entry = ioba >> stit->tbl->it_page_shift;
+		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
 			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 1/2] KVM: PPC: Use correct page shift in H_STUFF_TCE
@ 2018-05-02  4:07   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  4:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

The other TCE handlers use page shift from the guest visible TCE table
(described by kvmppc_spapr_tce_iommu_table) so let's make H_STUFF_TCE
handlers do the same thing.

This should cause no behavioral change now but soon we will allow
the iommu_table::it_page_shift being different from from the emulated
table page size so this will play a role.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 4dffa61..041e54d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -615,7 +615,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 		return H_PARAMETER;
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-		unsigned long entry = ioba >> stit->tbl->it_page_shift;
+		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
 			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6651f73..e220fab 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -526,7 +526,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 		return H_PARAMETER;
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-		unsigned long entry = ioba >> stit->tbl->it_page_shift;
+		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
 			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  4:07 ` Alexey Kardashevskiy
@ 2018-05-02  4:07   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  4:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

At the moment we only support in the host the IOMMU page sizes which
the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
emulate bigger guest pages (for example 16MB) with smaller host pages
(4KB/64KB/2MB).

This allows the physical IOMMU pages to use a page size smaller or equal
than the guest visible IOMMU page size.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
 2 files changed, 98 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 041e54d..e10d6a3 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 
 		if (!tbltmp)
 			continue;
-		/*
-		 * Make sure hardware table parameters are exactly the same;
-		 * this is used in the TCE handlers where boundary checks
-		 * use only the first attached table.
-		 */
-		if ((tbltmp->it_page_shift == stt->page_shift) &&
-				(tbltmp->it_offset == stt->offset) &&
-				(tbltmp->it_size == stt->size)) {
+		/* Make sure hardware table parameters are compatible */
+		if ((tbltmp->it_page_shift <= stt->page_shift) &&
+				(tbltmp->it_offset << tbltmp->it_page_shift ==
+				 stt->offset << stt->page_shift) &&
+				(tbltmp->it_size << tbltmp->it_page_shift ==
+				 stt->size << stt->page_shift)) {
 			/*
 			 * Reference the table to avoid races with
 			 * add/remove DMA windows.
@@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
 	return H_SUCCESS;
 }
 
-static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
+static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
 		struct iommu_table *tbl, unsigned long entry)
 {
 	enum dma_data_direction dir = DMA_NONE;
@@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
 	return ret;
 }
 
-long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
+static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg;
+
+	for (subpg = 0; subpg < subpages; ++subpg) {
+		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
+long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		unsigned long entry, unsigned long ua,
 		enum dma_data_direction dir)
 {
@@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
 	return 0;
 }
 
+static long kvmppc_tce_iommu_map(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry, unsigned long ua,
+		enum dma_data_direction dir)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg, pgoff;
+
+	for (subpg = 0, pgoff = 0; subpg < subpages;
+			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
+
+		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
+				io_entry + subpg, ua + pgoff, dir);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba, unsigned long tce)
 {
@@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
 		if (dir == DMA_NONE)
-			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry);
 		else
-			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
+			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
 					entry, ua, dir);
 
 		if (ret == H_SUCCESS)
@@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-			ret = kvmppc_tce_iommu_map(vcpu->kvm,
+			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
@@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
-			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry + i);
 
 			if (ret == H_SUCCESS)
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index e220fab..258e786 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
 	return H_SUCCESS;
 }
 
-static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
+static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
 		struct iommu_table *tbl, unsigned long entry)
 {
 	enum dma_data_direction dir = DMA_NONE;
@@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
 	return ret;
 }
 
-static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
+static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg;
+
+	for (subpg = 0; subpg < subpages; ++subpg) {
+		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
+static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		unsigned long entry, unsigned long ua,
 		enum dma_data_direction dir)
 {
@@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
 	return 0;
 }
 
+static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry, unsigned long ua,
+		enum dma_data_direction dir)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg, pgoff;
+
+	for (subpg = 0, pgoff = 0; subpg < subpages;
+			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
+
+		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
+				io_entry + subpg, ua + pgoff, dir);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
 long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		unsigned long ioba, unsigned long tce)
 {
@@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
 		if (dir == DMA_NONE)
-			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry);
 		else
-			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry, ua, dir);
 
 		if (ret == H_SUCCESS)
@@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
@@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
-			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry + i);
 
 			if (ret == H_SUCCESS)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  4:07   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  4:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

At the moment we only support in the host the IOMMU page sizes which
the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
emulate bigger guest pages (for example 16MB) with smaller host pages
(4KB/64KB/2MB).

This allows the physical IOMMU pages to use a page size smaller or equal
than the guest visible IOMMU page size.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
 arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
 2 files changed, 98 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 041e54d..e10d6a3 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 
 		if (!tbltmp)
 			continue;
-		/*
-		 * Make sure hardware table parameters are exactly the same;
-		 * this is used in the TCE handlers where boundary checks
-		 * use only the first attached table.
-		 */
-		if ((tbltmp->it_page_shift = stt->page_shift) &&
-				(tbltmp->it_offset = stt->offset) &&
-				(tbltmp->it_size = stt->size)) {
+		/* Make sure hardware table parameters are compatible */
+		if ((tbltmp->it_page_shift <= stt->page_shift) &&
+				(tbltmp->it_offset << tbltmp->it_page_shift =
+				 stt->offset << stt->page_shift) &&
+				(tbltmp->it_size << tbltmp->it_page_shift =
+				 stt->size << stt->page_shift)) {
 			/*
 			 * Reference the table to avoid races with
 			 * add/remove DMA windows.
@@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
 	return H_SUCCESS;
 }
 
-static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
+static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
 		struct iommu_table *tbl, unsigned long entry)
 {
 	enum dma_data_direction dir = DMA_NONE;
@@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
 	return ret;
 }
 
-long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
+static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg;
+
+	for (subpg = 0; subpg < subpages; ++subpg) {
+		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
+long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		unsigned long entry, unsigned long ua,
 		enum dma_data_direction dir)
 {
@@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
 	return 0;
 }
 
+static long kvmppc_tce_iommu_map(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry, unsigned long ua,
+		enum dma_data_direction dir)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg, pgoff;
+
+	for (subpg = 0, pgoff = 0; subpg < subpages;
+			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
+
+		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
+				io_entry + subpg, ua + pgoff, dir);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		      unsigned long ioba, unsigned long tce)
 {
@@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
 		if (dir = DMA_NONE)
-			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry);
 		else
-			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
+			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
 					entry, ua, dir);
 
 		if (ret = H_SUCCESS)
@@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-			ret = kvmppc_tce_iommu_map(vcpu->kvm,
+			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
@@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
-			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry + i);
 
 			if (ret = H_SUCCESS)
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index e220fab..258e786 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
 	return H_SUCCESS;
 }
 
-static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
+static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
 		struct iommu_table *tbl, unsigned long entry)
 {
 	enum dma_data_direction dir = DMA_NONE;
@@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
 	return ret;
 }
 
-static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
+static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg;
+
+	for (subpg = 0; subpg < subpages; ++subpg) {
+		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
+static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
 		unsigned long entry, unsigned long ua,
 		enum dma_data_direction dir)
 {
@@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
 	return 0;
 }
 
+static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
+		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
+		unsigned long entry, unsigned long ua,
+		enum dma_data_direction dir)
+{
+	unsigned long ret = H_SUCCESS;
+	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
+	unsigned long io_entry = entry * subpages;
+	unsigned long subpg, pgoff;
+
+	for (subpg = 0, pgoff = 0; subpg < subpages;
+			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
+
+		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
+				io_entry + subpg, ua + pgoff, dir);
+		if (ret != H_SUCCESS)
+			break;
+	}
+
+	return ret;
+}
+
 long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		unsigned long ioba, unsigned long tce)
 {
@@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
 		if (dir = DMA_NONE)
-			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry);
 		else
-			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry, ua, dir);
 
 		if (ret = H_SUCCESS)
@@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 			return H_PARAMETER;
 
 		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
-			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
 					stit->tbl, entry + i, ua,
 					iommu_tce_direction(tce));
 
@@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 		unsigned long entry = ioba >> stt->page_shift;
 
 		for (i = 0; i < npages; ++i) {
-			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
+			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
 					stit->tbl, entry + i);
 
 			if (ret = H_SUCCESS)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 1/2] KVM: PPC: Use correct page shift in H_STUFF_TCE
  2018-05-02  4:07   ` Alexey Kardashevskiy
@ 2018-05-02  5:41     ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  5:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

On Wed, May 02, 2018 at 02:07:22PM +1000, Alexey Kardashevskiy wrote:
> The other TCE handlers use page shift from the guest visible TCE table
> (described by kvmppc_spapr_tce_iommu_table) so let's make H_STUFF_TCE
> handlers do the same thing.
> 
> This should cause no behavioral change now but soon we will allow
> the iommu_table::it_page_shift being different from from the emulated
> table page size so this will play a role.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 2 +-
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 4dffa61..041e54d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -615,7 +615,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		return H_PARAMETER;
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -		unsigned long entry = ioba >> stit->tbl->it_page_shift;
> +		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
>  			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 6651f73..e220fab 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -526,7 +526,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		return H_PARAMETER;
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -		unsigned long entry = ioba >> stit->tbl->it_page_shift;
> +		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
>  			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 1/2] KVM: PPC: Use correct page shift in H_STUFF_TCE
@ 2018-05-02  5:41     ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  5:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

On Wed, May 02, 2018 at 02:07:22PM +1000, Alexey Kardashevskiy wrote:
> The other TCE handlers use page shift from the guest visible TCE table
> (described by kvmppc_spapr_tce_iommu_table) so let's make H_STUFF_TCE
> handlers do the same thing.
> 
> This should cause no behavioral change now but soon we will allow
> the iommu_table::it_page_shift being different from from the emulated
> table page size so this will play a role.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 2 +-
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 4dffa61..041e54d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -615,7 +615,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		return H_PARAMETER;
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -		unsigned long entry = ioba >> stit->tbl->it_page_shift;
> +		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
>  			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 6651f73..e220fab 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -526,7 +526,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		return H_PARAMETER;
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -		unsigned long entry = ioba >> stit->tbl->it_page_shift;
> +		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
>  			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  4:07   ` Alexey Kardashevskiy
@ 2018-05-02  5:49     ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  5:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 9227 bytes --]

On Wed, May 02, 2018 at 02:07:23PM +1000, Alexey Kardashevskiy wrote:
> At the moment we only support in the host the IOMMU page sizes which
> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
> emulate bigger guest pages (for example 16MB) with smaller host pages
> (4KB/64KB/2MB).
> 
> This allows the physical IOMMU pages to use a page size smaller or equal
> than the guest visible IOMMU page size.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Except for one possible nit..

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>  2 files changed, 98 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 041e54d..e10d6a3 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>  
>  		if (!tbltmp)
>  			continue;
> -		/*
> -		 * Make sure hardware table parameters are exactly the same;
> -		 * this is used in the TCE handlers where boundary checks
> -		 * use only the first attached table.
> -		 */
> -		if ((tbltmp->it_page_shift == stt->page_shift) &&
> -				(tbltmp->it_offset == stt->offset) &&
> -				(tbltmp->it_size == stt->size)) {
> +		/* Make sure hardware table parameters are compatible */
> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
> +				(tbltmp->it_offset << tbltmp->it_page_shift ==
> +				 stt->offset << stt->page_shift) &&
> +				(tbltmp->it_size << tbltmp->it_page_shift ==
> +				 stt->size << stt->page_shift)) {

Do we need to worry about stt->offset << stt->page_shift overflowing
with a buggy or malicious userspace?

>  			/*
>  			 * Reference the table to avoid races with
>  			 * add/remove DMA windows.
> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg;
> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir == DMA_NONE)
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>  					entry, ua, dir);
>  
>  		if (ret == H_SUCCESS)
> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret == H_SUCCESS)
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e220fab..258e786 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg;
> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		unsigned long ioba, unsigned long tce)
>  {
> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir == DMA_NONE)
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry, ua, dir);
>  
>  		if (ret == H_SUCCESS)
> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret == H_SUCCESS)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  5:49     ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  5:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 9227 bytes --]

On Wed, May 02, 2018 at 02:07:23PM +1000, Alexey Kardashevskiy wrote:
> At the moment we only support in the host the IOMMU page sizes which
> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
> emulate bigger guest pages (for example 16MB) with smaller host pages
> (4KB/64KB/2MB).
> 
> This allows the physical IOMMU pages to use a page size smaller or equal
> than the guest visible IOMMU page size.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Except for one possible nit..

> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>  2 files changed, 98 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 041e54d..e10d6a3 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>  
>  		if (!tbltmp)
>  			continue;
> -		/*
> -		 * Make sure hardware table parameters are exactly the same;
> -		 * this is used in the TCE handlers where boundary checks
> -		 * use only the first attached table.
> -		 */
> -		if ((tbltmp->it_page_shift == stt->page_shift) &&
> -				(tbltmp->it_offset == stt->offset) &&
> -				(tbltmp->it_size == stt->size)) {
> +		/* Make sure hardware table parameters are compatible */
> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
> +				(tbltmp->it_offset << tbltmp->it_page_shift ==
> +				 stt->offset << stt->page_shift) &&
> +				(tbltmp->it_size << tbltmp->it_page_shift ==
> +				 stt->size << stt->page_shift)) {

Do we need to worry about stt->offset << stt->page_shift overflowing
with a buggy or malicious userspace?

>  			/*
>  			 * Reference the table to avoid races with
>  			 * add/remove DMA windows.
> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg;
> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir == DMA_NONE)
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>  					entry, ua, dir);
>  
>  		if (ret == H_SUCCESS)
> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret == H_SUCCESS)
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e220fab..258e786 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg;
> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		unsigned long ioba, unsigned long tce)
>  {
> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir == DMA_NONE)
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry, ua, dir);
>  
>  		if (ret == H_SUCCESS)
> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret == H_SUCCESS)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  4:07   ` Alexey Kardashevskiy
@ 2018-05-02  5:53     ` Balbir Singh
  -1 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2018-05-02  5:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, David Gibson

On Wed,  2 May 2018 14:07:23 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> At the moment we only support in the host the IOMMU page sizes which
> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
> emulate bigger guest pages (for example 16MB) with smaller host pages
> (4KB/64KB/2MB).
> 
> This allows the physical IOMMU pages to use a page size smaller or equal
> than the guest visible IOMMU page size.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>  2 files changed, 98 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 041e54d..e10d6a3 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>  
>  		if (!tbltmp)
>  			continue;
> -		/*
> -		 * Make sure hardware table parameters are exactly the same;
> -		 * this is used in the TCE handlers where boundary checks
> -		 * use only the first attached table.
> -		 */
> -		if ((tbltmp->it_page_shift == stt->page_shift) &&
> -				(tbltmp->it_offset == stt->offset) &&
> -				(tbltmp->it_size == stt->size)) {
> +		/* Make sure hardware table parameters are compatible */
> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
> +				(tbltmp->it_offset << tbltmp->it_page_shift ==
> +				 stt->offset << stt->page_shift) &&
> +				(tbltmp->it_size << tbltmp->it_page_shift ==
> +				 stt->size << stt->page_shift)) {

Very difficult to parse

How about -

subpages_shift = (stt->page_shift - tbl->it_page_shift)

and then

matches_offset = !(tbltmp->it_offset - (stt->it_offset << subpages_shift));
matches_size = !(tbltmp->it_size - (stt->it_size << subpages_shift));

The condition then is just 

		if ((tbltmp->it_page_shift == stt->page_shift) &&
				matches_offset && matches_size) {



>  			/*
>  			 * Reference the table to avoid races with
>  			 * add/remove DMA windows.
> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;

How do we know this multiplication does not overflow?

> +	unsigned long subpg;

Why not just i?

> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir == DMA_NONE)
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>  					entry, ua, dir);
>  
>  		if (ret == H_SUCCESS)
> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret == H_SUCCESS)
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e220fab..258e786 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg;
> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		unsigned long ioba, unsigned long tce)
>  {
> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir == DMA_NONE)
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry, ua, dir);
>  
>  		if (ret == H_SUCCESS)
> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret == H_SUCCESS)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  5:53     ` Balbir Singh
  0 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2018-05-02  5:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, David Gibson

On Wed,  2 May 2018 14:07:23 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> At the moment we only support in the host the IOMMU page sizes which
> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
> emulate bigger guest pages (for example 16MB) with smaller host pages
> (4KB/64KB/2MB).
> 
> This allows the physical IOMMU pages to use a page size smaller or equal
> than the guest visible IOMMU page size.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>  2 files changed, 98 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 041e54d..e10d6a3 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>  
>  		if (!tbltmp)
>  			continue;
> -		/*
> -		 * Make sure hardware table parameters are exactly the same;
> -		 * this is used in the TCE handlers where boundary checks
> -		 * use only the first attached table.
> -		 */
> -		if ((tbltmp->it_page_shift = stt->page_shift) &&
> -				(tbltmp->it_offset = stt->offset) &&
> -				(tbltmp->it_size = stt->size)) {
> +		/* Make sure hardware table parameters are compatible */
> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
> +				(tbltmp->it_offset << tbltmp->it_page_shift =
> +				 stt->offset << stt->page_shift) &&
> +				(tbltmp->it_size << tbltmp->it_page_shift =
> +				 stt->size << stt->page_shift)) {

Very difficult to parse

How about -

subpages_shift = (stt->page_shift - tbl->it_page_shift)

and then

matches_offset = !(tbltmp->it_offset - (stt->it_offset << subpages_shift));
matches_size = !(tbltmp->it_size - (stt->it_size << subpages_shift));

The condition then is just 

		if ((tbltmp->it_page_shift = stt->page_shift) &&
				matches_offset && matches_size) {



>  			/*
>  			 * Reference the table to avoid races with
>  			 * add/remove DMA windows.
> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;

How do we know this multiplication does not overflow?

> +	unsigned long subpg;

Why not just i?

> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		      unsigned long ioba, unsigned long tce)
>  {
> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir = DMA_NONE)
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>  					entry, ua, dir);
>  
>  		if (ret = H_SUCCESS)
> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret = H_SUCCESS)
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index e220fab..258e786 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>  	return H_SUCCESS;
>  }
>  
> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>  		struct iommu_table *tbl, unsigned long entry)
>  {
>  	enum dma_data_direction dir = DMA_NONE;
> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg;
> +
> +	for (subpg = 0; subpg < subpages; ++subpg) {
> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>  		unsigned long entry, unsigned long ua,
>  		enum dma_data_direction dir)
>  {
> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>  	return 0;
>  }
>  
> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
> +		unsigned long entry, unsigned long ua,
> +		enum dma_data_direction dir)
> +{
> +	unsigned long ret = H_SUCCESS;
> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
> +	unsigned long io_entry = entry * subpages;
> +	unsigned long subpg, pgoff;
> +
> +	for (subpg = 0, pgoff = 0; subpg < subpages;
> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
> +
> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
> +				io_entry + subpg, ua + pgoff, dir);
> +		if (ret != H_SUCCESS)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  		unsigned long ioba, unsigned long tce)
>  {
> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>  
>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>  		if (dir = DMA_NONE)
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry);
>  		else
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry, ua, dir);
>  
>  		if (ret = H_SUCCESS)
> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>  			return H_PARAMETER;
>  
>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>  					stit->tbl, entry + i, ua,
>  					iommu_tce_direction(tce));
>  
> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  		unsigned long entry = ioba >> stt->page_shift;
>  
>  		for (i = 0; i < npages; ++i) {
> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>  					stit->tbl, entry + i);
>  
>  			if (ret = H_SUCCESS)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 0/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
  2018-05-02  4:07 ` Alexey Kardashevskiy
@ 2018-05-02  6:16   ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  6:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

On Wed, May 02, 2018 at 02:07:21PM +1000, Alexey Kardashevskiy wrote:
> This is to allow the guest to use big host IOMMU pages even when
> the exact page size (i.e. 16MB) is not supported by the host
> (i.e.P9).

Thanks for implementing this, it will make a bunch of things rather easier.

> 
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (2):
>   KVM: PPC: Use correct page shift in H_STUFF_TCE
>   KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
>     pages
> 
>  arch/powerpc/kvm/book3s_64_vio.c    | 68 +++++++++++++++++++++++++++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 54 +++++++++++++++++++++++++----
>  2 files changed, 100 insertions(+), 22 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 0/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
@ 2018-05-02  6:16   ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  6:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

On Wed, May 02, 2018 at 02:07:21PM +1000, Alexey Kardashevskiy wrote:
> This is to allow the guest to use big host IOMMU pages even when
> the exact page size (i.e. 16MB) is not supported by the host
> (i.e.P9).

Thanks for implementing this, it will make a bunch of things rather easier.

> 
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (2):
>   KVM: PPC: Use correct page shift in H_STUFF_TCE
>   KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical
>     pages
> 
>  arch/powerpc/kvm/book3s_64_vio.c    | 68 +++++++++++++++++++++++++++++--------
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 54 +++++++++++++++++++++++++----
>  2 files changed, 100 insertions(+), 22 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  5:53     ` Balbir Singh
@ 2018-05-02  6:26       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  6:26 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linuxppc-dev, kvm-ppc, David Gibson

On 2/5/18 3:53 pm, Balbir Singh wrote:
> On Wed,  2 May 2018 14:07:23 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> At the moment we only support in the host the IOMMU page sizes which
>> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
>> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
>> emulate bigger guest pages (for example 16MB) with smaller host pages
>> (4KB/64KB/2MB).
>>
>> This allows the physical IOMMU pages to use a page size smaller or equal
>> than the guest visible IOMMU page size.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>>  2 files changed, 98 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 041e54d..e10d6a3 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>>  
>>  		if (!tbltmp)
>>  			continue;
>> -		/*
>> -		 * Make sure hardware table parameters are exactly the same;
>> -		 * this is used in the TCE handlers where boundary checks
>> -		 * use only the first attached table.
>> -		 */
>> -		if ((tbltmp->it_page_shift == stt->page_shift) &&
>> -				(tbltmp->it_offset == stt->offset) &&
>> -				(tbltmp->it_size == stt->size)) {
>> +		/* Make sure hardware table parameters are compatible */
>> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
>> +				(tbltmp->it_offset << tbltmp->it_page_shift ==
>> +				 stt->offset << stt->page_shift) &&
>> +				(tbltmp->it_size << tbltmp->it_page_shift ==
>> +				 stt->size << stt->page_shift)) {
> 
> Very difficult to parse
> 
> How about -
> 
> subpages_shift = (stt->page_shift - tbl->it_page_shift)
> 
> and then
> 
> matches_offset = !(tbltmp->it_offset - (stt->it_offset << subpages_shift));
> matches_size = !(tbltmp->it_size - (stt->it_size << subpages_shift));
> 
> The condition then is just 
> 
> 		if ((tbltmp->it_page_shift == stt->page_shift) &&
> 				matches_offset && matches_size) {

This is harder to parse. My variant is bulky but straight forward otherwise.


> 
> 
> 
>>  			/*
>>  			 * Reference the table to avoid races with
>>  			 * add/remove DMA windows.
>> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
> 
> How do we know this multiplication does not overflow?

Entry, page_shift come from the userspace and checked in appropriate
ioctl/hcall handlers.

> 
>> +	unsigned long subpg;
> 
> Why not just i?

I can imagine pages so huge so backing them with 4K will overflow 32bit
anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
no much point in not-long types anyway.


>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		      unsigned long ioba, unsigned long tce)
>>  {
>> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir == DMA_NONE)
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>>  					entry, ua, dir);
>>  
>>  		if (ret == H_SUCCESS)
>> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret == H_SUCCESS)
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index e220fab..258e786 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg;
>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		unsigned long ioba, unsigned long tce)
>>  {
>> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir == DMA_NONE)
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry, ua, dir);
>>  
>>  		if (ret == H_SUCCESS)
>> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret == H_SUCCESS)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  6:26       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  6:26 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linuxppc-dev, kvm-ppc, David Gibson

On 2/5/18 3:53 pm, Balbir Singh wrote:
> On Wed,  2 May 2018 14:07:23 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> At the moment we only support in the host the IOMMU page sizes which
>> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
>> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
>> emulate bigger guest pages (for example 16MB) with smaller host pages
>> (4KB/64KB/2MB).
>>
>> This allows the physical IOMMU pages to use a page size smaller or equal
>> than the guest visible IOMMU page size.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>>  2 files changed, 98 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 041e54d..e10d6a3 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>>  
>>  		if (!tbltmp)
>>  			continue;
>> -		/*
>> -		 * Make sure hardware table parameters are exactly the same;
>> -		 * this is used in the TCE handlers where boundary checks
>> -		 * use only the first attached table.
>> -		 */
>> -		if ((tbltmp->it_page_shift = stt->page_shift) &&
>> -				(tbltmp->it_offset = stt->offset) &&
>> -				(tbltmp->it_size = stt->size)) {
>> +		/* Make sure hardware table parameters are compatible */
>> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
>> +				(tbltmp->it_offset << tbltmp->it_page_shift =
>> +				 stt->offset << stt->page_shift) &&
>> +				(tbltmp->it_size << tbltmp->it_page_shift =
>> +				 stt->size << stt->page_shift)) {
> 
> Very difficult to parse
> 
> How about -
> 
> subpages_shift = (stt->page_shift - tbl->it_page_shift)
> 
> and then
> 
> matches_offset = !(tbltmp->it_offset - (stt->it_offset << subpages_shift));
> matches_size = !(tbltmp->it_size - (stt->it_size << subpages_shift));
> 
> The condition then is just 
> 
> 		if ((tbltmp->it_page_shift = stt->page_shift) &&
> 				matches_offset && matches_size) {

This is harder to parse. My variant is bulky but straight forward otherwise.


> 
> 
> 
>>  			/*
>>  			 * Reference the table to avoid races with
>>  			 * add/remove DMA windows.
>> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
> 
> How do we know this multiplication does not overflow?

Entry, page_shift come from the userspace and checked in appropriate
ioctl/hcall handlers.

> 
>> +	unsigned long subpg;
> 
> Why not just i?

I can imagine pages so huge so backing them with 4K will overflow 32bit
anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
no much point in not-long types anyway.


>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		      unsigned long ioba, unsigned long tce)
>>  {
>> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir = DMA_NONE)
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>>  					entry, ua, dir);
>>  
>>  		if (ret = H_SUCCESS)
>> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret = H_SUCCESS)
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index e220fab..258e786 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg;
>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		unsigned long ioba, unsigned long tce)
>>  {
>> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir = DMA_NONE)
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry, ua, dir);
>>  
>>  		if (ret = H_SUCCESS)
>> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret = H_SUCCESS)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  6:26       ` Alexey Kardashevskiy
@ 2018-05-02  7:03         ` David Gibson
  -1 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  7:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Balbir Singh, linuxppc-dev, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

On Wed, May 02, 2018 at 04:26:04PM +1000, Alexey Kardashevskiy wrote:
> On 2/5/18 3:53 pm, Balbir Singh wrote:
> > On Wed,  2 May 2018 14:07:23 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
[snip]
> >> +	unsigned long subpg;
> > 
> > Why not just i?
> 
> I can imagine pages so huge so backing them with 4K will overflow 32bit
> anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
> no much point in not-long types anyway.

I think Balbir was talking about the variable name, not the type.
i.e. "Why not call it 'i' the default name for a random loop
counter?".

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  7:03         ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2018-05-02  7:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Balbir Singh, linuxppc-dev, kvm-ppc

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

On Wed, May 02, 2018 at 04:26:04PM +1000, Alexey Kardashevskiy wrote:
> On 2/5/18 3:53 pm, Balbir Singh wrote:
> > On Wed,  2 May 2018 14:07:23 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
[snip]
> >> +	unsigned long subpg;
> > 
> > Why not just i?
> 
> I can imagine pages so huge so backing them with 4K will overflow 32bit
> anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
> no much point in not-long types anyway.

I think Balbir was talking about the variable name, not the type.
i.e. "Why not call it 'i' the default name for a random loop
counter?".

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  6:26       ` Alexey Kardashevskiy
@ 2018-05-02  8:59         ` Balbir Singh
  -1 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2018-05-02  8:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	open list:KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC, David Gibson

On Wed, May 2, 2018 at 4:26 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 2/5/18 3:53 pm, Balbir Singh wrote:
>> On Wed,  2 May 2018 14:07:23 +1000
>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>> At the moment we only support in the host the IOMMU page sizes which
>>> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
>>> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
>>> emulate bigger guest pages (for example 16MB) with smaller host pages
>>> (4KB/64KB/2MB).
>>>
>>> This allows the physical IOMMU pages to use a page size smaller or equal
>>> than the guest visible IOMMU page size.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>>>  2 files changed, 98 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>>> index 041e54d..e10d6a3 100644
>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>>>
>>>              if (!tbltmp)
>>>                      continue;
>>> -            /*
>>> -             * Make sure hardware table parameters are exactly the same;
>>> -             * this is used in the TCE handlers where boundary checks
>>> -             * use only the first attached table.
>>> -             */
>>> -            if ((tbltmp->it_page_shift == stt->page_shift) &&
>>> -                            (tbltmp->it_offset == stt->offset) &&
>>> -                            (tbltmp->it_size == stt->size)) {
>>> +            /* Make sure hardware table parameters are compatible */
>>> +            if ((tbltmp->it_page_shift <= stt->page_shift) &&
>>> +                            (tbltmp->it_offset << tbltmp->it_page_shift ==
>>> +                             stt->offset << stt->page_shift) &&
>>> +                            (tbltmp->it_size << tbltmp->it_page_shift ==
>>> +                             stt->size << stt->page_shift)) {
>>
>> Very difficult to parse
>>
>> How about -
>>
>> subpages_shift = (stt->page_shift - tbl->it_page_shift)
>>
>> and then
>>
>> matches_offset = !(tbltmp->it_offset - (stt->it_offset << subpages_shift));
>> matches_size = !(tbltmp->it_size - (stt->it_size << subpages_shift));
>>
>> The condition then is just
>>
>>               if ((tbltmp->it_page_shift == stt->page_shift) &&
>>                               matches_offset && matches_size) {
>
> This is harder to parse. My variant is bulky but straight forward otherwise.

OK, I don't read this code as often, but found it_page_shift and page_shift
variants hard to understand, but I guess it means I need to read more code.

>
>
>>
>>
>>
>>>                      /*
>>>                       * Reference the table to avoid races with
>>>                       * add/remove DMA windows.
>>> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>>>      return H_SUCCESS;
>>>  }
>>>
>>> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>>>              struct iommu_table *tbl, unsigned long entry)
>>>  {
>>>      enum dma_data_direction dir = DMA_NONE;
>>> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>>      return ret;
>>>  }
>>>
>>> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>> +            struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>>> +            unsigned long entry)
>>> +{
>>> +    unsigned long ret = H_SUCCESS;
>>> +    unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>>> +    unsigned long io_entry = entry * subpages;
>>
>> How do we know this multiplication does not overflow?
>
> Entry, page_shift come from the userspace and checked in appropriate
> ioctl/hcall handlers.
>

Yes, I am not sure how many safety nets we need, but I worry about
such things :)

>>
>>> +    unsigned long subpg;
>>
>> Why not just i?
>
> I can imagine pages so huge so backing them with 4K will overflow 32bit
> anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
> no much point in not-long types anyway.
>

What David said, i is an easy iterator to understand :)

Balbir Singh.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  8:59         ` Balbir Singh
  0 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2018-05-02  8:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	open list:KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC, David Gibson

On Wed, May 2, 2018 at 4:26 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 2/5/18 3:53 pm, Balbir Singh wrote:
>> On Wed,  2 May 2018 14:07:23 +1000
>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>> At the moment we only support in the host the IOMMU page sizes which
>>> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
>>> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
>>> emulate bigger guest pages (for example 16MB) with smaller host pages
>>> (4KB/64KB/2MB).
>>>
>>> This allows the physical IOMMU pages to use a page size smaller or equal
>>> than the guest visible IOMMU page size.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>>>  2 files changed, 98 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>>> index 041e54d..e10d6a3 100644
>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>>>
>>>              if (!tbltmp)
>>>                      continue;
>>> -            /*
>>> -             * Make sure hardware table parameters are exactly the same;
>>> -             * this is used in the TCE handlers where boundary checks
>>> -             * use only the first attached table.
>>> -             */
>>> -            if ((tbltmp->it_page_shift = stt->page_shift) &&
>>> -                            (tbltmp->it_offset = stt->offset) &&
>>> -                            (tbltmp->it_size = stt->size)) {
>>> +            /* Make sure hardware table parameters are compatible */
>>> +            if ((tbltmp->it_page_shift <= stt->page_shift) &&
>>> +                            (tbltmp->it_offset << tbltmp->it_page_shift =
>>> +                             stt->offset << stt->page_shift) &&
>>> +                            (tbltmp->it_size << tbltmp->it_page_shift =
>>> +                             stt->size << stt->page_shift)) {
>>
>> Very difficult to parse
>>
>> How about -
>>
>> subpages_shift = (stt->page_shift - tbl->it_page_shift)
>>
>> and then
>>
>> matches_offset = !(tbltmp->it_offset - (stt->it_offset << subpages_shift));
>> matches_size = !(tbltmp->it_size - (stt->it_size << subpages_shift));
>>
>> The condition then is just
>>
>>               if ((tbltmp->it_page_shift = stt->page_shift) &&
>>                               matches_offset && matches_size) {
>
> This is harder to parse. My variant is bulky but straight forward otherwise.

OK, I don't read this code as often, but found it_page_shift and page_shift
variants hard to understand, but I guess it means I need to read more code.

>
>
>>
>>
>>
>>>                      /*
>>>                       * Reference the table to avoid races with
>>>                       * add/remove DMA windows.
>>> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>>>      return H_SUCCESS;
>>>  }
>>>
>>> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>>>              struct iommu_table *tbl, unsigned long entry)
>>>  {
>>>      enum dma_data_direction dir = DMA_NONE;
>>> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>>      return ret;
>>>  }
>>>
>>> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>> +            struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>>> +            unsigned long entry)
>>> +{
>>> +    unsigned long ret = H_SUCCESS;
>>> +    unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>>> +    unsigned long io_entry = entry * subpages;
>>
>> How do we know this multiplication does not overflow?
>
> Entry, page_shift come from the userspace and checked in appropriate
> ioctl/hcall handlers.
>

Yes, I am not sure how many safety nets we need, but I worry about
such things :)

>>
>>> +    unsigned long subpg;
>>
>> Why not just i?
>
> I can imagine pages so huge so backing them with 4K will overflow 32bit
> anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
> no much point in not-long types anyway.
>

What David said, i is an easy iterator to understand :)

Balbir Singh.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  5:49     ` David Gibson
@ 2018-05-02  9:10       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  9:10 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

On 2/5/18 3:49 pm, David Gibson wrote:
> On Wed, May 02, 2018 at 02:07:23PM +1000, Alexey Kardashevskiy wrote:
>> At the moment we only support in the host the IOMMU page sizes which
>> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
>> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
>> emulate bigger guest pages (for example 16MB) with smaller host pages
>> (4KB/64KB/2MB).
>>
>> This allows the physical IOMMU pages to use a page size smaller or equal
>> than the guest visible IOMMU page size.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Except for one possible nit..
> 
>> ---
>>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>>  2 files changed, 98 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 041e54d..e10d6a3 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>>  
>>  		if (!tbltmp)
>>  			continue;
>> -		/*
>> -		 * Make sure hardware table parameters are exactly the same;
>> -		 * this is used in the TCE handlers where boundary checks
>> -		 * use only the first attached table.
>> -		 */
>> -		if ((tbltmp->it_page_shift == stt->page_shift) &&
>> -				(tbltmp->it_offset == stt->offset) &&
>> -				(tbltmp->it_size == stt->size)) {
>> +		/* Make sure hardware table parameters are compatible */
>> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
>> +				(tbltmp->it_offset << tbltmp->it_page_shift ==
>> +				 stt->offset << stt->page_shift) &&
>> +				(tbltmp->it_size << tbltmp->it_page_shift ==
>> +				 stt->size << stt->page_shift)) {
> 
> Do we need to worry about stt->offset << stt->page_shift overflowing
> with a buggy or malicious userspace?


I cannot see how it can break anything... But probably some sanity check
that the entire kvmppc_spapr_tce_table fits 64bit would not hurt, I'll make
a separate patch.


> 
>>  			/*
>>  			 * Reference the table to avoid races with
>>  			 * add/remove DMA windows.
>> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg;
>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		      unsigned long ioba, unsigned long tce)
>>  {
>> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir == DMA_NONE)
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>>  					entry, ua, dir);
>>  
>>  		if (ret == H_SUCCESS)
>> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret == H_SUCCESS)
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index e220fab..258e786 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg;
>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		unsigned long ioba, unsigned long tce)
>>  {
>> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir == DMA_NONE)
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry, ua, dir);
>>  
>>  		if (ret == H_SUCCESS)
>> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret == H_SUCCESS)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  9:10       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  9:10 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, kvm-ppc, Paul Mackerras

On 2/5/18 3:49 pm, David Gibson wrote:
> On Wed, May 02, 2018 at 02:07:23PM +1000, Alexey Kardashevskiy wrote:
>> At the moment we only support in the host the IOMMU page sizes which
>> the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
>> 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
>> emulate bigger guest pages (for example 16MB) with smaller host pages
>> (4KB/64KB/2MB).
>>
>> This allows the physical IOMMU pages to use a page size smaller or equal
>> than the guest visible IOMMU page size.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Except for one possible nit..
> 
>> ---
>>  arch/powerpc/kvm/book3s_64_vio.c    | 66 +++++++++++++++++++++++++++++--------
>>  arch/powerpc/kvm/book3s_64_vio_hv.c | 52 +++++++++++++++++++++++++----
>>  2 files changed, 98 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 041e54d..e10d6a3 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -176,14 +176,12 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
>>  
>>  		if (!tbltmp)
>>  			continue;
>> -		/*
>> -		 * Make sure hardware table parameters are exactly the same;
>> -		 * this is used in the TCE handlers where boundary checks
>> -		 * use only the first attached table.
>> -		 */
>> -		if ((tbltmp->it_page_shift = stt->page_shift) &&
>> -				(tbltmp->it_offset = stt->offset) &&
>> -				(tbltmp->it_size = stt->size)) {
>> +		/* Make sure hardware table parameters are compatible */
>> +		if ((tbltmp->it_page_shift <= stt->page_shift) &&
>> +				(tbltmp->it_offset << tbltmp->it_page_shift =
>> +				 stt->offset << stt->page_shift) &&
>> +				(tbltmp->it_size << tbltmp->it_page_shift =
>> +				 stt->size << stt->page_shift)) {
> 
> Do we need to worry about stt->offset << stt->page_shift overflowing
> with a buggy or malicious userspace?


I cannot see how it can break anything... But probably some sanity check
that the entire kvmppc_spapr_tce_table fits 64bit would not hurt, I'll make
a separate patch.


> 
>>  			/*
>>  			 * Reference the table to avoid races with
>>  			 * add/remove DMA windows.
>> @@ -396,7 +394,7 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -416,7 +414,25 @@ static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg;
>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -453,6 +469,28 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		      unsigned long ioba, unsigned long tce)
>>  {
>> @@ -491,10 +529,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir = DMA_NONE)
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm, stit->tbl,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl,
>>  					entry, ua, dir);
>>  
>>  		if (ret = H_SUCCESS)
>> @@ -570,7 +608,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -618,7 +656,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret = H_SUCCESS)
>> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> index e220fab..258e786 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
>> @@ -221,7 +221,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
>>  	return H_SUCCESS;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm,
>>  		struct iommu_table *tbl, unsigned long entry)
>>  {
>>  	enum dma_data_direction dir = DMA_NONE;
>> @@ -245,7 +245,25 @@ static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>>  	return ret;
>>  }
>>  
>> -static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg;
>> +
>> +	for (subpg = 0; subpg < subpages; ++subpg) {
>> +		ret = kvmppc_rm_tce_iommu_do_unmap(kvm, tbl, io_entry + subpg);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
>>  		unsigned long entry, unsigned long ua,
>>  		enum dma_data_direction dir)
>>  {
>> @@ -290,6 +308,28 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
>>  	return 0;
>>  }
>>  
>> +static long kvmppc_rm_tce_iommu_map(struct kvm *kvm,
>> +		struct kvmppc_spapr_tce_table *stt, struct iommu_table *tbl,
>> +		unsigned long entry, unsigned long ua,
>> +		enum dma_data_direction dir)
>> +{
>> +	unsigned long ret = H_SUCCESS;
>> +	unsigned long subpages = 1ULL << (stt->page_shift - tbl->it_page_shift);
>> +	unsigned long io_entry = entry * subpages;
>> +	unsigned long subpg, pgoff;
>> +
>> +	for (subpg = 0, pgoff = 0; subpg < subpages;
>> +			++subpg, pgoff += IOMMU_PAGE_SIZE(tbl)) {
>> +
>> +		ret = kvmppc_rm_tce_iommu_do_map(kvm, tbl,
>> +				io_entry + subpg, ua + pgoff, dir);
>> +		if (ret != H_SUCCESS)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  		unsigned long ioba, unsigned long tce)
>>  {
>> @@ -327,10 +367,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
>>  
>>  	list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>  		if (dir = DMA_NONE)
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry);
>>  		else
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry, ua, dir);
>>  
>>  		if (ret = H_SUCCESS)
>> @@ -477,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>  			return H_PARAMETER;
>>  
>>  		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>> -			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt,
>>  					stit->tbl, entry + i, ua,
>>  					iommu_tce_direction(tce));
>>  
>> @@ -529,7 +569,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>>  		unsigned long entry = ioba >> stt->page_shift;
>>  
>>  		for (i = 0; i < npages; ++i) {
>> -			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm,
>> +			ret = kvmppc_rm_tce_iommu_unmap(vcpu->kvm, stt,
>>  					stit->tbl, entry + i);
>>  
>>  			if (ret = H_SUCCESS)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
  2018-05-02  8:59         ` Balbir Singh
@ 2018-05-02  9:13           ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  9:13 UTC (permalink / raw)
  To: Balbir Singh
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	open list:KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC, David Gibson

On 2/5/18 6:59 pm, Balbir Singh wrote:

>>>
>>>> +    unsigned long subpg;
>>>
>>> Why not just i?
>>
>> I can imagine pages so huge so backing them with 4K will overflow 32bit
>> anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
>> no much point in not-long types anyway.
>>
> 
> What David said, i is an easy iterator to understand :)


Ah, blind me. The reason for this is there is another "i" in the caller and
I found it easier not to mix this "i" (host TCE entry number) with that "i"
(guest TCE entry number).


-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages
@ 2018-05-02  9:13           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 24+ messages in thread
From: Alexey Kardashevskiy @ 2018-05-02  9:13 UTC (permalink / raw)
  To: Balbir Singh
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	open list:KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC, David Gibson

On 2/5/18 6:59 pm, Balbir Singh wrote:

>>>
>>>> +    unsigned long subpg;
>>>
>>> Why not just i?
>>
>> I can imagine pages so huge so backing them with 4K will overflow 32bit
>> anyway. It is very (very) unlikely but it is 64bit arch anyway and there is
>> no much point in not-long types anyway.
>>
> 
> What David said, i is an easy iterator to understand :)


Ah, blind me. The reason for this is there is another "i" in the caller and
I found it easier not to mix this "i" (host TCE entry number) with that "i"
(guest TCE entry number).


-- 
Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-02  9:13 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02  4:07 [PATCH kernel 0/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical Alexey Kardashevskiy
2018-05-02  4:07 ` Alexey Kardashevskiy
2018-05-02  4:07 ` [PATCH kernel 1/2] KVM: PPC: Use correct page shift in H_STUFF_TCE Alexey Kardashevskiy
2018-05-02  4:07   ` Alexey Kardashevskiy
2018-05-02  5:41   ` David Gibson
2018-05-02  5:41     ` David Gibson
2018-05-02  4:07 ` [PATCH kernel 2/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical pages Alexey Kardashevskiy
2018-05-02  4:07   ` Alexey Kardashevskiy
2018-05-02  5:49   ` David Gibson
2018-05-02  5:49     ` David Gibson
2018-05-02  9:10     ` Alexey Kardashevskiy
2018-05-02  9:10       ` Alexey Kardashevskiy
2018-05-02  5:53   ` Balbir Singh
2018-05-02  5:53     ` Balbir Singh
2018-05-02  6:26     ` Alexey Kardashevskiy
2018-05-02  6:26       ` Alexey Kardashevskiy
2018-05-02  7:03       ` David Gibson
2018-05-02  7:03         ` David Gibson
2018-05-02  8:59       ` Balbir Singh
2018-05-02  8:59         ` Balbir Singh
2018-05-02  9:13         ` Alexey Kardashevskiy
2018-05-02  9:13           ` Alexey Kardashevskiy
2018-05-02  6:16 ` [PATCH kernel 0/2] KVM: PPC: Allow backing bigger guest IOMMU pages with smaller physical David Gibson
2018-05-02  6:16   ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.