All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window at 4GB
@ 2020-02-18  7:36 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson


Here is an attempt to support bigger DMA space for devices
supporting DMA masks less than 59 bits (GPUs come into mind
first). POWER9 PHBs have an option to map 2 windows at 0
and select a windows based on DMA address being below or above
4GB.

This adds the "iommu=iommu_bypass" kernel parameter and
supports VFIO+pseries machine - current this requires telling
upstream+unmodified QEMU about this via
-global spapr-pci-host-bridge.dma64_win_addr=0x100000000
or per-phb property. 4/4 advertises the new option but
there is no automation around it in QEMU (should it be?).

For now it is either 1<<59 or 4GB mode; dynamic switching is
not supported (could be via sysfs).

This is a rebased version of
https://lore.kernel.org/kvm/20191202015953.127902-1-aik@ozlabs.ru/

This is based on sha1
71c3a888cbca Linus Torvalds "Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux".

Please comment. Thanks.



Alexey Kardashevskiy (5):
  powerpc/powernv/ioda: Move TCE bypass base to PE
  powerpc/powernv/ioda: Rework for huge DMA window at 4GB
  powerpc/powernv/ioda: Allow smaller TCE table levels
  powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
  vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB

 arch/powerpc/include/asm/iommu.h              |   2 +
 arch/powerpc/include/asm/opal-api.h           |   9 +-
 arch/powerpc/include/asm/opal.h               |   2 +
 arch/powerpc/platforms/powernv/pci.h          |   2 +-
 include/uapi/linux/vfio.h                     |   2 +
 arch/powerpc/platforms/powernv/npu-dma.c      |   1 +
 arch/powerpc/platforms/powernv/opal-call.c    |   2 +
 arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
 arch/powerpc/platforms/powernv/pci-ioda.c     | 229 ++++++++++++++----
 drivers/vfio/vfio_iommu_spapr_tce.c           |  10 +-
 10 files changed, 207 insertions(+), 56 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window at 4GB
@ 2020-02-18  7:36 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson


Here is an attempt to support bigger DMA space for devices
supporting DMA masks less than 59 bits (GPUs come into mind
first). POWER9 PHBs have an option to map 2 windows at 0
and select a windows based on DMA address being below or above
4GB.

This adds the "iommu=iommu_bypass" kernel parameter and
supports VFIO+pseries machine - current this requires telling
upstream+unmodified QEMU about this via
-global spapr-pci-host-bridge.dma64_win_addr=0x100000000
or per-phb property. 4/4 advertises the new option but
there is no automation around it in QEMU (should it be?).

For now it is either 1<<59 or 4GB mode; dynamic switching is
not supported (could be via sysfs).

This is a rebased version of
https://lore.kernel.org/kvm/20191202015953.127902-1-aik@ozlabs.ru/

This is based on sha1
71c3a888cbca Linus Torvalds "Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux".

Please comment. Thanks.



Alexey Kardashevskiy (5):
  powerpc/powernv/ioda: Move TCE bypass base to PE
  powerpc/powernv/ioda: Rework for huge DMA window at 4GB
  powerpc/powernv/ioda: Allow smaller TCE table levels
  powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
  vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB

 arch/powerpc/include/asm/iommu.h              |   2 +
 arch/powerpc/include/asm/opal-api.h           |   9 +-
 arch/powerpc/include/asm/opal.h               |   2 +
 arch/powerpc/platforms/powernv/pci.h          |   2 +-
 include/uapi/linux/vfio.h                     |   2 +
 arch/powerpc/platforms/powernv/npu-dma.c      |   1 +
 arch/powerpc/platforms/powernv/opal-call.c    |   2 +
 arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
 arch/powerpc/platforms/powernv/pci-ioda.c     | 229 ++++++++++++++----
 drivers/vfio/vfio_iommu_spapr_tce.c           |  10 +-
 10 files changed, 207 insertions(+), 56 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH kernel 1/5] powerpc/powernv/ioda: Move TCE bypass base to PE
  2020-02-18  7:36 ` Alexey Kardashevskiy
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

We are about to allow another location for the second DMA window and
we will need to advertise it outside of the powernv platform code.

This moves bypass base address to iommu_table_group so drivers such as
VFIO SPAPR TCE can see it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h          |  1 +
 arch/powerpc/platforms/powernv/pci.h      |  1 -
 arch/powerpc/platforms/powernv/npu-dma.c  |  1 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 26 ++++++++++-------------
 4 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 350101e11ddb..479439ef003e 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -194,6 +194,7 @@ struct iommu_table_group {
 	__u64 pgsizes; /* Bitmap of supported page sizes */
 	__u32 max_dynamic_windows_supported;
 	__u32 max_levels;
+	__u64 tce64_start;
 
 	struct iommu_group *group;
 	struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index d3bbdeab3a32..a808dd396522 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -67,7 +67,6 @@ struct pnv_ioda_pe {
 
 	/* 64-bit TCE bypass region */
 	bool			tce_bypass_enabled;
-	uint64_t		tce_bypass_base;
 
 	/* MSIs. MVE index is identical for for 32 and 64 bit MSI
 	 * and -1 if not supported. (It's actually identical to the
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index b95b9e3c4c98..97a479848003 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -469,6 +469,7 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 	table_group->tce32_start = pe->table_group.tce32_start;
 	table_group->tce32_size = pe->table_group.tce32_size;
 	table_group->max_levels = pe->table_group.max_levels;
+	table_group->tce64_start = pe->table_group.tce64_start;
 	if (!table_group->pgsizes)
 		table_group->pgsizes = pe->table_group.pgsizes;
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 22c22cd7bd82..52db10ab4fef 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1777,7 +1777,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pci_dev *pdev)
 
 	pe = &phb->ioda.pe_array[pdn->pe_number];
 	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-	pdev->dev.archdata.dma_offset = pe->tce_bypass_base;
+	pdev->dev.archdata.dma_offset = pe->table_group.tce64_start;
 	set_iommu_table_base(&pdev->dev, pe->table_group.tables[0]);
 	/*
 	 * Note: iommu_add_device() will fail here as
@@ -1869,7 +1869,8 @@ static bool pnv_pci_ioda_iommu_bypass_supported(struct pci_dev *pdev,
 
 	pe = &phb->ioda.pe_array[pdn->pe_number];
 	if (pe->tce_bypass_enabled) {
-		u64 top = pe->tce_bypass_base + memblock_end_of_DRAM() - 1;
+		u64 top = pe->table_group.tce64_start +
+			memblock_end_of_DRAM() - 1;
 		if (dma_mask >= top)
 			return true;
 	}
@@ -1903,7 +1904,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		set_iommu_table_base(&dev->dev, pe->table_group.tables[0]);
-		dev->dev.archdata.dma_offset = pe->tce_bypass_base;
+		dev->dev.archdata.dma_offset = pe->table_group.tce64_start;
 
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
@@ -2361,16 +2362,12 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 
 		top = roundup_pow_of_two(top);
 		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-						     pe->pe_number,
-						     window_id,
-						     pe->tce_bypass_base,
-						     top);
+				pe->pe_number, window_id,
+				pe->table_group.tce64_start, top);
 	} else {
 		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-						     pe->pe_number,
-						     window_id,
-						     pe->tce_bypass_base,
-						     0);
+				pe->pe_number, window_id,
+				pe->table_group.tce64_start, 0);
 	}
 	if (rc)
 		pe_err(pe, "OPAL error %lld configuring bypass window\n", rc);
@@ -2385,7 +2382,8 @@ static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 			table_group);
 	int nid = pe->phb->hose->node;
-	__u64 bus_offset = num ? pe->tce_bypass_base : table_group->tce32_start;
+	__u64 bus_offset = num ?
+		pe->table_group.tce64_start : table_group->tce32_start;
 	long ret;
 	struct iommu_table *tbl;
 
@@ -2735,9 +2733,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	if (!pnv_pci_ioda_pe_dma_weight(pe))
 		return;
 
-	/* TVE #1 is selected by PCI address bit 59 */
-	pe->tce_bypass_base = 1ull << 59;
-
 	/* The PE will reserve all possible 32-bits space */
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
 		phb->ioda.m32_pci_base);
@@ -2745,6 +2740,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	/* Setup linux iommu table */
 	pe->table_group.tce32_start = 0;
 	pe->table_group.tce32_size = phb->ioda.m32_pci_base;
+	pe->table_group.tce64_start = 1UL << 59;
 	pe->table_group.max_dynamic_windows_supported =
 			IOMMU_TABLE_GROUP_MAX_TABLES;
 	pe->table_group.max_levels = POWERNV_IOMMU_MAX_LEVELS;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 1/5] powerpc/powernv/ioda: Move TCE bypass base to PE
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

We are about to allow another location for the second DMA window and
we will need to advertise it outside of the powernv platform code.

This moves bypass base address to iommu_table_group so drivers such as
VFIO SPAPR TCE can see it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h          |  1 +
 arch/powerpc/platforms/powernv/pci.h      |  1 -
 arch/powerpc/platforms/powernv/npu-dma.c  |  1 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 26 ++++++++++-------------
 4 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 350101e11ddb..479439ef003e 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -194,6 +194,7 @@ struct iommu_table_group {
 	__u64 pgsizes; /* Bitmap of supported page sizes */
 	__u32 max_dynamic_windows_supported;
 	__u32 max_levels;
+	__u64 tce64_start;
 
 	struct iommu_group *group;
 	struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index d3bbdeab3a32..a808dd396522 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -67,7 +67,6 @@ struct pnv_ioda_pe {
 
 	/* 64-bit TCE bypass region */
 	bool			tce_bypass_enabled;
-	uint64_t		tce_bypass_base;
 
 	/* MSIs. MVE index is identical for for 32 and 64 bit MSI
 	 * and -1 if not supported. (It's actually identical to the
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index b95b9e3c4c98..97a479848003 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -469,6 +469,7 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
 	table_group->tce32_start = pe->table_group.tce32_start;
 	table_group->tce32_size = pe->table_group.tce32_size;
 	table_group->max_levels = pe->table_group.max_levels;
+	table_group->tce64_start = pe->table_group.tce64_start;
 	if (!table_group->pgsizes)
 		table_group->pgsizes = pe->table_group.pgsizes;
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 22c22cd7bd82..52db10ab4fef 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1777,7 +1777,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pci_dev *pdev)
 
 	pe = &phb->ioda.pe_array[pdn->pe_number];
 	WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-	pdev->dev.archdata.dma_offset = pe->tce_bypass_base;
+	pdev->dev.archdata.dma_offset = pe->table_group.tce64_start;
 	set_iommu_table_base(&pdev->dev, pe->table_group.tables[0]);
 	/*
 	 * Note: iommu_add_device() will fail here as
@@ -1869,7 +1869,8 @@ static bool pnv_pci_ioda_iommu_bypass_supported(struct pci_dev *pdev,
 
 	pe = &phb->ioda.pe_array[pdn->pe_number];
 	if (pe->tce_bypass_enabled) {
-		u64 top = pe->tce_bypass_base + memblock_end_of_DRAM() - 1;
+		u64 top = pe->table_group.tce64_start +
+			memblock_end_of_DRAM() - 1;
 		if (dma_mask >= top)
 			return true;
 	}
@@ -1903,7 +1904,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		set_iommu_table_base(&dev->dev, pe->table_group.tables[0]);
-		dev->dev.archdata.dma_offset = pe->tce_bypass_base;
+		dev->dev.archdata.dma_offset = pe->table_group.tce64_start;
 
 		if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
 			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
@@ -2361,16 +2362,12 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 
 		top = roundup_pow_of_two(top);
 		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-						     pe->pe_number,
-						     window_id,
-						     pe->tce_bypass_base,
-						     top);
+				pe->pe_number, window_id,
+				pe->table_group.tce64_start, top);
 	} else {
 		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-						     pe->pe_number,
-						     window_id,
-						     pe->tce_bypass_base,
-						     0);
+				pe->pe_number, window_id,
+				pe->table_group.tce64_start, 0);
 	}
 	if (rc)
 		pe_err(pe, "OPAL error %lld configuring bypass window\n", rc);
@@ -2385,7 +2382,8 @@ static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 			table_group);
 	int nid = pe->phb->hose->node;
-	__u64 bus_offset = num ? pe->tce_bypass_base : table_group->tce32_start;
+	__u64 bus_offset = num ?
+		pe->table_group.tce64_start : table_group->tce32_start;
 	long ret;
 	struct iommu_table *tbl;
 
@@ -2735,9 +2733,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	if (!pnv_pci_ioda_pe_dma_weight(pe))
 		return;
 
-	/* TVE #1 is selected by PCI address bit 59 */
-	pe->tce_bypass_base = 1ull << 59;
-
 	/* The PE will reserve all possible 32-bits space */
 	pe_info(pe, "Setting up 32-bit TCE table at 0..%08x\n",
 		phb->ioda.m32_pci_base);
@@ -2745,6 +2740,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	/* Setup linux iommu table */
 	pe->table_group.tce32_start = 0;
 	pe->table_group.tce32_size = phb->ioda.m32_pci_base;
+	pe->table_group.tce64_start = 1UL << 59;
 	pe->table_group.max_dynamic_windows_supported  			IOMMU_TABLE_GROUP_MAX_TABLES;
 	pe->table_group.max_levels = POWERNV_IOMMU_MAX_LEVELS;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 2/5] powerpc/powernv/ioda: Rework for huge DMA window at 4GB
  2020-02-18  7:36 ` Alexey Kardashevskiy
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

This moves code to make the next patches look simpler. In particular:

1. Separate locals declaration as we will be allocating a smaller DMA
window if a TVE1_4GB option (allows a huge DMA windows at 4GB) is enabled;

2. Pass the bypass offset directly to pnv_pci_ioda2_create_table()
as it is the only information needed from @pe;

3. Use PAGE_SHIFT for it_map allocation estimate and @tceshift for
the IOMMU page size; this makes the distinction clear and allows
easy switching between different IOMMU page size.

These changes should not cause behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

I really need 1), 2) makes the code less dependent on the PE struct member
value (==easier to follow), 3) is to enable 2MB quickly for the default
DMA window for debugging/performance testing.
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 38 ++++++++++++-----------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 52db10ab4fef..f5f1b4e25530 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2375,15 +2375,10 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 		pe->tce_bypass_enabled = enable;
 }
 
-static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
-		int num, __u32 page_shift, __u64 window_size, __u32 levels,
+static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
+		__u32 page_shift, __u64 window_size, __u32 levels,
 		bool alloc_userspace_copy, struct iommu_table **ptbl)
 {
-	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
-			table_group);
-	int nid = pe->phb->hose->node;
-	__u64 bus_offset = num ?
-		pe->table_group.tce64_start : table_group->tce32_start;
 	long ret;
 	struct iommu_table *tbl;
 
@@ -2410,21 +2405,23 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 {
 	struct iommu_table *tbl = NULL;
 	long rc;
-	unsigned long res_start, res_end;
+	u64 max_memory, maxblock, window_size;
+	const unsigned int tceshift = PAGE_SHIFT;
+	unsigned long res_start, res_end, tces_order, tcelevel_order, levels;
 
 	/*
 	 * crashkernel= specifies the kdump kernel's maximum memory at
 	 * some offset and there is no guaranteed the result is a power
 	 * of 2, which will cause errors later.
 	 */
-	const u64 max_memory = __rounddown_pow_of_two(memory_hotplug_max());
+	max_memory = __rounddown_pow_of_two(memory_hotplug_max());
 
 	/*
 	 * In memory constrained environments, e.g. kdump kernel, the
 	 * DMA window can be larger than available memory, which will
 	 * cause errors later.
 	 */
-	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
+	maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
 
 	/*
 	 * We create the default window as big as we can. The constraint is
@@ -2434,11 +2431,11 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 * to support crippled devices (i.e. not fully 64bit DMAble) only.
 	 */
 	/* iommu_table::it_map uses 1 bit per IOMMU page, hence 8 */
-	const u64 window_size = min((maxblock * 8) << PAGE_SHIFT, max_memory);
+	window_size = min((maxblock * 8) << tceshift, max_memory);
 	/* Each TCE level cannot exceed maxblock so go multilevel if needed */
-	unsigned long tces_order = ilog2(window_size >> PAGE_SHIFT);
-	unsigned long tcelevel_order = ilog2(maxblock >> 3);
-	unsigned int levels = tces_order / tcelevel_order;
+	tces_order = ilog2(window_size >> tceshift);
+	tcelevel_order = ilog2(maxblock >> 3);
+	levels = tces_order / tcelevel_order;
 
 	if (tces_order % tcelevel_order)
 		levels += 1;
@@ -2448,8 +2445,8 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 */
 	levels = max_t(unsigned int, levels, POWERNV_IOMMU_DEFAULT_LEVELS);
 
-	rc = pnv_pci_ioda2_create_table(&pe->table_group, 0, PAGE_SHIFT,
-			window_size, levels, false, &tbl);
+	rc = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+			0, 0, tceshift, window_size, levels, false, &tbl);
 	if (rc) {
 		pe_err(pe, "Failed to create 32-bit TCE table, err %ld",
 				rc);
@@ -2551,8 +2548,13 @@ static long pnv_pci_ioda2_create_table_userspace(
 		int num, __u32 page_shift, __u64 window_size, __u32 levels,
 		struct iommu_table **ptbl)
 {
-	long ret = pnv_pci_ioda2_create_table(table_group,
-			num, page_shift, window_size, levels, true, ptbl);
+	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
+			table_group);
+	__u64 bus_offset = num ?
+		pe->table_group.tce64_start : table_group->tce32_start;
+	long ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+			num, bus_offset, page_shift, window_size, levels, true,
+			ptbl);
 
 	if (!ret)
 		(*ptbl)->it_allocated_size = pnv_pci_ioda2_get_table_size(
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 2/5] powerpc/powernv/ioda: Rework for huge DMA window at 4GB
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

This moves code to make the next patches look simpler. In particular:

1. Separate locals declaration as we will be allocating a smaller DMA
window if a TVE1_4GB option (allows a huge DMA windows at 4GB) is enabled;

2. Pass the bypass offset directly to pnv_pci_ioda2_create_table()
as it is the only information needed from @pe;

3. Use PAGE_SHIFT for it_map allocation estimate and @tceshift for
the IOMMU page size; this makes the distinction clear and allows
easy switching between different IOMMU page size.

These changes should not cause behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

I really need 1), 2) makes the code less dependent on the PE struct member
value (=easier to follow), 3) is to enable 2MB quickly for the default
DMA window for debugging/performance testing.
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 38 ++++++++++++-----------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 52db10ab4fef..f5f1b4e25530 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2375,15 +2375,10 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 		pe->tce_bypass_enabled = enable;
 }
 
-static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
-		int num, __u32 page_shift, __u64 window_size, __u32 levels,
+static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
+		__u32 page_shift, __u64 window_size, __u32 levels,
 		bool alloc_userspace_copy, struct iommu_table **ptbl)
 {
-	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
-			table_group);
-	int nid = pe->phb->hose->node;
-	__u64 bus_offset = num ?
-		pe->table_group.tce64_start : table_group->tce32_start;
 	long ret;
 	struct iommu_table *tbl;
 
@@ -2410,21 +2405,23 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 {
 	struct iommu_table *tbl = NULL;
 	long rc;
-	unsigned long res_start, res_end;
+	u64 max_memory, maxblock, window_size;
+	const unsigned int tceshift = PAGE_SHIFT;
+	unsigned long res_start, res_end, tces_order, tcelevel_order, levels;
 
 	/*
 	 * crashkernel= specifies the kdump kernel's maximum memory at
 	 * some offset and there is no guaranteed the result is a power
 	 * of 2, which will cause errors later.
 	 */
-	const u64 max_memory = __rounddown_pow_of_two(memory_hotplug_max());
+	max_memory = __rounddown_pow_of_two(memory_hotplug_max());
 
 	/*
 	 * In memory constrained environments, e.g. kdump kernel, the
 	 * DMA window can be larger than available memory, which will
 	 * cause errors later.
 	 */
-	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
+	maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
 
 	/*
 	 * We create the default window as big as we can. The constraint is
@@ -2434,11 +2431,11 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 * to support crippled devices (i.e. not fully 64bit DMAble) only.
 	 */
 	/* iommu_table::it_map uses 1 bit per IOMMU page, hence 8 */
-	const u64 window_size = min((maxblock * 8) << PAGE_SHIFT, max_memory);
+	window_size = min((maxblock * 8) << tceshift, max_memory);
 	/* Each TCE level cannot exceed maxblock so go multilevel if needed */
-	unsigned long tces_order = ilog2(window_size >> PAGE_SHIFT);
-	unsigned long tcelevel_order = ilog2(maxblock >> 3);
-	unsigned int levels = tces_order / tcelevel_order;
+	tces_order = ilog2(window_size >> tceshift);
+	tcelevel_order = ilog2(maxblock >> 3);
+	levels = tces_order / tcelevel_order;
 
 	if (tces_order % tcelevel_order)
 		levels += 1;
@@ -2448,8 +2445,8 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 */
 	levels = max_t(unsigned int, levels, POWERNV_IOMMU_DEFAULT_LEVELS);
 
-	rc = pnv_pci_ioda2_create_table(&pe->table_group, 0, PAGE_SHIFT,
-			window_size, levels, false, &tbl);
+	rc = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+			0, 0, tceshift, window_size, levels, false, &tbl);
 	if (rc) {
 		pe_err(pe, "Failed to create 32-bit TCE table, err %ld",
 				rc);
@@ -2551,8 +2548,13 @@ static long pnv_pci_ioda2_create_table_userspace(
 		int num, __u32 page_shift, __u64 window_size, __u32 levels,
 		struct iommu_table **ptbl)
 {
-	long ret = pnv_pci_ioda2_create_table(table_group,
-			num, page_shift, window_size, levels, true, ptbl);
+	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
+			table_group);
+	__u64 bus_offset = num ?
+		pe->table_group.tce64_start : table_group->tce32_start;
+	long ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+			num, bus_offset, page_shift, window_size, levels, true,
+			ptbl);
 
 	if (!ret)
 		(*ptbl)->it_allocated_size = pnv_pci_ioda2_get_table_size(
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 3/5] powerpc/powernv/ioda: Allow smaller TCE table levels
  2020-02-18  7:36 ` Alexey Kardashevskiy
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

Now the minimum allocation size for a TCE table level is PAGE_SIZE (64k)
as this is the minimum for alloc_pages(). The limit was set in POWER8
where we did not have sparse RAM so we did not need sparse TCE tables.
On POWER9 we have gaps in the phys address space for which using multi
level TCE tables makes sense. The problem with that is that 64K per level
is too much for 2 levels and 1GB pages as it exceeds the hardware limit
of 55bits so we need smaller levels.

This drops the minimum level size to 4K.

For a machine with 2 CPUs, top RAM address 0x4000.0000.0000
(each node gets 32TiB) and 1GiB IOMMU pages:
Before the patch: 512KiB or 8 pages.
After the patch: 3 pages: one level1 + 2xlevel2 tables, each can map
up to 64k>>3<<30 = 8TiB of physical space.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 5dc6847d5f4c..82e680da9d94 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -37,7 +37,7 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
 	__be64 *addr;
 
 	tce_mem = alloc_pages_node(nid, GFP_ATOMIC | __GFP_NOWARN,
-			shift - PAGE_SHIFT);
+			shift > PAGE_SHIFT ? shift - PAGE_SHIFT : 0);
 	if (!tce_mem) {
 		pr_err("Failed to allocate a TCE memory, level shift=%d\n",
 				shift);
@@ -282,7 +282,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	/* Adjust direct table size from window_size and levels */
 	entries_shift = (entries_shift + levels - 1) / levels;
 	level_shift = entries_shift + 3;
-	level_shift = max_t(unsigned int, level_shift, PAGE_SHIFT);
+	level_shift = max_t(unsigned int, level_shift, 12); /* 4K is minimum */
 
 	if ((level_shift - 3) * levels + page_shift >= 55)
 		return -EINVAL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 3/5] powerpc/powernv/ioda: Allow smaller TCE table levels
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

Now the minimum allocation size for a TCE table level is PAGE_SIZE (64k)
as this is the minimum for alloc_pages(). The limit was set in POWER8
where we did not have sparse RAM so we did not need sparse TCE tables.
On POWER9 we have gaps in the phys address space for which using multi
level TCE tables makes sense. The problem with that is that 64K per level
is too much for 2 levels and 1GB pages as it exceeds the hardware limit
of 55bits so we need smaller levels.

This drops the minimum level size to 4K.

For a machine with 2 CPUs, top RAM address 0x4000.0000.0000
(each node gets 32TiB) and 1GiB IOMMU pages:
Before the patch: 512KiB or 8 pages.
After the patch: 3 pages: one level1 + 2xlevel2 tables, each can map
up to 64k>>3<<30 = 8TiB of physical space.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 5dc6847d5f4c..82e680da9d94 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -37,7 +37,7 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
 	__be64 *addr;
 
 	tce_mem = alloc_pages_node(nid, GFP_ATOMIC | __GFP_NOWARN,
-			shift - PAGE_SHIFT);
+			shift > PAGE_SHIFT ? shift - PAGE_SHIFT : 0);
 	if (!tce_mem) {
 		pr_err("Failed to allocate a TCE memory, level shift=%d\n",
 				shift);
@@ -282,7 +282,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	/* Adjust direct table size from window_size and levels */
 	entries_shift = (entries_shift + levels - 1) / levels;
 	level_shift = entries_shift + 3;
-	level_shift = max_t(unsigned int, level_shift, PAGE_SHIFT);
+	level_shift = max_t(unsigned int, level_shift, 12); /* 4K is minimum */
 
 	if ((level_shift - 3) * levels + page_shift >= 55)
 		return -EINVAL;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 4/5] powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
  2020-02-18  7:36 ` Alexey Kardashevskiy
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

IODA2 systems (POWER8/9) allow DMA windows at 2 fixed locations - 0 and
0x800.0000.0000.0000==1<<59, stored in TVT as TVE0/1. PHB4 on POWER9 has
an additional PHB mode to allow mapping both windows at 0 and selecting
one based on IOBA address - accesses below 4GB go via TVE0 and above
4GB - via TVE1; it is called TVT Select 'GTE4GB' Option in PHB4.

This adds a new value for the "iommu" kernel parameter - "iommu_bypass".
When enabled this programs PHB to enable the TVE1_4GB mode and uses TVE1
to map the entire host RAM using the largest available IOMMU pages
- 1GB for PHB and 256MB for NPU-PHB effectively.

This creates huge DMA windows at 4GB for VFIO; new iommu_table::it_dmaoff
flag tells the VFIO IOMMU driver about the bus offset it should be using.
Since the actual second table still covers the first 4GB of the bus space
iommu_table::it_offset is set to 0. The next patch will explicitly
advertise new offset to the userspace.

When TVE1_4GB enabled, this creates a small (2GB typically) 32 bit window
as there is no need to cover as much of lower DMA space - the 4GB+ window
does it better anyway.

This keeps the existing behavior by default as the TVE1_4GB flag is set
per PHB by device assignment is done on PE basis and managing both modes
dynamically might get nasty.

This relies on skiboot supporting the new OPAL_PHB_SET_OPTION OPAL call.

Suggested-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h           |   1 +
 arch/powerpc/include/asm/opal-api.h        |   9 +-
 arch/powerpc/include/asm/opal.h            |   2 +
 arch/powerpc/platforms/powernv/pci.h       |   1 +
 arch/powerpc/platforms/powernv/opal-call.c |   2 +
 arch/powerpc/platforms/powernv/pci-ioda.c  | 187 ++++++++++++++++++---
 6 files changed, 173 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 479439ef003e..e0ca82d31253 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -97,6 +97,7 @@ struct iommu_table {
 	unsigned long  it_level_size;
 	unsigned long  it_allocated_size;
 	unsigned long  it_offset;    /* Offset into global table */
+	unsigned long  it_dmaoff;
 	unsigned long  it_base;      /* mapped address of tce table */
 	unsigned long  it_index;     /* which iommu table this is */
 	unsigned long  it_type;      /* type: PCI or Virtual Bus */
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index c1f25a760eb1..7873754f5ea6 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -214,7 +214,9 @@
 #define OPAL_SECVAR_GET				176
 #define OPAL_SECVAR_GET_NEXT			177
 #define OPAL_SECVAR_ENQUEUE_UPDATE		178
-#define OPAL_LAST				178
+#define OPAL_PHB_SET_OPTION			179
+#define OPAL_PHB_GET_OPTION			180
+#define OPAL_LAST				180
 
 #define QUIESCE_HOLD			1 /* Spin all calls at entry */
 #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
@@ -437,6 +439,11 @@ enum OpalSlotLedState {
 	OPAL_SLOT_LED_STATE_ON = 1	/* LED is ON */
 };
 
+enum OpalPhbOption {
+	OPAL_PHB_OPTION_TVE1_4GB = 0x1,
+	OPAL_PHB_OPTION_MMIO_EEH_DISABLE = 0x2,
+};
+
 /*
  * Address cycle types for LPC accesses. These also correspond
  * to the content of the first cell of the "reg" property for
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9986ac34b8e2..89b712288cdd 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -142,6 +142,8 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint16_t dma_window_number, uint64_t pci_start_addr,
 					uint64_t pci_mem_size);
+int64_t opal_phb_set_option(uint64_t phb_id, uint64_t opt, uint64_t setting);
+int64_t opal_phb_get_option(uint64_t phb_id, uint64_t opt, uint64_t *setting);
 int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index a808dd396522..ce00278185b0 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -100,6 +100,7 @@ struct pnv_phb {
 	int			has_dbgfs;
 	struct dentry		*dbgfs;
 #endif
+	struct iommu_table	*bypass_tbl; /* PNV_IOMMU_TCE_BYPASS only */
 
 	unsigned int		msi_base;
 	unsigned int		msi32_support;
diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
index 5cd0f52d258f..3130d5a41570 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -293,3 +293,5 @@ OPAL_CALL(opal_mpipl_query_tag,			OPAL_MPIPL_QUERY_TAG);
 OPAL_CALL(opal_secvar_get,			OPAL_SECVAR_GET);
 OPAL_CALL(opal_secvar_get_next,			OPAL_SECVAR_GET_NEXT);
 OPAL_CALL(opal_secvar_enqueue_update,		OPAL_SECVAR_ENQUEUE_UPDATE);
+OPAL_CALL(opal_phb_set_option,			OPAL_PHB_SET_OPTION);
+OPAL_CALL(opal_phb_get_option,			OPAL_PHB_GET_OPTION);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f5f1b4e25530..74b4ac8cb71d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -51,6 +51,10 @@ static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU_NVLINK",
 					      "NPU_OCAPI" };
 
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
+static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
+static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
+		__u32 page_shift, __u64 window_size, __u32 levels,
+		bool alloc_userspace_copy, struct iommu_table **ptbl);
 
 void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...)
@@ -83,7 +87,14 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 	va_end(args);
 }
 
-static bool pnv_iommu_bypass_disabled __read_mostly;
+enum pnv_iommu_bypass_mode {
+	PNV_IOMMU_NO_TRANSLATE,
+	PNV_IOMMU_BYPASS_DISABLED,
+	PNV_IOMMU_TCE_BYPASS
+};
+
+static enum pnv_iommu_bypass_mode pnv_iommu_bypass_mode __read_mostly =
+		PNV_IOMMU_NO_TRANSLATE;
 static bool pci_reset_phbs __read_mostly;
 
 static int __init iommu_setup(char *str)
@@ -93,9 +104,13 @@ static int __init iommu_setup(char *str)
 
 	while (*str) {
 		if (!strncmp(str, "nobypass", 8)) {
-			pnv_iommu_bypass_disabled = true;
+			pnv_iommu_bypass_mode = PNV_IOMMU_BYPASS_DISABLED;
 			pr_info("PowerNV: IOMMU bypass window disabled.\n");
 			break;
+		} else if (!strncmp(str, "iommu_bypass", 12)) {
+			pnv_iommu_bypass_mode = PNV_IOMMU_TCE_BYPASS;
+			pr_info("PowerNV: IOMMU TCE bypass window selected.\n");
+			break;
 		}
 		str += strcspn(str, ",");
 		if (*str == ',')
@@ -2351,28 +2366,125 @@ static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
 	return 0;
 }
 
+static long pnv_pci_ioda2_set_bypass_iommu(struct pnv_ioda_pe *pe,
+		unsigned long offset)
+{
+	struct pnv_phb *phb = pe->phb;
+	long rc;
+	struct memblock_region *r;
+	unsigned long i, pgsizes, tceoff;
+
+
+	pgsizes = pnv_ioda_parse_tce_sizes(phb);
+	/* Filter sizes to have round number of TCEs to cover 0..offset */
+	pgsizes &= offset | (offset - 1);
+	if (!pgsizes)
+		return -1;
+
+	if (!phb->bypass_tbl) {
+		struct iommu_table *tbl = NULL;
+
+		rc = pnv_pci_ioda2_create_table(phb->hose->node,
+				1 /* window number */,
+				0 /* bus offset */,
+				__fls(pgsizes),
+				roundup_pow_of_two(memory_hotplug_max() +
+					offset),
+				2 /* levels */,
+				false /* userspace cache */,
+				&tbl);
+		if (rc)
+			return -1;
+
+		tceoff = offset >> tbl->it_page_shift;
+
+		for_each_memblock(memory, r) {
+			phys_addr_t start, end;
+
+			start = r->base >> tbl->it_page_shift;
+			end = start + (r->size >> tbl->it_page_shift);
+
+			for (i = start; i < end; ++i)
+				tbl->it_ops->set(tbl, i + tceoff, 1,
+						i << tbl->it_page_shift,
+						DMA_BIDIRECTIONAL, 0);
+		}
+		phb->bypass_tbl = tbl;
+		pe_err(pe, "Created 64-bit bypass TCE table at 0x%lx\n",
+			phb->bypass_tbl->it_base);
+	} else {
+		iommu_tce_table_get(phb->bypass_tbl);
+	}
+
+	rc = pnv_pci_ioda2_set_window(&pe->table_group, 1, phb->bypass_tbl);
+	if (rc) {
+		iommu_tce_table_put(phb->bypass_tbl);
+		return -1;
+	}
+
+	pe->tce_bypass_enabled = true;
+
+	return 0;
+}
+
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 {
+	struct pnv_phb *phb = pe->phb;
 	uint16_t window_id = (pe->pe_number << 1 ) + 1;
 	int64_t rc;
+	phys_addr_t top;
 
-	pe_info(pe, "%sabling 64-bit DMA bypass\n", enable ? "En" : "Dis");
-	if (enable) {
-		phys_addr_t top = memblock_end_of_DRAM();
+	if (!enable) {
+		pe_info(pe, "Disabling 64-bit bypass\n");
+		/*
+		 * Whether it was bypass or iommu bypass, this will write
+		 * a zero to TVE and disable the window;
+		 * opal_pci_map_pe_dma_window() does the same thing when
+		 * tce_table_size==0.
+		 */
+		rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
+				pe->pe_number, window_id, 0, 0);
+		if (rc)
+			pe_err(pe, "OPAL error %lld configuring bypass window\n",
+				rc);
 
-		top = roundup_pow_of_two(top);
-		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-				pe->pe_number, window_id,
-				pe->table_group.tce64_start, top);
-	} else {
-		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-				pe->pe_number, window_id,
-				pe->table_group.tce64_start, 0);
+		pe->tce_bypass_enabled = false;
+		return;
+	}
+
+	if (pnv_iommu_bypass_mode == PNV_IOMMU_TCE_BYPASS) {
+		if (!opal_phb_set_option(phb->opal_id,
+					OPAL_PHB_OPTION_TVE1_4GB, 1)) {
+			pe->table_group.tce64_start = SZ_4G;
+			if (!pnv_pci_ioda2_set_bypass_iommu(pe,
+						pe->table_group.tce64_start)) {
+				pe->tce_bypass_enabled = true;
+				pe_info(pe, "Enabled 64-bit IOMMU bypass at %llx\n",
+						pe->table_group.tce64_start);
+				return;
+			}
+			pe_err(pe, "Enabled TVE1_4GB but failed to configure TCE table");
+		} else {
+			/* IOMMU bypass failed, fallback to direct bypass */
+			pnv_iommu_bypass_mode = PNV_IOMMU_NO_TRANSLATE;
+		}
+	}
+
+	if (pnv_iommu_bypass_mode == PNV_IOMMU_NO_TRANSLATE) {
+		/*
+		 * FIXME: if we enable dynamic switch, here we need to disable
+		 * OPAL_PCI_PHB_FLAG_TVE1_4GB
+		 */
+		top = roundup_pow_of_two(memblock_end_of_DRAM());
+		if (!opal_pci_map_pe_dma_window_real(phb->opal_id,
+					pe->pe_number, window_id,
+					pe->table_group.tce64_start, top)) {
+			pe->tce_bypass_enabled = true;
+			pe_info(pe, "Enabled 64-bit direct bypass at %llx\n",
+					pe->table_group.tce64_start);
+			return;
+		}
 	}
-	if (rc)
-		pe_err(pe, "OPAL error %lld configuring bypass window\n", rc);
-	else
-		pe->tce_bypass_enabled = enable;
 }
 
 static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
@@ -2409,6 +2521,9 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	const unsigned int tceshift = PAGE_SHIFT;
 	unsigned long res_start, res_end, tces_order, tcelevel_order, levels;
 
+	if (pnv_iommu_bypass_mode != PNV_IOMMU_BYPASS_DISABLED)
+		pnv_pci_ioda2_set_bypass(pe, true);
+
 	/*
 	 * crashkernel= specifies the kdump kernel's maximum memory at
 	 * some offset and there is no guaranteed the result is a power
@@ -2432,6 +2547,15 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 */
 	/* iommu_table::it_map uses 1 bit per IOMMU page, hence 8 */
 	window_size = min((maxblock * 8) << tceshift, max_memory);
+
+	/*
+	 * If we get TVE#1_4GB on, there is no point in having a huge default
+	 * DMA window.
+	 */
+	if (pnv_iommu_bypass_mode == PNV_IOMMU_TCE_BYPASS)
+		window_size = min_t(u64, pe->table_group.tce32_size,
+				window_size);
+
 	/* Each TCE level cannot exceed maxblock so go multilevel if needed */
 	tces_order = ilog2(window_size >> tceshift);
 	tcelevel_order = ilog2(maxblock >> 3);
@@ -2470,9 +2594,6 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 		return rc;
 	}
 
-	if (!pnv_iommu_bypass_disabled)
-		pnv_pci_ioda2_set_bypass(pe, true);
-
 	/*
 	 * Set table base for the case of IOMMU DMA use. Usually this is done
 	 * from dma_dev_setup() which is not called when a device is returned
@@ -2550,15 +2671,27 @@ static long pnv_pci_ioda2_create_table_userspace(
 {
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 			table_group);
-	__u64 bus_offset = num ?
-		pe->table_group.tce64_start : table_group->tce32_start;
-	long ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
-			num, bus_offset, page_shift, window_size, levels, true,
-			ptbl);
+	__u64 bus_offset = num ? table_group->tce64_start :
+				 table_group->tce32_start;
+	long ret;
 
-	if (!ret)
+	if (bus_offset == SZ_4G)
+		ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+				num, 0, page_shift,
+				roundup_pow_of_two(window_size + bus_offset),
+				levels, true, ptbl);
+	else
+		ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+				num, bus_offset, page_shift,
+				window_size,
+				levels, true, ptbl);
+
+	if (!ret) {
 		(*ptbl)->it_allocated_size = pnv_pci_ioda2_get_table_size(
 				page_shift, window_size, levels);
+		(*ptbl)->it_dmaoff = bus_offset >> page_shift;
+	}
+
 	return ret;
 }
 
@@ -2624,8 +2757,6 @@ static void pnv_ioda_setup_bus_iommu_group(struct pnv_ioda_pe *pe,
 				bus);
 }
 
-static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
-
 static void pnv_pci_ioda_setup_iommu_api(void)
 {
 	struct pci_controller *hose;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 4/5] powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

IODA2 systems (POWER8/9) allow DMA windows at 2 fixed locations - 0 and
0x800.0000.0000.0000=1<<59, stored in TVT as TVE0/1. PHB4 on POWER9 has
an additional PHB mode to allow mapping both windows at 0 and selecting
one based on IOBA address - accesses below 4GB go via TVE0 and above
4GB - via TVE1; it is called TVT Select 'GTE4GB' Option in PHB4.

This adds a new value for the "iommu" kernel parameter - "iommu_bypass".
When enabled this programs PHB to enable the TVE1_4GB mode and uses TVE1
to map the entire host RAM using the largest available IOMMU pages
- 1GB for PHB and 256MB for NPU-PHB effectively.

This creates huge DMA windows at 4GB for VFIO; new iommu_table::it_dmaoff
flag tells the VFIO IOMMU driver about the bus offset it should be using.
Since the actual second table still covers the first 4GB of the bus space
iommu_table::it_offset is set to 0. The next patch will explicitly
advertise new offset to the userspace.

When TVE1_4GB enabled, this creates a small (2GB typically) 32 bit window
as there is no need to cover as much of lower DMA space - the 4GB+ window
does it better anyway.

This keeps the existing behavior by default as the TVE1_4GB flag is set
per PHB by device assignment is done on PE basis and managing both modes
dynamically might get nasty.

This relies on skiboot supporting the new OPAL_PHB_SET_OPTION OPAL call.

Suggested-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h           |   1 +
 arch/powerpc/include/asm/opal-api.h        |   9 +-
 arch/powerpc/include/asm/opal.h            |   2 +
 arch/powerpc/platforms/powernv/pci.h       |   1 +
 arch/powerpc/platforms/powernv/opal-call.c |   2 +
 arch/powerpc/platforms/powernv/pci-ioda.c  | 187 ++++++++++++++++++---
 6 files changed, 173 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 479439ef003e..e0ca82d31253 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -97,6 +97,7 @@ struct iommu_table {
 	unsigned long  it_level_size;
 	unsigned long  it_allocated_size;
 	unsigned long  it_offset;    /* Offset into global table */
+	unsigned long  it_dmaoff;
 	unsigned long  it_base;      /* mapped address of tce table */
 	unsigned long  it_index;     /* which iommu table this is */
 	unsigned long  it_type;      /* type: PCI or Virtual Bus */
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index c1f25a760eb1..7873754f5ea6 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -214,7 +214,9 @@
 #define OPAL_SECVAR_GET				176
 #define OPAL_SECVAR_GET_NEXT			177
 #define OPAL_SECVAR_ENQUEUE_UPDATE		178
-#define OPAL_LAST				178
+#define OPAL_PHB_SET_OPTION			179
+#define OPAL_PHB_GET_OPTION			180
+#define OPAL_LAST				180
 
 #define QUIESCE_HOLD			1 /* Spin all calls at entry */
 #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
@@ -437,6 +439,11 @@ enum OpalSlotLedState {
 	OPAL_SLOT_LED_STATE_ON = 1	/* LED is ON */
 };
 
+enum OpalPhbOption {
+	OPAL_PHB_OPTION_TVE1_4GB = 0x1,
+	OPAL_PHB_OPTION_MMIO_EEH_DISABLE = 0x2,
+};
+
 /*
  * Address cycle types for LPC accesses. These also correspond
  * to the content of the first cell of the "reg" property for
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9986ac34b8e2..89b712288cdd 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -142,6 +142,8 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
 					uint16_t dma_window_number, uint64_t pci_start_addr,
 					uint64_t pci_mem_size);
+int64_t opal_phb_set_option(uint64_t phb_id, uint64_t opt, uint64_t setting);
+int64_t opal_phb_get_option(uint64_t phb_id, uint64_t opt, uint64_t *setting);
 int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index a808dd396522..ce00278185b0 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -100,6 +100,7 @@ struct pnv_phb {
 	int			has_dbgfs;
 	struct dentry		*dbgfs;
 #endif
+	struct iommu_table	*bypass_tbl; /* PNV_IOMMU_TCE_BYPASS only */
 
 	unsigned int		msi_base;
 	unsigned int		msi32_support;
diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
index 5cd0f52d258f..3130d5a41570 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -293,3 +293,5 @@ OPAL_CALL(opal_mpipl_query_tag,			OPAL_MPIPL_QUERY_TAG);
 OPAL_CALL(opal_secvar_get,			OPAL_SECVAR_GET);
 OPAL_CALL(opal_secvar_get_next,			OPAL_SECVAR_GET_NEXT);
 OPAL_CALL(opal_secvar_enqueue_update,		OPAL_SECVAR_ENQUEUE_UPDATE);
+OPAL_CALL(opal_phb_set_option,			OPAL_PHB_SET_OPTION);
+OPAL_CALL(opal_phb_get_option,			OPAL_PHB_GET_OPTION);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f5f1b4e25530..74b4ac8cb71d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -51,6 +51,10 @@ static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU_NVLINK",
 					      "NPU_OCAPI" };
 
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
+static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
+static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
+		__u32 page_shift, __u64 window_size, __u32 levels,
+		bool alloc_userspace_copy, struct iommu_table **ptbl);
 
 void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 			    const char *fmt, ...)
@@ -83,7 +87,14 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
 	va_end(args);
 }
 
-static bool pnv_iommu_bypass_disabled __read_mostly;
+enum pnv_iommu_bypass_mode {
+	PNV_IOMMU_NO_TRANSLATE,
+	PNV_IOMMU_BYPASS_DISABLED,
+	PNV_IOMMU_TCE_BYPASS
+};
+
+static enum pnv_iommu_bypass_mode pnv_iommu_bypass_mode __read_mostly +		PNV_IOMMU_NO_TRANSLATE;
 static bool pci_reset_phbs __read_mostly;
 
 static int __init iommu_setup(char *str)
@@ -93,9 +104,13 @@ static int __init iommu_setup(char *str)
 
 	while (*str) {
 		if (!strncmp(str, "nobypass", 8)) {
-			pnv_iommu_bypass_disabled = true;
+			pnv_iommu_bypass_mode = PNV_IOMMU_BYPASS_DISABLED;
 			pr_info("PowerNV: IOMMU bypass window disabled.\n");
 			break;
+		} else if (!strncmp(str, "iommu_bypass", 12)) {
+			pnv_iommu_bypass_mode = PNV_IOMMU_TCE_BYPASS;
+			pr_info("PowerNV: IOMMU TCE bypass window selected.\n");
+			break;
 		}
 		str += strcspn(str, ",");
 		if (*str = ',')
@@ -2351,28 +2366,125 @@ static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
 	return 0;
 }
 
+static long pnv_pci_ioda2_set_bypass_iommu(struct pnv_ioda_pe *pe,
+		unsigned long offset)
+{
+	struct pnv_phb *phb = pe->phb;
+	long rc;
+	struct memblock_region *r;
+	unsigned long i, pgsizes, tceoff;
+
+
+	pgsizes = pnv_ioda_parse_tce_sizes(phb);
+	/* Filter sizes to have round number of TCEs to cover 0..offset */
+	pgsizes &= offset | (offset - 1);
+	if (!pgsizes)
+		return -1;
+
+	if (!phb->bypass_tbl) {
+		struct iommu_table *tbl = NULL;
+
+		rc = pnv_pci_ioda2_create_table(phb->hose->node,
+				1 /* window number */,
+				0 /* bus offset */,
+				__fls(pgsizes),
+				roundup_pow_of_two(memory_hotplug_max() +
+					offset),
+				2 /* levels */,
+				false /* userspace cache */,
+				&tbl);
+		if (rc)
+			return -1;
+
+		tceoff = offset >> tbl->it_page_shift;
+
+		for_each_memblock(memory, r) {
+			phys_addr_t start, end;
+
+			start = r->base >> tbl->it_page_shift;
+			end = start + (r->size >> tbl->it_page_shift);
+
+			for (i = start; i < end; ++i)
+				tbl->it_ops->set(tbl, i + tceoff, 1,
+						i << tbl->it_page_shift,
+						DMA_BIDIRECTIONAL, 0);
+		}
+		phb->bypass_tbl = tbl;
+		pe_err(pe, "Created 64-bit bypass TCE table at 0x%lx\n",
+			phb->bypass_tbl->it_base);
+	} else {
+		iommu_tce_table_get(phb->bypass_tbl);
+	}
+
+	rc = pnv_pci_ioda2_set_window(&pe->table_group, 1, phb->bypass_tbl);
+	if (rc) {
+		iommu_tce_table_put(phb->bypass_tbl);
+		return -1;
+	}
+
+	pe->tce_bypass_enabled = true;
+
+	return 0;
+}
+
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 {
+	struct pnv_phb *phb = pe->phb;
 	uint16_t window_id = (pe->pe_number << 1 ) + 1;
 	int64_t rc;
+	phys_addr_t top;
 
-	pe_info(pe, "%sabling 64-bit DMA bypass\n", enable ? "En" : "Dis");
-	if (enable) {
-		phys_addr_t top = memblock_end_of_DRAM();
+	if (!enable) {
+		pe_info(pe, "Disabling 64-bit bypass\n");
+		/*
+		 * Whether it was bypass or iommu bypass, this will write
+		 * a zero to TVE and disable the window;
+		 * opal_pci_map_pe_dma_window() does the same thing when
+		 * tce_table_size=0.
+		 */
+		rc = opal_pci_map_pe_dma_window_real(phb->opal_id,
+				pe->pe_number, window_id, 0, 0);
+		if (rc)
+			pe_err(pe, "OPAL error %lld configuring bypass window\n",
+				rc);
 
-		top = roundup_pow_of_two(top);
-		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-				pe->pe_number, window_id,
-				pe->table_group.tce64_start, top);
-	} else {
-		rc = opal_pci_map_pe_dma_window_real(pe->phb->opal_id,
-				pe->pe_number, window_id,
-				pe->table_group.tce64_start, 0);
+		pe->tce_bypass_enabled = false;
+		return;
+	}
+
+	if (pnv_iommu_bypass_mode = PNV_IOMMU_TCE_BYPASS) {
+		if (!opal_phb_set_option(phb->opal_id,
+					OPAL_PHB_OPTION_TVE1_4GB, 1)) {
+			pe->table_group.tce64_start = SZ_4G;
+			if (!pnv_pci_ioda2_set_bypass_iommu(pe,
+						pe->table_group.tce64_start)) {
+				pe->tce_bypass_enabled = true;
+				pe_info(pe, "Enabled 64-bit IOMMU bypass at %llx\n",
+						pe->table_group.tce64_start);
+				return;
+			}
+			pe_err(pe, "Enabled TVE1_4GB but failed to configure TCE table");
+		} else {
+			/* IOMMU bypass failed, fallback to direct bypass */
+			pnv_iommu_bypass_mode = PNV_IOMMU_NO_TRANSLATE;
+		}
+	}
+
+	if (pnv_iommu_bypass_mode = PNV_IOMMU_NO_TRANSLATE) {
+		/*
+		 * FIXME: if we enable dynamic switch, here we need to disable
+		 * OPAL_PCI_PHB_FLAG_TVE1_4GB
+		 */
+		top = roundup_pow_of_two(memblock_end_of_DRAM());
+		if (!opal_pci_map_pe_dma_window_real(phb->opal_id,
+					pe->pe_number, window_id,
+					pe->table_group.tce64_start, top)) {
+			pe->tce_bypass_enabled = true;
+			pe_info(pe, "Enabled 64-bit direct bypass at %llx\n",
+					pe->table_group.tce64_start);
+			return;
+		}
 	}
-	if (rc)
-		pe_err(pe, "OPAL error %lld configuring bypass window\n", rc);
-	else
-		pe->tce_bypass_enabled = enable;
 }
 
 static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
@@ -2409,6 +2521,9 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	const unsigned int tceshift = PAGE_SHIFT;
 	unsigned long res_start, res_end, tces_order, tcelevel_order, levels;
 
+	if (pnv_iommu_bypass_mode != PNV_IOMMU_BYPASS_DISABLED)
+		pnv_pci_ioda2_set_bypass(pe, true);
+
 	/*
 	 * crashkernel= specifies the kdump kernel's maximum memory at
 	 * some offset and there is no guaranteed the result is a power
@@ -2432,6 +2547,15 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 */
 	/* iommu_table::it_map uses 1 bit per IOMMU page, hence 8 */
 	window_size = min((maxblock * 8) << tceshift, max_memory);
+
+	/*
+	 * If we get TVE#1_4GB on, there is no point in having a huge default
+	 * DMA window.
+	 */
+	if (pnv_iommu_bypass_mode = PNV_IOMMU_TCE_BYPASS)
+		window_size = min_t(u64, pe->table_group.tce32_size,
+				window_size);
+
 	/* Each TCE level cannot exceed maxblock so go multilevel if needed */
 	tces_order = ilog2(window_size >> tceshift);
 	tcelevel_order = ilog2(maxblock >> 3);
@@ -2470,9 +2594,6 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 		return rc;
 	}
 
-	if (!pnv_iommu_bypass_disabled)
-		pnv_pci_ioda2_set_bypass(pe, true);
-
 	/*
 	 * Set table base for the case of IOMMU DMA use. Usually this is done
 	 * from dma_dev_setup() which is not called when a device is returned
@@ -2550,15 +2671,27 @@ static long pnv_pci_ioda2_create_table_userspace(
 {
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 			table_group);
-	__u64 bus_offset = num ?
-		pe->table_group.tce64_start : table_group->tce32_start;
-	long ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
-			num, bus_offset, page_shift, window_size, levels, true,
-			ptbl);
+	__u64 bus_offset = num ? table_group->tce64_start :
+				 table_group->tce32_start;
+	long ret;
 
-	if (!ret)
+	if (bus_offset = SZ_4G)
+		ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+				num, 0, page_shift,
+				roundup_pow_of_two(window_size + bus_offset),
+				levels, true, ptbl);
+	else
+		ret = pnv_pci_ioda2_create_table(pe->phb->hose->node,
+				num, bus_offset, page_shift,
+				window_size,
+				levels, true, ptbl);
+
+	if (!ret) {
 		(*ptbl)->it_allocated_size = pnv_pci_ioda2_get_table_size(
 				page_shift, window_size, levels);
+		(*ptbl)->it_dmaoff = bus_offset >> page_shift;
+	}
+
 	return ret;
 }
 
@@ -2624,8 +2757,6 @@ static void pnv_ioda_setup_bus_iommu_group(struct pnv_ioda_pe *pe,
 				bus);
 }
 
-static unsigned long pnv_ioda_parse_tce_sizes(struct pnv_phb *phb);
-
 static void pnv_pci_ioda_setup_iommu_api(void)
 {
 	struct pci_controller *hose;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 5/5] vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
  2020-02-18  7:36 ` Alexey Kardashevskiy
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

So far the only option for a big 64big DMA window was a window located
at 0x800.0000.0000.0000 (1<<59) which creates problems for devices
supporting smaller DMA masks.

This exploits a POWER9 PHB option to allow the second DMA window to map
at 0 and advertises it with a 4GB offset to avoid overlap with
the default 32bit window.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/uapi/linux/vfio.h           |  2 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 10 ++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..c7f89d47335a 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -831,9 +831,11 @@ struct vfio_iommu_spapr_tce_info {
 	__u32 argsz;
 	__u32 flags;
 #define VFIO_IOMMU_SPAPR_INFO_DDW	(1 << 0)	/* DDW supported */
+#define VFIO_IOMMU_SPAPR_INFO_DDW_START	(1 << 1)	/* DDW offset */
 	__u32 dma32_window_start;	/* 32 bit window start (bytes) */
 	__u32 dma32_window_size;	/* 32 bit window size (bytes) */
 	struct vfio_iommu_spapr_tce_ddw_info ddw;
+	__u64 dma64_window_start;
 };
 
 #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 16b3adc508db..4f22be3c4aa2 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -691,7 +691,7 @@ static long tce_iommu_create_window(struct tce_container *container,
 	container->tables[num] = tbl;
 
 	/* Return start address assigned by platform in create_table() */
-	*start_addr = tbl->it_offset << tbl->it_page_shift;
+	*start_addr = tbl->it_dmaoff << tbl->it_page_shift;
 
 	return 0;
 
@@ -842,7 +842,13 @@ static long tce_iommu_ioctl(void *iommu_data,
 			info.ddw.levels = table_group->max_levels;
 		}
 
-		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info, ddw);
+		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info,
+				dma64_window_start);
+
+		if (info.argsz >= ddwsz) {
+			info.flags |= VFIO_IOMMU_SPAPR_INFO_DDW_START;
+			info.dma64_window_start = table_group->tce64_start;
+		}
 
 		if (info.argsz >= ddwsz)
 			minsz = ddwsz;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kernel 5/5] vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
@ 2020-02-18  7:36   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-18  7:36 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Alex Williamson, kvm-ppc,
	David Gibson

So far the only option for a big 64big DMA window was a window located
at 0x800.0000.0000.0000 (1<<59) which creates problems for devices
supporting smaller DMA masks.

This exploits a POWER9 PHB option to allow the second DMA window to map
at 0 and advertises it with a 4GB offset to avoid overlap with
the default 32bit window.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/uapi/linux/vfio.h           |  2 ++
 drivers/vfio/vfio_iommu_spapr_tce.c | 10 ++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..c7f89d47335a 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -831,9 +831,11 @@ struct vfio_iommu_spapr_tce_info {
 	__u32 argsz;
 	__u32 flags;
 #define VFIO_IOMMU_SPAPR_INFO_DDW	(1 << 0)	/* DDW supported */
+#define VFIO_IOMMU_SPAPR_INFO_DDW_START	(1 << 1)	/* DDW offset */
 	__u32 dma32_window_start;	/* 32 bit window start (bytes) */
 	__u32 dma32_window_size;	/* 32 bit window size (bytes) */
 	struct vfio_iommu_spapr_tce_ddw_info ddw;
+	__u64 dma64_window_start;
 };
 
 #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 16b3adc508db..4f22be3c4aa2 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -691,7 +691,7 @@ static long tce_iommu_create_window(struct tce_container *container,
 	container->tables[num] = tbl;
 
 	/* Return start address assigned by platform in create_table() */
-	*start_addr = tbl->it_offset << tbl->it_page_shift;
+	*start_addr = tbl->it_dmaoff << tbl->it_page_shift;
 
 	return 0;
 
@@ -842,7 +842,13 @@ static long tce_iommu_ioctl(void *iommu_data,
 			info.ddw.levels = table_group->max_levels;
 		}
 
-		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info, ddw);
+		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info,
+				dma64_window_start);
+
+		if (info.argsz >= ddwsz) {
+			info.flags |= VFIO_IOMMU_SPAPR_INFO_DDW_START;
+			info.dma64_window_start = table_group->tce64_start;
+		}
 
 		if (info.argsz >= ddwsz)
 			minsz = ddwsz;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH kernel 5/5] vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
  2020-02-18  7:36   ` Alexey Kardashevskiy
@ 2020-02-20 18:19     ` Alex Williamson
  -1 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2020-02-20 18:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Alistair Popple, linuxppc-dev, kvm-ppc, David Gibson

On Tue, 18 Feb 2020 18:36:50 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> So far the only option for a big 64big DMA window was a window located
> at 0x800.0000.0000.0000 (1<<59) which creates problems for devices
> supporting smaller DMA masks.
> 
> This exploits a POWER9 PHB option to allow the second DMA window to map
> at 0 and advertises it with a 4GB offset to avoid overlap with
> the default 32bit window.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/uapi/linux/vfio.h           |  2 ++
>  drivers/vfio/vfio_iommu_spapr_tce.c | 10 ++++++++--
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9e843a147ead..c7f89d47335a 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -831,9 +831,11 @@ struct vfio_iommu_spapr_tce_info {
>  	__u32 argsz;
>  	__u32 flags;
>  #define VFIO_IOMMU_SPAPR_INFO_DDW	(1 << 0)	/* DDW supported */
> +#define VFIO_IOMMU_SPAPR_INFO_DDW_START	(1 << 1)	/* DDW offset */
>  	__u32 dma32_window_start;	/* 32 bit window start (bytes) */
>  	__u32 dma32_window_size;	/* 32 bit window size (bytes) */
>  	struct vfio_iommu_spapr_tce_ddw_info ddw;
> +	__u64 dma64_window_start;
>  };
>  
>  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 16b3adc508db..4f22be3c4aa2 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -691,7 +691,7 @@ static long tce_iommu_create_window(struct tce_container *container,
>  	container->tables[num] = tbl;
>  
>  	/* Return start address assigned by platform in create_table() */
> -	*start_addr = tbl->it_offset << tbl->it_page_shift;
> +	*start_addr = tbl->it_dmaoff << tbl->it_page_shift;
>  
>  	return 0;
>  
> @@ -842,7 +842,13 @@ static long tce_iommu_ioctl(void *iommu_data,
>  			info.ddw.levels = table_group->max_levels;
>  		}
>  
> -		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info, ddw);
> +		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info,
> +				dma64_window_start);

This breaks existing users, now they no longer get the ddw struct
unless their argsz also includes the new dma64 window field.

> +
> +		if (info.argsz >= ddwsz) {
> +			info.flags |= VFIO_IOMMU_SPAPR_INFO_DDW_START;
> +			info.dma64_window_start = table_group->tce64_start;
> +		}

This is inconsistent with ddw where we set the flag regardless of
argsz, but obviously only provide the field to the user if they've
provided room for it.  Thanks,

Alex

>  
>  		if (info.argsz >= ddwsz)
>  			minsz = ddwsz;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kernel 5/5] vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
@ 2020-02-20 18:19     ` Alex Williamson
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2020-02-20 18:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Alistair Popple, linuxppc-dev, kvm-ppc, David Gibson

On Tue, 18 Feb 2020 18:36:50 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> So far the only option for a big 64big DMA window was a window located
> at 0x800.0000.0000.0000 (1<<59) which creates problems for devices
> supporting smaller DMA masks.
> 
> This exploits a POWER9 PHB option to allow the second DMA window to map
> at 0 and advertises it with a 4GB offset to avoid overlap with
> the default 32bit window.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/uapi/linux/vfio.h           |  2 ++
>  drivers/vfio/vfio_iommu_spapr_tce.c | 10 ++++++++--
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9e843a147ead..c7f89d47335a 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -831,9 +831,11 @@ struct vfio_iommu_spapr_tce_info {
>  	__u32 argsz;
>  	__u32 flags;
>  #define VFIO_IOMMU_SPAPR_INFO_DDW	(1 << 0)	/* DDW supported */
> +#define VFIO_IOMMU_SPAPR_INFO_DDW_START	(1 << 1)	/* DDW offset */
>  	__u32 dma32_window_start;	/* 32 bit window start (bytes) */
>  	__u32 dma32_window_size;	/* 32 bit window size (bytes) */
>  	struct vfio_iommu_spapr_tce_ddw_info ddw;
> +	__u64 dma64_window_start;
>  };
>  
>  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 16b3adc508db..4f22be3c4aa2 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -691,7 +691,7 @@ static long tce_iommu_create_window(struct tce_container *container,
>  	container->tables[num] = tbl;
>  
>  	/* Return start address assigned by platform in create_table() */
> -	*start_addr = tbl->it_offset << tbl->it_page_shift;
> +	*start_addr = tbl->it_dmaoff << tbl->it_page_shift;
>  
>  	return 0;
>  
> @@ -842,7 +842,13 @@ static long tce_iommu_ioctl(void *iommu_data,
>  			info.ddw.levels = table_group->max_levels;
>  		}
>  
> -		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info, ddw);
> +		ddwsz = offsetofend(struct vfio_iommu_spapr_tce_info,
> +				dma64_window_start);

This breaks existing users, now they no longer get the ddw struct
unless their argsz also includes the new dma64 window field.

> +
> +		if (info.argsz >= ddwsz) {
> +			info.flags |= VFIO_IOMMU_SPAPR_INFO_DDW_START;
> +			info.dma64_window_start = table_group->tce64_start;
> +		}

This is inconsistent with ddw where we set the flag regardless of
argsz, but obviously only provide the field to the user if they've
provided room for it.  Thanks,

Alex

>  
>  		if (info.argsz >= ddwsz)
>  			minsz = ddwsz;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window at 4GB
  2020-02-18  7:36 ` Alexey Kardashevskiy
@ 2020-03-18  4:31   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-18  4:31 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alistair Popple, Alex Williamson, kvm-ppc, David Gibson



On 18/02/2020 18:36, Alexey Kardashevskiy wrote:
> Here is an attempt to support bigger DMA space for devices
> supporting DMA masks less than 59 bits (GPUs come into mind
> first). POWER9 PHBs have an option to map 2 windows at 0
> and select a windows based on DMA address being below or above
> 4GB.
> 
> This adds the "iommu=iommu_bypass" kernel parameter and
> supports VFIO+pseries machine - current this requires telling
> upstream+unmodified QEMU about this via
> -global spapr-pci-host-bridge.dma64_win_addr=0x100000000
> or per-phb property. 4/4 advertises the new option but
> there is no automation around it in QEMU (should it be?).
> 
> For now it is either 1<<59 or 4GB mode; dynamic switching is
> not supported (could be via sysfs).
> 
> This is a rebased version of
> https://lore.kernel.org/kvm/20191202015953.127902-1-aik@ozlabs.ru/
> 
> This is based on sha1
> 71c3a888cbca Linus Torvalds "Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux".
> 
> Please comment. Thanks.


Anyone from the POWERPC side wants to comment? Thanks,


> 
> 
> 
> Alexey Kardashevskiy (5):
>   powerpc/powernv/ioda: Move TCE bypass base to PE
>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>   powerpc/powernv/ioda: Allow smaller TCE table levels
>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
> 
>  arch/powerpc/include/asm/iommu.h              |   2 +
>  arch/powerpc/include/asm/opal-api.h           |   9 +-
>  arch/powerpc/include/asm/opal.h               |   2 +
>  arch/powerpc/platforms/powernv/pci.h          |   2 +-
>  include/uapi/linux/vfio.h                     |   2 +
>  arch/powerpc/platforms/powernv/npu-dma.c      |   1 +
>  arch/powerpc/platforms/powernv/opal-call.c    |   2 +
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c     | 229 ++++++++++++++----
>  drivers/vfio/vfio_iommu_spapr_tce.c           |  10 +-
>  10 files changed, 207 insertions(+), 56 deletions(-)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window at 4GB
@ 2020-03-18  4:31   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-18  4:31 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alistair Popple, Alex Williamson, kvm-ppc, David Gibson



On 18/02/2020 18:36, Alexey Kardashevskiy wrote:
> Here is an attempt to support bigger DMA space for devices
> supporting DMA masks less than 59 bits (GPUs come into mind
> first). POWER9 PHBs have an option to map 2 windows at 0
> and select a windows based on DMA address being below or above
> 4GB.
> 
> This adds the "iommu=iommu_bypass" kernel parameter and
> supports VFIO+pseries machine - current this requires telling
> upstream+unmodified QEMU about this via
> -global spapr-pci-host-bridge.dma64_win_addr=0x100000000
> or per-phb property. 4/4 advertises the new option but
> there is no automation around it in QEMU (should it be?).
> 
> For now it is either 1<<59 or 4GB mode; dynamic switching is
> not supported (could be via sysfs).
> 
> This is a rebased version of
> https://lore.kernel.org/kvm/20191202015953.127902-1-aik@ozlabs.ru/
> 
> This is based on sha1
> 71c3a888cbca Linus Torvalds "Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux".
> 
> Please comment. Thanks.


Anyone from the POWERPC side wants to comment? Thanks,


> 
> 
> 
> Alexey Kardashevskiy (5):
>   powerpc/powernv/ioda: Move TCE bypass base to PE
>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>   powerpc/powernv/ioda: Allow smaller TCE table levels
>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
> 
>  arch/powerpc/include/asm/iommu.h              |   2 +
>  arch/powerpc/include/asm/opal-api.h           |   9 +-
>  arch/powerpc/include/asm/opal.h               |   2 +
>  arch/powerpc/platforms/powernv/pci.h          |   2 +-
>  include/uapi/linux/vfio.h                     |   2 +
>  arch/powerpc/platforms/powernv/npu-dma.c      |   1 +
>  arch/powerpc/platforms/powernv/opal-call.c    |   2 +
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c     | 229 ++++++++++++++----
>  drivers/vfio/vfio_iommu_spapr_tce.c           |  10 +-
>  10 files changed, 207 insertions(+), 56 deletions(-)
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-18  4:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-18  7:36 [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window at 4GB Alexey Kardashevskiy
2020-02-18  7:36 ` Alexey Kardashevskiy
2020-02-18  7:36 ` [PATCH kernel 1/5] powerpc/powernv/ioda: Move TCE bypass base to PE Alexey Kardashevskiy
2020-02-18  7:36   ` Alexey Kardashevskiy
2020-02-18  7:36 ` [PATCH kernel 2/5] powerpc/powernv/ioda: Rework for huge DMA window at 4GB Alexey Kardashevskiy
2020-02-18  7:36   ` Alexey Kardashevskiy
2020-02-18  7:36 ` [PATCH kernel 3/5] powerpc/powernv/ioda: Allow smaller TCE table levels Alexey Kardashevskiy
2020-02-18  7:36   ` Alexey Kardashevskiy
2020-02-18  7:36 ` [PATCH kernel 4/5] powerpc/powernv/phb4: Add 4GB IOMMU bypass mode Alexey Kardashevskiy
2020-02-18  7:36   ` Alexey Kardashevskiy
2020-02-18  7:36 ` [PATCH kernel 5/5] vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB Alexey Kardashevskiy
2020-02-18  7:36   ` Alexey Kardashevskiy
2020-02-20 18:19   ` Alex Williamson
2020-02-20 18:19     ` Alex Williamson
2020-03-18  4:31 ` [PATCH kernel 0/5] powerpc/powenv/ioda: Allow huge DMA window " Alexey Kardashevskiy
2020-03-18  4:31   ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.