linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/11] DDW + Indirect Mapping
@ 2021-07-16  8:27 Leonardo Bras
  2021-07-16  8:27 ` [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
                   ` (10 more replies)
  0 siblings, 11 replies; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.

Using the DDW instead of the default DMA window may allow to expand the
amount of memory that can be DMA-mapped, given the number of pages (TCEs)
may stay the same (or increase) and the default DMA window offers only
4k-pages while DDW may offer larger pages (4k, 64k, 16M ...).

Patch #1 replaces hard-coded 4K page size with a variable containing the
correct page size for the window.

Patch #2 introduces iommu_table_in_use(), and replace manual bit-field
checking where it's used. It will be used for aborting enable_ddw() if
there is any current iommu allocation and we are trying single window
indirect mapping.

Patch #3 introduces iommu_pseries_alloc_table() that will be helpful
when indirect mapping needs to replace the iommu_table.

Patch #4 adds helpers for adding DDWs in the list.

Patch #5 refactors enable_ddw() so it returns if direct mapping is
possible, instead of DMA offset. It helps for next patches on
indirect DMA mapping and also allows DMA windows starting at 0x00.

Patch #6 bring new helper to simplify enable_ddw(), allowing
some reorganization for introducing indirect mapping DDW.

Patch #7 adds new helper _iommu_table_setparms() and use it in other
*setparams*() to fill iommu_table. It will also be used for creating a
new iommu_table for indirect mapping.

Patch #8 updates remove_dma_window() to accept different property names,
so we can introduce a new property for indirect mapping.

Patch #9 extracts find_existing_ddw_windows() into
find_existing_ddw_windows_named(), and calls it by it's property name.
This will be useful when the property for indirect mapping is created,
so we can search the device-tree for both properties.

Patch #10:
Instead of destroying the created DDW if it doesn't map the whole
partition, make use of it instead of the default DMA window as it improves
performance. Also, update the iommu_table and re-generate the pools.
It introduces a new property name for DDW with indirect DMA mapping.

Patch #11:
Does some renaming of 'direct window' to 'dma window', given the DDW
created can now be also used in indirect mapping if direct mapping is not
available.

All patches were tested into an LPAR with an virtio-net interface that
allows default DMA window and DDW to coexist.

Changes since v4:
- Solve conflicts with new upstream versions
- Avoid unecessary code moving by doing variable declaration before definition
- Rename _iommu_table_setparms to iommu_table_setparms_common and changed base
  parameter from unsigned long to void* in order to avoid unecessary casting.
- Fix breaking case for existing direct-mapping.
- Fix IORESOURCE_MEM bound issue
- Move new tbl to pci->table_group->tables[1] instead of replacing [0]
v4 Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=241597&state=%2A&archive=both

Changes since v3:
- Fixed inverted free order at ddw_property_create()
- Updated goto tag naming
v3 Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=240287&state=%2A&archive=both

Changes since v2:
- Some patches got removed from the series and sent by themselves,
- New tbl created for DDW + indirect mapping reserves MMIO32 space,
- Improved reserved area algorithm,
- Improved commit messages,
- Removed define for default DMA window prop name,
- Avoided some unnecessary renaming,
- Removed some unnecessary empty lines,
- Changed some code moving to forward declarations.
v2 Link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=201210&state=%2A&archive=both


Leonardo Bras (11):
  powerpc/pseries/iommu: Replace hard-coded page shift
  powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  powerpc/pseries/iommu: Allow DDW windows starting at 0x00
  powerpc/pseries/iommu: Add ddw_property_create() and refactor
    enable_ddw()
  powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new
    helper
  powerpc/pseries/iommu: Update remove_dma_window() to accept property
    name
  powerpc/pseries/iommu: Find existing DDW with given property name
  powerpc/pseries/iommu: Make use of DDW for indirect mapping
  powerpc/pseries/iommu: Rename "direct window" to "dma window"

 arch/powerpc/include/asm/iommu.h       |   1 +
 arch/powerpc/include/asm/tce.h         |   8 -
 arch/powerpc/kernel/iommu.c            |  65 ++--
 arch/powerpc/platforms/pseries/iommu.c | 481 +++++++++++++++----------
 4 files changed, 330 insertions(+), 225 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-19 13:48   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

Some functions assume IOMMU page size can only be 4K (pageshift == 12).
Update them to accept any page size passed, so we can use 64K pages.

In the process, some defines like TCE_SHIFT were made obsolete, and then
removed.

IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
tce_buildmulti_pSeriesLP().

Most places had a tbl struct, so using tbl->it_page_shift was simple.
tce_free_pSeriesLP() was a special case, since callers not always have a
tbl struct, so adding a tceshift parameter seems the right thing to do.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/tce.h         |  8 ------
 arch/powerpc/platforms/pseries/iommu.c | 39 +++++++++++++++-----------
 2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index db5fc2f2262d..0c34d2756d92 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -19,15 +19,7 @@
 #define TCE_VB			0
 #define TCE_PCI			1
 
-/* TCE page size is 4096 bytes (1 << 12) */
-
-#define TCE_SHIFT	12
-#define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
-
 #define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
-
-#define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
-#define TCE_RPN_SHIFT		12
 #define TCE_VALID		0x800		/* TCE valid */
 #define TCE_ALLIO		0x400		/* TCE valid for all lpars */
 #define TCE_PCI_WRITE		0x2		/* write from PCI allowed */
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 0c55b991f665..b1b8d12bab39 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
 	u64 proto_tce;
 	__be64 *tcep;
 	u64 rpn;
+	const unsigned long tceshift = tbl->it_page_shift;
+	const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
 
 	proto_tce = TCE_PCI_READ; // Read allowed
 
@@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
 
 	while (npages--) {
 		/* can't move this out since we might cross MEMBLOCK boundary */
-		rpn = __pa(uaddr) >> TCE_SHIFT;
-		*tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
+		rpn = __pa(uaddr) >> tceshift;
+		*tcep = cpu_to_be64(proto_tce | rpn << tceshift);
 
-		uaddr += TCE_PAGE_SIZE;
+		uaddr += pagesize;
 		tcep++;
 	}
 	return 0;
@@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
 	return be64_to_cpu(*tcep);
 }
 
-static void tce_free_pSeriesLP(unsigned long liobn, long, long);
+static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
 static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
 
 static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
@@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
 		proto_tce |= TCE_PCI_WRITE;
 
 	while (npages--) {
-		tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
+		tce = proto_tce | rpn << tceshift;
 		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
 
 		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
 			ret = (int)rc;
-			tce_free_pSeriesLP(liobn, tcenum_start,
+			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
 			                   (npages_start - (npages + 1)));
 			break;
 		}
@@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	long tcenum_start = tcenum, npages_start = npages;
 	int ret = 0;
 	unsigned long flags;
+	const unsigned long tceshift = tbl->it_page_shift;
 
 	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
 		return tce_build_pSeriesLP(tbl->it_index, tcenum,
-					   tbl->it_page_shift, npages, uaddr,
+					   tceshift, npages, uaddr,
 		                           direction, attrs);
 	}
 
@@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 		if (!tcep) {
 			local_irq_restore(flags);
 			return tce_build_pSeriesLP(tbl->it_index, tcenum,
-					tbl->it_page_shift,
+					tceshift,
 					npages, uaddr, direction, attrs);
 		}
 		__this_cpu_write(tce_page, tcep);
 	}
 
-	rpn = __pa(uaddr) >> TCE_SHIFT;
+	rpn = __pa(uaddr) >> tceshift;
 	proto_tce = TCE_PCI_READ;
 	if (direction != DMA_TO_DEVICE)
 		proto_tce |= TCE_PCI_WRITE;
@@ -245,12 +248,12 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 		limit = min_t(long, npages, 4096/TCE_ENTRY_SIZE);
 
 		for (l = 0; l < limit; l++) {
-			tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
+			tcep[l] = cpu_to_be64(proto_tce | rpn << tceshift);
 			rpn++;
 		}
 
 		rc = plpar_tce_put_indirect((u64)tbl->it_index,
-					    (u64)tcenum << 12,
+					    (u64)tcenum << tceshift,
 					    (u64)__pa(tcep),
 					    limit);
 
@@ -277,12 +280,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	return ret;
 }
 
-static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
+static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
+			       long npages)
 {
 	u64 rc;
 
 	while (npages--) {
-		rc = plpar_tce_put((u64)liobn, (u64)tcenum << 12, 0);
+		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, 0);
 
 		if (rc && printk_ratelimit()) {
 			printk("tce_free_pSeriesLP: plpar_tce_put failed. rc=%lld\n", rc);
@@ -301,9 +305,11 @@ static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long n
 	u64 rc;
 
 	if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
-		return tce_free_pSeriesLP(tbl->it_index, tcenum, npages);
+		return tce_free_pSeriesLP(tbl->it_index, tcenum,
+					  tbl->it_page_shift, npages);
 
-	rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
+	rc = plpar_tce_stuff((u64)tbl->it_index,
+			     (u64)tcenum << tbl->it_page_shift, 0, npages);
 
 	if (rc && printk_ratelimit()) {
 		printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
@@ -319,7 +325,8 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
 	u64 rc;
 	unsigned long tce_ret;
 
-	rc = plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, &tce_ret);
+	rc = plpar_tce_get((u64)tbl->it_index,
+			   (u64)tcenum << tbl->it_page_shift, &tce_ret);
 
 	if (rc && printk_ratelimit()) {
 		printk("tce_get_pSeriesLP: plpar_tce_get failed. rc=%lld\n", rc);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
  2021-07-16  8:27 ` [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-19 13:53   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

Having a function to check if the iommu table has any allocation helps
deciding if a tbl can be reset for using a new DMA window.

It should be enough to replace all instances of !bitmap_empty(tbl...).

iommu_table_in_use() skips reserved memory, so we don't need to worry about
releasing it before testing. This causes iommu_table_release_pages() to
become unnecessary, given it is only used to remove reserved memory for
testing.

Also, only allow storing reserved memory values in tbl if they are valid
in the table, so there is no need to check it in the new helper.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h |  1 +
 arch/powerpc/kernel/iommu.c      | 65 ++++++++++++++++----------------
 2 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index deef7c94d7b6..bf3b84128525 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -154,6 +154,7 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
  */
 extern struct iommu_table *iommu_init_table(struct iommu_table *tbl,
 		int nid, unsigned long res_start, unsigned long res_end);
+bool iommu_table_in_use(struct iommu_table *tbl);
 
 #define IOMMU_TABLE_GROUP_MAX_TABLES	2
 
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 2af89a5e379f..b10bf58ae467 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -690,32 +690,24 @@ static void iommu_table_reserve_pages(struct iommu_table *tbl,
 	if (tbl->it_offset == 0)
 		set_bit(0, tbl->it_map);
 
-	tbl->it_reserved_start = res_start;
-	tbl->it_reserved_end = res_end;
-
-	/* Check if res_start..res_end isn't empty and overlaps the table */
-	if (res_start && res_end &&
-			(tbl->it_offset + tbl->it_size < res_start ||
-			 res_end < tbl->it_offset))
-		return;
+	if (res_start < tbl->it_offset)
+		res_start = tbl->it_offset;
 
-	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
-		set_bit(i - tbl->it_offset, tbl->it_map);
-}
+	if (res_end > (tbl->it_offset + tbl->it_size))
+		res_end = tbl->it_offset + tbl->it_size;
 
-static void iommu_table_release_pages(struct iommu_table *tbl)
-{
-	int i;
+	/* Check if res_start..res_end is a valid range in the table */
+	if (res_start >= res_end) {
+		tbl->it_reserved_start = tbl->it_offset;
+		tbl->it_reserved_end = tbl->it_offset;
+		return;
+	}
 
-	/*
-	 * In case we have reserved the first bit, we should not emit
-	 * the warning below.
-	 */
-	if (tbl->it_offset == 0)
-		clear_bit(0, tbl->it_map);
+	tbl->it_reserved_start = res_start;
+	tbl->it_reserved_end = res_end;
 
 	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
-		clear_bit(i - tbl->it_offset, tbl->it_map);
+		set_bit(i - tbl->it_offset, tbl->it_map);
 }
 
 /*
@@ -779,6 +771,22 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 	return tbl;
 }
 
+bool iommu_table_in_use(struct iommu_table *tbl)
+{
+	unsigned long start = 0, end;
+
+	/* ignore reserved bit0 */
+	if (tbl->it_offset == 0)
+		start = 1;
+	end = tbl->it_reserved_start - tbl->it_offset;
+	if (find_next_bit(tbl->it_map, end, start) != end)
+		return true;
+
+	start = tbl->it_reserved_end - tbl->it_offset;
+	end = tbl->it_size;
+	return find_next_bit(tbl->it_map, end, start) != end;
+}
+
 static void iommu_table_free(struct kref *kref)
 {
 	struct iommu_table *tbl;
@@ -795,10 +803,8 @@ static void iommu_table_free(struct kref *kref)
 
 	iommu_debugfs_del(tbl);
 
-	iommu_table_release_pages(tbl);
-
 	/* verify that table contains no entries */
-	if (!bitmap_empty(tbl->it_map, tbl->it_size))
+	if (iommu_table_in_use(tbl))
 		pr_warn("%s: Unexpected TCEs\n", __func__);
 
 	/* free bitmap */
@@ -1099,18 +1105,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock);
 
-	iommu_table_release_pages(tbl);
-
-	if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
+	if (iommu_table_in_use(tbl)) {
 		pr_err("iommu_tce: it_map is not empty");
 		ret = -EBUSY;
-		/* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
-		iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
-				tbl->it_reserved_end);
-	} else {
-		memset(tbl->it_map, 0xff, sz);
 	}
 
+	memset(tbl->it_map, 0xff, sz);
+
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_unlock(&tbl->pools[i].lock);
 	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
  2021-07-16  8:27 ` [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
  2021-07-16  8:27 ` [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-19 14:04   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

Creates a helper to allow allocating a new iommu_table without the need
to reallocate the iommu_group.

This will be helpful for replacing the iommu_table for the new DMA window,
after we remove the old one with iommu_tce_table_put().

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index b1b8d12bab39..33d82865d6e6 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,28 +53,31 @@ enum {
 	DDW_EXT_QUERY_OUT_SIZE = 2
 };
 
-static struct iommu_table_group *iommu_pseries_alloc_group(int node)
+static struct iommu_table *iommu_pseries_alloc_table(int node)
 {
-	struct iommu_table_group *table_group;
 	struct iommu_table *tbl;
 
-	table_group = kzalloc_node(sizeof(struct iommu_table_group), GFP_KERNEL,
-			   node);
-	if (!table_group)
-		return NULL;
-
 	tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
 	if (!tbl)
-		goto free_group;
+		return NULL;
 
 	INIT_LIST_HEAD_RCU(&tbl->it_group_list);
 	kref_init(&tbl->it_kref);
+	return tbl;
+}
 
-	table_group->tables[0] = tbl;
+static struct iommu_table_group *iommu_pseries_alloc_group(int node)
+{
+	struct iommu_table_group *table_group;
+
+	table_group = kzalloc_node(sizeof(*table_group), GFP_KERNEL, node);
+	if (!table_group)
+		return NULL;
 
-	return table_group;
+	table_group->tables[0] = iommu_pseries_alloc_table(node);
+	if (table_group->tables[0])
+		return table_group;
 
-free_group:
 	kfree(table_group);
 	return NULL;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (2 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-19 14:14   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

There are two functions creating direct_window_list entries in a
similar way, so create a ddw_list_new_entry() to avoid duplicity and
simplify those functions.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 32 +++++++++++++++++---------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 33d82865d6e6..712d1667144a 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -874,6 +874,21 @@ static u64 find_existing_ddw(struct device_node *pdn, int *window_shift)
 	return dma_addr;
 }
 
+static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
+						const struct dynamic_dma_window_prop *dma64)
+{
+	struct direct_window *window;
+
+	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	if (!window)
+		return NULL;
+
+	window->device = pdn;
+	window->prop = dma64;
+
+	return window;
+}
+
 static int find_existing_ddw_windows(void)
 {
 	int len;
@@ -886,18 +901,15 @@ static int find_existing_ddw_windows(void)
 
 	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
 		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
-		if (!direct64)
-			continue;
-
-		window = kzalloc(sizeof(*window), GFP_KERNEL);
-		if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
-			kfree(window);
+		if (!direct64 || len < sizeof(*direct64)) {
 			remove_ddw(pdn, true);
 			continue;
 		}
 
-		window->device = pdn;
-		window->prop = direct64;
+		window = ddw_list_new_entry(pdn, direct64);
+		if (!window)
+			break;
+
 		spin_lock(&direct_window_list_lock);
 		list_add(&window->list, &direct_window_list);
 		spin_unlock(&direct_window_list_lock);
@@ -1307,7 +1319,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
 		  create.liobn, dn);
 
-	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	window = ddw_list_new_entry(pdn, ddwprop);
 	if (!window)
 		goto out_clear_window;
 
@@ -1326,8 +1338,6 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		goto out_free_window;
 	}
 
-	window->device = pdn;
-	window->prop = ddwprop;
 	spin_lock(&direct_window_list_lock);
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (3 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-20 17:44   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

enable_ddw() currently returns the address of the DMA window, which is
considered invalid if has the value 0x00.

Also, it only considers valid an address returned from find_existing_ddw
if it's not 0x00.

Changing this behavior makes sense, given the users of enable_ddw() only
need to know if direct mapping is possible. It can also allow a DMA window
starting at 0x00 to be used.

This will be helpful for using a DDW with indirect mapping, as the window
address will be different than 0x00, but it will not map the whole
partition.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 36 +++++++++++++-------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 712d1667144a..b34b473bbdc1 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -853,25 +853,26 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
 			np, ret);
 }
 
-static u64 find_existing_ddw(struct device_node *pdn, int *window_shift)
+static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
 {
 	struct direct_window *window;
 	const struct dynamic_dma_window_prop *direct64;
-	u64 dma_addr = 0;
+	bool found = false;
 
 	spin_lock(&direct_window_list_lock);
 	/* check if we already created a window and dupe that config if so */
 	list_for_each_entry(window, &direct_window_list, list) {
 		if (window->device == pdn) {
 			direct64 = window->prop;
-			dma_addr = be64_to_cpu(direct64->dma_base);
+			*dma_addr = be64_to_cpu(direct64->dma_base);
 			*window_shift = be32_to_cpu(direct64->window_shift);
+			found = true;
 			break;
 		}
 	}
 	spin_unlock(&direct_window_list_lock);
 
-	return dma_addr;
+	return found;
 }
 
 static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
@@ -1161,20 +1162,20 @@ static int iommu_get_page_shift(u32 query_page_size)
  * pdn: the parent pe node with the ibm,dma_window property
  * Future: also check if we can remap the base window for our base page size
  *
- * returns the dma offset for use by the direct mapped DMA code.
+ * returns true if can map all pages (direct mapping), false otherwise..
  */
-static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
+static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 {
 	int len = 0, ret;
 	int max_ram_len = order_base_2(ddw_memory_hotplug_max());
 	struct ddw_query_response query;
 	struct ddw_create_response create;
 	int page_shift;
-	u64 dma_addr;
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	struct direct_window *window;
 	struct property *win64;
+	bool ddw_enabled = false;
 	struct dynamic_dma_window_prop *ddwprop;
 	struct failed_ddw_pdn *fpdn;
 	bool default_win_removed = false;
@@ -1186,9 +1187,10 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 
 	mutex_lock(&direct_window_init_mutex);
 
-	dma_addr = find_existing_ddw(pdn, &len);
-	if (dma_addr != 0)
+	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
+		ddw_enabled = true;
 		goto out_unlock;
+	}
 
 	/*
 	 * If we already went through this for a previous function of
@@ -1342,7 +1344,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
 
-	dma_addr = be64_to_cpu(ddwprop->dma_base);
+	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
+	ddw_enabled = true;
 	goto out_unlock;
 
 out_free_window:
@@ -1374,10 +1377,10 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	 * as RAM, then we failed to create a window to cover persistent
 	 * memory and need to set the DMA limit.
 	 */
-	if (pmem_present && dma_addr && (len == max_ram_len))
-		dev->dev.bus_dma_limit = dma_addr + (1ULL << len);
+	if (pmem_present && ddw_enabled && (len == max_ram_len))
+		dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
 
-	return dma_addr;
+	return ddw_enabled;
 }
 
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
@@ -1456,11 +1459,8 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
 			break;
 	}
 
-	if (pdn && PCI_DN(pdn)) {
-		pdev->dev.archdata.dma_offset = enable_ddw(pdev, pdn);
-		if (pdev->dev.archdata.dma_offset)
-			return true;
-	}
+	if (pdn && PCI_DN(pdn))
+		return enable_ddw(pdev, pdn);
 
 	return false;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (4 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-20 17:49   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

Code used to create a ddw property that was previously scattered in
enable_ddw() is now gathered in ddw_property_create(), which deals with
allocation and filling the property, letting it ready for
of_property_add(), which now occurs in sequence.

This created an opportunity to reorganize the second part of enable_ddw():

Without this patch enable_ddw() does, in order:
kzalloc() property & members, create_ddw(), fill ddwprop inside property,
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
of_add_property(), and list_add().

With this patch enable_ddw() does, in order:
create_ddw(), ddw_property_create(), of_add_property(),
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
and list_add().

This change requires of_remove_property() in case anything fails after
of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
in all memory, which looks the most expensive operation, only if
everything else succeeds.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 93 ++++++++++++++++----------
 1 file changed, 57 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index b34b473bbdc1..7ca79a04fa52 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1153,6 +1153,35 @@ static int iommu_get_page_shift(u32 query_page_size)
 	return 0;
 }
 
+static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
+					    u32 page_shift, u32 window_shift)
+{
+	struct dynamic_dma_window_prop *ddwprop;
+	struct property *win64;
+
+	win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
+	if (!win64)
+		return NULL;
+
+	win64->name = kstrdup(propname, GFP_KERNEL);
+	ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
+	win64->value = ddwprop;
+	win64->length = sizeof(*ddwprop);
+	if (!win64->name || !win64->value) {
+		kfree(win64->name);
+		kfree(win64->value);
+		kfree(win64);
+		return NULL;
+	}
+
+	ddwprop->liobn = cpu_to_be32(liobn);
+	ddwprop->dma_base = cpu_to_be64(dma_addr);
+	ddwprop->tce_shift = cpu_to_be32(page_shift);
+	ddwprop->window_shift = cpu_to_be32(window_shift);
+
+	return win64;
+}
+
 /*
  * If the PE supports dynamic dma windows, and there is space for a table
  * that can map all pages in a linear offset, then setup such a table,
@@ -1171,12 +1200,12 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	struct ddw_query_response query;
 	struct ddw_create_response create;
 	int page_shift;
+	u64 win_addr;
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	struct direct_window *window;
 	struct property *win64;
 	bool ddw_enabled = false;
-	struct dynamic_dma_window_prop *ddwprop;
 	struct failed_ddw_pdn *fpdn;
 	bool default_win_removed = false;
 	bool pmem_present;
@@ -1293,72 +1322,64 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 			1ULL << page_shift);
 		goto out_failed;
 	}
-	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
-	if (!win64) {
-		dev_info(&dev->dev,
-			"couldn't allocate property for 64bit dma window\n");
-		goto out_failed;
-	}
-	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
-	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
-	win64->length = sizeof(*ddwprop);
-	if (!win64->name || !win64->value) {
-		dev_info(&dev->dev,
-			"couldn't allocate property name and value\n");
-		goto out_free_prop;
-	}
 
 	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
 	if (ret != 0)
-		goto out_free_prop;
-
-	ddwprop->liobn = cpu_to_be32(create.liobn);
-	ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
-			create.addr_lo);
-	ddwprop->tce_shift = cpu_to_be32(page_shift);
-	ddwprop->window_shift = cpu_to_be32(len);
+		goto out_failed;
 
 	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
 		  create.liobn, dn);
 
-	window = ddw_list_new_entry(pdn, ddwprop);
+	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
+	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
+				    page_shift, len);
+	if (!win64) {
+		dev_info(&dev->dev,
+			 "couldn't allocate property, property name, or value\n");
+		goto out_remove_win;
+	}
+
+	ret = of_add_property(pdn, win64);
+	if (ret) {
+		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
+			pdn, ret);
+		goto out_free_prop;
+	}
+
+	window = ddw_list_new_entry(pdn, win64->value);
 	if (!window)
-		goto out_clear_window;
+		goto out_del_prop;
 
 	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
 			win64->value, tce_setrange_multi_pSeriesLP_walk);
 	if (ret) {
 		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
 			 dn, ret);
-		goto out_free_window;
-	}
-
-	ret = of_add_property(pdn, win64);
-	if (ret) {
-		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
-			 pdn, ret);
-		goto out_free_window;
+		goto out_del_list;
 	}
 
 	spin_lock(&direct_window_list_lock);
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
 
-	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
+	dev->dev.archdata.dma_offset = win_addr;
 	ddw_enabled = true;
 	goto out_unlock;
 
-out_free_window:
+out_del_list:
 	kfree(window);
 
-out_clear_window:
-	remove_ddw(pdn, true);
+out_del_prop:
+	of_remove_property(pdn, win64);
 
 out_free_prop:
 	kfree(win64->name);
 	kfree(win64->value);
 	kfree(win64);
 
+out_remove_win:
+	remove_ddw(pdn, true);
+
 out_failed:
 	if (default_win_removed)
 		reset_dma_window(dev, pdn);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (5 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-20 17:50   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

Add a new helper _iommu_table_setparms(), and use it in
iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
code.

Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
so move it to the new helper. Since we need the iommu_table_ops to be
declared before used, declare iommu_table_lpar_multi_ops and
iommu_table_pseries_ops to before their respective iommu_table_setparms*().

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 72 ++++++++++++++------------
 1 file changed, 38 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 7ca79a04fa52..108c3dcca686 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -501,6 +501,24 @@ static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
 	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
 }
 
+static void iommu_table_setparms_common(struct iommu_table *tbl, unsigned long busno,
+					unsigned long liobn, unsigned long win_addr,
+					unsigned long window_size, unsigned long page_shift,
+					void *base, struct iommu_table_ops *table_ops)
+{
+	tbl->it_busno = busno;
+	tbl->it_index = liobn;
+	tbl->it_offset = win_addr >> page_shift;
+	tbl->it_size = window_size >> page_shift;
+	tbl->it_page_shift = page_shift;
+	tbl->it_base = (unsigned long)base;
+	tbl->it_blocksize = 16;
+	tbl->it_type = TCE_PCI;
+	tbl->it_ops = table_ops;
+}
+
+struct iommu_table_ops iommu_table_pseries_ops;
+
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
 				 struct iommu_table *tbl)
@@ -509,8 +527,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
 	const unsigned long *basep;
 	const u32 *sizep;
 
-	node = phb->dn;
+	/* Test if we are going over 2GB of DMA space */
+	if (phb->dma_window_base_cur + phb->dma_window_size > SZ_2G) {
+		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
+		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
+	}
 
+	node = phb->dn;
 	basep = of_get_property(node, "linux,tce-base", NULL);
 	sizep = of_get_property(node, "linux,tce-size", NULL);
 	if (basep == NULL || sizep == NULL) {
@@ -519,33 +542,18 @@ static void iommu_table_setparms(struct pci_controller *phb,
 		return;
 	}
 
-	tbl->it_base = (unsigned long)__va(*basep);
+	iommu_table_setparms_common(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
+				    phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
+				    __va(*basep), &iommu_table_pseries_ops);
 
 	if (!is_kdump_kernel())
 		memset((void *)tbl->it_base, 0, *sizep);
 
-	tbl->it_busno = phb->bus->number;
-	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
-
-	/* Units of tce entries */
-	tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
-
-	/* Test if we are going over 2GB of DMA space */
-	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
-		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
-		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
-	}
-
 	phb->dma_window_base_cur += phb->dma_window_size;
-
-	/* Set the tce table size - measured in entries */
-	tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
-
-	tbl->it_index = 0;
-	tbl->it_blocksize = 16;
-	tbl->it_type = TCE_PCI;
 }
 
+struct iommu_table_ops iommu_table_lpar_multi_ops;
+
 /*
  * iommu_table_setparms_lpar
  *
@@ -557,17 +565,13 @@ static void iommu_table_setparms_lpar(struct pci_controller *phb,
 				      struct iommu_table_group *table_group,
 				      const __be32 *dma_window)
 {
-	unsigned long offset, size;
+	unsigned long offset, size, liobn;
 
-	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
+	of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
+
+	iommu_table_setparms_common(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, NULL,
+				    &iommu_table_lpar_multi_ops);
 
-	tbl->it_busno = phb->bus->number;
-	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
-	tbl->it_base   = 0;
-	tbl->it_blocksize  = 16;
-	tbl->it_type = TCE_PCI;
-	tbl->it_offset = offset >> tbl->it_page_shift;
-	tbl->it_size = size >> tbl->it_page_shift;
 
 	table_group->tce32_start = offset;
 	table_group->tce32_size = size;
@@ -647,7 +651,7 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 	tbl = pci->table_group->tables[0];
 
 	iommu_table_setparms(pci->phb, dn, tbl);
-	tbl->it_ops = &iommu_table_pseries_ops;
+
 	if (!iommu_init_table(tbl, pci->phb->node, 0, 0))
 		panic("Failed to initialize iommu table");
 
@@ -730,7 +734,7 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 		tbl = ppci->table_group->tables[0];
 		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
 				ppci->table_group, dma_window);
-		tbl->it_ops = &iommu_table_lpar_multi_ops;
+
 		if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
 			panic("Failed to initialize iommu table");
 		iommu_register_group(ppci->table_group,
@@ -760,7 +764,7 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		PCI_DN(dn)->table_group = iommu_pseries_alloc_group(phb->node);
 		tbl = PCI_DN(dn)->table_group->tables[0];
 		iommu_table_setparms(phb, dn, tbl);
-		tbl->it_ops = &iommu_table_pseries_ops;
+
 		if (!iommu_init_table(tbl, phb->node, 0, 0))
 			panic("Failed to initialize iommu table");
 
@@ -1443,7 +1447,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 		tbl = pci->table_group->tables[0];
 		iommu_table_setparms_lpar(pci->phb, pdn, tbl,
 				pci->table_group, dma_window);
-		tbl->it_ops = &iommu_table_lpar_multi_ops;
+
 		iommu_init_table(tbl, pci->phb->node, 0, 0);
 		iommu_register_group(pci->table_group,
 				pci_domain_nr(pci->phb->bus), 0);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (6 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-20 17:51   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 09/11] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

Update remove_dma_window() so it can be used to remove DDW with a given
property name.

This enables the creation of new property names for DDW, so we can
have different usage for it, like indirect mapping.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 108c3dcca686..17c6f4706e76 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -830,31 +830,32 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
 			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
 }
 
-static void remove_ddw(struct device_node *np, bool remove_prop)
+static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_name)
 {
 	struct property *win;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	int ret = 0;
 
+	win = of_find_property(np, win_name, NULL);
+	if (!win)
+		return -EINVAL;
+
 	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
 					 &ddw_avail[0], DDW_APPLICABLE_SIZE);
 	if (ret)
-		return;
-
-	win = of_find_property(np, DIRECT64_PROPNAME, NULL);
-	if (!win)
-		return;
+		return 0;
 
 	if (win->length >= sizeof(struct dynamic_dma_window_prop))
 		remove_dma_window(np, ddw_avail, win);
 
 	if (!remove_prop)
-		return;
+		return 0;
 
 	ret = of_remove_property(np, win);
 	if (ret)
 		pr_warn("%pOF: failed to remove direct window property: %d\n",
 			np, ret);
+	return 0;
 }
 
 static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
@@ -907,7 +908,7 @@ static int find_existing_ddw_windows(void)
 	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
 		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
 		if (!direct64 || len < sizeof(*direct64)) {
-			remove_ddw(pdn, true);
+			remove_ddw(pdn, true, DIRECT64_PROPNAME);
 			continue;
 		}
 
@@ -1382,7 +1383,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	kfree(win64);
 
 out_remove_win:
-	remove_ddw(pdn, true);
+	remove_ddw(pdn, true, DIRECT64_PROPNAME);
 
 out_failed:
 	if (default_win_removed)
@@ -1547,7 +1548,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 		 * we have to remove the property when releasing
 		 * the device node.
 		 */
-		remove_ddw(np, false);
+		remove_ddw(np, false, DIRECT64_PROPNAME);
 		if (pci && pci->table_group)
 			iommu_pseries_free_group(pci->table_group,
 					np->full_name);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 09/11] powerpc/pseries/iommu: Find existing DDW with given property name
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (7 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-20 17:52   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
  2021-07-16  8:27 ` [PATCH v5 11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

At the moment pseries stores information about created directly mapped
DDW window in DIRECT64_PROPNAME.

With the objective of implementing indirect DMA mapping with DDW, it's
necessary to have another propriety name to make sure kexec'ing into older
kernels does not break, as it would if we reuse DIRECT64_PROPNAME.

In order to have this, find_existing_ddw_windows() needs to be able to
look for different property names.

Extract find_existing_ddw_windows() into find_existing_ddw_windows_named()
and calls it with current property name.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 17c6f4706e76..22d251e15b61 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -895,24 +895,21 @@ static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
 	return window;
 }
 
-static int find_existing_ddw_windows(void)
+static void find_existing_ddw_windows_named(const char *name)
 {
 	int len;
 	struct device_node *pdn;
 	struct direct_window *window;
-	const struct dynamic_dma_window_prop *direct64;
-
-	if (!firmware_has_feature(FW_FEATURE_LPAR))
-		return 0;
+	const struct dynamic_dma_window_prop *dma64;
 
-	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
-		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
-		if (!direct64 || len < sizeof(*direct64)) {
-			remove_ddw(pdn, true, DIRECT64_PROPNAME);
+	for_each_node_with_property(pdn, name) {
+		dma64 = of_get_property(pdn, name, &len);
+		if (!dma64 || len < sizeof(*dma64)) {
+			remove_ddw(pdn, true, name);
 			continue;
 		}
 
-		window = ddw_list_new_entry(pdn, direct64);
+		window = ddw_list_new_entry(pdn, dma64);
 		if (!window)
 			break;
 
@@ -920,6 +917,14 @@ static int find_existing_ddw_windows(void)
 		list_add(&window->list, &direct_window_list);
 		spin_unlock(&direct_window_list_lock);
 	}
+}
+
+static int find_existing_ddw_windows(void)
+{
+	if (!firmware_has_feature(FW_FEATURE_LPAR))
+		return 0;
+
+	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
 
 	return 0;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (8 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 09/11] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-20 18:12   ` Frederic Barrat
  2021-07-16  8:27 ` [PATCH v5 11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.

By using DDW, indirect mapping  can get more TCEs than available for the
default DMA window, and also get access to using much larger pagesizes
(16MB as implemented in qemu vs 4k from default DMA window), causing a
significant increase on the maximum amount of memory that can be IOMMU
mapped at the same time.

Indirect mapping will only be used if direct mapping is not a
possibility.

For indirect mapping, it's necessary to re-create the iommu_table with
the new DMA window parameters, so iommu_alloc() can use it.

Removing the default DMA window for using DDW with indirect mapping
is only allowed if there is no current IOMMU memory allocated in
the iommu_table. enable_ddw() is aborted otherwise.

Even though there won't be both direct and indirect mappings at the
same time, we can't reuse the DIRECT64_PROPNAME property name, or else
an older kexec()ed kernel can assume direct mapping, and skip
iommu_alloc(), causing undesirable behavior.
So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
was created to represent a DDW that does not allow direct mapping.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 87 +++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 22d251e15b61..a67e71c49aeb 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
 /* protects initializing window twice for same device */
 static DEFINE_MUTEX(direct_window_init_mutex);
 #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
 
 static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 					unsigned long num_pfn, const void *arg)
@@ -925,6 +926,7 @@ static int find_existing_ddw_windows(void)
 		return 0;
 
 	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
+	find_existing_ddw_windows_named(DMA64_PROPNAME);
 
 	return 0;
 }
@@ -1211,14 +1213,17 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	struct ddw_create_response create;
 	int page_shift;
 	u64 win_addr;
+	const char *win_name;
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	struct direct_window *window;
 	struct property *win64;
 	bool ddw_enabled = false;
 	struct failed_ddw_pdn *fpdn;
-	bool default_win_removed = false;
+	bool default_win_removed = false, direct_mapping = false;
 	bool pmem_present;
+	struct pci_dn *pci = PCI_DN(pdn);
+	struct iommu_table *tbl = pci->table_group->tables[0];
 
 	dn = of_find_node_by_type(NULL, "ibm,pmemory");
 	pmem_present = dn != NULL;
@@ -1227,6 +1232,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	mutex_lock(&direct_window_init_mutex);
 
 	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
+		direct_mapping = (len >= max_ram_len);
 		ddw_enabled = true;
 		goto out_unlock;
 	}
@@ -1307,8 +1313,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 			  query.page_size);
 		goto out_failed;
 	}
-	/* verify the window * number of ptes will map the partition */
-	/* check largest block * page size > max memory hotplug addr */
+
 	/*
 	 * The "ibm,pmemory" can appear anywhere in the address space.
 	 * Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS
@@ -1324,13 +1329,25 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 			dev_info(&dev->dev, "Skipping ibm,pmemory");
 	}
 
+	/* check if the available block * number of ptes will map everything */
 	if (query.largest_available_block < (1ULL << (len - page_shift))) {
 		dev_dbg(&dev->dev,
 			"can't map partition max 0x%llx with %llu %llu-sized pages\n",
 			1ULL << len,
 			query.largest_available_block,
 			1ULL << page_shift);
-		goto out_failed;
+
+		/* DDW + IOMMU on single window may fail if there is any allocation */
+		if (default_win_removed && iommu_table_in_use(tbl)) {
+			dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
+			goto out_failed;
+		}
+
+		len = order_base_2(query.largest_available_block << page_shift);
+		win_name = DMA64_PROPNAME;
+	} else {
+		direct_mapping = true;
+		win_name = DIRECT64_PROPNAME;
 	}
 
 	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
@@ -1341,8 +1358,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		  create.liobn, dn);
 
 	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
-	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
-				    page_shift, len);
+	win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
 	if (!win64) {
 		dev_info(&dev->dev,
 			 "couldn't allocate property, property name, or value\n");
@@ -1360,12 +1376,51 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	if (!window)
 		goto out_del_prop;
 
-	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
-			win64->value, tce_setrange_multi_pSeriesLP_walk);
-	if (ret) {
-		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
-			 dn, ret);
-		goto out_del_list;
+	if (direct_mapping) {
+		/* DDW maps the whole partition, so enable direct DMA mapping */
+		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
+					    win64->value, tce_setrange_multi_pSeriesLP_walk);
+		if (ret) {
+			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
+				 dn, ret);
+			goto out_del_list;
+		}
+	} else {
+		struct iommu_table *newtbl;
+		int i;
+
+		for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
+			const unsigned long mask = IORESOURCE_MEM_64 | IORESOURCE_MEM;
+
+			/* Look for MMIO32 */
+			if ((pci->phb->mem_resources[i].flags & mask) == IORESOURCE_MEM)
+				break;
+		}
+
+		if (i == ARRAY_SIZE(pci->phb->mem_resources))
+			goto out_del_list;
+
+		/* New table for using DDW instead of the default DMA window */
+		newtbl = iommu_pseries_alloc_table(pci->phb->node);
+		if (!newtbl) {
+			dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
+			goto out_del_list;
+		}
+
+		iommu_table_setparms_common(newtbl, pci->phb->bus->number, create.liobn, win_addr,
+					    1UL << len, page_shift, NULL, &iommu_table_lpar_multi_ops);
+		iommu_init_table(newtbl, pci->phb->node, pci->phb->mem_resources[i].start,
+				 pci->phb->mem_resources[i].end);
+
+		pci->table_group->tables[1] = newtbl;
+
+		/* Keep default DMA window stuct if removed */
+		if (default_win_removed) {
+			tbl->it_size = 0;
+			kfree(tbl->it_map);
+		}
+
+		set_iommu_table_base(&dev->dev, newtbl);
 	}
 
 	spin_lock(&direct_window_list_lock);
@@ -1408,10 +1463,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	 * as RAM, then we failed to create a window to cover persistent
 	 * memory and need to set the DMA limit.
 	 */
-	if (pmem_present && ddw_enabled && (len == max_ram_len))
+	if (pmem_present && ddw_enabled && direct_mapping && len == max_ram_len)
 		dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
 
-	return ddw_enabled;
+    return ddw_enabled && direct_mapping;
 }
 
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
@@ -1553,7 +1608,9 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 		 * we have to remove the property when releasing
 		 * the device node.
 		 */
-		remove_ddw(np, false, DIRECT64_PROPNAME);
+		if (remove_ddw(np, false, DIRECT64_PROPNAME))
+			remove_ddw(np, false, DMA64_PROPNAME);
+
 		if (pci && pci->table_group)
 			iommu_pseries_free_group(pci->table_group,
 					np->full_name);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window"
  2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
                   ` (9 preceding siblings ...)
  2021-07-16  8:27 ` [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
@ 2021-07-16  8:27 ` Leonardo Bras
  2021-07-20 18:12   ` Frederic Barrat
  10 siblings, 1 reply; 37+ messages in thread
From: Leonardo Bras @ 2021-07-16  8:27 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen, Frederic Barrat
  Cc: linuxppc-dev, linux-kernel

A previous change introduced the usage of DDW as a bigger indirect DMA
mapping when the DDW available size does not map the whole partition.

As most of the code that manipulates direct mappings was reused for
indirect mappings, it's necessary to rename all names and debug/info
messages to reflect that it can be used for both kinds of mapping.

This should cause no behavioural change, just adjust naming.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 91 +++++++++++++-------------
 1 file changed, 47 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index a67e71c49aeb..52548dfb8b45 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -349,7 +349,7 @@ struct dynamic_dma_window_prop {
 	__be32	window_shift;	/* ilog2(tce_window_size) */
 };
 
-struct direct_window {
+struct dma_win {
 	struct device_node *device;
 	const struct dynamic_dma_window_prop *prop;
 	struct list_head list;
@@ -369,11 +369,11 @@ struct ddw_create_response {
 	u32 addr_lo;
 };
 
-static LIST_HEAD(direct_window_list);
+static LIST_HEAD(dma_win_list);
 /* prevents races between memory on/offline and window creation */
-static DEFINE_SPINLOCK(direct_window_list_lock);
+static DEFINE_SPINLOCK(dma_win_list_lock);
 /* protects initializing window twice for same device */
-static DEFINE_MUTEX(direct_window_init_mutex);
+static DEFINE_MUTEX(dma_win_init_mutex);
 #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
 #define DMA64_PROPNAME "linux,dma64-ddr-window-info"
 
@@ -713,7 +713,10 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 	pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
 		 dn);
 
-	/* Find nearest ibm,dma-window, walking up the device tree */
+	/*
+	 * Find nearest ibm,dma-window (default DMA window), walking up the
+	 * device tree
+	 */
 	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
 		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
 		if (dma_window != NULL)
@@ -822,11 +825,11 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
 
 	ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
 	if (ret)
-		pr_warn("%pOF: failed to remove direct window: rtas returned "
+		pr_warn("%pOF: failed to remove DMA window: rtas returned "
 			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
 			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
 	else
-		pr_debug("%pOF: successfully removed direct window: rtas returned "
+		pr_debug("%pOF: successfully removed DMA window: rtas returned "
 			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
 			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
 }
@@ -854,37 +857,37 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
 
 	ret = of_remove_property(np, win);
 	if (ret)
-		pr_warn("%pOF: failed to remove direct window property: %d\n",
+		pr_warn("%pOF: failed to remove DMA window property: %d\n",
 			np, ret);
 	return 0;
 }
 
 static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
 {
-	struct direct_window *window;
-	const struct dynamic_dma_window_prop *direct64;
+	struct dma_win *window;
+	const struct dynamic_dma_window_prop *dma64;
 	bool found = false;
 
-	spin_lock(&direct_window_list_lock);
+	spin_lock(&dma_win_list_lock);
 	/* check if we already created a window and dupe that config if so */
-	list_for_each_entry(window, &direct_window_list, list) {
+	list_for_each_entry(window, &dma_win_list, list) {
 		if (window->device == pdn) {
-			direct64 = window->prop;
-			*dma_addr = be64_to_cpu(direct64->dma_base);
-			*window_shift = be32_to_cpu(direct64->window_shift);
+			dma64 = window->prop;
+			*dma_addr = be64_to_cpu(dma64->dma_base);
+			*window_shift = be32_to_cpu(dma64->window_shift);
 			found = true;
 			break;
 		}
 	}
-	spin_unlock(&direct_window_list_lock);
+	spin_unlock(&dma_win_list_lock);
 
 	return found;
 }
 
-static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
-						const struct dynamic_dma_window_prop *dma64)
+static struct dma_win *ddw_list_new_entry(struct device_node *pdn,
+					  const struct dynamic_dma_window_prop *dma64)
 {
-	struct direct_window *window;
+	struct dma_win *window;
 
 	window = kzalloc(sizeof(*window), GFP_KERNEL);
 	if (!window)
@@ -900,7 +903,7 @@ static void find_existing_ddw_windows_named(const char *name)
 {
 	int len;
 	struct device_node *pdn;
-	struct direct_window *window;
+	struct dma_win *window;
 	const struct dynamic_dma_window_prop *dma64;
 
 	for_each_node_with_property(pdn, name) {
@@ -914,9 +917,9 @@ static void find_existing_ddw_windows_named(const char *name)
 		if (!window)
 			break;
 
-		spin_lock(&direct_window_list_lock);
-		list_add(&window->list, &direct_window_list);
-		spin_unlock(&direct_window_list_lock);
+		spin_lock(&dma_win_list_lock);
+		list_add(&window->list, &dma_win_list);
+		spin_unlock(&dma_win_list_lock);
 	}
 }
 
@@ -1216,7 +1219,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	const char *win_name;
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
-	struct direct_window *window;
+	struct dma_win *window;
 	struct property *win64;
 	bool ddw_enabled = false;
 	struct failed_ddw_pdn *fpdn;
@@ -1229,7 +1232,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	pmem_present = dn != NULL;
 	of_node_put(dn);
 
-	mutex_lock(&direct_window_init_mutex);
+	mutex_lock(&dma_win_init_mutex);
 
 	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
 		direct_mapping = (len >= max_ram_len);
@@ -1309,8 +1312,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 
 	page_shift = iommu_get_page_shift(query.page_size);
 	if (!page_shift) {
-		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
-			  query.page_size);
+		dev_dbg(&dev->dev, "no supported page size in mask %x",
+			query.page_size);
 		goto out_failed;
 	}
 
@@ -1367,7 +1370,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 
 	ret = of_add_property(pdn, win64);
 	if (ret) {
-		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
+		dev_err(&dev->dev, "unable to add DMA window property for %pOF: %d",
 			pdn, ret);
 		goto out_free_prop;
 	}
@@ -1381,7 +1384,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
 					    win64->value, tce_setrange_multi_pSeriesLP_walk);
 		if (ret) {
-			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
+			dev_info(&dev->dev, "failed to map DMA window for %pOF: %d\n",
 				 dn, ret);
 			goto out_del_list;
 		}
@@ -1423,9 +1426,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		set_iommu_table_base(&dev->dev, newtbl);
 	}
 
-	spin_lock(&direct_window_list_lock);
-	list_add(&window->list, &direct_window_list);
-	spin_unlock(&direct_window_list_lock);
+	spin_lock(&dma_win_list_lock);
+	list_add(&window->list, &dma_win_list);
+	spin_unlock(&dma_win_list_lock);
 
 	dev->dev.archdata.dma_offset = win_addr;
 	ddw_enabled = true;
@@ -1456,7 +1459,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	list_add(&fpdn->list, &failed_ddw_pdn_list);
 
 out_unlock:
-	mutex_unlock(&direct_window_init_mutex);
+	mutex_unlock(&dma_win_init_mutex);
 
 	/*
 	 * If we have persistent memory and the window size is only as big
@@ -1554,29 +1557,29 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
 static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
 		void *data)
 {
-	struct direct_window *window;
+	struct dma_win *window;
 	struct memory_notify *arg = data;
 	int ret = 0;
 
 	switch (action) {
 	case MEM_GOING_ONLINE:
-		spin_lock(&direct_window_list_lock);
-		list_for_each_entry(window, &direct_window_list, list) {
+		spin_lock(&dma_win_list_lock);
+		list_for_each_entry(window, &dma_win_list, list) {
 			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
 					arg->nr_pages, window->prop);
 			/* XXX log error */
 		}
-		spin_unlock(&direct_window_list_lock);
+		spin_unlock(&dma_win_list_lock);
 		break;
 	case MEM_CANCEL_ONLINE:
 	case MEM_OFFLINE:
-		spin_lock(&direct_window_list_lock);
-		list_for_each_entry(window, &direct_window_list, list) {
+		spin_lock(&dma_win_list_lock);
+		list_for_each_entry(window, &dma_win_list, list) {
 			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
 					arg->nr_pages, window->prop);
 			/* XXX log error */
 		}
-		spin_unlock(&direct_window_list_lock);
+		spin_unlock(&dma_win_list_lock);
 		break;
 	default:
 		break;
@@ -1597,7 +1600,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 	struct of_reconfig_data *rd = data;
 	struct device_node *np = rd->dn;
 	struct pci_dn *pci = PCI_DN(np);
-	struct direct_window *window;
+	struct dma_win *window;
 
 	switch (action) {
 	case OF_RECONFIG_DETACH_NODE:
@@ -1615,15 +1618,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 			iommu_pseries_free_group(pci->table_group,
 					np->full_name);
 
-		spin_lock(&direct_window_list_lock);
-		list_for_each_entry(window, &direct_window_list, list) {
+		spin_lock(&dma_win_list_lock);
+		list_for_each_entry(window, &dma_win_list, list) {
 			if (window->device == np) {
 				list_del(&window->list);
 				kfree(window);
 				break;
 			}
 		}
-		spin_unlock(&direct_window_list_lock);
+		spin_unlock(&dma_win_list_lock);
 		break;
 	default:
 		err = NOTIFY_DONE;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift
  2021-07-16  8:27 ` [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
@ 2021-07-19 13:48   ` Frederic Barrat
  2021-07-19 18:43     ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-19 13:48 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> Some functions assume IOMMU page size can only be 4K (pageshift == 12).
> Update them to accept any page size passed, so we can use 64K pages.
> 
> In the process, some defines like TCE_SHIFT were made obsolete, and then
> removed.
> 
> IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
> a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
> no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
> It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
> tce_buildmulti_pSeriesLP().
> 
> Most places had a tbl struct, so using tbl->it_page_shift was simple.
> tce_free_pSeriesLP() was a special case, since callers not always have a
> tbl struct, so adding a tceshift parameter seems the right thing to do.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---

FWIW,
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>



>   arch/powerpc/include/asm/tce.h         |  8 ------
>   arch/powerpc/platforms/pseries/iommu.c | 39 +++++++++++++++-----------
>   2 files changed, 23 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
> index db5fc2f2262d..0c34d2756d92 100644
> --- a/arch/powerpc/include/asm/tce.h
> +++ b/arch/powerpc/include/asm/tce.h
> @@ -19,15 +19,7 @@
>   #define TCE_VB			0
>   #define TCE_PCI			1
>   
> -/* TCE page size is 4096 bytes (1 << 12) */
> -
> -#define TCE_SHIFT	12
> -#define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
> -
>   #define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
> -
> -#define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
> -#define TCE_RPN_SHIFT		12
>   #define TCE_VALID		0x800		/* TCE valid */
>   #define TCE_ALLIO		0x400		/* TCE valid for all lpars */
>   #define TCE_PCI_WRITE		0x2		/* write from PCI allowed */
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 0c55b991f665..b1b8d12bab39 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
>   	u64 proto_tce;
>   	__be64 *tcep;
>   	u64 rpn;
> +	const unsigned long tceshift = tbl->it_page_shift;
> +	const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
>   
>   	proto_tce = TCE_PCI_READ; // Read allowed
>   
> @@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
>   
>   	while (npages--) {
>   		/* can't move this out since we might cross MEMBLOCK boundary */
> -		rpn = __pa(uaddr) >> TCE_SHIFT;
> -		*tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> +		rpn = __pa(uaddr) >> tceshift;
> +		*tcep = cpu_to_be64(proto_tce | rpn << tceshift);
>   
> -		uaddr += TCE_PAGE_SIZE;
> +		uaddr += pagesize;
>   		tcep++;
>   	}
>   	return 0;
> @@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
>   	return be64_to_cpu(*tcep);
>   }
>   
> -static void tce_free_pSeriesLP(unsigned long liobn, long, long);
> +static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
>   static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
>   
>   static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> @@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
>   		proto_tce |= TCE_PCI_WRITE;
>   
>   	while (npages--) {
> -		tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
> +		tce = proto_tce | rpn << tceshift;
>   		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
>   
>   		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   			ret = (int)rc;
> -			tce_free_pSeriesLP(liobn, tcenum_start,
> +			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
>   			                   (npages_start - (npages + 1)));
>   			break;
>   		}
> @@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	long tcenum_start = tcenum, npages_start = npages;
>   	int ret = 0;
>   	unsigned long flags;
> +	const unsigned long tceshift = tbl->it_page_shift;
>   
>   	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
>   		return tce_build_pSeriesLP(tbl->it_index, tcenum,
> -					   tbl->it_page_shift, npages, uaddr,
> +					   tceshift, npages, uaddr,
>   		                           direction, attrs);
>   	}
>   
> @@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   		if (!tcep) {
>   			local_irq_restore(flags);
>   			return tce_build_pSeriesLP(tbl->it_index, tcenum,
> -					tbl->it_page_shift,
> +					tceshift,
>   					npages, uaddr, direction, attrs);
>   		}
>   		__this_cpu_write(tce_page, tcep);
>   	}
>   
> -	rpn = __pa(uaddr) >> TCE_SHIFT;
> +	rpn = __pa(uaddr) >> tceshift;
>   	proto_tce = TCE_PCI_READ;
>   	if (direction != DMA_TO_DEVICE)
>   		proto_tce |= TCE_PCI_WRITE;
> @@ -245,12 +248,12 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   		limit = min_t(long, npages, 4096/TCE_ENTRY_SIZE);
>   
>   		for (l = 0; l < limit; l++) {
> -			tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> +			tcep[l] = cpu_to_be64(proto_tce | rpn << tceshift);
>   			rpn++;
>   		}
>   
>   		rc = plpar_tce_put_indirect((u64)tbl->it_index,
> -					    (u64)tcenum << 12,
> +					    (u64)tcenum << tceshift,
>   					    (u64)__pa(tcep),
>   					    limit);
>   
> @@ -277,12 +280,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	return ret;
>   }
>   
> -static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
> +static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> +			       long npages)
>   {
>   	u64 rc;
>   
>   	while (npages--) {
> -		rc = plpar_tce_put((u64)liobn, (u64)tcenum << 12, 0);
> +		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, 0);
>   
>   		if (rc && printk_ratelimit()) {
>   			printk("tce_free_pSeriesLP: plpar_tce_put failed. rc=%lld\n", rc);
> @@ -301,9 +305,11 @@ static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long n
>   	u64 rc;
>   
>   	if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
> -		return tce_free_pSeriesLP(tbl->it_index, tcenum, npages);
> +		return tce_free_pSeriesLP(tbl->it_index, tcenum,
> +					  tbl->it_page_shift, npages);
>   
> -	rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
> +	rc = plpar_tce_stuff((u64)tbl->it_index,
> +			     (u64)tcenum << tbl->it_page_shift, 0, npages);
>   
>   	if (rc && printk_ratelimit()) {
>   		printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
> @@ -319,7 +325,8 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
>   	u64 rc;
>   	unsigned long tce_ret;
>   
> -	rc = plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, &tce_ret);
> +	rc = plpar_tce_get((u64)tbl->it_index,
> +			   (u64)tcenum << tbl->it_page_shift, &tce_ret);
>   
>   	if (rc && printk_ratelimit()) {
>   		printk("tce_get_pSeriesLP: plpar_tce_get failed. rc=%lld\n", rc);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  2021-07-16  8:27 ` [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
@ 2021-07-19 13:53   ` Frederic Barrat
  2021-07-20  5:38     ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-19 13:53 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> @@ -1099,18 +1105,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
>   	for (i = 0; i < tbl->nr_pools; i++)
>   		spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock);
>   
> -	iommu_table_release_pages(tbl);
> -
> -	if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
> +	if (iommu_table_in_use(tbl)) {
>   		pr_err("iommu_tce: it_map is not empty");
>   		ret = -EBUSY;
> -		/* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
> -		iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
> -				tbl->it_reserved_end);
> -	} else {
> -		memset(tbl->it_map, 0xff, sz);
>   	}
>   
> +	memset(tbl->it_map, 0xff, sz);
> +


So if the table is not empty, we fail (EBUSY) but we now also completely 
overwrite the bitmap. It was in an unexpected state, but we're making it 
worse. Or am I missing something?

   Fred


>   	for (i = 0; i < tbl->nr_pools; i++)
>   		spin_unlock(&tbl->pools[i].lock);
>   	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  2021-07-16  8:27 ` [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
@ 2021-07-19 14:04   ` Frederic Barrat
  2021-07-19 18:47     ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-19 14:04 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> Creates a helper to allow allocating a new iommu_table without the need
> to reallocate the iommu_group.
> 
> This will be helpful for replacing the iommu_table for the new DMA window,
> after we remove the old one with iommu_tce_table_put().
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 25 ++++++++++++++-----------
>   1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index b1b8d12bab39..33d82865d6e6 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -53,28 +53,31 @@ enum {
>   	DDW_EXT_QUERY_OUT_SIZE = 2
>   };
>   
> -static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> +static struct iommu_table *iommu_pseries_alloc_table(int node)
>   {
> -	struct iommu_table_group *table_group;
>   	struct iommu_table *tbl;
>   
> -	table_group = kzalloc_node(sizeof(struct iommu_table_group), GFP_KERNEL,
> -			   node);
> -	if (!table_group)
> -		return NULL;
> -
>   	tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
>   	if (!tbl)
> -		goto free_group;
> +		return NULL;
>   
>   	INIT_LIST_HEAD_RCU(&tbl->it_group_list);
>   	kref_init(&tbl->it_kref);
> +	return tbl;
> +}
>   
> -	table_group->tables[0] = tbl;
> +static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> +{
> +	struct iommu_table_group *table_group;
> +
> +	table_group = kzalloc_node(sizeof(*table_group), GFP_KERNEL, node);
> +	if (!table_group)
> +		return NULL;
>   
> -	return table_group;
> +	table_group->tables[0] = iommu_pseries_alloc_table(node);
> +	if (table_group->tables[0])
> +		return table_group;


Nitpick: for readability, we'd usually expect the error path to be 
detected with the if statement and keep going on the good path, and here 
the code does the opposite. No big deal though, so

Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>


>   
> -free_group:
>   	kfree(table_group);
>   	return NULL;
>   }
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  2021-07-16  8:27 ` [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
@ 2021-07-19 14:14   ` Frederic Barrat
  2021-07-19 18:47     ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-19 14:14 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> There are two functions creating direct_window_list entries in a
> similar way, so create a ddw_list_new_entry() to avoid duplicity and
> simplify those functions.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---

LGTM
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>


>   arch/powerpc/platforms/pseries/iommu.c | 32 +++++++++++++++++---------
>   1 file changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 33d82865d6e6..712d1667144a 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -874,6 +874,21 @@ static u64 find_existing_ddw(struct device_node *pdn, int *window_shift)
>   	return dma_addr;
>   }
>   
> +static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> +						const struct dynamic_dma_window_prop *dma64)
> +{
> +	struct direct_window *window;
> +
> +	window = kzalloc(sizeof(*window), GFP_KERNEL);
> +	if (!window)
> +		return NULL;
> +
> +	window->device = pdn;
> +	window->prop = dma64;
> +
> +	return window;
> +}
> +
>   static int find_existing_ddw_windows(void)
>   {
>   	int len;
> @@ -886,18 +901,15 @@ static int find_existing_ddw_windows(void)
>   
>   	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
>   		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
> -		if (!direct64)
> -			continue;
> -
> -		window = kzalloc(sizeof(*window), GFP_KERNEL);
> -		if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
> -			kfree(window);
> +		if (!direct64 || len < sizeof(*direct64)) {
>   			remove_ddw(pdn, true);
>   			continue;
>   		}
>   
> -		window->device = pdn;
> -		window->prop = direct64;
> +		window = ddw_list_new_entry(pdn, direct64);
> +		if (!window)
> +			break;
> +
>   		spin_lock(&direct_window_list_lock);
>   		list_add(&window->list, &direct_window_list);
>   		spin_unlock(&direct_window_list_lock);
> @@ -1307,7 +1319,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
>   		  create.liobn, dn);
>   
> -	window = kzalloc(sizeof(*window), GFP_KERNEL);
> +	window = ddw_list_new_entry(pdn, ddwprop);
>   	if (!window)
>   		goto out_clear_window;
>   
> @@ -1326,8 +1338,6 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		goto out_free_window;
>   	}
>   
> -	window->device = pdn;
> -	window->prop = ddwprop;
>   	spin_lock(&direct_window_list_lock);
>   	list_add(&window->list, &direct_window_list);
>   	spin_unlock(&direct_window_list_lock);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift
  2021-07-19 13:48   ` Frederic Barrat
@ 2021-07-19 18:43     ` Leonardo Brás
  0 siblings, 0 replies; 37+ messages in thread
From: Leonardo Brás @ 2021-07-19 18:43 UTC (permalink / raw)
  To: Frederic Barrat, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

On Mon, 2021-07-19 at 15:48 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > Some functions assume IOMMU page size can only be 4K (pageshift ==
> > 12).
> > Update them to accept any page size passed, so we can use 64K
> > pages.
> > 
> > In the process, some defines like TCE_SHIFT were made obsolete, and
> > then
> > removed.
> > 
> > IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5
> > show
> > a RPN of 52-bit, and considers a 12-bit pageshift, so there should
> > be
> > no need of using TCE_RPN_MASK, which masks out any bit after 40 in
> > rpn.
> > It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(),
> > and
> > tce_buildmulti_pSeriesLP().
> > 
> > Most places had a tbl struct, so using tbl->it_page_shift was
> > simple.
> > tce_free_pSeriesLP() was a special case, since callers not always
> > have a
> > tbl struct, so adding a tceshift parameter seems the right thing to
> > do.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> 
> FWIW,
> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
> 

Thanks!


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  2021-07-19 14:04   ` Frederic Barrat
@ 2021-07-19 18:47     ` Leonardo Brás
  0 siblings, 0 replies; 37+ messages in thread
From: Leonardo Brás @ 2021-07-19 18:47 UTC (permalink / raw)
  To: Frederic Barrat, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

On Mon, 2021-07-19 at 16:04 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > Creates a helper to allow allocating a new iommu_table without the
> > need
> > to reallocate the iommu_group.
> > 
> > This will be helpful for replacing the iommu_table for the new DMA
> > window,
> > after we remove the old one with iommu_tce_table_put().
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> >   arch/powerpc/platforms/pseries/iommu.c | 25 ++++++++++++++-------
> > ----
> >   1 file changed, 14 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index b1b8d12bab39..33d82865d6e6 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -53,28 +53,31 @@ enum {
> >         DDW_EXT_QUERY_OUT_SIZE = 2
> >   };
> >   
> > -static struct iommu_table_group *iommu_pseries_alloc_group(int
> > node)
> > +static struct iommu_table *iommu_pseries_alloc_table(int node)
> >   {
> > -       struct iommu_table_group *table_group;
> >         struct iommu_table *tbl;
> >   
> > -       table_group = kzalloc_node(sizeof(struct
> > iommu_table_group), GFP_KERNEL,
> > -                          node);
> > -       if (!table_group)
> > -               return NULL;
> > -
> >         tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
> > node);
> >         if (!tbl)
> > -               goto free_group;
> > +               return NULL;
> >   
> >         INIT_LIST_HEAD_RCU(&tbl->it_group_list);
> >         kref_init(&tbl->it_kref);
> > +       return tbl;
> > +}
> >   
> > -       table_group->tables[0] = tbl;
> > +static struct iommu_table_group *iommu_pseries_alloc_group(int
> > node)
> > +{
> > +       struct iommu_table_group *table_group;
> > +
> > +       table_group = kzalloc_node(sizeof(*table_group),
> > GFP_KERNEL, node);
> > +       if (!table_group)
> > +               return NULL;
> >   
> > -       return table_group;
> > +       table_group->tables[0] = iommu_pseries_alloc_table(node);
> > +       if (table_group->tables[0])
> > +               return table_group;
> 
> 
> Nitpick: for readability, we'd usually expect the error path to be 
> detected with the if statement and keep going on the good path, and
> here 
> the code does the opposite. No big deal though, so
> 
> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
> 
> 

Thanks for the tip and review!

Best regards,
Leonardo Bras


> >   
> > -free_group:
> >         kfree(table_group);
> >         return NULL;
> >   }
> > 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  2021-07-19 14:14   ` Frederic Barrat
@ 2021-07-19 18:47     ` Leonardo Brás
  0 siblings, 0 replies; 37+ messages in thread
From: Leonardo Brás @ 2021-07-19 18:47 UTC (permalink / raw)
  To: Frederic Barrat, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

On Mon, 2021-07-19 at 16:14 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > There are two functions creating direct_window_list entries in a
> > similar way, so create a ddw_list_new_entry() to avoid duplicity
> > and
> > simplify those functions.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> 
> LGTM
> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
> 


Thanks!


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  2021-07-19 13:53   ` Frederic Barrat
@ 2021-07-20  5:38     ` Leonardo Brás
  2021-07-20  9:41       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Leonardo Brás @ 2021-07-20  5:38 UTC (permalink / raw)
  To: Frederic Barrat, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

Hello Fred, thanks for this feedback!

Sorry if I miss anything, this snippet was written for v1 over an year
ago, and I have not taken a look at it ever since.

On Mon, 2021-07-19 at 15:53 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > @@ -1099,18 +1105,13 @@ int iommu_take_ownership(struct iommu_table
> > *tbl)
> >         for (i = 0; i < tbl->nr_pools; i++)
> >                 spin_lock_nest_lock(&tbl->pools[i].lock, &tbl-
> > >large_pool.lock);
> >   
> > -       iommu_table_release_pages(tbl);
> > -
> > -       if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
> > +       if (iommu_table_in_use(tbl)) {
> >                 pr_err("iommu_tce: it_map is not empty");
> >                 ret = -EBUSY;
> > -               /* Undo iommu_table_release_pages, i.e. restore
> > bit#0, etc */
> > -               iommu_table_reserve_pages(tbl, tbl-
> > >it_reserved_start,
> > -                               tbl->it_reserved_end);
> > -       } else {
> > -               memset(tbl->it_map, 0xff, sz);
> >         }
> >   
> > +       memset(tbl->it_map, 0xff, sz);
> > +
> 
> 
> So if the table is not empty, we fail (EBUSY) but we now also
> completely 
> overwrite the bitmap. It was in an unexpected state, but we're making
> it 
> worse. Or am I missing something?

IIRC there was a reason to do that at the time, but TBH I don't really
remember it, and by looking at the code right now you seem to be
correct about this causing trouble.

I will send a v6 fixing it soon.
Please review the remaining patches for some issue I may be missing.

Alexey, any comments on that?

> 
>    Fred
> 

Again, thank you for reviewing Fred! 
Best regards,
Leonardo Bras






^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  2021-07-20  5:38     ` Leonardo Brás
@ 2021-07-20  9:41       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 37+ messages in thread
From: Alexey Kardashevskiy @ 2021-07-20  9:41 UTC (permalink / raw)
  To: Leonardo Brás, Frederic Barrat, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 20/07/2021 15:38, Leonardo Brás wrote:
> Hello Fred, thanks for this feedback!
> 
> Sorry if I miss anything, this snippet was written for v1 over an year
> ago, and I have not taken a look at it ever since.
> 
> On Mon, 2021-07-19 at 15:53 +0200, Frederic Barrat wrote:
>>
>>
>> On 16/07/2021 10:27, Leonardo Bras wrote:
>>> @@ -1099,18 +1105,13 @@ int iommu_take_ownership(struct iommu_table
>>> *tbl)
>>>          for (i = 0; i < tbl->nr_pools; i++)
>>>                  spin_lock_nest_lock(&tbl->pools[i].lock, &tbl-
>>>> large_pool.lock);
>>>    
>>> -       iommu_table_release_pages(tbl);
>>> -
>>> -       if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
>>> +       if (iommu_table_in_use(tbl)) {
>>>                  pr_err("iommu_tce: it_map is not empty");
>>>                  ret = -EBUSY;
>>> -               /* Undo iommu_table_release_pages, i.e. restore
>>> bit#0, etc */
>>> -               iommu_table_reserve_pages(tbl, tbl-
>>>> it_reserved_start,
>>> -                               tbl->it_reserved_end);
>>> -       } else {
>>> -               memset(tbl->it_map, 0xff, sz);
>>>          }
>>>    
>>> +       memset(tbl->it_map, 0xff, sz);
>>> +
>>
>>
>> So if the table is not empty, we fail (EBUSY) but we now also
>> completely
>> overwrite the bitmap. It was in an unexpected state, but we're making
>> it
>> worse. Or am I missing something?
> 
> IIRC there was a reason to do that at the time, but TBH I don't really
> remember it, and by looking at the code right now you seem to be
> correct about this causing trouble.
> 
> I will send a v6 fixing it soon.
> Please review the remaining patches for some issue I may be missing.
> 
> Alexey, any comments on that?


Agree with Fred, this is a bug, EBUSY is not that unexpected :-/ Thanks,


> 
>>
>>     Fred
>>
> 
> Again, thank you for reviewing Fred!
> Best regards,
> Leonardo Bras
> 
> 
> 
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00
  2021-07-16  8:27 ` [PATCH v5 05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
@ 2021-07-20 17:44   ` Frederic Barrat
  0 siblings, 0 replies; 37+ messages in thread
From: Frederic Barrat @ 2021-07-20 17:44 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> enable_ddw() currently returns the address of the DMA window, which is
> considered invalid if has the value 0x00.
> 
> Also, it only considers valid an address returned from find_existing_ddw
> if it's not 0x00.
> 
> Changing this behavior makes sense, given the users of enable_ddw() only
> need to know if direct mapping is possible. It can also allow a DMA window
> starting at 0x00 to be used.
> 
> This will be helpful for using a DDW with indirect mapping, as the window
> address will be different than 0x00, but it will not map the whole
> partition.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---


Looks good to me
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>



>   arch/powerpc/platforms/pseries/iommu.c | 36 +++++++++++++-------------
>   1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 712d1667144a..b34b473bbdc1 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -853,25 +853,26 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
>   			np, ret);
>   }
>   
> -static u64 find_existing_ddw(struct device_node *pdn, int *window_shift)
> +static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
>   {
>   	struct direct_window *window;
>   	const struct dynamic_dma_window_prop *direct64;
> -	u64 dma_addr = 0;
> +	bool found = false;
>   
>   	spin_lock(&direct_window_list_lock);
>   	/* check if we already created a window and dupe that config if so */
>   	list_for_each_entry(window, &direct_window_list, list) {
>   		if (window->device == pdn) {
>   			direct64 = window->prop;
> -			dma_addr = be64_to_cpu(direct64->dma_base);
> +			*dma_addr = be64_to_cpu(direct64->dma_base);
>   			*window_shift = be32_to_cpu(direct64->window_shift);
> +			found = true;
>   			break;
>   		}
>   	}
>   	spin_unlock(&direct_window_list_lock);
>   
> -	return dma_addr;
> +	return found;
>   }
>   
>   static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> @@ -1161,20 +1162,20 @@ static int iommu_get_page_shift(u32 query_page_size)
>    * pdn: the parent pe node with the ibm,dma_window property
>    * Future: also check if we can remap the base window for our base page size
>    *
> - * returns the dma offset for use by the direct mapped DMA code.
> + * returns true if can map all pages (direct mapping), false otherwise..
>    */
> -static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> +static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   {
>   	int len = 0, ret;
>   	int max_ram_len = order_base_2(ddw_memory_hotplug_max());
>   	struct ddw_query_response query;
>   	struct ddw_create_response create;
>   	int page_shift;
> -	u64 dma_addr;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
>   	struct property *win64;
> +	bool ddw_enabled = false;
>   	struct dynamic_dma_window_prop *ddwprop;
>   	struct failed_ddw_pdn *fpdn;
>   	bool default_win_removed = false;
> @@ -1186,9 +1187,10 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   
>   	mutex_lock(&direct_window_init_mutex);
>   
> -	dma_addr = find_existing_ddw(pdn, &len);
> -	if (dma_addr != 0)
> +	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
> +		ddw_enabled = true;
>   		goto out_unlock;
> +	}
>   
>   	/*
>   	 * If we already went through this for a previous function of
> @@ -1342,7 +1344,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	list_add(&window->list, &direct_window_list);
>   	spin_unlock(&direct_window_list_lock);
>   
> -	dma_addr = be64_to_cpu(ddwprop->dma_base);
> +	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
> +	ddw_enabled = true;
>   	goto out_unlock;
>   
>   out_free_window:
> @@ -1374,10 +1377,10 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	 * as RAM, then we failed to create a window to cover persistent
>   	 * memory and need to set the DMA limit.
>   	 */
> -	if (pmem_present && dma_addr && (len == max_ram_len))
> -		dev->dev.bus_dma_limit = dma_addr + (1ULL << len);
> +	if (pmem_present && ddw_enabled && (len == max_ram_len))
> +		dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
>   
> -	return dma_addr;
> +	return ddw_enabled;
>   }
>   
>   static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> @@ -1456,11 +1459,8 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
>   			break;
>   	}
>   
> -	if (pdn && PCI_DN(pdn)) {
> -		pdev->dev.archdata.dma_offset = enable_ddw(pdev, pdn);
> -		if (pdev->dev.archdata.dma_offset)
> -			return true;
> -	}
> +	if (pdn && PCI_DN(pdn))
> +		return enable_ddw(pdev, pdn);
>   
>   	return false;
>   }
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  2021-07-16  8:27 ` [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
@ 2021-07-20 17:49   ` Frederic Barrat
  2021-08-17  5:59     ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-20 17:49 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> Code used to create a ddw property that was previously scattered in
> enable_ddw() is now gathered in ddw_property_create(), which deals with
> allocation and filling the property, letting it ready for
> of_property_add(), which now occurs in sequence.
> 
> This created an opportunity to reorganize the second part of enable_ddw():
> 
> Without this patch enable_ddw() does, in order:
> kzalloc() property & members, create_ddw(), fill ddwprop inside property,
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> of_add_property(), and list_add().
> 
> With this patch enable_ddw() does, in order:
> create_ddw(), ddw_property_create(), of_add_property(),
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> and list_add().
> 
> This change requires of_remove_property() in case anything fails after
> of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
> in all memory, which looks the most expensive operation, only if
> everything else succeeds.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 93 ++++++++++++++++----------
>   1 file changed, 57 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index b34b473bbdc1..7ca79a04fa52 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1153,6 +1153,35 @@ static int iommu_get_page_shift(u32 query_page_size)
>   	return 0;
>   }
>   
> +static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
> +					    u32 page_shift, u32 window_shift)
> +{
> +	struct dynamic_dma_window_prop *ddwprop;
> +	struct property *win64;
> +
> +	win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
> +	if (!win64)
> +		return NULL;
> +
> +	win64->name = kstrdup(propname, GFP_KERNEL);
> +	ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
> +	win64->value = ddwprop;
> +	win64->length = sizeof(*ddwprop);
> +	if (!win64->name || !win64->value) {
> +		kfree(win64->name);
> +		kfree(win64->value);
> +		kfree(win64);
> +		return NULL;
> +	}
> +
> +	ddwprop->liobn = cpu_to_be32(liobn);
> +	ddwprop->dma_base = cpu_to_be64(dma_addr);
> +	ddwprop->tce_shift = cpu_to_be32(page_shift);
> +	ddwprop->window_shift = cpu_to_be32(window_shift);
> +
> +	return win64;
> +}
> +
>   /*
>    * If the PE supports dynamic dma windows, and there is space for a table
>    * that can map all pages in a linear offset, then setup such a table,
> @@ -1171,12 +1200,12 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct ddw_query_response query;
>   	struct ddw_create_response create;
>   	int page_shift;
> +	u64 win_addr;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
>   	struct property *win64;
>   	bool ddw_enabled = false;
> -	struct dynamic_dma_window_prop *ddwprop;
>   	struct failed_ddw_pdn *fpdn;
>   	bool default_win_removed = false;
>   	bool pmem_present;
> @@ -1293,72 +1322,64 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   			1ULL << page_shift);
>   		goto out_failed;
>   	}
> -	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
> -	if (!win64) {
> -		dev_info(&dev->dev,
> -			"couldn't allocate property for 64bit dma window\n");
> -		goto out_failed;
> -	}
> -	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
> -	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
> -	win64->length = sizeof(*ddwprop);
> -	if (!win64->name || !win64->value) {
> -		dev_info(&dev->dev,
> -			"couldn't allocate property name and value\n");
> -		goto out_free_prop;
> -	}
>   
>   	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
>   	if (ret != 0)
> -		goto out_free_prop;
> -
> -	ddwprop->liobn = cpu_to_be32(create.liobn);
> -	ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
> -			create.addr_lo);
> -	ddwprop->tce_shift = cpu_to_be32(page_shift);
> -	ddwprop->window_shift = cpu_to_be32(len);
> +		goto out_failed;
>   
>   	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
>   		  create.liobn, dn);
>   
> -	window = ddw_list_new_entry(pdn, ddwprop);
> +	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> +	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> +				    page_shift, len);
> +	if (!win64) {
> +		dev_info(&dev->dev,
> +			 "couldn't allocate property, property name, or value\n");
> +		goto out_remove_win;
> +	}
> +
> +	ret = of_add_property(pdn, win64);
> +	if (ret) {
> +		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> +			pdn, ret);
> +		goto out_free_prop;
> +	}
> +
> +	window = ddw_list_new_entry(pdn, win64->value);
>   	if (!window)
> -		goto out_clear_window;
> +		goto out_del_prop;
>   
>   	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>   			win64->value, tce_setrange_multi_pSeriesLP_walk);
>   	if (ret) {
>   		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
>   			 dn, ret);
> -		goto out_free_window;
> -	}
> -
> -	ret = of_add_property(pdn, win64);
> -	if (ret) {
> -		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> -			 pdn, ret);
> -		goto out_free_window;
> +		goto out_del_list;
>   	}
>   
>   	spin_lock(&direct_window_list_lock);
>   	list_add(&window->list, &direct_window_list);
>   	spin_unlock(&direct_window_list_lock);
>   
> -	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
> +	dev->dev.archdata.dma_offset = win_addr;
>   	ddw_enabled = true;
>   	goto out_unlock;
>   
> -out_free_window:
> +out_del_list:
>   	kfree(window);
>   
> -out_clear_window:
> -	remove_ddw(pdn, true);
> +out_del_prop:
> +	of_remove_property(pdn, win64);
>   
>   out_free_prop:
>   	kfree(win64->name);
>   	kfree(win64->value);
>   	kfree(win64);
>   
> +out_remove_win:
> +	remove_ddw(pdn, true);


I believe there's a small problem here. We jump directly to 
out_remove_win if allocating the property failed. Yet, the first thing 
remove_ddw() does is look for the property. So it will never find it and 
the window is never removed by the hypervisor.

   Fred


> +
>   out_failed:
>   	if (default_win_removed)
>   		reset_dma_window(dev, pdn);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
  2021-07-16  8:27 ` [PATCH v5 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
@ 2021-07-20 17:50   ` Frederic Barrat
  0 siblings, 0 replies; 37+ messages in thread
From: Frederic Barrat @ 2021-07-20 17:50 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> Add a new helper _iommu_table_setparms(), and use it in
> iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
> code.
> 
> Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
> so move it to the new helper. Since we need the iommu_table_ops to be
> declared before used, declare iommu_table_lpar_multi_ops and
> iommu_table_pseries_ops to before their respective iommu_table_setparms*().
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---


Looks good
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>



>   arch/powerpc/platforms/pseries/iommu.c | 72 ++++++++++++++------------
>   1 file changed, 38 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 7ca79a04fa52..108c3dcca686 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -501,6 +501,24 @@ static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
>   	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
>   }
>   
> +static void iommu_table_setparms_common(struct iommu_table *tbl, unsigned long busno,
> +					unsigned long liobn, unsigned long win_addr,
> +					unsigned long window_size, unsigned long page_shift,
> +					void *base, struct iommu_table_ops *table_ops)
> +{
> +	tbl->it_busno = busno;
> +	tbl->it_index = liobn;
> +	tbl->it_offset = win_addr >> page_shift;
> +	tbl->it_size = window_size >> page_shift;
> +	tbl->it_page_shift = page_shift;
> +	tbl->it_base = (unsigned long)base;
> +	tbl->it_blocksize = 16;
> +	tbl->it_type = TCE_PCI;
> +	tbl->it_ops = table_ops;
> +}
> +
> +struct iommu_table_ops iommu_table_pseries_ops;
> +
>   static void iommu_table_setparms(struct pci_controller *phb,
>   				 struct device_node *dn,
>   				 struct iommu_table *tbl)
> @@ -509,8 +527,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
>   	const unsigned long *basep;
>   	const u32 *sizep;
>   
> -	node = phb->dn;
> +	/* Test if we are going over 2GB of DMA space */
> +	if (phb->dma_window_base_cur + phb->dma_window_size > SZ_2G) {
> +		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> +		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> +	}
>   
> +	node = phb->dn;
>   	basep = of_get_property(node, "linux,tce-base", NULL);
>   	sizep = of_get_property(node, "linux,tce-size", NULL);
>   	if (basep == NULL || sizep == NULL) {
> @@ -519,33 +542,18 @@ static void iommu_table_setparms(struct pci_controller *phb,
>   		return;
>   	}
>   
> -	tbl->it_base = (unsigned long)__va(*basep);
> +	iommu_table_setparms_common(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
> +				    phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
> +				    __va(*basep), &iommu_table_pseries_ops);
>   
>   	if (!is_kdump_kernel())
>   		memset((void *)tbl->it_base, 0, *sizep);
>   
> -	tbl->it_busno = phb->bus->number;
> -	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> -
> -	/* Units of tce entries */
> -	tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
> -
> -	/* Test if we are going over 2GB of DMA space */
> -	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> -		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> -		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> -	}
> -
>   	phb->dma_window_base_cur += phb->dma_window_size;
> -
> -	/* Set the tce table size - measured in entries */
> -	tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> -
> -	tbl->it_index = 0;
> -	tbl->it_blocksize = 16;
> -	tbl->it_type = TCE_PCI;
>   }
>   
> +struct iommu_table_ops iommu_table_lpar_multi_ops;
> +
>   /*
>    * iommu_table_setparms_lpar
>    *
> @@ -557,17 +565,13 @@ static void iommu_table_setparms_lpar(struct pci_controller *phb,
>   				      struct iommu_table_group *table_group,
>   				      const __be32 *dma_window)
>   {
> -	unsigned long offset, size;
> +	unsigned long offset, size, liobn;
>   
> -	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
> +	of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
> +
> +	iommu_table_setparms_common(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, NULL,
> +				    &iommu_table_lpar_multi_ops);
>   
> -	tbl->it_busno = phb->bus->number;
> -	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> -	tbl->it_base   = 0;
> -	tbl->it_blocksize  = 16;
> -	tbl->it_type = TCE_PCI;
> -	tbl->it_offset = offset >> tbl->it_page_shift;
> -	tbl->it_size = size >> tbl->it_page_shift;
>   
>   	table_group->tce32_start = offset;
>   	table_group->tce32_size = size;
> @@ -647,7 +651,7 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>   	tbl = pci->table_group->tables[0];
>   
>   	iommu_table_setparms(pci->phb, dn, tbl);
> -	tbl->it_ops = &iommu_table_pseries_ops;
> +
>   	if (!iommu_init_table(tbl, pci->phb->node, 0, 0))
>   		panic("Failed to initialize iommu table");
>   
> @@ -730,7 +734,7 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   		tbl = ppci->table_group->tables[0];
>   		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
>   				ppci->table_group, dma_window);
> -		tbl->it_ops = &iommu_table_lpar_multi_ops;
> +
>   		if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
>   			panic("Failed to initialize iommu table");
>   		iommu_register_group(ppci->table_group,
> @@ -760,7 +764,7 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
>   		PCI_DN(dn)->table_group = iommu_pseries_alloc_group(phb->node);
>   		tbl = PCI_DN(dn)->table_group->tables[0];
>   		iommu_table_setparms(phb, dn, tbl);
> -		tbl->it_ops = &iommu_table_pseries_ops;
> +
>   		if (!iommu_init_table(tbl, phb->node, 0, 0))
>   			panic("Failed to initialize iommu table");
>   
> @@ -1443,7 +1447,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
>   		tbl = pci->table_group->tables[0];
>   		iommu_table_setparms_lpar(pci->phb, pdn, tbl,
>   				pci->table_group, dma_window);
> -		tbl->it_ops = &iommu_table_lpar_multi_ops;
> +
>   		iommu_init_table(tbl, pci->phb->node, 0, 0);
>   		iommu_register_group(pci->table_group,
>   				pci_domain_nr(pci->phb->bus), 0);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2021-07-16  8:27 ` [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
@ 2021-07-20 17:51   ` Frederic Barrat
  2021-08-17  5:59     ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-20 17:51 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> Update remove_dma_window() so it can be used to remove DDW with a given
> property name.
> 
> This enables the creation of new property names for DDW, so we can
> have different usage for it, like indirect mapping.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 108c3dcca686..17c6f4706e76 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -830,31 +830,32 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   }
>   
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_name)
>   {


Why switch to returning an int? None of the callers check it.

   Fred


>   	struct property *win;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	int ret = 0;
>   
> +	win = of_find_property(np, win_name, NULL);
> +	if (!win)
> +		return -EINVAL;
> +
>   	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
>   					 &ddw_avail[0], DDW_APPLICABLE_SIZE);
>   	if (ret)
> -		return;
> -
> -	win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> -	if (!win)
> -		return;
> +		return 0;
>   
>   	if (win->length >= sizeof(struct dynamic_dma_window_prop))
>   		remove_dma_window(np, ddw_avail, win);
>   
>   	if (!remove_prop)
> -		return;
> +		return 0;
>   
>   	ret = of_remove_property(np, win);
>   	if (ret)
>   		pr_warn("%pOF: failed to remove direct window property: %d\n",
>   			np, ret);
> +	return 0;
>   }
>   
>   static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
> @@ -907,7 +908,7 @@ static int find_existing_ddw_windows(void)
>   	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
>   		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
>   		if (!direct64 || len < sizeof(*direct64)) {
> -			remove_ddw(pdn, true);
> +			remove_ddw(pdn, true, DIRECT64_PROPNAME);
>   			continue;
>   		}
>   
> @@ -1382,7 +1383,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	kfree(win64);
>   
>   out_remove_win:
> -	remove_ddw(pdn, true);
> +	remove_ddw(pdn, true, DIRECT64_PROPNAME);
>   
>   out_failed:
>   	if (default_win_removed)
> @@ -1547,7 +1548,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   		 * we have to remove the property when releasing
>   		 * the device node.
>   		 */
> -		remove_ddw(np, false);
> +		remove_ddw(np, false, DIRECT64_PROPNAME);
>   		if (pci && pci->table_group)
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 09/11] powerpc/pseries/iommu: Find existing DDW with given property name
  2021-07-16  8:27 ` [PATCH v5 09/11] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
@ 2021-07-20 17:52   ` Frederic Barrat
  0 siblings, 0 replies; 37+ messages in thread
From: Frederic Barrat @ 2021-07-20 17:52 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> At the moment pseries stores information about created directly mapped
> DDW window in DIRECT64_PROPNAME.
> 
> With the objective of implementing indirect DMA mapping with DDW, it's
> necessary to have another propriety name to make sure kexec'ing into older
> kernels does not break, as it would if we reuse DIRECT64_PROPNAME.
> 
> In order to have this, find_existing_ddw_windows() needs to be able to
> look for different property names.
> 
> Extract find_existing_ddw_windows() into find_existing_ddw_windows_named()
> and calls it with current property name.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---


Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>




>   arch/powerpc/platforms/pseries/iommu.c | 25 +++++++++++++++----------
>   1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 17c6f4706e76..22d251e15b61 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -895,24 +895,21 @@ static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
>   	return window;
>   }
>   
> -static int find_existing_ddw_windows(void)
> +static void find_existing_ddw_windows_named(const char *name)
>   {
>   	int len;
>   	struct device_node *pdn;
>   	struct direct_window *window;
> -	const struct dynamic_dma_window_prop *direct64;
> -
> -	if (!firmware_has_feature(FW_FEATURE_LPAR))
> -		return 0;
> +	const struct dynamic_dma_window_prop *dma64;
>   
> -	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
> -		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
> -		if (!direct64 || len < sizeof(*direct64)) {
> -			remove_ddw(pdn, true, DIRECT64_PROPNAME);
> +	for_each_node_with_property(pdn, name) {
> +		dma64 = of_get_property(pdn, name, &len);
> +		if (!dma64 || len < sizeof(*dma64)) {
> +			remove_ddw(pdn, true, name);
>   			continue;
>   		}
>   
> -		window = ddw_list_new_entry(pdn, direct64);
> +		window = ddw_list_new_entry(pdn, dma64);
>   		if (!window)
>   			break;
>   
> @@ -920,6 +917,14 @@ static int find_existing_ddw_windows(void)
>   		list_add(&window->list, &direct_window_list);
>   		spin_unlock(&direct_window_list_lock);
>   	}
> +}
> +
> +static int find_existing_ddw_windows(void)
> +{
> +	if (!firmware_has_feature(FW_FEATURE_LPAR))
> +		return 0;
> +
> +	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
>   
>   	return 0;
>   }
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-07-16  8:27 ` [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
@ 2021-07-20 18:12   ` Frederic Barrat
  2021-07-21  3:32     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-20 18:12 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> works with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
> 
> As of today, if the assumed direct mapping is not possible, DDW creation
> is skipped and the default DMA window "ibm,dma-window" is used instead.
> 
> By using DDW, indirect mapping  can get more TCEs than available for the
> default DMA window, and also get access to using much larger pagesizes
> (16MB as implemented in qemu vs 4k from default DMA window), causing a
> significant increase on the maximum amount of memory that can be IOMMU
> mapped at the same time.
> 
> Indirect mapping will only be used if direct mapping is not a
> possibility.
> 
> For indirect mapping, it's necessary to re-create the iommu_table with
> the new DMA window parameters, so iommu_alloc() can use it.
> 
> Removing the default DMA window for using DDW with indirect mapping
> is only allowed if there is no current IOMMU memory allocated in
> the iommu_table. enable_ddw() is aborted otherwise.
> 
> Even though there won't be both direct and indirect mappings at the
> same time, we can't reuse the DIRECT64_PROPNAME property name, or else
> an older kexec()ed kernel can assume direct mapping, and skip
> iommu_alloc(), causing undesirable behavior.
> So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
> was created to represent a DDW that does not allow direct mapping.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 87 +++++++++++++++++++++-----
>   1 file changed, 72 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 22d251e15b61..a67e71c49aeb 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
>   /* protects initializing window twice for same device */
>   static DEFINE_MUTEX(direct_window_init_mutex);
>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>   
>   static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   					unsigned long num_pfn, const void *arg)
> @@ -925,6 +926,7 @@ static int find_existing_ddw_windows(void)
>   		return 0;
>   
>   	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
> +	find_existing_ddw_windows_named(DMA64_PROPNAME);
>   
>   	return 0;
>   }
> @@ -1211,14 +1213,17 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct ddw_create_response create;
>   	int page_shift;
>   	u64 win_addr;
> +	const char *win_name;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
>   	struct property *win64;
>   	bool ddw_enabled = false;
>   	struct failed_ddw_pdn *fpdn;
> -	bool default_win_removed = false;
> +	bool default_win_removed = false, direct_mapping = false;
>   	bool pmem_present;
> +	struct pci_dn *pci = PCI_DN(pdn);
> +	struct iommu_table *tbl = pci->table_group->tables[0];
>   
>   	dn = of_find_node_by_type(NULL, "ibm,pmemory");
>   	pmem_present = dn != NULL;
> @@ -1227,6 +1232,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	mutex_lock(&direct_window_init_mutex);
>   
>   	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
> +		direct_mapping = (len >= max_ram_len);
>   		ddw_enabled = true;
>   		goto out_unlock;
>   	}
> @@ -1307,8 +1313,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   			  query.page_size);
>   		goto out_failed;
>   	}
> -	/* verify the window * number of ptes will map the partition */
> -	/* check largest block * page size > max memory hotplug addr */
> +
>   	/*
>   	 * The "ibm,pmemory" can appear anywhere in the address space.
>   	 * Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS
> @@ -1324,13 +1329,25 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   			dev_info(&dev->dev, "Skipping ibm,pmemory");
>   	}
>   
> +	/* check if the available block * number of ptes will map everything */
>   	if (query.largest_available_block < (1ULL << (len - page_shift))) {
>   		dev_dbg(&dev->dev,
>   			"can't map partition max 0x%llx with %llu %llu-sized pages\n",
>   			1ULL << len,
>   			query.largest_available_block,
>   			1ULL << page_shift);
> -		goto out_failed;
> +
> +		/* DDW + IOMMU on single window may fail if there is any allocation */
> +		if (default_win_removed && iommu_table_in_use(tbl)) {
> +			dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
> +			goto out_failed;
> +		}
> +
> +		len = order_base_2(query.largest_available_block << page_shift);
> +		win_name = DMA64_PROPNAME;
> +	} else {
> +		direct_mapping = true;
> +		win_name = DIRECT64_PROPNAME;
>   	}
>   
>   	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
> @@ -1341,8 +1358,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		  create.liobn, dn);
>   
>   	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> -	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> -				    page_shift, len);
> +	win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
>   	if (!win64) {
>   		dev_info(&dev->dev,
>   			 "couldn't allocate property, property name, or value\n");
> @@ -1360,12 +1376,51 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	if (!window)
>   		goto out_del_prop;
>   
> -	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> -			win64->value, tce_setrange_multi_pSeriesLP_walk);
> -	if (ret) {
> -		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> -			 dn, ret);
> -		goto out_del_list;
> +	if (direct_mapping) {
> +		/* DDW maps the whole partition, so enable direct DMA mapping */
> +		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> +					    win64->value, tce_setrange_multi_pSeriesLP_walk);
> +		if (ret) {
> +			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> +				 dn, ret);
> +			goto out_del_list;
> +		}
> +	} else {
> +		struct iommu_table *newtbl;
> +		int i;
> +
> +		for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
> +			const unsigned long mask = IORESOURCE_MEM_64 | IORESOURCE_MEM;
> +
> +			/* Look for MMIO32 */
> +			if ((pci->phb->mem_resources[i].flags & mask) == IORESOURCE_MEM)
> +				break;
> +		}
> +
> +		if (i == ARRAY_SIZE(pci->phb->mem_resources))
> +			goto out_del_list;


So we exit and do nothing if there's no MMIO32 bar?
Isn't the intent just to figure out the MMIO32 area to reserve it when 
init'ing the table? In which case we could default to 0,0

I'm actually not clear why we are reserving this area on pseries.


> +
> +		/* New table for using DDW instead of the default DMA window */
> +		newtbl = iommu_pseries_alloc_table(pci->phb->node);
> +		if (!newtbl) {
> +			dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
> +			goto out_del_list;
> +		}
> +
> +		iommu_table_setparms_common(newtbl, pci->phb->bus->number, create.liobn, win_addr,
> +					    1UL << len, page_shift, NULL, &iommu_table_lpar_multi_ops);
> +		iommu_init_table(newtbl, pci->phb->node, pci->phb->mem_resources[i].start,
> +				 pci->phb->mem_resources[i].end);
> +
> +		pci->table_group->tables[1] = newtbl;
> +
> +		/* Keep default DMA window stuct if removed */
> +		if (default_win_removed) {
> +			tbl->it_size = 0;
> +			kfree(tbl->it_map);
> +		}
> +
> +		set_iommu_table_base(&dev->dev, newtbl);
>   	}
>   
>   	spin_lock(&direct_window_list_lock);




Somewhere around here, we have:

out_remove_win:
	remove_ddw(pdn, true, DIRECT64_PROPNAME);

We should replace with:
	remove_ddw(pdn, true, win_name);


   Fred



> @@ -1408,10 +1463,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	 * as RAM, then we failed to create a window to cover persistent
>   	 * memory and need to set the DMA limit.
>   	 */
> -	if (pmem_present && ddw_enabled && (len == max_ram_len))
> +	if (pmem_present && ddw_enabled && direct_mapping && len == max_ram_len)
>   		dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
>   
> -	return ddw_enabled;
> +    return ddw_enabled && direct_mapping;
>   }
>   
>   static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> @@ -1553,7 +1608,9 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   		 * we have to remove the property when releasing
>   		 * the device node.
>   		 */
> -		remove_ddw(np, false, DIRECT64_PROPNAME);
> +		if (remove_ddw(np, false, DIRECT64_PROPNAME))
> +			remove_ddw(np, false, DMA64_PROPNAME);
> +
>   		if (pci && pci->table_group)
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window"
  2021-07-16  8:27 ` [PATCH v5 11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
@ 2021-07-20 18:12   ` Frederic Barrat
  0 siblings, 0 replies; 37+ messages in thread
From: Frederic Barrat @ 2021-07-20 18:12 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 16/07/2021 10:27, Leonardo Bras wrote:
> A previous change introduced the usage of DDW as a bigger indirect DMA
> mapping when the DDW available size does not map the whole partition.
> 
> As most of the code that manipulates direct mappings was reused for
> indirect mappings, it's necessary to rename all names and debug/info
> messages to reflect that it can be used for both kinds of mapping.
> 
> This should cause no behavioural change, just adjust naming.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---


LGTM:
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>



>   arch/powerpc/platforms/pseries/iommu.c | 91 +++++++++++++-------------
>   1 file changed, 47 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index a67e71c49aeb..52548dfb8b45 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -349,7 +349,7 @@ struct dynamic_dma_window_prop {
>   	__be32	window_shift;	/* ilog2(tce_window_size) */
>   };
>   
> -struct direct_window {
> +struct dma_win {
>   	struct device_node *device;
>   	const struct dynamic_dma_window_prop *prop;
>   	struct list_head list;
> @@ -369,11 +369,11 @@ struct ddw_create_response {
>   	u32 addr_lo;
>   };
>   
> -static LIST_HEAD(direct_window_list);
> +static LIST_HEAD(dma_win_list);
>   /* prevents races between memory on/offline and window creation */
> -static DEFINE_SPINLOCK(direct_window_list_lock);
> +static DEFINE_SPINLOCK(dma_win_list_lock);
>   /* protects initializing window twice for same device */
> -static DEFINE_MUTEX(direct_window_init_mutex);
> +static DEFINE_MUTEX(dma_win_init_mutex);
>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
>   #define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>   
> @@ -713,7 +713,10 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   	pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
>   		 dn);
>   
> -	/* Find nearest ibm,dma-window, walking up the device tree */
> +	/*
> +	 * Find nearest ibm,dma-window (default DMA window), walking up the
> +	 * device tree
> +	 */
>   	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
>   		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
>   		if (dma_window != NULL)
> @@ -822,11 +825,11 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
>   
>   	ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
>   	if (ret)
> -		pr_warn("%pOF: failed to remove direct window: rtas returned "
> +		pr_warn("%pOF: failed to remove DMA window: rtas returned "
>   			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   	else
> -		pr_debug("%pOF: successfully removed direct window: rtas returned "
> +		pr_debug("%pOF: successfully removed DMA window: rtas returned "
>   			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   }
> @@ -854,37 +857,37 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
>   
>   	ret = of_remove_property(np, win);
>   	if (ret)
> -		pr_warn("%pOF: failed to remove direct window property: %d\n",
> +		pr_warn("%pOF: failed to remove DMA window property: %d\n",
>   			np, ret);
>   	return 0;
>   }
>   
>   static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
>   {
> -	struct direct_window *window;
> -	const struct dynamic_dma_window_prop *direct64;
> +	struct dma_win *window;
> +	const struct dynamic_dma_window_prop *dma64;
>   	bool found = false;
>   
> -	spin_lock(&direct_window_list_lock);
> +	spin_lock(&dma_win_list_lock);
>   	/* check if we already created a window and dupe that config if so */
> -	list_for_each_entry(window, &direct_window_list, list) {
> +	list_for_each_entry(window, &dma_win_list, list) {
>   		if (window->device == pdn) {
> -			direct64 = window->prop;
> -			*dma_addr = be64_to_cpu(direct64->dma_base);
> -			*window_shift = be32_to_cpu(direct64->window_shift);
> +			dma64 = window->prop;
> +			*dma_addr = be64_to_cpu(dma64->dma_base);
> +			*window_shift = be32_to_cpu(dma64->window_shift);
>   			found = true;
>   			break;
>   		}
>   	}
> -	spin_unlock(&direct_window_list_lock);
> +	spin_unlock(&dma_win_list_lock);
>   
>   	return found;
>   }
>   
> -static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> -						const struct dynamic_dma_window_prop *dma64)
> +static struct dma_win *ddw_list_new_entry(struct device_node *pdn,
> +					  const struct dynamic_dma_window_prop *dma64)
>   {
> -	struct direct_window *window;
> +	struct dma_win *window;
>   
>   	window = kzalloc(sizeof(*window), GFP_KERNEL);
>   	if (!window)
> @@ -900,7 +903,7 @@ static void find_existing_ddw_windows_named(const char *name)
>   {
>   	int len;
>   	struct device_node *pdn;
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	const struct dynamic_dma_window_prop *dma64;
>   
>   	for_each_node_with_property(pdn, name) {
> @@ -914,9 +917,9 @@ static void find_existing_ddw_windows_named(const char *name)
>   		if (!window)
>   			break;
>   
> -		spin_lock(&direct_window_list_lock);
> -		list_add(&window->list, &direct_window_list);
> -		spin_unlock(&direct_window_list_lock);
> +		spin_lock(&dma_win_list_lock);
> +		list_add(&window->list, &dma_win_list);
> +		spin_unlock(&dma_win_list_lock);
>   	}
>   }
>   
> @@ -1216,7 +1219,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	const char *win_name;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	struct property *win64;
>   	bool ddw_enabled = false;
>   	struct failed_ddw_pdn *fpdn;
> @@ -1229,7 +1232,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	pmem_present = dn != NULL;
>   	of_node_put(dn);
>   
> -	mutex_lock(&direct_window_init_mutex);
> +	mutex_lock(&dma_win_init_mutex);
>   
>   	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
>   		direct_mapping = (len >= max_ram_len);
> @@ -1309,8 +1312,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   
>   	page_shift = iommu_get_page_shift(query.page_size);
>   	if (!page_shift) {
> -		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
> -			  query.page_size);
> +		dev_dbg(&dev->dev, "no supported page size in mask %x",
> +			query.page_size);
>   		goto out_failed;
>   	}
>   
> @@ -1367,7 +1370,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   
>   	ret = of_add_property(pdn, win64);
>   	if (ret) {
> -		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> +		dev_err(&dev->dev, "unable to add DMA window property for %pOF: %d",
>   			pdn, ret);
>   		goto out_free_prop;
>   	}
> @@ -1381,7 +1384,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>   					    win64->value, tce_setrange_multi_pSeriesLP_walk);
>   		if (ret) {
> -			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> +			dev_info(&dev->dev, "failed to map DMA window for %pOF: %d\n",
>   				 dn, ret);
>   			goto out_del_list;
>   		}
> @@ -1423,9 +1426,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		set_iommu_table_base(&dev->dev, newtbl);
>   	}
>   
> -	spin_lock(&direct_window_list_lock);
> -	list_add(&window->list, &direct_window_list);
> -	spin_unlock(&direct_window_list_lock);
> +	spin_lock(&dma_win_list_lock);
> +	list_add(&window->list, &dma_win_list);
> +	spin_unlock(&dma_win_list_lock);
>   
>   	dev->dev.archdata.dma_offset = win_addr;
>   	ddw_enabled = true;
> @@ -1456,7 +1459,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	list_add(&fpdn->list, &failed_ddw_pdn_list);
>   
>   out_unlock:
> -	mutex_unlock(&direct_window_init_mutex);
> +	mutex_unlock(&dma_win_init_mutex);
>   
>   	/*
>   	 * If we have persistent memory and the window size is only as big
> @@ -1554,29 +1557,29 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
>   static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
>   		void *data)
>   {
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	struct memory_notify *arg = data;
>   	int ret = 0;
>   
>   	switch (action) {
>   	case MEM_GOING_ONLINE:
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>   					arg->nr_pages, window->prop);
>   			/* XXX log error */
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	case MEM_CANCEL_ONLINE:
>   	case MEM_OFFLINE:
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>   					arg->nr_pages, window->prop);
>   			/* XXX log error */
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	default:
>   		break;
> @@ -1597,7 +1600,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   	struct of_reconfig_data *rd = data;
>   	struct device_node *np = rd->dn;
>   	struct pci_dn *pci = PCI_DN(np);
> -	struct direct_window *window;
> +	struct dma_win *window;
>   
>   	switch (action) {
>   	case OF_RECONFIG_DETACH_NODE:
> @@ -1615,15 +1618,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
>   
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			if (window->device == np) {
>   				list_del(&window->list);
>   				kfree(window);
>   				break;
>   			}
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	default:
>   		err = NOTIFY_DONE;
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-07-20 18:12   ` Frederic Barrat
@ 2021-07-21  3:32     ` Alexey Kardashevskiy
  2021-07-21 15:04       ` Frederic Barrat
  2021-08-17  5:59       ` Leonardo Brás
  0 siblings, 2 replies; 37+ messages in thread
From: Alexey Kardashevskiy @ 2021-07-21  3:32 UTC (permalink / raw)
  To: Frederic Barrat, Leonardo Bras, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 21/07/2021 04:12, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
>> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
>> works with a small number of devices. SRIOV changes it as the user can
>> configure hundreds VFs and since phyp preallocates TCEs and does not
>> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
>> per a PE to limit waste of physical pages.
>>
>> As of today, if the assumed direct mapping is not possible, DDW creation
>> is skipped and the default DMA window "ibm,dma-window" is used instead.
>>
>> By using DDW, indirect mapping  can get more TCEs than available for the
>> default DMA window, and also get access to using much larger pagesizes
>> (16MB as implemented in qemu vs 4k from default DMA window), causing a
>> significant increase on the maximum amount of memory that can be IOMMU
>> mapped at the same time.
>>
>> Indirect mapping will only be used if direct mapping is not a
>> possibility.
>>
>> For indirect mapping, it's necessary to re-create the iommu_table with
>> the new DMA window parameters, so iommu_alloc() can use it.
>>
>> Removing the default DMA window for using DDW with indirect mapping
>> is only allowed if there is no current IOMMU memory allocated in
>> the iommu_table. enable_ddw() is aborted otherwise.
>>
>> Even though there won't be both direct and indirect mappings at the
>> same time, we can't reuse the DIRECT64_PROPNAME property name, or else
>> an older kexec()ed kernel can assume direct mapping, and skip
>> iommu_alloc(), causing undesirable behavior.
>> So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
>> was created to represent a DDW that does not allow direct mapping.
>>
>> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
>> ---
>>   arch/powerpc/platforms/pseries/iommu.c | 87 +++++++++++++++++++++-----
>>   1 file changed, 72 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>> b/arch/powerpc/platforms/pseries/iommu.c
>> index 22d251e15b61..a67e71c49aeb 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
>>   /* protects initializing window twice for same device */
>>   static DEFINE_MUTEX(direct_window_init_mutex);
>>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
>> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>>   static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>>                       unsigned long num_pfn, const void *arg)
>> @@ -925,6 +926,7 @@ static int find_existing_ddw_windows(void)
>>           return 0;
>>       find_existing_ddw_windows_named(DIRECT64_PROPNAME);
>> +    find_existing_ddw_windows_named(DMA64_PROPNAME);
>>       return 0;
>>   }
>> @@ -1211,14 +1213,17 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>       struct ddw_create_response create;
>>       int page_shift;
>>       u64 win_addr;
>> +    const char *win_name;
>>       struct device_node *dn;
>>       u32 ddw_avail[DDW_APPLICABLE_SIZE];
>>       struct direct_window *window;
>>       struct property *win64;
>>       bool ddw_enabled = false;
>>       struct failed_ddw_pdn *fpdn;
>> -    bool default_win_removed = false;
>> +    bool default_win_removed = false, direct_mapping = false;
>>       bool pmem_present;
>> +    struct pci_dn *pci = PCI_DN(pdn);
>> +    struct iommu_table *tbl = pci->table_group->tables[0];
>>       dn = of_find_node_by_type(NULL, "ibm,pmemory");
>>       pmem_present = dn != NULL;
>> @@ -1227,6 +1232,7 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>       mutex_lock(&direct_window_init_mutex);
>>       if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
>> +        direct_mapping = (len >= max_ram_len);
>>           ddw_enabled = true;
>>           goto out_unlock;
>>       }
>> @@ -1307,8 +1313,7 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>                 query.page_size);
>>           goto out_failed;
>>       }
>> -    /* verify the window * number of ptes will map the partition */
>> -    /* check largest block * page size > max memory hotplug addr */
>> +
>>       /*
>>        * The "ibm,pmemory" can appear anywhere in the address space.
>>        * Assuming it is still backed by page structs, try 
>> MAX_PHYSMEM_BITS
>> @@ -1324,13 +1329,25 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>               dev_info(&dev->dev, "Skipping ibm,pmemory");
>>       }
>> +    /* check if the available block * number of ptes will map 
>> everything */
>>       if (query.largest_available_block < (1ULL << (len - page_shift))) {
>>           dev_dbg(&dev->dev,
>>               "can't map partition max 0x%llx with %llu %llu-sized 
>> pages\n",
>>               1ULL << len,
>>               query.largest_available_block,
>>               1ULL << page_shift);
>> -        goto out_failed;
>> +
>> +        /* DDW + IOMMU on single window may fail if there is any 
>> allocation */
>> +        if (default_win_removed && iommu_table_in_use(tbl)) {
>> +            dev_dbg(&dev->dev, "current IOMMU table in use, can't be 
>> replaced.\n");
>> +            goto out_failed;
>> +        }
>> +
>> +        len = order_base_2(query.largest_available_block << page_shift);
>> +        win_name = DMA64_PROPNAME;
>> +    } else {
>> +        direct_mapping = true;
>> +        win_name = DIRECT64_PROPNAME;
>>       }
>>       ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
>> @@ -1341,8 +1358,7 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>             create.liobn, dn);
>>       win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
>> -    win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, 
>> win_addr,
>> -                    page_shift, len);
>> +    win64 = ddw_property_create(win_name, create.liobn, win_addr, 
>> page_shift, len);
>>       if (!win64) {
>>           dev_info(&dev->dev,
>>                "couldn't allocate property, property name, or value\n");
>> @@ -1360,12 +1376,51 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>       if (!window)
>>           goto out_del_prop;
>> -    ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>> -            win64->value, tce_setrange_multi_pSeriesLP_walk);
>> -    if (ret) {
>> -        dev_info(&dev->dev, "failed to map direct window for %pOF: 
>> %d\n",
>> -             dn, ret);
>> -        goto out_del_list;
>> +    if (direct_mapping) {
>> +        /* DDW maps the whole partition, so enable direct DMA mapping */
>> +        ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> 
>> PAGE_SHIFT,
>> +                        win64->value, 
>> tce_setrange_multi_pSeriesLP_walk);
>> +        if (ret) {
>> +            dev_info(&dev->dev, "failed to map direct window for 
>> %pOF: %d\n",
>> +                 dn, ret);
>> +            goto out_del_list;
>> +        }
>> +    } else {
>> +        struct iommu_table *newtbl;
>> +        int i;
>> +
>> +        for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
>> +            const unsigned long mask = IORESOURCE_MEM_64 | 
>> IORESOURCE_MEM;
>> +
>> +            /* Look for MMIO32 */
>> +            if ((pci->phb->mem_resources[i].flags & mask) == 
>> IORESOURCE_MEM)
>> +                break;
>> +        }
>> +
>> +        if (i == ARRAY_SIZE(pci->phb->mem_resources))
>> +            goto out_del_list;
> 
> 
> So we exit and do nothing if there's no MMIO32 bar?
> Isn't the intent just to figure out the MMIO32 area to reserve it when 
> init'ing the table? In which case we could default to 0,0
> 
> I'm actually not clear why we are reserving this area on pseries.



If we do not reserve it, then the iommu code will allocate DMA pages 
from there and these addresses are MMIO32 from the kernel pov at least. 
I saw crashes when (I think) a device tried DMAing to the top 2GB of the 
bus space which happened to be a some other device's BAR.


> 
> 
>> +
>> +        /* New table for using DDW instead of the default DMA window */
>> +        newtbl = iommu_pseries_alloc_table(pci->phb->node);
>> +        if (!newtbl) {
>> +            dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
>> +            goto out_del_list;
>> +        }
>> +
>> +        iommu_table_setparms_common(newtbl, pci->phb->bus->number, 
>> create.liobn, win_addr,
>> +                        1UL << len, page_shift, NULL, 
>> &iommu_table_lpar_multi_ops);
>> +        iommu_init_table(newtbl, pci->phb->node, 
>> pci->phb->mem_resources[i].start,
>> +                 pci->phb->mem_resources[i].end);
>> +
>> +        pci->table_group->tables[1] = newtbl;
>> +
>> +        /* Keep default DMA window stuct if removed */
>> +        if (default_win_removed) {
>> +            tbl->it_size = 0;
>> +            kfree(tbl->it_map);
>> +        }
>> +
>> +        set_iommu_table_base(&dev->dev, newtbl);
>>       }
>>       spin_lock(&direct_window_list_lock);
> 
> 
> 
> 
> Somewhere around here, we have:
> 
> out_remove_win:
>      remove_ddw(pdn, true, DIRECT64_PROPNAME);
> 
> We should replace with:
>      remove_ddw(pdn, true, win_name);


True. Good spotting. Or rework remove_dma_window() to take just a liobn. 
Thanks,

> 
> 
>    Fred
> 
> 
> 
>> @@ -1408,10 +1463,10 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>        * as RAM, then we failed to create a window to cover persistent
>>        * memory and need to set the DMA limit.
>>        */
>> -    if (pmem_present && ddw_enabled && (len == max_ram_len))
>> +    if (pmem_present && ddw_enabled && direct_mapping && len == 
>> max_ram_len)
>>           dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + 
>> (1ULL << len);
>> -    return ddw_enabled;
>> +    return ddw_enabled && direct_mapping;
>>   }
>>   static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
>> @@ -1553,7 +1608,9 @@ static int iommu_reconfig_notifier(struct 
>> notifier_block *nb, unsigned long acti
>>            * we have to remove the property when releasing
>>            * the device node.
>>            */
>> -        remove_ddw(np, false, DIRECT64_PROPNAME);
>> +        if (remove_ddw(np, false, DIRECT64_PROPNAME))
>> +            remove_ddw(np, false, DMA64_PROPNAME);
>> +
>>           if (pci && pci->table_group)
>>               iommu_pseries_free_group(pci->table_group,
>>                       np->full_name);
>>

-- 
Alexey

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-07-21  3:32     ` Alexey Kardashevskiy
@ 2021-07-21 15:04       ` Frederic Barrat
  2021-07-23  5:34         ` Alexey Kardashevskiy
  2021-08-17  5:59       ` Leonardo Brás
  1 sibling, 1 reply; 37+ messages in thread
From: Frederic Barrat @ 2021-07-21 15:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Leonardo Bras, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 21/07/2021 05:32, Alexey Kardashevskiy wrote:
>>> +        struct iommu_table *newtbl;
>>> +        int i;
>>> +
>>> +        for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
>>> +            const unsigned long mask = IORESOURCE_MEM_64 | 
>>> IORESOURCE_MEM;
>>> +
>>> +            /* Look for MMIO32 */
>>> +            if ((pci->phb->mem_resources[i].flags & mask) == 
>>> IORESOURCE_MEM)
>>> +                break;
>>> +        }
>>> +
>>> +        if (i == ARRAY_SIZE(pci->phb->mem_resources))
>>> +            goto out_del_list;
>>
>>
>> So we exit and do nothing if there's no MMIO32 bar?
>> Isn't the intent just to figure out the MMIO32 area to reserve it when 
>> init'ing the table? In which case we could default to 0,0
>>
>> I'm actually not clear why we are reserving this area on pseries.
> 
> 
> 
> If we do not reserve it, then the iommu code will allocate DMA pages 
> from there and these addresses are MMIO32 from the kernel pov at least. 
> I saw crashes when (I think) a device tried DMAing to the top 2GB of the 
> bus space which happened to be a some other device's BAR.


hmmm... then figuring out the correct range needs more work. We could 
have more than one MMIO32 bar. And they don't have to be adjacent. I 
don't see that we are reserving any range on the initial table though 
(on pseries).

   Fred

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-07-21 15:04       ` Frederic Barrat
@ 2021-07-23  5:34         ` Alexey Kardashevskiy
  2021-08-17  5:59           ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Alexey Kardashevskiy @ 2021-07-23  5:34 UTC (permalink / raw)
  To: Frederic Barrat, Leonardo Bras, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 22/07/2021 01:04, Frederic Barrat wrote:
> 
> 
> On 21/07/2021 05:32, Alexey Kardashevskiy wrote:
>>>> +        struct iommu_table *newtbl;
>>>> +        int i;
>>>> +
>>>> +        for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
>>>> +            const unsigned long mask = IORESOURCE_MEM_64 | 
>>>> IORESOURCE_MEM;
>>>> +
>>>> +            /* Look for MMIO32 */
>>>> +            if ((pci->phb->mem_resources[i].flags & mask) == 
>>>> IORESOURCE_MEM)
>>>> +                break;
>>>> +        }
>>>> +
>>>> +        if (i == ARRAY_SIZE(pci->phb->mem_resources))
>>>> +            goto out_del_list;
>>>
>>>
>>> So we exit and do nothing if there's no MMIO32 bar?
>>> Isn't the intent just to figure out the MMIO32 area to reserve it 
>>> when init'ing the table? In which case we could default to 0,0
>>>
>>> I'm actually not clear why we are reserving this area on pseries.
>>
>>
>>
>> If we do not reserve it, then the iommu code will allocate DMA pages 
>> from there and these addresses are MMIO32 from the kernel pov at 
>> least. I saw crashes when (I think) a device tried DMAing to the top 
>> 2GB of the bus space which happened to be a some other device's BAR.
> 
> 
> hmmm... then figuring out the correct range needs more work. We could 
> have more than one MMIO32 bar. And they don't have to be adjacent. 

They all have to be within the MMIO32 window of a PHB and we reserve the 
entire window here.

> I 
> don't see that we are reserving any range on the initial table though 
> (on pseries).
True, we did not need to, as the hypervisor always took care of DMA and 
MMIO32 regions to not overlap.

And in this series we do not (strictly speaking) need this either as 
phyp never allocates more than one window dynamically and that only 
window is always the second one starting from 0x800.0000.0000.0000. It 
is probably my mistake that KVM allows a new window to start from 0 - 
PAPR did not prohibit this explicitly.

And for the KVM case, we do not need to remove the default window as KVM 
can pretty much always allocate as many TCE as the VM wants. But we 
still allow removing the default window and creating a huge one instead 
at 0x0 as this way we can allow 1:1 for every single PCI device even if 
it only allows 48 (or similar but less than 64bit) DMA. Hope this makes 
sense. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  2021-07-20 17:49   ` Frederic Barrat
@ 2021-08-17  5:59     ` Leonardo Brás
  0 siblings, 0 replies; 37+ messages in thread
From: Leonardo Brás @ 2021-08-17  5:59 UTC (permalink / raw)
  To: Frederic Barrat, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

Hello Fred, thanks for this feedback!

On Tue, 2021-07-20 at 19:49 +0200, Frederic Barrat wrote:
> 
> >         kfree(window);
> >   
> > -out_clear_window:
> > -       remove_ddw(pdn, true);
> > +out_del_prop:
> > +       of_remove_property(pdn, win64);
> >   
> >   out_free_prop:
> >         kfree(win64->name);
> >         kfree(win64->value);
> >         kfree(win64);
> >   
> > +out_remove_win:
> > +       remove_ddw(pdn, true);
> 
> 
> I believe there's a small problem here. We jump directly to 
> out_remove_win if allocating the property failed. Yet, the first
> thing 
> remove_ddw() does is look for the property. So it will never find it
> and 
> the window is never removed by the hypervisor.
> 
>    Fred

That makes sense, thanks for catching this one!

What I intended here was just removing the DDW, so I think it should be
ok replacing remove_ddw() by a new helper that only does the rtas-call.

I will send a v6 with this change soon.

> 
> 
> > +
> >   out_failed:
> >         if (default_win_removed)
> >                 reset_dma_window(dev, pdn);
> > 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-07-21  3:32     ` Alexey Kardashevskiy
  2021-07-21 15:04       ` Frederic Barrat
@ 2021-08-17  5:59       ` Leonardo Brás
  1 sibling, 0 replies; 37+ messages in thread
From: Leonardo Brás @ 2021-08-17  5:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Frederic Barrat, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

Hello Alexey, Fred. Thanks for reviewing!


On Wed, 2021-07-21 at 13:32 +1000, Alexey Kardashevskiy wrote:
> > >     spin_lock(&direct_window_list_lock);
> > 
> > 
> > 
> > 
> > Somewhere around here, we have:
> > 
> > out_remove_win:
> >      remove_ddw(pdn, true, DIRECT64_PROPNAME);
> > 
> > We should replace with:
> >      remove_ddw(pdn, true, win_name);
> 
> 
> True. Good spotting. Or rework remove_dma_window() to take just a
> liobn. 
> Thanks,

Fred,
Good catch! On rebasing the last changes I did miss this one.

Due to reworking v5 06/11, I ended up with a solution that takes a
liobn instead of a win name (as suggested by Alexey), and this one is
no more :)


> 
> > 
> > 
> >    Fred
> > 
> > 
> > 
> > > 
> 

Thanks!


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2021-07-20 17:51   ` Frederic Barrat
@ 2021-08-17  5:59     ` Leonardo Brás
  2021-08-17  6:12       ` Leonardo Brás
  0 siblings, 1 reply; 37+ messages in thread
From: Leonardo Brás @ 2021-08-17  5:59 UTC (permalink / raw)
  To: Frederic Barrat, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

Hello Fred, thanks for the feedback!

On Tue, 2021-07-20 at 19:51 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > Update remove_dma_window() so it can be used to remove DDW with a
> > given
> > property name.
> > 
> > This enables the creation of new property names for DDW, so we can
> > have different usage for it, like indirect mapping.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> >   arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
> >   1 file changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index 108c3dcca686..17c6f4706e76 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -830,31 +830,32 @@ static void remove_dma_window(struct
> > device_node *np, u32 *ddw_avail,
> >                         np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> > liobn);
> >   }
> >   
> > -static void remove_ddw(struct device_node *np, bool remove_prop)
> > +static int remove_ddw(struct device_node *np, bool remove_prop,
> > const char *win_name)
> >   {
> 
> 
> Why switch to returning an int? None of the callers check it.

IIRC, in a previous version it did make sense, which is not the case
anymore. I will revert this.

Thanks!

> 
>    Fred



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-07-23  5:34         ` Alexey Kardashevskiy
@ 2021-08-17  5:59           ` Leonardo Brás
  0 siblings, 0 replies; 37+ messages in thread
From: Leonardo Brás @ 2021-08-17  5:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Frederic Barrat, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

Alexey, Fred:

On Fri, 2021-07-23 at 15:34 +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 22/07/2021 01:04, Frederic Barrat wrote:
> > 
> > 
> > On 21/07/2021 05:32, Alexey Kardashevskiy wrote:
> > > > > +        struct iommu_table *newtbl;
> > > > > +        int i;
> > > > > +
> > > > > +        for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources);
> > > > > i++) {
> > > > > +            const unsigned long mask = IORESOURCE_MEM_64 | 
> > > > > IORESOURCE_MEM;
> > > > > +
> > > > > +            /* Look for MMIO32 */
> > > > > +            if ((pci->phb->mem_resources[i].flags & mask) ==
> > > > > IORESOURCE_MEM)
> > > > > +                break;
> > > > > +        }
> > > > > +
> > > > > +        if (i == ARRAY_SIZE(pci->phb->mem_resources))
> > > > > +            goto out_del_list;
> > > > 
> > > > 
> > > > So we exit and do nothing if there's no MMIO32 bar?
> > > > Isn't the intent just to figure out the MMIO32 area to reserve
> > > > it 
> > > > when init'ing the table? In which case we could default to 0,0
> > > > 
> > > > I'm actually not clear why we are reserving this area on
> > > > pseries.
> > > 
> > > 
> > > 
> > > If we do not reserve it, then the iommu code will allocate DMA
> > > pages 
> > > from there and these addresses are MMIO32 from the kernel pov at 
> > > least. I saw crashes when (I think) a device tried DMAing to the
> > > top 
> > > 2GB of the bus space which happened to be a some other device's
> > > BAR.
> > 
> > 
> > hmmm... then figuring out the correct range needs more work. We
> > could 
> > have more than one MMIO32 bar. And they don't have to be adjacent. 
> 
> They all have to be within the MMIO32 window of a PHB and we reserve
> the 
> entire window here.
> 
> > I 
> > don't see that we are reserving any range on the initial table
> > though 
> > (on pseries).
> True, we did not need to, as the hypervisor always took care of DMA
> and 
> MMIO32 regions to not overlap.
> 
> And in this series we do not (strictly speaking) need this either as 
> phyp never allocates more than one window dynamically and that only 
> window is always the second one starting from 0x800.0000.0000.0000.
> It 
> is probably my mistake that KVM allows a new window to start from 0 -
> PAPR did not prohibit this explicitly.
> 
> And for the KVM case, we do not need to remove the default window as
> KVM 
> can pretty much always allocate as many TCE as the VM wants. But we 
> still allow removing the default window and creating a huge one
> instead 
> at 0x0 as this way we can allow 1:1 for every single PCI device even
> if 
> it only allows 48 (or similar but less than 64bit) DMA. Hope this
> makes 
> sense. Thanks,
> 
> 

Thank you for this discussion, I got to learn a lot!

If I got this, no further change will be necessary, is that correct?

I am testing a v6, and I intend to send it soon.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2021-08-17  5:59     ` Leonardo Brás
@ 2021-08-17  6:12       ` Leonardo Brás
  2021-08-24  6:31         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Leonardo Brás @ 2021-08-17  6:12 UTC (permalink / raw)
  To: Frederic Barrat, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Alexey Kardashevskiy, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel

On Tue, 2021-08-17 at 02:59 -0300, Leonardo Brás wrote:
> Hello Fred, thanks for the feedback!
> 
> On Tue, 2021-07-20 at 19:51 +0200, Frederic Barrat wrote:
> > 
> > 
> > On 16/07/2021 10:27, Leonardo Bras wrote:
> > > Update remove_dma_window() so it can be used to remove DDW with a
> > > given
> > > property name.
> > > 
> > > This enables the creation of new property names for DDW, so we
> > > can
> > > have different usage for it, like indirect mapping.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > > Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > ---
> > >   arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++--------
> > > --
> > >   1 file changed, 11 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > > b/arch/powerpc/platforms/pseries/iommu.c
> > > index 108c3dcca686..17c6f4706e76 100644
> > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > @@ -830,31 +830,32 @@ static void remove_dma_window(struct
> > > device_node *np, u32 *ddw_avail,
> > >                         np, ret,
> > > ddw_avail[DDW_REMOVE_PE_DMA_WIN],
> > > liobn);
> > >   }
> > >   
> > > -static void remove_ddw(struct device_node *np, bool remove_prop)
> > > +static int remove_ddw(struct device_node *np, bool remove_prop,
> > > const char *win_name)
> > >   {
> > 
> > 
> > Why switch to returning an int? None of the callers check it.
> 
> IIRC, in a previous version it did make sense, which is not the case
> anymore. I will revert this.
> 
> Thanks!

Oh, sorry about that, it is in fact still needed:

It will make sense in patch v5 10/11:
On iommu_reconfig_notifier(), if (action == OF_RECONFIG_DETACH_NODE),
we need to remove a DDW if it exists.

As there may be different window names, it tests for DIRECT64_PROPNAME,
and if it's not found, it tests for DMA64_PROPNAME.

This approach will skip scanning for DMA64_PROPNAME if
DIRECT64_PROPNAME was found, as both may not exist in the same node.
But for this approach to work we need remove_ddw() to return error if
the property is not found.

Does it make sense? or should I just test for both?

Best regards,
Leonardo Bras




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2021-08-17  6:12       ` Leonardo Brás
@ 2021-08-24  6:31         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 37+ messages in thread
From: Alexey Kardashevskiy @ 2021-08-24  6:31 UTC (permalink / raw)
  To: Leonardo Brás, Frederic Barrat, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, David Gibson,
	kernel test robot, Nicolin Chen
  Cc: linuxppc-dev, linux-kernel



On 17/08/2021 16:12, Leonardo Brás wrote:
> On Tue, 2021-08-17 at 02:59 -0300, Leonardo Brás wrote:
>> Hello Fred, thanks for the feedback!
>>
>> On Tue, 2021-07-20 at 19:51 +0200, Frederic Barrat wrote:
>>>
>>>
>>> On 16/07/2021 10:27, Leonardo Bras wrote:
>>>> Update remove_dma_window() so it can be used to remove DDW with a
>>>> given
>>>> property name.
>>>>
>>>> This enables the creation of new property names for DDW, so we
>>>> can
>>>> have different usage for it, like indirect mapping.
>>>>
>>>> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
>>>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>    arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++--------
>>>> --
>>>>    1 file changed, 11 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c
>>>> b/arch/powerpc/platforms/pseries/iommu.c
>>>> index 108c3dcca686..17c6f4706e76 100644
>>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>>> @@ -830,31 +830,32 @@ static void remove_dma_window(struct
>>>> device_node *np, u32 *ddw_avail,
>>>>                          np, ret,
>>>> ddw_avail[DDW_REMOVE_PE_DMA_WIN],
>>>> liobn);
>>>>    }
>>>>    
>>>> -static void remove_ddw(struct device_node *np, bool remove_prop)
>>>> +static int remove_ddw(struct device_node *np, bool remove_prop,
>>>> const char *win_name)
>>>>    {
>>>
>>>
>>> Why switch to returning an int? None of the callers check it.
>>
>> IIRC, in a previous version it did make sense, which is not the case
>> anymore. I will revert this.
>>
>> Thanks!
> 
> Oh, sorry about that, it is in fact still needed:


Then you should have added it in 10/11.

> 
> It will make sense in patch v5 10/11:
> On iommu_reconfig_notifier(), if (action == OF_RECONFIG_DETACH_NODE),
> we need to remove a DDW if it exists.
> 
> As there may be different window names, it tests for DIRECT64_PROPNAME,
> and if it's not found, it tests for DMA64_PROPNAME.
> 
> This approach will skip scanning for DMA64_PROPNAME if
> DIRECT64_PROPNAME was found, as both may not exist in the same node.
> But for this approach to work we need remove_ddw() to return error if
> the property is not found.
> 
> Does it make sense? or should I just test for both?

Or you could just try removing both without checking the return code, it 
is one extra of_find_property in very rare code path. Not worth 
reposting though imho. (sorry I was off last week, catching up). Thanks,



-- 
Alexey

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-08-24  6:32 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16  8:27 [PATCH v5 00/11] DDW + Indirect Mapping Leonardo Bras
2021-07-16  8:27 ` [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
2021-07-19 13:48   ` Frederic Barrat
2021-07-19 18:43     ` Leonardo Brás
2021-07-16  8:27 ` [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
2021-07-19 13:53   ` Frederic Barrat
2021-07-20  5:38     ` Leonardo Brás
2021-07-20  9:41       ` Alexey Kardashevskiy
2021-07-16  8:27 ` [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
2021-07-19 14:04   ` Frederic Barrat
2021-07-19 18:47     ` Leonardo Brás
2021-07-16  8:27 ` [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
2021-07-19 14:14   ` Frederic Barrat
2021-07-19 18:47     ` Leonardo Brás
2021-07-16  8:27 ` [PATCH v5 05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
2021-07-20 17:44   ` Frederic Barrat
2021-07-16  8:27 ` [PATCH v5 06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
2021-07-20 17:49   ` Frederic Barrat
2021-08-17  5:59     ` Leonardo Brás
2021-07-16  8:27 ` [PATCH v5 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
2021-07-20 17:50   ` Frederic Barrat
2021-07-16  8:27 ` [PATCH v5 08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
2021-07-20 17:51   ` Frederic Barrat
2021-08-17  5:59     ` Leonardo Brás
2021-08-17  6:12       ` Leonardo Brás
2021-08-24  6:31         ` Alexey Kardashevskiy
2021-07-16  8:27 ` [PATCH v5 09/11] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
2021-07-20 17:52   ` Frederic Barrat
2021-07-16  8:27 ` [PATCH v5 10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
2021-07-20 18:12   ` Frederic Barrat
2021-07-21  3:32     ` Alexey Kardashevskiy
2021-07-21 15:04       ` Frederic Barrat
2021-07-23  5:34         ` Alexey Kardashevskiy
2021-08-17  5:59           ` Leonardo Brás
2021-08-17  5:59       ` Leonardo Brás
2021-07-16  8:27 ` [PATCH v5 11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
2021-07-20 18:12   ` Frederic Barrat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).