linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] DDW Indirect Mapping
@ 2020-09-11 17:07 Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
                   ` (13 more replies)
  0 siblings, 14 replies; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

##
This patchset is based on top of:
https://github.com/linuxppc/linux/tree/next
that already contains
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=194179&state=%2A&archive=both
##

So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.

Using the DDW instead of the default DMA window may allow to expand the
amount of memory that can be DMA-mapped, given the number of pages (TCEs)
may stay the same and the default DMA window offers only 4k-pages
while DDW may offer larger pages (64k).

Patch #1 replaces hard-coded 4K page size with a variable containing the
correct page size for the window.

Patch #2 makes sure IOMMU_PAGE_SIZE() <= PAGE_SIZE, to avoid mapping
pages from other processess.

Patch #3 will save TCEs for small allocations when
IOMMU_PAGE_SIZE() < PAGE_SIZE.

Patch #4 let small allocations use largepool if there is no more space
left in the other pools, thus allowing the whole DMA window to be used by
smaller allocations.

Patch #5 introduces iommu_table_in_use(), and replace manual bit-field
checking where it's used. It will be used for aborting enable_ddw() if
there is any current iommu allocation and we are trying single window
indirect mapping.

Patch #6 introduces iommu_pseries_alloc_table() that will be helpful
when indirect mapping needs to replace the iommu_table.

Patch #7 adds helpers for adding DDWs in the list.

Patch #8 refactors enable_ddw() so it returns if direct mapping is
possible, instead of DMA offset. It helps for next patches on
indirect DMA mapping and also allows DMA windows starting at 0x00.

Patch #9 bring new helper to simplify enable_ddw(), allowing
some reorganization for introducing indirect mapping DDW.

Patch #10 adds new helper _iommu_table_setparms() and use it in other
*setparams*() to fill iommu_table. It will also be used for creating a
new iommu_table for indirect mapping.

Patch #11 updates remove_dma_window() to accept different property names,
so we can introduce a new property for indirect mapping.

Patch #12 extracts find_existing_ddw_windows() into
find_existing_ddw_windows_named(), and calls it by it's property name.
This will be useful when the property for indirect mapping is created,
so we can search the device-tree for both properties.

Patch #13:
Instead of destroying the created DDW if it doesn't map the whole
partition, make use of it instead of the default DMA window as it improves
performance. Also, update the iommu_table and re-generate the pools.
It introduces a new property name for DDW with indirect DMA mapping.

Patch #14:
Does some renaming of 'direct window' to 'dma window', given the DDW
created can now be also used in indirect mapping if direct mapping is not
available.

All patches were tested into an LPAR with an Ethernet VF:
4005:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4 Virtual Function]

Patchset was tested with a 64GB DDW which did not map the whole
partition (128G).

Leonardo Bras (14):
  powerpc/pseries/iommu: Replace hard-coded page shift
  powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE
  powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
  powerpc/kernel/iommu: Use largepool as a last resort when !largealloc
  powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  powerpc/pseries/iommu: Allow DDW windows starting at 0x00
  powerpc/pseries/iommu: Add ddw_property_create() and refactor
    enable_ddw()
  powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new
    helper
  powerpc/pseries/iommu: Update remove_dma_window() to accept property
    name
  powerpc/pseries/iommu: Find existing DDW with given property name
  powerpc/pseries/iommu: Make use of DDW for indirect mapping
  powerpc/pseries/iommu: Rename "direct window" to "dma window"

 arch/powerpc/include/asm/iommu.h       |   1 +
 arch/powerpc/include/asm/tce.h         |   8 -
 arch/powerpc/kernel/iommu.c            |  86 ++--
 arch/powerpc/platforms/pseries/iommu.c | 648 ++++++++++++++-----------
 4 files changed, 417 insertions(+), 326 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:56   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 02/14] powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE Leonardo Bras
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Some functions assume IOMMU page size can only be 4K (pageshift == 12).
Update them to accept any page size passed, so we can use 64K pages.

In the process, some defines like TCE_SHIFT were made obsolete, and then
removed.

IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
tce_buildmulti_pSeriesLP().

Most places had a tbl struct, so using tbl->it_page_shift was simple.
tce_free_pSeriesLP() was a special case, since callers not always have a
tbl struct, so adding a tceshift parameter seems the right thing to do.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/include/asm/tce.h         |  8 ------
 arch/powerpc/platforms/pseries/iommu.c | 39 +++++++++++++++-----------
 2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index db5fc2f2262d..0c34d2756d92 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -19,15 +19,7 @@
 #define TCE_VB			0
 #define TCE_PCI			1
 
-/* TCE page size is 4096 bytes (1 << 12) */
-
-#define TCE_SHIFT	12
-#define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
-
 #define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
-
-#define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
-#define TCE_RPN_SHIFT		12
 #define TCE_VALID		0x800		/* TCE valid */
 #define TCE_ALLIO		0x400		/* TCE valid for all lpars */
 #define TCE_PCI_WRITE		0x2		/* write from PCI allowed */
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index e4198700ed1a..9db3927607a4 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
 	u64 proto_tce;
 	__be64 *tcep;
 	u64 rpn;
+	const unsigned long tceshift = tbl->it_page_shift;
+	const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
 
 	proto_tce = TCE_PCI_READ; // Read allowed
 
@@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
 
 	while (npages--) {
 		/* can't move this out since we might cross MEMBLOCK boundary */
-		rpn = __pa(uaddr) >> TCE_SHIFT;
-		*tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
+		rpn = __pa(uaddr) >> tceshift;
+		*tcep = cpu_to_be64(proto_tce | rpn << tceshift);
 
-		uaddr += TCE_PAGE_SIZE;
+		uaddr += pagesize;
 		tcep++;
 	}
 	return 0;
@@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
 	return be64_to_cpu(*tcep);
 }
 
-static void tce_free_pSeriesLP(unsigned long liobn, long, long);
+static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
 static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
 
 static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
@@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
 		proto_tce |= TCE_PCI_WRITE;
 
 	while (npages--) {
-		tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
+		tce = proto_tce | rpn << tceshift;
 		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
 
 		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
 			ret = (int)rc;
-			tce_free_pSeriesLP(liobn, tcenum_start,
+			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
 			                   (npages_start - (npages + 1)));
 			break;
 		}
@@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	long tcenum_start = tcenum, npages_start = npages;
 	int ret = 0;
 	unsigned long flags;
+	const unsigned long tceshift = tbl->it_page_shift;
 
 	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
 		return tce_build_pSeriesLP(tbl->it_index, tcenum,
-					   tbl->it_page_shift, npages, uaddr,
+					   tceshift, npages, uaddr,
 		                           direction, attrs);
 	}
 
@@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 		if (!tcep) {
 			local_irq_restore(flags);
 			return tce_build_pSeriesLP(tbl->it_index, tcenum,
-					tbl->it_page_shift,
+					tceshift,
 					npages, uaddr, direction, attrs);
 		}
 		__this_cpu_write(tce_page, tcep);
 	}
 
-	rpn = __pa(uaddr) >> TCE_SHIFT;
+	rpn = __pa(uaddr) >> tceshift;
 	proto_tce = TCE_PCI_READ;
 	if (direction != DMA_TO_DEVICE)
 		proto_tce |= TCE_PCI_WRITE;
@@ -245,12 +248,12 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 		limit = min_t(long, npages, 4096/TCE_ENTRY_SIZE);
 
 		for (l = 0; l < limit; l++) {
-			tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
+			tcep[l] = cpu_to_be64(proto_tce | rpn << tceshift);
 			rpn++;
 		}
 
 		rc = plpar_tce_put_indirect((u64)tbl->it_index,
-					    (u64)tcenum << 12,
+					    (u64)tcenum << tceshift,
 					    (u64)__pa(tcep),
 					    limit);
 
@@ -277,12 +280,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	return ret;
 }
 
-static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
+static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
+			       long npages)
 {
 	u64 rc;
 
 	while (npages--) {
-		rc = plpar_tce_put((u64)liobn, (u64)tcenum << 12, 0);
+		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, 0);
 
 		if (rc && printk_ratelimit()) {
 			printk("tce_free_pSeriesLP: plpar_tce_put failed. rc=%lld\n", rc);
@@ -301,9 +305,11 @@ static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long n
 	u64 rc;
 
 	if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
-		return tce_free_pSeriesLP(tbl->it_index, tcenum, npages);
+		return tce_free_pSeriesLP(tbl->it_index, tcenum,
+					  tbl->it_page_shift, npages);
 
-	rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
+	rc = plpar_tce_stuff((u64)tbl->it_index,
+			     (u64)tcenum << tbl->it_page_shift, 0, npages);
 
 	if (rc && printk_ratelimit()) {
 		printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
@@ -319,7 +325,8 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
 	u64 rc;
 	unsigned long tce_ret;
 
-	rc = plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, &tce_ret);
+	rc = plpar_tce_get((u64)tbl->it_index,
+			   (u64)tcenum << tbl->it_page_shift, &tce_ret);
 
 	if (rc && printk_ratelimit()) {
 		printk("tce_get_pSeriesLP: plpar_tce_get failed. rc=%lld\n", rc);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 02/14] powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:57   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 03/14] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs Leonardo Bras
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Having System pagesize < IOMMU pagesize may cause a page owned by another
process/VM to be written by a buggy driver / device.

As it's intended to use DDW for indirect mapping, it's possible to get
a configuration where (PAGE_SIZE = 4k) < (IOMMU_PAGE_SIZE() = 64k).

To avoid this, make sure create_ddw() can only use pagesize <= PAGE_SIZE.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9db3927607a4..4eff3a6a441e 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1206,17 +1206,20 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 			goto out_failed;
 		}
 	}
-	if (query.page_size & 4) {
-		page_shift = 24; /* 16MB */
-	} else if (query.page_size & 2) {
+
+	/* Choose the biggest pagesize available <= PAGE_SHIFT */
+	if ((query.page_size & 4)) {
+		page_shift = 24; /* 16MB - Hugepages */
+	} else if ((query.page_size & 2) && PAGE_SHIFT >= 16) {
 		page_shift = 16; /* 64kB */
-	} else if (query.page_size & 1) {
+	} else if ((query.page_size & 1) && PAGE_SHIFT >= 12) {
 		page_shift = 12; /* 4kB */
 	} else {
 		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
 			  query.page_size);
 		goto out_failed;
 	}
+
 	/* verify the window * number of ptes will map the partition */
 	/* check largest block * page size > max memory hotplug addr */
 	max_addr = ddw_memory_hotplug_max();
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 03/14] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 02/14] powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:57   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 04/14] powerpc/kernel/iommu: Use largepool as a last resort when !largealloc Leonardo Bras
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Currently both iommu_alloc_coherent() and iommu_free_coherent() align the
desired allocation size to PAGE_SIZE, and gets system pages and IOMMU
mappings (TCEs) for that value.

When IOMMU_PAGE_SIZE < PAGE_SIZE, this behavior may cause unnecessary
TCEs to be created for mapping the whole system page.

Example:
- PAGE_SIZE = 64k, IOMMU_PAGE_SIZE() = 4k
- iommu_alloc_coherent() is called for 128 bytes
- 1 system page (64k) is allocated
- 16 IOMMU pages (16 x 4k) are allocated (16 TCEs used)

It would be enough to use a single TCE for this, so 15 TCEs are
wasted in the process.

Update iommu_*_coherent() to make sure the size alignment happens only
for IOMMU_PAGE_SIZE() before calling iommu_alloc() and iommu_free().

Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
with IOMMU_PAGE_ALIGN(n, tbl), which is easier to read and does the
same.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/kernel/iommu.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9704f3f76e63..7961645a6980 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device *dev,
 	}
 
 	if (dev)
-		boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
-				      1 << tbl->it_page_shift);
+		boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, tbl);
 	else
-		boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
+		boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
 	/* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
 
 	n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
@@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
 	unsigned int order;
 	unsigned int nio_pages, io_order;
 	struct page *page;
+	size_t size_io = size;
 
 	size = PAGE_ALIGN(size);
 	order = get_order(size);
@@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
 	memset(ret, 0, size);
 
 	/* Set up tces to cover the allocated range */
-	nio_pages = size >> tbl->it_page_shift;
-	io_order = get_iommu_order(size, tbl);
+	size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
+	nio_pages = size_io >> tbl->it_page_shift;
+	io_order = get_iommu_order(size_io, tbl);
 	mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
 			      mask >> tbl->it_page_shift, io_order, 0);
 	if (mapping == DMA_MAPPING_ERROR) {
@@ -900,10 +901,9 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size,
 			 void *vaddr, dma_addr_t dma_handle)
 {
 	if (tbl) {
-		unsigned int nio_pages;
+		size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
+		unsigned int nio_pages = size_io >> tbl->it_page_shift;
 
-		size = PAGE_ALIGN(size);
-		nio_pages = size >> tbl->it_page_shift;
 		iommu_free(tbl, dma_handle, nio_pages);
 		size = PAGE_ALIGN(size);
 		free_pages((unsigned long)vaddr, get_order(size));
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 04/14] powerpc/kernel/iommu: Use largepool as a last resort when !largealloc
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (2 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 03/14] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

As of today, doing iommu_range_alloc() only for !largealloc (npages <= 15)
will only be able to use 3/4 of the available pages, given pages on
largepool  not being available for !largealloc.

This could mean some drivers not being able to fully use all the available
pages for the DMA window.

Add pages on largepool as a last resort for !largealloc, making all pages
of the DMA window available.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 7961645a6980..ffb2637dc82b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -261,6 +261,15 @@ static unsigned long iommu_range_alloc(struct device *dev,
 			pass++;
 			goto again;
 
+		} else if (pass == tbl->nr_pools + 1) {
+			/* Last resort: try largepool */
+			spin_unlock(&pool->lock);
+			pool = &tbl->large_pool;
+			spin_lock(&pool->lock);
+			pool->hint = pool->start;
+			pass++;
+			goto again;
+
 		} else {
 			/* Give up */
 			spin_unlock_irqrestore(&(pool->lock), flags);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (3 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 04/14] powerpc/kernel/iommu: Use largepool as a last resort when !largealloc Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:57   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 06/14] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Having a function to check if the iommu table has any allocation helps
deciding if a tbl can be reset for using a new DMA window.

It should be enough to replace all instances of !bitmap_empty(tbl...).

iommu_table_in_use() skips reserved memory, so we don't need to worry about
releasing it before testing. This causes iommu_table_release_pages() to
become unnecessary, given it is only used to remove reserved memory for
testing.

Also, only allow storing reserved memory values in tbl if they are valid
in the table, so there is no need to check it in the new helper.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/include/asm/iommu.h |  1 +
 arch/powerpc/kernel/iommu.c      | 61 +++++++++++++++-----------------
 2 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 5032f1593299..2913e5c8b1f8 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -154,6 +154,7 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
  */
 extern struct iommu_table *iommu_init_table(struct iommu_table *tbl,
 		int nid, unsigned long res_start, unsigned long res_end);
+bool iommu_table_in_use(struct iommu_table *tbl);
 
 #define IOMMU_TABLE_GROUP_MAX_TABLES	2
 
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ffb2637dc82b..c838da3d8f32 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -655,34 +655,21 @@ static void iommu_table_reserve_pages(struct iommu_table *tbl,
 	if (tbl->it_offset == 0)
 		set_bit(0, tbl->it_map);
 
+	/* Check if res_start..res_end is a valid range in the table */
+	if (res_start >= res_end || res_start < tbl->it_offset ||
+	    res_end > (tbl->it_offset + tbl->it_size)) {
+		tbl->it_reserved_start = tbl->it_offset;
+		tbl->it_reserved_end = tbl->it_offset;
+		return;
+	}
+
 	tbl->it_reserved_start = res_start;
 	tbl->it_reserved_end = res_end;
 
-	/* Check if res_start..res_end isn't empty and overlaps the table */
-	if (res_start && res_end &&
-			(tbl->it_offset + tbl->it_size < res_start ||
-			 res_end < tbl->it_offset))
-		return;
-
 	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
 		set_bit(i - tbl->it_offset, tbl->it_map);
 }
 
-static void iommu_table_release_pages(struct iommu_table *tbl)
-{
-	int i;
-
-	/*
-	 * In case we have reserved the first bit, we should not emit
-	 * the warning below.
-	 */
-	if (tbl->it_offset == 0)
-		clear_bit(0, tbl->it_map);
-
-	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
-		clear_bit(i - tbl->it_offset, tbl->it_map);
-}
-
 /*
  * Build a iommu_table structure.  This contains a bit map which
  * is used to manage allocation of the tce space.
@@ -743,6 +730,23 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
 	return tbl;
 }
 
+bool iommu_table_in_use(struct iommu_table *tbl)
+{
+	unsigned long start = 0, end;
+
+	/* ignore reserved bit0 */
+	if (tbl->it_offset == 0)
+		start = 1;
+	end = tbl->it_reserved_start - tbl->it_offset;
+	if (find_next_bit(tbl->it_map, end, start) != end)
+		return true;
+
+	start = tbl->it_reserved_end - tbl->it_offset;
+	end = tbl->it_size;
+	return find_next_bit(tbl->it_map, end, start) != end;
+
+}
+
 static void iommu_table_free(struct kref *kref)
 {
 	unsigned long bitmap_sz;
@@ -759,10 +763,8 @@ static void iommu_table_free(struct kref *kref)
 		return;
 	}
 
-	iommu_table_release_pages(tbl);
-
 	/* verify that table contains no entries */
-	if (!bitmap_empty(tbl->it_map, tbl->it_size))
+	if (iommu_table_in_use(tbl))
 		pr_warn("%s: Unexpected TCEs\n", __func__);
 
 	/* calculate bitmap size in bytes */
@@ -1068,18 +1070,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_lock(&tbl->pools[i].lock);
 
-	iommu_table_release_pages(tbl);
-
-	if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
+	if (iommu_table_in_use(tbl)) {
 		pr_err("iommu_tce: it_map is not empty");
 		ret = -EBUSY;
-		/* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
-		iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
-				tbl->it_reserved_end);
-	} else {
-		memset(tbl->it_map, 0xff, sz);
 	}
 
+	memset(tbl->it_map, 0xff, sz);
+
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_unlock(&tbl->pools[i].lock);
 	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 06/14] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (4 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 07/14] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Creates a helper to allow allocating a new iommu_table without the need
to reallocate the iommu_group.

This will be helpful for replacing the iommu_table for the new DMA window,
after we remove the old one with iommu_tce_table_put().

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 4eff3a6a441e..49008d2e217d 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,28 +53,31 @@ enum {
 	DDW_EXT_QUERY_OUT_SIZE = 2
 };
 
-static struct iommu_table_group *iommu_pseries_alloc_group(int node)
+static struct iommu_table *iommu_pseries_alloc_table(int node)
 {
-	struct iommu_table_group *table_group;
 	struct iommu_table *tbl;
 
-	table_group = kzalloc_node(sizeof(struct iommu_table_group), GFP_KERNEL,
-			   node);
-	if (!table_group)
-		return NULL;
-
 	tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
 	if (!tbl)
-		goto free_group;
+		return NULL;
 
 	INIT_LIST_HEAD_RCU(&tbl->it_group_list);
 	kref_init(&tbl->it_kref);
+	return tbl;
+}
 
-	table_group->tables[0] = tbl;
+static struct iommu_table_group *iommu_pseries_alloc_group(int node)
+{
+	struct iommu_table_group *table_group;
+
+	table_group = kzalloc_node(sizeof(*table_group), GFP_KERNEL, node);
+	if (!table_group)
+		return NULL;
 
-	return table_group;
+	table_group->tables[0] = iommu_pseries_alloc_table(node);
+	if (table_group->tables[0])
+		return table_group;
 
-free_group:
 	kfree(table_group);
 	return NULL;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 07/14] powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (5 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 06/14] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:57   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 08/14] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

There are two functions creating direct_window_list entries in a
similar way, so create a ddw_list_new_entry() to avoid duplicity and
simplify those functions.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 32 +++++++++++++++++---------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 49008d2e217d..e4c17d27dcf3 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -869,6 +869,21 @@ static u64 find_existing_ddw(struct device_node *pdn)
 	return dma_addr;
 }
 
+static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
+						const struct dynamic_dma_window_prop *dma64)
+{
+	struct direct_window *window;
+
+	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	if (!window)
+		return NULL;
+
+	window->device = pdn;
+	window->prop = dma64;
+
+	return window;
+}
+
 static int find_existing_ddw_windows(void)
 {
 	int len;
@@ -881,18 +896,15 @@ static int find_existing_ddw_windows(void)
 
 	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
 		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
-		if (!direct64)
-			continue;
-
-		window = kzalloc(sizeof(*window), GFP_KERNEL);
-		if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
-			kfree(window);
+		if (!direct64 || len < sizeof(*direct64)) {
 			remove_ddw(pdn, true);
 			continue;
 		}
 
-		window->device = pdn;
-		window->prop = direct64;
+		window = ddw_list_new_entry(pdn, direct64);
+		if (!window)
+			break;
+
 		spin_lock(&direct_window_list_lock);
 		list_add(&window->list, &direct_window_list);
 		spin_unlock(&direct_window_list_lock);
@@ -1261,7 +1273,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
 		  create.liobn, dn);
 
-	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	window = ddw_list_new_entry(pdn, ddwprop);
 	if (!window)
 		goto out_clear_window;
 
@@ -1280,8 +1292,6 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		goto out_free_window;
 	}
 
-	window->device = pdn;
-	window->prop = ddwprop;
 	spin_lock(&direct_window_list_lock);
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 08/14] powerpc/pseries/iommu: Allow DDW windows starting at 0x00
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (6 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 07/14] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

enable_ddw() currently returns the address of the DMA window, which is
considered invalid if has the value 0x00.

Also, it only considers valid an address returned from find_existing_ddw
if it's not 0x00.

Changing this behavior makes sense, given the users of enable_ddw() only
need to know if direct mapping is possible. It can also allow a DMA window
starting at 0x00 to be used.

This will be helpful for using a DDW with indirect mapping, as the window
address will be different than 0x00, but it will not map the whole
partition.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/pseries/iommu.c | 30 ++++++++++++--------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index e4c17d27dcf3..bba8584a8f9d 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -849,24 +849,25 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
 			np, ret);
 }
 
-static u64 find_existing_ddw(struct device_node *pdn)
+static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
 {
 	struct direct_window *window;
 	const struct dynamic_dma_window_prop *direct64;
-	u64 dma_addr = 0;
+	bool found = false;
 
 	spin_lock(&direct_window_list_lock);
 	/* check if we already created a window and dupe that config if so */
 	list_for_each_entry(window, &direct_window_list, list) {
 		if (window->device == pdn) {
 			direct64 = window->prop;
-			dma_addr = be64_to_cpu(direct64->dma_base);
+			*dma_addr = be64_to_cpu(direct64->dma_base);
+			found = true;
 			break;
 		}
 	}
 	spin_unlock(&direct_window_list_lock);
 
-	return dma_addr;
+	return found;
 }
 
 static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
@@ -1129,15 +1130,15 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
  * pdn: the parent pe node with the ibm,dma_window property
  * Future: also check if we can remap the base window for our base page size
  *
- * returns the dma offset for use by the direct mapped DMA code.
+ * returns true if can map all pages (direct mapping), false otherwise..
  */
-static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
+static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 {
 	int len, ret;
 	struct ddw_query_response query;
 	struct ddw_create_response create;
 	int page_shift;
-	u64 dma_addr, max_addr;
+	u64 max_addr;
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	struct direct_window *window;
@@ -1148,8 +1149,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 
 	mutex_lock(&direct_window_init_mutex);
 
-	dma_addr = find_existing_ddw(pdn);
-	if (dma_addr != 0)
+	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset))
 		goto out_unlock;
 
 	/*
@@ -1296,7 +1296,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
 
-	dma_addr = be64_to_cpu(ddwprop->dma_base);
+	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
 	goto out_unlock;
 
 out_free_window:
@@ -1309,6 +1309,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	kfree(win64->name);
 	kfree(win64->value);
 	kfree(win64);
+	win64 = NULL;
 
 out_failed:
 	if (default_win_removed)
@@ -1322,7 +1323,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 
 out_unlock:
 	mutex_unlock(&direct_window_init_mutex);
-	return dma_addr;
+	return win64;
 }
 
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
@@ -1401,11 +1402,8 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
 			break;
 	}
 
-	if (pdn && PCI_DN(pdn)) {
-		pdev->dev.archdata.dma_offset = enable_ddw(pdev, pdn);
-		if (pdev->dev.archdata.dma_offset)
-			return true;
-	}
+	if (pdn && PCI_DN(pdn))
+		return enable_ddw(pdev, pdn);
 
 	return false;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (7 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 08/14] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:56   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Code used to create a ddw property that was previously scattered in
enable_ddw() is now gathered in ddw_property_create(), which deals with
allocation and filling the property, letting it ready for
of_property_add(), which now occurs in sequence.

This created an opportunity to reorganize the second part of enable_ddw():

Without this patch enable_ddw() does, in order:
kzalloc() property & members, create_ddw(), fill ddwprop inside property,
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
of_add_property(), and list_add().

With this patch enable_ddw() does, in order:
create_ddw(), ddw_property_create(), of_add_property(),
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
and list_add().

This change requires of_remove_property() in case anything fails after
of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
in all memory, which looks the most expensive operation, only if
everything else succeeds.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 106 +++++++++++++++----------
 1 file changed, 63 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index bba8584a8f9d..510ccb0521af 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -177,7 +177,7 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
 		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
 			ret = (int)rc;
 			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
-			                   (npages_start - (npages + 1)));
+					   (npages_start - (npages + 1)));
 			break;
 		}
 
@@ -215,7 +215,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
 		return tce_build_pSeriesLP(tbl->it_index, tcenum,
 					   tceshift, npages, uaddr,
-		                           direction, attrs);
+					   direction, attrs);
 	}
 
 	local_irq_save(flags);	/* to protect tcep and the page behind it */
@@ -269,7 +269,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
 	if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
 		ret = (int)rc;
 		tce_freemulti_pSeriesLP(tbl, tcenum_start,
-		                        (npages_start - (npages + limit)));
+					(npages_start - (npages + limit)));
 		return ret;
 	}
 
@@ -1121,6 +1121,35 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
 			 ret);
 }
 
+static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
+					    u32 page_shift, u32 window_shift)
+{
+	struct dynamic_dma_window_prop *ddwprop;
+	struct property *win64;
+
+	win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
+	if (!win64)
+		return NULL;
+
+	win64->name = kstrdup(propname, GFP_KERNEL);
+	ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
+	win64->value = ddwprop;
+	win64->length = sizeof(*ddwprop);
+	if (!win64->name || !win64->value) {
+		kfree(win64);
+		kfree(win64->name);
+		kfree(win64->value);
+		return NULL;
+	}
+
+	ddwprop->liobn = cpu_to_be32(liobn);
+	ddwprop->dma_base = cpu_to_be64(dma_addr);
+	ddwprop->tce_shift = cpu_to_be32(page_shift);
+	ddwprop->window_shift = cpu_to_be32(window_shift);
+
+	return win64;
+}
+
 /*
  * If the PE supports dynamic dma windows, and there is space for a table
  * that can map all pages in a linear offset, then setup such a table,
@@ -1138,12 +1167,11 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	struct ddw_query_response query;
 	struct ddw_create_response create;
 	int page_shift;
-	u64 max_addr;
+	u64 max_addr, win_addr;
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	struct direct_window *window;
-	struct property *win64;
-	struct dynamic_dma_window_prop *ddwprop;
+	struct property *win64 = NULL;
 	struct failed_ddw_pdn *fpdn;
 	bool default_win_removed = false;
 
@@ -1245,72 +1273,64 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		goto out_failed;
 	}
 	len = order_base_2(max_addr);
-	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
-	if (!win64) {
-		dev_info(&dev->dev,
-			"couldn't allocate property for 64bit dma window\n");
-		goto out_failed;
-	}
-	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
-	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
-	win64->length = sizeof(*ddwprop);
-	if (!win64->name || !win64->value) {
-		dev_info(&dev->dev,
-			"couldn't allocate property name and value\n");
-		goto out_free_prop;
-	}
 
 	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
 	if (ret != 0)
-		goto out_free_prop;
-
-	ddwprop->liobn = cpu_to_be32(create.liobn);
-	ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
-			create.addr_lo);
-	ddwprop->tce_shift = cpu_to_be32(page_shift);
-	ddwprop->window_shift = cpu_to_be32(len);
+		goto out_failed;
 
 	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
-		  create.liobn, dn);
+		create.liobn, dn);
 
-	window = ddw_list_new_entry(pdn, ddwprop);
+	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
+	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
+				    page_shift, len);
+	if (!win64) {
+		dev_info(&dev->dev,
+			 "couldn't allocate property, property name, or value\n");
+		goto out_win_del;
+	}
+
+	ret = of_add_property(pdn, win64);
+	if (ret) {
+		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
+			pdn, ret);
+		goto out_prop_free;
+	}
+
+	window = ddw_list_new_entry(pdn, win64->value);
 	if (!window)
-		goto out_clear_window;
+		goto out_prop_del;
 
 	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
 			win64->value, tce_setrange_multi_pSeriesLP_walk);
 	if (ret) {
 		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
 			 dn, ret);
-		goto out_free_window;
-	}
-
-	ret = of_add_property(pdn, win64);
-	if (ret) {
-		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
-			 pdn, ret);
-		goto out_free_window;
+		goto out_list_del;
 	}
 
 	spin_lock(&direct_window_list_lock);
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
 
-	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
+	dev->dev.archdata.dma_offset = win_addr;
 	goto out_unlock;
 
-out_free_window:
+out_list_del:
 	kfree(window);
 
-out_clear_window:
-	remove_ddw(pdn, true);
+out_prop_del:
+	of_remove_property(pdn, win64);
 
-out_free_prop:
+out_prop_free:
 	kfree(win64->name);
 	kfree(win64->value);
 	kfree(win64);
 	win64 = NULL;
 
+out_win_del:
+	remove_ddw(pdn, true);
+
 out_failed:
 	if (default_win_removed)
 		reset_dma_window(dev, pdn);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (8 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:56   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Add a new helper _iommu_table_setparms(), and use it in
iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
code.

Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
so move it to the new helper. Since we need the iommu_table_ops to be
declared before used, move iommu_table_lpar_multi_ops and
iommu_table_pseries_ops to before their respective iommu_table_setparms*().

The tce_exchange_pseries() also had to be moved up, since it's used in
iommu_table_lpar_multi_ops.xchg_no_kill.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 149 ++++++++++++-------------
 1 file changed, 72 insertions(+), 77 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 510ccb0521af..abd36b257725 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -495,12 +495,62 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 	return rc;
 }
 
+#ifdef CONFIG_IOMMU_API
+static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
+				long *tce, enum dma_data_direction *direction,
+				bool realmode)
+{
+	long rc;
+	unsigned long ioba = (unsigned long)index << tbl->it_page_shift;
+	unsigned long flags, oldtce = 0;
+	u64 proto_tce = iommu_direction_to_tce_perm(*direction);
+	unsigned long newtce = *tce | proto_tce;
+
+	spin_lock_irqsave(&tbl->large_pool.lock, flags);
+
+	rc = plpar_tce_get((u64)tbl->it_index, ioba, &oldtce);
+	if (!rc)
+		rc = plpar_tce_put((u64)tbl->it_index, ioba, newtce);
+
+	if (!rc) {
+		*direction = iommu_tce_direction(oldtce);
+		*tce = oldtce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
+	}
+
+	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
+
+	return rc;
+}
+#endif
+
 static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
 		unsigned long num_pfn, void *arg)
 {
 	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
 }
 
+static inline void _iommu_table_setparms(struct iommu_table *tbl, unsigned long busno,
+					 unsigned long liobn, unsigned long win_addr,
+					 unsigned long window_size, unsigned long page_shift,
+					 unsigned long base, struct iommu_table_ops *table_ops)
+{
+	tbl->it_busno = busno;
+	tbl->it_index = liobn;
+	tbl->it_offset = win_addr >> page_shift;
+	tbl->it_size = window_size >> page_shift;
+	tbl->it_page_shift = page_shift;
+	tbl->it_base = base;
+	tbl->it_blocksize = 16;
+	tbl->it_type = TCE_PCI;
+	tbl->it_ops = table_ops;
+}
+
+struct iommu_table_ops iommu_table_pseries_ops = {
+	.set = tce_build_pSeries,
+	.clear = tce_free_pSeries,
+	.get = tce_get_pseries
+};
+
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
 				 struct iommu_table *tbl)
@@ -509,8 +559,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
 	const unsigned long *basep;
 	const u32 *sizep;
 
-	node = phb->dn;
+	/* Test if we are going over 2GB of DMA space */
+	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
+		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
+		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
+	}
 
+	node = phb->dn;
 	basep = of_get_property(node, "linux,tce-base", NULL);
 	sizep = of_get_property(node, "linux,tce-size", NULL);
 	if (basep == NULL || sizep == NULL) {
@@ -519,33 +574,25 @@ static void iommu_table_setparms(struct pci_controller *phb,
 		return;
 	}
 
-	tbl->it_base = (unsigned long)__va(*basep);
+	_iommu_table_setparms(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
+			      phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
+			      (unsigned long)__va(*basep), &iommu_table_pseries_ops);
 
 	if (!is_kdump_kernel())
 		memset((void *)tbl->it_base, 0, *sizep);
 
-	tbl->it_busno = phb->bus->number;
-	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
-
-	/* Units of tce entries */
-	tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
-
-	/* Test if we are going over 2GB of DMA space */
-	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
-		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
-		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
-	}
-
 	phb->dma_window_base_cur += phb->dma_window_size;
-
-	/* Set the tce table size - measured in entries */
-	tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
-
-	tbl->it_index = 0;
-	tbl->it_blocksize = 16;
-	tbl->it_type = TCE_PCI;
 }
 
+struct iommu_table_ops iommu_table_lpar_multi_ops = {
+	.set = tce_buildmulti_pSeriesLP,
+#ifdef CONFIG_IOMMU_API
+	.xchg_no_kill = tce_exchange_pseries,
+#endif
+	.clear = tce_freemulti_pSeriesLP,
+	.get = tce_get_pSeriesLP
+};
+
 /*
  * iommu_table_setparms_lpar
  *
@@ -557,28 +604,17 @@ static void iommu_table_setparms_lpar(struct pci_controller *phb,
 				      struct iommu_table_group *table_group,
 				      const __be32 *dma_window)
 {
-	unsigned long offset, size;
+	unsigned long offset, size, liobn;
 
-	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
+	of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
 
-	tbl->it_busno = phb->bus->number;
-	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
-	tbl->it_base   = 0;
-	tbl->it_blocksize  = 16;
-	tbl->it_type = TCE_PCI;
-	tbl->it_offset = offset >> tbl->it_page_shift;
-	tbl->it_size = size >> tbl->it_page_shift;
+	_iommu_table_setparms(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, 0,
+			      &iommu_table_lpar_multi_ops);
 
 	table_group->tce32_start = offset;
 	table_group->tce32_size = size;
 }
 
-struct iommu_table_ops iommu_table_pseries_ops = {
-	.set = tce_build_pSeries,
-	.clear = tce_free_pSeries,
-	.get = tce_get_pseries
-};
-
 static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 {
 	struct device_node *dn;
@@ -647,7 +683,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 	tbl = pci->table_group->tables[0];
 
 	iommu_table_setparms(pci->phb, dn, tbl);
-	tbl->it_ops = &iommu_table_pseries_ops;
 	iommu_init_table(tbl, pci->phb->node, 0, 0);
 
 	/* Divide the rest (1.75GB) among the children */
@@ -658,43 +693,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 	pr_debug("ISA/IDE, window size is 0x%llx\n", pci->phb->dma_window_size);
 }
 
-#ifdef CONFIG_IOMMU_API
-static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
-				long *tce, enum dma_data_direction *direction,
-				bool realmode)
-{
-	long rc;
-	unsigned long ioba = (unsigned long) index << tbl->it_page_shift;
-	unsigned long flags, oldtce = 0;
-	u64 proto_tce = iommu_direction_to_tce_perm(*direction);
-	unsigned long newtce = *tce | proto_tce;
-
-	spin_lock_irqsave(&tbl->large_pool.lock, flags);
-
-	rc = plpar_tce_get((u64)tbl->it_index, ioba, &oldtce);
-	if (!rc)
-		rc = plpar_tce_put((u64)tbl->it_index, ioba, newtce);
-
-	if (!rc) {
-		*direction = iommu_tce_direction(oldtce);
-		*tce = oldtce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
-	}
-
-	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
-
-	return rc;
-}
-#endif
-
-struct iommu_table_ops iommu_table_lpar_multi_ops = {
-	.set = tce_buildmulti_pSeriesLP,
-#ifdef CONFIG_IOMMU_API
-	.xchg_no_kill = tce_exchange_pseries,
-#endif
-	.clear = tce_freemulti_pSeriesLP,
-	.get = tce_get_pSeriesLP
-};
-
 static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 {
 	struct iommu_table *tbl;
@@ -729,7 +727,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 		tbl = ppci->table_group->tables[0];
 		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
 				ppci->table_group, dma_window);
-		tbl->it_ops = &iommu_table_lpar_multi_ops;
 		iommu_init_table(tbl, ppci->phb->node, 0, 0);
 		iommu_register_group(ppci->table_group,
 				pci_domain_nr(bus), 0);
@@ -758,7 +755,6 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		PCI_DN(dn)->table_group = iommu_pseries_alloc_group(phb->node);
 		tbl = PCI_DN(dn)->table_group->tables[0];
 		iommu_table_setparms(phb, dn, tbl);
-		tbl->it_ops = &iommu_table_pseries_ops;
 		iommu_init_table(tbl, phb->node, 0, 0);
 		set_iommu_table_base(&dev->dev, tbl);
 		return;
@@ -1385,7 +1381,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 		tbl = pci->table_group->tables[0];
 		iommu_table_setparms_lpar(pci->phb, pdn, tbl,
 				pci->table_group, dma_window);
-		tbl->it_ops = &iommu_table_lpar_multi_ops;
 		iommu_init_table(tbl, pci->phb->node, 0, 0);
 		iommu_register_group(pci->table_group,
 				pci_domain_nr(pci->phb->bus), 0);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (9 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:56   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 12/14] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Update remove_dma_window() so it can be used to remove DDW with a given
property name.

This enables the creation of new property names for DDW, so we can
have different usage for it, like indirect mapping.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index abd36b257725..f6a65ecd1db5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -818,31 +818,32 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
 			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
 }
 
-static void remove_ddw(struct device_node *np, bool remove_prop)
+static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_name)
 {
 	struct property *win;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	int ret = 0;
 
+	win = of_find_property(np, win_name, NULL);
+	if (!win)
+		return -EINVAL;
+
 	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
 					 &ddw_avail[0], DDW_APPLICABLE_SIZE);
 	if (ret)
-		return;
-
-	win = of_find_property(np, DIRECT64_PROPNAME, NULL);
-	if (!win)
-		return;
+		return 0;
 
 	if (win->length >= sizeof(struct dynamic_dma_window_prop))
 		remove_dma_window(np, ddw_avail, win);
 
 	if (!remove_prop)
-		return;
+		return 0;
 
 	ret = of_remove_property(np, win);
 	if (ret)
 		pr_warn("%pOF: failed to remove direct window property: %d\n",
 			np, ret);
+	return 0;
 }
 
 static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
@@ -894,7 +895,7 @@ static int find_existing_ddw_windows(void)
 	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
 		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
 		if (!direct64 || len < sizeof(*direct64)) {
-			remove_ddw(pdn, true);
+			remove_ddw(pdn, true, DIRECT64_PROPNAME);
 			continue;
 		}
 
@@ -1325,7 +1326,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	win64 = NULL;
 
 out_win_del:
-	remove_ddw(pdn, true);
+	remove_ddw(pdn, true, DIRECT64_PROPNAME);
 
 out_failed:
 	if (default_win_removed)
@@ -1480,7 +1481,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 		 * we have to remove the property when releasing
 		 * the device node.
 		 */
-		remove_ddw(np, false);
+		remove_ddw(np, false, DIRECT64_PROPNAME);
 		if (pci && pci->table_group)
 			iommu_pseries_free_group(pci->table_group,
 					np->full_name);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 12/14] powerpc/pseries/iommu: Find existing DDW with given property name
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (10 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
  2020-09-11 17:07 ` [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
  13 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

Extract find_existing_ddw_windows() into find_existing_ddw_windows_named()
and calls it with current property name.

This will allow more property names to be checked in
find_existing_ddw_windows(), enabling the creation of new property names,
like the one that will be used for indirect mapping.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index f6a65ecd1db5..9b7c03652e72 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -882,24 +882,21 @@ static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
 	return window;
 }
 
-static int find_existing_ddw_windows(void)
+static void find_existing_ddw_windows_named(const char *name)
 {
 	int len;
 	struct device_node *pdn;
 	struct direct_window *window;
-	const struct dynamic_dma_window_prop *direct64;
-
-	if (!firmware_has_feature(FW_FEATURE_LPAR))
-		return 0;
+	const struct dynamic_dma_window_prop *dma64;
 
-	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
-		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
-		if (!direct64 || len < sizeof(*direct64)) {
-			remove_ddw(pdn, true, DIRECT64_PROPNAME);
+	for_each_node_with_property(pdn, name) {
+		dma64 = of_get_property(pdn, name, &len);
+		if (!dma64 || len < sizeof(*dma64)) {
+			remove_ddw(pdn, true, name);
 			continue;
 		}
 
-		window = ddw_list_new_entry(pdn, direct64);
+		window = ddw_list_new_entry(pdn, dma64);
 		if (!window)
 			break;
 
@@ -907,6 +904,14 @@ static int find_existing_ddw_windows(void)
 		list_add(&window->list, &direct_window_list);
 		spin_unlock(&direct_window_list_lock);
 	}
+}
+
+static int find_existing_ddw_windows(void)
+{
+	if (!firmware_has_feature(FW_FEATURE_LPAR))
+		return 0;
+
+	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
 
 	return 0;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (11 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 12/14] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:56   ` Alexey Kardashevskiy
  2020-09-11 17:07 ` [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.

As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.

The default DMA window uses 4k pages instead of 64k pages, and since
the amount of pages (TCEs) may stay the same (on pHyp case), making
use of DDW instead of the default DMA window for indirect mapping will
expand in up to 16x the amount of memory that can be mapped on DMA.

Indirect mapping will only be used if direct mapping is not a
possibility.

For indirect mapping, it's necessary to re-create the iommu_table with
the new DMA window parameters, so iommu_alloc() can use it.

Removing the default DMA window for using DDW with indirect mapping
is only allowed if there is no current IOMMU memory allocated in
the iommu_table. enable_ddw() is aborted otherwise.

Even though there won't be both direct and indirect mappings at the
same time, we can't reuse the DIRECT64_PROPNAME property name, or else
an older kexec()ed kernel can assume direct mapping, and skip
iommu_alloc(), causing undesirable behavior.
So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
was created to represent a DDW that does not allow direct mapping.

Note: ddw_memory_hotplug_max() was moved up so it can be used in
find_existing_ddw().

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 160 ++++++++++++++++---------
 1 file changed, 103 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9b7c03652e72..c4de23080d1b 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
 /* protects initializing window twice for same device */
 static DEFINE_MUTEX(direct_window_init_mutex);
 #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
 
 static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 					unsigned long num_pfn, const void *arg)
@@ -846,10 +847,48 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
 	return 0;
 }
 
-static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
+static phys_addr_t ddw_memory_hotplug_max(void)
+{
+	phys_addr_t max_addr = memory_hotplug_max();
+	struct device_node *memory;
+
+	/*
+	 * The "ibm,pmemory" can appear anywhere in the address space.
+	 * Assuming it is still backed by page structs, set the upper limit
+	 * for the huge DMA window as MAX_PHYSMEM_BITS.
+	 */
+	if (of_find_node_by_type(NULL, "ibm,pmemory"))
+		return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
+			(phys_addr_t)-1 : (1ULL << MAX_PHYSMEM_BITS);
+
+	for_each_node_by_type(memory, "memory") {
+		unsigned long start, size;
+		int n_mem_addr_cells, n_mem_size_cells, len;
+		const __be32 *memcell_buf;
+
+		memcell_buf = of_get_property(memory, "reg", &len);
+		if (!memcell_buf || len <= 0)
+			continue;
+
+		n_mem_addr_cells = of_n_addr_cells(memory);
+		n_mem_size_cells = of_n_size_cells(memory);
+
+		start = of_read_number(memcell_buf, n_mem_addr_cells);
+		memcell_buf += n_mem_addr_cells;
+		size = of_read_number(memcell_buf, n_mem_size_cells);
+		memcell_buf += n_mem_size_cells;
+
+		max_addr = max_t(phys_addr_t, max_addr, start + size);
+	}
+
+	return max_addr;
+}
+
+static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool *direct_mapping)
 {
 	struct direct_window *window;
 	const struct dynamic_dma_window_prop *direct64;
+	unsigned long window_size;
 	bool found = false;
 
 	spin_lock(&direct_window_list_lock);
@@ -858,6 +897,10 @@ static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
 		if (window->device == pdn) {
 			direct64 = window->prop;
 			*dma_addr = be64_to_cpu(direct64->dma_base);
+
+			window_size = (1UL << be32_to_cpu(direct64->window_shift));
+			*direct_mapping = (window_size >= ddw_memory_hotplug_max());
+
 			found = true;
 			break;
 		}
@@ -912,6 +955,7 @@ static int find_existing_ddw_windows(void)
 		return 0;
 
 	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
+	find_existing_ddw_windows_named(DMA64_PROPNAME);
 
 	return 0;
 }
@@ -1054,43 +1098,6 @@ struct failed_ddw_pdn {
 
 static LIST_HEAD(failed_ddw_pdn_list);
 
-static phys_addr_t ddw_memory_hotplug_max(void)
-{
-	phys_addr_t max_addr = memory_hotplug_max();
-	struct device_node *memory;
-
-	/*
-	 * The "ibm,pmemory" can appear anywhere in the address space.
-	 * Assuming it is still backed by page structs, set the upper limit
-	 * for the huge DMA window as MAX_PHYSMEM_BITS.
-	 */
-	if (of_find_node_by_type(NULL, "ibm,pmemory"))
-		return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
-			(phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
-
-	for_each_node_by_type(memory, "memory") {
-		unsigned long start, size;
-		int n_mem_addr_cells, n_mem_size_cells, len;
-		const __be32 *memcell_buf;
-
-		memcell_buf = of_get_property(memory, "reg", &len);
-		if (!memcell_buf || len <= 0)
-			continue;
-
-		n_mem_addr_cells = of_n_addr_cells(memory);
-		n_mem_size_cells = of_n_size_cells(memory);
-
-		start = of_read_number(memcell_buf, n_mem_addr_cells);
-		memcell_buf += n_mem_addr_cells;
-		size = of_read_number(memcell_buf, n_mem_size_cells);
-		memcell_buf += n_mem_size_cells;
-
-		max_addr = max_t(phys_addr_t, max_addr, start + size);
-	}
-
-	return max_addr;
-}
-
 /*
  * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
  * ibm,ddw-extensions, which carries the rtas token for
@@ -1173,14 +1180,19 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
 	struct direct_window *window;
+	const char *win_name;
 	struct property *win64 = NULL;
 	struct failed_ddw_pdn *fpdn;
-	bool default_win_removed = false;
+	bool default_win_removed = false, direct_mapping = false;
+	struct pci_dn *pci = PCI_DN(pdn);
+	struct iommu_table *tbl = pci->table_group->tables[0];
 
 	mutex_lock(&direct_window_init_mutex);
 
-	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset))
-		goto out_unlock;
+	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &direct_mapping)) {
+		mutex_unlock(&direct_window_init_mutex);
+		return direct_mapping;
+	}
 
 	/*
 	 * If we already went through this for a previous function of
@@ -1266,15 +1278,25 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	}
 
 	/* verify the window * number of ptes will map the partition */
-	/* check largest block * page size > max memory hotplug addr */
 	max_addr = ddw_memory_hotplug_max();
 	if (query.largest_available_block < (max_addr >> page_shift)) {
-		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
-			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
-			  1ULL << page_shift);
+		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu %llu-sized pages\n",
+			max_addr, query.largest_available_block,
+			1ULL << page_shift);
+
+		len = order_base_2(query.largest_available_block << page_shift);
+		win_name = DMA64_PROPNAME;
+	} else {
+		direct_mapping = true;
+		len = order_base_2(max_addr);
+		win_name = DIRECT64_PROPNAME;
+	}
+
+	/* DDW + IOMMU on single window may fail if there is any allocation */
+	if (default_win_removed && !direct_mapping && iommu_table_in_use(tbl)) {
+		dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
 		goto out_failed;
 	}
-	len = order_base_2(max_addr);
 
 	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
 	if (ret != 0)
@@ -1284,8 +1306,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		create.liobn, dn);
 
 	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
-	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
-				    page_shift, len);
+	win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
 	if (!win64) {
 		dev_info(&dev->dev,
 			 "couldn't allocate property, property name, or value\n");
@@ -1300,15 +1321,37 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	}
 
 	window = ddw_list_new_entry(pdn, win64->value);
-	if (!window)
+	if (!window) {
+		dev_dbg(&dev->dev, "couldn't create new list entry\n");
 		goto out_prop_del;
+	}
 
-	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
-			win64->value, tce_setrange_multi_pSeriesLP_walk);
-	if (ret) {
-		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
-			 dn, ret);
-		goto out_list_del;
+	if (direct_mapping) {
+		/* DDW maps the whole partition, so enable direct DMA mapping */
+		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
+					    win64->value, tce_setrange_multi_pSeriesLP_walk);
+		if (ret) {
+			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
+				 dn, ret);
+			goto out_list_del;
+		}
+	} else {
+		/* New table for using DDW instead of the default DMA window */
+		tbl = iommu_pseries_alloc_table(pci->phb->node);
+		if (!tbl) {
+			dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
+			goto out_list_del;
+		}
+
+		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
+				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
+		iommu_init_table(tbl, pci->phb->node, 0, 0);
+
+		/* Free old table and replace by the newer */
+		iommu_tce_table_put(pci->table_group->tables[0]);
+		pci->table_group->tables[0] = tbl;
+
+		set_iommu_table_base(&dev->dev, tbl);
 	}
 
 	spin_lock(&direct_window_list_lock);
@@ -1345,7 +1388,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 
 out_unlock:
 	mutex_unlock(&direct_window_init_mutex);
-	return win64;
+	return win64 && direct_mapping;
 }
 
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
@@ -1486,7 +1529,10 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 		 * we have to remove the property when releasing
 		 * the device node.
 		 */
-		remove_ddw(np, false, DIRECT64_PROPNAME);
+
+		if (remove_ddw(np, false, DIRECT64_PROPNAME))
+			remove_ddw(np, false, DMA64_PROPNAME);
+
 		if (pci && pci->table_group)
 			iommu_pseries_free_group(pci->table_group,
 					np->full_name);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window"
  2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
                   ` (12 preceding siblings ...)
  2020-09-11 17:07 ` [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
@ 2020-09-11 17:07 ` Leonardo Bras
  2020-09-29  3:55   ` Alexey Kardashevskiy
  13 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-11 17:07 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Leonardo Bras, Joel Stanley, Christophe Leroy,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, 

A previous change introduced the usage of DDW as a bigger indirect DMA
mapping when the DDW available size does not map the whole partition.

As most of the code that manipulates direct mappings was reused for
indirect mappings, it's necessary to rename all names and debug/info
messages to reflect that it can be used for both kinds of mapping.

Also, defines DEFAULT_DMA_WIN as "ibm,dma-window" to document that
it's the name of the default DMA window.

Those changes are not supposed to change how the code works in any
way, just adjust naming.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 102 +++++++++++++------------
 1 file changed, 53 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index c4de23080d1b..56638b7f07fc 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -349,7 +349,7 @@ struct dynamic_dma_window_prop {
 	__be32	window_shift;	/* ilog2(tce_window_size) */
 };
 
-struct direct_window {
+struct dma_win {
 	struct device_node *device;
 	const struct dynamic_dma_window_prop *prop;
 	struct list_head list;
@@ -369,13 +369,14 @@ struct ddw_create_response {
 	u32 addr_lo;
 };
 
-static LIST_HEAD(direct_window_list);
+static LIST_HEAD(dma_win_list);
 /* prevents races between memory on/offline and window creation */
-static DEFINE_SPINLOCK(direct_window_list_lock);
+static DEFINE_SPINLOCK(dma_win_list_lock);
 /* protects initializing window twice for same device */
-static DEFINE_MUTEX(direct_window_init_mutex);
+static DEFINE_MUTEX(dma_win_init_mutex);
 #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
 #define DMA64_PROPNAME "linux,dma64-ddr-window-info"
+#define DEFAULT_DMA_WIN "ibm,dma-window"
 
 static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 					unsigned long num_pfn, const void *arg)
@@ -706,15 +707,18 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 	pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
 		 dn);
 
-	/* Find nearest ibm,dma-window, walking up the device tree */
+	/*
+	 * Find nearest ibm,dma-window (default DMA window), walking up the
+	 * device tree
+	 */
 	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
-		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
+		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
 		if (dma_window != NULL)
 			break;
 	}
 
 	if (dma_window == NULL) {
-		pr_debug("  no ibm,dma-window property !\n");
+		pr_debug("  no %s property !\n", DEFAULT_DMA_WIN);
 		return;
 	}
 
@@ -810,11 +814,11 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
 
 	ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
 	if (ret)
-		pr_warn("%pOF: failed to remove direct window: rtas returned "
+		pr_warn("%pOF: failed to remove dma window: rtas returned "
 			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
 			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
 	else
-		pr_debug("%pOF: successfully removed direct window: rtas returned "
+		pr_debug("%pOF: successfully removed dma window: rtas returned "
 			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
 			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
 }
@@ -842,7 +846,7 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
 
 	ret = of_remove_property(np, win);
 	if (ret)
-		pr_warn("%pOF: failed to remove direct window property: %d\n",
+		pr_warn("%pOF: failed to remove dma window property: %d\n",
 			np, ret);
 	return 0;
 }
@@ -886,34 +890,34 @@ static phys_addr_t ddw_memory_hotplug_max(void)
 
 static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool *direct_mapping)
 {
-	struct direct_window *window;
-	const struct dynamic_dma_window_prop *direct64;
+	struct dma_win *window;
+	const struct dynamic_dma_window_prop *dma64;
 	unsigned long window_size;
 	bool found = false;
 
-	spin_lock(&direct_window_list_lock);
+	spin_lock(&dma_win_list_lock);
 	/* check if we already created a window and dupe that config if so */
-	list_for_each_entry(window, &direct_window_list, list) {
+	list_for_each_entry(window, &dma_win_list, list) {
 		if (window->device == pdn) {
-			direct64 = window->prop;
-			*dma_addr = be64_to_cpu(direct64->dma_base);
+			dma64 = window->prop;
+			*dma_addr = be64_to_cpu(dma64->dma_base);
 
-			window_size = (1UL << be32_to_cpu(direct64->window_shift));
+			window_size = (1UL << be32_to_cpu(dma64->window_shift));
 			*direct_mapping = (window_size >= ddw_memory_hotplug_max());
 
 			found = true;
 			break;
 		}
 	}
-	spin_unlock(&direct_window_list_lock);
+	spin_unlock(&dma_win_list_lock);
 
 	return found;
 }
 
-static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
-						const struct dynamic_dma_window_prop *dma64)
+static struct dma_win *ddw_list_new_entry(struct device_node *pdn,
+					  const struct dynamic_dma_window_prop *dma64)
 {
-	struct direct_window *window;
+	struct dma_win *window;
 
 	window = kzalloc(sizeof(*window), GFP_KERNEL);
 	if (!window)
@@ -929,7 +933,7 @@ static void find_existing_ddw_windows_named(const char *name)
 {
 	int len;
 	struct device_node *pdn;
-	struct direct_window *window;
+	struct dma_win *window;
 	const struct dynamic_dma_window_prop *dma64;
 
 	for_each_node_with_property(pdn, name) {
@@ -943,9 +947,9 @@ static void find_existing_ddw_windows_named(const char *name)
 		if (!window)
 			break;
 
-		spin_lock(&direct_window_list_lock);
-		list_add(&window->list, &direct_window_list);
-		spin_unlock(&direct_window_list_lock);
+		spin_lock(&dma_win_list_lock);
+		list_add(&window->list, &dma_win_list);
+		spin_unlock(&dma_win_list_lock);
 	}
 }
 
@@ -1179,7 +1183,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	u64 max_addr, win_addr;
 	struct device_node *dn;
 	u32 ddw_avail[DDW_APPLICABLE_SIZE];
-	struct direct_window *window;
+	struct dma_win *window;
 	const char *win_name;
 	struct property *win64 = NULL;
 	struct failed_ddw_pdn *fpdn;
@@ -1187,10 +1191,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	struct pci_dn *pci = PCI_DN(pdn);
 	struct iommu_table *tbl = pci->table_group->tables[0];
 
-	mutex_lock(&direct_window_init_mutex);
+	mutex_lock(&dma_win_init_mutex);
 
 	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &direct_mapping)) {
-		mutex_unlock(&direct_window_init_mutex);
+		mutex_unlock(&dma_win_init_mutex);
 		return direct_mapping;
 	}
 
@@ -1241,7 +1245,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		struct property *default_win;
 		int reset_win_ext;
 
-		default_win = of_find_property(pdn, "ibm,dma-window", NULL);
+		default_win = of_find_property(pdn, DEFAULT_DMA_WIN, NULL);
 		if (!default_win)
 			goto out_failed;
 
@@ -1272,8 +1276,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	} else if ((query.page_size & 1) && PAGE_SHIFT >= 12) {
 		page_shift = 12; /* 4kB */
 	} else {
-		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
-			  query.page_size);
+		dev_dbg(&dev->dev, "no supported page size in mask %x",
+			query.page_size);
 		goto out_failed;
 	}
 
@@ -1331,7 +1335,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
 					    win64->value, tce_setrange_multi_pSeriesLP_walk);
 		if (ret) {
-			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
+			dev_info(&dev->dev, "failed to map DMA window for %pOF: %d\n",
 				 dn, ret);
 			goto out_list_del;
 		}
@@ -1354,9 +1358,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		set_iommu_table_base(&dev->dev, tbl);
 	}
 
-	spin_lock(&direct_window_list_lock);
-	list_add(&window->list, &direct_window_list);
-	spin_unlock(&direct_window_list_lock);
+	spin_lock(&dma_win_list_lock);
+	list_add(&window->list, &dma_win_list);
+	spin_unlock(&dma_win_list_lock);
 
 	dev->dev.archdata.dma_offset = win_addr;
 	goto out_unlock;
@@ -1387,7 +1391,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	list_add(&fpdn->list, &failed_ddw_pdn_list);
 
 out_unlock:
-	mutex_unlock(&direct_window_init_mutex);
+	mutex_unlock(&dma_win_init_mutex);
 	return win64 && direct_mapping;
 }
 
@@ -1411,7 +1415,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 
 	for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
 	     pdn = pdn->parent) {
-		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
+		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
 		if (dma_window)
 			break;
 	}
@@ -1461,7 +1465,7 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
 	 */
 	for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
 			pdn = pdn->parent) {
-		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
+		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
 		if (dma_window)
 			break;
 	}
@@ -1475,29 +1479,29 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
 static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
 		void *data)
 {
-	struct direct_window *window;
+	struct dma_win *window;
 	struct memory_notify *arg = data;
 	int ret = 0;
 
 	switch (action) {
 	case MEM_GOING_ONLINE:
-		spin_lock(&direct_window_list_lock);
-		list_for_each_entry(window, &direct_window_list, list) {
+		spin_lock(&dma_win_list_lock);
+		list_for_each_entry(window, &dma_win_list, list) {
 			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
 					arg->nr_pages, window->prop);
 			/* XXX log error */
 		}
-		spin_unlock(&direct_window_list_lock);
+		spin_unlock(&dma_win_list_lock);
 		break;
 	case MEM_CANCEL_ONLINE:
 	case MEM_OFFLINE:
-		spin_lock(&direct_window_list_lock);
-		list_for_each_entry(window, &direct_window_list, list) {
+		spin_lock(&dma_win_list_lock);
+		list_for_each_entry(window, &dma_win_list, list) {
 			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
 					arg->nr_pages, window->prop);
 			/* XXX log error */
 		}
-		spin_unlock(&direct_window_list_lock);
+		spin_unlock(&dma_win_list_lock);
 		break;
 	default:
 		break;
@@ -1518,7 +1522,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 	struct of_reconfig_data *rd = data;
 	struct device_node *np = rd->dn;
 	struct pci_dn *pci = PCI_DN(np);
-	struct direct_window *window;
+	struct dma_win *window;
 
 	switch (action) {
 	case OF_RECONFIG_DETACH_NODE:
@@ -1537,15 +1541,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 			iommu_pseries_free_group(pci->table_group,
 					np->full_name);
 
-		spin_lock(&direct_window_list_lock);
-		list_for_each_entry(window, &direct_window_list, list) {
+		spin_lock(&dma_win_list_lock);
+		list_for_each_entry(window, &dma_win_list, list) {
 			if (window->device == np) {
 				list_del(&window->list);
 				kfree(window);
 				break;
 			}
 		}
-		spin_unlock(&direct_window_list_lock);
+		spin_unlock(&dma_win_list_lock);
 		break;
 	default:
 		err = NOTIFY_DONE;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window"
  2020-09-11 17:07 ` [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
@ 2020-09-29  3:55   ` Alexey Kardashevskiy
  2020-09-29 20:54     ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:55 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> A previous change introduced the usage of DDW as a bigger indirect DMA
> mapping when the DDW available size does not map the whole partition.
> 
> As most of the code that manipulates direct mappings was reused for
> indirect mappings, it's necessary to rename all names and debug/info
> messages to reflect that it can be used for both kinds of mapping.
> 
> Also, defines DEFAULT_DMA_WIN as "ibm,dma-window" to document that
> it's the name of the default DMA window.

"ibm,dma-window" is so old so it does not need a macro (which btw would 
be DMA_WIN_PROPNAME to match the other names) :)


> Those changes are not supposed to change how the code works in any
> way, just adjust naming.

I simply have this in my .vimrc for the cases like this one:

===
This should cause no behavioural change.
===


> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 102 +++++++++++++------------
>   1 file changed, 53 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index c4de23080d1b..56638b7f07fc 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -349,7 +349,7 @@ struct dynamic_dma_window_prop {
>   	__be32	window_shift;	/* ilog2(tce_window_size) */
>   };
>   
> -struct direct_window {
> +struct dma_win {
>   	struct device_node *device;
>   	const struct dynamic_dma_window_prop *prop;
>   	struct list_head list;
> @@ -369,13 +369,14 @@ struct ddw_create_response {
>   	u32 addr_lo;
>   };
>   
> -static LIST_HEAD(direct_window_list);
> +static LIST_HEAD(dma_win_list);
>   /* prevents races between memory on/offline and window creation */
> -static DEFINE_SPINLOCK(direct_window_list_lock);
> +static DEFINE_SPINLOCK(dma_win_list_lock);
>   /* protects initializing window twice for same device */
> -static DEFINE_MUTEX(direct_window_init_mutex);
> +static DEFINE_MUTEX(dma_win_init_mutex);
>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
>   #define DMA64_PROPNAME "linux,dma64-ddr-window-info"
> +#define DEFAULT_DMA_WIN "ibm,dma-window"
>   
>   static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   					unsigned long num_pfn, const void *arg)
> @@ -706,15 +707,18 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   	pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
>   		 dn);
>   
> -	/* Find nearest ibm,dma-window, walking up the device tree */
> +	/*
> +	 * Find nearest ibm,dma-window (default DMA window), walking up the
> +	 * device tree
> +	 */
>   	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
> -		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
> +		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (dma_window != NULL)
>   			break;
>   	}
>   
>   	if (dma_window == NULL) {
> -		pr_debug("  no ibm,dma-window property !\n");
> +		pr_debug("  no %s property !\n", DEFAULT_DMA_WIN);
>   		return;
>   	}
>   
> @@ -810,11 +814,11 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
>   
>   	ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
>   	if (ret)
> -		pr_warn("%pOF: failed to remove direct window: rtas returned "
> +		pr_warn("%pOF: failed to remove dma window: rtas returned "
>   			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   	else
> -		pr_debug("%pOF: successfully removed direct window: rtas returned "
> +		pr_debug("%pOF: successfully removed dma window: rtas returned "


s/dma/DMA/ in all user visible strings.



>   			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   }
> @@ -842,7 +846,7 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
>   
>   	ret = of_remove_property(np, win);
>   	if (ret)
> -		pr_warn("%pOF: failed to remove direct window property: %d\n",
> +		pr_warn("%pOF: failed to remove dma window property: %d\n",
>   			np, ret);
>   	return 0;
>   }
> @@ -886,34 +890,34 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>   
>   static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool *direct_mapping)
>   {
> -	struct direct_window *window;
> -	const struct dynamic_dma_window_prop *direct64;
> +	struct dma_win *window;
> +	const struct dynamic_dma_window_prop *dma64;
>   	unsigned long window_size;
>   	bool found = false;
>   
> -	spin_lock(&direct_window_list_lock);
> +	spin_lock(&dma_win_list_lock);
>   	/* check if we already created a window and dupe that config if so */
> -	list_for_each_entry(window, &direct_window_list, list) {
> +	list_for_each_entry(window, &dma_win_list, list) {
>   		if (window->device == pdn) {
> -			direct64 = window->prop;
> -			*dma_addr = be64_to_cpu(direct64->dma_base);
> +			dma64 = window->prop;
> +			*dma_addr = be64_to_cpu(dma64->dma_base);
>   
> -			window_size = (1UL << be32_to_cpu(direct64->window_shift));
> +			window_size = (1UL << be32_to_cpu(dma64->window_shift));
>   			*direct_mapping = (window_size >= ddw_memory_hotplug_max());
>   
>   			found = true;
>   			break;
>   		}
>   	}
> -	spin_unlock(&direct_window_list_lock);
> +	spin_unlock(&dma_win_list_lock);
>   
>   	return found;
>   }
>   
> -static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> -						const struct dynamic_dma_window_prop *dma64)
> +static struct dma_win *ddw_list_new_entry(struct device_node *pdn,
> +					  const struct dynamic_dma_window_prop *dma64)
>   {
> -	struct direct_window *window;
> +	struct dma_win *window;
>   
>   	window = kzalloc(sizeof(*window), GFP_KERNEL);
>   	if (!window)
> @@ -929,7 +933,7 @@ static void find_existing_ddw_windows_named(const char *name)
>   {
>   	int len;
>   	struct device_node *pdn;
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	const struct dynamic_dma_window_prop *dma64;
>   
>   	for_each_node_with_property(pdn, name) {
> @@ -943,9 +947,9 @@ static void find_existing_ddw_windows_named(const char *name)
>   		if (!window)
>   			break;
>   
> -		spin_lock(&direct_window_list_lock);
> -		list_add(&window->list, &direct_window_list);
> -		spin_unlock(&direct_window_list_lock);
> +		spin_lock(&dma_win_list_lock);
> +		list_add(&window->list, &dma_win_list);
> +		spin_unlock(&dma_win_list_lock);
>   	}
>   }
>   
> @@ -1179,7 +1183,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	u64 max_addr, win_addr;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	const char *win_name;
>   	struct property *win64 = NULL;
>   	struct failed_ddw_pdn *fpdn;
> @@ -1187,10 +1191,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct pci_dn *pci = PCI_DN(pdn);
>   	struct iommu_table *tbl = pci->table_group->tables[0];
>   
> -	mutex_lock(&direct_window_init_mutex);
> +	mutex_lock(&dma_win_init_mutex);
>   
>   	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &direct_mapping)) {
> -		mutex_unlock(&direct_window_init_mutex);
> +		mutex_unlock(&dma_win_init_mutex);
>   		return direct_mapping;
>   	}
>   
> @@ -1241,7 +1245,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		struct property *default_win;
>   		int reset_win_ext;
>   
> -		default_win = of_find_property(pdn, "ibm,dma-window", NULL);
> +		default_win = of_find_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (!default_win)
>   			goto out_failed;
>   
> @@ -1272,8 +1276,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	} else if ((query.page_size & 1) && PAGE_SHIFT >= 12) {
>   		page_shift = 12; /* 4kB */
>   	} else {
> -		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
> -			  query.page_size);
> +		dev_dbg(&dev->dev, "no supported page size in mask %x",
> +			query.page_size);
>   		goto out_failed;
>   	}
>   
> @@ -1331,7 +1335,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>   					    win64->value, tce_setrange_multi_pSeriesLP_walk);
>   		if (ret) {
> -			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> +			dev_info(&dev->dev, "failed to map DMA window for %pOF: %d\n",
>   				 dn, ret);
>   			goto out_list_del;
>   		}
> @@ -1354,9 +1358,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		set_iommu_table_base(&dev->dev, tbl);
>   	}
>   
> -	spin_lock(&direct_window_list_lock);
> -	list_add(&window->list, &direct_window_list);
> -	spin_unlock(&direct_window_list_lock);
> +	spin_lock(&dma_win_list_lock);
> +	list_add(&window->list, &dma_win_list);
> +	spin_unlock(&dma_win_list_lock);
>   
>   	dev->dev.archdata.dma_offset = win_addr;
>   	goto out_unlock;
> @@ -1387,7 +1391,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	list_add(&fpdn->list, &failed_ddw_pdn_list);
>   
>   out_unlock:
> -	mutex_unlock(&direct_window_init_mutex);
> +	mutex_unlock(&dma_win_init_mutex);
>   	return win64 && direct_mapping;
>   }
>   
> @@ -1411,7 +1415,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
>   
>   	for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
>   	     pdn = pdn->parent) {
> -		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
> +		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (dma_window)
>   			break;
>   	}
> @@ -1461,7 +1465,7 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
>   	 */
>   	for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->table_group;
>   			pdn = pdn->parent) {
> -		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
> +		dma_window = of_get_property(pdn, DEFAULT_DMA_WIN, NULL);
>   		if (dma_window)
>   			break;
>   	}
> @@ -1475,29 +1479,29 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
>   static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
>   		void *data)
>   {
> -	struct direct_window *window;
> +	struct dma_win *window;
>   	struct memory_notify *arg = data;
>   	int ret = 0;
>   
>   	switch (action) {
>   	case MEM_GOING_ONLINE:
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>   					arg->nr_pages, window->prop);
>   			/* XXX log error */
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	case MEM_CANCEL_ONLINE:
>   	case MEM_OFFLINE:
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>   					arg->nr_pages, window->prop);
>   			/* XXX log error */
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	default:
>   		break;
> @@ -1518,7 +1522,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   	struct of_reconfig_data *rd = data;
>   	struct device_node *np = rd->dn;
>   	struct pci_dn *pci = PCI_DN(np);
> -	struct direct_window *window;
> +	struct dma_win *window;
>   
>   	switch (action) {
>   	case OF_RECONFIG_DETACH_NODE:
> @@ -1537,15 +1541,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
>   
> -		spin_lock(&direct_window_list_lock);
> -		list_for_each_entry(window, &direct_window_list, list) {
> +		spin_lock(&dma_win_list_lock);
> +		list_for_each_entry(window, &dma_win_list, list) {
>   			if (window->device == np) {
>   				list_del(&window->list);
>   				kfree(window);
>   				break;
>   			}
>   		}
> -		spin_unlock(&direct_window_list_lock);
> +		spin_unlock(&dma_win_list_lock);
>   		break;
>   	default:
>   		err = NOTIFY_DONE;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2020-09-11 17:07 ` [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
@ 2020-09-29  3:56   ` Alexey Kardashevskiy
  2021-04-13  5:44     ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Update remove_dma_window() so it can be used to remove DDW with a given
> property name.
> 

Out of context this seems useless. How about?
===
At the moment pseries stores information about created directly mapped 
DDW window in DIRECT64_PROPNAME. We are going to implement indirect DDW 
window which we need to preserve during kexec so we need another 
property for that.
===

Feel free to correct my english :)


> This enables the creation of new property names for DDW, so we can
> have different usage for it, like indirect mapping.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index abd36b257725..f6a65ecd1db5 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -818,31 +818,32 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
>   			np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
>   }
>   
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_name)
>   {
>   	struct property *win;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	int ret = 0;
>   
> +	win = of_find_property(np, win_name, NULL);
> +	if (!win)
> +		return -EINVAL;
> +
>   	ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
>   					 &ddw_avail[0], DDW_APPLICABLE_SIZE);
>   	if (ret)
> -		return;
> -
> -	win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> -	if (!win)
> -		return;
> +		return 0;
>   
>   	if (win->length >= sizeof(struct dynamic_dma_window_prop))
>   		remove_dma_window(np, ddw_avail, win);
>   
>   	if (!remove_prop)
> -		return;
> +		return 0;
>   
>   	ret = of_remove_property(np, win);
>   	if (ret)
>   		pr_warn("%pOF: failed to remove direct window property: %d\n",
>   			np, ret);
> +	return 0;


You do not test the return code anywhere until 13/14 so I'd say merge 
this one into 13/14, the same comment applies to 12/14. If you do not 
move chunks in 13/14, it is going to be fairly small patch.


>   }
>   
>   static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> @@ -894,7 +895,7 @@ static int find_existing_ddw_windows(void)
>   	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
>   		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
>   		if (!direct64 || len < sizeof(*direct64)) {
> -			remove_ddw(pdn, true);
> +			remove_ddw(pdn, true, DIRECT64_PROPNAME);
>   			continue;
>   		}
>   
> @@ -1325,7 +1326,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	win64 = NULL;
>   
>   out_win_del:
> -	remove_ddw(pdn, true);
> +	remove_ddw(pdn, true, DIRECT64_PROPNAME);
>   
>   out_failed:
>   	if (default_win_removed)
> @@ -1480,7 +1481,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   		 * we have to remove the property when releasing
>   		 * the device node.
>   		 */
> -		remove_ddw(np, false);
> +		remove_ddw(np, false, DIRECT64_PROPNAME);
>   		if (pci && pci->table_group)
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2020-09-11 17:07 ` [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
@ 2020-09-29  3:56   ` Alexey Kardashevskiy
  2021-04-13  5:49     ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> works with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
> 
> As of today, if the assumed direct mapping is not possible, DDW creation
> is skipped and the default DMA window "ibm,dma-window" is used instead.
> 
> The default DMA window uses 4k pages instead of 64k pages, and since
> the amount of pages (TCEs) may stay the same (on pHyp case), making
> use of DDW instead of the default DMA window for indirect mapping will
> expand in up to 16x the amount of memory that can be mapped on DMA.
> 
> Indirect mapping will only be used if direct mapping is not a
> possibility.
> 
> For indirect mapping, it's necessary to re-create the iommu_table with
> the new DMA window parameters, so iommu_alloc() can use it.
> 
> Removing the default DMA window for using DDW with indirect mapping
> is only allowed if there is no current IOMMU memory allocated in
> the iommu_table. enable_ddw() is aborted otherwise.
> 
> Even though there won't be both direct and indirect mappings at the
> same time, we can't reuse the DIRECT64_PROPNAME property name, or else
> an older kexec()ed kernel can assume direct mapping, and skip
> iommu_alloc(), causing undesirable behavior.
> So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
> was created to represent a DDW that does not allow direct mapping.
> 
> Note: ddw_memory_hotplug_max() was moved up so it can be used in
> find_existing_ddw().
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 160 ++++++++++++++++---------
>   1 file changed, 103 insertions(+), 57 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9b7c03652e72..c4de23080d1b 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -375,6 +375,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
>   /* protects initializing window twice for same device */
>   static DEFINE_MUTEX(direct_window_init_mutex);
>   #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>   
>   static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   					unsigned long num_pfn, const void *arg)
> @@ -846,10 +847,48 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
>   	return 0;
>   }
>   
> -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> +static phys_addr_t ddw_memory_hotplug_max(void)


Please, forward declaration or a separate patch; this creates 
unnecessary noise to the actual change.


> +{
> +	phys_addr_t max_addr = memory_hotplug_max();
> +	struct device_node *memory;
> +
> +	/*
> +	 * The "ibm,pmemory" can appear anywhere in the address space.
> +	 * Assuming it is still backed by page structs, set the upper limit
> +	 * for the huge DMA window as MAX_PHYSMEM_BITS.
> +	 */
> +	if (of_find_node_by_type(NULL, "ibm,pmemory"))
> +		return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
> +			(phys_addr_t)-1 : (1ULL << MAX_PHYSMEM_BITS);
> +
> +	for_each_node_by_type(memory, "memory") {
> +		unsigned long start, size;
> +		int n_mem_addr_cells, n_mem_size_cells, len;
> +		const __be32 *memcell_buf;
> +
> +		memcell_buf = of_get_property(memory, "reg", &len);
> +		if (!memcell_buf || len <= 0)
> +			continue;
> +
> +		n_mem_addr_cells = of_n_addr_cells(memory);
> +		n_mem_size_cells = of_n_size_cells(memory);
> +
> +		start = of_read_number(memcell_buf, n_mem_addr_cells);
> +		memcell_buf += n_mem_addr_cells;
> +		size = of_read_number(memcell_buf, n_mem_size_cells);
> +		memcell_buf += n_mem_size_cells;
> +
> +		max_addr = max_t(phys_addr_t, max_addr, start + size);
> +	}
> +
> +	return max_addr;
> +}
> +
> +static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool *direct_mapping)
>   {
>   	struct direct_window *window;
>   	const struct dynamic_dma_window_prop *direct64;
> +	unsigned long window_size;
>   	bool found = false;
>   
>   	spin_lock(&direct_window_list_lock);
> @@ -858,6 +897,10 @@ static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
>   		if (window->device == pdn) {
>   			direct64 = window->prop;
>   			*dma_addr = be64_to_cpu(direct64->dma_base);
> +
> +			window_size = (1UL << be32_to_cpu(direct64->window_shift));
> +			*direct_mapping = (window_size >= ddw_memory_hotplug_max());
> +
>   			found = true;
>   			break;
>   		}
> @@ -912,6 +955,7 @@ static int find_existing_ddw_windows(void)
>   		return 0;
>   
>   	find_existing_ddw_windows_named(DIRECT64_PROPNAME);
> +	find_existing_ddw_windows_named(DMA64_PROPNAME);
>   
>   	return 0;
>   }
> @@ -1054,43 +1098,6 @@ struct failed_ddw_pdn {
>   
>   static LIST_HEAD(failed_ddw_pdn_list);
>   
> -static phys_addr_t ddw_memory_hotplug_max(void)
> -{
> -	phys_addr_t max_addr = memory_hotplug_max();
> -	struct device_node *memory;
> -
> -	/*
> -	 * The "ibm,pmemory" can appear anywhere in the address space.
> -	 * Assuming it is still backed by page structs, set the upper limit
> -	 * for the huge DMA window as MAX_PHYSMEM_BITS.
> -	 */
> -	if (of_find_node_by_type(NULL, "ibm,pmemory"))
> -		return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
> -			(phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
> -
> -	for_each_node_by_type(memory, "memory") {
> -		unsigned long start, size;
> -		int n_mem_addr_cells, n_mem_size_cells, len;
> -		const __be32 *memcell_buf;
> -
> -		memcell_buf = of_get_property(memory, "reg", &len);
> -		if (!memcell_buf || len <= 0)
> -			continue;
> -
> -		n_mem_addr_cells = of_n_addr_cells(memory);
> -		n_mem_size_cells = of_n_size_cells(memory);
> -
> -		start = of_read_number(memcell_buf, n_mem_addr_cells);
> -		memcell_buf += n_mem_addr_cells;
> -		size = of_read_number(memcell_buf, n_mem_size_cells);
> -		memcell_buf += n_mem_size_cells;
> -
> -		max_addr = max_t(phys_addr_t, max_addr, start + size);
> -	}
> -
> -	return max_addr;
> -}
> -
>   /*
>    * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
>    * ibm,ddw-extensions, which carries the rtas token for
> @@ -1173,14 +1180,19 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
> +	const char *win_name;
>   	struct property *win64 = NULL;
>   	struct failed_ddw_pdn *fpdn;
> -	bool default_win_removed = false;
> +	bool default_win_removed = false, direct_mapping = false;
> +	struct pci_dn *pci = PCI_DN(pdn);
> +	struct iommu_table *tbl = pci->table_group->tables[0];
>   
>   	mutex_lock(&direct_window_init_mutex);
>   
> -	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset))
> -		goto out_unlock;
> +	if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &direct_mapping)) {
> +		mutex_unlock(&direct_window_init_mutex);
> +		return direct_mapping;
> +	}
>   
>   	/*
>   	 * If we already went through this for a previous function of
> @@ -1266,15 +1278,25 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	}
>   
>   	/* verify the window * number of ptes will map the partition */
> -	/* check largest block * page size > max memory hotplug addr */
>   	max_addr = ddw_memory_hotplug_max();
>   	if (query.largest_available_block < (max_addr >> page_shift)) {
> -		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
> -			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
> -			  1ULL << page_shift);
> +		dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu %llu-sized pages\n",
> +			max_addr, query.largest_available_block,
> +			1ULL << page_shift);
> +
> +		len = order_base_2(query.largest_available_block << page_shift);
> +		win_name = DMA64_PROPNAME;
> +	} else {
> +		direct_mapping = true;
> +		len = order_base_2(max_addr);
> +		win_name = DIRECT64_PROPNAME;
> +	}
> +
> +	/* DDW + IOMMU on single window may fail if there is any allocation */
> +	if (default_win_removed && !direct_mapping && iommu_table_in_use(tbl)) {
> +		dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
>   		goto out_failed;
>   	}
> -	len = order_base_2(max_addr);
>   
>   	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
>   	if (ret != 0)
> @@ -1284,8 +1306,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		create.liobn, dn);
>   
>   	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> -	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> -				    page_shift, len);
> +	win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
>   	if (!win64) {
>   		dev_info(&dev->dev,
>   			 "couldn't allocate property, property name, or value\n");
> @@ -1300,15 +1321,37 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	}
>   
>   	window = ddw_list_new_entry(pdn, win64->value);
> -	if (!window)
> +	if (!window) {
> +		dev_dbg(&dev->dev, "couldn't create new list entry\n");
>   		goto out_prop_del;
> +	}
>   
> -	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> -			win64->value, tce_setrange_multi_pSeriesLP_walk);
> -	if (ret) {
> -		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> -			 dn, ret);
> -		goto out_list_del;
> +	if (direct_mapping) {
> +		/* DDW maps the whole partition, so enable direct DMA mapping */
> +		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> +					    win64->value, tce_setrange_multi_pSeriesLP_walk);
> +		if (ret) {
> +			dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> +				 dn, ret);
> +			goto out_list_del;
> +		}
> +	} else {
> +		/* New table for using DDW instead of the default DMA window */
> +		tbl = iommu_pseries_alloc_table(pci->phb->node);
> +		if (!tbl) {
> +			dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
> +			goto out_list_del;
> +		}
> +
> +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
> +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
> +		iommu_init_table(tbl, pci->phb->node, 0, 0);


It is 0,0 only if win_addr>0 which is not the QEMU case.


> +
> +		/* Free old table and replace by the newer */
> +		iommu_tce_table_put(pci->table_group->tables[0]);
> +		pci->table_group->tables[0] = tbl;
> +
> +		set_iommu_table_base(&dev->dev, tbl);
>   	}
>   
>   	spin_lock(&direct_window_list_lock);
> @@ -1345,7 +1388,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   
>   out_unlock:
>   	mutex_unlock(&direct_window_init_mutex);
> -	return win64;
> +	return win64 && direct_mapping;
>   }
>   
>   static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> @@ -1486,7 +1529,10 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>   		 * we have to remove the property when releasing
>   		 * the device node.
>   		 */
> -		remove_ddw(np, false, DIRECT64_PROPNAME);
> +
> +		if (remove_ddw(np, false, DIRECT64_PROPNAME))
> +			remove_ddw(np, false, DMA64_PROPNAME);
> +
>   		if (pci && pci->table_group)
>   			iommu_pseries_free_group(pci->table_group,
>   					np->full_name);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
  2020-09-11 17:07 ` [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
@ 2020-09-29  3:56   ` Alexey Kardashevskiy
  2021-04-11  8:16     ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Add a new helper _iommu_table_setparms(), and use it in
> iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
> code.
> 
> Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
> so move it to the new helper. Since we need the iommu_table_ops to be
> declared before used, move iommu_table_lpar_multi_ops and
> iommu_table_pseries_ops to before their respective iommu_table_setparms*().
> 
> The tce_exchange_pseries() also had to be moved up, since it's used in
> iommu_table_lpar_multi_ops.xchg_no_kill.


Use forward declarations (preferred) or make a separate patch for moving 
chunks (I do not see much point).


> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 149 ++++++++++++-------------
>   1 file changed, 72 insertions(+), 77 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 510ccb0521af..abd36b257725 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -495,12 +495,62 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>   	return rc;
>   }
>   
> +#ifdef CONFIG_IOMMU_API
> +static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
> +				long *tce, enum dma_data_direction *direction,
> +				bool realmode)
> +{
> +	long rc;
> +	unsigned long ioba = (unsigned long)index << tbl->it_page_shift;
> +	unsigned long flags, oldtce = 0;
> +	u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> +	unsigned long newtce = *tce | proto_tce;
> +
> +	spin_lock_irqsave(&tbl->large_pool.lock, flags);
> +
> +	rc = plpar_tce_get((u64)tbl->it_index, ioba, &oldtce);
> +	if (!rc)
> +		rc = plpar_tce_put((u64)tbl->it_index, ioba, newtce);
> +
> +	if (!rc) {
> +		*direction = iommu_tce_direction(oldtce);
> +		*tce = oldtce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> +	}
> +
> +	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
> +
> +	return rc;
> +}
> +#endif
> +
>   static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
>   		unsigned long num_pfn, void *arg)
>   {
>   	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
>   }
>   
> +static inline void _iommu_table_setparms(struct iommu_table *tbl, unsigned long busno,
> +					 unsigned long liobn, unsigned long win_addr,
> +					 unsigned long window_size, unsigned long page_shift,
> +					 unsigned long base, struct iommu_table_ops *table_ops)
> +{
> +	tbl->it_busno = busno;
> +	tbl->it_index = liobn;
> +	tbl->it_offset = win_addr >> page_shift;
> +	tbl->it_size = window_size >> page_shift;
> +	tbl->it_page_shift = page_shift;
> +	tbl->it_base = base;
> +	tbl->it_blocksize = 16;
> +	tbl->it_type = TCE_PCI;
> +	tbl->it_ops = table_ops;
> +}
> +
> +struct iommu_table_ops iommu_table_pseries_ops = {
> +	.set = tce_build_pSeries,
> +	.clear = tce_free_pSeries,
> +	.get = tce_get_pseries
> +};
> +
>   static void iommu_table_setparms(struct pci_controller *phb,
>   				 struct device_node *dn,
>   				 struct iommu_table *tbl)
> @@ -509,8 +559,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
>   	const unsigned long *basep;
>   	const u32 *sizep;
>   
> -	node = phb->dn;
> +	/* Test if we are going over 2GB of DMA space */
> +	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> +		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> +		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> +	}


s/0x80000000ul/2*SZ_1G/

but more to the point - why this check? QEMU can create windows at 0 and 
as big as the VM requested. And I am pretty sure I can construct QEMU 
command line such as it won't have MMIO32 at all and a 4GB default DMA 
window.


>   
> +	node = phb->dn;
>   	basep = of_get_property(node, "linux,tce-base", NULL);
>   	sizep = of_get_property(node, "linux,tce-size", NULL);
>   	if (basep == NULL || sizep == NULL) {
> @@ -519,33 +574,25 @@ static void iommu_table_setparms(struct pci_controller *phb,
>   		return;
>   	}
>   
> -	tbl->it_base = (unsigned long)__va(*basep);
> +	_iommu_table_setparms(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
> +			      phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
> +			      (unsigned long)__va(*basep), &iommu_table_pseries_ops);
>   
>   	if (!is_kdump_kernel())
>   		memset((void *)tbl->it_base, 0, *sizep);
>   
> -	tbl->it_busno = phb->bus->number;
> -	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> -
> -	/* Units of tce entries */
> -	tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
> -
> -	/* Test if we are going over 2GB of DMA space */
> -	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> -		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> -		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> -	}
> -
>   	phb->dma_window_base_cur += phb->dma_window_size;
> -
> -	/* Set the tce table size - measured in entries */
> -	tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> -
> -	tbl->it_index = 0;
> -	tbl->it_blocksize = 16;
> -	tbl->it_type = TCE_PCI;
>   }
>   
> +struct iommu_table_ops iommu_table_lpar_multi_ops = {
> +	.set = tce_buildmulti_pSeriesLP,
> +#ifdef CONFIG_IOMMU_API
> +	.xchg_no_kill = tce_exchange_pseries,
> +#endif
> +	.clear = tce_freemulti_pSeriesLP,
> +	.get = tce_get_pSeriesLP
> +};
> +
>   /*
>    * iommu_table_setparms_lpar
>    *
> @@ -557,28 +604,17 @@ static void iommu_table_setparms_lpar(struct pci_controller *phb,
>   				      struct iommu_table_group *table_group,
>   				      const __be32 *dma_window)
>   {
> -	unsigned long offset, size;
> +	unsigned long offset, size, liobn;
>   
> -	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
> +	of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
>   
> -	tbl->it_busno = phb->bus->number;
> -	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> -	tbl->it_base   = 0;
> -	tbl->it_blocksize  = 16;
> -	tbl->it_type = TCE_PCI;
> -	tbl->it_offset = offset >> tbl->it_page_shift;
> -	tbl->it_size = size >> tbl->it_page_shift;
> +	_iommu_table_setparms(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, 0,
> +			      &iommu_table_lpar_multi_ops);
>   
>   	table_group->tce32_start = offset;
>   	table_group->tce32_size = size;
>   }
>   
> -struct iommu_table_ops iommu_table_pseries_ops = {
> -	.set = tce_build_pSeries,
> -	.clear = tce_free_pSeries,
> -	.get = tce_get_pseries
> -};
> -
>   static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>   {
>   	struct device_node *dn;
> @@ -647,7 +683,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>   	tbl = pci->table_group->tables[0];
>   
>   	iommu_table_setparms(pci->phb, dn, tbl);
> -	tbl->it_ops = &iommu_table_pseries_ops;
>   	iommu_init_table(tbl, pci->phb->node, 0, 0);
>   
>   	/* Divide the rest (1.75GB) among the children */
> @@ -658,43 +693,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
>   	pr_debug("ISA/IDE, window size is 0x%llx\n", pci->phb->dma_window_size);
>   }
>   
> -#ifdef CONFIG_IOMMU_API
> -static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
> -				long *tce, enum dma_data_direction *direction,
> -				bool realmode)
> -{
> -	long rc;
> -	unsigned long ioba = (unsigned long) index << tbl->it_page_shift;
> -	unsigned long flags, oldtce = 0;
> -	u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> -	unsigned long newtce = *tce | proto_tce;
> -
> -	spin_lock_irqsave(&tbl->large_pool.lock, flags);
> -
> -	rc = plpar_tce_get((u64)tbl->it_index, ioba, &oldtce);
> -	if (!rc)
> -		rc = plpar_tce_put((u64)tbl->it_index, ioba, newtce);
> -
> -	if (!rc) {
> -		*direction = iommu_tce_direction(oldtce);
> -		*tce = oldtce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> -	}
> -
> -	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
> -
> -	return rc;
> -}
> -#endif
> -
> -struct iommu_table_ops iommu_table_lpar_multi_ops = {
> -	.set = tce_buildmulti_pSeriesLP,
> -#ifdef CONFIG_IOMMU_API
> -	.xchg_no_kill = tce_exchange_pseries,
> -#endif
> -	.clear = tce_freemulti_pSeriesLP,
> -	.get = tce_get_pSeriesLP
> -};
> -
>   static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   {
>   	struct iommu_table *tbl;
> @@ -729,7 +727,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>   		tbl = ppci->table_group->tables[0];
>   		iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
>   				ppci->table_group, dma_window);
> -		tbl->it_ops = &iommu_table_lpar_multi_ops;
>   		iommu_init_table(tbl, ppci->phb->node, 0, 0);
>   		iommu_register_group(ppci->table_group,
>   				pci_domain_nr(bus), 0);
> @@ -758,7 +755,6 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
>   		PCI_DN(dn)->table_group = iommu_pseries_alloc_group(phb->node);
>   		tbl = PCI_DN(dn)->table_group->tables[0];
>   		iommu_table_setparms(phb, dn, tbl);
> -		tbl->it_ops = &iommu_table_pseries_ops;
>   		iommu_init_table(tbl, phb->node, 0, 0);
>   		set_iommu_table_base(&dev->dev, tbl);
>   		return;
> @@ -1385,7 +1381,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
>   		tbl = pci->table_group->tables[0];
>   		iommu_table_setparms_lpar(pci->phb, pdn, tbl,
>   				pci->table_group, dma_window);
> -		tbl->it_ops = &iommu_table_lpar_multi_ops;
>   		iommu_init_table(tbl, pci->phb->node, 0, 0);
>   		iommu_register_group(pci->table_group,
>   				pci_domain_nr(pci->phb->bus), 0);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  2020-09-11 17:07 ` [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
@ 2020-09-29  3:56   ` Alexey Kardashevskiy
  2021-04-11  7:52     ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Code used to create a ddw property that was previously scattered in
> enable_ddw() is now gathered in ddw_property_create(), which deals with
> allocation and filling the property, letting it ready for
> of_property_add(), which now occurs in sequence.
> 
> This created an opportunity to reorganize the second part of enable_ddw():
> 
> Without this patch enable_ddw() does, in order:
> kzalloc() property & members, create_ddw(), fill ddwprop inside property,
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> of_add_property(), and list_add().
> 
> With this patch enable_ddw() does, in order:
> create_ddw(), ddw_property_create(), of_add_property(),
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> and list_add().
> 
> This change requires of_remove_property() in case anything fails after
> of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
> in all memory, which looks the most expensive operation, only if
> everything else succeeds.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 106 +++++++++++++++----------
>   1 file changed, 63 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index bba8584a8f9d..510ccb0521af 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -177,7 +177,7 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
>   		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   			ret = (int)rc;
>   			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
> -			                   (npages_start - (npages + 1)));
> +					   (npages_start - (npages + 1)));

Unrelated spaces change, the existing code does not seem incorrect (tabs 
first, then spaces).


>   			break;
>   		}
>   
> @@ -215,7 +215,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
>   		return tce_build_pSeriesLP(tbl->it_index, tcenum,
>   					   tceshift, npages, uaddr,
> -		                           direction, attrs);
> +					   direction, attrs);


And here.

>   	}
>   
>   	local_irq_save(flags);	/* to protect tcep and the page behind it */
> @@ -269,7 +269,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   		ret = (int)rc;
>   		tce_freemulti_pSeriesLP(tbl, tcenum_start,
> -		                        (npages_start - (npages + limit)));
> +					(npages_start - (npages + limit)));

And here.

>   		return ret;
>   	}
>   
> @@ -1121,6 +1121,35 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
>   			 ret);
>   }
>   
> +static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
> +					    u32 page_shift, u32 window_shift)
> +{
> +	struct dynamic_dma_window_prop *ddwprop;
> +	struct property *win64;
> +
> +	win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
> +	if (!win64)
> +		return NULL;
> +
> +	win64->name = kstrdup(propname, GFP_KERNEL);
> +	ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
> +	win64->value = ddwprop;
> +	win64->length = sizeof(*ddwprop);
> +	if (!win64->name || !win64->value) {
> +		kfree(win64);
> +		kfree(win64->name);
> +		kfree(win64->value);
> +		return NULL;
> +	}
> +
> +	ddwprop->liobn = cpu_to_be32(liobn);
> +	ddwprop->dma_base = cpu_to_be64(dma_addr);
> +	ddwprop->tce_shift = cpu_to_be32(page_shift);
> +	ddwprop->window_shift = cpu_to_be32(window_shift);
> +
> +	return win64;
> +}
> +
>   /*
>    * If the PE supports dynamic dma windows, and there is space for a table
>    * that can map all pages in a linear offset, then setup such a table,
> @@ -1138,12 +1167,11 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	struct ddw_query_response query;
>   	struct ddw_create_response create;
>   	int page_shift;
> -	u64 max_addr;
> +	u64 max_addr, win_addr;
>   	struct device_node *dn;
>   	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   	struct direct_window *window;
> -	struct property *win64;
> -	struct dynamic_dma_window_prop *ddwprop;
> +	struct property *win64 = NULL;
>   	struct failed_ddw_pdn *fpdn;
>   	bool default_win_removed = false;
>   
> @@ -1245,72 +1273,64 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		goto out_failed;
>   	}
>   	len = order_base_2(max_addr);
> -	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
> -	if (!win64) {
> -		dev_info(&dev->dev,
> -			"couldn't allocate property for 64bit dma window\n");
> -		goto out_failed;
> -	}
> -	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
> -	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
> -	win64->length = sizeof(*ddwprop);
> -	if (!win64->name || !win64->value) {
> -		dev_info(&dev->dev,
> -			"couldn't allocate property name and value\n");
> -		goto out_free_prop;
> -	}
>   
>   	ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
>   	if (ret != 0)
> -		goto out_free_prop;
> -
> -	ddwprop->liobn = cpu_to_be32(create.liobn);
> -	ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
> -			create.addr_lo);
> -	ddwprop->tce_shift = cpu_to_be32(page_shift);
> -	ddwprop->window_shift = cpu_to_be32(len);
> +		goto out_failed;
>   
>   	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
> -		  create.liobn, dn);
> +		create.liobn, dn);


Unrelated. If you think the spaces/tabs thing needs to be fixed, make it 
a separate patch and do all these changes there at once.


>   
> -	window = ddw_list_new_entry(pdn, ddwprop);
> +	win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> +	win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> +				    page_shift, len);
> +	if (!win64) {
> +		dev_info(&dev->dev,
> +			 "couldn't allocate property, property name, or value\n");
> +		goto out_win_del;
> +	}
> +
> +	ret = of_add_property(pdn, win64);
> +	if (ret) {
> +		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> +			pdn, ret);
> +		goto out_prop_free;
> +	}
> +
> +	window = ddw_list_new_entry(pdn, win64->value);
>   	if (!window)
> -		goto out_clear_window;
> +		goto out_prop_del;
>   
>   	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>   			win64->value, tce_setrange_multi_pSeriesLP_walk);
>   	if (ret) {
>   		dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
>   			 dn, ret);
> -		goto out_free_window;
> -	}
> -
> -	ret = of_add_property(pdn, win64);
> -	if (ret) {
> -		dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> -			 pdn, ret);
> -		goto out_free_window;
> +		goto out_list_del;
>   	}
>   
>   	spin_lock(&direct_window_list_lock);
>   	list_add(&window->list, &direct_window_list);
>   	spin_unlock(&direct_window_list_lock);
>   
> -	dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
> +	dev->dev.archdata.dma_offset = win_addr;
>   	goto out_unlock;
>   
> -out_free_window:
> +out_list_del:
>   	kfree(window);
>   
> -out_clear_window:
> -	remove_ddw(pdn, true);
> +out_prop_del:
> +	of_remove_property(pdn, win64);
>   
> -out_free_prop:
> +out_prop_free:


Really? :) s/out_prop_del/out_del_prop/ may be? The less unrelated 
changes the better.


>   	kfree(win64->name);
>   	kfree(win64->value);
>   	kfree(win64);
>   	win64 = NULL;
>   
> +out_win_del:
> +	remove_ddw(pdn, true);
> +
>   out_failed:
>   	if (default_win_removed)
>   		reset_dma_window(dev, pdn);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift
  2020-09-11 17:07 ` [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
@ 2020-09-29  3:56   ` Alexey Kardashevskiy
  2020-09-29 18:25     ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:56 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,

These normally go right before "signed-off-by".


> 
> Some functions assume IOMMU page size can only be 4K (pageshift == 12).
> Update them to accept any page size passed, so we can use 64K pages.
> 
> In the process, some defines like TCE_SHIFT were made obsolete, and then
> removed.
> 
> IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
> a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
> no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
> It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
> tce_buildmulti_pSeriesLP().
> 
> Most places had a tbl struct, so using tbl->it_page_shift was simple.
> tce_free_pSeriesLP() was a special case, since callers not always have a
> tbl struct, so adding a tceshift parameter seems the right thing to do.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>


Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/include/asm/tce.h         |  8 ------
>   arch/powerpc/platforms/pseries/iommu.c | 39 +++++++++++++++-----------
>   2 files changed, 23 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
> index db5fc2f2262d..0c34d2756d92 100644
> --- a/arch/powerpc/include/asm/tce.h
> +++ b/arch/powerpc/include/asm/tce.h
> @@ -19,15 +19,7 @@
>   #define TCE_VB			0
>   #define TCE_PCI			1
>   
> -/* TCE page size is 4096 bytes (1 << 12) */
> -
> -#define TCE_SHIFT	12
> -#define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
> -
>   #define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
> -
> -#define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
> -#define TCE_RPN_SHIFT		12
>   #define TCE_VALID		0x800		/* TCE valid */
>   #define TCE_ALLIO		0x400		/* TCE valid for all lpars */
>   #define TCE_PCI_WRITE		0x2		/* write from PCI allowed */
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index e4198700ed1a..9db3927607a4 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
>   	u64 proto_tce;
>   	__be64 *tcep;
>   	u64 rpn;
> +	const unsigned long tceshift = tbl->it_page_shift;
> +	const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
>   
>   	proto_tce = TCE_PCI_READ; // Read allowed
>   
> @@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
>   
>   	while (npages--) {
>   		/* can't move this out since we might cross MEMBLOCK boundary */
> -		rpn = __pa(uaddr) >> TCE_SHIFT;
> -		*tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> +		rpn = __pa(uaddr) >> tceshift;
> +		*tcep = cpu_to_be64(proto_tce | rpn << tceshift);
>   
> -		uaddr += TCE_PAGE_SIZE;
> +		uaddr += pagesize;
>   		tcep++;
>   	}
>   	return 0;
> @@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
>   	return be64_to_cpu(*tcep);
>   }
>   
> -static void tce_free_pSeriesLP(unsigned long liobn, long, long);
> +static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
>   static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
>   
>   static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> @@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
>   		proto_tce |= TCE_PCI_WRITE;
>   
>   	while (npages--) {
> -		tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
> +		tce = proto_tce | rpn << tceshift;
>   		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
>   
>   		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
>   			ret = (int)rc;
> -			tce_free_pSeriesLP(liobn, tcenum_start,
> +			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
>   			                   (npages_start - (npages + 1)));
>   			break;
>   		}
> @@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	long tcenum_start = tcenum, npages_start = npages;
>   	int ret = 0;
>   	unsigned long flags;
> +	const unsigned long tceshift = tbl->it_page_shift;
>   
>   	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
>   		return tce_build_pSeriesLP(tbl->it_index, tcenum,
> -					   tbl->it_page_shift, npages, uaddr,
> +					   tceshift, npages, uaddr,
>   		                           direction, attrs);
>   	}
>   
> @@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   		if (!tcep) {
>   			local_irq_restore(flags);
>   			return tce_build_pSeriesLP(tbl->it_index, tcenum,
> -					tbl->it_page_shift,
> +					tceshift,
>   					npages, uaddr, direction, attrs);
>   		}
>   		__this_cpu_write(tce_page, tcep);
>   	}
>   
> -	rpn = __pa(uaddr) >> TCE_SHIFT;
> +	rpn = __pa(uaddr) >> tceshift;
>   	proto_tce = TCE_PCI_READ;
>   	if (direction != DMA_TO_DEVICE)
>   		proto_tce |= TCE_PCI_WRITE;
> @@ -245,12 +248,12 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   		limit = min_t(long, npages, 4096/TCE_ENTRY_SIZE);
>   
>   		for (l = 0; l < limit; l++) {
> -			tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> +			tcep[l] = cpu_to_be64(proto_tce | rpn << tceshift);
>   			rpn++;
>   		}
>   
>   		rc = plpar_tce_put_indirect((u64)tbl->it_index,
> -					    (u64)tcenum << 12,
> +					    (u64)tcenum << tceshift,
>   					    (u64)__pa(tcep),
>   					    limit);
>   
> @@ -277,12 +280,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
>   	return ret;
>   }
>   
> -static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
> +static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> +			       long npages)
>   {
>   	u64 rc;
>   
>   	while (npages--) {
> -		rc = plpar_tce_put((u64)liobn, (u64)tcenum << 12, 0);
> +		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, 0);
>   
>   		if (rc && printk_ratelimit()) {
>   			printk("tce_free_pSeriesLP: plpar_tce_put failed. rc=%lld\n", rc);
> @@ -301,9 +305,11 @@ static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long n
>   	u64 rc;
>   
>   	if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
> -		return tce_free_pSeriesLP(tbl->it_index, tcenum, npages);
> +		return tce_free_pSeriesLP(tbl->it_index, tcenum,
> +					  tbl->it_page_shift, npages);
>   
> -	rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
> +	rc = plpar_tce_stuff((u64)tbl->it_index,
> +			     (u64)tcenum << tbl->it_page_shift, 0, npages);
>   
>   	if (rc && printk_ratelimit()) {
>   		printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
> @@ -319,7 +325,8 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
>   	u64 rc;
>   	unsigned long tce_ret;
>   
> -	rc = plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, &tce_ret);
> +	rc = plpar_tce_get((u64)tbl->it_index,
> +			   (u64)tcenum << tbl->it_page_shift, &tce_ret);
>   
>   	if (rc && printk_ratelimit()) {
>   		printk("tce_get_pSeriesLP: plpar_tce_get failed. rc=%lld\n", rc);
> 

-- 
--
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  2020-09-11 17:07 ` [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
@ 2020-09-29  3:57   ` Alexey Kardashevskiy
  2021-04-11  6:55     ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Having a function to check if the iommu table has any allocation helps
> deciding if a tbl can be reset for using a new DMA window.
> 
> It should be enough to replace all instances of !bitmap_empty(tbl...).
> 
> iommu_table_in_use() skips reserved memory, so we don't need to worry about
> releasing it before testing. This causes iommu_table_release_pages() to
> become unnecessary, given it is only used to remove reserved memory for
> testing.
> 
> Also, only allow storing reserved memory values in tbl if they are valid
> in the table, so there is no need to check it in the new helper.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/include/asm/iommu.h |  1 +
>   arch/powerpc/kernel/iommu.c      | 61 +++++++++++++++-----------------
>   2 files changed, 30 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index 5032f1593299..2913e5c8b1f8 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -154,6 +154,7 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
>    */
>   extern struct iommu_table *iommu_init_table(struct iommu_table *tbl,
>   		int nid, unsigned long res_start, unsigned long res_end);
> +bool iommu_table_in_use(struct iommu_table *tbl);
>   
>   #define IOMMU_TABLE_GROUP_MAX_TABLES	2
>   
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ffb2637dc82b..c838da3d8f32 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -655,34 +655,21 @@ static void iommu_table_reserve_pages(struct iommu_table *tbl,
>   	if (tbl->it_offset == 0)
>   		set_bit(0, tbl->it_map);
>   
> +	/* Check if res_start..res_end is a valid range in the table */
> +	if (res_start >= res_end || res_start < tbl->it_offset ||
> +	    res_end > (tbl->it_offset + tbl->it_size)) {
> +		tbl->it_reserved_start = tbl->it_offset;
> +		tbl->it_reserved_end = tbl->it_offset;


This silently ignores overlapped range of the reserved area and the 
window which does not seem right.



> +		return;
> +	}
> +
>   	tbl->it_reserved_start = res_start;
>   	tbl->it_reserved_end = res_end;
>   
> -	/* Check if res_start..res_end isn't empty and overlaps the table */
> -	if (res_start && res_end &&
> -			(tbl->it_offset + tbl->it_size < res_start ||
> -			 res_end < tbl->it_offset))
> -		return;
> -
>   	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
>   		set_bit(i - tbl->it_offset, tbl->it_map);
>   }
>   
> -static void iommu_table_release_pages(struct iommu_table *tbl)
> -{
> -	int i;
> -
> -	/*
> -	 * In case we have reserved the first bit, we should not emit
> -	 * the warning below.
> -	 */
> -	if (tbl->it_offset == 0)
> -		clear_bit(0, tbl->it_map);
> -
> -	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
> -		clear_bit(i - tbl->it_offset, tbl->it_map);
> -}
> -
>   /*
>    * Build a iommu_table structure.  This contains a bit map which
>    * is used to manage allocation of the tce space.
> @@ -743,6 +730,23 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
>   	return tbl;
>   }
>   
> +bool iommu_table_in_use(struct iommu_table *tbl)
> +{
> +	unsigned long start = 0, end;
> +
> +	/* ignore reserved bit0 */
> +	if (tbl->it_offset == 0)
> +		start = 1;
> +	end = tbl->it_reserved_start - tbl->it_offset;
> +	if (find_next_bit(tbl->it_map, end, start) != end)
> +		return true;
> +
> +	start = tbl->it_reserved_end - tbl->it_offset;
> +	end = tbl->it_size;
> +	return find_next_bit(tbl->it_map, end, start) != end;
> +

Unnecessary empty line.

> +}
> +
>   static void iommu_table_free(struct kref *kref)
>   {
>   	unsigned long bitmap_sz;
> @@ -759,10 +763,8 @@ static void iommu_table_free(struct kref *kref)
>   		return;
>   	}
>   
> -	iommu_table_release_pages(tbl);
> -
>   	/* verify that table contains no entries */
> -	if (!bitmap_empty(tbl->it_map, tbl->it_size))
> +	if (iommu_table_in_use(tbl))
>   		pr_warn("%s: Unexpected TCEs\n", __func__);
>   
>   	/* calculate bitmap size in bytes */
> @@ -1068,18 +1070,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
>   	for (i = 0; i < tbl->nr_pools; i++)
>   		spin_lock(&tbl->pools[i].lock);
>   
> -	iommu_table_release_pages(tbl);
> -
> -	if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
> +	if (iommu_table_in_use(tbl)) {
>   		pr_err("iommu_tce: it_map is not empty");
>   		ret = -EBUSY;
> -		/* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
> -		iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
> -				tbl->it_reserved_end);
> -	} else {
> -		memset(tbl->it_map, 0xff, sz);
>   	}
>   
> +	memset(tbl->it_map, 0xff, sz);
> +
>   	for (i = 0; i < tbl->nr_pools; i++)
>   		spin_unlock(&tbl->pools[i].lock);
>   	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
> 

-- 
--
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 02/14] powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE
  2020-09-11 17:07 ` [PATCH v2 02/14] powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE Leonardo Bras
@ 2020-09-29  3:57   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Having System pagesize < IOMMU pagesize may cause a page owned by another
> process/VM to be written by a buggy driver / device.
> 
> As it's intended to use DDW for indirect mapping, it's possible to get
> a configuration where (PAGE_SIZE = 4k) < (IOMMU_PAGE_SIZE() = 64k).


Have you tried 4K VM kernel under phyp with 64K iommu pages? phyp may 
assume it is still correct as it knows that it allocates memory in 256MB 
contiguous chunks and for DMA purposes phyp may assume that there just a 
VM kernel which owns all the VM physical memory and can align size in 
iommu_range_alloc() to the IOMMU page size. Not sure.


> 
> To avoid this, make sure create_ddw() can only use pagesize <= PAGE_SIZE.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9db3927607a4..4eff3a6a441e 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1206,17 +1206,20 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   			goto out_failed;
>   		}
>   	}
> -	if (query.page_size & 4) {
> -		page_shift = 24; /* 16MB */
> -	} else if (query.page_size & 2) {
> +
> +	/* Choose the biggest pagesize available <= PAGE_SHIFT */
> +	if ((query.page_size & 4)) {


This just doubles braces.

Also I believe for the indirect case you should not even try 16MB.


> +		page_shift = 24; /* 16MB - Hugepages */
> +	} else if ((query.page_size & 2) && PAGE_SHIFT >= 16) {


PAGE_SHIFT == 16

>   		page_shift = 16; /* 64kB */
> -	} else if (query.page_size & 1) {
> +	} else if ((query.page_size & 1) && PAGE_SHIFT >= 12) {


Kind of useless check, if it is not 64K, it is 4K.

I understand you are trying to be safe here but this give a new reader 
an idea of more flexibility that there is :) BUILD_BUG_ON() would do 
here imho.



>   		page_shift = 12; /* 4kB */
>   	} else {
>   		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
>   			  query.page_size);
>   		goto out_failed;
>   	}
> +

Unrelated.

>   	/* verify the window * number of ptes will map the partition */
>   	/* check largest block * page size > max memory hotplug addr */
>   	max_addr = ddw_memory_hotplug_max();
> 

-- 
--
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 03/14] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
  2020-09-11 17:07 ` [PATCH v2 03/14] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs Leonardo Bras
@ 2020-09-29  3:57   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> Currently both iommu_alloc_coherent() and iommu_free_coherent() align the
> desired allocation size to PAGE_SIZE, and gets system pages and IOMMU
> mappings (TCEs) for that value.
> 
> When IOMMU_PAGE_SIZE < PAGE_SIZE, this behavior may cause unnecessary
> TCEs to be created for mapping the whole system page.
> 
> Example:
> - PAGE_SIZE = 64k, IOMMU_PAGE_SIZE() = 4k
> - iommu_alloc_coherent() is called for 128 bytes
> - 1 system page (64k) is allocated
> - 16 IOMMU pages (16 x 4k) are allocated (16 TCEs used)
> 
> It would be enough to use a single TCE for this, so 15 TCEs are
> wasted in the process.
> 
> Update iommu_*_coherent() to make sure the size alignment happens only
> for IOMMU_PAGE_SIZE() before calling iommu_alloc() and iommu_free().
> 
> Also, on iommu_range_alloc(), replace ALIGN(n, 1 << tbl->it_page_shift)
> with IOMMU_PAGE_ALIGN(n, tbl), which is easier to read and does the
> same.


This seems alright but rather unrelated to the series, probably makes 
sense to post it separately.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>   arch/powerpc/kernel/iommu.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 9704f3f76e63..7961645a6980 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -237,10 +237,9 @@ static unsigned long iommu_range_alloc(struct device *dev,
>   	}
>   
>   	if (dev)
> -		boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> -				      1 << tbl->it_page_shift);
> +		boundary_size = IOMMU_PAGE_ALIGN(dma_get_seg_boundary(dev) + 1, tbl);
>   	else
> -		boundary_size = ALIGN(1UL << 32, 1 << tbl->it_page_shift);
> +		boundary_size = IOMMU_PAGE_ALIGN(1UL << 32, tbl);
>   	/* 4GB boundary for iseries_hv_alloc and iseries_hv_map */
>   
>   	n = iommu_area_alloc(tbl->it_map, limit, start, npages, tbl->it_offset,
> @@ -858,6 +857,7 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
>   	unsigned int order;
>   	unsigned int nio_pages, io_order;
>   	struct page *page;
> +	size_t size_io = size;
>   
>   	size = PAGE_ALIGN(size);
>   	order = get_order(size);
> @@ -884,8 +884,9 @@ void *iommu_alloc_coherent(struct device *dev, struct iommu_table *tbl,
>   	memset(ret, 0, size);
>   
>   	/* Set up tces to cover the allocated range */
> -	nio_pages = size >> tbl->it_page_shift;
> -	io_order = get_iommu_order(size, tbl);
> +	size_io = IOMMU_PAGE_ALIGN(size_io, tbl);
> +	nio_pages = size_io >> tbl->it_page_shift;
> +	io_order = get_iommu_order(size_io, tbl);
>   	mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
>   			      mask >> tbl->it_page_shift, io_order, 0);
>   	if (mapping == DMA_MAPPING_ERROR) {
> @@ -900,10 +901,9 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size,
>   			 void *vaddr, dma_addr_t dma_handle)
>   {
>   	if (tbl) {
> -		unsigned int nio_pages;
> +		size_t size_io = IOMMU_PAGE_ALIGN(size, tbl);
> +		unsigned int nio_pages = size_io >> tbl->it_page_shift;
>   
> -		size = PAGE_ALIGN(size);
> -		nio_pages = size >> tbl->it_page_shift;
>   		iommu_free(tbl, dma_handle, nio_pages);
>   		size = PAGE_ALIGN(size);
>   		free_pages((unsigned long)vaddr, get_order(size));
> 

-- 
--
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 07/14] powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  2020-09-11 17:07 ` [PATCH v2 07/14] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
@ 2020-09-29  3:57   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-29  3:57 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 12/09/2020 03:07, Leonardo Bras wrote:
> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> There are two functions creating direct_window_list entries in a
> similar way, so create a ddw_list_new_entry() to avoid duplicity and
> simplify those functions.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


> ---
>   arch/powerpc/platforms/pseries/iommu.c | 32 +++++++++++++++++---------
>   1 file changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 49008d2e217d..e4c17d27dcf3 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -869,6 +869,21 @@ static u64 find_existing_ddw(struct device_node *pdn)
>   	return dma_addr;
>   }
>   
> +static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> +						const struct dynamic_dma_window_prop *dma64)
> +{
> +	struct direct_window *window;
> +
> +	window = kzalloc(sizeof(*window), GFP_KERNEL);
> +	if (!window)
> +		return NULL;
> +
> +	window->device = pdn;
> +	window->prop = dma64;
> +
> +	return window;
> +}
> +
>   static int find_existing_ddw_windows(void)
>   {
>   	int len;
> @@ -881,18 +896,15 @@ static int find_existing_ddw_windows(void)
>   
>   	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
>   		direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
> -		if (!direct64)
> -			continue;
> -
> -		window = kzalloc(sizeof(*window), GFP_KERNEL);
> -		if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
> -			kfree(window);
> +		if (!direct64 || len < sizeof(*direct64)) {
>   			remove_ddw(pdn, true);
>   			continue;
>   		}
>   
> -		window->device = pdn;
> -		window->prop = direct64;
> +		window = ddw_list_new_entry(pdn, direct64);
> +		if (!window)
> +			break;
> +
>   		spin_lock(&direct_window_list_lock);
>   		list_add(&window->list, &direct_window_list);
>   		spin_unlock(&direct_window_list_lock);
> @@ -1261,7 +1273,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
>   		  create.liobn, dn);
>   
> -	window = kzalloc(sizeof(*window), GFP_KERNEL);
> +	window = ddw_list_new_entry(pdn, ddwprop);
>   	if (!window)
>   		goto out_clear_window;
>   
> @@ -1280,8 +1292,6 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		goto out_free_window;
>   	}
>   
> -	window->device = pdn;
> -	window->prop = ddwprop;
>   	spin_lock(&direct_window_list_lock);
>   	list_add(&window->list, &direct_window_list);
>   	spin_unlock(&direct_window_list_lock);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift
  2020-09-29  3:56   ` Alexey Kardashevskiy
@ 2020-09-29 18:25     ` Leonardo Bras
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2020-09-29 18:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> 
> On 12/09/2020 03:07, Leonardo Bras wrote:
> > Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> 
> These normally go right before "signed-off-by".
> 

Yeah, it looks like something went wrong between git format-patch and
git send-email. I will look for in in the next series.

> 
> > Some functions assume IOMMU page size can only be 4K (pageshift == 12).
> > Update them to accept any page size passed, so we can use 64K pages.
> > 
> > In the process, some defines like TCE_SHIFT were made obsolete, and then
> > removed.
> > 
> > IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
> > a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
> > no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
> > It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
> > tce_buildmulti_pSeriesLP().
> > 
> > Most places had a tbl struct, so using tbl->it_page_shift was simple.
> > tce_free_pSeriesLP() was a special case, since callers not always have a
> > tbl struct, so adding a tceshift parameter seems the right thing to do.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> 
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Thanks for reviewing!

> 
> 
> > ---
> >   arch/powerpc/include/asm/tce.h         |  8 ------
> >   arch/powerpc/platforms/pseries/iommu.c | 39 +++++++++++++++-----------
> >   2 files changed, 23 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
> > index db5fc2f2262d..0c34d2756d92 100644
> > --- a/arch/powerpc/include/asm/tce.h
> > +++ b/arch/powerpc/include/asm/tce.h
> > @@ -19,15 +19,7 @@
> >   #define TCE_VB			0
> >   #define TCE_PCI			1
> >   
> > -/* TCE page size is 4096 bytes (1 << 12) */
> > -
> > -#define TCE_SHIFT	12
> > -#define TCE_PAGE_SIZE	(1 << TCE_SHIFT)
> > -
> >   #define TCE_ENTRY_SIZE		8		/* each TCE is 64 bits */
> > -
> > -#define TCE_RPN_MASK		0xfffffffffful  /* 40-bit RPN (4K pages) */
> > -#define TCE_RPN_SHIFT		12
> >   #define TCE_VALID		0x800		/* TCE valid */
> >   #define TCE_ALLIO		0x400		/* TCE valid for all lpars */
> >   #define TCE_PCI_WRITE		0x2		/* write from PCI allowed */
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index e4198700ed1a..9db3927607a4 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
> >   	u64 proto_tce;
> >   	__be64 *tcep;
> >   	u64 rpn;
> > +	const unsigned long tceshift = tbl->it_page_shift;
> > +	const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
> >   
> >   	proto_tce = TCE_PCI_READ; // Read allowed
> >   
> > @@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
> >   
> >   	while (npages--) {
> >   		/* can't move this out since we might cross MEMBLOCK boundary */
> > -		rpn = __pa(uaddr) >> TCE_SHIFT;
> > -		*tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> > +		rpn = __pa(uaddr) >> tceshift;
> > +		*tcep = cpu_to_be64(proto_tce | rpn << tceshift);
> >   
> > -		uaddr += TCE_PAGE_SIZE;
> > +		uaddr += pagesize;
> >   		tcep++;
> >   	}
> >   	return 0;
> > @@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
> >   	return be64_to_cpu(*tcep);
> >   }
> >   
> > -static void tce_free_pSeriesLP(unsigned long liobn, long, long);
> > +static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
> >   static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
> >   
> >   static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> > @@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> >   		proto_tce |= TCE_PCI_WRITE;
> >   
> >   	while (npages--) {
> > -		tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
> > +		tce = proto_tce | rpn << tceshift;
> >   		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
> >   
> >   		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
> >   			ret = (int)rc;
> > -			tce_free_pSeriesLP(liobn, tcenum_start,
> > +			tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
> >   			                   (npages_start - (npages + 1)));
> >   			break;
> >   		}
> > @@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >   	long tcenum_start = tcenum, npages_start = npages;
> >   	int ret = 0;
> >   	unsigned long flags;
> > +	const unsigned long tceshift = tbl->it_page_shift;
> >   
> >   	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
> >   		return tce_build_pSeriesLP(tbl->it_index, tcenum,
> > -					   tbl->it_page_shift, npages, uaddr,
> > +					   tceshift, npages, uaddr,
> >   		                           direction, attrs);
> >   	}
> >   
> > @@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >   		if (!tcep) {
> >   			local_irq_restore(flags);
> >   			return tce_build_pSeriesLP(tbl->it_index, tcenum,
> > -					tbl->it_page_shift,
> > +					tceshift,
> >   					npages, uaddr, direction, attrs);
> >   		}
> >   		__this_cpu_write(tce_page, tcep);
> >   	}
> >   
> > -	rpn = __pa(uaddr) >> TCE_SHIFT;
> > +	rpn = __pa(uaddr) >> tceshift;
> >   	proto_tce = TCE_PCI_READ;
> >   	if (direction != DMA_TO_DEVICE)
> >   		proto_tce |= TCE_PCI_WRITE;
> > @@ -245,12 +248,12 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >   		limit = min_t(long, npages, 4096/TCE_ENTRY_SIZE);
> >   
> >   		for (l = 0; l < limit; l++) {
> > -			tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
> > +			tcep[l] = cpu_to_be64(proto_tce | rpn << tceshift);
> >   			rpn++;
> >   		}
> >   
> >   		rc = plpar_tce_put_indirect((u64)tbl->it_index,
> > -					    (u64)tcenum << 12,
> > +					    (u64)tcenum << tceshift,
> >   					    (u64)__pa(tcep),
> >   					    limit);
> >   
> > @@ -277,12 +280,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >   	return ret;
> >   }
> >   
> > -static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
> > +static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
> > +			       long npages)
> >   {
> >   	u64 rc;
> >   
> >   	while (npages--) {
> > -		rc = plpar_tce_put((u64)liobn, (u64)tcenum << 12, 0);
> > +		rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, 0);
> >   
> >   		if (rc && printk_ratelimit()) {
> >   			printk("tce_free_pSeriesLP: plpar_tce_put failed. rc=%lld\n", rc);
> > @@ -301,9 +305,11 @@ static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long n
> >   	u64 rc;
> >   
> >   	if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
> > -		return tce_free_pSeriesLP(tbl->it_index, tcenum, npages);
> > +		return tce_free_pSeriesLP(tbl->it_index, tcenum,
> > +					  tbl->it_page_shift, npages);
> >   
> > -	rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
> > +	rc = plpar_tce_stuff((u64)tbl->it_index,
> > +			     (u64)tcenum << tbl->it_page_shift, 0, npages);
> >   
> >   	if (rc && printk_ratelimit()) {
> >   		printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
> > @@ -319,7 +325,8 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
> >   	u64 rc;
> >   	unsigned long tce_ret;
> >   
> > -	rc = plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, &tce_ret);
> > +	rc = plpar_tce_get((u64)tbl->it_index,
> > +			   (u64)tcenum << tbl->it_page_shift, &tce_ret);
> >   
> >   	if (rc && printk_ratelimit()) {
> >   		printk("tce_get_pSeriesLP: plpar_tce_get failed. rc=%lld\n", rc);
> > 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window"
  2020-09-29  3:55   ` Alexey Kardashevskiy
@ 2020-09-29 20:54     ` Leonardo Bras
  2020-09-30  7:29       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2020-09-29 20:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2020-09-29 at 13:55 +1000, Alexey Kardashevskiy wrote:
> 
> On 12/09/2020 03:07, Leonardo Bras wrote:
> > Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> > 
> > A previous change introduced the usage of DDW as a bigger indirect DMA
> > mapping when the DDW available size does not map the whole partition.
> > 
> > As most of the code that manipulates direct mappings was reused for
> > indirect mappings, it's necessary to rename all names and debug/info
> > messages to reflect that it can be used for both kinds of mapping.
> > 
> > Also, defines DEFAULT_DMA_WIN as "ibm,dma-window" to document that
> > it's the name of the default DMA window.
> 
> "ibm,dma-window" is so old so it does not need a macro (which btw would 
> be DMA_WIN_PROPNAME to match the other names) :)

Thanks for bringing that to my attention!
In fact, DMA_WIN_PROPNAME makes more sense, but it's still generic and
doesn't look to point to a generic one.

Would that be ok to call it DEFAULT_WIN_PROPNAME ?


> 
> 
> > Those changes are not supposed to change how the code works in any
> > way, just adjust naming.
> 
> I simply have this in my .vimrc for the cases like this one:
> 
> ===
> This should cause no behavioural change.
> ===

Great tip! I will make sure to have this saved here :)

Thank you!


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window"
  2020-09-29 20:54     ` Leonardo Bras
@ 2020-09-30  7:29       ` Alexey Kardashevskiy
  2021-04-13  6:03         ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2020-09-30  7:29 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 30/09/2020 06:54, Leonardo Bras wrote:
> On Tue, 2020-09-29 at 13:55 +1000, Alexey Kardashevskiy wrote:
>>
>> On 12/09/2020 03:07, Leonardo Bras wrote:
>>> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
>>>
>>> A previous change introduced the usage of DDW as a bigger indirect DMA
>>> mapping when the DDW available size does not map the whole partition.
>>>
>>> As most of the code that manipulates direct mappings was reused for
>>> indirect mappings, it's necessary to rename all names and debug/info
>>> messages to reflect that it can be used for both kinds of mapping.
>>>
>>> Also, defines DEFAULT_DMA_WIN as "ibm,dma-window" to document that
>>> it's the name of the default DMA window.
>>
>> "ibm,dma-window" is so old so it does not need a macro (which btw would
>> be DMA_WIN_PROPNAME to match the other names) :)
> 
> Thanks for bringing that to my attention!
> In fact, DMA_WIN_PROPNAME makes more sense, but it's still generic and
> doesn't look to point to a generic one.
> 
> Would that be ok to call it DEFAULT_WIN_PROPNAME ?


I would not touch it at all, the property name is painfully known and 
not going to change ever. Does anyone else define it as a macro? I do 
not see any:

[fstn1-p1 kernel-dma-bypass]$ git grep "ibm,dma-window"  | wc -l
8
[fstn1-p1 kernel-dma-bypass]$ git grep "define.*ibm,dma-window"  | wc -l
0



> 
> 
>>
>>
>>> Those changes are not supposed to change how the code works in any
>>> way, just adjust naming.
>>
>> I simply have this in my .vimrc for the cases like this one:
>>
>> ===
>> This should cause no behavioural change.
>> ===
> 
> Great tip! I will make sure to have this saved here :)
> 
> Thank you!
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  2020-09-29  3:57   ` Alexey Kardashevskiy
@ 2021-04-11  6:55     ` Leonardo Bras
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2021-04-11  6:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Hello Alexey, thanks for the feedback!

On Tue, 2020-09-29 at 13:57 +1000, Alexey Kardashevskiy wrote:
> 
> On 12/09/2020 03:07, Leonardo Bras wrote:
> > diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> > index ffb2637dc82b..c838da3d8f32 100644
> > --- a/arch/powerpc/kernel/iommu.c
> > +++ b/arch/powerpc/kernel/iommu.c
> > @@ -655,34 +655,21 @@ static void iommu_table_reserve_pages(struct iommu_table *tbl,
> >   	if (tbl->it_offset == 0)
> >   		set_bit(0, tbl->it_map);
> >   
> > 
> > 
> > 
> > +	/* Check if res_start..res_end is a valid range in the table */
> > +	if (res_start >= res_end || res_start < tbl->it_offset ||
> > +	    res_end > (tbl->it_offset + tbl->it_size)) {
> > +		tbl->it_reserved_start = tbl->it_offset;
> > +		tbl->it_reserved_end = tbl->it_offset;
> 
> 
> This silently ignores overlapped range of the reserved area and the 
> window which does not seem right.

Humm, that makes sense.
Would it be better to do something like this?

if (res_start < tbl->it_offset) 
	res_start = tbl->it_offset;

if (res_end > (tbl->it_offset + tbl->it_size))
	res_end = tbl->it_offset + tbl->it_size;

if (res_start >= res_end) {
	tbl->it_reserved_start = tbl->it_offset;
	tbl->it_reserved_end = tbl->it_offset;
	return;
}


> > +		return;
> > +	}
> > +
> >   	tbl->it_reserved_start = res_start;
> >   	tbl->it_reserved_end = res_end;

> >   -	/* Check if res_start..res_end isn't empty and overlaps the table */
> > -	if (res_start && res_end &&
> > -			(tbl->it_offset + tbl->it_size < res_start ||
> > -			 res_end < tbl->it_offset))
> > -		return;
> > -
> >   	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
> >   		set_bit(i - tbl->it_offset, tbl->it_map);
> >   }
> > +bool iommu_table_in_use(struct iommu_table *tbl)
> > +{
> > +	unsigned long start = 0, end;
> > +
> > +	/* ignore reserved bit0 */
> > +	if (tbl->it_offset == 0)
> > +		start = 1;
> > +	end = tbl->it_reserved_start - tbl->it_offset;
> > +	if (find_next_bit(tbl->it_map, end, start) != end)
> > +		return true;
> > +
> > +	start = tbl->it_reserved_end - tbl->it_offset;
> > +	end = tbl->it_size;
> > +	return find_next_bit(tbl->it_map, end, start) != end;
> > +
> 
> Unnecessary empty line.

Sure, removing. 
Thanks!


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  2020-09-29  3:56   ` Alexey Kardashevskiy
@ 2021-04-11  7:52     ` Leonardo Bras
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2021-04-11  7:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > 
> >   	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
> > -		  create.liobn, dn);
> > +		create.liobn, dn);
> 
> 
> Unrelated. If you think the spaces/tabs thing needs to be fixed, make it 
> a separate patch and do all these changes there at once.

Sorry, it was some issue with my editor / diff. 
I removed those changes for next version.

> > -out_free_prop:
> > +out_prop_free:
> 
> 
> Really? :) s/out_prop_del/out_del_prop/ may be? The less unrelated 
> changes the better.

I changed all labels I added to have out_<action>_<target>, I think
that will allow it to stay like existing labels.


Thanks for reviewing!
Leonardo Bras


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
  2020-09-29  3:56   ` Alexey Kardashevskiy
@ 2021-04-11  8:16     ` Leonardo Bras
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2021-04-11  8:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> 
> On 12/09/2020 03:07, Leonardo Bras wrote:
> > Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> > 
> > Add a new helper _iommu_table_setparms(), and use it in
> > iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
> > code.
> > 
> > Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
> > so move it to the new helper. Since we need the iommu_table_ops to be
> > declared before used, move iommu_table_lpar_multi_ops and
> > iommu_table_pseries_ops to before their respective iommu_table_setparms*().
> > 
> > The tce_exchange_pseries() also had to be moved up, since it's used in
> > iommu_table_lpar_multi_ops.xchg_no_kill.
> 
> 
> Use forward declarations (preferred) or make a separate patch for moving 
> chunks (I do not see much point).

Fixed :)

> > @@ -509,8 +559,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
> >   	const unsigned long *basep;
> >   	const u32 *sizep;

> > -	node = phb->dn;
> > +	/* Test if we are going over 2GB of DMA space */
> > 
> > 
> > 
> > +	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> > +		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> > +		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> > +	}
> 
> 
> s/0x80000000ul/2*SZ_1G/

Done!

> 
> but more to the point - why this check? QEMU can create windows at 0 and 
> as big as the VM requested. And I am pretty sure I can construct QEMU 
> command line such as it won't have MMIO32 at all and a 4GB default DMA 
> window.
> 

Oh, the diff was a little strange here. I did not add this snippet, it
was already in that function, but since I created the helper, the diff
made it look like I introduced this piece of code.
Please take a look in the diff snippet bellow. (This same lines were
there.)

> > @@ -519,33 +574,25 @@ static void iommu_table_setparms(struct pci_controller *phb,
> >   		return;
> >   	}
> >   
> > -	tbl->it_base = (unsigned long)__va(*basep);
> > 
> > 
> > 
> > +	_iommu_table_setparms(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
> > +			      phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
> > +			      (unsigned long)__va(*basep), &iommu_table_pseries_ops);
> >     	if (!is_kdump_kernel())
> > 
> > 
> > 
> >   		memset((void *)tbl->it_base, 0, *sizep);
> > 
> > -	tbl->it_busno = phb->bus->number;
> > -	tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> > -
> > -	/* Units of tce entries */
> > -	tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
> > -
> > -	/* Test if we are going over 2GB of DMA space */
> > -	if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> > -		udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> > -		panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> > -	}
> > -
> >   	phb->dma_window_base_cur += phb->dma_window_size;
> > -
> > -	/* Set the tce table size - measured in entries */
> > -	tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> > -
> > -	tbl->it_index = 0;
> > -	tbl->it_blocksize = 16;
> > -	tbl->it_type = TCE_PCI;
> >   }
> >   

Thanks for reviewing, Alexey!



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  2020-09-29  3:56   ` Alexey Kardashevskiy
@ 2021-04-13  5:44     ` Leonardo Bras
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2021-04-13  5:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> 
> On 12/09/2020 03:07, Leonardo Bras wrote:
> > Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> > 
> > Update remove_dma_window() so it can be used to remove DDW with a given
> > property name.
> > 
> 
> Out of context this seems useless. How about?
> ===
> At the moment pseries stores information about created directly mapped 
> DDW window in DIRECT64_PROPNAME. We are going to implement indirect DDW 
> window which we need to preserve during kexec so we need another 
> property for that.
> ===
> 
> Feel free to correct my english :)

Thanks Alexey! It helped a lot me better describing the reasoning
before the change!

> > 
> >   	ret = of_remove_property(np, win);
> >   	if (ret)
> >   		pr_warn("%pOF: failed to remove direct window property: %d\n",
> >   			np, ret);
> > +	return 0;
> 
> 
> You do not test the return code anywhere until 13/14 so I'd say merge 
> this one into 13/14, the same comment applies to 12/14. If you do not 
> move chunks in 13/14, it is going to be fairly small patch.

I have applied most suggested changes for patches 11,12,13, but on a
single diff it still amounts to 275 lines. 
To be honest, after 7 months of sending this patchset (and working on
other stuff), patch 13 looks a lot like to read alone, and merging with
11 & 12 seems to be too much.

Would it be ok to apply the changes and leave them all separated, or as
a mid ground just merging 11 & 12 together? 

Adding your suggested text above should be enough to get enough context
for them. I could also say why the return code is left unused for now.

Best regards,
Leonardo Bras



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2020-09-29  3:56   ` Alexey Kardashevskiy
@ 2021-04-13  5:49     ` Leonardo Bras
  2021-04-13  7:18       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2021-04-13  5:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

Thanks for the feedback!

On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> > +static phys_addr_t ddw_memory_hotplug_max(void)
> 
> 
> Please, forward declaration or a separate patch; this creates 
> unnecessary noise to the actual change.
> 

Sure, done!

> 
> > +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
> > +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
> > +		iommu_init_table(tbl, pci->phb->node, 0, 0);
> 
> 
> It is 0,0 only if win_addr>0 which is not the QEMU case.
> 

Oh, ok.
I previously though it was ok to use 0,0 here as any other usage in
this file was also 0,0. 

What should I use to get the correct parameters? Use the previous tbl
it_reserved_start and tbl->it_reserved_end is enough?

Best regards,
Leonardo Bras
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window"
  2020-09-30  7:29       ` Alexey Kardashevskiy
@ 2021-04-13  6:03         ` Leonardo Bras
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2021-04-13  6:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Wed, 2020-09-30 at 17:29 +1000, Alexey Kardashevskiy wrote:
> 
> On 30/09/2020 06:54, Leonardo Bras wrote:
> > On Tue, 2020-09-29 at 13:55 +1000, Alexey Kardashevskiy wrote:
> > > 
> > > On 12/09/2020 03:07, Leonardo Bras wrote:
> > > > Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
> > > > 
> > > > A previous change introduced the usage of DDW as a bigger indirect DMA
> > > > mapping when the DDW available size does not map the whole partition.
> > > > 
> > > > As most of the code that manipulates direct mappings was reused for
> > > > indirect mappings, it's necessary to rename all names and debug/info
> > > > messages to reflect that it can be used for both kinds of mapping.
> > > > 
> > > > Also, defines DEFAULT_DMA_WIN as "ibm,dma-window" to document that
> > > > it's the name of the default DMA window.
> > > 
> > > "ibm,dma-window" is so old so it does not need a macro (which btw would
> > > be DMA_WIN_PROPNAME to match the other names) :)
> > 
> > Thanks for bringing that to my attention!
> > In fact, DMA_WIN_PROPNAME makes more sense, but it's still generic and
> > doesn't look to point to a generic one.
> > 
> > Would that be ok to call it DEFAULT_WIN_PROPNAME ?
> 
> 
> I would not touch it at all, the property name is painfully known and 
> not going to change ever. Does anyone else define it as a macro? I do 
> not see any:

Ok then, reverting define :)

Thanks!

> 
> [fstn1-p1 kernel-dma-bypass]$ git grep "ibm,dma-window"  | wc -l
> 8
> [fstn1-p1 kernel-dma-bypass]$ git grep "define.*ibm,dma-window"  | wc -l
> 0
> 
> 
> 
> > 
> > 
> > > 
> > > 
> > > > Those changes are not supposed to change how the code works in any
> > > > way, just adjust naming.
> > > 
> > > I simply have this in my .vimrc for the cases like this one:
> > > 
> > > ===
> > > This should cause no behavioural change.
> > > ===
> > 
> > Great tip! I will make sure to have this saved here :)
> > 
> > Thank you!
> > 
> 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-04-13  5:49     ` Leonardo Bras
@ 2021-04-13  7:18       ` Alexey Kardashevskiy
  2021-04-13  7:33         ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2021-04-13  7:18 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 13/04/2021 15:49, Leonardo Bras wrote:
> Thanks for the feedback!
> 
> On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
>>> -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
>>> +static phys_addr_t ddw_memory_hotplug_max(void)
>>
>>
>> Please, forward declaration or a separate patch; this creates
>> unnecessary noise to the actual change.
>>
> 
> Sure, done!
> 
>>
>>> +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
>>> +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
>>> +		iommu_init_table(tbl, pci->phb->node, 0, 0);
>>
>>
>> It is 0,0 only if win_addr>0 which is not the QEMU case.
>>
> 
> Oh, ok.
> I previously though it was ok to use 0,0 here as any other usage in
> this file was also 0,0.
> 
> What should I use to get the correct parameters? Use the previous tbl
> it_reserved_start and tbl->it_reserved_end is enough?

depends on whether you carry reserved start/end even if they are outside 
of the dma window.


-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-04-13  7:18       ` Alexey Kardashevskiy
@ 2021-04-13  7:33         ` Leonardo Bras
  2021-04-13  7:41           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2021-04-13  7:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
> 
> On 13/04/2021 15:49, Leonardo Bras wrote:
> > Thanks for the feedback!
> > 
> > On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > > > -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> > > > +static phys_addr_t ddw_memory_hotplug_max(void)
> > > 
> > > 
> > > Please, forward declaration or a separate patch; this creates
> > > unnecessary noise to the actual change.
> > > 
> > 
> > Sure, done!
> > 
> > > 
> > > > +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
> > > > +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
> > > > +		iommu_init_table(tbl, pci->phb->node, 0, 0);
> > > 
> > > 
> > > It is 0,0 only if win_addr>0 which is not the QEMU case.
> > > 
> > 
> > Oh, ok.
> > I previously though it was ok to use 0,0 here as any other usage in
> > this file was also 0,0.
> > 
> > What should I use to get the correct parameters? Use the previous tbl
> > it_reserved_start and tbl->it_reserved_end is enough?
> 
> depends on whether you carry reserved start/end even if they are outside 
> of the dma window.
> 

Oh, that makes sense.
On a previous patch (5/14 IIRC), I changed the behavior to only store
the valid range on tbl, but now I understand why it's important to
store the raw value.

Ok, I will change it back so the reserved range stays in tbl even if it
does not intersect with the DMA window. This way I can reuse the values
in case of indirect mapping with DDW.

Is that ok? Are the reserved values are supposed to stay the same after
changing from Default DMA window to DDW?

Best regards,
Leonardo Bras


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-04-13  7:33         ` Leonardo Bras
@ 2021-04-13  7:41           ` Alexey Kardashevskiy
  2021-04-13  7:58             ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2021-04-13  7:41 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 13/04/2021 17:33, Leonardo Bras wrote:
> On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
>>
>> On 13/04/2021 15:49, Leonardo Bras wrote:
>>> Thanks for the feedback!
>>>
>>> On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
>>>>> -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
>>>>> +static phys_addr_t ddw_memory_hotplug_max(void)
>>>>
>>>>
>>>> Please, forward declaration or a separate patch; this creates
>>>> unnecessary noise to the actual change.
>>>>
>>>
>>> Sure, done!
>>>
>>>>
>>>>> +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
>>>>> +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
>>>>> +		iommu_init_table(tbl, pci->phb->node, 0, 0);
>>>>
>>>>
>>>> It is 0,0 only if win_addr>0 which is not the QEMU case.
>>>>
>>>
>>> Oh, ok.
>>> I previously though it was ok to use 0,0 here as any other usage in
>>> this file was also 0,0.
>>>
>>> What should I use to get the correct parameters? Use the previous tbl
>>> it_reserved_start and tbl->it_reserved_end is enough?
>>
>> depends on whether you carry reserved start/end even if they are outside
>> of the dma window.
>>
> 
> Oh, that makes sense.
> On a previous patch (5/14 IIRC), I changed the behavior to only store
> the valid range on tbl, but now I understand why it's important to
> store the raw value.
> 
> Ok, I will change it back so the reserved range stays in tbl even if it
> does not intersect with the DMA window. This way I can reuse the values
> in case of indirect mapping with DDW.
> 
> Is that ok? Are the reserved values are supposed to stay the same after
> changing from Default DMA window to DDW?

I added them to know what bits in it_map to ignore when checking if 
there is any active user of the table. If you have non zero reserved 
start/end but they do not affect it_map, then it is rather weird way to 
carry reserved start/end from DDW to no-DDW. May be do not set these at 
all for DDW with window start at 1<<59 and when going back to no-DDW (or 
if DDW starts at 0) - just set them from MMIO32, just as they are 
initialized in the first place.



-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-04-13  7:41           ` Alexey Kardashevskiy
@ 2021-04-13  7:58             ` Leonardo Bras
  2021-04-13  8:24               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 40+ messages in thread
From: Leonardo Bras @ 2021-04-13  7:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2021-04-13 at 17:41 +1000, Alexey Kardashevskiy wrote:
> 
> On 13/04/2021 17:33, Leonardo Bras wrote:
> > On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
> > > 
> > > On 13/04/2021 15:49, Leonardo Bras wrote:
> > > > Thanks for the feedback!
> > > > 
> > > > On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > > > > > -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> > > > > > +static phys_addr_t ddw_memory_hotplug_max(void)
> > > > > 
> > > > > 
> > > > > Please, forward declaration or a separate patch; this creates
> > > > > unnecessary noise to the actual change.
> > > > > 
> > > > 
> > > > Sure, done!
> > > > 
> > > > > 
> > > > > > +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
> > > > > > +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
> > > > > > +		iommu_init_table(tbl, pci->phb->node, 0, 0);
> > > > > 
> > > > > 
> > > > > It is 0,0 only if win_addr>0 which is not the QEMU case.
> > > > > 
> > > > 
> > > > Oh, ok.
> > > > I previously though it was ok to use 0,0 here as any other usage in
> > > > this file was also 0,0.
> > > > 
> > > > What should I use to get the correct parameters? Use the previous tbl
> > > > it_reserved_start and tbl->it_reserved_end is enough?
> > > 
> > > depends on whether you carry reserved start/end even if they are outside
> > > of the dma window.
> > > 
> > 
> > Oh, that makes sense.
> > On a previous patch (5/14 IIRC), I changed the behavior to only store
> > the valid range on tbl, but now I understand why it's important to
> > store the raw value.
> > 
> > Ok, I will change it back so the reserved range stays in tbl even if it
> > does not intersect with the DMA window. This way I can reuse the values
> > in case of indirect mapping with DDW.
> > 
> > Is that ok? Are the reserved values are supposed to stay the same after
> > changing from Default DMA window to DDW?
> 
> I added them to know what bits in it_map to ignore when checking if 
> there is any active user of the table. If you have non zero reserved 
> start/end but they do not affect it_map, then it is rather weird way to 
> carry reserved start/end from DDW to no-DDW.
> 

Ok, agreed.

>  May be do not set these at 
> all for DDW with window start at 1<<59 and when going back to no-DDW (or 
> if DDW starts at 0) - just set them from MMIO32, just as they are 
> initialized in the first place.
> 

If I get it correctly from pci_of_scan.c, MMIO32 = {0, 32MB}, is that
correct?

So, if DDW starts at any value in this range (most probably at zero),
we should remove the rest, is that correct?

Could it always use iommu_init_table(..., 0, 32MB) here, so it always
reserve any part of the DMA window that's in this range? Ot there may
be other reserved values range?

> and when going back to no-DDW 

After iommu_init_table() there should be no failure, so it looks like
there is no 'going back to no-DDW'. Am I missing something?

Thanks for helping!

Best regards,
Leonardo Bras


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-04-13  7:58             ` Leonardo Bras
@ 2021-04-13  8:24               ` Alexey Kardashevskiy
  2021-04-13 23:01                 ` Leonardo Bras
  0 siblings, 1 reply; 40+ messages in thread
From: Alexey Kardashevskiy @ 2021-04-13  8:24 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev



On 13/04/2021 17:58, Leonardo Bras wrote:
> On Tue, 2021-04-13 at 17:41 +1000, Alexey Kardashevskiy wrote:
>>
>> On 13/04/2021 17:33, Leonardo Bras wrote:
>>> On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
>>>>
>>>> On 13/04/2021 15:49, Leonardo Bras wrote:
>>>>> Thanks for the feedback!
>>>>>
>>>>> On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
>>>>>>> -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
>>>>>>> +static phys_addr_t ddw_memory_hotplug_max(void)
>>>>>>
>>>>>>
>>>>>> Please, forward declaration or a separate patch; this creates
>>>>>> unnecessary noise to the actual change.
>>>>>>
>>>>>
>>>>> Sure, done!
>>>>>
>>>>>>
>>>>>>> +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
>>>>>>> +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
>>>>>>> +		iommu_init_table(tbl, pci->phb->node, 0, 0);
>>>>>>
>>>>>>
>>>>>> It is 0,0 only if win_addr>0 which is not the QEMU case.
>>>>>>
>>>>>
>>>>> Oh, ok.
>>>>> I previously though it was ok to use 0,0 here as any other usage in
>>>>> this file was also 0,0.
>>>>>
>>>>> What should I use to get the correct parameters? Use the previous tbl
>>>>> it_reserved_start and tbl->it_reserved_end is enough?
>>>>
>>>> depends on whether you carry reserved start/end even if they are outside
>>>> of the dma window.
>>>>
>>>
>>> Oh, that makes sense.
>>> On a previous patch (5/14 IIRC), I changed the behavior to only store
>>> the valid range on tbl, but now I understand why it's important to
>>> store the raw value.
>>>
>>> Ok, I will change it back so the reserved range stays in tbl even if it
>>> does not intersect with the DMA window. This way I can reuse the values
>>> in case of indirect mapping with DDW.
>>>
>>> Is that ok? Are the reserved values are supposed to stay the same after
>>> changing from Default DMA window to DDW?
>>
>> I added them to know what bits in it_map to ignore when checking if
>> there is any active user of the table. If you have non zero reserved
>> start/end but they do not affect it_map, then it is rather weird way to
>> carry reserved start/end from DDW to no-DDW.
>>
> 
> Ok, agreed.
> 
>>   May be do not set these at
>> all for DDW with window start at 1<<59 and when going back to no-DDW (or
>> if DDW starts at 0) - just set them from MMIO32, just as they are
>> initialized in the first place.
>>
> 
> If I get it correctly from pci_of_scan.c, MMIO32 = {0, 32MB}, is that
> correct?

No, under QEMU it is 0x8000.0000-0x1.0000.0000:

/proc/device-tree/pci@800000020000000/ranges

7 cells for each resource, the second one is MMIO32 (the first is IO 
ports, the last is 64bit MMIO).

> 
> So, if DDW starts at any value in this range (most probably at zero),
> we should remove the rest, is that correct?
> 
> Could it always use iommu_init_table(..., 0, 32MB) here, so it always
> reserve any part of the DMA window that's in this range? Ot there may
> be other reserved values range?
> 
>> and when going back to no-DDW
> 
> After iommu_init_table() there should be no failure, so it looks like
> there is no 'going back to no-DDW'. Am I missing something?

Well, a random driver could request 32bit DMA and if the new window is 
1:1, then it would break but this does not seem to happen and we do not 
support it anyway so no loss here.


-- 
Alexey

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping
  2021-04-13  8:24               ` Alexey Kardashevskiy
@ 2021-04-13 23:01                 ` Leonardo Bras
  0 siblings, 0 replies; 40+ messages in thread
From: Leonardo Bras @ 2021-04-13 23:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy,
	Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev

On Tue, 2021-04-13 at 18:24 +1000, Alexey Kardashevskiy wrote:
> 
> On 13/04/2021 17:58, Leonardo Bras wrote:
> > On Tue, 2021-04-13 at 17:41 +1000, Alexey Kardashevskiy wrote:
> > > 
> > > On 13/04/2021 17:33, Leonardo Bras wrote:
> > > > On Tue, 2021-04-13 at 17:18 +1000, Alexey Kardashevskiy wrote:
> > > > > 
> > > > > On 13/04/2021 15:49, Leonardo Bras wrote:
> > > > > > Thanks for the feedback!
> > > > > > 
> > > > > > On Tue, 2020-09-29 at 13:56 +1000, Alexey Kardashevskiy wrote:
> > > > > > > > -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> > > > > > > > +static phys_addr_t ddw_memory_hotplug_max(void)
> > > > > > > 
> > > > > > > 
> > > > > > > Please, forward declaration or a separate patch; this creates
> > > > > > > unnecessary noise to the actual change.
> > > > > > > 
> > > > > > 
> > > > > > Sure, done!
> > > > > > 
> > > > > > > 
> > > > > > > > +		_iommu_table_setparms(tbl, pci->phb->bus->number, create.liobn, win_addr,
> > > > > > > > +				      1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
> > > > > > > > +		iommu_init_table(tbl, pci->phb->node, 0, 0);
> > > > > > > 
> > > > > > > 
> > > > > > > It is 0,0 only if win_addr>0 which is not the QEMU case.
> > > > > > > 
> > > > > > 
> > > > > > Oh, ok.
> > > > > > I previously though it was ok to use 0,0 here as any other usage in
> > > > > > this file was also 0,0.
> > > > > > 
> > > > > > What should I use to get the correct parameters? Use the previous tbl
> > > > > > it_reserved_start and tbl->it_reserved_end is enough?
> > > > > 
> > > > > depends on whether you carry reserved start/end even if they are outside
> > > > > of the dma window.
> > > > > 
> > > > 
> > > > Oh, that makes sense.
> > > > On a previous patch (5/14 IIRC), I changed the behavior to only store
> > > > the valid range on tbl, but now I understand why it's important to
> > > > store the raw value.
> > > > 
> > > > Ok, I will change it back so the reserved range stays in tbl even if it
> > > > does not intersect with the DMA window. This way I can reuse the values
> > > > in case of indirect mapping with DDW.
> > > > 
> > > > Is that ok? Are the reserved values are supposed to stay the same after
> > > > changing from Default DMA window to DDW?
> > > 
> > > I added them to know what bits in it_map to ignore when checking if
> > > there is any active user of the table. If you have non zero reserved
> > > start/end but they do not affect it_map, then it is rather weird way to
> > > carry reserved start/end from DDW to no-DDW.
> > > 
> > 
> > Ok, agreed.
> > 
> > >   May be do not set these at
> > > all for DDW with window start at 1<<59 and when going back to no-DDW (or
> > > if DDW starts at 0) - just set them from MMIO32, just as they are
> > > initialized in the first place.
> > > 
> > 
> > If I get it correctly from pci_of_scan.c, MMIO32 = {0, 32MB}, is that
> > correct?
> 
> No, under QEMU it is 0x8000.0000-0x1.0000.0000:
> 
> /proc/device-tree/pci@800000020000000/ranges
> 
> 7 cells for each resource, the second one is MMIO32 (the first is IO 
> ports, the last is 64bit MMIO).
> > 
> > So, if DDW starts at any value in this range (most probably at zero),
> > we should remove the rest, is that correct?
> > 
> > Could it always use iommu_init_table(..., 0, 32MB) here, so it always
> > reserve any part of the DMA window that's in this range? Ot there may
> > be other reserved values range?
> > 
> > > and when going back to no-DDW
> > 
> > After iommu_init_table() there should be no failure, so it looks like
> > there is no 'going back to no-DDW'. Am I missing something?
> 
> Well, a random driver could request 32bit DMA and if the new window is 
> 1:1, then it would break but this does not seem to happen and we do not 
> support it anyway so no loss here.
> 

So you would recommend reading "ranges" with of_get_property() and
using the second entry (cells 7 - 13) in this point, get base & size to
make sure it does not map anything here? (should have no effect if the
value does not intersect with the DMA window)

Thank you for reviewing!
Leonardo Bras


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-04-13 23:02 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 17:07 [PATCH v2 00/14] DDW Indirect Mapping Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 01/14] powerpc/pseries/iommu: Replace hard-coded page shift Leonardo Bras
2020-09-29  3:56   ` Alexey Kardashevskiy
2020-09-29 18:25     ` Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 02/14] powerpc/pseries/iommu: Makes sure IOMMU_PAGE_SIZE <= PAGE_SIZE Leonardo Bras
2020-09-29  3:57   ` Alexey Kardashevskiy
2020-09-11 17:07 ` [PATCH v2 03/14] powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs Leonardo Bras
2020-09-29  3:57   ` Alexey Kardashevskiy
2020-09-11 17:07 ` [PATCH v2 04/14] powerpc/kernel/iommu: Use largepool as a last resort when !largealloc Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 05/14] powerpc/kernel/iommu: Add new iommu_table_in_use() helper Leonardo Bras
2020-09-29  3:57   ` Alexey Kardashevskiy
2021-04-11  6:55     ` Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 06/14] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 07/14] powerpc/pseries/iommu: Add ddw_list_new_entry() helper Leonardo Bras
2020-09-29  3:57   ` Alexey Kardashevskiy
2020-09-11 17:07 ` [PATCH v2 08/14] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 09/14] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() Leonardo Bras
2020-09-29  3:56   ` Alexey Kardashevskiy
2021-04-11  7:52     ` Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 10/14] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper Leonardo Bras
2020-09-29  3:56   ` Alexey Kardashevskiy
2021-04-11  8:16     ` Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 11/14] powerpc/pseries/iommu: Update remove_dma_window() to accept property name Leonardo Bras
2020-09-29  3:56   ` Alexey Kardashevskiy
2021-04-13  5:44     ` Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 12/14] powerpc/pseries/iommu: Find existing DDW with given " Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 13/14] powerpc/pseries/iommu: Make use of DDW for indirect mapping Leonardo Bras
2020-09-29  3:56   ` Alexey Kardashevskiy
2021-04-13  5:49     ` Leonardo Bras
2021-04-13  7:18       ` Alexey Kardashevskiy
2021-04-13  7:33         ` Leonardo Bras
2021-04-13  7:41           ` Alexey Kardashevskiy
2021-04-13  7:58             ` Leonardo Bras
2021-04-13  8:24               ` Alexey Kardashevskiy
2021-04-13 23:01                 ` Leonardo Bras
2020-09-11 17:07 ` [PATCH v2 14/14] powerpc/pseries/iommu: Rename "direct window" to "dma window" Leonardo Bras
2020-09-29  3:55   ` Alexey Kardashevskiy
2020-09-29 20:54     ` Leonardo Bras
2020-09-30  7:29       ` Alexey Kardashevskiy
2021-04-13  6:03         ` Leonardo Bras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).