All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support
@ 2010-10-27  3:35 Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 1/7 v2] macio: ensure all dma routines get copied over Nishanth Aravamudan
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc; +Cc: sonnyrao, miltonm, Paul Mackerras, linuxppc-dev

The following series, which builds upon the series of cleanups I posted
on 9/15 and 10/18 as "ppc iommu cleanups", enables the pseries firmware
feature dynamic dma windows. This feature will allow future devices to
have a 64-bit DMA mapping covering all memory, coexisting with a smaller
IOMMU window in 32-bit PCI space

Changes from v1 to v2:

Fixed numerous bugs/issues found in testing.
Reworked to be based off platform hook dma_set_mask().

Nishanth Aravamudan (7):
  macio: ensure all dma routines get copied over
  ppc: add memory_hotplug_max
  ppc: do not search for dma-window property on dlpar remove
  ppc: checking for pdn->parent is redundant
  ppc/iommu: do not need to check for dma_window == NULL
  ppc/iommu: pass phb only to iommu_table_setparms_lpar
  ppc: add dynamic dma window support

 arch/powerpc/include/asm/device.h      |    6 +
 arch/powerpc/include/asm/mmzone.h      |    5 +
 arch/powerpc/mm/numa.c                 |   26 ++
 arch/powerpc/platforms/pseries/iommu.c |  600 ++++++++++++++++++++++++++++++--
 drivers/macintosh/macio_asic.c         |    7 +-
 5 files changed, 619 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 1/7 v2] macio: ensure all dma routines get copied over
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
@ 2010-10-27  3:35 ` Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 2/7 v2] ppc: add memory_hotplug_max Nishanth Aravamudan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc; +Cc: sonnyrao, miltonm, Paul Mackerras, linuxppc-dev

Also add a comment to dev_archdata, indicating that changes there need
to be verified against the driver code.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/include/asm/device.h |    6 ++++++
 drivers/macintosh/macio_asic.c    |    7 +++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index a3954e4..16d25c0 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -9,6 +9,12 @@
 struct dma_map_ops;
 struct device_node;
 
+/*
+ * Arch extensions to struct device.
+ *
+ * When adding fields, consider macio_add_one_device in
+ * drivers/macintosh/macio_asic.c
+ */
 struct dev_archdata {
 	/* DMA operations on that device */
 	struct dma_map_ops	*dma_ops;
diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c
index b6e7ddc..4daf9e5 100644
--- a/drivers/macintosh/macio_asic.c
+++ b/drivers/macintosh/macio_asic.c
@@ -387,11 +387,10 @@ static struct macio_dev * macio_add_one_device(struct macio_chip *chip,
 	/* Set the DMA ops to the ones from the PCI device, this could be
 	 * fishy if we didn't know that on PowerMac it's always direct ops
 	 * or iommu ops that will work fine
+	 *
+	 * To get all the fields, copy all archdata
 	 */
-	dev->ofdev.dev.archdata.dma_ops =
-		chip->lbus.pdev->dev.archdata.dma_ops;
-	dev->ofdev.dev.archdata.dma_data =
-		chip->lbus.pdev->dev.archdata.dma_data;
+	dev->ofdev.dev.archdata = chip->lbus.pdev->dev.archdata;
 #endif /* CONFIG_PCI */
 
 #ifdef DEBUG
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 2/7 v2] ppc: add memory_hotplug_max
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 1/7 v2] macio: ensure all dma routines get copied over Nishanth Aravamudan
@ 2010-10-27  3:35 ` Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove Nishanth Aravamudan
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc
  Cc: sonnyrao, linuxppc-dev, miltonm, H Hartley Sweeten,
	Paul Mackerras, Anton Blanchard, H. Peter Anvin, Yinghai Lu

Add a function to get the maximum address that can be hotplug added.
This is needed to calculate the size of the tce table needed to cover
all memory in 1:1 mode.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
Comments on where to export?
---
 arch/powerpc/include/asm/mmzone.h |    5 +++++
 arch/powerpc/mm/numa.c            |   26 ++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/mmzone.h b/arch/powerpc/include/asm/mmzone.h
index aac87cb..fd3fd58 100644
--- a/arch/powerpc/include/asm/mmzone.h
+++ b/arch/powerpc/include/asm/mmzone.h
@@ -33,6 +33,9 @@ extern int numa_cpu_lookup_table[];
 extern cpumask_var_t node_to_cpumask_map[];
 #ifdef CONFIG_MEMORY_HOTPLUG
 extern unsigned long max_pfn;
+u64 memory_hotplug_max(void);
+#else
+#define memory_hotplug_max() memblock_end_of_DRAM()
 #endif
 
 /*
@@ -42,6 +45,8 @@ extern unsigned long max_pfn;
 #define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
 #define node_end_pfn(nid)	(NODE_DATA(nid)->node_end_pfn)
 
+#else
+#define memory_hotplug_max() memblock_end_of_DRAM()
 #endif /* CONFIG_NEED_MULTIPLE_NODES */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 74505b2..8c0944c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1247,4 +1247,30 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
 	return nid;
 }
 
+static u64 hot_add_drconf_memory_max(void)
+{
+        struct device_node *memory = NULL;
+        unsigned int drconf_cell_cnt = 0;
+        u64 lmb_size = 0;
+        const u32 *dm = 0;
+
+        memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+        if (memory) {
+                drconf_cell_cnt = of_get_drconf_memory(memory, &dm);
+                lmb_size = of_get_lmb_size(memory);
+                of_node_put(memory);
+        }
+        return lmb_size * drconf_cell_cnt;
+}
+
+/*
+ * memory_hotplug_max - return max address of memory that may be added
+ *
+ * This is currently only used on systems that support drconfig memory
+ * hotplug.
+ */
+u64 memory_hotplug_max(void)
+{
+        return max(hot_add_drconf_memory_max(), memblock_end_of_DRAM());
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 1/7 v2] macio: ensure all dma routines get copied over Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 2/7 v2] ppc: add memory_hotplug_max Nishanth Aravamudan
@ 2010-10-27  3:35 ` Nishanth Aravamudan
  2010-11-29  1:38   ` Benjamin Herrenschmidt
  2010-10-27  3:35 ` [RFC PATCH 4/7 v2] ppc: checking for pdn->parent is redundant Nishanth Aravamudan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc; +Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

The iommu_table pointer in the pci auxiliary struct of device_node has
not been used by the iommu ops since the dma refactor of
12d04eef927bf61328af2c7cbe756c96f98ac3bf, however this code still uses
it to find tables for dlpar. By only setting the PCI_DN iommu_table
pointer on nodes with dma window properties, we will be able to quickly
find the node for later checks, and can remove the table without looking
for the the dma window property on dlpar remove.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9184db3..8ab32da 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -455,9 +455,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
 		pr_debug("  created table: %p\n", ppci->iommu_table);
 	}
-
-	if (pdn != dn)
-		PCI_DN(dn)->iommu_table = ppci->iommu_table;
 }
 
 
@@ -571,8 +568,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
-		if (pci && pci->iommu_table &&
-		    of_get_property(np, "ibm,dma-window", NULL))
+		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
 		break;
 	default:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 4/7 v2] ppc: checking for pdn->parent is redundant
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (2 preceding siblings ...)
  2010-10-27  3:35 ` [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove Nishanth Aravamudan
@ 2010-10-27  3:35 ` Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 5/7 v2] ppc/iommu: do not need to check for dma_window == NULL Nishanth Aravamudan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc; +Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

The device tree root is never a pci bus, and will not have a
PCI_DN(pdn), so the check for PCI_DN added in
650f7b3b2f0ead0673e90452cf3dedde97c537ba makes the check for pdn->parent
redundant and it can be removed.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 8ab32da..0ae5a60 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -530,10 +530,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	}
 	pr_debug("  parent is %s\n", pdn->full_name);
 
-	/* Check for parent == NULL so we don't try to setup the empty EADS
-	 * slots on POWER4 machines.
-	 */
-	if (dma_window == NULL || pdn->parent == NULL) {
+	if (dma_window == NULL) {
 		pr_debug("  no dma window for device, linking to parent\n");
 		set_iommu_table_base(&dev->dev, PCI_DN(pdn)->iommu_table);
 		return;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 5/7 v2] ppc/iommu: do not need to check for dma_window == NULL
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (3 preceding siblings ...)
  2010-10-27  3:35 ` [RFC PATCH 4/7 v2] ppc: checking for pdn->parent is redundant Nishanth Aravamudan
@ 2010-10-27  3:35 ` Nishanth Aravamudan
  2010-10-27  3:35 ` [RFC PATCH 6/7 v2] ppc/iommu: pass phb only to iommu_table_setparms_lpar Nishanth Aravamudan
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc; +Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

The block in pci_dma_dev_setup_pSeriesLP for dma_window == NULL can be
removed because we will only teminate the loop if we had already allocated
a iommu table for that node or we found a window.  While there may be
no window for the device, the intresting part is if we are reusing a
table or creating it for the first device under it.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 0ae5a60..9d564b9 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -530,12 +530,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	}
 	pr_debug("  parent is %s\n", pdn->full_name);
 
-	if (dma_window == NULL) {
-		pr_debug("  no dma window for device, linking to parent\n");
-		set_iommu_table_base(&dev->dev, PCI_DN(pdn)->iommu_table);
-		return;
-	}
-
 	pci = PCI_DN(pdn);
 	if (!pci->iommu_table) {
 		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 6/7 v2] ppc/iommu: pass phb only to iommu_table_setparms_lpar
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (4 preceding siblings ...)
  2010-10-27  3:35 ` [RFC PATCH 5/7 v2] ppc/iommu: do not need to check for dma_window == NULL Nishanth Aravamudan
@ 2010-10-27  3:35 ` Nishanth Aravamudan
  2010-12-09  4:24   ` Benjamin Herrenschmidt
  2010-10-27  3:35 ` [RFC PATCH 7/7 v2] ppc: add dynamic dma window support Nishanth Aravamudan
  2010-11-08 19:42 ` [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
  7 siblings, 1 reply; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc; +Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

iommu_table_setparms_lpar needs either the phb or the subbusnumber
(not both), pass the phb to make it similar to iommu_table_setparms.

Note: In cases where a caller was passing bus->number previously to
iommu_table_setparms_lpar() rather than phb->bus->number, this can lead
to a different value in tbl->it_busno. The only example of this was the
removed pci_dma_dev_setup_pSeriesLP(), removed in "ppc/iommu: remove
unneeded pci_dma_dev_setup_pSeriesLP".

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9d564b9..45c6865 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -323,14 +323,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
 static void iommu_table_setparms_lpar(struct pci_controller *phb,
 				      struct device_node *dn,
 				      struct iommu_table *tbl,
-				      const void *dma_window,
-				      int bussubno)
+				      const void *dma_window)
 {
 	unsigned long offset, size;
 
-	tbl->it_busno  = bussubno;
 	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
 
+	tbl->it_busno = phb->bus->number;
 	tbl->it_base   = 0;
 	tbl->it_blocksize  = 16;
 	tbl->it_type = TCE_PCI;
@@ -534,8 +533,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	if (!pci->iommu_table) {
 		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
 				   pci->phb->node);
-		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window,
-			pci->phb->bus->number);
+		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window);
 		pci->iommu_table = iommu_init_table(tbl, pci->phb->node);
 		pr_debug("  created table: %p\n", pci->iommu_table);
 	} else {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 7/7 v2] ppc: add dynamic dma window support
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (5 preceding siblings ...)
  2010-10-27  3:35 ` [RFC PATCH 6/7 v2] ppc/iommu: pass phb only to iommu_table_setparms_lpar Nishanth Aravamudan
@ 2010-10-27  3:35 ` Nishanth Aravamudan
  2010-12-09  4:17   ` Benjamin Herrenschmidt
  2010-12-09 19:09   ` Nishanth Aravamudan
  2010-11-08 19:42 ` [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
  7 siblings, 2 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-10-27  3:35 UTC (permalink / raw)
  To: nacc; +Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

If firmware allows us to map all of a partition's memory for DMA on a
particular bridge, create a 1:1 mapping of that memory. Add hooks for
dealing with hotplug events. Dyanmic DMA windows can use larger than the
default page size, and we use the largest one possible.

Not-yet-signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

---

I've tested this briefly on a machine with suitable firmware/hardware.
Things seem to work well, but I want to do more exhaustive I/O testing
before asking for upstream merging. I would really appreciate any
feedback on the updated approach.

Specific questions:

Ben, did I hook into the dma_set_mask() platform callback as you
expected? Anything I can do better or which perhaps might lead to
gotchas later?

I've added a disable_ddw option, but perhaps it would be better to
just disable the feature if iommu=force?

---
 arch/powerpc/platforms/pseries/iommu.c |  577 +++++++++++++++++++++++++++++++-
 1 files changed, 575 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 45c6865..8090b6b 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -33,6 +33,7 @@
 #include <linux/pci.h>
 #include <linux/dma-mapping.h>
 #include <linux/crash_dump.h>
+#include <linux/memory.h>
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
@@ -45,6 +46,7 @@
 #include <asm/tce.h>
 #include <asm/ppc-pci.h>
 #include <asm/udbg.h>
+#include <asm/mmzone.h>
 
 #include "plpar_wrappers.h"
 
@@ -270,6 +272,139 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
 	return tce_ret;
 }
 
+/* this is compatable with cells for the device tree property */
+struct dynamic_dma_window_prop {
+	__be32	liobn;		/* tce table number */
+	__be64	dma_base;	/* address hi,lo */
+	__be32	tce_shift;	/* ilog2(tce_page_size) */
+	__be32	window_shift;	/* ilog2(tce_window_size) */
+};
+
+struct direct_window {
+	struct device_node *device;
+	const struct dynamic_dma_window_prop *prop;
+	struct list_head list;
+};
+static LIST_HEAD(direct_window_list);
+/* prevents races between memory on/offline and window creation */
+static DEFINE_SPINLOCK(direct_window_list_lock);
+/* protects initializing window twice for same device */
+static DEFINE_MUTEX(direct_window_init_mutex);
+#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+
+static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
+					unsigned long num_pfn, const void *arg)
+{
+	const struct dynamic_dma_window_prop *maprange = arg;
+	int rc;
+	u64 tce_size, num_tce, dma_offset, next;
+	u32 tce_shift;
+	long limit;
+
+	tce_shift = be32_to_cpu(maprange->tce_shift);
+	tce_size = 1ULL << tce_shift;
+	next = start_pfn << PAGE_SHIFT;
+	num_tce = num_pfn << PAGE_SHIFT;
+
+	/* round back to the beginning of the tce page size */
+	num_tce += next & (tce_size - 1);
+	next &= ~(tce_size - 1);
+
+	/* covert to number of tces */
+	num_tce |= tce_size - 1;
+	num_tce >>= tce_shift;
+
+	do {
+		/*
+		 * Set up the page with TCE data, looping through and setting
+		 * the values.
+		 */
+		limit = min_t(long, num_tce, 512);
+		dma_offset = next + be64_to_cpu(maprange->dma_base);
+
+		rc = plpar_tce_stuff(be64_to_cpu(maprange->liobn),
+					    (u64)dma_offset,
+					     0, limit);
+		num_tce -= limit;
+	} while (num_tce > 0 && !rc);
+
+	return rc;
+}
+
+static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
+					unsigned long num_pfn, const void *arg)
+{
+	const struct dynamic_dma_window_prop *maprange = arg;
+	u64 *tcep, tce_size, num_tce, dma_offset, next, proto_tce, liobn;
+	u32 tce_shift;
+	u64 rc = 0;
+	long l, limit;
+
+	local_irq_disable();	/* to protect tcep and the page behind it */
+	tcep = __get_cpu_var(tce_page);
+
+	if (!tcep) {
+		tcep = (u64 *)__get_free_page(GFP_ATOMIC);
+		if (!tcep) {
+			local_irq_enable();
+			return -ENOMEM;
+		}
+		__get_cpu_var(tce_page) = tcep;
+	}
+
+	proto_tce = TCE_PCI_READ | TCE_PCI_WRITE;
+
+	liobn = (u64)be32_to_cpu(maprange->liobn);
+	tce_shift = be32_to_cpu(maprange->tce_shift);
+	tce_size = 1ULL << tce_shift;
+	next = start_pfn << PAGE_SHIFT;
+	num_tce = num_pfn << PAGE_SHIFT;
+
+	/* round back to the beginning of the tce page size */
+	num_tce += next & (tce_size - 1);
+	next &= ~(tce_size - 1);
+
+	/* covert to number of tces */
+	num_tce |= tce_size - 1;
+	num_tce >>= tce_shift;
+
+	/* We can map max one pageful of TCEs at a time */
+	do {
+		/*
+		 * Set up the page with TCE data, looping through and setting
+		 * the values.
+		 */
+		limit = min_t(long, num_tce, 4096/TCE_ENTRY_SIZE);
+		dma_offset = next + be64_to_cpu(maprange->dma_base);
+
+		for (l = 0; l < limit; l++) {
+			tcep[l] = proto_tce | next;
+			next += tce_size;
+		}
+
+		rc = plpar_tce_put_indirect(liobn,
+					    (u64)dma_offset,
+					    (u64)virt_to_abs(tcep),
+					    limit);
+
+		num_tce -= limit;
+	} while (num_tce > 0 && !rc);
+                printk("plpar_tce_put_indirect for offset 0x%llx and tcep[0] 0x%llx returned %llu\n",
+                                (u64)dma_offset, tcep[0], rc);
+
+	/* error cleanup: caller will clear whole range */
+
+	local_irq_enable();
+	return rc;
+}
+
+static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
+                unsigned long num_pfn, void *arg)
+{
+        return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
+}
+
+
 #ifdef CONFIG_PCI
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
@@ -449,8 +584,7 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 	if (!ppci->iommu_table) {
 		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
 				   ppci->phb->node);
-		iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window,
-			bus->number);
+		iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window);
 		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
 		pr_debug("  created table: %p\n", ppci->iommu_table);
 	}
@@ -496,6 +630,338 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		       pci_name(dev));
 }
 
+static int __read_mostly disable_ddw;
+
+static int __init disable_ddw_setup(char *str)
+{
+        disable_ddw = 1;
+        printk(KERN_INFO "ppc iommu: disabling ddw.\n");
+
+        return 0;
+}
+
+early_param("disable_ddw", disable_ddw_setup);
+
+static void remove_ddw(struct device_node *np)
+{
+	struct dynamic_dma_window_prop *dwp;
+	struct property *win64;
+	const u32 *ddr_avail;
+        u64 liobn;
+	int len, ret;
+
+	ddr_avail = of_get_property(np, "ibm,ddw-applicable", &len);
+	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
+	if (!win64 || !ddr_avail || len < 3 * sizeof(u32))
+		return;
+
+	dwp = win64->value;
+        liobn = (u64)be32_to_cpu(dwp->liobn);
+
+	/* clear the whole window, note the arg is in kernel pages */
+	ret = tce_clearrange_multi_pSeriesLP(0,
+		1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
+	if (ret)
+		pr_warning("%s failed to clear tces in window.\n",
+			 np->full_name);
+        else
+		pr_warning("%s successfully cleared tces in window.\n",
+			 np->full_name);
+
+	ret = rtas_call(ddr_avail[2], 1, 1, NULL, liobn);
+	if (ret)
+		pr_warning("%s: failed to remove direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
+			np->full_name, ret, ddr_avail[2], liobn);
+	else
+		pr_warning("%s: successfully removed direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
+			np->full_name, ret, ddr_avail[2], liobn);
+
+	ret = prom_remove_property(np, win64);
+	if (ret)
+		pr_warning("%s: failed to remove direct window property (%i)\n",
+			np->full_name, ret);
+	else
+		pr_warning("%s: successfully removed direct window property (%i)\n",
+			np->full_name, ret);
+}
+
+
+static int dupe_ddw_if_already_created(struct pci_dev *dev, struct device_node *pdn)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	struct direct_window *window;
+	const struct dynamic_dma_window_prop *direct64;
+	u64 dma_addr;
+
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	spin_lock(&direct_window_list_lock);
+	/* check if we already created a window and dupe that config if so */
+	list_for_each_entry(window, &direct_window_list, list) {
+		if (window->device == pdn) {
+			direct64 = window->prop;
+			dma_addr = direct64->dma_base;
+			break;
+		}
+	}
+	spin_unlock(&direct_window_list_lock);
+
+	return dma_addr;
+}
+
+static u64 dupe_ddw_if_kexec(struct pci_dev *dev, struct device_node *pdn)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	int len;
+	struct direct_window *window;
+	const struct dynamic_dma_window_prop *direct64;
+	u64 dma_addr;
+
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
+	if (direct64) {
+		window = kzalloc(sizeof(*window), GFP_KERNEL);
+		if (!window) {
+			remove_ddw(pdn);
+		} else {
+			window->device = pdn;
+			window->prop = direct64;
+			spin_lock(&direct_window_list_lock);
+			list_add(&window->list, &direct_window_list);
+			spin_unlock(&direct_window_list_lock);
+			dma_addr = direct64->dma_base;
+		}
+	}
+
+	return dma_addr;
+}
+
+static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	u32 cfg_addr;
+	u64 buid;
+	int ret;
+
+	/*
+	 * Get the config address and phb build of the PE window.
+	 * Rely on eeh to retrieve this for us.
+	 * Retrieve them from the pci device, not the node with the
+	 * dma-window property
+	 */
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	cfg_addr = pcidn->eeh_config_addr;
+	if (pcidn->eeh_pe_config_addr)
+		cfg_addr = pcidn->eeh_pe_config_addr;
+	buid = pcidn->phb->buid;
+	ret = rtas_call(ddr_avail[0], 3, 5, query,
+		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
+	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
+		" returned %d\n", ddr_avail[0], cfg_addr, BUID_HI(buid),
+		BUID_LO(buid), ret);
+	return ret;
+}
+
+static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, int page_shift, int window_shift)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	u32 cfg_addr;
+	u64 buid;
+	int ret;
+
+	/*
+	 * Get the config address and phb build of the PE window.
+	 * Rely on eeh to retrieve this for us.
+	 * Retrieve them from the pci device, not the node with the
+	 * dma-window property
+	 */
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	cfg_addr = pcidn->eeh_config_addr;
+	if (pcidn->eeh_pe_config_addr)
+		cfg_addr = pcidn->eeh_pe_config_addr;
+	buid = pcidn->phb->buid;
+
+	do {
+		/* extra outputs are LIOBN and dma-addr (hi, lo) */
+		ret = rtas_call(ddr_avail[1], 5, 4, &create[0], cfg_addr,
+				BUID_HI(buid), BUID_LO(buid), page_shift, window_shift);
+	} while(rtas_busy_delay(ret));
+	dev_info(&dev->dev,
+		"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
+		"(liobn = 0x%x starting addr = %x %x\n", ddr_avail[1],
+		 cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
+		 window_shift, ret, create[0], create[1], create[2]);
+	
+	return ret;
+}
+
+/*
+ * If the PE supports dynamic dma windows, and there is space for a table
+ * that can map all pages in a linear offset, then setup such a table,
+ * and record the dma-offset in the struct device.
+ *
+ * dev: the pci device we are checking
+ * pdn: the parent pe node with the ibm,dma_window property
+ * Future: also check if we can remap the base window for our base page size
+ *
+ * returns the dma offset for use by dma_set_mask
+ */
+static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
+{
+	int len, ret;
+	u32 query[4], create[3];
+	int page_shift;
+	u64 dma_addr, max_addr;
+	struct device_node *dn;
+	const u32 *uninitialized_var(ddr_avail);
+	struct direct_window *window;
+	struct property *uninitialized_var(win64);
+	struct dynamic_dma_window_prop *ddwprop;
+	const struct dynamic_dma_window_prop *direct64;
+
+	mutex_lock(&direct_window_init_mutex);
+
+	dma_addr = dupe_ddw_if_already_created(dev, pdn);
+	if (dma_addr != 0)
+		goto out_unlock;
+
+	dma_addr = dupe_ddw_if_kexec(dev, pdn);
+	if (dma_addr != 0)
+		goto out_unlock;
+
+	/*
+	 * the ibm,ddw-applicable property holds the tokens for:
+	 * ibm,query-pe-dma-window
+	 * ibm,create-pe-dma-window
+	 * ibm,remove-pe-dma-window
+	 * for the given node in that order.
+	 * the property is actually in the parent, not the PE
+	 */
+	ddr_avail = of_get_property(pdn, "ibm,ddw-applicable", &len);
+	if (!ddr_avail || len < 3 * sizeof(u32))
+		goto out_unlock;
+
+       /*
+	 * Query if there is a second window of size to map the
+	 * whole partition.  Query returns number of windows, largest
+	 * block assigned to PE (partition endpoint), and two bitmasks
+	 * of page sizes: supported and supported for migrate-dma.
+	 */
+	dn = pci_device_to_OF_node(dev);
+	ret = query_ddw(dev, ddr_avail, &query[0]);
+	if (ret != 0)
+		goto out_unlock;
+
+	if (!query[0]) {
+		/*
+		 * no additional windows are available for this device.
+		 * We might be able to reallocate the existing window,
+		 * trading in for a larger page size.
+		 */
+		dev_dbg(&dev->dev, "no free dynamic windows");
+		goto out_unlock;
+	}
+	if (query[2] & 4) {
+		page_shift = 24; /* 16MB */
+	} else if (query[2] & 2) {
+		page_shift = 16; /* 64kB */
+	} else if (query[2] & 1) {
+		page_shift = 12; /* 4kB */
+	} else {
+		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
+			  query[2]);
+		goto out_unlock;
+	}
+	/* verify the window * number of ptes will map the partition */
+	/* check largest block * page size > max memory hotplug addr */
+	max_addr = memory_hotplug_max();
+	if (query[1] < (max_addr >> page_shift)) {
+		dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
+			  "%llu-sized pages\n", max_addr,  query[1],
+			  1ULL << page_shift);
+		goto out_unlock;
+	}
+	len = order_base_2(max_addr);
+	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
+	if (!win64) {
+		dev_info(&dev->dev,
+			"couldn't allocate property for 64bit dma window\n");
+		goto out_unlock;
+	}
+	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
+	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
+	if (!win64->name || !win64->value) {
+		dev_info(&dev->dev,
+			"couldn't allocate property name and value\n");
+		goto out_free_prop;
+	}
+
+	ret = create_ddw(dev, ddr_avail, &create[0], page_shift, len);
+	if (ret != 0)
+		goto out_free_prop;
+
+	*ddwprop = (struct dynamic_dma_window_prop) {
+		.liobn = cpu_to_be32(create[0]),
+		.dma_base = cpu_to_be64(((u64)create[1] << 32) + (u64)create[2]),
+		.tce_shift = cpu_to_be32(page_shift),
+		.window_shift = cpu_to_be32(len)
+	};
+
+	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %s\n",
+		  create[0], dn->full_name);
+
+	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
+			win64->value, tce_setrange_multi_pSeriesLP_walk);
+	if (ret) {
+		dev_info(&dev->dev, "failed to map direct window for %s: %d\n",
+			 dn->full_name, ret);
+		goto out_clear_window;
+	}
+
+	ret = prom_add_property(pdn, win64);
+	if (ret) {
+		dev_err(&dev->dev, "unable to add dma window property for %s: %d",
+			 pdn->full_name, ret);
+		goto out_clear_window;
+	}
+
+	direct64 = ddwprop;
+
+	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	if (!window)
+		goto out_clear_window;
+	window->device = pdn;
+	window->prop = direct64;
+	spin_lock(&direct_window_list_lock);
+	list_add(&window->list, &direct_window_list);
+	spin_unlock(&direct_window_list_lock);
+
+	dma_addr = of_read_number(&create[1], 2);
+	set_dma_offset(&dev->dev, dma_addr);
+	goto out_unlock;
+
+out_clear_window:
+	remove_ddw(pdn);
+
+out_free_prop:
+	kfree(win64->name);
+	kfree(win64->value);
+	kfree(win64);
+
+out_unlock:
+	mutex_unlock(&direct_window_init_mutex);
+	return dma_addr;
+}
+
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 {
 	struct device_node *pdn, *dn;
@@ -542,23 +1008,128 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 
 	set_iommu_table_base(&dev->dev, pci->iommu_table);
 }
+
+static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
+{
+	bool ddw_enabled = false;
+        struct device_node *pdn, *dn;
+        struct pci_dev *pdev;
+	const void *dma_window = NULL;
+        u64 dma_offset;
+
+	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+		return -EIO;
+
+	/* only attempt to use a new window if 64-bit DMA is requested */
+	if (!disable_ddw && dma_mask == DMA_BIT_MASK(64)) {
+		pdev = to_pci_dev(dev);
+
+		dn = pci_device_to_OF_node(pdev);
+		dev_dbg(dev, "node is %s\n", dn->full_name);
+
+		/* 
+		 * the device tree might contain the dma-window properties
+		 * per-device and not neccesarily for the bus. So we need to
+		 * search upwards in the tree until we either hit a dma-window
+		 * property, OR find a parent with a table already allocated.
+		 */
+		for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->iommu_table;
+				pdn = pdn->parent) {
+			dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
+			if (dma_window)
+				break;
+		}
+		if (pdn && PCI_DN(pdn)) {
+			dma_offset = enable_ddw(pdev, pdn);
+			if (dma_offset != 0) {
+				dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset);
+                                set_dma_offset(dev, dma_offset);
+				set_dma_ops(dev, &dma_direct_ops);
+				ddw_enabled = true;
+			}
+		}
+	}
+
+	/* fall-through to iommu ops */
+	if (!ddw_enabled) {
+		dev_info(dev, "Using 32-bit DMA via iommu\n");
+		set_dma_ops(dev, &dma_iommu_ops);
+	}
+
+	*dev->dma_mask = dma_mask;
+	return 0;
+}
+
 #else  /* CONFIG_PCI */
 #define pci_dma_bus_setup_pSeries	NULL
 #define pci_dma_dev_setup_pSeries	NULL
 #define pci_dma_bus_setup_pSeriesLP	NULL
 #define pci_dma_dev_setup_pSeriesLP	NULL
+#define dma_set_mask_pSeriesLP		NULL
 #endif /* !CONFIG_PCI */
 
+static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
+		void *data)
+{
+	struct direct_window *window;
+	struct memory_notify *arg = data;
+	int ret = 0;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	case MEM_CANCEL_ONLINE:
+	case MEM_OFFLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	default:
+		break;
+	}
+	if (ret && action != MEM_CANCEL_ONLINE)
+		return NOTIFY_BAD;
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block iommu_mem_nb = {
+	.notifier_call = iommu_mem_notifier,
+};
+
 static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node)
 {
 	int err = NOTIFY_OK;
 	struct device_node *np = node;
 	struct pci_dn *pci = PCI_DN(np);
+	struct direct_window *window;
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
 		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
+
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			if (window->device == np) {
+				list_del(&window->list);
+				break;
+			}
+		}
+		spin_unlock(&direct_window_list_lock);
+
+		remove_ddw(np);
 		break;
 	default:
 		err = NOTIFY_DONE;
@@ -588,6 +1159,7 @@ void iommu_init_early_pSeries(void)
 		ppc_md.tce_get   = tce_get_pSeriesLP;
 		ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeriesLP;
 		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeriesLP;
+		ppc_md.dma_set_mask = dma_set_mask_pSeriesLP;
 	} else {
 		ppc_md.tce_build = tce_build_pSeries;
 		ppc_md.tce_free  = tce_free_pSeries;
@@ -598,6 +1170,7 @@ void iommu_init_early_pSeries(void)
 
 
 	pSeries_reconfig_notifier_register(&iommu_reconfig_nb);
+	register_memory_notifier(&iommu_mem_nb);
 
 	set_pci_dma_ops(&dma_iommu_ops);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support
  2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (6 preceding siblings ...)
  2010-10-27  3:35 ` [RFC PATCH 7/7 v2] ppc: add dynamic dma window support Nishanth Aravamudan
@ 2010-11-08 19:42 ` Nishanth Aravamudan
  7 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-11-08 19:42 UTC (permalink / raw)
  To: sonnyrao, miltonm, Benjamin Herrenschmidt, Paul Mackerras,
	Grant Likely, linuxppc-dev

Hi all,

On 26.10.2010 [20:35:10 -0700], Nishanth Aravamudan wrote:
> The following series, which builds upon the series of cleanups I posted
> on 9/15 and 10/18 as "ppc iommu cleanups", enables the pseries firmware
> feature dynamic dma windows. This feature will allow future devices to
> have a 64-bit DMA mapping covering all memory, coexisting with a smaller
> IOMMU window in 32-bit PCI space

Was ping'ing on this changeset. If there are no objections, I will
repost it without the RFC tag asking Ben to merge it.

Thanks,
Nish

> Changes from v1 to v2:
> 
> Fixed numerous bugs/issues found in testing.
> Reworked to be based off platform hook dma_set_mask().
> 
> Nishanth Aravamudan (7):
>   macio: ensure all dma routines get copied over
>   ppc: add memory_hotplug_max
>   ppc: do not search for dma-window property on dlpar remove
>   ppc: checking for pdn->parent is redundant
>   ppc/iommu: do not need to check for dma_window == NULL
>   ppc/iommu: pass phb only to iommu_table_setparms_lpar
>   ppc: add dynamic dma window support
> 
>  arch/powerpc/include/asm/device.h      |    6 +
>  arch/powerpc/include/asm/mmzone.h      |    5 +
>  arch/powerpc/mm/numa.c                 |   26 ++
>  arch/powerpc/platforms/pseries/iommu.c |  600 ++++++++++++++++++++++++++++++--
>  drivers/macintosh/macio_asic.c         |    7 +-
>  5 files changed, 619 insertions(+), 25 deletions(-)
> 
> 

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove
  2010-10-27  3:35 ` [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove Nishanth Aravamudan
@ 2010-11-29  1:38   ` Benjamin Herrenschmidt
  2010-12-01  0:30     ` Nishanth Aravamudan
  2010-12-04  0:30     ` Nishanth Aravamudan
  0 siblings, 2 replies; 21+ messages in thread
From: Benjamin Herrenschmidt @ 2010-11-29  1:38 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

On Tue, 2010-10-26 at 20:35 -0700, Nishanth Aravamudan wrote:
> The iommu_table pointer in the pci auxiliary struct of device_node has
> not been used by the iommu ops since the dma refactor of
> 12d04eef927bf61328af2c7cbe756c96f98ac3bf, however this code still uses
> it to find tables for dlpar. By only setting the PCI_DN iommu_table
> pointer on nodes with dma window properties, we will be able to quickly
> find the node for later checks, and can remove the table without looking
> for the the dma window property on dlpar remove.

The answer might well be yes but are we sure this works with busses &
devices that don't have a dma,window ? ie. we always properly look for
parents when assigning pci devices arch_data iommu table ? Did you test
it ? :-) (Best way is to find a card with a P2P bridge on it).

Cheers,
Ben. 

> Signed-off-by: Milton Miller <miltonm@bga.com>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c |    6 +-----
>  1 files changed, 1 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9184db3..8ab32da 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -455,9 +455,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>  		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
>  		pr_debug("  created table: %p\n", ppci->iommu_table);
>  	}
> -
> -	if (pdn != dn)
> -		PCI_DN(dn)->iommu_table = ppci->iommu_table;
>  }
>  
> @@ -571,8 +568,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
>  
>  	switch (action) {
>  	case PSERIES_RECONFIG_REMOVE:
> -		if (pci && pci->iommu_table &&
> -		    of_get_property(np, "ibm,dma-window", NULL))
> +		if (pci && pci->iommu_table)
>  			iommu_free_table(pci->iommu_table, np->full_name);
>  		break;
>  	default:

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove
  2010-11-29  1:38   ` Benjamin Herrenschmidt
@ 2010-12-01  0:30     ` Nishanth Aravamudan
  2010-12-04  0:30     ` Nishanth Aravamudan
  1 sibling, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-12-01  0:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

On 29.11.2010 [12:38:41 +1100], Benjamin Herrenschmidt wrote:
> On Tue, 2010-10-26 at 20:35 -0700, Nishanth Aravamudan wrote:
> > The iommu_table pointer in the pci auxiliary struct of device_node has
> > not been used by the iommu ops since the dma refactor of
> > 12d04eef927bf61328af2c7cbe756c96f98ac3bf, however this code still uses
> > it to find tables for dlpar. By only setting the PCI_DN iommu_table
> > pointer on nodes with dma window properties, we will be able to quickly
> > find the node for later checks, and can remove the table without looking
> > for the the dma window property on dlpar remove.
> 
> The answer might well be yes but are we sure this works with busses &
> devices that don't have a dma,window ? ie. we always properly look for
> parents when assigning pci devices arch_data iommu table ? Did you test
> it ? :-) (Best way is to find a card with a P2P bridge on it).

I haven't tested this particular case. I'm getting a machine to do so
now, though.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove
  2010-11-29  1:38   ` Benjamin Herrenschmidt
  2010-12-01  0:30     ` Nishanth Aravamudan
@ 2010-12-04  0:30     ` Nishanth Aravamudan
  1 sibling, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-12-04  0:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

On 29.11.2010 [12:38:41 +1100], Benjamin Herrenschmidt wrote:
> On Tue, 2010-10-26 at 20:35 -0700, Nishanth Aravamudan wrote:
> > The iommu_table pointer in the pci auxiliary struct of device_node has
> > not been used by the iommu ops since the dma refactor of
> > 12d04eef927bf61328af2c7cbe756c96f98ac3bf, however this code still uses
> > it to find tables for dlpar. By only setting the PCI_DN iommu_table
> > pointer on nodes with dma window properties, we will be able to quickly
> > find the node for later checks, and can remove the table without looking
> > for the the dma window property on dlpar remove.
> 
> The answer might well be yes but are we sure this works with busses &
> devices that don't have a dma,window ? ie. we always properly look for
> parents when assigning pci devices arch_data iommu table ? Did you test
> it ? :-) (Best way is to find a card with a P2P bridge on it).

So I spent quite a while looking for some device or bus that didn't have
"ibm,dma-window" and the boxes I have access to didn't contain any :/

I did test dlpar remove now on p6 and it worked fine.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 7/7 v2] ppc: add dynamic dma window support
  2010-10-27  3:35 ` [RFC PATCH 7/7 v2] ppc: add dynamic dma window support Nishanth Aravamudan
@ 2010-12-09  4:17   ` Benjamin Herrenschmidt
  2010-12-09 19:00     ` Nishanth Aravamudan
  2010-12-09 19:09   ` Nishanth Aravamudan
  1 sibling, 1 reply; 21+ messages in thread
From: Benjamin Herrenschmidt @ 2010-12-09  4:17 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

On Tue, 2010-10-26 at 20:35 -0700, Nishanth Aravamudan wrote:

No much comments... I'm amazed how complex he firmware folks managed to
make this ... 

>  static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node)
>  {
>  	int err = NOTIFY_OK;
>  	struct device_node *np = node;
>  	struct pci_dn *pci = PCI_DN(np);
> +	struct direct_window *window;
>  
>  	switch (action) {
>  	case PSERIES_RECONFIG_REMOVE:
>  		if (pci && pci->iommu_table)
>  			iommu_free_table(pci->iommu_table, np->full_name);
> +
> +		spin_lock(&direct_window_list_lock);
> +		list_for_each_entry(window, &direct_window_list, list) {
> +			if (window->device == np) {
> +				list_del(&window->list);
> +				break;
> +			}
> +		}
> +		spin_unlock(&direct_window_list_lock);

Should you also kfree the window ?


Cheers,
Ben.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 6/7 v2] ppc/iommu: pass phb only to iommu_table_setparms_lpar
  2010-10-27  3:35 ` [RFC PATCH 6/7 v2] ppc/iommu: pass phb only to iommu_table_setparms_lpar Nishanth Aravamudan
@ 2010-12-09  4:24   ` Benjamin Herrenschmidt
  2010-12-09 16:16     ` Nishanth Aravamudan
  0 siblings, 1 reply; 21+ messages in thread
From: Benjamin Herrenschmidt @ 2010-12-09  4:24 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

On Tue, 2010-10-26 at 20:35 -0700, Nishanth Aravamudan wrote:
> iommu_table_setparms_lpar needs either the phb or the subbusnumber
> (not both), pass the phb to make it similar to iommu_table_setparms.
> 
> Note: In cases where a caller was passing bus->number previously to
> iommu_table_setparms_lpar() rather than phb->bus->number, this can lead
> to a different value in tbl->it_busno. The only example of this was the
> removed pci_dma_dev_setup_pSeriesLP(), removed in "ppc/iommu: remove
> unneeded pci_dma_dev_setup_pSeriesLP".
> 
> Signed-off-by: Milton Miller <miltonm@bga.com>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c |    8 +++-----
>  1 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 9d564b9..45c6865 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -323,14 +323,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
>  static void iommu_table_setparms_lpar(struct pci_controller *phb,
>  				      struct device_node *dn,
>  				      struct iommu_table *tbl,
> -				      const void *dma_window,
> -				      int bussubno)
> +				      const void *dma_window)
>  {
>  	unsigned long offset, size;
>  
> -	tbl->it_busno  = bussubno;
>  	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
>  
> +	tbl->it_busno = phb->bus->number;
>  	tbl->it_base   = 0;
>  	tbl->it_blocksize  = 16;
>  	tbl->it_type = TCE_PCI;
> @@ -534,8 +533,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
>  	if (!pci->iommu_table) {
>  		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
>  				   pci->phb->node);
> -		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window,
> -			pci->phb->bus->number);
> +		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window);
>  		pci->iommu_table = iommu_init_table(tbl, pci->phb->node);
>  		pr_debug("  created table: %p\n", pci->iommu_table);
>  	} else {

There's another caller :-) I've fixed that up locally and will push with
the fix.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 6/7 v2] ppc/iommu: pass phb only to iommu_table_setparms_lpar
  2010-12-09  4:24   ` Benjamin Herrenschmidt
@ 2010-12-09 16:16     ` Nishanth Aravamudan
  0 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-12-09 16:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, sonnyrao, Paul Mackerras, Anton Blanchard, miltonm

On 09.12.2010 [15:24:39 +1100], Benjamin Herrenschmidt wrote:
> On Tue, 2010-10-26 at 20:35 -0700, Nishanth Aravamudan wrote:
> > iommu_table_setparms_lpar needs either the phb or the subbusnumber
> > (not both), pass the phb to make it similar to iommu_table_setparms.
> > 
> > Note: In cases where a caller was passing bus->number previously to
> > iommu_table_setparms_lpar() rather than phb->bus->number, this can lead
> > to a different value in tbl->it_busno. The only example of this was the
> > removed pci_dma_dev_setup_pSeriesLP(), removed in "ppc/iommu: remove
> > unneeded pci_dma_dev_setup_pSeriesLP".
> > 
> > Signed-off-by: Milton Miller <miltonm@bga.com>
> > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c |    8 +++-----
> >  1 files changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 9d564b9..45c6865 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -323,14 +323,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
> >  static void iommu_table_setparms_lpar(struct pci_controller *phb,
> >  				      struct device_node *dn,
> >  				      struct iommu_table *tbl,
> > -				      const void *dma_window,
> > -				      int bussubno)
> > +				      const void *dma_window)
> >  {
> >  	unsigned long offset, size;
> >  
> > -	tbl->it_busno  = bussubno;
> >  	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
> >  
> > +	tbl->it_busno = phb->bus->number;
> >  	tbl->it_base   = 0;
> >  	tbl->it_blocksize  = 16;
> >  	tbl->it_type = TCE_PCI;
> > @@ -534,8 +533,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> >  	if (!pci->iommu_table) {
> >  		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
> >  				   pci->phb->node);
> > -		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window,
> > -			pci->phb->bus->number);
> > +		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window);
> >  		pci->iommu_table = iommu_init_table(tbl, pci->phb->node);
> >  		pr_debug("  created table: %p\n", pci->iommu_table);
> >  	} else {
> 
> There's another caller :-) I've fixed that up locally and will push with
> the fix.

Shoot! Thanks for catching that.

-Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 7/7 v2] ppc: add dynamic dma window support
  2010-12-09  4:17   ` Benjamin Herrenschmidt
@ 2010-12-09 19:00     ` Nishanth Aravamudan
  0 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-12-09 19:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: sonnyrao, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

On 09.12.2010 [15:17:06 +1100], Benjamin Herrenschmidt wrote:
> On Tue, 2010-10-26 at 20:35 -0700, Nishanth Aravamudan wrote:
> 
> No much comments... I'm amazed how complex he firmware folks managed to
> make this ... 
> 
> >  static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node)
> >  {
> >  	int err = NOTIFY_OK;
> >  	struct device_node *np = node;
> >  	struct pci_dn *pci = PCI_DN(np);
> > +	struct direct_window *window;
> >  
> >  	switch (action) {
> >  	case PSERIES_RECONFIG_REMOVE:
> >  		if (pci && pci->iommu_table)
> >  			iommu_free_table(pci->iommu_table, np->full_name);
> > +
> > +		spin_lock(&direct_window_list_lock);
> > +		list_for_each_entry(window, &direct_window_list, list) {
> > +			if (window->device == np) {
> > +				list_del(&window->list);
> > +				break;
> > +			}
> > +		}
> > +		spin_unlock(&direct_window_list_lock);
> 
> Should you also kfree the window ?

Yeah, looks like I should. I have a few other questions due to testing,
but I'll reply to my original e-mail with those.

Thanks for the review!
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 7/7 v2] ppc: add dynamic dma window support
  2010-10-27  3:35 ` [RFC PATCH 7/7 v2] ppc: add dynamic dma window support Nishanth Aravamudan
  2010-12-09  4:17   ` Benjamin Herrenschmidt
@ 2010-12-09 19:09   ` Nishanth Aravamudan
  2010-12-11  0:07     ` [PATCH 7/7 v3] " Nishanth Aravamudan
  1 sibling, 1 reply; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-12-09 19:09 UTC (permalink / raw)
  To: sonnyrao, miltonm, Benjamin Herrenschmidt, Paul Mackerras,
	Grant Likely, Anton Blanchard, linuxppc-dev

On 26.10.2010 [20:35:17 -0700], Nishanth Aravamudan wrote:
> If firmware allows us to map all of a partition's memory for DMA on a
> particular bridge, create a 1:1 mapping of that memory. Add hooks for
> dealing with hotplug events. Dyanmic DMA windows can use larger than the
> default page size, and we use the largest one possible.
> 
> Not-yet-signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> 
> ---
> 
> I've tested this briefly on a machine with suitable firmware/hardware.
> Things seem to work well, but I want to do more exhaustive I/O testing
> before asking for upstream merging. I would really appreciate any
> feedback on the updated approach.
> 
> Specific questions:
> 
> Ben, did I hook into the dma_set_mask() platform callback as you
> expected? Anything I can do better or which perhaps might lead to
> gotchas later?
> 
> I've added a disable_ddw option, but perhaps it would be better to
> just disable the feature if iommu=force?

So for the final version, I probably should document this option in
kernel-parameters.txt w/ the patch, right?

<snip>

> +static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
> +					unsigned long num_pfn, const void *arg)
> +{
> +	const struct dynamic_dma_window_prop *maprange = arg;
> +	int rc;
> +	u64 tce_size, num_tce, dma_offset, next;
> +	u32 tce_shift;
> +	long limit;
> +
> +	tce_shift = be32_to_cpu(maprange->tce_shift);
> +	tce_size = 1ULL << tce_shift;
> +	next = start_pfn << PAGE_SHIFT;
> +	num_tce = num_pfn << PAGE_SHIFT;
> +
> +	/* round back to the beginning of the tce page size */
> +	num_tce += next & (tce_size - 1);
> +	next &= ~(tce_size - 1);
> +
> +	/* covert to number of tces */
> +	num_tce |= tce_size - 1;
> +	num_tce >>= tce_shift;
> +
> +	do {
> +		/*
> +		 * Set up the page with TCE data, looping through and setting
> +		 * the values.
> +		 */
> +		limit = min_t(long, num_tce, 512);
> +		dma_offset = next + be64_to_cpu(maprange->dma_base);
> +
> +		rc = plpar_tce_stuff(be64_to_cpu(maprange->liobn),
> +					    (u64)dma_offset,
> +					     0, limit);
> +		num_tce -= limit;
> +	} while (num_tce > 0 && !rc);
> +
> +	return rc;
> +}

There is a bit of a typo here, the liobn is a 32-bit value. I've fixed
this is up locally and will update it when I send out the final version
of this patch.

I'm finding that on dlpar remove of adapters running in slots supporting
64-bit DMA, that the plpar_tce_stuff is failing. Can you think of a
reason why? It looks basically the same as the put_indirect below...

> +static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
> +					unsigned long num_pfn, const void *arg)
> +{
> +	const struct dynamic_dma_window_prop *maprange = arg;
> +	u64 *tcep, tce_size, num_tce, dma_offset, next, proto_tce, liobn;
> +	u32 tce_shift;
> +	u64 rc = 0;
> +	long l, limit;
> +
> +	local_irq_disable();	/* to protect tcep and the page behind it */
> +	tcep = __get_cpu_var(tce_page);
> +
> +	if (!tcep) {
> +		tcep = (u64 *)__get_free_page(GFP_ATOMIC);
> +		if (!tcep) {
> +			local_irq_enable();
> +			return -ENOMEM;
> +		}
> +		__get_cpu_var(tce_page) = tcep;
> +	}
> +
> +	proto_tce = TCE_PCI_READ | TCE_PCI_WRITE;
> +
> +	liobn = (u64)be32_to_cpu(maprange->liobn);
> +	tce_shift = be32_to_cpu(maprange->tce_shift);
> +	tce_size = 1ULL << tce_shift;
> +	next = start_pfn << PAGE_SHIFT;
> +	num_tce = num_pfn << PAGE_SHIFT;
> +
> +	/* round back to the beginning of the tce page size */
> +	num_tce += next & (tce_size - 1);
> +	next &= ~(tce_size - 1);
> +
> +	/* covert to number of tces */
> +	num_tce |= tce_size - 1;
> +	num_tce >>= tce_shift;
> +
> +	/* We can map max one pageful of TCEs at a time */
> +	do {
> +		/*
> +		 * Set up the page with TCE data, looping through and setting
> +		 * the values.
> +		 */
> +		limit = min_t(long, num_tce, 4096/TCE_ENTRY_SIZE);
> +		dma_offset = next + be64_to_cpu(maprange->dma_base);
> +
> +		for (l = 0; l < limit; l++) {
> +			tcep[l] = proto_tce | next;
> +			next += tce_size;
> +		}
> +
> +		rc = plpar_tce_put_indirect(liobn,
> +					    (u64)dma_offset,
> +					    (u64)virt_to_abs(tcep),
> +					    limit);
> +
> +		num_tce -= limit;
> +	} while (num_tce > 0 && !rc);
> +                printk("plpar_tce_put_indirect for offset 0x%llx and tcep[0] 0x%llx returned %llu\n",
> +                                (u64)dma_offset, tcep[0], rc);
> +

I'll cleanup the debugging on the final version too.

<snip>

> +static void remove_ddw(struct device_node *np)
> +{
> +	struct dynamic_dma_window_prop *dwp;
> +	struct property *win64;
> +	const u32 *ddr_avail;
> +        u64 liobn;
> +	int len, ret;
> +
> +	ddr_avail = of_get_property(np, "ibm,ddw-applicable", &len);
> +	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> +	if (!win64 || !ddr_avail || len < 3 * sizeof(u32))
> +		return;
> +
> +	dwp = win64->value;
> +        liobn = (u64)be32_to_cpu(dwp->liobn);
> +
> +	/* clear the whole window, note the arg is in kernel pages */
> +	ret = tce_clearrange_multi_pSeriesLP(0,
> +		1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
> +	if (ret)
> +		pr_warning("%s failed to clear tces in window.\n",
> +			 np->full_name);
> +        else
> +		pr_warning("%s successfully cleared tces in window.\n",
> +			 np->full_name);
> +
> +	ret = rtas_call(ddr_avail[2], 1, 1, NULL, liobn);
> +	if (ret)
> +		pr_warning("%s: failed to remove direct window: rtas returned "
> +			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
> +			np->full_name, ret, ddr_avail[2], liobn);
> +	else
> +		pr_warning("%s: successfully removed direct window: rtas returned "
> +			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
> +			np->full_name, ret, ddr_avail[2], liobn);
> +
> +	ret = prom_remove_property(np, win64);
> +	if (ret)
> +		pr_warning("%s: failed to remove direct window property (%i)\n",
> +			np->full_name, ret);
> +	else
> +		pr_warning("%s: successfully removed direct window property (%i)\n",
> +			np->full_name, ret);
> +}

When this function gets called on dlpar remove of an adapter, it throws
a proc warning because the property has already been removed from
/proc/device-tree (but not the kernel representation) before the
notifiers get called:

static int pSeries_reconfig_remove_node(struct device_node *np)
{
        struct device_node *parent, *child;

        parent = of_get_parent(np);
        if (!parent)
                return -EINVAL;

        if ((child = of_get_next_child(np, NULL))) {
                of_node_put(child);
                of_node_put(parent);
                return -EBUSY;
        }

        remove_node_proc_entries(np);

        blocking_notifier_call_chain(&pSeries_reconfig_chain,
                            PSERIES_RECONFIG_REMOVE, np);
        of_detach_node(np);

        of_node_put(parent);
        of_node_put(np); /* Must decrement the refcount */
        return 0;
}

Am I reading that correctly? Should I add a paramter to remove_ddw that
specifies if it is being called from the reconfig notifier (or perhaps
just whether it needs to remove the property)?

Also, just so I understand, it doesn't seem like dlpar provides an
option for the notifier chain to indicate failure (e.g., the tce stuff
failing above) and prevent the dlpar operation. AFAICT after discussing
with the firmware folks, it's actually non-fatal for the TCEs not to be
cleared during the dlpar remove, but seems like it might indicate an
issue if it happens in the field?

Otherwise, I've hit no problems testing this series under load. Once I
get some feedback on these questions, I'll roll out a new version,
hopefully tomorrow, that can be accepted.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 7/7 v3] ppc: add dynamic dma window support
  2010-12-09 19:09   ` Nishanth Aravamudan
@ 2010-12-11  0:07     ` Nishanth Aravamudan
  2011-01-08  2:53       ` [PATCH] ppc: update dynamic dma support Nishanth Aravamudan
  0 siblings, 1 reply; 21+ messages in thread
From: Nishanth Aravamudan @ 2010-12-11  0:07 UTC (permalink / raw)
  To: sonnyrao, miltonm, Benjamin Herrenschmidt, Paul Mackerras,
	Grant Likely, Anton Blanchard, linuxppc-dev

On 09.12.2010 [11:09:20 -0800], Nishanth Aravamudan wrote:
> On 26.10.2010 [20:35:17 -0700], Nishanth Aravamudan wrote:
> > If firmware allows us to map all of a partition's memory for DMA on a
> > particular bridge, create a 1:1 mapping of that memory. Add hooks for
> > dealing with hotplug events. Dyanmic DMA windows can use larger than the
> > default page size, and we use the largest one possible.
> > 
> > Not-yet-signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > 
> > ---
> > 
> > I've tested this briefly on a machine with suitable firmware/hardware.
> > Things seem to work well, but I want to do more exhaustive I/O testing
> > before asking for upstream merging. I would really appreciate any
> > feedback on the updated approach.
> > 
> > Specific questions:
> > 
> > Ben, did I hook into the dma_set_mask() platform callback as you
> > expected? Anything I can do better or which perhaps might lead to
> > gotchas later?
> > 
> > I've added a disable_ddw option, but perhaps it would be better to
> > just disable the feature if iommu=force?
> 
> So for the final version, I probably should document this option in
> kernel-parameters.txt w/ the patch, right?

Here's an updated version. Ben, think you can pick this up to your tree?

Thanks,
Nish

ppc: add dynamic dma window support
    
If firmware allows us to map all of a partition's memory for DMA on a
particular bridge, create a 1:1 mapping of that memory. Add hooks for
dealing with hotplug events. Dyanmic DMA windows can use larger than the
default page size, and we use the largest one possible.
    
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

---

I've tested this fairly heavily on a machine with suitable
firmware/hardware, including dlpar operations.

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index cdd2a6e..e9ac890 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -618,6 +618,10 @@ and is between 256 and 4096 characters. It is defined in the file
 	disable=	[IPV6]
 			See Documentation/networking/ipv6.txt.
 
+	disable_ddw     [PPC]
+			Disable Dynamic DMA Window support. Use this if
+			to workaround buggy firmware.
+
 	disable_ipv6=	[IPV6]
 			See Documentation/networking/ipv6.txt.
 
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 45c6865..4ba2338 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -33,6 +33,7 @@
 #include <linux/pci.h>
 #include <linux/dma-mapping.h>
 #include <linux/crash_dump.h>
+#include <linux/memory.h>
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
@@ -45,6 +46,7 @@
 #include <asm/tce.h>
 #include <asm/ppc-pci.h>
 #include <asm/udbg.h>
+#include <asm/mmzone.h>
 
 #include "plpar_wrappers.h"
 
@@ -270,6 +272,137 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
 	return tce_ret;
 }
 
+/* this is compatable with cells for the device tree property */
+struct dynamic_dma_window_prop {
+	__be32	liobn;		/* tce table number */
+	__be64	dma_base;	/* address hi,lo */
+	__be32	tce_shift;	/* ilog2(tce_page_size) */
+	__be32	window_shift;	/* ilog2(tce_window_size) */
+};
+
+struct direct_window {
+	struct device_node *device;
+	const struct dynamic_dma_window_prop *prop;
+	struct list_head list;
+};
+static LIST_HEAD(direct_window_list);
+/* prevents races between memory on/offline and window creation */
+static DEFINE_SPINLOCK(direct_window_list_lock);
+/* protects initializing window twice for same device */
+static DEFINE_MUTEX(direct_window_init_mutex);
+#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+
+static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
+					unsigned long num_pfn, const void *arg)
+{
+	const struct dynamic_dma_window_prop *maprange = arg;
+	int rc;
+	u64 tce_size, num_tce, dma_offset, next;
+	u32 tce_shift;
+	long limit;
+
+	tce_shift = be32_to_cpu(maprange->tce_shift);
+	tce_size = 1ULL << tce_shift;
+	next = start_pfn << PAGE_SHIFT;
+	num_tce = num_pfn << PAGE_SHIFT;
+
+	/* round back to the beginning of the tce page size */
+	num_tce += next & (tce_size - 1);
+	next &= ~(tce_size - 1);
+
+	/* covert to number of tces */
+	num_tce |= tce_size - 1;
+	num_tce >>= tce_shift;
+
+	do {
+		/*
+		 * Set up the page with TCE data, looping through and setting
+		 * the values.
+		 */
+		limit = min_t(long, num_tce, 512);
+		dma_offset = next + be64_to_cpu(maprange->dma_base);
+
+		rc = plpar_tce_stuff((u64)be32_to_cpu(maprange->liobn),
+					    (u64)dma_offset,
+					     0, limit);
+		num_tce -= limit;
+	} while (num_tce > 0 && !rc);
+
+	return rc;
+}
+
+static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
+					unsigned long num_pfn, const void *arg)
+{
+	const struct dynamic_dma_window_prop *maprange = arg;
+	u64 *tcep, tce_size, num_tce, dma_offset, next, proto_tce, liobn;
+	u32 tce_shift;
+	u64 rc = 0;
+	long l, limit;
+
+	local_irq_disable();	/* to protect tcep and the page behind it */
+	tcep = __get_cpu_var(tce_page);
+
+	if (!tcep) {
+		tcep = (u64 *)__get_free_page(GFP_ATOMIC);
+		if (!tcep) {
+			local_irq_enable();
+			return -ENOMEM;
+		}
+		__get_cpu_var(tce_page) = tcep;
+	}
+
+	proto_tce = TCE_PCI_READ | TCE_PCI_WRITE;
+
+	liobn = (u64)be32_to_cpu(maprange->liobn);
+	tce_shift = be32_to_cpu(maprange->tce_shift);
+	tce_size = 1ULL << tce_shift;
+	next = start_pfn << PAGE_SHIFT;
+	num_tce = num_pfn << PAGE_SHIFT;
+
+	/* round back to the beginning of the tce page size */
+	num_tce += next & (tce_size - 1);
+	next &= ~(tce_size - 1);
+
+	/* covert to number of tces */
+	num_tce |= tce_size - 1;
+	num_tce >>= tce_shift;
+
+	/* We can map max one pageful of TCEs at a time */
+	do {
+		/*
+		 * Set up the page with TCE data, looping through and setting
+		 * the values.
+		 */
+		limit = min_t(long, num_tce, 4096/TCE_ENTRY_SIZE);
+		dma_offset = next + be64_to_cpu(maprange->dma_base);
+
+		for (l = 0; l < limit; l++) {
+			tcep[l] = proto_tce | next;
+			next += tce_size;
+		}
+
+		rc = plpar_tce_put_indirect(liobn,
+					    (u64)dma_offset,
+					    (u64)virt_to_abs(tcep),
+					    limit);
+
+		num_tce -= limit;
+	} while (num_tce > 0 && !rc);
+
+	/* error cleanup: caller will clear whole range */
+
+	local_irq_enable();
+	return rc;
+}
+
+static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
+		unsigned long num_pfn, void *arg)
+{
+	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
+}
+
+
 #ifdef CONFIG_PCI
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
@@ -449,8 +582,7 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 	if (!ppci->iommu_table) {
 		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
 				   ppci->phb->node);
-		iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window,
-			bus->number);
+		iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window);
 		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
 		pr_debug("  created table: %p\n", ppci->iommu_table);
 	}
@@ -496,6 +628,328 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		       pci_name(dev));
 }
 
+static int __read_mostly disable_ddw;
+
+static int __init disable_ddw_setup(char *str)
+{
+	disable_ddw = 1;
+	printk(KERN_INFO "ppc iommu: disabling ddw.\n");
+
+	return 0;
+}
+
+early_param("disable_ddw", disable_ddw_setup);
+
+static void remove_ddw(struct device_node *np)
+{
+	struct dynamic_dma_window_prop *dwp;
+	struct property *win64;
+	const u32 *ddr_avail;
+	u64 liobn;
+	int len, ret;
+
+	ddr_avail = of_get_property(np, "ibm,ddw-applicable", &len);
+	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
+	if (!win64 || !ddr_avail || len < 3 * sizeof(u32))
+		return;
+
+	dwp = win64->value;
+	liobn = (u64)be32_to_cpu(dwp->liobn);
+
+	/* clear the whole window, note the arg is in kernel pages */
+	ret = tce_clearrange_multi_pSeriesLP(0,
+		1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
+	if (ret)
+		pr_warning("%s failed to clear tces in window.\n",
+			 np->full_name);
+	else
+		pr_warning("%s successfully cleared tces in window.\n",
+			 np->full_name);
+
+	ret = rtas_call(ddr_avail[2], 1, 1, NULL, liobn);
+	if (ret)
+		pr_warning("%s: failed to remove direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
+			np->full_name, ret, ddr_avail[2], liobn);
+	else
+		pr_warning("%s: successfully removed direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %llx\n",
+			np->full_name, ret, ddr_avail[2], liobn);
+}
+
+
+static int dupe_ddw_if_already_created(struct pci_dev *dev, struct device_node *pdn)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	struct direct_window *window;
+	const struct dynamic_dma_window_prop *direct64;
+	u64 dma_addr;
+
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	spin_lock(&direct_window_list_lock);
+	/* check if we already created a window and dupe that config if so */
+	list_for_each_entry(window, &direct_window_list, list) {
+		if (window->device == pdn) {
+			direct64 = window->prop;
+			dma_addr = direct64->dma_base;
+			break;
+		}
+	}
+	spin_unlock(&direct_window_list_lock);
+
+	return dma_addr;
+}
+
+static u64 dupe_ddw_if_kexec(struct pci_dev *dev, struct device_node *pdn)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	int len;
+	struct direct_window *window;
+	const struct dynamic_dma_window_prop *direct64;
+	u64 dma_addr;
+
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
+	if (direct64) {
+		window = kzalloc(sizeof(*window), GFP_KERNEL);
+		if (!window) {
+			remove_ddw(pdn);
+		} else {
+			window->device = pdn;
+			window->prop = direct64;
+			spin_lock(&direct_window_list_lock);
+			list_add(&window->list, &direct_window_list);
+			spin_unlock(&direct_window_list_lock);
+			dma_addr = direct64->dma_base;
+		}
+	}
+
+	return dma_addr;
+}
+
+static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	u32 cfg_addr;
+	u64 buid;
+	int ret;
+
+	/*
+	 * Get the config address and phb build of the PE window.
+	 * Rely on eeh to retrieve this for us.
+	 * Retrieve them from the pci device, not the node with the
+	 * dma-window property
+	 */
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	cfg_addr = pcidn->eeh_config_addr;
+	if (pcidn->eeh_pe_config_addr)
+		cfg_addr = pcidn->eeh_pe_config_addr;
+	buid = pcidn->phb->buid;
+	ret = rtas_call(ddr_avail[0], 3, 5, query,
+		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
+	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
+		" returned %d\n", ddr_avail[0], cfg_addr, BUID_HI(buid),
+		BUID_LO(buid), ret);
+	return ret;
+}
+
+static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, int page_shift, int window_shift)
+{
+	struct device_node *dn;
+	struct pci_dn *pcidn;
+	u32 cfg_addr;
+	u64 buid;
+	int ret;
+
+	/*
+	 * Get the config address and phb build of the PE window.
+	 * Rely on eeh to retrieve this for us.
+	 * Retrieve them from the pci device, not the node with the
+	 * dma-window property
+	 */
+	dn = pci_device_to_OF_node(dev);
+	pcidn = PCI_DN(dn);
+	cfg_addr = pcidn->eeh_config_addr;
+	if (pcidn->eeh_pe_config_addr)
+		cfg_addr = pcidn->eeh_pe_config_addr;
+	buid = pcidn->phb->buid;
+
+	do {
+		/* extra outputs are LIOBN and dma-addr (hi, lo) */
+		ret = rtas_call(ddr_avail[1], 5, 4, &create[0], cfg_addr,
+				BUID_HI(buid), BUID_LO(buid), page_shift, window_shift);
+	} while(rtas_busy_delay(ret));
+	dev_info(&dev->dev,
+		"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
+		"(liobn = 0x%x starting addr = %x %x)\n", ddr_avail[1],
+		 cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
+		 window_shift, ret, create[0], create[1], create[2]);
+	
+	return ret;
+}
+
+/*
+ * If the PE supports dynamic dma windows, and there is space for a table
+ * that can map all pages in a linear offset, then setup such a table,
+ * and record the dma-offset in the struct device.
+ *
+ * dev: the pci device we are checking
+ * pdn: the parent pe node with the ibm,dma_window property
+ * Future: also check if we can remap the base window for our base page size
+ *
+ * returns the dma offset for use by dma_set_mask
+ */
+static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
+{
+	int len, ret;
+	u32 query[4], create[3];
+	int page_shift;
+	u64 dma_addr, max_addr;
+	struct device_node *dn;
+	const u32 *uninitialized_var(ddr_avail);
+	struct direct_window *window;
+	struct property *uninitialized_var(win64);
+	struct dynamic_dma_window_prop *ddwprop;
+
+	mutex_lock(&direct_window_init_mutex);
+
+	dma_addr = dupe_ddw_if_already_created(dev, pdn);
+	if (dma_addr != 0)
+		goto out_unlock;
+
+	dma_addr = dupe_ddw_if_kexec(dev, pdn);
+	if (dma_addr != 0)
+		goto out_unlock;
+
+	/*
+	 * the ibm,ddw-applicable property holds the tokens for:
+	 * ibm,query-pe-dma-window
+	 * ibm,create-pe-dma-window
+	 * ibm,remove-pe-dma-window
+	 * for the given node in that order.
+	 * the property is actually in the parent, not the PE
+	 */
+	ddr_avail = of_get_property(pdn, "ibm,ddw-applicable", &len);
+	if (!ddr_avail || len < 3 * sizeof(u32))
+		goto out_unlock;
+
+       /*
+	 * Query if there is a second window of size to map the
+	 * whole partition.  Query returns number of windows, largest
+	 * block assigned to PE (partition endpoint), and two bitmasks
+	 * of page sizes: supported and supported for migrate-dma.
+	 */
+	dn = pci_device_to_OF_node(dev);
+	ret = query_ddw(dev, ddr_avail, &query[0]);
+	if (ret != 0)
+		goto out_unlock;
+
+	if (!query[0]) {
+		/*
+		 * no additional windows are available for this device.
+		 * We might be able to reallocate the existing window,
+		 * trading in for a larger page size.
+		 */
+		dev_dbg(&dev->dev, "no free dynamic windows");
+		goto out_unlock;
+	}
+	if (query[2] & 4) {
+		page_shift = 24; /* 16MB */
+	} else if (query[2] & 2) {
+		page_shift = 16; /* 64kB */
+	} else if (query[2] & 1) {
+		page_shift = 12; /* 4kB */
+	} else {
+		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
+			  query[2]);
+		goto out_unlock;
+	}
+	/* verify the window * number of ptes will map the partition */
+	/* check largest block * page size > max memory hotplug addr */
+	max_addr = memory_hotplug_max();
+	if (query[1] < (max_addr >> page_shift)) {
+		dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
+			  "%llu-sized pages\n", max_addr,  query[1],
+			  1ULL << page_shift);
+		goto out_unlock;
+	}
+	len = order_base_2(max_addr);
+	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
+	if (!win64) {
+		dev_info(&dev->dev,
+			"couldn't allocate property for 64bit dma window\n");
+		goto out_unlock;
+	}
+	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
+	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
+	if (!win64->name || !win64->value) {
+		dev_info(&dev->dev,
+			"couldn't allocate property name and value\n");
+		goto out_free_prop;
+	}
+
+	ret = create_ddw(dev, ddr_avail, &create[0], page_shift, len);
+	if (ret != 0)
+		goto out_free_prop;
+
+	*ddwprop = (struct dynamic_dma_window_prop) {
+		.liobn = cpu_to_be32(create[0]),
+		.dma_base = cpu_to_be64(((u64)create[1] << 32) + (u64)create[2]),
+		.tce_shift = cpu_to_be32(page_shift),
+		.window_shift = cpu_to_be32(len)
+	};
+
+	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %s\n",
+		  create[0], dn->full_name);
+
+	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	if (!window)
+		goto out_clear_window;
+
+	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
+			win64->value, tce_setrange_multi_pSeriesLP_walk);
+	if (ret) {
+		dev_info(&dev->dev, "failed to map direct window for %s: %d\n",
+			 dn->full_name, ret);
+		goto out_clear_window;
+	}
+
+	ret = prom_add_property(pdn, win64);
+	if (ret) {
+		dev_err(&dev->dev, "unable to add dma window property for %s: %d",
+			 pdn->full_name, ret);
+		goto out_clear_window;
+	}
+
+	window->device = pdn;
+	window->prop = ddwprop;
+	spin_lock(&direct_window_list_lock);
+	list_add(&window->list, &direct_window_list);
+	spin_unlock(&direct_window_list_lock);
+
+	dma_addr = of_read_number(&create[1], 2);
+	set_dma_offset(&dev->dev, dma_addr);
+	goto out_unlock;
+
+out_clear_window:
+	remove_ddw(pdn);
+
+out_free_prop:
+	kfree(win64->name);
+	kfree(win64->value);
+	kfree(win64);
+
+out_unlock:
+	mutex_unlock(&direct_window_init_mutex);
+	return dma_addr;
+}
+
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 {
 	struct device_node *pdn, *dn;
@@ -542,23 +996,129 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 
 	set_iommu_table_base(&dev->dev, pci->iommu_table);
 }
+
+static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
+{
+	bool ddw_enabled = false;
+	struct device_node *pdn, *dn;
+	struct pci_dev *pdev;
+	const void *dma_window = NULL;
+	u64 dma_offset;
+
+	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+		return -EIO;
+
+	/* only attempt to use a new window if 64-bit DMA is requested */
+	if (!disable_ddw && dma_mask == DMA_BIT_MASK(64)) {
+		pdev = to_pci_dev(dev);
+
+		dn = pci_device_to_OF_node(pdev);
+		dev_dbg(dev, "node is %s\n", dn->full_name);
+
+		/* 
+		 * the device tree might contain the dma-window properties
+		 * per-device and not neccesarily for the bus. So we need to
+		 * search upwards in the tree until we either hit a dma-window
+		 * property, OR find a parent with a table already allocated.
+		 */
+		for (pdn = dn; pdn && PCI_DN(pdn) && !PCI_DN(pdn)->iommu_table;
+				pdn = pdn->parent) {
+			dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
+			if (dma_window)
+				break;
+		}
+		if (pdn && PCI_DN(pdn)) {
+			dma_offset = enable_ddw(pdev, pdn);
+			if (dma_offset != 0) {
+				dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset);
+				set_dma_offset(dev, dma_offset);
+				set_dma_ops(dev, &dma_direct_ops);
+				ddw_enabled = true;
+			}
+		}
+	}
+
+	/* fall-through to iommu ops */
+	if (!ddw_enabled) {
+		dev_info(dev, "Using 32-bit DMA via iommu\n");
+		set_dma_ops(dev, &dma_iommu_ops);
+	}
+
+	*dev->dma_mask = dma_mask;
+	return 0;
+}
+
 #else  /* CONFIG_PCI */
 #define pci_dma_bus_setup_pSeries	NULL
 #define pci_dma_dev_setup_pSeries	NULL
 #define pci_dma_bus_setup_pSeriesLP	NULL
 #define pci_dma_dev_setup_pSeriesLP	NULL
+#define dma_set_mask_pSeriesLP		NULL
 #endif /* !CONFIG_PCI */
 
+static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
+		void *data)
+{
+	struct direct_window *window;
+	struct memory_notify *arg = data;
+	int ret = 0;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	case MEM_CANCEL_ONLINE:
+	case MEM_OFFLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	default:
+		break;
+	}
+	if (ret && action != MEM_CANCEL_ONLINE)
+		return NOTIFY_BAD;
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block iommu_mem_nb = {
+	.notifier_call = iommu_mem_notifier,
+};
+
 static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node)
 {
 	int err = NOTIFY_OK;
 	struct device_node *np = node;
 	struct pci_dn *pci = PCI_DN(np);
+	struct direct_window *window;
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
 		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
+
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			if (window->device == np) {
+				list_del(&window->list);
+				kfree(window);
+				break;
+			}
+		}
+		spin_unlock(&direct_window_list_lock);
+
+		remove_ddw(np);
 		break;
 	default:
 		err = NOTIFY_DONE;
@@ -588,6 +1148,7 @@ void iommu_init_early_pSeries(void)
 		ppc_md.tce_get   = tce_get_pSeriesLP;
 		ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeriesLP;
 		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeriesLP;
+		ppc_md.dma_set_mask = dma_set_mask_pSeriesLP;
 	} else {
 		ppc_md.tce_build = tce_build_pSeries;
 		ppc_md.tce_free  = tce_free_pSeries;
@@ -598,6 +1159,7 @@ void iommu_init_early_pSeries(void)
 
 
 	pSeries_reconfig_notifier_register(&iommu_reconfig_nb);
+	register_memory_notifier(&iommu_mem_nb);
 
 	set_pci_dma_ops(&dma_iommu_ops);
 }

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH] ppc: update dynamic dma support
  2010-12-11  0:07     ` [PATCH 7/7 v3] " Nishanth Aravamudan
@ 2011-01-08  2:53       ` Nishanth Aravamudan
  2011-01-17 17:32         ` [PATCH v2] " Nishanth Aravamudan
  0 siblings, 1 reply; 21+ messages in thread
From: Nishanth Aravamudan @ 2011-01-08  2:53 UTC (permalink / raw)
  To: sonnyrao, miltonm, Benjamin Herrenschmidt, Paul Mackerras,
	Grant Likely, Anton Blanchard, linuxppc-dev

On 10.12.2010 [16:07:44 -0800], Nishanth Aravamudan wrote:
> On 09.12.2010 [11:09:20 -0800], Nishanth Aravamudan wrote:
> > On 26.10.2010 [20:35:17 -0700], Nishanth Aravamudan wrote:
> > > If firmware allows us to map all of a partition's memory for DMA on a
> > > particular bridge, create a 1:1 mapping of that memory. Add hooks for
> > > dealing with hotplug events. Dyanmic DMA windows can use larger than the
> > > default page size, and we use the largest one possible.
> > > 
> > > Not-yet-signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > > 
> > > ---
> > > 
> > > I've tested this briefly on a machine with suitable firmware/hardware.
> > > Things seem to work well, but I want to do more exhaustive I/O testing
> > > before asking for upstream merging. I would really appreciate any
> > > feedback on the updated approach.
> > > 
> > > Specific questions:
> > > 
> > > Ben, did I hook into the dma_set_mask() platform callback as you
> > > expected? Anything I can do better or which perhaps might lead to
> > > gotchas later?
> > > 
> > > I've added a disable_ddw option, but perhaps it would be better to
> > > just disable the feature if iommu=force?
> > 
> > So for the final version, I probably should document this option in
> > kernel-parameters.txt w/ the patch, right?
> 
> Here's an updated version. Ben, think you can pick this up to your tree?

Hi Ben,

I have a small follow-on patch that tidies up the code a bit and deals
with an error condition on dlpar remove of ddw slots. I'm putting it
below as a follow-on patch, but I can roll it into the v3 patch and post
a v4 if you'd prefer?

Thanks,
Nish


pseries: ddw cleanups
    
Use symbolic constants to access RTAS responses.
    
Disable reconfig notifier's clearing of TCEs and removal of DMA window.
This is handled by firmware currently. If the kernel were to do it, we'd
need a new callback action before the isolation of the slot in question,
or else we'd always get permission errors (firmware revokes the window
automatically).
    
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 43268f1..c17adf7 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -47,6 +47,20 @@ extern int rtas_setup_phb(struct pci_controller *phb);
 
 extern unsigned long pci_probe_only;
 
+/* Dynamic DMA Window support */
+struct ddw_query_response {
+        u32 windows_available;
+        u32 largest_available_block;
+        u32 page_size;
+        u32 migration_capable;
+};
+
+struct ddw_create_response {
+        u32 liobn;
+        u32 addr_hi;
+        u32 addr_lo;
+};
+
 /* ---- EEH internal-use-only related routines ---- */
 #ifdef CONFIG_EEH
 
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 4ba2338..b6f73c6 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -323,7 +323,7 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 		dma_offset = next + be64_to_cpu(maprange->dma_base);
 
 		rc = plpar_tce_stuff((u64)be32_to_cpu(maprange->liobn),
-					    (u64)dma_offset,
+					     dma_offset,
 					     0, limit);
 		num_tce -= limit;
 	} while (num_tce > 0 && !rc);
@@ -383,7 +383,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 		}
 
 		rc = plpar_tce_put_indirect(liobn,
-					    (u64)dma_offset,
+					    dma_offset,
 					    (u64)virt_to_abs(tcep),
 					    limit);
 
@@ -731,7 +731,8 @@ static u64 dupe_ddw_if_kexec(struct pci_dev *dev, struct device_node *pdn)
 	return dma_addr;
 }
 
-static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
+static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail,
+			struct ddw_query_response *query)
 {
 	struct device_node *dn;
 	struct pci_dn *pcidn;
@@ -751,7 +752,7 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
 	if (pcidn->eeh_pe_config_addr)
 		cfg_addr = pcidn->eeh_pe_config_addr;
 	buid = pcidn->phb->buid;
-	ret = rtas_call(ddr_avail[0], 3, 5, query,
+	ret = rtas_call(ddr_avail[0], 3, 5, (u32 *)query,
 		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
 	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
 		" returned %d\n", ddr_avail[0], cfg_addr, BUID_HI(buid),
@@ -759,7 +760,9 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
 	return ret;
 }
 
-static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, int page_shift, int window_shift)
+static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail,
+			struct ddw_create_response *create, int page_shift,
+		       	int window_shift)
 {
 	struct device_node *dn;
 	struct pci_dn *pcidn;
@@ -782,14 +785,14 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, in
 
 	do {
 		/* extra outputs are LIOBN and dma-addr (hi, lo) */
-		ret = rtas_call(ddr_avail[1], 5, 4, &create[0], cfg_addr,
+		ret = rtas_call(ddr_avail[1], 5, 4, (u32 *)create, cfg_addr,
 				BUID_HI(buid), BUID_LO(buid), page_shift, window_shift);
 	} while(rtas_busy_delay(ret));
 	dev_info(&dev->dev,
 		"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
 		"(liobn = 0x%x starting addr = %x %x)\n", ddr_avail[1],
 		 cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
-		 window_shift, ret, create[0], create[1], create[2]);
+		 window_shift, ret, create->liobn, create->addr_hi, create->addr_lo);
 	
 	return ret;
 }
@@ -808,7 +811,8 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, in
 static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 {
 	int len, ret;
-	u32 query[4], create[3];
+        struct ddw_query_response query;
+        struct ddw_create_response create;
 	int page_shift;
 	u64 dma_addr, max_addr;
 	struct device_node *dn;
@@ -846,11 +850,11 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	 * of page sizes: supported and supported for migrate-dma.
 	 */
 	dn = pci_device_to_OF_node(dev);
-	ret = query_ddw(dev, ddr_avail, &query[0]);
+	ret = query_ddw(dev, ddr_avail, &query);
 	if (ret != 0)
 		goto out_unlock;
 
-	if (!query[0]) {
+	if (query.windows_available == 0) {
 		/*
 		 * no additional windows are available for this device.
 		 * We might be able to reallocate the existing window,
@@ -859,23 +863,23 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		dev_dbg(&dev->dev, "no free dynamic windows");
 		goto out_unlock;
 	}
-	if (query[2] & 4) {
+	if (query.page_size & 4) {
 		page_shift = 24; /* 16MB */
-	} else if (query[2] & 2) {
+	} else if (query.page_size & 2) {
 		page_shift = 16; /* 64kB */
-	} else if (query[2] & 1) {
+	} else if (query.page_size & 1) {
 		page_shift = 12; /* 4kB */
 	} else {
 		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
-			  query[2]);
+			  query.page_size);
 		goto out_unlock;
 	}
 	/* verify the window * number of ptes will map the partition */
 	/* check largest block * page size > max memory hotplug addr */
 	max_addr = memory_hotplug_max();
-	if (query[1] < (max_addr >> page_shift)) {
+	if (query.largest_available_block < (max_addr >> page_shift)) {
 		dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
-			  "%llu-sized pages\n", max_addr,  query[1],
+			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
 			  1ULL << page_shift);
 		goto out_unlock;
 	}
@@ -894,19 +898,17 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		goto out_free_prop;
 	}
 
-	ret = create_ddw(dev, ddr_avail, &create[0], page_shift, len);
+	ret = create_ddw(dev, ddr_avail, &create, page_shift, len);
 	if (ret != 0)
 		goto out_free_prop;
 
-	*ddwprop = (struct dynamic_dma_window_prop) {
-		.liobn = cpu_to_be32(create[0]),
-		.dma_base = cpu_to_be64(((u64)create[1] << 32) + (u64)create[2]),
-		.tce_shift = cpu_to_be32(page_shift),
-		.window_shift = cpu_to_be32(len)
-	};
+	ddwprop->liobn = cpu_to_be32(create.liobn);
+	ddwprop->dma_base = cpu_to_be64(of_read_number(&create.addr_hi, 2));
+	ddwprop->tce_shift = cpu_to_be32(page_shift);
+	ddwprop->window_shift = cpu_to_be32(len);
 
 	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %s\n",
-		  create[0], dn->full_name);
+		  create.liobn, dn->full_name);
 
 	window = kzalloc(sizeof(*window), GFP_KERNEL);
 	if (!window)
@@ -933,7 +935,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
 
-	dma_addr = of_read_number(&create[1], 2);
+	dma_addr = of_read_number(&create.addr_hi, 2);
 	set_dma_offset(&dev->dev, dma_addr);
 	goto out_unlock;
 
@@ -1118,7 +1120,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 		}
 		spin_unlock(&direct_window_list_lock);
 
-		remove_ddw(np);
+		/*
+		 * Because the notifier runs after isolation of the
+		 * slot, we are guaranteed any DMA window has already
+		 * been revoked and the TCEs have been marked invalid,
+		 * so we don't need a call to remove_ddw(np). However,
+		 * if an additional notifier action is added before the
+		 * isolate call, we should update this code for
+		 * completeness with such a call.
+		 */
 		break;
 	default:
 		err = NOTIFY_DONE;

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2] ppc: update dynamic dma support
  2011-01-08  2:53       ` [PATCH] ppc: update dynamic dma support Nishanth Aravamudan
@ 2011-01-17 17:32         ` Nishanth Aravamudan
  2011-01-18  0:20           ` [PATCH v3] " Nishanth Aravamudan
  0 siblings, 1 reply; 21+ messages in thread
From: Nishanth Aravamudan @ 2011-01-17 17:32 UTC (permalink / raw)
  To: sonnyrao, miltonm, Benjamin Herrenschmidt, Paul Mackerras,
	Grant Likely, Anton Blanchard, linuxppc-dev

On 07.01.2011 [18:53:34 -0800], Nishanth Aravamudan wrote:
> On 10.12.2010 [16:07:44 -0800], Nishanth Aravamudan wrote:
> > On 09.12.2010 [11:09:20 -0800], Nishanth Aravamudan wrote:
> > > On 26.10.2010 [20:35:17 -0700], Nishanth Aravamudan wrote:
> > > > If firmware allows us to map all of a partition's memory for DMA on a
> > > > particular bridge, create a 1:1 mapping of that memory. Add hooks for
> > > > dealing with hotplug events. Dyanmic DMA windows can use larger than the
> > > > default page size, and we use the largest one possible.
> > > > 
> > > > Not-yet-signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > > > 
> > > > ---
> > > > 
> > > > I've tested this briefly on a machine with suitable firmware/hardware.
> > > > Things seem to work well, but I want to do more exhaustive I/O testing
> > > > before asking for upstream merging. I would really appreciate any
> > > > feedback on the updated approach.
> > > > 
> > > > Specific questions:
> > > > 
> > > > Ben, did I hook into the dma_set_mask() platform callback as you
> > > > expected? Anything I can do better or which perhaps might lead to
> > > > gotchas later?
> > > > 
> > > > I've added a disable_ddw option, but perhaps it would be better to
> > > > just disable the feature if iommu=force?
> > > 
> > > So for the final version, I probably should document this option in
> > > kernel-parameters.txt w/ the patch, right?
> > 
> > Here's an updated version. Ben, think you can pick this up to your tree?
> 
> Hi Ben,
> 
> I have a small follow-on patch that tidies up the code a bit and deals
> with an error condition on dlpar remove of ddw slots. I'm putting it
> below as a follow-on patch, but I can roll it into the v3 patch and post
> a v4 if you'd prefer?

Sorry, found a few more cleanups (spaces instead of tabs, etc.).


pseries: ddw cleanups
    
Use symbolic constants to access RTAS responses.
    
Disable reconfig notifier's clearing of TCEs and removal of DMA window.
This is handled by firmware currently. If the kernel were to do it, we'd
need a new callback action before the isolation of the slot in question,
or else we'd always get permission errors (firmware revokes the window
automatically).

Fix-up a few whitespace issues.
    
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 43268f1..ab37004 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -47,6 +47,20 @@ extern int rtas_setup_phb(struct pci_controller *phb);
 
 extern unsigned long pci_probe_only;
 
+/* Dynamic DMA Window support */
+struct ddw_query_response {
+	u32 windows_available;
+	u32 largest_available_block;
+	u32 page_size;
+	u32 migration_capable;
+};
+
+struct ddw_create_response {
+	u32 liobn;
+	u32 addr_hi;
+	u32 addr_lo;
+};
+
 /* ---- EEH internal-use-only related routines ---- */
 #ifdef CONFIG_EEH
 
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 4ba2338..28cf227 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -323,7 +323,7 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 		dma_offset = next + be64_to_cpu(maprange->dma_base);
 
 		rc = plpar_tce_stuff((u64)be32_to_cpu(maprange->liobn),
-					    (u64)dma_offset,
+					     dma_offset,
 					     0, limit);
 		num_tce -= limit;
 	} while (num_tce > 0 && !rc);
@@ -383,7 +383,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 		}
 
 		rc = plpar_tce_put_indirect(liobn,
-					    (u64)dma_offset,
+					    dma_offset,
 					    (u64)virt_to_abs(tcep),
 					    limit);
 
@@ -731,7 +731,8 @@ static u64 dupe_ddw_if_kexec(struct pci_dev *dev, struct device_node *pdn)
 	return dma_addr;
 }
 
-static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
+static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail,
+			struct ddw_query_response *query)
 {
 	struct device_node *dn;
 	struct pci_dn *pcidn;
@@ -751,7 +752,7 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
 	if (pcidn->eeh_pe_config_addr)
 		cfg_addr = pcidn->eeh_pe_config_addr;
 	buid = pcidn->phb->buid;
-	ret = rtas_call(ddr_avail[0], 3, 5, query,
+	ret = rtas_call(ddr_avail[0], 3, 5, (u32 *)query,
 		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
 	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
 		" returned %d\n", ddr_avail[0], cfg_addr, BUID_HI(buid),
@@ -759,7 +760,9 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
 	return ret;
 }
 
-static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, int page_shift, int window_shift)
+static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail,
+			struct ddw_create_response *create, int page_shift,
+			int window_shift)
 {
 	struct device_node *dn;
 	struct pci_dn *pcidn;
@@ -782,15 +785,15 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, in
 
 	do {
 		/* extra outputs are LIOBN and dma-addr (hi, lo) */
-		ret = rtas_call(ddr_avail[1], 5, 4, &create[0], cfg_addr,
+		ret = rtas_call(ddr_avail[1], 5, 4, (u32 *)create, cfg_addr,
 				BUID_HI(buid), BUID_LO(buid), page_shift, window_shift);
-	} while(rtas_busy_delay(ret));
+	} while (rtas_busy_delay(ret));
 	dev_info(&dev->dev,
 		"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
 		"(liobn = 0x%x starting addr = %x %x)\n", ddr_avail[1],
 		 cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
-		 window_shift, ret, create[0], create[1], create[2]);
-	
+		 window_shift, ret, create->liobn, create->addr_hi, create->addr_lo);
+
 	return ret;
 }
 
@@ -808,7 +811,8 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, in
 static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 {
 	int len, ret;
-	u32 query[4], create[3];
+	struct ddw_query_response query;
+	struct ddw_create_response create;
 	int page_shift;
 	u64 dma_addr, max_addr;
 	struct device_node *dn;
@@ -846,11 +850,11 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	 * of page sizes: supported and supported for migrate-dma.
 	 */
 	dn = pci_device_to_OF_node(dev);
-	ret = query_ddw(dev, ddr_avail, &query[0]);
+	ret = query_ddw(dev, ddr_avail, &query);
 	if (ret != 0)
 		goto out_unlock;
 
-	if (!query[0]) {
+	if (query.windows_available == 0) {
 		/*
 		 * no additional windows are available for this device.
 		 * We might be able to reallocate the existing window,
@@ -859,23 +863,23 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		dev_dbg(&dev->dev, "no free dynamic windows");
 		goto out_unlock;
 	}
-	if (query[2] & 4) {
+	if (query.page_size & 4) {
 		page_shift = 24; /* 16MB */
-	} else if (query[2] & 2) {
+	} else if (query.page_size & 2) {
 		page_shift = 16; /* 64kB */
-	} else if (query[2] & 1) {
+	} else if (query.page_size & 1) {
 		page_shift = 12; /* 4kB */
 	} else {
 		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
-			  query[2]);
+			  query.page_size);
 		goto out_unlock;
 	}
 	/* verify the window * number of ptes will map the partition */
 	/* check largest block * page size > max memory hotplug addr */
 	max_addr = memory_hotplug_max();
-	if (query[1] < (max_addr >> page_shift)) {
+	if (query.largest_available_block < (max_addr >> page_shift)) {
 		dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
-			  "%llu-sized pages\n", max_addr,  query[1],
+			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
 			  1ULL << page_shift);
 		goto out_unlock;
 	}
@@ -894,19 +898,17 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		goto out_free_prop;
 	}
 
-	ret = create_ddw(dev, ddr_avail, &create[0], page_shift, len);
+	ret = create_ddw(dev, ddr_avail, &create, page_shift, len);
 	if (ret != 0)
 		goto out_free_prop;
 
-	*ddwprop = (struct dynamic_dma_window_prop) {
-		.liobn = cpu_to_be32(create[0]),
-		.dma_base = cpu_to_be64(((u64)create[1] << 32) + (u64)create[2]),
-		.tce_shift = cpu_to_be32(page_shift),
-		.window_shift = cpu_to_be32(len)
-	};
+	ddwprop->liobn = cpu_to_be32(create.liobn);
+	ddwprop->dma_base = cpu_to_be64(of_read_number(&create.addr_hi, 2));
+	ddwprop->tce_shift = cpu_to_be32(page_shift);
+	ddwprop->window_shift = cpu_to_be32(len);
 
 	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %s\n",
-		  create[0], dn->full_name);
+		  create.liobn, dn->full_name);
 
 	window = kzalloc(sizeof(*window), GFP_KERNEL);
 	if (!window)
@@ -933,7 +935,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
 
-	dma_addr = of_read_number(&create[1], 2);
+	dma_addr = of_read_number(&create.addr_hi, 2);
 	set_dma_offset(&dev->dev, dma_addr);
 	goto out_unlock;
 
@@ -1015,7 +1017,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
 		dn = pci_device_to_OF_node(pdev);
 		dev_dbg(dev, "node is %s\n", dn->full_name);
 
-		/* 
+		/*
 		 * the device tree might contain the dma-window properties
 		 * per-device and not neccesarily for the bus. So we need to
 		 * search upwards in the tree until we either hit a dma-window
@@ -1118,7 +1120,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 		}
 		spin_unlock(&direct_window_list_lock);
 
-		remove_ddw(np);
+		/*
+		 * Because the notifier runs after isolation of the
+		 * slot, we are guaranteed any DMA window has already
+		 * been revoked and the TCEs have been marked invalid,
+		 * so we don't need a call to remove_ddw(np). However,
+		 * if an additional notifier action is added before the
+		 * isolate call, we should update this code for
+		 * completeness with such a call.
+		 */
 		break;
 	default:
 		err = NOTIFY_DONE;

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3] ppc: update dynamic dma support
  2011-01-17 17:32         ` [PATCH v2] " Nishanth Aravamudan
@ 2011-01-18  0:20           ` Nishanth Aravamudan
  0 siblings, 0 replies; 21+ messages in thread
From: Nishanth Aravamudan @ 2011-01-18  0:20 UTC (permalink / raw)
  To: sonnyrao, miltonm, Benjamin Herrenschmidt, Paul Mackerras,
	Grant Likely, Anton Blanchard, linuxppc-dev

On 17.01.2011 [09:32:10 -0800], Nishanth Aravamudan wrote:
> On 07.01.2011 [18:53:34 -0800], Nishanth Aravamudan wrote:
> > On 10.12.2010 [16:07:44 -0800], Nishanth Aravamudan wrote:
> > > On 09.12.2010 [11:09:20 -0800], Nishanth Aravamudan wrote:
> > > > On 26.10.2010 [20:35:17 -0700], Nishanth Aravamudan wrote:
> > > > > If firmware allows us to map all of a partition's memory for DMA on a
> > > > > particular bridge, create a 1:1 mapping of that memory. Add hooks for
> > > > > dealing with hotplug events. Dyanmic DMA windows can use larger than the
> > > > > default page size, and we use the largest one possible.
> > > > > 
> > > > > Not-yet-signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > > > > 
> > > > > ---
> > > > > 
> > > > > I've tested this briefly on a machine with suitable firmware/hardware.
> > > > > Things seem to work well, but I want to do more exhaustive I/O testing
> > > > > before asking for upstream merging. I would really appreciate any
> > > > > feedback on the updated approach.
> > > > > 
> > > > > Specific questions:
> > > > > 
> > > > > Ben, did I hook into the dma_set_mask() platform callback as you
> > > > > expected? Anything I can do better or which perhaps might lead to
> > > > > gotchas later?
> > > > > 
> > > > > I've added a disable_ddw option, but perhaps it would be better to
> > > > > just disable the feature if iommu=force?
> > > > 
> > > > So for the final version, I probably should document this option in
> > > > kernel-parameters.txt w/ the patch, right?
> > > 
> > > Here's an updated version. Ben, think you can pick this up to your tree?
> > 
> > Hi Ben,
> > 
> > I have a small follow-on patch that tidies up the code a bit and deals
> > with an error condition on dlpar remove of ddw slots. I'm putting it
> > below as a follow-on patch, but I can roll it into the v3 patch and post
> > a v4 if you'd prefer?
> 
> Sorry, found a few more cleanups (spaces instead of tabs, etc.).

Sigh, this is just embarassing. Milton pointed out that there is no
reason to clutter the asm/ppc-pci.h with RTAS specific declarations that
only apply to DDW. So I have moved them into iommu.c in this version.

Thanks,
Nish

pseries: ddw cleanups
    
Use symbolic constants to access RTAS responses.
    
Disable reconfig notifier's clearing of TCEs and removal of DMA window.
This is handled by firmware currently. If the kernel were to do it, we'd
need a new callback action before the isolation of the slot in question,
or else we'd always get permission errors (firmware revokes the window
automatically).
    
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 4ba2338..e4050f6 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -285,6 +285,21 @@ struct direct_window {
 	const struct dynamic_dma_window_prop *prop;
 	struct list_head list;
 };
+
+/* Dynamic DMA Window support */
+struct ddw_query_response {
+	u32 windows_available;
+	u32 largest_available_block;
+	u32 page_size;
+	u32 migration_capable;
+};
+
+struct ddw_create_response {
+	u32 liobn;
+	u32 addr_hi;
+	u32 addr_lo;
+};
+
 static LIST_HEAD(direct_window_list);
 /* prevents races between memory on/offline and window creation */
 static DEFINE_SPINLOCK(direct_window_list_lock);
@@ -323,7 +338,7 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 		dma_offset = next + be64_to_cpu(maprange->dma_base);
 
 		rc = plpar_tce_stuff((u64)be32_to_cpu(maprange->liobn),
-					    (u64)dma_offset,
+					     dma_offset,
 					     0, limit);
 		num_tce -= limit;
 	} while (num_tce > 0 && !rc);
@@ -383,7 +398,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 		}
 
 		rc = plpar_tce_put_indirect(liobn,
-					    (u64)dma_offset,
+					    dma_offset,
 					    (u64)virt_to_abs(tcep),
 					    limit);
 
@@ -731,7 +746,8 @@ static u64 dupe_ddw_if_kexec(struct pci_dev *dev, struct device_node *pdn)
 	return dma_addr;
 }
 
-static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
+static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail,
+			struct ddw_query_response *query)
 {
 	struct device_node *dn;
 	struct pci_dn *pcidn;
@@ -751,7 +767,7 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
 	if (pcidn->eeh_pe_config_addr)
 		cfg_addr = pcidn->eeh_pe_config_addr;
 	buid = pcidn->phb->buid;
-	ret = rtas_call(ddr_avail[0], 3, 5, query,
+	ret = rtas_call(ddr_avail[0], 3, 5, (u32 *)query,
 		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
 	dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
 		" returned %d\n", ddr_avail[0], cfg_addr, BUID_HI(buid),
@@ -759,7 +775,9 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *query)
 	return ret;
 }
 
-static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, int page_shift, int window_shift)
+static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail,
+			struct ddw_create_response *create, int page_shift,
+			int window_shift)
 {
 	struct device_node *dn;
 	struct pci_dn *pcidn;
@@ -782,15 +800,15 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, in
 
 	do {
 		/* extra outputs are LIOBN and dma-addr (hi, lo) */
-		ret = rtas_call(ddr_avail[1], 5, 4, &create[0], cfg_addr,
+		ret = rtas_call(ddr_avail[1], 5, 4, (u32 *)create, cfg_addr,
 				BUID_HI(buid), BUID_LO(buid), page_shift, window_shift);
-	} while(rtas_busy_delay(ret));
+	} while (rtas_busy_delay(ret));
 	dev_info(&dev->dev,
 		"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
 		"(liobn = 0x%x starting addr = %x %x)\n", ddr_avail[1],
 		 cfg_addr, BUID_HI(buid), BUID_LO(buid), page_shift,
-		 window_shift, ret, create[0], create[1], create[2]);
-	
+		 window_shift, ret, create->liobn, create->addr_hi, create->addr_lo);
+
 	return ret;
 }
 
@@ -808,7 +826,8 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddr_avail, u32 *create, in
 static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 {
 	int len, ret;
-	u32 query[4], create[3];
+	struct ddw_query_response query;
+	struct ddw_create_response create;
 	int page_shift;
 	u64 dma_addr, max_addr;
 	struct device_node *dn;
@@ -846,11 +865,11 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	 * of page sizes: supported and supported for migrate-dma.
 	 */
 	dn = pci_device_to_OF_node(dev);
-	ret = query_ddw(dev, ddr_avail, &query[0]);
+	ret = query_ddw(dev, ddr_avail, &query);
 	if (ret != 0)
 		goto out_unlock;
 
-	if (!query[0]) {
+	if (query.windows_available == 0) {
 		/*
 		 * no additional windows are available for this device.
 		 * We might be able to reallocate the existing window,
@@ -859,23 +878,23 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		dev_dbg(&dev->dev, "no free dynamic windows");
 		goto out_unlock;
 	}
-	if (query[2] & 4) {
+	if (query.page_size & 4) {
 		page_shift = 24; /* 16MB */
-	} else if (query[2] & 2) {
+	} else if (query.page_size & 2) {
 		page_shift = 16; /* 64kB */
-	} else if (query[2] & 1) {
+	} else if (query.page_size & 1) {
 		page_shift = 12; /* 4kB */
 	} else {
 		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
-			  query[2]);
+			  query.page_size);
 		goto out_unlock;
 	}
 	/* verify the window * number of ptes will map the partition */
 	/* check largest block * page size > max memory hotplug addr */
 	max_addr = memory_hotplug_max();
-	if (query[1] < (max_addr >> page_shift)) {
+	if (query.largest_available_block < (max_addr >> page_shift)) {
 		dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
-			  "%llu-sized pages\n", max_addr,  query[1],
+			  "%llu-sized pages\n", max_addr,  query.largest_available_block,
 			  1ULL << page_shift);
 		goto out_unlock;
 	}
@@ -894,19 +913,17 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		goto out_free_prop;
 	}
 
-	ret = create_ddw(dev, ddr_avail, &create[0], page_shift, len);
+	ret = create_ddw(dev, ddr_avail, &create, page_shift, len);
 	if (ret != 0)
 		goto out_free_prop;
 
-	*ddwprop = (struct dynamic_dma_window_prop) {
-		.liobn = cpu_to_be32(create[0]),
-		.dma_base = cpu_to_be64(((u64)create[1] << 32) + (u64)create[2]),
-		.tce_shift = cpu_to_be32(page_shift),
-		.window_shift = cpu_to_be32(len)
-	};
+	ddwprop->liobn = cpu_to_be32(create.liobn);
+	ddwprop->dma_base = cpu_to_be64(of_read_number(&create.addr_hi, 2));
+	ddwprop->tce_shift = cpu_to_be32(page_shift);
+	ddwprop->window_shift = cpu_to_be32(len);
 
 	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %s\n",
-		  create[0], dn->full_name);
+		  create.liobn, dn->full_name);
 
 	window = kzalloc(sizeof(*window), GFP_KERNEL);
 	if (!window)
@@ -933,7 +950,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	list_add(&window->list, &direct_window_list);
 	spin_unlock(&direct_window_list_lock);
 
-	dma_addr = of_read_number(&create[1], 2);
+	dma_addr = of_read_number(&create.addr_hi, 2);
 	set_dma_offset(&dev->dev, dma_addr);
 	goto out_unlock;
 
@@ -1015,7 +1032,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
 		dn = pci_device_to_OF_node(pdev);
 		dev_dbg(dev, "node is %s\n", dn->full_name);
 
-		/* 
+		/*
 		 * the device tree might contain the dma-window properties
 		 * per-device and not neccesarily for the bus. So we need to
 		 * search upwards in the tree until we either hit a dma-window
@@ -1118,7 +1135,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 		}
 		spin_unlock(&direct_window_list_lock);
 
-		remove_ddw(np);
+		/*
+		 * Because the notifier runs after isolation of the
+		 * slot, we are guaranteed any DMA window has already
+		 * been revoked and the TCEs have been marked invalid,
+		 * so we don't need a call to remove_ddw(np). However,
+		 * if an additional notifier action is added before the
+		 * isolate call, we should update this code for
+		 * completeness with such a call.
+		 */
 		break;
 	default:
 		err = NOTIFY_DONE;

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-01-18  0:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-27  3:35 [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan
2010-10-27  3:35 ` [RFC PATCH 1/7 v2] macio: ensure all dma routines get copied over Nishanth Aravamudan
2010-10-27  3:35 ` [RFC PATCH 2/7 v2] ppc: add memory_hotplug_max Nishanth Aravamudan
2010-10-27  3:35 ` [RFC PATCH 3/7 v2] ppc: do not search for dma-window property on dlpar remove Nishanth Aravamudan
2010-11-29  1:38   ` Benjamin Herrenschmidt
2010-12-01  0:30     ` Nishanth Aravamudan
2010-12-04  0:30     ` Nishanth Aravamudan
2010-10-27  3:35 ` [RFC PATCH 4/7 v2] ppc: checking for pdn->parent is redundant Nishanth Aravamudan
2010-10-27  3:35 ` [RFC PATCH 5/7 v2] ppc/iommu: do not need to check for dma_window == NULL Nishanth Aravamudan
2010-10-27  3:35 ` [RFC PATCH 6/7 v2] ppc/iommu: pass phb only to iommu_table_setparms_lpar Nishanth Aravamudan
2010-12-09  4:24   ` Benjamin Herrenschmidt
2010-12-09 16:16     ` Nishanth Aravamudan
2010-10-27  3:35 ` [RFC PATCH 7/7 v2] ppc: add dynamic dma window support Nishanth Aravamudan
2010-12-09  4:17   ` Benjamin Herrenschmidt
2010-12-09 19:00     ` Nishanth Aravamudan
2010-12-09 19:09   ` Nishanth Aravamudan
2010-12-11  0:07     ` [PATCH 7/7 v3] " Nishanth Aravamudan
2011-01-08  2:53       ` [PATCH] ppc: update dynamic dma support Nishanth Aravamudan
2011-01-17 17:32         ` [PATCH v2] " Nishanth Aravamudan
2011-01-18  0:20           ` [PATCH v3] " Nishanth Aravamudan
2010-11-08 19:42 ` [RFC PATCH 0/7 v2] ppc: enable dynamic dma window support Nishanth Aravamudan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.