linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/11] ppc: enable dynamic dma window support
@ 2010-10-08 17:33 Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 01/11] macio: ensure all dma routines get copied over Nishanth Aravamudan
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc; +Cc: linuxppc-dev, devicetree-discuss, linux-kernel, miltonm

Hi,

The following series, which builds upon the series of cleanups I posted
on 9/15 as "ppc iommu cleanups", enables the pseries firmware feature
dynamic dma windows. This feature will allow future devices to have a
64-bit DMA mapping covering all memory, coexisting with a smaller IOMMU
window in 32-bit PCI space.

Comments requested and welcome!

Nishanth Aravamudan (11):
  macio: ensure all dma routines get copied over
  ppc: allow direct and iommu to coexist
  ppc: Create ops to choose between direct window and iommu based on
    device mask
  ppc: add memory_hotplug_max
  ppc: do not search for dma-window property on dlpar remove
  ppc: checking for pdn->parent is redundant
  ppc/iommu: do not need to check for dma_window == NULL
  ppc/iommu: remove unneeded pci_dma_bus_setup_pSeriesLP
  ppc/iommu: pass phb only to iommu_table_setparms_lpar
  ppc/iommu: add routines to pseries iommu to map tces 1-1
  ppc: add dynamic dma window support

 arch/powerpc/include/asm/device.h      |   20 +-
 arch/powerpc/include/asm/dma-mapping.h |    6 +-
 arch/powerpc/include/asm/iommu.h       |    6 +-
 arch/powerpc/include/asm/mmzone.h      |    5 +
 arch/powerpc/kernel/Makefile           |    2 +-
 arch/powerpc/kernel/dma-choose64.c     |  167 +++++++++++
 arch/powerpc/mm/numa.c                 |   26 ++
 arch/powerpc/platforms/pseries/iommu.c |  475 +++++++++++++++++++++++++++----
 drivers/macintosh/macio_asic.c         |    7 +-
 9 files changed, 635 insertions(+), 79 deletions(-)
 create mode 100644 arch/powerpc/kernel/dma-choose64.c

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 01/11] macio: ensure all dma routines get copied over
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 02/11] ppc: allow direct and iommu to coexist Nishanth Aravamudan
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc; +Cc: linux-kernel, miltonm, Andreas Schwab, Paul Mackerras, linuxppc-dev

Also add a comment to dev_archdata, indicating that changes there need
to be verified against the driver code.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/include/asm/device.h |    6 ++++++
 drivers/macintosh/macio_asic.c    |    7 +++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index a3954e4..16d25c0 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -9,6 +9,12 @@
 struct dma_map_ops;
 struct device_node;
 
+/*
+ * Arch extensions to struct device.
+ *
+ * When adding fields, consider macio_add_one_device in
+ * drivers/macintosh/macio_asic.c
+ */
 struct dev_archdata {
 	/* DMA operations on that device */
 	struct dma_map_ops	*dma_ops;
diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c
index b6e7ddc..18bf7a9 100644
--- a/drivers/macintosh/macio_asic.c
+++ b/drivers/macintosh/macio_asic.c
@@ -387,11 +387,10 @@ static struct macio_dev * macio_add_one_device(struct macio_chip *chip,
 	/* Set the DMA ops to the ones from the PCI device, this could be
 	 * fishy if we didn't know that on PowerMac it's always direct ops
 	 * or iommu ops that will work fine
+         *
+         * To get all the fields, copy all archdata
 	 */
-	dev->ofdev.dev.archdata.dma_ops =
-		chip->lbus.pdev->dev.archdata.dma_ops;
-	dev->ofdev.dev.archdata.dma_data =
-		chip->lbus.pdev->dev.archdata.dma_data;
+        dev->ofdev.dev.archdata = chip->lbus.pdev->dev.archdata;
 #endif /* CONFIG_PCI */
 
 #ifdef DEBUG
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 02/11] ppc: allow direct and iommu to coexist
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 01/11] macio: ensure all dma routines get copied over Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 23:38   ` Benjamin Herrenschmidt
  2010-10-08 17:33 ` [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask Nishanth Aravamudan
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: FUJITA Tomonori, linux-kernel, miltonm, Paul Mackerras,
	Andrew Morton, linuxppc-dev

Replace the union with just the multiple fields, ifdef on CONFIG_PPC64.

Future pseries boxes will allow a 64 bit dma mapping covering all
memory, coexisting with a smaller iommu window in 32 bit pci space.

The cell fixed mapping would also like both to coexist.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
I used the ifdef guard of CONFIG_PPC64 according to the current makefile
for iommu.c.  One set is burried in the middle of iommu.h.
---
 arch/powerpc/include/asm/device.h      |   14 ++++++--------
 arch/powerpc/include/asm/dma-mapping.h |    4 ++--
 arch/powerpc/include/asm/iommu.h       |    6 ++++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 16d25c0..ed883ea 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -19,14 +19,12 @@ struct dev_archdata {
 	/* DMA operations on that device */
 	struct dma_map_ops	*dma_ops;
 
-	/*
-	 * When an iommu is in use, dma_data is used as a ptr to the base of the
-	 * iommu_table.  Otherwise, it is a simple numerical offset.
-	 */
-	union {
-		dma_addr_t	dma_offset;
-		void		*iommu_table_base;
-	} dma_data;
+	/* dma_offset is used by swiotlb and direct dma ops, but no iommu */
+	dma_addr_t	dma_offset;
+
+#ifdef CONFIG_PPC64
+	void		*iommu_table_base;
+#endif
 
 #ifdef CONFIG_SWIOTLB
 	dma_addr_t		max_direct_dma_addr;
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 8c9c6ad..644103a 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -100,7 +100,7 @@ static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
 static inline dma_addr_t get_dma_offset(struct device *dev)
 {
 	if (dev)
-		return dev->archdata.dma_data.dma_offset;
+		return dev->archdata.dma_offset;
 
 	return PCI_DRAM_OFFSET;
 }
@@ -108,7 +108,7 @@ static inline dma_addr_t get_dma_offset(struct device *dev)
 static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 {
 	if (dev)
-		dev->archdata.dma_data.dma_offset = off;
+		dev->archdata.dma_offset = off;
 }
 
 /* this will be removed soon */
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index edfc980..0f605a4 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -70,15 +70,17 @@ struct iommu_table {
 
 struct scatterlist;
 
+#ifdef CONFIG_PPC64
 static inline void set_iommu_table_base(struct device *dev, void *base)
 {
-	dev->archdata.dma_data.iommu_table_base = base;
+	dev->archdata.iommu_table_base = base;
 }
 
 static inline void *get_iommu_table_base(struct device *dev)
 {
-	return dev->archdata.dma_data.iommu_table_base;
+	return dev->archdata.iommu_table_base;
 }
+#endif
 
 /* Frees table for an individual device node */
 extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 01/11] macio: ensure all dma routines get copied over Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 02/11] ppc: allow direct and iommu to coexist Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 23:43   ` Benjamin Herrenschmidt
  2010-10-08 23:44   ` Benjamin Herrenschmidt
  2010-10-08 17:33 ` [RFC PATCH 04/11] ppc: add memory_hotplug_max Nishanth Aravamudan
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: Anton Vorontsov, miltonm, linux-kernel, FUJITA Tomonori,
	Paul Mackerras, Scott Wood, Andrew Morton, linuxppc-dev

Also allow the coherent ops to be iommu if only the coherent mask is too
small, mostly for driver that do not set set the coherent mask but also
don't use the coherent api.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/include/asm/dma-mapping.h |    2 +
 arch/powerpc/kernel/Makefile           |    2 +-
 arch/powerpc/kernel/dma-choose64.c     |  167 ++++++++++++++++++++++++++++++++
 3 files changed, 170 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/kernel/dma-choose64.c

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 644103a..9ffb16a 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -68,6 +68,8 @@ static inline unsigned long device_to_mask(struct device *dev)
  */
 #ifdef CONFIG_PPC64
 extern struct dma_map_ops dma_iommu_ops;
+extern struct dma_map_ops dma_choose64_ops;
+extern struct dma_map_ops dma_iommu_coherent_ops;
 #endif
 extern struct dma_map_ops dma_direct_ops;
 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1dda701..21b8ea1 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -82,7 +82,7 @@ obj-y				+= time.o prom.o traps.o setup-common.o \
 				   udbg.o misc.o io.o dma.o \
 				   misc_$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o
-obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
+obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o dma-choose64.o
 obj-$(CONFIG_KGDB)		+= kgdb.o
 obj-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE)	+= prom_init.o
 obj-$(CONFIG_MODULES)		+= ppc_ksyms.o
diff --git a/arch/powerpc/kernel/dma-choose64.c b/arch/powerpc/kernel/dma-choose64.c
new file mode 100644
index 0000000..17c716f
--- /dev/null
+++ b/arch/powerpc/kernel/dma-choose64.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright (C) 2006 Benjamin Herrenschmidt, IBM Corporation
+ *
+ * Provide default implementations of the DMA mapping callbacks for
+ * directly mapped busses.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/bug.h>
+
+/*
+ * DMA operations that choose between a 64-bit direct mapping and and iommu
+ *
+ * This set of dma ops chooses between directing to a static 1:1 mapping
+ * that may require a 64 bit address and a iommu based on the declared
+ * streaming and coherent masks for the device.  The choice is made on
+ * the first dma map call.
+ */
+
+/* first BUG ops for calls out of sequence */
+
+void *dma_bug_alloc_coherent(struct device *dev, size_t size,
+				dma_addr_t *dma_handle, gfp_t flag)
+{
+	BUG();
+
+	return NULL;
+}
+
+void dma_bug_free_coherent(struct device *dev, size_t size,
+			      void *vaddr, dma_addr_t dma_handle)
+{
+	BUG();
+}
+
+static int dma_bug_dma_supported(struct device *dev, u64 mask)
+{
+	BUG();
+
+	return 0;
+}
+
+static int dma_bug_map_sg(struct device *dev, struct scatterlist *sgl,
+			     int nents, enum dma_data_direction direction,
+			     struct dma_attrs *attrs)
+{
+	BUG();
+
+	return 0;
+}
+
+
+static void dma_bug_unmap_sg(struct device *dev, struct scatterlist *sg,
+				int nents, enum dma_data_direction direction,
+				struct dma_attrs *attrs)
+{
+	BUG();
+}
+
+static dma_addr_t dma_bug_map_page(struct device *dev,
+					     struct page *page,
+					     unsigned long offset,
+					     size_t size,
+					     enum dma_data_direction dir,
+					     struct dma_attrs *attrs)
+{
+	BUG();
+
+	return DMA_ERROR_CODE;
+}
+
+
+static void dma_bug_unmap_page(struct device *dev,
+					 dma_addr_t dma_address,
+					 size_t size,
+					 enum dma_data_direction direction,
+					 struct dma_attrs *attrs)
+{
+	BUG();
+}
+
+
+static struct dma_map_ops *choose(struct device *dev)
+{
+	if (dma_direct_ops.dma_supported(dev, device_to_mask(dev))) {
+		if (dma_direct_ops.dma_supported(dev, dev->coherent_dma_mask))
+			return &dma_direct_ops;
+		return &dma_iommu_coherent_ops;
+	}
+	return &dma_iommu_ops;
+}
+
+void *dma_choose64_alloc_coherent(struct device *dev, size_t size,
+				dma_addr_t *dma_handle, gfp_t flag)
+{
+	struct dma_map_ops *new = choose(dev);
+
+	set_dma_ops(dev, new);
+	return new->alloc_coherent(dev, size, dma_handle, flag);
+}
+
+static int dma_choose64_map_sg(struct device *dev, struct scatterlist *sgl,
+			     int nents, enum dma_data_direction direction,
+			     struct dma_attrs *attrs)
+{
+	struct dma_map_ops *new = choose(dev);
+
+	set_dma_ops(dev, new);
+	return new->map_sg(dev, sgl, nents, direction, attrs);
+}
+
+
+static int dma_choose64_dma_supported(struct device *dev, u64 mask)
+{
+	return dma_direct_ops.dma_supported(dev, mask) ||
+		dma_iommu_ops.dma_supported(dev, mask);
+}
+
+static dma_addr_t dma_choose64_map_page(struct device *dev,
+					     struct page *page,
+					     unsigned long offset,
+					     size_t size,
+					     enum dma_data_direction dir,
+					     struct dma_attrs *attrs)
+{
+	struct dma_map_ops *new = choose(dev);
+
+	set_dma_ops(dev, new);
+	return new->map_page(dev, page, offset, size, dir, attrs);
+}
+
+struct dma_map_ops dma_choose64_ops = {
+	.alloc_coherent	= dma_choose64_alloc_coherent,
+	.free_coherent	= dma_bug_free_coherent,
+	.map_sg		= dma_choose64_map_sg,
+	.unmap_sg	= dma_bug_unmap_sg,
+	.dma_supported	= dma_choose64_dma_supported,
+	.map_page	= dma_choose64_map_page,
+	.unmap_page	= dma_bug_unmap_page,
+};
+EXPORT_SYMBOL(dma_choose64_ops);
+
+/* set these up to BUG() until we initialze them in the arch initcall below */
+struct dma_map_ops dma_iommu_coherent_ops = {
+	.alloc_coherent	= dma_bug_alloc_coherent,
+	.free_coherent	= dma_bug_free_coherent,
+	.map_sg		= dma_bug_map_sg,
+	.unmap_sg	= dma_bug_unmap_sg,
+	.dma_supported	= dma_bug_dma_supported,
+	.map_page	= dma_bug_map_page,
+	.unmap_page	= dma_bug_unmap_page,
+};
+EXPORT_SYMBOL(dma_iommu_coherent_ops);
+
+static int setup_choose64_ops(void)
+{
+	dma_iommu_coherent_ops = dma_direct_ops;
+	dma_iommu_coherent_ops.alloc_coherent = dma_iommu_ops.alloc_coherent;
+	dma_iommu_coherent_ops.free_coherent = dma_iommu_ops.free_coherent;
+
+	/* should we be stricter? */
+	dma_iommu_coherent_ops.dma_supported = dma_choose64_dma_supported;
+
+	return 0;
+}
+arch_initcall(setup_choose64_ops);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 04/11] ppc: add memory_hotplug_max
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (2 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 05/11] ppc: do not search for dma-window property on dlpar remove Nishanth Aravamudan
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: linux-kernel, miltonm, H Hartley Sweeten, Paul Mackerras,
	Anton Blanchard, David Rientjes, Andrew Morton, linuxppc-dev

Add a function to get the maximum address that can be hotplug added.
This is needed to calculate the size of the tce table needed to cover
all memory in 1:1 mode.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
Comments on where to export?
---
 arch/powerpc/include/asm/mmzone.h |    5 +++++
 arch/powerpc/mm/numa.c            |   26 ++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/mmzone.h b/arch/powerpc/include/asm/mmzone.h
index aac87cb..fd3fd58 100644
--- a/arch/powerpc/include/asm/mmzone.h
+++ b/arch/powerpc/include/asm/mmzone.h
@@ -33,6 +33,9 @@ extern int numa_cpu_lookup_table[];
 extern cpumask_var_t node_to_cpumask_map[];
 #ifdef CONFIG_MEMORY_HOTPLUG
 extern unsigned long max_pfn;
+u64 memory_hotplug_max(void);
+#else
+#define memory_hotplug_max() memblock_end_of_DRAM()
 #endif
 
 /*
@@ -42,6 +45,8 @@ extern unsigned long max_pfn;
 #define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
 #define node_end_pfn(nid)	(NODE_DATA(nid)->node_end_pfn)
 
+#else
+#define memory_hotplug_max() memblock_end_of_DRAM()
 #endif /* CONFIG_NEED_MULTIPLE_NODES */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 002878c..f98b0d2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1246,4 +1246,30 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
 	return nid;
 }
 
+static u64 hot_add_drconf_memory_max(void)
+{
+        struct device_node *memory = NULL;
+        unsigned int drconf_cell_cnt = 0;
+        u64 lmb_size = 0;
+        const u32 *dm = 0;
+
+        memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+        if (memory) {
+                drconf_cell_cnt = of_get_drconf_memory(memory, &dm);
+                lmb_size = of_get_lmb_size(memory);
+                of_node_put(memory);
+        }
+        return lmb_size * drconf_cell_cnt;
+}
+
+/*
+ * memory_hotplug_max - return max address of memory that may be added
+ *
+ * This is currently only used on systems that support drconfig memory
+ * hotplug.
+ */
+u64 memory_hotplug_max(void)
+{
+        return max(hot_add_drconf_memory_max(), memblock_end_of_DRAM());
+}
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 05/11] ppc: do not search for dma-window property on dlpar remove
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (3 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 04/11] ppc: add memory_hotplug_max Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 06/11] ppc: checking for pdn->parent is redundant Nishanth Aravamudan
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: devicetree-discuss, linux-kernel, miltonm, Paul Mackerras,
	Anton Blanchard, linuxppc-dev

The iommu_table pointer in the pci auxiliary struct of device_node has
not been used by the iommu ops since the dma refactor of
12d04eef927bf61328af2c7cbe756c96f98ac3bf, however this code still uses
it to find tables for dlpar. By only setting the PCI_DN iommu_table
pointer on nodes with dma window properties, we will be able to quickly
find the node for later checks, and can remove the table without looking
for the the dma window property on dlpar remove.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9184db3..8ab32da 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -455,9 +455,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
 		pr_debug("  created table: %p\n", ppci->iommu_table);
 	}
-
-	if (pdn != dn)
-		PCI_DN(dn)->iommu_table = ppci->iommu_table;
 }
 
 
@@ -571,8 +568,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
-		if (pci && pci->iommu_table &&
-		    of_get_property(np, "ibm,dma-window", NULL))
+		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
 		break;
 	default:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 06/11] ppc: checking for pdn->parent is redundant
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (4 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 05/11] ppc: do not search for dma-window property on dlpar remove Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 07/11] ppc/iommu: do not need to check for dma_window == NULL Nishanth Aravamudan
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc; +Cc: linux-kernel, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

The device tree root is never a pci bus, and will not have a
PCI_DN(pdn), so the check for PCI_DN added in
650f7b3b2f0ead0673e90452cf3dedde97c537ba makes the check for pdn->parent
redundant and it can be removed.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 8ab32da..0ae5a60 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -530,10 +530,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	}
 	pr_debug("  parent is %s\n", pdn->full_name);
 
-	/* Check for parent == NULL so we don't try to setup the empty EADS
-	 * slots on POWER4 machines.
-	 */
-	if (dma_window == NULL || pdn->parent == NULL) {
+	if (dma_window == NULL) {
 		pr_debug("  no dma window for device, linking to parent\n");
 		set_iommu_table_base(&dev->dev, PCI_DN(pdn)->iommu_table);
 		return;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 07/11] ppc/iommu: do not need to check for dma_window == NULL
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (5 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 06/11] ppc: checking for pdn->parent is redundant Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 08/11] ppc/iommu: remove unneeded pci_dma_bus_setup_pSeriesLP Nishanth Aravamudan
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc; +Cc: linux-kernel, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

The block in pci_dma_dev_setup_pSeriesLP for dma_window == NULL can be
removed because we will only teminate the loop if we had already allocated
a iommu table for that node or we found a window.  While there may be
no window for the device, the intresting part is if we are reusing a
table or creating it for the first device under it.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 0ae5a60..9d564b9 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -530,12 +530,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	}
 	pr_debug("  parent is %s\n", pdn->full_name);
 
-	if (dma_window == NULL) {
-		pr_debug("  no dma window for device, linking to parent\n");
-		set_iommu_table_base(&dev->dev, PCI_DN(pdn)->iommu_table);
-		return;
-	}
-
 	pci = PCI_DN(pdn);
 	if (!pci->iommu_table) {
 		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 08/11] ppc/iommu: remove unneeded pci_dma_bus_setup_pSeriesLP
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (6 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 07/11] ppc/iommu: do not need to check for dma_window == NULL Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 09/11] ppc/iommu: pass phb only to iommu_table_setparms_lpar Nishanth Aravamudan
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: devicetree-discuss, linux-kernel, miltonm, Paul Mackerras,
	Anton Blanchard, linuxppc-dev

The work done in pci_dma_bus_setup_pSeriesLP will be done in
pci_dma_dev_setup_pSeriesLP, and therefore we can remove the bus setup
function for lpar.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |   43 --------------------------------
 1 files changed, 0 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 9d564b9..d8bb9be 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -417,47 +417,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 	pr_debug("ISA/IDE, window size is 0x%llx\n", pci->phb->dma_window_size);
 }
 
-
-static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
-{
-	struct iommu_table *tbl;
-	struct device_node *dn, *pdn;
-	struct pci_dn *ppci;
-	const void *dma_window = NULL;
-
-	dn = pci_bus_to_OF_node(bus);
-
-	pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %s\n",
-		 dn->full_name);
-
-	/* Find nearest ibm,dma-window, walking up the device tree */
-	for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
-		dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
-		if (dma_window != NULL)
-			break;
-	}
-
-	if (dma_window == NULL) {
-		pr_debug("  no ibm,dma-window property !\n");
-		return;
-	}
-
-	ppci = PCI_DN(pdn);
-
-	pr_debug("  parent is %s, iommu_table: 0x%p\n",
-		 pdn->full_name, ppci->iommu_table);
-
-	if (!ppci->iommu_table) {
-		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-				   ppci->phb->node);
-		iommu_table_setparms_lpar(ppci->phb, pdn, tbl, dma_window,
-			bus->number);
-		ppci->iommu_table = iommu_init_table(tbl, ppci->phb->node);
-		pr_debug("  created table: %p\n", ppci->iommu_table);
-	}
-}
-
-
 static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 {
 	struct device_node *dn;
@@ -547,7 +506,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 #else  /* CONFIG_PCI */
 #define pci_dma_bus_setup_pSeries	NULL
 #define pci_dma_dev_setup_pSeries	NULL
-#define pci_dma_bus_setup_pSeriesLP	NULL
 #define pci_dma_dev_setup_pSeriesLP	NULL
 #endif /* !CONFIG_PCI */
 
@@ -588,7 +546,6 @@ void iommu_init_early_pSeries(void)
 			ppc_md.tce_free	 = tce_free_pSeriesLP;
 		}
 		ppc_md.tce_get   = tce_get_pSeriesLP;
-		ppc_md.pci_dma_bus_setup = pci_dma_bus_setup_pSeriesLP;
 		ppc_md.pci_dma_dev_setup = pci_dma_dev_setup_pSeriesLP;
 	} else {
 		ppc_md.tce_build = tce_build_pSeries;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 09/11] ppc/iommu: pass phb only to iommu_table_setparms_lpar
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (7 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 08/11] ppc/iommu: remove unneeded pci_dma_bus_setup_pSeriesLP Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 10/11] ppc/iommu: add routines to pseries iommu to map tces 1-1 Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 11/11] ppc: add dynamic dma window support Nishanth Aravamudan
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc; +Cc: linux-kernel, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

iommu_table_setparms_lpar needs either the phb or the subbusnumber
(not both), pass the phb to make it similar to iommu_table_setparms.

Note: In cases where a caller was passing bus->number previously to
iommu_table_setparms_lpar() rather than phb->bus->number, this can lead
to a different value in tbl->it_busno. The only example of this was the
removed pci_dma_dev_setup_pSeriesLP(), removed in "ppc/iommu: remove
unneeded pci_dma_dev_setup_pSeriesLP".

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index d8bb9be..8ec81df 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -323,14 +323,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
 static void iommu_table_setparms_lpar(struct pci_controller *phb,
 				      struct device_node *dn,
 				      struct iommu_table *tbl,
-				      const void *dma_window,
-				      int bussubno)
+				      const void *dma_window)
 {
 	unsigned long offset, size;
 
-	tbl->it_busno  = bussubno;
 	of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
 
+	tbl->it_busno = phb->bus->number;
 	tbl->it_base   = 0;
 	tbl->it_blocksize  = 16;
 	tbl->it_type = TCE_PCI;
@@ -493,8 +492,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	if (!pci->iommu_table) {
 		tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
 				   pci->phb->node);
-		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window,
-			pci->phb->bus->number);
+		iommu_table_setparms_lpar(pci->phb, pdn, tbl, dma_window);
 		pci->iommu_table = iommu_init_table(tbl, pci->phb->node);
 		pr_debug("  created table: %p\n", pci->iommu_table);
 	} else {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 10/11] ppc/iommu: add routines to pseries iommu to map tces 1-1
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (8 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 09/11] ppc/iommu: pass phb only to iommu_table_setparms_lpar Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  2010-10-08 17:33 ` [RFC PATCH 11/11] ppc: add dynamic dma window support Nishanth Aravamudan
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc; +Cc: linux-kernel, miltonm, Paul Mackerras, Anton Blanchard, linuxppc-dev

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |   98 ++++++++++++++++++++++++++++++++
 1 files changed, 98 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 8ec81df..451d2d1 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -270,6 +270,104 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
 	return tce_ret;
 }
 
+/* this is compatable with cells for the device tree property */
+struct dynamic_dma_window_prop {
+	__be32	liobn;		/* tce table number */
+	__be32	dma_base[2];	/* address hi,lo */
+	__be32	tce_shift;	/* ilog2(tce_page_size) */
+	__be32	window_shift;	/* ilog2(tce_window_size) */
+};
+
+static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
+					unsigned long num_pfn, void *arg)
+{
+	struct dynamic_dma_window_prop *maprange = arg;
+	int rc;
+	u64 tce_size, num_tce, dma_offset;
+	u32 tce_shift;
+
+	tce_shift = be32_to_cpu(maprange->tce_shift);
+	tce_size = 1ULL << tce_shift;
+	num_tce = num_pfn << PAGE_SHIFT;
+	dma_offset = start_pfn << PAGE_SHIFT;
+
+	/* round back to the beginning of the tce page size */
+	num_tce += dma_offset & (tce_size - 1);
+	dma_offset &= ~(tce_size - 1);
+
+	/* covert to number of tces */
+	num_tce |= tce_size - 1;
+	num_tce >>= tce_shift;
+
+	rc = plpar_tce_stuff(maprange->liobn, dma_offset, 0, num_tce);
+
+	return rc;
+}
+
+static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
+					unsigned long num_pfn, void *arg)
+{
+	struct dynamic_dma_window_prop *maprange = arg;
+	u64 *tcep, tce_size, num_tce, dma_offset, next, proto_tce;
+	u32 tce_shift;
+	long rc = 0;
+	long l, limit;
+
+	local_irq_disable();	/* to protect tcep and the page behind it */
+	tcep = __get_cpu_var(tce_page);
+
+	if (!tcep) {
+		tcep = (u64 *)__get_free_page(GFP_ATOMIC);
+		if (!tcep) {
+			local_irq_enable();
+			return -ENOMEM;
+		}
+		__get_cpu_var(tce_page) = tcep;
+	}
+
+	proto_tce = TCE_PCI_READ | TCE_PCI_WRITE;
+
+	tce_shift = be32_to_cpu(maprange->tce_shift);
+	tce_size = 1ULL << tce_shift;
+	next = start_pfn << PAGE_SHIFT;
+	num_tce = num_pfn << PAGE_SHIFT;
+
+	/* round back to the beginning of the tce page size */
+	num_tce += next & (tce_size - 1);
+	next &= ~(tce_size - 1);
+
+	/* covert to number of tces */
+	num_tce |= tce_size - 1;
+	num_tce >>= maprange->tce_shift;
+
+	/* We can map max one pageful of TCEs at a time */
+	do {
+		/*
+		 * Set up the page with TCE data, looping through and setting
+		 * the values.
+		 */
+		limit = min_t(long, num_tce, 4096/TCE_ENTRY_SIZE);
+		dma_offset = next;
+
+		for (l = 0; l < limit; l++) {
+			tcep[l] = proto_tce | dma_offset;
+			next += tce_size;
+		}
+
+		rc = plpar_tce_put_indirect((u64)maprange->liobn,
+					    (u64)dma_offset,
+					    (u64)virt_to_abs(tcep),
+					    limit);
+
+		num_tce -= limit;
+	} while (num_tce > 0 && !rc);
+
+	/* error cleanup: caller will clear whole range */
+
+	local_irq_enable();
+	return rc;
+}
+
 #ifdef CONFIG_PCI
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 11/11] ppc: add dynamic dma window support
  2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
                   ` (9 preceding siblings ...)
  2010-10-08 17:33 ` [RFC PATCH 10/11] ppc/iommu: add routines to pseries iommu to map tces 1-1 Nishanth Aravamudan
@ 2010-10-08 17:33 ` Nishanth Aravamudan
  10 siblings, 0 replies; 17+ messages in thread
From: Nishanth Aravamudan @ 2010-10-08 17:33 UTC (permalink / raw)
  To: nacc
  Cc: devicetree-discuss, linux-kernel, miltonm, Paul Mackerras,
	Anton Blanchard, linuxppc-dev

If firmware allows us to map all of a partition's memory for DMA on a
particular bridge, create a 1:1 mapping of that memory. Add hooks for
dealing with hotplug events. Dyanmic DMA windows can use larger than the
default page size, and we use the largest one possible.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |  319 +++++++++++++++++++++++++++++++-
 1 files changed, 315 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 451d2d1..23ca0d1 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -33,6 +33,7 @@
 #include <linux/pci.h>
 #include <linux/dma-mapping.h>
 #include <linux/crash_dump.h>
+#include <linux/memory.h>
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
@@ -45,6 +46,7 @@
 #include <asm/tce.h>
 #include <asm/ppc-pci.h>
 #include <asm/udbg.h>
+#include <asm/mmzone.h>
 
 #include "plpar_wrappers.h"
 
@@ -278,10 +280,19 @@ struct dynamic_dma_window_prop {
 	__be32	window_shift;	/* ilog2(tce_window_size) */
 };
 
+struct direct_window {
+	struct device_node *device;
+	const struct dynamic_dma_window_prop *prop;
+	struct list_head list;
+};
+static LIST_HEAD(direct_window_list);
+static DEFINE_SPINLOCK(direct_window_list_lock);
+#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+
 static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
-					unsigned long num_pfn, void *arg)
+					unsigned long num_pfn, const void *arg)
 {
-	struct dynamic_dma_window_prop *maprange = arg;
+	const struct dynamic_dma_window_prop *maprange = arg;
 	int rc;
 	u64 tce_size, num_tce, dma_offset;
 	u32 tce_shift;
@@ -305,9 +316,9 @@ static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
 }
 
 static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
-					unsigned long num_pfn, void *arg)
+					unsigned long num_pfn, const void *arg)
 {
-	struct dynamic_dma_window_prop *maprange = arg;
+	const struct dynamic_dma_window_prop *maprange = arg;
 	u64 *tcep, tce_size, num_tce, dma_offset, next, proto_tce;
 	u32 tce_shift;
 	long rc = 0;
@@ -368,6 +379,12 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
 	return rc;
 }
 
+static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
+					unsigned long num_pfn, void *arg)
+{
+	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
+}
+
 #ifdef CONFIG_PCI
 static void iommu_table_setparms(struct pci_controller *phb,
 				 struct device_node *dn,
@@ -553,6 +570,246 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		       pci_name(dev));
 }
 
+/*
+ * If the PE supports dynamic dma windows, and there is space for a table
+ * that can map all pages in a linear offset, then setup such a table,
+ * and record the dma-offset in the struct device.
+ *
+ * dev: the pci device we are checking
+ * pdn: the parent pe node with the ibm,dma_window property
+ * Future: also check if we can remap the base window for our base page size
+ */
+static void check_ddr_windowLP(struct pci_dev *dev, struct device_node *pdn)
+{
+	int len, ret;
+	u32 query[4], create[3], cfg_addr;
+	int page_shift;
+	u64 dma_addr, buid, max_addr;
+	struct pci_dn *pcidn;
+	const u32 *uninitialized_var(ddr_avail);
+	struct direct_window *window;
+	struct property *uninitialized_var(win64);
+	struct dynamic_dma_window_prop *ddwprop;
+	const struct dynamic_dma_window_prop *direct64;
+
+	spin_lock(&direct_window_list_lock);
+
+	/* check if we already created a window */
+	list_for_each_entry(window, &direct_window_list, list) {
+		if (window->device == pdn) {
+			direct64 = window->prop;
+			goto set_device;
+		}
+	}
+	/* check if we kexec'd with a window */
+	direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
+	if (direct64)
+		goto create_window_listent;
+
+	ddr_avail = of_get_property(pdn, "ibm,ddw-applicable", &len);
+
+	if (!ddr_avail || len < 4 * sizeof(u32))
+		return;
+	/*
+	 * the ibm,ddw-applicable property holds the tokens for:
+	 * ibm,query-pe-dma-window
+	 * ibm,create-pe-dma-window
+	 * ibm,remove-pe-dma-window
+	 * for the given node in that order.
+	 *
+	 * Query if there is a second window of size to map the
+	 * whole partition.  Query returns number of windows, largest
+	 * block assigned to PE (partition endpoint), and two bitmasks
+	 * of page sizes: supported and supported for migrate-dma.
+	 */
+
+	/*
+	 * Get the config address and phb build of the PE window.
+	 * Rely on eeh to retrieve this for us.
+	 * Retrieve them from the node with the dma window property.
+	 */
+	pcidn = PCI_DN(pdn);
+	cfg_addr = pcidn->eeh_config_addr;
+	if (pcidn->eeh_pe_config_addr)
+		cfg_addr = pcidn->eeh_pe_config_addr;
+	buid = pcidn->phb->buid;
+	ret = rtas_call(ddr_avail[0], 3, 5, &query[0],
+		  cfg_addr, BUID_HI(buid), BUID_LO(buid));
+	if (ret != 0) {
+		dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
+			" returned %d\n", ddr_avail[0], cfg_addr, BUID_HI(buid),
+			BUID_LO(buid), ret);
+		goto out_unlock;
+	}
+
+	if (!query[0]) {
+		/*
+		 * no additional windows are available for this device.
+		 * We might be able to reallocate the existing window,
+		 * trading in for a larger page size.
+		 */
+		dev_dbg(&dev->dev, "no free dynamic windows");
+		goto out_unlock;
+	}
+	if (query[2] & 4) {
+		page_shift = 24; /* 16MB */
+	} else if (query[2] & 2) {
+		page_shift = 16; /* 64kB */
+	} else if (query[2] & 1) {
+		page_shift = 12; /* 4kB */
+	} else {
+		dev_dbg(&dev->dev, "no supported direct page size in mask %x",
+			  query[2]);
+		goto out_unlock;
+	}
+	/* verify the window * number of ptes will map the partition */
+	/* check largest block * page size > max memory hotplug addr */
+	max_addr = memory_hotplug_max();
+	if (query[1] < (max_addr >> page_shift)) {
+		dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
+			  "%llu-sized pages\n", max_addr,  query[1],
+			  1ULL << page_shift);
+		goto out_unlock;
+	}
+	len = order_base_2(max_addr);
+	win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
+	if (!win64) {
+		dev_info(&dev->dev,
+			"couldn't allocate property for 64bit dma window\n");
+		goto out_unlock;
+	}
+	win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
+	win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
+	if (!win64->name || !win64->value) {
+		dev_info(&dev->dev,
+			"couldn't allocate property name and value\n");
+		goto out_free_prop;
+	}
+	do {
+		/* extra outputs are LIOBN and dma-addr (hi, lo) */
+		ret = rtas_call(ddr_avail[1], 7, 4, &create[0], cfg_addr,
+				BUID_HI(buid), BUID_LO(buid), len, page_shift);
+	} while(rtas_busy_delay(ret));
+	if (ret) {
+		dev_info(&dev->dev,
+			"failed to create direct window: rtas returned %d"
+			" to ibm,create-pe-dma-window(%x) %x %x %x %x %x\n",
+			ret, ddr_avail[1], cfg_addr, BUID_HI(buid),
+			BUID_LO(buid), len, page_shift);
+		goto out_free_prop;
+	}
+
+	*ddwprop = (struct dynamic_dma_window_prop) {
+		.liobn = cpu_to_be32(create[0]),
+		.dma_base = {cpu_to_be32(create[1]), cpu_to_be32(create[2])},
+		.tce_shift = cpu_to_be32(page_shift),
+		.window_shift = cpu_to_be32(len)
+	};
+
+	dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %s\n",
+		  create[0], pdn->full_name);
+
+	ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
+			win64->value, tce_setrange_multi_pSeriesLP_walk);
+	if (ret) {
+		dev_info(&dev->dev, "failed to map direct window for %s\n",
+			 pdn->full_name);
+
+		goto out_clear_window;
+	}
+
+	ret = prom_add_property(pdn, win64);
+	if (ret) {
+		pr_err("%s: unable to add dma window property: %d",
+			 pdn->full_name, ret);
+		goto out_clear_window;
+	}
+
+	direct64 = ddwprop;
+
+create_window_listent:
+	window = kzalloc(sizeof(*window), GFP_KERNEL);
+	if (!window)
+		goto out_clear_window;
+	window->device = pdn;
+	window->prop = direct64;
+	list_add(&window->list, &direct_window_list);
+
+set_device:
+	dma_addr = of_read_number(&direct64->dma_base[0], 2);
+	set_dma_offset(&dev->dev, dma_addr);
+	set_dma_ops(&dev->dev, &dma_choose64_ops);
+
+	dev_dbg(&dev->dev, "Can use direct dma at %s (offset %llx)\n",
+		pdn->full_name, dma_addr);
+
+out_unlock:
+	spin_unlock(&direct_window_list_lock);
+	return;
+
+out_clear_window:
+	ret = tce_clearrange_multi_pSeriesLP(0,
+		memblock_end_of_DRAM() >> PAGE_SHIFT, win64->value);
+	if (ret)
+		dev_info(&dev->dev,
+			"failed to clear partial window for %s\n",
+			 pdn->full_name);
+
+	ret = rtas_call(ddr_avail[2], 1, 1, NULL, direct64->liobn);
+	if (ret) {
+		dev_info(&dev->dev,
+			"failed to remove direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %x\n",
+			ret, ddr_avail[2], direct64->liobn);
+	}
+
+out_free_prop:
+	kfree(win64->name);
+	kfree(win64->value);
+	kfree(win64);
+
+	goto out_unlock;
+}
+
+#if 1 //def CLEAN_WINDOW_ON_REMOVE
+static void remove_ddr_windowLP(struct device_node *np)
+{
+	struct dynamic_dma_window_prop *dwp;
+	struct property *win64;
+	const u32 *ddr_avail;
+	int len, ret;
+
+	ddr_avail = of_get_property(np, "ibm,ddw-applicable", &len);
+
+	win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
+
+	if (!win64 || !ddr_avail || len < 4 * sizeof(u32))
+		return;
+
+	dwp = win64->value;
+
+	/* clear the whole window, note the arg is in kernel pages */
+	ret = tce_clearrange_multi_pSeriesLP(0,
+		1ULL << (dwp->window_shift - PAGE_SHIFT), dwp);
+	if (ret)
+		pr_warning("%s failed to clear tces in window.\n",
+			 np->full_name);
+
+	ret = rtas_call(ddr_avail[2], 1, 1, NULL, dwp->liobn);
+	if (ret)
+		pr_warning("%s: failed to remove direct window: rtas returned "
+			"%d to ibm,remove-pe-dma-window(%x) %x\n",
+			np->full_name, ret, ddr_avail[2], dwp->liobn);
+
+	ret = prom_remove_property(np, win64);
+	if (ret)
+		pr_warning("%s: failed to remove direct window property (%i)\n",
+			np->full_name, ret);
+}
+#else
+static void remove_ddr_windowLP(struct device_node *np) {}
+#endif
+
 static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 {
 	struct device_node *pdn, *dn;
@@ -598,6 +855,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 	}
 
 	set_iommu_table_base(&dev->dev, pci->iommu_table);
+	check_ddr_windowLP(dev, pdn);
 }
 #else  /* CONFIG_PCI */
 #define pci_dma_bus_setup_pSeries	NULL
@@ -605,16 +863,68 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 #define pci_dma_dev_setup_pSeriesLP	NULL
 #endif /* !CONFIG_PCI */
 
+static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
+		void *data)
+{
+	struct direct_window *window;
+	struct memory_notify *arg = data;
+	int ret = 0;
+
+	switch (action) {
+	case MEM_GOING_ONLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	case MEM_CANCEL_ONLINE:
+	case MEM_OFFLINE:
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
+					arg->nr_pages, window->prop);
+			/* XXX log error */
+		}
+		spin_unlock(&direct_window_list_lock);
+		break;
+	default:
+		break;
+	}
+	if (ret && action != MEM_CANCEL_ONLINE)
+		return NOTIFY_BAD;
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block iommu_mem_nb = {
+	.notifier_call = iommu_mem_notifier,
+};
+
 static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node)
 {
 	int err = NOTIFY_OK;
 	struct device_node *np = node;
 	struct pci_dn *pci = PCI_DN(np);
+	struct direct_window *window;
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
 		if (pci && pci->iommu_table)
 			iommu_free_table(pci->iommu_table, np->full_name);
+
+		spin_lock(&direct_window_list_lock);
+		list_for_each_entry(window, &direct_window_list, list) {
+			if (window->device == np) {
+				list_del(&window->list);
+				break;
+			}
+		}
+		spin_unlock(&direct_window_list_lock);
+
+		remove_ddr_windowLP(np);
 		break;
 	default:
 		err = NOTIFY_DONE;
@@ -653,6 +963,7 @@ void iommu_init_early_pSeries(void)
 
 
 	pSeries_reconfig_notifier_register(&iommu_reconfig_nb);
+	register_memory_notifier(&iommu_mem_nb);
 
 	set_pci_dma_ops(&dma_iommu_ops);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 02/11] ppc: allow direct and iommu to coexist
  2010-10-08 17:33 ` [RFC PATCH 02/11] ppc: allow direct and iommu to coexist Nishanth Aravamudan
@ 2010-10-08 23:38   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-08 23:38 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: FUJITA Tomonori, linux-kernel, miltonm, Paul Mackerras,
	Andrew Morton, linuxppc-dev

On Fri, 2010-10-08 at 10:33 -0700, Nishanth Aravamudan wrote:
> Replace the union with just the multiple fields, ifdef on CONFIG_PPC64.
> 
> Future pseries boxes will allow a 64 bit dma mapping covering all
> memory, coexisting with a smaller iommu window in 32 bit pci space.
> 
> The cell fixed mapping would also like both to coexist.
> 
> Signed-off-by: Milton Miller <miltonm@bga.com>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> ---
> I used the ifdef guard of CONFIG_PPC64 according to the current makefile
> for iommu.c.  One set is burried in the middle of iommu.h.

I dislike the ifdef's ...

Also, why remove the union ? IE. Do we really them to co-exist for a
given device ? I'm doing something similar for another (not released
yet) processor where I'm flicking between direct and iommu at
set_dma_mask time, it's easy enough to change the union content.

Cheers,
Ben.

> ---
>  arch/powerpc/include/asm/device.h      |   14 ++++++--------
>  arch/powerpc/include/asm/dma-mapping.h |    4 ++--
>  arch/powerpc/include/asm/iommu.h       |    6 ++++--
>  3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
> index 16d25c0..ed883ea 100644
> --- a/arch/powerpc/include/asm/device.h
> +++ b/arch/powerpc/include/asm/device.h
> @@ -19,14 +19,12 @@ struct dev_archdata {
>  	/* DMA operations on that device */
>  	struct dma_map_ops	*dma_ops;
>  
> -	/*
> -	 * When an iommu is in use, dma_data is used as a ptr to the base of the
> -	 * iommu_table.  Otherwise, it is a simple numerical offset.
> -	 */
> -	union {
> -		dma_addr_t	dma_offset;
> -		void		*iommu_table_base;
> -	} dma_data;
> +	/* dma_offset is used by swiotlb and direct dma ops, but no iommu */
> +	dma_addr_t	dma_offset;
> +
> +#ifdef CONFIG_PPC64
> +	void		*iommu_table_base;
> +#endif
>  
>  #ifdef CONFIG_SWIOTLB
>  	dma_addr_t		max_direct_dma_addr;
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index 8c9c6ad..644103a 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -100,7 +100,7 @@ static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
>  static inline dma_addr_t get_dma_offset(struct device *dev)
>  {
>  	if (dev)
> -		return dev->archdata.dma_data.dma_offset;
> +		return dev->archdata.dma_offset;
>  
>  	return PCI_DRAM_OFFSET;
>  }
> @@ -108,7 +108,7 @@ static inline dma_addr_t get_dma_offset(struct device *dev)
>  static inline void set_dma_offset(struct device *dev, dma_addr_t off)
>  {
>  	if (dev)
> -		dev->archdata.dma_data.dma_offset = off;
> +		dev->archdata.dma_offset = off;
>  }
>  
>  /* this will be removed soon */
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index edfc980..0f605a4 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -70,15 +70,17 @@ struct iommu_table {
>  
>  struct scatterlist;
>  
> +#ifdef CONFIG_PPC64
>  static inline void set_iommu_table_base(struct device *dev, void *base)
>  {
> -	dev->archdata.dma_data.iommu_table_base = base;
> +	dev->archdata.iommu_table_base = base;
>  }
>  
>  static inline void *get_iommu_table_base(struct device *dev)
>  {
> -	return dev->archdata.dma_data.iommu_table_base;
> +	return dev->archdata.iommu_table_base;
>  }
> +#endif
>  
>  /* Frees table for an individual device node */
>  extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask
  2010-10-08 17:33 ` [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask Nishanth Aravamudan
@ 2010-10-08 23:43   ` Benjamin Herrenschmidt
  2010-10-08 23:44   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-08 23:43 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Anton Vorontsov, miltonm, linux-kernel, FUJITA Tomonori,
	Paul Mackerras, Scott Wood, Andrew Morton, linuxppc-dev

On Fri, 2010-10-08 at 10:33 -0700, Nishanth Aravamudan wrote:
> Also allow the coherent ops to be iommu if only the coherent mask is too
> small, mostly for driver that do not set set the coherent mask but also
> don't use the coherent api.

You are doing the transition at map_sg time which is a hot path, I don't
like that. Also you add all those "choose" variants of the dma ops...
not very nice at all.

You may want to look at the patches I posted to the list a while back
for doing direct DMA on Bimini:

> Signed-off-by: Milton Miller <miltonm@bga.com>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> ---
>  arch/powerpc/include/asm/dma-mapping.h |    2 +
>  arch/powerpc/kernel/Makefile           |    2 +-
>  arch/powerpc/kernel/dma-choose64.c     |  167 ++++++++++++++++++++++++++++++++
>  3 files changed, 170 insertions(+), 1 deletions(-)
>  create mode 100644 arch/powerpc/kernel/dma-choose64.c
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index 644103a..9ffb16a 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -68,6 +68,8 @@ static inline unsigned long device_to_mask(struct device *dev)
>   */
>  #ifdef CONFIG_PPC64
>  extern struct dma_map_ops dma_iommu_ops;
> +extern struct dma_map_ops dma_choose64_ops;
> +extern struct dma_map_ops dma_iommu_coherent_ops;
>  #endif
>  extern struct dma_map_ops dma_direct_ops;
>  
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1dda701..21b8ea1 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -82,7 +82,7 @@ obj-y				+= time.o prom.o traps.o setup-common.o \
>  				   udbg.o misc.o io.o dma.o \
>  				   misc_$(CONFIG_WORD_SIZE).o
>  obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o
> -obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
> +obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o dma-choose64.o
>  obj-$(CONFIG_KGDB)		+= kgdb.o
>  obj-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE)	+= prom_init.o
>  obj-$(CONFIG_MODULES)		+= ppc_ksyms.o
> diff --git a/arch/powerpc/kernel/dma-choose64.c b/arch/powerpc/kernel/dma-choose64.c
> new file mode 100644
> index 0000000..17c716f
> --- /dev/null
> +++ b/arch/powerpc/kernel/dma-choose64.c
> @@ -0,0 +1,167 @@
> +/*
> + * Copyright (C) 2006 Benjamin Herrenschmidt, IBM Corporation
> + *
> + * Provide default implementations of the DMA mapping callbacks for
> + * directly mapped busses.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/bug.h>
> +
> +/*
> + * DMA operations that choose between a 64-bit direct mapping and and iommu
> + *
> + * This set of dma ops chooses between directing to a static 1:1 mapping
> + * that may require a 64 bit address and a iommu based on the declared
> + * streaming and coherent masks for the device.  The choice is made on
> + * the first dma map call.
> + */
> +
> +/* first BUG ops for calls out of sequence */
> +
> +void *dma_bug_alloc_coherent(struct device *dev, size_t size,
> +				dma_addr_t *dma_handle, gfp_t flag)
> +{
> +	BUG();
> +
> +	return NULL;
> +}
> +
> +void dma_bug_free_coherent(struct device *dev, size_t size,
> +			      void *vaddr, dma_addr_t dma_handle)
> +{
> +	BUG();
> +}
> +
> +static int dma_bug_dma_supported(struct device *dev, u64 mask)
> +{
> +	BUG();
> +
> +	return 0;
> +}
> +
> +static int dma_bug_map_sg(struct device *dev, struct scatterlist *sgl,
> +			     int nents, enum dma_data_direction direction,
> +			     struct dma_attrs *attrs)
> +{
> +	BUG();
> +
> +	return 0;
> +}
> +
> +
> +static void dma_bug_unmap_sg(struct device *dev, struct scatterlist *sg,
> +				int nents, enum dma_data_direction direction,
> +				struct dma_attrs *attrs)
> +{
> +	BUG();
> +}
> +
> +static dma_addr_t dma_bug_map_page(struct device *dev,
> +					     struct page *page,
> +					     unsigned long offset,
> +					     size_t size,
> +					     enum dma_data_direction dir,
> +					     struct dma_attrs *attrs)
> +{
> +	BUG();
> +
> +	return DMA_ERROR_CODE;
> +}
> +
> +
> +static void dma_bug_unmap_page(struct device *dev,
> +					 dma_addr_t dma_address,
> +					 size_t size,
> +					 enum dma_data_direction direction,
> +					 struct dma_attrs *attrs)
> +{
> +	BUG();
> +}
> +
> +
> +static struct dma_map_ops *choose(struct device *dev)
> +{
> +	if (dma_direct_ops.dma_supported(dev, device_to_mask(dev))) {
> +		if (dma_direct_ops.dma_supported(dev, dev->coherent_dma_mask))
> +			return &dma_direct_ops;
> +		return &dma_iommu_coherent_ops;
> +	}
> +	return &dma_iommu_ops;
> +}
> +
> +void *dma_choose64_alloc_coherent(struct device *dev, size_t size,
> +				dma_addr_t *dma_handle, gfp_t flag)
> +{
> +	struct dma_map_ops *new = choose(dev);
> +
> +	set_dma_ops(dev, new);
> +	return new->alloc_coherent(dev, size, dma_handle, flag);
> +}
> +
> +static int dma_choose64_map_sg(struct device *dev, struct scatterlist *sgl,
> +			     int nents, enum dma_data_direction direction,
> +			     struct dma_attrs *attrs)
> +{
> +	struct dma_map_ops *new = choose(dev);
> +
> +	set_dma_ops(dev, new);
> +	return new->map_sg(dev, sgl, nents, direction, attrs);
> +}
> +
> +
> +static int dma_choose64_dma_supported(struct device *dev, u64 mask)
> +{
> +	return dma_direct_ops.dma_supported(dev, mask) ||
> +		dma_iommu_ops.dma_supported(dev, mask);
> +}
> +
> +static dma_addr_t dma_choose64_map_page(struct device *dev,
> +					     struct page *page,
> +					     unsigned long offset,
> +					     size_t size,
> +					     enum dma_data_direction dir,
> +					     struct dma_attrs *attrs)
> +{
> +	struct dma_map_ops *new = choose(dev);
> +
> +	set_dma_ops(dev, new);
> +	return new->map_page(dev, page, offset, size, dir, attrs);
> +}
> +
> +struct dma_map_ops dma_choose64_ops = {
> +	.alloc_coherent	= dma_choose64_alloc_coherent,
> +	.free_coherent	= dma_bug_free_coherent,
> +	.map_sg		= dma_choose64_map_sg,
> +	.unmap_sg	= dma_bug_unmap_sg,
> +	.dma_supported	= dma_choose64_dma_supported,
> +	.map_page	= dma_choose64_map_page,
> +	.unmap_page	= dma_bug_unmap_page,
> +};
> +EXPORT_SYMBOL(dma_choose64_ops);
> +
> +/* set these up to BUG() until we initialze them in the arch initcall below */
> +struct dma_map_ops dma_iommu_coherent_ops = {
> +	.alloc_coherent	= dma_bug_alloc_coherent,
> +	.free_coherent	= dma_bug_free_coherent,
> +	.map_sg		= dma_bug_map_sg,
> +	.unmap_sg	= dma_bug_unmap_sg,
> +	.dma_supported	= dma_bug_dma_supported,
> +	.map_page	= dma_bug_map_page,
> +	.unmap_page	= dma_bug_unmap_page,
> +};
> +EXPORT_SYMBOL(dma_iommu_coherent_ops);
> +
> +static int setup_choose64_ops(void)
> +{
> +	dma_iommu_coherent_ops = dma_direct_ops;
> +	dma_iommu_coherent_ops.alloc_coherent = dma_iommu_ops.alloc_coherent;
> +	dma_iommu_coherent_ops.free_coherent = dma_iommu_ops.free_coherent;
> +
> +	/* should we be stricter? */
> +	dma_iommu_coherent_ops.dma_supported = dma_choose64_dma_supported;
> +
> +	return 0;
> +}
> +arch_initcall(setup_choose64_ops);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask
  2010-10-08 17:33 ` [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask Nishanth Aravamudan
  2010-10-08 23:43   ` Benjamin Herrenschmidt
@ 2010-10-08 23:44   ` Benjamin Herrenschmidt
  2010-10-10 15:09     ` FUJITA Tomonori
  1 sibling, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-08 23:44 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Anton Vorontsov, miltonm, linux-kernel, FUJITA Tomonori,
	Paul Mackerras, Scott Wood, Andrew Morton, linuxppc-dev

On Fri, 2010-10-08 at 10:33 -0700, Nishanth Aravamudan wrote:
> Also allow the coherent ops to be iommu if only the coherent mask is too
> small, mostly for driver that do not set set the coherent mask but also
> don't use the coherent api.

You are doing the transition at map_sg time which is a hot path, I don't
like that. Also you add all those "choose" variants of the dma ops...
not very nice at all.

You may want to look at the patches I posted to the list a while back
for doing direct DMA on Bimini:

[PATCH 1/2] powerpc/dma: Add optional platform override of dma_set_mask()
[PATCH 2/2] powerpc/dart_iommu: Support for 64-bit iommu bypass window on PCIe

Cheers,
Ben.

> Signed-off-by: Milton Miller <miltonm@bga.com>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> ---
>  arch/powerpc/include/asm/dma-mapping.h |    2 +
>  arch/powerpc/kernel/Makefile           |    2 +-
>  arch/powerpc/kernel/dma-choose64.c     |  167 ++++++++++++++++++++++++++++++++
>  3 files changed, 170 insertions(+), 1 deletions(-)
>  create mode 100644 arch/powerpc/kernel/dma-choose64.c
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index 644103a..9ffb16a 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -68,6 +68,8 @@ static inline unsigned long device_to_mask(struct device *dev)
>   */
>  #ifdef CONFIG_PPC64
>  extern struct dma_map_ops dma_iommu_ops;
> +extern struct dma_map_ops dma_choose64_ops;
> +extern struct dma_map_ops dma_iommu_coherent_ops;
>  #endif
>  extern struct dma_map_ops dma_direct_ops;
>  
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1dda701..21b8ea1 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -82,7 +82,7 @@ obj-y				+= time.o prom.o traps.o setup-common.o \
>  				   udbg.o misc.o io.o dma.o \
>  				   misc_$(CONFIG_WORD_SIZE).o
>  obj-$(CONFIG_PPC32)		+= entry_32.o setup_32.o
> -obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o
> +obj-$(CONFIG_PPC64)		+= dma-iommu.o iommu.o dma-choose64.o
>  obj-$(CONFIG_KGDB)		+= kgdb.o
>  obj-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE)	+= prom_init.o
>  obj-$(CONFIG_MODULES)		+= ppc_ksyms.o
> diff --git a/arch/powerpc/kernel/dma-choose64.c b/arch/powerpc/kernel/dma-choose64.c
> new file mode 100644
> index 0000000..17c716f
> --- /dev/null
> +++ b/arch/powerpc/kernel/dma-choose64.c
> @@ -0,0 +1,167 @@
> +/*
> + * Copyright (C) 2006 Benjamin Herrenschmidt, IBM Corporation
> + *
> + * Provide default implementations of the DMA mapping callbacks for
> + * directly mapped busses.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/bug.h>
> +
> +/*
> + * DMA operations that choose between a 64-bit direct mapping and and iommu
> + *
> + * This set of dma ops chooses between directing to a static 1:1 mapping
> + * that may require a 64 bit address and a iommu based on the declared
> + * streaming and coherent masks for the device.  The choice is made on
> + * the first dma map call.
> + */
> +
> +/* first BUG ops for calls out of sequence */
> +
> +void *dma_bug_alloc_coherent(struct device *dev, size_t size,
> +				dma_addr_t *dma_handle, gfp_t flag)
> +{
> +	BUG();
> +
> +	return NULL;
> +}
> +
> +void dma_bug_free_coherent(struct device *dev, size_t size,
> +			      void *vaddr, dma_addr_t dma_handle)
> +{
> +	BUG();
> +}
> +
> +static int dma_bug_dma_supported(struct device *dev, u64 mask)
> +{
> +	BUG();
> +
> +	return 0;
> +}
> +
> +static int dma_bug_map_sg(struct device *dev, struct scatterlist *sgl,
> +			     int nents, enum dma_data_direction direction,
> +			     struct dma_attrs *attrs)
> +{
> +	BUG();
> +
> +	return 0;
> +}
> +
> +
> +static void dma_bug_unmap_sg(struct device *dev, struct scatterlist *sg,
> +				int nents, enum dma_data_direction direction,
> +				struct dma_attrs *attrs)
> +{
> +	BUG();
> +}
> +
> +static dma_addr_t dma_bug_map_page(struct device *dev,
> +					     struct page *page,
> +					     unsigned long offset,
> +					     size_t size,
> +					     enum dma_data_direction dir,
> +					     struct dma_attrs *attrs)
> +{
> +	BUG();
> +
> +	return DMA_ERROR_CODE;
> +}
> +
> +
> +static void dma_bug_unmap_page(struct device *dev,
> +					 dma_addr_t dma_address,
> +					 size_t size,
> +					 enum dma_data_direction direction,
> +					 struct dma_attrs *attrs)
> +{
> +	BUG();
> +}
> +
> +
> +static struct dma_map_ops *choose(struct device *dev)
> +{
> +	if (dma_direct_ops.dma_supported(dev, device_to_mask(dev))) {
> +		if (dma_direct_ops.dma_supported(dev, dev->coherent_dma_mask))
> +			return &dma_direct_ops;
> +		return &dma_iommu_coherent_ops;
> +	}
> +	return &dma_iommu_ops;
> +}
> +
> +void *dma_choose64_alloc_coherent(struct device *dev, size_t size,
> +				dma_addr_t *dma_handle, gfp_t flag)
> +{
> +	struct dma_map_ops *new = choose(dev);
> +
> +	set_dma_ops(dev, new);
> +	return new->alloc_coherent(dev, size, dma_handle, flag);
> +}
> +
> +static int dma_choose64_map_sg(struct device *dev, struct scatterlist *sgl,
> +			     int nents, enum dma_data_direction direction,
> +			     struct dma_attrs *attrs)
> +{
> +	struct dma_map_ops *new = choose(dev);
> +
> +	set_dma_ops(dev, new);
> +	return new->map_sg(dev, sgl, nents, direction, attrs);
> +}
> +
> +
> +static int dma_choose64_dma_supported(struct device *dev, u64 mask)
> +{
> +	return dma_direct_ops.dma_supported(dev, mask) ||
> +		dma_iommu_ops.dma_supported(dev, mask);
> +}
> +
> +static dma_addr_t dma_choose64_map_page(struct device *dev,
> +					     struct page *page,
> +					     unsigned long offset,
> +					     size_t size,
> +					     enum dma_data_direction dir,
> +					     struct dma_attrs *attrs)
> +{
> +	struct dma_map_ops *new = choose(dev);
> +
> +	set_dma_ops(dev, new);
> +	return new->map_page(dev, page, offset, size, dir, attrs);
> +}
> +
> +struct dma_map_ops dma_choose64_ops = {
> +	.alloc_coherent	= dma_choose64_alloc_coherent,
> +	.free_coherent	= dma_bug_free_coherent,
> +	.map_sg		= dma_choose64_map_sg,
> +	.unmap_sg	= dma_bug_unmap_sg,
> +	.dma_supported	= dma_choose64_dma_supported,
> +	.map_page	= dma_choose64_map_page,
> +	.unmap_page	= dma_bug_unmap_page,
> +};
> +EXPORT_SYMBOL(dma_choose64_ops);
> +
> +/* set these up to BUG() until we initialze them in the arch initcall below */
> +struct dma_map_ops dma_iommu_coherent_ops = {
> +	.alloc_coherent	= dma_bug_alloc_coherent,
> +	.free_coherent	= dma_bug_free_coherent,
> +	.map_sg		= dma_bug_map_sg,
> +	.unmap_sg	= dma_bug_unmap_sg,
> +	.dma_supported	= dma_bug_dma_supported,
> +	.map_page	= dma_bug_map_page,
> +	.unmap_page	= dma_bug_unmap_page,
> +};
> +EXPORT_SYMBOL(dma_iommu_coherent_ops);
> +
> +static int setup_choose64_ops(void)
> +{
> +	dma_iommu_coherent_ops = dma_direct_ops;
> +	dma_iommu_coherent_ops.alloc_coherent = dma_iommu_ops.alloc_coherent;
> +	dma_iommu_coherent_ops.free_coherent = dma_iommu_ops.free_coherent;
> +
> +	/* should we be stricter? */
> +	dma_iommu_coherent_ops.dma_supported = dma_choose64_dma_supported;
> +
> +	return 0;
> +}
> +arch_initcall(setup_choose64_ops);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask
  2010-10-08 23:44   ` Benjamin Herrenschmidt
@ 2010-10-10 15:09     ` FUJITA Tomonori
  2010-10-10 23:41       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 17+ messages in thread
From: FUJITA Tomonori @ 2010-10-10 15:09 UTC (permalink / raw)
  To: benh
  Cc: avorontsov, miltonm, linux-kernel, fujita.tomonori, paulus,
	scottwood, nacc, akpm, linuxppc-dev

On Sat, 09 Oct 2010 10:44:53 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Fri, 2010-10-08 at 10:33 -0700, Nishanth Aravamudan wrote:
> > Also allow the coherent ops to be iommu if only the coherent mask is too
> > small, mostly for driver that do not set set the coherent mask but also
> > don't use the coherent api.
> 
> You are doing the transition at map_sg time which is a hot path, I don't
> like that. Also you add all those "choose" variants of the dma ops...
> not very nice at all.

Agreed, looks hacky.


> You may want to look at the patches I posted to the list a while back
> for doing direct DMA on Bimini:
> 
> [PATCH 1/2] powerpc/dma: Add optional platform override of dma_set_mask()

Would it be cleaner if each ppc dma_map_ops has the own set_dma_mask
and dma_set_mask simply calls dma_map_ops->set_dma_mask?


> [PATCH 2/2] powerpc/dart_iommu: Support for 64-bit iommu bypass window on PCIe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask
  2010-10-10 15:09     ` FUJITA Tomonori
@ 2010-10-10 23:41       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-10 23:41 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: avorontsov, miltonm, linux-kernel, paulus, scottwood, nacc, akpm,
	linuxppc-dev

On Mon, 2010-10-11 at 00:09 +0900, FUJITA Tomonori wrote:
> On Sat, 09 Oct 2010 10:44:53 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Fri, 2010-10-08 at 10:33 -0700, Nishanth Aravamudan wrote:
> > > Also allow the coherent ops to be iommu if only the coherent mask is too
> > > small, mostly for driver that do not set set the coherent mask but also
> > > don't use the coherent api.
> > 
> > You are doing the transition at map_sg time which is a hot path, I don't
> > like that. Also you add all those "choose" variants of the dma ops...
> > not very nice at all.
> 
> Agreed, looks hacky.
> 
> 
> > You may want to look at the patches I posted to the list a while back
> > for doing direct DMA on Bimini:
> > 
> > [PATCH 1/2] powerpc/dma: Add optional platform override of dma_set_mask()
> 
> Would it be cleaner if each ppc dma_map_ops has the own set_dma_mask
> and dma_set_mask simply calls dma_map_ops->set_dma_mask?

I'm not sure I parse what you wrote above :-)

I did try with various methods back then, and what ended up sucking the
less was basically to hookup dma_set_mask() at the arch level.

In fact, it makes sense to the extent that the arch is the one that
knows that there are multiple regions configured potentially with
different capabilities.

You can still do the switch within the dma_ops->set_dma_mask if you want
I suppose, especially if you end up hitting different attribute regions
within a single bus or such but from my experience, it gets really hacky
with multiple ops structures etc...
> 
> > [PATCH 2/2] powerpc/dart_iommu: Support for 64-bit iommu bypass window on PCIe
> 

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-10-10 23:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-08 17:33 [RFC PATCH 00/11] ppc: enable dynamic dma window support Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 01/11] macio: ensure all dma routines get copied over Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 02/11] ppc: allow direct and iommu to coexist Nishanth Aravamudan
2010-10-08 23:38   ` Benjamin Herrenschmidt
2010-10-08 17:33 ` [RFC PATCH 03/11] ppc: Create ops to choose between direct window and iommu based on device mask Nishanth Aravamudan
2010-10-08 23:43   ` Benjamin Herrenschmidt
2010-10-08 23:44   ` Benjamin Herrenschmidt
2010-10-10 15:09     ` FUJITA Tomonori
2010-10-10 23:41       ` Benjamin Herrenschmidt
2010-10-08 17:33 ` [RFC PATCH 04/11] ppc: add memory_hotplug_max Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 05/11] ppc: do not search for dma-window property on dlpar remove Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 06/11] ppc: checking for pdn->parent is redundant Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 07/11] ppc/iommu: do not need to check for dma_window == NULL Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 08/11] ppc/iommu: remove unneeded pci_dma_bus_setup_pSeriesLP Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 09/11] ppc/iommu: pass phb only to iommu_table_setparms_lpar Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 10/11] ppc/iommu: add routines to pseries iommu to map tces 1-1 Nishanth Aravamudan
2010-10-08 17:33 ` [RFC PATCH 11/11] ppc: add dynamic dma window support Nishanth Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).