[PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-25 11:02 ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

From: Robert Richter <rrichter@cavium.com>

This series implements the use of CMA for allocation of large device
tables for the arm64 gicv3 interrupt controller.

There are 2 patches, the first is for early activation of cma, which
needs to be done before interrupt initialization to make it available
to the gicv3. The second implements the use of CMA to allocate
gicv3-its device tables.

This solves the problem where mem allocation is limited to 4MB. A
previous patch sent to the list to address this that instead increases
FORCE_MAX_ZONEORDER becomes obsolete.

Robert Richter (2):
  mm: cma: arm64: Introduce dma_activate_contiguous() for early
    activation
  irqchip, gicv3-its, cma: Use CMA for allocation of large device tables

 arch/arm64/kernel/irq.c          |  4 ++++
 drivers/base/dma-contiguous.c    | 14 ++++++++++++++
 drivers/irqchip/irq-gic-v3-its.c | 30 +++++++++++++++++++++---------
 include/linux/cma.h              |  1 +
 include/linux/dma-contiguous.h   |  8 ++++++++
 mm/cma.c                         |  6 +++++-
 6 files changed, 53 insertions(+), 10 deletions(-)

-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-25 11:02 ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

From: Robert Richter <rrichter@cavium.com>

This series implements the use of CMA for allocation of large device
tables for the arm64 gicv3 interrupt controller.

There are 2 patches, the first is for early activation of cma, which
needs to be done before interrupt initialization to make it available
to the gicv3. The second implements the use of CMA to allocate
gicv3-its device tables.

This solves the problem where mem allocation is limited to 4MB. A
previous patch sent to the list to address this that instead increases
FORCE_MAX_ZONEORDER becomes obsolete.

Robert Richter (2):
  mm: cma: arm64: Introduce dma_activate_contiguous() for early
    activation
  irqchip, gicv3-its, cma: Use CMA for allocation of large device tables

 arch/arm64/kernel/irq.c          |  4 ++++
 drivers/base/dma-contiguous.c    | 14 ++++++++++++++
 drivers/irqchip/irq-gic-v3-its.c | 30 +++++++++++++++++++++---------
 include/linux/cma.h              |  1 +
 include/linux/dma-contiguous.h   |  8 ++++++++
 mm/cma.c                         |  6 +++++-
 6 files changed, 53 insertions(+), 10 deletions(-)

-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-25 11:02 ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Richter <rrichter@cavium.com>

This series implements the use of CMA for allocation of large device
tables for the arm64 gicv3 interrupt controller.

There are 2 patches, the first is for early activation of cma, which
needs to be done before interrupt initialization to make it available
to the gicv3. The second implements the use of CMA to allocate
gicv3-its device tables.

This solves the problem where mem allocation is limited to 4MB. A
previous patch sent to the list to address this that instead increases
FORCE_MAX_ZONEORDER becomes obsolete.

Robert Richter (2):
  mm: cma: arm64: Introduce dma_activate_contiguous() for early
    activation
  irqchip, gicv3-its, cma: Use CMA for allocation of large device tables

 arch/arm64/kernel/irq.c          |  4 ++++
 drivers/base/dma-contiguous.c    | 14 ++++++++++++++
 drivers/irqchip/irq-gic-v3-its.c | 30 +++++++++++++++++++++---------
 include/linux/cma.h              |  1 +
 include/linux/dma-contiguous.h   |  8 ++++++++
 mm/cma.c                         |  6 +++++-
 6 files changed, 53 insertions(+), 10 deletions(-)

-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] mm: cma: arm64: Introduce dma_activate_contiguous() for early activation
  2016-02-25 11:02 ` Robert Richter
  (?)
@ 2016-02-25 11:02   ` Robert Richter
  -1 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

From: Robert Richter <rrichter@cavium.com>

For the arm64 gicv3 interrupt controller we need CMA to allocate large
blocks of physically contiguous memory. Usually page_alloc() is
limited by 2^(MAX_ORDER - 1), which is typically 4MB at 4k pagesize.
A current gicv3-its device table may have a size of up to 16MB.

Since the interrupt controller is initialized before other subsystems
(initcall functions), current dma activation (core_initcall) is too
late and makes it unusable for gicv3. On the other side, it is
generally possible to activate dma alloc right after the kernel's
memory initialization.

Now, this patch implements dma_activate_contiguous() to allow
architectures to enable dma alloc earlier. It also enables early dma
activation for the arm64 subsystem directly before interrupt
initialization and thus makes CMA usable for gicv3's memory
allocation.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 arch/arm64/kernel/irq.c        |  4 ++++
 drivers/base/dma-contiguous.c  | 14 ++++++++++++++
 include/linux/cma.h            |  1 +
 include/linux/dma-contiguous.h |  8 ++++++++
 mm/cma.c                       |  6 +++++-
 5 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 9f17ec071ee0..913b32021f50 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -27,6 +27,7 @@
 #include <linux/init.h>
 #include <linux/irqchip.h>
 #include <linux/seq_file.h>
+#include <linux/dma-contiguous.h>
 
 unsigned long irq_err_count;
 
@@ -49,6 +50,9 @@ void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
 
 void __init init_IRQ(void)
 {
+	/* early activate cma since some gic controllers need it */
+	dma_activate_contiguous();
+
 	irqchip_init();
 	if (!handle_arch_irq)
 		panic("No interrupt controller found.");
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index e167a1e1bccb..1c73d4899e8d 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -212,6 +212,20 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	return cma_release(dev_get_cma_area(dev), pages, count);
 }
 
+/**
+ * dma_activate_contiguous() - activate reserved areas for contiguous
+ *			       memory handling
+ *
+ * This function enables contiguous memory allocation. It can be used
+ * by archs for early initialization right after the kernel memory
+ * subsystem (like slab allocator) is available and if the
+ * core_initcall for it is too late.
+ */
+int __init dma_activate_contiguous(void)
+{
+	return cma_init_reserved_areas();
+}
+
 /*
  * Support for reserved memory regions defined in device tree
  */
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e774ab76..c2ab619769e6 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -23,6 +23,7 @@ extern int __init cma_declare_contiguous(phys_addr_t base,
 			phys_addr_t size, phys_addr_t limit,
 			phys_addr_t alignment, unsigned int order_per_bit,
 			bool fixed, struct cma **res_cma);
+extern int __init cma_init_reserved_areas(void);
 extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					unsigned int order_per_bit,
 					struct cma **res_cma);
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index fec734df1524..07d542b8bb4d 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -111,6 +111,8 @@ static inline int dma_declare_contiguous(struct device *dev, phys_addr_t size,
 	return ret;
 }
 
+int __init dma_activate_contiguous(void);
+
 struct page *dma_alloc_from_contiguous(struct device *dev, size_t count,
 				       unsigned int order);
 bool dma_release_from_contiguous(struct device *dev, struct page *pages,
@@ -157,6 +159,12 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	return false;
 }
 
+static inline
+int dma_activate_contiguous(void)
+{
+	return -ENOSYS;
+}
+
 #endif
 
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb18cd6..be1f55782c25 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,10 +142,14 @@ static int __init cma_activate_area(struct cma *cma)
 	return -EINVAL;
 }
 
-static int __init cma_init_reserved_areas(void)
+int __init cma_init_reserved_areas(void)
 {
 	int i;
 
+	if (cma_area_count && cma_areas[0].bitmap)
+		/* Already activated */
+		return 0;
+
 	for (i = 0; i < cma_area_count; i++) {
 		int ret = cma_activate_area(&cma_areas[i]);
 
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/2] mm: cma: arm64: Introduce dma_activate_contiguous() for early activation
@ 2016-02-25 11:02   ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

From: Robert Richter <rrichter@cavium.com>

For the arm64 gicv3 interrupt controller we need CMA to allocate large
blocks of physically contiguous memory. Usually page_alloc() is
limited by 2^(MAX_ORDER - 1), which is typically 4MB at 4k pagesize.
A current gicv3-its device table may have a size of up to 16MB.

Since the interrupt controller is initialized before other subsystems
(initcall functions), current dma activation (core_initcall) is too
late and makes it unusable for gicv3. On the other side, it is
generally possible to activate dma alloc right after the kernel's
memory initialization.

Now, this patch implements dma_activate_contiguous() to allow
architectures to enable dma alloc earlier. It also enables early dma
activation for the arm64 subsystem directly before interrupt
initialization and thus makes CMA usable for gicv3's memory
allocation.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 arch/arm64/kernel/irq.c        |  4 ++++
 drivers/base/dma-contiguous.c  | 14 ++++++++++++++
 include/linux/cma.h            |  1 +
 include/linux/dma-contiguous.h |  8 ++++++++
 mm/cma.c                       |  6 +++++-
 5 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 9f17ec071ee0..913b32021f50 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -27,6 +27,7 @@
 #include <linux/init.h>
 #include <linux/irqchip.h>
 #include <linux/seq_file.h>
+#include <linux/dma-contiguous.h>
 
 unsigned long irq_err_count;
 
@@ -49,6 +50,9 @@ void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
 
 void __init init_IRQ(void)
 {
+	/* early activate cma since some gic controllers need it */
+	dma_activate_contiguous();
+
 	irqchip_init();
 	if (!handle_arch_irq)
 		panic("No interrupt controller found.");
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index e167a1e1bccb..1c73d4899e8d 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -212,6 +212,20 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	return cma_release(dev_get_cma_area(dev), pages, count);
 }
 
+/**
+ * dma_activate_contiguous() - activate reserved areas for contiguous
+ *			       memory handling
+ *
+ * This function enables contiguous memory allocation. It can be used
+ * by archs for early initialization right after the kernel memory
+ * subsystem (like slab allocator) is available and if the
+ * core_initcall for it is too late.
+ */
+int __init dma_activate_contiguous(void)
+{
+	return cma_init_reserved_areas();
+}
+
 /*
  * Support for reserved memory regions defined in device tree
  */
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e774ab76..c2ab619769e6 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -23,6 +23,7 @@ extern int __init cma_declare_contiguous(phys_addr_t base,
 			phys_addr_t size, phys_addr_t limit,
 			phys_addr_t alignment, unsigned int order_per_bit,
 			bool fixed, struct cma **res_cma);
+extern int __init cma_init_reserved_areas(void);
 extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					unsigned int order_per_bit,
 					struct cma **res_cma);
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index fec734df1524..07d542b8bb4d 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -111,6 +111,8 @@ static inline int dma_declare_contiguous(struct device *dev, phys_addr_t size,
 	return ret;
 }
 
+int __init dma_activate_contiguous(void);
+
 struct page *dma_alloc_from_contiguous(struct device *dev, size_t count,
 				       unsigned int order);
 bool dma_release_from_contiguous(struct device *dev, struct page *pages,
@@ -157,6 +159,12 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	return false;
 }
 
+static inline
+int dma_activate_contiguous(void)
+{
+	return -ENOSYS;
+}
+
 #endif
 
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb18cd6..be1f55782c25 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,10 +142,14 @@ static int __init cma_activate_area(struct cma *cma)
 	return -EINVAL;
 }
 
-static int __init cma_init_reserved_areas(void)
+int __init cma_init_reserved_areas(void)
 {
 	int i;
 
+	if (cma_area_count && cma_areas[0].bitmap)
+		/* Already activated */
+		return 0;
+
 	for (i = 0; i < cma_area_count; i++) {
 		int ret = cma_activate_area(&cma_areas[i]);
 
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/2] mm: cma: arm64: Introduce dma_activate_contiguous() for early activation
@ 2016-02-25 11:02   ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Richter <rrichter@cavium.com>

For the arm64 gicv3 interrupt controller we need CMA to allocate large
blocks of physically contiguous memory. Usually page_alloc() is
limited by 2^(MAX_ORDER - 1), which is typically 4MB at 4k pagesize.
A current gicv3-its device table may have a size of up to 16MB.

Since the interrupt controller is initialized before other subsystems
(initcall functions), current dma activation (core_initcall) is too
late and makes it unusable for gicv3. On the other side, it is
generally possible to activate dma alloc right after the kernel's
memory initialization.

Now, this patch implements dma_activate_contiguous() to allow
architectures to enable dma alloc earlier. It also enables early dma
activation for the arm64 subsystem directly before interrupt
initialization and thus makes CMA usable for gicv3's memory
allocation.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 arch/arm64/kernel/irq.c        |  4 ++++
 drivers/base/dma-contiguous.c  | 14 ++++++++++++++
 include/linux/cma.h            |  1 +
 include/linux/dma-contiguous.h |  8 ++++++++
 mm/cma.c                       |  6 +++++-
 5 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 9f17ec071ee0..913b32021f50 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -27,6 +27,7 @@
 #include <linux/init.h>
 #include <linux/irqchip.h>
 #include <linux/seq_file.h>
+#include <linux/dma-contiguous.h>
 
 unsigned long irq_err_count;
 
@@ -49,6 +50,9 @@ void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
 
 void __init init_IRQ(void)
 {
+	/* early activate cma since some gic controllers need it */
+	dma_activate_contiguous();
+
 	irqchip_init();
 	if (!handle_arch_irq)
 		panic("No interrupt controller found.");
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index e167a1e1bccb..1c73d4899e8d 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -212,6 +212,20 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	return cma_release(dev_get_cma_area(dev), pages, count);
 }
 
+/**
+ * dma_activate_contiguous() - activate reserved areas for contiguous
+ *			       memory handling
+ *
+ * This function enables contiguous memory allocation. It can be used
+ * by archs for early initialization right after the kernel memory
+ * subsystem (like slab allocator) is available and if the
+ * core_initcall for it is too late.
+ */
+int __init dma_activate_contiguous(void)
+{
+	return cma_init_reserved_areas();
+}
+
 /*
  * Support for reserved memory regions defined in device tree
  */
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e774ab76..c2ab619769e6 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -23,6 +23,7 @@ extern int __init cma_declare_contiguous(phys_addr_t base,
 			phys_addr_t size, phys_addr_t limit,
 			phys_addr_t alignment, unsigned int order_per_bit,
 			bool fixed, struct cma **res_cma);
+extern int __init cma_init_reserved_areas(void);
 extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					unsigned int order_per_bit,
 					struct cma **res_cma);
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index fec734df1524..07d542b8bb4d 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -111,6 +111,8 @@ static inline int dma_declare_contiguous(struct device *dev, phys_addr_t size,
 	return ret;
 }
 
+int __init dma_activate_contiguous(void);
+
 struct page *dma_alloc_from_contiguous(struct device *dev, size_t count,
 				       unsigned int order);
 bool dma_release_from_contiguous(struct device *dev, struct page *pages,
@@ -157,6 +159,12 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	return false;
 }
 
+static inline
+int dma_activate_contiguous(void)
+{
+	return -ENOSYS;
+}
+
 #endif
 
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb18cd6..be1f55782c25 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,10 +142,14 @@ static int __init cma_activate_area(struct cma *cma)
 	return -EINVAL;
 }
 
-static int __init cma_init_reserved_areas(void)
+int __init cma_init_reserved_areas(void)
 {
 	int i;
 
+	if (cma_area_count && cma_areas[0].bitmap)
+		/* Already activated */
+		return 0;
+
 	for (i = 0; i < cma_area_count; i++) {
 		int ret = cma_activate_area(&cma_areas[i]);
 
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] irqchip, gicv3-its, cma: Use CMA for allocation of large device tables
  2016-02-25 11:02 ` Robert Richter
  (?)
@ 2016-02-25 11:02   ` Robert Richter
  -1 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Jason Cooper
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

From: Robert Richter <rrichter@cavium.com>

The gicv3-its device table may have a size of up to 16MB. With 4k
pagesize the maximum size of memory allocation is 4MB. Use CMA for
allocation of large tables.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 443ba8892f6f..c8914026d0e4 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -19,6 +19,7 @@
 #include <linux/bitmap.h>
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/dma-contiguous.h>
 #include <linux/interrupt.h>
 #include <linux/irqdomain.h>
 #include <linux/iort.h>
@@ -860,6 +861,7 @@ static int its_alloc_tables(struct its_node *its)
 		int alloc_pages;
 		u64 tmp;
 		void *base;
+		struct page *page;
 
 		if (type == GITS_BASER_TYPE_NONE)
 			continue;
@@ -881,13 +883,8 @@ static int its_alloc_tables(struct its_node *its)
 			 */
 			order = max(get_order((1UL << ids) * entry_size),
 				    order);
-			if (order >= MAX_ORDER) {
-				order = MAX_ORDER - 1;
-				pr_warn("ITS@0x%lx: Device Table too large, reduce its page order to %u\n",
-					its->phys_base, order);
-			}
 		}
-
+retry_alloc:
 		alloc_size = (1 << order) * PAGE_SIZE;
 		alloc_pages = (alloc_size / psz);
 		if (alloc_pages > GITS_BASER_PAGES_MAX) {
@@ -897,8 +894,22 @@ static int its_alloc_tables(struct its_node *its)
 				its->phys_base, order, alloc_pages);
 		}
 
-		base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+		if (order >= MAX_ORDER) {
+			page = dma_alloc_from_contiguous(NULL, 1 << order, 0);
+			base = page ? page_address(page) : NULL;
+			if (!base) {
+				order = MAX_ORDER - 1;
+				pr_warn("ITS@0x%lx: Device table too large, reduce its page order to %u\n",
+					its->phys_base, order);
+				goto retry_alloc;
+			}
+		} else {
+			base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+		}
+
 		if (!base) {
+			pr_err("ITS@0x%lx: Failed to allocate device table\n",
+				its->phys_base);
 			err = -ENOMEM;
 			goto out_free;
 		}
@@ -970,11 +981,12 @@ static int its_alloc_tables(struct its_node *its)
 			goto out_free;
 		}
 
-		pr_info("ITS: allocated %d %s @%lx (psz %dK, shr %d)\n",
+		pr_info("ITS: allocated %d %s @%lx (psz %dK, shr %d)%s\n",
 			(int)(alloc_size / entry_size),
 			its_base_type_string[type],
 			(unsigned long)virt_to_phys(base),
-			psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT);
+			psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT,
+			order >= MAX_ORDER ? " using CMA" : "");
 	}
 
 	return 0;
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] irqchip, gicv3-its, cma: Use CMA for allocation of large device tables
@ 2016-02-25 11:02   ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Jason Cooper
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

From: Robert Richter <rrichter@cavium.com>

The gicv3-its device table may have a size of up to 16MB. With 4k
pagesize the maximum size of memory allocation is 4MB. Use CMA for
allocation of large tables.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 443ba8892f6f..c8914026d0e4 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -19,6 +19,7 @@
 #include <linux/bitmap.h>
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/dma-contiguous.h>
 #include <linux/interrupt.h>
 #include <linux/irqdomain.h>
 #include <linux/iort.h>
@@ -860,6 +861,7 @@ static int its_alloc_tables(struct its_node *its)
 		int alloc_pages;
 		u64 tmp;
 		void *base;
+		struct page *page;
 
 		if (type == GITS_BASER_TYPE_NONE)
 			continue;
@@ -881,13 +883,8 @@ static int its_alloc_tables(struct its_node *its)
 			 */
 			order = max(get_order((1UL << ids) * entry_size),
 				    order);
-			if (order >= MAX_ORDER) {
-				order = MAX_ORDER - 1;
-				pr_warn("ITS@0x%lx: Device Table too large, reduce its page order to %u\n",
-					its->phys_base, order);
-			}
 		}
-
+retry_alloc:
 		alloc_size = (1 << order) * PAGE_SIZE;
 		alloc_pages = (alloc_size / psz);
 		if (alloc_pages > GITS_BASER_PAGES_MAX) {
@@ -897,8 +894,22 @@ static int its_alloc_tables(struct its_node *its)
 				its->phys_base, order, alloc_pages);
 		}
 
-		base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+		if (order >= MAX_ORDER) {
+			page = dma_alloc_from_contiguous(NULL, 1 << order, 0);
+			base = page ? page_address(page) : NULL;
+			if (!base) {
+				order = MAX_ORDER - 1;
+				pr_warn("ITS@0x%lx: Device table too large, reduce its page order to %u\n",
+					its->phys_base, order);
+				goto retry_alloc;
+			}
+		} else {
+			base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+		}
+
 		if (!base) {
+			pr_err("ITS@0x%lx: Failed to allocate device table\n",
+				its->phys_base);
 			err = -ENOMEM;
 			goto out_free;
 		}
@@ -970,11 +981,12 @@ static int its_alloc_tables(struct its_node *its)
 			goto out_free;
 		}
 
-		pr_info("ITS: allocated %d %s @%lx (psz %dK, shr %d)\n",
+		pr_info("ITS: allocated %d %s @%lx (psz %dK, shr %d)%s\n",
 			(int)(alloc_size / entry_size),
 			its_base_type_string[type],
 			(unsigned long)virt_to_phys(base),
-			psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT);
+			psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT,
+			order >= MAX_ORDER ? " using CMA" : "");
 	}
 
 	return 0;
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] irqchip, gicv3-its, cma: Use CMA for allocation of large device tables
@ 2016-02-25 11:02   ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-25 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Richter <rrichter@cavium.com>

The gicv3-its device table may have a size of up to 16MB. With 4k
pagesize the maximum size of memory allocation is 4MB. Use CMA for
allocation of large tables.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 443ba8892f6f..c8914026d0e4 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -19,6 +19,7 @@
 #include <linux/bitmap.h>
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/dma-contiguous.h>
 #include <linux/interrupt.h>
 #include <linux/irqdomain.h>
 #include <linux/iort.h>
@@ -860,6 +861,7 @@ static int its_alloc_tables(struct its_node *its)
 		int alloc_pages;
 		u64 tmp;
 		void *base;
+		struct page *page;
 
 		if (type == GITS_BASER_TYPE_NONE)
 			continue;
@@ -881,13 +883,8 @@ static int its_alloc_tables(struct its_node *its)
 			 */
 			order = max(get_order((1UL << ids) * entry_size),
 				    order);
-			if (order >= MAX_ORDER) {
-				order = MAX_ORDER - 1;
-				pr_warn("ITS at 0x%lx: Device Table too large, reduce its page order to %u\n",
-					its->phys_base, order);
-			}
 		}
-
+retry_alloc:
 		alloc_size = (1 << order) * PAGE_SIZE;
 		alloc_pages = (alloc_size / psz);
 		if (alloc_pages > GITS_BASER_PAGES_MAX) {
@@ -897,8 +894,22 @@ static int its_alloc_tables(struct its_node *its)
 				its->phys_base, order, alloc_pages);
 		}
 
-		base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+		if (order >= MAX_ORDER) {
+			page = dma_alloc_from_contiguous(NULL, 1 << order, 0);
+			base = page ? page_address(page) : NULL;
+			if (!base) {
+				order = MAX_ORDER - 1;
+				pr_warn("ITS@0x%lx: Device table too large, reduce its page order to %u\n",
+					its->phys_base, order);
+				goto retry_alloc;
+			}
+		} else {
+			base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+		}
+
 		if (!base) {
+			pr_err("ITS at 0x%lx: Failed to allocate device table\n",
+				its->phys_base);
 			err = -ENOMEM;
 			goto out_free;
 		}
@@ -970,11 +981,12 @@ static int its_alloc_tables(struct its_node *its)
 			goto out_free;
 		}
 
-		pr_info("ITS: allocated %d %s @%lx (psz %dK, shr %d)\n",
+		pr_info("ITS: allocated %d %s @%lx (psz %dK, shr %d)%s\n",
 			(int)(alloc_size / entry_size),
 			its_base_type_string[type],
 			(unsigned long)virt_to_phys(base),
-			psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT);
+			psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT,
+			order >= MAX_ORDER ? " using CMA" : "");
 	}
 
 	return 0;
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
  2016-02-25 11:02 ` Robert Richter
  (?)
@ 2016-02-29 10:46   ` Marc Zyngier
  -1 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-02-29 10:46 UTC (permalink / raw)
  To: Robert Richter, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

Hi Robert,

On 25/02/16 11:02, Robert Richter wrote:
> From: Robert Richter <rrichter@cavium.com>
> 
> This series implements the use of CMA for allocation of large device
> tables for the arm64 gicv3 interrupt controller.
> 
> There are 2 patches, the first is for early activation of cma, which
> needs to be done before interrupt initialization to make it available
> to the gicv3. The second implements the use of CMA to allocate
> gicv3-its device tables.
> 
> This solves the problem where mem allocation is limited to 4MB. A
> previous patch sent to the list to address this that instead increases
> FORCE_MAX_ZONEORDER becomes obsolete.

I think you're looking at the problem the wrong way. Instead of going
through CMA directly, I'd rather go through the normal DMA API
(dma_alloc_coherent), which can itself try CMA (should it be enabled).

That will give you all the benefit of the CMA allocation, and also make
the driver more robust. I meant to do this for a while, and never found
the time. Any chance you could have a look?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 10:46   ` Marc Zyngier
  0 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-02-29 10:46 UTC (permalink / raw)
  To: Robert Richter, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner
  Cc: Tirumalesh Chalamarla, linux-arm-kernel, linux-mm, linux-kernel,
	Robert Richter

Hi Robert,

On 25/02/16 11:02, Robert Richter wrote:
> From: Robert Richter <rrichter@cavium.com>
> 
> This series implements the use of CMA for allocation of large device
> tables for the arm64 gicv3 interrupt controller.
> 
> There are 2 patches, the first is for early activation of cma, which
> needs to be done before interrupt initialization to make it available
> to the gicv3. The second implements the use of CMA to allocate
> gicv3-its device tables.
> 
> This solves the problem where mem allocation is limited to 4MB. A
> previous patch sent to the list to address this that instead increases
> FORCE_MAX_ZONEORDER becomes obsolete.

I think you're looking at the problem the wrong way. Instead of going
through CMA directly, I'd rather go through the normal DMA API
(dma_alloc_coherent), which can itself try CMA (should it be enabled).

That will give you all the benefit of the CMA allocation, and also make
the driver more robust. I meant to do this for a while, and never found
the time. Any chance you could have a look?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 10:46   ` Marc Zyngier
  0 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-02-29 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robert,

On 25/02/16 11:02, Robert Richter wrote:
> From: Robert Richter <rrichter@cavium.com>
> 
> This series implements the use of CMA for allocation of large device
> tables for the arm64 gicv3 interrupt controller.
> 
> There are 2 patches, the first is for early activation of cma, which
> needs to be done before interrupt initialization to make it available
> to the gicv3. The second implements the use of CMA to allocate
> gicv3-its device tables.
> 
> This solves the problem where mem allocation is limited to 4MB. A
> previous patch sent to the list to address this that instead increases
> FORCE_MAX_ZONEORDER becomes obsolete.

I think you're looking at the problem the wrong way. Instead of going
through CMA directly, I'd rather go through the normal DMA API
(dma_alloc_coherent), which can itself try CMA (should it be enabled).

That will give you all the benefit of the CMA allocation, and also make
the driver more robust. I meant to do this for a while, and never found
the time. Any chance you could have a look?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
  2016-02-29 10:46   ` Marc Zyngier
  (?)
@ 2016-02-29 12:25     ` Robert Richter
  -1 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-29 12:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 29.02.16 10:46:49, Marc Zyngier wrote:
> On 25/02/16 11:02, Robert Richter wrote:
> > From: Robert Richter <rrichter@cavium.com>
> > 
> > This series implements the use of CMA for allocation of large device
> > tables for the arm64 gicv3 interrupt controller.
> > 
> > There are 2 patches, the first is for early activation of cma, which
> > needs to be done before interrupt initialization to make it available
> > to the gicv3. The second implements the use of CMA to allocate
> > gicv3-its device tables.
> > 
> > This solves the problem where mem allocation is limited to 4MB. A
> > previous patch sent to the list to address this that instead increases
> > FORCE_MAX_ZONEORDER becomes obsolete.
> 
> I think you're looking at the problem the wrong way. Instead of going
> through CMA directly, I'd rather go through the normal DMA API
> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
> 
> That will give you all the benefit of the CMA allocation, and also make
> the driver more robust. I meant to do this for a while, and never found
> the time. Any chance you could have a look?

I was considering this first, and in fact the backend used is the
same. The problem is that irq initialization is much more earlier than
standard device probing. The gic even does not have its own struct
device and is not initialized like devices are. This makes the whole
dma_alloc_coherent() approach not feasable, at least this would
require introducing and using a dev struct for the gic. But still this
migth not work as it could be too early during boot. I also think
there were reasons not implementing the gic as a device.

I was following more the approach of iommu/mmu implementations which
use dma_alloc_from_contiguous() directly. I think this is more close
to the device tables for its.

Code path of dma_alloc_coherent():

 dma_alloc_coherent()
    v
 dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
    v
 dma_alloc_from_coherent()
    v
 ...

The difference it that dma_alloc_coherent() tries cma first and then
proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
dma_alloc_from_coherent() fails. In my implementation I am directly
using dma_alloc_from_coherent() and only for large mem sizes.

So both approaches uses finally the same allocation, but for gicv3-its
the generic dma framework is not used since the gic is not implemented
as a device.

Does this makes sense to you?

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 12:25     ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-29 12:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 29.02.16 10:46:49, Marc Zyngier wrote:
> On 25/02/16 11:02, Robert Richter wrote:
> > From: Robert Richter <rrichter@cavium.com>
> > 
> > This series implements the use of CMA for allocation of large device
> > tables for the arm64 gicv3 interrupt controller.
> > 
> > There are 2 patches, the first is for early activation of cma, which
> > needs to be done before interrupt initialization to make it available
> > to the gicv3. The second implements the use of CMA to allocate
> > gicv3-its device tables.
> > 
> > This solves the problem where mem allocation is limited to 4MB. A
> > previous patch sent to the list to address this that instead increases
> > FORCE_MAX_ZONEORDER becomes obsolete.
> 
> I think you're looking at the problem the wrong way. Instead of going
> through CMA directly, I'd rather go through the normal DMA API
> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
> 
> That will give you all the benefit of the CMA allocation, and also make
> the driver more robust. I meant to do this for a while, and never found
> the time. Any chance you could have a look?

I was considering this first, and in fact the backend used is the
same. The problem is that irq initialization is much more earlier than
standard device probing. The gic even does not have its own struct
device and is not initialized like devices are. This makes the whole
dma_alloc_coherent() approach not feasable, at least this would
require introducing and using a dev struct for the gic. But still this
migth not work as it could be too early during boot. I also think
there were reasons not implementing the gic as a device.

I was following more the approach of iommu/mmu implementations which
use dma_alloc_from_contiguous() directly. I think this is more close
to the device tables for its.

Code path of dma_alloc_coherent():

 dma_alloc_coherent()
    v
 dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
    v
 dma_alloc_from_coherent()
    v
 ...

The difference it that dma_alloc_coherent() tries cma first and then
proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
dma_alloc_from_coherent() fails. In my implementation I am directly
using dma_alloc_from_coherent() and only for large mem sizes.

So both approaches uses finally the same allocation, but for gicv3-its
the generic dma framework is not used since the gic is not implemented
as a device.

Does this makes sense to you?

Thanks,

-Robert

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 12:25     ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-02-29 12:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 29.02.16 10:46:49, Marc Zyngier wrote:
> On 25/02/16 11:02, Robert Richter wrote:
> > From: Robert Richter <rrichter@cavium.com>
> > 
> > This series implements the use of CMA for allocation of large device
> > tables for the arm64 gicv3 interrupt controller.
> > 
> > There are 2 patches, the first is for early activation of cma, which
> > needs to be done before interrupt initialization to make it available
> > to the gicv3. The second implements the use of CMA to allocate
> > gicv3-its device tables.
> > 
> > This solves the problem where mem allocation is limited to 4MB. A
> > previous patch sent to the list to address this that instead increases
> > FORCE_MAX_ZONEORDER becomes obsolete.
> 
> I think you're looking at the problem the wrong way. Instead of going
> through CMA directly, I'd rather go through the normal DMA API
> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
> 
> That will give you all the benefit of the CMA allocation, and also make
> the driver more robust. I meant to do this for a while, and never found
> the time. Any chance you could have a look?

I was considering this first, and in fact the backend used is the
same. The problem is that irq initialization is much more earlier than
standard device probing. The gic even does not have its own struct
device and is not initialized like devices are. This makes the whole
dma_alloc_coherent() approach not feasable, at least this would
require introducing and using a dev struct for the gic. But still this
migth not work as it could be too early during boot. I also think
there were reasons not implementing the gic as a device.

I was following more the approach of iommu/mmu implementations which
use dma_alloc_from_contiguous() directly. I think this is more close
to the device tables for its.

Code path of dma_alloc_coherent():

 dma_alloc_coherent()
    v
 dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
    v
 dma_alloc_from_coherent()
    v
 ...

The difference it that dma_alloc_coherent() tries cma first and then
proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
dma_alloc_from_coherent() fails. In my implementation I am directly
using dma_alloc_from_coherent() and only for large mem sizes.

So both approaches uses finally the same allocation, but for gicv3-its
the generic dma framework is not used since the gic is not implemented
as a device.

Does this makes sense to you?

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
  2016-02-29 12:25     ` Robert Richter
  (?)
@ 2016-02-29 13:30       ` Marc Zyngier
  -1 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-02-29 13:30 UTC (permalink / raw)
  To: Robert Richter
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 29/02/16 12:25, Robert Richter wrote:
> On 29.02.16 10:46:49, Marc Zyngier wrote:
>> On 25/02/16 11:02, Robert Richter wrote:
>>> From: Robert Richter <rrichter@cavium.com>
>>>
>>> This series implements the use of CMA for allocation of large device
>>> tables for the arm64 gicv3 interrupt controller.
>>>
>>> There are 2 patches, the first is for early activation of cma, which
>>> needs to be done before interrupt initialization to make it available
>>> to the gicv3. The second implements the use of CMA to allocate
>>> gicv3-its device tables.
>>>
>>> This solves the problem where mem allocation is limited to 4MB. A
>>> previous patch sent to the list to address this that instead increases
>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>
>> I think you're looking at the problem the wrong way. Instead of going
>> through CMA directly, I'd rather go through the normal DMA API
>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>
>> That will give you all the benefit of the CMA allocation, and also make
>> the driver more robust. I meant to do this for a while, and never found
>> the time. Any chance you could have a look?
> 
> I was considering this first, and in fact the backend used is the
> same. The problem is that irq initialization is much more earlier than
> standard device probing. The gic even does not have its own struct
> device and is not initialized like devices are. This makes the whole
> dma_alloc_coherent() approach not feasable, at least this would
> require introducing and using a dev struct for the gic. But still this
> migth not work as it could be too early during boot. I also think
> there were reasons not implementing the gic as a device.
> 
> I was following more the approach of iommu/mmu implementations which
> use dma_alloc_from_contiguous() directly. I think this is more close
> to the device tables for its.
> 
> Code path of dma_alloc_coherent():
> 
>  dma_alloc_coherent()
>     v
>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>     v
>  dma_alloc_from_coherent()
>     v
>  ...
> 
> The difference it that dma_alloc_coherent() tries cma first and then
> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
> dma_alloc_from_coherent() fails. In my implementation I am directly
> using dma_alloc_from_coherent() and only for large mem sizes.
> 
> So both approaches uses finally the same allocation, but for gicv3-its
> the generic dma framework is not used since the gic is not implemented
> as a device.

And that's what I propose we change.

The core GIC itself indeed isn't a device, and I'm not proposing we make
it a device (yet). But the ITS is only used much later in the game, and
we could move the table allocation to a different time (when the actual
domains are allocated, for example...). Then, we'd have a set of devices
available, and the DMA API is our friend again.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 13:30       ` Marc Zyngier
  0 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-02-29 13:30 UTC (permalink / raw)
  To: Robert Richter
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 29/02/16 12:25, Robert Richter wrote:
> On 29.02.16 10:46:49, Marc Zyngier wrote:
>> On 25/02/16 11:02, Robert Richter wrote:
>>> From: Robert Richter <rrichter@cavium.com>
>>>
>>> This series implements the use of CMA for allocation of large device
>>> tables for the arm64 gicv3 interrupt controller.
>>>
>>> There are 2 patches, the first is for early activation of cma, which
>>> needs to be done before interrupt initialization to make it available
>>> to the gicv3. The second implements the use of CMA to allocate
>>> gicv3-its device tables.
>>>
>>> This solves the problem where mem allocation is limited to 4MB. A
>>> previous patch sent to the list to address this that instead increases
>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>
>> I think you're looking at the problem the wrong way. Instead of going
>> through CMA directly, I'd rather go through the normal DMA API
>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>
>> That will give you all the benefit of the CMA allocation, and also make
>> the driver more robust. I meant to do this for a while, and never found
>> the time. Any chance you could have a look?
> 
> I was considering this first, and in fact the backend used is the
> same. The problem is that irq initialization is much more earlier than
> standard device probing. The gic even does not have its own struct
> device and is not initialized like devices are. This makes the whole
> dma_alloc_coherent() approach not feasable, at least this would
> require introducing and using a dev struct for the gic. But still this
> migth not work as it could be too early during boot. I also think
> there were reasons not implementing the gic as a device.
> 
> I was following more the approach of iommu/mmu implementations which
> use dma_alloc_from_contiguous() directly. I think this is more close
> to the device tables for its.
> 
> Code path of dma_alloc_coherent():
> 
>  dma_alloc_coherent()
>     v
>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>     v
>  dma_alloc_from_coherent()
>     v
>  ...
> 
> The difference it that dma_alloc_coherent() tries cma first and then
> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
> dma_alloc_from_coherent() fails. In my implementation I am directly
> using dma_alloc_from_coherent() and only for large mem sizes.
> 
> So both approaches uses finally the same allocation, but for gicv3-its
> the generic dma framework is not used since the gic is not implemented
> as a device.

And that's what I propose we change.

The core GIC itself indeed isn't a device, and I'm not proposing we make
it a device (yet). But the ITS is only used much later in the game, and
we could move the table allocation to a different time (when the actual
domains are allocated, for example...). Then, we'd have a set of devices
available, and the DMA API is our friend again.

	M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 13:30       ` Marc Zyngier
  0 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-02-29 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 29/02/16 12:25, Robert Richter wrote:
> On 29.02.16 10:46:49, Marc Zyngier wrote:
>> On 25/02/16 11:02, Robert Richter wrote:
>>> From: Robert Richter <rrichter@cavium.com>
>>>
>>> This series implements the use of CMA for allocation of large device
>>> tables for the arm64 gicv3 interrupt controller.
>>>
>>> There are 2 patches, the first is for early activation of cma, which
>>> needs to be done before interrupt initialization to make it available
>>> to the gicv3. The second implements the use of CMA to allocate
>>> gicv3-its device tables.
>>>
>>> This solves the problem where mem allocation is limited to 4MB. A
>>> previous patch sent to the list to address this that instead increases
>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>
>> I think you're looking at the problem the wrong way. Instead of going
>> through CMA directly, I'd rather go through the normal DMA API
>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>
>> That will give you all the benefit of the CMA allocation, and also make
>> the driver more robust. I meant to do this for a while, and never found
>> the time. Any chance you could have a look?
> 
> I was considering this first, and in fact the backend used is the
> same. The problem is that irq initialization is much more earlier than
> standard device probing. The gic even does not have its own struct
> device and is not initialized like devices are. This makes the whole
> dma_alloc_coherent() approach not feasable, at least this would
> require introducing and using a dev struct for the gic. But still this
> migth not work as it could be too early during boot. I also think
> there were reasons not implementing the gic as a device.
> 
> I was following more the approach of iommu/mmu implementations which
> use dma_alloc_from_contiguous() directly. I think this is more close
> to the device tables for its.
> 
> Code path of dma_alloc_coherent():
> 
>  dma_alloc_coherent()
>     v
>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>     v
>  dma_alloc_from_coherent()
>     v
>  ...
> 
> The difference it that dma_alloc_coherent() tries cma first and then
> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
> dma_alloc_from_coherent() fails. In my implementation I am directly
> using dma_alloc_from_coherent() and only for large mem sizes.
> 
> So both approaches uses finally the same allocation, but for gicv3-its
> the generic dma framework is not used since the gic is not implemented
> as a device.

And that's what I propose we change.

The core GIC itself indeed isn't a device, and I'm not proposing we make
it a device (yet). But the ITS is only used much later in the game, and
we could move the table allocation to a different time (when the actual
domains are allocated, for example...). Then, we'd have a set of devices
available, and the DMA API is our friend again.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
  2016-02-29 13:30       ` Marc Zyngier
  (?)
@ 2016-02-29 23:17         ` Laura Abbott
  -1 siblings, 0 replies; 30+ messages in thread
From: Laura Abbott @ 2016-02-29 23:17 UTC (permalink / raw)
  To: Marc Zyngier, Robert Richter
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 02/29/2016 05:30 AM, Marc Zyngier wrote:
> On 29/02/16 12:25, Robert Richter wrote:
>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>> On 25/02/16 11:02, Robert Richter wrote:
>>>> From: Robert Richter <rrichter@cavium.com>
>>>>
>>>> This series implements the use of CMA for allocation of large device
>>>> tables for the arm64 gicv3 interrupt controller.
>>>>
>>>> There are 2 patches, the first is for early activation of cma, which
>>>> needs to be done before interrupt initialization to make it available
>>>> to the gicv3. The second implements the use of CMA to allocate
>>>> gicv3-its device tables.
>>>>
>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>> previous patch sent to the list to address this that instead increases
>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>
>>> I think you're looking at the problem the wrong way. Instead of going
>>> through CMA directly, I'd rather go through the normal DMA API
>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>
>>> That will give you all the benefit of the CMA allocation, and also make
>>> the driver more robust. I meant to do this for a while, and never found
>>> the time. Any chance you could have a look?
>>
>> I was considering this first, and in fact the backend used is the
>> same. The problem is that irq initialization is much more earlier than
>> standard device probing. The gic even does not have its own struct
>> device and is not initialized like devices are. This makes the whole
>> dma_alloc_coherent() approach not feasable, at least this would
>> require introducing and using a dev struct for the gic. But still this
>> migth not work as it could be too early during boot. I also think
>> there were reasons not implementing the gic as a device.
>>
>> I was following more the approach of iommu/mmu implementations which
>> use dma_alloc_from_contiguous() directly. I think this is more close
>> to the device tables for its.
>>
>> Code path of dma_alloc_coherent():
>>
>>   dma_alloc_coherent()
>>      v
>>   dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>      v
>>   dma_alloc_from_coherent()
>>      v
>>   ...
>>
>> The difference it that dma_alloc_coherent() tries cma first and then
>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>> dma_alloc_from_coherent() fails. In my implementation I am directly
>> using dma_alloc_from_coherent() and only for large mem sizes.
>>
>> So both approaches uses finally the same allocation, but for gicv3-its
>> the generic dma framework is not used since the gic is not implemented
>> as a device.
>
> And that's what I propose we change.
>
> The core GIC itself indeed isn't a device, and I'm not proposing we make
> it a device (yet). But the ITS is only used much later in the game, and
> we could move the table allocation to a different time (when the actual
> domains are allocated, for example...). Then, we'd have a set of devices
> available, and the DMA API is our friend again.
>
> 	M.
>

I did the first drop of CMA in the DMA APIs for arm64. When adding that,
it was decided to disallow dma_alloc calls without a valid device pointer
(c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
if the GIC code wants to use dma_alloc it _must_ have a proper device.

If the device shift still isn't feasible, a better approach might be
what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
calls the cma_alloc functions directly and skips trying to work around
the DMA layer.

With either option, I don't think the early initialization approach
proposed is great. If we want CMA early, it's probably be just to
explicitly initialize it early rather than trying to do it from
two places. Something like:

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..a26712a 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,5 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
                                         struct cma **res_cma);
  extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
+extern int __init cma_init_reserved_areas(void);
  #endif
diff --git a/init/main.c b/init/main.c
index 58c9e37..a92bdb8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -81,6 +81,7 @@
  #include <linux/integrity.h>
  #include <linux/proc_ns.h>
  #include <linux/io.h>
+#include <linux/cma.h>
  
  #include <asm/io.h>
  #include <asm/bugs.h>
@@ -492,6 +493,7 @@ static void __init mm_init(void)
         pgtable_init();
         vmalloc_init();
         ioremap_huge_init();
+       cma_init_reserved_areas();
  }
  
  asmlinkage __visible void __init start_kernel(void)
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb..42278d4 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,7 +142,7 @@ err:
         return -EINVAL;
  }
  
-static int __init cma_init_reserved_areas(void)
+int __init cma_init_reserved_areas(void)
  {
         int i;
  
@@ -155,7 +155,6 @@ static int __init cma_init_reserved_areas(void)
  
         return 0;
  }
-core_initcall(cma_init_reserved_areas);
  
  /**
   * cma_init_reserved_mem() - create custom contiguous area from reserved memory


Thanks,
Laura

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 23:17         ` Laura Abbott
  0 siblings, 0 replies; 30+ messages in thread
From: Laura Abbott @ 2016-02-29 23:17 UTC (permalink / raw)
  To: Marc Zyngier, Robert Richter
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 02/29/2016 05:30 AM, Marc Zyngier wrote:
> On 29/02/16 12:25, Robert Richter wrote:
>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>> On 25/02/16 11:02, Robert Richter wrote:
>>>> From: Robert Richter <rrichter@cavium.com>
>>>>
>>>> This series implements the use of CMA for allocation of large device
>>>> tables for the arm64 gicv3 interrupt controller.
>>>>
>>>> There are 2 patches, the first is for early activation of cma, which
>>>> needs to be done before interrupt initialization to make it available
>>>> to the gicv3. The second implements the use of CMA to allocate
>>>> gicv3-its device tables.
>>>>
>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>> previous patch sent to the list to address this that instead increases
>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>
>>> I think you're looking at the problem the wrong way. Instead of going
>>> through CMA directly, I'd rather go through the normal DMA API
>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>
>>> That will give you all the benefit of the CMA allocation, and also make
>>> the driver more robust. I meant to do this for a while, and never found
>>> the time. Any chance you could have a look?
>>
>> I was considering this first, and in fact the backend used is the
>> same. The problem is that irq initialization is much more earlier than
>> standard device probing. The gic even does not have its own struct
>> device and is not initialized like devices are. This makes the whole
>> dma_alloc_coherent() approach not feasable, at least this would
>> require introducing and using a dev struct for the gic. But still this
>> migth not work as it could be too early during boot. I also think
>> there were reasons not implementing the gic as a device.
>>
>> I was following more the approach of iommu/mmu implementations which
>> use dma_alloc_from_contiguous() directly. I think this is more close
>> to the device tables for its.
>>
>> Code path of dma_alloc_coherent():
>>
>>   dma_alloc_coherent()
>>      v
>>   dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>      v
>>   dma_alloc_from_coherent()
>>      v
>>   ...
>>
>> The difference it that dma_alloc_coherent() tries cma first and then
>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>> dma_alloc_from_coherent() fails. In my implementation I am directly
>> using dma_alloc_from_coherent() and only for large mem sizes.
>>
>> So both approaches uses finally the same allocation, but for gicv3-its
>> the generic dma framework is not used since the gic is not implemented
>> as a device.
>
> And that's what I propose we change.
>
> The core GIC itself indeed isn't a device, and I'm not proposing we make
> it a device (yet). But the ITS is only used much later in the game, and
> we could move the table allocation to a different time (when the actual
> domains are allocated, for example...). Then, we'd have a set of devices
> available, and the DMA API is our friend again.
>
> 	M.
>

I did the first drop of CMA in the DMA APIs for arm64. When adding that,
it was decided to disallow dma_alloc calls without a valid device pointer
(c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
if the GIC code wants to use dma_alloc it _must_ have a proper device.

If the device shift still isn't feasible, a better approach might be
what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
calls the cma_alloc functions directly and skips trying to work around
the DMA layer.

With either option, I don't think the early initialization approach
proposed is great. If we want CMA early, it's probably be just to
explicitly initialize it early rather than trying to do it from
two places. Something like:

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..a26712a 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,5 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
                                         struct cma **res_cma);
  extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
+extern int __init cma_init_reserved_areas(void);
  #endif
diff --git a/init/main.c b/init/main.c
index 58c9e37..a92bdb8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -81,6 +81,7 @@
  #include <linux/integrity.h>
  #include <linux/proc_ns.h>
  #include <linux/io.h>
+#include <linux/cma.h>
  
  #include <asm/io.h>
  #include <asm/bugs.h>
@@ -492,6 +493,7 @@ static void __init mm_init(void)
         pgtable_init();
         vmalloc_init();
         ioremap_huge_init();
+       cma_init_reserved_areas();
  }
  
  asmlinkage __visible void __init start_kernel(void)
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb..42278d4 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,7 +142,7 @@ err:
         return -EINVAL;
  }
  
-static int __init cma_init_reserved_areas(void)
+int __init cma_init_reserved_areas(void)
  {
         int i;
  
@@ -155,7 +155,6 @@ static int __init cma_init_reserved_areas(void)
  
         return 0;
  }
-core_initcall(cma_init_reserved_areas);
  
  /**
   * cma_init_reserved_mem() - create custom contiguous area from reserved memory


Thanks,
Laura

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-02-29 23:17         ` Laura Abbott
  0 siblings, 0 replies; 30+ messages in thread
From: Laura Abbott @ 2016-02-29 23:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/29/2016 05:30 AM, Marc Zyngier wrote:
> On 29/02/16 12:25, Robert Richter wrote:
>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>> On 25/02/16 11:02, Robert Richter wrote:
>>>> From: Robert Richter <rrichter@cavium.com>
>>>>
>>>> This series implements the use of CMA for allocation of large device
>>>> tables for the arm64 gicv3 interrupt controller.
>>>>
>>>> There are 2 patches, the first is for early activation of cma, which
>>>> needs to be done before interrupt initialization to make it available
>>>> to the gicv3. The second implements the use of CMA to allocate
>>>> gicv3-its device tables.
>>>>
>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>> previous patch sent to the list to address this that instead increases
>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>
>>> I think you're looking at the problem the wrong way. Instead of going
>>> through CMA directly, I'd rather go through the normal DMA API
>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>
>>> That will give you all the benefit of the CMA allocation, and also make
>>> the driver more robust. I meant to do this for a while, and never found
>>> the time. Any chance you could have a look?
>>
>> I was considering this first, and in fact the backend used is the
>> same. The problem is that irq initialization is much more earlier than
>> standard device probing. The gic even does not have its own struct
>> device and is not initialized like devices are. This makes the whole
>> dma_alloc_coherent() approach not feasable, at least this would
>> require introducing and using a dev struct for the gic. But still this
>> migth not work as it could be too early during boot. I also think
>> there were reasons not implementing the gic as a device.
>>
>> I was following more the approach of iommu/mmu implementations which
>> use dma_alloc_from_contiguous() directly. I think this is more close
>> to the device tables for its.
>>
>> Code path of dma_alloc_coherent():
>>
>>   dma_alloc_coherent()
>>      v
>>   dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>      v
>>   dma_alloc_from_coherent()
>>      v
>>   ...
>>
>> The difference it that dma_alloc_coherent() tries cma first and then
>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>> dma_alloc_from_coherent() fails. In my implementation I am directly
>> using dma_alloc_from_coherent() and only for large mem sizes.
>>
>> So both approaches uses finally the same allocation, but for gicv3-its
>> the generic dma framework is not used since the gic is not implemented
>> as a device.
>
> And that's what I propose we change.
>
> The core GIC itself indeed isn't a device, and I'm not proposing we make
> it a device (yet). But the ITS is only used much later in the game, and
> we could move the table allocation to a different time (when the actual
> domains are allocated, for example...). Then, we'd have a set of devices
> available, and the DMA API is our friend again.
>
> 	M.
>

I did the first drop of CMA in the DMA APIs for arm64. When adding that,
it was decided to disallow dma_alloc calls without a valid device pointer
(c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
if the GIC code wants to use dma_alloc it _must_ have a proper device.

If the device shift still isn't feasible, a better approach might be
what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
calls the cma_alloc functions directly and skips trying to work around
the DMA layer.

With either option, I don't think the early initialization approach
proposed is great. If we want CMA early, it's probably be just to
explicitly initialize it early rather than trying to do it from
two places. Something like:

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..a26712a 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,5 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
                                         struct cma **res_cma);
  extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
+extern int __init cma_init_reserved_areas(void);
  #endif
diff --git a/init/main.c b/init/main.c
index 58c9e37..a92bdb8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -81,6 +81,7 @@
  #include <linux/integrity.h>
  #include <linux/proc_ns.h>
  #include <linux/io.h>
+#include <linux/cma.h>
  
  #include <asm/io.h>
  #include <asm/bugs.h>
@@ -492,6 +493,7 @@ static void __init mm_init(void)
         pgtable_init();
         vmalloc_init();
         ioremap_huge_init();
+       cma_init_reserved_areas();
  }
  
  asmlinkage __visible void __init start_kernel(void)
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb..42278d4 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,7 +142,7 @@ err:
         return -EINVAL;
  }
  
-static int __init cma_init_reserved_areas(void)
+int __init cma_init_reserved_areas(void)
  {
         int i;
  
@@ -155,7 +155,6 @@ static int __init cma_init_reserved_areas(void)
  
         return 0;
  }
-core_initcall(cma_init_reserved_areas);
  
  /**
   * cma_init_reserved_mem() - create custom contiguous area from reserved memory


Thanks,
Laura

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
  2016-02-29 23:17         ` Laura Abbott
  (?)
@ 2016-03-01 12:40           ` Robert Richter
  -1 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-03-01 12:40 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 29.02.16 15:17:53, Laura Abbott wrote:
> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
> >On 29/02/16 12:25, Robert Richter wrote:
> >>On 29.02.16 10:46:49, Marc Zyngier wrote:
> >>>On 25/02/16 11:02, Robert Richter wrote:
> >>>>From: Robert Richter <rrichter@cavium.com>
> >>>>
> >>>>This series implements the use of CMA for allocation of large device
> >>>>tables for the arm64 gicv3 interrupt controller.
> >>>>
> >>>>There are 2 patches, the first is for early activation of cma, which
> >>>>needs to be done before interrupt initialization to make it available
> >>>>to the gicv3. The second implements the use of CMA to allocate
> >>>>gicv3-its device tables.
> >>>>
> >>>>This solves the problem where mem allocation is limited to 4MB. A
> >>>>previous patch sent to the list to address this that instead increases
> >>>>FORCE_MAX_ZONEORDER becomes obsolete.
> >>>
> >>>I think you're looking at the problem the wrong way. Instead of going
> >>>through CMA directly, I'd rather go through the normal DMA API
> >>>(dma_alloc_coherent), which can itself try CMA (should it be enabled).
> >>>
> >>>That will give you all the benefit of the CMA allocation, and also make
> >>>the driver more robust. I meant to do this for a while, and never found
> >>>the time. Any chance you could have a look?
> >>
> >>I was considering this first, and in fact the backend used is the
> >>same. The problem is that irq initialization is much more earlier than
> >>standard device probing. The gic even does not have its own struct
> >>device and is not initialized like devices are. This makes the whole
> >>dma_alloc_coherent() approach not feasable, at least this would
> >>require introducing and using a dev struct for the gic. But still this
> >>migth not work as it could be too early during boot. I also think
> >>there were reasons not implementing the gic as a device.
> >>
> >>I was following more the approach of iommu/mmu implementations which
> >>use dma_alloc_from_contiguous() directly. I think this is more close
> >>to the device tables for its.
> >>
> >>Code path of dma_alloc_coherent():
> >>
> >>  dma_alloc_coherent()
> >>     v
> >>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
> >>     v
> >>  dma_alloc_from_coherent()
> >>     v
> >>  ...
> >>
> >>The difference it that dma_alloc_coherent() tries cma first and then
> >>proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
> >>dma_alloc_from_coherent() fails. In my implementation I am directly
> >>using dma_alloc_from_coherent() and only for large mem sizes.
> >>
> >>So both approaches uses finally the same allocation, but for gicv3-its
> >>the generic dma framework is not used since the gic is not implemented
> >>as a device.
> >
> >And that's what I propose we change.
> >
> >The core GIC itself indeed isn't a device, and I'm not proposing we make
> >it a device (yet). But the ITS is only used much later in the game, and
> >we could move the table allocation to a different time (when the actual
> >domains are allocated, for example...). Then, we'd have a set of devices
> >available, and the DMA API is our friend again.
> >
> >	M.
> >
> 
> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
> it was decided to disallow dma_alloc calls without a valid device pointer
> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
> if the GIC code wants to use dma_alloc it _must_ have a proper device.
> 
> If the device shift still isn't feasible, a better approach might be
> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
> calls the cma_alloc functions directly and skips trying to work around
> the DMA layer.
> 
> With either option, I don't think the early initialization approach
> proposed is great. If we want CMA early, it's probably be just to
> explicitly initialize it early rather than trying to do it from
> two places. Something like:

I wasn't sure whether this works for all archs if called directly in
mm_init(). If so, ok your proposed change would be better, though a
stub for !CONFIG_CMA needs to be added. Any comment on the change
below as a replacement for patch #1?

On the other side, if we use device enablement for its, then early cma
enablement is not needed anymore. Will check how that could work.

-Robert

> 
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 29f9e77..a26712a 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -28,4 +28,5 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>                                         struct cma **res_cma);
>  extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
>  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
> +extern int __init cma_init_reserved_areas(void);
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 58c9e37..a92bdb8 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -81,6 +81,7 @@
>  #include <linux/integrity.h>
>  #include <linux/proc_ns.h>
>  #include <linux/io.h>
> +#include <linux/cma.h>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -492,6 +493,7 @@ static void __init mm_init(void)
>         pgtable_init();
>         vmalloc_init();
>         ioremap_huge_init();
> +       cma_init_reserved_areas();
>  }
>  asmlinkage __visible void __init start_kernel(void)
> diff --git a/mm/cma.c b/mm/cma.c
> index ea506eb..42278d4 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -142,7 +142,7 @@ err:
>         return -EINVAL;
>  }
> -static int __init cma_init_reserved_areas(void)
> +int __init cma_init_reserved_areas(void)
>  {
>         int i;
> @@ -155,7 +155,6 @@ static int __init cma_init_reserved_areas(void)
>         return 0;
>  }
> -core_initcall(cma_init_reserved_areas);
>  /**
>   * cma_init_reserved_mem() - create custom contiguous area from reserved memory

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-03-01 12:40           ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-03-01 12:40 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 29.02.16 15:17:53, Laura Abbott wrote:
> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
> >On 29/02/16 12:25, Robert Richter wrote:
> >>On 29.02.16 10:46:49, Marc Zyngier wrote:
> >>>On 25/02/16 11:02, Robert Richter wrote:
> >>>>From: Robert Richter <rrichter@cavium.com>
> >>>>
> >>>>This series implements the use of CMA for allocation of large device
> >>>>tables for the arm64 gicv3 interrupt controller.
> >>>>
> >>>>There are 2 patches, the first is for early activation of cma, which
> >>>>needs to be done before interrupt initialization to make it available
> >>>>to the gicv3. The second implements the use of CMA to allocate
> >>>>gicv3-its device tables.
> >>>>
> >>>>This solves the problem where mem allocation is limited to 4MB. A
> >>>>previous patch sent to the list to address this that instead increases
> >>>>FORCE_MAX_ZONEORDER becomes obsolete.
> >>>
> >>>I think you're looking at the problem the wrong way. Instead of going
> >>>through CMA directly, I'd rather go through the normal DMA API
> >>>(dma_alloc_coherent), which can itself try CMA (should it be enabled).
> >>>
> >>>That will give you all the benefit of the CMA allocation, and also make
> >>>the driver more robust. I meant to do this for a while, and never found
> >>>the time. Any chance you could have a look?
> >>
> >>I was considering this first, and in fact the backend used is the
> >>same. The problem is that irq initialization is much more earlier than
> >>standard device probing. The gic even does not have its own struct
> >>device and is not initialized like devices are. This makes the whole
> >>dma_alloc_coherent() approach not feasable, at least this would
> >>require introducing and using a dev struct for the gic. But still this
> >>migth not work as it could be too early during boot. I also think
> >>there were reasons not implementing the gic as a device.
> >>
> >>I was following more the approach of iommu/mmu implementations which
> >>use dma_alloc_from_contiguous() directly. I think this is more close
> >>to the device tables for its.
> >>
> >>Code path of dma_alloc_coherent():
> >>
> >>  dma_alloc_coherent()
> >>     v
> >>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
> >>     v
> >>  dma_alloc_from_coherent()
> >>     v
> >>  ...
> >>
> >>The difference it that dma_alloc_coherent() tries cma first and then
> >>proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
> >>dma_alloc_from_coherent() fails. In my implementation I am directly
> >>using dma_alloc_from_coherent() and only for large mem sizes.
> >>
> >>So both approaches uses finally the same allocation, but for gicv3-its
> >>the generic dma framework is not used since the gic is not implemented
> >>as a device.
> >
> >And that's what I propose we change.
> >
> >The core GIC itself indeed isn't a device, and I'm not proposing we make
> >it a device (yet). But the ITS is only used much later in the game, and
> >we could move the table allocation to a different time (when the actual
> >domains are allocated, for example...). Then, we'd have a set of devices
> >available, and the DMA API is our friend again.
> >
> >	M.
> >
> 
> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
> it was decided to disallow dma_alloc calls without a valid device pointer
> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
> if the GIC code wants to use dma_alloc it _must_ have a proper device.
> 
> If the device shift still isn't feasible, a better approach might be
> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
> calls the cma_alloc functions directly and skips trying to work around
> the DMA layer.
> 
> With either option, I don't think the early initialization approach
> proposed is great. If we want CMA early, it's probably be just to
> explicitly initialize it early rather than trying to do it from
> two places. Something like:

I wasn't sure whether this works for all archs if called directly in
mm_init(). If so, ok your proposed change would be better, though a
stub for !CONFIG_CMA needs to be added. Any comment on the change
below as a replacement for patch #1?

On the other side, if we use device enablement for its, then early cma
enablement is not needed anymore. Will check how that could work.

-Robert

> 
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 29f9e77..a26712a 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -28,4 +28,5 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>                                         struct cma **res_cma);
>  extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
>  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
> +extern int __init cma_init_reserved_areas(void);
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 58c9e37..a92bdb8 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -81,6 +81,7 @@
>  #include <linux/integrity.h>
>  #include <linux/proc_ns.h>
>  #include <linux/io.h>
> +#include <linux/cma.h>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -492,6 +493,7 @@ static void __init mm_init(void)
>         pgtable_init();
>         vmalloc_init();
>         ioremap_huge_init();
> +       cma_init_reserved_areas();
>  }
>  asmlinkage __visible void __init start_kernel(void)
> diff --git a/mm/cma.c b/mm/cma.c
> index ea506eb..42278d4 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -142,7 +142,7 @@ err:
>         return -EINVAL;
>  }
> -static int __init cma_init_reserved_areas(void)
> +int __init cma_init_reserved_areas(void)
>  {
>         int i;
> @@ -155,7 +155,6 @@ static int __init cma_init_reserved_areas(void)
>         return 0;
>  }
> -core_initcall(cma_init_reserved_areas);
>  /**
>   * cma_init_reserved_mem() - create custom contiguous area from reserved memory

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-03-01 12:40           ` Robert Richter
  0 siblings, 0 replies; 30+ messages in thread
From: Robert Richter @ 2016-03-01 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 29.02.16 15:17:53, Laura Abbott wrote:
> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
> >On 29/02/16 12:25, Robert Richter wrote:
> >>On 29.02.16 10:46:49, Marc Zyngier wrote:
> >>>On 25/02/16 11:02, Robert Richter wrote:
> >>>>From: Robert Richter <rrichter@cavium.com>
> >>>>
> >>>>This series implements the use of CMA for allocation of large device
> >>>>tables for the arm64 gicv3 interrupt controller.
> >>>>
> >>>>There are 2 patches, the first is for early activation of cma, which
> >>>>needs to be done before interrupt initialization to make it available
> >>>>to the gicv3. The second implements the use of CMA to allocate
> >>>>gicv3-its device tables.
> >>>>
> >>>>This solves the problem where mem allocation is limited to 4MB. A
> >>>>previous patch sent to the list to address this that instead increases
> >>>>FORCE_MAX_ZONEORDER becomes obsolete.
> >>>
> >>>I think you're looking at the problem the wrong way. Instead of going
> >>>through CMA directly, I'd rather go through the normal DMA API
> >>>(dma_alloc_coherent), which can itself try CMA (should it be enabled).
> >>>
> >>>That will give you all the benefit of the CMA allocation, and also make
> >>>the driver more robust. I meant to do this for a while, and never found
> >>>the time. Any chance you could have a look?
> >>
> >>I was considering this first, and in fact the backend used is the
> >>same. The problem is that irq initialization is much more earlier than
> >>standard device probing. The gic even does not have its own struct
> >>device and is not initialized like devices are. This makes the whole
> >>dma_alloc_coherent() approach not feasable, at least this would
> >>require introducing and using a dev struct for the gic. But still this
> >>migth not work as it could be too early during boot. I also think
> >>there were reasons not implementing the gic as a device.
> >>
> >>I was following more the approach of iommu/mmu implementations which
> >>use dma_alloc_from_contiguous() directly. I think this is more close
> >>to the device tables for its.
> >>
> >>Code path of dma_alloc_coherent():
> >>
> >>  dma_alloc_coherent()
> >>     v
> >>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
> >>     v
> >>  dma_alloc_from_coherent()
> >>     v
> >>  ...
> >>
> >>The difference it that dma_alloc_coherent() tries cma first and then
> >>proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
> >>dma_alloc_from_coherent() fails. In my implementation I am directly
> >>using dma_alloc_from_coherent() and only for large mem sizes.
> >>
> >>So both approaches uses finally the same allocation, but for gicv3-its
> >>the generic dma framework is not used since the gic is not implemented
> >>as a device.
> >
> >And that's what I propose we change.
> >
> >The core GIC itself indeed isn't a device, and I'm not proposing we make
> >it a device (yet). But the ITS is only used much later in the game, and
> >we could move the table allocation to a different time (when the actual
> >domains are allocated, for example...). Then, we'd have a set of devices
> >available, and the DMA API is our friend again.
> >
> >	M.
> >
> 
> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
> it was decided to disallow dma_alloc calls without a valid device pointer
> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
> if the GIC code wants to use dma_alloc it _must_ have a proper device.
> 
> If the device shift still isn't feasible, a better approach might be
> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
> calls the cma_alloc functions directly and skips trying to work around
> the DMA layer.
> 
> With either option, I don't think the early initialization approach
> proposed is great. If we want CMA early, it's probably be just to
> explicitly initialize it early rather than trying to do it from
> two places. Something like:

I wasn't sure whether this works for all archs if called directly in
mm_init(). If so, ok your proposed change would be better, though a
stub for !CONFIG_CMA needs to be added. Any comment on the change
below as a replacement for patch #1?

On the other side, if we use device enablement for its, then early cma
enablement is not needed anymore. Will check how that could work.

-Robert

> 
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 29f9e77..a26712a 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -28,4 +28,5 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>                                         struct cma **res_cma);
>  extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
>  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
> +extern int __init cma_init_reserved_areas(void);
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 58c9e37..a92bdb8 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -81,6 +81,7 @@
>  #include <linux/integrity.h>
>  #include <linux/proc_ns.h>
>  #include <linux/io.h>
> +#include <linux/cma.h>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -492,6 +493,7 @@ static void __init mm_init(void)
>         pgtable_init();
>         vmalloc_init();
>         ioremap_huge_init();
> +       cma_init_reserved_areas();
>  }
>  asmlinkage __visible void __init start_kernel(void)
> diff --git a/mm/cma.c b/mm/cma.c
> index ea506eb..42278d4 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -142,7 +142,7 @@ err:
>         return -EINVAL;
>  }
> -static int __init cma_init_reserved_areas(void)
> +int __init cma_init_reserved_areas(void)
>  {
>         int i;
> @@ -155,7 +155,6 @@ static int __init cma_init_reserved_areas(void)
>         return 0;
>  }
> -core_initcall(cma_init_reserved_areas);
>  /**
>   * cma_init_reserved_mem() - create custom contiguous area from reserved memory

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
  2016-03-01 12:40           ` Robert Richter
  (?)
@ 2016-03-04 14:26             ` Vlastimil Babka
  -1 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-03-04 14:26 UTC (permalink / raw)
  To: Robert Richter, Laura Abbott
  Cc: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel, Joonsoo Kim

On 03/01/2016 01:40 PM, Robert Richter wrote:
> On 29.02.16 15:17:53, Laura Abbott wrote:
>> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
>>> On 29/02/16 12:25, Robert Richter wrote:
>>>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>>>> On 25/02/16 11:02, Robert Richter wrote:
>>>>>> From: Robert Richter <rrichter@cavium.com>
>>>>>>
>>>>>> This series implements the use of CMA for allocation of large device
>>>>>> tables for the arm64 gicv3 interrupt controller.
>>>>>>
>>>>>> There are 2 patches, the first is for early activation of cma, which
>>>>>> needs to be done before interrupt initialization to make it available
>>>>>> to the gicv3. The second implements the use of CMA to allocate
>>>>>> gicv3-its device tables.
>>>>>>
>>>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>>>> previous patch sent to the list to address this that instead increases
>>>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>>>
>>>>> I think you're looking at the problem the wrong way. Instead of going
>>>>> through CMA directly, I'd rather go through the normal DMA API
>>>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>>>
>>>>> That will give you all the benefit of the CMA allocation, and also make
>>>>> the driver more robust. I meant to do this for a while, and never found
>>>>> the time. Any chance you could have a look?
>>>>
>>>> I was considering this first, and in fact the backend used is the
>>>> same. The problem is that irq initialization is much more earlier than
>>>> standard device probing. The gic even does not have its own struct
>>>> device and is not initialized like devices are. This makes the whole
>>>> dma_alloc_coherent() approach not feasable, at least this would
>>>> require introducing and using a dev struct for the gic. But still this
>>>> migth not work as it could be too early during boot. I also think
>>>> there were reasons not implementing the gic as a device.
>>>>
>>>> I was following more the approach of iommu/mmu implementations which
>>>> use dma_alloc_from_contiguous() directly. I think this is more close
>>>> to the device tables for its.
>>>>
>>>> Code path of dma_alloc_coherent():
>>>>
>>>>  dma_alloc_coherent()
>>>>     v
>>>>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>>>     v
>>>>  dma_alloc_from_coherent()
>>>>     v
>>>>  ...
>>>>
>>>> The difference it that dma_alloc_coherent() tries cma first and then
>>>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>>>> dma_alloc_from_coherent() fails. In my implementation I am directly
>>>> using dma_alloc_from_coherent() and only for large mem sizes.
>>>>
>>>> So both approaches uses finally the same allocation, but for gicv3-its
>>>> the generic dma framework is not used since the gic is not implemented
>>>> as a device.
>>>
>>> And that's what I propose we change.
>>>
>>> The core GIC itself indeed isn't a device, and I'm not proposing we make
>>> it a device (yet). But the ITS is only used much later in the game, and
>>> we could move the table allocation to a different time (when the actual
>>> domains are allocated, for example...). Then, we'd have a set of devices
>>> available, and the DMA API is our friend again.
>>>
>>> 	M.
>>>
>>
>> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
>> it was decided to disallow dma_alloc calls without a valid device pointer
>> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
>> if the GIC code wants to use dma_alloc it _must_ have a proper device.
>>
>> If the device shift still isn't feasible, a better approach might be
>> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
>> calls the cma_alloc functions directly and skips trying to work around
>> the DMA layer.
>>
>> With either option, I don't think the early initialization approach
>> proposed is great. If we want CMA early, it's probably be just to
>> explicitly initialize it early rather than trying to do it from
>> two places. Something like:
> 
> I wasn't sure whether this works for all archs if called directly in
> mm_init(). If so, ok your proposed change would be better, though a
> stub for !CONFIG_CMA needs to be added. Any comment on the change
> below as a replacement for patch #1?
> 
> On the other side, if we use device enablement for its, then early cma
> enablement is not needed anymore. Will check how that could work.

Hmm this reminds me of commit 080fe2068e1c7f19f5 where I've exposed
alloc_contig_range() and related stuff for allowing gigantic page
allocations without full CONFIG_CMA. Could this perhaps be generalized
for this case? Would alloc_contig_range() without the CMA pageblock
reservations be enough for you as well? Maybe then the "early CMA
initialization" problem would go away.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-03-04 14:26             ` Vlastimil Babka
  0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-03-04 14:26 UTC (permalink / raw)
  To: Robert Richter, Laura Abbott
  Cc: Marc Zyngier, Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel, Joonsoo Kim

On 03/01/2016 01:40 PM, Robert Richter wrote:
> On 29.02.16 15:17:53, Laura Abbott wrote:
>> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
>>> On 29/02/16 12:25, Robert Richter wrote:
>>>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>>>> On 25/02/16 11:02, Robert Richter wrote:
>>>>>> From: Robert Richter <rrichter@cavium.com>
>>>>>>
>>>>>> This series implements the use of CMA for allocation of large device
>>>>>> tables for the arm64 gicv3 interrupt controller.
>>>>>>
>>>>>> There are 2 patches, the first is for early activation of cma, which
>>>>>> needs to be done before interrupt initialization to make it available
>>>>>> to the gicv3. The second implements the use of CMA to allocate
>>>>>> gicv3-its device tables.
>>>>>>
>>>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>>>> previous patch sent to the list to address this that instead increases
>>>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>>>
>>>>> I think you're looking at the problem the wrong way. Instead of going
>>>>> through CMA directly, I'd rather go through the normal DMA API
>>>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>>>
>>>>> That will give you all the benefit of the CMA allocation, and also make
>>>>> the driver more robust. I meant to do this for a while, and never found
>>>>> the time. Any chance you could have a look?
>>>>
>>>> I was considering this first, and in fact the backend used is the
>>>> same. The problem is that irq initialization is much more earlier than
>>>> standard device probing. The gic even does not have its own struct
>>>> device and is not initialized like devices are. This makes the whole
>>>> dma_alloc_coherent() approach not feasable, at least this would
>>>> require introducing and using a dev struct for the gic. But still this
>>>> migth not work as it could be too early during boot. I also think
>>>> there were reasons not implementing the gic as a device.
>>>>
>>>> I was following more the approach of iommu/mmu implementations which
>>>> use dma_alloc_from_contiguous() directly. I think this is more close
>>>> to the device tables for its.
>>>>
>>>> Code path of dma_alloc_coherent():
>>>>
>>>>  dma_alloc_coherent()
>>>>     v
>>>>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>>>     v
>>>>  dma_alloc_from_coherent()
>>>>     v
>>>>  ...
>>>>
>>>> The difference it that dma_alloc_coherent() tries cma first and then
>>>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>>>> dma_alloc_from_coherent() fails. In my implementation I am directly
>>>> using dma_alloc_from_coherent() and only for large mem sizes.
>>>>
>>>> So both approaches uses finally the same allocation, but for gicv3-its
>>>> the generic dma framework is not used since the gic is not implemented
>>>> as a device.
>>>
>>> And that's what I propose we change.
>>>
>>> The core GIC itself indeed isn't a device, and I'm not proposing we make
>>> it a device (yet). But the ITS is only used much later in the game, and
>>> we could move the table allocation to a different time (when the actual
>>> domains are allocated, for example...). Then, we'd have a set of devices
>>> available, and the DMA API is our friend again.
>>>
>>> 	M.
>>>
>>
>> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
>> it was decided to disallow dma_alloc calls without a valid device pointer
>> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
>> if the GIC code wants to use dma_alloc it _must_ have a proper device.
>>
>> If the device shift still isn't feasible, a better approach might be
>> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
>> calls the cma_alloc functions directly and skips trying to work around
>> the DMA layer.
>>
>> With either option, I don't think the early initialization approach
>> proposed is great. If we want CMA early, it's probably be just to
>> explicitly initialize it early rather than trying to do it from
>> two places. Something like:
> 
> I wasn't sure whether this works for all archs if called directly in
> mm_init(). If so, ok your proposed change would be better, though a
> stub for !CONFIG_CMA needs to be added. Any comment on the change
> below as a replacement for patch #1?
> 
> On the other side, if we use device enablement for its, then early cma
> enablement is not needed anymore. Will check how that could work.

Hmm this reminds me of commit 080fe2068e1c7f19f5 where I've exposed
alloc_contig_range() and related stuff for allowing gigantic page
allocations without full CONFIG_CMA. Could this perhaps be generalized
for this case? Would alloc_contig_range() without the CMA pageblock
reservations be enough for you as well? Maybe then the "early CMA
initialization" problem would go away.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-03-04 14:26             ` Vlastimil Babka
  0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2016-03-04 14:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/01/2016 01:40 PM, Robert Richter wrote:
> On 29.02.16 15:17:53, Laura Abbott wrote:
>> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
>>> On 29/02/16 12:25, Robert Richter wrote:
>>>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>>>> On 25/02/16 11:02, Robert Richter wrote:
>>>>>> From: Robert Richter <rrichter@cavium.com>
>>>>>>
>>>>>> This series implements the use of CMA for allocation of large device
>>>>>> tables for the arm64 gicv3 interrupt controller.
>>>>>>
>>>>>> There are 2 patches, the first is for early activation of cma, which
>>>>>> needs to be done before interrupt initialization to make it available
>>>>>> to the gicv3. The second implements the use of CMA to allocate
>>>>>> gicv3-its device tables.
>>>>>>
>>>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>>>> previous patch sent to the list to address this that instead increases
>>>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>>>
>>>>> I think you're looking at the problem the wrong way. Instead of going
>>>>> through CMA directly, I'd rather go through the normal DMA API
>>>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>>>
>>>>> That will give you all the benefit of the CMA allocation, and also make
>>>>> the driver more robust. I meant to do this for a while, and never found
>>>>> the time. Any chance you could have a look?
>>>>
>>>> I was considering this first, and in fact the backend used is the
>>>> same. The problem is that irq initialization is much more earlier than
>>>> standard device probing. The gic even does not have its own struct
>>>> device and is not initialized like devices are. This makes the whole
>>>> dma_alloc_coherent() approach not feasable, at least this would
>>>> require introducing and using a dev struct for the gic. But still this
>>>> migth not work as it could be too early during boot. I also think
>>>> there were reasons not implementing the gic as a device.
>>>>
>>>> I was following more the approach of iommu/mmu implementations which
>>>> use dma_alloc_from_contiguous() directly. I think this is more close
>>>> to the device tables for its.
>>>>
>>>> Code path of dma_alloc_coherent():
>>>>
>>>>  dma_alloc_coherent()
>>>>     v
>>>>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>>>     v
>>>>  dma_alloc_from_coherent()
>>>>     v
>>>>  ...
>>>>
>>>> The difference it that dma_alloc_coherent() tries cma first and then
>>>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>>>> dma_alloc_from_coherent() fails. In my implementation I am directly
>>>> using dma_alloc_from_coherent() and only for large mem sizes.
>>>>
>>>> So both approaches uses finally the same allocation, but for gicv3-its
>>>> the generic dma framework is not used since the gic is not implemented
>>>> as a device.
>>>
>>> And that's what I propose we change.
>>>
>>> The core GIC itself indeed isn't a device, and I'm not proposing we make
>>> it a device (yet). But the ITS is only used much later in the game, and
>>> we could move the table allocation to a different time (when the actual
>>> domains are allocated, for example...). Then, we'd have a set of devices
>>> available, and the DMA API is our friend again.
>>>
>>> 	M.
>>>
>>
>> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
>> it was decided to disallow dma_alloc calls without a valid device pointer
>> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
>> if the GIC code wants to use dma_alloc it _must_ have a proper device.
>>
>> If the device shift still isn't feasible, a better approach might be
>> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
>> calls the cma_alloc functions directly and skips trying to work around
>> the DMA layer.
>>
>> With either option, I don't think the early initialization approach
>> proposed is great. If we want CMA early, it's probably be just to
>> explicitly initialize it early rather than trying to do it from
>> two places. Something like:
> 
> I wasn't sure whether this works for all archs if called directly in
> mm_init(). If so, ok your proposed change would be better, though a
> stub for !CONFIG_CMA needs to be added. Any comment on the change
> below as a replacement for patch #1?
> 
> On the other side, if we use device enablement for its, then early cma
> enablement is not needed anymore. Will check how that could work.

Hmm this reminds me of commit 080fe2068e1c7f19f5 where I've exposed
alloc_contig_range() and related stuff for allowing gigantic page
allocations without full CONFIG_CMA. Could this perhaps be generalized
for this case? Would alloc_contig_range() without the CMA pageblock
reservations be enough for you as well? Maybe then the "early CMA
initialization" problem would go away.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
  2016-03-01 12:40           ` Robert Richter
  (?)
@ 2016-03-04 17:32             ` Marc Zyngier
  -1 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-03-04 17:32 UTC (permalink / raw)
  To: Robert Richter, Laura Abbott
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 01/03/16 12:40, Robert Richter wrote:
> On 29.02.16 15:17:53, Laura Abbott wrote:
>> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
>>> On 29/02/16 12:25, Robert Richter wrote:
>>>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>>>> On 25/02/16 11:02, Robert Richter wrote:
>>>>>> From: Robert Richter <rrichter@cavium.com>
>>>>>>
>>>>>> This series implements the use of CMA for allocation of large device
>>>>>> tables for the arm64 gicv3 interrupt controller.
>>>>>>
>>>>>> There are 2 patches, the first is for early activation of cma, which
>>>>>> needs to be done before interrupt initialization to make it available
>>>>>> to the gicv3. The second implements the use of CMA to allocate
>>>>>> gicv3-its device tables.
>>>>>>
>>>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>>>> previous patch sent to the list to address this that instead increases
>>>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>>>
>>>>> I think you're looking at the problem the wrong way. Instead of going
>>>>> through CMA directly, I'd rather go through the normal DMA API
>>>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>>>
>>>>> That will give you all the benefit of the CMA allocation, and also make
>>>>> the driver more robust. I meant to do this for a while, and never found
>>>>> the time. Any chance you could have a look?
>>>>
>>>> I was considering this first, and in fact the backend used is the
>>>> same. The problem is that irq initialization is much more earlier than
>>>> standard device probing. The gic even does not have its own struct
>>>> device and is not initialized like devices are. This makes the whole
>>>> dma_alloc_coherent() approach not feasable, at least this would
>>>> require introducing and using a dev struct for the gic. But still this
>>>> migth not work as it could be too early during boot. I also think
>>>> there were reasons not implementing the gic as a device.
>>>>
>>>> I was following more the approach of iommu/mmu implementations which
>>>> use dma_alloc_from_contiguous() directly. I think this is more close
>>>> to the device tables for its.
>>>>
>>>> Code path of dma_alloc_coherent():
>>>>
>>>>  dma_alloc_coherent()
>>>>     v
>>>>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>>>     v
>>>>  dma_alloc_from_coherent()
>>>>     v
>>>>  ...
>>>>
>>>> The difference it that dma_alloc_coherent() tries cma first and then
>>>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>>>> dma_alloc_from_coherent() fails. In my implementation I am directly
>>>> using dma_alloc_from_coherent() and only for large mem sizes.
>>>>
>>>> So both approaches uses finally the same allocation, but for gicv3-its
>>>> the generic dma framework is not used since the gic is not implemented
>>>> as a device.
>>>
>>> And that's what I propose we change.
>>>
>>> The core GIC itself indeed isn't a device, and I'm not proposing we make
>>> it a device (yet). But the ITS is only used much later in the game, and
>>> we could move the table allocation to a different time (when the actual
>>> domains are allocated, for example...). Then, we'd have a set of devices
>>> available, and the DMA API is our friend again.
>>>
>>> 	M.
>>>
>>
>> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
>> it was decided to disallow dma_alloc calls without a valid device pointer
>> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
>> if the GIC code wants to use dma_alloc it _must_ have a proper device.
>>
>> If the device shift still isn't feasible, a better approach might be
>> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
>> calls the cma_alloc functions directly and skips trying to work around
>> the DMA layer.
>>
>> With either option, I don't think the early initialization approach
>> proposed is great. If we want CMA early, it's probably be just to
>> explicitly initialize it early rather than trying to do it from
>> two places. Something like:
> 
> I wasn't sure whether this works for all archs if called directly in
> mm_init(). If so, ok your proposed change would be better, though a
> stub for !CONFIG_CMA needs to be added. Any comment on the change
> below as a replacement for patch #1?
> 
> On the other side, if we use device enablement for its, then early cma
> enablement is not needed anymore. Will check how that could work.

I'm planning to have a look at that next week. This would solve a number
of other issues (like the custom "needs flushing" flags we have so far),
and Will has been pestering me about it for quite a while now.

The only worry I have is that we end-up in a dependency hell with PCI
being probed too early. I really wish we had proper device dependencies
sorted... But we do need to try that route before starting to hack
things like CMA.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-03-04 17:32             ` Marc Zyngier
  0 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-03-04 17:32 UTC (permalink / raw)
  To: Robert Richter, Laura Abbott
  Cc: Will Deacon, Catalin Marinas, Greg Kroah-Hartman,
	Thomas Gleixner, Tirumalesh Chalamarla, linux-arm-kernel,
	linux-mm, linux-kernel

On 01/03/16 12:40, Robert Richter wrote:
> On 29.02.16 15:17:53, Laura Abbott wrote:
>> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
>>> On 29/02/16 12:25, Robert Richter wrote:
>>>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>>>> On 25/02/16 11:02, Robert Richter wrote:
>>>>>> From: Robert Richter <rrichter@cavium.com>
>>>>>>
>>>>>> This series implements the use of CMA for allocation of large device
>>>>>> tables for the arm64 gicv3 interrupt controller.
>>>>>>
>>>>>> There are 2 patches, the first is for early activation of cma, which
>>>>>> needs to be done before interrupt initialization to make it available
>>>>>> to the gicv3. The second implements the use of CMA to allocate
>>>>>> gicv3-its device tables.
>>>>>>
>>>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>>>> previous patch sent to the list to address this that instead increases
>>>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>>>
>>>>> I think you're looking at the problem the wrong way. Instead of going
>>>>> through CMA directly, I'd rather go through the normal DMA API
>>>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>>>
>>>>> That will give you all the benefit of the CMA allocation, and also make
>>>>> the driver more robust. I meant to do this for a while, and never found
>>>>> the time. Any chance you could have a look?
>>>>
>>>> I was considering this first, and in fact the backend used is the
>>>> same. The problem is that irq initialization is much more earlier than
>>>> standard device probing. The gic even does not have its own struct
>>>> device and is not initialized like devices are. This makes the whole
>>>> dma_alloc_coherent() approach not feasable, at least this would
>>>> require introducing and using a dev struct for the gic. But still this
>>>> migth not work as it could be too early during boot. I also think
>>>> there were reasons not implementing the gic as a device.
>>>>
>>>> I was following more the approach of iommu/mmu implementations which
>>>> use dma_alloc_from_contiguous() directly. I think this is more close
>>>> to the device tables for its.
>>>>
>>>> Code path of dma_alloc_coherent():
>>>>
>>>>  dma_alloc_coherent()
>>>>     v
>>>>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>>>     v
>>>>  dma_alloc_from_coherent()
>>>>     v
>>>>  ...
>>>>
>>>> The difference it that dma_alloc_coherent() tries cma first and then
>>>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>>>> dma_alloc_from_coherent() fails. In my implementation I am directly
>>>> using dma_alloc_from_coherent() and only for large mem sizes.
>>>>
>>>> So both approaches uses finally the same allocation, but for gicv3-its
>>>> the generic dma framework is not used since the gic is not implemented
>>>> as a device.
>>>
>>> And that's what I propose we change.
>>>
>>> The core GIC itself indeed isn't a device, and I'm not proposing we make
>>> it a device (yet). But the ITS is only used much later in the game, and
>>> we could move the table allocation to a different time (when the actual
>>> domains are allocated, for example...). Then, we'd have a set of devices
>>> available, and the DMA API is our friend again.
>>>
>>> 	M.
>>>
>>
>> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
>> it was decided to disallow dma_alloc calls without a valid device pointer
>> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
>> if the GIC code wants to use dma_alloc it _must_ have a proper device.
>>
>> If the device shift still isn't feasible, a better approach might be
>> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
>> calls the cma_alloc functions directly and skips trying to work around
>> the DMA layer.
>>
>> With either option, I don't think the early initialization approach
>> proposed is great. If we want CMA early, it's probably be just to
>> explicitly initialize it early rather than trying to do it from
>> two places. Something like:
> 
> I wasn't sure whether this works for all archs if called directly in
> mm_init(). If so, ok your proposed change would be better, though a
> stub for !CONFIG_CMA needs to be added. Any comment on the change
> below as a replacement for patch #1?
> 
> On the other side, if we use device enablement for its, then early cma
> enablement is not needed anymore. Will check how that could work.

I'm planning to have a look at that next week. This would solve a number
of other issues (like the custom "needs flushing" flags we have so far),
and Will has been pestering me about it for quite a while now.

The only worry I have is that we end-up in a dependency hell with PCI
being probed too early. I really wish we had proper device dependencies
sorted... But we do need to try that route before starting to hack
things like CMA.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables
@ 2016-03-04 17:32             ` Marc Zyngier
  0 siblings, 0 replies; 30+ messages in thread
From: Marc Zyngier @ 2016-03-04 17:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/03/16 12:40, Robert Richter wrote:
> On 29.02.16 15:17:53, Laura Abbott wrote:
>> On 02/29/2016 05:30 AM, Marc Zyngier wrote:
>>> On 29/02/16 12:25, Robert Richter wrote:
>>>> On 29.02.16 10:46:49, Marc Zyngier wrote:
>>>>> On 25/02/16 11:02, Robert Richter wrote:
>>>>>> From: Robert Richter <rrichter@cavium.com>
>>>>>>
>>>>>> This series implements the use of CMA for allocation of large device
>>>>>> tables for the arm64 gicv3 interrupt controller.
>>>>>>
>>>>>> There are 2 patches, the first is for early activation of cma, which
>>>>>> needs to be done before interrupt initialization to make it available
>>>>>> to the gicv3. The second implements the use of CMA to allocate
>>>>>> gicv3-its device tables.
>>>>>>
>>>>>> This solves the problem where mem allocation is limited to 4MB. A
>>>>>> previous patch sent to the list to address this that instead increases
>>>>>> FORCE_MAX_ZONEORDER becomes obsolete.
>>>>>
>>>>> I think you're looking at the problem the wrong way. Instead of going
>>>>> through CMA directly, I'd rather go through the normal DMA API
>>>>> (dma_alloc_coherent), which can itself try CMA (should it be enabled).
>>>>>
>>>>> That will give you all the benefit of the CMA allocation, and also make
>>>>> the driver more robust. I meant to do this for a while, and never found
>>>>> the time. Any chance you could have a look?
>>>>
>>>> I was considering this first, and in fact the backend used is the
>>>> same. The problem is that irq initialization is much more earlier than
>>>> standard device probing. The gic even does not have its own struct
>>>> device and is not initialized like devices are. This makes the whole
>>>> dma_alloc_coherent() approach not feasable, at least this would
>>>> require introducing and using a dev struct for the gic. But still this
>>>> migth not work as it could be too early during boot. I also think
>>>> there were reasons not implementing the gic as a device.
>>>>
>>>> I was following more the approach of iommu/mmu implementations which
>>>> use dma_alloc_from_contiguous() directly. I think this is more close
>>>> to the device tables for its.
>>>>
>>>> Code path of dma_alloc_coherent():
>>>>
>>>>  dma_alloc_coherent()
>>>>     v
>>>>  dma_alloc_attrs()             <---- Requires get_dma_ops(dev) != NULL
>>>>     v
>>>>  dma_alloc_from_coherent()
>>>>     v
>>>>  ...
>>>>
>>>> The difference it that dma_alloc_coherent() tries cma first and then
>>>> proceeds with ops->alloc() (which is __dma_alloc() for arm64) if
>>>> dma_alloc_from_coherent() fails. In my implementation I am directly
>>>> using dma_alloc_from_coherent() and only for large mem sizes.
>>>>
>>>> So both approaches uses finally the same allocation, but for gicv3-its
>>>> the generic dma framework is not used since the gic is not implemented
>>>> as a device.
>>>
>>> And that's what I propose we change.
>>>
>>> The core GIC itself indeed isn't a device, and I'm not proposing we make
>>> it a device (yet). But the ITS is only used much later in the game, and
>>> we could move the table allocation to a different time (when the actual
>>> domains are allocated, for example...). Then, we'd have a set of devices
>>> available, and the DMA API is our friend again.
>>>
>>> 	M.
>>>
>>
>> I did the first drop of CMA in the DMA APIs for arm64. When adding that,
>> it was decided to disallow dma_alloc calls without a valid device pointer
>> (c666e8d5cae7 "arm64: Warn on NULL device structure for dma APIs") so
>> if the GIC code wants to use dma_alloc it _must_ have a proper device.
>>
>> If the device shift still isn't feasible, a better approach might be
>> what powerpc did for kvm (arch/powerpc/kvm/book3s_hv_builtin.c). This
>> calls the cma_alloc functions directly and skips trying to work around
>> the DMA layer.
>>
>> With either option, I don't think the early initialization approach
>> proposed is great. If we want CMA early, it's probably be just to
>> explicitly initialize it early rather than trying to do it from
>> two places. Something like:
> 
> I wasn't sure whether this works for all archs if called directly in
> mm_init(). If so, ok your proposed change would be better, though a
> stub for !CONFIG_CMA needs to be added. Any comment on the change
> below as a replacement for patch #1?
> 
> On the other side, if we use device enablement for its, then early cma
> enablement is not needed anymore. Will check how that could work.

I'm planning to have a look at that next week. This would solve a number
of other issues (like the custom "needs flushing" flags we have so far),
and Will has been pestering me about it for quite a while now.

The only worry I have is that we end-up in a dependency hell with PCI
being probed too early. I really wish we had proper device dependencies
sorted... But we do need to try that route before starting to hack
things like CMA.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-03-04 17:32 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-25 11:02 [PATCH 0/2] arm64, cma, gicv3-its: Use CMA for allocation of large device tables Robert Richter
2016-02-25 11:02 ` Robert Richter
2016-02-25 11:02 ` Robert Richter
2016-02-25 11:02 ` [PATCH 1/2] mm: cma: arm64: Introduce dma_activate_contiguous() for early activation Robert Richter
2016-02-25 11:02   ` Robert Richter
2016-02-25 11:02   ` Robert Richter
2016-02-25 11:02 ` [PATCH 2/2] irqchip, gicv3-its, cma: Use CMA for allocation of large device tables Robert Richter
2016-02-25 11:02   ` Robert Richter
2016-02-25 11:02   ` Robert Richter
2016-02-29 10:46 ` [PATCH 0/2] arm64, cma, gicv3-its: " Marc Zyngier
2016-02-29 10:46   ` Marc Zyngier
2016-02-29 10:46   ` Marc Zyngier
2016-02-29 12:25   ` Robert Richter
2016-02-29 12:25     ` Robert Richter
2016-02-29 12:25     ` Robert Richter
2016-02-29 13:30     ` Marc Zyngier
2016-02-29 13:30       ` Marc Zyngier
2016-02-29 13:30       ` Marc Zyngier
2016-02-29 23:17       ` Laura Abbott
2016-02-29 23:17         ` Laura Abbott
2016-02-29 23:17         ` Laura Abbott
2016-03-01 12:40         ` Robert Richter
2016-03-01 12:40           ` Robert Richter
2016-03-01 12:40           ` Robert Richter
2016-03-04 14:26           ` Vlastimil Babka
2016-03-04 14:26             ` Vlastimil Babka
2016-03-04 14:26             ` Vlastimil Babka
2016-03-04 17:32           ` Marc Zyngier
2016-03-04 17:32             ` Marc Zyngier
2016-03-04 17:32             ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.