linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] arm64: gicv3: its: Fixes and updates for ThunderX
@ 2015-05-03 20:49 Robert Richter
  2015-05-03 20:49 ` [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id Robert Richter
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Robert Richter @ 2015-05-03 20:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Robert Richter, Tirumalesh Chalamarla, Radha Mohan Chintakuntla,
	linux-kernel, linux-arm-kernel, Robert Richter

From: Robert Richter <rrichter@cavium.com>

This patch series adds fixes and updates for ThunderX.

Patches are unrelated each other and can be applied individually.


Radha Mohan Chintakuntla (1):
  arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

Robert Richter (2):
  arm64: gicv3: its: Add range check for number of allocated pages
  arm64: gicv3: its: Read typer register outside the loop

Tirumalesh Chalamarla (1):
  arm64: gicv3: its: Encode domain number in PCI stream id

 arch/arm64/Kconfig                 |  1 +
 drivers/irqchip/irq-gic-v3-its.c   | 18 +++++++++++++-----
 include/linux/irqchip/arm-gic-v3.h |  1 +
 3 files changed, 15 insertions(+), 5 deletions(-)

-- 
2.1.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id
  2015-05-03 20:49 [PATCH 0/4] arm64: gicv3: its: Fixes and updates for ThunderX Robert Richter
@ 2015-05-03 20:49 ` Robert Richter
  2015-05-20 12:11   ` Marc Zyngier
  2015-05-03 20:49 ` [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages Robert Richter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-03 20:49 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Jason Cooper
  Cc: Robert Richter, Tirumalesh Chalamarla, Radha Mohan Chintakuntla,
	linux-kernel, linux-arm-kernel, Robert Richter

From: Tirumalesh Chalamarla <tchalamarla@cavium.com>

PCI stream ids need to consider pci bridge number to be unique on the
system. Using only bus and devfn can't do the trick in systems that
have multiple pci bridges.

Signed-off-by: Tirumalesh Chalamarla <tchalamarla@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 9687f8afebff..e30b4de04c6c 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1186,7 +1186,7 @@ static int its_get_pci_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
 	struct its_pci_alias *dev_alias = data;
 
-	dev_alias->dev_id = alias;
+	dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
 	if (pdev != dev_alias->pdev)
 		dev_alias->count += its_pci_msi_vec_count(dev_alias->pdev);
 
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages
  2015-05-03 20:49 [PATCH 0/4] arm64: gicv3: its: Fixes and updates for ThunderX Robert Richter
  2015-05-03 20:49 ` [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id Robert Richter
@ 2015-05-03 20:49 ` Robert Richter
  2015-05-20 12:14   ` Marc Zyngier
  2015-05-03 20:49 ` [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop Robert Richter
  2015-05-03 20:49 ` [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX Robert Richter
  3 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-03 20:49 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Jason Cooper
  Cc: Robert Richter, Tirumalesh Chalamarla, Radha Mohan Chintakuntla,
	linux-kernel, linux-arm-kernel, Robert Richter

From: Robert Richter <rrichter@cavium.com>

The number of pages for the its table may exceed the maximum of 256.
Adding a range check and limitting the number to its maximum.

Based on a patch from Tirumalesh Chalamarla <tchalamarla@cavium.com>.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 drivers/irqchip/irq-gic-v3-its.c   | 11 ++++++++++-
 include/linux/irqchip/arm-gic-v3.h |  1 +
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index e30b4de04c6c..a2619a1d5bb3 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -810,6 +810,7 @@ static int its_alloc_tables(struct its_node *its)
 		u64 entry_size = GITS_BASER_ENTRY_SIZE(val);
 		int order = get_order(psz);
 		int alloc_size;
+		int alloc_pages;
 		u64 tmp;
 		void *base;
 
@@ -837,6 +838,14 @@ static int its_alloc_tables(struct its_node *its)
 		}
 
 		alloc_size = (1 << order) * PAGE_SIZE;
+		alloc_pages = (alloc_size / psz);
+		if (alloc_pages > GITS_BASER_PAGES_MAX) {
+			alloc_pages = GITS_BASER_PAGES_MAX;
+			order = get_order(GITS_BASER_PAGES_MAX * psz);
+			pr_warn("%s: Device Table too large, reduce its page order to %u (%u pages)\n",
+				its->msi_chip.of_node->full_name, order, alloc_pages);
+		}
+
 		base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
 		if (!base) {
 			err = -ENOMEM;
@@ -865,7 +874,7 @@ static int its_alloc_tables(struct its_node *its)
 			break;
 		}
 
-		val |= (alloc_size / psz) - 1;
+		val |= alloc_pages - 1;
 
 		writeq_relaxed(val, its->base + GITS_BASER + i * 8);
 		tmp = readq_relaxed(its->base + GITS_BASER + i * 8);
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index ffbc034c8810..f28da189c4aa 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -229,6 +229,7 @@
 #define GITS_BASER_PAGE_SIZE_16K	(1UL << GITS_BASER_PAGE_SIZE_SHIFT)
 #define GITS_BASER_PAGE_SIZE_64K	(2UL << GITS_BASER_PAGE_SIZE_SHIFT)
 #define GITS_BASER_PAGE_SIZE_MASK	(3UL << GITS_BASER_PAGE_SIZE_SHIFT)
+#define GITS_BASER_PAGES_MAX		256
 
 #define GITS_BASER_TYPE_NONE		0
 #define GITS_BASER_TYPE_DEVICE		1
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop
  2015-05-03 20:49 [PATCH 0/4] arm64: gicv3: its: Fixes and updates for ThunderX Robert Richter
  2015-05-03 20:49 ` [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id Robert Richter
  2015-05-03 20:49 ` [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages Robert Richter
@ 2015-05-03 20:49 ` Robert Richter
  2015-05-20 12:15   ` Marc Zyngier
  2015-05-03 20:49 ` [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX Robert Richter
  3 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-03 20:49 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Jason Cooper
  Cc: Robert Richter, Tirumalesh Chalamarla, Radha Mohan Chintakuntla,
	linux-kernel, linux-arm-kernel, Robert Richter

From: Robert Richter <rrichter@cavium.com>

No need to read the typer register in the loop. Values do not change.

Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index a2619a1d5bb3..916c4724c771 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -803,6 +803,8 @@ static int its_alloc_tables(struct its_node *its)
 	int psz = SZ_64K;
 	u64 shr = GITS_BASER_InnerShareable;
 	u64 cache = GITS_BASER_WaWb;
+	u64 typer = readq_relaxed(its->base + GITS_TYPER);
+	u32 ids = GITS_TYPER_DEVBITS(typer);
 
 	for (i = 0; i < GITS_BASER_NR_REGS; i++) {
 		u64 val = readq_relaxed(its->base + GITS_BASER + i * 8);
@@ -826,9 +828,6 @@ static int its_alloc_tables(struct its_node *its)
 		 * For other tables, only allocate a single page.
 		 */
 		if (type == GITS_BASER_TYPE_DEVICE) {
-			u64 typer = readq_relaxed(its->base + GITS_TYPER);
-			u32 ids = GITS_TYPER_DEVBITS(typer);
-
 			order = get_order((1UL << ids) * entry_size);
 			if (order >= MAX_ORDER) {
 				order = MAX_ORDER - 1;
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-03 20:49 [PATCH 0/4] arm64: gicv3: its: Fixes and updates for ThunderX Robert Richter
                   ` (2 preceding siblings ...)
  2015-05-03 20:49 ` [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop Robert Richter
@ 2015-05-03 20:49 ` Robert Richter
  2015-05-05 10:53   ` Will Deacon
  3 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-03 20:49 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Robert Richter, Tirumalesh Chalamarla, Radha Mohan Chintakuntla,
	linux-kernel, linux-arm-kernel, Robert Richter

From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>

In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
which is bigger than the allowed max order. So we are forcing it only in
case of 4KB page size.

Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1b8e97331ffb..c519f3777453 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -542,6 +542,7 @@ config XEN
 config FORCE_MAX_ZONEORDER
 	int
 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
+	default "13" if (ARCH_THUNDER && !ARM64_64K_PAGES)
 	default "11"
 
 menuconfig ARMV8_DEPRECATED
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-03 20:49 ` [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX Robert Richter
@ 2015-05-05 10:53   ` Will Deacon
  2015-05-11  9:14     ` Robert Richter
  0 siblings, 1 reply; 24+ messages in thread
From: Will Deacon @ 2015-05-05 10:53 UTC (permalink / raw)
  To: Robert Richter
  Cc: Marc Zyngier, Catalin Marinas, Tirumalesh Chalamarla,
	Radha Mohan Chintakuntla, linux-kernel, linux-arm-kernel,
	Robert Richter

On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> 
> In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> which is bigger than the allowed max order. So we are forcing it only in
> case of 4KB page size.

Does this problem disappear if the ITS driver uses dma_alloc_coherent
instead? That would also allow us to remove the __flush_dcache_area abuse
from the driver.

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-05 10:53   ` Will Deacon
@ 2015-05-11  9:14     ` Robert Richter
  2015-05-12 12:30       ` Will Deacon
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-11  9:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robert Richter, Marc Zyngier, Catalin Marinas,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On 05.05.15 11:53:29, Will Deacon wrote:
> On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > 
> > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > which is bigger than the allowed max order. So we are forcing it only in
> > case of 4KB page size.
> 
> Does this problem disappear if the ITS driver uses dma_alloc_coherent
> instead? That would also allow us to remove the __flush_dcache_area abuse
> from the driver.

__get_free_pages() is also used internally in dma_alloc_coherent().

There is another case if the device brings dma mem with it. I am not
sure if it would be possible to assign some phys memory via devicetree
to the interrupt controller and then assign that range for its table
allocation.

Another option would be to allocate a hugepage. This would require
setting up hugepages during boottime. I need to figure out whether
that could work.

What about on the remaining 3 patches?

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-11  9:14     ` Robert Richter
@ 2015-05-12 12:30       ` Will Deacon
  2015-05-12 16:20         ` Robert Richter
  0 siblings, 1 reply; 24+ messages in thread
From: Will Deacon @ 2015-05-12 12:30 UTC (permalink / raw)
  To: Robert Richter
  Cc: Robert Richter, Marc Zyngier, Catalin Marinas,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> On 05.05.15 11:53:29, Will Deacon wrote:
> > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > 
> > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > which is bigger than the allowed max order. So we are forcing it only in
> > > case of 4KB page size.
> > 
> > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > instead? That would also allow us to remove the __flush_dcache_area abuse
> > from the driver.
> 
> __get_free_pages() is also used internally in dma_alloc_coherent().
> 
> There is another case if the device brings dma mem with it. I am not
> sure if it would be possible to assign some phys memory via devicetree
> to the interrupt controller and then assign that range for its table
> allocation.
> 
> Another option would be to allocate a hugepage. This would require
> setting up hugepages during boottime. I need to figure out whether
> that could work.
> 
> What about on the remaining 3 patches?

Marc would be the best guy to review those, but he's on holiday for a couple
of weeks at the moment.

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-12 12:30       ` Will Deacon
@ 2015-05-12 16:20         ` Robert Richter
  2015-05-12 17:24           ` Will Deacon
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-12 16:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robert Richter, Marc Zyngier, Catalin Marinas,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

Will,

On 12.05.15 13:30:57, Will Deacon wrote:
> On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > On 05.05.15 11:53:29, Will Deacon wrote:
> > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > > 
> > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > case of 4KB page size.
> > > 
> > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > from the driver.
> > 
> > __get_free_pages() is also used internally in dma_alloc_coherent().
> > 
> > There is another case if the device brings dma mem with it. I am not
> > sure if it would be possible to assign some phys memory via devicetree
> > to the interrupt controller and then assign that range for its table
> > allocation.
> > 
> > Another option would be to allocate a hugepage. This would require
> > setting up hugepages during boottime. I need to figure out whether
> > that could work.

For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
default pagesize) I see this different approaches:

 * set FORCE_MAX_ZONEORDER to 13 as default,

 * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,

 * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),

 * use hugepages if enabled (defconfig has the following options
   enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
   work with current default kernel without changing defconfig
   options),

 * use devicetree to reserve mem for gicv3 (need to check ACPI).

Do you see any direction?

> > What about on the remaining 3 patches?
> 
> Marc would be the best guy to review those, but he's on holiday for a couple
> of weeks at the moment.

Thanks for the note and your comments.

-Robert

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-12 16:20         ` Robert Richter
@ 2015-05-12 17:24           ` Will Deacon
  2015-05-12 17:46             ` Robert Richter
  2015-05-20 12:22             ` Marc Zyngier
  0 siblings, 2 replies; 24+ messages in thread
From: Will Deacon @ 2015-05-12 17:24 UTC (permalink / raw)
  To: Robert Richter
  Cc: Robert Richter, Marc Zyngier, Catalin Marinas,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> On 12.05.15 13:30:57, Will Deacon wrote:
> > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > > On 05.05.15 11:53:29, Will Deacon wrote:
> > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > > > 
> > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > > case of 4KB page size.
> > > > 
> > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > > from the driver.
> > > 
> > > __get_free_pages() is also used internally in dma_alloc_coherent().
> > > 
> > > There is another case if the device brings dma mem with it. I am not
> > > sure if it would be possible to assign some phys memory via devicetree
> > > to the interrupt controller and then assign that range for its table
> > > allocation.
> > > 
> > > Another option would be to allocate a hugepage. This would require
> > > setting up hugepages during boottime. I need to figure out whether
> > > that could work.
> 
> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> default pagesize) I see this different approaches:

16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
a sparse DeviceID space or both?

>  * set FORCE_MAX_ZONEORDER to 13 as default,
> 
>  * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,
> 
>  * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),

I'm not hugely fond of these suggestions, as there's still no guarantee
that such a huge allocation is going to succeed and we end up bumping
MAX_ORDER for all platforms in defconfig if we enable THUNDER there.

>  * use hugepages if enabled (defconfig has the following options
>    enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
>    work with current default kernel without changing defconfig
>    options),

I don't think hugepages help with DMA.

>  * use devicetree to reserve mem for gicv3 (need to check ACPI).

Using a carveout like this might be the best bet. I assume the memory used
by the ITS can never be reclaimed by the syste (and therefore there's no
issue with wastage)?

> Do you see any direction?

Dunno, does CMA also require the MAX_ORDER bump?

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-12 17:24           ` Will Deacon
@ 2015-05-12 17:46             ` Robert Richter
  2015-05-20 12:22             ` Marc Zyngier
  1 sibling, 0 replies; 24+ messages in thread
From: Robert Richter @ 2015-05-12 17:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robert Richter, Marc Zyngier, Catalin Marinas,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On 12.05.15 18:24:16, Will Deacon wrote:
> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > default pagesize) I see this different approaches:
> 
> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> a sparse DeviceID space or both?
> 
> >  * set FORCE_MAX_ZONEORDER to 13 as default,
> > 
> >  * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,
> > 
> >  * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),
> 
> I'm not hugely fond of these suggestions, as there's still no guarantee
> that such a huge allocation is going to succeed and we end up bumping
> MAX_ORDER for all platforms in defconfig if we enable THUNDER there.

I actually was expecting this...

> >  * use hugepages if enabled (defconfig has the following options
> >    enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
> >    work with current default kernel without changing defconfig
> >    options),
> 
> I don't think hugepages help with DMA.
> 
> >  * use devicetree to reserve mem for gicv3 (need to check ACPI).

I am quite a bit concerned letting firmware handle this. But if that
would solve it, fine.

> Using a carveout like this might be the best bet. I assume the memory used
> by the ITS can never be reclaimed by the syste (and therefore there's no
> issue with wastage)?
> 
> > Do you see any direction?
> 
> Dunno, does CMA also require the MAX_ORDER bump?

Looks promising at the first glance. Will look into it.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id
  2015-05-03 20:49 ` [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id Robert Richter
@ 2015-05-20 12:11   ` Marc Zyngier
  2015-05-20 12:48     ` Robert Richter
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2015-05-20 12:11 UTC (permalink / raw)
  To: Robert Richter
  Cc: Thomas Gleixner, Jason Cooper, Tirumalesh Chalamarla,
	Radha Mohan Chintakuntla, linux-kernel, linux-arm-kernel,
	Robert Richter

On Sun, 3 May 2015 21:49:29 +0100
Robert Richter <rric@kernel.org> wrote:

> From: Tirumalesh Chalamarla <tchalamarla@cavium.com>
> 
> PCI stream ids need to consider pci bridge number to be unique on the
> system. Using only bus and devfn can't do the trick in systems that
> have multiple pci bridges.
> 
> Signed-off-by: Tirumalesh Chalamarla <tchalamarla@cavium.com>
> Signed-off-by: Robert Richter <rrichter@cavium.com>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 9687f8afebff..e30b4de04c6c 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1186,7 +1186,7 @@ static int its_get_pci_alias(struct pci_dev *pdev, u16 alias, void *data)
>  {
>  	struct its_pci_alias *dev_alias = data;
>  
> -	dev_alias->dev_id = alias;
> +	dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
>  	if (pdev != dev_alias->pdev)
>  		dev_alias->count += its_pci_msi_vec_count(dev_alias->pdev);
>  

This feels very scary. We're now assuming that the domain number will
always be presented to the doorbell. What guarantee do we have that
this is always the case, irrespective of the platform?

Also, domains have no PCI reality, they are a Linux thing. And they can
be "randomly" assigned, unless you force the domain in DT with a
linux,pci-domain property. This looks even more wrong, specially
considering ACPI.

It really feels like we need a way to describe how the BDF numbering is
augmented. We also need to guarantee that we get the actual bridge
number, as opposed to the domain number.

Thoughts?

	M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages
  2015-05-03 20:49 ` [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages Robert Richter
@ 2015-05-20 12:14   ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2015-05-20 12:14 UTC (permalink / raw)
  To: Robert Richter
  Cc: Thomas Gleixner, Jason Cooper, Tirumalesh Chalamarla,
	Radha Mohan Chintakuntla, linux-kernel, linux-arm-kernel,
	Robert Richter

On Sun, 3 May 2015 21:49:30 +0100
Robert Richter <rric@kernel.org> wrote:

> From: Robert Richter <rrichter@cavium.com>
> 
> The number of pages for the its table may exceed the maximum of 256.
> Adding a range check and limitting the number to its maximum.
> 
> Based on a patch from Tirumalesh Chalamarla <tchalamarla@cavium.com>.
> 
> Signed-off-by: Robert Richter <rrichter@cavium.com>

Looks good to me.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.

> ---
>  drivers/irqchip/irq-gic-v3-its.c   | 11 ++++++++++-
>  include/linux/irqchip/arm-gic-v3.h |  1 +
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index e30b4de04c6c..a2619a1d5bb3 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -810,6 +810,7 @@ static int its_alloc_tables(struct its_node *its)
>  		u64 entry_size = GITS_BASER_ENTRY_SIZE(val);
>  		int order = get_order(psz);
>  		int alloc_size;
> +		int alloc_pages;
>  		u64 tmp;
>  		void *base;
>  
> @@ -837,6 +838,14 @@ static int its_alloc_tables(struct its_node *its)
>  		}
>  
>  		alloc_size = (1 << order) * PAGE_SIZE;
> +		alloc_pages = (alloc_size / psz);
> +		if (alloc_pages > GITS_BASER_PAGES_MAX) {
> +			alloc_pages = GITS_BASER_PAGES_MAX;
> +			order = get_order(GITS_BASER_PAGES_MAX * psz);
> +			pr_warn("%s: Device Table too large, reduce its page order to %u (%u pages)\n",
> +				its->msi_chip.of_node->full_name, order, alloc_pages);
> +		}
> +
>  		base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
>  		if (!base) {
>  			err = -ENOMEM;
> @@ -865,7 +874,7 @@ static int its_alloc_tables(struct its_node *its)
>  			break;
>  		}
>  
> -		val |= (alloc_size / psz) - 1;
> +		val |= alloc_pages - 1;
>  
>  		writeq_relaxed(val, its->base + GITS_BASER + i * 8);
>  		tmp = readq_relaxed(its->base + GITS_BASER + i * 8);
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index ffbc034c8810..f28da189c4aa 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -229,6 +229,7 @@
>  #define GITS_BASER_PAGE_SIZE_16K	(1UL << GITS_BASER_PAGE_SIZE_SHIFT)
>  #define GITS_BASER_PAGE_SIZE_64K	(2UL << GITS_BASER_PAGE_SIZE_SHIFT)
>  #define GITS_BASER_PAGE_SIZE_MASK	(3UL << GITS_BASER_PAGE_SIZE_SHIFT)
> +#define GITS_BASER_PAGES_MAX		256
>  
>  #define GITS_BASER_TYPE_NONE		0
>  #define GITS_BASER_TYPE_DEVICE		1



-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop
  2015-05-03 20:49 ` [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop Robert Richter
@ 2015-05-20 12:15   ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2015-05-20 12:15 UTC (permalink / raw)
  To: Robert Richter
  Cc: Thomas Gleixner, Jason Cooper, Tirumalesh Chalamarla,
	Radha Mohan Chintakuntla, linux-kernel, linux-arm-kernel,
	Robert Richter

On Sun, 3 May 2015 21:49:31 +0100
Robert Richter <rric@kernel.org> wrote:

> From: Robert Richter <rrichter@cavium.com>
> 
> No need to read the typer register in the loop. Values do not change.
> 
> Signed-off-by: Robert Richter <rrichter@cavium.com>

Fair enough.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.

> ---
>  drivers/irqchip/irq-gic-v3-its.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index a2619a1d5bb3..916c4724c771 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -803,6 +803,8 @@ static int its_alloc_tables(struct its_node *its)
>  	int psz = SZ_64K;
>  	u64 shr = GITS_BASER_InnerShareable;
>  	u64 cache = GITS_BASER_WaWb;
> +	u64 typer = readq_relaxed(its->base + GITS_TYPER);
> +	u32 ids = GITS_TYPER_DEVBITS(typer);
>  
>  	for (i = 0; i < GITS_BASER_NR_REGS; i++) {
>  		u64 val = readq_relaxed(its->base + GITS_BASER + i * 8);
> @@ -826,9 +828,6 @@ static int its_alloc_tables(struct its_node *its)
>  		 * For other tables, only allocate a single page.
>  		 */
>  		if (type == GITS_BASER_TYPE_DEVICE) {
> -			u64 typer = readq_relaxed(its->base + GITS_TYPER);
> -			u32 ids = GITS_TYPER_DEVBITS(typer);
> -
>  			order = get_order((1UL << ids) * entry_size);
>  			if (order >= MAX_ORDER) {
>  				order = MAX_ORDER - 1;



-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-12 17:24           ` Will Deacon
  2015-05-12 17:46             ` Robert Richter
@ 2015-05-20 12:22             ` Marc Zyngier
  2015-05-20 12:31               ` Robert Richter
  1 sibling, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2015-05-20 12:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robert Richter, Robert Richter, Catalin Marinas,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On Tue, 12 May 2015 18:24:16 +0100
Will Deacon <will.deacon@arm.com> wrote:

> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > On 12.05.15 13:30:57, Will Deacon wrote:
> > > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > > > On 05.05.15 11:53:29, Will Deacon wrote:
> > > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > > > > 
> > > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > > > case of 4KB page size.
> > > > > 
> > > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > > > from the driver.
> > > > 
> > > > __get_free_pages() is also used internally in dma_alloc_coherent().
> > > > 
> > > > There is another case if the device brings dma mem with it. I am not
> > > > sure if it would be possible to assign some phys memory via devicetree
> > > > to the interrupt controller and then assign that range for its table
> > > > allocation.
> > > > 
> > > > Another option would be to allocate a hugepage. This would require
> > > > setting up hugepages during boottime. I need to figure out whether
> > > > that could work.
> > 
> > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > default pagesize) I see this different approaches:
> 
> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> a sparse DeviceID space or both?

That's probably due to the sparseness of the DeviceID space. With some
form of bridge number encoded on top of the BFD number, the device
table is enormous, and I don't see a nice way to avoid it...

	M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-20 12:22             ` Marc Zyngier
@ 2015-05-20 12:31               ` Robert Richter
  2015-05-20 16:48                 ` Catalin Marinas
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-20 12:31 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Robert Richter, Catalin Marinas,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On 20.05.15 13:22:13, Marc Zyngier wrote:
> On Tue, 12 May 2015 18:24:16 +0100
> Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > > On 12.05.15 13:30:57, Will Deacon wrote:

> > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > > default pagesize) I see this different approaches:
> > 
> > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> > a sparse DeviceID space or both?
> 
> That's probably due to the sparseness of the DeviceID space. With some
> form of bridge number encoded on top of the BFD number, the device
> table is enormous, and I don't see a nice way to avoid it...

Right. At the momement out of 21 bits (16MB) we currently have 2 spare
bits, which reduces the actually size used to 4MB. Though, for the
current cpu model we can reduce it at least to 8MB total.

I will come up with an additional patch setting this to 8MB.

As said before, I also write on a patch to use CMA.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id
  2015-05-20 12:11   ` Marc Zyngier
@ 2015-05-20 12:48     ` Robert Richter
  2015-05-22  8:26       ` Marc Zyngier
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-20 12:48 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Robert Richter, Thomas Gleixner, Jason Cooper,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

Mark,

thanks for review, also of the other patches of this series.

See below

On 20.05.15 13:11:38, Marc Zyngier wrote:
> > -	dev_alias->dev_id = alias;
> > +	dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;

> This feels very scary. We're now assuming that the domain number will
> always be presented to the doorbell. What guarantee do we have that
> this is always the case, irrespective of the platform?
> 
> Also, domains have no PCI reality, they are a Linux thing. And they can
> be "randomly" assigned, unless you force the domain in DT with a
> linux,pci-domain property. This looks even more wrong, specially
> considering ACPI.

The main problem here is that device ids (32 bits) are system
specific. Since we have more than one PCI root complex we need the
upper 16 bits in the devid for mapping. Using pci_domain_nr for this
fits our needs for now and shouldn't affect systems with a single RC
only as the domain nr is zero then.

The domain number is incremented during initialization beginnig with
zero and the order of it is fixed since it is taken from DT or ACPI
tables. So we have full controll of it. I don't see issues here.

> It really feels like we need a way to describe how the BDF numbering is
> augmented. We also need to guarantee that we get the actual bridge
> number, as opposed to the domain number.

But true, the obove is just intermediate. In the end we need some sort
of handler that is setup during cpu initialization that registers a
callback for the gic to determine the device id of that paricular
system.

-Robert

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-20 12:31               ` Robert Richter
@ 2015-05-20 16:48                 ` Catalin Marinas
  2015-05-21  8:35                   ` Marc Zyngier
  0 siblings, 1 reply; 24+ messages in thread
From: Catalin Marinas @ 2015-05-20 16:48 UTC (permalink / raw)
  To: Robert Richter
  Cc: Marc Zyngier, Robert Richter, Will Deacon, linux-kernel,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla,
	linux-arm-kernel

On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
> On 20.05.15 13:22:13, Marc Zyngier wrote:
> > On Tue, 12 May 2015 18:24:16 +0100
> > Will Deacon <will.deacon@arm.com> wrote:
> > > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > > > On 12.05.15 13:30:57, Will Deacon wrote:
> 
> > > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > > > default pagesize) I see this different approaches:
> > > 
> > > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> > > a sparse DeviceID space or both?
> > 
> > That's probably due to the sparseness of the DeviceID space. With some
> > form of bridge number encoded on top of the BFD number, the device
> > table is enormous, and I don't see a nice way to avoid it...
> 
> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
> bits, which reduces the actually size used to 4MB. Though, for the
> current cpu model we can reduce it at least to 8MB total.
> 
> I will come up with an additional patch setting this to 8MB.
> 
> As said before, I also write on a patch to use CMA.

Can we not reserve a chunk of memory and pass the information to the
kernel via DT (/memreserve/ and a new GIC-specific binding)?

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-20 16:48                 ` Catalin Marinas
@ 2015-05-21  8:35                   ` Marc Zyngier
  2015-05-21 12:13                     ` Robert Richter
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2015-05-21  8:35 UTC (permalink / raw)
  To: Catalin Marinas, Robert Richter
  Cc: Robert Richter, Will Deacon, linux-kernel, Tirumalesh Chalamarla,
	Radha Mohan Chintakuntla, linux-arm-kernel

On 20/05/15 17:48, Catalin Marinas wrote:
> On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
>> On 20.05.15 13:22:13, Marc Zyngier wrote:
>>> On Tue, 12 May 2015 18:24:16 +0100
>>> Will Deacon <will.deacon@arm.com> wrote:
>>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
>>>>> On 12.05.15 13:30:57, Will Deacon wrote:
>>
>>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
>>>>> default pagesize) I see this different approaches:
>>>>
>>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
>>>> a sparse DeviceID space or both?
>>>
>>> That's probably due to the sparseness of the DeviceID space. With some
>>> form of bridge number encoded on top of the BFD number, the device
>>> table is enormous, and I don't see a nice way to avoid it...
>>
>> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
>> bits, which reduces the actually size used to 4MB. Though, for the
>> current cpu model we can reduce it at least to 8MB total.
>>
>> I will come up with an additional patch setting this to 8MB.
>>
>> As said before, I also write on a patch to use CMA.
> 
> Can we not reserve a chunk of memory and pass the information to the
> kernel via DT (/memreserve/ and a new GIC-specific binding)?

That would have to be done on a per-table basis then. And how would that
work with ACPI? I don't think the ACPI ITS table specifies anything in
that respect.

We're just facing the horrible reality that linear tables are not very
well suited to sparse addressing. Nobody copied the VAX MMU model for a
reason... until now.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-21  8:35                   ` Marc Zyngier
@ 2015-05-21 12:13                     ` Robert Richter
  2015-05-21 12:34                       ` Marc Zyngier
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Richter @ 2015-05-21 12:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Robert Richter, Will Deacon, linux-kernel,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla,
	linux-arm-kernel

On 21.05.15 09:35:47, Marc Zyngier wrote:
> On 20/05/15 17:48, Catalin Marinas wrote:
> > On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
> >> On 20.05.15 13:22:13, Marc Zyngier wrote:
> >>> On Tue, 12 May 2015 18:24:16 +0100
> >>> Will Deacon <will.deacon@arm.com> wrote:
> >>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> >>>>> On 12.05.15 13:30:57, Will Deacon wrote:
> >>
> >>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> >>>>> default pagesize) I see this different approaches:
> >>>>
> >>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> >>>> a sparse DeviceID space or both?
> >>>
> >>> That's probably due to the sparseness of the DeviceID space. With some
> >>> form of bridge number encoded on top of the BFD number, the device
> >>> table is enormous, and I don't see a nice way to avoid it...
> >>
> >> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
> >> bits, which reduces the actually size used to 4MB. Though, for the
> >> current cpu model we can reduce it at least to 8MB total.
> >>
> >> I will come up with an additional patch setting this to 8MB.
> >>
> >> As said before, I also write on a patch to use CMA.
> > 
> > Can we not reserve a chunk of memory and pass the information to the
> > kernel via DT (/memreserve/ and a new GIC-specific binding)?
> 
> That would have to be done on a per-table basis then. And how would that
> work with ACPI? I don't think the ACPI ITS table specifies anything in
> that respect.
> 
> We're just facing the horrible reality that linear tables are not very
> well suited to sparse addressing. Nobody copied the VAX MMU model for a
> reason... until now.

We could still fall back to mem alloc if the DT or ACPI does not
provide a base address for the table.

For know I would prefer to just implement mem allocation with CMA.

-Robert

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
  2015-05-21 12:13                     ` Robert Richter
@ 2015-05-21 12:34                       ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2015-05-21 12:34 UTC (permalink / raw)
  To: Robert Richter
  Cc: Catalin Marinas, Robert Richter, Will Deacon, linux-kernel,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla

[off the list]

On 21/05/15 13:13, Robert Richter wrote:
> On 21.05.15 09:35:47, Marc Zyngier wrote:
>> On 20/05/15 17:48, Catalin Marinas wrote:
>>> On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
>>>> On 20.05.15 13:22:13, Marc Zyngier wrote:
>>>>> On Tue, 12 May 2015 18:24:16 +0100
>>>>> Will Deacon <will.deacon@arm.com> wrote:
>>>>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
>>>>>>> On 12.05.15 13:30:57, Will Deacon wrote:
>>>>
>>>>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
>>>>>>> default pagesize) I see this different approaches:
>>>>>>
>>>>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
>>>>>> a sparse DeviceID space or both?
>>>>>
>>>>> That's probably due to the sparseness of the DeviceID space. With some
>>>>> form of bridge number encoded on top of the BFD number, the device
>>>>> table is enormous, and I don't see a nice way to avoid it...
>>>>
>>>> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
>>>> bits, which reduces the actually size used to 4MB. Though, for the
>>>> current cpu model we can reduce it at least to 8MB total.
>>>>
>>>> I will come up with an additional patch setting this to 8MB.
>>>>
>>>> As said before, I also write on a patch to use CMA.
>>>
>>> Can we not reserve a chunk of memory and pass the information to the
>>> kernel via DT (/memreserve/ and a new GIC-specific binding)?
>>
>> That would have to be done on a per-table basis then. And how would that
>> work with ACPI? I don't think the ACPI ITS table specifies anything in
>> that respect.
>>
>> We're just facing the horrible reality that linear tables are not very
>> well suited to sparse addressing. Nobody copied the VAX MMU model for a
>> reason... until now.
> 
> We could still fall back to mem alloc if the DT or ACPI does not
> provide a base address for the table.
> 
> For know I would prefer to just implement mem allocation with CMA.

I suppose your ITS implementation doesn't have support for the indirect
tables (where bit 62 of GITS_BASERn can be 1)? If it did, that would
solve most of the issues.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id
  2015-05-20 12:48     ` Robert Richter
@ 2015-05-22  8:26       ` Marc Zyngier
  2015-05-22 22:57         ` Chalamarla, Tirumalesh
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2015-05-22  8:26 UTC (permalink / raw)
  To: Robert Richter
  Cc: Robert Richter, Thomas Gleixner, Jason Cooper,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On 20/05/15 13:48, Robert Richter wrote:
> Mark,
> 
> thanks for review, also of the other patches of this series.
> 
> See below
> 
> On 20.05.15 13:11:38, Marc Zyngier wrote:
>>> -	dev_alias->dev_id = alias;
>>> +	dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
> 
>> This feels very scary. We're now assuming that the domain number will
>> always be presented to the doorbell. What guarantee do we have that
>> this is always the case, irrespective of the platform?
>>
>> Also, domains have no PCI reality, they are a Linux thing. And they can
>> be "randomly" assigned, unless you force the domain in DT with a
>> linux,pci-domain property. This looks even more wrong, specially
>> considering ACPI.
> 
> The main problem here is that device ids (32 bits) are system
> specific. Since we have more than one PCI root complex we need the
> upper 16 bits in the devid for mapping. Using pci_domain_nr for this
> fits our needs for now and shouldn't affect systems with a single RC
> only as the domain nr is zero then.
> 
> The domain number is incremented during initialization beginnig with
> zero and the order of it is fixed since it is taken from DT or ACPI
> tables. So we have full controll of it. I don't see issues here.

This may match what you have on ThunderX (as long as the kernel doesn't
adopt another behaviour when allocating the domain number). But other
platforms may have a completely different numbering, which will mess
them up entirely.

>> It really feels like we need a way to describe how the BDF numbering is
>> augmented. We also need to guarantee that we get the actual bridge
>> number, as opposed to the domain number.
> 
> But true, the obove is just intermediate. In the end we need some sort
> of handler that is setup during cpu initialization that registers a
> callback for the gic to determine the device id of that paricular
> system.

I don't really like the idea of a callback from the GIC - I'd prefer it
to be standalone, and rely on the topology information to build the
DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
ago), maybe it would be good to get back to that and find out what we
can do. ACPI should also have similar information (IORT?).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id
  2015-05-22  8:26       ` Marc Zyngier
@ 2015-05-22 22:57         ` Chalamarla, Tirumalesh
  2015-05-25 10:38           ` Marc Zyngier
  0 siblings, 1 reply; 24+ messages in thread
From: Chalamarla, Tirumalesh @ 2015-05-22 22:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Richter, Robert, Robert Richter, Thomas Gleixner, Jason Cooper,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel


> On May 22, 2015, at 1:26 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> 
> On 20/05/15 13:48, Robert Richter wrote:
>> Mark,
>> 
>> thanks for review, also of the other patches of this series.
>> 
>> See below
>> 
>> On 20.05.15 13:11:38, Marc Zyngier wrote:
>>>> -	dev_alias->dev_id = alias;
>>>> +	dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
>> 
>>> This feels very scary. We're now assuming that the domain number will
>>> always be presented to the doorbell. What guarantee do we have that
>>> this is always the case, irrespective of the platform?
>>> 
>>> Also, domains have no PCI reality, they are a Linux thing. And they can
>>> be "randomly" assigned, unless you force the domain in DT with a
>>> linux,pci-domain property. This looks even more wrong, specially
>>> considering ACPI.
>> 
>> The main problem here is that device ids (32 bits) are system
>> specific. Since we have more than one PCI root complex we need the
>> upper 16 bits in the devid for mapping. Using pci_domain_nr for this
>> fits our needs for now and shouldn't affect systems with a single RC
>> only as the domain nr is zero then.
>> 
>> The domain number is incremented during initialization beginnig with
>> zero and the order of it is fixed since it is taken from DT or ACPI
>> tables. So we have full controll of it. I don't see issues here.
> 
> This may match what you have on ThunderX (as long as the kernel doesn't
> adopt another behaviour when allocating the domain number). But other
> platforms may have a completely different numbering, which will mess
> them up entirely.
> 
>>> It really feels like we need a way to describe how the BDF numbering is
>>> augmented. We also need to guarantee that we get the actual bridge
>>> number, as opposed to the domain number.
>> 
>> But true, the obove is just intermediate. In the end we need some sort
>> of handler that is setup during cpu initialization that registers a
>> callback for the gic to determine the device id of that paricular
>> system.
> 
> I don't really like the idea of a callback from the GIC - I'd prefer it
> to be standalone, and rely on the topology information to build the
> DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
> ago), maybe it would be good to get back to that and find out what we
> can do. ACPI should also have similar information (IORT?).
> 

How can some one pass this from DT, especially in GIC entry. i still think it is bus owner responsibility and call back is better idea. 
but if some one has a better idea for DT and ACPI, we are fine as long as it works on ThunderX.   

Thanks,
Tirumalesh. 


> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id
  2015-05-22 22:57         ` Chalamarla, Tirumalesh
@ 2015-05-25 10:38           ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2015-05-25 10:38 UTC (permalink / raw)
  To: Chalamarla, Tirumalesh, Mark Rutland
  Cc: Richter, Robert, Robert Richter, Thomas Gleixner, Jason Cooper,
	Tirumalesh Chalamarla, Radha Mohan Chintakuntla, linux-kernel,
	linux-arm-kernel

On Fri, 22 May 2015 23:57:40 +0100
"Chalamarla, Tirumalesh" <Tirumalesh.Chalamarla@caviumnetworks.com>
wrote:

Hi Tirumalesh,

> 
> > On May 22, 2015, at 1:26 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > 
> > On 20/05/15 13:48, Robert Richter wrote:
> >> Mark,
> >> 
> >> thanks for review, also of the other patches of this series.
> >> 
> >> See below
> >> 
> >> On 20.05.15 13:11:38, Marc Zyngier wrote:
> >>>> -	dev_alias->dev_id = alias;
> >>>> +	dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
> >> 
> >>> This feels very scary. We're now assuming that the domain number will
> >>> always be presented to the doorbell. What guarantee do we have that
> >>> this is always the case, irrespective of the platform?
> >>> 
> >>> Also, domains have no PCI reality, they are a Linux thing. And they can
> >>> be "randomly" assigned, unless you force the domain in DT with a
> >>> linux,pci-domain property. This looks even more wrong, specially
> >>> considering ACPI.
> >> 
> >> The main problem here is that device ids (32 bits) are system
> >> specific. Since we have more than one PCI root complex we need the
> >> upper 16 bits in the devid for mapping. Using pci_domain_nr for this
> >> fits our needs for now and shouldn't affect systems with a single RC
> >> only as the domain nr is zero then.
> >> 
> >> The domain number is incremented during initialization beginnig with
> >> zero and the order of it is fixed since it is taken from DT or ACPI
> >> tables. So we have full controll of it. I don't see issues here.
> > 
> > This may match what you have on ThunderX (as long as the kernel doesn't
> > adopt another behaviour when allocating the domain number). But other
> > platforms may have a completely different numbering, which will mess
> > them up entirely.
> > 
> >>> It really feels like we need a way to describe how the BDF numbering is
> >>> augmented. We also need to guarantee that we get the actual bridge
> >>> number, as opposed to the domain number.
> >> 
> >> But true, the obove is just intermediate. In the end we need some sort
> >> of handler that is setup during cpu initialization that registers a
> >> callback for the gic to determine the device id of that paricular
> >> system.
> > 
> > I don't really like the idea of a callback from the GIC - I'd prefer it
> > to be standalone, and rely on the topology information to build the
> > DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
> > ago), maybe it would be good to get back to that and find out what we
> > can do. ACPI should also have similar information (IORT?).
> > 
> 
> How can some one pass this from DT, especially in GIC entry. i still
> think it is bus owner responsibility and call back is better idea.
> but if some one has a better idea for DT and ACPI, we are fine as
> long as it works on ThunderX.   

A callback would have to be bus-specific, and depends from the observer
of the access. There is strictly no guarantee that a single write from
the device is performed using the same ID to the IOMMU and to the MSI
doorbell. Actually, they are very likely to be different. A generic
callback would have to know about the point where this access is
observed, and expressing this is a nightmare.

Also, I'm really opposed to having platform-specific code that has for
sole purpose to describe the hardware. This is why we have DT (and to a
lesser extent ACPI). We've been there on 32bit, and learned our lesson.
It doesn't scale, it leads to a bunch of hacks in all corners, and I
don't feel like being on the receiving end of something like this.

I really suggest you look at Mark's suggestion:


http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/333199.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/341584.html

because so far, this is the only proposal that makes any sense to me in
the long run. Feel free to comment on it and help us making something
that also work for your favorite SoC.

Thanks,

        M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-05-25 10:38 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-03 20:49 [PATCH 0/4] arm64: gicv3: its: Fixes and updates for ThunderX Robert Richter
2015-05-03 20:49 ` [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id Robert Richter
2015-05-20 12:11   ` Marc Zyngier
2015-05-20 12:48     ` Robert Richter
2015-05-22  8:26       ` Marc Zyngier
2015-05-22 22:57         ` Chalamarla, Tirumalesh
2015-05-25 10:38           ` Marc Zyngier
2015-05-03 20:49 ` [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages Robert Richter
2015-05-20 12:14   ` Marc Zyngier
2015-05-03 20:49 ` [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop Robert Richter
2015-05-20 12:15   ` Marc Zyngier
2015-05-03 20:49 ` [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX Robert Richter
2015-05-05 10:53   ` Will Deacon
2015-05-11  9:14     ` Robert Richter
2015-05-12 12:30       ` Will Deacon
2015-05-12 16:20         ` Robert Richter
2015-05-12 17:24           ` Will Deacon
2015-05-12 17:46             ` Robert Richter
2015-05-20 12:22             ` Marc Zyngier
2015-05-20 12:31               ` Robert Richter
2015-05-20 16:48                 ` Catalin Marinas
2015-05-21  8:35                   ` Marc Zyngier
2015-05-21 12:13                     ` Robert Richter
2015-05-21 12:34                       ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).