All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
@ 2014-05-14 15:50 Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 1/7] ARM: extend machine_desc with additional flags Thomas Petazzoni
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

Disclaimer: this is an RFC patch series.

Several of the latest Marvell Armada EBU platforms have a special
feature that is fairly uncommon on ARM platforms: hardware I/O
coherency. This feature allows to remove all cache maintenance
operations that are normally needed when doing DMA transfers (only an
I/O sync barrier is needed for DMA transfers from device).

This hardware I/O coherency mechanism needs a set of ARM core
requirements to operate properly:

 * On Armada 370 (a single core processor)

   - The cache policy of pages must be set to "write allocate".

 * On Armada XP (which has dual core and quad core variants)

   - The cache policy of pages must be set to "write allocate".
   - The SMP bit in the Auxiliary Functional Modes Control Register 0
     must be set (remember the CPU core is PJ4B)
   - The pages must be set as shareable.

 * On Armada 375/38x (which have single core and dual core variants)

   - The cache policy of pages must be set to "write allocate".
   - The SMP and TLB broadcast bits must be set in the Auxiliary
     Control Register (the core is a Cortex-A9)
   - The pages must be set as shareable.
   - The SCU must be enabled

All of these requirements are met when the kernel is configured with
CONFIG_SMP *and* when the processor is actually a multiple core
processors (otherwise, due to the CONFIG_SMP_ON_UP logic, is_smp()
returns false, and most of the requirements above are not met).

For example, as of today, Armada 370 is broken because even if we
build the kernel CONFIG_SMP, the pages do not have the "write
allocate" attribute, because Armada 370 is single core, therefore the
CONFIG_SMP_ON_UP logic decides that we're not on an SMP system, and
therefore is_smp() returns false. Therefore, the fact that hardware
I/O coherency is enabled on Armada 370 today is incorrect, and there
is no way to fix that without making changes to the ARM core code.

Similarly, the Armada 380 (single core variant of the Armada 385) will
have the same problem: it's a single core processor, but that requires
several characteristics of an SMP configuration to properly use
hardware I/O coherency.

Also:

 - Not being able to use hardware I/O coherency on Armada 370 and
   Armada 380 is not really an option, as it is a major feature of
   those SoCs.

 - Having to enable CONFIG_SMP is also not very nice, as it comes with
   other performance penalties that could be avoided by using a
   !CONFIG_SMP configuration on those single core systems. And again,
   enabling CONFIG_SMP does *not* work for single core processors due
   to CONFIG_SMP_ON_UP.

Therefore, this RFC patch series proposes one solution to allow these
requirements to be met in the situations we are interested. The patch
series is *only* a proposal, and I'm definitely interested in hearing
about other implementation possibilites to make things look nicer.

Basically, the patch series goes like this:

 * PATCH 1 adds a 'flags' field to 'struct machine_desc' so that each
   platform can tell the ARM kernel core some of its requirements. We
   maybe could have used the Device Tree as well, but accessing the
   Device Tree as early as in paging_init() is a bit problematic as
   far as I understand.

   Two flags are defined: MACHINE_NEEDS_CPOLICY_WRITEALLOC and
   MACHINE_NEEDS_SHAREABLE_PAGES. We need separate flags because
   Armada 370 cannot have the shareable attribute on page tables.

   I am completely open to discussing other possibilities to achieve
   the same goal.

 * PATCH 2 actually implements the logic behind
   MACHINE_NEEDS_CPOLICY_WRITEALLOC and MACHINE_NEEDS_SHAREABLE_PAGES.

 * PATCH 3 sets the SMP and TLB broadcast bits in the Cortex-A9 CP15
   register. This is normally done in proc-v7.S for real SMP systems,
   but if the platform (such as Armada 380) is detected as a single
   core, proc-v7.S will not set the SMP bit and TLB broadcast
   bits. Since we don't have an easy access to the 'struct
   machine_desc' from proc-v7.S, this logic has been added to
   mmu.c. This is done only is is_smp() is false and the platform has
   requested shareable pages.

 * PATCH 4 allows the SCU to be enabled even in !CONFIG_SMP
   configurations. This is needed for the Armada 380, which is a
   single core processor that has hardware I/O coherency support.

 * PATCH 5 splits the Armada 370 and Armada XP machine_desc structures
   into two separate structures, because the two platforms will have
   different flags.

 * PATCH 6 defines the appropriate flags for the Armada 370, Armada
   XP, Armada 375 and Armada 38x 'struct machine_desc'.

 * PATCH 7 now allows Armada 375 and Armada 38x to enable hardware I/O
   coherency even in non-SMP situations (i.e either CONFIG_SMP is
   disabled, or CONFIG_SMP is enabled but the processor is single
   core).

Thanks,

Thomas

Thomas Petazzoni (7):
  ARM: extend machine_desc with additional flags
  ARM: mm: implement the usage of the machine_desc flags
  ARM: mm: enable SMP bit and TLB broadcast bit on !SMP when needed
  ARM: kernel: allow the SCU to be enabled even on !SMP
  ARM: mvebu: split Armada 370 and Armada XP machine_desc
  ARM: mvebu: define the Armada 370/375/38x/XP machine_desc flags
  ARM: mvebu: I/O coherency no longer needs SMP on 375 and 38x

 arch/arm/include/asm/mach/arch.h |  5 +++++
 arch/arm/include/asm/smp_scu.h   |  2 +-
 arch/arm/kernel/smp_scu.c        |  2 --
 arch/arm/mach-mvebu/board-v7.c   | 28 +++++++++++++++++++++++-----
 arch/arm/mach-mvebu/coherency.c  | 13 +++----------
 arch/arm/mm/mmu.c                | 24 ++++++++++++++++++++----
 6 files changed, 52 insertions(+), 22 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 1/7] ARM: extend machine_desc with additional flags
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
@ 2014-05-14 15:50 ` Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 2/7] ARM: mm: implement the usage of the machine_desc flags Thomas Petazzoni
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Processors implementing hardware I/O coherency rely on certain
SMP-related features to be enabled even on UP configurations: the
write allocate cache policy or the shareable attribute. These
decisions are made very early by the ARM core code, in
build_mem_type_table(), even before the machine specific
->init_early() function has been called.

In order to allow these machines to tell the ARM core code about these
specificities, while still preserving multiplatform support, this
commit extends the machine_desc structure with an additional 'flags'
field, and defines two flags: one that allows the machine to tell the
ARM core code that the cache policy should be write allocate, and
another that tells the ARM core code that the shareable attribute
should be set on page tables.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/include/asm/mach/arch.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 17a3fa2..f2a9b39 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -25,6 +25,10 @@ struct smp_operations;
 #define smp_init_ops(ops) (bool (*)(void))NULL
 #endif
 
+/* Possible flags in struct machine_desc */
+#define MACHINE_NEEDS_CPOLICY_WRITEALLOC BIT(0)
+#define MACHINE_NEEDS_SHAREABLE_PAGES    BIT(1)
+
 struct machine_desc {
 	unsigned int		nr;		/* architecture number	*/
 	const char		*name;		/* architecture name	*/
@@ -46,6 +50,7 @@ struct machine_desc {
 	unsigned char		reserve_lp2 :1;	/* never has lp2	*/
 	enum reboot_mode	reboot_mode;	/* default restart mode	*/
 	struct smp_operations	*smp;		/* SMP operations	*/
+	unsigned long		flags;
 	bool			(*smp_init)(void);
 	void			(*fixup)(struct tag *, char **,
 					 struct meminfo *);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 2/7] ARM: mm: implement the usage of the machine_desc flags
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 1/7] ARM: extend machine_desc with additional flags Thomas Petazzoni
@ 2014-05-14 15:50 ` Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 3/7] ARM: mm: enable SMP bit and TLB broadcast bit on !SMP when needed Thomas Petazzoni
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

This commit implements the necessary changes in arch/arm/mm/mmu.c for
the two new flags of the machine_desc structure,
MACHINE_NEEDS_CPOLICY_WRITEALLOC and MACHINE_NEEDS_SHAREABLE_PAGES.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mm/mmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b68c6b2..ad6557e 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -391,7 +391,7 @@ SET_MEMORY_FN(nx, pte_set_nx)
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */
-static void __init build_mem_type_table(void)
+static void __init build_mem_type_table(const struct machine_desc *mdesc)
 {
 	struct cachepolicy *cp;
 	unsigned int cr = get_cr();
@@ -414,7 +414,7 @@ static void __init build_mem_type_table(void)
 			cachepolicy = CPOLICY_WRITEBACK;
 		ecc_mask = 0;
 	}
-	if (is_smp())
+	if (is_smp() || (mdesc->flags & MACHINE_NEEDS_CPOLICY_WRITEALLOC))
 		cachepolicy = CPOLICY_WRITEALLOC;
 
 	/*
@@ -539,7 +539,7 @@ static void __init build_mem_type_table(void)
 		mem_types[MT_CACHECLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
 #endif
 
-		if (is_smp()) {
+		if (is_smp() || (mdesc->flags & MACHINE_NEEDS_SHAREABLE_PAGES)) {
 			/*
 			 * Mark memory with the "shared" attribute
 			 * for SMP systems
@@ -1504,7 +1504,7 @@ void __init paging_init(const struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	build_mem_type_table();
+	build_mem_type_table(mdesc);
 	prepare_page_table();
 	map_lowmem();
 	dma_contiguous_remap();
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 3/7] ARM: mm: enable SMP bit and TLB broadcast bit on !SMP when needed
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 1/7] ARM: extend machine_desc with additional flags Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 2/7] ARM: mm: implement the usage of the machine_desc flags Thomas Petazzoni
@ 2014-05-14 15:50 ` Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 4/7] ARM: kernel: allow the SCU to be enabled even on !SMP Thomas Petazzoni
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Cortex-A9 systems, configured in !SMP, proc-v7.S does not set the
SMP bit and the TLB broadcast bit. However, these are needed to use
the shareable attribute on page tables, which in turn is needed for
certain systems that provide hardware I/O coherency.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mm/mmu.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index ad6557e..6929e67 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -539,6 +539,22 @@ static void __init build_mem_type_table(const struct machine_desc *mdesc)
 		mem_types[MT_CACHECLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
 #endif
 
+		/*
+		 * On Cortex-A9 systems, configured in !SMP, proc-v7.S
+		 * has not set the SMP bit and the TLB broadcast
+		 * bit. However, these are needed to use the shareable
+		 * attribute on page tables, which in turn is needed
+		 * for certain systems that provide hardware I/O
+		 * coherency.
+		 */
+		if (read_cpuid_part_number() == ARM_CPU_PART_CORTEX_A9 &&
+		    !is_smp() && (mdesc->flags & MACHINE_NEEDS_SHAREABLE_PAGES)) {
+			u32 reg;
+			asm("mrc p15, 0, %0, c1, c0, 1" : "=r" (reg));
+			reg |= (1 << 6) | (1 << 0);
+			asm("mcr p15, 0, %0, c1, c0, 1" : : "r" (reg));
+		}
+
 		if (is_smp() || (mdesc->flags & MACHINE_NEEDS_SHAREABLE_PAGES)) {
 			/*
 			 * Mark memory with the "shared" attribute
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 4/7] ARM: kernel: allow the SCU to be enabled even on !SMP
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (2 preceding siblings ...)
  2014-05-14 15:50 ` [RFC PATCHv1 3/7] ARM: mm: enable SMP bit and TLB broadcast bit on !SMP when needed Thomas Petazzoni
@ 2014-05-14 15:50 ` Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 5/7] ARM: mvebu: split Armada 370 and Armada XP machine_desc Thomas Petazzoni
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Certain non-SMP platforms rely on the SCU being enabled to provide
hardware I/O coherency between DMA masters and the CPU. Therefore, the
scu_enable() hook should not be reduced to a no-op when CONFIG_SMP is
disabled.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/include/asm/smp_scu.h | 2 +-
 arch/arm/kernel/smp_scu.c      | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/smp_scu.h b/arch/arm/include/asm/smp_scu.h
index 0393fba..e78f72d 100644
--- a/arch/arm/include/asm/smp_scu.h
+++ b/arch/arm/include/asm/smp_scu.h
@@ -37,7 +37,7 @@ static inline int scu_power_mode(void __iomem *scu_base, unsigned int mode)
 }
 #endif
 
-#if defined(CONFIG_SMP) && defined(CONFIG_HAVE_ARM_SCU)
+#if defined(CONFIG_HAVE_ARM_SCU)
 void scu_enable(void __iomem *scu_base);
 #else
 static inline void scu_enable(void __iomem *scu_base) {}
diff --git a/arch/arm/kernel/smp_scu.c b/arch/arm/kernel/smp_scu.c
index 1aafa0d..09d21b0 100644
--- a/arch/arm/kernel/smp_scu.c
+++ b/arch/arm/kernel/smp_scu.c
@@ -22,7 +22,6 @@
 #define SCU_INVALIDATE		0x0c
 #define SCU_FPGA_REVISION	0x10
 
-#ifdef CONFIG_SMP
 /*
  * Get the number of CPU cores from the SCU configuration
  */
@@ -62,7 +61,6 @@ void scu_enable(void __iomem *scu_base)
 	 */
 	flush_cache_all();
 }
-#endif
 
 /*
  * Set the executing CPUs power mode as defined.  This will be in
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 5/7] ARM: mvebu: split Armada 370 and Armada XP machine_desc
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (3 preceding siblings ...)
  2014-05-14 15:50 ` [RFC PATCHv1 4/7] ARM: kernel: allow the SCU to be enabled even on !SMP Thomas Petazzoni
@ 2014-05-14 15:50 ` Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 6/7] ARM: mvebu: define the Armada 370/375/38x/XP machine_desc flags Thomas Petazzoni
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Armada 370 and Armada XP will need a different set of machine_desc
flags, so we need to separate their machine_desc definitions: one will
be taking care of the "marvell,armada370" compatible string, and the
other will be taking care of the "marvell,armadaxp" compatible string.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/board-v7.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
index 01cfce6..8b4b44b 100644
--- a/arch/arm/mach-mvebu/board-v7.c
+++ b/arch/arm/mach-mvebu/board-v7.c
@@ -176,17 +176,28 @@ static void __init mvebu_dt_init(void)
 	of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
 }
 
-static const char * const armada_370_xp_dt_compat[] = {
-	"marvell,armada-370-xp",
+static const char * const armada_370_dt_compat[] = {
+	"marvell,armada370",
 	NULL,
 };
 
-DT_MACHINE_START(ARMADA_370_XP_DT, "Marvell Armada 370/XP (Device Tree)")
-	.smp		= smp_ops(armada_xp_smp_ops),
+DT_MACHINE_START(ARMADA_370_DT, "Marvell Armada 370 (Device Tree)")
 	.init_machine	= mvebu_dt_init,
 	.init_time	= mvebu_timer_and_clk_init,
 	.restart	= mvebu_restart,
-	.dt_compat	= armada_370_xp_dt_compat,
+	.dt_compat	= armada_370_dt_compat,
+MACHINE_END
+
+static const char * const armada_xp_dt_compat[] = {
+	"marvell,armadaxp",
+	NULL,
+};
+
+DT_MACHINE_START(ARMADA_XP_DT, "Marvell Armada XP (Device Tree)")
+	.init_machine	= mvebu_dt_init,
+	.init_time	= mvebu_timer_and_clk_init,
+	.restart	= mvebu_restart,
+	.dt_compat	= armada_xp_dt_compat,
 MACHINE_END
 
 static const char * const armada_375_dt_compat[] = {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 6/7] ARM: mvebu: define the Armada 370/375/38x/XP machine_desc flags
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (4 preceding siblings ...)
  2014-05-14 15:50 ` [RFC PATCHv1 5/7] ARM: mvebu: split Armada 370 and Armada XP machine_desc Thomas Petazzoni
@ 2014-05-14 15:50 ` Thomas Petazzoni
  2014-05-14 15:50 ` [RFC PATCHv1 7/7] ARM: mvebu: I/O coherency no longer needs SMP on 375 and 38x Thomas Petazzoni
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

The Armada 370 is a purely non-SMP processor, so it does not need the
MACHINE_NEEDS_SHAREABLE_PAGES flag, and more importantly this flag
cannot be set with this CPU core.

For all other processors, they are SMP-capable, but have cut-down
variants that only have one core. And for these variants, we need both
the cache policy to be write-allocate, and the shareable attribute to
be set on page tables in order for I/O coherency to work properly.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/board-v7.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
index 8b4b44b..d1f348e 100644
--- a/arch/arm/mach-mvebu/board-v7.c
+++ b/arch/arm/mach-mvebu/board-v7.c
@@ -186,6 +186,7 @@ DT_MACHINE_START(ARMADA_370_DT, "Marvell Armada 370 (Device Tree)")
 	.init_time	= mvebu_timer_and_clk_init,
 	.restart	= mvebu_restart,
 	.dt_compat	= armada_370_dt_compat,
+	.flags          = MACHINE_NEEDS_CPOLICY_WRITEALLOC,
 MACHINE_END
 
 static const char * const armada_xp_dt_compat[] = {
@@ -198,6 +199,8 @@ DT_MACHINE_START(ARMADA_XP_DT, "Marvell Armada XP (Device Tree)")
 	.init_time	= mvebu_timer_and_clk_init,
 	.restart	= mvebu_restart,
 	.dt_compat	= armada_xp_dt_compat,
+	.flags          = (MACHINE_NEEDS_CPOLICY_WRITEALLOC |
+			   MACHINE_NEEDS_SHAREABLE_PAGES),
 MACHINE_END
 
 static const char * const armada_375_dt_compat[] = {
@@ -210,6 +213,8 @@ DT_MACHINE_START(ARMADA_375_DT, "Marvell Armada 375 (Device Tree)")
 	.init_machine	= mvebu_dt_init,
 	.restart	= mvebu_restart,
 	.dt_compat	= armada_375_dt_compat,
+	.flags          = (MACHINE_NEEDS_CPOLICY_WRITEALLOC |
+			   MACHINE_NEEDS_SHAREABLE_PAGES),
 MACHINE_END
 
 static const char * const armada_38x_dt_compat[] = {
@@ -222,4 +227,6 @@ DT_MACHINE_START(ARMADA_38X_DT, "Marvell Armada 380/385 (Device Tree)")
 	.init_time	= mvebu_timer_and_clk_init,
 	.restart	= mvebu_restart,
 	.dt_compat	= armada_38x_dt_compat,
+	.flags          = (MACHINE_NEEDS_CPOLICY_WRITEALLOC |
+			   MACHINE_NEEDS_SHAREABLE_PAGES),
 MACHINE_END
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 7/7] ARM: mvebu: I/O coherency no longer needs SMP on 375 and 38x
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (5 preceding siblings ...)
  2014-05-14 15:50 ` [RFC PATCHv1 6/7] ARM: mvebu: define the Armada 370/375/38x/XP machine_desc flags Thomas Petazzoni
@ 2014-05-14 15:50 ` Thomas Petazzoni
  2014-05-14 17:04 ` [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Catalin Marinas
  2014-05-14 17:07 ` Rob Herring
  8 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-14 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/coherency.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index d5a975b..8f3f006 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -322,16 +322,9 @@ static int coherency_type(void)
 	if (np) {
 		int type = (int) match->data;
 
-		/* Armada 370/XP coherency works in both UP and SMP */
-		if (type == COHERENCY_FABRIC_TYPE_ARMADA_370_XP)
-			return type;
-
-		/* Armada 375 coherency works only on SMP */
-		else if (type == COHERENCY_FABRIC_TYPE_ARMADA_375 && is_smp())
-			return type;
-
-		/* Armada 380 coherency works only on SMP */
-		else if (type == COHERENCY_FABRIC_TYPE_ARMADA_380 && is_smp())
+		if (type == COHERENCY_FABRIC_TYPE_ARMADA_370_XP ||
+		    type == COHERENCY_FABRIC_TYPE_ARMADA_375    ||
+		    type == COHERENCY_FABRIC_TYPE_ARMADA_380)
 			return type;
 	}
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (6 preceding siblings ...)
  2014-05-14 15:50 ` [RFC PATCHv1 7/7] ARM: mvebu: I/O coherency no longer needs SMP on 375 and 38x Thomas Petazzoni
@ 2014-05-14 17:04 ` Catalin Marinas
  2014-05-15  9:50   ` Thomas Petazzoni
  2014-05-19 13:38   ` Thomas Petazzoni
  2014-05-14 17:07 ` Rob Herring
  8 siblings, 2 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-05-14 17:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 14, 2014 at 04:50:34PM +0100, Thomas Petazzoni wrote:
> This hardware I/O coherency mechanism needs a set of ARM core
> requirements to operate properly:
> 
>  * On Armada 370 (a single core processor)
> 
>    - The cache policy of pages must be set to "write allocate".

Arguably, I would make this the default for ARMv6+ CPUs even if UP. It's
a hint that the CPU may or may not ignore but it shouldn't break
anything (well, maybe some artificial benchmarks designed to show
that write-allocate caches are bad).

[...]

>  * On Armada 375/38x (which have single core and dual core variants)
> 
>    - The cache policy of pages must be set to "write allocate".
>    - The SMP and TLB broadcast bits must be set in the Auxiliary
>      Control Register (the core is a Cortex-A9)

What about setting this bit in the firmware/bootloader? It's a sane
initialisation firmware should do.

>    - The pages must be set as shareable.

Here you may have some conflict between the initial page tables set in
__create_page_tables as non-shareable (that's unless MPIDR shows it as
SMP but I guess not since smp-on-up kicks in). I have to think a bit
more about the implications (the ARM ARM has a chapter on mismatched
memory attributes and I think it talks about shareable vs
non-shareable).

>    - The SCU must be enabled

Again, could the firmware do this?

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
                   ` (7 preceding siblings ...)
  2014-05-14 17:04 ` [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Catalin Marinas
@ 2014-05-14 17:07 ` Rob Herring
  2014-05-15 10:01   ` Thomas Petazzoni
  8 siblings, 1 reply; 28+ messages in thread
From: Rob Herring @ 2014-05-14 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 14, 2014 at 10:50 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Hello,
>
> Disclaimer: this is an RFC patch series.
>
> Several of the latest Marvell Armada EBU platforms have a special
> feature that is fairly uncommon on ARM platforms: hardware I/O
> coherency. This feature allows to remove all cache maintenance
> operations that are normally needed when doing DMA transfers (only an
> I/O sync barrier is needed for DMA transfers from device).
>
> This hardware I/O coherency mechanism needs a set of ARM core
> requirements to operate properly:
>
>  * On Armada 370 (a single core processor)
>
>    - The cache policy of pages must be set to "write allocate".

Why do we want !SMP to be no write allocate in the first place? Seems
like we should always enable write-allocate at least for v7.

>  * On Armada XP (which has dual core and quad core variants)
>
>    - The cache policy of pages must be set to "write allocate".
>    - The SMP bit in the Auxiliary Functional Modes Control Register 0
>      must be set (remember the CPU core is PJ4B)

I know you don't like my answer of this should be done in the bootloader.

>    - The pages must be set as shareable.

What is the impact of setting shareable bit for !SMP? I would think it
would be a don't care for true UP systems. Does the 370 actually not
work with shareable pages? AMP systems could have an issue though.

>  * On Armada 375/38x (which have single core and dual core variants)
>
>    - The cache policy of pages must be set to "write allocate".
>    - The SMP and TLB broadcast bits must be set in the Auxiliary
>      Control Register (the core is a Cortex-A9)

bootloader :)

>    - The pages must be set as shareable.
>    - The SCU must be enabled

This is also needed on A9s using ACP port. (Could be in the bootloader as well.)

While I understand the we can't update the bootloader/firmware
arguments, we also would reasonably expect ACPI based server systems
from the same vendor. Those systems are going to have to get things
right in the firmware and thus will require firmware updates in the
early stages before production systems and OS's are ready. I don't
know that the vendor mindset will ever be changed other than by
refusing to accept changes upstream which should be fixed in firmware
(and distros also refusing to carry them).

> All of these requirements are met when the kernel is configured with
> CONFIG_SMP *and* when the processor is actually a multiple core
> processors (otherwise, due to the CONFIG_SMP_ON_UP logic, is_smp()
> returns false, and most of the requirements above are not met).

I think there are some other problems here with is_smp(). For example,
there are several cases where it is used for determining to read the
MPIDR register. For things such as matching MPIDR to DT cpu nodes, we
need that to work for !SMP kernels.

> For example, as of today, Armada 370 is broken because even if we
> build the kernel CONFIG_SMP, the pages do not have the "write
> allocate" attribute, because Armada 370 is single core, therefore the
> CONFIG_SMP_ON_UP logic decides that we're not on an SMP system, and
> therefore is_smp() returns false. Therefore, the fact that hardware
> I/O coherency is enabled on Armada 370 today is incorrect, and there
> is no way to fix that without making changes to the ARM core code.
>
> Similarly, the Armada 380 (single core variant of the Armada 385) will
> have the same problem: it's a single core processor, but that requires
> several characteristics of an SMP configuration to properly use
> hardware I/O coherency.
>
> Also:
>
>  - Not being able to use hardware I/O coherency on Armada 370 and
>    Armada 380 is not really an option, as it is a major feature of
>    those SoCs.
>
>  - Having to enable CONFIG_SMP is also not very nice, as it comes with
>    other performance penalties that could be avoided by using a
>    !CONFIG_SMP configuration on those single core systems. And again,
>    enabling CONFIG_SMP does *not* work for single core processors due
>    to CONFIG_SMP_ON_UP.
>
> Therefore, this RFC patch series proposes one solution to allow these
> requirements to be met in the situations we are interested. The patch
> series is *only* a proposal, and I'm definitely interested in hearing
> about other implementation possibilites to make things look nicer.
>
> Basically, the patch series goes like this:
>
>  * PATCH 1 adds a 'flags' field to 'struct machine_desc' so that each
>    platform can tell the ARM kernel core some of its requirements. We
>    maybe could have used the Device Tree as well, but accessing the
>    Device Tree as early as in paging_init() is a bit problematic as
>    far as I understand.
>
>    Two flags are defined: MACHINE_NEEDS_CPOLICY_WRITEALLOC and
>    MACHINE_NEEDS_SHAREABLE_PAGES. We need separate flags because
>    Armada 370 cannot have the shareable attribute on page tables.

I think these are too specific. We used to have arch_is_coherent()
which somewhat does what you need, but it was not multi-platform
friendly. I think you need something like "is_io_coherent" and/or
"is_mp".

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-14 17:04 ` [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Catalin Marinas
@ 2014-05-15  9:50   ` Thomas Petazzoni
  2014-05-15 14:22     ` Catalin Marinas
  2014-05-19 13:38   ` Thomas Petazzoni
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-15  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin,

Thanks for having quickly taken a look.

On Wed, 14 May 2014 18:04:56 +0100, Catalin Marinas wrote:
> On Wed, May 14, 2014 at 04:50:34PM +0100, Thomas Petazzoni wrote:
> > This hardware I/O coherency mechanism needs a set of ARM core
> > requirements to operate properly:
> > 
> >  * On Armada 370 (a single core processor)
> > 
> >    - The cache policy of pages must be set to "write allocate".
> 
> Arguably, I would make this the default for ARMv6+ CPUs even if UP. It's
> a hint that the CPU may or may not ignore but it shouldn't break
> anything (well, maybe some artificial benchmarks designed to show
> that write-allocate caches are bad).

So, something like:

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b68c6b2..a1d6845 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -57,7 +57,7 @@ pmd_t *top_pmd;
 #define CPOLICY_WRITEBACK      3
 #define CPOLICY_WRITEALLOC     4
 
-static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
+static unsigned int cachepolicy __initdata = CPOLICY_WRITEALLOC;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
@@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
                if (cachepolicy > CPOLICY_WRITETHROUGH)
                        cachepolicy = CPOLICY_WRITETHROUGH;
 #endif
+               if (cachepolicy > CPOLICY_WRITEBACK)
+                       cachepolicy = CPOLICY_WRITEBACK;
        }
        if (cpu_arch < CPU_ARCH_ARMv5) {
                if (cachepolicy >= CPOLICY_WRITEALLOC)
                        cachepolicy = CPOLICY_WRITEBACK;
                ecc_mask = 0;
        }
-       if (is_smp())
-               cachepolicy = CPOLICY_WRITEALLOC;
 
        /*
         * Strip out features not present on earlier architectures.


> >  * On Armada 375/38x (which have single core and dual core variants)
> > 
> >    - The cache policy of pages must be set to "write allocate".
> >    - The SMP and TLB broadcast bits must be set in the Auxiliary
> >      Control Register (the core is a Cortex-A9)
> 
> What about setting this bit in the firmware/bootloader? It's a sane
> initialisation firmware should do.

I'm sorry, but I don't buy the "fix your bootloader" argument. For
various reasons:

 - The Linux kernel policy has always been to do *more* things in the
   kernel, to avoid relying on crappy vendor-specific bootloaders, and
   avoid having bugs that cannot be fixed at the kernel level. Going in
   the opposite direction looks completely backward.

 - Doing so means crazy complex dependencies of kernel versions against
   bootloader versions, which are a nightmare for users. Recently for
   example, Olof Johansson (which, you will admit, is not the least
   experienced in ARM stuff) had to struggle a long time to get a
   mainline kernel boot on Allwinner kernel, because it was mandatory
   to upgrade the bootloader to do so. He was annoyed, so you can
   imagine how normal users can be annoyed. For us, platform
   maintainers, who support users of the mainline kernel, it is a
   *complete* pain to have such dependencies of a kernel version
   towards a given bootloader version. We already have enough of such
   bootloader/kernel issues to not add more when there are sane
   solutions that allow the kernel to do its own thing without having
   to make crazy assumptions about what the bootloader has done.

 - The kernel is setting the SMP and TLB broadcast bit already in
   proc-v7.s when the system is SMP. What would these actions have to
   be done in the bootloader when the system is non-SMP, but I/O
   coherent, and done in the kernel when the system is SMP? This
   doesn't make sense.

 - Many platforms, including Marvell ones, use vendor-specific
   bootloaders that are *very* difficult to get fixed. Of course, it
   would be a lot nicer to have mainline U-Boot/Barebox support for
   those platforms, but it's not the case today.

So, really, I would like the kernel to do this.

> >    - The pages must be set as shareable.
> 
> Here you may have some conflict between the initial page tables set in
> __create_page_tables as non-shareable (that's unless MPIDR shows it as
> SMP but I guess not since smp-on-up kicks in).

For Armada 375/385/XP, MPIDR shows SMP, but not for Armada 370 (hence
the issue even when booting CONFIG_SMP=y on Armada 370). As for Armada
380 (which is a single core variant of the Armada 385), I haven't had
access to such platforms, so I'm not sure what the MPIDR will look
like. But since it's single core, most users will want to build
CONFIG_SMP=n for this platform, but still have hardware I/O coherency.

> I have to think a bit
> more about the implications (the ARM ARM has a chapter on mismatched
> memory attributes and I think it talks about shareable vs
> non-shareable).

Unfortunately, __create_page_tables is called so early that I don't
know if it's possible to access 'struct machine_desc' at this point to
know whether the platform needs shareable pages or not.

Also, don't we have the same problem with the TTB_FLAGS_{SMP,UP) ?

> >    - The SCU must be enabled
> 
> Again, could the firmware do this?

See above. If the kernel does it for SMP cases, why wouldn't it do it
also for !SMP I/O coherent cases? It's either it's *always* done by the
bootloader and we completely remove the SCU enabling code in the
kernel, and add to the ARM boot protocol that enabling the SCU is the
responsibility of the bootloader, or the kernel does the SCU enabling
in all situations.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-14 17:07 ` Rob Herring
@ 2014-05-15 10:01   ` Thomas Petazzoni
  2014-05-15 13:27     ` Will Deacon
  2014-05-15 14:44     ` Rob Herring
  0 siblings, 2 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 10:01 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Rob Herring,

On Wed, 14 May 2014 12:07:28 -0500, Rob Herring wrote:

> > This hardware I/O coherency mechanism needs a set of ARM core
> > requirements to operate properly:
> >
> >  * On Armada 370 (a single core processor)
> >
> >    - The cache policy of pages must be set to "write allocate".
> 
> Why do we want !SMP to be no write allocate in the first place? Seems
> like we should always enable write-allocate at least for v7.

Ok, seems like it matches the suggestion from Catalin. I've sent a
quick patch in my reply to Catalin, if you could have a look and tell
if it's OK, I can use that in my next version.

> >  * On Armada XP (which has dual core and quad core variants)
> >
> >    - The cache policy of pages must be set to "write allocate".
> >    - The SMP bit in the Auxiliary Functional Modes Control Register 0
> >      must be set (remember the CPU core is PJ4B)
> 
> I know you don't like my answer of this should be done in the bootloader.

See my answer to Catalin about this.

> >    - The pages must be set as shareable.
> 
> What is the impact of setting shareable bit for !SMP? I would think it
> would be a don't care for true UP systems. Does the 370 actually not
> work with shareable pages? AMP systems could have an issue though.

No, the Armada 370 doesn't work with shareable pages. See my cover
letter that details the requirements for the different SoCs, and I've
only mentioned shareable pages as part of the requirements for Armada
XP, Armada 375 and Armada 38x.

> >  * On Armada 375/38x (which have single core and dual core variants)
> >
> >    - The cache policy of pages must be set to "write allocate".
> >    - The SMP and TLB broadcast bits must be set in the Auxiliary
> >      Control Register (the core is a Cortex-A9)
> 
> bootloader :)

No. It's done by the kernel for the SMP case, there's no reason it
should be done in the bootloader for the !SMP I/O coherent case. Or, we
remove the SMP and TLB broadcast bit fiddling in the kernel
*completely* and make it part of the ARM boot protocol that the
bootloader is required to do that.

I really don't see any justification for why it some situations the
bootloader would be responsible for it, and why in some other
situations the kernel would be responsible for it.

> >    - The pages must be set as shareable.
> >    - The SCU must be enabled
> 
> This is also needed on A9s using ACP port. (Could be in the bootloader as well.)
> 
> While I understand the we can't update the bootloader/firmware
> arguments, we also would reasonably expect ACPI based server systems
> from the same vendor. Those systems are going to have to get things
> right in the firmware and thus will require firmware updates in the
> early stages before production systems and OS's are ready. I don't
> know that the vendor mindset will ever be changed other than by
> refusing to accept changes upstream which should be fixed in firmware
> (and distros also refusing to carry them).

See above. It's not only about having issues to get the vendor to fix
the firmware. It's also about:

 * Having a consistent strategy with regard to what the kernel does and
   what the bootloader does. Why would the kernel set the SMP and TLB
   broadcast bit when CONFIG_SMP is set and the processor is SMP, but
   not when CONFIG_SMP is disabled and the system is I/O coherent?

 * User support issues. Having gazillions of bootloader updates every
   time a piece of configuration settings is rejected by the kernel
   maintainers is going to be a nightmare for users.

I certainly do understand the wish to put pressure on vendors to do
their firmware correctly, and I'm continuously pushing Marvell to be
more open, and I'm hoping to get them to do progress in the bootloader
area in the future.

However, by forcing users to do gazillions of bootloader upgrades, you
also make the mainline kernel very difficult to use for them. The
result is that those users will prefer to use the vendor BSP kernels
which can carry all the crazy fixes, and therefore the mainline kernel
is completely untested, unused, and basically just a sort of repository
of code that helps vendor reduce the overhead of bumping from one
kernel version to the other for their own vendor BSP. Is this really
what you want? Clearly not me.

> > All of these requirements are met when the kernel is configured with
> > CONFIG_SMP *and* when the processor is actually a multiple core
> > processors (otherwise, due to the CONFIG_SMP_ON_UP logic, is_smp()
> > returns false, and most of the requirements above are not met).
> 
> I think there are some other problems here with is_smp(). For example,
> there are several cases where it is used for determining to read the
> MPIDR register. For things such as matching MPIDR to DT cpu nodes, we
> need that to work for !SMP kernels.

Could you point more specifically where this is happening?

In fact, another possible solution would be to make is_smp() returns
"true" when the system is !SMP but has I/O coherency enabled. However,
I have the issue that Armada 370 does *not* want shareable pages, while
Armada XP/375/38x *require* that.

> > Basically, the patch series goes like this:
> >
> >  * PATCH 1 adds a 'flags' field to 'struct machine_desc' so that each
> >    platform can tell the ARM kernel core some of its requirements. We
> >    maybe could have used the Device Tree as well, but accessing the
> >    Device Tree as early as in paging_init() is a bit problematic as
> >    far as I understand.
> >
> >    Two flags are defined: MACHINE_NEEDS_CPOLICY_WRITEALLOC and
> >    MACHINE_NEEDS_SHAREABLE_PAGES. We need separate flags because
> >    Armada 370 cannot have the shareable attribute on page tables.
> 
> I think these are too specific. We used to have arch_is_coherent()
> which somewhat does what you need, but it was not multi-platform
> friendly. I think you need something like "is_io_coherent" and/or
> "is_mp".

I agree the flags are very specific, but unfortunately I haven't found
a better way of describing things. The main problem here is that the
different platforms have different requirements: Armada XP/375/38x
require shareable pages, but Armada 370 does not want them. So a single
"is_io_coherent" flag was not sufficient, because each platform has its
own requirements. I'm really open to suggestions on this, but a single
is_io_coherent flag doesn't work, unfortunately.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 10:01   ` Thomas Petazzoni
@ 2014-05-15 13:27     ` Will Deacon
  2014-05-15 13:44       ` Thomas Petazzoni
  2014-05-15 14:44     ` Rob Herring
  1 sibling, 1 reply; 28+ messages in thread
From: Will Deacon @ 2014-05-15 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 15, 2014 at 11:01:52AM +0100, Thomas Petazzoni wrote:
> On Wed, 14 May 2014 12:07:28 -0500, Rob Herring wrote:
> 
> > > This hardware I/O coherency mechanism needs a set of ARM core
> > > requirements to operate properly:
> > >
> > >  * On Armada 370 (a single core processor)
> > >
> > >    - The cache policy of pages must be set to "write allocate".
> > 
> > Why do we want !SMP to be no write allocate in the first place? Seems
> > like we should always enable write-allocate at least for v7.
> 
> Ok, seems like it matches the suggestion from Catalin. I've sent a
> quick patch in my reply to Catalin, if you could have a look and tell
> if it's OK, I can use that in my next version.

It's probably also worth mentioning in your commit log that modern ARM CPUs
can change the allocation policy dynamically based on the access pattern
(e.g. switch to no-allocate when there is a series of streaming stores), so
historical reasons for forcing write-no-allocate aren't applicable on these
cores.

> I really don't see any justification for why it some situations the
> bootloader would be responsible for it, and why in some other
> situations the kernel would be responsible for it.

Well, for better or worse, we are moving in the direction of Linux running
non-secure with a non-trivial amount of higher privileged software running
in the system. We can use this as a basis to decide whether or not Linux
should be setting configuration bits by looking at whether or not these bits
are accessible from non-secure svc mode.

I'm not denying that we've not been following this rule in the past, but
given where we're going with ARMv8 and arm64, the sooner people realise that
the firmware and bootloader have some duties in the way of system
configuration, then the less pain they will have later on when they try to
boot Linux.

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 13:27     ` Will Deacon
@ 2014-05-15 13:44       ` Thomas Petazzoni
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-15 13:44 UTC (permalink / raw)
  To: linux-arm-kernel

Will,

On Thu, 15 May 2014 14:27:54 +0100, Will Deacon wrote:

> > > >  * On Armada 370 (a single core processor)
> > > >
> > > >    - The cache policy of pages must be set to "write allocate".
> > > 
> > > Why do we want !SMP to be no write allocate in the first place? Seems
> > > like we should always enable write-allocate at least for v7.
> > 
> > Ok, seems like it matches the suggestion from Catalin. I've sent a
> > quick patch in my reply to Catalin, if you could have a look and tell
> > if it's OK, I can use that in my next version.
> 
> It's probably also worth mentioning in your commit log that modern ARM CPUs
> can change the allocation policy dynamically based on the access pattern
> (e.g. switch to no-allocate when there is a series of streaming stores), so
> historical reasons for forcing write-no-allocate aren't applicable on these
> cores.

Ok.

> > I really don't see any justification for why it some situations the
> > bootloader would be responsible for it, and why in some other
> > situations the kernel would be responsible for it.
> 
> Well, for better or worse, we are moving in the direction of Linux running
> non-secure with a non-trivial amount of higher privileged software running
> in the system. We can use this as a basis to decide whether or not Linux
> should be setting configuration bits by looking at whether or not these bits
> are accessible from non-secure svc mode.
> 
> I'm not denying that we've not been following this rule in the past, but
> given where we're going with ARMv8 and arm64, the sooner people realise that
> the firmware and bootloader have some duties in the way of system
> configuration, then the less pain they will have later on when they try to
> boot Linux.

Sure. I certainly do not deny that certain things have to be done by
the bootloader. What I strongly dislike is to be told to move into the
bootloader some initialization (such as the SMP bit, TLB broadcast bit,
and SCU initialization) which is currently done inside the kernel for
SMP configurations. Either this configuration can be done in the
kernel, and then it should be done in the kernel regardless if it's
needed for SMP or hardware I/O coherency. Or said configuration cannot
be done in the kernel (due to Linux running non-secure), and then it is
done in the bootloader in all cases, regardless of whether the
particular configuration is needed for SMP or hardware I/O coherency.

Asking me to push to the bootloader some elements of configurations
that are currently done by the kernel in the SMP case is really a poor
way of escaping the problems I'm trying to solve :-)

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15  9:50   ` Thomas Petazzoni
@ 2014-05-15 14:22     ` Catalin Marinas
  2014-05-15 14:59       ` Rob Herring
  2014-05-19  9:17       ` Thomas Petazzoni
  0 siblings, 2 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-05-15 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 15, 2014 at 10:50:10AM +0100, Thomas Petazzoni wrote:
> On Wed, 14 May 2014 18:04:56 +0100, Catalin Marinas wrote:
> > On Wed, May 14, 2014 at 04:50:34PM +0100, Thomas Petazzoni wrote:
> > > This hardware I/O coherency mechanism needs a set of ARM core
> > > requirements to operate properly:
> > > 
> > >  * On Armada 370 (a single core processor)
> > > 
> > >    - The cache policy of pages must be set to "write allocate".
> > 
> > Arguably, I would make this the default for ARMv6+ CPUs even if UP. It's
> > a hint that the CPU may or may not ignore but it shouldn't break
> > anything (well, maybe some artificial benchmarks designed to show
> > that write-allocate caches are bad).
> 
> So, something like:
> 
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index b68c6b2..a1d6845 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -57,7 +57,7 @@ pmd_t *top_pmd;
>  #define CPOLICY_WRITEBACK      3
>  #define CPOLICY_WRITEALLOC     4
>  
> -static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
> +static unsigned int cachepolicy __initdata = CPOLICY_WRITEALLOC;
>  static unsigned int ecc_mask __initdata = 0;
>  pgprot_t pgprot_user;
>  pgprot_t pgprot_kernel;
> @@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
>                 if (cachepolicy > CPOLICY_WRITETHROUGH)
>                         cachepolicy = CPOLICY_WRITETHROUGH;
>  #endif
> +               if (cachepolicy > CPOLICY_WRITEBACK)
> +                       cachepolicy = CPOLICY_WRITEBACK;
>         }

There is some more clean-up to do in this file as there are other
assumptions about cachepolicy. For example, early_cachepolicy() sets it
back to writeback (and possibly the initial kernel mapping to avoid
aliases, though having the same type, just the allocation hint doesn't
really cause problems).

> > >  * On Armada 375/38x (which have single core and dual core variants)
> > > 
> > >    - The cache policy of pages must be set to "write allocate".
> > >    - The SMP and TLB broadcast bits must be set in the Auxiliary
> > >      Control Register (the core is a Cortex-A9)
> > 
> > What about setting this bit in the firmware/bootloader? It's a sane
> > initialisation firmware should do.
> 
> I'm sorry, but I don't buy the "fix your bootloader" argument. For
> various reasons:

Well, for arm64 people should get used to this because I'm not taking
code for setting up things like SCU and ACTLR during boot.

>  - The Linux kernel policy has always been to do *more* things in the
>    kernel, to avoid relying on crappy vendor-specific bootloaders, and
>    avoid having bugs that cannot be fixed at the kernel level. Going in
>    the opposite direction looks completely backward.

There is no written policy here for how close to the hardware reset the
kernel should run. With the same arguments people may want to initialise
the DRAM controller in Linux while booting in SRAM temporarily (or
ROM/flash + XIP).

But in this particular case, let's look at ACTLR. It's a secure-only
register, so what if someone decides to deploy Linux as a non-secure OS
on this platform? Do you add a config option to have a different kernel
build?

>  - Doing so means crazy complex dependencies of kernel versions against
>    bootloader versions, which are a nightmare for users.

Because we didn't have a clear policy here, the secure extensions came
in later when we were already doing things like this. It's more
difficult to enforce it now but we face such issues regularly. For
example, on several occasions we had TI asking for per-SoC CPU
initialisation (possibly as addition to __v7_setup) for specific SMC
calls to the firmware which breaks single Image.

It make sense a lot of sense to place sane initialisation in a
SoC-specific firmware/bootloader but whether we want to enforce such
policy is Russell's call.

>  - The kernel is setting the SMP and TLB broadcast bit already in
>    proc-v7.s when the system is SMP. What would these actions have to
>    be done in the bootloader when the system is non-SMP, but I/O
>    coherent, and done in the kernel when the system is SMP? This
>    doesn't make sense.

Because of historical reasons, we can't remove the ACTLR setting in
proc-v7.S, as much as I would like. But when Linux runs in non-secure
mode, such initialisation is already done by firmware, so it's not
unheard of in the 32-bit ARM world.

> > >    - The pages must be set as shareable.
> > 
> > Here you may have some conflict between the initial page tables set in
> > __create_page_tables as non-shareable (that's unless MPIDR shows it as
> > SMP but I guess not since smp-on-up kicks in).
[...]
> > I have to think a bit
> > more about the implications (the ARM ARM has a chapter on mismatched
> > memory attributes and I think it talks about shareable vs
> > non-shareable).
> 
> Unfortunately, __create_page_tables is called so early that I don't
> know if it's possible to access 'struct machine_desc' at this point to
> know whether the platform needs shareable pages or not.
> 
> Also, don't we have the same problem with the TTB_FLAGS_{SMP,UP) ?

I don't think we do because we use the ALT_{SMP,UP} for TTB settings and
we have the smp-on-up fixup running before __v7_setup.

> > >    - The SCU must be enabled
> > 
> > Again, could the firmware do this?
> 
> See above. If the kernel does it for SMP cases, why wouldn't it do it
> also for !SMP I/O coherent cases?

The I/O coherency is an SoC property rather than an ARM architecture
property. I want to separate the two so that the kernel can boot a
significant part assuming sane architecture settings. Once you have DT
available and start loading device drivers, you are in the SoC realm and
you can do whatever initialisation for buses, interconnects, but not
going back to change architected settings.

> It's either it's *always* done by the
> bootloader and we completely remove the SCU enabling code in the
> kernel, and add to the ARM boot protocol that enabling the SCU is the
> responsibility of the bootloader, or the kernel does the SCU enabling
> in all situations.

I'm always for the former (and that's the case for arm64), though for
some older boards it's too late to change.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 10:01   ` Thomas Petazzoni
  2014-05-15 13:27     ` Will Deacon
@ 2014-05-15 14:44     ` Rob Herring
  2014-05-19  9:31       ` Thomas Petazzoni
  1 sibling, 1 reply; 28+ messages in thread
From: Rob Herring @ 2014-05-15 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 15, 2014 at 5:01 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Dear Rob Herring,
>
> On Wed, 14 May 2014 12:07:28 -0500, Rob Herring wrote:
>
>> > This hardware I/O coherency mechanism needs a set of ARM core
>> > requirements to operate properly:
>> >
>> >  * On Armada 370 (a single core processor)
>> >
>> >    - The cache policy of pages must be set to "write allocate".
>>
>> Why do we want !SMP to be no write allocate in the first place? Seems
>> like we should always enable write-allocate at least for v7.
>
> Ok, seems like it matches the suggestion from Catalin. I've sent a
> quick patch in my reply to Catalin, if you could have a look and tell
> if it's OK, I can use that in my next version.

Looks fine to me.

>> >  * On Armada XP (which has dual core and quad core variants)
>> >
>> >    - The cache policy of pages must be set to "write allocate".
>> >    - The SMP bit in the Auxiliary Functional Modes Control Register 0
>> >      must be set (remember the CPU core is PJ4B)
>>
>> I know you don't like my answer of this should be done in the bootloader.
>
> See my answer to Catalin about this.

I'll reply in the other thread.

>> >    - The pages must be set as shareable.
>>
>> What is the impact of setting shareable bit for !SMP? I would think it
>> would be a don't care for true UP systems. Does the 370 actually not
>> work with shareable pages? AMP systems could have an issue though.
>
> No, the Armada 370 doesn't work with shareable pages. See my cover
> letter that details the requirements for the different SoCs, and I've
> only mentioned shareable pages as part of the requirements for Armada
> XP, Armada 375 and Armada 38x.

I asked because you only mentioned platforms requiring shared pages.
That didn't tell me whether some platforms can work either way. It
would take some hacks to test that out including dealing the initial
mappings.

>> >  * On Armada 375/38x (which have single core and dual core variants)
>> >
>> >    - The cache policy of pages must be set to "write allocate".
>> >    - The SMP and TLB broadcast bits must be set in the Auxiliary
>> >      Control Register (the core is a Cortex-A9)
>>
>> bootloader :)
>
> No. It's done by the kernel for the SMP case, there's no reason it
> should be done in the bootloader for the !SMP I/O coherent case. Or, we
> remove the SMP and TLB broadcast bit fiddling in the kernel
> *completely* and make it part of the ARM boot protocol that the
> bootloader is required to do that.

No, it depends on the A9 configuration. The TLB broadcast is not
modifiable in non-secure mode. You can control non-secure access to
the SMP bit, but it can work with it enabled by secure world. This is
what highbank does.

> I really don't see any justification for why it some situations the
> bootloader would be responsible for it, and why in some other
> situations the kernel would be responsible for it.

Like many other cases, we did not enforce what the bootloader must do
in the past. The mixture of platforms running in secure and non-secure
mode has led to some of it.

>> >    - The pages must be set as shareable.
>> >    - The SCU must be enabled
>>
>> This is also needed on A9s using ACP port. (Could be in the bootloader as well.)
>>
>> While I understand the we can't update the bootloader/firmware
>> arguments, we also would reasonably expect ACPI based server systems
>> from the same vendor. Those systems are going to have to get things
>> right in the firmware and thus will require firmware updates in the
>> early stages before production systems and OS's are ready. I don't
>> know that the vendor mindset will ever be changed other than by
>> refusing to accept changes upstream which should be fixed in firmware
>> (and distros also refusing to carry them).
>
> See above. It's not only about having issues to get the vendor to fix
> the firmware. It's also about:
>
>  * Having a consistent strategy with regard to what the kernel does and
>    what the bootloader does. Why would the kernel set the SMP and TLB
>    broadcast bit when CONFIG_SMP is set and the processor is SMP, but
>    not when CONFIG_SMP is disabled and the system is I/O coherent?

It is not that simple. The SMP bit has different meanings depending on
the core, but is more related to cache usage than running multi-core
or not. On the A15 for example, IIRC, the SMP bit basically means
disable cache lookups while the C bit only means disable cache
allocations (i.e. cache hits can occur with the "cache disabled").

>  * User support issues. Having gazillions of bootloader updates every
>    time a piece of configuration settings is rejected by the kernel
>    maintainers is going to be a nightmare for users.

It's an artifact of trying to upstream the kernel after shipping
rather than before. If these issues were fixed in the bootloader at
the same time as someone decided to just hack up the vendor kernel,
then probably none of us would even know about the issue.

> I certainly do understand the wish to put pressure on vendors to do
> their firmware correctly, and I'm continuously pushing Marvell to be
> more open, and I'm hoping to get them to do progress in the bootloader
> area in the future.
>
> However, by forcing users to do gazillions of bootloader upgrades, you
> also make the mainline kernel very difficult to use for them. The
> result is that those users will prefer to use the vendor BSP kernels
> which can carry all the crazy fixes, and therefore the mainline kernel
> is completely untested, unused, and basically just a sort of repository
> of code that helps vendor reduce the overhead of bumping from one
> kernel version to the other for their own vendor BSP. Is this really
> what you want? Clearly not me.

As long as upstream is not a requirement, you are not in a position
you can win. You can never keep up with someone that has the freedom
to just go and change whatever they want without having to deal with
those annoying kernel maintainers. You fix these issues, there will
just be other reasons why mainline is not usable (Do they still have
the random one line deletion of a goto statement in the scheduler).

Speaking from experience rebasing i.MX vendor kernels, nothing
upstream is easier to update than partially upstream if you try to
build upon what is upstream. Generally speaking, vendors don't care
what their delta against upstream is any more than they care about
upstream being production quality.

Obviously, Marvell does care about mainline to some extent or I
wouldn't be talking to you. But how do we get them to care enough to
make mainline production quality? Part of the problem is we have
distro's willing to take your money and vendor kernel (they are also
willing to take your money and upstream kernel as well).

>> > All of these requirements are met when the kernel is configured with
>> > CONFIG_SMP *and* when the processor is actually a multiple core
>> > processors (otherwise, due to the CONFIG_SMP_ON_UP logic, is_smp()
>> > returns false, and most of the requirements above are not met).
>>
>> I think there are some other problems here with is_smp(). For example,
>> there are several cases where it is used for determining to read the
>> MPIDR register. For things such as matching MPIDR to DT cpu nodes, we
>> need that to work for !SMP kernels.
>
> Could you point more specifically where this is happening?

Just grep for is_smp. IIRC, the DT cpu parsing code spews a warning
when running a !SMP kernel on an MP system. It may only be when the
MPIDR is not 0 like highbank which has 0x900.

> In fact, another possible solution would be to make is_smp() returns
> "true" when the system is !SMP but has I/O coherency enabled. However,
> I have the issue that Armada 370 does *not* want shareable pages, while
> Armada XP/375/38x *require* that.

We'd need to look more closely at how is_smp is used to redefine it's
meaning. We probably need to be able to describe "I have MP
extensions", "I'm MP on SMP kernel" and "I'm UP running an SMP
kernel".

>> > Basically, the patch series goes like this:
>> >
>> >  * PATCH 1 adds a 'flags' field to 'struct machine_desc' so that each
>> >    platform can tell the ARM kernel core some of its requirements. We
>> >    maybe could have used the Device Tree as well, but accessing the
>> >    Device Tree as early as in paging_init() is a bit problematic as
>> >    far as I understand.
>> >
>> >    Two flags are defined: MACHINE_NEEDS_CPOLICY_WRITEALLOC and
>> >    MACHINE_NEEDS_SHAREABLE_PAGES. We need separate flags because
>> >    Armada 370 cannot have the shareable attribute on page tables.
>>
>> I think these are too specific. We used to have arch_is_coherent()
>> which somewhat does what you need, but it was not multi-platform
>> friendly. I think you need something like "is_io_coherent" and/or
>> "is_mp".
>
> I agree the flags are very specific, but unfortunately I haven't found
> a better way of describing things. The main problem here is that the
> different platforms have different requirements: Armada XP/375/38x
> require shareable pages, but Armada 370 does not want them. So a single
> "is_io_coherent" flag was not sufficient, because each platform has its
> own requirements. I'm really open to suggestions on this, but a single
> is_io_coherent flag doesn't work, unfortunately.

With the write-allocate change, you only have a single flag so it is
really just a question of name and how to control the flag.

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 14:22     ` Catalin Marinas
@ 2014-05-15 14:59       ` Rob Herring
  2014-05-15 15:25         ` Catalin Marinas
  2014-05-19  9:17       ` Thomas Petazzoni
  1 sibling, 1 reply; 28+ messages in thread
From: Rob Herring @ 2014-05-15 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 15, 2014 at 9:22 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Thu, May 15, 2014 at 10:50:10AM +0100, Thomas Petazzoni wrote:
>> On Wed, 14 May 2014 18:04:56 +0100, Catalin Marinas wrote:
>> > On Wed, May 14, 2014 at 04:50:34PM +0100, Thomas Petazzoni wrote:
>> > > This hardware I/O coherency mechanism needs a set of ARM core
>> > > requirements to operate properly:

[...]

>> > >    - The SCU must be enabled
>> >
>> > Again, could the firmware do this?
>>
>> See above. If the kernel does it for SMP cases, why wouldn't it do it
>> also for !SMP I/O coherent cases?
>
> The I/O coherency is an SoC property rather than an ARM architecture
> property. I want to separate the two so that the kernel can boot a
> significant part assuming sane architecture settings. Once you have DT
> available and start loading device drivers, you are in the SoC realm and
> you can do whatever initialisation for buses, interconnects, but not
> going back to change architected settings.

The SCU has nothing to do with the architecture and really is part of
the SOC. Let's look at this another way. Are there any usecases where
you would not enable the SCU? If any cores are coherent or the ACP is
coherent, it must be on. So that leaves all core in AMP mode. In this
case, does it matter if the SCU is enabled or not?

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 14:59       ` Rob Herring
@ 2014-05-15 15:25         ` Catalin Marinas
  2014-05-15 19:11           ` Rob Herring
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-05-15 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 15, 2014 at 03:59:31PM +0100, Rob Herring wrote:
> On Thu, May 15, 2014 at 9:22 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Thu, May 15, 2014 at 10:50:10AM +0100, Thomas Petazzoni wrote:
> >> On Wed, 14 May 2014 18:04:56 +0100, Catalin Marinas wrote:
> >> > On Wed, May 14, 2014 at 04:50:34PM +0100, Thomas Petazzoni wrote:
> >> > >    - The SCU must be enabled
> >> >
> >> > Again, could the firmware do this?
> >>
> >> See above. If the kernel does it for SMP cases, why wouldn't it do it
> >> also for !SMP I/O coherent cases?
> >
> > The I/O coherency is an SoC property rather than an ARM architecture
> > property. I want to separate the two so that the kernel can boot a
> > significant part assuming sane architecture settings. Once you have DT
> > available and start loading device drivers, you are in the SoC realm and
> > you can do whatever initialisation for buses, interconnects, but not
> > going back to change architected settings.
> 
> The SCU has nothing to do with the architecture and really is part of
> the SOC. 

Indeed, it's not part of the architecture but I don't see it any
different than other early configuration like SDRAM controller.

> Let's look at this another way. Are there any usecases where
> you would not enable the SCU? If any cores are coherent or the ACP is
> coherent, it must be on. So that leaves all core in AMP mode. In this
> case, does it matter if the SCU is enabled or not?

I don't fully follow the question. You may not enable the SCU if you
don't care at all about the SMP mode or other coherency like ACP.
Otherwise it should be enabled, the latest before the secondaries start.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 15:25         ` Catalin Marinas
@ 2014-05-15 19:11           ` Rob Herring
  2014-05-16 15:11             ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Rob Herring @ 2014-05-15 19:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 15, 2014 at 10:25 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Thu, May 15, 2014 at 03:59:31PM +0100, Rob Herring wrote:
>> On Thu, May 15, 2014 at 9:22 AM, Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>> > On Thu, May 15, 2014 at 10:50:10AM +0100, Thomas Petazzoni wrote:
>> >> On Wed, 14 May 2014 18:04:56 +0100, Catalin Marinas wrote:
>> >> > On Wed, May 14, 2014 at 04:50:34PM +0100, Thomas Petazzoni wrote:
>> >> > >    - The SCU must be enabled
>> >> >
>> >> > Again, could the firmware do this?
>> >>
>> >> See above. If the kernel does it for SMP cases, why wouldn't it do it
>> >> also for !SMP I/O coherent cases?
>> >
>> > The I/O coherency is an SoC property rather than an ARM architecture
>> > property. I want to separate the two so that the kernel can boot a
>> > significant part assuming sane architecture settings. Once you have DT
>> > available and start loading device drivers, you are in the SoC realm and
>> > you can do whatever initialisation for buses, interconnects, but not
>> > going back to change architected settings.
>>
>> The SCU has nothing to do with the architecture and really is part of
>> the SOC.
>
> Indeed, it's not part of the architecture but I don't see it any
> different than other early configuration like SDRAM controller.
>
>> Let's look at this another way. Are there any usecases where
>> you would not enable the SCU? If any cores are coherent or the ACP is
>> coherent, it must be on. So that leaves all core in AMP mode. In this
>> case, does it matter if the SCU is enabled or not?
>
> I don't fully follow the question. You may not enable the SCU if you
> don't care at all about the SMP mode or other coherency like ACP.
> Otherwise it should be enabled, the latest before the secondaries start.

My question is should the kernel unconditionally enable the SCU
regardless of CONFIG_SMP or any other condition? Is there harm in
enabling the SCU when in AMP mode? I'm just looking to simplify the
kernel side since this probably can't be moved to the bootloader in
all cases.

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 19:11           ` Rob Herring
@ 2014-05-16 15:11             ` Catalin Marinas
  2014-05-19  9:19               ` Thomas Petazzoni
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-05-16 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 15, 2014 at 08:11:13PM +0100, Rob Herring wrote:
> On Thu, May 15, 2014 at 10:25 AM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Thu, May 15, 2014 at 03:59:31PM +0100, Rob Herring wrote:
> >> On Thu, May 15, 2014 at 9:22 AM, Catalin Marinas
> >> <catalin.marinas@arm.com> wrote:
> >> > On Thu, May 15, 2014 at 10:50:10AM +0100, Thomas Petazzoni wrote:
> >> >> On Wed, 14 May 2014 18:04:56 +0100, Catalin Marinas wrote:
> >> >> > On Wed, May 14, 2014 at 04:50:34PM +0100, Thomas Petazzoni wrote:
> >> >> > >    - The SCU must be enabled
> >> >> >
> >> >> > Again, could the firmware do this?
> >> >>
> >> >> See above. If the kernel does it for SMP cases, why wouldn't it do it
> >> >> also for !SMP I/O coherent cases?
> >> >
> >> > The I/O coherency is an SoC property rather than an ARM architecture
> >> > property. I want to separate the two so that the kernel can boot a
> >> > significant part assuming sane architecture settings. Once you have DT
> >> > available and start loading device drivers, you are in the SoC realm and
> >> > you can do whatever initialisation for buses, interconnects, but not
> >> > going back to change architected settings.
> >>
> >> The SCU has nothing to do with the architecture and really is part of
> >> the SOC.
> >
> > Indeed, it's not part of the architecture but I don't see it any
> > different than other early configuration like SDRAM controller.
> >
> >> Let's look at this another way. Are there any usecases where
> >> you would not enable the SCU? If any cores are coherent or the ACP is
> >> coherent, it must be on. So that leaves all core in AMP mode. In this
> >> case, does it matter if the SCU is enabled or not?
> >
> > I don't fully follow the question. You may not enable the SCU if you
> > don't care at all about the SMP mode or other coherency like ACP.
> > Otherwise it should be enabled, the latest before the secondaries start.
> 
> My question is should the kernel unconditionally enable the SCU
> regardless of CONFIG_SMP or any other condition? Is there harm in
> enabling the SCU when in AMP mode?

I don't see why we couldn't always enable the SCU at boot, whether the
OS runs SMP, AMP or UP mode.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 14:22     ` Catalin Marinas
  2014-05-15 14:59       ` Rob Herring
@ 2014-05-19  9:17       ` Thomas Petazzoni
  2014-05-19 10:42         ` Catalin Marinas
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-19  9:17 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Catalin Marinas,

On Thu, 15 May 2014 15:22:05 +0100, Catalin Marinas wrote:

> > -static unsigned int cachepolicy __initdata = CPOLICY_WRITEBACK;
> > +static unsigned int cachepolicy __initdata = CPOLICY_WRITEALLOC;
> >  static unsigned int ecc_mask __initdata = 0;
> >  pgprot_t pgprot_user;
> >  pgprot_t pgprot_kernel;
> > @@ -408,14 +408,14 @@ static void __init build_mem_type_table(void)
> >                 if (cachepolicy > CPOLICY_WRITETHROUGH)
> >                         cachepolicy = CPOLICY_WRITETHROUGH;
> >  #endif
> > +               if (cachepolicy > CPOLICY_WRITEBACK)
> > +                       cachepolicy = CPOLICY_WRITEBACK;
> >         }
> 
> There is some more clean-up to do in this file as there are other
> assumptions about cachepolicy. For example, early_cachepolicy() sets it
> back to writeback (and possibly the initial kernel mapping to avoid
> aliases, though having the same type, just the allocation hint doesn't
> really cause problems).

Ok.

> > > >  * On Armada 375/38x (which have single core and dual core variants)
> > > > 
> > > >    - The cache policy of pages must be set to "write allocate".
> > > >    - The SMP and TLB broadcast bits must be set in the Auxiliary
> > > >      Control Register (the core is a Cortex-A9)
> > > 
> > > What about setting this bit in the firmware/bootloader? It's a sane
> > > initialisation firmware should do.
> > 
> > I'm sorry, but I don't buy the "fix your bootloader" argument. For
> > various reasons:
> 
> Well, for arm64 people should get used to this because I'm not taking
> code for setting up things like SCU and ACTLR during boot.

I have no problem about the bootloader doing some initialization, so if
that's a requirement for arm64, no problem. But it should be a
requirement regardless of the kernel configuration: in the current
kernel, some initialization is done when CONFIG_SMP is enabled. And now
that the same initialization is needed in !CONFIG_SMP, I'm asked to do
it in the bootloader. That's the thing I don't buy.

> >  - The Linux kernel policy has always been to do *more* things in the
> >    kernel, to avoid relying on crappy vendor-specific bootloaders, and
> >    avoid having bugs that cannot be fixed at the kernel level. Going in
> >    the opposite direction looks completely backward.
> 
> There is no written policy here for how close to the hardware reset the
> kernel should run. With the same arguments people may want to initialise
> the DRAM controller in Linux while booting in SRAM temporarily (or
> ROM/flash + XIP).

Indeed.

> But in this particular case, let's look at ACTLR. It's a secure-only
> register, so what if someone decides to deploy Linux as a non-secure OS
> on this platform? Do you add a config option to have a different kernel
> build?

The ACTLR *is* currently accessed by the Linux kernel to set the SMP
bit and TLB broadcast bit. So the potential problem you're pointing
already exists, and is not made worse by the patches I'm proposing.

> >  - Doing so means crazy complex dependencies of kernel versions against
> >    bootloader versions, which are a nightmare for users.
> 
> Because we didn't have a clear policy here, the secure extensions came
> in later when we were already doing things like this. It's more
> difficult to enforce it now but we face such issues regularly. For
> example, on several occasions we had TI asking for per-SoC CPU
> initialisation (possibly as addition to __v7_setup) for specific SMC
> calls to the firmware which breaks single Image.
> 
> It make sense a lot of sense to place sane initialisation in a
> SoC-specific firmware/bootloader but whether we want to enforce such
> policy is Russell's call.

Again, you're forgetting that I'm not asking to add *additional*
initialization. All the initialization I'm looking at is *already* done
by the kernel today. All what my patches are doing is make this
initialization also happen in other situations than just CONFIG_SMP.

> > Unfortunately, __create_page_tables is called so early that I don't
> > know if it's possible to access 'struct machine_desc' at this point to
> > know whether the platform needs shareable pages or not.
> > 
> > Also, don't we have the same problem with the TTB_FLAGS_{SMP,UP) ?
> 
> I don't think we do because we use the ALT_{SMP,UP} for TTB settings and
> we have the smp-on-up fixup running before __v7_setup.

Ok. For Armada 370, I don't need shareable pages. For Armada 380, I
need shareable pages, but even if the processor is single core, I
believe it pretends to be SMP in MPIDR.

> > > >    - The SCU must be enabled
> > > 
> > > Again, could the firmware do this?
> > 
> > See above. If the kernel does it for SMP cases, why wouldn't it do it
> > also for !SMP I/O coherent cases?
> 
> The I/O coherency is an SoC property rather than an ARM architecture
> property. I want to separate the two so that the kernel can boot a
> significant part assuming sane architecture settings. Once you have DT
> available and start loading device drivers, you are in the SoC realm and
> you can do whatever initialisation for buses, interconnects, but not
> going back to change architected settings.

But, yes, but I don't see how it relates with the discussion here, and
how it solves the problem. Could you be more specific?

> > It's either it's *always* done by the
> > bootloader and we completely remove the SCU enabling code in the
> > kernel, and add to the ARM boot protocol that enabling the SCU is the
> > responsibility of the bootloader, or the kernel does the SCU enabling
> > in all situations.
> 
> I'm always for the former (and that's the case for arm64), though for
> some older boards it's too late to change.

Indeed. So how do we move forward?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-16 15:11             ` Catalin Marinas
@ 2014-05-19  9:19               ` Thomas Petazzoni
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-19  9:19 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Catalin Marinas,

On Fri, 16 May 2014 16:11:30 +0100, Catalin Marinas wrote:

> > My question is should the kernel unconditionally enable the SCU
> > regardless of CONFIG_SMP or any other condition? Is there harm in
> > enabling the SCU when in AMP mode?
> 
> I don't see why we couldn't always enable the SCU at boot, whether the
> OS runs SMP, AMP or UP mode.

Does this means that my "[RFC PATCHv1 4/7] ARM: kernel: allow the SCU
to be enabled even on !SMP" is OK ?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-15 14:44     ` Rob Herring
@ 2014-05-19  9:31       ` Thomas Petazzoni
  2014-05-19 16:53         ` Rob Herring
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-19  9:31 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Rob Herring,

On Thu, 15 May 2014 09:44:11 -0500, Rob Herring wrote:

> >> >    - The pages must be set as shareable.
> >>
> >> What is the impact of setting shareable bit for !SMP? I would think it
> >> would be a don't care for true UP systems. Does the 370 actually not
> >> work with shareable pages? AMP systems could have an issue though.
> >
> > No, the Armada 370 doesn't work with shareable pages. See my cover
> > letter that details the requirements for the different SoCs, and I've
> > only mentioned shareable pages as part of the requirements for Armada
> > XP, Armada 375 and Armada 38x.
> 
> I asked because you only mentioned platforms requiring shared pages.

Nope. See again the cover letter text. It does clearly states that
Armada 370 does *not* want shareable pages, and that Armada XP, Armada
375 and Armada 38x *do* want shareable pages.

Armada 370 cannot work with shareable pages. Armada XP, 375 and 38x
cannot have I/O coherency working without shareable pages.

> > No. It's done by the kernel for the SMP case, there's no reason it
> > should be done in the bootloader for the !SMP I/O coherent case. Or, we
> > remove the SMP and TLB broadcast bit fiddling in the kernel
> > *completely* and make it part of the ARM boot protocol that the
> > bootloader is required to do that.
> 
> No, it depends on the A9 configuration. The TLB broadcast is not
> modifiable in non-secure mode. You can control non-secure access to
> the SMP bit, but it can work with it enabled by secure world. This is
> what highbank does.

So I'm not sure what your suggestion is here. Again, if the kernel does
it in the CONFIG_SMP case, why couldn't it also do it in
the !CONFIG_SMP case when needed?

> > See above. It's not only about having issues to get the vendor to fix
> > the firmware. It's also about:
> >
> >  * Having a consistent strategy with regard to what the kernel does and
> >    what the bootloader does. Why would the kernel set the SMP and TLB
> >    broadcast bit when CONFIG_SMP is set and the processor is SMP, but
> >    not when CONFIG_SMP is disabled and the system is I/O coherent?
> 
> It is not that simple. The SMP bit has different meanings depending on
> the core, but is more related to cache usage than running multi-core
> or not. On the A15 for example, IIRC, the SMP bit basically means
> disable cache lookups while the C bit only means disable cache
> allocations (i.e. cache hits can occur with the "cache disabled").

Hum, right, and?

> >  * User support issues. Having gazillions of bootloader updates every
> >    time a piece of configuration settings is rejected by the kernel
> >    maintainers is going to be a nightmare for users.
> 
> It's an artifact of trying to upstream the kernel after shipping
> rather than before. If these issues were fixed in the bootloader at
> the same time as someone decided to just hack up the vendor kernel,
> then probably none of us would even know about the issue.

Sorry, but that just doesn't work. The time it takes to mainline things
is *way* longer than the time it takes for the SoC vendor to start
providing early SoC releases and development boards with bootloaders to
customers. There will also be some bootloader pushed out to customers
before the code is mainlined in Linux.

> As long as upstream is not a requirement, you are not in a position
> you can win.

Have you seen the number of patches we have sent over the last two
years for these platforms? You must have missed them, or otherwise I
don't understand how you manage to conclude that upstream is not a
requirement.

> You can never keep up with someone that has the freedom
> to just go and change whatever they want without having to deal with
> those annoying kernel maintainers. You fix these issues, there will
> just be other reasons why mainline is not usable (Do they still have
> the random one line deletion of a goto statement in the scheduler).
> 
> Speaking from experience rebasing i.MX vendor kernels, nothing
> upstream is easier to update than partially upstream if you try to
> build upon what is upstream. Generally speaking, vendors don't care
> what their delta against upstream is any more than they care about
> upstream being production quality.
> 
> Obviously, Marvell does care about mainline to some extent or I
> wouldn't be talking to you. But how do we get them to care enough to
> make mainline production quality? Part of the problem is we have
> distro's willing to take your money and vendor kernel (they are also
> willing to take your money and upstream kernel as well).

Could you be more specific about what Marvell isn't doing today to be
mainline production ready? As I said, we've been working hard since two
years to push to mainline the support for their SoCs. Our patches can
certainly be criticized, but the end result is clear: their Armada 370,
XP and now 375/38x platforms are quite well supported in mainline. And
since the patches have been merged, they surely have matched the
mainline kernel quality requirements, no?

> >> > All of these requirements are met when the kernel is configured with
> >> > CONFIG_SMP *and* when the processor is actually a multiple core
> >> > processors (otherwise, due to the CONFIG_SMP_ON_UP logic, is_smp()
> >> > returns false, and most of the requirements above are not met).
> >>
> >> I think there are some other problems here with is_smp(). For example,
> >> there are several cases where it is used for determining to read the
> >> MPIDR register. For things such as matching MPIDR to DT cpu nodes, we
> >> need that to work for !SMP kernels.
> >
> > Could you point more specifically where this is happening?
> 
> Just grep for is_smp. IIRC, the DT cpu parsing code spews a warning
> when running a !SMP kernel on an MP system. It may only be when the
> MPIDR is not 0 like highbank which has 0x900.

I'll have a look at the other is_smp() call sites.

> > In fact, another possible solution would be to make is_smp() returns
> > "true" when the system is !SMP but has I/O coherency enabled. However,
> > I have the issue that Armada 370 does *not* want shareable pages, while
> > Armada XP/375/38x *require* that.
> 
> We'd need to look more closely at how is_smp is used to redefine it's
> meaning. We probably need to be able to describe "I have MP
> extensions", "I'm MP on SMP kernel" and "I'm UP running an SMP
> kernel".

Right.

> >> > Basically, the patch series goes like this:
> >> >
> >> >  * PATCH 1 adds a 'flags' field to 'struct machine_desc' so that each
> >> >    platform can tell the ARM kernel core some of its requirements. We
> >> >    maybe could have used the Device Tree as well, but accessing the
> >> >    Device Tree as early as in paging_init() is a bit problematic as
> >> >    far as I understand.
> >> >
> >> >    Two flags are defined: MACHINE_NEEDS_CPOLICY_WRITEALLOC and
> >> >    MACHINE_NEEDS_SHAREABLE_PAGES. We need separate flags because
> >> >    Armada 370 cannot have the shareable attribute on page tables.
> >>
> >> I think these are too specific. We used to have arch_is_coherent()
> >> which somewhat does what you need, but it was not multi-platform
> >> friendly. I think you need something like "is_io_coherent" and/or
> >> "is_mp".
> >
> > I agree the flags are very specific, but unfortunately I haven't found
> > a better way of describing things. The main problem here is that the
> > different platforms have different requirements: Armada XP/375/38x
> > require shareable pages, but Armada 370 does not want them. So a single
> > "is_io_coherent" flag was not sufficient, because each platform has its
> > own requirements. I'm really open to suggestions on this, but a single
> > is_io_coherent flag doesn't work, unfortunately.
> 
> With the write-allocate change, you only have a single flag so it is
> really just a question of name and how to control the flag.

Correct. Though I'm not sure how to name this flag. Let's say we name
is "is_io_coherent" and it controls whether shareable pages must be
enabled or not. Then this flag would have to be "false" on Armada 370
(because we do *not* want shareable pages on Armada 370), even though
the platform is I/O coherent.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-19  9:17       ` Thomas Petazzoni
@ 2014-05-19 10:42         ` Catalin Marinas
  2014-05-19 11:17           ` Thomas Petazzoni
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-05-19 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 19, 2014 at 10:17:39AM +0100, Thomas Petazzoni wrote:
> On Thu, 15 May 2014 15:22:05 +0100, Catalin Marinas wrote:
> > > > >  * On Armada 375/38x (which have single core and dual core variants)
> > > > > 
> > > > >    - The cache policy of pages must be set to "write allocate".
> > > > >    - The SMP and TLB broadcast bits must be set in the Auxiliary
> > > > >      Control Register (the core is a Cortex-A9)
> > > > 
> > > > What about setting this bit in the firmware/bootloader? It's a sane
> > > > initialisation firmware should do.
> > > 
> > > I'm sorry, but I don't buy the "fix your bootloader" argument. For
> > > various reasons:
> > 
> > Well, for arm64 people should get used to this because I'm not taking
> > code for setting up things like SCU and ACTLR during boot.
> 
> I have no problem about the bootloader doing some initialization, so if
> that's a requirement for arm64, no problem. But it should be a
> requirement regardless of the kernel configuration: in the current
> kernel, some initialization is done when CONFIG_SMP is enabled. And now
> that the same initialization is needed in !CONFIG_SMP, I'm asked to do
> it in the bootloader. That's the thing I don't buy.

And that's a hard thing for me to sell for arm32 ;) because of the
current code already doing this.

One of the issues is secure vs non-secure kernel deployment and
secure-only registers like ACTLR (and even SCU). I've seen SoCs for
which the mainline kernel assumes secure mode but it's deployed as
non-secure in the field with out of tree modifications to make it work.
I would rather have a single kernel image for both cases.

> > > Unfortunately, __create_page_tables is called so early that I don't
> > > know if it's possible to access 'struct machine_desc' at this point to
> > > know whether the platform needs shareable pages or not.
> > > 
> > > Also, don't we have the same problem with the TTB_FLAGS_{SMP,UP) ?
> > 
> > I don't think we do because we use the ALT_{SMP,UP} for TTB settings and
> > we have the smp-on-up fixup running before __v7_setup.
> 
> Ok. For Armada 370, I don't need shareable pages. For Armada 380, I
> need shareable pages, but even if the processor is single core, I
> believe it pretends to be SMP in MPIDR.

You'd need to check this.

> > > It's either it's *always* done by the
> > > bootloader and we completely remove the SCU enabling code in the
> > > kernel, and add to the ARM boot protocol that enabling the SCU is the
> > > responsibility of the bootloader, or the kernel does the SCU enabling
> > > in all situations.
> > 
> > I'm always for the former (and that's the case for arm64), though for
> > some older boards it's too late to change.
> 
> Indeed. So how do we move forward?

As I said previously, it's not my call, just a recommendation. It's up
to the arm/arm-soc maintainers to take this. Give the arm32 SMP
initialisation precedent, forcing it for an existing platform is a
harder sell.

Whether we want to set a policy for new arm32 SoCs is for a different
thread.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-19 10:42         ` Catalin Marinas
@ 2014-05-19 11:17           ` Thomas Petazzoni
  2014-05-19 15:19             ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-19 11:17 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin,

On Mon, 19 May 2014 11:42:19 +0100, Catalin Marinas wrote:

> > I have no problem about the bootloader doing some initialization, so if
> > that's a requirement for arm64, no problem. But it should be a
> > requirement regardless of the kernel configuration: in the current
> > kernel, some initialization is done when CONFIG_SMP is enabled. And now
> > that the same initialization is needed in !CONFIG_SMP, I'm asked to do
> > it in the bootloader. That's the thing I don't buy.
> 
> And that's a hard thing for me to sell for arm32 ;) because of the
> current code already doing this.
> 
> One of the issues is secure vs non-secure kernel deployment and
> secure-only registers like ACTLR (and even SCU). I've seen SoCs for
> which the mainline kernel assumes secure mode but it's deployed as
> non-secure in the field with out of tree modifications to make it work.
> I would rather have a single kernel image for both cases.

As things are today, I believe the Armada 370/XP/375/38x are always
booted in secure mode. Couldn't the kernel detect if it's running
secure or non-secure. If it's running secure, it can do all the ACTLR
configuration (and other things that require being in secure mode). If
it's running non-secure, then it doesn't do this configuration, and
assumes the firmware/bootloader did it.

> > > > Unfortunately, __create_page_tables is called so early that I don't
> > > > know if it's possible to access 'struct machine_desc' at this point to
> > > > know whether the platform needs shareable pages or not.
> > > > 
> > > > Also, don't we have the same problem with the TTB_FLAGS_{SMP,UP) ?
> > > 
> > > I don't think we do because we use the ALT_{SMP,UP} for TTB settings and
> > > we have the smp-on-up fixup running before __v7_setup.
> > 
> > Ok. For Armada 370, I don't need shareable pages. For Armada 380, I
> > need shareable pages, but even if the processor is single core, I
> > believe it pretends to be SMP in MPIDR.
> 
> You'd need to check this.

I got the confirmation. *But* this is only going to affect CONFIG_SMP
builds. Armada 380 being single core, we want I/O coherency to work on
it in !CONFIG_SMP configurations. And in !CONFIG_SMP configuration,
it's always ALT_UP() that is used, regardless of what MPIDR says. And
therefore the TTB_FLAGS_UP will be used.

> > > I'm always for the former (and that's the case for arm64), though for
> > > some older boards it's too late to change.
> > 
> > Indeed. So how do we move forward?
> 
> As I said previously, it's not my call, just a recommendation. It's up
> to the arm/arm-soc maintainers to take this. Give the arm32 SMP
> initialisation precedent, forcing it for an existing platform is a
> harder sell.

Ok. So I should keep going with the approach proposed in my patch
series, obviously after fixing the various other comments that were
posted?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-14 17:04 ` [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Catalin Marinas
  2014-05-15  9:50   ` Thomas Petazzoni
@ 2014-05-19 13:38   ` Thomas Petazzoni
  1 sibling, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-05-19 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Catalin Marinas,

On Wed, 14 May 2014 18:04:56 +0100, Catalin Marinas wrote:

> >    - The pages must be set as shareable.
> 
> Here you may have some conflict between the initial page tables set in
> __create_page_tables as non-shareable (that's unless MPIDR shows it as
> SMP but I guess not since smp-on-up kicks in). I have to think a bit
> more about the implications (the ARM ARM has a chapter on mismatched
> memory attributes and I think it talks about shareable vs
> non-shareable).

Do you have more insights about this potential conflict?

The problem is that at the arch/arm/kernel/head.S stage, it's currently
not possible to do platform-specific code while keeping multiplatform
characteristics, because I don't see how we could get a machine_desc
pointer or any other platform-specific indicator so early in the boot.
Do you have suggestions?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-19 11:17           ` Thomas Petazzoni
@ 2014-05-19 15:19             ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-05-19 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 19, 2014 at 12:17:57PM +0100, Thomas Petazzoni wrote:
> On Mon, 19 May 2014 11:42:19 +0100, Catalin Marinas wrote:
> > > I have no problem about the bootloader doing some initialization, so if
> > > that's a requirement for arm64, no problem. But it should be a
> > > requirement regardless of the kernel configuration: in the current
> > > kernel, some initialization is done when CONFIG_SMP is enabled. And now
> > > that the same initialization is needed in !CONFIG_SMP, I'm asked to do
> > > it in the bootloader. That's the thing I don't buy.
> > 
> > And that's a hard thing for me to sell for arm32 ;) because of the
> > current code already doing this.
> > 
> > One of the issues is secure vs non-secure kernel deployment and
> > secure-only registers like ACTLR (and even SCU). I've seen SoCs for
> > which the mainline kernel assumes secure mode but it's deployed as
> > non-secure in the field with out of tree modifications to make it work.
> > I would rather have a single kernel image for both cases.
> 
> As things are today, I believe the Armada 370/XP/375/38x are always
> booted in secure mode. Couldn't the kernel detect if it's running
> secure or non-secure. If it's running secure, it can do all the ACTLR
> configuration (and other things that require being in secure mode). If
> it's running non-secure, then it doesn't do this configuration, and
> assumes the firmware/bootloader did it.

Unfortunately there is no easy way to detect secure vs non-secure at
boot time. What we have for other platforms is a check for whether the
corresponding ACTLR bit is set and if yes, we avoid writing the
register (which would undef).

> > > > > Unfortunately, __create_page_tables is called so early that I don't
> > > > > know if it's possible to access 'struct machine_desc' at this point to
> > > > > know whether the platform needs shareable pages or not.
> > > > > 
> > > > > Also, don't we have the same problem with the TTB_FLAGS_{SMP,UP) ?
> > > > 
> > > > I don't think we do because we use the ALT_{SMP,UP} for TTB settings and
> > > > we have the smp-on-up fixup running before __v7_setup.
> > > 
> > > Ok. For Armada 370, I don't need shareable pages. For Armada 380, I
> > > need shareable pages, but even if the processor is single core, I
> > > believe it pretends to be SMP in MPIDR.
> > 
> > You'd need to check this.
> 
> I got the confirmation. *But* this is only going to affect CONFIG_SMP
> builds. Armada 380 being single core, we want I/O coherency to work on
> it in !CONFIG_SMP configurations. And in !CONFIG_SMP configuration,
> it's always ALT_UP() that is used, regardless of what MPIDR says. And
> therefore the TTB_FLAGS_UP will be used.

Good point. In this case you could have a coherency problem as the page
table walks are not shareable. In practice, it's only one CPU that
writes/reads the page tables I think it should be OK.

The other case is non-shareable cacheable access with the original
page table and later switched to shareable. I don't think the cache
lines would be tagged with a shareable attribute but it's worth asking
the hw people (but you are really into microarchitecture details).

According to the ARM ARM (A3.5.7) you shouldn't have different agents
accessing the same location with different shareability attributes (page
table walker is such agent):

  If the mismatched attributes for a location mean that multiple
  cacheable accesses to the location might be made with different
  shareability attributes, then coherency is guaranteed only if each
  thread of execution that accesses the location with a cacheable
  attribute performs a clean and invalidate of the location.

> > > > I'm always for the former (and that's the case for arm64), though for
> > > > some older boards it's too late to change.
> > > 
> > > Indeed. So how do we move forward?
> > 
> > As I said previously, it's not my call, just a recommendation. It's up
> > to the arm/arm-soc maintainers to take this. Give the arm32 SMP
> > initialisation precedent, forcing it for an existing platform is a
> > harder sell.
> 
> Ok. So I should keep going with the approach proposed in my patch
> series, obviously after fixing the various other comments that were
> posted?

I assume it's business as usual unless you hear otherwise from rmk or
arm-soc ;).

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms
  2014-05-19  9:31       ` Thomas Petazzoni
@ 2014-05-19 16:53         ` Rob Herring
  0 siblings, 0 replies; 28+ messages in thread
From: Rob Herring @ 2014-05-19 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 19, 2014 at 4:31 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Dear Rob Herring,
>
> On Thu, 15 May 2014 09:44:11 -0500, Rob Herring wrote:
>

>> > See above. It's not only about having issues to get the vendor to fix
>> > the firmware. It's also about:
>> >
>> >  * Having a consistent strategy with regard to what the kernel does and
>> >    what the bootloader does. Why would the kernel set the SMP and TLB
>> >    broadcast bit when CONFIG_SMP is set and the processor is SMP, but
>> >    not when CONFIG_SMP is disabled and the system is I/O coherent?
>>
>> It is not that simple. The SMP bit has different meanings depending on
>> the core, but is more related to cache usage than running multi-core
>> or not. On the A15 for example, IIRC, the SMP bit basically means
>> disable cache lookups while the C bit only means disable cache
>> allocations (i.e. cache hits can occur with the "cache disabled").
>
> Hum, right, and?

My only point is the function of the bit is not entirely consistent
from core to core and the kernel may not know the right thing to do.
The only consistent policy is to make it a bootloader problem. An A15
and A7 should always enable it because otherwise you are running with
caches disabled and the A15/A7 don't really support AMP operation. An
A9 may depend on the AMP usage of the cores for how to set the SMP
bit. I think the kernel can only set the SMP bit when CONFIG_SMP is
enabled and otherwise it must leave it to the bootloader to do.

Some examples:

A15/A7 - always set by bootloader
A9 SMP - only set by bootloader if AMP is never used, otherwise set by kernel
A9 AMP - cleared/untouched by bootloader, untouched by kernel

Then there's other cores I'm not familiar with. If they are not used
for AMP, then it should be easy decision for the bootloader to setup.

This raises another question of how you configure AMP setup. Perhaps
not a problem we want to solve here.

>> >  * User support issues. Having gazillions of bootloader updates every
>> >    time a piece of configuration settings is rejected by the kernel
>> >    maintainers is going to be a nightmare for users.
>>
>> It's an artifact of trying to upstream the kernel after shipping
>> rather than before. If these issues were fixed in the bootloader at
>> the same time as someone decided to just hack up the vendor kernel,
>> then probably none of us would even know about the issue.
>
> Sorry, but that just doesn't work. The time it takes to mainline things
> is *way* longer than the time it takes for the SoC vendor to start
> providing early SoC releases and development boards with bootloaders to
> customers. There will also be some bootloader pushed out to customers
> before the code is mainlined in Linux.

It can work and is possible because I did it at Calxeda and Intel does
it. Highbank support went upstream 6-9 months before Si and boards
arrived. It did require some changes after Si, but those were small
and most changes were in the firmware side. It does require a certain
mindset for the company to drive the planning and needs software
influence over the h/w design. It is not something easy to accomplish
especially when you get behind working on the previous generation that
you can't start early on the next gen. While your ELC presentation
showed things are improving with Marvell upstreaming (which is great
to see), I'd argue the 6 month embargo you had is still a problem.

Intel gets new platform support upstream before details are publicly
announced and chips are available beyond OEMs. You can argue Intel
platforms are a lot different, but that doesn't matter because that is
who the ARM vendors are competing against.

>> As long as upstream is not a requirement, you are not in a position
>> you can win.
>
> Have you seen the number of patches we have sent over the last two
> years for these platforms? You must have missed them, or otherwise I
> don't understand how you manage to conclude that upstream is not a
> requirement.

Sorry, I should say a requirement for production (by customers) and
for distribution kernel support. Obviously, that was not a requirement
if you look at the Marvell kernel for ubuntu. But that was a few years
back before you really started. Maybe things are changing.

>> You can never keep up with someone that has the freedom
>> to just go and change whatever they want without having to deal with
>> those annoying kernel maintainers. You fix these issues, there will
>> just be other reasons why mainline is not usable (Do they still have
>> the random one line deletion of a goto statement in the scheduler).
>>
>> Speaking from experience rebasing i.MX vendor kernels, nothing
>> upstream is easier to update than partially upstream if you try to
>> build upon what is upstream. Generally speaking, vendors don't care
>> what their delta against upstream is any more than they care about
>> upstream being production quality.
>>
>> Obviously, Marvell does care about mainline to some extent or I
>> wouldn't be talking to you. But how do we get them to care enough to
>> make mainline production quality? Part of the problem is we have
>> distro's willing to take your money and vendor kernel (they are also
>> willing to take your money and upstream kernel as well).
>
> Could you be more specific about what Marvell isn't doing today to be
> mainline production ready? As I said, we've been working hard since two
> years to push to mainline the support for their SoCs. Our patches can
> certainly be criticized, but the end result is clear: their Armada 370,
> XP and now 375/38x platforms are quite well supported in mainline. And
> since the patches have been merged, they surely have matched the
> mainline kernel quality requirements, no?

Obviously, what is mainlined matches mainline kernel quality
requirements and you all are doing a great job there. My guess is that
the vendor kernel is still used because of performance reasons (like
coherent i/o) and feature gaps. Is the vendor BSP really shrinking
over time and is what remains becoming more inline with what is
acceptable for mainline (i.e. no HAL layer)?

Rob

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-05-19 16:53 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14 15:50 [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Thomas Petazzoni
2014-05-14 15:50 ` [RFC PATCHv1 1/7] ARM: extend machine_desc with additional flags Thomas Petazzoni
2014-05-14 15:50 ` [RFC PATCHv1 2/7] ARM: mm: implement the usage of the machine_desc flags Thomas Petazzoni
2014-05-14 15:50 ` [RFC PATCHv1 3/7] ARM: mm: enable SMP bit and TLB broadcast bit on !SMP when needed Thomas Petazzoni
2014-05-14 15:50 ` [RFC PATCHv1 4/7] ARM: kernel: allow the SCU to be enabled even on !SMP Thomas Petazzoni
2014-05-14 15:50 ` [RFC PATCHv1 5/7] ARM: mvebu: split Armada 370 and Armada XP machine_desc Thomas Petazzoni
2014-05-14 15:50 ` [RFC PATCHv1 6/7] ARM: mvebu: define the Armada 370/375/38x/XP machine_desc flags Thomas Petazzoni
2014-05-14 15:50 ` [RFC PATCHv1 7/7] ARM: mvebu: I/O coherency no longer needs SMP on 375 and 38x Thomas Petazzoni
2014-05-14 17:04 ` [RFC PATCHv1 0/7] ARM core support for hardware I/O coherency in non-SMP platforms Catalin Marinas
2014-05-15  9:50   ` Thomas Petazzoni
2014-05-15 14:22     ` Catalin Marinas
2014-05-15 14:59       ` Rob Herring
2014-05-15 15:25         ` Catalin Marinas
2014-05-15 19:11           ` Rob Herring
2014-05-16 15:11             ` Catalin Marinas
2014-05-19  9:19               ` Thomas Petazzoni
2014-05-19  9:17       ` Thomas Petazzoni
2014-05-19 10:42         ` Catalin Marinas
2014-05-19 11:17           ` Thomas Petazzoni
2014-05-19 15:19             ` Catalin Marinas
2014-05-19 13:38   ` Thomas Petazzoni
2014-05-14 17:07 ` Rob Herring
2014-05-15 10:01   ` Thomas Petazzoni
2014-05-15 13:27     ` Will Deacon
2014-05-15 13:44       ` Thomas Petazzoni
2014-05-15 14:44     ` Rob Herring
2014-05-19  9:31       ` Thomas Petazzoni
2014-05-19 16:53         ` Rob Herring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.