All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2
@ 2016-03-16 14:41 Alexander Graf
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 1/5] arm64: Add 32bit arm compatible dcache definitions Alexander Graf
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Alexander Graf @ 2016-03-16 14:41 UTC (permalink / raw)
  To: u-boot

This patch set converts the Raspberry Pi 2 system to properly make use of
the caches available in it.

Because we're running in HYP mode, we first need to teach U-Boot how to
make use of HYP registers and the LPAE page layout which is mandated by
hardware when running in HYP mode.

Then while we're at it, also mark the frame buffer cached to speed up
screen updates.

With this patch set, my Raspberry Pi 3 running in AArch32 mode is a *lot*
faster than without.

Please verify that the code works on a RPi2 as well and doesn't break the
original Pi. In theory it should work, but I only have a 3 to test on
available here.

v1 -> v2:

  - Move to KConfig
  - Adapt to new config name "CONFIG_ARMV7_LPAE"

Alexander Graf (5):
  arm64: Add 32bit arm compatible dcache definitions
  arm: Add support for HYP mode and LPAE page tables
  lcd: Fix compile warning in 64bit mode
  RPi: Enable caches for rpi2
  bcm2835 video: Map fb as cached

 arch/arm/cpu/armv7/Kconfig    |   8 ++++
 arch/arm/include/asm/system.h | 105 +++++++++++++++++++++++++++++++++++++++---
 arch/arm/lib/cache-cp15.c     |  66 +++++++++++++++++++++++---
 arch/arm/mach-bcm283x/Kconfig |   1 +
 arch/arm/mach-bcm283x/init.c  |   7 +++
 common/lcd.c                  |   4 +-
 drivers/video/bcm2835.c       |   6 +++
 include/configs/rpi_2.h       |   1 -
 8 files changed, 182 insertions(+), 16 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 1/5] arm64: Add 32bit arm compatible dcache definitions
  2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
@ 2016-03-16 14:41 ` Alexander Graf
  2016-03-27 22:25   ` [U-Boot] [U-Boot, v2, " Tom Rini
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 2/5] arm: Add support for HYP mode and LPAE page tables Alexander Graf
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2016-03-16 14:41 UTC (permalink / raw)
  To: u-boot

We want to be able to reuse device drivers from 32bit code, so let's add
definitions for all the dcache options that 32bit code has.

While at it, fix up the DCACHE_OFF configuration. That was setting the bits
to declare a PTE a PTE and left the MAIR index bit at 0. Drop the useless
bits and make the index explicit.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/arm/include/asm/system.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index ac1173d..832c1db 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -26,8 +26,12 @@ u64 get_page_table_size(void);
 #define MMU_SECTION_SHIFT	21
 #define MMU_SECTION_SIZE	(1 << MMU_SECTION_SHIFT)
 
+/* These constants need to be synced to the MT_ types in asm/armv8/mmu.h */
 enum dcache_option {
-	DCACHE_OFF = 0x3,
+	DCACHE_OFF = 0 << 2,
+	DCACHE_WRITETHROUGH = 3 << 2,
+	DCACHE_WRITEBACK = 4 << 2,
+	DCACHE_WRITEALLOC = 4 << 2,
 };
 
 #define isb()				\
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 2/5] arm: Add support for HYP mode and LPAE page tables
  2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 1/5] arm64: Add 32bit arm compatible dcache definitions Alexander Graf
@ 2016-03-16 14:41 ` Alexander Graf
  2016-03-27 22:25   ` [U-Boot] [U-Boot, v2, " Tom Rini
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 3/5] lcd: Fix compile warning in 64bit mode Alexander Graf
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2016-03-16 14:41 UTC (permalink / raw)
  To: u-boot

We currently always modify the SVC versions of registers and only support
the short descriptor PTE format.

Some boards however (like the RPi2) run in HYP mode. There, we need to modify
the HYP version of system registers and HYP mode only supports the long
descriptor PTE format.

So this patch introduces support for both long descriptor PTEs and HYP mode
registers.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2:

  - Switch to KConfig
  - Rename config define to CONFIG_ARMV7_LPAE
---
 arch/arm/cpu/armv7/Kconfig    |  8 ++++
 arch/arm/include/asm/system.h | 99 ++++++++++++++++++++++++++++++++++++++++---
 arch/arm/lib/cache-cp15.c     | 66 ++++++++++++++++++++++++++---
 3 files changed, 161 insertions(+), 12 deletions(-)

diff --git a/arch/arm/cpu/armv7/Kconfig b/arch/arm/cpu/armv7/Kconfig
index 6c5d5dd..afeaac8 100644
--- a/arch/arm/cpu/armv7/Kconfig
+++ b/arch/arm/cpu/armv7/Kconfig
@@ -31,4 +31,12 @@ config ARMV7_VIRT
 	---help---
 	Say Y here to boot in hypervisor (HYP) mode when booting non-secure.
 
+config ARMV7_LPAE
+	boolean "Use LPAE page table format" if EXPERT
+	depends on CPU_V7
+	default n
+	---help---
+	Say Y here to use the long descriptor page table format. This is
+	required if U-Boot runs in HYP mode.
+
 endif
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index 832c1db..9ae890a 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -176,7 +176,9 @@ void smc_call(struct pt_regs *args);
 #define CR_AFE	(1 << 29)	/* Access flag enable			*/
 #define CR_TE	(1 << 30)	/* Thumb exception enable		*/
 
-#ifndef PGTABLE_SIZE
+#if defined(CONFIG_ARMV7_LPAE) && !defined(PGTABLE_SIZE)
+#define PGTABLE_SIZE		(4096 * 5)
+#elif !defined(PGTABLE_SIZE)
 #define PGTABLE_SIZE		(4096 * 4)
 #endif
 
@@ -233,17 +235,50 @@ void save_boot_params_ret(void);
 #define wfi()
 #endif
 
+static inline unsigned long get_cpsr(void)
+{
+	unsigned long cpsr;
+
+	asm volatile("mrs %0, cpsr" : "=r"(cpsr): );
+	return cpsr;
+}
+
+static inline int is_hyp(void)
+{
+#ifdef CONFIG_ARMV7_LPAE
+	/* HYP mode requires LPAE ... */
+	return ((get_cpsr() & 0x1f) == 0x1a);
+#else
+	/* ... so without LPAE support we can optimize all hyp code away */
+	return 0;
+#endif
+}
+
 static inline unsigned int get_cr(void)
 {
 	unsigned int val;
-	asm volatile("mrc p15, 0, %0, c1, c0, 0	@ get CR" : "=r" (val) : : "cc");
+
+	if (is_hyp())
+		asm volatile("mrc p15, 4, %0, c1, c0, 0	@ get CR" : "=r" (val)
+								  :
+								  : "cc");
+	else
+		asm volatile("mrc p15, 0, %0, c1, c0, 0	@ get CR" : "=r" (val)
+								  :
+								  : "cc");
 	return val;
 }
 
 static inline void set_cr(unsigned int val)
 {
-	asm volatile("mcr p15, 0, %0, c1, c0, 0	@ set CR"
-	  : : "r" (val) : "cc");
+	if (is_hyp())
+		asm volatile("mcr p15, 4, %0, c1, c0, 0	@ set CR" :
+								  : "r" (val)
+								  : "cc");
+	else
+		asm volatile("mcr p15, 0, %0, c1, c0, 0	@ set CR" :
+								  : "r" (val)
+								  : "cc");
 	isb();
 }
 
@@ -261,12 +296,59 @@ static inline void set_dacr(unsigned int val)
 	isb();
 }
 
-#ifdef CONFIG_CPU_V7
+#ifdef CONFIG_ARMV7_LPAE
+/* Long-Descriptor Translation Table Level 1/2 Bits */
+#define TTB_SECT_XN_MASK	(1ULL << 54)
+#define TTB_SECT_NG_MASK	(1 << 11)
+#define TTB_SECT_AF		(1 << 10)
+#define TTB_SECT_SH_MASK	(3 << 8)
+#define TTB_SECT_NS_MASK	(1 << 5)
+#define TTB_SECT_AP		(1 << 6)
+/* Note: TTB AP bits are set elsewhere */
+#define TTB_SECT_MAIR(x)	((x & 0x7) << 2) /* Index into MAIR */
+#define TTB_SECT		(1 << 0)
+#define TTB_PAGETABLE		(3 << 0)
+
+/* TTBCR flags */
+#define TTBCR_EAE		(1 << 31)
+#define TTBCR_T0SZ(x)		((x) << 0)
+#define TTBCR_T1SZ(x)		((x) << 16)
+#define TTBCR_USING_TTBR0	(TTBCR_T0SZ(0) | TTBCR_T1SZ(0))
+#define TTBCR_IRGN0_NC		(0 << 8)
+#define TTBCR_IRGN0_WBWA	(1 << 8)
+#define TTBCR_IRGN0_WT		(2 << 8)
+#define TTBCR_IRGN0_WBNWA	(3 << 8)
+#define TTBCR_IRGN0_MASK	(3 << 8)
+#define TTBCR_ORGN0_NC		(0 << 10)
+#define TTBCR_ORGN0_WBWA	(1 << 10)
+#define TTBCR_ORGN0_WT		(2 << 10)
+#define TTBCR_ORGN0_WBNWA	(3 << 10)
+#define TTBCR_ORGN0_MASK	(3 << 10)
+#define TTBCR_SHARED_NON	(0 << 12)
+#define TTBCR_SHARED_OUTER	(2 << 12)
+#define TTBCR_SHARED_INNER	(3 << 12)
+#define TTBCR_EPD0		(0 << 7)
+
+/*
+ * Memory types
+ */
+#define MEMORY_ATTRIBUTES	((0x00 << (0 * 8)) | (0x88 << (1 * 8)) | \
+				 (0xcc << (2 * 8)) | (0xff << (3 * 8)))
+
+/* options available for data cache on each page */
+enum dcache_option {
+	DCACHE_OFF = TTB_SECT | TTB_SECT_MAIR(0),
+	DCACHE_WRITETHROUGH = TTB_SECT | TTB_SECT_MAIR(1),
+	DCACHE_WRITEBACK = TTB_SECT | TTB_SECT_MAIR(2),
+	DCACHE_WRITEALLOC = TTB_SECT | TTB_SECT_MAIR(3),
+};
+#elif defined(CONFIG_CPU_V7)
 /* Short-Descriptor Translation Table Level 1 Bits */
 #define TTB_SECT_NS_MASK	(1 << 19)
 #define TTB_SECT_NG_MASK	(1 << 17)
 #define TTB_SECT_S_MASK		(1 << 16)
 /* Note: TTB AP bits are set elsewhere */
+#define TTB_SECT_AP		(3 << 10)
 #define TTB_SECT_TEX(x)		((x & 0x7) << 12)
 #define TTB_SECT_DOMAIN(x)	((x & 0xf) << 5)
 #define TTB_SECT_XN_MASK	(1 << 4)
@@ -282,6 +364,7 @@ enum dcache_option {
 	DCACHE_WRITEALLOC = DCACHE_WRITEBACK | TTB_SECT_TEX(1),
 };
 #else
+#define TTB_SECT_AP		(3 << 10)
 /* options available for data cache on each page */
 enum dcache_option {
 	DCACHE_OFF = 0x12,
@@ -293,7 +376,11 @@ enum dcache_option {
 
 /* Size of an MMU section */
 enum {
-	MMU_SECTION_SHIFT	= 20,
+#ifdef CONFIG_ARMV7_LPAE
+	MMU_SECTION_SHIFT	= 21, /* 2MB */
+#else
+	MMU_SECTION_SHIFT	= 20, /* 1MB */
+#endif
 	MMU_SECTION_SIZE	= 1 << MMU_SECTION_SHIFT,
 };
 
diff --git a/arch/arm/lib/cache-cp15.c b/arch/arm/lib/cache-cp15.c
index 8e18538..1121dc3 100644
--- a/arch/arm/lib/cache-cp15.c
+++ b/arch/arm/lib/cache-cp15.c
@@ -34,11 +34,22 @@ static void cp_delay (void)
 
 void set_section_dcache(int section, enum dcache_option option)
 {
+#ifdef CONFIG_ARMV7_LPAE
+	u64 *page_table = (u64 *)gd->arch.tlb_addr;
+	/* Need to set the access flag to not fault */
+	u64 value = TTB_SECT_AP | TTB_SECT_AF;
+#else
 	u32 *page_table = (u32 *)gd->arch.tlb_addr;
-	u32 value;
+	u32 value = TTB_SECT_AP;
+#endif
+
+	/* Add the page offset */
+	value |= ((u32)section << MMU_SECTION_SHIFT);
 
-	value = (section << MMU_SECTION_SHIFT) | (3 << 10);
+	/* Add caching bits */
 	value |= option;
+
+	/* Set PTE */
 	page_table[section] = value;
 }
 
@@ -68,8 +79,9 @@ __weak void dram_bank_mmu_setup(int bank)
 	int	i;
 
 	debug("%s: bank: %d\n", __func__, bank);
-	for (i = bd->bi_dram[bank].start >> 20;
-	     i < (bd->bi_dram[bank].start >> 20) + (bd->bi_dram[bank].size >> 20);
+	for (i = bd->bi_dram[bank].start >> MMU_SECTION_SHIFT;
+	     i < (bd->bi_dram[bank].start >> MMU_SECTION_SHIFT) +
+		 (bd->bi_dram[bank].size >> MMU_SECTION_SHIFT);
 	     i++) {
 #if defined(CONFIG_SYS_ARM_CACHE_WRITETHROUGH)
 		set_section_dcache(i, DCACHE_WRITETHROUGH);
@@ -89,14 +101,56 @@ static inline void mmu_setup(void)
 
 	arm_init_before_mmu();
 	/* Set up an identity-mapping for all 4GB, rw for everyone */
-	for (i = 0; i < 4096; i++)
+	for (i = 0; i < ((4096ULL * 1024 * 1024) >> MMU_SECTION_SHIFT); i++)
 		set_section_dcache(i, DCACHE_OFF);
 
 	for (i = 0; i < CONFIG_NR_DRAM_BANKS; i++) {
 		dram_bank_mmu_setup(i);
 	}
 
-#ifdef CONFIG_CPU_V7
+#ifdef CONFIG_ARMV7_LPAE
+	/* Set up 4 PTE entries pointing to our 4 1GB page tables */
+	for (i = 0; i < 4; i++) {
+		u64 *page_table = (u64 *)(gd->arch.tlb_addr + (4096 * 4));
+		u64 tpt = gd->arch.tlb_addr + (4096 * i);
+		page_table[i] = tpt | TTB_PAGETABLE;
+	}
+
+	reg = TTBCR_EAE;
+#if defined(CONFIG_SYS_ARM_CACHE_WRITETHROUGH)
+	reg |= TTBCR_ORGN0_WT | TTBCR_IRGN0_WT;
+#elif defined(CONFIG_SYS_ARM_CACHE_WRITEALLOC)
+	reg |= TTBCR_ORGN0_WBWA | TTBCR_IRGN0_WBWA;
+#else
+	reg |= TTBCR_ORGN0_WBNWA | TTBCR_IRGN0_WBNWA;
+#endif
+
+	if (is_hyp()) {
+		/* Set HCTR to enable LPAE */
+		asm volatile("mcr p15, 4, %0, c2, c0, 2"
+			: : "r" (reg) : "memory");
+		/* Set HTTBR0 */
+		asm volatile("mcrr p15, 4, %0, %1, c2"
+			:
+			: "r"(gd->arch.tlb_addr + (4096 * 4)), "r"(0)
+			: "memory");
+		/* Set HMAIR */
+		asm volatile("mcr p15, 4, %0, c10, c2, 0"
+			: : "r" (MEMORY_ATTRIBUTES) : "memory");
+	} else {
+		/* Set TTBCR to enable LPAE */
+		asm volatile("mcr p15, 0, %0, c2, c0, 2"
+			: : "r" (reg) : "memory");
+		/* Set 64-bit TTBR0 */
+		asm volatile("mcrr p15, 0, %0, %1, c2"
+			:
+			: "r"(gd->arch.tlb_addr + (4096 * 4)), "r"(0)
+			: "memory");
+		/* Set MAIR */
+		asm volatile("mcr p15, 0, %0, c10, c2, 0"
+			: : "r" (MEMORY_ATTRIBUTES) : "memory");
+	}
+#elif defined(CONFIG_CPU_V7)
 	/* Set TTBR0 */
 	reg = gd->arch.tlb_addr & TTBR0_BASE_ADDR_MASK;
 #if defined(CONFIG_SYS_ARM_CACHE_WRITETHROUGH)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 3/5] lcd: Fix compile warning in 64bit mode
  2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 1/5] arm64: Add 32bit arm compatible dcache definitions Alexander Graf
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 2/5] arm: Add support for HYP mode and LPAE page tables Alexander Graf
@ 2016-03-16 14:41 ` Alexander Graf
  2016-03-27 22:25   ` [U-Boot] [U-Boot, v2, " Tom Rini
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 4/5] RPi: Enable caches for rpi2 Alexander Graf
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2016-03-16 14:41 UTC (permalink / raw)
  To: u-boot

When compiling the code for 64bit, the lcd code emits warnings because it
tries to cast pointers to 32bit values. Fix it by casting them to longs
instead, actually properly aligning with the function prototype.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 common/lcd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/common/lcd.c b/common/lcd.c
index 51705ad..783626e 100644
--- a/common/lcd.c
+++ b/common/lcd.c
@@ -66,8 +66,8 @@ void lcd_sync(void)
 	int line_length;
 
 	if (lcd_flush_dcache)
-		flush_dcache_range((u32)lcd_base,
-			(u32)(lcd_base + lcd_get_size(&line_length)));
+		flush_dcache_range((ulong)lcd_base,
+			(ulong)(lcd_base + lcd_get_size(&line_length)));
 #endif
 }
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 4/5] RPi: Enable caches for rpi2
  2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
                   ` (2 preceding siblings ...)
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 3/5] lcd: Fix compile warning in 64bit mode Alexander Graf
@ 2016-03-16 14:41 ` Alexander Graf
  2016-03-27 22:25   ` [U-Boot] [U-Boot,v2,4/5] " Tom Rini
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 5/5] bcm2835 video: Map fb as cached Alexander Graf
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2016-03-16 14:41 UTC (permalink / raw)
  To: u-boot

Now that we have support for running with caches enabled in HYP mode,
opt in to that on the Raspberry Pi 2. This brings a significant performance
boost.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2:

  - Move to KConfig
  - Adapt to new define name
---
 arch/arm/mach-bcm283x/Kconfig | 1 +
 arch/arm/mach-bcm283x/init.c  | 7 +++++++
 include/configs/rpi_2.h       | 1 -
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-bcm283x/Kconfig b/arch/arm/mach-bcm283x/Kconfig
index 2315a13..1a7baf6 100644
--- a/arch/arm/mach-bcm283x/Kconfig
+++ b/arch/arm/mach-bcm283x/Kconfig
@@ -12,6 +12,7 @@ config TARGET_RPI
 config TARGET_RPI_2
 	bool "Raspberry Pi 2"
 	select CPU_V7
+	select ARMV7_LPAE
 
 endchoice
 
diff --git a/arch/arm/mach-bcm283x/init.c b/arch/arm/mach-bcm283x/init.c
index d2d366b..4fa94db 100644
--- a/arch/arm/mach-bcm283x/init.c
+++ b/arch/arm/mach-bcm283x/init.c
@@ -15,3 +15,10 @@ int arch_cpu_init(void)
 
 	return 0;
 }
+
+#ifdef CONFIG_ARMV7_LPAE
+void enable_caches(void)
+{
+	dcache_enable();
+}
+#endif
diff --git a/include/configs/rpi_2.h b/include/configs/rpi_2.h
index bea4ebd..13dc8de 100644
--- a/include/configs/rpi_2.h
+++ b/include/configs/rpi_2.h
@@ -10,7 +10,6 @@
 #define CONFIG_SKIP_LOWLEVEL_INIT
 #define CONFIG_BCM2836
 #define CONFIG_SYS_CACHELINE_SIZE		64
-#define CONFIG_SYS_DCACHE_OFF
 
 #include "rpi-common.h"
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 5/5] bcm2835 video: Map fb as cached
  2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
                   ` (3 preceding siblings ...)
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 4/5] RPi: Enable caches for rpi2 Alexander Graf
@ 2016-03-16 14:41 ` Alexander Graf
  2016-03-22 15:22   ` [U-Boot] [PATCH v3 " Alexander Graf
                     ` (2 more replies)
  2016-03-17  4:26 ` [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Stephen Warren
  2016-03-25  4:13 ` Stephen Warren
  6 siblings, 3 replies; 23+ messages in thread
From: Alexander Graf @ 2016-03-16 14:41 UTC (permalink / raw)
  To: u-boot

The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
all the glorious performance boost that brings with it.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 drivers/video/bcm2835.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c
index bff1fcb..fe49f2e 100644
--- a/drivers/video/bcm2835.c
+++ b/drivers/video/bcm2835.c
@@ -106,6 +106,12 @@ void lcd_ctrl_init(void *lcdbase)
 
 	gd->fb_base = bus_to_phys(
 		msg_setup->allocate_buffer.body.resp.fb_address);
+
+	/* Enable dcache for the frame buffer */
+        mmu_set_region_dcache_behaviour(gd->fb_base,
+		ALIGN(PAGE_SIZE, msg_setup->allocate_buffer.body.resp.fb_size),
+		DCACHE_WRITEBACK);
+	lcd_set_flush_dcache(1);
 }
 
 void lcd_enable(void)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2
  2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
                   ` (4 preceding siblings ...)
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 5/5] bcm2835 video: Map fb as cached Alexander Graf
@ 2016-03-17  4:26 ` Stephen Warren
  2016-03-17  5:35   ` Alexander Graf
  2016-03-17  7:58   ` Alexander Graf
  2016-03-25  4:13 ` Stephen Warren
  6 siblings, 2 replies; 23+ messages in thread
From: Stephen Warren @ 2016-03-17  4:26 UTC (permalink / raw)
  To: u-boot

On 03/16/2016 08:41 AM, Alexander Graf wrote:
> This patch set converts the Raspberry Pi 2 system to properly make use of
> the caches available in it.
>
> Because we're running in HYP mode, we first need to teach U-Boot how to
> make use of HYP registers and the LPAE page layout which is mandated by
> hardware when running in HYP mode.
>
> Then while we're at it, also mark the frame buffer cached to speed up
> screen updates.
>
> With this patch set, my Raspberry Pi 3 running in AArch32 mode is a *lot*
> faster than without.
>
> Please verify that the code works on a RPi2 as well and doesn't break the
> original Pi. In theory it should work, but I only have a 3 to test on
> available here.

This series mostly works OK. I found the following results, with my 
rpi_dev branch on github if you want to test the exact same commits:

RPi B+ (running rpi_1 build):
- Very minor transient corruption when running "ls mmc 0:2 /etc"

RPi 2 (running rpi_2 build):
RPi 3 (booting in 32-bit mode and running rpi_2 build):
RPi 3 (booting in 32-bit mode and running rpi_3_32b build):
- Obvious transient corruption when running "ls mmc 0:2 /etc"

RPi 3 (booting in 64-bit mode and running rpi_3 build):
- No issues

I suspect the transient corruptions that I saw were missing cache flush 
operations; during the large "memcpy" while scrolling the LCD, I would 
see corruption in the copied data. This would soon disappear; presumably 
as new data is written to the bottom lines of the frame-buffer flushes 
out old cache lines while doing a write-allocate?

Still, I'm not 100% sure this is an issue with these patches since the 
RPi B+ has a similar (although much less obvious) issue. Perhaps U-Boot 
is using the wrong VC/GPU cache alias (top 2 bits of 32-bit physical 
address) for the frame-buffer, i.e. the issue is in the GPU cache, not 
in the ARM cache?

As far as I can tell, USB worked fine in all cases (at least, the 
on-board hub/Ethernet device seemed to be enumerated without issue 
according to "usb tree" and "usb info".)

As such, I'm tempted to just ack the patches, but it'd be nice if you 
could take a look and see if something obvious is wrong.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2
  2016-03-17  4:26 ` [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Stephen Warren
@ 2016-03-17  5:35   ` Alexander Graf
  2016-03-17  7:58   ` Alexander Graf
  1 sibling, 0 replies; 23+ messages in thread
From: Alexander Graf @ 2016-03-17  5:35 UTC (permalink / raw)
  To: u-boot



> Am 17.03.2016 um 05:26 schrieb Stephen Warren <swarren@wwwdotorg.org>:
> 
>> On 03/16/2016 08:41 AM, Alexander Graf wrote:
>> This patch set converts the Raspberry Pi 2 system to properly make use of
>> the caches available in it.
>> 
>> Because we're running in HYP mode, we first need to teach U-Boot how to
>> make use of HYP registers and the LPAE page layout which is mandated by
>> hardware when running in HYP mode.
>> 
>> Then while we're at it, also mark the frame buffer cached to speed up
>> screen updates.
>> 
>> With this patch set, my Raspberry Pi 3 running in AArch32 mode is a *lot*
>> faster than without.
>> 
>> Please verify that the code works on a RPi2 as well and doesn't break the
>> original Pi. In theory it should work, but I only have a 3 to test on
>> available here.
> 
> This series mostly works OK. I found the following results, with my rpi_dev branch on github if you want to test the exact same commits:
> 
> RPi B+ (running rpi_1 build):
> - Very minor transient corruption when running "ls mmc 0:2 /etc"
> 
> RPi 2 (running rpi_2 build):
> RPi 3 (booting in 32-bit mode and running rpi_2 build):
> RPi 3 (booting in 32-bit mode and running rpi_3_32b build):
> - Obvious transient corruption when running "ls mmc 0:2 /etc"
> 
> RPi 3 (booting in 64-bit mode and running rpi_3 build):
> - No issues
> 
> I suspect the transient corruptions that I saw were missing cache flush operations; during the large "memcpy" while scrolling the LCD, I would see corruption in the copied data. This would soon disappear; presumably as new data is written to the bottom lines of the frame-buffer flushes out old cache lines while doing a write-allocate?
> 
> Still, I'm not 100% sure this is an issue with these patches since the RPi B+ has a similar (although much less obvious) issue. Perhaps U-Boot is using the wrong VC/GPU cache alias (top 2 bits of 32-bit physical address) for the frame-buffer, i.e. the issue is in the GPU cache, not in the ARM cache?
> 
> As far as I can tell, USB worked fine in all cases (at least, the on-board hub/Ethernet device seemed to be enumerated without issue according to "usb tree" and "usb info".)
> 
> As such, I'm tempted to just ack the patches, but it'd be nice if you could take a look and see if something obvious is wrong.

Could you please take all patches except for the one that actually sets the dcache policy in the video driver? That way we already get the non-fb speedups for rpi2/3 and I can check where the corruption comes from in parallel with a smaller patch queue.


Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2
  2016-03-17  4:26 ` [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Stephen Warren
  2016-03-17  5:35   ` Alexander Graf
@ 2016-03-17  7:58   ` Alexander Graf
  2016-03-19  2:12     ` Stephen Warren
  1 sibling, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2016-03-17  7:58 UTC (permalink / raw)
  To: u-boot



On 17.03.16 05:26, Stephen Warren wrote:
> On 03/16/2016 08:41 AM, Alexander Graf wrote:
>> This patch set converts the Raspberry Pi 2 system to properly make use of
>> the caches available in it.
>>
>> Because we're running in HYP mode, we first need to teach U-Boot how to
>> make use of HYP registers and the LPAE page layout which is mandated by
>> hardware when running in HYP mode.
>>
>> Then while we're at it, also mark the frame buffer cached to speed up
>> screen updates.
>>
>> With this patch set, my Raspberry Pi 3 running in AArch32 mode is a *lot*
>> faster than without.
>>
>> Please verify that the code works on a RPi2 as well and doesn't break the
>> original Pi. In theory it should work, but I only have a 3 to test on
>> available here.
> 
> This series mostly works OK. I found the following results, with my
> rpi_dev branch on github if you want to test the exact same commits:
> 
> RPi B+ (running rpi_1 build):
> - Very minor transient corruption when running "ls mmc 0:2 /etc"
> 
> RPi 2 (running rpi_2 build):
> RPi 3 (booting in 32-bit mode and running rpi_2 build):
> RPi 3 (booting in 32-bit mode and running rpi_3_32b build):
> - Obvious transient corruption when running "ls mmc 0:2 /etc"
> 
> RPi 3 (booting in 64-bit mode and running rpi_3 build):
> - No issues
> 
> I suspect the transient corruptions that I saw were missing cache flush
> operations; during the large "memcpy" while scrolling the LCD, I would
> see corruption in the copied data. This would soon disappear; presumably
> as new data is written to the bottom lines of the frame-buffer flushes
> out old cache lines while doing a write-allocate?
> 
> Still, I'm not 100% sure this is an issue with these patches since the
> RPi B+ has a similar (although much less obvious) issue. Perhaps U-Boot
> is using the wrong VC/GPU cache alias (top 2 bits of 32-bit physical
> address) for the frame-buffer, i.e. the issue is in the GPU cache, not
> in the ARM cache?
> 
> As far as I can tell, USB worked fine in all cases (at least, the
> on-board hub/Ethernet device seemed to be enumerated without issue
> according to "usb tree" and "usb info".)
> 
> As such, I'm tempted to just ack the patches, but it'd be nice if you
> could take a look and see if something obvious is wrong.

Ugh. It helps when you get the parameters for ALIGN() correctly.

Please just squash the patch below into the last patch, then things
should work fine. If you like I can resend a v3, but I guess the change
is small enough?


Alex

diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c
index fe49f2e..40fc418 100644
--- a/drivers/video/bcm2835.c
+++ b/drivers/video/bcm2835.c
@@ -109,7 +109,7 @@ void lcd_ctrl_init(void *lcdbase)

        /* Enable dcache for the frame buffer */
         mmu_set_region_dcache_behaviour(gd->fb_base,
-               ALIGN(PAGE_SIZE,
msg_setup->allocate_buffer.body.resp.fb_size),
+               ALIGN(msg_setup->allocate_buffer.body.resp.fb_size,
PAGE_SIZE),
                DCACHE_WRITEBACK);
        lcd_set_flush_dcache(1);
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2
  2016-03-17  7:58   ` Alexander Graf
@ 2016-03-19  2:12     ` Stephen Warren
  0 siblings, 0 replies; 23+ messages in thread
From: Stephen Warren @ 2016-03-19  2:12 UTC (permalink / raw)
  To: u-boot

On 03/17/2016 01:58 AM, Alexander Graf wrote:
>
>
> On 17.03.16 05:26, Stephen Warren wrote:
>> On 03/16/2016 08:41 AM, Alexander Graf wrote:
>>> This patch set converts the Raspberry Pi 2 system to properly make use of
>>> the caches available in it.
>>>
>>> Because we're running in HYP mode, we first need to teach U-Boot how to
>>> make use of HYP registers and the LPAE page layout which is mandated by
>>> hardware when running in HYP mode.
>>>
>>> Then while we're at it, also mark the frame buffer cached to speed up
>>> screen updates.
>>>
>>> With this patch set, my Raspberry Pi 3 running in AArch32 mode is a *lot*
>>> faster than without.
>>>
>>> Please verify that the code works on a RPi2 as well and doesn't break the
>>> original Pi. In theory it should work, but I only have a 3 to test on
>>> available here.
>>
>> This series mostly works OK. I found the following results, with my
>> rpi_dev branch on github if you want to test the exact same commits:
>>
>> RPi B+ (running rpi_1 build):
>> - Very minor transient corruption when running "ls mmc 0:2 /etc"
>>
>> RPi 2 (running rpi_2 build):
>> RPi 3 (booting in 32-bit mode and running rpi_2 build):
>> RPi 3 (booting in 32-bit mode and running rpi_3_32b build):
>> - Obvious transient corruption when running "ls mmc 0:2 /etc"
>>
>> RPi 3 (booting in 64-bit mode and running rpi_3 build):
>> - No issues
...
> Ugh. It helps when you get the parameters for ALIGN() correctly.
>
> Please just squash the patch below into the last patch, then things
> should work fine. If you like I can resend a v3, but I guess the change
> is small enough?

With that fix squashed in, the series,
Tested-by: Stephen Warren <swarren@wwwdotorg.org>

(You probably want to Cc Tom Rini on the revised patches since he 
applies ARM board patches these days.)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v3 5/5] bcm2835 video: Map fb as cached
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 5/5] bcm2835 video: Map fb as cached Alexander Graf
@ 2016-03-22 15:22   ` Alexander Graf
  2016-03-24  0:27   ` [U-Boot] [PATCH v4 " Alexander Graf
  2016-03-24  9:31   ` [U-Boot] [PATCH v5 " Alexander Graf
  2 siblings, 0 replies; 23+ messages in thread
From: Alexander Graf @ 2016-03-22 15:22 UTC (permalink / raw)
  To: u-boot

The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
all the glorious performance boost that brings with it.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v2 -> v3:

  - Fix align parameters
  - Fix whitespace
---
 drivers/video/bcm2835.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c
index bff1fcb..7c99e2d 100644
--- a/drivers/video/bcm2835.c
+++ b/drivers/video/bcm2835.c
@@ -106,6 +106,12 @@ void lcd_ctrl_init(void *lcdbase)
 
 	gd->fb_base = bus_to_phys(
 		msg_setup->allocate_buffer.body.resp.fb_address);
+
+	/* Enable dcache for the frame buffer */
+	mmu_set_region_dcache_behaviour(gd->fb_base,
+		ALIGN(msg_setup->allocate_buffer.body.resp.fb_size, PAGE_SIZE),
+		DCACHE_WRITEBACK);
+	lcd_set_flush_dcache(1);
 }
 
 void lcd_enable(void)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v4 5/5] bcm2835 video: Map fb as cached
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 5/5] bcm2835 video: Map fb as cached Alexander Graf
  2016-03-22 15:22   ` [U-Boot] [PATCH v3 " Alexander Graf
@ 2016-03-24  0:27   ` Alexander Graf
  2016-03-24  2:05     ` Stephen Warren
  2016-03-24  9:31   ` [U-Boot] [PATCH v5 " Alexander Graf
  2 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2016-03-24  0:27 UTC (permalink / raw)
  To: u-boot

The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
all the glorious performance boost that brings with it.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v2 -> v3:

  - Fix align parameters
  - Fix whitespace

v3 -> v4:

  - Align start of fb as well to align with segments
  - Align fb size on segment size, not page size
---
 drivers/video/bcm2835.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c
index bff1fcb..a376537 100644
--- a/drivers/video/bcm2835.c
+++ b/drivers/video/bcm2835.c
@@ -44,6 +44,7 @@ void lcd_ctrl_init(void *lcdbase)
 	ALLOC_CACHE_ALIGN_BUFFER(struct msg_setup, msg_setup, 1);
 	int ret;
 	u32 w, h;
+	u32 fb_start, fb_end;
 
 	debug("bcm2835: Query resolution...\n");
 
@@ -106,6 +107,14 @@ void lcd_ctrl_init(void *lcdbase)
 
 	gd->fb_base = bus_to_phys(
 		msg_setup->allocate_buffer.body.resp.fb_address);
+
+	/* Enable dcache for the frame buffer */
+	fb_start = ALIGN(gd->fb_base, 1 << MMU_SECTION_SHIFT);
+	fb_end = gd->fb_base + msg_setup->allocate_buffer.body.resp.fb_size;
+	fb_end = ALIGN(fb_end, 1 << MMU_SECTION_SHIFT);
+	mmu_set_region_dcache_behaviour(fb_start, fb_end - fb_start,
+		DCACHE_WRITEBACK);
+	lcd_set_flush_dcache(1);
 }
 
 void lcd_enable(void)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v4 5/5] bcm2835 video: Map fb as cached
  2016-03-24  0:27   ` [U-Boot] [PATCH v4 " Alexander Graf
@ 2016-03-24  2:05     ` Stephen Warren
  2016-03-24  7:33       ` Alexander Graf
  0 siblings, 1 reply; 23+ messages in thread
From: Stephen Warren @ 2016-03-24  2:05 UTC (permalink / raw)
  To: u-boot

On 03/23/2016 06:27 PM, Alexander Graf wrote:
> The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
> all the glorious performance boost that brings with it.

Tested-by: Stephen Warren <swarren@wwwdotorg.org>

> diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c

> @@ -44,6 +44,7 @@ void lcd_ctrl_init(void *lcdbase)

> +	/* Enable dcache for the frame buffer */
> +	fb_start = ALIGN(gd->fb_base, 1 << MMU_SECTION_SHIFT);
> +	fb_end = gd->fb_base + msg_setup->allocate_buffer.body.resp.fb_size;
> +	fb_end = ALIGN(fb_end, 1 << MMU_SECTION_SHIFT);
> +	mmu_set_region_dcache_behaviour(fb_start, fb_end - fb_start,
> +		DCACHE_WRITEBACK);

Shouldn't the start be rounded down and the end be rounded up to cover 
the entire FB RAM? Normally it might be problematic to push the 
boundaries outside the relevant data block since it would affect 
adjacent memory that might not then be aware of the cache setup. 
However, since the entire FB is in the VC portion of RAM and U-Boot is 
touching nothing else there, it should be fine in this case. (And even 
if it wasn't, the two ALIGNs would still want to move in opposite 
directions; start up and end down).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v4 5/5] bcm2835 video: Map fb as cached
  2016-03-24  2:05     ` Stephen Warren
@ 2016-03-24  7:33       ` Alexander Graf
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Graf @ 2016-03-24  7:33 UTC (permalink / raw)
  To: u-boot



> Am 24.03.2016 um 03:05 schrieb Stephen Warren <swarren@wwwdotorg.org>:
> 
>> On 03/23/2016 06:27 PM, Alexander Graf wrote:
>> The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
>> all the glorious performance boost that brings with it.
> 
> Tested-by: Stephen Warren <swarren@wwwdotorg.org>
> 
>> diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c
> 
>> @@ -44,6 +44,7 @@ void lcd_ctrl_init(void *lcdbase)
> 
>> +    /* Enable dcache for the frame buffer */
>> +    fb_start = ALIGN(gd->fb_base, 1 << MMU_SECTION_SHIFT);
>> +    fb_end = gd->fb_base + msg_setup->allocate_buffer.body.resp.fb_size;
>> +    fb_end = ALIGN(fb_end, 1 << MMU_SECTION_SHIFT);
>> +    mmu_set_region_dcache_behaviour(fb_start, fb_end - fb_start,
>> +        DCACHE_WRITEBACK);
> 
> Shouldn't the start be rounded down and the end be rounded up to cover the entire FB RAM? Normally it might be problematic to push the boundaries outside the relevant data block since it would affect adjacent memory that might not then be aware of the cache setup. However, since the entire FB is in the VC portion of RAM and U-Boot is touching nothing else there, it should be fine in this case. (And even if it wasn't, the two ALIGNs would still want to move in opposite directions; start up and end down).

Yes, my bad. I'll send another version.

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v5 5/5] bcm2835 video: Map fb as cached
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 5/5] bcm2835 video: Map fb as cached Alexander Graf
  2016-03-22 15:22   ` [U-Boot] [PATCH v3 " Alexander Graf
  2016-03-24  0:27   ` [U-Boot] [PATCH v4 " Alexander Graf
@ 2016-03-24  9:31   ` Alexander Graf
  2016-03-25  3:28     ` Stephen Warren
  2016-03-27 22:30     ` [U-Boot] [U-Boot,v5,5/5] " Tom Rini
  2 siblings, 2 replies; 23+ messages in thread
From: Alexander Graf @ 2016-03-24  9:31 UTC (permalink / raw)
  To: u-boot

The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
all the glorious performance boost that brings with it.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v2 -> v3:

  - Fix align parameters
  - Fix whitespace

v3 -> v4:

  - Align start of fb as well to align with segments
  - Align fb size on segment size, not page size

v4 -> v5:

  - Align fb start down, not up
---
 drivers/video/bcm2835.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/video/bcm2835.c b/drivers/video/bcm2835.c
index bff1fcb..cd605e6 100644
--- a/drivers/video/bcm2835.c
+++ b/drivers/video/bcm2835.c
@@ -44,6 +44,7 @@ void lcd_ctrl_init(void *lcdbase)
 	ALLOC_CACHE_ALIGN_BUFFER(struct msg_setup, msg_setup, 1);
 	int ret;
 	u32 w, h;
+	u32 fb_start, fb_end;
 
 	debug("bcm2835: Query resolution...\n");
 
@@ -106,6 +107,14 @@ void lcd_ctrl_init(void *lcdbase)
 
 	gd->fb_base = bus_to_phys(
 		msg_setup->allocate_buffer.body.resp.fb_address);
+
+	/* Enable dcache for the frame buffer */
+	fb_start = gd->fb_base & ~(MMU_SECTION_SIZE - 1);
+	fb_end = gd->fb_base + msg_setup->allocate_buffer.body.resp.fb_size;
+	fb_end = ALIGN(fb_end, 1 << MMU_SECTION_SHIFT);
+	mmu_set_region_dcache_behaviour(fb_start, fb_end - fb_start,
+		DCACHE_WRITEBACK);
+	lcd_set_flush_dcache(1);
 }
 
 void lcd_enable(void)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v5 5/5] bcm2835 video: Map fb as cached
  2016-03-24  9:31   ` [U-Boot] [PATCH v5 " Alexander Graf
@ 2016-03-25  3:28     ` Stephen Warren
  2016-03-27 22:30     ` [U-Boot] [U-Boot,v5,5/5] " Tom Rini
  1 sibling, 0 replies; 23+ messages in thread
From: Stephen Warren @ 2016-03-25  3:28 UTC (permalink / raw)
  To: u-boot

On 03/24/2016 03:31 AM, Alexander Graf wrote:
> The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
> all the glorious performance boost that brings with it.

Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2
  2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
                   ` (5 preceding siblings ...)
  2016-03-17  4:26 ` [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Stephen Warren
@ 2016-03-25  4:13 ` Stephen Warren
  2016-08-11 13:42   ` Alexander Graf
  6 siblings, 1 reply; 23+ messages in thread
From: Stephen Warren @ 2016-03-25  4:13 UTC (permalink / raw)
  To: u-boot

On 03/16/2016 08:41 AM, Alexander Graf wrote:
> This patch set converts the Raspberry Pi 2 system to properly make use of
> the caches available in it.
>
> Because we're running in HYP mode, we first need to teach U-Boot how to
> make use of HYP registers and the LPAE page layout which is mandated by
> hardware when running in HYP mode.
>
> Then while we're at it, also mark the frame buffer cached to speed up
> screen updates.
>
> With this patch set, my Raspberry Pi 3 running in AArch32 mode is a *lot*
> faster than without.
>
> Please verify that the code works on a RPi2 as well and doesn't break the
> original Pi. In theory it should work, but I only have a 3 to test on
> available here.

I did find one quirk with this series (as tested in my rpi_dev branch on 
github): HDMI console scrolling is now extremely fast for 32-bit builds. 
However, it's noticeably slower on the 64-bit RPi 3 build. I wonder if 
the DCACHE_* constants aren't optimal for AArch64? Perhaps this can all 
be explained instead by RPi 3 needing a slower core clock to support a 
fixed mini UART frequency; that probably slows down the ARM access to DRAM.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [U-Boot, v2, 1/5] arm64: Add 32bit arm compatible dcache definitions
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 1/5] arm64: Add 32bit arm compatible dcache definitions Alexander Graf
@ 2016-03-27 22:25   ` Tom Rini
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Rini @ 2016-03-27 22:25 UTC (permalink / raw)
  To: u-boot

On Wed, Mar 16, 2016 at 03:41:20PM +0100, Alexander Graf wrote:

> We want to be able to reuse device drivers from 32bit code, so let's add
> definitions for all the dcache options that 32bit code has.
> 
> While at it, fix up the DCACHE_OFF configuration. That was setting the bits
> to declare a PTE a PTE and left the MAIR index bit at 0. Drop the useless
> bits and make the index explicit.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20160327/635b4e77/attachment.sig>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [U-Boot, v2, 2/5] arm: Add support for HYP mode and LPAE page tables
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 2/5] arm: Add support for HYP mode and LPAE page tables Alexander Graf
@ 2016-03-27 22:25   ` Tom Rini
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Rini @ 2016-03-27 22:25 UTC (permalink / raw)
  To: u-boot

On Wed, Mar 16, 2016 at 03:41:21PM +0100, Alexander Graf wrote:

> We currently always modify the SVC versions of registers and only support
> the short descriptor PTE format.
> 
> Some boards however (like the RPi2) run in HYP mode. There, we need to modify
> the HYP version of system registers and HYP mode only supports the long
> descriptor PTE format.
> 
> So this patch introduces support for both long descriptor PTEs and HYP mode
> registers.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20160327/fb1b8172/attachment.sig>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [U-Boot, v2, 3/5] lcd: Fix compile warning in 64bit mode
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 3/5] lcd: Fix compile warning in 64bit mode Alexander Graf
@ 2016-03-27 22:25   ` Tom Rini
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Rini @ 2016-03-27 22:25 UTC (permalink / raw)
  To: u-boot

On Wed, Mar 16, 2016 at 03:41:22PM +0100, Alexander Graf wrote:

> When compiling the code for 64bit, the lcd code emits warnings because it
> tries to cast pointers to 32bit values. Fix it by casting them to longs
> instead, actually properly aligning with the function prototype.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20160327/16fec502/attachment.sig>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [U-Boot,v2,4/5] RPi: Enable caches for rpi2
  2016-03-16 14:41 ` [U-Boot] [PATCH v2 4/5] RPi: Enable caches for rpi2 Alexander Graf
@ 2016-03-27 22:25   ` Tom Rini
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Rini @ 2016-03-27 22:25 UTC (permalink / raw)
  To: u-boot

On Wed, Mar 16, 2016 at 03:41:23PM +0100, Alexander Graf wrote:

> Now that we have support for running with caches enabled in HYP mode,
> opt in to that on the Raspberry Pi 2. This brings a significant performance
> boost.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20160327/00e9fb1d/attachment.sig>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [U-Boot,v5,5/5] bcm2835 video: Map fb as cached
  2016-03-24  9:31   ` [U-Boot] [PATCH v5 " Alexander Graf
  2016-03-25  3:28     ` Stephen Warren
@ 2016-03-27 22:30     ` Tom Rini
  1 sibling, 0 replies; 23+ messages in thread
From: Tom Rini @ 2016-03-27 22:30 UTC (permalink / raw)
  To: u-boot

On Thu, Mar 24, 2016 at 10:31:11AM +0100, Alexander Graf wrote:

> The bcm2835 frame buffer is in RAM, so we can easily map it as cached and gain
> all the glorious performance boost that brings with it.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> Tested-by: Stephen Warren <swarren@wwwdotorg.org>
> Acked-by: Stephen Warren <swarren@wwwdotorg.org>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20160327/1dfe89b8/attachment.sig>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2
  2016-03-25  4:13 ` Stephen Warren
@ 2016-08-11 13:42   ` Alexander Graf
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Graf @ 2016-08-11 13:42 UTC (permalink / raw)
  To: u-boot



On 25.03.16 05:13, Stephen Warren wrote:
> On 03/16/2016 08:41 AM, Alexander Graf wrote:
>> This patch set converts the Raspberry Pi 2 system to properly make use of
>> the caches available in it.
>>
>> Because we're running in HYP mode, we first need to teach U-Boot how to
>> make use of HYP registers and the LPAE page layout which is mandated by
>> hardware when running in HYP mode.
>>
>> Then while we're at it, also mark the frame buffer cached to speed up
>> screen updates.
>>
>> With this patch set, my Raspberry Pi 3 running in AArch32 mode is a *lot*
>> faster than without.
>>
>> Please verify that the code works on a RPi2 as well and doesn't break the
>> original Pi. In theory it should work, but I only have a 3 to test on
>> available here.
> 
> I did find one quirk with this series (as tested in my rpi_dev branch on
> github): HDMI console scrolling is now extremely fast for 32-bit builds.
> However, it's noticeably slower on the 64-bit RPi 3 build. I wonder if
> the DCACHE_* constants aren't optimal for AArch64? Perhaps this can all
> be explained instead by RPi 3 needing a slower core clock to support a
> fixed mini UART frequency; that probably slows down the ARM access to DRAM.

I tried with the latest code and my patches that allow for disabled
uart, so that the core shouldn't get slowed down anymore.

It still does feel significantly slower than it should.

We set the memory type to WRITEBACK which translates to index 4 into MAIR

  DCACHE_WRITEBACK = 4 << 2,

->

  #define MT_NORMAL               4

So we need to look at what MAIR 4 looks like:

  #define MEMORY_ATTRIBUTES [...] |
        (UL(0xff) << (MT_NORMAL * 8)))

and that is

  Normal Memory, Outer Write-back non-transient
  Outer Read Allocate
  Outer Write Allocate
  Normal Memory, Inner Write-back non-transient
  Inner Read Allocate
  Inner Write Allocate

So we should be using as much cache as we can ;). I'm not quite sure why
it's still slower than you'd expect though. Hrm.


Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-08-11 13:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-16 14:41 [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Alexander Graf
2016-03-16 14:41 ` [U-Boot] [PATCH v2 1/5] arm64: Add 32bit arm compatible dcache definitions Alexander Graf
2016-03-27 22:25   ` [U-Boot] [U-Boot, v2, " Tom Rini
2016-03-16 14:41 ` [U-Boot] [PATCH v2 2/5] arm: Add support for HYP mode and LPAE page tables Alexander Graf
2016-03-27 22:25   ` [U-Boot] [U-Boot, v2, " Tom Rini
2016-03-16 14:41 ` [U-Boot] [PATCH v2 3/5] lcd: Fix compile warning in 64bit mode Alexander Graf
2016-03-27 22:25   ` [U-Boot] [U-Boot, v2, " Tom Rini
2016-03-16 14:41 ` [U-Boot] [PATCH v2 4/5] RPi: Enable caches for rpi2 Alexander Graf
2016-03-27 22:25   ` [U-Boot] [U-Boot,v2,4/5] " Tom Rini
2016-03-16 14:41 ` [U-Boot] [PATCH v2 5/5] bcm2835 video: Map fb as cached Alexander Graf
2016-03-22 15:22   ` [U-Boot] [PATCH v3 " Alexander Graf
2016-03-24  0:27   ` [U-Boot] [PATCH v4 " Alexander Graf
2016-03-24  2:05     ` Stephen Warren
2016-03-24  7:33       ` Alexander Graf
2016-03-24  9:31   ` [U-Boot] [PATCH v5 " Alexander Graf
2016-03-25  3:28     ` Stephen Warren
2016-03-27 22:30     ` [U-Boot] [U-Boot,v5,5/5] " Tom Rini
2016-03-17  4:26 ` [U-Boot] [PATCH v2 0/5] Enable caches for the RPi2 Stephen Warren
2016-03-17  5:35   ` Alexander Graf
2016-03-17  7:58   ` Alexander Graf
2016-03-19  2:12     ` Stephen Warren
2016-03-25  4:13 ` Stephen Warren
2016-08-11 13:42   ` Alexander Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.