All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] arm: stm32mp1: activate data cache in SPL and before relocation
@ 2020-04-30 14:30 Patrick Delaunay
  2020-04-30 14:30 ` [PATCH v4 1/2] arm: stm32mp: " Patrick Delaunay
  2020-04-30 14:30 ` [PATCH v4 2/2] arm: stm32mp: activate data cache on DDR in SPL Patrick Delaunay
  0 siblings, 2 replies; 5+ messages in thread
From: Patrick Delaunay @ 2020-04-30 14:30 UTC (permalink / raw)
  To: u-boot


V4 = cosmetic update of the previous serie V3
"arm: stm32mp1: activate data cache in SPL and before relocation"
http://patchwork.ozlabs.org/project/uboot/list/?series=172557

This serie depends on the ARM cache serie:
"arm: caches: allow to activate dcache in SPL and in U-Boot pre-reloc"
http://patchwork.ozlabs.org/project/uboot/list/?series=172555

I move tlb in .data section and simplify the implementation by reusing
the default weak function dram_bank_mmu_setup() for MMU configuration
and mmu_set_region_dcache_behaviour() to setup the specific behavior.

I also activate data cache on DDR for SPL.

For information the gain of the second patch is limited (few ms) for boot
from SDCARD: the SDMMC IP use internal DMA and data cache on DDR is
not really used.

Gain should be better for other boot use-case.

Example of bootstage report on STM32MP157C-DK2, boot from SD card.

1/ For trusted boot chain with TF-A

a) Before

    STM32MP> bootstage report
    Timer summary in microseconds (9 records):
           Mark    Elapsed  Stage
              0          0  reset
        583,290    583,290  board_init_f
      2,348,898  1,765,608  board_init_r
      2,664,580    315,682  id=64
      2,704,027     39,447  id=65
      2,704,729        702  main_loop
      5,563,519  2,858,790  id=175

    Accumulated time:
                    41,696  dm_r
                   615,561  dm_f

b) After the serie

    STM32MP> bootstage report
    Timer summary in microseconds (9 records):
           Mark    Elapsed  Stage
              0          0  reset
        583,401    583,401  board_init_f
        727,725    144,324  board_init_r
      1,043,362    315,637  id=64
      1,082,806     39,444  id=65
      1,083,507        701  main_loop
      3,680,827  2,597,320  id=175

    Accumulated time:
                    36,047  dm_f
                    41,718  dm_r

2/ And for the basic boot chain with SPL

a) Before:

    STM32MP> bootstage report
    Timer summary in microseconds (12 records):
           Mark    Elapsed  Stage
              0          0  reset
        195,613    195,613  SPL
        837,867    642,254  end SPL
        840,117      2,250  board_init_f
      2,739,639  1,899,522  board_init_r
      3,066,815    327,176  id=64
      3,103,377     36,562  id=65
      3,104,078        701  main_loop
      3,142,171     38,093  id=175

    Accumulated time:
                    38,124  dm_spl
                    41,956  dm_r
                   648,861  dm_f

b) After the serie

    STM32MP> bootstage report
    Timer summary in microseconds (12 records):
           Mark    Elapsed  Stage
              0          0  reset
        195,859    195,859  SPL
        330,190    134,331  end SPL
        332,408      2,218  board_init_f
        482,688    150,280  board_init_r
        808,694    326,006  id=64
        845,029     36,335  id=65
        845,730        701  main_loop
      3,281,876  2,436,146  id=175

    Accumulated time:
                     3,169  dm_spl
                    36,041  dm_f
                    41,701  dm_r

    STM32MP> bootstage report
    Timer summary in microseconds (12 records):
           Mark    Elapsed  Stage
              0          0  reset
        211,036    211,036  SPL
        343,393    132,357  end SPL
        345,645      2,252  board_init_f
        496,596    150,951  board_init_r
        822,256    325,660  id=64
        858,451     36,195  id=65
        859,153        702  main_loop
      3,414,706  2,555,553  id=175

    Accumulated time:
                     3,132  dm_spl
                    36,005  dm_f
                    41,695  dm_r


Changes in v4:
- fix commit message and comment and add Patrice Chotard reviewed-by
- fix commit message and add Patrice Chotard reviewed-by

Changes in v3:
- add Information in commit-message on early malloc and .BSS
- remove debug message "bye"

Changes in v2:
- create a new function early_enable_caches
- use TLB in .init section
- use the default weak dram_bank_mmu_setup() and
  use mmu_set_region_dcache_behaviour() to setup
  the early MMU configuration
- enable data cache on DDR in SPL, after DDR controller initialization
- new

Patrick Delaunay (2):
  arm: stm32mp: activate data cache in SPL and before relocation
  arm: stm32mp: activate data cache on DDR in SPL

 arch/arm/mach-stm32mp/cpu.c | 43 ++++++++++++++++++++++++++++++++++++-
 arch/arm/mach-stm32mp/spl.c | 19 ++++++++++++++++
 2 files changed, 61 insertions(+), 1 deletion(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] arm: stm32mp: activate data cache in SPL and before relocation
  2020-04-30 14:30 [PATCH v4 0/2] arm: stm32mp1: activate data cache in SPL and before relocation Patrick Delaunay
@ 2020-04-30 14:30 ` Patrick Delaunay
  2020-05-14  9:39   ` Patrick DELAUNAY
  2020-04-30 14:30 ` [PATCH v4 2/2] arm: stm32mp: activate data cache on DDR in SPL Patrick Delaunay
  1 sibling, 1 reply; 5+ messages in thread
From: Patrick Delaunay @ 2020-04-30 14:30 UTC (permalink / raw)
  To: u-boot

Activate the data cache in SPL and in U-Boot before relocation.

In arch_cpu_init(), the function early_enable_caches() sets the early
TLB, early_tlb[] located .init section, and set cacheable:
- for SPL, all the SYSRAM
- for U-Boot, all the DDR

After relocation, the function enable_caches() (called by board_r)
reconfigures the MMU with new TLB location (reserved in
board_f.c::reserve_mmu) and re-enable the data cache.

This patch allows to reduce the execution time, particularly
- for the device tree parsing in U-Boot pre-reloc stage
  (dm_extended_scan_fd =>dm_scan_fdt)
- in I2C timing computation in SPL (stm32_i2c_choose_solution())

For example, the result on STM32MP157C-DK2 board is:
   1,6s gain for trusted boot chain with TF-A
   2,2s gain for basic boot chain with SPL

For information, as TLB is added in .data section, the binary size
increased and the SPL load time by ROM code increased (30ms on DK2).

But early malloc can't be used for TLB because arch_cpu_init()
is executed before the early poll initialization done in spl_common_init()
called by spl_early_init() So it too late for this use case.
And if I initialize the MMU and the cache after this function it is
too late, as dm_init_and_scan and fdt parsing is also called in
spl_common_init().

And .BSS can be used in board_init_f(): only stack and global can use
before BSS init done in board_init_r().

So .data is the better solution without hardcoded location but if you
have size issue for SPL you can deactivate cache for SPL only
(with CONFIG_SPL_SYS_DCACHE_OFF).

Reviewed-by: Patrice Chotard <patrice.chotard@st.com>
Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
---

Changes in v4:
- fix commit message and comment and add Patrice Chotard reviewed-by

Changes in v3:
- add Information in commit-message on early malloc and .BSS

Changes in v2:
- create a new function early_enable_caches
- use TLB in .init section
- use the default weak dram_bank_mmu_setup() and
  use mmu_set_region_dcache_behaviour() to setup
  the early MMU configuration
- enable data cache on DDR in SPL, after DDR controller initialization

 arch/arm/mach-stm32mp/cpu.c | 43 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-stm32mp/cpu.c b/arch/arm/mach-stm32mp/cpu.c
index 74d03fa7dd..bbffa3b9ff 100644
--- a/arch/arm/mach-stm32mp/cpu.c
+++ b/arch/arm/mach-stm32mp/cpu.c
@@ -75,6 +75,12 @@
 #define PKG_SHIFT	27
 #define PKG_MASK	GENMASK(2, 0)
 
+/*
+ * early TLB into the .data section so that it not get cleared
+ * with 16kB allignment (see TTBR0_BASE_ADDR_MASK)
+ */
+u8 early_tlb[PGTABLE_SIZE] __section(".data") __aligned(0x4000);
+
 #if !defined(CONFIG_SPL) || defined(CONFIG_SPL_BUILD)
 #ifndef CONFIG_TFABOOT
 static void security_init(void)
@@ -186,6 +192,32 @@ u32 get_bootmode(void)
 		    TAMP_BOOT_MODE_SHIFT;
 }
 
+/*
+ * initialize the MMU and activate cache in SPL or in U-Boot pre-reloc stage
+ * MMU/TLB is updated in enable_caches() for U-Boot after relocation
+ * or is deactivated in U-Boot entry function start.S::cpu_init_cp15
+ */
+static void early_enable_caches(void)
+{
+	/* I-cache is already enabled in start.S: cpu_init_cp15 */
+
+	if (CONFIG_IS_ENABLED(SYS_DCACHE_OFF))
+		return;
+
+	gd->arch.tlb_size = PGTABLE_SIZE;
+	gd->arch.tlb_addr = (unsigned long)&early_tlb;
+
+	dcache_enable();
+
+	if (IS_ENABLED(CONFIG_SPL_BUILD))
+		mmu_set_region_dcache_behaviour(STM32_SYSRAM_BASE,
+						STM32_SYSRAM_SIZE,
+						DCACHE_DEFAULT_OPTION);
+	else
+		mmu_set_region_dcache_behaviour(STM32_DDR_BASE, STM32_DDR_SIZE,
+						DCACHE_DEFAULT_OPTION);
+}
+
 /*
  * Early system init
  */
@@ -193,6 +225,8 @@ int arch_cpu_init(void)
 {
 	u32 boot_mode;
 
+	early_enable_caches();
+
 	/* early armv7 timer init: needed for polling */
 	timer_init();
 
@@ -225,7 +259,14 @@ int arch_cpu_init(void)
 
 void enable_caches(void)
 {
-	/* Enable D-cache. I-cache is already enabled in start.S */
+	/* I-cache is already enabled in start.S: icache_enable() not needed */
+
+	/* deactivate the data cache, early enabled in arch_cpu_init() */
+	dcache_disable();
+	/*
+	 * update MMU after relocation and enable the data cache
+	 * warning: the TLB location udpated in board_f.c::reserve_mmu
+	 */
 	dcache_enable();
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] arm: stm32mp: activate data cache on DDR in SPL
  2020-04-30 14:30 [PATCH v4 0/2] arm: stm32mp1: activate data cache in SPL and before relocation Patrick Delaunay
  2020-04-30 14:30 ` [PATCH v4 1/2] arm: stm32mp: " Patrick Delaunay
@ 2020-04-30 14:30 ` Patrick Delaunay
  2020-05-14  9:39   ` Patrick DELAUNAY
  1 sibling, 1 reply; 5+ messages in thread
From: Patrick Delaunay @ 2020-04-30 14:30 UTC (permalink / raw)
  To: u-boot

Activate cache on DDR to improve the accesses to DDR used by SPL:
- CONFIG_SPL_BSS_START_ADDR
- CONFIG_SYS_SPL_MALLOC_START

Cache is configured only when DDR is fully initialized,
to avoid speculative access and issue in get_ram_size().
Data cache is deactivated at the end of SPL, to flush the data cache
and the TLB.

Reviewed-by: Patrice Chotard <patrice.chotard@st.com>
Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
---

Changes in v4:
- fix commit message and add Patrice Chotard reviewed-by

Changes in v3:
- remove debug message "bye"

Changes in v2:
- new

 arch/arm/mach-stm32mp/spl.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm/mach-stm32mp/spl.c b/arch/arm/mach-stm32mp/spl.c
index f85391c6af..e50a21c3b7 100644
--- a/arch/arm/mach-stm32mp/spl.c
+++ b/arch/arm/mach-stm32mp/spl.c
@@ -4,6 +4,7 @@
  */
 
 #include <common.h>
+#include <cpu_func.h>
 #include <dm.h>
 #include <hang.h>
 #include <spl.h>
@@ -115,4 +116,22 @@ void board_init_f(ulong dummy)
 		printf("DRAM init failed: %d\n", ret);
 		hang();
 	}
+
+	/*
+	 * activate cache on DDR only when DDR is fully initialized
+	 * to avoid speculative access and issue in get_ram_size()
+	 */
+	if (!CONFIG_IS_ENABLED(SYS_DCACHE_OFF))
+		mmu_set_region_dcache_behaviour(STM32_DDR_BASE, STM32_DDR_SIZE,
+						DCACHE_DEFAULT_OPTION);
+}
+
+void spl_board_prepare_for_boot(void)
+{
+	dcache_disable();
+}
+
+void spl_board_prepare_for_boot_linux(void)
+{
+	dcache_disable();
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] arm: stm32mp: activate data cache in SPL and before relocation
  2020-04-30 14:30 ` [PATCH v4 1/2] arm: stm32mp: " Patrick Delaunay
@ 2020-05-14  9:39   ` Patrick DELAUNAY
  0 siblings, 0 replies; 5+ messages in thread
From: Patrick DELAUNAY @ 2020-05-14  9:39 UTC (permalink / raw)
  To: u-boot

Hi,

> From: Patrick DELAUNAY <patrick.delaunay@st.com>
> Sent: jeudi 30 avril 2020 16:30
> 
> Activate the data cache in SPL and in U-Boot before relocation.
> 
> In arch_cpu_init(), the function early_enable_caches() sets the early TLB,
> early_tlb[] located .init section, and set cacheable:
> - for SPL, all the SYSRAM
> - for U-Boot, all the DDR
> 
> After relocation, the function enable_caches() (called by board_r) reconfigures the
> MMU with new TLB location (reserved in
> board_f.c::reserve_mmu) and re-enable the data cache.
> 
> This patch allows to reduce the execution time, particularly
> - for the device tree parsing in U-Boot pre-reloc stage
>   (dm_extended_scan_fd =>dm_scan_fdt)
> - in I2C timing computation in SPL (stm32_i2c_choose_solution())
> 
> For example, the result on STM32MP157C-DK2 board is:
>    1,6s gain for trusted boot chain with TF-A
>    2,2s gain for basic boot chain with SPL
> 
> For information, as TLB is added in .data section, the binary size increased and
> the SPL load time by ROM code increased (30ms on DK2).
> 
> But early malloc can't be used for TLB because arch_cpu_init() is executed before
> the early poll initialization done in spl_common_init() called by spl_early_init() So it
> too late for this use case.
> And if I initialize the MMU and the cache after this function it is too late, as
> dm_init_and_scan and fdt parsing is also called in spl_common_init().
> 
> And .BSS can be used in board_init_f(): only stack and global can use before BSS
> init done in board_init_r().
> 
> So .data is the better solution without hardcoded location but if you have size
> issue for SPL you can deactivate cache for SPL only (with
> CONFIG_SPL_SYS_DCACHE_OFF).
> 
> Reviewed-by: Patrice Chotard <patrice.chotard@st.com>
> Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
> ---
> 
> Changes in v4:
> - fix commit message and comment and add Patrice Chotard reviewed-by
> 
> Changes in v3:
> - add Information in commit-message on early malloc and .BSS
> 
> Changes in v2:
> - create a new function early_enable_caches
> - use TLB in .init section
> - use the default weak dram_bank_mmu_setup() and
>   use mmu_set_region_dcache_behaviour() to setup
>   the early MMU configuration
> - enable data cache on DDR in SPL, after DDR controller initialization
> 
>  arch/arm/mach-stm32mp/cpu.c | 43
> ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 42 insertions(+), 1 deletion(-)
> 

Applied to u-boot-stm/master, thanks!

Regards

Patrick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] arm: stm32mp: activate data cache on DDR in SPL
  2020-04-30 14:30 ` [PATCH v4 2/2] arm: stm32mp: activate data cache on DDR in SPL Patrick Delaunay
@ 2020-05-14  9:39   ` Patrick DELAUNAY
  0 siblings, 0 replies; 5+ messages in thread
From: Patrick DELAUNAY @ 2020-05-14  9:39 UTC (permalink / raw)
  To: u-boot

Hi

> From: Patrick DELAUNAY <patrick.delaunay@st.com>
> Sent: jeudi 30 avril 2020 16:30
> 
> Activate cache on DDR to improve the accesses to DDR used by SPL:
> - CONFIG_SPL_BSS_START_ADDR
> - CONFIG_SYS_SPL_MALLOC_START
> 
> Cache is configured only when DDR is fully initialized, to avoid speculative access
> and issue in get_ram_size().
> Data cache is deactivated at the end of SPL, to flush the data cache and the TLB.
> 
> Reviewed-by: Patrice Chotard <patrice.chotard@st.com>
> Signed-off-by: Patrick Delaunay <patrick.delaunay@st.com>
> ---
> 
> Changes in v4:
> - fix commit message and add Patrice Chotard reviewed-by
> 
> Changes in v3:
> - remove debug message "bye"
> 
> Changes in v2:
> - new
> 
>  arch/arm/mach-stm32mp/spl.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 

Applied to u-boot-stm/master, thanks!

Regards

Patrick

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-14  9:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-30 14:30 [PATCH v4 0/2] arm: stm32mp1: activate data cache in SPL and before relocation Patrick Delaunay
2020-04-30 14:30 ` [PATCH v4 1/2] arm: stm32mp: " Patrick Delaunay
2020-05-14  9:39   ` Patrick DELAUNAY
2020-04-30 14:30 ` [PATCH v4 2/2] arm: stm32mp: activate data cache on DDR in SPL Patrick Delaunay
2020-05-14  9:39   ` Patrick DELAUNAY

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.