All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Reduce the complexity of add_map() and count_required_pts()
@ 2023-02-14 13:38 Ying-Chun Liu (PaulLiu)
  2023-02-14 13:38 ` [PATCH 1/2] arm64: Reduce add_map() complexity Ying-Chun Liu (PaulLiu)
  2023-02-14 13:38 ` [PATCH 2/2] arm64: Reduce PT size estimation complexity Ying-Chun Liu (PaulLiu)
  0 siblings, 2 replies; 19+ messages in thread
From: Ying-Chun Liu (PaulLiu) @ 2023-02-14 13:38 UTC (permalink / raw)
  To: u-boot; +Cc: Ying-Chun Liu (PaulLiu)

Reduce the complexity of add_map() and count_required_pts() to gain
better performance.

Marc Zyngier (2):
  arm64: Reduce add_map() complexity
  arm64: Reduce PT size estimation complexity

 arch/arm/cpu/armv8/cache_v8.c | 201 +++++++++++++---------------------
 1 file changed, 79 insertions(+), 122 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] arm64: Reduce add_map() complexity
  2023-02-14 13:38 [PATCH 0/2] Reduce the complexity of add_map() and count_required_pts() Ying-Chun Liu (PaulLiu)
@ 2023-02-14 13:38 ` Ying-Chun Liu (PaulLiu)
  2023-03-07 17:52   ` Tom Rini
                     ` (2 more replies)
  2023-02-14 13:38 ` [PATCH 2/2] arm64: Reduce PT size estimation complexity Ying-Chun Liu (PaulLiu)
  1 sibling, 3 replies; 19+ messages in thread
From: Ying-Chun Liu (PaulLiu) @ 2023-02-14 13:38 UTC (permalink / raw)
  To: u-boot; +Cc: Marc Zyngier, Pierre-Clément Tosi, Ying-Chun Liu, Tom Rini

From: Marc Zyngier <maz@kernel.org>

In the add_map() function, for each level it populates, it iterates from
the root of the PT tree, making it ineficient if a mapping needs to occur
past level 1.

Instead, replace it with a recursive (and much simpler) algorithm
that keeps the complexity as low as possible. With this, mapping
512GB at level 2 goes from several seconds down to not measurable
on an A55 machine.

We keep the block mappings at level 1 for now though.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
[ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
  upstream ]
Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
Cc: Tom Rini <trini@konsulko.com>
Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6
Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059
---
 arch/arm/cpu/armv8/cache_v8.c | 94 +++++++++++++++++------------------
 1 file changed, 46 insertions(+), 48 deletions(-)

diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
index f333ad8889..876344e1b4 100644
--- a/arch/arm/cpu/armv8/cache_v8.c
+++ b/arch/arm/cpu/armv8/cache_v8.c
@@ -299,61 +299,59 @@ static void split_block(u64 *pte, int level)
 	set_pte_table(pte, new_table);
 }
 
-/* Add one mm_region map entry to the page tables */
-static void add_map(struct mm_region *map)
+static void map_range(u64 virt, u64 phys, u64 size, int level,
+		      u64 *table, u64 attrs)
 {
-	u64 *pte;
-	u64 virt = map->virt;
-	u64 phys = map->phys;
-	u64 size = map->size;
-	u64 attrs = map->attrs | PTE_TYPE_BLOCK | PTE_BLOCK_AF;
-	u64 blocksize;
-	int level;
-	u64 *new_table;
+	u64 map_size = BIT_ULL(level2shift(level));
+	int i, idx;
 
-	while (size) {
-		pte = find_pte(virt, 0);
-		if (pte && (pte_type(pte) == PTE_TYPE_FAULT)) {
-			debug("Creating table for virt 0x%llx\n", virt);
-			new_table = create_table();
-			set_pte_table(pte, new_table);
-		}
+	idx = (virt >> level2shift(level)) & (MAX_PTE_ENTRIES - 1);
+	for (i = idx; size; i++) {
+		u64 next_size, *next_table;
 
-		for (level = 1; level < 4; level++) {
-			pte = find_pte(virt, level);
-			if (!pte)
-				panic("pte not found\n");
-
-			blocksize = 1ULL << level2shift(level);
-			debug("Checking if pte fits for virt=%llx size=%llx blocksize=%llx\n",
-			      virt, size, blocksize);
-			if (size >= blocksize && !(virt & (blocksize - 1))) {
-				/* Page fits, create block PTE */
-				debug("Setting PTE %p to block virt=%llx\n",
-				      pte, virt);
-				if (level == 3)
-					*pte = phys | attrs | PTE_TYPE_PAGE;
-				else
-					*pte = phys | attrs;
-				virt += blocksize;
-				phys += blocksize;
-				size -= blocksize;
-				break;
-			} else if (pte_type(pte) == PTE_TYPE_FAULT) {
-				/* Page doesn't fit, create subpages */
-				debug("Creating subtable for virt 0x%llx blksize=%llx\n",
-				      virt, blocksize);
-				new_table = create_table();
-				set_pte_table(pte, new_table);
-			} else if (pte_type(pte) == PTE_TYPE_BLOCK) {
-				debug("Split block into subtable for virt 0x%llx blksize=0x%llx\n",
-				      virt, blocksize);
-				split_block(pte, level);
-			}
+		if (level >= 1 &&
+		    size >= map_size && !(virt & (map_size - 1))) {
+			if (level == 3)
+				table[i] = phys | attrs | PTE_TYPE_PAGE;
+			else
+				table[i] = phys | attrs;
+
+			virt += map_size;
+			phys += map_size;
+			size -= map_size;
+
+			continue;
 		}
+
+		/* Going one level down */
+		if (pte_type(&table[i]) == PTE_TYPE_FAULT)
+			set_pte_table(&table[i], create_table());
+
+		next_table = (u64 *)(table[i] & GENMASK_ULL(47, PAGE_SHIFT));
+		next_size = min(map_size - (virt & (map_size - 1)), size);
+
+		map_range(virt, phys, next_size, level + 1, next_table, attrs);
+
+		virt += next_size;
+		phys += next_size;
+		size -= next_size;
 	}
 }
 
+static void add_map(struct mm_region *map)
+{
+	u64 attrs = map->attrs | PTE_TYPE_BLOCK | PTE_BLOCK_AF;
+	u64 va_bits;
+	int level = 0;
+
+	get_tcr(NULL, &va_bits);
+	if (va_bits < 39)
+		level = 1;
+
+	map_range(map->virt, map->phys, map->size, level,
+		  (u64 *)gd->arch.tlb_addr, attrs);
+}
+
 enum pte_type {
 	PTE_INVAL,
 	PTE_BLOCK,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] arm64: Reduce PT size estimation complexity
  2023-02-14 13:38 [PATCH 0/2] Reduce the complexity of add_map() and count_required_pts() Ying-Chun Liu (PaulLiu)
  2023-02-14 13:38 ` [PATCH 1/2] arm64: Reduce add_map() complexity Ying-Chun Liu (PaulLiu)
@ 2023-02-14 13:38 ` Ying-Chun Liu (PaulLiu)
  2023-03-07 17:52   ` Tom Rini
  1 sibling, 1 reply; 19+ messages in thread
From: Ying-Chun Liu (PaulLiu) @ 2023-02-14 13:38 UTC (permalink / raw)
  To: u-boot; +Cc: Marc Zyngier, Pierre-Clément Tosi, Ying-Chun Liu, Tom Rini

From: Marc Zyngier <maz@kernel.org>

count_required_pts()'s complexity is high if mappings are not using the
largest possible block size (due to some other requirement such as tracking
dirty pages, for example).

Let's switch to a method that follows the pattern established with
the add_map() helper, and make it almost instantaneous instead of
taking a large amount of time if 2MB mappings are in use instead of
1GB.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
[ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
  upstream ]
Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
Cc: Tom Rini <trini@konsulko.com>
Link: https://android.googlesource.com/platform/external/u-boot/+/5d756d147e31a1cdaaa261a50e526404ca5968f5
Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059
---
 arch/arm/cpu/armv8/cache_v8.c | 109 +++++++++++-----------------------
 1 file changed, 34 insertions(+), 75 deletions(-)

diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
index 876344e1b4..697334086f 100644
--- a/arch/arm/cpu/armv8/cache_v8.c
+++ b/arch/arm/cpu/armv8/cache_v8.c
@@ -352,98 +352,57 @@ static void add_map(struct mm_region *map)
 		  (u64 *)gd->arch.tlb_addr, attrs);
 }
 
-enum pte_type {
-	PTE_INVAL,
-	PTE_BLOCK,
-	PTE_LEVEL,
-};
-
-/*
- * This is a recursively called function to count the number of
- * page tables we need to cover a particular PTE range. If you
- * call this with level = -1 you basically get the full 48 bit
- * coverage.
- */
-static int count_required_pts(u64 addr, int level, u64 maxaddr)
+static void count_range(u64 virt, u64 size, int level, int *cntp)
 {
-	int levelshift = level2shift(level);
-	u64 levelsize = 1ULL << levelshift;
-	u64 levelmask = levelsize - 1;
-	u64 levelend = addr + levelsize;
-	int r = 0;
-	int i;
-	enum pte_type pte_type = PTE_INVAL;
+	u64 map_size = BIT_ULL(level2shift(level));
+	int i, idx;
 
-	for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) {
-		struct mm_region *map = &mem_map[i];
-		u64 start = map->virt;
-		u64 end = start + map->size;
+	idx = (virt >> level2shift(level)) & (MAX_PTE_ENTRIES - 1);
+	for (i = idx; size; i++) {
+		u64 next_size;
 
-		/* Check if the PTE would overlap with the map */
-		if (max(addr, start) <= min(levelend, end)) {
-			start = max(addr, start);
-			end = min(levelend, end);
+		if (level >= 1 &&
+		    size >= map_size && !(virt & (map_size - 1))) {
+			virt += map_size;
+			size -= map_size;
 
-			/* We need a sub-pt for this level */
-			if ((start & levelmask) || (end & levelmask)) {
-				pte_type = PTE_LEVEL;
-				break;
-			}
+			continue;
+		}
 
-			/* Lv0 can not do block PTEs, so do levels here too */
-			if (level <= 0) {
-				pte_type = PTE_LEVEL;
-				break;
-			}
+		/* Going one level down */
+		(*cntp)++;
+		next_size = min(map_size - (virt & (map_size - 1)), size);
 
-			/* PTE is active, but fits into a block */
-			pte_type = PTE_BLOCK;
-		}
-	}
+		count_range(virt, next_size, level + 1, cntp);
 
-	/*
-	 * Block PTEs at this level are already covered by the parent page
-	 * table, so we only need to count sub page tables.
-	 */
-	if (pte_type == PTE_LEVEL) {
-		int sublevel = level + 1;
-		u64 sublevelsize = 1ULL << level2shift(sublevel);
-
-		/* Account for the new sub page table ... */
-		r = 1;
-
-		/* ... and for all child page tables that one might have */
-		for (i = 0; i < MAX_PTE_ENTRIES; i++) {
-			r += count_required_pts(addr, sublevel, maxaddr);
-			addr += sublevelsize;
-
-			if (addr >= maxaddr) {
-				/*
-				 * We reached the end of address space, no need
-				 * to look any further.
-				 */
-				break;
-			}
-		}
+		virt += next_size;
+		size -= next_size;
 	}
-
-	return r;
 }
 
-/* Returns the estimated required size of all page tables */
-__weak u64 get_page_table_size(void)
+static int count_ranges(void)
 {
-	u64 one_pt = MAX_PTE_ENTRIES * sizeof(u64);
-	u64 size = 0;
+	int i, count = 0, level = 0;
 	u64 va_bits;
-	int start_level = 0;
 
 	get_tcr(NULL, &va_bits);
 	if (va_bits < 39)
-		start_level = 1;
+		level = 1;
+
+	for (i = 0; mem_map[i].size || mem_map[i].attrs; i++)
+		count_range(mem_map[i].virt, mem_map[i].size, level, &count);
+
+	return count;
+}
+
+/* Returns the estimated required size of all page tables */
+__weak u64 get_page_table_size(void)
+{
+	u64 one_pt = MAX_PTE_ENTRIES * sizeof(u64);
+	u64 size;
 
 	/* Account for all page tables we would need to cover our memory map */
-	size = one_pt * count_required_pts(0, start_level - 1, 1ULL << va_bits);
+	size = one_pt * count_ranges();
 
 	/*
 	 * We need to duplicate our page table once to have an emergency pt to
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2023-02-14 13:38 ` [PATCH 1/2] arm64: Reduce add_map() complexity Ying-Chun Liu (PaulLiu)
@ 2023-03-07 17:52   ` Tom Rini
  2023-08-01  8:53   ` Oliver Graute
  2024-03-08 20:22   ` Fabio Estevam
  2 siblings, 0 replies; 19+ messages in thread
From: Tom Rini @ 2023-03-07 17:52 UTC (permalink / raw)
  To: Ying-Chun Liu (PaulLiu); +Cc: u-boot, Marc Zyngier, Pierre-Clément Tosi

[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]

On Tue, Feb 14, 2023 at 09:38:13PM +0800, Ying-Chun Liu (PaulLiu) wrote:

> From: Marc Zyngier <maz@kernel.org>
> 
> In the add_map() function, for each level it populates, it iterates from
> the root of the PT tree, making it ineficient if a mapping needs to occur
> past level 1.
> 
> Instead, replace it with a recursive (and much simpler) algorithm
> that keeps the complexity as low as possible. With this, mapping
> 512GB at level 2 goes from several seconds down to not measurable
> on an A55 machine.
> 
> We keep the block mappings at level 1 for now though.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
> [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
>   upstream ]
> Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
> Cc: Tom Rini <trini@konsulko.com>
> Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6
> Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059

Applied to u-boot/next, thanks!

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] arm64: Reduce PT size estimation complexity
  2023-02-14 13:38 ` [PATCH 2/2] arm64: Reduce PT size estimation complexity Ying-Chun Liu (PaulLiu)
@ 2023-03-07 17:52   ` Tom Rini
  0 siblings, 0 replies; 19+ messages in thread
From: Tom Rini @ 2023-03-07 17:52 UTC (permalink / raw)
  To: Ying-Chun Liu (PaulLiu); +Cc: u-boot, Marc Zyngier, Pierre-Clément Tosi

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

On Tue, Feb 14, 2023 at 09:38:14PM +0800, Ying-Chun Liu (PaulLiu) wrote:

> From: Marc Zyngier <maz@kernel.org>
> 
> count_required_pts()'s complexity is high if mappings are not using the
> largest possible block size (due to some other requirement such as tracking
> dirty pages, for example).
> 
> Let's switch to a method that follows the pattern established with
> the add_map() helper, and make it almost instantaneous instead of
> taking a large amount of time if 2MB mappings are in use instead of
> 1GB.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
> [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
>   upstream ]
> Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
> Cc: Tom Rini <trini@konsulko.com>
> Link: https://android.googlesource.com/platform/external/u-boot/+/5d756d147e31a1cdaaa261a50e526404ca5968f5
> Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059

Applied to u-boot/next, thanks!

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2023-02-14 13:38 ` [PATCH 1/2] arm64: Reduce add_map() complexity Ying-Chun Liu (PaulLiu)
  2023-03-07 17:52   ` Tom Rini
@ 2023-08-01  8:53   ` Oliver Graute
  2023-08-01 12:09     ` Oliver Graute
  2023-08-01 17:31     ` Marc Zyngier
  2024-03-08 20:22   ` Fabio Estevam
  2 siblings, 2 replies; 19+ messages in thread
From: Oliver Graute @ 2023-08-01  8:53 UTC (permalink / raw)
  To: Ying-Chun Liu (PaulLiu), Marc Zyngier
  Cc: u-boot, Marc Zyngier, Pierre-Clément Tosi, Tom Rini

On 14/02/23, Ying-Chun Liu (PaulLiu) wrote:
> From: Marc Zyngier <maz@kernel.org>
> 
> In the add_map() function, for each level it populates, it iterates from
> the root of the PT tree, making it ineficient if a mapping needs to occur
> past level 1.
> 
> Instead, replace it with a recursive (and much simpler) algorithm
> that keeps the complexity as low as possible. With this, mapping
> 512GB at level 2 goes from several seconds down to not measurable
> on an A55 machine.
> 
> We keep the block mappings at level 1 for now though.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
> [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
>   upstream ]
> Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
> Cc: Tom Rini <trini@konsulko.com>
> Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6
> Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059

Hello Marc,

this patch somehow broke the boot of my imx8qm board. I run into this
issue:

U-Boot 2023.07-rc6-00004-g5527698481 (Aug 01 2023 - 10:10:52 +0200)

Model: Advantech iMX8QM DMSSE20
Board: DMS-SE20A1 8GB
Build: SCFW 549b1e18, SECO-FW c9de51c0, ATF 5782363
Boot:  USB
DRAM:  8 GiB
"Error" handler, esr 0xbf000000
elr: 0000000080020938 lr : 00000000800209c8 (reloc)
elr: 00000000ffecf938 lr : 00000000ffecf9c8
x0 : 0000000000000001 x1 : 0000000060000000
x2 : 0000000010000000 x3 : 0000000000000002
x4 : 0000000040000000 x5 : 0060000000000401
x6 : 0000000000000800 x7 : 00000000fff44a60
x8 : 0068000000000481 x9 : 0000000000000008
x10: 000000000a200023 x11: 0000000000000002
x12: 0000000000000002 x13: 0000000080095a00
x14: 00000000ffffffff x15: 00000000ffecfd2c
x16: 000000008005454c x17: 0000000000000000
x18: 00000000fd6afd50 x19: 000000000fe00000
x20: 0000000000000000 x21: 0060000000000401
x22: 0000000060200000 x23: 0000000000200000
x24: 0000000040000808 x25: 00000000001fffff
x26: 0000000000000003 x27: 0000000060200000
x28: 0000000000000002 x29: 00000000fd6ab110

Code: a94573fb a8c67bfd d65f03c0 b9418a40 (8a160334)
Resetting CPU ...

resetting ...
SCI reboot request............................................................................................................................................................................................................................

After some bisecting this patch poped up: 

41e2787f5ec4249cb2e77a3ebd3c49035e3c6535 is the first bad commit
arm64: Reduce add_map() complexity

After I reverted everything on top of this patch its booting again with v2023.07

commit c1da6fdb5c239b432440721772d993e63cfdeb20
armv8: enable HAFDBS for other ELx when FEAT_HAFDBS is present

commit 836b8d4b205d2175b57cb9ef271e638b0c116e89
arm64: Use level-2 for largest block mappings when FEAT_HAFDBS is present

commit 6cdf6b7a340db4ddd008516181de7e08e3f8c213
arm64: Use FEAT_HAFDBS to track dirty pages when available


Do you have any idea whats going on here?

Best Regards,

Oliver

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2023-08-01  8:53   ` Oliver Graute
@ 2023-08-01 12:09     ` Oliver Graute
  2023-08-01 17:31     ` Marc Zyngier
  1 sibling, 0 replies; 19+ messages in thread
From: Oliver Graute @ 2023-08-01 12:09 UTC (permalink / raw)
  To: u-boot@lists.denx.de Marc Zyngier, Ying-Chun Liu (PaulLiu),
	jason.hui.liu, peng.fan
  Cc: Pierre-Clément Tosi, Tom Rini, oliver.graute

On 01/08/23, Oliver Graute wrote:
> On 14/02/23, Ying-Chun Liu (PaulLiu) wrote:
> > From: Marc Zyngier <maz@kernel.org>
> > 
> > In the add_map() function, for each level it populates, it iterates from
> > the root of the PT tree, making it ineficient if a mapping needs to occur
> > past level 1.
> > 
> > Instead, replace it with a recursive (and much simpler) algorithm
> > that keeps the complexity as low as possible. With this, mapping
> > 512GB at level 2 goes from several seconds down to not measurable
> > on an A55 machine.
> > 
> > We keep the block mappings at level 1 for now though.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
> > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
> >   upstream ]
> > Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
> > Cc: Tom Rini <trini@konsulko.com>
> > Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6
> > Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059
> 
> Hello Marc,
> 
> this patch somehow broke the boot of my imx8qm board. I run into this
> issue:
> 
> U-Boot 2023.07-rc6-00004-g5527698481 (Aug 01 2023 - 10:10:52 +0200)
> 
> Model: Advantech iMX8QM DMSSE20
> Board: DMS-SE20A1 8GB
> Build: SCFW 549b1e18, SECO-FW c9de51c0, ATF 5782363
> Boot:  USB
> DRAM:  8 GiB
> "Error" handler, esr 0xbf000000
> elr: 0000000080020938 lr : 00000000800209c8 (reloc)
> elr: 00000000ffecf938 lr : 00000000ffecf9c8
> x0 : 0000000000000001 x1 : 0000000060000000
> x2 : 0000000010000000 x3 : 0000000000000002
> x4 : 0000000040000000 x5 : 0060000000000401
> x6 : 0000000000000800 x7 : 00000000fff44a60
> x8 : 0068000000000481 x9 : 0000000000000008
> x10: 000000000a200023 x11: 0000000000000002
> x12: 0000000000000002 x13: 0000000080095a00
> x14: 00000000ffffffff x15: 00000000ffecfd2c
> x16: 000000008005454c x17: 0000000000000000
> x18: 00000000fd6afd50 x19: 000000000fe00000
> x20: 0000000000000000 x21: 0060000000000401
> x22: 0000000060200000 x23: 0000000000200000
> x24: 0000000040000808 x25: 00000000001fffff
> x26: 0000000000000003 x27: 0000000060200000
> x28: 0000000000000002 x29: 00000000fd6ab110
> 
> Code: a94573fb a8c67bfd d65f03c0 b9418a40 (8a160334)
> Resetting CPU ...
> 
> resetting ...
> SCI reboot request............................................................................................................................................................................................................................
> 
> After some bisecting this patch poped up: 
> 
> 41e2787f5ec4249cb2e77a3ebd3c49035e3c6535 is the first bad commit
> arm64: Reduce add_map() complexity
> 
> After I reverted everything on top of this patch its booting again with v2023.07
> 
> commit c1da6fdb5c239b432440721772d993e63cfdeb20
> armv8: enable HAFDBS for other ELx when FEAT_HAFDBS is present
> 
> commit 836b8d4b205d2175b57cb9ef271e638b0c116e89
> arm64: Use level-2 for largest block mappings when FEAT_HAFDBS is present
> 
> commit 6cdf6b7a340db4ddd008516181de7e08e3f8c213
> arm64: Use FEAT_HAFDBS to track dirty pages when available
> 
> 
> Do you have any idea whats going on here?

Is this behavior somehow releated to the known Cache coherency issue on
A53 Core on NXP imx8qm?

https://lore.kernel.org/linux-arm-kernel/ZDflS%2FCnEx8iCspk@FVFF77S0Q05N/T/#mf733406e618244b0b21fd25077febd69b31b686e

+Jason Liu
+Peng Fan

Best Regards,

Oliver

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2023-08-01  8:53   ` Oliver Graute
  2023-08-01 12:09     ` Oliver Graute
@ 2023-08-01 17:31     ` Marc Zyngier
  1 sibling, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2023-08-01 17:31 UTC (permalink / raw)
  To: Ying-Chun Liu (PaulLiu),
	Marc Zyngier, u-boot, Pierre-Clément Tosi, Tom Rini

On Tue, 01 Aug 2023 09:53:52 +0100,
Oliver Graute <oliver.graute@gmail.com> wrote:
> 
> On 14/02/23, Ying-Chun Liu (PaulLiu) wrote:
> > From: Marc Zyngier <maz@kernel.org>
> > 
> > In the add_map() function, for each level it populates, it iterates from
> > the root of the PT tree, making it ineficient if a mapping needs to occur
> > past level 1.
> > 
> > Instead, replace it with a recursive (and much simpler) algorithm
> > that keeps the complexity as low as possible. With this, mapping
> > 512GB at level 2 goes from several seconds down to not measurable
> > on an A55 machine.
> > 
> > We keep the block mappings at level 1 for now though.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
> > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
> >   upstream ]
> > Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
> > Cc: Tom Rini <trini@konsulko.com>
> > Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6
> > Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059
> 
> Hello Marc,
> 
> this patch somehow broke the boot of my imx8qm board. I run into this
> issue:
> 
> U-Boot 2023.07-rc6-00004-g5527698481 (Aug 01 2023 - 10:10:52 +0200)
> 
> Model: Advantech iMX8QM DMSSE20
> Board: DMS-SE20A1 8GB
> Build: SCFW 549b1e18, SECO-FW c9de51c0, ATF 5782363
> Boot:  USB
> DRAM:  8 GiB
> "Error" handler, esr 0xbf000000

SError.

> elr: 0000000080020938 lr : 00000000800209c8 (reloc)
> elr: 00000000ffecf938 lr : 00000000ffecf9c8
> x0 : 0000000000000001 x1 : 0000000060000000
> x2 : 0000000010000000 x3 : 0000000000000002
> x4 : 0000000040000000 x5 : 0060000000000401
> x6 : 0000000000000800 x7 : 00000000fff44a60
> x8 : 0068000000000481 x9 : 0000000000000008
> x10: 000000000a200023 x11: 0000000000000002
> x12: 0000000000000002 x13: 0000000080095a00
> x14: 00000000ffffffff x15: 00000000ffecfd2c
> x16: 000000008005454c x17: 0000000000000000
> x18: 00000000fd6afd50 x19: 000000000fe00000
> x20: 0000000000000000 x21: 0060000000000401
> x22: 0000000060200000 x23: 0000000000200000
> x24: 0000000040000808 x25: 00000000001fffff
> x26: 0000000000000003 x27: 0000000060200000
> x28: 0000000000000002 x29: 00000000fd6ab110
> 
> Code: a94573fb a8c67bfd d65f03c0 b9418a40 (8a160334)
> Resetting CPU ...
> 
> resetting ...
> SCI reboot request............................................................................................................................................................................................................................
> 
> After some bisecting this patch poped up: 
> 
> 41e2787f5ec4249cb2e77a3ebd3c49035e3c6535 is the first bad commit
> arm64: Reduce add_map() complexity
> 
> After I reverted everything on top of this patch its booting again with v2023.07
> 
> commit c1da6fdb5c239b432440721772d993e63cfdeb20
> armv8: enable HAFDBS for other ELx when FEAT_HAFDBS is present
> 
> commit 836b8d4b205d2175b57cb9ef271e638b0c116e89
> arm64: Use level-2 for largest block mappings when FEAT_HAFDBS is present
> 
> commit 6cdf6b7a340db4ddd008516181de7e08e3f8c213
> arm64: Use FEAT_HAFDBS to track dirty pages when available
> 
> 
> Do you have any idea whats going on here?

Not really. I can only tell you that your HW has generated a SError,
which is usually pretty fatal. There could be all sort of reasons why
this happen, such as mapping a piece of address space that you are not
supposed to access.

If the previous incarnation of the PT code was behaving in a better
way, the way to debug this is to dump both sets of page tables and
compare what they actually map, figuring out where things go wrong.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2023-02-14 13:38 ` [PATCH 1/2] arm64: Reduce add_map() complexity Ying-Chun Liu (PaulLiu)
  2023-03-07 17:52   ` Tom Rini
  2023-08-01  8:53   ` Oliver Graute
@ 2024-03-08 20:22   ` Fabio Estevam
  2024-03-09  9:52     ` Marc Zyngier
  2 siblings, 1 reply; 19+ messages in thread
From: Fabio Estevam @ 2024-03-08 20:22 UTC (permalink / raw)
  To: Ying-Chun Liu (PaulLiu)
  Cc: u-boot, Marc Zyngier, Pierre-Clément Tosi, Tom Rini,
	Marcel Ziswiler, Francesco Dolcini

Hi Paul and Tom,

On Tue, Feb 14, 2023 at 10:38 AM Ying-Chun Liu (PaulLiu)
<paul.liu@linaro.org> wrote:
>
> From: Marc Zyngier <maz@kernel.org>
>
> In the add_map() function, for each level it populates, it iterates from
> the root of the PT tree, making it ineficient if a mapping needs to occur
> past level 1.
>
> Instead, replace it with a recursive (and much simpler) algorithm
> that keeps the complexity as low as possible. With this, mapping
> 512GB at level 2 goes from several seconds down to not measurable
> on an A55 machine.
>
> We keep the block mappings at level 1 for now though.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
> [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
>   upstream ]
> Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
> Cc: Tom Rini <trini@konsulko.com>
> Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6
> Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059

I know this is an old thread, but this commit causes the following
boot regression on a colibri-imx8qxp board:

U-Boot 2024.04-rc3-00070-gecc9298a893b (Mar 08 2024 - 17:15:31 -0300)

CPU:   NXP i.MX8QXP RevC A35 at 1200 MHz at 51C

DRAM:  2 GiB
"Error" handler, esr 0xbf000000
elr: 0000000080020914 lr : 00000000800209c0 (reloc)
elr: 00000000ffec4914 lr : 00000000ffec49c0
x0 : 0060000070800401 x1 : 0000000070000000
x2 : 0000000010000000 x3 : 0000000000000002
x4 : 0000000040000000 x5 : 0060000000000401
x6 : 0000000000000c00 x7 : 00000000fff45140
x8 : 0060000000000000 x9 : 00000000fff45100
x10: 000000000a200023 x11: 0000000000000002
x12: 0000000000000002 x13: 00000000800a10e8
x14: 00000000ffffffff x15: 00000000ffec4cb8
x16: 0000000080056a88 x17: 0000000000000000
x18: 00000000fd6c1d70 x19: 000000000f600000
x20: 0000000000000000 x21: 0060000000000401
x22: 0000000070a00000 x23: 0000000000200000
x24: 0000000040000c28 x25: 00000000001fffff
x26: 0000000000000003 x27: 0000000070a00000
x28: 0000000000000002 x29: 00000000fd6bbfd0

Code: 1100047a a90573fb aa0103fb 2a0303fc (b5000113)
Resetting CPU ...

resetting ...

Reverting this commit on top of master fixes the boot regression.

Any ideas?

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-08 20:22   ` Fabio Estevam
@ 2024-03-09  9:52     ` Marc Zyngier
  2024-03-09 12:29       ` Fabio Estevam
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2024-03-09  9:52 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Ying-Chun Liu (PaulLiu),
	u-boot, Pierre-Clément Tosi, Tom Rini, Marcel Ziswiler,
	Francesco Dolcini

On Fri, 08 Mar 2024 20:22:53 +0000,
Fabio Estevam <festevam@gmail.com> wrote:
> 
> Hi Paul and Tom,
> 
> On Tue, Feb 14, 2023 at 10:38 AM Ying-Chun Liu (PaulLiu)
> <paul.liu@linaro.org> wrote:
> >
> > From: Marc Zyngier <maz@kernel.org>
> >
> > In the add_map() function, for each level it populates, it iterates from
> > the root of the PT tree, making it ineficient if a mapping needs to occur
> > past level 1.
> >
> > Instead, replace it with a recursive (and much simpler) algorithm
> > that keeps the complexity as low as possible. With this, mapping
> > 512GB at level 2 goes from several seconds down to not measurable
> > on an A55 machine.
> >
> > We keep the block mappings at level 1 for now though.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
> > [ Paul: pick from the Android tree. Fixup Pierre's commit. Rebase to the
> >   upstream ]
> > Signed-off-by: Ying-Chun Liu (PaulLiu) <paul.liu@linaro.org>
> > Cc: Tom Rini <trini@konsulko.com>
> > Link: https://android.googlesource.com/platform/external/u-boot/+/96ad729cf4cab53bdff8222bb3eb256f38b5c3a6
> > Link: https://android.googlesource.com/platform/external/u-boot/+/6be9330601d81545c7c941e3609f35bf68a09059
> 
> I know this is an old thread, but this commit causes the following
> boot regression on a colibri-imx8qxp board:
> 
> U-Boot 2024.04-rc3-00070-gecc9298a893b (Mar 08 2024 - 17:15:31 -0300)
> 
> CPU:   NXP i.MX8QXP RevC A35 at 1200 MHz at 51C
> 
> DRAM:  2 GiB
> "Error" handler, esr 0xbf000000

SError. Not good.

> elr: 0000000080020914 lr : 00000000800209c0 (reloc)
> elr: 00000000ffec4914 lr : 00000000ffec49c0
> x0 : 0060000070800401 x1 : 0000000070000000
> x2 : 0000000010000000 x3 : 0000000000000002
> x4 : 0000000040000000 x5 : 0060000000000401
> x6 : 0000000000000c00 x7 : 00000000fff45140
> x8 : 0060000000000000 x9 : 00000000fff45100
> x10: 000000000a200023 x11: 0000000000000002
> x12: 0000000000000002 x13: 00000000800a10e8
> x14: 00000000ffffffff x15: 00000000ffec4cb8
> x16: 0000000080056a88 x17: 0000000000000000
> x18: 00000000fd6c1d70 x19: 000000000f600000
> x20: 0000000000000000 x21: 0060000000000401
> x22: 0000000070a00000 x23: 0000000000200000
> x24: 0000000040000c28 x25: 00000000001fffff
> x26: 0000000000000003 x27: 0000000070a00000
> x28: 0000000000000002 x29: 00000000fd6bbfd0
> 
> Code: 1100047a a90573fb aa0103fb 2a0303fc (b5000113)
> Resetting CPU ...
> 
> resetting ...
> 
> Reverting this commit on top of master fixes the boot regression.
> 
> Any ideas?

It would be good to narrow down which access is generating this. It is
an asynchronous error, so the code above won't help.

Alternatively, and if you are sure that this is due to this change,
dumping the page tables and comparing them before and after would
help.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-09  9:52     ` Marc Zyngier
@ 2024-03-09 12:29       ` Fabio Estevam
  2024-03-09 12:39         ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Fabio Estevam @ 2024-03-09 12:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ying-Chun Liu (PaulLiu),
	u-boot, Pierre-Clément Tosi, Tom Rini, Marcel Ziswiler,
	Francesco Dolcini

Hi Marc,

On Sat, Mar 9, 2024 at 6:53 AM Marc Zyngier <maz@kernel.org> wrote:

> It would be good to narrow down which access is generating this. It is
> an asynchronous error, so the code above won't help.
>
> Alternatively, and if you are sure that this is due to this change,
> dumping the page tables and comparing them before and after would
> help.

Yes, I am sure the error is due to this change. It is 100% reproducible.

I am not familiar with this part of the code, so I would appreciate it
if you could
tell me how to dump the page tables so I can compare them before and after.

Thanks,

Fabio Estevam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-09 12:29       ` Fabio Estevam
@ 2024-03-09 12:39         ` Marc Zyngier
  2024-03-09 14:36           ` Fabio Estevam
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2024-03-09 12:39 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Ying-Chun Liu (PaulLiu),
	u-boot, Pierre-Clément Tosi, Tom Rini, Marcel Ziswiler,
	Francesco Dolcini

On Sat, 09 Mar 2024 12:29:10 +0000,
Fabio Estevam <festevam@gmail.com> wrote:
> 
> Hi Marc,
> 
> On Sat, Mar 9, 2024 at 6:53 AM Marc Zyngier <maz@kernel.org> wrote:
> 
> > It would be good to narrow down which access is generating this. It is
> > an asynchronous error, so the code above won't help.
> >
> > Alternatively, and if you are sure that this is due to this change,
> > dumping the page tables and comparing them before and after would
> > help.
> 
> Yes, I am sure the error is due to this change. It is 100% reproducible.

Can you figure out what memory access is triggering it? Even at
narrowing it to the subsystem level would be a good indication.

> I am not familiar with this part of the code, so I would appreciate
> it if you could tell me how to dump the page tables so I can compare
> them before and after.

You could just dump the entries as they are written. The order may not
be the same, but for a given VA you should observe the same entries
being written.

My hunch is that the new code is a lot more picky about the alignment
of things, and that could result in something being similarly
unaligned. But without access to the platform nor an idea of what gets
mapped, it's a bit hard to have a clue.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-09 12:39         ` Marc Zyngier
@ 2024-03-09 14:36           ` Fabio Estevam
  2024-03-15 11:56             ` Fabio Estevam
  0 siblings, 1 reply; 19+ messages in thread
From: Fabio Estevam @ 2024-03-09 14:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ying-Chun Liu (PaulLiu),
	u-boot, Pierre-Clément Tosi, Tom Rini, Marcel Ziswiler,
	Francesco Dolcini

On Sat, Mar 9, 2024 at 9:39 AM Marc Zyngier <maz@kernel.org> wrote:

> Can you figure out what memory access is triggering it? Even at
> narrowing it to the subsystem level would be a good indication.

The problem happens so early that I am not able to narrow it down at
subsystem level.

> You could just dump the entries as they are written. The order may not
> be the same, but for a given VA you should observe the same entries
> being written.

Does the log below help?

https://pastebin.com/raw/1i1VBA0a

If not, please send me a debug patch and I will be glad to run it here.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-09 14:36           ` Fabio Estevam
@ 2024-03-15 11:56             ` Fabio Estevam
  2024-03-15 15:13               ` Pierre-Clément Tosi
  0 siblings, 1 reply; 19+ messages in thread
From: Fabio Estevam @ 2024-03-15 11:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ying-Chun Liu (PaulLiu),
	u-boot, Pierre-Clément Tosi, Tom Rini, Marcel Ziswiler,
	Francesco Dolcini

Hi Marc,

On Sat, Mar 9, 2024 at 11:36 AM Fabio Estevam <festevam@gmail.com> wrote:

> Does the log below help?
>
> https://pastebin.com/raw/1i1VBA0a
>
> If not, please send me a debug patch and I will be glad to run it here.

I'm sorry to bother you, but have you had a chance to look at the log
I shared with you?

That's the only issue preventing colibri-imx8x from booting mainline U-Boot.

We would like to get this fixed for U-Boot 2024.04.

Thanks for your help

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-15 11:56             ` Fabio Estevam
@ 2024-03-15 15:13               ` Pierre-Clément Tosi
  2024-03-18 13:31                 ` Fabio Estevam
  0 siblings, 1 reply; 19+ messages in thread
From: Pierre-Clément Tosi @ 2024-03-15 15:13 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Marc Zyngier, Ying-Chun Liu (PaulLiu),
	u-boot, Tom Rini, Marcel Ziswiler, Francesco Dolcini

Hi Fabio,

On Fri, Mar 15, 2024 at 08:56:17AM -0300, Fabio Estevam wrote:
> Hi Marc,
> 
> On Sat, Mar 9, 2024 at 11:36 AM Fabio Estevam <festevam@gmail.com> wrote:
> 
> > Does the log below help?
> >
> > https://pastebin.com/raw/1i1VBA0a
> >
> > If not, please send me a debug patch and I will be glad to run it here.
> 
> I'm sorry to bother you, but have you had a chance to look at the log
> I shared with you?

I had a quick look through your logs and notice that U-Boot master attempts to
map addresses in the high VA range e.g.

  Checking if pte fits for virt=ffffffffe4000000 [...]

while the logs that boot successfully only use the low VA range e.g.

  Checking if pte fits for virt=80193000 [...]

Unless that has recently changed (since I last worked with U-Boot), U-Boot on
AArch64 only supports identity mappings and therefore was only taught how to
program TTBR0_ELx (i.e. is only able to map low virtual addresses). This means
that the code - with or without 41e2787f5ec4 - would be unable to map addresses
such as 0xffffffffe4000000.

Now, given that 41e2787f5ec4 only affects implementation details of add_map(),
I am surprised that reverting that commit changes the arguments received by the
function such as virt. As a reminder, add_map() is exclusively used on mem_map:

  for (i = 0; mem_map[i].size || mem_map[i].attrs; i++)
          add_map(&mem_map[i]);

> 
> That's the only issue preventing colibri-imx8x from booting mainline U-Boot.

If I read the U-Boot configs right i.e.

 - configs/colibri-imx8x_defconfig: CONFIG_ARCH_IMX8=y
 - arch/arm/mach-imx/imx8/Makefile: obj-y += cpu.o
 - arch/arm/mach-imx/imx8/cpu.c: struct mm_region *mem_map = imx8_mem_map;

There is a possibility that your mem_map is getting modified by MACH-specific
code. In particular, enable_caches() seems to dynamically get the MMU mappings
from some RPC mechanism (see get_owned_memreg() and sc_rm_get_memreg_info()).

Could it be that whatever services those requests might be returning unexpected
values? Instrumenting arch/arm/mach-imx/imx8/cpu.c and dumping mem_map (and
the RPC messages) with/without the patch would help clear this up.

HTH,

> 
> We would like to get this fixed for U-Boot 2024.04.
> 
> Thanks for your help

-- 
Pierre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-15 15:13               ` Pierre-Clément Tosi
@ 2024-03-18 13:31                 ` Fabio Estevam
  2024-03-18 13:50                   ` Fabio Estevam
  2024-03-18 19:59                   ` Pierre-Clément Tosi
  0 siblings, 2 replies; 19+ messages in thread
From: Fabio Estevam @ 2024-03-18 13:31 UTC (permalink / raw)
  To: Pierre-Clément Tosi
  Cc: Marc Zyngier, Ying-Chun Liu (PaulLiu),
	u-boot, Tom Rini, Marcel Ziswiler, Francesco Dolcini

Hi Pierre,

On Fri, Mar 15, 2024 at 12:13 PM Pierre-Clément Tosi <ptosi@google.com> wrote:

> I had a quick look through your logs and notice that U-Boot master attempts to
> map addresses in the high VA range e.g.
>
>   Checking if pte fits for virt=ffffffffe4000000 [...]
>
> while the logs that boot successfully only use the low VA range e.g.
>
>   Checking if pte fits for virt=80193000 [...]
>
> Unless that has recently changed (since I last worked with U-Boot), U-Boot on
> AArch64 only supports identity mappings and therefore was only taught how to
> program TTBR0_ELx (i.e. is only able to map low virtual addresses). This means
> that the code - with or without 41e2787f5ec4 - would be unable to map addresses
> such as 0xffffffffe4000000.

Yes, I found it strange too. I may have done something wrong the last
time I instrumented the code.

I tried it again and no longer see these unexpected high virtual addresses.

Please find the new logs here:

https://pastebin.com/raw/qF3GbJry

> Now, given that 41e2787f5ec4 only affects implementation details of add_map(),
> I am surprised that reverting that commit changes the arguments received by the
> function such as virt. As a reminder, add_map() is exclusively used on mem_map:
>
>   for (i = 0; mem_map[i].size || mem_map[i].attrs; i++)
>           add_map(&mem_map[i]);
>
> >
> > That's the only issue preventing colibri-imx8x from booting mainline U-Boot.
>
> If I read the U-Boot configs right i.e.
>
>  - configs/colibri-imx8x_defconfig: CONFIG_ARCH_IMX8=y
>  - arch/arm/mach-imx/imx8/Makefile: obj-y += cpu.o
>  - arch/arm/mach-imx/imx8/cpu.c: struct mm_region *mem_map = imx8_mem_map;

Correct, these are the relevant files for the i.MXQ8XP.

> There is a possibility that your mem_map is getting modified by MACH-specific
> code. In particular, enable_caches() seems to dynamically get the MMU mappings
> from some RPC mechanism (see get_owned_memreg() and sc_rm_get_memreg_info()).
>
> Could it be that whatever services those requests might be returning unexpected
> values? Instrumenting arch/arm/mach-imx/imx8/cpu.c and dumping mem_map (and
> the RPC messages) with/without the patch would help clear this up.

I tried dumping the page table entries, but could not notice anything
that looked suspicious.

Please let me know if you have any suggestions.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-18 13:31                 ` Fabio Estevam
@ 2024-03-18 13:50                   ` Fabio Estevam
  2024-03-18 19:59                   ` Pierre-Clément Tosi
  1 sibling, 0 replies; 19+ messages in thread
From: Fabio Estevam @ 2024-03-18 13:50 UTC (permalink / raw)
  To: Pierre-Clément Tosi
  Cc: Marc Zyngier, Ying-Chun Liu (PaulLiu),
	u-boot, Tom Rini, Marcel Ziswiler, Francesco Dolcini

On Mon, Mar 18, 2024 at 10:31 AM Fabio Estevam <festevam@gmail.com> wrote:

> I tried dumping the page table entries, but could not notice anything
> that looked suspicious.

This looks suspicious:

With 41e2787f5ec4 reverted:

Checking if pte fits for virt=1c000000 size=64000000 blocksize=40000000
Checking if pte fits for virt=1c000000 size=64000000 blocksize=200000
Checking if pte fits for virt=1c200000 size=63e00000 blocksize=40000000
Checking if pte fits for virt=1c200000 size=63e00000 blocksize=200000
Checking if pte fits for virt=1c400000 size=63c00000 blocksize=40000000
....

In U-Boot master:

Checking if pte fits for virt=1c000000 size=64000000 blocksize=40000000
Checking if pte fits for virt=1c000000 size=24000000 blocksize=200000
Checking if pte fits for virt=1c200000 size=23e00000 blocksize=200000
Checking if pte fits for virt=1c400000 size=23c00000 blocksize=200000

The second entry has size=24000000 instead of size=64000000.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-18 13:31                 ` Fabio Estevam
  2024-03-18 13:50                   ` Fabio Estevam
@ 2024-03-18 19:59                   ` Pierre-Clément Tosi
  2024-03-18 20:09                     ` Fabio Estevam
  1 sibling, 1 reply; 19+ messages in thread
From: Pierre-Clément Tosi @ 2024-03-18 19:59 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Marc Zyngier, Ying-Chun Liu (PaulLiu),
	u-boot, Tom Rini, Marcel Ziswiler, Francesco Dolcini

Hi Fabio,

On Mon, Mar 18, 2024 at 10:31:25AM -0300, Fabio Estevam wrote:
> Please find the new logs here:
> 
> https://pastebin.com/raw/qF3GbJry

I notice that the mem_map in these logs have overlapping ranges, which results
in unnecessary work when creating the PTs. For this reason, it would make sense
to prune it in arch/arm/mach-imx/imx8/cpu.c before calling dcache_enable(), IMO.
On this point, you also have contiguous ranges with identical attributes
(mem_map[0] and mem_map[6]), which could be merged into a single entry. This
could result in more efficient MMU mappings if the mem_map entries can share a
block mapping. Also note that mem_map[4].size=0 so could be dropped.

In any case, if we assume that overlapping mem_map entries are a valid input to
the arch code (maybe not as it leads to potentially ambiguous behavior?), then
41e2787f5ec4 had removed support for splitting existing block mappings.

In your case, my assumption is that the function was then treating block
mappings in the range 0x1c000000-0x80000000 (which get mapped, at least
partially, by mem_map[0], mem_map[1], then mem_map[2]) as table mappings and was
issuing read/write instructions in that range (accessing them as PTEs). As those
seem to be device memory (I see they get mapped as MT_DEVICE_NGNRNE), these
accesses might explain the SError you were experiencing.

Would you mind testing [1] and giving it "Tested-by:" if it addresses the issue?

Thanks,

[1]: https://lore.kernel.org/u-boot/43haokus4jdxguk4arig5tsqcgq2wbezwpbj7oti6pdkvrfual@wa7vz2iypcv5/

-- 
Pierre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] arm64: Reduce add_map() complexity
  2024-03-18 19:59                   ` Pierre-Clément Tosi
@ 2024-03-18 20:09                     ` Fabio Estevam
  0 siblings, 0 replies; 19+ messages in thread
From: Fabio Estevam @ 2024-03-18 20:09 UTC (permalink / raw)
  To: Pierre-Clément Tosi
  Cc: Marc Zyngier, Ying-Chun Liu (PaulLiu),
	u-boot, Tom Rini, Marcel Ziswiler, Francesco Dolcini

Hi Pierre,

On Mon, Mar 18, 2024 at 4:59 PM Pierre-Clément Tosi <ptosi@google.com> wrote:

> I notice that the mem_map in these logs have overlapping ranges, which results
> in unnecessary work when creating the PTs. For this reason, it would make sense
> to prune it in arch/arm/mach-imx/imx8/cpu.c before calling dcache_enable(), IMO.
> On this point, you also have contiguous ranges with identical attributes
> (mem_map[0] and mem_map[6]), which could be merged into a single entry. This
> could result in more efficient MMU mappings if the mem_map entries can share a
> block mapping. Also note that mem_map[4].size=0 so could be dropped.
>
> In any case, if we assume that overlapping mem_map entries are a valid input to
> the arch code (maybe not as it leads to potentially ambiguous behavior?), then
> 41e2787f5ec4 had removed support for splitting existing block mappings.
>
> In your case, my assumption is that the function was then treating block
> mappings in the range 0x1c000000-0x80000000 (which get mapped, at least
> partially, by mem_map[0], mem_map[1], then mem_map[2]) as table mappings and was
> issuing read/write instructions in that range (accessing them as PTEs). As those
> seem to be device memory (I see they get mapped as MT_DEVICE_NGNRNE), these
> accesses might explain the SError you were experiencing.
>
> Would you mind testing [1] and giving it "Tested-by:" if it addresses the issue?

Your patch fixed the boot regression. Thanks for your fix, appreciated it!

I have replied with my Tested-by tag.

Cheers,

Fabio Estevam

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-03-18 20:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-14 13:38 [PATCH 0/2] Reduce the complexity of add_map() and count_required_pts() Ying-Chun Liu (PaulLiu)
2023-02-14 13:38 ` [PATCH 1/2] arm64: Reduce add_map() complexity Ying-Chun Liu (PaulLiu)
2023-03-07 17:52   ` Tom Rini
2023-08-01  8:53   ` Oliver Graute
2023-08-01 12:09     ` Oliver Graute
2023-08-01 17:31     ` Marc Zyngier
2024-03-08 20:22   ` Fabio Estevam
2024-03-09  9:52     ` Marc Zyngier
2024-03-09 12:29       ` Fabio Estevam
2024-03-09 12:39         ` Marc Zyngier
2024-03-09 14:36           ` Fabio Estevam
2024-03-15 11:56             ` Fabio Estevam
2024-03-15 15:13               ` Pierre-Clément Tosi
2024-03-18 13:31                 ` Fabio Estevam
2024-03-18 13:50                   ` Fabio Estevam
2024-03-18 19:59                   ` Pierre-Clément Tosi
2024-03-18 20:09                     ` Fabio Estevam
2023-02-14 13:38 ` [PATCH 2/2] arm64: Reduce PT size estimation complexity Ying-Chun Liu (PaulLiu)
2023-03-07 17:52   ` Tom Rini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.