On Thu, Aug 05, 2021 at 11:52:05PM +0200, Marek Vasut wrote: > On 8/2/21 4:44 PM, Tom Rini wrote: > > On Mon, Aug 02, 2021 at 04:34:29PM +0200, Jan Kiszka wrote: > > > On 02.08.21 16:27, Tom Rini wrote: > > > > On Mon, Aug 02, 2021 at 04:03:01PM +0200, Jan Kiszka wrote: > > > > > On 02.08.21 15:04, Tom Rini wrote: > > > > > > On Mon, Aug 02, 2021 at 01:54:57PM +0200, Jan Kiszka wrote: > > > > > > > On 02.08.21 13:38, Marek Vasut wrote: > > > > > > > > On 8/2/21 1:36 PM, Jan Kiszka wrote: > > > > > > > > > On 02.08.21 12:48, Marek Vasut wrote: > > > > > > > > > > On 8/2/21 11:37 AM, Jan Kiszka wrote: > > > > > > > > > > > On 02.08.21 02:54, Marek Vasut wrote: > > > > > > > > > > > > On 7/29/21 6:58 PM, Tom Rini wrote: > > > > > > > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > > > > > > > > > so when did rcar3 introduce something there that shouldn't be > > > > > > > > > > > > > > > reserved?  And you had phrased this to me on IRC as about reserving > > > > > > > > > > > > > > > spot > > > > > > > > > > > > > > > for ATAGS, and that not being needed of course on arm64.  But > > > > > > > > > > > > > > > that's > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > what's going on.  Perhaps the answer is that rcar3 needs to > > > > > > > > > > > > > > > introduce a > > > > > > > > > > > > > > > board_lmb_reserve to free the normal arch one and provide whatever > > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > narrow scope it needs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Based on the commit message 2359fa7a878 ("arm: bootm: Disable LMB > > > > > > > > > > > > > > reservation for command line and board info on arm64") , this is > > > > > > > > > > > > > > about ATAGS > > > > > > > > > > > > > > and we really don't need to reserve those on arm64. > > > > > > > > > > > > > > > > > > > > > > > > > > Commit 2359fa7a878 disables the entire arch_lmb_reserve function on > > > > > > > > > > > > > aarch64, yes.  I assumed when we had talked that it was a small area > > > > > > > > > > > > > being set aside and perhaps mis-recalled that ATAGS tended to live at > > > > > > > > > > > > > DDR_BASE + 0x800 or so. > > > > > > > > > > > > > > > > > > > > > > > > That arch_lmb_reserve() is responsible for reserving architecture > > > > > > > > > > > > specific memory. On arm32 it is ATAGS, on arm64 it is nothing as > > > > > > > > > > > > far as > > > > > > > > > > > > I can tell (and see below regarding the TLB). > > > > > > > > > > > > > > > > > > > > > > > > > This reservation is not at that spot, and a lot > > > > > > > > > > > > > more than that. > > > > > > > > > > > > > > > > > > > > > > > > Can you please elaborate on this "lot more" part ? Because as much > > > > > > > > > > > > as I > > > > > > > > > > > > studied the reservation code, the "lot more" was ATAGS on arm32 and > > > > > > > > > > > > nothing on arm64. > > > > > > > > > > > > > > > > > > > > > > See my commit log. > > > > > > > > > > > > > > > > > > > > This is not particularly useful answer, considering the commit log says: > > > > > > > > > > "lot of crucial things", "Possibly more", "likely also on other boards" > > > > > > > > > > and other opaque statements. But really, the problem so far happens on > > > > > > > > > > one K3 board. > > > > > > > > > > > > > > > > > > "Such things are the page table (tlb_addr), > > > > > > > > > relocated U-Boot and the active stack." > > > > > > > > > > > > > > > > Please read the rest of my answer, I don't believe the TLB should be > > > > > > > > reserved at all. DTTO for the stack. If you think otherwise, please > > > > > > > > explain why. > > > > > > > > > > > > > > Marek, I've provided you with three generic examples of active memory > > > > > > > blocks that are relevant while U-Boot is allocating from and also > > > > > > > filling that LMB. Please follow those cases and explain to us why they > > > > > > > aren't active - or at least prove why they are specific the k3 (for > > > > > > > which I found no traces). > > > > > > > > > > > > > > And stop following the TLB topic for now. That was only my first guess. > > > > > > > The actual crash I'm seeing on my board come from plain code > > > > > > > overwriting. It could have been TLB as well. It could also have been the > > > > > > > stack. All those become unprotected via your reservation removal. > > > > > > > > > > > > Jan, one thing I didn't see before is, are you also using > > > > > > include/configs/ti_armv7_common.h in the end, like the K3 reference > > > > > > platforms, and if not are you setting bootm_size in your environment? I > > > > > > have one more idea on why this fails on your board but not Marek's. > > > > > > Thanks. > > > > > > > > > > We are including that header but we didn't use DEFAULT_LINUX_BOOT_ENV, > > > > > in fact. That left bootm_size undefined. Can you explain the impact? > > > > > > > > I suspect the answer here is that Marek does not see this problem > > > > because on R-Car bootm_size is set to 0x10000000 and so no relocation of > > > > the device tree / kernel / initrd happens to overwrite the running > > > > U-Boot and blow everything up. If you don't revert this, and do set > > > > bootm_size does everything work? Marek, if you unset bootm_size, do you > > > > see failure? Thanks! > > > > > > > > > > I currently do not see the error, even with unset bootm_size and Marek's > > > patch back in. But fdt indeed moves down when adopting those settings. > > > That makes sense for us anyway, I think our custom env values are rather > > > for historic reasons, and one had an issue anyway (incorrect kernel > > > alignment). > > > > > > But at least we understand why I was able to see this, sometimes. > > > > OK, thanks. Note that I'm not sure how I want to move forward here > > because a very frequent user/developer problem is "device tree > > relocated, everything crashed, why? oh, I'll just disable it (and lead > > to another problem down the line)". > > In rcar with bootm_size unset it looks like this: > > => bdinfo > boot_params = 0x000000007beee240 > DRAM bank = 0x0000000000000000 > -> start = 0x0000000048000000 > -> size = 0x0000000038000000 > DRAM bank = 0x0000000000000001 > -> start = 0x0000000500000000 > -> size = 0x0000000040000000 > DRAM bank = 0x0000000000000002 > -> start = 0x0000000600000000 > -> size = 0x0000000040000000 > DRAM bank = 0x0000000000000003 > -> start = 0x0000000700000000 > -> size = 0x0000000040000000 > flashstart = 0x0000000008000000 > flashsize = 0x0000000004000000 > flashoffset = 0x00000000000f5890 > baudrate = 115200 bps > relocaddr = 0x000000007fee8000 > reloc off = 0x000000007fee8000 > Build = 64-bit > current eth = ethernet@e6800000 > ... > fdt_blob = 0x000000007beda0e0 > new_fdt = 0x000000007beda0e0 > fdt_size = 0x000000000000dcc0 > multi_dtb_fit= 0x0000000049000000 > lmb_dump_all: > memory.cnt = 0x4 > memory[0] [0x48000000-0x7fffffff], 0x38000000 bytes flags: 0 > memory[1] [0x500000000-0x53fffffff], 0x40000000 bytes flags: 0 > memory[2] [0x600000000-0x63fffffff], 0x40000000 bytes flags: 0 > memory[3] [0x700000000-0x73fffffff], 0x40000000 bytes flags: 0 > reserved.cnt = 0x1 > reserved[0] [0x44100000-0x47efffff], 0x03e00000 bytes flags: 4 > arch_number = 0x0000000000000000 > TLB addr = 0x000000007fff0000 > irq_sp = 0x000000007beda0d0 > sp start = 0x000000007beda0d0 > Early malloc usage: 1318 / 8000 > > ... > > ## Loading kernel from FIT Image at 58000000 ... > Using 'conf-1' configuration > Trying 'kernel-1' kernel subimage > Description: Linux kernel (Sat Jun 5 00:24:15 CEST 2021) > Type: Kernel Image > Compression: uncompressed > Data Start: 0x58000154 > Data Size: 16662536 Bytes = 15.9 MiB > Architecture: AArch64 > OS: Linux > Load Address: 0x50200000 > Entry Point: 0x50200000 > Hash algo: crc32 > Hash value: 0655cd1f > Verifying Hash Integrity ... crc32+ OK > ## Loading fdt from FIT Image at 58000000 ... > Using 'conf-1' configuration > Trying 'fdt-1' fdt subimage > Description: Flattened Device Tree blob (Sat Jun 5 00:24:15 CEST > 2021) > Type: Flat Device Tree > Compression: uncompressed > Data Start: 0x58fe42a4 > Data Size: 74686 Bytes = 72.9 KiB > Architecture: AArch64 > Hash algo: crc32 > Hash value: 287b2438 > Verifying Hash Integrity ... crc32+ OK > Booting using the fdt blob at 0x58fe42a4 > Loading Kernel Image > Loading Device Tree to 000000007ffea000, end 000000007ffff3bd ... OK OK, I think we can say it's likely that in your case we're relocating the start of the device tree just a bit past where U-Boot is running. A bit of quick math says there's around 1MiB between relocaddr for U-Boot and startof the device tree relocation address. -- Tom