On 03/07/2017 04:46 AM, Tobias Klauser wrote: > On 2017-03-03 at 04:04:41 +0100, Guenter Roeck wrote: >> On 03/02/2017 08:38 AM, Tobias Klauser wrote: >>> On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: >>>> On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: >>>>> Hi Guenter, Tobias and Sandra, >>>>> >>>>> thanks for your effort here. >>>>> >>>>> On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: >>>>>> On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: >>>>>>> On 02/28/2017 08:53 AM, Tobias Klauser wrote: >>>>>>>> (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils >>>>>>>> for nios2) >>>>>>>> >>>>>>>> On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: >>>>>>>>> Hi Sven, >>>>>>>>> >>>>>>>>> my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: >>>>>>>>> update LZ4 compressor module"). The test hangs early during boot before >>>>>>>>> any console output is seen. Reverting the offending patch as well as the >>>>>>>>> subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 >>>>>>>>> and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" >>>>>>>>> at the top of the LZ4 decompression code). For reference, bisect log >>>>>>>>> is attached. >>>>>>>>> >>>>>>>>> I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 >>>>>>>>> and binutils 2.26.1. Scripts used to run the tests are available at >>>>>>>>> https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. >>>>>>>>> Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. >>>>>>>> >>>>>>>> Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and >>>>>>>> binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can >>>>>>>> get a kernel booting on latest master branch. AFAICT, none of the >>>>>>>> LZ4_decompress_* functions are called during boot. >>>>>>>> >>>>> >>>>> It seems a bit strange that code which is not actually called causes problems like that. >>>>> >>>> Yes, it is, though it is always possible. The code isn't exactly easy to >>>> understand; there may be some hidden caveats such as global variables. It may >>>> also be that some jump target exceeds its range (though why that would only >>>> be seen with the LZ4 code is another question), or that the compiler gets >>>> confused by the forced inlines (disabling that didn't make a difference, >>>> though, nor did disabling -O3). >>>> >>>>> Please let me know if and how I may help you figure out what's happening, especially >>>>> regarding the differences between the previous LZ4 and the current implementation. >>>>> >>>> >>>> For my part I am all but clueless. Unless someone has an idea, we may to >>>> disable LZ4 support for nios2 for the time being. Does anyone have thoughts >>>> on that ? Of course, that would not help if the problem also affects >>>> recent gcc/binutil versions on other architectures. >>> >>> After some further investigations, I'd say this isn't "caused" by LZ4 >>> specifically but by a more general problem with one of the nios2 arch >>> specific tools involved. >>> >>> I manually enabled random additional CONFIG_* options and in some cases >>> I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return >>> -EINVAL in place) while in others I didn't. So I'd rather suspect this >>> problem to be connected to the size or structure of the generated vmlinux >>> image. >>> >>> Or could this even be a problem with qemu? Did anyone already verify >>> this on the 10m50 devboard? (Unfortunately I don't have any nios2 >>> devboard available right now, otherwise I would have done this...) >>> >> >> That is of course always possible. >> >>> Other than that I'm also becoming all but clueless... One option I >>> thought of was using the QEMU monitor to dump the CPU state after the >>> hang but so far I didn't manage to get it to work (hints appreciated ;) >>> >> >> Something like >> >> qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ >> -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ >> --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio >> >> gives you a qemu monitor window. Use "info registers" to see registers. >> Looks like it is stuck in init_bootmem_core, or at least that is what it >> shows for me. > > Thanks a lot for the hint, this worked perfectly. I'm not all that > familiar with qemu :-/ > > Using the qemu gdbserver I can indeed confirm that it seems to be stuck > in init_bootmem_core: > > (gdb) file vmlinux > Reading symbols from vmlinux...done. > (gdb) target remote localhost:1234 > Remote debugging using localhost:1234 > link_bootmem (bdata=) at mm/bootmem.c:80 > 80 if (bdata->node_min_pfn < ent->node_min_pfn) { > > This looks like a very weird place for it to get stuck... > > So I followed a different path and implemented early printk support for > the 8250/16650 serial console on nios2, so I could get debug outputs > earlier on (patch below, I'll also officially submit this later one). > That is great; I'll add that to my own tests to get some output. > Now I get the following output on boot: > > Linux version 4.11.0-rc1-dirty (tobiask@ziws08) (gcc version 7.0.1 20170226 (experimental) (GCC) ) #46 Tue Mar 7 13:40:53 CET 2017 > bootconsole [early0] enabled > Early console on uart16650 initialized at 0xf8001600 > OF: fdt: Error -11 processing FDT > Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! > > ---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! > > Looks like the in-memory device tree somehow gets corrupted. Not sure > yet why and how this is linked to the Kconfig options selected but at > least we now have a possibility to use debug messages earlier on. > Interesting. I was able to confirm that the lz4 patch is not the root cause. I was not able to reproduce the problem in v4.10, but after adding more and more configuration options I get it to fail starting with commit ac1820fb286b552 ("Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/ scm/linux/kernel/git/dledford/rdma"). No idea if that is the root cause either. Kernel configuration for that case is attached. Of course ac1820fb286b552 doesn't crash anymore with that after applying your patch below, and v4.11-rc1 crashes without any output :-(. I think I'll add some logging into qemu to see where it puts the dtb. Guenter > ---%<---%<--- > > Patch for 8250/16650 early printk support on nios2 (make sure to select > CONFIG_EARLY_PRINTK): > > diff --git a/arch/nios2/Kconfig.debug b/arch/nios2/Kconfig.debug > index 2fd08cbfdddb..35b5dd67b15a 100644 > --- a/arch/nios2/Kconfig.debug > +++ b/arch/nios2/Kconfig.debug > @@ -18,7 +18,7 @@ config EARLY_PRINTK > bool "Activate early kernel debugging" > default y > select SERIAL_CORE_CONSOLE > - depends on SERIAL_ALTERA_JTAGUART_CONSOLE || SERIAL_ALTERA_UART_CONSOLE > + depends on SERIAL_ALTERA_JTAGUART_CONSOLE || SERIAL_ALTERA_UART_CONSOLE || SERIAL_8250_CONSOLE > help > Enable early printk on console > This is useful for kernel debugging when your machine crashes very > diff --git a/arch/nios2/kernel/early_printk.c b/arch/nios2/kernel/early_printk.c > index c08e4c1486fc..24b4506f4969 100644 > --- a/arch/nios2/kernel/early_printk.c > +++ b/arch/nios2/kernel/early_printk.c > @@ -22,6 +22,8 @@ static unsigned long base_addr; > > #if defined(CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE) > > +#define UART_NAME "altera_jtaguart" > + > #define ALTERA_JTAGUART_DATA_REG 0 > #define ALTERA_JTAGUART_CONTROL_REG 4 > #define ALTERA_JTAGUART_CONTROL_WSPACE_MSK 0xFFFF0000 > @@ -53,6 +55,8 @@ static void early_console_write(struct console *con, const char *s, unsigned n) > > #elif defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE) > > +#define UART_NAME "altera_uart" > + > #define ALTERA_UART_TXDATA_REG 4 > #define ALTERA_UART_STATUS_REG 8 > #define ALTERA_UART_STATUS_TRDY 0x0040 > @@ -80,9 +84,40 @@ static void early_console_write(struct console *con, const char *s, unsigned n) > } > } > > +#elif defined(CONFIG_SERIAL_8250_CONSOLE) > + > +#define UART_NAME "uart16650" > + > +#define UART_LSR_TEMT 0x40 /* Transmitter empty */ > +#define UART_LSR_THRE 0x20 /* Transmit-hold-register empty */ > +#define BOTH_EMPTY (UART_LSR_TEMT | UART_LSR_THRE) > + > +#define UART_GET_SR() \ > + __builtin_ldwio((void *)(base_addr + 0x14)) > +#define UART_SET_TX(v) \ > + __builtin_stwio((void *)(base_addr), v) > + > +static void early_console_putc(char c) > +{ > + while (!((UART_GET_SR() & BOTH_EMPTY) == BOTH_EMPTY)) > + ; > + > + UART_SET_TX(c & 0xff); > +} > + > +static void early_console_write(struct console *con, const char *s, unsigned n) > +{ > + while (n-- && *s) { > + early_console_putc(*s); > + if (*s == '\n') > + early_console_putc('\r'); > + s++; > + } > +} > + > #else > -# error Neither SERIAL_ALTERA_JTAGUART_CONSOLE nor SERIAL_ALTERA_UART_CONSOLE \ > -selected > +# error Neither SERIAL_ALTERA_JTAGUART_CONSOLE, SERIAL_ALTERA_UART_CONSOLE, \ > + nor SERIAL_8250_CONSOLE selected > #endif > > static struct console early_console_prom = { > @@ -95,7 +130,8 @@ static struct console early_console_prom = { > void __init setup_early_printk(void) > { > #if defined(CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE) || \ > - defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE) > + defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE) || \ > + defined(CONFIG_SERIAL_8250_CONSOLE) > base_addr = of_early_console(); > #else > base_addr = 0; > @@ -114,5 +150,5 @@ void __init setup_early_printk(void) > > early_console = &early_console_prom; > register_console(early_console); > - pr_info("early_console initialized at 0x%08lx\n", base_addr); > + pr_info("Early console on %s initialized at 0x%08lx\n", UART_NAME, base_addr); > } >