Linux kernel regressions
 help / color / Atom feed
* [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
@ 2021-06-15 11:11 Naresh Kamboju
  2021-06-15 11:50 ` Will Deacon
  2021-06-15 12:47 ` Mark Rutland
  0 siblings, 2 replies; 10+ messages in thread
From: Naresh Kamboju @ 2021-06-15 11:11 UTC (permalink / raw)
  To: Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	Will Deacon, lkft-triage, regressions
  Cc: Andrew Morton, Stephen Rothwell, Arnd Bergmann, Ard Biesheuvel,
	Catalin Marinas, Mike Rapoport, Christophe Leroy

Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
with allmodconfig build.

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.13.0-rc6-next-20210615
(tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
PREEMPT Tue Jun 15 10:20:51 UTC 2021
[    0.000000] Machine model: linux,dummy-virt
[    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[    0.000000] printk: bootconsole [pl11] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem
0x0000000040000000-0x00000000bfffffff]
[    0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
[    0.000000] ------------[ cut here ]------------
[    0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
[    0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G                T
5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
[    0.000000] pc : __phys_addr_symbol+0x44/0xc0
[    0.000000] lr : __phys_addr_symbol+0x44/0xc0
[    0.000000] sp : ffff800014287b00
[    0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
[    0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
[    0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
[    0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
[    0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
[    0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
[    0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
[    0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
[    0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
[    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[    0.000000] Call trace:
[    0.000000]  __phys_addr_symbol+0x44/0xc0
[    0.000000]  sparse_init_nid+0x98/0x6d0
[    0.000000]  sparse_init+0x460/0x4d4
[    0.000000]  bootmem_init+0x110/0x340
[    0.000000]  setup_arch+0x1b8/0x2e0
[    0.000000]  start_kernel+0x110/0x870
[    0.000000]  __primary_switched+0xa8/0xb0
[    0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
[    0.000000] random: get_random_bytes called from
oops_exit+0x54/0xc0 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
exception ]---

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 11:11 [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c Naresh Kamboju
@ 2021-06-15 11:50 ` Will Deacon
  2021-06-17 12:15   ` Naresh Kamboju
  2021-06-15 12:47 ` Mark Rutland
  1 sibling, 1 reply; 10+ messages in thread
From: Will Deacon @ 2021-06-15 11:50 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	lkft-triage, regressions, Andrew Morton, Stephen Rothwell,
	Arnd Bergmann, Ard Biesheuvel, Catalin Marinas, Mike Rapoport,
	Christophe Leroy

Hi Naresh,

On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> with allmodconfig build.
> 
> [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> [    0.000000] Linux version 5.13.0-rc6-next-20210615
> (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> PREEMPT Tue Jun 15 10:20:51 UTC 2021
> [    0.000000] Machine model: linux,dummy-virt
> [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> [    0.000000] printk: bootconsole [pl11] enabled
> [    0.000000] efi: UEFI not found.
> [    0.000000] NUMA: No NUMA configuration found
> [    0.000000] NUMA: Faking a node at [mem
> 0x0000000040000000-0x00000000bfffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> [    0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G                T
> 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> [    0.000000] Hardware name: linux,dummy-virt (DT)
> [    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> [    0.000000] pc : __phys_addr_symbol+0x44/0xc0
> [    0.000000] lr : __phys_addr_symbol+0x44/0xc0
> [    0.000000] sp : ffff800014287b00
> [    0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> [    0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> [    0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> [    0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> [    0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> [    0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> [    0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> [    0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> [    0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> [    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> [    0.000000] Call trace:
> [    0.000000]  __phys_addr_symbol+0x44/0xc0
> [    0.000000]  sparse_init_nid+0x98/0x6d0
> [    0.000000]  sparse_init+0x460/0x4d4
> [    0.000000]  bootmem_init+0x110/0x340
> [    0.000000]  setup_arch+0x1b8/0x2e0
> [    0.000000]  start_kernel+0x110/0x870
> [    0.000000]  __primary_switched+0xa8/0xb0
> [    0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> [    0.000000] random: get_random_bytes called from
> oops_exit+0x54/0xc0 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [    0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> exception ]---
> 
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Thanks for the report, although since this appears to be part of a broader
testing effort, here are some things that I think would make the reports
even more useful:

  1. An indication as to whether or not this is a regression (i.e. do you
     have a known good build, perhaps even a bisection?)

  2. Either a link to the vmlinux, or faddr2line run on the backtrace.
     Looking at the above, I can't tell what sparse_init_nid+0x98/0x6d0
     actually is.

  3. The exact QEMU command-line you are using, so I can try to reproduce
     this locally. I think the 0-day bot wraps the repro up in a shell
     script for you.

  4. Whether or not the issue is reproducible.

  5. Information about the toolchain you used to build the kernel (it
     happens to be present here because it's in the kernel log, but
     generally I think it would be handy to specify that in the report).

Please can you provide that information for this crash? It would really
help in debugging it.

Thanks!

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 11:11 [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c Naresh Kamboju
  2021-06-15 11:50 ` Will Deacon
@ 2021-06-15 12:47 ` Mark Rutland
  2021-06-15 13:19   ` Mark Rutland
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Rutland @ 2021-06-15 12:47 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	Will Deacon, lkft-triage, regressions, Andrew Morton,
	Stephen Rothwell, Arnd Bergmann, Ard Biesheuvel, Catalin Marinas,
	Mike Rapoport, Christophe Leroy, Miles Chen

On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> with allmodconfig build.
> 
> [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> [    0.000000] Linux version 5.13.0-rc6-next-20210615
> (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> PREEMPT Tue Jun 15 10:20:51 UTC 2021
> [    0.000000] Machine model: linux,dummy-virt
> [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> [    0.000000] printk: bootconsole [pl11] enabled
> [    0.000000] efi: UEFI not found.
> [    0.000000] NUMA: No NUMA configuration found
> [    0.000000] NUMA: Faking a node at [mem
> 0x0000000040000000-0x00000000bfffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> [    0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G                T
> 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> [    0.000000] Hardware name: linux,dummy-virt (DT)
> [    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> [    0.000000] pc : __phys_addr_symbol+0x44/0xc0
> [    0.000000] lr : __phys_addr_symbol+0x44/0xc0
> [    0.000000] sp : ffff800014287b00
> [    0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> [    0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> [    0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> [    0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> [    0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> [    0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> [    0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> [    0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> [    0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> [    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> [    0.000000] Call trace:
> [    0.000000]  __phys_addr_symbol+0x44/0xc0
> [    0.000000]  sparse_init_nid+0x98/0x6d0

From the looks of it, this is pgdat_to_phys, as introduced in next
commit:

  e1db6ef7336d817c ("mm/sparse: fix check_usemap_section_nr warnings")

It appears thta allmodconfig doesn't have CONFIG_NEED_MULTIPLE_NODES=y,
but does have CONFIG_NUMA=y, and so *does* use the dynamically-allocated
node_data array (since contig_page_data is only defined for !NUMA).

I don't think that commit is correct.

Thanks,
Mark.

> [    0.000000]  sparse_init+0x460/0x4d4
> [    0.000000]  bootmem_init+0x110/0x340
> [    0.000000]  setup_arch+0x1b8/0x2e0
> [    0.000000]  start_kernel+0x110/0x870
> [    0.000000]  __primary_switched+0xa8/0xb0
> [    0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> [    0.000000] random: get_random_bytes called from
> oops_exit+0x54/0xc0 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [    0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> exception ]---
> 
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> 
> --
> Linaro LKFT
> https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 12:47 ` Mark Rutland
@ 2021-06-15 13:19   ` Mark Rutland
  2021-06-15 14:50     ` Qian Cai
  2021-06-16  0:29     ` Miles Chen
  0 siblings, 2 replies; 10+ messages in thread
From: Mark Rutland @ 2021-06-15 13:19 UTC (permalink / raw)
  To: Naresh Kamboju, Mike Rapoport, Miles Chen, Andrew Morton
  Cc: Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	Will Deacon, lkft-triage, regressions, Stephen Rothwell,
	Arnd Bergmann, Ard Biesheuvel, Catalin Marinas, Christophe Leroy

On Tue, Jun 15, 2021 at 01:47:45PM +0100, Mark Rutland wrote:
> On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> > Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> > with allmodconfig build.
> > 
> > [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> > [    0.000000] Linux version 5.13.0-rc6-next-20210615
> > (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> > 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> > PREEMPT Tue Jun 15 10:20:51 UTC 2021
> > [    0.000000] Machine model: linux,dummy-virt
> > [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> > [    0.000000] printk: bootconsole [pl11] enabled
> > [    0.000000] efi: UEFI not found.
> > [    0.000000] NUMA: No NUMA configuration found
> > [    0.000000] NUMA: Faking a node at [mem
> > 0x0000000040000000-0x00000000bfffffff]
> > [    0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> > [    0.000000] ------------[ cut here ]------------
> > [    0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> > [    0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G                T
> > 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> > [    0.000000] Hardware name: linux,dummy-virt (DT)
> > [    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> > [    0.000000] pc : __phys_addr_symbol+0x44/0xc0
> > [    0.000000] lr : __phys_addr_symbol+0x44/0xc0
> > [    0.000000] sp : ffff800014287b00
> > [    0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> > [    0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> > [    0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> > [    0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> > [    0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> > [    0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> > [    0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> > [    0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> > [    0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> > [    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> > [    0.000000] Call trace:
> > [    0.000000]  __phys_addr_symbol+0x44/0xc0
> > [    0.000000]  sparse_init_nid+0x98/0x6d0
> 
> From the looks of it, this is pgdat_to_phys, as introduced in next
> commit:
> 
>   e1db6ef7336d817c ("mm/sparse: fix check_usemap_section_nr warnings")
> 
> It appears thta allmodconfig doesn't have CONFIG_NEED_MULTIPLE_NODES=y,
> but does have CONFIG_NUMA=y, and so *does* use the dynamically-allocated
> node_data array (since contig_page_data is only defined for !NUMA).
> 
> I don't think that commit is correct.

Looking some more, it looks like that's correct in isolation, but it
clashes with commit:

  5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")

... and I reckon it'd be clearer and more robust to define
pgdat_to_phys() in the same ifdefs as contig_page_data so that
these, stay in-sync. e.g. have:

| #ifdef CONFIG_NUMA
| #define pgdat_to_phys(x)	virt_to_phys(x)
| #else /* CONFIG_NUMA */
| 
| extern struct pglist_data contig_page_data;
| ...
| #define pgdat_to_phys(x)	__pa_symbol(&contig_page_data)
|
| #endif /* CONIFIG_NUMA */

... which'd also make clear that contig_page_data is the *only* expected
pglist_data.

Thanks,
Mark.

> Thanks,
> Mark.
> 
> > [    0.000000]  sparse_init+0x460/0x4d4
> > [    0.000000]  bootmem_init+0x110/0x340
> > [    0.000000]  setup_arch+0x1b8/0x2e0
> > [    0.000000]  start_kernel+0x110/0x870
> > [    0.000000]  __primary_switched+0xa8/0xb0
> > [    0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> > [    0.000000] random: get_random_bytes called from
> > oops_exit+0x54/0xc0 with crng_init=0
> > [    0.000000] ---[ end trace 0000000000000000 ]---
> > [    0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > [    0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> > exception ]---
> > 
> > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> > 
> > --
> > Linaro LKFT
> > https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 13:19   ` Mark Rutland
@ 2021-06-15 14:50     ` Qian Cai
  2021-06-15 19:21       ` Mike Rapoport
  2021-06-16  0:29     ` Miles Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Qian Cai @ 2021-06-15 14:50 UTC (permalink / raw)
  To: Mark Rutland, Naresh Kamboju, Mike Rapoport, Miles Chen, Andrew Morton
  Cc: Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	Will Deacon, lkft-triage, regressions, Stephen Rothwell,
	Arnd Bergmann, Ard Biesheuvel, Catalin Marinas, Christophe Leroy



On 6/15/2021 9:19 AM, Mark Rutland wrote:
> Looking some more, it looks like that's correct in isolation, but it
> clashes with commit:
> 
>   5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")

Just a data point. Reverting the commit alone fixed the same crash for me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 14:50     ` Qian Cai
@ 2021-06-15 19:21       ` Mike Rapoport
  2021-06-15 23:34         ` Stephen Rothwell
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Rapoport @ 2021-06-15 19:21 UTC (permalink / raw)
  To: Qian Cai, Andrew Morton
  Cc: Mark Rutland, Naresh Kamboju, Miles Chen,
	Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	Will Deacon, lkft-triage, regressions, Stephen Rothwell,
	Arnd Bergmann, Ard Biesheuvel, Catalin Marinas, Christophe Leroy

On Tue, Jun 15, 2021 at 10:50:31AM -0400, Qian Cai wrote:
> 
> 
> On 6/15/2021 9:19 AM, Mark Rutland wrote:
> > Looking some more, it looks like that's correct in isolation, but it
> > clashes with commit:
> > 
> >   5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")
> 
> Just a data point. Reverting the commit alone fixed the same crash for me.

Yeah, that commit didn't take into the account the change in
pgdat_to_phys().

The patch below should fix it. In the long run I think we should get rid of
contig_page_data and allocate NODE_DATA(0) for !NUMA case as well.

Andrew, can you please add this as a fixup to "mm: replace
CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA"?


diff --git a/mm/sparse.c b/mm/sparse.c
index a0e9cdb5bc38..6326cdf36c4f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -347,7 +347,7 @@ size_t mem_section_usage_size(void)
 
 static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
 {
-#ifndef CONFIG_NEED_MULTIPLE_NODES
+#ifndef CONFIG_NUMA
 	return __pa_symbol(pgdat);
 #else
 	return __pa(pgdat);

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 19:21       ` Mike Rapoport
@ 2021-06-15 23:34         ` Stephen Rothwell
  2021-06-15 23:40           ` Miles Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Rothwell @ 2021-06-15 23:34 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Qian Cai, Andrew Morton, Mark Rutland, Naresh Kamboju,
	Miles Chen, Linux-Next Mailing List, linux-mm, Linux ARM,
	open list, Will Deacon, lkft-triage, regressions, Arnd Bergmann,
	Ard Biesheuvel, Catalin Marinas, Christophe Leroy,
	Alistair Popple


[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]

Hi all,

On Tue, 15 Jun 2021 22:21:32 +0300 Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> On Tue, Jun 15, 2021 at 10:50:31AM -0400, Qian Cai wrote:
> > 
> > On 6/15/2021 9:19 AM, Mark Rutland wrote:  
> > > Looking some more, it looks like that's correct in isolation, but it
> > > clashes with commit:
> > > 
> > >   5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")  
> > 
> > Just a data point. Reverting the commit alone fixed the same crash for me.  
> 
> Yeah, that commit didn't take into the account the change in
> pgdat_to_phys().
> 
> The patch below should fix it. In the long run I think we should get rid of
> contig_page_data and allocate NODE_DATA(0) for !NUMA case as well.
> 
> Andrew, can you please add this as a fixup to "mm: replace
> CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA"?
> 
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index a0e9cdb5bc38..6326cdf36c4f 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -347,7 +347,7 @@ size_t mem_section_usage_size(void)
>  
>  static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
>  {
> -#ifndef CONFIG_NEED_MULTIPLE_NODES
> +#ifndef CONFIG_NUMA
>  	return __pa_symbol(pgdat);
>  #else
>  	return __pa(pgdat);

Added to linux-next today.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 23:34         ` Stephen Rothwell
@ 2021-06-15 23:40           ` Miles Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Miles Chen @ 2021-06-15 23:40 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Mike Rapoport, Qian Cai, Andrew Morton, Mark Rutland,
	Naresh Kamboju, Linux-Next Mailing List, linux-mm, Linux ARM,
	open list, Will Deacon, lkft-triage, regressions, Arnd Bergmann,
	Ard Biesheuvel, Catalin Marinas, Christophe Leroy,
	Alistair Popple

On Wed, 2021-06-16 at 09:34 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> On Tue, 15 Jun 2021 22:21:32 +0300 Mike Rapoport <rppt@linux.ibm.com> wrote:
> >
> > On Tue, Jun 15, 2021 at 10:50:31AM -0400, Qian Cai wrote:
> > > 
> > > On 6/15/2021 9:19 AM, Mark Rutland wrote:  
> > > > Looking some more, it looks like that's correct in isolation, but it
> > > > clashes with commit:
> > > > 
> > > >   5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")  
> > > 
> > > Just a data point. Reverting the commit alone fixed the same crash for me.  
> > 
> > Yeah, that commit didn't take into the account the change in
> > pgdat_to_phys().
> > 
> > The patch below should fix it. In the long run I think we should get rid of
> > contig_page_data and allocate NODE_DATA(0) for !NUMA case as well.
> > 
> > Andrew, can you please add this as a fixup to "mm: replace
> > CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA"?
> > 
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index a0e9cdb5bc38..6326cdf36c4f 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -347,7 +347,7 @@ size_t mem_section_usage_size(void)
> >  
> >  static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat)
> >  {
> > -#ifndef CONFIG_NEED_MULTIPLE_NODES
> > +#ifndef CONFIG_NUMA
> >  	return __pa_symbol(pgdat);
> >  #else
> >  	return __pa(pgdat);
> 
> Added to linux-next today.
> 

Sorry for my late response.
thanks for doing this. 

Miles


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 13:19   ` Mark Rutland
  2021-06-15 14:50     ` Qian Cai
@ 2021-06-16  0:29     ` Miles Chen
  1 sibling, 0 replies; 10+ messages in thread
From: Miles Chen @ 2021-06-16  0:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Naresh Kamboju, Mike Rapoport, Andrew Morton,
	Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	Will Deacon, lkft-triage, regressions, Stephen Rothwell,
	Arnd Bergmann, Ard Biesheuvel, Catalin Marinas, Christophe Leroy

On Tue, 2021-06-15 at 14:19 +0100, Mark Rutland wrote:
> On Tue, Jun 15, 2021 at 01:47:45PM +0100, Mark Rutland wrote:
> > On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> > > Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> > > with allmodconfig build.
> > > 
> > > [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> > > [    0.000000] Linux version 5.13.0-rc6-next-20210615
> > > (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> > > 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> > > PREEMPT Tue Jun 15 10:20:51 UTC 2021
> > > [    0.000000] Machine model: linux,dummy-virt
> > > [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> > > [    0.000000] printk: bootconsole [pl11] enabled
> > > [    0.000000] efi: UEFI not found.
> > > [    0.000000] NUMA: No NUMA configuration found
> > > [    0.000000] NUMA: Faking a node at [mem
> > > 0x0000000040000000-0x00000000bfffffff]
> > > [    0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> > > [    0.000000] ------------[ cut here ]------------
> > > [    0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> > > [    0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > > [    0.000000] Modules linked in:
> > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G                T
> > > 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> > > [    0.000000] Hardware name: linux,dummy-virt (DT)
> > > [    0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> > > [    0.000000] pc : __phys_addr_symbol+0x44/0xc0
> > > [    0.000000] lr : __phys_addr_symbol+0x44/0xc0
> > > [    0.000000] sp : ffff800014287b00
> > > [    0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> > > [    0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> > > [    0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> > > [    0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> > > [    0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> > > [    0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> > > [    0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> > > [    0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> > > [    0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> > > [    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> > > [    0.000000] Call trace:
> > > [    0.000000]  __phys_addr_symbol+0x44/0xc0
> > > [    0.000000]  sparse_init_nid+0x98/0x6d0
> > 
> > From the looks of it, this is pgdat_to_phys, as introduced in next
> > commit:
> > 
> >   e1db6ef7336d817c ("mm/sparse: fix check_usemap_section_nr warnings")
> > 
> > It appears thta allmodconfig doesn't have CONFIG_NEED_MULTIPLE_NODES=y,
> > but does have CONFIG_NUMA=y, and so *does* use the dynamically-allocated
> > node_data array (since contig_page_data is only defined for !NUMA).
> > 
> > I don't think that commit is correct.
> 
> Looking some more, it looks like that's correct in isolation, but it
> clashes with commit:
> 
>   5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")
> 
> ... and I reckon it'd be clearer and more robust to define
> pgdat_to_phys() in the same ifdefs as contig_page_data so that
> these, stay in-sync. e.g. have:
> 
> | #ifdef CONFIG_NUMA
> | #define pgdat_to_phys(x)	virt_to_phys(x)
> | #else /* CONFIG_NUMA */
> | 
> | extern struct pglist_data contig_page_data;
> | ...
> | #define pgdat_to_phys(x)	__pa_symbol(&contig_page_data)
> |
> | #endif /* CONIFIG_NUMA */
> 
> ... which'd also make clear that contig_page_data is the *only* expected
> pglist_data.

Thanks for your suggestion. 
It looks more clear, I will submit another patch for this. (after the
merge)

Miles

> Thanks,
> Mark.
> 
> > Thanks,
> > Mark.
> > 
> > > [    0.000000]  sparse_init+0x460/0x4d4
> > > [    0.000000]  bootmem_init+0x110/0x340
> > > [    0.000000]  setup_arch+0x1b8/0x2e0
> > > [    0.000000]  start_kernel+0x110/0x870
> > > [    0.000000]  __primary_switched+0xa8/0xb0
> > > [    0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> > > [    0.000000] random: get_random_bytes called from
> > > oops_exit+0x54/0xc0 with crng_init=0
> > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > [    0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > > [    0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> > > exception ]---
> > > 
> > > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> > > 
> > > --
> > > Linaro LKFT
> > > https://lkft.linaro.org


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c
  2021-06-15 11:50 ` Will Deacon
@ 2021-06-17 12:15   ` Naresh Kamboju
  0 siblings, 0 replies; 10+ messages in thread
From: Naresh Kamboju @ 2021-06-17 12:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linux-Next Mailing List, linux-mm, Linux ARM, open list,
	lkft-triage, regressions, Andrew Morton, Stephen Rothwell,
	Arnd Bergmann, Ard Biesheuvel, Catalin Marinas, Mike Rapoport,
	Christophe Leroy

Hi Will,

On Tue, 15 Jun 2021 at 17:20, Will Deacon <will@kernel.org> wrote:
>
> Hi Naresh,
>
> On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> > Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> > with allmodconfig build.

<trim>

> Thanks for the report, although since this appears to be part of a broader
> testing effort, here are some things that I think would make the reports
> even more useful:
>
>   1. An indication as to whether or not this is a regression (i.e. do you
>      have a known good build, perhaps even a bisection?)
>
>   2. Either a link to the vmlinux, or faddr2line run on the backtrace.
>      Looking at the above, I can't tell what sparse_init_nid+0x98/0x6d0
>      actually is.
>
>   3. The exact QEMU command-line you are using, so I can try to reproduce
>      this locally. I think the 0-day bot wraps the repro up in a shell
>      script for you.
>
>   4. Whether or not the issue is reproducible.
>
>   5. Information about the toolchain you used to build the kernel (it
>      happens to be present here because it's in the kernel log, but
>      generally I think it would be handy to specify that in the report).
>
> Please can you provide that information for this crash? It would really
> help in debugging it.

Sorry for the incomplete bug report.

Thanks for sharing these details.
Next time I will include the suggested data points in my email report.

- Naresh

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-15 11:11 [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c Naresh Kamboju
2021-06-15 11:50 ` Will Deacon
2021-06-17 12:15   ` Naresh Kamboju
2021-06-15 12:47 ` Mark Rutland
2021-06-15 13:19   ` Mark Rutland
2021-06-15 14:50     ` Qian Cai
2021-06-15 19:21       ` Mike Rapoport
2021-06-15 23:34         ` Stephen Rothwell
2021-06-15 23:40           ` Miles Chen
2021-06-16  0:29     ` Miles Chen

Linux kernel regressions

Archives are clonable:
	git clone --mirror https://lore.kernel.org/regressions/0 regressions/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 regressions regressions/ https://lore.kernel.org/regressions \
		regressions@lists.linux.dev
	public-inbox-index regressions

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/dev.linux.lists.regressions


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git