All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
@ 2022-08-12 20:01 Max Schulze
  2022-08-15 14:22 ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Max Schulze @ 2022-08-12 20:01 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: catalin.marinas, will, naush

Hello,


I run a userspace program, which does image analysis. This is compiled from freepascal. The program freezes, I get below kernel oops. My program is calling SysUtils.ExecuteProcess('/sbin/shutdown') when finished, I have traced with strace and it hangs at the *clone syscall*.


I have 4 different devices where this happens. Tonight I built the latest kernel with debug infos (rpi-5.19.y commit c3a3eb5a3).

Log is attached.


a) might bad pointers being fed into the clone syscall be the culprit or is this purely a kernel issue? Do you have tips how to investigate?

b) how can I improve the debug logs, which kernel options should I add ?


Thanks,

Max


$ cat /proc/cmdline

coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0 video=HDMI-A-1:1920x1080M@60 smsc95xx.macaddr=<> vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=tty1 root=PARTUUID=<> rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr mitigations=off


[20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
[20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
[20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G         C        5.19.0-v8+ #1
[20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[20:48:46] Call trace:
[20:48:46]  dump_backtrace.part.0+0x1dc/0x1ec
[20:48:46]  show_stack+0x24/0x80
[20:48:46]  dump_stack_lvl+0x8c/0xb8
[20:48:46]  dump_stack+0x1c/0x38
[20:48:46]  print_bad_pte+0x2ec/0x350
[20:48:46]  vm_normal_page+0x16c/0x190
[20:48:46]  copy_page_range+0x45c/0x13c0
[20:48:46]  dup_mm+0x5bc/0x7f4
[20:48:46]  copy_process+0x1354/0x2370
[20:48:46]  kernel_clone+0xf0/0x590
[20:48:46]  __do_sys_clone+0xa4/0xe0
[20:48:46]  __arm64_sys_clone+0x74/0x90
[20:48:46]  invoke_syscall+0x68/0x1a0
[20:48:46]  el0_svc_common.constprop.0+0x88/0x170
[20:48:46]  do_el0_svc+0xcc/0xf0
[20:48:46]  el0_svc+0x30/0x70
[20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
[20:48:46]  el0t_64_sync+0x18c/0x190
[20:48:46] Disabling lock debugging due to kernel taint
[20:48:46] get_swap_device: Bad swap file entry 801111112111111
[20:48:46] BUG: Bad page map in process projecta  pte:1211111111111111 pmd:800000001c40003
[20:48:46] addr:0000007fa1c02000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c02
[20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G    B    C        5.19.0-v8+ #1
[20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[20:48:46] Call trace:
[20:48:46]  dump_backtrace.part.0+0x1dc/0x1ec
[20:48:46]  show_stack+0x24/0x80
[20:48:46]  dump_stack_lvl+0x8c/0xb8
[20:48:46]  dump_stack+0x1c/0x38
[20:48:46]  print_bad_pte+0x2ec/0x350
[20:48:46]  vm_normal_page+0x16c/0x190
[20:48:46]  copy_page_range+0x45c/0x13c0
[20:48:46]  dup_mm+0x5bc/0x7f4
[20:48:46]  copy_process+0x1354/0x2370
[20:48:46]  kernel_clone+0xf0/0x590
[20:48:46]  __do_sys_clone+0xa4/0xe0
[20:48:46]  __arm64_sys_clone+0x74/0x90
[20:48:46]  invoke_syscall+0x68/0x1a0
[20:48:46]  el0_svc_common.constprop.0+0x88/0x170
[20:48:46]  do_el0_svc+0xcc/0xf0
[20:48:46]  el0_svc+0x30/0x70
[20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
[20:48:46]  el0t_64_sync+0x18c/0x190
[20:48:46] ==================================================================
[20:48:46] BUG: KASAN: user-memory-access in __sync_icache_dcache+0xc0/0x190
[20:48:46] Read of size 8 at addr 0000004244444440 by task projecta/1069

[20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G    B    C        5.19.0-v8+ #1
[20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[20:48:46] Call trace:
[20:48:46]  dump_backtrace.part.0+0x1dc/0x1ec
[20:48:46]  show_stack+0x24/0x80
[20:48:46]  dump_stack_lvl+0x8c/0xb8
[20:48:46]  print_report+0xcc/0x580
[20:48:46]  kasan_report+0xa8/0x170
[20:48:46]  __asan_load8+0x94/0xd0
[20:48:46]  __sync_icache_dcache+0xc0/0x190
[20:48:46]  set_pte_at+0x20c/0x280
[20:48:46]  copy_page_range+0x7fc/0x13c0
[20:48:46]  dup_mm+0x5bc/0x7f4
[20:48:46]  copy_process+0x1354/0x2370
[20:48:46]  kernel_clone+0xf0/0x590
[20:48:46]  __do_sys_clone+0xa4/0xe0
[20:48:46]  __arm64_sys_clone+0x74/0x90
[20:48:46]  invoke_syscall+0x68/0x1a0
[20:48:46]  el0_svc_common.constprop.0+0x88/0x170
[20:48:46]  do_el0_svc+0xcc/0xf0
[20:48:46]  el0_svc+0x30/0x70
[20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
[20:48:46]  el0t_64_sync+0x18c/0x190
[20:48:46] ==================================================================
[20:48:46] Unable to handle kernel paging request at virtual address 0000004244444440
[20:48:46] Mem abort info:
[20:48:46]   ESR = 0x0000000096000005
[20:48:46]   EC = 0x25: DABT (current EL), IL = 32 bits
[20:48:46]   SET = 0, FnV = 0
[20:48:46]   EA = 0, S1PTW = 0
[20:48:46]   FSC = 0x05: level 1 translation fault
[20:48:46] Data abort info:
[20:48:46]   ISV = 0, ISS = 0x00000005
[20:48:46]   CM = 0, WnR = 0
[20:48:46] user pgtable: 4k pages, 39-bit VAs, pgdp=000000004ef1b000
[20:48:46] [0000004244444440] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[20:48:46] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[20:48:46] Modules linked in: ov9281 rtc_pcf85063 regmap_i2c brcmfmac brcmutil cfg80211 v3d gpu_sched raspberrypi_hwmon drm_shmem_helper gpio_keys i2c_mux_pinctrl i2c_mux bcm2835_unicam rfkill v4l2_dv_timings i2c_brcmstb v4l2_fwnode joydev v4l2_async hid_microsoft rpivid_hevc(C) bcm2835_codec(C) bcm2835_isp(C) ff_memless bcm2835_v4l2(C) i2c_bcm2835 bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops vc_sm_cma(C) videobuf2_v4l2 videobuf2_common videodev mc nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G    B    C        5.19.0-v8+ #1
[20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[20:48:46] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[20:48:46] pc : __sync_icache_dcache+0xc0/0x190
[20:48:46] lr : __sync_icache_dcache+0xc0/0x190
[20:48:46] sp : ffffffc00ce97590
[20:48:46] x29: ffffffc00ce97590 x28: ffffff804b540e50 x27: fffffffe00071028
[20:48:46] x26: 0000007fa1c02000 x25: fffffffe00bf5228 x24: 0000000000000000
[20:48:46] x23: fffffffe00000000 x22: 0000004244444440 x21: 1ffffff8019d2eba
[20:48:46] x20: 0000000000000000 x19: 0000004444444440 x18: 0000000000000000
[20:48:46] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[20:48:46] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014d3a11
[20:48:46] x11: 1ffffff8014d3a10 x10: ffffffb8014d3a10 x9 : dfffffc000000000
[20:48:46] x8 : ffffffc00a69d087 x7 : 0000000000000001 x6 : 00000047feb2c5f0
[20:48:46] x5 : ffffffc00a69d080 x4 : ffffffb8014d3a11 x3 : ffffffc0080b88a4
[20:48:46] x2 : 0000000000000000 x1 : ffffff804c330040 x0 : 0000000000000001
[20:48:46] Call trace:
[20:48:46]  __sync_icache_dcache+0xc0/0x190
[20:48:46]  set_pte_at+0x20c/0x280
[20:48:46]  copy_page_range+0x7fc/0x13c0
[20:48:46]  dup_mm+0x5bc/0x7f4
[20:48:46]  copy_process+0x1354/0x2370
[20:48:46]  kernel_clone+0xf0/0x590
[20:48:46]  __do_sys_clone+0xa4/0xe0
[20:48:46]  __arm64_sys_clone+0x74/0x90
[20:48:46]  invoke_syscall+0x68/0x1a0
[20:48:46]  el0_svc_common.constprop.0+0x88/0x170
[20:48:46]  do_el0_svc+0xcc/0xf0
[20:48:46]  el0_svc+0x30/0x70
[20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
[20:48:46]  el0t_64_sync+0x18c/0x190
[20:48:46] Code: d37ae673 8b170276 aa1603e0 940fa1ce (f8776a60)
[20:48:46] ---[ end trace 0000000000000000 ]---
[20:48:46] note: projecta[1069] exited with preempt_count 2
[20:53:57] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[20:57:57] kmemleak: 2434 new suspected memory leaks (see /sys/kernel/debug/kmemleak)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
  2022-08-12 20:01 BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall Max Schulze
@ 2022-08-15 14:22 ` Will Deacon
  2022-08-15 14:58   ` Max Schulze
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2022-08-15 14:22 UTC (permalink / raw)
  To: Max Schulze; +Cc: linux-arm-kernel, catalin.marinas, naush

Hi,

On Fri, Aug 12, 2022 at 10:01:06PM +0200, Max Schulze wrote:
> I run a userspace program, which does image analysis. This is compiled
> from freepascal. The program freezes, I get below kernel oops. My program
> is calling SysUtils.ExecuteProcess('/sbin/shutdown') when finished, I have
> traced with strace and it hangs at the *clone syscall*.
> 
> I have 4 different devices where this happens. Tonight I built the latest
> kernel with debug infos (rpi-5.19.y commit c3a3eb5a3).
> 
> $ cat /proc/cmdline
> 
> coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0
> video=HDMI-A-1:1920x1080M@60 smsc95xx.macaddr=<>
> vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=tty1
> root=PARTUUID=<> rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr
> mitigations=off
> 
> 
> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> [20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G         C        5.19.0-v8+ #1
> [20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
> [20:48:46] Call trace:
> [20:48:46]  dump_backtrace.part.0+0x1dc/0x1ec
> [20:48:46]  show_stack+0x24/0x80
> [20:48:46]  dump_stack_lvl+0x8c/0xb8
> [20:48:46]  dump_stack+0x1c/0x38
> [20:48:46]  print_bad_pte+0x2ec/0x350
> [20:48:46]  vm_normal_page+0x16c/0x190
> [20:48:46]  copy_page_range+0x45c/0x13c0
> [20:48:46]  dup_mm+0x5bc/0x7f4
> [20:48:46]  copy_process+0x1354/0x2370
> [20:48:46]  kernel_clone+0xf0/0x590
> [20:48:46]  __do_sys_clone+0xa4/0xe0
> [20:48:46]  __arm64_sys_clone+0x74/0x90
> [20:48:46]  invoke_syscall+0x68/0x1a0
> [20:48:46]  el0_svc_common.constprop.0+0x88/0x170
> [20:48:46]  do_el0_svc+0xcc/0xf0
> [20:48:46]  el0_svc+0x30/0x70
> [20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
> [20:48:46]  el0t_64_sync+0x18c/0x190
> [20:48:46] Disabling lock debugging due to kernel taint
> [20:48:46] get_swap_device: Bad swap file entry 801111112111111
> [20:48:46] BUG: Bad page map in process projecta  pte:1211111111111111 pmd:800000001c40003
> [20:48:46] addr:0000007fa1c02000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c02

I hate to say it, but this all looks like memory corruption hitting the
page table and possibly the 'struct page' array to me :/

Are you able to reproduce this problem using an upstream kernel, rather
than one built from the raspberry pi tree?

Another thing to consider (and apologies if this sounds silly) is the power
supply. Are all four of your devices using the same power supply? Are you
able to try something beefier?

Finally, you could enable CONFIG_MEMTEST and boot with memtest=17 on the
command line, just in case it's a rogue DMA or something.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
  2022-08-15 14:22 ` Will Deacon
@ 2022-08-15 14:58   ` Max Schulze
  2022-08-18 17:14       ` Max Schulze
  0 siblings, 1 reply; 9+ messages in thread
From: Max Schulze @ 2022-08-15 14:58 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, catalin.marinas, naush

Hi Will,


On 15.08.22 16:22, Will Deacon wrote:
> Hi,
> 
> On Fri, Aug 12, 2022 at 10:01:06PM +0200, Max Schulze wrote:
>> I run a userspace program, which does image analysis. This is compiled
>> from freepascal. The program freezes, I get below kernel oops. My program
>> is calling SysUtils.ExecuteProcess('/sbin/shutdown') when finished, I have
>> traced with strace and it hangs at the *clone syscall*.
>>
>> I have 4 different devices where this happens. Tonight I built the latest
>> kernel with debug infos (rpi-5.19.y commit c3a3eb5a3).
>>
>> $ cat /proc/cmdline
>>
>> coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0
>> video=HDMI-A-1:1920x1080M@60 smsc95xx.macaddr=<>
>> vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=tty1
>> root=PARTUUID=<> rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr
>> mitigations=off
>>
>>
>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
>> [20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G         C        5.19.0-v8+ #1
>> [20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>> [20:48:46] Call trace:
>> [20:48:46]  dump_backtrace.part.0+0x1dc/0x1ec
>> [20:48:46]  show_stack+0x24/0x80
>> [20:48:46]  dump_stack_lvl+0x8c/0xb8
>> [20:48:46]  dump_stack+0x1c/0x38
>> [20:48:46]  print_bad_pte+0x2ec/0x350
>> [20:48:46]  vm_normal_page+0x16c/0x190
>> [20:48:46]  copy_page_range+0x45c/0x13c0
>> [20:48:46]  dup_mm+0x5bc/0x7f4
>> [20:48:46]  copy_process+0x1354/0x2370
>> [20:48:46]  kernel_clone+0xf0/0x590
>> [20:48:46]  __do_sys_clone+0xa4/0xe0
>> [20:48:46]  __arm64_sys_clone+0x74/0x90
>> [20:48:46]  invoke_syscall+0x68/0x1a0
>> [20:48:46]  el0_svc_common.constprop.0+0x88/0x170
>> [20:48:46]  do_el0_svc+0xcc/0xf0
>> [20:48:46]  el0_svc+0x30/0x70
>> [20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
>> [20:48:46]  el0t_64_sync+0x18c/0x190
>> [20:48:46] Disabling lock debugging due to kernel taint
>> [20:48:46] get_swap_device: Bad swap file entry 801111112111111
>> [20:48:46] BUG: Bad page map in process projecta  pte:1211111111111111 pmd:800000001c40003
>> [20:48:46] addr:0000007fa1c02000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c02
> 

> I hate to say it, but this all looks like memory corruption hitting the
> page table and possibly the 'struct page' array to me :/

Perhaps a note on the occcurence: across devices, the "bad page map" differs at pte, but somehow is mostly consistent at pmd:800000001c40003 (though I have seen 800000001c02003 and 800000001c40003). Is this some "magic value"? Because when not, I think it would be highly unlikely to be the hardware.

It is not only my program that has the problem, I have seen

[Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3  pte:262d2626292a2627 pmd:800000001c01003

and
[Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1  pte:a098a09aa29ea8a4 pmd:800000001c01003
[Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
[Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0


Now I have a kernel with KASAN and KFENCE, I get this additional Output, is that of any help?

 14:26:12 kernel: BUG: Bad page map in process projecta  pte:8e8d8a938a918d88 pmd:800000001c01003
 14:26:12 kernel: addr:0000007fa5601000 vm_flags:00100073 anon_vma:ffffff805e9898f8 mapping:0000000000000000 index:7fa5601
 14:26:12 kernel: file:(null) fault:0x0 mmap:0x0 read_folio:0x0
 14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G         C        5.19.0-v8-0815+ #5
 14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
 14:26:12 kernel: Call trace:
 14:26:12 kernel:  dump_backtrace.part.0+0x1dc/0x1ec
 14:26:12 kernel:  show_stack+0x24/0x80
 14:26:12 kernel:  dump_stack_lvl+0x8c/0xb8
 14:26:12 kernel:  dump_stack+0x1c/0x38
 14:26:12 kernel:  print_bad_pte+0x2ec/0x350
 14:26:12 kernel:  vm_normal_page+0x16c/0x190
 14:26:12 kernel:  copy_page_range+0x45c/0x11f0
 14:26:12 kernel:  dup_mm+0x5bc/0x7f4
 14:26:12 kernel:  copy_process+0x1354/0x2370
 14:26:12 kernel:  kernel_clone+0xf0/0x590
 14:26:12 kernel:  __do_sys_clone+0xa4/0xe0
 14:26:12 kernel:  __arm64_sys_clone+0x74/0x90
 14:26:12 kernel:  invoke_syscall+0x68/0x1a0
 14:26:12 kernel:  el0_svc_common.constprop.0+0x88/0x170
 14:26:12 kernel:  do_el0_svc+0xcc/0xf0
 14:26:12 kernel:  el0_svc+0x30/0x70
 14:26:12 kernel:  el0t_64_sync_handler+0x1ac/0x1b0
 14:26:12 kernel:  el0t_64_sync+0x18c/0x190
 14:26:12 kernel: Disabling lock debugging due to kernel taint
 14:26:12 kernel: ==================================================================
 14:26:12 kernel: BUG: KASAN: wild-memory-access in __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel: Read of size 8 at addr 000002284e2a4600 by task projecta/1053
 14:26:12 kernel: 
 14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G    B    C        5.19.0-v8-0815+ #5
 14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
 14:26:12 kernel: Call trace:
 14:26:12 kernel:  dump_backtrace.part.0+0x1dc/0x1ec
 14:26:12 kernel:  show_stack+0x24/0x80
 14:26:12 kernel:  dump_stack_lvl+0x8c/0xb8
 14:26:12 kernel:  print_report+0xcc/0x580
 14:26:12 kernel:  kasan_report+0xa8/0x170
 14:26:12 kernel:  __asan_load8+0x94/0xd0
 14:26:12 kernel:  __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel:  set_pte_at+0x20c/0x280
 14:26:12 kernel:  copy_page_range+0x804/0x11f0
 14:26:12 kernel:  dup_mm+0x5bc/0x7f4
 14:26:12 kernel:  copy_process+0x1354/0x2370
 14:26:12 kernel:  kernel_clone+0xf0/0x590
 14:26:12 kernel:  __do_sys_clone+0xa4/0xe0
 14:26:12 kernel:  __arm64_sys_clone+0x74/0x90
 14:26:12 kernel:  invoke_syscall+0x68/0x1a0
 14:26:12 kernel:  el0_svc_common.constprop.0+0x88/0x170
 14:26:12 kernel:  do_el0_svc+0xcc/0xf0
 14:26:12 kernel:  el0_svc+0x30/0x70
 14:26:12 kernel:  el0t_64_sync_handler+0x1ac/0x1b0
 14:26:12 kernel:  el0t_64_sync+0x18c/0x190
 14:26:12 kernel: ==================================================================
 14:26:12 kernel: Unable to handle kernel paging request at virtual address 000002284e2a4600
 14:26:12 kernel: Mem abort info:
 14:26:12 kernel:   ESR = 0x0000000096000004
 14:26:12 kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
 14:26:12 kernel:   SET = 0, FnV = 0
 14:26:12 kernel:   EA = 0, S1PTW = 0
 14:26:12 kernel:   FSC = 0x04: level 0 translation fault
 14:26:12 kernel: Data abort info:
 14:26:12 kernel:   ISV = 0, ISS = 0x00000004
 14:26:12 kernel:   CM = 0, WnR = 0
 14:26:12 kernel: [000002284e2a4600] address between user and kernel address ranges
 14:26:12 kernel: Internal error: Oops: 96000004 [#1] PREEMPT SMP
 14:26:12 kernel: Modules linked in: ov9281 rtc_pcf85063 regmap_i2c rfkill v3d bcm2835_unicam rpivid_hevc(C) v4l2_dv_timings bcm2835_v4l2(C) v4l2_fwnode bcm2835_codec(C) v4l2_async videobuf2_vmalloc v4l2_mem2mem gpu_sched bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig drm_shmem_helper videobuf2_memops videobuf2_v4l2 i2c_mux_pinctrl videobuf2_common i2c_mux raspberrypi_hwmon videodev i2c_brcmstb joydev i2c_bcm2835 hid_microsoft ff_memless vc_sm_cma(C) mc nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ipv6
 14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G    B    C        5.19.0-v8-0815+ #5
 14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
 14:26:12 kernel: pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 14:26:12 kernel: pc : __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel: lr : __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel: sp : ffffffc00cbe7590
 14:26:12 kernel: x29: ffffffc00cbe7590 x28: fffffffe00070068 x27: fffffffe00be37e8
 14:26:12 kernel: x26: 0000007fa5601000 x25: 8e858a938a918d88 x24: 0000000000000000
 14:26:12 kernel: x23: fffffffe00000000 x22: 000002284e2a4600 x21: 1ffffff80197ceba
 14:26:12 kernel: x20: 0000000000000000 x19: 0000022a4e2a4600 x18: 0000000000000000
 14:26:12 kernel: x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
 14:26:12 kernel: x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014cfe11
 14:26:12 kernel: x11: 1ffffff8014cfe10 x10: ffffffb8014cfe10 x9 : dfffffc000000000
 14:26:12 kernel: x8 : ffffffc00a67f087 x7 : 0000000000000001 x6 : 00000047feb301f0
 14:26:12 kernel: x5 : ffffffc00a67f080 x4 : ffffffb8014cfe11 x3 : ffffffc0080b88a4
 14:26:12 kernel: x2 : 0000000000000000 x1 : ffffff804a0e0040 x0 : 0000000000000001
 14:26:12 kernel: Call trace:
 14:26:12 kernel:  __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel:  set_pte_at+0x20c/0x280
 14:26:12 kernel:  copy_page_range+0x804/0x11f0
 14:26:12 kernel:  dup_mm+0x5bc/0x7f4
 14:26:12 kernel:  copy_process+0x1354/0x2370
 14:26:12 kernel:  kernel_clone+0xf0/0x590
 14:26:12 kernel:  __do_sys_clone+0xa4/0xe0
 14:26:12 kernel:  __arm64_sys_clone+0x74/0x90
 14:26:12 kernel:  invoke_syscall+0x68/0x1a0
 14:26:12 kernel:  el0_svc_common.constprop.0+0x88/0x170
 14:26:12 kernel:  do_el0_svc+0xcc/0xf0
 14:26:12 kernel:  el0_svc+0x30/0x70
 14:26:12 kernel:  el0t_64_sync_handler+0x1ac/0x1b0
 14:26:12 kernel:  el0t_64_sync+0x18c/0x190
 14:26:12 kernel: Code: d37ae673 8b170276 aa1603e0 940fa1f6 (f8776a60) 
 14:26:12 kernel: ---[ end trace 0000000000000000 ]---
 14:26:12 kernel: note: projecta[1053] exited with preempt_count 2





> 
> Are you able to reproduce this problem using an upstream kernel, rather
> than one built from the raspberry pi tree?

I will try to do this, might take a while and report back.

NB: When I began building the kernel with more debug output, this is when I saw the "kmemleak: cannot insert..." I reported separately. They might be related.

> 
> Another thing to consider (and apologies if this sounds silly) is the power
> supply. Are all four of your devices using the same power supply? Are you
> able to try something beefier?

No, they are all using different power supplies. The one on my desk is 12V/5A, should be plenty when recommended minimum is 1A.

> 
> Finally, you could enable CONFIG_MEMTEST and boot with memtest=17 on the
> command line, just in case it's a rogue DMA or something.

I did this just now. Nothing to report:

Mon Aug 15 16:34:24 2022] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[Mon Aug 15 16:34:24 2022] Linux version 5.19.0-v8-0815+ (root@raspberrypi) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #5 SMP PREEMPT Sun Aug 14 23:20:47 CEST 2022
[Mon Aug 15 16:34:24 2022] random: crng init done
[Mon Aug 15 16:34:24 2022] Machine model: Raspberry Pi Compute Module 4 Rev 1.0
[Mon Aug 15 16:34:24 2022] efi: UEFI not found.
[Mon Aug 15 16:34:24 2022] Reserved memory: created CMA memory pool at 0x000000001ac00000, size 320 MiB
[Mon Aug 15 16:34:24 2022] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[Mon Aug 15 16:34:24 2022] early_memtest: # of tests: 17
[Mon Aug 15 16:34:24 2022]   0x0000000000001000 - 0x0000000000210000 pattern 4c494e5558726c7a

[...] long lines of patterns. Full output at https://pastebin.ubuntu.com/p/KgStKrxDfr/

[Mon Aug 15 16:34:24 2022]   0x000000007fbfa87f - 0x000000007fbfa880 pattern 0000000000000000
[Mon Aug 15 16:34:24 2022] Zone ranges:
[Mon Aug 15 16:34:24 2022]   DMA      [mem 0x0000000000000000-0x000000003fffffff]
[Mon Aug 15 16:34:24 2022]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022]   Normal   empty
[Mon Aug 15 16:34:24 2022] Movable zone start for each node
[Mon Aug 15 16:34:24 2022] Early memory node ranges
[Mon Aug 15 16:34:24 2022]   node   0: [mem 0x0000000000000000-0x0000000037ffffff]
[Mon Aug 15 16:34:24 2022]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022] Initmem setup node 0 [mem 0x0000000000000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022] kasan: KernelAddressSanitizer initialized (generic)
[Mon Aug 15 16:34:24 2022] percpu: Embedded 29 pages/cpu s78888 r8192 d31704 u118784
[Mon Aug 15 16:34:24 2022] pcpu-alloc: s78888 r8192 d31704 u118784 alloc=29*4096
[Mon Aug 15 16:34:24 2022] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[Mon Aug 15 16:34:24 2022] Detected PIPT I-cache on CPU0
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v2
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v3a
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v4
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-BHB
[Mon Aug 15 16:34:24 2022] CPU features: kernel page table isolation forced OFF by kpti command line option
[Mon Aug 15 16:34:24 2022] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[Mon Aug 15 16:34:24 2022] Built 1 zonelists, mobility grouping on.  Total pages: 483840
[Mon Aug 15 16:34:24 2022] Kernel command line: coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0 video=HDMI-A-1:1920x1080M@60 smsc95xx.macaddr=DC:___ vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=tty1 root=PARTUUID=c-02 rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr mitigations=off memtest=17
[Mon Aug 15 16:34:24 2022] Unknown kernel command line parameters "nokaslr", will be passed to user space.
[Mon Aug 15 16:34:24 2022] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[Mon Aug 15 16:34:24 2022] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[Mon Aug 15 16:34:24 2022] mem auto-init: stack:off, heap alloc:off, heap free:off

Thanks,

Max

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
  2022-08-15 14:58   ` Max Schulze
@ 2022-08-18 17:14       ` Max Schulze
  0 siblings, 0 replies; 9+ messages in thread
From: Max Schulze @ 2022-08-18 17:14 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm

Hello,


> On 15.08.22 16:22, Will Deacon wrote:
>>
>>>
[...]
>>>
>>>
>>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
>>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
>>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0

> 
>> I hate to say it, but this all looks like memory corruption hitting the
>> page table and possibly the 'struct page' array to me :/
> 
> Perhaps a note on the occcurence: across devices, the "bad page map" differs at pte, but somehow is mostly consistent at pmd:800000001c40003 (though I have seen 800000001c02003 and 800000001c40003). Is this some "magic value"? Because when not, I think it would be highly unlikely to be the hardware.
> 
> It is not only my program that has the problem, I have seen
> 
> [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3  pte:262d2626292a2627 pmd:800000001c01003
> 
> and
> [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1  pte:a098a09aa29ea8a4 pmd:800000001c01003
> [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
> [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> 
> 
[..]

I am able to reproduce this on 6.0.0-rc1 . 
It looks like vm_normal_page does not recognize the page as being "normal" (?).
(mm/memory.c)
>               if (likely(!pte_special(pte)))
> 			goto check_pfn;
> 		if (vma->vm_ops && vma->vm_ops->find_special_page)
> 			return vma->vm_ops->find_special_page(vma, addr);
> 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> 			return NULL;
> 		if (is_zero_pfn(pfn))
> 			return NULL;
> 		if (pte_devmap(pte))
>[...]
> 			return NULL;
> 
> 		print_bad_pte(vma, addr, pte, NULL);



What would be helpful to do next? Is the KASAN warning a consequent error or the cause?

[ 18:42:59] 
[ 18:44:17] BUG: Bad page map in process projecta  pte:212725231f242323 pmd:800000001c01003
[ 18:44:17] addr:0000007fa1000000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1000
[ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G         C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) 
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) 
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 18:44:17] dump_stack (lib/dump_stack.c:114) 
[ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12)) 
[ 18:44:17] vm_normal_page (mm/memory.c:638) 
[ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] Disabling lock debugging due to kernel taint
[ 18:44:17] BUG: Bad page map in process projecta  pte:2626262023222323 pmd:800000001c01003
[ 18:44:17] addr:0000007fa1001000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1001
[ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G    B    C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) 
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) 
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 18:44:17] dump_stack (lib/dump_stack.c:114) 
[ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12)) 
[ 18:44:17] vm_normal_page (mm/memory.c:638) 
[ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] ==================================================================
[ 18:44:17] BUG: KASAN: wild-memory-access in __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] Read of size 8 at addr 00000096808c8880 by task projecta/1135

[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G    B    C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) 
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) 
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 18:44:17] print_report (mm/kasan/report.c:438) 
[ 18:44:17] kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:497) 
[ 18:44:17] __asan_load8 (mm/kasan/generic.c:256) 
[ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] ==================================================================
[ 18:44:17] Unable to handle kernel paging request at virtual address 00000096808c8880
[ 18:44:17] Mem abort info:
[ 18:44:17]   ESR = 0x0000000096000004
[ 18:44:17]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 18:44:17]   SET = 0, FnV = 0
[ 18:44:17]   EA = 0, S1PTW = 0
[ 18:44:17]   FSC = 0x04: level 0 translation fault
[ 18:44:17] Data abort info:
[ 18:44:17]   ISV = 0, ISS = 0x00000004
[ 18:44:17]   CM = 0, WnR = 0
[ 18:44:17] [00000096808c8880] address between user and kernel address ranges
[ 18:44:17] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 18:44:17] Modules linked in: rtc_pcf85063 regmap_i2c ov9281 rfkill bcm2835_unicam v4l2_dv_timings v4l2_fwnode v3d bcm2835_v4l2(C) v4l2_async bcm2835_codec(C) bcm2835_isp(C) videobuf2_vmalloc rpivid_hevc(C) v4l2_mem2mem drm_shmem_helper bcm2835_mmal_vchiq(C) gpu_sched videobuf2_dma_contig videobuf2_memops i2c_mux_pinctrl videobuf2_v4l2 videobuf2_common raspberrypi_hwmon i2c_mux videodev i2c_brcmstb i2c_bcm2835 vc_sm_cma(C) mc uio_pdrv_genirq nvmem_rmem uio drm fuse drm_panel_orientation_quirks backlight ipv6
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G    B    C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 18:44:17] pc : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] lr : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] sp : ffffffc00d067630
[ 18:44:17] x29: ffffffc00d067630 x28: 0400000000000001 x27: 2626262023222323
[ 18:44:17] x26: 0000007fa1001000 x25: fffffffe010f2ce8 x24: 0000000000000000
[ 18:44:17] x23: fffffffe00000000 x22: 00000096808c8880 x21: 1ffffff801a0cece
[ 18:44:17] x20: 0000000000000000 x19: 00000098808c8880 x18: 0000000000000000
[ 18:44:17] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[ 18:44:17] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014cd81d
[ 18:44:17] x11: 1ffffff8014cd81c x10: ffffffb8014cd81c x9 : dfffffc000000000
[ 18:44:17] x8 : ffffffc00a66c0e7 x7 : 00000047feb327e4 x6 : 0000000000000001
[ 18:44:17] x5 : ffffffc00a66c0e0 x4 : ffffffb8014cd81d x3 : ffffffc0080b68e4
[ 18:44:17] x2 : 0000000000000000 x1 : ffffff804f3e0040 x0 : 0000000000000001
[ 18:44:17] Call trace:
[ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] Code: d37ae673 8b170276 aa1603e0 940f8ac1 (f8776a60)
All code
========
   0:	d37ae673 	lsl	x19, x19, #6
   4:	8b170276 	add	x22, x19, x23
   8:	aa1603e0 	mov	x0, x22
   c:	940f8ac1 	bl	0x3e2b10
  10:*	f8776a60 	ldr	x0, [x19, x23]		<-- trapping instruction

Code starting with the faulting instruction
===========================================
   0:	f8776a60 	ldr	x0, [x19, x23]
[ 18:44:17] ---[ end trace 0000000000000000 ]---
[ 18:44:17] note: projecta[1135] exited with preempt_count 2

Thanks,
Max

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
@ 2022-08-18 17:14       ` Max Schulze
  0 siblings, 0 replies; 9+ messages in thread
From: Max Schulze @ 2022-08-18 17:14 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm

Hello,


> On 15.08.22 16:22, Will Deacon wrote:
>>
>>>
[...]
>>>
>>>
>>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
>>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
>>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0

> 
>> I hate to say it, but this all looks like memory corruption hitting the
>> page table and possibly the 'struct page' array to me :/
> 
> Perhaps a note on the occcurence: across devices, the "bad page map" differs at pte, but somehow is mostly consistent at pmd:800000001c40003 (though I have seen 800000001c02003 and 800000001c40003). Is this some "magic value"? Because when not, I think it would be highly unlikely to be the hardware.
> 
> It is not only my program that has the problem, I have seen
> 
> [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3  pte:262d2626292a2627 pmd:800000001c01003
> 
> and
> [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1  pte:a098a09aa29ea8a4 pmd:800000001c01003
> [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
> [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> 
> 
[..]

I am able to reproduce this on 6.0.0-rc1 . 
It looks like vm_normal_page does not recognize the page as being "normal" (?).
(mm/memory.c)
>               if (likely(!pte_special(pte)))
> 			goto check_pfn;
> 		if (vma->vm_ops && vma->vm_ops->find_special_page)
> 			return vma->vm_ops->find_special_page(vma, addr);
> 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> 			return NULL;
> 		if (is_zero_pfn(pfn))
> 			return NULL;
> 		if (pte_devmap(pte))
>[...]
> 			return NULL;
> 
> 		print_bad_pte(vma, addr, pte, NULL);



What would be helpful to do next? Is the KASAN warning a consequent error or the cause?

[ 18:42:59] 
[ 18:44:17] BUG: Bad page map in process projecta  pte:212725231f242323 pmd:800000001c01003
[ 18:44:17] addr:0000007fa1000000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1000
[ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G         C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) 
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) 
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 18:44:17] dump_stack (lib/dump_stack.c:114) 
[ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12)) 
[ 18:44:17] vm_normal_page (mm/memory.c:638) 
[ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] Disabling lock debugging due to kernel taint
[ 18:44:17] BUG: Bad page map in process projecta  pte:2626262023222323 pmd:800000001c01003
[ 18:44:17] addr:0000007fa1001000 vm_flags:00100073 anon_vma:ffffff8054090c38 mapping:0000000000000000 index:7fa1001
[ 18:44:17] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G    B    C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) 
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) 
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 18:44:17] dump_stack (lib/dump_stack.c:114) 
[ 18:44:17] print_bad_pte (mm/memory.c:567 (discriminator 12)) 
[ 18:44:17] vm_normal_page (mm/memory.c:638) 
[ 18:44:17] copy_page_range (mm/memory.c:951 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] ==================================================================
[ 18:44:17] BUG: KASAN: wild-memory-access in __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] Read of size 8 at addr 00000096808c8880 by task projecta/1135

[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G    B    C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] Call trace:
[ 18:44:17] dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:184) 
[ 18:44:17] show_stack (arch/arm64/kernel/stacktrace.c:191) 
[ 18:44:17] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 18:44:17] print_report (mm/kasan/report.c:438) 
[ 18:44:17] kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:497) 
[ 18:44:17] __asan_load8 (mm/kasan/generic.c:256) 
[ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] ==================================================================
[ 18:44:17] Unable to handle kernel paging request at virtual address 00000096808c8880
[ 18:44:17] Mem abort info:
[ 18:44:17]   ESR = 0x0000000096000004
[ 18:44:17]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 18:44:17]   SET = 0, FnV = 0
[ 18:44:17]   EA = 0, S1PTW = 0
[ 18:44:17]   FSC = 0x04: level 0 translation fault
[ 18:44:17] Data abort info:
[ 18:44:17]   ISV = 0, ISS = 0x00000004
[ 18:44:17]   CM = 0, WnR = 0
[ 18:44:17] [00000096808c8880] address between user and kernel address ranges
[ 18:44:17] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 18:44:17] Modules linked in: rtc_pcf85063 regmap_i2c ov9281 rfkill bcm2835_unicam v4l2_dv_timings v4l2_fwnode v3d bcm2835_v4l2(C) v4l2_async bcm2835_codec(C) bcm2835_isp(C) videobuf2_vmalloc rpivid_hevc(C) v4l2_mem2mem drm_shmem_helper bcm2835_mmal_vchiq(C) gpu_sched videobuf2_dma_contig videobuf2_memops i2c_mux_pinctrl videobuf2_v4l2 videobuf2_common raspberrypi_hwmon i2c_mux videodev i2c_brcmstb i2c_bcm2835 vc_sm_cma(C) mc uio_pdrv_genirq nvmem_rmem uio drm fuse drm_panel_orientation_quirks backlight ipv6
[ 18:44:17] CPU: 3 PID: 1135 Comm: projecta Tainted: G    B    C         6.0.0-rc1-v8-gc8f41281d1f4 #2
[ 18:44:17] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[ 18:44:17] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 18:44:17] pc : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] lr : __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] sp : ffffffc00d067630
[ 18:44:17] x29: ffffffc00d067630 x28: 0400000000000001 x27: 2626262023222323
[ 18:44:17] x26: 0000007fa1001000 x25: fffffffe010f2ce8 x24: 0000000000000000
[ 18:44:17] x23: fffffffe00000000 x22: 00000096808c8880 x21: 1ffffff801a0cece
[ 18:44:17] x20: 0000000000000000 x19: 00000098808c8880 x18: 0000000000000000
[ 18:44:17] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[ 18:44:17] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014cd81d
[ 18:44:17] x11: 1ffffff8014cd81c x10: ffffffb8014cd81c x9 : dfffffc000000000
[ 18:44:17] x8 : ffffffc00a66c0e7 x7 : 00000047feb327e4 x6 : 0000000000000001
[ 18:44:17] x5 : ffffffc00a66c0e0 x4 : ffffffb8014cd81d x3 : ffffffc0080b68e4
[ 18:44:17] x2 : 0000000000000000 x1 : ffffff804f3e0040 x0 : 0000000000000001
[ 18:44:17] Call trace:
[ 18:44:17] __sync_icache_dcache (./include/asm-generic/bitops/generic-non-atomic.h:127 arch/arm64/mm/flush.c:62) 
[ 18:44:17] copy_page_range (./arch/arm64/include/asm/pgtable.h:327 ./arch/arm64/include/asm/pgtable.h:358 mm/memory.c:994 mm/memory.c:1085 mm/memory.c:1171 mm/memory.c:1208 mm/memory.c:1232 mm/memory.c:1330) 
[ 18:44:17] dup_mm (kernel/fork.c:699 kernel/fork.c:1524) 
[ 18:44:17] copy_process (kernel/fork.c:1576 kernel/fork.c:2256) 
[ 18:44:17] kernel_clone (kernel/fork.c:2673) 
[ 18:44:17] __do_sys_clone (kernel/fork.c:2808) 
[ 18:44:17] __arm64_sys_clone (kernel/fork.c:2775) 
[ 18:44:17] invoke_syscall (arch/arm64/kernel/syscall.c:38 arch/arm64/kernel/syscall.c:52) 
[ 18:44:17] el0_svc_common.constprop.0 (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/syscall.c:150) 
[ 18:44:17] do_el0_svc (arch/arm64/kernel/syscall.c:207) 
[ 18:44:17] el0_svc (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:625) 
[ 18:44:17] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:643) 
[ 18:44:17] el0t_64_sync (arch/arm64/kernel/entry.S:581) 
[ 18:44:17] Code: d37ae673 8b170276 aa1603e0 940f8ac1 (f8776a60)
All code
========
   0:	d37ae673 	lsl	x19, x19, #6
   4:	8b170276 	add	x22, x19, x23
   8:	aa1603e0 	mov	x0, x22
   c:	940f8ac1 	bl	0x3e2b10
  10:*	f8776a60 	ldr	x0, [x19, x23]		<-- trapping instruction

Code starting with the faulting instruction
===========================================
   0:	f8776a60 	ldr	x0, [x19, x23]
[ 18:44:17] ---[ end trace 0000000000000000 ]---
[ 18:44:17] note: projecta[1135] exited with preempt_count 2

Thanks,
Max


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
  2022-08-18 17:14       ` Max Schulze
@ 2022-08-24 15:30         ` Will Deacon
  -1 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2022-08-24 15:30 UTC (permalink / raw)
  To: Max Schulze; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm

On Thu, Aug 18, 2022 at 07:14:12PM +0200, Max Schulze wrote:
> > On 15.08.22 16:22, Will Deacon wrote:
> >>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
> >>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
> >>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> 
> > 
> >> I hate to say it, but this all looks like memory corruption hitting the
> >> page table and possibly the 'struct page' array to me :/
> > 
> > Perhaps a note on the occcurence: across devices, the "bad page map"
> > differs at pte, but somehow is mostly consistent at pmd:800000001c40003
> > (though I have seen 800000001c02003 and 800000001c40003). Is this some
> > "magic value"? Because when not, I think it would be highly unlikely to
> > be the hardware.
> > 
> > It is not only my program that has the problem, I have seen
> > 
> > [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3  pte:262d2626292a2627 pmd:800000001c01003
> > 
> > and
> > [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1  pte:a098a09aa29ea8a4 pmd:800000001c01003
> > [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
> > [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> > 
> > 
> [..]
> 
> I am able to reproduce this on 6.0.0-rc1 . 
> It looks like vm_normal_page does not recognize the page as being "normal" (?).
> (mm/memory.c)

I think the issue is much more fundamental than that; you appear to have
page-table corruption (for example, "pte:262d2626292a2627" and
"pte:1110111111111111" are definitely corrupted) and so anything dealing
with 'struct page' derived from the physical address in the pte is going to
go wonky.

From the logs here, the pmds look ok but these are the pte values I spotted:

0x1110111111111111
0x262d2626292a2627
0xa098a09aa29ea8a4
0x212725231f242323
0x2626262023222323

which don't seem to correspond to any sort of poison, but are possibly
artifacts of repeated patterns with random bits cleared?

Will


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
@ 2022-08-24 15:30         ` Will Deacon
  0 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2022-08-24 15:30 UTC (permalink / raw)
  To: Max Schulze; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm

On Thu, Aug 18, 2022 at 07:14:12PM +0200, Max Schulze wrote:
> > On 15.08.22 16:22, Will Deacon wrote:
> >>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
> >>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
> >>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> 
> > 
> >> I hate to say it, but this all looks like memory corruption hitting the
> >> page table and possibly the 'struct page' array to me :/
> > 
> > Perhaps a note on the occcurence: across devices, the "bad page map"
> > differs at pte, but somehow is mostly consistent at pmd:800000001c40003
> > (though I have seen 800000001c02003 and 800000001c40003). Is this some
> > "magic value"? Because when not, I think it would be highly unlikely to
> > be the hardware.
> > 
> > It is not only my program that has the problem, I have seen
> > 
> > [Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3  pte:262d2626292a2627 pmd:800000001c01003
> > 
> > and
> > [Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1  pte:a098a09aa29ea8a4 pmd:800000001c01003
> > [Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
> > [Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> > 
> > 
> [..]
> 
> I am able to reproduce this on 6.0.0-rc1 . 
> It looks like vm_normal_page does not recognize the page as being "normal" (?).
> (mm/memory.c)

I think the issue is much more fundamental than that; you appear to have
page-table corruption (for example, "pte:262d2626292a2627" and
"pte:1110111111111111" are definitely corrupted) and so anything dealing
with 'struct page' derived from the physical address in the pte is going to
go wonky.

From the logs here, the pmds look ok but these are the pte values I spotted:

0x1110111111111111
0x262d2626292a2627
0xa098a09aa29ea8a4
0x212725231f242323
0x2626262023222323

which don't seem to correspond to any sort of poison, but are possibly
artifacts of repeated patterns with random bits cleared?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Aw: Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
  2022-08-24 15:30         ` Will Deacon
@ 2022-08-26  8:39           ` max.schulze
  -1 siblings, 0 replies; 9+ messages in thread
From: max.schulze @ 2022-08-26  8:39 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm

> 
> 
> I think the issue is much more fundamental than that; you appear to have
> page-table corruption (for example, "pte:262d2626292a2627" and
> "pte:1110111111111111" are definitely corrupted) and so anything dealing
> with 'struct page' derived from the physical address in the pte is going to
> go wonky.
> 
>  From the logs here, the pmds look ok but these are the pte values I 
> spotted:
> 
> 0x1110111111111111
> 0x262d2626292a2627
> 0xa098a09aa29ea8a4
> 0x212725231f242323
> 0x2626262023222323
> 
> which don't seem to correspond to any sort of poison, but are possibly
> artifacts of repeated patterns with random bits cleared?
> 
> Will


Very valuable thought. I am streaming camera data, and generally the image captured from my desk is very dark (so these might be 8-bit brightness values from a mono ov9281 sensor).

I then set the camera to generate a test pattern, i.e. it will only produce output  like this

00000090: ffff ffff ffff ffff ffff 00ff 00ff 00ff
000000a0: 00ff 00ff 00ff 00ff 00ff 00ff 00ff 00ff

( also ff00 and 0000 ).

And those were the crashes that followed:

[   54.838582] BUG: Bad page map in process pool-upowerd  pte:ff00ffffffffff pmd:800000001801003
[   54.839187] BUG: Bad page map in process pool-upowerd  pte:ff00ff00ff00ff pmd:800000001801003
[   54.839615] BUG: Bad page map in process pool-upowerd  pte:ff00ff00ff00ff pmd:800000001801003
[   54.845439] BUG: Bad page map in process pool-upowerd  pte:ff00ff00ff00ff pmd:800000001801003

[   49.664066] BUG: Bad page map in process gmain  pte:ff00ffffffffff pmd:800000001801003
[   49.664515] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.664879] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.665245] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.665610] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.665973] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.666336] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.666698] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.667061] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003

That confirms my suspicion of a dma write-out-of-bounds from a very separate angle (thanks again for the valuable idea).
I have since taken this to https://github.com/raspberrypi/linux/issues/5138 but will update the list once investigation has found a solution.

Thanks,
Max


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Aw: Re: BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
@ 2022-08-26  8:39           ` max.schulze
  0 siblings, 0 replies; 9+ messages in thread
From: max.schulze @ 2022-08-26  8:39 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-arm-kernel, catalin.marinas, naush, linux-mm, akpm

> 
> 
> I think the issue is much more fundamental than that; you appear to have
> page-table corruption (for example, "pte:262d2626292a2627" and
> "pte:1110111111111111" are definitely corrupted) and so anything dealing
> with 'struct page' derived from the physical address in the pte is going to
> go wonky.
> 
>  From the logs here, the pmds look ok but these are the pte values I 
> spotted:
> 
> 0x1110111111111111
> 0x262d2626292a2627
> 0xa098a09aa29ea8a4
> 0x212725231f242323
> 0x2626262023222323
> 
> which don't seem to correspond to any sort of poison, but are possibly
> artifacts of repeated patterns with random bits cleared?
> 
> Will


Very valuable thought. I am streaming camera data, and generally the image captured from my desk is very dark (so these might be 8-bit brightness values from a mono ov9281 sensor).

I then set the camera to generate a test pattern, i.e. it will only produce output  like this

00000090: ffff ffff ffff ffff ffff 00ff 00ff 00ff
000000a0: 00ff 00ff 00ff 00ff 00ff 00ff 00ff 00ff

( also ff00 and 0000 ).

And those were the crashes that followed:

[   54.838582] BUG: Bad page map in process pool-upowerd  pte:ff00ffffffffff pmd:800000001801003
[   54.839187] BUG: Bad page map in process pool-upowerd  pte:ff00ff00ff00ff pmd:800000001801003
[   54.839615] BUG: Bad page map in process pool-upowerd  pte:ff00ff00ff00ff pmd:800000001801003
[   54.845439] BUG: Bad page map in process pool-upowerd  pte:ff00ff00ff00ff pmd:800000001801003

[   49.664066] BUG: Bad page map in process gmain  pte:ff00ffffffffff pmd:800000001801003
[   49.664515] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.664879] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.665245] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.665610] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.665973] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.666336] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.666698] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003
[   49.667061] BUG: Bad page map in process gmain  pte:ff00ff00ff00ff pmd:800000001801003

That confirms my suspicion of a dma write-out-of-bounds from a very separate angle (thanks again for the valuable idea).
I have since taken this to https://github.com/raspberrypi/linux/issues/5138 but will update the list once investigation has found a solution.

Thanks,
Max

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-26  8:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 20:01 BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall Max Schulze
2022-08-15 14:22 ` Will Deacon
2022-08-15 14:58   ` Max Schulze
2022-08-18 17:14     ` Max Schulze
2022-08-18 17:14       ` Max Schulze
2022-08-24 15:30       ` Will Deacon
2022-08-24 15:30         ` Will Deacon
2022-08-26  8:39         ` Aw: " max.schulze
2022-08-26  8:39           ` max.schulze

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.