All of lore.kernel.org
 help / color / mirror / Atom feed
* Arm64 crash while reading memory sysfs
@ 2021-05-25 15:25 ` Qian Cai (QUIC)
  0 siblings, 0 replies; 44+ messages in thread
From: Qian Cai (QUIC) @ 2021-05-25 15:25 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while reading files under /sys/devices/system/memory.

[1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/

[  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
[  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
[  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
[  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
[  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
[  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
[  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
[  247.731327][ T1443] sp : ffff800023f8f670
[  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
[  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
[  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
[  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
[  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
[  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
[  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
[  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
[  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
[  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
[  247.813606][ T1443] Call trace:
[  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
[  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
[  247.826483][ T1443]  dev_attr_show+0x50/0xc8
[  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
[  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
[  247.840143][ T1443]  seq_read_iter+0x344/0xd50
[  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
[  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
[  247.854063][ T1443]  vfs_read+0x18c/0x340
[  247.858066][ T1443]  ksys_read+0xf8/0x1e0
[  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
[  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
[  247.872072][ T1443]  do_el0_svc+0xe4/0x298
[  247.876162][ T1443]  el0_svc+0x20/0x30
[  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
[  247.884429][ T1443]  el0_sync+0x178/0x180
[  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
[  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
[  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
[  247.907501][ T1443] SMP: stopping secondary CPUs
[  247.912122][ T1443] Kernel Offset: disabled
[  247.916296][ T1443] CPU features: 0x00000251,20000846
[  247.921340][ T1443] Memory Limit: none
[  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Arm64 crash while reading memory sysfs
@ 2021-05-25 15:25 ` Qian Cai (QUIC)
  0 siblings, 0 replies; 44+ messages in thread
From: Qian Cai (QUIC) @ 2021-05-25 15:25 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while reading files under /sys/devices/system/memory.

[1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/

[  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
[  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
[  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
[  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
[  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
[  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
[  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
[  247.731327][ T1443] sp : ffff800023f8f670
[  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
[  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
[  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
[  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
[  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
[  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
[  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
[  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
[  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
[  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
[  247.813606][ T1443] Call trace:
[  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
[  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
[  247.826483][ T1443]  dev_attr_show+0x50/0xc8
[  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
[  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
[  247.840143][ T1443]  seq_read_iter+0x344/0xd50
[  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
[  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
[  247.854063][ T1443]  vfs_read+0x18c/0x340
[  247.858066][ T1443]  ksys_read+0xf8/0x1e0
[  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
[  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
[  247.872072][ T1443]  do_el0_svc+0xe4/0x298
[  247.876162][ T1443]  el0_svc+0x20/0x30
[  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
[  247.884429][ T1443]  el0_sync+0x178/0x180
[  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
[  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
[  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
[  247.907501][ T1443] SMP: stopping secondary CPUs
[  247.912122][ T1443] Kernel Offset: disabled
[  247.916296][ T1443] CPU features: 0x00000251,20000846
[  247.921340][ T1443] Memory Limit: none
[  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-25 15:25 ` Qian Cai (QUIC)
@ 2021-05-25 15:37   ` David Hildenbrand
  -1 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-05-25 15:37 UTC (permalink / raw)
  To: Qian Cai (QUIC), Mike Rapoport
  Cc: Andrew Morton, Catalin Marinas, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On 25.05.21 17:25, Qian Cai (QUIC) wrote:
> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while reading files under /sys/devices/system/memory.
> 
> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> 
> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> [  247.731327][ T1443] sp : ffff800023f8f670
> [  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
> [  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
> [  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
> [  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
> [  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
> [  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
> [  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
> [  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
> [  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
> [  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
> [  247.813606][ T1443] Call trace:
> [  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
> [  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
> [  247.826483][ T1443]  dev_attr_show+0x50/0xc8
> [  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
> [  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
> [  247.840143][ T1443]  seq_read_iter+0x344/0xd50
> [  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
> [  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
> [  247.854063][ T1443]  vfs_read+0x18c/0x340
> [  247.858066][ T1443]  ksys_read+0xf8/0x1e0
> [  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
> [  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
> [  247.872072][ T1443]  do_el0_svc+0xe4/0x298
> [  247.876162][ T1443]  el0_svc+0x20/0x30
> [  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
> [  247.884429][ T1443]  el0_sync+0x178/0x180
> [  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
> [  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
> [  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [  247.907501][ T1443] SMP: stopping secondary CPUs
> [  247.912122][ T1443] Kernel Offset: disabled
> [  247.916296][ T1443] CPU features: 0x00000251,20000846
> [  247.921340][ T1443] Memory Limit: none
> [  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---

That hole test_pages_in_a_zone() cruft has to go sooner or later. I have 
getting rid of that on my list (simply storing the single zone if any 
per memory block).

We run into an uninitialized memmap, because the poison check in 
page_zone()->page_to_nid() triggers. I assume the memmap of a memory 
hole does not get initialized properly?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-25 15:37   ` David Hildenbrand
  0 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-05-25 15:37 UTC (permalink / raw)
  To: Qian Cai (QUIC), Mike Rapoport
  Cc: Andrew Morton, Catalin Marinas, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On 25.05.21 17:25, Qian Cai (QUIC) wrote:
> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while reading files under /sys/devices/system/memory.
> 
> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> 
> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> [  247.731327][ T1443] sp : ffff800023f8f670
> [  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
> [  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
> [  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
> [  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
> [  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
> [  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
> [  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
> [  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
> [  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
> [  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
> [  247.813606][ T1443] Call trace:
> [  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
> [  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
> [  247.826483][ T1443]  dev_attr_show+0x50/0xc8
> [  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
> [  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
> [  247.840143][ T1443]  seq_read_iter+0x344/0xd50
> [  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
> [  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
> [  247.854063][ T1443]  vfs_read+0x18c/0x340
> [  247.858066][ T1443]  ksys_read+0xf8/0x1e0
> [  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
> [  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
> [  247.872072][ T1443]  do_el0_svc+0xe4/0x298
> [  247.876162][ T1443]  el0_svc+0x20/0x30
> [  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
> [  247.884429][ T1443]  el0_sync+0x178/0x180
> [  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
> [  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
> [  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [  247.907501][ T1443] SMP: stopping secondary CPUs
> [  247.912122][ T1443] Kernel Offset: disabled
> [  247.916296][ T1443] CPU features: 0x00000251,20000846
> [  247.921340][ T1443] Memory Limit: none
> [  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---

That hole test_pages_in_a_zone() cruft has to go sooner or later. I have 
getting rid of that on my list (simply storing the single zone if any 
per memory block).

We run into an uninitialized memmap, because the poison check in 
page_zone()->page_to_nid() triggers. I assume the memmap of a memory 
hole does not get initialized properly?

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-25 15:25 ` Qian Cai (QUIC)
@ 2021-05-26  6:40   ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-26  6:40 UTC (permalink / raw)
  To: Qian Cai (QUIC)
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

Hi,

On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while reading files under /sys/devices/system/memory.

Can you please send the beginning of the boot log, up to the
	 "Memory: xK/yK available ..."
line?
 
> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> 
> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> [  247.731327][ T1443] sp : ffff800023f8f670
> [  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
> [  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
> [  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
> [  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
> [  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
> [  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
> [  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
> [  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
> [  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
> [  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
> [  247.813606][ T1443] Call trace:
> [  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
> [  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
> [  247.826483][ T1443]  dev_attr_show+0x50/0xc8
> [  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
> [  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
> [  247.840143][ T1443]  seq_read_iter+0x344/0xd50
> [  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
> [  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
> [  247.854063][ T1443]  vfs_read+0x18c/0x340
> [  247.858066][ T1443]  ksys_read+0xf8/0x1e0
> [  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
> [  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
> [  247.872072][ T1443]  do_el0_svc+0xe4/0x298
> [  247.876162][ T1443]  el0_svc+0x20/0x30
> [  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
> [  247.884429][ T1443]  el0_sync+0x178/0x180
> [  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
> [  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
> [  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [  247.907501][ T1443] SMP: stopping secondary CPUs
> [  247.912122][ T1443] Kernel Offset: disabled
> [  247.916296][ T1443] CPU features: 0x00000251,20000846
> [  247.921340][ T1443] Memory Limit: none
> [  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-26  6:40   ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-26  6:40 UTC (permalink / raw)
  To: Qian Cai (QUIC)
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

Hi,

On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while reading files under /sys/devices/system/memory.

Can you please send the beginning of the boot log, up to the
	 "Memory: xK/yK available ..."
line?
 
> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> 
> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> [  247.731327][ T1443] sp : ffff800023f8f670
> [  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
> [  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
> [  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
> [  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
> [  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
> [  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
> [  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
> [  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
> [  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
> [  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
> [  247.813606][ T1443] Call trace:
> [  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
> [  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
> [  247.826483][ T1443]  dev_attr_show+0x50/0xc8
> [  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
> [  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
> [  247.840143][ T1443]  seq_read_iter+0x344/0xd50
> [  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
> [  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
> [  247.854063][ T1443]  vfs_read+0x18c/0x340
> [  247.858066][ T1443]  ksys_read+0xf8/0x1e0
> [  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
> [  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
> [  247.872072][ T1443]  do_el0_svc+0xe4/0x298
> [  247.876162][ T1443]  el0_svc+0x20/0x30
> [  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
> [  247.884429][ T1443]  el0_sync+0x178/0x180
> [  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
> [  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
> [  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [  247.907501][ T1443] SMP: stopping secondary CPUs
> [  247.912122][ T1443] Kernel Offset: disabled
> [  247.916296][ T1443] CPU features: 0x00000251,20000846
> [  247.921340][ T1443] Memory Limit: none
> [  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
> 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Arm64 crash while reading memory sysfs
  2021-05-26  6:40   ` Mike Rapoport
@ 2021-05-26 12:09     ` Qian Cai (QUIC)
  -1 siblings, 0 replies; 44+ messages in thread
From: Qian Cai (QUIC) @ 2021-05-26 12:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



> -----Original Message-----
> From: Mike Rapoport <rppt@linux.ibm.com>
> Sent: Wednesday, May 26, 2021 2:40 AM
> To: Qian Cai (QUIC) <quic_qiancai@quicinc.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>; David Hildenbrand <david@redhat.com>; Catalin Marinas
> <catalin.marinas@arm.com>; Anshuman Khandual <anshuman.khandual@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Linux
> Memory Management List <linux-mm@kvack.org>; Will Deacon <will@kernel.org>; Marc Zyngier <maz@kernel.org>; Linux Kernel
> Mailing List <linux-kernel@vger.kernel.org>; Linux ARM <linux-arm-kernel@lists.infradead.org>
> Subject: Re: Arm64 crash while reading memory sysfs
> 
> Hi,
> 
> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> > Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> reading files under /sys/devices/system/memory.
> 
> Can you please send the beginning of the boot log, up to the
> 	 "Memory: xK/yK available ..."
> line?

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
[    0.000000] Linux version 5.13.0-rc3-next-20210525+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #27 SMP Tue May 25 19:03:24 UTC 2021
[    0.000000] efi: EFI v2.70 by American Megatrends
[    0.000000] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dbed98
[    0.000000] esrt: Reserving ESRT space from 0x0000009ff1d18298 to 0x0000009ff1d182f8.
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x0000009FF5B40000 000024 (v02 ALASKA)
[    0.000000] ACPI: XSDT 0x0000009FF5B40028 000094 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: FACP 0x0000009FF5B400C0 000114 (v06 Ampere eMAG     00000003 INTL 20190509)
[    0.000000] ACPI: DSDT 0x0000009FF5B401D8 00765A (v05 ALASKA A M I    00000001 INTL 20190509)
[    0.000000] ACPI: FIDT 0x0000009FF5B47838 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: DBG2 0x0000009FF5B478D8 000061 (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000] ACPI: GTDT 0x0000009FF5B47940 000108 (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000] ACPI: IORT 0x0000009FF5B47A48 000BCC (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000] ACPI: MCFG 0x0000009FF5B48618 0000AC (v01 Ampere eMAG     00000001 INTL 20190509)
[    0.000000] ACPI: SSDT 0x0000009FF5B486C8 00002D (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000] ACPI: SPMI 0x0000009FF5B486F8 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
[    0.000000] ACPI: APIC 0x0000009FF5B48740 000A68 (v04 Ampere eMAG     00000004      01000013)
[    0.000000] ACPI: PCCT 0x0000009FF5B491A8 0005D0 (v01 Ampere eMAG     00000003      01000013)
[    0.000000] ACPI: BERT 0x0000009FF5B49778 000030 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000] ACPI: HEST 0x0000009FF5B497A8 000328 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000] ACPI: SPCR 0x0000009FF5B49AD0 000050 (v02 A M I  APTIO V  01072009 AMI. 0005000D)
[    0.000000] ACPI: PPTT 0x0000009FF5B49B20 000CB8 (v01 Ampere eMAG     00000003      01000013)
[    0.000000] ACPI: SPCR: console: pl011,mmio32,0x12600000,115200
[    0.000000] NUMA: Failed to initialise from firmware
[    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
[    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
[    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
[    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
[    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
[    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
[    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
[    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
[    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
[    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
[    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
[    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000] kasan: KernelAddressSanitizer initialized
[    0.000000] psci: probing for conduit method from ACPI.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v65535.65535
[    0.000000] ACPI: SRAT not present
[    0.000000] percpu: Embedded 10 pages/cpu s584592 r8192 d62576 u655360
[    0.000000] pcpu-alloc: s584592 r8192 d62576 u655360 alloc=10*65536
[    0.000000] pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0] 06 [0] 07
[    0.000000] pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0] 14 [0] 15
[    0.000000] pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0] 22 [0] 23
[    0.000000] pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0] 30 [0] 31
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: detected: Spectre-v2
[    0.000000] CPU features: detected: Spectre-v4
[    0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2091012
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210525+ root=/dev/mapper/ubuntu--vg-ubuntu--lv ro cma=1024M iommu.passthrough=1
[    0.000000] Unknown command line parameters: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210525+ cma=1024M
[    0.000000] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, linear)
[    0.000000] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

> 
> > [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> >
> > [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> > [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> > [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
> nvme mlx5_core i2c_core nvme_core firmware_class
> > [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> > [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> > [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> > [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> > [  247.731327][ T1443] sp : ffff800023f8f670
> > [  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
> > [  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
> > [  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
> > [  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
> > [  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
> > [  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
> > [  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
> > [  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
> > [  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
> > [  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
> > [  247.813606][ T1443] Call trace:
> > [  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
> > [  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
> > [  247.826483][ T1443]  dev_attr_show+0x50/0xc8
> > [  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
> > [  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
> > [  247.840143][ T1443]  seq_read_iter+0x344/0xd50
> > [  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
> > [  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
> > [  247.854063][ T1443]  vfs_read+0x18c/0x340
> > [  247.858066][ T1443]  ksys_read+0xf8/0x1e0
> > [  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
> > [  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
> > [  247.872072][ T1443]  do_el0_svc+0xe4/0x298
> > [  247.876162][ T1443]  el0_svc+0x20/0x30
> > [  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
> > [  247.884429][ T1443]  el0_sync+0x178/0x180
> > [  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
> > [  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
> > [  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > [  247.907501][ T1443] SMP: stopping secondary CPUs
> > [  247.912122][ T1443] Kernel Offset: disabled
> > [  247.916296][ T1443] CPU features: 0x00000251,20000846
> > [  247.921340][ T1443] Memory Limit: none
> > [  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
> >
> 
> --
> Sincerely yours,
> Mike.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Arm64 crash while reading memory sysfs
@ 2021-05-26 12:09     ` Qian Cai (QUIC)
  0 siblings, 0 replies; 44+ messages in thread
From: Qian Cai (QUIC) @ 2021-05-26 12:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



> -----Original Message-----
> From: Mike Rapoport <rppt@linux.ibm.com>
> Sent: Wednesday, May 26, 2021 2:40 AM
> To: Qian Cai (QUIC) <quic_qiancai@quicinc.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>; David Hildenbrand <david@redhat.com>; Catalin Marinas
> <catalin.marinas@arm.com>; Anshuman Khandual <anshuman.khandual@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Linux
> Memory Management List <linux-mm@kvack.org>; Will Deacon <will@kernel.org>; Marc Zyngier <maz@kernel.org>; Linux Kernel
> Mailing List <linux-kernel@vger.kernel.org>; Linux ARM <linux-arm-kernel@lists.infradead.org>
> Subject: Re: Arm64 crash while reading memory sysfs
> 
> Hi,
> 
> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> > Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> reading files under /sys/devices/system/memory.
> 
> Can you please send the beginning of the boot log, up to the
> 	 "Memory: xK/yK available ..."
> line?

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
[    0.000000] Linux version 5.13.0-rc3-next-20210525+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #27 SMP Tue May 25 19:03:24 UTC 2021
[    0.000000] efi: EFI v2.70 by American Megatrends
[    0.000000] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dbed98
[    0.000000] esrt: Reserving ESRT space from 0x0000009ff1d18298 to 0x0000009ff1d182f8.
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x0000009FF5B40000 000024 (v02 ALASKA)
[    0.000000] ACPI: XSDT 0x0000009FF5B40028 000094 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: FACP 0x0000009FF5B400C0 000114 (v06 Ampere eMAG     00000003 INTL 20190509)
[    0.000000] ACPI: DSDT 0x0000009FF5B401D8 00765A (v05 ALASKA A M I    00000001 INTL 20190509)
[    0.000000] ACPI: FIDT 0x0000009FF5B47838 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: DBG2 0x0000009FF5B478D8 000061 (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000] ACPI: GTDT 0x0000009FF5B47940 000108 (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000] ACPI: IORT 0x0000009FF5B47A48 000BCC (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000] ACPI: MCFG 0x0000009FF5B48618 0000AC (v01 Ampere eMAG     00000001 INTL 20190509)
[    0.000000] ACPI: SSDT 0x0000009FF5B486C8 00002D (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000] ACPI: SPMI 0x0000009FF5B486F8 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
[    0.000000] ACPI: APIC 0x0000009FF5B48740 000A68 (v04 Ampere eMAG     00000004      01000013)
[    0.000000] ACPI: PCCT 0x0000009FF5B491A8 0005D0 (v01 Ampere eMAG     00000003      01000013)
[    0.000000] ACPI: BERT 0x0000009FF5B49778 000030 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000] ACPI: HEST 0x0000009FF5B497A8 000328 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000] ACPI: SPCR 0x0000009FF5B49AD0 000050 (v02 A M I  APTIO V  01072009 AMI. 0005000D)
[    0.000000] ACPI: PPTT 0x0000009FF5B49B20 000CB8 (v01 Ampere eMAG     00000003      01000013)
[    0.000000] ACPI: SPCR: console: pl011,mmio32,0x12600000,115200
[    0.000000] NUMA: Failed to initialise from firmware
[    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
[    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
[    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
[    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
[    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
[    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
[    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
[    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
[    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
[    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
[    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
[    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000] kasan: KernelAddressSanitizer initialized
[    0.000000] psci: probing for conduit method from ACPI.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v65535.65535
[    0.000000] ACPI: SRAT not present
[    0.000000] percpu: Embedded 10 pages/cpu s584592 r8192 d62576 u655360
[    0.000000] pcpu-alloc: s584592 r8192 d62576 u655360 alloc=10*65536
[    0.000000] pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0] 06 [0] 07
[    0.000000] pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0] 14 [0] 15
[    0.000000] pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0] 22 [0] 23
[    0.000000] pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0] 30 [0] 31
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: detected: Spectre-v2
[    0.000000] CPU features: detected: Spectre-v4
[    0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2091012
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210525+ root=/dev/mapper/ubuntu--vg-ubuntu--lv ro cma=1024M iommu.passthrough=1
[    0.000000] Unknown command line parameters: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210525+ cma=1024M
[    0.000000] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, linear)
[    0.000000] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

> 
> > [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> >
> > [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> > [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> > [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
> nvme mlx5_core i2c_core nvme_core firmware_class
> > [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> > [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> > [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> > [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> > [  247.731327][ T1443] sp : ffff800023f8f670
> > [  247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
> > [  247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
> > [  247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
> > [  247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
> > [  247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
> > [  247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
> > [  247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
> > [  247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
> > [  247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
> > [  247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
> > [  247.813606][ T1443] Call trace:
> > [  247.816738][ T1443]  test_pages_in_a_zone+0x23c/0x300
> > [  247.821784][ T1443]  valid_zones_show+0x1e0/0x298
> > [  247.826483][ T1443]  dev_attr_show+0x50/0xc8
> > [  247.830747][ T1443]  sysfs_kf_seq_show+0x164/0x368
> > [  247.835533][ T1443]  kernfs_seq_show+0x130/0x198
> > [  247.840143][ T1443]  seq_read_iter+0x344/0xd50
> > [  247.844581][ T1443]  kernfs_fop_read_iter+0x32c/0x4a8
> > [  247.849625][ T1443]  new_sync_read+0x2bc/0x4e8
> > [  247.854063][ T1443]  vfs_read+0x18c/0x340
> > [  247.858066][ T1443]  ksys_read+0xf8/0x1e0
> > [  247.862068][ T1443]  __arm64_sys_read+0x74/0xa8
> > [  247.866591][ T1443]  invoke_syscall.constprop.0+0xdc/0x1d8
> > [  247.872072][ T1443]  do_el0_svc+0xe4/0x298
> > [  247.876162][ T1443]  el0_svc+0x20/0x30
> > [  247.879906][ T1443]  el0_sync_handler+0xb0/0xb8
> > [  247.884429][ T1443]  el0_sync+0x178/0x180
> > [  247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
> > [  247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
> > [  247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > [  247.907501][ T1443] SMP: stopping secondary CPUs
> > [  247.912122][ T1443] Kernel Offset: disabled
> > [  247.916296][ T1443] CPU features: 0x00000251,20000846
> > [  247.921340][ T1443] Memory Limit: none
> > [  247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
> >
> 
> --
> Sincerely yours,
> Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-26 12:09     ` Qian Cai (QUIC)
@ 2021-05-26 13:04       ` Catalin Marinas
  -1 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2021-05-26 13:04 UTC (permalink / raw)
  To: Qian Cai (QUIC)
  Cc: Mike Rapoport, Andrew Morton, David Hildenbrand,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]

Maybe de-selecting HOLES_IN_ZONE is not correct for arm64 in all
circumstances. In a configuration with 64K pages, MAX_ORDER is 14,
MAX_ORDER_NR_PAGES is 8192, so a 2^29 address range. However, the above
range starts on 2^28 boundary.

SECTION_SIZE_BITS is 29 in this configuration but the corresponding
mem_map[] in the first half of the first section is probably not marked
as reserved as we'd do for NOMAP.

-- 
Catalin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-26 13:04       ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2021-05-26 13:04 UTC (permalink / raw)
  To: Qian Cai (QUIC)
  Cc: Mike Rapoport, Andrew Morton, David Hildenbrand,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]

Maybe de-selecting HOLES_IN_ZONE is not correct for arm64 in all
circumstances. In a configuration with 64K pages, MAX_ORDER is 14,
MAX_ORDER_NR_PAGES is 8192, so a 2^29 address range. However, the above
range starts on 2^28 boundary.

SECTION_SIZE_BITS is 29 in this configuration but the corresponding
mem_map[] in the first half of the first section is probably not marked
as reserved as we'd do for NOMAP.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-26 12:09     ` Qian Cai (QUIC)
@ 2021-05-26 17:24       ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-26 17:24 UTC (permalink / raw)
  To: Qian Cai (QUIC)
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> > 
> > On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> > > Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> > reading files under /sys/devices/system/memory.

Does the issue persist of you only revert the latest patch in the series?
In next-20210525 it would be commit 
89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
and commit
dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").

> > Can you please send the beginning of the boot log, up to the
> > 	 "Memory: xK/yK available ..."
> > line?
> 
> [    0.000000] NUMA: Failed to initialise from firmware
> [    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
> [    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
> [    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
> [    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
> [    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
> [    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
> [    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
> [    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
> [    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
> [    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
> [    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
> [    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
> [    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
> [    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

The available and reserved sizes look weird. Can you post the log with
memblock=debug and mminit_loglevel=4 added to the kernel command line?
 
> > > [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> > >
> > > [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> > > [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> > > [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
> > nvme mlx5_core i2c_core nvme_core firmware_class
> > > [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> > > [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> > > [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > > [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> > > [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300

Do we know what PFN triggers it? Can you please run with this patch:

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 70620d0dd923..b9d1dd0dae5f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1443,6 +1443,12 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
 				i++;
 			if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
 				continue;
+
+			if (!pfn_valid(pfn))
+				pr_info("%s: pfn %lx is not valid\n", __func__, pfn);
+			else if (PagePoisoned(pfn_to_page(pfn)))
+				dump_page(pfn_to_page(pfn), "");
+
 			/* Check if we got outside of the zone */
 			if (zone && !zone_spans_pfn(zone, pfn + i))
 				return NULL;


-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-26 17:24       ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-26 17:24 UTC (permalink / raw)
  To: Qian Cai (QUIC)
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> > 
> > On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> > > Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> > reading files under /sys/devices/system/memory.

Does the issue persist of you only revert the latest patch in the series?
In next-20210525 it would be commit 
89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
and commit
dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").

> > Can you please send the beginning of the boot log, up to the
> > 	 "Memory: xK/yK available ..."
> > line?
> 
> [    0.000000] NUMA: Failed to initialise from firmware
> [    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
> [    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
> [    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
> [    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
> [    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
> [    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
> [    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
> [    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
> [    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
> [    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
> [    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
> [    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
> [    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
> [    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

The available and reserved sizes look weird. Can you post the log with
memblock=debug and mminit_loglevel=4 added to the kernel command line?
 
> > > [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> > >
> > > [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> > > [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> > > [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
> > nvme mlx5_core i2c_core nvme_core firmware_class
> > > [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> > > [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> > > [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > > [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> > > [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300

Do we know what PFN triggers it? Can you please run with this patch:

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 70620d0dd923..b9d1dd0dae5f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1443,6 +1443,12 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
 				i++;
 			if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
 				continue;
+
+			if (!pfn_valid(pfn))
+				pr_info("%s: pfn %lx is not valid\n", __func__, pfn);
+			else if (PagePoisoned(pfn_to_page(pfn)))
+				dump_page(pfn_to_page(pfn), "");
+
 			/* Check if we got outside of the zone */
 			if (zone && !zone_spans_pfn(zone, pfn + i))
 				return NULL;


-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-26 13:04       ` Catalin Marinas
@ 2021-05-26 17:25         ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-26 17:25 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Qian Cai (QUIC),
	Andrew Morton, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 02:04:26PM +0100, Catalin Marinas wrote:
> On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> > [    0.000000] Early memory node ranges
> > [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
> 
> Maybe de-selecting HOLES_IN_ZONE is not correct for arm64 in all
> circumstances. In a configuration with 64K pages, MAX_ORDER is 14,
> MAX_ORDER_NR_PAGES is 8192, so a 2^29 address range. However, the above
> range starts on 2^28 boundary.
> 
> SECTION_SIZE_BITS is 29 in this configuration but the corresponding
> mem_map[] in the first half of the first section is probably not marked
> as reserved as we'd do for NOMAP.

We do initialize (or at least we should) the first of the first section in
page_alloc::init_unavailable_range() so the range [0x8000000 - 0x9000000]
will have struct pages marked as reserved.

I think it should be fine to de-select HOLES_IN_ZONE as long as MAX_ORDER
chunk does not exceed a section because we do have memory map there in such
case and HOLES_IN_ZONE along with pfn_valid_within() protected against
access to non-existing memory map entries.

We still have an issue with memory map initialization, and probably I've
missed something in decoupling of "do we have memory there" from
pfn_valid().

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-26 17:25         ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-26 17:25 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Qian Cai (QUIC),
	Andrew Morton, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 02:04:26PM +0100, Catalin Marinas wrote:
> On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> > [    0.000000] Early memory node ranges
> > [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
> 
> Maybe de-selecting HOLES_IN_ZONE is not correct for arm64 in all
> circumstances. In a configuration with 64K pages, MAX_ORDER is 14,
> MAX_ORDER_NR_PAGES is 8192, so a 2^29 address range. However, the above
> range starts on 2^28 boundary.
> 
> SECTION_SIZE_BITS is 29 in this configuration but the corresponding
> mem_map[] in the first half of the first section is probably not marked
> as reserved as we'd do for NOMAP.

We do initialize (or at least we should) the first of the first section in
page_alloc::init_unavailable_range() so the range [0x8000000 - 0x9000000]
will have struct pages marked as reserved.

I think it should be fine to de-select HOLES_IN_ZONE as long as MAX_ORDER
chunk does not exceed a section because we do have memory map there in such
case and HOLES_IN_ZONE along with pfn_valid_within() protected against
access to non-existing memory map entries.

We still have an issue with memory map initialization, and probably I've
missed something in decoupling of "do we have memory there" from
pfn_valid().

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-26 17:24       ` Mike Rapoport
@ 2021-05-27  0:16         ` Qian Cai
  -1 siblings, 0 replies; 44+ messages in thread
From: Qian Cai @ 2021-05-27  0:16 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



On 5/26/2021 1:24 PM, Mike Rapoport wrote:
> On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
>>>
>>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
>>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
>>> reading files under /sys/devices/system/memory.
> 
> Does the issue persist of you only revert the latest patch in the series?
> In next-20210525 it would be commit 
> 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> and commit
> dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").

Reverting those two commits alone is enough to fix the issue.

> 
>>> Can you please send the beginning of the boot log, up to the
>>> 	 "Memory: xK/yK available ..."
>>> line?
>>
>> [    0.000000] NUMA: Failed to initialise from firmware
>> [    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
>> [    0.000000] Zone ranges:
>> [    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
>> [    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
>> [    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
>> [    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
>> [    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
>> [    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
>> [    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
>> [    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
>> [    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
>> [    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
>> [    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
>> [    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
>> [    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
>> [    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
>> [    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> 
> The available and reserved sizes look weird. Can you post the log with
> memblock=debug and mminit_loglevel=4 added to the kernel command line?

http://www.lsbug.org/tmp/dmesg.txt

>  
>>>> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
>>>>
>>>> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
>>>> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
>>>> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
>>> nvme mlx5_core i2c_core nvme_core firmware_class
>>>> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
>>>> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
>>>> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>>>> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
>>>> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> 
> Do we know what PFN triggers it? Can you please run with this patch:

Nothing useful showed up with this patch. Yes, I double-checked that the patch was applied.

> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 70620d0dd923..b9d1dd0dae5f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1443,6 +1443,12 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
>  				i++;
>  			if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
>  				continue;
> +
> +			if (!pfn_valid(pfn))
> +				pr_info("%s: pfn %lx is not valid\n", __func__, pfn);
> +			else if (PagePoisoned(pfn_to_page(pfn)))
> +				dump_page(pfn_to_page(pfn), "");
> +
>  			/* Check if we got outside of the zone */
>  			if (zone && !zone_spans_pfn(zone, pfn + i))
>  				return NULL;
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27  0:16         ` Qian Cai
  0 siblings, 0 replies; 44+ messages in thread
From: Qian Cai @ 2021-05-27  0:16 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



On 5/26/2021 1:24 PM, Mike Rapoport wrote:
> On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
>>>
>>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
>>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
>>> reading files under /sys/devices/system/memory.
> 
> Does the issue persist of you only revert the latest patch in the series?
> In next-20210525 it would be commit 
> 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> and commit
> dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").

Reverting those two commits alone is enough to fix the issue.

> 
>>> Can you please send the beginning of the boot log, up to the
>>> 	 "Memory: xK/yK available ..."
>>> line?
>>
>> [    0.000000] NUMA: Failed to initialise from firmware
>> [    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
>> [    0.000000] Zone ranges:
>> [    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
>> [    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
>> [    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
>> [    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
>> [    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
>> [    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
>> [    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
>> [    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
>> [    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
>> [    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
>> [    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
>> [    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
>> [    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
>> [    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
>> [    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> 
> The available and reserved sizes look weird. Can you post the log with
> memblock=debug and mminit_loglevel=4 added to the kernel command line?

http://www.lsbug.org/tmp/dmesg.txt

>  
>>>> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
>>>>
>>>> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
>>>> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
>>>> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
>>> nvme mlx5_core i2c_core nvme_core firmware_class
>>>> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
>>>> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
>>>> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
>>>> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
>>>> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> 
> Do we know what PFN triggers it? Can you please run with this patch:

Nothing useful showed up with this patch. Yes, I double-checked that the patch was applied.

> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 70620d0dd923..b9d1dd0dae5f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1443,6 +1443,12 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
>  				i++;
>  			if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
>  				continue;
> +
> +			if (!pfn_valid(pfn))
> +				pr_info("%s: pfn %lx is not valid\n", __func__, pfn);
> +			else if (PagePoisoned(pfn_to_page(pfn)))
> +				dump_page(pfn_to_page(pfn), "");
> +
>  			/* Check if we got outside of the zone */
>  			if (zone && !zone_spans_pfn(zone, pfn + i))
>  				return NULL;
> 
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27  0:16         ` Qian Cai
@ 2021-05-27  0:31           ` Andrew Morton
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2021-05-27  0:31 UTC (permalink / raw)
  To: Qian Cai
  Cc: Mike Rapoport, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM,
	Stephen Rothwell

On Wed, 26 May 2021 20:16:14 -0400 Qian Cai <quic_qiancai@quicinc.com> wrote:

> 
> 
> On 5/26/2021 1:24 PM, Mike Rapoport wrote:
> > On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> >>>
> >>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> >>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> >>> reading files under /sys/devices/system/memory.
> > 
> > Does the issue persist of you only revert the latest patch in the series?
> > In next-20210525 it would be commit 
> > 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> > and commit
> > dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").
> 
> Reverting those two commits alone is enough to fix the issue.

(cc Stephen)

Thanks, I'll drop

arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch
arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix.patch


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27  0:31           ` Andrew Morton
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2021-05-27  0:31 UTC (permalink / raw)
  To: Qian Cai
  Cc: Mike Rapoport, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM,
	Stephen Rothwell

On Wed, 26 May 2021 20:16:14 -0400 Qian Cai <quic_qiancai@quicinc.com> wrote:

> 
> 
> On 5/26/2021 1:24 PM, Mike Rapoport wrote:
> > On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> >>>
> >>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> >>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> >>> reading files under /sys/devices/system/memory.
> > 
> > Does the issue persist of you only revert the latest patch in the series?
> > In next-20210525 it would be commit 
> > 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> > and commit
> > dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").
> 
> Reverting those two commits alone is enough to fix the issue.

(cc Stephen)

Thanks, I'll drop

arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch
arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix.patch


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27  0:31           ` Andrew Morton
@ 2021-05-27  7:25             ` Stephen Rothwell
  -1 siblings, 0 replies; 44+ messages in thread
From: Stephen Rothwell @ 2021-05-27  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Qian Cai, Mike Rapoport, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

Hi Andrew,

On Wed, 26 May 2021 17:31:41 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 26 May 2021 20:16:14 -0400 Qian Cai <quic_qiancai@quicinc.com> wrote:
> 
> > 
> > 
> > On 5/26/2021 1:24 PM, Mike Rapoport wrote:  
> > > On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:  
> > >>>
> > >>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:  
> > >>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while  
> > >>> reading files under /sys/devices/system/memory.  
> > > 
> > > Does the issue persist of you only revert the latest patch in the series?
> > > In next-20210525 it would be commit 
> > > 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> > > and commit
> > > dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").  
> > 
> > Reverting those two commits alone is enough to fix the issue.  
> 
> (cc Stephen)
> 
> Thanks, I'll drop
> 
> arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch
> arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix.patch

Reverted from linux-next for today as well.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27  7:25             ` Stephen Rothwell
  0 siblings, 0 replies; 44+ messages in thread
From: Stephen Rothwell @ 2021-05-27  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Qian Cai, Mike Rapoport, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM


[-- Attachment #1.1: Type: text/plain, Size: 1250 bytes --]

Hi Andrew,

On Wed, 26 May 2021 17:31:41 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 26 May 2021 20:16:14 -0400 Qian Cai <quic_qiancai@quicinc.com> wrote:
> 
> > 
> > 
> > On 5/26/2021 1:24 PM, Mike Rapoport wrote:  
> > > On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:  
> > >>>
> > >>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:  
> > >>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while  
> > >>> reading files under /sys/devices/system/memory.  
> > > 
> > > Does the issue persist of you only revert the latest patch in the series?
> > > In next-20210525 it would be commit 
> > > 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> > > and commit
> > > dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").  
> > 
> > Reverting those two commits alone is enough to fix the issue.  
> 
> (cc Stephen)
> 
> Thanks, I'll drop
> 
> arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch
> arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix.patch

Reverted from linux-next for today as well.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27  0:16         ` Qian Cai
@ 2021-05-27  8:56           ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-27  8:56 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 08:16:14PM -0400, Qian Cai wrote:
> 
> On 5/26/2021 1:24 PM, Mike Rapoport wrote:
> > On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> >>>
> >>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> >>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> >>> reading files under /sys/devices/system/memory.
> > 
> > Does the issue persist of you only revert the latest patch in the series?
> > In next-20210525 it would be commit 
> > 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> > and commit
> > dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").
> 
> Reverting those two commits alone is enough to fix the issue.
> 
> > 
> >>> Can you please send the beginning of the boot log, up to the
> >>> 	 "Memory: xK/yK available ..."
> >>> line?
> >>
> >> [    0.000000] NUMA: Failed to initialise from firmware
> >> [    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
> >> [    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
> >> [    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
> >> [    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
> >> [    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
> >> [    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
> >> [    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
> >> [    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
> >> [    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> > 
> > The available and reserved sizes look weird. Can you post the log with
> > memblock=debug and mminit_loglevel=4 added to the kernel command line?
> 
> http://www.lsbug.org/tmp/dmesg.txt

It seems cut in the middle and even then it's too long to be useful.

Let's drop memblock=debug for now and add this instead:

diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..3f888bef1994 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2055,6 +2055,8 @@ void __init memblock_free_all(void)
 {
 	unsigned long pages;
 
+	__memblock_dump_all();
+
 	free_unused_memmap();
 	reset_all_zones_managed_pages();
 
> >>>> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> >>>>
> >>>> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> >>>> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> >>>> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
> >>> nvme mlx5_core i2c_core nvme_core firmware_class
> >>>> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> >>>> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> >>>> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> >>>> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> >>>> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> > 
> > Do we know what PFN triggers it? Can you please run with this patch:
> 
> Nothing useful showed up with this patch. Yes, I double-checked that the patch was applied.

Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
you please try this instead:


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 70620d0dd923..d0e42e09ad84 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1447,6 +1447,13 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
 			if (zone && !zone_spans_pfn(zone, pfn + i))
 				return NULL;
 			page = pfn_to_page(pfn + i);
+
+			if (!pfn_valid(pfn + i))
+				pr_info("%s: pfn %lx is not valid\n", __func__, pfn + i);
+			else if (PagePoisoned(page))
+				dump_page(page, "");
+
+
 			if (zone && page_zone(page) != zone)
 				return NULL;
 			zone = page_zone(page);
 
-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27  8:56           ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-27  8:56 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Wed, May 26, 2021 at 08:16:14PM -0400, Qian Cai wrote:
> 
> On 5/26/2021 1:24 PM, Mike Rapoport wrote:
> > On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote:
> >>>
> >>> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> >>>> Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> >>> reading files under /sys/devices/system/memory.
> > 
> > Does the issue persist of you only revert the latest patch in the series?
> > In next-20210525 it would be commit 
> > 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix")
> > and commit
> > dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()").
> 
> Reverting those two commits alone is enough to fix the issue.
> 
> > 
> >>> Can you please send the beginning of the boot log, up to the
> >>> 	 "Memory: xK/yK available ..."
> >>> line?
> >>
> >> [    0.000000] NUMA: Failed to initialise from firmware
> >> [    0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
> >> [    0.000000]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
> >> [    0.000000]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
> >> [    0.000000]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
> >> [    0.000000]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
> >> [    0.000000]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
> >> [    0.000000]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
> >> [    0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
> >> [    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
> >> [    0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> > 
> > The available and reserved sizes look weird. Can you post the log with
> > memblock=debug and mminit_loglevel=4 added to the kernel command line?
> 
> http://www.lsbug.org/tmp/dmesg.txt

It seems cut in the middle and even then it's too long to be useful.

Let's drop memblock=debug for now and add this instead:

diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..3f888bef1994 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2055,6 +2055,8 @@ void __init memblock_free_all(void)
 {
 	unsigned long pages;
 
+	__memblock_dump_all();
+
 	free_unused_memmap();
 	reset_all_zones_managed_pages();
 
> >>>> [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/
> >>>>
> >>>> [  247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> >>>> [  247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> >>>> [  247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
> >>> nvme mlx5_core i2c_core nvme_core firmware_class
> >>>> [  247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> >>>> [  247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> >>>> [  247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> >>>> [  247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> >>>> [  247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> > 
> > Do we know what PFN triggers it? Can you please run with this patch:
> 
> Nothing useful showed up with this patch. Yes, I double-checked that the patch was applied.

Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
you please try this instead:


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 70620d0dd923..d0e42e09ad84 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1447,6 +1447,13 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn,
 			if (zone && !zone_spans_pfn(zone, pfn + i))
 				return NULL;
 			page = pfn_to_page(pfn + i);
+
+			if (!pfn_valid(pfn + i))
+				pr_info("%s: pfn %lx is not valid\n", __func__, pfn + i);
+			else if (PagePoisoned(page))
+				dump_page(page, "");
+
+
 			if (zone && page_zone(page) != zone)
 				return NULL;
 			zone = page_zone(page);
 
-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27  8:56           ` Mike Rapoport
@ 2021-05-27 14:33             ` Qian Cai
  -1 siblings, 0 replies; 44+ messages in thread
From: Qian Cai @ 2021-05-27 14:33 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



On 5/27/2021 4:56 AM, Mike Rapoport wrote:
> Let's drop memblock=debug for now and add this instead:

[    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
[    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
[    0.000000][    T0] efi: EFI v2.70 by American Megatrends
[    0.000000][    T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dbed98
[    0.000000][    T0] esrt: Reserving ESRT space from 0x0000009ff1d18298 to 0x0000009ff1d182f8.
[    0.000000][    T0] ACPI: Early table checksum verification disabled
[    0.000000][    T0] ACPI: RSDP 0x0000009FF5B40000 000024 (v02 ALASKA)
[    0.000000][    T0] ACPI: XSDT 0x0000009FF5B40028 000094 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000][    T0] ACPI: FACP 0x0000009FF5B400C0 000114 (v06 Ampere eMAG     00000003 INTL 20190509)
[    0.000000][    T0] ACPI: DSDT 0x0000009FF5B401D8 00765A (v05 ALASKA A M I    00000001 INTL 20190509)
[    0.000000][    T0] ACPI: FIDT 0x0000009FF5B47838 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000][    T0] ACPI: DBG2 0x0000009FF5B478D8 000061 (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000][    T0] ACPI: GTDT 0x0000009FF5B47940 000108 (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000][    T0] ACPI: IORT 0x0000009FF5B47A48 000BCC (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000][    T0] ACPI: MCFG 0x0000009FF5B48618 0000AC (v01 Ampere eMAG     00000001 INTL 20190509)
[    0.000000][    T0] ACPI: SSDT 0x0000009FF5B486C8 00002D (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000][    T0] ACPI: SPMI 0x0000009FF5B486F8 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
[    0.000000][    T0] ACPI: APIC 0x0000009FF5B48740 000A68 (v04 Ampere eMAG     00000004      01000013)
[    0.000000][    T0] ACPI: PCCT 0x0000009FF5B491A8 0005D0 (v01 Ampere eMAG     00000003      01000013)
[    0.000000][    T0] ACPI: BERT 0x0000009FF5B49778 000030 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000][    T0] ACPI: HEST 0x0000009FF5B497A8 000328 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000][    T0] ACPI: SPCR 0x0000009FF5B49AD0 000050 (v02 A M I  APTIO V  01072009 AMI. 0005000D)
[    0.000000][    T0] ACPI: PPTT 0x0000009FF5B49B20 000CB8 (v01 Ampere eMAG     00000003      01000013)
[    0.000000][    T0] ACPI: SPCR: console: pl011,mmio32,0x12600000,115200
[    0.000000][    T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '115200')
[    0.000000][    T0] printk: bootconsole [pl11] enabled
[    0.000000][    T0] NUMA: Failed to initialise from firmware
[    0.000000][    T0] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000][    T0] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
[    0.000000][    T0] Zone ranges:
[    0.000000][    T0]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000][    T0] Movable zone start for each node
[    0.000000][    T0] Early memory node ranges
[    0.000000][    T0]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
[    0.000000][    T0]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
[    0.000000][    T0]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
[    0.000000][    T0]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
[    0.000000][    T0]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
[    0.000000][    T0]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
[    0.000000][    T0] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000][    T0] kasan: KernelAddressSanitizer initialized
[    0.000000][    T0] psci: probing for conduit method from ACPI.
[    0.000000][    T0] psci: PSCIv1.0 detected in firmware.
[    0.000000][    T0] psci: Using standard PSCI v0.2 function IDs
[    0.000000][    T0] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000][    T0] psci: SMC Calling Convention v65535.65535
[    0.000000][    T0] ACPI: SRAT not present
[    0.000000][    T0] percpu: Embedded 10 pages/cpu s584592 r8192 d62576 u655360
[    0.000000][    T0] Detected PIPT I-cache on CPU0
[    0.000000][    T0] CPU features: detected: GIC system register CPU interface
[    0.000000][    T0] CPU features: detected: Spectre-v2
[    0.000000][    T0] CPU features: detected: Spectre-v4
[    0.000000][    T0] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000][    T0] Built 1 zonelists, mobility grouping on.  Total pages: 2091012
[    0.000000][    T0] Policy zone: Normal
[    0.000000][    T0] Kernel command line: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210526+ root=/dev/mapper/ubuntu--vg-ubuntu--lv ro cma=1024M iommu.passthrough=1 earlycon mminit_loglevel=4
[    0.000000][    T0] Unknown command line parameters: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210526+ cma=1024M mminit_loglevel=4
[    0.000000][    T0] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, linear)
[    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
[    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.000000][    T0] MEMBLOCK configuration:
[    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
[    0.000000][    T0]  memory.cnt  = 0xc
[    0.000000][    T0]  memory[0x0]     [0x0000000090000000-0x0000000091ffffff], 0x0000000002000000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x1]     [0x0000000092000000-0x00000000928fffff], 0x0000000000900000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x2]     [0x0000000092900000-0x00000000fffbffff], 0x000000006d6c0000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x3]     [0x00000000fffc0000-0x00000000ffffffff], 0x0000000000040000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x4]     [0x0000000880000000-0x0000000fffffffff], 0x0000000780000000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x5]     [0x0000008800000000-0x0000009ff5aeffff], 0x00000017f5af0000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x6]     [0x0000009ff5af0000-0x0000009ff5b2ffff], 0x0000000000040000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x7]     [0x0000009ff5b30000-0x0000009ff5baffff], 0x0000000000080000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x8]     [0x0000009ff5bb0000-0x0000009ff7deffff], 0x0000000002240000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x9]     [0x0000009ff7df0000-0x0000009ff7e5ffff], 0x0000000000070000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0xa]     [0x0000009ff7e60000-0x0000009ff7ffffff], 0x00000000001a0000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0xb]     [0x0000009ff8000000-0x0000009fffffffff], 0x0000000008000000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  reserved.cnt  = 0x16
[    0.000000][    T0]  reserved[0x0]   [0x000000088b7c0000-0x000000088fffffff], 0x0000000004840000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x1]   [0x0000009be0000000-0x0000009be07fffff], 0x0000000000800000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x2]   [0x0000009be0da0000-0x0000009be819ffff], 0x0000000007400000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x3]   [0x0000009be81c0000-0x0000009f6c800255], 0x0000000384640256 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x4]   [0x0000009f6c810000-0x0000009fe6daffff], 0x000000007a5a0000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x5]   [0x0000009fe6dbed98-0x0000009fe6dbeda7], 0x0000000000000010 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x6]   [0x0000009fe6dc0000-0x0000009ff1d0ffff], 0x000000000af50000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x7]   [0x0000009ff1d18298-0x0000009ff1d182f7], 0x0000000000000060 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x8]   [0x0000009ff1d1c600-0x0000009ff1d1c61f], 0x0000000000000020 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x9]   [0x0000009ff1d1c640-0x0000009ff1d1ce47], 0x0000000000000808 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xa]   [0x0000009ff1d1ce80-0x0000009ff1d1d70f], 0x0000000000000890 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xb]   [0x0000009ff1d1d740-0x0000009ff1d1e787], 0x0000000000001048 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xc]   [0x0000009ff1d1e7c0-0x0000009ff1d1f84f], 0x0000000000001090 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xd]   [0x0000009ff1d1f880-0x0000009ff1d1fb1f], 0x00000000000002a0 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xe]   [0x0000009ff1d1fb40-0x0000009ff1d1fcc7], 0x0000000000000188 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xf]   [0x0000009ff1d1fd00-0x0000009ff5aeffff], 0x0000000003dd0300 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x10]  [0x0000009ff5b30000-0x0000009ff5baffff], 0x0000000000080000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x11]  [0x0000009ff7df0000-0x0000009ff7e5ffff], 0x0000000000070000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x12]  [0x0000009ff8000000-0x0000009ffefa0007], 0x0000000006fa0008 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x13]  [0x0000009ffefa0040-0x0000009ffefa00d0], 0x0000000000000091 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x14]  [0x0000009ffefa0100-0x0000009ffefa0190], 0x0000000000000091 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x15]  [0x0000009ffefa01c0-0x0000009fffffffff], 0x000000000105fe40 bytes flags: 0x0
[    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

> Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
> you please try this instead:

[  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
[  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
[  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27 14:33             ` Qian Cai
  0 siblings, 0 replies; 44+ messages in thread
From: Qian Cai @ 2021-05-27 14:33 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



On 5/27/2021 4:56 AM, Mike Rapoport wrote:
> Let's drop memblock=debug for now and add this instead:

[    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
[    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
[    0.000000][    T0] efi: EFI v2.70 by American Megatrends
[    0.000000][    T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dbed98
[    0.000000][    T0] esrt: Reserving ESRT space from 0x0000009ff1d18298 to 0x0000009ff1d182f8.
[    0.000000][    T0] ACPI: Early table checksum verification disabled
[    0.000000][    T0] ACPI: RSDP 0x0000009FF5B40000 000024 (v02 ALASKA)
[    0.000000][    T0] ACPI: XSDT 0x0000009FF5B40028 000094 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000][    T0] ACPI: FACP 0x0000009FF5B400C0 000114 (v06 Ampere eMAG     00000003 INTL 20190509)
[    0.000000][    T0] ACPI: DSDT 0x0000009FF5B401D8 00765A (v05 ALASKA A M I    00000001 INTL 20190509)
[    0.000000][    T0] ACPI: FIDT 0x0000009FF5B47838 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000][    T0] ACPI: DBG2 0x0000009FF5B478D8 000061 (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000][    T0] ACPI: GTDT 0x0000009FF5B47940 000108 (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000][    T0] ACPI: IORT 0x0000009FF5B47A48 000BCC (v00 Ampere eMAG     00000000 INTL 20190509)
[    0.000000][    T0] ACPI: MCFG 0x0000009FF5B48618 0000AC (v01 Ampere eMAG     00000001 INTL 20190509)
[    0.000000][    T0] ACPI: SSDT 0x0000009FF5B486C8 00002D (v02 Ampere eMAG     00000001 INTL 20190509)
[    0.000000][    T0] ACPI: SPMI 0x0000009FF5B486F8 000041 (v05 ALASKA A M I    00000000 AMI. 00000000)
[    0.000000][    T0] ACPI: APIC 0x0000009FF5B48740 000A68 (v04 Ampere eMAG     00000004      01000013)
[    0.000000][    T0] ACPI: PCCT 0x0000009FF5B491A8 0005D0 (v01 Ampere eMAG     00000003      01000013)
[    0.000000][    T0] ACPI: BERT 0x0000009FF5B49778 000030 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000][    T0] ACPI: HEST 0x0000009FF5B497A8 000328 (v01 Ampere eMAG     00000003 INTL 20190509)
[    0.000000][    T0] ACPI: SPCR 0x0000009FF5B49AD0 000050 (v02 A M I  APTIO V  01072009 AMI. 0005000D)
[    0.000000][    T0] ACPI: PPTT 0x0000009FF5B49B20 000CB8 (v01 Ampere eMAG     00000003      01000013)
[    0.000000][    T0] ACPI: SPCR: console: pl011,mmio32,0x12600000,115200
[    0.000000][    T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '115200')
[    0.000000][    T0] printk: bootconsole [pl11] enabled
[    0.000000][    T0] NUMA: Failed to initialise from firmware
[    0.000000][    T0] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000][    T0] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
[    0.000000][    T0] Zone ranges:
[    0.000000][    T0]   Normal   [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000][    T0] Movable zone start for each node
[    0.000000][    T0] Early memory node ranges
[    0.000000][    T0]   node   0: [mem 0x0000000090000000-0x0000000091ffffff]
[    0.000000][    T0]   node   0: [mem 0x0000000092000000-0x00000000928fffff]
[    0.000000][    T0]   node   0: [mem 0x0000000092900000-0x00000000fffbffff]
[    0.000000][    T0]   node   0: [mem 0x00000000fffc0000-0x00000000ffffffff]
[    0.000000][    T0]   node   0: [mem 0x0000000880000000-0x0000000fffffffff]
[    0.000000][    T0]   node   0: [mem 0x0000008800000000-0x0000009ff5aeffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
[    0.000000][    T0]   node   0: [mem 0x0000009ff8000000-0x0000009fffffffff]
[    0.000000][    T0] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
[    0.000000][    T0] kasan: KernelAddressSanitizer initialized
[    0.000000][    T0] psci: probing for conduit method from ACPI.
[    0.000000][    T0] psci: PSCIv1.0 detected in firmware.
[    0.000000][    T0] psci: Using standard PSCI v0.2 function IDs
[    0.000000][    T0] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000][    T0] psci: SMC Calling Convention v65535.65535
[    0.000000][    T0] ACPI: SRAT not present
[    0.000000][    T0] percpu: Embedded 10 pages/cpu s584592 r8192 d62576 u655360
[    0.000000][    T0] Detected PIPT I-cache on CPU0
[    0.000000][    T0] CPU features: detected: GIC system register CPU interface
[    0.000000][    T0] CPU features: detected: Spectre-v2
[    0.000000][    T0] CPU features: detected: Spectre-v4
[    0.000000][    T0] CPU features: detected: Kernel page table isolation (KPTI)
[    0.000000][    T0] Built 1 zonelists, mobility grouping on.  Total pages: 2091012
[    0.000000][    T0] Policy zone: Normal
[    0.000000][    T0] Kernel command line: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210526+ root=/dev/mapper/ubuntu--vg-ubuntu--lv ro cma=1024M iommu.passthrough=1 earlycon mminit_loglevel=4
[    0.000000][    T0] Unknown command line parameters: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210526+ cma=1024M mminit_loglevel=4
[    0.000000][    T0] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, linear)
[    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
[    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.000000][    T0] MEMBLOCK configuration:
[    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
[    0.000000][    T0]  memory.cnt  = 0xc
[    0.000000][    T0]  memory[0x0]     [0x0000000090000000-0x0000000091ffffff], 0x0000000002000000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x1]     [0x0000000092000000-0x00000000928fffff], 0x0000000000900000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x2]     [0x0000000092900000-0x00000000fffbffff], 0x000000006d6c0000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x3]     [0x00000000fffc0000-0x00000000ffffffff], 0x0000000000040000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x4]     [0x0000000880000000-0x0000000fffffffff], 0x0000000780000000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x5]     [0x0000008800000000-0x0000009ff5aeffff], 0x00000017f5af0000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x6]     [0x0000009ff5af0000-0x0000009ff5b2ffff], 0x0000000000040000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x7]     [0x0000009ff5b30000-0x0000009ff5baffff], 0x0000000000080000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0x8]     [0x0000009ff5bb0000-0x0000009ff7deffff], 0x0000000002240000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0x9]     [0x0000009ff7df0000-0x0000009ff7e5ffff], 0x0000000000070000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  memory[0xa]     [0x0000009ff7e60000-0x0000009ff7ffffff], 0x00000000001a0000 bytes on node 0 flags: 0x4
[    0.000000][    T0]  memory[0xb]     [0x0000009ff8000000-0x0000009fffffffff], 0x0000000008000000 bytes on node 0 flags: 0x0
[    0.000000][    T0]  reserved.cnt  = 0x16
[    0.000000][    T0]  reserved[0x0]   [0x000000088b7c0000-0x000000088fffffff], 0x0000000004840000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x1]   [0x0000009be0000000-0x0000009be07fffff], 0x0000000000800000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x2]   [0x0000009be0da0000-0x0000009be819ffff], 0x0000000007400000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x3]   [0x0000009be81c0000-0x0000009f6c800255], 0x0000000384640256 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x4]   [0x0000009f6c810000-0x0000009fe6daffff], 0x000000007a5a0000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x5]   [0x0000009fe6dbed98-0x0000009fe6dbeda7], 0x0000000000000010 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x6]   [0x0000009fe6dc0000-0x0000009ff1d0ffff], 0x000000000af50000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x7]   [0x0000009ff1d18298-0x0000009ff1d182f7], 0x0000000000000060 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x8]   [0x0000009ff1d1c600-0x0000009ff1d1c61f], 0x0000000000000020 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x9]   [0x0000009ff1d1c640-0x0000009ff1d1ce47], 0x0000000000000808 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xa]   [0x0000009ff1d1ce80-0x0000009ff1d1d70f], 0x0000000000000890 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xb]   [0x0000009ff1d1d740-0x0000009ff1d1e787], 0x0000000000001048 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xc]   [0x0000009ff1d1e7c0-0x0000009ff1d1f84f], 0x0000000000001090 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xd]   [0x0000009ff1d1f880-0x0000009ff1d1fb1f], 0x00000000000002a0 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xe]   [0x0000009ff1d1fb40-0x0000009ff1d1fcc7], 0x0000000000000188 bytes flags: 0x0
[    0.000000][    T0]  reserved[0xf]   [0x0000009ff1d1fd00-0x0000009ff5aeffff], 0x0000000003dd0300 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x10]  [0x0000009ff5b30000-0x0000009ff5baffff], 0x0000000000080000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x11]  [0x0000009ff7df0000-0x0000009ff7e5ffff], 0x0000000000070000 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x12]  [0x0000009ff8000000-0x0000009ffefa0007], 0x0000000006fa0008 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x13]  [0x0000009ffefa0040-0x0000009ffefa00d0], 0x0000000000000091 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x14]  [0x0000009ffefa0100-0x0000009ffefa0190], 0x0000000000000091 bytes flags: 0x0
[    0.000000][    T0]  reserved[0x15]  [0x0000009ffefa01c0-0x0000009fffffffff], 0x000000000105fe40 bytes flags: 0x0
[    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

> Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
> you please try this instead:

[  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
[  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
[  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27 14:33             ` Qian Cai
@ 2021-05-27 16:22               ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-27 16:22 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, May 27, 2021 at 10:33:13AM -0400, Qian Cai wrote:
> 
> 
> On 5/27/2021 4:56 AM, Mike Rapoport wrote:
> > Let's drop memblock=debug for now and add this instead:
> 
> [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> [    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
> [    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
> [    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
> [    0.000000][    T0] MEMBLOCK configuration:
> [    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
> [    0.000000][    T0]  memory.cnt  = 0xc
> [    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

I still cannot understand where most of the memory disappeared, but it
seems entirely different issue.
 
> > Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
> > you please try this instead:
> 
> [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
> [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
> [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))

Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":

https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com

It seems to me that the check for memblock_is_memory() in
arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
section parts that are not actually populated and then we have
VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27 16:22               ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-27 16:22 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, May 27, 2021 at 10:33:13AM -0400, Qian Cai wrote:
> 
> 
> On 5/27/2021 4:56 AM, Mike Rapoport wrote:
> > Let's drop memblock=debug for now and add this instead:
> 
> [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> [    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
> [    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
> [    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
> [    0.000000][    T0] MEMBLOCK configuration:
> [    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
> [    0.000000][    T0]  memory.cnt  = 0xc
> [    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

I still cannot understand where most of the memory disappeared, but it
seems entirely different issue.
 
> > Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
> > you please try this instead:
> 
> [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
> [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
> [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))

Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":

https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com

It seems to me that the check for memblock_is_memory() in
arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
section parts that are not actually populated and then we have
VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27 16:22               ` Mike Rapoport
@ 2021-05-27 17:00                 ` Qian Cai
  -1 siblings, 0 replies; 44+ messages in thread
From: Qian Cai @ 2021-05-27 17:00 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



On 5/27/2021 12:22 PM, Mike Rapoport wrote:
> On Thu, May 27, 2021 at 10:33:13AM -0400, Qian Cai wrote:
>>
>>
>> On 5/27/2021 4:56 AM, Mike Rapoport wrote:
>>> Let's drop memblock=debug for now and add this instead:
>>
>> [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
>> [    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
>> [    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
>> [    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
>> [    0.000000][    T0] MEMBLOCK configuration:
>> [    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
>> [    0.000000][    T0]  memory.cnt  = 0xc
>> [    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> 
> I still cannot understand where most of the memory disappeared, but it
> seems entirely different issue.

Interesting, it seems those memory did come back after booting.

# cat /proc/meminfo
MemTotal:       116656448 kB
MemFree:        110464000 kB
MemAvailable:   101919872 kB
Buffers:           16320 kB
Cached:           118912 kB
SwapCached:         3136 kB
Active:            63360 kB
Inactive:         199936 kB
Active(anon):       9792 kB
Inactive(anon):   132480 kB
Active(file):      53568 kB
Inactive(file):    67456 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8388544 kB
SwapFree:        8344704 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        125056 kB
Mapped:            44992 kB
Shmem:             14784 kB
KReclaimable:      92160 kB
Slab:            4943424 kB
SReclaimable:      92160 kB
SUnreclaim:      4851264 kB
KernelStack:       24832 kB
PageTables:        10240 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    66716736 kB
Committed_AS:     708096 kB
VmallocTotal:   133143461888 kB
VmallocUsed:       49600 kB
VmallocChunk:          0 kB
Percpu:            45056 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:     524288 kB
Hugetlb:               0 kB

>  
>>> Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
>>> you please try this instead:
>>
>> [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
>> [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
>> [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> 
> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> 
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> It seems to me that the check for memblock_is_memory() in
> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> section parts that are not actually populated and then we have
> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.

That patch fixed it.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27 17:00                 ` Qian Cai
  0 siblings, 0 replies; 44+ messages in thread
From: Qian Cai @ 2021-05-27 17:00 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Ard Biesheuvel, Linux Memory Management List,
	Will Deacon, Marc Zyngier, Linux Kernel Mailing List, Linux ARM



On 5/27/2021 12:22 PM, Mike Rapoport wrote:
> On Thu, May 27, 2021 at 10:33:13AM -0400, Qian Cai wrote:
>>
>>
>> On 5/27/2021 4:56 AM, Mike Rapoport wrote:
>>> Let's drop memblock=debug for now and add this instead:
>>
>> [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
>> [    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
>> [    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
>> [    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
>> [    0.000000][    T0] MEMBLOCK configuration:
>> [    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
>> [    0.000000][    T0]  memory.cnt  = 0xc
>> [    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> 
> I still cannot understand where most of the memory disappeared, but it
> seems entirely different issue.

Interesting, it seems those memory did come back after booting.

# cat /proc/meminfo
MemTotal:       116656448 kB
MemFree:        110464000 kB
MemAvailable:   101919872 kB
Buffers:           16320 kB
Cached:           118912 kB
SwapCached:         3136 kB
Active:            63360 kB
Inactive:         199936 kB
Active(anon):       9792 kB
Inactive(anon):   132480 kB
Active(file):      53568 kB
Inactive(file):    67456 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8388544 kB
SwapFree:        8344704 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        125056 kB
Mapped:            44992 kB
Shmem:             14784 kB
KReclaimable:      92160 kB
Slab:            4943424 kB
SReclaimable:      92160 kB
SUnreclaim:      4851264 kB
KernelStack:       24832 kB
PageTables:        10240 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    66716736 kB
Committed_AS:     708096 kB
VmallocTotal:   133143461888 kB
VmallocUsed:       49600 kB
VmallocChunk:          0 kB
Percpu:            45056 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:     524288 kB
Hugetlb:               0 kB

>  
>>> Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
>>> you please try this instead:
>>
>> [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
>> [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
>> [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> 
> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> 
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> It seems to me that the check for memblock_is_memory() in
> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> section parts that are not actually populated and then we have
> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.

That patch fixed it.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27 16:22               ` Mike Rapoport
@ 2021-05-27 17:12                 ` David Hildenbrand
  -1 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-05-27 17:12 UTC (permalink / raw)
  To: Mike Rapoport, Qian Cai
  Cc: Andrew Morton, Catalin Marinas, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

>> [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
>> [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
>> [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> 
> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> 
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> It seems to me that the check for memblock_is_memory() in
> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> section parts that are not actually populated and then we have
> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> 

Oh, that makes sense to me.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27 17:12                 ` David Hildenbrand
  0 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-05-27 17:12 UTC (permalink / raw)
  To: Mike Rapoport, Qian Cai
  Cc: Andrew Morton, Catalin Marinas, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

>> [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
>> [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
>> [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> 
> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> 
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> It seems to me that the check for memblock_is_memory() in
> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> section parts that are not actually populated and then we have
> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> 

Oh, that makes sense to me.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27 16:22               ` Mike Rapoport
@ 2021-05-27 17:50                 ` Catalin Marinas
  -1 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2021-05-27 17:50 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Qian Cai, Andrew Morton, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, May 27, 2021 at 07:22:00PM +0300, Mike Rapoport wrote:
> On Thu, May 27, 2021 at 10:33:13AM -0400, Qian Cai wrote:
> > On 5/27/2021 4:56 AM, Mike Rapoport wrote:
> > > Let's drop memblock=debug for now and add this instead:
> > 
> > [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > [    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
> > [    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
> > [    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
> > [    0.000000][    T0] MEMBLOCK configuration:
> > [    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
> > [    0.000000][    T0]  memory.cnt  = 0xc
> > [    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> 
> I still cannot understand where most of the memory disappeared, but it
> seems entirely different issue.
>  
> > > Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
> > > you please try this instead:
> > 
> > [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
> > [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
> > [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> 
> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> 
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> It seems to me that the check for memblock_is_memory() in
> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> section parts that are not actually populated and then we have
> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.

I acked Anshuman's patch, I think they all need to go in together.

-- 
Catalin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27 17:50                 ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2021-05-27 17:50 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Qian Cai, Andrew Morton, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, May 27, 2021 at 07:22:00PM +0300, Mike Rapoport wrote:
> On Thu, May 27, 2021 at 10:33:13AM -0400, Qian Cai wrote:
> > On 5/27/2021 4:56 AM, Mike Rapoport wrote:
> > > Let's drop memblock=debug for now and add this instead:
> > 
> > [    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > [    0.000000][    T0] Linux version 5.13.0-rc3-next-20210526+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #31 SMP Thu May 27 12:32:40 UTC 2021
> > [    0.000000][    T0] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
> > [    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
> > [    0.000000][    T0] MEMBLOCK configuration:
> > [    0.000000][    T0]  memory size = 0x0000001ff0000000 reserved size = 0x0000000421e33ae8
> > [    0.000000][    T0]  memory.cnt  = 0xc
> > [    0.000000][    T0] Memory: 777216K/133955584K available (17984K kernel code, 118722K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)
> 
> I still cannot understand where most of the memory disappeared, but it
> seems entirely different issue.
>  
> > > Sorry, I've missed that the BUG is apparently triggered for pfn + i. Can
> > > you please try this instead:
> > 
> > [  259.216661][ T1417] test_pages_in_a_zone: pfn 8000 is not valid
> > [  259.226547][ T1417] page:00000000f4aa8c5c is uninitialized and poisoned
> > [  259.226560][ T1417] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> 
> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> 
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> It seems to me that the check for memblock_is_memory() in
> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> section parts that are not actually populated and then we have
> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.

I acked Anshuman's patch, I think they all need to go in together.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27 17:50                 ` Catalin Marinas
@ 2021-05-27 22:56                   ` Andrew Morton
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2021-05-27 22:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mike Rapoport, Qian Cai, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:

> > Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > 
> > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > 
> > It seems to me that the check for memblock_is_memory() in
> > arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > section parts that are not actually populated and then we have
> > VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> 
> I acked Anshuman's patch, I think they all need to go in together.

That's neat.   Specifically which patches are we referring to here?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-27 22:56                   ` Andrew Morton
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2021-05-27 22:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mike Rapoport, Qian Cai, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:

> > Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > 
> > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > 
> > It seems to me that the check for memblock_is_memory() in
> > arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > section parts that are not actually populated and then we have
> > VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> 
> I acked Anshuman's patch, I think they all need to go in together.

That's neat.   Specifically which patches are we referring to here?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-27 22:56                   ` Andrew Morton
@ 2021-05-28  5:13                     ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-28  5:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Catalin Marinas, Qian Cai, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> > > Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > > 
> > > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > > 
> > > It seems to me that the check for memblock_is_memory() in
> > > arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > > section parts that are not actually populated and then we have
> > > VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> > 
> > I acked Anshuman's patch, I think they all need to go in together.
> 
> That's neat.   Specifically which patches are we referring to here?

arm64: drop pfn_valid_within() and simplify pfn_valid():
https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org

arm64/mm: Drop HAVE_ARCH_PFN_VALID:
https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-05-28  5:13                     ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-05-28  5:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Catalin Marinas, Qian Cai, David Hildenbrand, Anshuman Khandual,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> > > Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > > 
> > > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > > 
> > > It seems to me that the check for memblock_is_memory() in
> > > arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > > section parts that are not actually populated and then we have
> > > VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> > 
> > I acked Anshuman's patch, I think they all need to go in together.
> 
> That's neat.   Specifically which patches are we referring to here?

arm64: drop pfn_valid_within() and simplify pfn_valid():
https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org

arm64/mm: Drop HAVE_ARCH_PFN_VALID:
https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-05-28  5:13                     ` Mike Rapoport
@ 2021-06-08  7:06                       ` Anshuman Khandual
  -1 siblings, 0 replies; 44+ messages in thread
From: Anshuman Khandual @ 2021-06-08  7:06 UTC (permalink / raw)
  To: Mike Rapoport, Andrew Morton
  Cc: Catalin Marinas, Qian Cai, David Hildenbrand, Ard Biesheuvel,
	Linux Memory Management List, Will Deacon, Marc Zyngier,
	Linux Kernel Mailing List, Linux ARM



On 5/28/21 10:43 AM, Mike Rapoport wrote:
> On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
>> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
>>
>>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
>>>>
>>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
>>>>
>>>> It seems to me that the check for memblock_is_memory() in
>>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
>>>> section parts that are not actually populated and then we have
>>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
>>>
>>> I acked Anshuman's patch, I think they all need to go in together.
>>
>> That's neat.   Specifically which patches are we referring to here?
> 
> arm64: drop pfn_valid_within() and simplify pfn_valid():
> https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> 
> arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com

I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
i.e. next-20210607. I might have missed some earlier context here but do not we want
to fallback on generic pfn_valid() after Mike's series ?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-06-08  7:06                       ` Anshuman Khandual
  0 siblings, 0 replies; 44+ messages in thread
From: Anshuman Khandual @ 2021-06-08  7:06 UTC (permalink / raw)
  To: Mike Rapoport, Andrew Morton
  Cc: Catalin Marinas, Qian Cai, David Hildenbrand, Ard Biesheuvel,
	Linux Memory Management List, Will Deacon, Marc Zyngier,
	Linux Kernel Mailing List, Linux ARM



On 5/28/21 10:43 AM, Mike Rapoport wrote:
> On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
>> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
>>
>>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
>>>>
>>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
>>>>
>>>> It seems to me that the check for memblock_is_memory() in
>>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
>>>> section parts that are not actually populated and then we have
>>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
>>>
>>> I acked Anshuman's patch, I think they all need to go in together.
>>
>> That's neat.   Specifically which patches are we referring to here?
> 
> arm64: drop pfn_valid_within() and simplify pfn_valid():
> https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> 
> arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com

I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
i.e. next-20210607. I might have missed some earlier context here but do not we want
to fallback on generic pfn_valid() after Mike's series ?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-06-08  7:06                       ` Anshuman Khandual
@ 2021-06-14  8:25                         ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-06-14  8:25 UTC (permalink / raw)
  To: Anshuman Khandual, Andrew Morton
  Cc: Catalin Marinas, Qian Cai, David Hildenbrand, Ard Biesheuvel,
	Linux Memory Management List, Will Deacon, Marc Zyngier,
	Linux Kernel Mailing List, Linux ARM

On Tue, Jun 08, 2021 at 12:36:21PM +0530, Anshuman Khandual wrote:
> 
> 
> On 5/28/21 10:43 AM, Mike Rapoport wrote:
> > On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> >> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>
> >>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> >>>>
> >>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> >>>>
> >>>> It seems to me that the check for memblock_is_memory() in
> >>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> >>>> section parts that are not actually populated and then we have
> >>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> >>>
> >>> I acked Anshuman's patch, I think they all need to go in together.
> >>
> >> That's neat.   Specifically which patches are we referring to here?
> > 
> > arm64: drop pfn_valid_within() and simplify pfn_valid():
> > https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> > 
> > arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
> i.e. next-20210607. I might have missed some earlier context here but do not we want
> to fallback on generic pfn_valid() after Mike's series ?

Andrew,

Can you please pick the two patches above?

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-06-14  8:25                         ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-06-14  8:25 UTC (permalink / raw)
  To: Anshuman Khandual, Andrew Morton
  Cc: Catalin Marinas, Qian Cai, David Hildenbrand, Ard Biesheuvel,
	Linux Memory Management List, Will Deacon, Marc Zyngier,
	Linux Kernel Mailing List, Linux ARM

On Tue, Jun 08, 2021 at 12:36:21PM +0530, Anshuman Khandual wrote:
> 
> 
> On 5/28/21 10:43 AM, Mike Rapoport wrote:
> > On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> >> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>
> >>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> >>>>
> >>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> >>>>
> >>>> It seems to me that the check for memblock_is_memory() in
> >>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> >>>> section parts that are not actually populated and then we have
> >>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> >>>
> >>> I acked Anshuman's patch, I think they all need to go in together.
> >>
> >> That's neat.   Specifically which patches are we referring to here?
> > 
> > arm64: drop pfn_valid_within() and simplify pfn_valid():
> > https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> > 
> > arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> 
> I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
> i.e. next-20210607. I might have missed some earlier context here but do not we want
> to fallback on generic pfn_valid() after Mike's series ?

Andrew,

Can you please pick the two patches above?

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-06-14  8:25                         ` Mike Rapoport
@ 2021-06-15  0:13                           ` Andrew Morton
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2021-06-15  0:13 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Anshuman Khandual, Catalin Marinas, Qian Cai, David Hildenbrand,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Mon, 14 Jun 2021 11:25:54 +0300 Mike Rapoport <rppt@linux.ibm.com> wrote:

> On Tue, Jun 08, 2021 at 12:36:21PM +0530, Anshuman Khandual wrote:
> > 
> > 
> > On 5/28/21 10:43 AM, Mike Rapoport wrote:
> > > On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> > >> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> > >>
> > >>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > >>>>
> > >>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > >>>>
> > >>>> It seems to me that the check for memblock_is_memory() in
> > >>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > >>>> section parts that are not actually populated and then we have
> > >>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> > >>>
> > >>> I acked Anshuman's patch, I think they all need to go in together.
> > >>
> > >> That's neat.   Specifically which patches are we referring to here?
> > > 
> > > arm64: drop pfn_valid_within() and simplify pfn_valid():
> > > https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> > > 
> > > arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> > > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > 
> > I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
> > i.e. next-20210607. I might have missed some earlier context here but do not we want
> > to fallback on generic pfn_valid() after Mike's series ?
> 
> Andrew,
> 
> Can you please pick the two patches above?

I already had

include-linux-mmzoneh-add-documentation-for-pfn_valid.patch
memblock-update-initialization-of-reserved-pages.patch
arm64-decouple-check-whether-pfn-is-in-linear-map-from-pfn_valid.patch
arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch

and I just added

arm64-mm-drop-have_arch_pfn_valid.patch

so I think we're all good now?

and I don't think any of this is needed in 5.13 or -stable, correct?

I still have question marks over

https://lkml.kernel.org/r/YJ0Fhs5krPJ0FgiV@kernel.org and
https://lkml.kernel.org/r/d55f915c-ad01-e729-1e29-b57d78257cbb@quicinc.com

Is this all OK now?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-06-15  0:13                           ` Andrew Morton
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Morton @ 2021-06-15  0:13 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Anshuman Khandual, Catalin Marinas, Qian Cai, David Hildenbrand,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Mon, 14 Jun 2021 11:25:54 +0300 Mike Rapoport <rppt@linux.ibm.com> wrote:

> On Tue, Jun 08, 2021 at 12:36:21PM +0530, Anshuman Khandual wrote:
> > 
> > 
> > On 5/28/21 10:43 AM, Mike Rapoport wrote:
> > > On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> > >> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> > >>
> > >>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > >>>>
> > >>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > >>>>
> > >>>> It seems to me that the check for memblock_is_memory() in
> > >>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > >>>> section parts that are not actually populated and then we have
> > >>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> > >>>
> > >>> I acked Anshuman's patch, I think they all need to go in together.
> > >>
> > >> That's neat.   Specifically which patches are we referring to here?
> > > 
> > > arm64: drop pfn_valid_within() and simplify pfn_valid():
> > > https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> > > 
> > > arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> > > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > 
> > I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
> > i.e. next-20210607. I might have missed some earlier context here but do not we want
> > to fallback on generic pfn_valid() after Mike's series ?
> 
> Andrew,
> 
> Can you please pick the two patches above?

I already had

include-linux-mmzoneh-add-documentation-for-pfn_valid.patch
memblock-update-initialization-of-reserved-pages.patch
arm64-decouple-check-whether-pfn-is-in-linear-map-from-pfn_valid.patch
arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch

and I just added

arm64-mm-drop-have_arch_pfn_valid.patch

so I think we're all good now?

and I don't think any of this is needed in 5.13 or -stable, correct?

I still have question marks over

https://lkml.kernel.org/r/YJ0Fhs5krPJ0FgiV@kernel.org and
https://lkml.kernel.org/r/d55f915c-ad01-e729-1e29-b57d78257cbb@quicinc.com

Is this all OK now?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
  2021-06-15  0:13                           ` Andrew Morton
@ 2021-06-15  6:05                             ` Mike Rapoport
  -1 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-06-15  6:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Anshuman Khandual, Catalin Marinas, Qian Cai, David Hildenbrand,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Mon, Jun 14, 2021 at 05:13:51PM -0700, Andrew Morton wrote:
> On Mon, 14 Jun 2021 11:25:54 +0300 Mike Rapoport <rppt@linux.ibm.com> wrote:
> 
> > On Tue, Jun 08, 2021 at 12:36:21PM +0530, Anshuman Khandual wrote:
> > > 
> > > 
> > > On 5/28/21 10:43 AM, Mike Rapoport wrote:
> > > > On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> > > >> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >>
> > > >>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > > >>>>
> > > >>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > > >>>>
> > > >>>> It seems to me that the check for memblock_is_memory() in
> > > >>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > > >>>> section parts that are not actually populated and then we have
> > > >>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> > > >>>
> > > >>> I acked Anshuman's patch, I think they all need to go in together.
> > > >>
> > > >> That's neat.   Specifically which patches are we referring to here?
> > > > 
> > > > arm64: drop pfn_valid_within() and simplify pfn_valid():
> > > > https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> > > > 
> > > > arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> > > > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > > 
> > > I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
> > > i.e. next-20210607. I might have missed some earlier context here but do not we want
> > > to fallback on generic pfn_valid() after Mike's series ?
> > 
> > Andrew,
> > 
> > Can you please pick the two patches above?
> 
> I already had
> 
> include-linux-mmzoneh-add-documentation-for-pfn_valid.patch
> memblock-update-initialization-of-reserved-pages.patch
> arm64-decouple-check-whether-pfn-is-in-linear-map-from-pfn_valid.patch
> arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch
> 
> and I just added
> 
> arm64-mm-drop-have_arch_pfn_valid.patch
> 
> so I think we're all good now?

Yes. 
 
> and I don't think any of this is needed in 5.13 or -stable, correct?

Right.
 
> I still have question marks over
> 
> https://lkml.kernel.org/r/YJ0Fhs5krPJ0FgiV@kernel.org and
> https://lkml.kernel.org/r/d55f915c-ad01-e729-1e29-b57d78257cbb@quicinc.com
> 
> Is this all OK now?

Yes, it is.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Arm64 crash while reading memory sysfs
@ 2021-06-15  6:05                             ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2021-06-15  6:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Anshuman Khandual, Catalin Marinas, Qian Cai, David Hildenbrand,
	Ard Biesheuvel, Linux Memory Management List, Will Deacon,
	Marc Zyngier, Linux Kernel Mailing List, Linux ARM

On Mon, Jun 14, 2021 at 05:13:51PM -0700, Andrew Morton wrote:
> On Mon, 14 Jun 2021 11:25:54 +0300 Mike Rapoport <rppt@linux.ibm.com> wrote:
> 
> > On Tue, Jun 08, 2021 at 12:36:21PM +0530, Anshuman Khandual wrote:
> > > 
> > > 
> > > On 5/28/21 10:43 AM, Mike Rapoport wrote:
> > > > On Thu, May 27, 2021 at 03:56:44PM -0700, Andrew Morton wrote:
> > > >> On Thu, 27 May 2021 18:50:48 +0100 Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >>
> > > >>>> Can you please try Anshuman's patch "arm64/mm: Drop HAVE_ARCH_PFN_VALID":
> > > >>>>
> > > >>>> https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > > >>>>
> > > >>>> It seems to me that the check for memblock_is_memory() in
> > > >>>> arm64::pfn_valid() is what makes init_unavailable_range() to bail out for
> > > >>>> section parts that are not actually populated and then we have
> > > >>>> VM_BUG_ON_PAGE(PagePoisoned(p)) for these pages.
> > > >>>
> > > >>> I acked Anshuman's patch, I think they all need to go in together.
> > > >>
> > > >> That's neat.   Specifically which patches are we referring to here?
> > > > 
> > > > arm64: drop pfn_valid_within() and simplify pfn_valid():
> > > > https://lore.kernel.org/lkml/20210511100550.28178-5-rppt@kernel.org
> > > > 
> > > > arm64/mm: Drop HAVE_ARCH_PFN_VALID:
> > > > https://lore.kernel.org/lkml/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com
> > > 
> > > I dont see the above patch (which drops HAVE_ARCH_PFN_VALID on arm64) on linux-next
> > > i.e. next-20210607. I might have missed some earlier context here but do not we want
> > > to fallback on generic pfn_valid() after Mike's series ?
> > 
> > Andrew,
> > 
> > Can you please pick the two patches above?
> 
> I already had
> 
> include-linux-mmzoneh-add-documentation-for-pfn_valid.patch
> memblock-update-initialization-of-reserved-pages.patch
> arm64-decouple-check-whether-pfn-is-in-linear-map-from-pfn_valid.patch
> arm64-drop-pfn_valid_within-and-simplify-pfn_valid.patch
> 
> and I just added
> 
> arm64-mm-drop-have_arch_pfn_valid.patch
> 
> so I think we're all good now?

Yes. 
 
> and I don't think any of this is needed in 5.13 or -stable, correct?

Right.
 
> I still have question marks over
> 
> https://lkml.kernel.org/r/YJ0Fhs5krPJ0FgiV@kernel.org and
> https://lkml.kernel.org/r/d55f915c-ad01-e729-1e29-b57d78257cbb@quicinc.com
> 
> Is this all OK now?

Yes, it is.

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-06-15 18:43 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25 15:25 Arm64 crash while reading memory sysfs Qian Cai (QUIC)
2021-05-25 15:25 ` Qian Cai (QUIC)
2021-05-25 15:37 ` David Hildenbrand
2021-05-25 15:37   ` David Hildenbrand
2021-05-26  6:40 ` Mike Rapoport
2021-05-26  6:40   ` Mike Rapoport
2021-05-26 12:09   ` Qian Cai (QUIC)
2021-05-26 12:09     ` Qian Cai (QUIC)
2021-05-26 13:04     ` Catalin Marinas
2021-05-26 13:04       ` Catalin Marinas
2021-05-26 17:25       ` Mike Rapoport
2021-05-26 17:25         ` Mike Rapoport
2021-05-26 17:24     ` Mike Rapoport
2021-05-26 17:24       ` Mike Rapoport
2021-05-27  0:16       ` Qian Cai
2021-05-27  0:16         ` Qian Cai
2021-05-27  0:31         ` Andrew Morton
2021-05-27  0:31           ` Andrew Morton
2021-05-27  7:25           ` Stephen Rothwell
2021-05-27  7:25             ` Stephen Rothwell
2021-05-27  8:56         ` Mike Rapoport
2021-05-27  8:56           ` Mike Rapoport
2021-05-27 14:33           ` Qian Cai
2021-05-27 14:33             ` Qian Cai
2021-05-27 16:22             ` Mike Rapoport
2021-05-27 16:22               ` Mike Rapoport
2021-05-27 17:00               ` Qian Cai
2021-05-27 17:00                 ` Qian Cai
2021-05-27 17:12               ` David Hildenbrand
2021-05-27 17:12                 ` David Hildenbrand
2021-05-27 17:50               ` Catalin Marinas
2021-05-27 17:50                 ` Catalin Marinas
2021-05-27 22:56                 ` Andrew Morton
2021-05-27 22:56                   ` Andrew Morton
2021-05-28  5:13                   ` Mike Rapoport
2021-05-28  5:13                     ` Mike Rapoport
2021-06-08  7:06                     ` Anshuman Khandual
2021-06-08  7:06                       ` Anshuman Khandual
2021-06-14  8:25                       ` Mike Rapoport
2021-06-14  8:25                         ` Mike Rapoport
2021-06-15  0:13                         ` Andrew Morton
2021-06-15  0:13                           ` Andrew Morton
2021-06-15  6:05                           ` Mike Rapoport
2021-06-15  6:05                             ` Mike Rapoport

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.